Files

Anto01 93f796207f docs: Day 5 — extractor scope + stale follow-ups cleaned

Documents the LLM-assisted extractor's in-scope / out-of-scope
categories derived from the first live triage pass (16 promoted,
35 rejected). Five in-scope classes, six explicit out-of-scope
classes, trust model summary, multi-model future direction.

Cleaned up stale follow-up items in next-steps.md: rule expansion
marked deprioritized, LLM extractor marked done, retrieval harness
marked done with expansion pending.

Fixed docstring timeout (45s -> 90s) in extractor_llm.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-12 06:24:25 -04:00

12 KiB

Raw Blame History

AtoCore Next Steps

Current Position

AtoCore now has:

canonical runtime and machine storage on Dalidou
separated source and machine-data boundaries
initial self-knowledge ingested into the live instance
trusted project-state entries for AtoCore itself
a first read-only OpenClaw integration path on the T420
a first real active-project corpus batch for:
- p04-gigabit
- p05-interferometer
- p06-polisher

This working list should be read alongside:

master-plan-status.md

Immediate Next Steps

~~Re-run the backup/restore drill~~ — DONE 2026-04-11, full pass
~~Turn on auto-capture of Claude Code sessions~~ — DONE 2026-04-11, Stop hook via deploy/hooks/capture_stop.py → POST /interactions with reinforce=false; kill switch: ATOCORE_CAPTURE_DISABLED=1 2a. Run a short real-use pilot with auto-capture on
- verify interactions are landing in Dalidou
- check prompt/response quality and truncation
- confirm fail-open: no user-visible impact when Dalidou is down
Use the T420 atocore-context skill and the new organic routing layer in real OpenClaw workflows
- confirm auto-context feels natural
- confirm project inference is good enough in practice
- confirm the fail-open behavior remains acceptable in practice
Review retrieval quality after the first real project ingestion batch
- check whether the top hits are useful
- check whether trusted project state remains dominant
- reduce cross-project competition and prompt ambiguity where needed
- use debug-context to inspect the exact last AtoCore supplement
Treat the active-project full markdown/text wave as complete
- p04-gigabit
- p05-interferometer
- p06-polisher
Define a cleaner source refresh model
- make the difference between source truth, staged inputs, and machine store explicit
- move toward a project source registry and refresh workflow
- foundation now exists via project registry + per-project refresh API
- registration policy + template + proposal + approved registration are now the normal path for new projects
Move to Wave 2 trusted-operational ingestion
- curated dashboards
- decision logs
- milestone/current-status views
- operational truth, not just raw project notes
Integrate the new engineering architecture docs into active planning, not immediate schema code
- keep docs/architecture/engineering-knowledge-hybrid-architecture.md as the target layer model
- keep docs/architecture/engineering-ontology-v1.md as the V1 structured-domain target
- do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
Finish the boring operations baseline around backup
- retention policy cleanup script (snapshots dir grows monotonically today)
- off-Dalidou backup target (at minimum an rsync to laptop or another host so a single-disk failure isn't terminal)
- automatic post-backup validation (have create_runtime_backup call validate_backup on its own output and refuse to declare success if validation fails)
- DONE in commits be40994 / 0382238 / 3362080 / this one:
  - create_runtime_backup + list_runtime_backups + validate_backup + restore_runtime_backup with CLI
  - POST /admin/backup with include_chroma=true under the ingestion lock
  - /health build_sha / build_time / build_branch provenance
  - deploy.sh self-update re-exec guard + build_sha drift verification
  - live drill procedure in docs/backup-restore-procedure.md with failure-mode table and the memory_type=episodic marker pattern from the 2026-04-09 drill
Keep deeper automatic runtime integration modest until the organic read-only model has proven value

Trusted State Status

The first conservative trusted-state promotion pass is now complete for:

p04-gigabit
p05-interferometer
p06-polisher

Each project now has a small set of stable entries covering:

summary
architecture or boundary decision
key constraints
current next focus

This materially improves context/build quality for project-hinted prompts.

Recommended Near-Term Project Work

The active-project full markdown/text wave is now in.

The near-term work is now:

strengthen retrieval quality
promote or refine trusted operational truth where the broad corpus is now too noisy
keep trusted project state concise and high-confidence
widen only through named ingestion waves

Recommended Next Wave Inputs

Wave 2 should emphasize trusted operational truth, not bulk historical notes.

P04:

current status dashboard
current selected design path
current frame interface truth
current next-step milestone view

P05:

selected vendor path
current error-budget baseline
current architecture freeze or open decisions
current procurement / next-action view

P06:

current system map
current shared contracts baseline
current calibration procedure truth
current July / proving roadmap view

Deferred On Purpose

automatic write-back from OpenClaw into AtoCore
automatic memory promotion
~~reflection loop integration~~ — baseline now landed (2026-04-11): Stop hook runs reinforce automatically, project memories are folded into the context pack, batch-extract and triage CLIs exist. What remains deferred: scheduled/automatic batch extraction and extractor rule tuning (rule-based extractor produced 1 candidate from 42 real captures — needs new cues for conversational LLM content).
replacing OpenClaw's own memory system
syncing the live machine DB between machines

Success Criteria For The Next Batch

The next batch is successful if:

OpenClaw can use AtoCore naturally when context is needed
OpenClaw can infer registered projects and call AtoCore organically for project-knowledge questions
the active-project full corpus wave can be inspected and used concretely through auto-context, context-build, and debug-context
OpenClaw can also register a new project cleanly before refreshing it
existing project registrations can be refined safely before refresh when the staged source set evolves
AtoCore answers correctly for the active project set
retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
trusted project state remains concise and high confidence
project ingestion remains controlled rather than noisy
the canonical Dalidou instance stays stable

Retrieval Quality Review — 2026-04-11

First sweep with real project-hinted queries on Dalidou. Used POST /context/build against p04, p05, p06 with representative questions and inspected formatted_context.

Findings:

Trusted Project State is surfacing correctly. The DECISION and REQUIREMENT categories appear at the top of the pack and include the expected key facts (e.g. p04 "Option B conical-back mirror architecture"). This is the strongest signal in the pack today.
Chunk retrieval is relevant on-topic but broad. Top chunks for the p04 architecture query are PDR intro, CAD assembly overview, and the index — all on the right project but none of them directly answer the "why was Option B chosen" question. The authoritative answer sits in Project State, not in the chunks.
Active memories are NOT reaching the pack. The context builder surfaces Trusted Project State and retrieved chunks but does not include the 21 active project/knowledge memories. Reinforcement (Phase 9 Commit B) bumps memory confidence without the memory ever being read back into a prompt — the reflection loop has no outlet on the retrieval side. This is a design gap, not a bug: needs a decision on whether memories should feed into context assembly, and if so at what trust level (below project_state, above chunks).
Cross-project bleed is low. The p04 query did pull one p05 chunk (CGH_Design_Input_for_AOM) as the bottom hit but the top-4 were all p04.

Proposed follow-ups (not yet scheduled):

~~Decide whether memories should be folded into formatted_context and under what section header.~~ DONE 2026-04-11 (commits 8ea53f4, 5913da5, 1161645). A --- Project Memories --- band now sits between identity/preference and retrieved chunks, gated on a canonical project hint to prevent cross-project bleed. Budget ratio 0.25 (tuned empirically — paragraph memories are ~400 chars and earlier 0.15 ratio starved the first entry by one char). Verified live: p04 architecture query surfaces the Option B memory.
Re-run the same three queries after any builder change and compare formatted_context diffs — still open, and is the natural entry point for the retrieval eval harness on the roadmap.

Reflection Loop Live Check — 2026-04-11

First real run of batch-extract across 42 captured Claude Code interactions on Dalidou produced exactly 1 candidate, and that candidate was a synthetic test capture from earlier in the session (rejected). Finding:

The rule-based extractor in src/atocore/memory/extractor.py keys on explicit structural cues (decision headings like ## Decision: ..., preference sentences, etc.). Real Claude Code responses are conversational and almost never contain those cues.
This means the capture → extract half of the reflection loop is effectively inert against organic LLM sessions until either the rules are broadened (new cue families: "we chose X because...", "the selected approach is...", etc.) or an LLM-assisted extraction path is added alongside the rule-based one.
Capture → reinforce is working correctly on live data (length-aware matcher verified on live paraphrase of a p04 memory).

Follow-up candidates:

~~Extractor rule expansion~~ — Day 2 baseline showed 0% recall across 5 distinct miss classes; rule expansion cannot close a 5-way miss. Deprioritized.
~~LLM-assisted extractor~~ — DONE 2026-04-12. extractor_llm.py shells out to claude -p (Haiku, OAuth, no API key). First live run: 100% recall, 2.55 yield/interaction on a 20-interaction labeled set. First triage: 51 candidates → 16 promoted, 35 rejected (31% accept rate). Active memories 20 → 36.
~~Retrieval eval harness~~ — DONE 2026-04-11 (scripts/retrieval_eval.py, 6/6 passing). Expansion to 15-20 fixtures is mini-phase Day 6.

Extractor Scope — 2026-04-12

What the LLM-assisted extractor (src/atocore/memory/extractor_llm.py) extracts from conversational Claude Code captures:

In scope:

Architectural commitments (e.g. "Z-axis is engage/retract, not continuous position")
Ratified decisions with project scope (e.g. "USB SSD mandatory on RPi for telemetry storage")
Durable engineering facts (e.g. "telemetry data rate ~29 MB/hour")
Working rules and adaptation patterns (e.g. "extraction stays off the capture hot path")
Interface invariants (e.g. "controller-job.v1 in, run-log.v1 out; no firmware change needed")

Out of scope (intentionally rejected by triage):

Transient roadmap / plan steps that will be stale in a week
Operational instructions ("run this command to deploy")
Process rules that live in DEV-LEDGER.md / AGENTS.md, not in memory
Implementation details that are too granular (individual field names when the parent concept is already captured)
Already-fixed review findings (P1/P2 that no longer apply)
Duplicates of existing active memories with wrong project tags

Trust model:

Extraction stays off the capture hot path (batch / manual only)
All candidates land as status=candidate, never auto-promoted
Human or auto-triage reviews before promotion to active
Future direction: multi-model extraction + triage (Codex/Gemini as second-pass reviewers for robustness against single-model bias)

Long-Run Goal

The long-run target is:

continue working normally inside PKM project stacks and Gitea repos
let OpenClaw keep its own memory and runtime behavior
let AtoCore supplement LLM work with stronger trusted context, retrieval, and context assembly

That means AtoCore should behave like a durable external context engine and machine-memory layer, not a replacement for normal repo work or OpenClaw memory.

12 KiB Raw Blame History