Documents the LLM-assisted extractor's in-scope / out-of-scope categories derived from the first live triage pass (16 promoted, 35 rejected). Five in-scope classes, six explicit out-of-scope classes, trust model summary, multi-model future direction. Cleaned up stale follow-up items in next-steps.md: rule expansion marked deprioritized, LLM extractor marked done, retrieval harness marked done with expansion pending. Fixed docstring timeout (45s -> 90s) in extractor_llm.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 KiB
AtoCore Next Steps
Current Position
AtoCore now has:
- canonical runtime and machine storage on Dalidou
- separated source and machine-data boundaries
- initial self-knowledge ingested into the live instance
- trusted project-state entries for AtoCore itself
- a first read-only OpenClaw integration path on the T420
- a first real active-project corpus batch for:
p04-gigabitp05-interferometerp06-polisher
This working list should be read alongside:
Immediate Next Steps
Re-run the backup/restore drill— DONE 2026-04-11, full passTurn on auto-capture of Claude Code sessions— DONE 2026-04-11, Stop hook viadeploy/hooks/capture_stop.py→POST /interactionswithreinforce=false; kill switch:ATOCORE_CAPTURE_DISABLED=12a. Run a short real-use pilot with auto-capture on- verify interactions are landing in Dalidou
- check prompt/response quality and truncation
- confirm fail-open: no user-visible impact when Dalidou is down
- Use the T420
atocore-contextskill and the new organic routing layer in real OpenClaw workflows- confirm
auto-contextfeels natural - confirm project inference is good enough in practice
- confirm the fail-open behavior remains acceptable in practice
- confirm
- Review retrieval quality after the first real project ingestion batch
- check whether the top hits are useful
- check whether trusted project state remains dominant
- reduce cross-project competition and prompt ambiguity where needed
- use
debug-contextto inspect the exact last AtoCore supplement
- Treat the active-project full markdown/text wave as complete
p04-gigabitp05-interferometerp06-polisher
- Define a cleaner source refresh model
- make the difference between source truth, staged inputs, and machine store explicit
- move toward a project source registry and refresh workflow
- foundation now exists via project registry + per-project refresh API
- registration policy + template + proposal + approved registration are now the normal path for new projects
- Move to Wave 2 trusted-operational ingestion
- curated dashboards
- decision logs
- milestone/current-status views
- operational truth, not just raw project notes
- Integrate the new engineering architecture docs into active planning, not immediate schema code
- keep
docs/architecture/engineering-knowledge-hybrid-architecture.mdas the target layer model - keep
docs/architecture/engineering-ontology-v1.mdas the V1 structured-domain target - do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
- keep
- Finish the boring operations baseline around backup
- retention policy cleanup script (snapshots dir grows monotonically today)
- off-Dalidou backup target (at minimum an rsync to laptop or another host so a single-disk failure isn't terminal)
- automatic post-backup validation (have
create_runtime_backupcallvalidate_backupon its own output and refuse to declare success if validation fails) - DONE in commits
be40994/0382238/3362080/ this one:create_runtime_backup+list_runtime_backups+validate_backup+restore_runtime_backupwith CLIPOST /admin/backupwithinclude_chroma=trueunder the ingestion lock/healthbuild_sha / build_time / build_branch provenancedeploy.shself-update re-exec guard + build_sha drift verification- live drill procedure in
docs/backup-restore-procedure.mdwith failure-mode table and the memory_type=episodic marker pattern from the 2026-04-09 drill
- Keep deeper automatic runtime integration modest until the organic read-only model has proven value
Trusted State Status
The first conservative trusted-state promotion pass is now complete for:
p04-gigabitp05-interferometerp06-polisher
Each project now has a small set of stable entries covering:
- summary
- architecture or boundary decision
- key constraints
- current next focus
This materially improves context/build quality for project-hinted prompts.
Recommended Near-Term Project Work
The active-project full markdown/text wave is now in.
The near-term work is now:
- strengthen retrieval quality
- promote or refine trusted operational truth where the broad corpus is now too noisy
- keep trusted project state concise and high-confidence
- widen only through named ingestion waves
Recommended Next Wave Inputs
Wave 2 should emphasize trusted operational truth, not bulk historical notes.
P04:
- current status dashboard
- current selected design path
- current frame interface truth
- current next-step milestone view
P05:
- selected vendor path
- current error-budget baseline
- current architecture freeze or open decisions
- current procurement / next-action view
P06:
- current system map
- current shared contracts baseline
- current calibration procedure truth
- current July / proving roadmap view
Deferred On Purpose
- automatic write-back from OpenClaw into AtoCore
- automatic memory promotion
reflection loop integration— baseline now landed (2026-04-11): Stop hook runs reinforce automatically, project memories are folded into the context pack, batch-extract and triage CLIs exist. What remains deferred: scheduled/automatic batch extraction and extractor rule tuning (rule-based extractor produced 1 candidate from 42 real captures — needs new cues for conversational LLM content).- replacing OpenClaw's own memory system
- syncing the live machine DB between machines
Success Criteria For The Next Batch
The next batch is successful if:
- OpenClaw can use AtoCore naturally when context is needed
- OpenClaw can infer registered projects and call AtoCore organically for project-knowledge questions
- the active-project full corpus wave can be inspected and used concretely
through
auto-context,context-build, anddebug-context - OpenClaw can also register a new project cleanly before refreshing it
- existing project registrations can be refined safely before refresh when the staged source set evolves
- AtoCore answers correctly for the active project set
- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
- trusted project state remains concise and high confidence
- project ingestion remains controlled rather than noisy
- the canonical Dalidou instance stays stable
Retrieval Quality Review — 2026-04-11
First sweep with real project-hinted queries on Dalidou. Used
POST /context/build against p04, p05, p06 with representative
questions and inspected formatted_context.
Findings:
- Trusted Project State is surfacing correctly. The DECISION and REQUIREMENT categories appear at the top of the pack and include the expected key facts (e.g. p04 "Option B conical-back mirror architecture"). This is the strongest signal in the pack today.
- Chunk retrieval is relevant on-topic but broad. Top chunks for the p04 architecture query are PDR intro, CAD assembly overview, and the index — all on the right project but none of them directly answer the "why was Option B chosen" question. The authoritative answer sits in Project State, not in the chunks.
- Active memories are NOT reaching the pack. The context builder surfaces Trusted Project State and retrieved chunks but does not include the 21 active project/knowledge memories. Reinforcement (Phase 9 Commit B) bumps memory confidence without the memory ever being read back into a prompt — the reflection loop has no outlet on the retrieval side. This is a design gap, not a bug: needs a decision on whether memories should feed into context assembly, and if so at what trust level (below project_state, above chunks).
- Cross-project bleed is low. The p04 query did pull one p05 chunk (CGH_Design_Input_for_AOM) as the bottom hit but the top-4 were all p04.
Proposed follow-ups (not yet scheduled):
Decide whether memories should be folded intoDONE 2026-04-11 (commitsformatted_contextand under what section header.8ea53f4,5913da5,1161645). A--- Project Memories ---band now sits between identity/preference and retrieved chunks, gated on a canonical project hint to prevent cross-project bleed. Budget ratio 0.25 (tuned empirically — paragraph memories are ~400 chars and earlier 0.15 ratio starved the first entry by one char). Verified live: p04 architecture query surfaces the Option B memory.- Re-run the same three queries after any builder change and compare
formatted_contextdiffs — still open, and is the natural entry point for the retrieval eval harness on the roadmap.
Reflection Loop Live Check — 2026-04-11
First real run of batch-extract across 42 captured Claude Code
interactions on Dalidou produced exactly 1 candidate, and that
candidate was a synthetic test capture from earlier in the session
(rejected). Finding:
- The rule-based extractor in
src/atocore/memory/extractor.pykeys on explicit structural cues (decision headings like## Decision: ..., preference sentences, etc.). Real Claude Code responses are conversational and almost never contain those cues. - This means the capture → extract half of the reflection loop is effectively inert against organic LLM sessions until either the rules are broadened (new cue families: "we chose X because...", "the selected approach is...", etc.) or an LLM-assisted extraction path is added alongside the rule-based one.
- Capture → reinforce is working correctly on live data (length-aware matcher verified on live paraphrase of a p04 memory).
Follow-up candidates:
Extractor rule expansion— Day 2 baseline showed 0% recall across 5 distinct miss classes; rule expansion cannot close a 5-way miss. Deprioritized.LLM-assisted extractor— DONE 2026-04-12.extractor_llm.pyshells out toclaude -p(Haiku, OAuth, no API key). First live run: 100% recall, 2.55 yield/interaction on a 20-interaction labeled set. First triage: 51 candidates → 16 promoted, 35 rejected (31% accept rate). Active memories 20 → 36.Retrieval eval harness— DONE 2026-04-11 (scripts/retrieval_eval.py, 6/6 passing). Expansion to 15-20 fixtures is mini-phase Day 6.
Extractor Scope — 2026-04-12
What the LLM-assisted extractor (src/atocore/memory/extractor_llm.py)
extracts from conversational Claude Code captures:
In scope:
- Architectural commitments (e.g. "Z-axis is engage/retract, not continuous position")
- Ratified decisions with project scope (e.g. "USB SSD mandatory on RPi for telemetry storage")
- Durable engineering facts (e.g. "telemetry data rate ~29 MB/hour")
- Working rules and adaptation patterns (e.g. "extraction stays off the capture hot path")
- Interface invariants (e.g. "controller-job.v1 in, run-log.v1 out; no firmware change needed")
Out of scope (intentionally rejected by triage):
- Transient roadmap / plan steps that will be stale in a week
- Operational instructions ("run this command to deploy")
- Process rules that live in DEV-LEDGER.md / AGENTS.md, not in memory
- Implementation details that are too granular (individual field names when the parent concept is already captured)
- Already-fixed review findings (P1/P2 that no longer apply)
- Duplicates of existing active memories with wrong project tags
Trust model:
- Extraction stays off the capture hot path (batch / manual only)
- All candidates land as
status=candidate, never auto-promoted - Human or auto-triage reviews before promotion to active
- Future direction: multi-model extraction + triage (Codex/Gemini as second-pass reviewers for robustness against single-model bias)
Long-Run Goal
The long-run target is:
- continue working normally inside PKM project stacks and Gitea repos
- let OpenClaw keep its own memory and runtime behavior
- let AtoCore supplement LLM work with stronger trusted context, retrieval, and context assembly
That means AtoCore should behave like a durable external context engine and machine-memory layer, not a replacement for normal repo work or OpenClaw memory.