Documents the LLM-assisted extractor's in-scope / out-of-scope categories derived from the first live triage pass (16 promoted, 35 rejected). Five in-scope classes, six explicit out-of-scope classes, trust model summary, multi-model future direction. Cleaned up stale follow-up items in next-steps.md: rule expansion marked deprioritized, LLM extractor marked done, retrieval harness marked done with expansion pending. Fixed docstring timeout (45s -> 90s) in extractor_llm.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
288 lines
12 KiB
Markdown
288 lines
12 KiB
Markdown
# AtoCore Next Steps
|
|
|
|
## Current Position
|
|
|
|
AtoCore now has:
|
|
|
|
- canonical runtime and machine storage on Dalidou
|
|
- separated source and machine-data boundaries
|
|
- initial self-knowledge ingested into the live instance
|
|
- trusted project-state entries for AtoCore itself
|
|
- a first read-only OpenClaw integration path on the T420
|
|
- a first real active-project corpus batch for:
|
|
- `p04-gigabit`
|
|
- `p05-interferometer`
|
|
- `p06-polisher`
|
|
|
|
This working list should be read alongside:
|
|
|
|
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
|
|
|
|
## Immediate Next Steps
|
|
|
|
1. ~~Re-run the backup/restore drill~~ — DONE 2026-04-11, full pass
|
|
2. ~~Turn on auto-capture of Claude Code sessions~~ — DONE 2026-04-11,
|
|
Stop hook via `deploy/hooks/capture_stop.py` → `POST /interactions`
|
|
with `reinforce=false`; kill switch: `ATOCORE_CAPTURE_DISABLED=1`
|
|
2a. Run a short real-use pilot with auto-capture on
|
|
- verify interactions are landing in Dalidou
|
|
- check prompt/response quality and truncation
|
|
- confirm fail-open: no user-visible impact when Dalidou is down
|
|
3. Use the T420 `atocore-context` skill and the new organic routing layer in
|
|
real OpenClaw workflows
|
|
- confirm `auto-context` feels natural
|
|
- confirm project inference is good enough in practice
|
|
- confirm the fail-open behavior remains acceptable in practice
|
|
4. Review retrieval quality after the first real project ingestion batch
|
|
- check whether the top hits are useful
|
|
- check whether trusted project state remains dominant
|
|
- reduce cross-project competition and prompt ambiguity where needed
|
|
- use `debug-context` to inspect the exact last AtoCore supplement
|
|
5. Treat the active-project full markdown/text wave as complete
|
|
- `p04-gigabit`
|
|
- `p05-interferometer`
|
|
- `p06-polisher`
|
|
6. Define a cleaner source refresh model
|
|
- make the difference between source truth, staged inputs, and machine store
|
|
explicit
|
|
- move toward a project source registry and refresh workflow
|
|
- foundation now exists via project registry + per-project refresh API
|
|
- registration policy + template + proposal + approved registration are now
|
|
the normal path for new projects
|
|
7. Move to Wave 2 trusted-operational ingestion
|
|
- curated dashboards
|
|
- decision logs
|
|
- milestone/current-status views
|
|
- operational truth, not just raw project notes
|
|
8. Integrate the new engineering architecture docs into active planning, not immediate schema code
|
|
- keep `docs/architecture/engineering-knowledge-hybrid-architecture.md` as the target layer model
|
|
- keep `docs/architecture/engineering-ontology-v1.md` as the V1 structured-domain target
|
|
- do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
|
|
9. Finish the boring operations baseline around backup
|
|
- retention policy cleanup script (snapshots dir grows
|
|
monotonically today)
|
|
- off-Dalidou backup target (at minimum an rsync to laptop or
|
|
another host so a single-disk failure isn't terminal)
|
|
- automatic post-backup validation (have `create_runtime_backup`
|
|
call `validate_backup` on its own output and refuse to
|
|
declare success if validation fails)
|
|
- DONE in commits be40994 / 0382238 / 3362080 / this one:
|
|
- `create_runtime_backup` + `list_runtime_backups` +
|
|
`validate_backup` + `restore_runtime_backup` with CLI
|
|
- `POST /admin/backup` with `include_chroma=true` under
|
|
the ingestion lock
|
|
- `/health` build_sha / build_time / build_branch provenance
|
|
- `deploy.sh` self-update re-exec guard + build_sha drift
|
|
verification
|
|
- live drill procedure in `docs/backup-restore-procedure.md`
|
|
with failure-mode table and the memory_type=episodic
|
|
marker pattern from the 2026-04-09 drill
|
|
10. Keep deeper automatic runtime integration modest until the organic read-only
|
|
model has proven value
|
|
|
|
## Trusted State Status
|
|
|
|
The first conservative trusted-state promotion pass is now complete for:
|
|
|
|
- `p04-gigabit`
|
|
- `p05-interferometer`
|
|
- `p06-polisher`
|
|
|
|
Each project now has a small set of stable entries covering:
|
|
|
|
- summary
|
|
- architecture or boundary decision
|
|
- key constraints
|
|
- current next focus
|
|
|
|
This materially improves `context/build` quality for project-hinted prompts.
|
|
|
|
## Recommended Near-Term Project Work
|
|
|
|
The active-project full markdown/text wave is now in.
|
|
|
|
The near-term work is now:
|
|
|
|
1. strengthen retrieval quality
|
|
2. promote or refine trusted operational truth where the broad corpus is now too noisy
|
|
3. keep trusted project state concise and high-confidence
|
|
4. widen only through named ingestion waves
|
|
|
|
## Recommended Next Wave Inputs
|
|
|
|
Wave 2 should emphasize trusted operational truth, not bulk historical notes.
|
|
|
|
P04:
|
|
|
|
- current status dashboard
|
|
- current selected design path
|
|
- current frame interface truth
|
|
- current next-step milestone view
|
|
|
|
P05:
|
|
|
|
- selected vendor path
|
|
- current error-budget baseline
|
|
- current architecture freeze or open decisions
|
|
- current procurement / next-action view
|
|
|
|
P06:
|
|
|
|
- current system map
|
|
- current shared contracts baseline
|
|
- current calibration procedure truth
|
|
- current July / proving roadmap view
|
|
|
|
## Deferred On Purpose
|
|
|
|
- automatic write-back from OpenClaw into AtoCore
|
|
- automatic memory promotion
|
|
- ~~reflection loop integration~~ — baseline now landed (2026-04-11):
|
|
Stop hook runs reinforce automatically, project memories are folded
|
|
into the context pack, batch-extract and triage CLIs exist. What
|
|
remains deferred: scheduled/automatic batch extraction and extractor
|
|
rule tuning (rule-based extractor produced 1 candidate from 42 real
|
|
captures — needs new cues for conversational LLM content).
|
|
- replacing OpenClaw's own memory system
|
|
- syncing the live machine DB between machines
|
|
|
|
## Success Criteria For The Next Batch
|
|
|
|
The next batch is successful if:
|
|
|
|
- OpenClaw can use AtoCore naturally when context is needed
|
|
- OpenClaw can infer registered projects and call AtoCore organically for
|
|
project-knowledge questions
|
|
- the active-project full corpus wave can be inspected and used concretely
|
|
through `auto-context`, `context-build`, and `debug-context`
|
|
- OpenClaw can also register a new project cleanly before refreshing it
|
|
- existing project registrations can be refined safely before refresh when the
|
|
staged source set evolves
|
|
- AtoCore answers correctly for the active project set
|
|
- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
|
|
- trusted project state remains concise and high confidence
|
|
- project ingestion remains controlled rather than noisy
|
|
- the canonical Dalidou instance stays stable
|
|
|
|
## Retrieval Quality Review — 2026-04-11
|
|
|
|
First sweep with real project-hinted queries on Dalidou. Used
|
|
`POST /context/build` against p04, p05, p06 with representative
|
|
questions and inspected `formatted_context`.
|
|
|
|
Findings:
|
|
|
|
- **Trusted Project State is surfacing correctly.** The DECISION and
|
|
REQUIREMENT categories appear at the top of the pack and include
|
|
the expected key facts (e.g. p04 "Option B conical-back mirror
|
|
architecture"). This is the strongest signal in the pack today.
|
|
- **Chunk retrieval is relevant on-topic but broad.** Top chunks for
|
|
the p04 architecture query are PDR intro, CAD assembly overview,
|
|
and the index — all on the right project but none of them directly
|
|
answer the "why was Option B chosen" question. The authoritative
|
|
answer sits in Project State, not in the chunks.
|
|
- **Active memories are NOT reaching the pack.** The context builder
|
|
surfaces Trusted Project State and retrieved chunks but does not
|
|
include the 21 active project/knowledge memories. Reinforcement
|
|
(Phase 9 Commit B) bumps memory confidence without the memory ever
|
|
being read back into a prompt — the reflection loop has no outlet
|
|
on the retrieval side. This is a design gap, not a bug: needs a
|
|
decision on whether memories should feed into context assembly,
|
|
and if so at what trust level (below project_state, above chunks).
|
|
- **Cross-project bleed is low.** The p04 query did pull one p05
|
|
chunk (CGH_Design_Input_for_AOM) as the bottom hit but the top-4
|
|
were all p04.
|
|
|
|
Proposed follow-ups (not yet scheduled):
|
|
|
|
1. ~~Decide whether memories should be folded into `formatted_context`
|
|
and under what section header.~~ DONE 2026-04-11 (commits 8ea53f4,
|
|
5913da5, 1161645). A `--- Project Memories ---` band now sits
|
|
between identity/preference and retrieved chunks, gated on a
|
|
canonical project hint to prevent cross-project bleed. Budget
|
|
ratio 0.25 (tuned empirically — paragraph memories are ~400 chars
|
|
and earlier 0.15 ratio starved the first entry by one char).
|
|
Verified live: p04 architecture query surfaces the Option B memory.
|
|
2. Re-run the same three queries after any builder change and compare
|
|
`formatted_context` diffs — still open, and is the natural entry
|
|
point for the retrieval eval harness on the roadmap.
|
|
|
|
## Reflection Loop Live Check — 2026-04-11
|
|
|
|
First real run of `batch-extract` across 42 captured Claude Code
|
|
interactions on Dalidou produced exactly **1 candidate**, and that
|
|
candidate was a synthetic test capture from earlier in the session
|
|
(rejected). Finding:
|
|
|
|
- The rule-based extractor in `src/atocore/memory/extractor.py` keys
|
|
on explicit structural cues (decision headings like
|
|
`## Decision: ...`, preference sentences, etc.). Real Claude Code
|
|
responses are conversational and almost never contain those cues.
|
|
- This means the capture → extract half of the reflection loop is
|
|
effectively inert against organic LLM sessions until either the
|
|
rules are broadened (new cue families: "we chose X because...",
|
|
"the selected approach is...", etc.) or an LLM-assisted extraction
|
|
path is added alongside the rule-based one.
|
|
- Capture → reinforce is working correctly on live data (length-aware
|
|
matcher verified on live paraphrase of a p04 memory).
|
|
|
|
Follow-up candidates:
|
|
|
|
1. ~~Extractor rule expansion~~ — Day 2 baseline showed 0% recall
|
|
across 5 distinct miss classes; rule expansion cannot close a
|
|
5-way miss. Deprioritized.
|
|
2. ~~LLM-assisted extractor~~ — DONE 2026-04-12. `extractor_llm.py`
|
|
shells out to `claude -p` (Haiku, OAuth, no API key). First live
|
|
run: 100% recall, 2.55 yield/interaction on a 20-interaction
|
|
labeled set. First triage: 51 candidates → 16 promoted, 35
|
|
rejected (31% accept rate). Active memories 20 → 36.
|
|
3. ~~Retrieval eval harness~~ — DONE 2026-04-11 (scripts/retrieval_eval.py,
|
|
6/6 passing). Expansion to 15-20 fixtures is mini-phase Day 6.
|
|
|
|
## Extractor Scope — 2026-04-12
|
|
|
|
What the LLM-assisted extractor (`src/atocore/memory/extractor_llm.py`)
|
|
extracts from conversational Claude Code captures:
|
|
|
|
**In scope:**
|
|
|
|
- Architectural commitments (e.g. "Z-axis is engage/retract, not
|
|
continuous position")
|
|
- Ratified decisions with project scope (e.g. "USB SSD mandatory on
|
|
RPi for telemetry storage")
|
|
- Durable engineering facts (e.g. "telemetry data rate ~29 MB/hour")
|
|
- Working rules and adaptation patterns (e.g. "extraction stays off
|
|
the capture hot path")
|
|
- Interface invariants (e.g. "controller-job.v1 in, run-log.v1 out;
|
|
no firmware change needed")
|
|
|
|
**Out of scope (intentionally rejected by triage):**
|
|
|
|
- Transient roadmap / plan steps that will be stale in a week
|
|
- Operational instructions ("run this command to deploy")
|
|
- Process rules that live in DEV-LEDGER.md / AGENTS.md, not in memory
|
|
- Implementation details that are too granular (individual field names
|
|
when the parent concept is already captured)
|
|
- Already-fixed review findings (P1/P2 that no longer apply)
|
|
- Duplicates of existing active memories with wrong project tags
|
|
|
|
**Trust model:**
|
|
|
|
- Extraction stays off the capture hot path (batch / manual only)
|
|
- All candidates land as `status=candidate`, never auto-promoted
|
|
- Human or auto-triage reviews before promotion to active
|
|
- Future direction: multi-model extraction + triage (Codex/Gemini as
|
|
second-pass reviewers for robustness against single-model bias)
|
|
|
|
## Long-Run Goal
|
|
|
|
The long-run target is:
|
|
|
|
- continue working normally inside PKM project stacks and Gitea repos
|
|
- let OpenClaw keep its own memory and runtime behavior
|
|
- let AtoCore supplement LLM work with stronger trusted context, retrieval, and
|
|
context assembly
|
|
|
|
That means AtoCore should behave like a durable external context engine and
|
|
machine-memory layer, not a replacement for normal repo work or OpenClaw memory.
|