audit-batch2/docs/next-steps.md

# AtoCore Next Steps

## Current Position

AtoCore now has:

- canonical runtime and machine storage on Dalidou
- separated source and machine-data boundaries
- initial self-knowledge ingested into the live instance
- trusted project-state entries for AtoCore itself
- a first read-only OpenClaw integration path on the T420
- a first real active-project corpus batch for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`

This working list should be read alongside:

- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)

## Immediate Next Steps

1. ~~Re-run the backup/restore drill~~ — DONE 2026-04-11, full pass
2. ~~Turn on auto-capture of Claude Code sessions~~ — DONE 2026-04-11,
   Stop hook via `deploy/hooks/capture_stop.py` → `POST /interactions`
   with `reinforce=false`; kill switch: `ATOCORE_CAPTURE_DISABLED=1`
2a. Run a short real-use pilot with auto-capture on
   - verify interactions are landing in Dalidou
   - check prompt/response quality and truncation
   - confirm fail-open: no user-visible impact when Dalidou is down
3. Use the T420 `atocore-context` skill and the new organic routing layer in
   real OpenClaw workflows
   - confirm `auto-context` feels natural
   - confirm project inference is good enough in practice
   - confirm the fail-open behavior remains acceptable in practice
4. Review retrieval quality after the first real project ingestion batch
   - check whether the top hits are useful
   - check whether trusted project state remains dominant
   - reduce cross-project competition and prompt ambiguity where needed
   - use `debug-context` to inspect the exact last AtoCore supplement
5. Treat the active-project full markdown/text wave as complete
   - `p04-gigabit`
   - `p05-interferometer`
   - `p06-polisher`
6. Define a cleaner source refresh model
   - make the difference between source truth, staged inputs, and machine store
     explicit
   - move toward a project source registry and refresh workflow
   - foundation now exists via project registry + per-project refresh API
   - registration policy + template + proposal + approved registration are now
     the normal path for new projects
7. Move to Wave 2 trusted-operational ingestion
   - curated dashboards
   - decision logs
   - milestone/current-status views
   - operational truth, not just raw project notes
8. Integrate the new engineering architecture docs into active planning, not immediate schema code
   - keep `docs/architecture/engineering-knowledge-hybrid-architecture.md` as the target layer model
   - keep `docs/architecture/engineering-ontology-v1.md` as the V1 structured-domain target
   - do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
9. Finish the boring operations baseline around backup
   - retention policy cleanup script (snapshots dir grows
     monotonically today)
   - off-Dalidou backup target (at minimum an rsync to laptop or
     another host so a single-disk failure isn't terminal)
   - automatic post-backup validation (have `create_runtime_backup`
     call `validate_backup` on its own output and refuse to
     declare success if validation fails)
   - DONE in commits be40994 / 0382238 / 3362080 / this one:
     - `create_runtime_backup` + `list_runtime_backups` +
       `validate_backup` + `restore_runtime_backup` with CLI
     - `POST /admin/backup` with `include_chroma=true` under
       the ingestion lock
     - `/health` build_sha / build_time / build_branch provenance
     - `deploy.sh` self-update re-exec guard + build_sha drift
       verification
     - live drill procedure in `docs/backup-restore-procedure.md`
       with failure-mode table and the memory_type=episodic
       marker pattern from the 2026-04-09 drill
10. Keep deeper automatic runtime integration modest until the organic read-only
    model has proven value

## Trusted State Status

The first conservative trusted-state promotion pass is now complete for:

- `p04-gigabit`
- `p05-interferometer`
- `p06-polisher`

Each project now has a small set of stable entries covering:

- summary
- architecture or boundary decision
- key constraints
- current next focus

This materially improves `context/build` quality for project-hinted prompts.

## Recommended Near-Term Project Work

The active-project full markdown/text wave is now in.

The near-term work is now:

1. strengthen retrieval quality
2. promote or refine trusted operational truth where the broad corpus is now too noisy
3. keep trusted project state concise and high-confidence
4. widen only through named ingestion waves

## Recommended Next Wave Inputs

Wave 2 should emphasize trusted operational truth, not bulk historical notes.

P04:

- current status dashboard
- current selected design path
- current frame interface truth
- current next-step milestone view

P05:

- selected vendor path
- current error-budget baseline
- current architecture freeze or open decisions
- current procurement / next-action view

P06:

- current system map
- current shared contracts baseline
- current calibration procedure truth
- current July / proving roadmap view

## Deferred On Purpose

- automatic write-back from OpenClaw into AtoCore
- automatic memory promotion
- ~~reflection loop integration~~ — baseline now landed (2026-04-11):
  Stop hook runs reinforce automatically, project memories are folded
  into the context pack, batch-extract and triage CLIs exist. What
  remains deferred: scheduled/automatic batch extraction and extractor
  rule tuning (rule-based extractor produced 1 candidate from 42 real
  captures — needs new cues for conversational LLM content).
- replacing OpenClaw's own memory system
- syncing the live machine DB between machines

## Success Criteria For The Next Batch

The next batch is successful if:

- OpenClaw can use AtoCore naturally when context is needed
- OpenClaw can infer registered projects and call AtoCore organically for
  project-knowledge questions
- the active-project full corpus wave can be inspected and used concretely
  through `auto-context`, `context-build`, and `debug-context`
- OpenClaw can also register a new project cleanly before refreshing it
- existing project registrations can be refined safely before refresh when the
  staged source set evolves
- AtoCore answers correctly for the active project set
- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
- trusted project state remains concise and high confidence
- project ingestion remains controlled rather than noisy
- the canonical Dalidou instance stays stable

## Retrieval Quality Review — 2026-04-11

First sweep with real project-hinted queries on Dalidou. Used
`POST /context/build` against p04, p05, p06 with representative
questions and inspected `formatted_context`.

Findings:

- **Trusted Project State is surfacing correctly.** The DECISION and
  REQUIREMENT categories appear at the top of the pack and include
  the expected key facts (e.g. p04 "Option B conical-back mirror
  architecture"). This is the strongest signal in the pack today.
- **Chunk retrieval is relevant on-topic but broad.** Top chunks for
  the p04 architecture query are PDR intro, CAD assembly overview,
  and the index — all on the right project but none of them directly
  answer the "why was Option B chosen" question. The authoritative
  answer sits in Project State, not in the chunks.
- **Active memories are NOT reaching the pack.** The context builder
  surfaces Trusted Project State and retrieved chunks but does not
  include the 21 active project/knowledge memories. Reinforcement
  (Phase 9 Commit B) bumps memory confidence without the memory ever
  being read back into a prompt — the reflection loop has no outlet
  on the retrieval side. This is a design gap, not a bug: needs a
  decision on whether memories should feed into context assembly,
  and if so at what trust level (below project_state, above chunks).
- **Cross-project bleed is low.** The p04 query did pull one p05
  chunk (CGH_Design_Input_for_AOM) as the bottom hit but the top-4
  were all p04.

Proposed follow-ups (not yet scheduled):

1. ~~Decide whether memories should be folded into `formatted_context`
   and under what section header.~~ DONE 2026-04-11 (commits 8ea53f4,
   5913da5, 1161645). A `--- Project Memories ---` band now sits
   between identity/preference and retrieved chunks, gated on a
   canonical project hint to prevent cross-project bleed. Budget
   ratio 0.25 (tuned empirically — paragraph memories are ~400 chars
   and earlier 0.15 ratio starved the first entry by one char).
   Verified live: p04 architecture query surfaces the Option B memory.
2. Re-run the same three queries after any builder change and compare
   `formatted_context` diffs — still open, and is the natural entry
   point for the retrieval eval harness on the roadmap.

## Reflection Loop Live Check — 2026-04-11

First real run of `batch-extract` across 42 captured Claude Code
interactions on Dalidou produced exactly **1 candidate**, and that
candidate was a synthetic test capture from earlier in the session
(rejected). Finding:

- The rule-based extractor in `src/atocore/memory/extractor.py` keys
  on explicit structural cues (decision headings like
  `## Decision: ...`, preference sentences, etc.). Real Claude Code
  responses are conversational and almost never contain those cues.
- This means the capture → extract half of the reflection loop is
  effectively inert against organic LLM sessions until either the
  rules are broadened (new cue families: "we chose X because...",
  "the selected approach is...", etc.) or an LLM-assisted extraction
  path is added alongside the rule-based one.
- Capture → reinforce is working correctly on live data (length-aware
  matcher verified on live paraphrase of a p04 memory).

Follow-up candidates:

1. ~~Extractor rule expansion~~ — Day 2 baseline showed 0% recall
   across 5 distinct miss classes; rule expansion cannot close a
   5-way miss. Deprioritized.
2. ~~LLM-assisted extractor~~ — DONE 2026-04-12. `extractor_llm.py`
   shells out to `claude -p` (Haiku, OAuth, no API key). First live
   run: 100% recall, 2.55 yield/interaction on a 20-interaction
   labeled set. First triage: 51 candidates → 16 promoted, 35
   rejected (31% accept rate). Active memories 20 → 36.
3. ~~Retrieval eval harness~~ — DONE 2026-04-11 (scripts/retrieval_eval.py,
   6/6 passing). Expansion to 15-20 fixtures is mini-phase Day 6.

## Extractor Scope — 2026-04-12

What the LLM-assisted extractor (`src/atocore/memory/extractor_llm.py`)
extracts from conversational Claude Code captures:

**In scope:**

- Architectural commitments (e.g. "Z-axis is engage/retract, not
  continuous position")
- Ratified decisions with project scope (e.g. "USB SSD mandatory on
  RPi for telemetry storage")
- Durable engineering facts (e.g. "telemetry data rate ~29 MB/hour")
- Working rules and adaptation patterns (e.g. "extraction stays off
  the capture hot path")
- Interface invariants (e.g. "controller-job.v1 in, run-log.v1 out;
  no firmware change needed")

**Out of scope (intentionally rejected by triage):**

- Transient roadmap / plan steps that will be stale in a week
- Operational instructions ("run this command to deploy")
- Process rules that live in DEV-LEDGER.md / AGENTS.md, not in memory
- Implementation details that are too granular (individual field names
  when the parent concept is already captured)
- Already-fixed review findings (P1/P2 that no longer apply)
- Duplicates of existing active memories with wrong project tags

**Trust model:**

- Extraction stays off the capture hot path (batch / manual only)
- All candidates land as `status=candidate`, never auto-promoted
- Human or auto-triage reviews before promotion to active
- Future direction: multi-model extraction + triage (Codex/Gemini as
  second-pass reviewers for robustness against single-model bias)

## Long-Run Goal

The long-run target is:

- continue working normally inside PKM project stacks and Gitea repos
- let OpenClaw keep its own memory and runtime behavior
- let AtoCore supplement LLM work with stronger trusted context, retrieval, and
  context assembly

That means AtoCore should behave like a durable external context engine and
machine-memory layer, not a replacement for normal repo work or OpenClaw memory.