scripts/retrieval_eval.py walks a fixture file of project-hinted
questions, runs each against POST /context/build, and scores the
returned formatted_context against per-fixture expect_present and
expect_absent substring checklists. Exit 0 on all-pass, 1 on any
miss. Human-readable by default, --json for automation.
First live run against Dalidou at SHA 1161645: 4/6 pass. The two
failures are real findings, not harness bugs:
- p05-configuration FAIL: "GigaBIT M1" appears in the p05 pack.
Cross-project bleed from a shared p05 doc that legitimately
mentions the p04 mirror under test. Fixture kept strict so
future ranker tuning can close the gap.
- p05-vendor-signal FAIL: "Zygo" missing. The vendor memory exists
with confidence 0.9 but get_memories_for_context walks memories
in fixed order (effectively by updated_at / confidence), so lower-
ranked memories get pushed out of the per-project budget slice by
higher-confidence ones even when the query is specifically about
the lower-ranked content. Query-relevance ordering of memories is
the natural next fix.
Docs sync:
- master-plan-status.md: Phase 9 reflection entry now notes that
capture→reinforce runs automatically and project memories reach
the context pack, while extract remains batch/manual. First batch-
extract pass surfaced 1 candidate from 42 interactions — extractor
rule tuning is a known follow-up.
- next-steps.md: the 2026-04-11 retrieval quality review entry now
shows the project-memory-band work as DONE, and a new
"Reflection Loop Live Check" subsection records the extractor-
coverage finding from the first batch run.
- Both files now agree with the code; follow-up reviewers
(Codex, future Claude) should no longer see narrative drift.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
197 lines
7.4 KiB
Markdown
197 lines
7.4 KiB
Markdown
# AtoCore Master Plan Status
|
|
|
|
## Current Position
|
|
|
|
AtoCore is currently between **Phase 7** and **Phase 8**.
|
|
|
|
The platform is no longer just a proof of concept. The local engine exists, the
|
|
core correctness pass is complete, Dalidou hosts the canonical runtime and
|
|
machine database, and OpenClaw on the T420 can consume AtoCore safely in
|
|
read-only additive mode.
|
|
|
|
## Phase Status
|
|
|
|
### Completed
|
|
|
|
- Phase 0 - Foundation
|
|
- Phase 0.5 - Proof of Concept
|
|
- Phase 1 - Ingestion
|
|
|
|
### Baseline Complete
|
|
|
|
- Phase 2 - Memory Core
|
|
- Phase 3 - Retrieval
|
|
- Phase 5 - Project State
|
|
- Phase 7 - Context Builder
|
|
|
|
### Partial
|
|
|
|
- Phase 4 - Identity / Preferences
|
|
- Phase 8 - OpenClaw Integration
|
|
|
|
### Baseline Complete
|
|
|
|
- Phase 9 - Reflection (all three foundation commits landed:
|
|
A capture, B reinforcement, C candidate extraction + review queue).
|
|
As of 2026-04-11 the capture → reinforce half runs automatically on
|
|
every Stop-hook capture (length-aware token-overlap matcher handles
|
|
paragraph-length memories), and project-scoped memories now reach
|
|
the context pack via a dedicated `--- Project Memories ---` band
|
|
between identity/preference and retrieved chunks. The extract half
|
|
is still a manual / batch flow by design (`scripts/atocore_client.py
|
|
batch-extract` + `triage`). First live batch-extract run over 42
|
|
captured interactions produced 1 candidate (rule extractor is
|
|
conservative and keys on structural cues like `## Decision:`
|
|
headings that rarely appear in conversational LLM responses) —
|
|
extractor tuning is a known follow-up.
|
|
|
|
### Not Yet Complete In The Intended Sense
|
|
|
|
- Phase 6 - AtoDrive
|
|
- Phase 10 - Write-back
|
|
- Phase 11 - Multi-model
|
|
- Phase 12 - Evaluation
|
|
- Phase 13 - Hardening
|
|
|
|
### Engineering Layer Planning Sprint
|
|
|
|
**Status: complete.** All 8 architecture docs are drafted. The
|
|
engineering layer is now ready for V1 implementation against the
|
|
active project set.
|
|
|
|
- [engineering-query-catalog.md](architecture/engineering-query-catalog.md) —
|
|
the 20 v1-required queries the engineering layer must answer
|
|
- [memory-vs-entities.md](architecture/memory-vs-entities.md) —
|
|
canonical home split between memory and entity tables
|
|
- [promotion-rules.md](architecture/promotion-rules.md) —
|
|
Layer 0 → Layer 2 pipeline, triggers, review queue mechanics
|
|
- [conflict-model.md](architecture/conflict-model.md) —
|
|
detection, representation, and resolution of contradictory facts
|
|
- [tool-handoff-boundaries.md](architecture/tool-handoff-boundaries.md) —
|
|
KB-CAD / KB-FEM one-way mirror stance, ingest endpoints, drift handling
|
|
- [representation-authority.md](architecture/representation-authority.md) —
|
|
canonical home matrix across PKM / KB / repos / AtoCore for 22 fact kinds
|
|
- [human-mirror-rules.md](architecture/human-mirror-rules.md) —
|
|
templates, regeneration triggers, edit flow, "do not edit" enforcement
|
|
- [engineering-v1-acceptance.md](architecture/engineering-v1-acceptance.md) —
|
|
measurable done definition with 23 acceptance criteria
|
|
- [engineering-knowledge-hybrid-architecture.md](architecture/engineering-knowledge-hybrid-architecture.md) —
|
|
the 5-layer model (from the previous planning wave)
|
|
- [engineering-ontology-v1.md](architecture/engineering-ontology-v1.md) —
|
|
the initial V1 object and relationship inventory (previous wave)
|
|
- [project-identity-canonicalization.md](architecture/project-identity-canonicalization.md) —
|
|
the helper-at-every-service-boundary contract that keeps the
|
|
trust hierarchy dependable across alias and canonical-id callers;
|
|
required reading before adding new project-keyed entity surfaces
|
|
in the V1 implementation sprint
|
|
|
|
The next concrete next step is the V1 implementation sprint, which
|
|
should follow engineering-v1-acceptance.md as its checklist, and
|
|
must apply the project-identity-canonicalization contract at every
|
|
new service-layer entry point.
|
|
|
|
### LLM Client Integration
|
|
|
|
A separate but related architectural concern: how AtoCore is reachable
|
|
from many different LLM client contexts (OpenClaw, Claude Code, future
|
|
Codex skills, future MCP server). The layering rule is documented in:
|
|
|
|
- [llm-client-integration.md](architecture/llm-client-integration.md) —
|
|
three-layer shape: HTTP API → shared operator client
|
|
(`scripts/atocore_client.py`) → per-agent thin frontends; the
|
|
shared client is the canonical backbone every new client should
|
|
shell out to instead of reimplementing HTTP calls
|
|
|
|
This sits implicitly between Phase 8 (OpenClaw) and Phase 11
|
|
(multi-model). Memory-review and engineering-entity commands are
|
|
deferred from the shared client until their workflows are exercised.
|
|
|
|
## What Is Real Today
|
|
|
|
- canonical AtoCore runtime on Dalidou
|
|
- canonical machine DB and vector store on Dalidou
|
|
- project registry with:
|
|
- template
|
|
- proposal preview
|
|
- register
|
|
- update
|
|
- refresh
|
|
- read-only additive OpenClaw helper on the T420
|
|
- seeded project corpus for:
|
|
- `p04-gigabit`
|
|
- `p05-interferometer`
|
|
- `p06-polisher`
|
|
- conservative Trusted Project State for those active projects
|
|
- first operational backup foundation for SQLite + project registry
|
|
- implementation-facing architecture notes for future engineering knowledge work
|
|
- first organic routing layer in OpenClaw via:
|
|
- `detect-project`
|
|
- `auto-context`
|
|
|
|
## Now
|
|
|
|
These are the current practical priorities.
|
|
|
|
1. Finish practical OpenClaw integration
|
|
- make the helper lifecycle feel natural in daily use
|
|
- use the new organic routing layer for project-knowledge questions
|
|
- confirm fail-open behavior remains acceptable
|
|
- keep AtoCore clearly additive
|
|
2. Tighten retrieval quality
|
|
- reduce cross-project competition
|
|
- improve ranking on short or ambiguous prompts
|
|
- add only a few anchor docs where retrieval is still weak
|
|
3. Continue controlled ingestion
|
|
- deepen active projects selectively
|
|
- avoid noisy bulk corpus growth
|
|
4. Strengthen operational boringness
|
|
- backup and restore procedure
|
|
- Chroma rebuild / backup policy
|
|
- retention and restore validation
|
|
|
|
## Next
|
|
|
|
These are the next major layers after the current practical pass.
|
|
|
|
1. Clarify AtoDrive as a real operational truth layer
|
|
2. Mature identity / preferences handling
|
|
3. Improve observability for:
|
|
- retrieval quality
|
|
- context-pack inspection
|
|
- comparison of behavior with and without AtoCore
|
|
|
|
## Later
|
|
|
|
These are the deliberate future expansions already supported by the architecture
|
|
direction, but not yet ready for immediate implementation.
|
|
|
|
1. Minimal engineering knowledge layer
|
|
- driven by `docs/architecture/engineering-knowledge-hybrid-architecture.md`
|
|
- guided by `docs/architecture/engineering-ontology-v1.md`
|
|
2. Minimal typed objects and relationships
|
|
3. Evidence-linking and provenance-rich structured records
|
|
4. Human mirror generation from structured state
|
|
|
|
## Not Yet
|
|
|
|
These remain intentionally deferred.
|
|
|
|
- automatic write-back from OpenClaw into AtoCore
|
|
- automatic memory promotion
|
|
- ~~reflection loop integration~~ — baseline now in (capture→reinforce
|
|
auto, extract batch/manual). Extractor tuning and scheduled batch
|
|
extraction still open.
|
|
- replacing OpenClaw's own memory system
|
|
- live machine-DB sync between machines
|
|
- full ontology / graph expansion before the current baseline is stable
|
|
|
|
## Working Rule
|
|
|
|
The next sensible implementation threshold for the engineering ontology work is:
|
|
|
|
- after the current ingestion, retrieval, registry, OpenClaw helper, organic
|
|
routing, and backup baseline feels boring and dependable
|
|
|
|
Until then, the architecture docs should shape decisions, not force premature
|
|
schema work.
|