Files

Anto01 4da81c9e4e feat: retrieval eval harness + doc sync

scripts/retrieval_eval.py walks a fixture file of project-hinted
questions, runs each against POST /context/build, and scores the
returned formatted_context against per-fixture expect_present and
expect_absent substring checklists. Exit 0 on all-pass, 1 on any
miss. Human-readable by default, --json for automation.

First live run against Dalidou at SHA 1161645: 4/6 pass. The two
failures are real findings, not harness bugs:

- p05-configuration FAIL: "GigaBIT M1" appears in the p05 pack.
  Cross-project bleed from a shared p05 doc that legitimately
  mentions the p04 mirror under test. Fixture kept strict so
  future ranker tuning can close the gap.
- p05-vendor-signal FAIL: "Zygo" missing. The vendor memory exists
  with confidence 0.9 but get_memories_for_context walks memories
  in fixed order (effectively by updated_at / confidence), so lower-
  ranked memories get pushed out of the per-project budget slice by
  higher-confidence ones even when the query is specifically about
  the lower-ranked content. Query-relevance ordering of memories is
  the natural next fix.

Docs sync:

- master-plan-status.md: Phase 9 reflection entry now notes that
  capture→reinforce runs automatically and project memories reach
  the context pack, while extract remains batch/manual. First batch-
  extract pass surfaced 1 candidate from 42 interactions — extractor
  rule tuning is a known follow-up.
- next-steps.md: the 2026-04-11 retrieval quality review entry now
  shows the project-memory-band work as DONE, and a new
  "Reflection Loop Live Check" subsection records the extractor-
  coverage finding from the first batch run.
- Both files now agree with the code; follow-up reviewers
  (Codex, future Claude) should no longer see narrative drift.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-11 12:39:03 -04:00

7.4 KiB

Raw Blame History

AtoCore Master Plan Status

Current Position

AtoCore is currently between Phase 7 and Phase 8.

The platform is no longer just a proof of concept. The local engine exists, the core correctness pass is complete, Dalidou hosts the canonical runtime and machine database, and OpenClaw on the T420 can consume AtoCore safely in read-only additive mode.

Phase Status

Completed

Phase 0 - Foundation
Phase 0.5 - Proof of Concept
Phase 1 - Ingestion

Baseline Complete

Phase 2 - Memory Core
Phase 3 - Retrieval
Phase 5 - Project State
Phase 7 - Context Builder

Partial

Phase 4 - Identity / Preferences
Phase 8 - OpenClaw Integration

Baseline Complete

Phase 9 - Reflection (all three foundation commits landed: A capture, B reinforcement, C candidate extraction + review queue). As of 2026-04-11 the capture → reinforce half runs automatically on every Stop-hook capture (length-aware token-overlap matcher handles paragraph-length memories), and project-scoped memories now reach the context pack via a dedicated --- Project Memories --- band between identity/preference and retrieved chunks. The extract half is still a manual / batch flow by design (scripts/atocore_client.py batch-extract + triage). First live batch-extract run over 42 captured interactions produced 1 candidate (rule extractor is conservative and keys on structural cues like ## Decision: headings that rarely appear in conversational LLM responses) — extractor tuning is a known follow-up.

Not Yet Complete In The Intended Sense

Phase 6 - AtoDrive
Phase 10 - Write-back
Phase 11 - Multi-model
Phase 12 - Evaluation
Phase 13 - Hardening

Engineering Layer Planning Sprint

Status: complete. All 8 architecture docs are drafted. The engineering layer is now ready for V1 implementation against the active project set.

engineering-query-catalog.md — the 20 v1-required queries the engineering layer must answer
memory-vs-entities.md — canonical home split between memory and entity tables
promotion-rules.md — Layer 0 → Layer 2 pipeline, triggers, review queue mechanics
conflict-model.md — detection, representation, and resolution of contradictory facts
tool-handoff-boundaries.md — KB-CAD / KB-FEM one-way mirror stance, ingest endpoints, drift handling
representation-authority.md — canonical home matrix across PKM / KB / repos / AtoCore for 22 fact kinds
human-mirror-rules.md — templates, regeneration triggers, edit flow, "do not edit" enforcement
engineering-v1-acceptance.md — measurable done definition with 23 acceptance criteria
engineering-knowledge-hybrid-architecture.md — the 5-layer model (from the previous planning wave)
engineering-ontology-v1.md — the initial V1 object and relationship inventory (previous wave)
project-identity-canonicalization.md — the helper-at-every-service-boundary contract that keeps the trust hierarchy dependable across alias and canonical-id callers; required reading before adding new project-keyed entity surfaces in the V1 implementation sprint

The next concrete next step is the V1 implementation sprint, which should follow engineering-v1-acceptance.md as its checklist, and must apply the project-identity-canonicalization contract at every new service-layer entry point.

LLM Client Integration

A separate but related architectural concern: how AtoCore is reachable from many different LLM client contexts (OpenClaw, Claude Code, future Codex skills, future MCP server). The layering rule is documented in:

llm-client-integration.md — three-layer shape: HTTP API → shared operator client (scripts/atocore_client.py) → per-agent thin frontends; the shared client is the canonical backbone every new client should shell out to instead of reimplementing HTTP calls

This sits implicitly between Phase 8 (OpenClaw) and Phase 11 (multi-model). Memory-review and engineering-entity commands are deferred from the shared client until their workflows are exercised.

What Is Real Today

canonical AtoCore runtime on Dalidou
canonical machine DB and vector store on Dalidou
project registry with:
- template
- proposal preview
- register
- update
- refresh
read-only additive OpenClaw helper on the T420
seeded project corpus for:
- p04-gigabit
- p05-interferometer
- p06-polisher
conservative Trusted Project State for those active projects
first operational backup foundation for SQLite + project registry
implementation-facing architecture notes for future engineering knowledge work
first organic routing layer in OpenClaw via:
- detect-project
- auto-context

Now

These are the current practical priorities.

Finish practical OpenClaw integration
- make the helper lifecycle feel natural in daily use
- use the new organic routing layer for project-knowledge questions
- confirm fail-open behavior remains acceptable
- keep AtoCore clearly additive
Tighten retrieval quality
- reduce cross-project competition
- improve ranking on short or ambiguous prompts
- add only a few anchor docs where retrieval is still weak
Continue controlled ingestion
- deepen active projects selectively
- avoid noisy bulk corpus growth
Strengthen operational boringness
- backup and restore procedure
- Chroma rebuild / backup policy
- retention and restore validation

These are the next major layers after the current practical pass.

Clarify AtoDrive as a real operational truth layer
Mature identity / preferences handling
Improve observability for:
- retrieval quality
- context-pack inspection
- comparison of behavior with and without AtoCore

Later

These are the deliberate future expansions already supported by the architecture direction, but not yet ready for immediate implementation.

Minimal engineering knowledge layer
- driven by docs/architecture/engineering-knowledge-hybrid-architecture.md
- guided by docs/architecture/engineering-ontology-v1.md
Minimal typed objects and relationships
Evidence-linking and provenance-rich structured records
Human mirror generation from structured state

Not Yet

These remain intentionally deferred.

automatic write-back from OpenClaw into AtoCore
automatic memory promotion
~~reflection loop integration~~ — baseline now in (capture→reinforce auto, extract batch/manual). Extractor tuning and scheduled batch extraction still open.
replacing OpenClaw's own memory system
live machine-DB sync between machines
full ontology / graph expansion before the current baseline is stable

Working Rule

The next sensible implementation threshold for the engineering ontology work is:

after the current ingestion, retrieval, registry, OpenClaw helper, organic routing, and backup baseline feels boring and dependable

Until then, the architecture docs should shape decisions, not force premature schema work.

7.4 KiB Raw Blame History

AtoCore Master Plan Status

Current Position

Phase Status

Completed

Baseline Complete

Partial

Baseline Complete

Not Yet Complete In The Intended Sense

Engineering Layer Planning Sprint

LLM Client Integration

What Is Real Today

Now

Next

Later

Not Yet

Working Rule

7.4 KiB

Raw Blame History