fix(retrieval): enforce project-scoped context boundaries

This commit is contained in:
2026-04-24 10:46:56 -04:00
parent c53e61eb67
commit c7212900b0
11 changed files with 737 additions and 68 deletions

View File

@@ -1,6 +1,7 @@
# AtoCore Current State (2026-04-22)
# AtoCore - Current State (2026-04-24)
Live deploy: `2712c5d` · Dalidou health: ok · Harness: 17/18 · Tests: 547 passing.
Live deploy: `2b86543` · Dalidou health: ok · Harness: 18/20 with 1 known
content gap and 1 current blocking project-bleed guard · Tests: 553 passing.
## V1-0 landed 2026-04-22
@@ -13,9 +14,8 @@ supersede) with Q-3 fail-open. Prod backfill ran cleanly — 31 legacy
active/superseded entities flagged `hand_authored=1`, follow-up dry-run
returned 0 remaining rows. Test count 533 → 547 (+14).
R14 (P2, non-blocking): `POST /entities/{id}/promote` route fix translates
the new `ValueError` into 400. Branch `claude/r14-promote-400` pending
Codex review + squash-merge.
R14 is closed: `POST /entities/{id}/promote` now translates the new
caller-fixable V1-0 `ValueError` into HTTP 400.
**Next in the V1 track:** V1-A (minimal query slice + Q-6 killer-correctness
integration). Gated on pipeline soak (~2026-04-26) + 100+ active memory
@@ -65,10 +65,10 @@ Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs
| 7G | Re-extraction on prompt version bump | pending |
| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |
## Known gaps (honest)
## Known gaps (honest, refreshed 2026-04-24)
1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
2. **OpenClaw is capture-only, not context-grounded.** The plugin POSTs `/interactions` on `llm_output` but does NOT call `/context/build` on `before_agent_start`. OpenClaw's underlying agent runs blind. See Q2 below.
3. **Human interface (wiki) is thin and static.** 5 project cards + a "System" line. No dashboard for the autonomous activity. No per-memory detail page. See Q3/Q5.
4. **Harness 17/18** — the `p04-constraints` fixture wants "Zerodur" but retrieval surfaces related-not-exact terms. Content gap, not a retrieval regression.
5. **Two projects under-populated**: p05-interferometer (4 memories, 18 state) and atomizer-v2 (1 memory, 6 state). Batch re-extract with the new llm-0.6.0 prompt would help.
2. **Project-scoped retrieval still needs deployment verification.** The April 24 audit reproduced cross-project competition on broad p05 prompts. The current branch adds registry-aware project filtering and a harness guard; verify after deploy.
3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
4. **Harness known issue:** `p04-constraints` wants "Zerodur" and "1.2"; live retrieval surfaces related constraints but not those exact strings. Treat as content/state gap until fixed.
5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.

View File

@@ -70,9 +70,14 @@ read-only additive mode.
- Phase 6 - AtoDrive
- Phase 10 - Write-back
- Phase 11 - Multi-model
- Phase 12 - Evaluation
- Phase 13 - Hardening
### Partial / Operational Baseline
- Phase 12 - Evaluation. The retrieval/context harness exists and runs
against live Dalidou, but coverage is still intentionally small and
should grow before this is complete in the intended sense.
### Engineering Layer Planning Sprint
**Status: complete.** All 8 architecture docs are drafted. The
@@ -126,11 +131,13 @@ This sits implicitly between Phase 8 (OpenClaw) and Phase 11
(multi-model). Memory-review and engineering-entity commands are
deferred from the shared client until their workflows are exercised.
## What Is Real Today (updated 2026-04-16)
## What Is Real Today (updated 2026-04-24)
- canonical AtoCore runtime on Dalidou (`775960c`, deploy.sh verified)
- canonical AtoCore runtime on Dalidou (`2b86543`, deploy.sh verified)
- 33,253 vectors across 6 registered projects
- 234 captured interactions (192 claude-code, 38 openclaw, 4 test)
- 950 captured interactions as of the 2026-04-24 live dashboard; refresh
exact live counts with
`python scripts/live_status.py`
- 6 registered projects:
- `p04-gigabit` (483 docs, 15 state entries)
- `p05-interferometer` (109 docs, 18 state entries)
@@ -138,12 +145,15 @@ deferred from the shared client until their workflows are exercised.
- `atomizer-v2` (568 docs, 5 state entries)
- `abb-space` (6 state entries)
- `atocore` (drive source, 47 state entries)
- 110 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
- 84 active memories (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
- 128 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
- 290 active memories and 0 candidate memories as of the 2026-04-24 live
dashboard
- context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
- query-relevance memory ranking with overlap-density scoring
- retrieval eval harness: 18 fixtures, 17/18 passing on live
- 303 tests passing
- retrieval eval harness: 20 fixtures; current live has 18 pass, 1 known
content gap, and 1 blocking cross-project bleed guard targeted by the
current retrieval-scoping branch
- 553 tests passing on the audit-improvements branch
- nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness****pipeline summary to project state**
- Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
- pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
@@ -190,9 +200,9 @@ where surfaces are disjoint, pauses when they collide.
| V1-E | Memory→entity graduation end-to-end + remaining Q-4 trust tests | pending V1-D (note: collides with memory extractor; pauses for multi-model triage work) |
| V1-F | F-5 detector generalization + route alias + O-1/O-2/O-3 operational + D-1/D-3/D-4 docs | finish line |
R14 (P2, non-blocking): `POST /entities/{id}/promote` route returns 500
on the new V1-0 `ValueError` instead of 400. Fix on branch
`claude/r14-promote-400`, pending Codex review.
R14 is closed: `POST /entities/{id}/promote` now translates
caller-fixable V1-0 provenance validation failures into HTTP 400 instead
of leaking as HTTP 500.
## Next