5 Commits

Author SHA1 Message Date
4e6fba7cb9 docs(ledger): Wave 1 session log + Orientation refresh
Records the 2026-04-29 audit findings, the Codex review, and the three
memory-write-path fixes shipped on this branch. Refreshes Orientation
with live state verified against Dalidou: build 7042eae, harness 20/20,
1054 interactions, 315/1091 dashboard/integrity memory counts (the
sampling bug behind the discrepancy is fixed in the prior commit; the
two will agree post-deploy).

Also captures the unregistered emerging-projects list (apm=63, openclaw=9,
lead-space=2, drill=1, optiques-fullum=1) so the Wave 1 follow-up
(one-click registration proposal) doesn't get lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:41:18 -04:00
fb4d55cbcd fix(memory): SQL-aggregate dashboard counts, project on update, id-based invalidate
Three bugs surfaced by the 2026-04-29 Codex review of the state-of-the-service
plan, all in the memory write/read path:

1. /admin/dashboard memory counts were derived from a confidence-sorted
   get_memories(limit=500) sample. With prod at 1091 active memories the
   dashboard reported 315 ("active in the top 500"), while integrity
   reported the SQL aggregate 1091. Replaced the sampling block with a
   new get_memory_count_summary() helper that does straight SQL aggregates
   over status/type/project. Dashboard memories.{active,candidates,...}
   now match integrity. Adds memories.{by_status,total} for completeness.

2. PUT /memory/{id} silently dropped project changes because
   MemoryUpdateRequest had no project field and update_memory() didn't
   accept one. auto_triage.py:407 detects suggested_project drift and
   issues a PUT to fix it; the fix never landed. Added project to the
   request schema and the service signature, with resolve_project_name
   canonicalization, before/after audit snapshot, and the existing
   duplicate-active check now scoped to the new project.

3. POST /memory/{id}/invalidate did _get_memories(status="active",
   limit=1) and looked for the target inside that single
   highest-confidence row. Any other active memory 404'd. Replaced with
   a direct id lookup via the new get_memory(id) helper; status branching
   stays the same (404 unknown / 200 already-invalid / 409 wrong-status /
   200 invalidated).

Tests added (9):
- test_get_memory_count_summary_returns_full_table_aggregates
- test_get_memory_returns_single_row_or_none
- test_update_memory_can_change_project_with_canonicalization
- test_update_memory_project_unchanged_when_not_passed
- test_api_invalidate_finds_active_memory_outside_top_one
- test_api_invalidate_already_invalid_is_idempotent
- test_api_invalidate_candidate_returns_409
- test_api_invalidate_unknown_id_is_404
- test_admin_dashboard_active_count_matches_full_table

Test count: 572 -> 581. Full suite green locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:40:10 -04:00
7042eaea46 docs: record fully green retrieval harness 2026-04-24 21:04:13 -04:00
d3de9f67ea fix(context): rank trusted state by query relevance 2026-04-24 20:54:56 -04:00
0fc6705173 docs: record retrieval stabilization state 2026-04-24 20:45:05 -04:00
10 changed files with 457 additions and 78 deletions

View File

@@ -6,26 +6,26 @@
## Orientation ## Orientation
- **live_sha** (Dalidou `/health` build_sha): `f44a211` (verified 2026-04-24T14:48:44Z post audit-improvements deploy; status=ok) - **live_sha** (Dalidou `/health` build_sha): `7042eae` (verified 2026-04-29T01:19Z; status=ok; deployed 2026-04-25T01:04Z, docs-only on top of `d3de9f6`)
- **last_updated**: 2026-04-24 by Codex (retrieval boundary deployed; project_id metadata branch started) - **last_updated**: 2026-04-29 by Claude (Wave 1 debt-pay branch open; Codex review of plan applied)
- **main_tip**: `f44a211` - **main_tip**: `7042eae`
- **test_count**: 567 on `codex/project-id-metadata-retrieval` (deployed main baseline: 553) - **test_count**: 581 on `claude/wave1-dashboard-counts-and-memory-fixes` (572 on main + 9 Wave 1 regressions)
- **harness**: `19/20 PASS` on live Dalidou, 0 blocking failures, 1 known content gap (`p04-constraints`) - **harness**: `20/20 PASS` on live Dalidou, 0 blocking failures, 0 known issues (last harness run nightly 2026-04-28T03:00:30Z)
- **vectors**: 33,253 - **vectors**: 33,253
- **active_memories**: 290 (`/admin/dashboard` 2026-04-24; note integrity panel reports a separate active_memory_count=951 and needs reconciliation) - **active_memories**: 315 dashboard / 1091 integrity (verified live 2026-04-29). Discrepancy was a dashboard sampling bug — Wave 1 commit `fb4d55c` replaces it with SQL aggregates; post-deploy the two will agree.
- **candidate_memories**: 0 (triage queue drained) - **candidate_memories**: 0 (queue drained, +103 captures since 2026-04-25)
- **interactions**: 951 (`/admin/dashboard` 2026-04-24) - **interactions**: 1054 (claude-code 474, openclaw 576; verified `/admin/dashboard` 2026-04-29)
- **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08) - **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08). **Auto-detected, unregistered**: apm (63 active memories), openclaw (9), lead-space (2), drill (1), optiques-fullum (1) — see Wave 1 follow-up below.
- **project_state_entries**: 128 across registered projects (`/admin/dashboard` 2026-04-24) - **project_state_entries**: 128 across registered projects (verified 2026-04-29)
- **entities**: 66 (up from 35 — V1-0 backfill + ongoing work; 0 open conflicts) - **entities**: 66 (V1-0 backfill complete; 0 open conflicts)
- **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified - **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified
- **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)****pipeline summary (NEW)** - **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → auto-promote/expire → weekly synth/lint Sundays → retrieval harness pipeline summary
- **capture_clients**: claude-code (Stop hook + cwd project inference), openclaw (before_agent_start + llm_output plugin, verified live) - **capture_clients**: claude-code (Stop hook + cwd project inference), openclaw (before_agent_start + llm_output plugin, verified live)
- **wiki**: http://dalidou:8100/wiki (browse), /wiki/projects/{id}, /wiki/entities/{id}, /wiki/search - **wiki**: http://dalidou:8100/wiki (browse), /wiki/projects/{id}, /wiki/entities/{id}, /wiki/search
- **dashboard**: http://dalidou:8100/admin/dashboard (now shows pipeline health, interaction totals by client, all registered projects) - **dashboard**: http://dalidou:8100/admin/dashboard
- **active_track**: Engineering V1 Completion (started 2026-04-22). V1-0 landed (`2712c5d`). V1-A density gate CLEARED (784 active ≫ 100 target as of 2026-04-23). V1-A soak gate at day 5/~7 (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; failures confined to the known p04-constraints content gap). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`. - **active_track**: Wave 1 debt-pay (this session) → V1-A start. V1-A gates have cleared (soak ended 2026-04-26; density 315 ≫ 100). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`.
- **last_nightly_pipeline**: `2026-04-23T03:00:20Z` — harness 17/18, triage promoted=3 rejected=7 human=0, dedup 7 clusters (1 tier1 + 6 tier2 auto-merged), graduation 30-skipped 0-graduated 0-errors, auto-triage drained the queue (0 new candidates 2026-04-22T00:52Z run) - **last_nightly_pipeline**: `2026-04-28T03:00:30Z` — harness 20/20, triage promoted=1 rejected=1 human=0
- **open_branches**: none — R14 squash-merged as `0989fed` and deployed 2026-04-23T15:20:53Z. V1-A is the next scheduled work - **open_branches**: `claude/wave1-dashboard-counts-and-memory-fixes` (commit `fb4d55c`) — three memory-write-path bugs surfaced by Codex review; awaiting Codex audit before merge/deploy.
## Active Plan ## Active Plan
@@ -170,6 +170,12 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
## Session Log ## Session Log
- **2026-04-29 Claude (Wave 1 debt-pay started; Codex review of state-of-service plan)** Audited live state on Dalidou: `/health` build_sha `7042eae` (4d old), harness 20/20, 33,253 vectors, 1,748 docs, 1054 interactions (+103/4d), dashboard memories.active=315 vs integrity.active_memory_count=1091. Drafted a state-of-service assessment + Wave 1/2/3/4 plan, then asked Codex (gpt-5.5) for an adversarial review via `codex exec`. Codex verdict: assessment MIXED, plan ENDORSE WITH CHANGES. Codex caught two factual corrections (the 315-vs-1091 gap is a *sampling bug* not a definitional gap; my "R9 drops unregistered tags" framing is wrong — `extractor_llm.py:213-233` preserves them) and two new memory-write-path bugs I missed. Branched `claude/wave1-dashboard-counts-and-memory-fixes` from `7042eae`, fixed all three: (1) `/admin/dashboard` now uses a new `get_memory_count_summary()` SQL aggregate helper instead of counting inside a confidence-sorted `get_memories(limit=500)` sample; (2) `MemoryUpdateRequest` and `update_memory()` accept `project` with `resolve_project_name` canonicalization + before/after audit, so `auto_triage.py:407` suggested-project corrections will now actually apply; (3) `POST /memory/{id}/invalidate` replaces the `_get_memories(status="active", limit=1)` lookup (which only saw the highest-confidence active row) with a direct id lookup via new `get_memory(id)` helper. 9 regression tests added across `test_memory.py` and `test_invalidate_supersede.py`. Full local suite: 581 passed (572 → 581). Commit `fb4d55c`. Branch not pushed/deployed yet — awaiting Codex audit per working model. Refreshed Orientation block (live_sha, last_updated, test_count, memory counts, capture cadence, registered/unregistered project breakdown, open_branches). Wave 1 follow-up still open: (W1.2) one-click registration proposal for unregistered projects with ≥10 active memories (apm=63 is overdue); (W1.3 done by this entry) sync ledger; (W1.4) committed measurable-win probe fixture+JSON output. After this branch lands, V1-A is unblocked: gates have effectively cleared (soak ended 2026-04-26; density 315 ≫ 100 target).
- **2026-04-25 Codex (p04 constraint gap closed; harness fully green)** Root-caused the remaining `p04-constraints` fixture: the `Zerodur` / `1.2` fact already existed in Trusted Project State (`requirement/key_constraints`), but project-state formatting was category/key ordered and then truncated to the 20% state budget, so contacts/decisions consumed the budget before the relevant requirement. Added query-relevance ranking for Trusted Project State entries before formatting/truncation, with regression coverage in `test_project_state_query_relevance_before_truncation`. Removed the fixture's `known_issue` lane so future p04 constraint regressions are blocking. Cleaned up a duplicate live requirement entry created during diagnosis by invalidating `requirement/mirror-blank-core-constraints`; canonical `requirement/key_constraints` remains active. Verified focused suite: 35 passed. Verified full local suite: 572 passed. Deployed `d3de9f67eaa08dfc5b2d86e8221b8c70fef266d3`; live exact p04 probe now surfaces `[REQUIREMENT] key_constraints` with `1.2` and `Zerodur`. Live retrieval harness: 20/20, 0 known issues, 0 blocking failures.
- **2026-04-25 Codex (project_id backfill + retrieval stabilization closed)** Merged `codex/project-id-metadata-retrieval` into `main` (`867a1ab`) and deployed to Dalidou. Took Chroma-inclusive backup `/srv/storage/atocore/backups/snapshots/20260424T154358Z`, then ran `scripts/backfill_chunk_project_ids.py` per project; populated projects `p04-gigabit`, `p05-interferometer`, `p06-polisher`, `atomizer-v2`, and `atocore` applied cleanly for 33,253 vectors total, with 0 missing/malformed and an immediate final dry-run showing 33,253 already tagged / 0 updates. Post-backfill harness exposed p06 memory-ranking misses (`Tailscale`, `encoder`), so Codex shipped `4744c69` then `a87d984 fix(memory): widen query-time context candidates`. Full local suite: 571 passed. Live `/health` reports `a87d9845a8c34395a02890f0cf22aa7a46afaf62`, vectors=33,253, sources_ready=true. Live retrieval harness: 19/20, 0 blocking failures, 1 known issue (`p04-constraints` missing `Zerodur` / `1.2`). A repeat backfill dry-run after the code-only stabilization deploy was aborted after the one-off container ran too long; the live service stayed healthy and the earlier post-apply idempotency result remains the migration acceptance record. Dalidou HTTP push credentials are still not configured; this session pushed through the Windows credential path.
- **2026-04-24 Codex (retrieval boundary deployed + project_id metadata tranche)** Merged `codex/audit-improvements-foundation` to `main` as `f44a211` and pushed to Dalidou Gitea. Took pre-deploy runtime backup `/srv/storage/atocore/backups/snapshots/20260424T144810Z` (DB + registry, no Chroma). Deployed via `papa@dalidou` canonical `deploy/dalidou/deploy.sh`; live `/health` reports build_sha `f44a2114970008a7eec4e7fc2860c8f072914e38`, build_time `2026-04-24T14:48:44Z`, status ok. Post-deploy retrieval harness: 20 fixtures, 19 pass, 0 blocking failures, 1 known issue (`p04-constraints`). The former blocker `p05-broad-status-no-atomizer` now passes. Manual p05 `context-build "current status"` spot check shows no p04/Atomizer source bleed in retrieved chunks. Started follow-up branch `codex/project-id-metadata-retrieval`: registered-project ingestion now writes explicit `project_id` into DB chunk metadata and Chroma vector metadata; retrieval prefers exact `project_id` when present and keeps path/tag matching as legacy fallback; added dry-run-by-default `scripts/backfill_chunk_project_ids.py` to backfill SQLite + Chroma metadata; added tests for project-id ingestion, registered refresh propagation, exact project-id retrieval, and collision fallback. Verified targeted suite (`test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 36 passed. Verified full suite: 556 passed in 72.44s. Branch not merged or deployed yet. - **2026-04-24 Codex (retrieval boundary deployed + project_id metadata tranche)** Merged `codex/audit-improvements-foundation` to `main` as `f44a211` and pushed to Dalidou Gitea. Took pre-deploy runtime backup `/srv/storage/atocore/backups/snapshots/20260424T144810Z` (DB + registry, no Chroma). Deployed via `papa@dalidou` canonical `deploy/dalidou/deploy.sh`; live `/health` reports build_sha `f44a2114970008a7eec4e7fc2860c8f072914e38`, build_time `2026-04-24T14:48:44Z`, status ok. Post-deploy retrieval harness: 20 fixtures, 19 pass, 0 blocking failures, 1 known issue (`p04-constraints`). The former blocker `p05-broad-status-no-atomizer` now passes. Manual p05 `context-build "current status"` spot check shows no p04/Atomizer source bleed in retrieved chunks. Started follow-up branch `codex/project-id-metadata-retrieval`: registered-project ingestion now writes explicit `project_id` into DB chunk metadata and Chroma vector metadata; retrieval prefers exact `project_id` when present and keeps path/tag matching as legacy fallback; added dry-run-by-default `scripts/backfill_chunk_project_ids.py` to backfill SQLite + Chroma metadata; added tests for project-id ingestion, registered refresh propagation, exact project-id retrieval, and collision fallback. Verified targeted suite (`test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 36 passed. Verified full suite: 556 passed in 72.44s. Branch not merged or deployed yet.
- **2026-04-24 Codex (project_id audit response)** Applied independent-audit fixes on `codex/project-id-metadata-retrieval`. Closed the nightly `/ingest/sources` clobber risk by adding registry-level `derive_project_id_for_path()` and making unscoped `ingest_file()` derive ownership from registered ingest roots when possible; `refresh_registered_project()` still passes the canonical project id directly. Changed retrieval so empty `project_id` falls through to legacy path/tag ownership instead of short-circuiting as unowned. Hardened `scripts/backfill_chunk_project_ids.py`: `--apply` now requires `--chroma-snapshot-confirmed`, runs Chroma metadata updates before SQLite writes, batches updates, skips/report missing vectors, skips/report malformed metadata, reports already-tagged rows, and turns missing ingestion tables into a JSON `db_warning` instead of a traceback. Added tests for auto-derive ingestion, empty-project fallback, ingest-root overlap rejection, and backfill dry-run/apply/snapshot/missing-vector/malformed cases. Verified targeted suite (`test_backfill_chunk_project_ids.py`, `test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 45 passed. Verified full suite: 565 passed in 73.16s. Local dry-run on empty/default data returns 0 updates with `db_warning` rather than crashing. Branch still not merged/deployed. - **2026-04-24 Codex (project_id audit response)** Applied independent-audit fixes on `codex/project-id-metadata-retrieval`. Closed the nightly `/ingest/sources` clobber risk by adding registry-level `derive_project_id_for_path()` and making unscoped `ingest_file()` derive ownership from registered ingest roots when possible; `refresh_registered_project()` still passes the canonical project id directly. Changed retrieval so empty `project_id` falls through to legacy path/tag ownership instead of short-circuiting as unowned. Hardened `scripts/backfill_chunk_project_ids.py`: `--apply` now requires `--chroma-snapshot-confirmed`, runs Chroma metadata updates before SQLite writes, batches updates, skips/report missing vectors, skips/report malformed metadata, reports already-tagged rows, and turns missing ingestion tables into a JSON `db_warning` instead of a traceback. Added tests for auto-derive ingestion, empty-project fallback, ingest-root overlap rejection, and backfill dry-run/apply/snapshot/missing-vector/malformed cases. Verified targeted suite (`test_backfill_chunk_project_ids.py`, `test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 45 passed. Verified full suite: 565 passed in 73.16s. Local dry-run on empty/default data returns 0 updates with `db_warning` rather than crashing. Branch still not merged/deployed.

View File

@@ -1,11 +1,18 @@
# AtoCore - Current State (2026-04-24) # AtoCore - Current State (2026-04-25)
Update 2026-04-24: audit-improvements deployed as `f44a211`; live harness is Update 2026-04-25: project-id chunk/vector metadata is deployed and backfilled,
19/20 with 0 blocking failures and 1 known content gap. Active follow-up branch and the final p04 Trusted Project State budget/ranking gap is closed. Live
`codex/project-id-metadata-retrieval` is at 567 passing tests. Dalidou is on `d3de9f6`; `/health` is ok with 33,253 vectors and sources
ready. Live retrieval harness is 20/20 with 0 blocking failures and 0 known
issues. Full local suite: 572 passed.
Live deploy: `2b86543` · Dalidou health: ok · Harness: 18/20 with 1 known The project-id backfill was applied per populated project after a
content gap and 1 current blocking project-bleed guard · Tests: 553 passing. Chroma-inclusive backup at
`/srv/storage/atocore/backups/snapshots/20260424T154358Z`. The immediate
post-apply dry-run reported 33,253 already tagged, 0 updates, 0 missing, and 0
malformed. A later repeat dry-run after the code-only ranking deploy was
aborted because the one-off container ran too long; the earlier post-apply
idempotency result remains the migration acceptance record.
## V1-0 landed 2026-04-22 ## V1-0 landed 2026-04-22
@@ -69,10 +76,10 @@ Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs
| 7G | Re-extraction on prompt version bump | pending | | 7G | Re-extraction on prompt version bump | pending |
| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending | | 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |
## Known gaps (honest, refreshed 2026-04-24) ## Known gaps (honest, refreshed 2026-04-25)
1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below. 1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
2. **Project-scoped retrieval guard is deployed and passing.** The April 24 p05 broad-status bleed guard now passes on live Dalidou. The active follow-up branch adds explicit `project_id` chunk/vector metadata so the deployed path/tag heuristic can become a legacy fallback. 2. **Project-scoped retrieval guard is deployed and passing.** Explicit `project_id` chunk/vector metadata is now present in SQLite and Chroma for the 33,253-vector corpus. Retrieval prefers exact metadata ownership and keeps path/tag matching as a legacy fallback.
3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work. 3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
4. **Harness known issue:** `p04-constraints` wants "Zerodur" and "1.2"; live retrieval surfaces related constraints but not those exact strings. Treat as content/state gap until fixed. 4. **Harness is currently green.** The former `p04-constraints` known issue is closed; query-relevant Trusted Project State entries now rank before state-budget truncation.
5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs. 5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.

View File

@@ -131,10 +131,12 @@ This sits implicitly between Phase 8 (OpenClaw) and Phase 11
(multi-model). Memory-review and engineering-entity commands are (multi-model). Memory-review and engineering-entity commands are
deferred from the shared client until their workflows are exercised. deferred from the shared client until their workflows are exercised.
## What Is Real Today (updated 2026-04-24) ## What Is Real Today (updated 2026-04-25)
- canonical AtoCore runtime on Dalidou (`2b86543`, deploy.sh verified) - canonical AtoCore runtime on Dalidou (`d3de9f6`, deploy.sh verified)
- 33,253 vectors across 6 registered projects - 33,253 vectors across 6 registered projects, with explicit `project_id`
metadata backfilled into SQLite and Chroma after snapshot
`/srv/storage/atocore/backups/snapshots/20260424T154358Z`
- 951 captured interactions as of the 2026-04-24 live dashboard; refresh - 951 captured interactions as of the 2026-04-24 live dashboard; refresh
exact live counts with exact live counts with
`python scripts/live_status.py` `python scripts/live_status.py`
@@ -149,10 +151,13 @@ deferred from the shared client until their workflows are exercised.
- 290 active memories and 0 candidate memories as of the 2026-04-24 live - 290 active memories and 0 candidate memories as of the 2026-04-24 live
dashboard dashboard
- context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks - context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
- query-relevance memory ranking with overlap-density scoring - query-relevance memory ranking with overlap-density scoring and widened
- retrieval eval harness: 20 fixtures; current live has 19 pass, 1 known query-time candidate pools so older exact-intent project memories can rank
content gap, and 0 blocking failures after the audit-improvements deploy ahead of generic high-confidence notes
- 567 tests passing on the active `codex/project-id-metadata-retrieval` branch - query-relevance Trusted Project State ranking before state-budget truncation
- retrieval eval harness: 20 fixtures; current live has 20 pass, 0 known
issues, and 0 blocking failures
- 572 tests passing on `main`
- nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness****pipeline summary to project state** - nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness****pipeline summary to project state**
- Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced) - Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
- pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats - pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
@@ -173,9 +178,10 @@ These are the current practical priorities.
Target: 100+ active memories. Target: 100+ active memories.
3. **Multi-model triage** (Phase 11 entry) — switch auto-triage to a 3. **Multi-model triage** (Phase 11 entry) — switch auto-triage to a
different model than the extractor for independent validation different model than the extractor for independent validation
4. **Fix p04-constraints harness failure** — retrieval doesn't surface 4. **Fix Dalidou Git credentials** — the host checkout can fetch but cannot
"Zerodur" for p04 constraint queries. Investigate if it's a missing push to Gitea over HTTP in non-interactive SSH sessions. Prefer switching
memory or retrieval ranking issue. the deploy checkout to a Gitea SSH key; PAT-backed `credential.helper store`
is the fallback.
## Active — Engineering V1 Completion Track (started 2026-04-22) ## Active — Engineering V1 Completion Track (started 2026-04-22)

View File

@@ -27,8 +27,7 @@
"expect_absent": [ "expect_absent": [
"polisher suite" "polisher suite"
], ],
"known_issue": true, "notes": "Regression guard: query-relevant Trusted Project State requirements must survive the project-state budget cap."
"notes": "Known content gap as of 2026-04-24: live retrieval surfaces related constraints but not the exact Zerodur / 1.2 strings. Keep visible, but do not make nightly harness red until the source/state gap is fixed."
}, },
{ {
"name": "p04-short-ambiguous", "name": "p04-short-ambiguous",

View File

@@ -303,6 +303,7 @@ class MemoryUpdateRequest(BaseModel):
memory_type: str | None = None memory_type: str | None = None
domain_tags: list[str] | None = None domain_tags: list[str] | None = None
valid_until: str | None = None valid_until: str | None = None
project: str | None = None
class ProjectStateSetRequest(BaseModel): class ProjectStateSetRequest(BaseModel):
@@ -636,6 +637,7 @@ def api_update_memory(memory_id: str, req: MemoryUpdateRequest) -> dict:
memory_type=req.memory_type, memory_type=req.memory_type,
domain_tags=req.domain_tags, domain_tags=req.domain_tags,
valid_until=req.valid_until, valid_until=req.valid_until,
project=req.project,
) )
except ValueError as e: except ValueError as e:
raise HTTPException(status_code=400, detail=str(e)) raise HTTPException(status_code=400, detail=str(e))
@@ -794,33 +796,25 @@ def api_invalidate_memory(
req: MemoryInvalidateRequest | None = None, req: MemoryInvalidateRequest | None = None,
) -> dict: ) -> dict:
"""Retract an active memory (Issue E — active → invalid).""" """Retract an active memory (Issue E — active → invalid)."""
from atocore.memory.service import get_memories as _get_memories, invalidate_memory from atocore.memory.service import get_memory, invalidate_memory
reason = req.reason if req else "" reason = req.reason if req else ""
# Quick existence/status check for a clean 404 vs 409. # Direct id lookup — earlier code used get_memories(status='active', limit=1)
existing = [ # which only saw the highest-confidence active row, so any other active
m for m in _get_memories(status="active", limit=1) # memory would 404 here even though it existed.
if m.id == memory_id target = get_memory(memory_id)
] if target is None:
if not existing:
# Fall through to generic not-active if the id exists in another status.
all_match = [
m for m in _get_memories(status="candidate", limit=5000)
+ _get_memories(status="invalid", limit=5000)
+ _get_memories(status="superseded", limit=5000)
if m.id == memory_id
]
if all_match:
if all_match[0].status == "invalid":
return {"status": "already_invalid", "id": memory_id}
raise HTTPException(
status_code=409,
detail=(
f"Memory {memory_id} is {all_match[0].status}; "
"use /reject for candidates"
),
)
raise HTTPException(status_code=404, detail=f"Memory not found: {memory_id}") raise HTTPException(status_code=404, detail=f"Memory not found: {memory_id}")
if target.status == "invalid":
return {"status": "already_invalid", "id": memory_id}
if target.status != "active":
raise HTTPException(
status_code=409,
detail=(
f"Memory {memory_id} is {target.status}; "
"use /reject for candidates"
),
)
success = invalidate_memory(memory_id, actor="api-http", reason=reason) success = invalidate_memory(memory_id, actor="api-http", reason=reason)
if not success: if not success:
@@ -1280,16 +1274,20 @@ def api_dashboard() -> dict:
health beyond the basic /health endpoint. health beyond the basic /health endpoint.
""" """
import json as _json import json as _json
from collections import Counter
from datetime import datetime as _dt, timezone as _tz from datetime import datetime as _dt, timezone as _tz
all_memories = get_memories(active_only=False, limit=500) from atocore.memory.service import get_memory_count_summary
active = [m for m in all_memories if m.status == "active"]
candidates = [m for m in all_memories if m.status == "candidate"]
type_counts = dict(Counter(m.memory_type for m in active)) # SQL-backed counts. Earlier code derived these by sampling the top
project_counts = dict(Counter(m.project or "(none)" for m in active)) # 500 rows of get_memories() ordered by confidence — anything past
reinforced = [m for m in active if m.reference_count > 0] # the cap was invisible, so /admin/dashboard silently undercounted
# active memories once the corpus crossed ~500 active rows.
counts = get_memory_count_summary()
active_total = counts["active"]["total"]
candidate_total = counts["by_status"].get("candidate", 0)
type_counts = counts["active"]["by_type"]
project_counts = counts["active"]["by_project"]
reinforced_total = counts["active"]["reinforced"]
# Interaction stats — total + by_client from DB directly # Interaction stats — total + by_client from DB directly
interaction_stats: dict = {"most_recent": None, "total": 0, "by_client": {}} interaction_stats: dict = {"most_recent": None, "total": 0, "by_client": {}}
@@ -1402,13 +1400,13 @@ def api_dashboard() -> dict:
# Triage queue health # Triage queue health
triage: dict = { triage: dict = {
"pending": len(candidates), "pending": candidate_total,
"review_url": "/admin/triage", "review_url": "/admin/triage",
} }
if len(candidates) > 50: if candidate_total > 50:
triage["warning"] = f"High queue: {len(candidates)} candidates pending review." triage["warning"] = f"High queue: {candidate_total} candidates pending review."
elif len(candidates) > 20: elif candidate_total > 20:
triage["notice"] = f"{len(candidates)} candidates awaiting triage." triage["notice"] = f"{candidate_total} candidates awaiting triage."
# Recent audit activity (Phase 4 V1) — last 10 mutations for operator # Recent audit activity (Phase 4 V1) — last 10 mutations for operator
recent_audit: list[dict] = [] recent_audit: list[dict] = []
@@ -1420,11 +1418,13 @@ def api_dashboard() -> dict:
return { return {
"memories": { "memories": {
"active": len(active), "active": active_total,
"candidates": len(candidates), "candidates": candidate_total,
"by_type": type_counts, "by_type": type_counts,
"by_project": project_counts, "by_project": project_counts,
"reinforced": len(reinforced), "reinforced": reinforced_total,
"by_status": counts["by_status"],
"total": counts["total"],
}, },
"project_state": { "project_state": {
"counts": ps_counts, "counts": ps_counts,

View File

@@ -11,7 +11,7 @@ from dataclasses import dataclass, field
from pathlib import Path from pathlib import Path
import atocore.config as _config import atocore.config as _config
from atocore.context.project_state import format_project_state, get_state from atocore.context.project_state import ProjectStateEntry, format_project_state, get_state
from atocore.memory.service import get_memories_for_context from atocore.memory.service import get_memories_for_context
from atocore.observability.logger import get_logger from atocore.observability.logger import get_logger
from atocore.engineering.service import get_entities, get_entity_with_context from atocore.engineering.service import get_entities, get_entity_with_context
@@ -116,6 +116,11 @@ def build_context(
if canonical_project: if canonical_project:
state_entries = get_state(canonical_project) state_entries = get_state(canonical_project)
if state_entries: if state_entries:
state_entries = _rank_project_state_entries(
state_entries,
query=user_prompt,
project=canonical_project,
)
project_state_text = format_project_state(state_entries) project_state_text = format_project_state(state_entries)
project_state_text, project_state_chars = _truncate_text_block( project_state_text, project_state_chars = _truncate_text_block(
project_state_text, project_state_text,
@@ -284,6 +289,55 @@ def get_last_context_pack() -> ContextPack | None:
return _last_context_pack return _last_context_pack
def _rank_project_state_entries(
entries: list[ProjectStateEntry],
query: str,
project: str,
) -> list[ProjectStateEntry]:
"""Promote query-relevant trusted state before the state band is truncated."""
if not query or len(entries) <= 1:
return entries
from atocore.memory.reinforcement import _normalize, _tokenize
query_text = _normalize(query.replace("_", " "))
query_tokens = set(_tokenize(query_text))
query_tokens -= {
"how",
"what",
"when",
"where",
"which",
"who",
"why",
"current",
"status",
"project",
}
for part in (project or "").lower().replace("_", "-").split("-"):
query_tokens.discard(part)
if not query_tokens:
return entries
scored: list[tuple[int, float, float, int, ProjectStateEntry]] = []
for index, entry in enumerate(entries):
entry_text = " ".join(
[
entry.category,
entry.key.replace("_", " "),
entry.value,
entry.source,
]
)
entry_tokens = _tokenize(_normalize(entry_text))
overlap = len(entry_tokens & query_tokens) if entry_tokens else 0
density = overlap / len(entry_tokens) if entry_tokens else 0.0
scored.append((overlap, density, entry.confidence, -index, entry))
scored.sort(key=lambda item: (item[0], item[1], item[2], item[3]), reverse=True)
return [entry for _, _, _, _, entry in scored]
def _rank_chunks( def _rank_chunks(
candidates: list[ChunkResult], candidates: list[ChunkResult],
project_hint: str | None, project_hint: str | None,

View File

@@ -347,6 +347,83 @@ def get_memories(
return [_row_to_memory(r) for r in rows] return [_row_to_memory(r) for r in rows]
def get_memory(memory_id: str) -> Memory | None:
"""Return a single memory by id, or None if missing.
Direct id lookup (no LIMIT, no confidence ordering) — the right
primitive for routes that need to check a specific memory's status
before acting. Avoids the sampling pitfall where ``get_memories``
with a small ``limit`` could hide a target row sorted past the cap.
"""
with get_connection() as conn:
row = conn.execute(
"SELECT * FROM memories WHERE id = ?", (memory_id,)
).fetchone()
return _row_to_memory(row) if row else None
def get_memory_count_summary() -> dict:
"""Aggregate memory counts straight from SQL (no sampling).
Returned shape:
{
"total": int, # all rows
"by_status": {status: int, ...}, # full table
"active": {
"total": int,
"reinforced": int, # active with reference_count > 0
"by_type": {memory_type: int, ...},
"by_project": {project_or_none: int, ...},
},
}
Distinct from ``get_memories(...)``, which is a row-fetcher with a
confidence-sorted LIMIT and is therefore not safe for counting.
"""
summary: dict = {
"total": 0,
"by_status": {},
"active": {
"total": 0,
"reinforced": 0,
"by_type": {},
"by_project": {},
},
}
with get_connection() as conn:
row = conn.execute("SELECT count(*) FROM memories").fetchone()
summary["total"] = row[0] if row else 0
rows = conn.execute(
"SELECT status, count(*) FROM memories GROUP BY status"
).fetchall()
summary["by_status"] = {r[0]: r[1] for r in rows}
active_total = summary["by_status"].get("active", 0)
summary["active"]["total"] = active_total
rows = conn.execute(
"SELECT memory_type, count(*) FROM memories "
"WHERE status = 'active' GROUP BY memory_type"
).fetchall()
summary["active"]["by_type"] = {r[0]: r[1] for r in rows}
rows = conn.execute(
"SELECT COALESCE(NULLIF(project, ''), '(none)') AS project, count(*) "
"FROM memories WHERE status = 'active' GROUP BY project"
).fetchall()
summary["active"]["by_project"] = {r[0]: r[1] for r in rows}
row = conn.execute(
"SELECT count(*) FROM memories "
"WHERE status = 'active' AND reference_count > 0"
).fetchone()
summary["active"]["reinforced"] = row[0] if row else 0
return summary
def update_memory( def update_memory(
memory_id: str, memory_id: str,
content: str | None = None, content: str | None = None,
@@ -355,6 +432,7 @@ def update_memory(
memory_type: str | None = None, memory_type: str | None = None,
domain_tags: list[str] | None = None, domain_tags: list[str] | None = None,
valid_until: str | None = None, valid_until: str | None = None,
project: str | None = None,
actor: str = "api", actor: str = "api",
note: str = "", note: str = "",
) -> bool: ) -> bool:
@@ -368,6 +446,10 @@ def update_memory(
next_content = content if content is not None else existing["content"] next_content = content if content is not None else existing["content"]
next_status = status if status is not None else existing["status"] next_status = status if status is not None else existing["status"]
next_project = (
resolve_project_name(project) if project is not None
else (existing["project"] or "")
)
if confidence is not None: if confidence is not None:
_validate_confidence(confidence) _validate_confidence(confidence)
@@ -375,7 +457,7 @@ def update_memory(
duplicate = conn.execute( duplicate = conn.execute(
"SELECT id FROM memories " "SELECT id FROM memories "
"WHERE memory_type = ? AND content = ? AND project = ? AND status = 'active' AND id != ?", "WHERE memory_type = ? AND content = ? AND project = ? AND status = 'active' AND id != ?",
(existing["memory_type"], next_content, existing["project"] or "", memory_id), (existing["memory_type"], next_content, next_project, memory_id),
).fetchone() ).fetchone()
if duplicate: if duplicate:
raise ValueError("Update would create a duplicate active memory") raise ValueError("Update would create a duplicate active memory")
@@ -386,6 +468,7 @@ def update_memory(
"status": existing["status"], "status": existing["status"],
"confidence": existing["confidence"], "confidence": existing["confidence"],
"memory_type": existing["memory_type"], "memory_type": existing["memory_type"],
"project": existing["project"] or "",
} }
after_snapshot = dict(before_snapshot) after_snapshot = dict(before_snapshot)
@@ -422,6 +505,10 @@ def update_memory(
updates.append("valid_until = ?") updates.append("valid_until = ?")
params.append(vu) params.append(vu)
after_snapshot["valid_until"] = vu or "" after_snapshot["valid_until"] = vu or ""
if project is not None:
updates.append("project = ?")
params.append(next_project)
after_snapshot["project"] = next_project
if not updates: if not updates:
return False return False

View File

@@ -143,6 +143,52 @@ def test_project_state_respects_total_budget(tmp_data_dir, sample_markdown):
assert len(pack.formatted_context) <= 120 assert len(pack.formatted_context) <= 120
def test_project_state_query_relevance_before_truncation(tmp_data_dir, sample_markdown):
"""Relevant trusted state should survive the project-state budget cap."""
init_db()
init_project_state_schema()
ingest_file(sample_markdown)
set_state(
"p04-gigabit",
"contact",
"abb-space",
"ABB Space is the primary vendor contact for polishing, CCP, IBF, procurement coordination, "
"contract administration, interface planning, and delivery discussions.",
)
set_state(
"p04-gigabit",
"decision",
"back-structure",
"Option B selected: conical isogrid back structure with variable rib density. "
"Chosen over flat-back for stiffness-to-weight ratio and manufacturability.",
)
set_state(
"p04-gigabit",
"decision",
"polishing-vendor",
"ABB Space selected as polishing vendor. Contract includes computer-controlled polishing "
"and ion beam figuring.",
)
set_state(
"p04-gigabit",
"requirement",
"key_constraints",
"The program targets a 1.2 m lightweight Zerodur mirror with filtered mechanical WFE below 15 nm "
"and mass below 103.5 kg.",
)
pack = build_context(
"what are the key GigaBIT M1 program constraints",
project_hint="p04-gigabit",
budget=3000,
)
assert "Zerodur" in pack.formatted_context
assert "1.2" in pack.formatted_context
assert pack.formatted_context.find("[REQUIREMENT]") < pack.formatted_context.find("[CONTACT]")
def test_project_hint_matches_state_case_insensitively(tmp_data_dir, sample_markdown): def test_project_hint_matches_state_case_insensitively(tmp_data_dir, sample_markdown):
"""Project state lookup should not depend on exact casing.""" """Project state lookup should not depend on exact casing."""
init_db() init_db()

View File

@@ -192,3 +192,91 @@ def test_v1_aliases_present(env):
"/v1/memory/{memory_id}/supersede", "/v1/memory/{memory_id}/supersede",
): ):
assert p in paths, f"{p} missing" assert p in paths, f"{p} missing"
# ---------------------------------------------------------------------------
# Wave 1 (2026-04-29) — invalidation route used to do
# `_get_memories(status='active', limit=1)` and look for the target id
# inside that single highest-confidence row, so any active memory
# outside slot 0 fell through as 404. Direct id lookup fixes it.
# ---------------------------------------------------------------------------
def test_api_invalidate_finds_active_memory_outside_top_one(env):
"""An active memory not at the top of the confidence sort must still
be invalidatable via POST /memory/{id}/invalidate."""
high = create_memory(
memory_type="knowledge",
content="high-confidence top row",
confidence=0.99,
)
low = create_memory(
memory_type="knowledge",
content="lower-confidence target",
confidence=0.55,
)
client = TestClient(app)
r = client.post(f"/memory/{low.id}/invalidate", json={"reason": "wave1 regression"})
assert r.status_code == 200, r.text
assert r.json()["status"] == "invalidated"
# And confirm the high-confidence row is untouched
assert _get_memory(high.id).status == "active"
assert _get_memory(low.id).status == "invalid"
def test_api_invalidate_already_invalid_is_idempotent(env):
m = create_memory(memory_type="knowledge", content="already invalid")
client = TestClient(app)
r1 = client.post(f"/memory/{m.id}/invalidate", json={"reason": "first"})
assert r1.status_code == 200
r2 = client.post(f"/memory/{m.id}/invalidate", json={"reason": "again"})
assert r2.status_code == 200
assert r2.json()["status"] == "already_invalid"
def test_api_invalidate_candidate_returns_409(env):
m = create_memory(
memory_type="knowledge", content="candidate route", status="candidate"
)
client = TestClient(app)
r = client.post(f"/memory/{m.id}/invalidate", json={"reason": "wrong route"})
assert r.status_code == 409
def test_api_invalidate_unknown_id_is_404(env):
client = TestClient(app)
r = client.post("/memory/no-such-id/invalidate", json={"reason": "ghost"})
assert r.status_code == 404
def test_admin_dashboard_active_count_matches_full_table(env):
"""/admin/dashboard memories.active must match the SQL aggregate even
when there are more active memories than the legacy sample limit (500).
This guards the Codex finding that the dashboard was deriving counts
from a confidence-sorted limit=500 fetch, hiding rows past the cap.
We don't need 500 rows in the test — a small corpus that exercises
the SQL-aggregate path is enough; the integrity-vs-dashboard equality
is the invariant being asserted.
"""
# Mix of statuses to exercise the by_status aggregate
create_memory(memory_type="knowledge", content="a")
create_memory(memory_type="knowledge", content="b", project="p06-polisher")
create_memory(memory_type="project", content="c-cand", status="candidate")
cand = create_memory(memory_type="project", content="d-cand", status="candidate")
# Invalidate one to seed an "invalid" bucket
from atocore.memory.service import invalidate_memory
target_id = cand.id
# Promote it first via direct DB so invalidate does flip a candidate
# to invalid via the service path (mirrors actual API trajectory).
invalidate_memory(target_id)
client = TestClient(app)
dash = client.get("/admin/dashboard").json()
assert dash["memories"]["active"] == 2
assert dash["memories"]["candidates"] == 1
assert dash["memories"]["by_status"]["invalid"] == 1
assert dash["memories"]["total"] == 4
assert dash["memories"]["by_project"].get("p06-polisher") == 1
# "(none)" bucket is the COALESCE label for empty/null project
assert "(none)" in dash["memories"]["by_project"]

View File

@@ -575,3 +575,89 @@ def test_expire_stale_candidates_keeps_reinforced(isolated_db):
assert mid not in expired assert mid not in expired
mem = _get_memory_by_id(mid) mem = _get_memory_by_id(mid)
assert mem["status"] == "candidate" assert mem["status"] == "candidate"
# ---------------------------------------------------------------------------
# Wave 1 (2026-04-29) — counts come from SQL, not from the top-N sample.
# Exposed by Codex audit when prod /admin/dashboard reported 315 active
# while /admin/integrity-check reported 1091. The dashboard was building
# its counts from a confidence-sorted limit=500 fetch.
# ---------------------------------------------------------------------------
def test_get_memory_count_summary_returns_full_table_aggregates(isolated_db):
"""Counts come from SQL aggregates, not a sampled fetch."""
from atocore.memory.service import (
create_memory,
get_memory_count_summary,
invalidate_memory,
)
# Create more rows than any reasonable sampling LIMIT so any
# LIMIT-based counter would visibly disagree with reality.
for i in range(120):
create_memory(
"knowledge",
f"fact-{i}",
project="p04-gigabit",
confidence=0.9,
status="active",
)
for i in range(7):
create_memory("knowledge", f"cand-{i}", status="candidate")
invalid_obj = create_memory("knowledge", "to-invalidate", status="active")
invalidate_memory(invalid_obj.id)
summary = get_memory_count_summary()
assert summary["total"] == 120 + 7 + 1
assert summary["by_status"]["active"] == 120
assert summary["by_status"]["candidate"] == 7
assert summary["by_status"]["invalid"] == 1
assert summary["active"]["total"] == 120
assert summary["active"]["by_type"] == {"knowledge": 120}
assert summary["active"]["by_project"] == {"p04-gigabit": 120}
def test_get_memory_returns_single_row_or_none(isolated_db):
from atocore.memory.service import create_memory, get_memory
mem = create_memory("knowledge", "single-row test")
fetched = get_memory(mem.id)
assert fetched is not None
assert fetched.id == mem.id
assert get_memory("non-existent-id") is None
def test_update_memory_can_change_project_with_canonicalization(
isolated_db, project_registry
):
"""update_memory(project=...) canonicalizes aliases and writes audit."""
project_registry(("p04-gigabit", ("p04", "gigabit")))
from atocore.memory.service import (
create_memory,
get_memory,
get_memory_audit,
update_memory,
)
mem = create_memory("knowledge", "retargetable fact", project="atocore")
ok = update_memory(mem.id, project="p04") # alias
assert ok is True
refreshed = get_memory(mem.id)
assert refreshed.project == "p04-gigabit" # canonical, not "p04"
audit_rows = get_memory_audit(mem.id, limit=10)
update_rows = [r for r in audit_rows if r.get("action") == "updated"]
assert update_rows, f"expected an updated audit row, got {audit_rows}"
head = update_rows[0]
assert head["before"]["project"] == "atocore"
assert head["after"]["project"] == "p04-gigabit"
def test_update_memory_project_unchanged_when_not_passed(isolated_db):
from atocore.memory.service import create_memory, get_memory, update_memory
mem = create_memory("knowledge", "untouched project", project="p06-polisher")
update_memory(mem.id, content="edited content")
assert get_memory(mem.id).project == "p06-polisher"