10 Commits

Author SHA1 Message Date
9604c3e9ae docs(ledger): Wave 1 Codex audit closure + amend session log
Records Codex's formal GO WITH CONDITIONS verdict, the two P2 amends
shipped on 3a474f7, and the deployment checklist carried into the
merge plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:54:20 -04:00
3a474f750c fix(memory): close Codex Wave 1 audit conditions (auto_triage + supersede guard)
Codex's formal audit of fb4d55c said GO WITH CONDITIONS. Two P2 findings
to fold in before merge:

1. auto_triage.py:417 still PUT {"content": cand["content"]} — the
   suggested-project correction was unreachable even with
   MemoryUpdateRequest.project in place. Changed body to
   {"project": suggested} so misattribution flags actually retarget the
   memory. Added a regression test that asserts the script source
   contains the new PUT shape, so a future "optimization" can't silently
   undo this.

2. POST /memory/{id}/supersede had no status guard — calling
   supersede_memory() delegated to update_memory(status="superseded"),
   which would silently flip a candidate to superseded. Mirrored the
   invalidate route: get_memory(id) lookup, 404 unknown / 200
   already_superseded / 409 wrong-status / 200 superseded.

Plus a P3 from the same audit: covered the "retarget to project=''
when a global active duplicate exists" case via
test_update_memory_to_empty_project_detects_global_duplicate.

Tests: 581 -> 586 (+5: 3 supersede route + 1 project-empty duplicate +
1 auto_triage caller invariant).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:53:39 -04:00
4e6fba7cb9 docs(ledger): Wave 1 session log + Orientation refresh
Records the 2026-04-29 audit findings, the Codex review, and the three
memory-write-path fixes shipped on this branch. Refreshes Orientation
with live state verified against Dalidou: build 7042eae, harness 20/20,
1054 interactions, 315/1091 dashboard/integrity memory counts (the
sampling bug behind the discrepancy is fixed in the prior commit; the
two will agree post-deploy).

Also captures the unregistered emerging-projects list (apm=63, openclaw=9,
lead-space=2, drill=1, optiques-fullum=1) so the Wave 1 follow-up
(one-click registration proposal) doesn't get lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:41:18 -04:00
fb4d55cbcd fix(memory): SQL-aggregate dashboard counts, project on update, id-based invalidate
Three bugs surfaced by the 2026-04-29 Codex review of the state-of-the-service
plan, all in the memory write/read path:

1. /admin/dashboard memory counts were derived from a confidence-sorted
   get_memories(limit=500) sample. With prod at 1091 active memories the
   dashboard reported 315 ("active in the top 500"), while integrity
   reported the SQL aggregate 1091. Replaced the sampling block with a
   new get_memory_count_summary() helper that does straight SQL aggregates
   over status/type/project. Dashboard memories.{active,candidates,...}
   now match integrity. Adds memories.{by_status,total} for completeness.

2. PUT /memory/{id} silently dropped project changes because
   MemoryUpdateRequest had no project field and update_memory() didn't
   accept one. auto_triage.py:407 detects suggested_project drift and
   issues a PUT to fix it; the fix never landed. Added project to the
   request schema and the service signature, with resolve_project_name
   canonicalization, before/after audit snapshot, and the existing
   duplicate-active check now scoped to the new project.

3. POST /memory/{id}/invalidate did _get_memories(status="active",
   limit=1) and looked for the target inside that single
   highest-confidence row. Any other active memory 404'd. Replaced with
   a direct id lookup via the new get_memory(id) helper; status branching
   stays the same (404 unknown / 200 already-invalid / 409 wrong-status /
   200 invalidated).

Tests added (9):
- test_get_memory_count_summary_returns_full_table_aggregates
- test_get_memory_returns_single_row_or_none
- test_update_memory_can_change_project_with_canonicalization
- test_update_memory_project_unchanged_when_not_passed
- test_api_invalidate_finds_active_memory_outside_top_one
- test_api_invalidate_already_invalid_is_idempotent
- test_api_invalidate_candidate_returns_409
- test_api_invalidate_unknown_id_is_404
- test_admin_dashboard_active_count_matches_full_table

Test count: 572 -> 581. Full suite green locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:40:10 -04:00
7042eaea46 docs: record fully green retrieval harness 2026-04-24 21:04:13 -04:00
d3de9f67ea fix(context): rank trusted state by query relevance 2026-04-24 20:54:56 -04:00
0fc6705173 docs: record retrieval stabilization state 2026-04-24 20:45:05 -04:00
a87d9845a8 fix(memory): widen query-time context candidates 2026-04-24 20:29:00 -04:00
Antoine Letarte
4744c69d10 fix(memory): rank project memories by query intent 2026-04-24 20:49:53 +00:00
867a1abfaa merge: explicit project id retrieval metadata 2026-04-24 11:36:19 -04:00
11 changed files with 728 additions and 93 deletions

View File

@@ -6,26 +6,26 @@
## Orientation
- **live_sha** (Dalidou `/health` build_sha): `f44a211` (verified 2026-04-24T14:48:44Z post audit-improvements deploy; status=ok)
- **last_updated**: 2026-04-24 by Codex (retrieval boundary deployed; project_id metadata branch started)
- **main_tip**: `f44a211`
- **test_count**: 567 on `codex/project-id-metadata-retrieval` (deployed main baseline: 553)
- **harness**: `19/20 PASS` on live Dalidou, 0 blocking failures, 1 known content gap (`p04-constraints`)
- **live_sha** (Dalidou `/health` build_sha): `7042eae` (verified 2026-04-29T01:19Z; status=ok; deployed 2026-04-25T01:04Z, docs-only on top of `d3de9f6`)
- **last_updated**: 2026-04-29 by Claude (Wave 1 debt-pay branch open; Codex review of plan applied)
- **main_tip**: `7042eae`
- **test_count**: 586 on `claude/wave1-dashboard-counts-and-memory-fixes` (572 on main + 14 Wave 1 regressions; +5 from Codex review amends)
- **harness**: `20/20 PASS` on live Dalidou, 0 blocking failures, 0 known issues (last harness run nightly 2026-04-28T03:00:30Z)
- **vectors**: 33,253
- **active_memories**: 290 (`/admin/dashboard` 2026-04-24; note integrity panel reports a separate active_memory_count=951 and needs reconciliation)
- **candidate_memories**: 0 (triage queue drained)
- **interactions**: 951 (`/admin/dashboard` 2026-04-24)
- **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08)
- **project_state_entries**: 128 across registered projects (`/admin/dashboard` 2026-04-24)
- **entities**: 66 (up from 35 — V1-0 backfill + ongoing work; 0 open conflicts)
- **active_memories**: 315 dashboard / 1091 integrity (verified live 2026-04-29). Discrepancy was a dashboard sampling bug — Wave 1 commit `fb4d55c` replaces it with SQL aggregates; post-deploy the two will agree.
- **candidate_memories**: 0 (queue drained, +103 captures since 2026-04-25)
- **interactions**: 1054 (claude-code 474, openclaw 576; verified `/admin/dashboard` 2026-04-29)
- **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08). **Auto-detected, unregistered**: apm (63 active memories), openclaw (9), lead-space (2), drill (1), optiques-fullum (1) — see Wave 1 follow-up below.
- **project_state_entries**: 128 across registered projects (verified 2026-04-29)
- **entities**: 66 (V1-0 backfill complete; 0 open conflicts)
- **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified
- **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)****pipeline summary (NEW)**
- **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → auto-promote/expire → weekly synth/lint Sundays → retrieval harness pipeline summary
- **capture_clients**: claude-code (Stop hook + cwd project inference), openclaw (before_agent_start + llm_output plugin, verified live)
- **wiki**: http://dalidou:8100/wiki (browse), /wiki/projects/{id}, /wiki/entities/{id}, /wiki/search
- **dashboard**: http://dalidou:8100/admin/dashboard (now shows pipeline health, interaction totals by client, all registered projects)
- **active_track**: Engineering V1 Completion (started 2026-04-22). V1-0 landed (`2712c5d`). V1-A density gate CLEARED (784 active ≫ 100 target as of 2026-04-23). V1-A soak gate at day 5/~7 (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; failures confined to the known p04-constraints content gap). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`.
- **last_nightly_pipeline**: `2026-04-23T03:00:20Z` — harness 17/18, triage promoted=3 rejected=7 human=0, dedup 7 clusters (1 tier1 + 6 tier2 auto-merged), graduation 30-skipped 0-graduated 0-errors, auto-triage drained the queue (0 new candidates 2026-04-22T00:52Z run)
- **open_branches**: none — R14 squash-merged as `0989fed` and deployed 2026-04-23T15:20:53Z. V1-A is the next scheduled work
- **dashboard**: http://dalidou:8100/admin/dashboard
- **active_track**: Wave 1 debt-pay (this session) → V1-A start. V1-A gates have cleared (soak ended 2026-04-26; density 315 ≫ 100). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`.
- **last_nightly_pipeline**: `2026-04-28T03:00:30Z` — harness 20/20, triage promoted=1 rejected=1 human=0
- **open_branches**: `claude/wave1-dashboard-counts-and-memory-fixes` (tip `3a474f7`) — three memory-write-path bugs + two follow-on P2 fixes from Codex's formal audit (`auto_triage.py` PUT body + `/memory/{id}/supersede` status guard). Awaiting Codex re-review of the amended branch before squash-merge/deploy.
## Active Plan
@@ -170,6 +170,14 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
## Session Log
- **2026-04-29 Codex + Claude (Wave 1 formal audit closed + amends)** Codex's formal audit of `fb4d55c` (Wave 1 first commit on `claude/wave1-dashboard-counts-and-memory-fixes`): verdict GO WITH CONDITIONS. Two P1-prior closures confirmed (dashboard count bug + invalidate top-1 lookup). Project-update P2 was only "partially closed" — API/service plumbing fine, but `scripts/auto_triage.py:417` still PUT `{"content": cand["content"]}` so the operational suggested-project correction was unreachable even with `MemoryUpdateRequest.project` in place. Codex also flagged the symmetric supersede-route gap as same-class adjacent surface and recommended pulling it in here, not in Wave 1.5. Plus one P3: cover retarget-to-empty-project against a global active duplicate. Amended on `3a474f7`: (1) auto_triage PUT body now `{"project": suggested}` with a guard test that lints the script source for the new shape; (2) `/memory/{id}/supersede` mirrors the invalidate guard via `get_memory(id)` — 404 unknown / 200 already_superseded / 409 wrong-status / 200 superseded; (3) regression test for project-empty duplicate detection. Test count 581 → 586. Codex's recommended deployment checklist (post-deploy verifications, including the run-or-simulate auto-triage retarget probe) carried into the merge plan. Awaiting Codex re-review of the amended tip before squash-merge.
- **2026-04-29 Claude (Wave 1 debt-pay started; Codex review of state-of-service plan)** Audited live state on Dalidou: `/health` build_sha `7042eae` (4d old), harness 20/20, 33,253 vectors, 1,748 docs, 1054 interactions (+103/4d), dashboard memories.active=315 vs integrity.active_memory_count=1091. Drafted a state-of-service assessment + Wave 1/2/3/4 plan, then asked Codex (gpt-5.5) for an adversarial review via `codex exec`. Codex verdict: assessment MIXED, plan ENDORSE WITH CHANGES. Codex caught two factual corrections (the 315-vs-1091 gap is a *sampling bug* not a definitional gap; my "R9 drops unregistered tags" framing is wrong — `extractor_llm.py:213-233` preserves them) and two new memory-write-path bugs I missed. Branched `claude/wave1-dashboard-counts-and-memory-fixes` from `7042eae`, fixed all three: (1) `/admin/dashboard` now uses a new `get_memory_count_summary()` SQL aggregate helper instead of counting inside a confidence-sorted `get_memories(limit=500)` sample; (2) `MemoryUpdateRequest` and `update_memory()` accept `project` with `resolve_project_name` canonicalization + before/after audit, so `auto_triage.py:407` suggested-project corrections will now actually apply; (3) `POST /memory/{id}/invalidate` replaces the `_get_memories(status="active", limit=1)` lookup (which only saw the highest-confidence active row) with a direct id lookup via new `get_memory(id)` helper. 9 regression tests added across `test_memory.py` and `test_invalidate_supersede.py`. Full local suite: 581 passed (572 → 581). Commit `fb4d55c`. Branch not pushed/deployed yet — awaiting Codex audit per working model. Refreshed Orientation block (live_sha, last_updated, test_count, memory counts, capture cadence, registered/unregistered project breakdown, open_branches). Wave 1 follow-up still open: (W1.2) one-click registration proposal for unregistered projects with ≥10 active memories (apm=63 is overdue); (W1.3 done by this entry) sync ledger; (W1.4) committed measurable-win probe fixture+JSON output. After this branch lands, V1-A is unblocked: gates have effectively cleared (soak ended 2026-04-26; density 315 ≫ 100 target).
- **2026-04-25 Codex (p04 constraint gap closed; harness fully green)** Root-caused the remaining `p04-constraints` fixture: the `Zerodur` / `1.2` fact already existed in Trusted Project State (`requirement/key_constraints`), but project-state formatting was category/key ordered and then truncated to the 20% state budget, so contacts/decisions consumed the budget before the relevant requirement. Added query-relevance ranking for Trusted Project State entries before formatting/truncation, with regression coverage in `test_project_state_query_relevance_before_truncation`. Removed the fixture's `known_issue` lane so future p04 constraint regressions are blocking. Cleaned up a duplicate live requirement entry created during diagnosis by invalidating `requirement/mirror-blank-core-constraints`; canonical `requirement/key_constraints` remains active. Verified focused suite: 35 passed. Verified full local suite: 572 passed. Deployed `d3de9f67eaa08dfc5b2d86e8221b8c70fef266d3`; live exact p04 probe now surfaces `[REQUIREMENT] key_constraints` with `1.2` and `Zerodur`. Live retrieval harness: 20/20, 0 known issues, 0 blocking failures.
- **2026-04-25 Codex (project_id backfill + retrieval stabilization closed)** Merged `codex/project-id-metadata-retrieval` into `main` (`867a1ab`) and deployed to Dalidou. Took Chroma-inclusive backup `/srv/storage/atocore/backups/snapshots/20260424T154358Z`, then ran `scripts/backfill_chunk_project_ids.py` per project; populated projects `p04-gigabit`, `p05-interferometer`, `p06-polisher`, `atomizer-v2`, and `atocore` applied cleanly for 33,253 vectors total, with 0 missing/malformed and an immediate final dry-run showing 33,253 already tagged / 0 updates. Post-backfill harness exposed p06 memory-ranking misses (`Tailscale`, `encoder`), so Codex shipped `4744c69` then `a87d984 fix(memory): widen query-time context candidates`. Full local suite: 571 passed. Live `/health` reports `a87d9845a8c34395a02890f0cf22aa7a46afaf62`, vectors=33,253, sources_ready=true. Live retrieval harness: 19/20, 0 blocking failures, 1 known issue (`p04-constraints` missing `Zerodur` / `1.2`). A repeat backfill dry-run after the code-only stabilization deploy was aborted after the one-off container ran too long; the live service stayed healthy and the earlier post-apply idempotency result remains the migration acceptance record. Dalidou HTTP push credentials are still not configured; this session pushed through the Windows credential path.
- **2026-04-24 Codex (retrieval boundary deployed + project_id metadata tranche)** Merged `codex/audit-improvements-foundation` to `main` as `f44a211` and pushed to Dalidou Gitea. Took pre-deploy runtime backup `/srv/storage/atocore/backups/snapshots/20260424T144810Z` (DB + registry, no Chroma). Deployed via `papa@dalidou` canonical `deploy/dalidou/deploy.sh`; live `/health` reports build_sha `f44a2114970008a7eec4e7fc2860c8f072914e38`, build_time `2026-04-24T14:48:44Z`, status ok. Post-deploy retrieval harness: 20 fixtures, 19 pass, 0 blocking failures, 1 known issue (`p04-constraints`). The former blocker `p05-broad-status-no-atomizer` now passes. Manual p05 `context-build "current status"` spot check shows no p04/Atomizer source bleed in retrieved chunks. Started follow-up branch `codex/project-id-metadata-retrieval`: registered-project ingestion now writes explicit `project_id` into DB chunk metadata and Chroma vector metadata; retrieval prefers exact `project_id` when present and keeps path/tag matching as legacy fallback; added dry-run-by-default `scripts/backfill_chunk_project_ids.py` to backfill SQLite + Chroma metadata; added tests for project-id ingestion, registered refresh propagation, exact project-id retrieval, and collision fallback. Verified targeted suite (`test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 36 passed. Verified full suite: 556 passed in 72.44s. Branch not merged or deployed yet.
- **2026-04-24 Codex (project_id audit response)** Applied independent-audit fixes on `codex/project-id-metadata-retrieval`. Closed the nightly `/ingest/sources` clobber risk by adding registry-level `derive_project_id_for_path()` and making unscoped `ingest_file()` derive ownership from registered ingest roots when possible; `refresh_registered_project()` still passes the canonical project id directly. Changed retrieval so empty `project_id` falls through to legacy path/tag ownership instead of short-circuiting as unowned. Hardened `scripts/backfill_chunk_project_ids.py`: `--apply` now requires `--chroma-snapshot-confirmed`, runs Chroma metadata updates before SQLite writes, batches updates, skips/report missing vectors, skips/report malformed metadata, reports already-tagged rows, and turns missing ingestion tables into a JSON `db_warning` instead of a traceback. Added tests for auto-derive ingestion, empty-project fallback, ingest-root overlap rejection, and backfill dry-run/apply/snapshot/missing-vector/malformed cases. Verified targeted suite (`test_backfill_chunk_project_ids.py`, `test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 45 passed. Verified full suite: 565 passed in 73.16s. Local dry-run on empty/default data returns 0 updates with `db_warning` rather than crashing. Branch still not merged/deployed.

View File

@@ -1,11 +1,18 @@
# AtoCore - Current State (2026-04-24)
# AtoCore - Current State (2026-04-25)
Update 2026-04-24: audit-improvements deployed as `f44a211`; live harness is
19/20 with 0 blocking failures and 1 known content gap. Active follow-up branch
`codex/project-id-metadata-retrieval` is at 567 passing tests.
Update 2026-04-25: project-id chunk/vector metadata is deployed and backfilled,
and the final p04 Trusted Project State budget/ranking gap is closed. Live
Dalidou is on `d3de9f6`; `/health` is ok with 33,253 vectors and sources
ready. Live retrieval harness is 20/20 with 0 blocking failures and 0 known
issues. Full local suite: 572 passed.
Live deploy: `2b86543` · Dalidou health: ok · Harness: 18/20 with 1 known
content gap and 1 current blocking project-bleed guard · Tests: 553 passing.
The project-id backfill was applied per populated project after a
Chroma-inclusive backup at
`/srv/storage/atocore/backups/snapshots/20260424T154358Z`. The immediate
post-apply dry-run reported 33,253 already tagged, 0 updates, 0 missing, and 0
malformed. A later repeat dry-run after the code-only ranking deploy was
aborted because the one-off container ran too long; the earlier post-apply
idempotency result remains the migration acceptance record.
## V1-0 landed 2026-04-22
@@ -69,10 +76,10 @@ Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs
| 7G | Re-extraction on prompt version bump | pending |
| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |
## Known gaps (honest, refreshed 2026-04-24)
## Known gaps (honest, refreshed 2026-04-25)
1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
2. **Project-scoped retrieval guard is deployed and passing.** The April 24 p05 broad-status bleed guard now passes on live Dalidou. The active follow-up branch adds explicit `project_id` chunk/vector metadata so the deployed path/tag heuristic can become a legacy fallback.
2. **Project-scoped retrieval guard is deployed and passing.** Explicit `project_id` chunk/vector metadata is now present in SQLite and Chroma for the 33,253-vector corpus. Retrieval prefers exact metadata ownership and keeps path/tag matching as a legacy fallback.
3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
4. **Harness known issue:** `p04-constraints` wants "Zerodur" and "1.2"; live retrieval surfaces related constraints but not those exact strings. Treat as content/state gap until fixed.
4. **Harness is currently green.** The former `p04-constraints` known issue is closed; query-relevant Trusted Project State entries now rank before state-budget truncation.
5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.

View File

@@ -131,10 +131,12 @@ This sits implicitly between Phase 8 (OpenClaw) and Phase 11
(multi-model). Memory-review and engineering-entity commands are
deferred from the shared client until their workflows are exercised.
## What Is Real Today (updated 2026-04-24)
## What Is Real Today (updated 2026-04-25)
- canonical AtoCore runtime on Dalidou (`2b86543`, deploy.sh verified)
- 33,253 vectors across 6 registered projects
- canonical AtoCore runtime on Dalidou (`d3de9f6`, deploy.sh verified)
- 33,253 vectors across 6 registered projects, with explicit `project_id`
metadata backfilled into SQLite and Chroma after snapshot
`/srv/storage/atocore/backups/snapshots/20260424T154358Z`
- 951 captured interactions as of the 2026-04-24 live dashboard; refresh
exact live counts with
`python scripts/live_status.py`
@@ -149,10 +151,13 @@ deferred from the shared client until their workflows are exercised.
- 290 active memories and 0 candidate memories as of the 2026-04-24 live
dashboard
- context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
- query-relevance memory ranking with overlap-density scoring
- retrieval eval harness: 20 fixtures; current live has 19 pass, 1 known
content gap, and 0 blocking failures after the audit-improvements deploy
- 567 tests passing on the active `codex/project-id-metadata-retrieval` branch
- query-relevance memory ranking with overlap-density scoring and widened
query-time candidate pools so older exact-intent project memories can rank
ahead of generic high-confidence notes
- query-relevance Trusted Project State ranking before state-budget truncation
- retrieval eval harness: 20 fixtures; current live has 20 pass, 0 known
issues, and 0 blocking failures
- 572 tests passing on `main`
- nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness****pipeline summary to project state**
- Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
- pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
@@ -173,9 +178,10 @@ These are the current practical priorities.
Target: 100+ active memories.
3. **Multi-model triage** (Phase 11 entry) — switch auto-triage to a
different model than the extractor for independent validation
4. **Fix p04-constraints harness failure** — retrieval doesn't surface
"Zerodur" for p04 constraint queries. Investigate if it's a missing
memory or retrieval ranking issue.
4. **Fix Dalidou Git credentials** — the host checkout can fetch but cannot
push to Gitea over HTTP in non-interactive SSH sessions. Prefer switching
the deploy checkout to a Gitea SSH key; PAT-backed `credential.helper store`
is the fallback.
## Active — Engineering V1 Completion Track (started 2026-04-22)

View File

@@ -404,19 +404,23 @@ def process_candidate(cand, base_url, active_cache, state_cache, known_projects,
known_projects, TIER1_MODEL, DEFAULT_TIMEOUT_S,
)
# Project misattribution fix: suggested_project surfaces from tier 1
# Project misattribution fix: suggested_project surfaces from tier 1.
# Earlier code POSTed only {"content": cand["content"]}, which left
# the project field unchanged because MemoryUpdateRequest had no
# project key and the service signature didn't accept one. Wave 1
# added project to MemoryUpdateRequest and update_memory(); this
# caller now actually applies the suggested project.
suggested = (v1.get("suggested_project") or "").strip()
if suggested and suggested != project and suggested in known_projects:
# Try to re-canonicalize the memory's project
if not dry_run:
try:
import urllib.request as _ur
req = _ur.Request(
f"{base_url}/memory/{mid}", method="PUT",
headers={"Content-Type": "application/json"},
data=json.dumps({"content": cand["content"]}).encode("utf-8"),
data=json.dumps({"project": suggested}).encode("utf-8"),
)
_ur.urlopen(req, timeout=10).read() # triggers canonicalization via update
_ur.urlopen(req, timeout=10).read()
except Exception:
pass
print(f" ↺ misattribution flagged: {project!r}{suggested!r}")

View File

@@ -27,8 +27,7 @@
"expect_absent": [
"polisher suite"
],
"known_issue": true,
"notes": "Known content gap as of 2026-04-24: live retrieval surfaces related constraints but not the exact Zerodur / 1.2 strings. Keep visible, but do not make nightly harness red until the source/state gap is fixed."
"notes": "Regression guard: query-relevant Trusted Project State requirements must survive the project-state budget cap."
},
{
"name": "p04-short-ambiguous",

View File

@@ -303,6 +303,7 @@ class MemoryUpdateRequest(BaseModel):
memory_type: str | None = None
domain_tags: list[str] | None = None
valid_until: str | None = None
project: str | None = None
class ProjectStateSetRequest(BaseModel):
@@ -636,6 +637,7 @@ def api_update_memory(memory_id: str, req: MemoryUpdateRequest) -> dict:
memory_type=req.memory_type,
domain_tags=req.domain_tags,
valid_until=req.valid_until,
project=req.project,
)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
@@ -794,33 +796,25 @@ def api_invalidate_memory(
req: MemoryInvalidateRequest | None = None,
) -> dict:
"""Retract an active memory (Issue E — active → invalid)."""
from atocore.memory.service import get_memories as _get_memories, invalidate_memory
from atocore.memory.service import get_memory, invalidate_memory
reason = req.reason if req else ""
# Quick existence/status check for a clean 404 vs 409.
existing = [
m for m in _get_memories(status="active", limit=1)
if m.id == memory_id
]
if not existing:
# Fall through to generic not-active if the id exists in another status.
all_match = [
m for m in _get_memories(status="candidate", limit=5000)
+ _get_memories(status="invalid", limit=5000)
+ _get_memories(status="superseded", limit=5000)
if m.id == memory_id
]
if all_match:
if all_match[0].status == "invalid":
# Direct id lookup — earlier code used get_memories(status='active', limit=1)
# which only saw the highest-confidence active row, so any other active
# memory would 404 here even though it existed.
target = get_memory(memory_id)
if target is None:
raise HTTPException(status_code=404, detail=f"Memory not found: {memory_id}")
if target.status == "invalid":
return {"status": "already_invalid", "id": memory_id}
if target.status != "active":
raise HTTPException(
status_code=409,
detail=(
f"Memory {memory_id} is {all_match[0].status}; "
f"Memory {memory_id} is {target.status}; "
"use /reject for candidates"
),
)
raise HTTPException(status_code=404, detail=f"Memory not found: {memory_id}")
success = invalidate_memory(memory_id, actor="api-http", reason=reason)
if not success:
@@ -833,15 +827,33 @@ def api_supersede_memory(
memory_id: str,
req: MemorySupersedeRequest | None = None,
) -> dict:
"""Supersede an active memory (Issue E — active → superseded)."""
from atocore.memory.service import supersede_memory
"""Supersede an active memory (Issue E — active → superseded).
Mirrors the invalidate route's status guard: candidates and other
non-active rows must not silently flip to superseded.
"""
from atocore.memory.service import get_memory, supersede_memory
reason = req.reason if req else ""
target = get_memory(memory_id)
if target is None:
raise HTTPException(status_code=404, detail=f"Memory not found: {memory_id}")
if target.status == "superseded":
return {"status": "already_superseded", "id": memory_id}
if target.status != "active":
raise HTTPException(
status_code=409,
detail=(
f"Memory {memory_id} is {target.status}; "
"only active memories can be superseded"
),
)
success = supersede_memory(memory_id, actor="api-http", reason=reason)
if not success:
raise HTTPException(
status_code=404,
detail=f"Memory not found or not active: {memory_id}",
status_code=409,
detail=f"Memory {memory_id} could not be superseded",
)
return {"status": "superseded", "id": memory_id}
@@ -1280,16 +1292,20 @@ def api_dashboard() -> dict:
health beyond the basic /health endpoint.
"""
import json as _json
from collections import Counter
from datetime import datetime as _dt, timezone as _tz
all_memories = get_memories(active_only=False, limit=500)
active = [m for m in all_memories if m.status == "active"]
candidates = [m for m in all_memories if m.status == "candidate"]
from atocore.memory.service import get_memory_count_summary
type_counts = dict(Counter(m.memory_type for m in active))
project_counts = dict(Counter(m.project or "(none)" for m in active))
reinforced = [m for m in active if m.reference_count > 0]
# SQL-backed counts. Earlier code derived these by sampling the top
# 500 rows of get_memories() ordered by confidence — anything past
# the cap was invisible, so /admin/dashboard silently undercounted
# active memories once the corpus crossed ~500 active rows.
counts = get_memory_count_summary()
active_total = counts["active"]["total"]
candidate_total = counts["by_status"].get("candidate", 0)
type_counts = counts["active"]["by_type"]
project_counts = counts["active"]["by_project"]
reinforced_total = counts["active"]["reinforced"]
# Interaction stats — total + by_client from DB directly
interaction_stats: dict = {"most_recent": None, "total": 0, "by_client": {}}
@@ -1402,13 +1418,13 @@ def api_dashboard() -> dict:
# Triage queue health
triage: dict = {
"pending": len(candidates),
"pending": candidate_total,
"review_url": "/admin/triage",
}
if len(candidates) > 50:
triage["warning"] = f"High queue: {len(candidates)} candidates pending review."
elif len(candidates) > 20:
triage["notice"] = f"{len(candidates)} candidates awaiting triage."
if candidate_total > 50:
triage["warning"] = f"High queue: {candidate_total} candidates pending review."
elif candidate_total > 20:
triage["notice"] = f"{candidate_total} candidates awaiting triage."
# Recent audit activity (Phase 4 V1) — last 10 mutations for operator
recent_audit: list[dict] = []
@@ -1420,11 +1436,13 @@ def api_dashboard() -> dict:
return {
"memories": {
"active": len(active),
"candidates": len(candidates),
"active": active_total,
"candidates": candidate_total,
"by_type": type_counts,
"by_project": project_counts,
"reinforced": len(reinforced),
"reinforced": reinforced_total,
"by_status": counts["by_status"],
"total": counts["total"],
},
"project_state": {
"counts": ps_counts,

View File

@@ -11,7 +11,7 @@ from dataclasses import dataclass, field
from pathlib import Path
import atocore.config as _config
from atocore.context.project_state import format_project_state, get_state
from atocore.context.project_state import ProjectStateEntry, format_project_state, get_state
from atocore.memory.service import get_memories_for_context
from atocore.observability.logger import get_logger
from atocore.engineering.service import get_entities, get_entity_with_context
@@ -116,6 +116,11 @@ def build_context(
if canonical_project:
state_entries = get_state(canonical_project)
if state_entries:
state_entries = _rank_project_state_entries(
state_entries,
query=user_prompt,
project=canonical_project,
)
project_state_text = format_project_state(state_entries)
project_state_text, project_state_chars = _truncate_text_block(
project_state_text,
@@ -284,6 +289,55 @@ def get_last_context_pack() -> ContextPack | None:
return _last_context_pack
def _rank_project_state_entries(
entries: list[ProjectStateEntry],
query: str,
project: str,
) -> list[ProjectStateEntry]:
"""Promote query-relevant trusted state before the state band is truncated."""
if not query or len(entries) <= 1:
return entries
from atocore.memory.reinforcement import _normalize, _tokenize
query_text = _normalize(query.replace("_", " "))
query_tokens = set(_tokenize(query_text))
query_tokens -= {
"how",
"what",
"when",
"where",
"which",
"who",
"why",
"current",
"status",
"project",
}
for part in (project or "").lower().replace("_", "-").split("-"):
query_tokens.discard(part)
if not query_tokens:
return entries
scored: list[tuple[int, float, float, int, ProjectStateEntry]] = []
for index, entry in enumerate(entries):
entry_text = " ".join(
[
entry.category,
entry.key.replace("_", " "),
entry.value,
entry.source,
]
)
entry_tokens = _tokenize(_normalize(entry_text))
overlap = len(entry_tokens & query_tokens) if entry_tokens else 0
density = overlap / len(entry_tokens) if entry_tokens else 0.0
scored.append((overlap, density, entry.confidence, -index, entry))
scored.sort(key=lambda item: (item[0], item[1], item[2], item[3]), reverse=True)
return [entry for _, _, _, _, entry in scored]
def _rank_chunks(
candidates: list[ChunkResult],
project_hint: str | None,

View File

@@ -50,6 +50,9 @@ MEMORY_STATUSES = [
"graduated", # Phase 5: memory has become an entity; content frozen, forward pointer in properties
]
DEFAULT_CONTEXT_MEMORY_LIMIT = 30
QUERY_CONTEXT_MEMORY_LIMIT = 120
@dataclass
class Memory:
@@ -344,6 +347,83 @@ def get_memories(
return [_row_to_memory(r) for r in rows]
def get_memory(memory_id: str) -> Memory | None:
"""Return a single memory by id, or None if missing.
Direct id lookup (no LIMIT, no confidence ordering) — the right
primitive for routes that need to check a specific memory's status
before acting. Avoids the sampling pitfall where ``get_memories``
with a small ``limit`` could hide a target row sorted past the cap.
"""
with get_connection() as conn:
row = conn.execute(
"SELECT * FROM memories WHERE id = ?", (memory_id,)
).fetchone()
return _row_to_memory(row) if row else None
def get_memory_count_summary() -> dict:
"""Aggregate memory counts straight from SQL (no sampling).
Returned shape:
{
"total": int, # all rows
"by_status": {status: int, ...}, # full table
"active": {
"total": int,
"reinforced": int, # active with reference_count > 0
"by_type": {memory_type: int, ...},
"by_project": {project_or_none: int, ...},
},
}
Distinct from ``get_memories(...)``, which is a row-fetcher with a
confidence-sorted LIMIT and is therefore not safe for counting.
"""
summary: dict = {
"total": 0,
"by_status": {},
"active": {
"total": 0,
"reinforced": 0,
"by_type": {},
"by_project": {},
},
}
with get_connection() as conn:
row = conn.execute("SELECT count(*) FROM memories").fetchone()
summary["total"] = row[0] if row else 0
rows = conn.execute(
"SELECT status, count(*) FROM memories GROUP BY status"
).fetchall()
summary["by_status"] = {r[0]: r[1] for r in rows}
active_total = summary["by_status"].get("active", 0)
summary["active"]["total"] = active_total
rows = conn.execute(
"SELECT memory_type, count(*) FROM memories "
"WHERE status = 'active' GROUP BY memory_type"
).fetchall()
summary["active"]["by_type"] = {r[0]: r[1] for r in rows}
rows = conn.execute(
"SELECT COALESCE(NULLIF(project, ''), '(none)') AS project, count(*) "
"FROM memories WHERE status = 'active' GROUP BY project"
).fetchall()
summary["active"]["by_project"] = {r[0]: r[1] for r in rows}
row = conn.execute(
"SELECT count(*) FROM memories "
"WHERE status = 'active' AND reference_count > 0"
).fetchone()
summary["active"]["reinforced"] = row[0] if row else 0
return summary
def update_memory(
memory_id: str,
content: str | None = None,
@@ -352,6 +432,7 @@ def update_memory(
memory_type: str | None = None,
domain_tags: list[str] | None = None,
valid_until: str | None = None,
project: str | None = None,
actor: str = "api",
note: str = "",
) -> bool:
@@ -365,6 +446,10 @@ def update_memory(
next_content = content if content is not None else existing["content"]
next_status = status if status is not None else existing["status"]
next_project = (
resolve_project_name(project) if project is not None
else (existing["project"] or "")
)
if confidence is not None:
_validate_confidence(confidence)
@@ -372,7 +457,7 @@ def update_memory(
duplicate = conn.execute(
"SELECT id FROM memories "
"WHERE memory_type = ? AND content = ? AND project = ? AND status = 'active' AND id != ?",
(existing["memory_type"], next_content, existing["project"] or "", memory_id),
(existing["memory_type"], next_content, next_project, memory_id),
).fetchone()
if duplicate:
raise ValueError("Update would create a duplicate active memory")
@@ -383,6 +468,7 @@ def update_memory(
"status": existing["status"],
"confidence": existing["confidence"],
"memory_type": existing["memory_type"],
"project": existing["project"] or "",
}
after_snapshot = dict(before_snapshot)
@@ -419,6 +505,10 @@ def update_memory(
updates.append("valid_until = ?")
params.append(vu)
after_snapshot["valid_until"] = vu or ""
if project is not None:
updates.append("project = ?")
params.append(next_project)
after_snapshot["project"] = next_project
if not updates:
return False
@@ -896,6 +986,7 @@ def get_memories_for_context(
from atocore.memory.reinforcement import _normalize, _tokenize
query_tokens = _tokenize(_normalize(query))
query_tokens = _prepare_memory_query_tokens(query_tokens, project=project)
if not query_tokens:
query_tokens = None
@@ -908,12 +999,13 @@ def get_memories_for_context(
# ``_rank_memories_for_query`` via Python's stable sort.
pool: list[Memory] = []
seen_ids: set[str] = set()
candidate_limit = QUERY_CONTEXT_MEMORY_LIMIT if query_tokens is not None else DEFAULT_CONTEXT_MEMORY_LIMIT
for mtype in memory_types:
for mem in get_memories(
memory_type=mtype,
project=project,
min_confidence=0.5,
limit=30,
limit=candidate_limit,
):
if mem.id in seen_ids:
continue
@@ -980,11 +1072,11 @@ def _rank_memories_for_query(
) -> list["Memory"]:
"""Rerank a memory list by lexical overlap with a pre-tokenized query.
Primary key: overlap_density (overlap_count / memory_token_count),
which rewards short focused memories that match the query precisely
over long overview memories that incidentally share a few tokens.
Secondary: absolute overlap count. Tertiary: domain-tag match.
Quaternary: confidence.
Primary key: absolute overlap count, which keeps a richer memory
matching multiple query-intent terms ahead of a short memory that
only happens to share one term. Secondary: overlap_density
(overlap_count / memory_token_count), so ties still prefer short
focused memories. Tertiary: domain-tag match. Quaternary: confidence.
Phase 3: domain_tags contribute a boost when they appear in the
query text. A memory tagged [optics, thermal] for a query about
@@ -1010,10 +1102,46 @@ def _rank_memories_for_query(
tag_hits += 1
scored.append((density, overlap, tag_hits, mem.confidence, mem))
scored.sort(key=lambda t: (t[0], t[1], t[2], t[3]), reverse=True)
scored.sort(key=lambda t: (t[1], t[0], t[2], t[3]), reverse=True)
return [mem for _, _, _, _, mem in scored]
_MEMORY_QUERY_STOP_TOKENS = {
"how",
"what",
"when",
"where",
"which",
"who",
"why",
"current",
"status",
"project",
"machine",
}
_MEMORY_QUERY_TOKEN_EXPANSIONS = {
"remotely": {"remote"},
}
def _prepare_memory_query_tokens(
query_tokens: set[str],
project: str | None = None,
) -> set[str]:
"""Remove project-scope noise and add tiny intent-preserving expansions."""
prepared = set(query_tokens)
for token in list(prepared):
prepared.update(_MEMORY_QUERY_TOKEN_EXPANSIONS.get(token, set()))
prepared -= _MEMORY_QUERY_STOP_TOKENS
if project:
for part in project.lower().replace("_", "-").split("-"):
if part:
prepared.discard(part)
return prepared
def _row_to_memory(row) -> Memory:
"""Convert a DB row to Memory dataclass."""
import json as _json

View File

@@ -143,6 +143,52 @@ def test_project_state_respects_total_budget(tmp_data_dir, sample_markdown):
assert len(pack.formatted_context) <= 120
def test_project_state_query_relevance_before_truncation(tmp_data_dir, sample_markdown):
"""Relevant trusted state should survive the project-state budget cap."""
init_db()
init_project_state_schema()
ingest_file(sample_markdown)
set_state(
"p04-gigabit",
"contact",
"abb-space",
"ABB Space is the primary vendor contact for polishing, CCP, IBF, procurement coordination, "
"contract administration, interface planning, and delivery discussions.",
)
set_state(
"p04-gigabit",
"decision",
"back-structure",
"Option B selected: conical isogrid back structure with variable rib density. "
"Chosen over flat-back for stiffness-to-weight ratio and manufacturability.",
)
set_state(
"p04-gigabit",
"decision",
"polishing-vendor",
"ABB Space selected as polishing vendor. Contract includes computer-controlled polishing "
"and ion beam figuring.",
)
set_state(
"p04-gigabit",
"requirement",
"key_constraints",
"The program targets a 1.2 m lightweight Zerodur mirror with filtered mechanical WFE below 15 nm "
"and mass below 103.5 kg.",
)
pack = build_context(
"what are the key GigaBIT M1 program constraints",
project_hint="p04-gigabit",
budget=3000,
)
assert "Zerodur" in pack.formatted_context
assert "1.2" in pack.formatted_context
assert pack.formatted_context.find("[REQUIREMENT]") < pack.formatted_context.find("[CONTACT]")
def test_project_hint_matches_state_case_insensitively(tmp_data_dir, sample_markdown):
"""Project state lookup should not depend on exact casing."""
init_db()

View File

@@ -192,3 +192,120 @@ def test_v1_aliases_present(env):
"/v1/memory/{memory_id}/supersede",
):
assert p in paths, f"{p} missing"
# ---------------------------------------------------------------------------
# Wave 1 (2026-04-29) — invalidation route used to do
# `_get_memories(status='active', limit=1)` and look for the target id
# inside that single highest-confidence row, so any active memory
# outside slot 0 fell through as 404. Direct id lookup fixes it.
# ---------------------------------------------------------------------------
def test_api_invalidate_finds_active_memory_outside_top_one(env):
"""An active memory not at the top of the confidence sort must still
be invalidatable via POST /memory/{id}/invalidate."""
high = create_memory(
memory_type="knowledge",
content="high-confidence top row",
confidence=0.99,
)
low = create_memory(
memory_type="knowledge",
content="lower-confidence target",
confidence=0.55,
)
client = TestClient(app)
r = client.post(f"/memory/{low.id}/invalidate", json={"reason": "wave1 regression"})
assert r.status_code == 200, r.text
assert r.json()["status"] == "invalidated"
# And confirm the high-confidence row is untouched
assert _get_memory(high.id).status == "active"
assert _get_memory(low.id).status == "invalid"
def test_api_invalidate_already_invalid_is_idempotent(env):
m = create_memory(memory_type="knowledge", content="already invalid")
client = TestClient(app)
r1 = client.post(f"/memory/{m.id}/invalidate", json={"reason": "first"})
assert r1.status_code == 200
r2 = client.post(f"/memory/{m.id}/invalidate", json={"reason": "again"})
assert r2.status_code == 200
assert r2.json()["status"] == "already_invalid"
def test_api_invalidate_candidate_returns_409(env):
m = create_memory(
memory_type="knowledge", content="candidate route", status="candidate"
)
client = TestClient(app)
r = client.post(f"/memory/{m.id}/invalidate", json={"reason": "wrong route"})
assert r.status_code == 409
def test_api_invalidate_unknown_id_is_404(env):
client = TestClient(app)
r = client.post("/memory/no-such-id/invalidate", json={"reason": "ghost"})
assert r.status_code == 404
def test_api_supersede_candidate_returns_409(env):
"""Mirror of the invalidate guard: candidates must not silently flip
to superseded via the active-only supersede route."""
m = create_memory(
memory_type="knowledge", content="candidate target", status="candidate"
)
client = TestClient(app)
r = client.post(f"/memory/{m.id}/supersede", json={"reason": "wrong route"})
assert r.status_code == 409
# Row should still be a candidate
assert _get_memory(m.id).status == "candidate"
def test_api_supersede_already_superseded_is_idempotent(env):
m = create_memory(memory_type="knowledge", content="will be superseded")
client = TestClient(app)
r1 = client.post(f"/memory/{m.id}/supersede", json={"reason": "first"})
assert r1.status_code == 200
r2 = client.post(f"/memory/{m.id}/supersede", json={"reason": "again"})
assert r2.status_code == 200
assert r2.json()["status"] == "already_superseded"
def test_api_supersede_unknown_id_is_404(env):
client = TestClient(app)
r = client.post("/memory/no-such-id/supersede", json={"reason": "ghost"})
assert r.status_code == 404
def test_admin_dashboard_active_count_matches_full_table(env):
"""/admin/dashboard memories.active must match the SQL aggregate even
when there are more active memories than the legacy sample limit (500).
This guards the Codex finding that the dashboard was deriving counts
from a confidence-sorted limit=500 fetch, hiding rows past the cap.
We don't need 500 rows in the test — a small corpus that exercises
the SQL-aggregate path is enough; the integrity-vs-dashboard equality
is the invariant being asserted.
"""
# Mix of statuses to exercise the by_status aggregate
create_memory(memory_type="knowledge", content="a")
create_memory(memory_type="knowledge", content="b", project="p06-polisher")
create_memory(memory_type="project", content="c-cand", status="candidate")
cand = create_memory(memory_type="project", content="d-cand", status="candidate")
# Invalidate one to seed an "invalid" bucket
from atocore.memory.service import invalidate_memory
target_id = cand.id
# Promote it first via direct DB so invalidate does flip a candidate
# to invalid via the service path (mirrors actual API trajectory).
invalidate_memory(target_id)
client = TestClient(app)
dash = client.get("/admin/dashboard").json()
assert dash["memories"]["active"] == 2
assert dash["memories"]["candidates"] == 1
assert dash["memories"]["by_status"]["invalid"] == 1
assert dash["memories"]["total"] == 4
assert dash["memories"]["by_project"].get("p06-polisher") == 1
# "(none)" bucket is the COALESCE label for empty/null project
assert "(none)" in dash["memories"]["by_project"]

View File

@@ -428,6 +428,136 @@ def test_context_builder_tag_boost_orders_results(isolated_db):
assert idx_tagged < idx_untagged
def test_project_memory_ranking_ignores_scope_noise(isolated_db):
"""Project words should not crowd out the actual query intent."""
from atocore.memory.service import create_memory, get_memories_for_context
create_memory(
"project",
"Norman is the end operator for p06-polisher and requires an explicit manual mode to operate the machine.",
project="p06-polisher",
confidence=0.7,
)
create_memory(
"project",
"Polisher Control firmware spec document titled 'Fulum Polisher Machine Control Firmware Spec v1' lives in PKM.",
project="p06-polisher",
confidence=0.7,
)
create_memory(
"project",
"Machine design principle: works fully offline and independently; network connection is for remote access only",
project="p06-polisher",
confidence=0.5,
)
create_memory(
"project",
"Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
project="p06-polisher",
confidence=0.5,
)
text, _ = get_memories_for_context(
memory_types=["project"],
project="p06-polisher",
budget=360,
query="how do we access the polisher machine remotely",
)
assert "Tailscale" in text
assert text.find("remote access only") < text.find("Tailscale")
assert "manual mode" not in text
def test_project_memory_ranking_prefers_multiple_intent_hits(isolated_db):
"""A rich memory with several query hits should beat a terse one-hit memory."""
from atocore.memory.service import create_memory, get_memories_for_context
create_memory(
"project",
"CGH vendor selected for p05. Active integration coordination with Katie/AOM.",
project="p05-interferometer",
confidence=0.7,
)
create_memory(
"knowledge",
"Vendor-summary current signal: 4D is the strongest technical Twyman-Green candidate; "
"a certified used Zygo Verifire SV around $55k emerged as a strong value path.",
project="p05-interferometer",
confidence=0.9,
)
text, _ = get_memories_for_context(
memory_types=["project", "knowledge"],
project="p05-interferometer",
budget=220,
query="what is the current vendor signal for the interferometer procurement",
)
assert "4D" in text
assert "Zygo" in text
def test_project_memory_query_ranks_beyond_confidence_prefilter(isolated_db):
"""Query-time ranking should see older low-confidence but exact-intent memories."""
from atocore.memory.service import create_memory, get_memories_for_context
for idx in range(35):
create_memory(
"project",
f"High confidence p06 filler memory {idx}: Polisher Control planning note.",
project="p06-polisher",
confidence=0.9,
)
create_memory(
"project",
"Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
project="p06-polisher",
confidence=0.5,
)
text, _ = get_memories_for_context(
memory_types=["project"],
project="p06-polisher",
budget=360,
query="how do we access the polisher machine remotely",
)
assert "Tailscale" in text
def test_project_memory_query_prefers_exact_cam_fact(isolated_db):
from atocore.memory.service import create_memory, get_memories_for_context
create_memory(
"project",
"Polisher Control firmware spec document titled 'Fulum Polisher Machine Control Firmware Spec v1' lives in PKM.",
project="p06-polisher",
confidence=0.9,
)
create_memory(
"project",
"Polisher Control doc must cover manual mode for Norman as a required deliverable per the plan.",
project="p06-polisher",
confidence=0.9,
)
create_memory(
"project",
"Cam amplitude and offset are mechanically set by operator and read via encoders; no actuators control them.",
project="p06-polisher",
confidence=0.5,
)
text, _ = get_memories_for_context(
memory_types=["project"],
project="p06-polisher",
budget=300,
query="how is cam amplitude controlled on the polisher",
)
assert "encoders" in text
def test_expire_stale_candidates_keeps_reinforced(isolated_db):
from atocore.memory.service import create_memory, expire_stale_candidates
from atocore.models.database import get_connection
@@ -445,3 +575,121 @@ def test_expire_stale_candidates_keeps_reinforced(isolated_db):
assert mid not in expired
mem = _get_memory_by_id(mid)
assert mem["status"] == "candidate"
# ---------------------------------------------------------------------------
# Wave 1 (2026-04-29) — counts come from SQL, not from the top-N sample.
# Exposed by Codex audit when prod /admin/dashboard reported 315 active
# while /admin/integrity-check reported 1091. The dashboard was building
# its counts from a confidence-sorted limit=500 fetch.
# ---------------------------------------------------------------------------
def test_get_memory_count_summary_returns_full_table_aggregates(isolated_db):
"""Counts come from SQL aggregates, not a sampled fetch."""
from atocore.memory.service import (
create_memory,
get_memory_count_summary,
invalidate_memory,
)
# Create more rows than any reasonable sampling LIMIT so any
# LIMIT-based counter would visibly disagree with reality.
for i in range(120):
create_memory(
"knowledge",
f"fact-{i}",
project="p04-gigabit",
confidence=0.9,
status="active",
)
for i in range(7):
create_memory("knowledge", f"cand-{i}", status="candidate")
invalid_obj = create_memory("knowledge", "to-invalidate", status="active")
invalidate_memory(invalid_obj.id)
summary = get_memory_count_summary()
assert summary["total"] == 120 + 7 + 1
assert summary["by_status"]["active"] == 120
assert summary["by_status"]["candidate"] == 7
assert summary["by_status"]["invalid"] == 1
assert summary["active"]["total"] == 120
assert summary["active"]["by_type"] == {"knowledge": 120}
assert summary["active"]["by_project"] == {"p04-gigabit": 120}
def test_get_memory_returns_single_row_or_none(isolated_db):
from atocore.memory.service import create_memory, get_memory
mem = create_memory("knowledge", "single-row test")
fetched = get_memory(mem.id)
assert fetched is not None
assert fetched.id == mem.id
assert get_memory("non-existent-id") is None
def test_update_memory_can_change_project_with_canonicalization(
isolated_db, project_registry
):
"""update_memory(project=...) canonicalizes aliases and writes audit."""
project_registry(("p04-gigabit", ("p04", "gigabit")))
from atocore.memory.service import (
create_memory,
get_memory,
get_memory_audit,
update_memory,
)
mem = create_memory("knowledge", "retargetable fact", project="atocore")
ok = update_memory(mem.id, project="p04") # alias
assert ok is True
refreshed = get_memory(mem.id)
assert refreshed.project == "p04-gigabit" # canonical, not "p04"
audit_rows = get_memory_audit(mem.id, limit=10)
update_rows = [r for r in audit_rows if r.get("action") == "updated"]
assert update_rows, f"expected an updated audit row, got {audit_rows}"
head = update_rows[0]
assert head["before"]["project"] == "atocore"
assert head["after"]["project"] == "p04-gigabit"
def test_update_memory_project_unchanged_when_not_passed(isolated_db):
from atocore.memory.service import create_memory, get_memory, update_memory
mem = create_memory("knowledge", "untouched project", project="p06-polisher")
update_memory(mem.id, content="edited content")
assert get_memory(mem.id).project == "p06-polisher"
def test_update_memory_to_empty_project_detects_global_duplicate(isolated_db):
"""Codex P3: when retargeting to project='' (global), the duplicate
check must scope to the new project. If a global active memory with
the same content already exists, the update must raise."""
import pytest as _pytest
from atocore.memory.service import create_memory, update_memory
create_memory("knowledge", "shared global fact", project="")
scoped = create_memory("knowledge", "shared global fact", project="p04-gigabit")
with _pytest.raises(ValueError, match="duplicate active memory"):
update_memory(scoped.id, project="")
def test_auto_triage_suggested_project_put_body_uses_project_key():
"""Regression: the auto_triage caller used to PUT {"content": ...}
which silently dropped the suggested project change. The fix sends
{"project": suggested}. Inspect the script source so we don't have
to spin up a live triage run."""
from pathlib import Path
src = Path(__file__).resolve().parents[1] / "scripts" / "auto_triage.py"
text = src.read_text(encoding="utf-8")
# The block that PUTs to /memory/{mid} for a suggested_project fix
assert 'json.dumps({"project": suggested})' in text, (
"auto_triage.py must PUT {\"project\": suggested} so the "
"suggested-project correction actually applies. See Wave 1."
)
# And must not be back to the old shape
assert 'json.dumps({"content": cand["content"]})' not in text