docs(ledger): Wave 1 Codex audit closure + amend session log

Records Codex's formal GO WITH CONDITIONS verdict, the two P2 amends shipped on 3a474f7, and the deployment checklist carried into the merge plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(memory): close Codex Wave 1 audit conditions (auto_triage + supersede guard)
2026-04-28 21:54:20 -04:00 · 2026-04-28 21:53:39 -04:00
5 changed files with 95 additions and 10 deletions
--- a/DEV-LEDGER.md
+++ b/DEV-LEDGER.md
@@ -9,7 +9,7 @@
 - **live_sha** (Dalidou `/health` build_sha): `7042eae` (verified 2026-04-29T01:19Z; status=ok; deployed 2026-04-25T01:04Z, docs-only on top of `d3de9f6`)
 - **last_updated**: 2026-04-29 by Claude (Wave 1 debt-pay branch open; Codex review of plan applied)
 - **main_tip**: `7042eae`
- **test_count**: 581 on `claude/wave1-dashboard-counts-and-memory-fixes` (572 on main + 9 Wave 1 regressions)
+- **test_count**: 586 on `claude/wave1-dashboard-counts-and-memory-fixes` (572 on main + 14 Wave 1 regressions; +5 from Codex review amends)
 - **harness**: `20/20 PASS` on live Dalidou, 0 blocking failures, 0 known issues (last harness run nightly 2026-04-28T03:00:30Z)
 - **vectors**: 33,253
 - **active_memories**: 315 dashboard / 1091 integrity (verified live 2026-04-29). Discrepancy was a dashboard sampling bug — Wave 1 commit `fb4d55c` replaces it with SQL aggregates; post-deploy the two will agree.
@@ -25,7 +25,7 @@
 - **dashboard**: http://dalidou:8100/admin/dashboard
 - **active_track**: Wave 1 debt-pay (this session) → V1-A start. V1-A gates have cleared (soak ended 2026-04-26; density 315 ≫ 100). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`.
 - **last_nightly_pipeline**: `2026-04-28T03:00:30Z` — harness 20/20, triage promoted=1 rejected=1 human=0
- **open_branches**: `claude/wave1-dashboard-counts-and-memory-fixes` (commit `fb4d55c`) — three memory-write-path bugs surfaced by Codex review; awaiting Codex audit before merge/deploy.
+- **open_branches**: `claude/wave1-dashboard-counts-and-memory-fixes` (tip `3a474f7`) — three memory-write-path bugs + two follow-on P2 fixes from Codex's formal audit (`auto_triage.py` PUT body + `/memory/{id}/supersede` status guard). Awaiting Codex re-review of the amended branch before squash-merge/deploy.

 ## Active Plan

@@ -170,6 +170,8 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha

 ## Session Log

+- **2026-04-29 Codex + Claude (Wave 1 formal audit closed + amends)** Codex's formal audit of `fb4d55c` (Wave 1 first commit on `claude/wave1-dashboard-counts-and-memory-fixes`): verdict GO WITH CONDITIONS. Two P1-prior closures confirmed (dashboard count bug + invalidate top-1 lookup). Project-update P2 was only "partially closed" — API/service plumbing fine, but `scripts/auto_triage.py:417` still PUT `{"content": cand["content"]}` so the operational suggested-project correction was unreachable even with `MemoryUpdateRequest.project` in place. Codex also flagged the symmetric supersede-route gap as same-class adjacent surface and recommended pulling it in here, not in Wave 1.5. Plus one P3: cover retarget-to-empty-project against a global active duplicate. Amended on `3a474f7`: (1) auto_triage PUT body now `{"project": suggested}` with a guard test that lints the script source for the new shape; (2) `/memory/{id}/supersede` mirrors the invalidate guard via `get_memory(id)` — 404 unknown / 200 already_superseded / 409 wrong-status / 200 superseded; (3) regression test for project-empty duplicate detection. Test count 581 → 586. Codex's recommended deployment checklist (post-deploy verifications, including the run-or-simulate auto-triage retarget probe) carried into the merge plan. Awaiting Codex re-review of the amended tip before squash-merge.
+
 - **2026-04-29 Claude (Wave 1 debt-pay started; Codex review of state-of-service plan)** Audited live state on Dalidou: `/health` build_sha `7042eae` (4d old), harness 20/20, 33,253 vectors, 1,748 docs, 1054 interactions (+103/4d), dashboard memories.active=315 vs integrity.active_memory_count=1091. Drafted a state-of-service assessment + Wave 1/2/3/4 plan, then asked Codex (gpt-5.5) for an adversarial review via `codex exec`. Codex verdict: assessment MIXED, plan ENDORSE WITH CHANGES. Codex caught two factual corrections (the 315-vs-1091 gap is a *sampling bug* not a definitional gap; my "R9 drops unregistered tags" framing is wrong — `extractor_llm.py:213-233` preserves them) and two new memory-write-path bugs I missed. Branched `claude/wave1-dashboard-counts-and-memory-fixes` from `7042eae`, fixed all three: (1) `/admin/dashboard` now uses a new `get_memory_count_summary()` SQL aggregate helper instead of counting inside a confidence-sorted `get_memories(limit=500)` sample; (2) `MemoryUpdateRequest` and `update_memory()` accept `project` with `resolve_project_name` canonicalization + before/after audit, so `auto_triage.py:407` suggested-project corrections will now actually apply; (3) `POST /memory/{id}/invalidate` replaces the `_get_memories(status="active", limit=1)` lookup (which only saw the highest-confidence active row) with a direct id lookup via new `get_memory(id)` helper. 9 regression tests added across `test_memory.py` and `test_invalidate_supersede.py`. Full local suite: 581 passed (572 → 581). Commit `fb4d55c`. Branch not pushed/deployed yet — awaiting Codex audit per working model. Refreshed Orientation block (live_sha, last_updated, test_count, memory counts, capture cadence, registered/unregistered project breakdown, open_branches). Wave 1 follow-up still open: (W1.2) one-click registration proposal for unregistered projects with ≥10 active memories (apm=63 is overdue); (W1.3 done by this entry) sync ledger; (W1.4) committed measurable-win probe fixture+JSON output. After this branch lands, V1-A is unblocked: gates have effectively cleared (soak ended 2026-04-26; density 315 ≫ 100 target).

 - **2026-04-25 Codex (p04 constraint gap closed; harness fully green)** Root-caused the remaining `p04-constraints` fixture: the `Zerodur` / `1.2` fact already existed in Trusted Project State (`requirement/key_constraints`), but project-state formatting was category/key ordered and then truncated to the 20% state budget, so contacts/decisions consumed the budget before the relevant requirement. Added query-relevance ranking for Trusted Project State entries before formatting/truncation, with regression coverage in `test_project_state_query_relevance_before_truncation`. Removed the fixture's `known_issue` lane so future p04 constraint regressions are blocking. Cleaned up a duplicate live requirement entry created during diagnosis by invalidating `requirement/mirror-blank-core-constraints`; canonical `requirement/key_constraints` remains active. Verified focused suite: 35 passed. Verified full local suite: 572 passed. Deployed `d3de9f67eaa08dfc5b2d86e8221b8c70fef266d3`; live exact p04 probe now surfaces `[REQUIREMENT] key_constraints` with `1.2` and `Zerodur`. Live retrieval harness: 20/20, 0 known issues, 0 blocking failures.
--- a/scripts/auto_triage.py
+++ b/scripts/auto_triage.py
@@ -404,19 +404,23 @@ def process_candidate(cand, base_url, active_cache, state_cache, known_projects,
        known_projects, TIER1_MODEL, DEFAULT_TIMEOUT_S,
    )

-    # Project misattribution fix: suggested_project surfaces from tier 1
+    # Project misattribution fix: suggested_project surfaces from tier 1.
+    # Earlier code POSTed only {"content": cand["content"]}, which left
+    # the project field unchanged because MemoryUpdateRequest had no
+    # project key and the service signature didn't accept one. Wave 1
+    # added project to MemoryUpdateRequest and update_memory(); this
+    # caller now actually applies the suggested project.
    suggested = (v1.get("suggested_project") or "").strip()
    if suggested and suggested != project and suggested in known_projects:
-        # Try to re-canonicalize the memory's project
        if not dry_run:
            try:
                import urllib.request as _ur
                req = _ur.Request(
                    f"{base_url}/memory/{mid}", method="PUT",
                    headers={"Content-Type": "application/json"},
-                    data=json.dumps({"content": cand["content"]}).encode("utf-8"),
+                    data=json.dumps({"project": suggested}).encode("utf-8"),
                )
-                _ur.urlopen(req, timeout=10).read()  # triggers canonicalization via update
+                _ur.urlopen(req, timeout=10).read()
            except Exception:
                pass
        print(f"    ↺ misattribution flagged: {project!r} → {suggested!r}")
--- a/src/atocore/api/routes.py
+++ b/src/atocore/api/routes.py
@@ -827,15 +827,33 @@ def api_supersede_memory(
    memory_id: str,
    req: MemorySupersedeRequest | None = None,
 ) -> dict:
-    """Supersede an active memory (Issue E — active → superseded)."""
-    from atocore.memory.service import supersede_memory
+    """Supersede an active memory (Issue E — active → superseded).
+
+    Mirrors the invalidate route's status guard: candidates and other
+    non-active rows must not silently flip to superseded.
+    """
+    from atocore.memory.service import get_memory, supersede_memory

    reason = req.reason if req else ""
+    target = get_memory(memory_id)
+    if target is None:
+        raise HTTPException(status_code=404, detail=f"Memory not found: {memory_id}")
+    if target.status == "superseded":
+        return {"status": "already_superseded", "id": memory_id}
+    if target.status != "active":
+        raise HTTPException(
+            status_code=409,
+            detail=(
+                f"Memory {memory_id} is {target.status}; "
+                "only active memories can be superseded"
+            ),
+        )
+
    success = supersede_memory(memory_id, actor="api-http", reason=reason)
    if not success:
        raise HTTPException(
-            status_code=404,
-            detail=f"Memory not found or not active: {memory_id}",
+            status_code=409,
+            detail=f"Memory {memory_id} could not be superseded",
        )
    return {"status": "superseded", "id": memory_id}

--- a/tests/test_invalidate_supersede.py
+++ b/tests/test_invalidate_supersede.py
@@ -249,6 +249,35 @@ def test_api_invalidate_unknown_id_is_404(env):
    assert r.status_code == 404


+def test_api_supersede_candidate_returns_409(env):
+    """Mirror of the invalidate guard: candidates must not silently flip
+    to superseded via the active-only supersede route."""
+    m = create_memory(
+        memory_type="knowledge", content="candidate target", status="candidate"
+    )
+    client = TestClient(app)
+    r = client.post(f"/memory/{m.id}/supersede", json={"reason": "wrong route"})
+    assert r.status_code == 409
+    # Row should still be a candidate
+    assert _get_memory(m.id).status == "candidate"
+
+
+def test_api_supersede_already_superseded_is_idempotent(env):
+    m = create_memory(memory_type="knowledge", content="will be superseded")
+    client = TestClient(app)
+    r1 = client.post(f"/memory/{m.id}/supersede", json={"reason": "first"})
+    assert r1.status_code == 200
+    r2 = client.post(f"/memory/{m.id}/supersede", json={"reason": "again"})
+    assert r2.status_code == 200
+    assert r2.json()["status"] == "already_superseded"
+
+
+def test_api_supersede_unknown_id_is_404(env):
+    client = TestClient(app)
+    r = client.post("/memory/no-such-id/supersede", json={"reason": "ghost"})
+    assert r.status_code == 404
+
+
 def test_admin_dashboard_active_count_matches_full_table(env):
    """/admin/dashboard memories.active must match the SQL aggregate even
    when there are more active memories than the legacy sample limit (500).
--- a/tests/test_memory.py
+++ b/tests/test_memory.py
@@ -661,3 +661,35 @@ def test_update_memory_project_unchanged_when_not_passed(isolated_db):
    mem = create_memory("knowledge", "untouched project", project="p06-polisher")
    update_memory(mem.id, content="edited content")
    assert get_memory(mem.id).project == "p06-polisher"
+
+
+def test_update_memory_to_empty_project_detects_global_duplicate(isolated_db):
+    """Codex P3: when retargeting to project='' (global), the duplicate
+    check must scope to the new project. If a global active memory with
+    the same content already exists, the update must raise."""
+    import pytest as _pytest
+    from atocore.memory.service import create_memory, update_memory
+
+    create_memory("knowledge", "shared global fact", project="")
+    scoped = create_memory("knowledge", "shared global fact", project="p04-gigabit")
+
+    with _pytest.raises(ValueError, match="duplicate active memory"):
+        update_memory(scoped.id, project="")
+
+
+def test_auto_triage_suggested_project_put_body_uses_project_key():
+    """Regression: the auto_triage caller used to PUT {"content": ...}
+    which silently dropped the suggested project change. The fix sends
+    {"project": suggested}. Inspect the script source so we don't have
+    to spin up a live triage run."""
+    from pathlib import Path
+
+    src = Path(__file__).resolve().parents[1] / "scripts" / "auto_triage.py"
+    text = src.read_text(encoding="utf-8")
+    # The block that PUTs to /memory/{mid} for a suggested_project fix
+    assert 'json.dumps({"project": suggested})' in text, (
+        "auto_triage.py must PUT {\"project\": suggested} so the "
+        "suggested-project correction actually applies. See Wave 1."
+    )
+    # And must not be back to the old shape
+    assert 'json.dumps({"content": cand["content"]})' not in text