> Section headers are stable - do not rename them. Trim Session Log and Recent Decisions to the last 20 entries at session end; older history lives in `git log` and `docs/`.
Pick 30 real captures: 10 that should produce 0 candidates, 10 that should plausibly produce 1, 10 ambiguous/hard. Store as a stable artifact (interaction id, expected count, expected type, notes). Add a runner that scores extractor output against labels.
Success: 30 labeled interactions in a stable artifact, one-command precision/recall output.
Fail-early: if labeling 30 takes more than a day because the concept is unclear, tighten the extraction target before touching code.
Run the rule-based extractor on all 30. Record yield, TP, FP, FN. Bucket misses by class (conversational preference, decision summary, status/constraint, meta chatter).
Success: short scorecard with counts by miss type, top 2 miss classes obvious.
If rule expansion reaches a **meaningfully reviewable queue**, keep going with rules. Otherwise prototype an LLM-assisted extraction mode behind a flag.
Hard stop: if candidate yield is still under 10% after this point, stop rule tinkering and switch to architecture review (LLM-assisted OR narrower extraction scope).
Grow across p04/p05/p06. Include short ambiguous prompts, cross-project collision cases, expected project-state wins, expected project-memory wins, and 1-2 "should fail open / low confidence" cases.
Run harness on current code vs live Dalidou. Inspect failures (ranking, ingestion gap, project bleed, budget). Make at most ONE ranking/budget tweak if the harness clearly justifies it. Do not mix harness expansion and ranking changes in a single commit unless tightly coupled.
Success: harness still passes or improves after extractor work; any ranking tweak is justified by a concrete fixture delta.
Fail-early: if > 20-25% of harness fixtures regress after extractor changes, separate concerns before merging.
One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-harness-expansion` for Day 6-7. Keeps extraction and retrieval judgments auditable.
| R1 | Codex | P1 | deploy/hooks/capture_stop.py:76-85 | Live Claude capture still omits `extract`, so "loop closed both sides" remains overstated in practice even though the API supports it | fixed | Claude | 2026-04-11 | c67bec0 |
| R3 | Claude | P2 | src/atocore/memory/extractor.py | Rule cues (`## Decision:`) never fire on conversational LLM text | declined | Claude | 2026-04-11 | see 2026-04-14 session log |
| R5 | Codex | P1 | src/atocore/interactions/service.py:157-174 | The deployed extraction path still calls only the rule extractor; the new LLM extractor is eval/script-only, so Day 4 "gate cleared" is true as a benchmark result but not as an operational extraction path | fixed | Claude | 2026-04-12 | c67bec0 |
| R6 | Codex | P1 | src/atocore/memory/extractor_llm.py:258-276 | LLM extraction accepts model-supplied `project` verbatim with no fallback to `interaction.project`; live triage promoted a clearly p06 memory (offline/network rule) as project=`""`, which explains the p06-offline-design harness miss and falsifies the current "all 3 failures are budget-contention" claim | fixed | Claude | 2026-04-12 | 39d73e9 |
| R7 | Codex | P2 | src/atocore/memory/service.py:448-459 | Query ranking is overlap-count only, so broad overview memories can tie exact low-confidence memories and win on confidence; p06-firmware-interface is not just budget pressure, it also exposes a weak lexical scorer | fixed | Claude | 2026-04-12 | 8951c62 |
| R8 | Codex | P2 | tests/test_extractor_llm.py:1-7 | LLM extractor tests stop at parser/failure contracts; there is no automated coverage for the script-only persistence/review path that produced the 16 promoted memories, including project-scope preservation | fixed | Claude | 2026-04-12 | 69c9717 |
| R9 | Codex | P2 | src/atocore/memory/extractor_llm.py:258-259 | The R6 fallback only repairs empty project output. A wrong non-empty model project still overrides the interaction's known scope, so project attribution is improved but not yet trust-preserving. | fixed | Claude | 2026-04-12 | e5e9a99 |
| R10 | Codex | P2 | docs/master-plan-status.md:31-33 | "Phase 8 - OpenClaw Integration" is fair as a baseline milestone, but not as a "primary" integration claim. `t420-openclaw/atocore.py` currently covers a narrow read-oriented subset (13 request shapes vs 32 API routes) plus fail-open health, while memory/interactions/admin write paths remain out of surface. | fixed | Claude | 2026-04-12 | (pending) |
| R11 | Codex | P2 | src/atocore/api/routes.py:773-845 | `POST /admin/extract-batch` still accepts `mode="llm"` inside the container and returns a successful 0-candidate result instead of surfacing that host-only LLM extraction is unavailable from this runtime. That is a misleading API contract for operators. | fixed | Claude | 2026-04-12 | (pending) |
| R12 | Codex | P2 | scripts/batch_llm_extract_live.py:39-190 | The host-side extractor duplicates the LLM system prompt and JSON parsing logic from `src/atocore/memory/extractor_llm.py`. It works today, but this is now a prompt/parser drift risk across the container and host implementations. | fixed | Claude | 2026-04-12 | (pending) |
| R13 | Codex | P2 | DEV-LEDGER.md:12 | The new `286 passing` test-count claim is not reproducibly auditable from the current audit environments: neither Dalidou nor the clean worktree has `pytest` available. The claim may be true in Claude's dev shell, but it remains unverified in this audit. | fixed | Claude | 2026-04-12 | (pending) |
- **2026-04-22** **Engineering V1 Completion Plan — Codex sign-off (third round)**. Codex's third-round audit closed the remaining five open questions with concrete resolutions, patched inline in `docs/plans/engineering-v1-completion-plan.md`: (1) F-7 row rewritten with ground truth — schema + preserve-original + test coverage already exist (`graduated_to_entity_id` at `database.py:143-146`, `graduated` status in memory service, promote hook at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90`); **real gaps** are missing direct `POST /memory/{id}/graduate` route and spec's `knowledge→Fact` mismatch (no `fact` entity type exists; reconcile to `parameter` or similar); V1-E 2 → **3–4 days**; (2) Q-5 determinism reframed — don't stabilize the call to `datetime.now()`, inject regenerated timestamp + checksum as renderer inputs, remove DB iteration ordering dependencies; V1-D scope updated; (3) `project` vs `project_id` — doc note only, no rename, resolved; (4) total estimate 16.5–17.5 → **17.5–19.5 focused days** with calendar buffer on top; (5) "Minions" must not be canonized in D-3 release notes — neutral wording ("queued background processing / async workers") only. **Agreement reached**: Claude + Codex + Antoine aligned. V1-0 is ready to start once the current pipeline soak window ends (~2026-04-26) and the 100-memory density target is hit. *Patched by:* Codex. *Signed off by:* Codex ("with those edits, I'd sign off on the five questions"). *Accepted by:* Antoine. *Executor (V1-0 onwards):* Claude.
- **2026-04-22** **Engineering V1 Completion Plan revised per Codex first-round review** — original six-phase order (queries → ingest → mirror → graduation → provenance → ops) rejected by Codex as backward: provenance-at-write (F-8) and conflict-detection hooks (F-5 minimal) must precede any phase that writes active entities. Revised to seven phases: V1-0 write-time invariants (F-8 + F-5 hooks + F-1 audit) as hard prerequisite, V1-A minimum query slice proving the model, V1-B ingest, V1-C full query catalog, V1-D mirror, V1-E graduation, V1-F full F-5 spec + ops + docs. Also softened "parallel with Now list" — real collision points listed explicitly; schedule shifted ~4 weeks to reflect that V1-0 cannot start during pipeline soak. Withdrew the "50–70% built" global framing in favor of the per-criterion gap table. Workspace sync note added: Codex's Playground workspace can't see the plan file; canonical dev tree is Windows `C:\Users\antoi\ATOCore`. Plan: `docs/plans/engineering-v1-completion-plan.md`. Awaiting Codex file-level audit once workspace syncs. *Proposed by:* Claude. *First-round review by:* Codex.
- **2026-04-22** gbrain-inspired "Phase 8 Minions + typed edges" plan **rejected as packaged** — wrong sequencing (leapfrogged `master-plan-status.md` Now list), wrong predicate set (6 vs V1's 17), wrong canonical boundary (edges-on-wikilinks instead of typed entities+relationships per `memory-vs-entities.md`). Mechanic (durable jobs + typed graph) deferred to V1 home. Record: `docs/decisions/2026-04-22-gbrain-plan-rejection.md`. *Proposed by:* Claude. *Reviewed/rejected by:* Codex. *Ratified by:* Antoine.
- **2026-04-12** Day 4 gate cleared: LLM-assisted extraction via `claude -p` (OAuth, no API key) is the path forward. Rule extractor stays as default for structural cues. *Proposed by:* Claude. *Ratified by:* Antoine.
- **2026-04-12** First live triage: 16 promoted, 35 rejected from 51 LLM-extracted candidates. 31% accept rate. Active memory count 20->36. *Executed by:* Claude. *Ratified by:* Antoine.
- **2026-04-12** No API keys allowed in AtoCore — LLM-assisted features use OAuth via `claude -p` or equivalent CLI-authenticated paths. *Proposed by:* Antoine.
- **2026-04-12** Multi-model extraction direction: extraction/triage should be model-agnostic, with Codex/Gemini/Ollama as second-pass reviewers for robustness. *Proposed by:* Antoine.
- **2026-04-11** Review findings live in `DEV-LEDGER.md` with Codex owning finding text and Claude updating status fields only. *Proposed by:* Codex. *Ratified by:* Antoine.
- **2026-04-11** Project memories land in the pack under `--- Project Memories ---` at 25% budget ratio, gated on canonical project hint. *Proposed by:* Claude.
- **2026-04-11** Extraction stays off the capture hot path. Batch / manual only. *Proposed by:* Antoine.
- **2026-04-22 Codex (late night)** Third-round audit closed the remaining five open questions. Patched `docs/plans/engineering-v1-completion-plan.md` inline (no commit by Codex). **F-7 finding (P1):** graduation stack is partially built — `_graduation_prompt.py`, `scripts/graduate_memories.py`, `database.py:143-146` (`graduated_to_entity_id`), memory `graduated` status, promote-preserves-original at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90` all exist. Real gaps: no direct `POST /memory/{id}/graduate` route at `routes.py:756`; spec's `knowledge→Fact` doesn't match ontology (`service.py:16` has no `fact` type — reconcile to `parameter` or similar). V1-E estimate 2 → 3–4 days. **Q-5 finding (P2):** "stabilize timestamp" insufficient — renderer reads wall-clock in `_footer()` at `mirror.py:320`; fix is inject regenerated timestamp + checksum as renderer inputs + sort DB iteration + remove dict ordering deps. V1-D scope patched. **Remaining three (P3):**`project` stays as doc-note equivalence (no rename); total estimate 17.5–19.5 focused days; release notes must NOT canonize "Minions" — neutral "queued background processing / async workers" only. **Sign-off:** "with those edits, I'd sign off on the five questions. The only non-architectural uncertainty left in the plan is scheduling discipline against the current Now list; that does not block V1-0 once the soak window and memory-density gate clear." **Status:** Claude + Codex agreed. Plan frozen pending Antoine final accept and gate clearance. Claude to commit Codex's patches + push.
- **2026-04-22 Claude (late night)** Codex second-round review did the full file-level audit and came back with three P1/P2 findings, all with exact file:line refs. Verified each against current code before revising. (1) **F-1 not clean**: `Entity` dataclass at `service.py:67` and `entities` table schema are missing the `extractor_version` and `canonical_home` shared-header fields required by `engineering-v1-acceptance.md:45`; `project` field is the project identifier but not named `project_id` as spec writes (spec wording "fields equivalent to" allows the naming, but needs explicit doc note). V1-0 scope now includes adding both missing fields via additive `_apply_migrations` pattern. (2) **F-2 needed exact statuses, not guesses**: per-function audit gave ground truth — 9 of 20 v1-required queries done, 1 partial (Q-001 returns project-wide tree not subsystem-scoped expand=contains per `engineering-query-catalog.md:71`), 10 missing. V1-A scope shrank to Q-001 shape fix + Q-6 integration (most pillar queries already implemented); V1-C closes the 8 net-new queries + Q-020 to V1-D. (3) **F-5 misframed**: the generic `conflicts` + `conflict_members` schema is ALREADY spec-compliant at `database.py:190`; divergence is detector body at `conflicts.py:36` (per-type dispatch needs generalization) + route path (`/admin/conflicts/*` needs `/conflicts/*` alias). V1-F no longer includes a schema migration; detector generalization + route alignment only. Totals revised to 16.5–17.5 days, ~60 tests (down from 12–17 / 65 because V1-A and V1-F scopes both shrank after audit). Three of the eight open questions resolved. Remaining open: F-7 graduation depth, mirror determinism, `project` naming, velocity calibration, minions-as-V2 naming. No code changes this session — plan + ledger only. Next: commit + push revised plan, then await Antoine+Codex joint sign-off before V1-0 starts.
- **2026-04-22 Claude (night)** Codex first-round review of the V1 Completion Plan summary came back with four findings. Three substantive, one workspace-sync: (1) "50–70% built" too loose — replaced with per-criterion table, global framing withdrawn; (2) phase order backward — provenance-at-write (F-8) and conflict hooks (F-5 minimal) depend-upon by every later phase but were in V1-E; new V1-0 prerequisite phase inserted to establish write-time invariants, and V1-A shrunk to a minimum query slice (four pillars Q-001/Q-005/Q-006/Q-017 + Q-6 integration) rather than full catalog closure; (3) "parallel with Now list / disjoint surfaces" too strong — real collisions listed explicitly (V1-0 provenance + memory extractor write path, V1-E graduation + memory module, V1-F conflicts migration + memory promote); schedule shifted ~4 weeks, V1-0 cannot start during pipeline soak; (4) Codex's Playground workspace can't see the plan file or the `src/atocore/engineering/` code — added a Workspace note to the plan directing per-file audit at the Windows canonical dev tree (`C:\Users\antoi\ATOCore`) and noting the three visible file paths (`docs/plans/engineering-v1-completion-plan.md`, `docs/decisions/2026-04-22-gbrain-plan-rejection.md`, `DEV-LEDGER.md`). Revised plan estimate: 12–17 days across 7 phases (up from 11–14 / 6), ~65 tests added (up from ~50). V1-0 is a hard prerequisite; no later phase starts until it lands. Pending Antoine decision on workspace sync (commit+push vs paste-to-Codex) so Codex can do the file-level audit. No code changes this session.
- **2026-04-22 Claude (late eve)** After the rejection, read the four core V1 architecture docs end-to-end (`engineering-ontology-v1.md`, `engineering-query-catalog.md`, `memory-vs-entities.md`, `engineering-v1-acceptance.md`) plus the four supporting docs (`promotion-rules.md`, `conflict-model.md`, `human-mirror-rules.md`, `tool-handoff-boundaries.md`). Cross-referenced against current code in `src/atocore/engineering/`. **Key finding:** V1 is already 50–70% built — entity types (16, superset of V1's 12), all 18 V1 relationship types, 4-state lifecycle, CRUD + supersede + invalidate + PATCH, queries module with most killer-correctness queries (orphan_requirements, risky_decisions, unsupported_claims, impact_analysis, evidence_chain), conflicts module scaffolded, mirror scaffolded, graduation endpoint scaffolded. Recent commits e147ab2/b94f9df/081c058/069d155/b1a3dd0 are all V1 entity-layer work. Drafted `docs/plans/engineering-v1-completion-plan.md` reframing the work as **V1 completion, not V1 start**. Six sequential phases V1-A through V1-F, estimated 11–14 days, ~50 new tests (533 → ~580). Phases run in parallel with the Now list (pipeline soak + density + multi-model triage + p04-constraints) because surfaces are disjoint. Plan explicitly defers the minions/queue mechanic per acceptance-doc negative list. Pending Codex audit of the plan itself — especially the F-2 query gap list (Claude didn't read each query function end-to-end), F-5 conflicts schema divergence (per-type detectors vs spec's generic slot-keyed shape), and F-7 graduation depth. No code changes this session.
- **2026-04-22 Claude (eve)** gbrain review session. Antoine surfaced https://github.com/garrytan/gbrain for compare/contrast. Claude drafted a "Phase 8 Minions + typed edges" plan pairing a durable job queue with a 6-predicate edge upgrade over wikilinks. Codex reviewed and rejected as packaged: (1) sequencing leapfrogged the `master-plan-status.md` Now list (pipeline soak → 100+ memories → multi-model triage → p04-constraints fix); (2) 6 predicates vs V1's 17 across Structural/Intent/Validation/Provenance families — would have been schema debt on day one per `engineering-ontology-v1.md:112-137`; (3) "edges over wikilinks" bypassed the V1 canonical entity + promotion contract in `memory-vs-entities.md`. Claude verified each high finding against the cited files and concurred. The underlying mechanic (durable background jobs + typed relationship graph) is still a valid future direction, but its correct home is the Engineering V1 sprint under **Next** in `master-plan-status.md:179`, not a leapfrog phase. Decision record: `docs/decisions/2026-04-22-gbrain-plan-rejection.md`. No code changes this session. Next (pending Antoine ack): read the four V1 architecture docs end-to-end, then draft an Engineering V1 foundation plan that follows the existing contract, not a new external reference. Phase 8 (OpenClaw) name remains untouched — Claude's misuse of "Phase 8" in the rejected plan was a naming collision, not a renaming.
- **2026-04-22 Claude (pm)** Issue B (wiki redlinks) landed — last remaining P2 from Antoine's sprint plan. `_wikilink_transform(text, current_project)` in `src/atocore/engineering/wiki.py` replaces `[[Name]]` / `[[Name|Display]]` tokens (pre-markdown) with HTML anchors. Resolution order: same-project exact-name match → live `wikilink`; other-project match → live link with `(in project X)` scope indicator (`wikilink-cross`); no match → `redlink` pointing at `/wiki/new?name=<quoted>&project=<current>`. New route `GET /wiki/new` renders a pre-filled "create this entity" form that POSTs to `/v1/entities` via a minimal inline fetch() and redirects to the new entity's wiki page on success. Transform applied in `render_project` (over the mirror markdown) and `render_entity` (over the description body). CSS: dashed-underline accent for live wikilinks, red italic + dashed for redlinks. 12 new tests including the regression from the spec (entity A references `[[EntityB]]` → initial render has `class="redlink"`; after EntityB is created, re-render no longer has redlink and includes `/wiki/entities/{b.id}`). Tests 521 → 533. All 6 acceptance criteria from the sprint plan ("daily-usable") now green: retract/supersede, edit without cloning, cross-project has a home, visual evidence, wiki readable, AKC can capture reliably.
- **2026-04-22 Claude** PATCH `/entities/{id}` + Issue D (/v1/engineering/* aliases) landed. New `update_entity()` in `src/atocore/engineering/service.py` supports partial updates to description (replace), properties (shallow merge — `null` value deletes a key), confidence (0..1, 400 on bounds violation), source_refs (append + dedup). Writes an `updated` audit row with full before/after snapshots. Forbidden via this path: entity_type / project / name / status — those require supersede+create or the dedicated status endpoints, by design. New route `PATCH /entities/{id}` aliased under `/v1`. Issue D: all 10 `/engineering/*` query paths (decisions, systems, components/{id}/requirements, changes, gaps + sub-paths, impact, evidence) added to the `/v1` allowlist. 12 new PATCH tests (merge, null-delete, confidence bounds, source_refs dedup, 404, audit row, v1 alias). Tests 509 → 521. Next: commit + deploy, then Issue B (wiki redlinks) as the last remaining P2 per Antoine's sprint order.
- **2026-04-21 Claude (night)** Issue E (retraction path for active entities + memories) landed. Two new entity endpoints and two new memory endpoints, all aliased under `/v1`: `POST /entities/{id}/invalidate` (active→invalid, 200 idempotent on already-invalid, 409 if candidate/superseded, 404 if missing), `POST /entities/{id}/supersede` (active→superseded + auto-creates `supersedes` relationship from the new entity to the old one; rejects self-supersede and unknown superseded_by with 400), `POST /memory/{id}/invalidate`, `POST /memory/{id}/supersede`. `invalidate_memory`/`supersede_memory` in service.py now take a `reason` string that lands in the audit `note`. New service helper `invalidate_active_entity(id, reason)` returns `(ok, code)` where code is one of `invalidated | already_invalid | not_active | not_found` for a clean HTTP-status mapping. 15 new tests. Tests 494 → 509. Unblocks correction workflows — no more SQL required to retract mistakes.
- **2026-04-21 Claude (cleanup)** One-time SQL cleanup on live Dalidou: flipped 8 `status='active' → 'invalid'` rows in `entities` (CGH, tower, "interferometer mirror tower", steel, "steel (likely)" in p05-interferometer + 3 remaining `AKC-E2E-Test-*` rows that were still active). Each update paired with a `memory_audit` row (action=`invalidated`, actor=`sql-cleanup`, note references Issue E pending). Executed inside the `atocore` container via `docker exec` since `/srv/storage/atocore/data/db/atocore.db` is root-owned and the service holds write perms. Verification: `GET /entities?project=p05-interferometer&scope_only=true` now 21 active, zero pollution. Issue E (public `POST /v1/entities/{id}/invalidate` for active→invalid) remains open — this cleanup should not be needed again once E ships.
- **2026-04-21 Claude (evening)** Issue F (visual evidence) landed. New `src/atocore/assets/` module provides hash-dedup binary storage (`<assets_dir>/<hash[:2]>/<hash>.<ext>`) with on-demand JPEG thumbnails cached under `.thumbnails/<size>/`. New `assets` table (hash_sha256 unique, mime_type, size, width/height, source_refs, status). `artifact` added to `ENTITY_TYPES`; no schema change needed on entities (`properties` stays free-form JSON carrying `kind`/`asset_id`/`caption`/`capture_context`). `EVIDENCED_BY` already in the relationship enum — no change. New API: `POST /assets` (multipart, 20 MB cap, MIME allowlist: png/jpeg/webp/gif/pdf/step/iges), `GET /assets/{id}` (streams original), `GET /assets/{id}/thumbnail?size=N` (Pillow, 16-2048 px clamp), `GET /assets/{id}/meta`, `GET /admin/assets/orphans`, `DELETE /assets/{id}` (409 if referenced), `GET /entities/{id}/evidence` (returns EVIDENCED_BY artifacts with asset metadata resolved). All aliased under `/v1`. Wiki: artifact entity pages render full image + caption + capture_context; other entity pages render an "Visual evidence" strip of EVIDENCED_BY thumbnails linking to full-res + artifact detail page. PDFs render as a link; other artifact kinds render as labeled chips. Added `python-multipart` + `Pillow>=10.0.0` to deps; docker-compose gets an `${ATOCORE_ASSETS_DIR}` bind mount; Dalidou `.env` updated with `ATOCORE_ASSETS_DIR=/srv/storage/atocore/data/assets`. 16 new tests (hash dedup, size cap, mime allowlist, thumbnail cache, orphan detection, invalidate gating, multipart upload, evidence API, v1 aliases, wiki rendering). Tests 478 → 494.
- **2026-04-21 Claude (pm)** Issue C (inbox + cross-project entities) landed. `inbox` is a reserved pseudo-project: auto-exists, cannot be registered/updated/aliased (enforced in `src/atocore/projects/registry.py` via `is_reserved_project` + `register_project`/`update_project` guards). `project=""` remains the cross-project/global bucket for facts that apply to every project. `resolve_project_name("inbox")` is stable and does not hit the registry. `get_entities` now scopes: `project=""` → only globals; `project="inbox"` → only inbox; `project="<real>"` default → that project plus globals; `scope_only=true` → strict. `POST /entities` accepts `project=null` as equivalent to `""`. `POST /entities/{id}/promote` accepts `{target_project}` to retarget an inbox/global lead into a real project on promote (new "retargeted" audit action). Wiki homepage shows a new "📥 Inbox & Global" section with live counts, linking to scoped `/entities` lists. 15 new tests in `test_inbox_crossproject.py` cover reserved-name enforcement, scoping rules, API shape, and promote retargeting. Tests 463 → 478. Pending: commit, push, deploy. Issue B (wiki redlinks) deferred per AKC thread — P1 cosmetic, not a blocker.
- **2026-04-21 Claude** Issue A (API versioning) landed on `main` working tree (not yet committed/deployed). `src/atocore/main.py` now mounts a second `/v1` router that re-registers an explicit allowlist of public handlers (`_V1_PUBLIC_PATHS`) against the same endpoint functions — entities, relationships, ingest, context/build, query, projects, memory, interactions, project/state, health, sources, stats, and their sub-paths. Unversioned paths are untouched; OpenClaw and hooks keep working. Added `tests/test_v1_aliases.py` (5 tests: health parity, projects parity, entities reachable, v1 paths present in OpenAPI, unversioned paths still present in OpenAPI) and a "API versioning" section in the README documenting the rule (new endpoints at latest prefix, breaking changes bump prefix, unversioned retained for internal callers). Tests 459 → 463. Next: commit + deploy, then relay to the AKC thread so Phase 2 can code against `/v1`. Issues B (wiki redlinks) and C (inbox/cross-project) remain open, unstarted.
- **2026-04-15 Claude (pm)** Closed the last harness failure honestly. **p06-tailscale fixed: 18/18 PASS.** Root-caused: not a retrieval bug — the p06 `ARCHITECTURE.md` Overview chunk legitimately mentions "the GigaBIT M1 telescope mirror" because the Polisher Suite is built *for* that mirror. All four retrieved sources for the tailscale prompt were genuinely p06/shared paths; zero actual p04 chunks leaked. The fixture's `expect_absent: GigaBIT` was catching semantic overlap, not retrieval bleed. Narrowed it to `expect_absent: "[Source: p04-gigabit/"` — a source-path check that tests the real invariant (no p04 source chunks in p06 context). Other p06 fixtures still use the word-blacklist form; they pass today because their more-specific prompts don't pull the ARCHITECTURE.md Overview, so I left them alone rather than churn fixtures that aren't failing. Did NOT change retrieval/ranking — no code change, fixture-only fix. Tests unchanged at 299.
- **2026-04-15 Claude** Deploy + doc debt sweep. Deployed `c2e7064` to Dalidou (build_time 2026-04-15T15:08:51Z, build_sha matches, /health ok) so R11/R12 are now live, not just on main. **R11 verified on live**: `POST /admin/extract-batch {"mode":"llm"}` against http://127.0.0.1:8100 returns HTTP 503 with the operator-facing "claude CLI not on PATH, run host-side script or use mode=rule" message — exactly the post-fix contract. **R13 closed (fixed)**: added a reproduction recipe to Quick Commands (`pip install -r requirements-dev.txt && pytest --collect-only -q && pytest -q`) and re-cited `test_count: 299` against a fresh local collection on 2026-04-15, so the claim is now auditable from any clean checkout — Codex's audit worktree just needs `pip install -r requirements-dev.txt`. **R10 closed (fixed)**: rewrote the `docs/master-plan-status.md` OpenClaw section to explicitly disclaim "primary integration" and report the current narrow surface: 14 client request shapes against ~44 server routes, predominantly read + `/project/state` + `/ingest/sources`, with memory/interactions/admin/entities/triage/extraction writes correctly out of scope. Open findings now: none blocking. Next natural move: the last harness failure `p06-tailscale` (chunk bleed).
- **2026-04-14 Claude (pm)** Closed R11+R12, declined R3. **R11 (fixed):**`POST /admin/extract-batch` with `mode="llm"` now returns 503 when the `claude` CLI is not on PATH, with a message pointing at the host-side script. Previously it silently returned a success-0 payload, masking host-vs-container truth. 2 new tests in `test_extraction_pipeline.py` cover the 503 path and the rule-mode-still-works path. **R12 (fixed):** extracted shared `SYSTEM_PROMPT` + `parse_llm_json_array` + `normalize_candidate_item` + `build_user_message` into stdlib-only `src/atocore/memory/_llm_prompt.py`. Both `src/atocore/memory/extractor_llm.py` (container) and `scripts/batch_llm_extract_live.py` (host) now import from it. The host script uses `sys.path` to reach the stdlib-only module without needing the full atocore package. Project-attribution policy stays path-specific (container uses registry-check; host defers to server). **R3 (declined):** rule cues not firing on conversational LLM text is by design now — the LLM extractor (llm-0.4.0) is the production path for conversational content as of the Day 4 gate (2026-04-12). Expanding rules to match conversational prose risks the FP blowup Day 2 already showed. Rule extractor stays narrow for structural PKM text. Tests 297 → 299. Live `/health` still `58ea21d`; this session's changes need deploy.
- **2026-04-12 Codex (branch `codex/openclaw-capture-plugin`)** added a minimal external OpenClaw plugin at `openclaw-plugins/atocore-capture/` that mirrors Claude Code capture semantics: user-triggered assistant turns are POSTed to AtoCore `/interactions` with `client="openclaw"` and `reinforce=true`, fail-open, no extraction in-path. For live verification, temporarily added the local plugin load path to OpenClaw config and restarted the gateway so the plugin can load. Branch truth is ready; end-to-end verification still needs one fresh post-restart OpenClaw user turn to confirm new `client=openclaw` interactions appear on Dalidou.
- **2026-04-12 Claude** Batch 3 (R9 fix): `144dbbd..e5e9a99`. Trust hierarchy for project attribution — interaction scope always wins when set, model project only used for unscoped interactions + registered check. 7 case tests (A-G) cover every combination. Harness 17/18 (no regression). Tests 286->290. Before: wrong registered project could silently override interaction scope. After: interaction.project is the strongest signal; model project is only a fallback for unscoped captures. Not yet guaranteed: nothing prevents the *same* project's model output from being semantically wrong within that project. R9 marked fixed.
- **2026-04-12 Codex (audit branch `codex/audit-batch2`)** audited `69c9717..origin/main` against the current branch tip and live Dalidou. Verified: live build is `8951c62`, retrieval harness improved to **17/18 PASS**, candidate queue is now empty, active memories rose to **41**, and `python3 scripts/auto_triage.py --dry-run --base-url http://127.0.0.1:8100` runs cleanly on Dalidou but only exercised the empty-queue path. Updated R7 to **fixed** (`8951c62`) and R8 to **fixed** (`69c9717`). Kept R9 **open** because project trust-preservation still allows a wrong non-empty registered project from the model to override the interaction scope. Added R13 because the new `286 passing` claim could not be independently reproduced in this audit: `pytest` is absent on both Dalidou and the clean audit worktree. Also corrected stale Orientation fields (live SHA, main tip, harness, active/candidate memory counts).
- **2026-04-12 Codex (audit branch `codex/audit-2026-04-12-extraction`)** audited `54d84b5..ac7f77d` with live Dalidou verification. Confirmed the host-side LLM extraction pipeline is operational: nightly cron points at `deploy/dalidou/cron-backup.sh`, Step 4 calls `deploy/dalidou/batch-extract.sh`, the batch script exists/executable on Dalidou, and a manual host-side run produced candidates successfully. Updated R1 and R5 to **fixed** (`c67bec0`) because extraction now runs unattended off-container. Live state during audit: build `39d73e9`, active memories **36**, candidate queue **29** (16 existing + 13 added by manual verification run), and `last_extract_batch_run` populated in AtoCore project state. Added R11-R12 for the misleading container `mode=llm` no-op and host/container prompt-parser duplication. Security note: CLI positional prompt/response text is visible in process args while `claude -p` runs; acceptable on a single-user home host, but worth remembering if Dalidou's trust boundary changes.
- **2026-04-12 Codex (audit branch `codex/audit-2026-04-12-final`)** audited `c5bad99..e2895b5` against origin/main, live Dalidou, and the OpenClaw client script. Live state checked: build `39d73e9`, harness reproducible at **16/18 PASS**, active memories **36**, and `t420-openclaw/atocore.py health` fails open correctly with `fail_open=true`. Spot-checks of Wave 2 project-state entries matched their cited vault docs. Updated R5-R8 status reality (R6 fixed by `39d73e9`), added R9-R10, and corrected Orientation `main_tip` to `e2895b5` because the ledger had drifted behind origin/main. Note: live Dalidou is still on `39d73e9`, so branch-truth and deploy-truth are not the same yet.
- **2026-04-12 Codex (audit branch `codex/audit-2026-04-12`)** audited `c5bad99..146f2e4` against code, live Dalidou, and the 36 active memories. Confirmed: `claude -p` invocation is not shell-injection-prone (`subprocess.run(args)` with no shell), off-host backup wiring matches the ledger, and R1 remains unresolved in practice. Added R5-R8. Corrected Orientation `main_tip` (`146f2e4`, not `5c69f77`) and tightened the harness note: p06-firmware-interface is a ranking-tie issue, p06-offline-design comes from a project-scope miss in live triage, and p06-tailscale is retrieved-chunk bleed rather than memory-band budget contention.
- **2026-04-12 Claude** `330ecfb..06792d8` (merged eval-loop branch + triage). Day 1-4 of the mini-phase completed in one session. Day 2 baseline: rule extractor 0% recall, 5 distinct miss classes. Day 4 gate cleared: LLM extractor (claude -p haiku, OAuth) hit 100% recall, 2.55 yield/interaction. Refactored from anthropic SDK to subprocess after "no API key" rule. First live triage: 51 candidates -> 16 promoted, 35 rejected. Active memories 20->36. p06-polisher went from 2 to 16 memories (firmware/telemetry architecture set). POST /memory now accepts status field. Test count 264->278.
- **2026-04-11 Claude** `claude/extractor-eval-loop @ 7d8d599` — Day 1+2 of the mini-phase. Froze a 64-interaction snapshot (`scripts/eval_data/interactions_snapshot_2026-04-11.json`) and labeled 20 by length-stratified random sample (5 positive, 15 zero; 7 total expected candidates). Built `scripts/extractor_eval.py` as a file-based eval runner. **Day 2 baseline: rule extractor hit 0% yield / 0% recall / 0% precision on the labeled set; 5 false negatives across 5 distinct miss classes (recommendation_prose, architectural_change_summary, spec_update_announcement, layered_recommendation, alignment_assertion).** This is the Day 4 hard-stop signal arriving two days early — a single rule expansion cannot close a 5-way miss, and widening rules blindly will collapse precision. The Day 4 decision gate is escalated to Antoine for ratification before Day 3 touches any extractor code. No extractor code on main has changed.
- **2026-04-11 Codex (ledger audit)** fixed stale `main_tip`, retargeted R1 from the API surface to the live Claude Stop hook, and formalized the review write protocol so Claude can consume findings without rewriting them.
- **2026-04-11 Claude** `b3253f3..59331e5` (1 commit). Wired the DEV-LEDGER, added session protocol to AGENTS.md, created project-local CLAUDE.md, deleted stale `codex/port-atocore-ops-client` remote branch. No code changes, no redeploy needed.
- **2026-04-11 Codex (async review)** identified 2 P1s against a stale checkout. R1 was fair (extraction not automated), R2 was outdated (project memories already landed on main). Delivered the 8-day execution plan now in Active Plan.
- **2026-04-06 Antoine** created `codex/atocore-integration-pass` with the `t420-openclaw/` workspace (merged 2026-04-11).
## Working Rules
- Claude builds; Codex audits. No parallel work on the same files.