From 9ab5b3c9d847df69036743eae3fdbff1fc6b7786 Mon Sep 17 00:00:00 2001 From: Anto01 Date: Wed, 22 Apr 2026 14:24:43 -0400 Subject: [PATCH] =?UTF-8?q?docs(planning):=20V1=20Completion=20Plan=20?= =?UTF-8?q?=E2=80=94=20Codex=20sign-off=20(third=20round)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Codex's third-round audit closed the remaining five open questions with concrete file:line resolutions, patched inline in the plan: - F-7 (P1): graduation stack is partially built — graduated_to_entity_id at database.py:143-146, graduated memory status, promote preserves original at service.py:354-356, tests at test_engineering_v1_phase5.py. Gaps: missing direct POST /memory/{id}/graduate route; spec's knowledge -> Fact mismatches ontology (no fact type). Reconcile to parameter or similar. V1-E 2 days -> 3-4 days. - Q-5 / V1-D (P2): renderer reads wall-clock in _footer at mirror.py:320. Fix is injecting regenerated timestamp + checksum as renderer inputs, sorting DB iteration, removing dict ordering deps. Render code must not call wall-clock directly. - project vs project_id (P3): doc note only, no storage rename. - Total estimate: 17.5-19.5 focused days (calendar buffer on top). - Release notes must NOT canonize "Minions" as a V2 name. Use neutral "queued background processing / async workers" wording. Sign-off from Codex: "with those edits, I'd sign off on the five questions. The only non-architectural uncertainty left in the plan is scheduling discipline against the current Now list; that does not block V1-0 once the soak window and memory-density gate clear." Plan frozen. V1-0 starts after pipeline soak (~2026-04-26) and the 100-active-memory density gate clear. Co-Authored-By: Codex Co-Authored-By: Claude Opus 4.7 (1M context) --- DEV-LEDGER.md | 3 + docs/plans/engineering-v1-completion-plan.md | 87 +++++++++++--------- 2 files changed, 51 insertions(+), 39 deletions(-) diff --git a/DEV-LEDGER.md b/DEV-LEDGER.md index 8861fd7..329f2d4 100644 --- a/DEV-LEDGER.md +++ b/DEV-LEDGER.md @@ -146,6 +146,7 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha ## Recent Decisions +- **2026-04-22** **Engineering V1 Completion Plan — Codex sign-off (third round)**. Codex's third-round audit closed the remaining five open questions with concrete resolutions, patched inline in `docs/plans/engineering-v1-completion-plan.md`: (1) F-7 row rewritten with ground truth — schema + preserve-original + test coverage already exist (`graduated_to_entity_id` at `database.py:143-146`, `graduated` status in memory service, promote hook at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90`); **real gaps** are missing direct `POST /memory/{id}/graduate` route and spec's `knowledge→Fact` mismatch (no `fact` entity type exists; reconcile to `parameter` or similar); V1-E 2 → **3–4 days**; (2) Q-5 determinism reframed — don't stabilize the call to `datetime.now()`, inject regenerated timestamp + checksum as renderer inputs, remove DB iteration ordering dependencies; V1-D scope updated; (3) `project` vs `project_id` — doc note only, no rename, resolved; (4) total estimate 16.5–17.5 → **17.5–19.5 focused days** with calendar buffer on top; (5) "Minions" must not be canonized in D-3 release notes — neutral wording ("queued background processing / async workers") only. **Agreement reached**: Claude + Codex + Antoine aligned. V1-0 is ready to start once the current pipeline soak window ends (~2026-04-26) and the 100-memory density target is hit. *Patched by:* Codex. *Signed off by:* Codex ("with those edits, I'd sign off on the five questions"). *Accepted by:* Antoine. *Executor (V1-0 onwards):* Claude. - **2026-04-22** **Engineering V1 Completion Plan revised per Codex second-round file-level audit** — three findings folded in, all with exact file:line refs from Codex: (1) F-1 downgraded from ✅ to 🟡 — `extractor_version` and `canonical_home` missing from `Entity` dataclass and `entities` table per `engineering-v1-acceptance.md:45`; V1-0 scope now adds both fields via additive migration + doc note that `project` IS `project_id` per "fields equivalent to" spec wording; (2) F-2 replaced with ground-truth per-query status: 9 of 20 v1-required queries done (Q-004/Q-005/Q-006/Q-008/Q-009/Q-011/Q-013/Q-016/Q-017), 1 partial (Q-001 needs subsystem-scoped variant), 10 missing (Q-002/003/007/010/012/014/018/019/020); V1-A scope shrank to Q-001 shape fix + Q-6 integration (pillar queries already implemented); V1-C closes the 8 remaining new queries + Q-020 deferred to V1-D; (3) F-5 reframed — generic `conflicts` + `conflict_members` schema already present at `database.py:190`, no migration needed; divergence is detector body (per-type dispatch needs generalization) + routes (`/admin/conflicts/*` needs `/conflicts/*` alias). Total revised to 16.5–17.5 days, ~60 tests. Plan: `docs/plans/engineering-v1-completion-plan.md` at commit `ce3a878` (Codex pulled clean). Three of Codex's eight open questions now answered; remaining: F-7 graduation depth, mirror determinism, `project` rename question, velocity calibration, minions naming. *Proposed by:* Claude. *Reviewed by:* Codex (two rounds). - **2026-04-22** **Engineering V1 Completion Plan revised per Codex first-round review** — original six-phase order (queries → ingest → mirror → graduation → provenance → ops) rejected by Codex as backward: provenance-at-write (F-8) and conflict-detection hooks (F-5 minimal) must precede any phase that writes active entities. Revised to seven phases: V1-0 write-time invariants (F-8 + F-5 hooks + F-1 audit) as hard prerequisite, V1-A minimum query slice proving the model, V1-B ingest, V1-C full query catalog, V1-D mirror, V1-E graduation, V1-F full F-5 spec + ops + docs. Also softened "parallel with Now list" — real collision points listed explicitly; schedule shifted ~4 weeks to reflect that V1-0 cannot start during pipeline soak. Withdrew the "50–70% built" global framing in favor of the per-criterion gap table. Workspace sync note added: Codex's Playground workspace can't see the plan file; canonical dev tree is Windows `C:\Users\antoi\ATOCore`. Plan: `docs/plans/engineering-v1-completion-plan.md`. Awaiting Codex file-level audit once workspace syncs. *Proposed by:* Claude. *First-round review by:* Codex. - **2026-04-22** gbrain-inspired "Phase 8 Minions + typed edges" plan **rejected as packaged** — wrong sequencing (leapfrogged `master-plan-status.md` Now list), wrong predicate set (6 vs V1's 17), wrong canonical boundary (edges-on-wikilinks instead of typed entities+relationships per `memory-vs-entities.md`). Mechanic (durable jobs + typed graph) deferred to V1 home. Record: `docs/decisions/2026-04-22-gbrain-plan-rejection.md`. *Proposed by:* Claude. *Reviewed/rejected by:* Codex. *Ratified by:* Antoine. @@ -163,6 +164,8 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha ## Session Log +- **2026-04-22 Codex (late night)** Third-round audit closed the remaining five open questions. Patched `docs/plans/engineering-v1-completion-plan.md` inline (no commit by Codex). **F-7 finding (P1):** graduation stack is partially built — `_graduation_prompt.py`, `scripts/graduate_memories.py`, `database.py:143-146` (`graduated_to_entity_id`), memory `graduated` status, promote-preserves-original at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90` all exist. Real gaps: no direct `POST /memory/{id}/graduate` route at `routes.py:756`; spec's `knowledge→Fact` doesn't match ontology (`service.py:16` has no `fact` type — reconcile to `parameter` or similar). V1-E estimate 2 → 3–4 days. **Q-5 finding (P2):** "stabilize timestamp" insufficient — renderer reads wall-clock in `_footer()` at `mirror.py:320`; fix is inject regenerated timestamp + checksum as renderer inputs + sort DB iteration + remove dict ordering deps. V1-D scope patched. **Remaining three (P3):** `project` stays as doc-note equivalence (no rename); total estimate 17.5–19.5 focused days; release notes must NOT canonize "Minions" — neutral "queued background processing / async workers" only. **Sign-off:** "with those edits, I'd sign off on the five questions. The only non-architectural uncertainty left in the plan is scheduling discipline against the current Now list; that does not block V1-0 once the soak window and memory-density gate clear." **Status:** Claude + Codex agreed. Plan frozen pending Antoine final accept and gate clearance. Claude to commit Codex's patches + push. + - **2026-04-22 Claude (late night)** Codex second-round review did the full file-level audit and came back with three P1/P2 findings, all with exact file:line refs. Verified each against current code before revising. (1) **F-1 not clean**: `Entity` dataclass at `service.py:67` and `entities` table schema are missing the `extractor_version` and `canonical_home` shared-header fields required by `engineering-v1-acceptance.md:45`; `project` field is the project identifier but not named `project_id` as spec writes (spec wording "fields equivalent to" allows the naming, but needs explicit doc note). V1-0 scope now includes adding both missing fields via additive `_apply_migrations` pattern. (2) **F-2 needed exact statuses, not guesses**: per-function audit gave ground truth — 9 of 20 v1-required queries done, 1 partial (Q-001 returns project-wide tree not subsystem-scoped expand=contains per `engineering-query-catalog.md:71`), 10 missing. V1-A scope shrank to Q-001 shape fix + Q-6 integration (most pillar queries already implemented); V1-C closes the 8 net-new queries + Q-020 to V1-D. (3) **F-5 misframed**: the generic `conflicts` + `conflict_members` schema is ALREADY spec-compliant at `database.py:190`; divergence is detector body at `conflicts.py:36` (per-type dispatch needs generalization) + route path (`/admin/conflicts/*` needs `/conflicts/*` alias). V1-F no longer includes a schema migration; detector generalization + route alignment only. Totals revised to 16.5–17.5 days, ~60 tests (down from 12–17 / 65 because V1-A and V1-F scopes both shrank after audit). Three of the eight open questions resolved. Remaining open: F-7 graduation depth, mirror determinism, `project` naming, velocity calibration, minions-as-V2 naming. No code changes this session — plan + ledger only. Next: commit + push revised plan, then await Antoine+Codex joint sign-off before V1-0 starts. - **2026-04-22 Claude (night)** Codex first-round review of the V1 Completion Plan summary came back with four findings. Three substantive, one workspace-sync: (1) "50–70% built" too loose — replaced with per-criterion table, global framing withdrawn; (2) phase order backward — provenance-at-write (F-8) and conflict hooks (F-5 minimal) depend-upon by every later phase but were in V1-E; new V1-0 prerequisite phase inserted to establish write-time invariants, and V1-A shrunk to a minimum query slice (four pillars Q-001/Q-005/Q-006/Q-017 + Q-6 integration) rather than full catalog closure; (3) "parallel with Now list / disjoint surfaces" too strong — real collisions listed explicitly (V1-0 provenance + memory extractor write path, V1-E graduation + memory module, V1-F conflicts migration + memory promote); schedule shifted ~4 weeks, V1-0 cannot start during pipeline soak; (4) Codex's Playground workspace can't see the plan file or the `src/atocore/engineering/` code — added a Workspace note to the plan directing per-file audit at the Windows canonical dev tree (`C:\Users\antoi\ATOCore`) and noting the three visible file paths (`docs/plans/engineering-v1-completion-plan.md`, `docs/decisions/2026-04-22-gbrain-plan-rejection.md`, `DEV-LEDGER.md`). Revised plan estimate: 12–17 days across 7 phases (up from 11–14 / 6), ~65 tests added (up from ~50). V1-0 is a hard prerequisite; no later phase starts until it lands. Pending Antoine decision on workspace sync (commit+push vs paste-to-Codex) so Codex can do the file-level audit. No code changes this session. diff --git a/docs/plans/engineering-v1-completion-plan.md b/docs/plans/engineering-v1-completion-plan.md index ed93fff..5991608 100644 --- a/docs/plans/engineering-v1-completion-plan.md +++ b/docs/plans/engineering-v1-completion-plan.md @@ -83,7 +83,7 @@ capability exists but does not yet match spec shape or coverage. | F-4 | Candidate review queue end-to-end (list/promote/reject/edit) | 🟡 partial for entities | Memory side shipped in Phase 9 Commit C. Entity side has `promote_entity`, `supersede_entity`, `invalidate_active_entity` but reject path and editable-before-promote may not match spec shape. Need to verify `GET /entities?status=candidate` returns spec shape | | F-5 | Conflict detector fires synchronously; `POST /conflicts/{id}/resolve` + dismiss | 🟡 partial (per Codex 2026-04-22 audit — schema present, detector+routes divergent) | **Schema is already spec-shaped**: `database.py:190` defines the generic `conflicts` + `conflict_members` tables per `conflict-model.md`; `conflicts.py:154` persists through them. **Divergences are in detection and API, not schema**: (1) `conflicts.py:36` dispatches per-type detectors only (`_check_component_conflicts`, `_check_requirement_conflicts`) — needs generalization to slot-key-driven detection; (2) routes live at `/admin/conflicts/*`, spec says `/conflicts/*` — needs alias + deprecation. **No schema migration needed** | | F-6 | Mirror: `/mirror/{project}/overview`, `/decisions`, `/subsystems/{id}`, `/regenerate`; files under `/srv/storage/atocore/data/mirror/`; disputed + curated markers; deterministic output | 🟡 partial | `mirror.py` has `generate_project_overview` with header/state/system/decisions/requirements/materials/vendors/memories/footer sections. API at `/projects/{project_name}/mirror` and `.html`. **Gaps**: no separate `/mirror/{project}/decisions` or `/mirror/{project}/subsystems/{id}` routes, no `POST /regenerate` endpoint, no debounced-async-on-write, no daily refresh, no `⚠ disputed` markers wired to conflicts, no `(curated)` override annotations verified, no golden-file test for determinism | -| F-7 | Memory→entity graduation: `POST /memory/{id}/graduate` + `graduated` status + forward pointer + original preserved | 🟡 partial | `_graduation_prompt.py` exists; `api_request_graduation` + `api_graduation_status` + `api_graduation_stats` routes exist (routes.py:1573, 1607, 2065). Need to verify full flow against F-7 spec — original preserved? `graduated` status row added? forward pointer column present? | +| F-7 | Memory→entity graduation: `POST /memory/{id}/graduate` + `graduated` status + forward pointer + original preserved | 🟡 partial (per Codex 2026-04-22 third-round audit) | `_graduation_prompt.py` exists; `scripts/graduate_memories.py` creates entity candidates from active memories; `database.py:143-146` adds `graduated_to_entity_id`; `memory.service` already has a `graduated` status; `service.py:354-356,389-451` preserves the original memory and marks it `graduated` with a forward pointer on entity promote; `tests/test_engineering_v1_phase5.py:67-90` covers that flow. **Gaps vs spec**: no direct `POST /memory/{id}/graduate` route yet (current surface is batch/admin-driven via `/admin/graduation/request`); no explicit acceptance tests yet for `adaptation→decision` and `project→requirement`; spec wording `knowledge→Fact` does not match the current ontology (there is no `fact` entity type in `service.py` / `_graduation_prompt.py`) and should be reconciled to an actual V1 type such as `parameter` or another ontology-defined entity. | | F-8 | Every active entity has `source_refs`; Q-017 returns ≥1 row for every active entity | 🟡 partial | `Entity.source_refs` field exists; Q-017 (`evidence_chain`) exists. **Gap**: is provenance enforced at write time (not NULL), or just encouraged? Per spec it must be mandatory | ### Quality (Q-1 through Q-6) @@ -94,7 +94,7 @@ capability exists but does not yet match spec shape or coverage. | Q-2 | Each F criterion has happy-path + error-path test, <10s each, <30s total | 🟡 partial | 16 + 15 + 15 + 12 = 58 tests in engineering/queries/v1-phase5/patch files. Need to verify coverage of each F criterion one-for-one | | Q-3 | Conflict invariants enforced by tests (contradictory imports produce conflict, can't promote both, flag-never-block) | 🟡 partial | Tests likely exist in `test_engineering_v1_phase5.py` — verify explicit coverage of the three invariants | | Q-4 | Trust hierarchy enforced by tests (candidates never in context, active-only reinforcement, no auto-project-state writes) | 🟡 partial | Phase 9 Commit B covered the memory side; verify entity side has equivalent tests | -| Q-5 | Mirror has golden-file test, deterministic output | ❌ missing | No golden file seen; mirror output includes `now` timestamp (line 326) which is non-deterministic — would fail Q-5 as written | +| Q-5 | Mirror has golden-file test, deterministic output | ❌ missing | No golden file seen; mirror output reads wall-clock time inside `_footer()` (`mirror.py:320-327`). Determinism should come from injecting the regenerated timestamp/checksum as inputs to the renderer and pinning them in the golden-file test, not from calling `datetime.now()` inside render code | | Q-6 | Killer correctness queries pass against seeded real-ish data (5 seed cases per Q-006/Q-009/Q-011) | ❌ likely missing | No fixture file named for this seen. The three queries exist but there's no evidence of the single integration test described in Q-6 | ### Operational (O-1 through O-5) @@ -296,9 +296,10 @@ They all sit on top of the same entity store and V1-0 invariants. `/mirror/{project}/decisions`, `/mirror/{project}/subsystems/{subsystem}`. - Add `POST /mirror/{project}/regenerate`. - Move generated files to `/srv/storage/atocore/data/mirror/{project}/`. -- **Deterministic output:** stabilize the `now` timestamp (input - parameter pinned by golden tests), sort every iteration, remove - `dict` ordering dependencies. +- **Deterministic output:** inject regenerated timestamp + checksum as + renderer inputs (pinned by golden tests), sort every iteration, and + remove `dict` / database ordering dependencies. The renderer should + not call wall-clock time directly. - `⚠ disputed` markers inline wherever an open conflict touches a rendered field (uses V1-0's F-5 hook output). - `(curated)` annotations where project_state overrides entity state. @@ -322,20 +323,28 @@ lands after everything it depends on is stable. **Scope:** - Verify and close F-7 spec gaps: - - Original memory gets `status="graduated"` (new status). - - Forward-pointer column from graduated memory to entity candidate id. - - Promote-entity preserves original memory. - - Flow tested for `adaptation` → Decision, `project` → Requirement, - `knowledge` → Fact. -- Minimal schema additions: one column + one new status value; additive - migration only. + - Add the missing direct `POST /memory/{id}/graduate` route, reusing the + same prompt/parser as the batch graduation path. + - Keep `/admin/graduation/request` as the bulk lane; direct route is the + per-memory acceptance surface. + - Preserve current behavior where promote marks source memories + `status="graduated"` and sets `graduated_to_entity_id`. + - Flow tested for `adaptation` → Decision and `project` → Requirement. + - Reconcile the spec's `knowledge` → Fact wording with the actual V1 + ontology (no `fact` entity type exists today). Prefer doc alignment to + an existing typed entity such as `parameter`, rather than adding a vague + catch-all `Fact` type late in V1. +- Schema is mostly already in place: `graduated` status exists in memory + service, `graduated_to_entity_id` column + index exist, and promote + preserves the original memory. Remaining work is route surface, + ontology/spec reconciliation, and targeted end-to-end tests. - **Q-4 full trust-hierarchy tests**: no auto-write to project_state from any promote path; active-only reinforcement for entities; etc. (The entity-candidates-excluded-from-context test shipped in V1-0.) **Acceptance:** F-7 ✅, Q-4 ✅. -**Estimated size:** 2 days. +**Estimated size:** 3–4 days. **Tests added:** ~8. @@ -388,10 +397,9 @@ tests). ### Total (revised after Codex 2026-04-22 audit) - Phase budgets: V1-0 (3) + V1-A (1.5) + V1-B (2) + V1-C (2) + V1-D (3-4) - + V1-E (2) + V1-F (3) ≈ **16.5–17.5 days of focused work**. Revised down - slightly from the previous 12–17 estimate because V1-A scope shrank - (four pillar queries are mostly already implemented per Codex audit) - and V1-F F-5 work shrank (no schema migration needed). + + V1-E (3-4) + V1-F (3) ≈ **17.5–19.5 days of focused work**. This is a + realistic engineering-effort estimate, but a single-operator calendar + plan should still carry context-switch / soak / review buffer on top. - Adds roughly **60 tests** (533 → ~593). - Branch strategy: one branch per phase (V1-0 → V1-F), each squash-merged to main after Codex review. Phases sequential because each builds on @@ -496,38 +504,39 @@ following are **explicitly out of scope** for this plan: Three of the original eight questions (F-1 field audit, F-2 per-query audit, F-5 schema divergence) were answered by Codex's 2026-04-22 audit -and folded into the plan. Remaining open questions: +and folded into the plan. One open question remains; the rest are now +resolved in-plan: 1. **Parallel schedule vs Now list.** The first-round review correctly softened this from "fully parallel" to "less disjoint than claimed". Is the revised collision table + pause-points section enough, or should specific Now-list items gate specific V1 phases more strictly? -2. **F-7 graduation gap depth.** Still unaudited. `_graduation_prompt.py` - + `api_request_graduation` + DB schema need one Codex read to tell - us whether V1-E is a 2-day phase or a 4-day phase. +2. **F-7 graduation gap depth.** Resolved by Codex audit. The schema and + preserve-original-memory hook are already in place, so V1-E is not a + greenfield build. But the direct `/memory/{id}/graduate` route and the + ontology/spec mismatch around `knowledge` → `Fact` are still open, so + V1-E is closer to **3–4 days** than 2. -3. **Mirror determinism — where does `now` go?** The current mirror - footer has a live timestamp (`mirror.py:326`). Spec says - deterministic output, spec also shows a `Regenerated:` header with - timestamp (`human-mirror-rules.md:265`). Reconciliation proposal: - timestamp allowed in the header banner but must be an input - parameter so the golden-file test can pin it. Sound right? +3. **Mirror determinism — where does `now` go?** Resolved. Keep the + regenerated timestamp in the rendered output if desired, but pass it + into the renderer as an input value. Golden-file tests pin that input; + render code must not read the clock directly. -4. **`project` field naming.** The spec writes `project_id`; the code - writes `project`. The spec says "fields equivalent to" so naming is - technically flexible. Proposal: V1-0 adds a doc note making this - explicit, no column rename. Is this acceptable, or does Codex want - the rename for cleanliness? +4. **`project` field naming.** Resolved. Keep the existing `project` + field; add the explicit doc note that it is the project identifier for + V1 acceptance purposes. No storage rename needed. -5. **Velocity calibration.** Revised 16.5–17.5 days total. Given Phase - 7A took a week and Phase 7D fit in one session, is this a fair - estimate for a single-operator sprint, or should we build in buffer - for multi-phase context switches? +5. **Velocity calibration.** Resolved. **17.5–19.5 focused days** is a + fair engineering-effort estimate after the F-7 audit. For an actual + operator schedule, keep additional buffer for context switching, soak, + and review rounds. -6. **Minions/queue as V2 item in D-3.** Should we name it explicitly in - V1 release notes as a future track, or leave it unnamed until V2 - planning starts? +6. **Minions/queue as V2 item in D-3.** Resolved. Do not name the + rejected "Minions" plan in V1 release notes. If D-3 includes a future + work section, refer to it neutrally as "queued background processing / + async workers" rather than canonizing a V2 codename before V2 is + designed. ---