fix(context): rank trusted state by query relevance

docs: record retrieval stabilization state
fix(memory): widen query-time context candidates
2026-04-24 20:54:56 -04:00 · 2026-04-24 20:45:05 -04:00 · 2026-04-24 20:29:00 -04:00 · 2026-04-24 20:49:53 +00:00 · 2026-04-24 11:36:19 -04:00 · 2026-04-24 11:32:46 -04:00
35 changed files with 3159 additions and 104 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -14,3 +14,8 @@ venv/
 .claude/*
 !.claude/commands/
 !.claude/commands/**
 # Editor / IDE state — user-specific, not project config
 .obsidian/
 .vscode/
 .idea/
--- a/DEV-LEDGER.md
+++ b/DEV-LEDGER.md
@@ -6,23 +6,26 @@
 ## Orientation
- **live_sha** (Dalidou `/health` build_sha): `775960c` (verified 2026-04-16 via /health, build_time 2026-04-16T17:59:30Z)
+- **live_sha** (Dalidou `/health` build_sha): `a87d984` (verified 2026-04-25T00:37Z post memory-ranking stabilization deploy; status=ok)
- **last_updated**: 2026-04-18 by Claude (Phase 7A — Memory Consolidation "sleep cycle" V1 on branch, not yet deployed)
+- **last_updated**: 2026-04-25 by Codex (project_id metadata backfill complete; memory ranking stabilized)
- **main_tip**: `999788b`
+- **main_tip**: `a87d984`
- **test_count**: 533 (prior 521 + 12 new wikilink/redlink tests)
+- **test_count**: 571 on `main`
- **harness**: `17/18 PASS` on live Dalidou (p04-constraints expects "Zerodur" — retrieval content gap, not regression)
+- **harness**: `19/20 PASS` on live Dalidou, 0 blocking failures, 1 known content gap (`p04-constraints`)
 - **vectors**: 33,253
- **active_memories**: 84 (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
+- **active_memories**: 290 (`/admin/dashboard` 2026-04-24; note integrity panel reports a separate active_memory_count=951 and needs reconciliation)
- **candidate_memories**: 2
+- **candidate_memories**: 0 (triage queue drained)
- **interactions**: 234 total (192 claude-code, 38 openclaw, 4 test)
+- **interactions**: 951 (`/admin/dashboard` 2026-04-24)
 - **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08)
- **project_state_entries**: 110 total (atocore=47, p06=19, p05=18, p04=15, abb=6, atomizer=5)
+- **project_state_entries**: 128 across registered projects (`/admin/dashboard` 2026-04-24)
- **entities**: 35 (engineering knowledge graph, Layer 2)
+- **entities**: 66 (up from 35 — V1-0 backfill + ongoing work; 0 open conflicts)
 - **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified
 - **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)** → **pipeline summary (NEW)**
 - **capture_clients**: claude-code (Stop hook + cwd project inference), openclaw (before_agent_start + llm_output plugin, verified live)
 - **wiki**: http://dalidou:8100/wiki (browse), /wiki/projects/{id}, /wiki/entities/{id}, /wiki/search
 - **dashboard**: http://dalidou:8100/admin/dashboard (now shows pipeline health, interaction totals by client, all registered projects)
 - **active_track**: Engineering V1 Completion (started 2026-04-22). V1-0 landed (`2712c5d`). V1-A density gate CLEARED (784 active ≫ 100 target as of 2026-04-23). V1-A soak gate at day 5/~7 (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; failures confined to the known p04-constraints content gap). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`.
 - **last_nightly_pipeline**: `2026-04-23T03:00:20Z` — harness 17/18, triage promoted=3 rejected=7 human=0, dedup 7 clusters (1 tier1 + 6 tier2 auto-merged), graduation 30-skipped 0-graduated 0-errors, auto-triage drained the queue (0 new candidates 2026-04-22T00:52Z run)
 - **open_branches**: `codex/retrieval-memory-ranking-stabilization` pushed and fast-forwarded into `main` as `a87d984`; no active unmerged code branch for this tranche.
 ## Active Plan
@@ -143,9 +146,12 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
 | R11 | Codex  | P2       | src/atocore/api/routes.py:773-845  | `POST /admin/extract-batch` still accepts `mode="llm"` inside the container and returns a successful 0-candidate result instead of surfacing that host-only LLM extraction is unavailable from this runtime. That is a misleading API contract for operators. | fixed        | Claude | 2026-04-12 | (pending)   |
 | R12 | Codex  | P2       | scripts/batch_llm_extract_live.py:39-190 | The host-side extractor duplicates the LLM system prompt and JSON parsing logic from `src/atocore/memory/extractor_llm.py`. It works today, but this is now a prompt/parser drift risk across the container and host implementations. | fixed        | Claude | 2026-04-12 | (pending)   |
 | R13 | Codex  | P2       | DEV-LEDGER.md:12                    | The new `286 passing` test-count claim is not reproducibly auditable from the current audit environments: neither Dalidou nor the clean worktree has `pytest` available. The claim may be true in Claude's dev shell, but it remains unverified in this audit. | fixed        | Claude | 2026-04-12 | (pending)   |
 | R14 | Codex  | P2       | src/atocore/api/routes.py (POST /entities/{id}/promote) | The HTTP `POST /entities/{id}/promote` route does not translate the new service-layer `ValueError("source_refs required: cannot promote a candidate with no provenance...")` into a 400. A legacy no-provenance candidate promoted through the API currently surfaces as a 500. Does not block V1-0 acceptance; tidy in a follow-up. | fixed        | Claude | 2026-04-22 | 0989fed     |
 ## Recent Decisions
 - **2026-04-22** **Wiki reorg reframed as read-only operator orientation.** The earlier `docs/plans/wiki-reorg-plan.md` (5 additions W-1…W-5 adding `/wiki/interactions`, `/wiki/memories`, project-page restructure, recent feed, topnav exposure) is **superseded** and marked as such in-tree. Successor is `docs/plans/operator-orientation-plan.md` in the `ATOCore-clean` workspace: no `/wiki` surface, no Human Mirror implementation, no new API routes, no schema changes, no new source of truth. First deliverables are docs-only: `docs/where-things-live.md` (operator map of source/staged/machine/registry/trusted-state/memories/interactions/logs/backups with explicit edit-vs-don't-edit boundaries) and `docs/operator-home.md` (short daily starting page indexing those docs + a 4-command read-only orientation sequence). README and operations.md link both. Optional future work (`project-overview` and `memory-candidates` CLI helpers over existing endpoints) is gated on the docs proving useful in practice. *Proposed by:* Antoine. *Drafted by:* Claude. *Pending Codex review.*
 - **2026-04-22** **V1-0 done: approved, merged, deployed, prod backfilled.** Codex pulled `f16cd52`, re-ran the two original probes (both pass), re-ran the three targeted regression suites (all pass). Squash-merged to main as `2712c5d`. Dalidou deployed via canonical deploy script; `/health` reports build_sha=`2712c5d2d03cb2a6af38b559664afd1c4cd0e050`, status=ok. Validated backup snapshot taken at `/srv/storage/atocore/backups/snapshots/20260422T190624Z` before backfill. Prod backfill: `--dry-run` found 31 active/superseded entities with no provenance; list reviewed and sane; live run updated 31 rows via the default `hand_authored=1` flag path; follow-up dry-run returned 0 rows remaining. Residual logged as R14 (P2): `POST /entities/{id}/promote` HTTP route doesn't translate the new service-layer `ValueError` into a 400 — legacy bad candidate promotes via the API return 500 instead. Does not block V1-0 acceptance. V1-0 closed. Next: V1-A (Q-001 subsystem-scoped variant + Q-6 integration). V1-A holds until the soak window ends ~2026-04-26 and the 100-memory density target is hit. *Approved + landed by:* Codex. *Ratified by:* Antoine.
 - **2026-04-22** **Engineering V1 Completion Plan — Codex sign-off (third round)**. Codex's third-round audit closed the remaining five open questions with concrete resolutions, patched inline in `docs/plans/engineering-v1-completion-plan.md`: (1) F-7 row rewritten with ground truth — schema + preserve-original + test coverage already exist (`graduated_to_entity_id` at `database.py:143-146`, `graduated` status in memory service, promote hook at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90`); **real gaps** are missing direct `POST /memory/{id}/graduate` route and spec's `knowledge→Fact` mismatch (no `fact` entity type exists; reconcile to `parameter` or similar); V1-E 2 → **3–4 days**; (2) Q-5 determinism reframed — don't stabilize the call to `datetime.now()`, inject regenerated timestamp + checksum as renderer inputs, remove DB iteration ordering dependencies; V1-D scope updated; (3) `project` vs `project_id` — doc note only, no rename, resolved; (4) total estimate 16.5–17.5 → **17.5–19.5 focused days** with calendar buffer on top; (5) "Minions" must not be canonized in D-3 release notes — neutral wording ("queued background processing / async workers") only. **Agreement reached**: Claude + Codex + Antoine aligned. V1-0 is ready to start once the current pipeline soak window ends (~2026-04-26) and the 100-memory density target is hit. *Patched by:* Codex. *Signed off by:* Codex ("with those edits, I'd sign off on the five questions"). *Accepted by:* Antoine. *Executor (V1-0 onwards):* Claude.
 - **2026-04-22** **Engineering V1 Completion Plan revised per Codex second-round file-level audit** — three findings folded in, all with exact file:line refs from Codex: (1) F-1 downgraded from ✅ to 🟡 — `extractor_version` and `canonical_home` missing from `Entity` dataclass and `entities` table per `engineering-v1-acceptance.md:45`; V1-0 scope now adds both fields via additive migration + doc note that `project` IS `project_id` per "fields equivalent to" spec wording; (2) F-2 replaced with ground-truth per-query status: 9 of 20 v1-required queries done (Q-004/Q-005/Q-006/Q-008/Q-009/Q-011/Q-013/Q-016/Q-017), 1 partial (Q-001 needs subsystem-scoped variant), 10 missing (Q-002/003/007/010/012/014/018/019/020); V1-A scope shrank to Q-001 shape fix + Q-6 integration (pillar queries already implemented); V1-C closes the 8 remaining new queries + Q-020 deferred to V1-D; (3) F-5 reframed — generic `conflicts` + `conflict_members` schema already present at `database.py:190`, no migration needed; divergence is detector body (per-type dispatch needs generalization) + routes (`/admin/conflicts/*` needs `/conflicts/*` alias). Total revised to 16.5–17.5 days, ~60 tests. Plan: `docs/plans/engineering-v1-completion-plan.md` at commit `ce3a878` (Codex pulled clean). Three of Codex's eight open questions now answered; remaining: F-7 graduation depth, mirror determinism, `project` rename question, velocity calibration, minions naming. *Proposed by:* Claude. *Reviewed by:* Codex (two rounds).
 - **2026-04-22** **Engineering V1 Completion Plan revised per Codex first-round review** — original six-phase order (queries → ingest → mirror → graduation → provenance → ops) rejected by Codex as backward: provenance-at-write (F-8) and conflict-detection hooks (F-5 minimal) must precede any phase that writes active entities. Revised to seven phases: V1-0 write-time invariants (F-8 + F-5 hooks + F-1 audit) as hard prerequisite, V1-A minimum query slice proving the model, V1-B ingest, V1-C full query catalog, V1-D mirror, V1-E graduation, V1-F full F-5 spec + ops + docs. Also softened "parallel with Now list" — real collision points listed explicitly; schedule shifted ~4 weeks to reflect that V1-0 cannot start during pipeline soak. Withdrew the "50–70% built" global framing in favor of the per-criterion gap table. Workspace sync note added: Codex's Playground workspace can't see the plan file; canonical dev tree is Windows `C:\Users\antoi\ATOCore`. Plan: `docs/plans/engineering-v1-completion-plan.md`. Awaiting Codex file-level audit once workspace syncs. *Proposed by:* Claude. *First-round review by:* Codex.
@@ -164,6 +170,26 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
 ## Session Log
 - **2026-04-25 Codex (project_id backfill + retrieval stabilization closed)** Merged `codex/project-id-metadata-retrieval` into `main` (`867a1ab`) and deployed to Dalidou. Took Chroma-inclusive backup `/srv/storage/atocore/backups/snapshots/20260424T154358Z`, then ran `scripts/backfill_chunk_project_ids.py` per project; populated projects `p04-gigabit`, `p05-interferometer`, `p06-polisher`, `atomizer-v2`, and `atocore` applied cleanly for 33,253 vectors total, with 0 missing/malformed and an immediate final dry-run showing 33,253 already tagged / 0 updates. Post-backfill harness exposed p06 memory-ranking misses (`Tailscale`, `encoder`), so Codex shipped `4744c69` then `a87d984 fix(memory): widen query-time context candidates`. Full local suite: 571 passed. Live `/health` reports `a87d9845a8c34395a02890f0cf22aa7a46afaf62`, vectors=33,253, sources_ready=true. Live retrieval harness: 19/20, 0 blocking failures, 1 known issue (`p04-constraints` missing `Zerodur` / `1.2`). A repeat backfill dry-run after the code-only stabilization deploy was aborted after the one-off container ran too long; the live service stayed healthy and the earlier post-apply idempotency result remains the migration acceptance record. Dalidou HTTP push credentials are still not configured; this session pushed through the Windows credential path.
 - **2026-04-24 Codex (retrieval boundary deployed + project_id metadata tranche)** Merged `codex/audit-improvements-foundation` to `main` as `f44a211` and pushed to Dalidou Gitea. Took pre-deploy runtime backup `/srv/storage/atocore/backups/snapshots/20260424T144810Z` (DB + registry, no Chroma). Deployed via `papa@dalidou` canonical `deploy/dalidou/deploy.sh`; live `/health` reports build_sha `f44a2114970008a7eec4e7fc2860c8f072914e38`, build_time `2026-04-24T14:48:44Z`, status ok. Post-deploy retrieval harness: 20 fixtures, 19 pass, 0 blocking failures, 1 known issue (`p04-constraints`). The former blocker `p05-broad-status-no-atomizer` now passes. Manual p05 `context-build "current status"` spot check shows no p04/Atomizer source bleed in retrieved chunks. Started follow-up branch `codex/project-id-metadata-retrieval`: registered-project ingestion now writes explicit `project_id` into DB chunk metadata and Chroma vector metadata; retrieval prefers exact `project_id` when present and keeps path/tag matching as legacy fallback; added dry-run-by-default `scripts/backfill_chunk_project_ids.py` to backfill SQLite + Chroma metadata; added tests for project-id ingestion, registered refresh propagation, exact project-id retrieval, and collision fallback. Verified targeted suite (`test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 36 passed. Verified full suite: 556 passed in 72.44s. Branch not merged or deployed yet.
 - **2026-04-24 Codex (project_id audit response)** Applied independent-audit fixes on `codex/project-id-metadata-retrieval`. Closed the nightly `/ingest/sources` clobber risk by adding registry-level `derive_project_id_for_path()` and making unscoped `ingest_file()` derive ownership from registered ingest roots when possible; `refresh_registered_project()` still passes the canonical project id directly. Changed retrieval so empty `project_id` falls through to legacy path/tag ownership instead of short-circuiting as unowned. Hardened `scripts/backfill_chunk_project_ids.py`: `--apply` now requires `--chroma-snapshot-confirmed`, runs Chroma metadata updates before SQLite writes, batches updates, skips/report missing vectors, skips/report malformed metadata, reports already-tagged rows, and turns missing ingestion tables into a JSON `db_warning` instead of a traceback. Added tests for auto-derive ingestion, empty-project fallback, ingest-root overlap rejection, and backfill dry-run/apply/snapshot/missing-vector/malformed cases. Verified targeted suite (`test_backfill_chunk_project_ids.py`, `test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 45 passed. Verified full suite: 565 passed in 73.16s. Local dry-run on empty/default data returns 0 updates with `db_warning` rather than crashing. Branch still not merged/deployed.
 - **2026-04-24 Codex (project_id final hardening before merge)** Applied the final independent-review P2s on `codex/project-id-metadata-retrieval`: `ingest_file()` still fails open when project-id derivation fails, but now emits `project_id_derivation_failed` with file path and error; retrieval now catches registry failures both at project-scope resolution and the soft project-match boost path, logs warnings, and serves unscoped rather than raising. Added regression tests for both fail-open paths. Verified targeted suite (`test_ingestion.py`, `test_retrieval.py`, `test_backfill_chunk_project_ids.py`, `test_project_registry.py`): 47 passed. Verified full suite: 567 passed in 79.66s. Branch still not merged/deployed.
 - **2026-04-24 Codex (audit improvements foundation)** Started implementation of the audit recommendations on branch `codex/audit-improvements-foundation` from `origin/main@c53e61e`. First tranche: registry-aware project-scoped retrieval filtering (`ATOCORE_RANK_PROJECT_SCOPE_FILTER`, widened candidate pull before filtering), eval harness known-issue lane, two p05 project-bleed fixtures, `scripts/live_status.py`, README/current-state/master-plan status refresh. Verified `pytest -q`: 550 passed in 67.11s. Live retrieval harness against undeployed production: 20 fixtures, 18 pass, 1 known issue (`p04-constraints` Zerodur/1.2 content gap), 1 blocking guard (`p05-broad-status-no-atomizer`) still failing because production has not yet deployed the retrieval filter and currently pulls `P04-GigaBIT-M1-KB-design` into broad p05 status context. Live dashboard refresh: health ok, build `2b86543`, docs 1748, chunks/vectors 33253, interactions 948, active memories 289, candidates 0, project_state total 128. Noted count discrepancy: dashboard memories.active=289 while integrity active_memory_count=951; schedule reconciliation in a follow-up.
 - **2026-04-24 Codex (independent-audit hardening)** Applied the Opus independent audit's fast follow-ups before merge/deploy. Closed the two P1s by making project-scope ownership path/tag-based only, adding path-segment/tag-exact matching to avoid short-alias substring collisions, and keeping title/heading text out of provenance decisions. Added regression tests for title poisoning, substring collision, and unknown-project fallback. Added retrieval log fields `raw_results_count`, `post_filter_count`, `post_filter_dropped`, and `underfilled`. Added retrieval-eval run metadata (`generated_at`, `base_url`, `/health`) and `live_status.py` auth-token/status support. README now documents the ranking knobs and clarifies that the hard scope filter and soft project match boost are separate controls. Verified `pytest -q`: 553 passed in 66.07s. Live production remains expected-predeploy: 20 fixtures, 18 pass, 1 known content gap, 1 blocking p05 bleed guard. Latest live dashboard: build `2b86543`, docs 1748, chunks/vectors 33253, interactions 950, active memories 290, candidates 0, project_state total 128.
 - **2026-04-23 Codex + Claude (R14 closed)** Codex reviewed `claude/r14-promote-400` at `3888db9`, no findings: "The route change is narrowly scoped: `promote_entity()` still returns False for not-found/not-candidate cases, so the existing 404 behavior remains intact, while caller-fixable validation failures now surface as 400." Ran `pytest tests/test_v1_0_write_invariants.py -q` from an isolated worktree: 15 passed in 1.91s. Claude squash-merged to main as `0989fed`, followed by ledger close-out `2b86543`, then deployed via canonical script. Dalidou `/health` reports build_sha=`2b86543e6ad26011b39a44509cc8df3809725171`, build_time `2026-04-23T15:20:53Z`, status=ok. R14 closed. Orientation refreshed earlier this session also reflected the V1-A gate status: **density gate CLEARED** (784 active memories vs 100 target — density batch-extract ran between 2026-04-22 and 2026-04-23 and more than crushed the gate), **soak gate at day 5 of ~7** (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; only chronic failure is the known p04-constraints "Zerodur" content gap). V1-A branches from a clean V1-0 baseline as soon as the soak is called done.
 - **2026-04-22 Codex + Antoine (V1-0 closed)** Codex approved `f16cd52` after re-running both original probes (legacy-candidate promote + supersede hook — both correct) and the three targeted regression suites (`test_v1_0_write_invariants.py`, `test_engineering_v1_phase5.py`, `test_inbox_crossproject.py` — all pass). Squash-merged to main as `2712c5d` ("feat(engineering): enforce V1-0 write invariants"). Deployed to Dalidou via the canonical deploy script; `/health` build_sha=`2712c5d2d03cb2a6af38b559664afd1c4cd0e050` status=ok. Validated backup snapshot at `/srv/storage/atocore/backups/snapshots/20260422T190624Z` taken BEFORE prod backfill. Prod backfill of `scripts/v1_0_backfill_provenance.py` against live DB: dry-run found 31 active/superseded entities with no provenance, list reviewed and looked sane; live run with default `hand_authored=1` flag path updated 31 rows; follow-up dry-run returned 0 rows remaining → no lingering F-8 violations in prod. Codex logged one residual P2 (R14): HTTP `POST /entities/{id}/promote` route doesn't translate the new service-layer `ValueError` into 400 — legacy bad candidate promoted through the API surfaces as 500. Not blocking. V1-0 closed. **Gates for V1-A**: soak window ends ~2026-04-26; 100-active-memory density target (currently 84 active + the ~31 newly flagged ones — need to check how those count in density math). V1-A holds until both gates clear.
 - **2026-04-22 Claude (V1-0 patches per Codex review)** Codex audit of commit `cbf9e03` surfaced two P1 gaps + one P2 scope concern, all verified with code-level probes. **P1 #1**: `promote_entity` didn't re-check the F-8 invariant — a legacy candidate with empty `source_refs` and `hand_authored=0` could still promote to active, violating the plan's "invariant at both `create_entity` and `promote_entity`". Fixed: `promote_entity` at `service.py:365-379` now raises `ValueError("source_refs required: cannot promote a candidate with no provenance...")` before flipping status. Stays symmetric with the create-side error. **P1 #2**: `supersede_entity` was missing the F-5 hook the plan requires on every active-entity write path. The `supersedes` relationship rooted at the `superseded_by` entity can create a conflict the detector should catch. Fixed at `service.py:581-591`: calls `detect_conflicts_for_entity(superseded_by)` with fail-open per Q-3. **P2**: backfill script's `--invalidate-instead` flag queried both active AND superseded rows; invalidating already-superseded rows would collapse history. Fixed at `scripts/v1_0_backfill_provenance.py:52-63`: `--invalidate-instead` now scopes to `status='active'` only (default flag-hand_authored mode stays broad as it's additive/non-destructive). Help text tightened to make the destructive posture explicit. **Four new regression tests** in `test_v1_0_write_invariants.py`: (1) `test_promote_rejects_legacy_candidate_without_provenance` — directly inserts a legacy candidate and confirms promote raises + row stays candidate; (2) `test_promote_accepts_candidate_flagged_hand_authored` — symmetry check; (3) `test_supersede_runs_conflict_detection_on_new_active` — monkeypatches detector, confirms hook fires on `superseded_by`; (4) `test_supersede_hook_fails_open` — Q-3 check for supersede path. **Test count**: 543 → 547 (+4 regression). Full suite `547 passed in 81.07s`. Next: commit patches on branch, push, Codex re-review.
 - **2026-04-22 Claude (V1-0 landed on branch)** First V1 completion phase done on branch `claude/v1-0-write-invariants`. **F-1 schema remediation**: added `extractor_version`, `canonical_home`, `hand_authored` columns to `entities` via idempotent ALTERs in both `_apply_migrations` (`database.py:148-170`) and `init_engineering_schema` (`service.py:95-139`). CREATE TABLE also updated so fresh DBs get the columns natively. New `_table_exists` helper at `database.py:378`. `Entity` dataclass gains the three fields with sensible defaults. `EXTRACTOR_VERSION = "v1.0.0"` module constant at top of `service.py`. `_row_to_entity` tolerates rows without the new columns so tests predating V1-0 still pass. **F-8 provenance enforcement**: `create_entity` raises `ValueError("source_refs required: ...")` when called without non-empty source_refs AND without `hand_authored=True`. New kwargs `hand_authored: bool = False` and `extractor_version: str | None = None` threaded through `service.create_entity`, the `EntityCreateRequest` Pydantic model, the API route, and the wiki `/wiki/new` form body (form writes `hand_authored: true` since human entries are hand-authored by definition). **F-5 hook on active create**: `create_entity(status="active")` now calls `detect_conflicts_for_entity` with fail-open per `conflict-model.md:256` (errors log warning, write still succeeds). The promote path's existing hook at `service.py:400-404` was kept as-is. **Doc note** added to `engineering-ontology-v1.md` recording that `project` IS the `project_id` per "fields equivalent to" wording. **Backfill script** at `scripts/v1_0_backfill_provenance.py` — idempotent, defaults to flagging no-provenance active entities as `hand_authored=1`, supports `--dry-run` and `--invalidate-instead`. **Tests**: 10 new in `tests/test_v1_0_write_invariants.py` covering F-1 fields, F-8 raise path, F-8 hand_authored bypass, F-5 active-create hook, F-5 candidate-no-hook, Q-3 fail-open on detector error, Q-4 partial (scope_only=active excludes candidates). **Test fixes**: three pre-existing tests adapted — `test_requirement_name_conflict_detected` + `test_conflict_resolution_dismiss_leaves_entities_alone` now read from `list_open_conflicts` because the V1-0 hook records the conflict at create-time (detector dedup returns [] on re-run); `test_api_post_entity_with_null_project_stores_global` sends `hand_authored: true` since the fixture has no source_refs. **conftest.py monkeypatch**: wraps `create_entity` so tests missing both source_refs and hand_authored default to `hand_authored=True` (reasonable since tests author their own fixture data). Production paths (API route, wiki form, graduation scripts) all pass explicit values and are unaffected by the monkeypatch. **Test count**: 533 → 543 (+10), full suite `543 passed in 77.86s`. **Not yet**: commit + push + Codex review + deploy. **Branch**: `claude/v1-0-write-invariants`.
 - **2026-04-22 Codex (late night)** Third-round audit closed the remaining five open questions. Patched `docs/plans/engineering-v1-completion-plan.md` inline (no commit by Codex). **F-7 finding (P1):** graduation stack is partially built — `_graduation_prompt.py`, `scripts/graduate_memories.py`, `database.py:143-146` (`graduated_to_entity_id`), memory `graduated` status, promote-preserves-original at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90` all exist. Real gaps: no direct `POST /memory/{id}/graduate` route at `routes.py:756`; spec's `knowledge→Fact` doesn't match ontology (`service.py:16` has no `fact` type — reconcile to `parameter` or similar). V1-E estimate 2 → 3–4 days. **Q-5 finding (P2):** "stabilize timestamp" insufficient — renderer reads wall-clock in `_footer()` at `mirror.py:320`; fix is inject regenerated timestamp + checksum as renderer inputs + sort DB iteration + remove dict ordering deps. V1-D scope patched. **Remaining three (P3):** `project` stays as doc-note equivalence (no rename); total estimate 17.5–19.5 focused days; release notes must NOT canonize "Minions" — neutral "queued background processing / async workers" only. **Sign-off:** "with those edits, I'd sign off on the five questions. The only non-architectural uncertainty left in the plan is scheduling discipline against the current Now list; that does not block V1-0 once the soak window and memory-density gate clear." **Status:** Claude + Codex agreed. Plan frozen pending Antoine final accept and gate clearance. Claude to commit Codex's patches + push.
 - **2026-04-22 Claude (late night)** Codex second-round review did the full file-level audit and came back with three P1/P2 findings, all with exact file:line refs. Verified each against current code before revising. (1) **F-1 not clean**: `Entity` dataclass at `service.py:67` and `entities` table schema are missing the `extractor_version` and `canonical_home` shared-header fields required by `engineering-v1-acceptance.md:45`; `project` field is the project identifier but not named `project_id` as spec writes (spec wording "fields equivalent to" allows the naming, but needs explicit doc note). V1-0 scope now includes adding both missing fields via additive `_apply_migrations` pattern. (2) **F-2 needed exact statuses, not guesses**: per-function audit gave ground truth — 9 of 20 v1-required queries done, 1 partial (Q-001 returns project-wide tree not subsystem-scoped expand=contains per `engineering-query-catalog.md:71`), 10 missing. V1-A scope shrank to Q-001 shape fix + Q-6 integration (most pillar queries already implemented); V1-C closes the 8 net-new queries + Q-020 to V1-D. (3) **F-5 misframed**: the generic `conflicts` + `conflict_members` schema is ALREADY spec-compliant at `database.py:190`; divergence is detector body at `conflicts.py:36` (per-type dispatch needs generalization) + route path (`/admin/conflicts/*` needs `/conflicts/*` alias). V1-F no longer includes a schema migration; detector generalization + route alignment only. Totals revised to 16.5–17.5 days, ~60 tests (down from 12–17 / 65 because V1-A and V1-F scopes both shrank after audit). Three of the eight open questions resolved. Remaining open: F-7 graduation depth, mirror determinism, `project` naming, velocity calibration, minions-as-V2 naming. No code changes this session — plan + ledger only. Next: commit + push revised plan, then await Antoine+Codex joint sign-off before V1-0 starts.
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@ Personal context engine that enriches LLM interactions with durable memory, stru
 ```bash
 pip install -e .
-uvicorn src.atocore.main:app --port 8100
+uvicorn atocore.main:app --port 8100
 ```
 ## Usage
@@ -37,6 +37,10 @@ python scripts/atocore_client.py audit-query "gigabit" 5
 | POST | /ingest | Ingest markdown file or folder |
 | POST | /query | Retrieve relevant chunks |
 | POST | /context/build | Build full context pack |
 | POST | /interactions | Capture prompt/response interactions |
 | GET/POST | /memory | List/create durable memories |
 | GET/POST | /entities | Engineering entity graph surface |
 | GET | /admin/dashboard | Operator dashboard |
 | GET | /health | Health check |
 | GET | /debug/context | Inspect last context pack |
@@ -66,8 +70,10 @@ unversioned forms.
 FastAPI (port 8100)
  |- Ingestion: markdown -> parse -> chunk -> embed -> store
  |- Retrieval: query -> embed -> vector search -> rank
-  |- Context Builder: retrieve -> boost -> budget -> format
+  |- Context Builder: project state -> memories -> entities -> retrieval -> budget
-  |- SQLite (documents, chunks, memories, projects, interactions)
+  |- Reflection: capture -> reinforce -> extract -> triage -> promote/expire
  |- Engineering: typed entities, relationships, conflicts, wiki/mirror
  |- SQLite (documents, chunks, memories, projects, interactions, entities)
  '- ChromaDB (vector embeddings)
 ```
@@ -82,6 +88,16 @@ Set via environment variables (prefix `ATOCORE_`):
 | ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) |
 | ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) |
 | ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model |
 | ATOCORE_RANK_PROJECT_MATCH_BOOST | 2.0 | Soft boost for chunks whose metadata matches the project hint |
 | ATOCORE_RANK_PROJECT_SCOPE_FILTER | true | Filter project-hinted retrieval away from other registered project corpora |
 | ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER | 4 | Widen candidate pull before project-scope filtering |
 | ATOCORE_RANK_QUERY_TOKEN_STEP | 0.08 | Per-token boost when query terms appear in high-signal metadata |
 | ATOCORE_RANK_QUERY_TOKEN_CAP | 1.32 | Maximum query-token boost multiplier |
 | ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST | 1.18 | Boost current decision/status/requirements-like paths |
 | ATOCORE_RANK_PATH_LOW_SIGNAL_PENALTY | 0.72 | Down-rank archive/history-like paths |
 `ATOCORE_RANK_PROJECT_SCOPE_FILTER` gates the hard cross-project filter only.
 `ATOCORE_RANK_PROJECT_MATCH_BOOST` remains the separate soft-ranking knob.
 ## Testing
@@ -93,7 +109,11 @@ pytest
 ## Operations
 - `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
 - `scripts/retrieval_eval.py` runs the live retrieval/context harness, separates blocking failures from known content gaps, and stamps JSON output with target/build metadata.
 - `scripts/live_status.py` renders a compact read-only status report from `/health`, `/stats`, `/projects`, and `/admin/dashboard`; set `ATOCORE_AUTH_TOKEN` or `--auth-token` when those endpoints are gated.
 - `scripts/backfill_chunk_project_ids.py` dry-runs or applies explicit `project_id` metadata backfills for SQLite chunks and Chroma vectors; `--apply` requires a confirmed Chroma snapshot.
 - `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
 - `DEV-LEDGER.md` is the fast-moving source of operational truth during active development; copy claims into docs only after checking the live service.
 ## Architecture Notes
--- a/docs/architecture/engineering-ontology-v1.md
+++ b/docs/architecture/engineering-ontology-v1.md
@@ -159,6 +159,17 @@ Every major object should support fields equivalent to:
 - `created_at`
 - `updated_at`
 - `notes` (optional)
 - `extractor_version` (V1-0)
 - `canonical_home` (V1-0)
 **Naming note (V1-0, 2026-04-22).** The AtoCore `entities` table and
 `Entity` dataclass name the project-identifier field `project`, not
 `project_id`. This doc's "fields equivalent to" wording allows that
 naming flexibility — the `project` field on entity rows IS the
 `project_id` per spec. No storage rename is planned; downstream readers
 should treat `entity.project` as the project identifier. This was
 resolved in Codex's third-round audit of the V1 Completion Plan (see
 `docs/plans/engineering-v1-completion-plan.md`).
 ## Suggested Status Lifecycle
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -1,6 +1,42 @@
-# AtoCore — Current State (2026-04-19)
+# AtoCore - Current State (2026-04-25)
-Live deploy: `877b97e` · Dalidou health: ok · Harness: 17/18.
+Update 2026-04-25: project-id chunk/vector metadata is deployed and backfilled.
 Live Dalidou is on `a87d984`; `/health` is ok with 33,253 vectors and sources
 ready. Live retrieval harness is 19/20 with 0 blocking failures and 1 known
 content gap (`p04-constraints` missing `Zerodur` / `1.2`). Full local suite:
 571 passed.
 The project-id backfill was applied per populated project after a
 Chroma-inclusive backup at
 `/srv/storage/atocore/backups/snapshots/20260424T154358Z`. The immediate
 post-apply dry-run reported 33,253 already tagged, 0 updates, 0 missing, and 0
 malformed. A later repeat dry-run after the code-only ranking deploy was
 aborted because the one-off container ran too long; the earlier post-apply
 idempotency result remains the migration acceptance record.
 ## V1-0 landed 2026-04-22
 Engineering V1 completion track has started. **V1-0 write-time invariants**
 merged and deployed: F-1 shared-header fields (`extractor_version`,
 `canonical_home`, `hand_authored`) added to `entities`, F-8 provenance
 enforcement at both `create_entity` and `promote_entity`, F-5 synchronous
 conflict-detection hook on every active-entity write path (create, promote,
 supersede) with Q-3 fail-open. Prod backfill ran cleanly — 31 legacy
 active/superseded entities flagged `hand_authored=1`, follow-up dry-run
 returned 0 remaining rows. Test count 533 → 547 (+14).
 R14 is closed: `POST /entities/{id}/promote` now translates the new
 caller-fixable V1-0 `ValueError` into HTTP 400.
 **Next in the V1 track:** V1-A (minimal query slice + Q-6 killer-correctness
 integration). Gated on pipeline soak (~2026-04-26) + 100+ active memory
 density target. See `docs/plans/engineering-v1-completion-plan.md` for
 the full 7-phase roadmap and `docs/plans/v1-resume-state.md` for the
 "you are here" map.
 ---
 ## Snapshot from previous update (2026-04-19)
 ## The numbers
@@ -40,10 +76,10 @@ Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs
 | 7G | Re-extraction on prompt version bump | pending |
 | 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |
-## Known gaps (honest)
+## Known gaps (honest, refreshed 2026-04-24)
 1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
-2. **OpenClaw is capture-only, not context-grounded.** The plugin POSTs `/interactions` on `llm_output` but does NOT call `/context/build` on `before_agent_start`. OpenClaw's underlying agent runs blind. See Q2 below.
+2. **Project-scoped retrieval guard is deployed and passing.** Explicit `project_id` chunk/vector metadata is now present in SQLite and Chroma for the 33,253-vector corpus. Retrieval prefers exact metadata ownership and keeps path/tag matching as a legacy fallback.
-3. **Human interface (wiki) is thin and static.** 5 project cards + a "System" line. No dashboard for the autonomous activity. No per-memory detail page. See Q3/Q5.
+3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
-4. **Harness 17/18** — the `p04-constraints` fixture wants "Zerodur" but retrieval surfaces related-not-exact terms. Content gap, not a retrieval regression.
+4. **Harness known issue:** `p04-constraints` wants "Zerodur" and "1.2"; live retrieval surfaces related constraints but not those exact strings. Treat as content/state gap until fixed.
-5. **Two projects under-populated**: p05-interferometer (4 memories, 18 state) and atomizer-v2 (1 memory, 6 state). Batch re-extract with the new llm-0.6.0 prompt would help.
+5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.
--- a/docs/master-plan-status.md
+++ b/docs/master-plan-status.md
@@ -70,9 +70,14 @@ read-only additive mode.
 - Phase 6 - AtoDrive
 - Phase 10 - Write-back
 - Phase 11 - Multi-model
 - Phase 12 - Evaluation
 - Phase 13 - Hardening
 ### Partial / Operational Baseline
 - Phase 12 - Evaluation. The retrieval/context harness exists and runs
  against live Dalidou, but coverage is still intentionally small and
  should grow before this is complete in the intended sense.
 ### Engineering Layer Planning Sprint
 **Status: complete.** All 8 architecture docs are drafted. The
@@ -126,11 +131,15 @@ This sits implicitly between Phase 8 (OpenClaw) and Phase 11
 (multi-model). Memory-review and engineering-entity commands are
 deferred from the shared client until their workflows are exercised.
-## What Is Real Today (updated 2026-04-16)
+## What Is Real Today (updated 2026-04-25)
- canonical AtoCore runtime on Dalidou (`775960c`, deploy.sh verified)
+- canonical AtoCore runtime on Dalidou (`a87d984`, deploy.sh verified)
- 33,253 vectors across 6 registered projects
+- 33,253 vectors across 6 registered projects, with explicit `project_id`
- 234 captured interactions (192 claude-code, 38 openclaw, 4 test)
+  metadata backfilled into SQLite and Chroma after snapshot
  `/srv/storage/atocore/backups/snapshots/20260424T154358Z`
 - 951 captured interactions as of the 2026-04-24 live dashboard; refresh
  exact live counts with
  `python scripts/live_status.py`
 - 6 registered projects:
  - `p04-gigabit` (483 docs, 15 state entries)
  - `p05-interferometer` (109 docs, 18 state entries)
@@ -138,12 +147,17 @@ deferred from the shared client until their workflows are exercised.
  - `atomizer-v2` (568 docs, 5 state entries)
  - `abb-space` (6 state entries)
  - `atocore` (drive source, 47 state entries)
- 110 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
+- 128 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
- 84 active memories (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
+- 290 active memories and 0 candidate memories as of the 2026-04-24 live
  dashboard
 - context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
- query-relevance memory ranking with overlap-density scoring
+- query-relevance memory ranking with overlap-density scoring and widened
- retrieval eval harness: 18 fixtures, 17/18 passing on live
+  query-time candidate pools so older exact-intent project memories can rank
- 303 tests passing
+  ahead of generic high-confidence notes
 - retrieval eval harness: 20 fixtures; current live has 19 pass, 1 known
  content gap, and 0 blocking failures after the project-id backfill and
  memory-ranking stabilization deploy
 - 571 tests passing on `main`
 - nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness** → **pipeline summary to project state**
 - Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
 - pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
@@ -167,17 +181,45 @@ These are the current practical priorities.
 4. **Fix p04-constraints harness failure** — retrieval doesn't surface
   "Zerodur" for p04 constraint queries. Investigate if it's a missing
   memory or retrieval ranking issue.
 5. **Fix Dalidou Git credentials** — the host checkout can fetch but cannot
   push to Gitea over HTTP in non-interactive SSH sessions. Prefer switching
   the deploy checkout to a Gitea SSH key; PAT-backed `credential.helper store`
   is the fallback.
 ## Active — Engineering V1 Completion Track (started 2026-04-22)
 The Engineering V1 sprint moved from **Next** to **Active** on 2026-04-22.
 The discovery from the gbrain review was that V1 entity infrastructure
 had been built incrementally already; the sprint is a **completion** plan
 against `engineering-v1-acceptance.md`, not a greenfield build. Full plan:
 `docs/plans/engineering-v1-completion-plan.md`. "You are here" single-page
 map: `docs/plans/v1-resume-state.md`.
 Seven phases, ~17.5–19.5 focused days, runs in parallel with the Now list
 where surfaces are disjoint, pauses when they collide.
 | Phase | Scope | Status |
 |---|---|---|
 | V1-0 | Write-time invariants: F-1 header fields + F-8 provenance enforcement + F-5 hook on every active-entity write + Q-3 flag-never-block | ✅ done 2026-04-22 (`2712c5d`) |
 | V1-A | Minimum query slice: Q-001 subsystem-scoped variant + Q-6 killer-correctness integration test on p05-interferometer | 🟡 gated — starts when soak (~2026-04-26) + density (100+ active memories) gates clear |
 | V1-B | KB-CAD + KB-FEM ingest (`POST /ingest/kb-cad/export`, `POST /ingest/kb-fem/export`) + D-2 schema docs | pending V1-A |
 | V1-C | Close the remaining 8 queries (Q-002/003/007/010/012/014/018/019; Q-020 to V1-D) | pending V1-B |
 | V1-D | Full mirror surface (3 spec routes + regenerate + determinism + disputed + curated markers) + Q-5 golden file | pending V1-C |
 | V1-E | Memory→entity graduation end-to-end + remaining Q-4 trust tests | pending V1-D (note: collides with memory extractor; pauses for multi-model triage work) |
 | V1-F | F-5 detector generalization + route alias + O-1/O-2/O-3 operational + D-1/D-3/D-4 docs | finish line |
 R14 is closed: `POST /entities/{id}/promote` now translates
 caller-fixable V1-0 provenance validation failures into HTTP 400 instead
 of leaking as HTTP 500.
 ## Next
-These are the next major layers after the current stabilization pass.
+These are the next major layers after V1 and the current stabilization pass.
 1. Phase 6 AtoDrive — clarify Google Drive as a trusted operational
   source and ingest from it
 2. Phase 13 Hardening — Chroma backup policy, monitoring, alerting,
   failure visibility beyond log files
 3. Engineering V1 implementation sprint — once knowledge density is
   sufficient and the pipeline feels boring and dependable
 ## Later
--- a/docs/plans/v1-resume-state.md
+++ b/docs/plans/v1-resume-state.md
@@ -0,0 +1,161 @@
 # V1 Completion — Resume State
 **Last updated:** 2026-04-22 (after V1-0 landed + R14 branch pushed)
 **Purpose:** single-page "you are here" so any future session can pick up
 the V1 completion sprint without re-reading the full plan history.
 ## State of play
 - **V1-0 is DONE.** Merged to main as `2712c5d`, deployed to Dalidou,
  prod backfill ran cleanly (31 legacy entities flagged
  `hand_authored=1`, zero violations remaining).
 - **R14 is on a branch.** `claude/r14-promote-400` at `3888db9` —
  HTTP promote route returns 400 instead of 500 on V1-0 `ValueError`.
  Pending Codex review + squash-merge. Non-blocking for V1-A.
 - **V1-A is next but GATED.** Doesn't start until both gates clear.
 ## Start-gates for V1-A
 | Gate | Condition | Status as of 2026-04-22 |
 |---|---|---|
 | Soak | Four clean nightly cycles since F4 confidence-decay first real run 2026-04-19 | Day 3 of 4 — expected clear around **2026-04-26** |
 | Density | 100+ active memories | 84 active as of last ledger update — need +16. Lever: `scripts/batch_llm_extract_live.py` against 234-interaction backlog |
 **When both are green, start V1-A.** If only one is green, hold.
 ## Pre-flight checklist when resuming
 Before opening the V1-A branch, run through this in order:
 1. `git checkout main && git pull` — make sure you're at the tip
 2. Check `DEV-LEDGER.md` **Orientation** for current `live_sha`, `test_count`, `active_memories`
 3. Check `/health` on Dalidou returns the same `build_sha` as Orientation
 4. Check the dashboard for pipeline health: http://dalidou:8100/admin/dashboard
 5. Confirm R14 branch status — either merged or explicitly deferred
 6. Re-read the two core plan docs:
   - `docs/plans/engineering-v1-completion-plan.md` — the full 7-phase plan
   - `docs/architecture/engineering-v1-acceptance.md` — the acceptance contract
 7. Skim the relevant spec docs for the phase you're about to start:
   - V1-A: `engineering-query-catalog.md` (Q-001 + Q-006/Q-009/Q-011 killer queries)
   - V1-B: `tool-handoff-boundaries.md` (KB-CAD/KB-FEM export shapes)
   - V1-C: `engineering-query-catalog.md` (all remaining v1-required queries)
   - V1-D: `human-mirror-rules.md` (mirror spec end-to-end)
   - V1-E: `memory-vs-entities.md` (graduation flow)
   - V1-F: `conflict-model.md` (generic slot-key detector)
 ## What V1-A looks like when started
 **Branch:** `claude/v1-a-pillar-queries`
 **Scope (~1.5 days):**
 - **Q-001 shape fix.** Add a subsystem-scoped variant of `system_map()`
  matching `GET /entities/Subsystem/<id>?expand=contains` per
  `engineering-query-catalog.md:71`. The project-wide version stays
  (it serves Q-004).
 - **Q-6 integration test.** Seed p05-interferometer with five cases:
  1 satisfying Component, 1 orphan Requirement, 1 Decision on flagged
  Assumption, 1 supported ValidationClaim, 1 unsupported ValidationClaim.
  One test asserting Q-006 / Q-009 / Q-011 return exactly the expected
  members.
 - The four "pillar" queries (Q-001, Q-005, Q-006, Q-017) already work
  per Codex's 2026-04-22 audit. V1-A does NOT re-implement them —
  V1-A verifies them on seeded data.
 **Acceptance:** Q-001 subsystem-scoped variant + Q-6 integration test both
 green. F-2 moves from 🟡 partial to slightly-less-partial.
 **Estimated tests added:** ~4 (not ~12 — V1-A scope shrank after Codex
 confirmed most queries already work).
 ## Map of the remaining phases
 ```
 V1-0 ✅  write-time invariants          landed 2026-04-22 (2712c5d)
        ↓
 V1-A 🟡 minimum query slice            gated on soak + density (~1.5d when started)
        ↓
 V1-B    KB-CAD/KB-FEM ingest + D-2     ~2d
        ↓
 V1-C    close 8 remaining queries      ~2d
        ↓
 V1-D    full mirror + determinism      ~3-4d (biggest phase)
        ↓
 V1-E    graduation + trust tests       ~3-4d (pauses for multi-model triage)
        ↓
 V1-F    F-5 generalization + ops + docs ~3d — V1 done
 ```
 ## Parallel work that can run WITHOUT touching V1
 These are genuinely disjoint surfaces; pick any of them during the gate
 pause or as scheduling allows:
 - **Density batch-extract** — *required* to unblock V1-A. Not optional.
 - **p04-constraints harness fix** — retrieval-ranking change, fully
  disjoint from entities. Safe to do anywhere in the V1 track.
 - **Multi-model triage (Phase 11 entry)** — memory-side work, disjoint
  from V1-A/B/C/D. **Pause before V1-E starts** because V1-E touches
  memory module semantics.
 ## What NOT to do
 - Don't start V1-A until both gates are green.
 - Don't touch the memory extractor write path while V1-E is open.
 - Don't name the rejected "Minions" plan in any doc — neutral wording
  only ("queued background processing / async workers") per Codex
  sign-off.
 - Don't rename the `project` field to `project_id` — Codex + Antoine
  agreed it stays as `project`, with a doc note in
  `engineering-ontology-v1.md` that this IS the project_id per spec.
 ## Open review findings
 | id | severity | summary | status |
 |---|---|---|---|
 | R14 | P2 | `POST /entities/{id}/promote` returns 500 on V1-0 `ValueError` instead of 400 | fixed on branch `claude/r14-promote-400`, pending Codex review |
 Closed V1-0 findings: P1 "promote path allows provenance-less legacy
 candidates" (service.py:365-379), P1 "supersede path missing F-5 hook"
 (service.py:581-591), P2 "`--invalidate-instead` backfill too broad"
 (v1_0_backfill_provenance.py:52-63). All three patched and approved in
 the squash-merge to `2712c5d`.
 ## How agreement between Claude + Codex has worked so far
 Three review rounds before V1-0 started + three during implementation:
 1. **Rejection round.** Claude drafted a gbrain-inspired "Phase 8
   Minions + typed edges" plan; Codex rejected as wrong-packaging.
   Record: `docs/decisions/2026-04-22-gbrain-plan-rejection.md`.
 2. **Completion-plan rewrite.** Claude rewrote against
   `engineering-v1-acceptance.md`. Codex first-round review fixed the
   phase order (provenance-first).
 3. **Per-file audit.** Codex's second-round audit found F-1 / F-2 /
   F-5 gaps, all folded in.
 4. **Sign-off round.** Codex's third-round review resolved the five
   remaining open questions inline and signed off: *"with those edits,
   I'd sign off on the five questions."*
 5. **V1-0 review.** Codex found two P1 gaps (promote re-check missing,
   supersede hook missing) + one P2 (backfill scope too broad). All
   three patched. Codex re-ran probes + regression suites, approved,
   squash-merged.
 6. **V1-0 deploy + prod backfill.** Codex deployed + ran backfill,
   logged R14 as P2 residual.
 Protocol has been: Claude writes, Codex audits, human Antoine ratifies.
 Continue this for V1-A onward.
 ## References
 - `docs/plans/engineering-v1-completion-plan.md` — full 7-phase plan
 - `docs/decisions/2026-04-22-gbrain-plan-rejection.md` — prior rejection
 - `docs/architecture/engineering-ontology-v1.md` — V1 ontology (18 predicates)
 - `docs/architecture/engineering-query-catalog.md` — Q-001 through Q-020 spec
 - `docs/architecture/engineering-v1-acceptance.md` — F/Q/O/D acceptance table
 - `docs/architecture/promotion-rules.md` — candidate → active flow
 - `docs/architecture/conflict-model.md` — F-5 spec
 - `docs/architecture/human-mirror-rules.md` — V1-D spec
 - `docs/architecture/memory-vs-entities.md` — V1-E spec
 - `docs/architecture/tool-handoff-boundaries.md` — V1-B KB-CAD/KB-FEM
 - `docs/master-plan-status.md` — Now / Active / Next / Later
 - `DEV-LEDGER.md` — Orientation + Open Review Findings + Session Log
--- a/docs/plans/wiki-reorg-plan.md
+++ b/docs/plans/wiki-reorg-plan.md
@@ -0,0 +1,216 @@
 # Wiki Reorg Plan — Human-Readable Navigation of AtoCore State
 > **SUPERSEDED — 2026-04-22.** Do not implement this plan. It has been
 > replaced by the read-only operator orientation work at
 > `docs/plans/operator-orientation-plan.md` (in the `ATOCore-clean`
 > workspace). The successor reframes the problem as orientation over
 > existing APIs and docs rather than a new `/wiki` surface, and drops
 > interaction browser / memory index / project-page restructure from
 > scope. Nothing below should be picked up as a work item. Kept in tree
 > for context only.
 **Date:** 2026-04-22
 **Author:** Claude (after walking the live wiki at `http://dalidou:8100/wiki`
 and reading `src/atocore/engineering/wiki.py` + `/wiki/*` routes in
 `src/atocore/api/routes.py`)
 **Status:** SUPERSEDED (see banner above). Previously: Draft, pending Codex review
 **Scope boundary:** This plan does NOT touch Inbox / Global / Emerging.
 Those three surfaces stay exactly as they are today — the registration
 flow and scope semantics around them are load-bearing and out of scope
 for this reorg.
 ---
 ## Position
 The wiki is Layer 3 of the engineering architecture (Human Mirror —
 see `docs/architecture/engineering-knowledge-hybrid-architecture.md`).
 It is a **derived view** over the same database the LLM reads via
 `atocore_context` / `atocore_search`. There is no parallel store; if a
 fact is in the DB it is reachable from the wiki.
 The user-facing problem is not representation, it is **findability and
 structure**. Content the operator produced (proposals, decisions,
 constraints, captured conversations) is in the DB but is not reachable
 along a natural human reading path. Today the only routes to it are:
 - typing the right keyword into `/wiki/search`
 - remembering which project it lived under and scrolling the auto-
  generated mirror markdown on `/wiki/projects/{id}`
 - the homepage "What the brain is doing" strip, which shows counts of
  audit actions but no content
 There is no "recent conversations" view, no memory index, no way to
 browse by memory type (decision vs preference vs episodic), and the
 topnav only exposes Home / Activity / Triage / Dashboard.
 ## Goal
 Make the wiki a surface the operator actually uses to answer three
 recurring questions:
 1. **"Where did I say / decide / propose X?"** — find a memory or
   interaction by topic, not by exact-string search.
 2. **"What is the current state of project Y?"** — read decisions,
   constraints, and open questions as distinct sections, not as one
   wall of auto-generated markdown.
 3. **"What happened in the system recently, and what does it mean?"**
   — a human-meaningful recent feed (captures, promotions, decisions
   recorded), not a count of audit-action names.
 Success criterion: for each of the three questions above, a first-time
 visitor reaches a useful answer in **≤ 2 clicks from `/wiki`**.
 ## Non-goals
 - No schema changes. All data already exists.
 - No change to ingestion, extraction, promotion, or the trust rules.
 - No change to Inbox / Global / Emerging surfaces or semantics.
 - No change to `/admin/triage` or `/admin/dashboard`.
 - No change to the `atocore_context` / `atocore_search` LLM-facing APIs.
 - No new storage. Every new page is a new render function over existing
  service-layer calls.
 ## What it will do
 Five additions, in priority order. Each is independently mergeable and
 independently revertable.
 ### W-1 — Interaction browser: `/wiki/interactions`
 The DB holds 234 captured interactions (per ledger `2026-04-22`). None
 are readable in the wiki. Add a paginated list view and a detail view.
 - `/wiki/interactions?project=&client=&since=` — list, newest first,
  showing timestamp, client (claude-code / openclaw / test), inferred
  project, and the first ~160 chars of the user prompt.
 - `/wiki/interactions/{id}` — full prompt + response, plus any
  candidate / active memories extracted from it (back-link from the
  memory → interaction relation if present, else a "no extractions"
  note).
 - Filters: project (multi), client, date range, "has extractions" boolean.
 Answers Q1 ("where did I say X") by giving the user a time-ordered
 stream of their own conversations, not just extracted summaries.
 ### W-2 — Memory index: `/wiki/memories`
 Today `/wiki/memories/{id}` exists but there is no list view. Add one.
 - `/wiki/memories?project=&type=&status=&tag=` — filterable list.
  Status defaults to `active`. Types: decision, preference, episodic,
  knowledge, identity, adaptation.
 - Faceted counts in the sidebar (e.g. "decision: 14 · preference: 7").
 - Each row links to `/wiki/memories/{id}` and, where present, the
  originating interaction.
 ### W-3 — Project page restructure (tabbed, not flat)
 `/wiki/projects/{id}` today renders the full mirror markdown in one
 scroll. Restructure to tabs (or anchored sections, same thing
 semantically) over the data already produced by
 `generate_project_overview`:
 - **Overview** — stage, client, type, description, headline state.
 - **Decisions** — memories of type `decision` for this project.
 - **Constraints** — project_state entries in the `constraints` category.
 - **Proposals** — memories tagged `proposal` or type `preference` with
  proposal semantics (exact filter TBD during W-3; see Open Questions).
 - **Entities** — current list, already rendered inline today.
 - **Memories** — all memories, reusing the W-2 list component filtered
  to this project.
 - **Timeline** — interactions for this project, reusing W-1 filtered
  to this project.
 No new data is produced; this is purely a slicing change.
 ### W-4 — Recent feed on homepage
 Replace the current "What the brain is doing" strip (which shows audit
 action counts) with a **content** feed: the last N events that a human
 cares about —
 - new active memory (not candidate)
 - memory promoted candidate → active
 - state entry added / changed
 - new registered project
 - interaction captured (optional, may be noisy — gate behind a toggle)
 Each feed row links to the thing it describes. The audit-count strip
 can move to `/wiki/activity` where the full audit timeline already lives.
 ### W-5 — Topnav exposure
 Surface the new pages in the topnav so they are discoverable:
 Current: `🏠 Home · 📡 Activity · 🔀 Triage · 📊 Dashboard`
 Proposed: `🏠 Home · 💬 Interactions · 🧠 Memories · 📡 Activity · 🔀 Triage · 📊 Dashboard`
 Domain pages (`/wiki/domains/{tag}`) stay reachable via tag chips on
 memory rows, as today — no new topnav entry for them.
 ## Outcome
 - Every captured interaction is readable in the wiki, with a clear
  path from interaction → extracted memories and back.
 - Memories are browsable without knowing a keyword in advance.
 - A project page separates decisions, constraints, and proposals
  instead of interleaving them in auto-generated markdown.
 - The homepage tells the operator what **content** is new, not which
  audit action counters incremented.
 - Nothing in the Inbox / Global / Emerging flow changes.
 - Nothing the LLM reads changes.
 ## Non-outcomes (explicitly)
 - This plan does not improve retrieval quality. It does not touch the
  extractor, ranking, or harness.
 - This plan does not change what is captured or when.
 - This plan does not replace `/admin/triage`. Candidate review stays there.
 ## Sequencing
 1. **W-1 first.** Interaction browser is the biggest findability win
   and has no coupling to memory code.
 2. **W-2 next.** Memory index reuses list/filter infra from W-1.
 3. **W-3** depends on W-2 (reuses the memory list component).
 4. **W-4** is independent; can land any time after W-1.
 5. **W-5** lands last so topnav only exposes pages that exist.
 Each step ships with at least one render test (HTML contains expected
 anchors / row counts against a seeded fixture), following the pattern
 already in `tests/engineering/` for existing wiki renders.
 ## Risk & reversibility
 - All additions are new routes / new render functions. Reverting any
  step is a file delete + a topnav edit.
 - Project page restructure (W-3) is the only edit to an existing
  surface. Keep the flat-markdown render behind a query flag
  (`?layout=flat`) for one release so regressions are observable.
 - No DB migrations. No service-layer signatures change.
 ## Open questions for Codex
 1. **"Proposal" as a first-class filter** (W-3) — is this a memory type,
   a domain tag, a structural field we should add, or should it stay
   derived by filter? Current DB has no explicit proposal type; we'd be
   inferring from tags/content. If that inference is unreliable, W-3's
   Proposals tab becomes noise.
 2. **Interaction → memory back-link** (W-1) — does the current schema
   already record which interaction an extracted memory came from? If
   not, is exposing that link in the wiki worth a schema addition, or
   should W-1 ship without it?
 3. **Recent feed noise floor** (W-4) — should every captured
   interaction appear in the feed, or only interactions that produced
   at least one candidate memory? The former is complete but may drown
   out signal at current capture rates (~10/day).
 4. **Ordering vs V1 Completion track** — should any of this land before
   V1-A (currently gated on soak ~2026-04-26 + density 100+), or is it
   strictly after V1 closes?
 ## Workspace note
 Canonical dev workspace is `C:\Users\antoi\ATOCore` (per `CLAUDE.md`).
 Any Codex audit of this plan should sync from `origin/main` at or after
 `2712c5d` before reviewing.
--- a/scripts/backfill_chunk_project_ids.py
+++ b/scripts/backfill_chunk_project_ids.py
@@ -0,0 +1,178 @@
 """Backfill explicit project_id into chunk and vector metadata.
 Dry-run by default. The script derives ownership from the registered project
 ingest roots and updates both SQLite source_chunks.metadata and Chroma vector
 metadata only when --apply is provided.
 """
 from __future__ import annotations
 import argparse
 import json
 import sqlite3
 import sys
 from pathlib import Path
 sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "src"))
 from atocore.models.database import get_connection  # noqa: E402
 from atocore.projects.registry import derive_project_id_for_path  # noqa: E402
 from atocore.retrieval.vector_store import get_vector_store  # noqa: E402
 DEFAULT_BATCH_SIZE = 500
 def _decode_metadata(raw: str | None) -> dict | None:
    if not raw:
        return {}
    try:
        parsed = json.loads(raw)
    except json.JSONDecodeError:
        return None
    return parsed if isinstance(parsed, dict) else None
 def _chunk_rows() -> tuple[list[dict], str]:
    try:
        with get_connection() as conn:
            rows = conn.execute(
                """
                SELECT
                    sc.id AS chunk_id,
                    sc.metadata AS chunk_metadata,
                    sd.file_path AS file_path
                FROM source_chunks sc
                JOIN source_documents sd ON sd.id = sc.document_id
                ORDER BY sd.file_path, sc.chunk_index
                """
            ).fetchall()
    except sqlite3.OperationalError as exc:
        if "source_chunks" in str(exc) or "source_documents" in str(exc):
            return [], f"missing ingestion tables: {exc}"
        raise
    return [dict(row) for row in rows], ""
 def _batches(items: list, batch_size: int) -> list[list]:
    return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]
 def backfill(
    apply: bool = False,
    project_filter: str = "",
    batch_size: int = DEFAULT_BATCH_SIZE,
    require_chroma_snapshot: bool = False,
 ) -> dict:
    rows, db_warning = _chunk_rows()
    updates: list[tuple[str, str, dict]] = []
    by_project: dict[str, int] = {}
    skipped_unowned = 0
    already_tagged = 0
    malformed_metadata = 0
    for row in rows:
        project_id = derive_project_id_for_path(row["file_path"])
        if project_filter and project_id != project_filter:
            continue
        if not project_id:
            skipped_unowned += 1
            continue
        metadata = _decode_metadata(row["chunk_metadata"])
        if metadata is None:
            malformed_metadata += 1
            continue
        if metadata.get("project_id") == project_id:
            already_tagged += 1
            continue
        metadata["project_id"] = project_id
        updates.append((row["chunk_id"], project_id, metadata))
        by_project[project_id] = by_project.get(project_id, 0) + 1
    missing_vectors: list[str] = []
    applied_updates = 0
    if apply and updates:
        if not require_chroma_snapshot:
            raise ValueError(
                "--apply requires --chroma-snapshot-confirmed after taking a Chroma backup"
            )
        vector_store = get_vector_store()
        for batch in _batches(updates, max(1, batch_size)):
            chunk_ids = [chunk_id for chunk_id, _, _ in batch]
            vector_payload = vector_store.get_metadatas(chunk_ids)
            existing_vector_metadata = {
                chunk_id: metadata
                for chunk_id, metadata in zip(
                    vector_payload.get("ids", []),
                    vector_payload.get("metadatas", []),
                    strict=False,
                )
                if isinstance(metadata, dict)
            }
            vector_ids = []
            vector_metadatas = []
            sql_updates = []
            for chunk_id, project_id, chunk_metadata in batch:
                vector_metadata = existing_vector_metadata.get(chunk_id)
                if vector_metadata is None:
                    missing_vectors.append(chunk_id)
                    continue
                vector_metadata = dict(vector_metadata)
                vector_metadata["project_id"] = project_id
                vector_ids.append(chunk_id)
                vector_metadatas.append(vector_metadata)
                sql_updates.append((json.dumps(chunk_metadata, ensure_ascii=True), chunk_id))
            if not vector_ids:
                continue
            vector_store.update_metadatas(vector_ids, vector_metadatas)
            with get_connection() as conn:
                cursor = conn.executemany(
                    "UPDATE source_chunks SET metadata = ? WHERE id = ?",
                    sql_updates,
                )
                if cursor.rowcount != len(sql_updates):
                    raise RuntimeError(
                        f"SQLite rowcount mismatch: {cursor.rowcount} != {len(sql_updates)}"
                    )
            applied_updates += len(sql_updates)
    return {
        "apply": apply,
        "total_chunks": len(rows),
        "updates": len(updates),
        "applied_updates": applied_updates,
        "already_tagged": already_tagged,
        "skipped_unowned": skipped_unowned,
        "malformed_metadata": malformed_metadata,
        "missing_vectors": len(missing_vectors),
        "db_warning": db_warning,
        "by_project": dict(sorted(by_project.items())),
    }
 def main() -> int:
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("--apply", action="store_true", help="write SQLite and Chroma metadata updates")
    parser.add_argument("--project", default="", help="optional canonical project_id filter")
    parser.add_argument("--batch-size", type=int, default=DEFAULT_BATCH_SIZE)
    parser.add_argument(
        "--chroma-snapshot-confirmed",
        action="store_true",
        help="required with --apply; confirms a Chroma snapshot exists",
    )
    args = parser.parse_args()
    payload = backfill(
        apply=args.apply,
        project_filter=args.project.strip(),
        batch_size=args.batch_size,
        require_chroma_snapshot=args.chroma_snapshot_confirmed,
    )
    print(json.dumps(payload, indent=2, ensure_ascii=True))
    return 0
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/scripts/live_status.py
+++ b/scripts/live_status.py
@@ -0,0 +1,131 @@
 """Render a compact live-status report from a running AtoCore instance.
 This is intentionally read-only and stdlib-only so it can be used from a
 fresh checkout, a cron job, or a Codex/Claude session without installing the
 full app package. The output is meant to reduce docs drift: copy the report
 into status docs only after it was generated from the live service.
 """
 from __future__ import annotations
 import argparse
 import errno
 import json
 import os
 import sys
 import urllib.error
 import urllib.request
 from typing import Any
 DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100").rstrip("/")
 DEFAULT_TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
 DEFAULT_AUTH_TOKEN = os.environ.get("ATOCORE_AUTH_TOKEN", "").strip()
 def request_json(base_url: str, path: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
    headers = {"Authorization": f"Bearer {auth_token}"} if auth_token else {}
    req = urllib.request.Request(f"{base_url}{path}", method="GET", headers=headers)
    with urllib.request.urlopen(req, timeout=timeout) as response:
        body = response.read().decode("utf-8")
        status = getattr(response, "status", None)
    payload = json.loads(body) if body.strip() else {}
    if not isinstance(payload, dict):
        payload = {"value": payload}
    if status is not None:
        payload["_http_status"] = status
    return payload
 def collect_status(base_url: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
    payload: dict[str, Any] = {"base_url": base_url}
    for name, path in {
        "health": "/health",
        "stats": "/stats",
        "projects": "/projects",
        "dashboard": "/admin/dashboard",
    }.items():
        try:
            payload[name] = request_json(base_url, path, timeout, auth_token)
        except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
            payload[name] = {"error": str(exc)}
    return payload
 def render_markdown(status: dict[str, Any]) -> str:
    health = status.get("health", {})
    stats = status.get("stats", {})
    projects = status.get("projects", {}).get("projects", [])
    dashboard = status.get("dashboard", {})
    memories = dashboard.get("memories", {}) if isinstance(dashboard.get("memories"), dict) else {}
    project_state = dashboard.get("project_state", {}) if isinstance(dashboard.get("project_state"), dict) else {}
    interactions = dashboard.get("interactions", {}) if isinstance(dashboard.get("interactions"), dict) else {}
    pipeline = dashboard.get("pipeline", {}) if isinstance(dashboard.get("pipeline"), dict) else {}
    lines = [
        "# AtoCore Live Status",
        "",
        f"- base_url: `{status.get('base_url', '')}`",
        "- endpoint_http_statuses: "
        f"`health={health.get('_http_status', 'error')}, "
        f"stats={stats.get('_http_status', 'error')}, "
        f"projects={status.get('projects', {}).get('_http_status', 'error')}, "
        f"dashboard={dashboard.get('_http_status', 'error')}`",
        f"- service_status: `{health.get('status', 'unknown')}`",
        f"- code_version: `{health.get('code_version', health.get('version', 'unknown'))}`",
        f"- build_sha: `{health.get('build_sha', 'unknown')}`",
        f"- build_branch: `{health.get('build_branch', 'unknown')}`",
        f"- build_time: `{health.get('build_time', 'unknown')}`",
        f"- env: `{health.get('env', 'unknown')}`",
        f"- documents: `{stats.get('total_documents', 'unknown')}`",
        f"- chunks: `{stats.get('total_chunks', 'unknown')}`",
        f"- vectors: `{stats.get('total_vectors', health.get('vectors_count', 'unknown'))}`",
        f"- registered_projects: `{len(projects)}`",
        f"- active_memories: `{memories.get('active', 'unknown')}`",
        f"- candidate_memories: `{memories.get('candidates', 'unknown')}`",
        f"- interactions: `{interactions.get('total', 'unknown')}`",
        f"- project_state_entries: `{project_state.get('total', 'unknown')}`",
        f"- pipeline_last_run: `{pipeline.get('last_run', 'unknown')}`",
    ]
    if projects:
        lines.extend(["", "## Projects"])
        for project in projects:
            aliases = ", ".join(project.get("aliases", []))
            suffix = f" ({aliases})" if aliases else ""
            lines.append(f"- `{project.get('id', '')}`{suffix}")
    return "\n".join(lines) + "\n"
 def main() -> int:
    parser = argparse.ArgumentParser(description="Render live AtoCore status")
    parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
    parser.add_argument("--timeout", type=int, default=DEFAULT_TIMEOUT)
    parser.add_argument(
        "--auth-token",
        default=DEFAULT_AUTH_TOKEN,
        help="Bearer token; defaults to ATOCORE_AUTH_TOKEN when set",
    )
    parser.add_argument("--json", action="store_true", help="emit raw JSON")
    args = parser.parse_args()
    status = collect_status(args.base_url.rstrip("/"), args.timeout, args.auth_token)
    if args.json:
        output = json.dumps(status, indent=2, ensure_ascii=True) + "\n"
    else:
        output = render_markdown(status)
    try:
        sys.stdout.write(output)
    except BrokenPipeError:
        return 0
    except OSError as exc:
        if exc.errno in {errno.EINVAL, errno.EPIPE}:
            return 0
        raise
    return 0
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/scripts/retrieval_eval.py
+++ b/scripts/retrieval_eval.py
@@ -44,6 +44,7 @@ import urllib.error
 import urllib.parse
 import urllib.request
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from pathlib import Path
 DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
@@ -52,6 +53,13 @@ DEFAULT_BUDGET = 3000
 DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json"
 def request_json(base_url: str, path: str, timeout: int) -> dict:
    req = urllib.request.Request(f"{base_url}{path}", method="GET")
    with urllib.request.urlopen(req, timeout=timeout) as resp:
        body = resp.read().decode("utf-8")
    return json.loads(body) if body.strip() else {}
@dataclass
 class Fixture:
    name: str
@@ -60,6 +68,7 @@ class Fixture:
    budget: int = DEFAULT_BUDGET
    expect_present: list[str] = field(default_factory=list)
    expect_absent: list[str] = field(default_factory=list)
    known_issue: bool = False
    notes: str = ""
@@ -70,8 +79,13 @@ class FixtureResult:
    missing_present: list[str]
    unexpected_absent: list[str]
    total_chars: int
    known_issue: bool = False
    error: str = ""
    @property
    def blocking_failure(self) -> bool:
        return not self.ok and not self.known_issue
 def load_fixtures(path: Path) -> list[Fixture]:
    data = json.loads(path.read_text(encoding="utf-8"))
@@ -89,6 +103,7 @@ def load_fixtures(path: Path) -> list[Fixture]:
                budget=int(raw.get("budget", DEFAULT_BUDGET)),
                expect_present=list(raw.get("expect_present", [])),
                expect_absent=list(raw.get("expect_absent", [])),
                known_issue=bool(raw.get("known_issue", False)),
                notes=raw.get("notes", ""),
            )
        )
@@ -117,6 +132,7 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
            missing_present=list(fixture.expect_present),
            unexpected_absent=[],
            total_chars=0,
            known_issue=fixture.known_issue,
            error=f"http_error: {exc}",
        )
@@ -129,16 +145,26 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
        missing_present=missing,
        unexpected_absent=unexpected,
        total_chars=len(formatted),
        known_issue=fixture.known_issue,
    )
-def print_human_report(results: list[FixtureResult]) -> None:
+def print_human_report(results: list[FixtureResult], metadata: dict) -> None:
    total = len(results)
    passed = sum(1 for r in results if r.ok)
    known = sum(1 for r in results if not r.ok and r.known_issue)
    blocking = sum(1 for r in results if r.blocking_failure)
    print(f"Retrieval eval: {passed}/{total} fixtures passed")
    print(
        "Target: "
        f"{metadata.get('base_url', 'unknown')} "
        f"build={metadata.get('health', {}).get('build_sha', 'unknown')}"
    )
    if known or blocking:
        print(f"Blocking failures: {blocking}  Known issues: {known}")
    print()
    for r in results:
-        marker = "PASS" if r.ok else "FAIL"
+        marker = "PASS" if r.ok else ("KNOWN" if r.known_issue else "FAIL")
        print(f"[{marker}] {r.fixture.name}  project={r.fixture.project}  chars={r.total_chars}")
        if r.error:
            print(f"       error: {r.error}")
@@ -150,15 +176,21 @@ def print_human_report(results: list[FixtureResult]) -> None:
            print(f"       notes: {r.fixture.notes}")
-def print_json_report(results: list[FixtureResult]) -> None:
+def print_json_report(results: list[FixtureResult], metadata: dict) -> None:
    payload = {
        "generated_at": metadata.get("generated_at"),
        "base_url": metadata.get("base_url"),
        "health": metadata.get("health", {}),
        "total": len(results),
        "passed": sum(1 for r in results if r.ok),
        "known_issues": sum(1 for r in results if not r.ok and r.known_issue),
        "blocking_failures": sum(1 for r in results if r.blocking_failure),
        "fixtures": [
            {
                "name": r.fixture.name,
                "project": r.fixture.project,
                "ok": r.ok,
                "known_issue": r.known_issue,
                "total_chars": r.total_chars,
                "missing_present": r.missing_present,
                "unexpected_absent": r.unexpected_absent,
@@ -179,15 +211,26 @@ def main() -> int:
    parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
    args = parser.parse_args()
    base_url = args.base_url.rstrip("/")
    try:
        health = request_json(base_url, "/health", args.timeout)
    except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
        health = {"error": str(exc)}
    metadata = {
        "generated_at": datetime.now(timezone.utc).isoformat(),
        "base_url": base_url,
        "health": health,
    }
    fixtures = load_fixtures(args.fixtures)
-    results = [run_fixture(f, args.base_url, args.timeout) for f in fixtures]
+    results = [run_fixture(f, base_url, args.timeout) for f in fixtures]
    if args.json:
-        print_json_report(results)
+        print_json_report(results, metadata)
    else:
-        print_human_report(results)
+        print_human_report(results, metadata)
-    return 0 if all(r.ok for r in results) else 1
+    return 0 if not any(r.blocking_failure for r in results) else 1
 if __name__ == "__main__":
--- a/scripts/retrieval_eval_fixtures.json
+++ b/scripts/retrieval_eval_fixtures.json
@@ -27,7 +27,7 @@
    "expect_absent": [
      "polisher suite"
    ],
-    "notes": "Key constraints are in Trusted Project State and in the mission-framing memory"
+    "notes": "Regression guard: query-relevant Trusted Project State requirements must survive the project-state budget cap."
  },
  {
    "name": "p04-short-ambiguous",
@@ -80,6 +80,36 @@
    ],
    "notes": "CGH is a core p05 concept. Should surface via chunks and possibly the architecture memory. Must not bleed p06 polisher-suite terms."
  },
  {
    "name": "p05-broad-status-no-atomizer",
    "project": "p05-interferometer",
    "prompt": "current status",
    "expect_present": [
      "--- Trusted Project State ---",
      "--- Project Memories ---",
      "Zygo"
    ],
    "expect_absent": [
      "atomizer-v2",
      "ATOMIZER_PODCAST_BRIEFING",
      "[Source: atomizer-v2/",
      "P04-GigaBIT-M1-KB-design"
    ],
    "notes": "Regression guard for the April 24 audit finding: broad p05 status queries must not pull Atomizer/archive context into project-scoped packs."
  },
  {
    "name": "p05-vendor-decision-no-archive-first",
    "project": "p05-interferometer",
    "prompt": "vendor selection decision",
    "expect_present": [
      "Selection-Decision"
    ],
    "expect_absent": [
      "[Source: atomizer-v2/",
      "ATOMIZER_PODCAST_BRIEFING"
    ],
    "notes": "Project-scoped decision query should stay inside p05 and prefer current decision/vendor material over unrelated project archives."
  },
  {
    "name": "p06-suite-split",
    "project": "p06-polisher",
--- a/scripts/v1_0_backfill_provenance.py
+++ b/scripts/v1_0_backfill_provenance.py
@@ -0,0 +1,167 @@
 """V1-0 one-time backfill: flag existing active entities that have no
 provenance (empty source_refs) as hand_authored=1 so they stop failing
 the F-8 invariant.
 Runs against the live AtoCore DB. Idempotent: a second run after the
 first touches nothing because the flagged rows already have
 hand_authored=1.
 Per the Engineering V1 Completion Plan (V1-0 scope), the three options
 for an existing active entity without provenance are:
 1. Attach provenance — impossible without human review, not automatable
 2. Flag hand-authored — safe, additive, this script's default
 3. Invalidate — destructive, requires operator sign-off
 This script picks option (2) by default. Add --dry-run to see what
 would change without writing. Add --invalidate-instead to pick option
 (3) for all rows (not recommended for first run).
 Usage:
    python scripts/v1_0_backfill_provenance.py --base-url http://dalidou:8100 --dry-run
    python scripts/v1_0_backfill_provenance.py --base-url http://dalidou:8100
 """
 from __future__ import annotations
 import argparse
 import json
 import sqlite3
 import sys
 from pathlib import Path
 def run(db_path: Path, dry_run: bool, invalidate_instead: bool) -> int:
    if not db_path.exists():
        print(f"ERROR: db not found: {db_path}", file=sys.stderr)
        return 2
    conn = sqlite3.connect(str(db_path))
    conn.row_factory = sqlite3.Row
    # Verify the V1-0 migration ran: if hand_authored column is missing
    # the operator hasn't deployed V1-0 yet, and running this script
    # would crash. Fail loud rather than attempt the ALTER here.
    cols = {r["name"] for r in conn.execute("PRAGMA table_info(entities)").fetchall()}
    if "hand_authored" not in cols:
        print(
            "ERROR: entities table lacks the hand_authored column. "
            "Deploy V1-0 migrations first (init_db + init_engineering_schema).",
            file=sys.stderr,
        )
        return 2
    # Scope differs by mode:
    # - Default (flag hand_authored=1): safe/additive, applies to active
    #   AND superseded rows so the historical trail is consistent.
    # - --invalidate-instead: destructive — scope to ACTIVE rows only.
    #   Invalidating already-superseded history would collapse the audit
    #   trail, which the plan's remediation scope never intended
    #   (V1-0 talks about existing active no-provenance entities).
    if invalidate_instead:
        scope_sql = "status = 'active'"
    else:
        scope_sql = "status IN ('active', 'superseded')"
    rows = conn.execute(
        f"SELECT id, entity_type, name, project, status, source_refs, hand_authored "
        f"FROM entities WHERE {scope_sql} AND hand_authored = 0"
    ).fetchall()
    needs_fix = []
    for row in rows:
        refs_raw = row["source_refs"] or "[]"
        try:
            refs = json.loads(refs_raw)
        except Exception:
            refs = []
        if not refs:
            needs_fix.append(row)
    print(f"found {len(needs_fix)} active/superseded entities with no provenance")
    for row in needs_fix:
        print(
            f"  - {row['id'][:8]} [{row['entity_type']}] "
            f"{row['name']!r} project={row['project']!r} status={row['status']}"
        )
    if dry_run:
        print("--dry-run: no changes written")
        return 0
    if not needs_fix:
        print("nothing to do")
        return 0
    action = "invalidate" if invalidate_instead else "flag hand_authored=1"
    print(f"applying: {action}")
    cur = conn.cursor()
    for row in needs_fix:
        if invalidate_instead:
            cur.execute(
                "UPDATE entities SET status = 'invalid', "
                "updated_at = CURRENT_TIMESTAMP WHERE id = ?",
                (row["id"],),
            )
            cur.execute(
                "INSERT INTO memory_audit "
                "(id, memory_id, action, actor, before_json, after_json, note, entity_kind) "
                "VALUES (?, ?, 'invalidated', 'v1_0_backfill', ?, ?, ?, 'entity')",
                (
                    f"v10bf-{row['id'][:8]}-inv",
                    row["id"],
                    json.dumps({"status": row["status"]}),
                    json.dumps({"status": "invalid"}),
                    "V1-0 backfill: invalidated, no provenance",
                ),
            )
        else:
            cur.execute(
                "UPDATE entities SET hand_authored = 1, "
                "updated_at = CURRENT_TIMESTAMP WHERE id = ?",
                (row["id"],),
            )
            cur.execute(
                "INSERT INTO memory_audit "
                "(id, memory_id, action, actor, before_json, after_json, note, entity_kind) "
                "VALUES (?, ?, 'hand_authored_flagged', 'v1_0_backfill', ?, ?, ?, 'entity')",
                (
                    f"v10bf-{row['id'][:8]}-ha",
                    row["id"],
                    json.dumps({"hand_authored": False}),
                    json.dumps({"hand_authored": True}),
                    "V1-0 backfill: flagged hand_authored since source_refs empty",
                ),
            )
    conn.commit()
    print(f"done: updated {len(needs_fix)} entities")
    return 0
 def main() -> int:
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument(
        "--db",
        type=Path,
        default=Path("data/db/atocore.db"),
        help="Path to the SQLite database (default: data/db/atocore.db)",
    )
    parser.add_argument("--dry-run", action="store_true", help="Report only; no writes")
    parser.add_argument(
        "--invalidate-instead",
        action="store_true",
        help=(
            "DESTRUCTIVE. Invalidate active rows with no provenance instead "
            "of flagging them hand_authored. Scoped to status='active' only "
            "(superseded rows are left alone to preserve audit history). "
            "Not recommended for first run — start with --dry-run, then "
            "the default hand_authored flag path."
        ),
    )
    args = parser.parse_args()
    return run(args.db, args.dry_run, args.invalidate_instead)
 if __name__ == "__main__":
    sys.exit(main())
--- a/src/atocore/api/routes.py
+++ b/src/atocore/api/routes.py
@@ -1457,6 +1457,11 @@ class EntityCreateRequest(BaseModel):
    status: str = "active"
    confidence: float = 1.0
    source_refs: list[str] | None = None
    # V1-0 provenance enforcement (F-8). Clients must either pass
    # non-empty source_refs or set hand_authored=true. The service layer
    # raises ValueError otherwise, surfaced here as 400.
    hand_authored: bool = False
    extractor_version: str | None = None
 class EntityPromoteRequest(BaseModel):
@@ -1486,6 +1491,8 @@ def api_create_entity(req: EntityCreateRequest) -> dict:
            confidence=req.confidence,
            source_refs=req.source_refs,
            actor="api-http",
            hand_authored=req.hand_authored,
            extractor_version=req.extractor_version,
        )
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))
@@ -2180,12 +2187,17 @@ def api_promote_entity(
    from atocore.engineering.service import promote_entity
    target_project = req.target_project if req is not None else None
    note = req.note if req is not None else ""
-    success = promote_entity(
+    try:
-        entity_id,
+        success = promote_entity(
-        actor="api-http",
+            entity_id,
-        note=note,
+            actor="api-http",
-        target_project=target_project,
+            note=note,
-    )
+            target_project=target_project,
        )
    except ValueError as e:
        # V1-0 F-8 re-check raises ValueError for no-provenance candidates
        # (see service.promote_entity). Surface as 400, not 500.
        raise HTTPException(status_code=400, detail=str(e))
    if not success:
        raise HTTPException(status_code=404, detail=f"Entity not found or not a candidate: {entity_id}")
    result = {"status": "promoted", "id": entity_id}
--- a/src/atocore/config.py
+++ b/src/atocore/config.py
@@ -46,6 +46,8 @@ class Settings(BaseSettings):
    # All multipliers default to the values used since Wave 1; tighten or
    # loosen them via ATOCORE_* env vars without touching code.
    rank_project_match_boost: float = 2.0
    rank_project_scope_filter: bool = True
    rank_project_scope_candidate_multiplier: int = 4
    rank_query_token_step: float = 0.08
    rank_query_token_cap: float = 1.32
    rank_path_high_signal_boost: float = 1.18
--- a/src/atocore/context/builder.py
+++ b/src/atocore/context/builder.py
@@ -11,7 +11,7 @@ from dataclasses import dataclass, field
 from pathlib import Path
 import atocore.config as _config
-from atocore.context.project_state import format_project_state, get_state
+from atocore.context.project_state import ProjectStateEntry, format_project_state, get_state
 from atocore.memory.service import get_memories_for_context
 from atocore.observability.logger import get_logger
 from atocore.engineering.service import get_entities, get_entity_with_context
@@ -116,6 +116,11 @@ def build_context(
    if canonical_project:
        state_entries = get_state(canonical_project)
        if state_entries:
            state_entries = _rank_project_state_entries(
                state_entries,
                query=user_prompt,
                project=canonical_project,
            )
            project_state_text = format_project_state(state_entries)
            project_state_text, project_state_chars = _truncate_text_block(
                project_state_text,
@@ -284,6 +289,55 @@ def get_last_context_pack() -> ContextPack | None:
    return _last_context_pack
 def _rank_project_state_entries(
    entries: list[ProjectStateEntry],
    query: str,
    project: str,
 ) -> list[ProjectStateEntry]:
    """Promote query-relevant trusted state before the state band is truncated."""
    if not query or len(entries) <= 1:
        return entries
    from atocore.memory.reinforcement import _normalize, _tokenize
    query_text = _normalize(query.replace("_", " "))
    query_tokens = set(_tokenize(query_text))
    query_tokens -= {
        "how",
        "what",
        "when",
        "where",
        "which",
        "who",
        "why",
        "current",
        "status",
        "project",
    }
    for part in (project or "").lower().replace("_", "-").split("-"):
        query_tokens.discard(part)
    if not query_tokens:
        return entries
    scored: list[tuple[int, float, float, int, ProjectStateEntry]] = []
    for index, entry in enumerate(entries):
        entry_text = " ".join(
            [
                entry.category,
                entry.key.replace("_", " "),
                entry.value,
                entry.source,
            ]
        )
        entry_tokens = _tokenize(_normalize(entry_text))
        overlap = len(entry_tokens & query_tokens) if entry_tokens else 0
        density = overlap / len(entry_tokens) if entry_tokens else 0.0
        scored.append((overlap, density, entry.confidence, -index, entry))
    scored.sort(key=lambda item: (item[0], item[1], item[2], item[3]), reverse=True)
    return [entry for _, _, _, _, entry in scored]
 def _rank_chunks(
    candidates: list[ChunkResult],
    project_hint: str | None,
--- a/src/atocore/engineering/service.py
+++ b/src/atocore/engineering/service.py
@@ -63,6 +63,12 @@ RELATIONSHIP_TYPES = [
 ENTITY_STATUSES = ["candidate", "active", "superseded", "invalid"]
 # V1-0: extractor version this module writes into new entity rows.
 # Per promotion-rules.md:268, every candidate must record the version of
 # the extractor that produced it so later re-evaluation is auditable.
 # Bump this when extraction logic materially changes.
 EXTRACTOR_VERSION = "v1.0.0"
@dataclass
 class Entity:
@@ -77,6 +83,10 @@ class Entity:
    source_refs: list[str] = field(default_factory=list)
    created_at: str = ""
    updated_at: str = ""
    # V1-0 shared-header fields per engineering-v1-acceptance.md:45.
    extractor_version: str = ""
    canonical_home: str = "entity"
    hand_authored: bool = False
@dataclass
@@ -103,10 +113,25 @@ def init_engineering_schema() -> None:
                status TEXT NOT NULL DEFAULT 'active',
                confidence REAL NOT NULL DEFAULT 1.0,
                source_refs TEXT NOT NULL DEFAULT '[]',
                extractor_version TEXT NOT NULL DEFAULT '',
                canonical_home TEXT NOT NULL DEFAULT 'entity',
                hand_authored INTEGER NOT NULL DEFAULT 0,
                created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
                updated_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP
            )
        """)
        # V1-0 (Engineering V1 completion): the three shared-header fields
        # per engineering-v1-acceptance.md:45. Idempotent ALTERs for
        # databases created before V1-0 land these columns without a full
        # migration. Fresh DBs get them via the CREATE TABLE above; the
        # ALTERs below are a no-op there.
        from atocore.models.database import _column_exists  # late import; avoids cycle
        if not _column_exists(conn, "entities", "extractor_version"):
            conn.execute("ALTER TABLE entities ADD COLUMN extractor_version TEXT DEFAULT ''")
        if not _column_exists(conn, "entities", "canonical_home"):
            conn.execute("ALTER TABLE entities ADD COLUMN canonical_home TEXT DEFAULT 'entity'")
        if not _column_exists(conn, "entities", "hand_authored"):
            conn.execute("ALTER TABLE entities ADD COLUMN hand_authored INTEGER DEFAULT 0")
        conn.execute("""
            CREATE TABLE IF NOT EXISTS relationships (
                id TEXT PRIMARY KEY,
@@ -149,6 +174,8 @@ def create_entity(
    confidence: float = 1.0,
    source_refs: list[str] | None = None,
    actor: str = "api",
    hand_authored: bool = False,
    extractor_version: str | None = None,
 ) -> Entity:
    if entity_type not in ENTITY_TYPES:
        raise ValueError(f"Invalid entity type: {entity_type}. Must be one of {ENTITY_TYPES}")
@@ -157,6 +184,21 @@ def create_entity(
    if not name or not name.strip():
        raise ValueError("Entity name must be non-empty")
    refs = list(source_refs) if source_refs else []
    # V1-0 (F-8 provenance enforcement, engineering-v1-acceptance.md:147):
    # every new entity row must carry non-empty source_refs OR be explicitly
    # flagged hand_authored. This is the non-negotiable invariant every
    # later V1 phase depends on — without it, active entities can escape
    # into the graph with no traceable origin. Raises at the write seam so
    # the bug is impossible to introduce silently.
    if not refs and not hand_authored:
        raise ValueError(
            "source_refs required: every entity must carry provenance "
            "(source_chunk_id / source_interaction_id / kb_cad_export_id / ...) "
            "or set hand_authored=True to explicitly flag a direct human write"
        )
    # Phase 5: enforce project canonicalization contract at the write seam.
    # Aliases like "p04" become "p04-gigabit" so downstream reads stay
    # consistent with the registry.
@@ -165,18 +207,22 @@ def create_entity(
    entity_id = str(uuid.uuid4())
    now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
    props = properties or {}
-    refs = source_refs or []
+    ev = extractor_version if extractor_version is not None else EXTRACTOR_VERSION
    with get_connection() as conn:
        conn.execute(
            """INSERT INTO entities
               (id, entity_type, name, project, description, properties,
-                status, confidence, source_refs, created_at, updated_at)
+                status, confidence, source_refs,
-               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+                extractor_version, canonical_home, hand_authored,
                created_at, updated_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (
                entity_id, entity_type, name.strip(), project,
                description, json.dumps(props), status, confidence,
-                json.dumps(refs), now, now,
+                json.dumps(refs),
                ev, "entity", 1 if hand_authored else 0,
                now, now,
            ),
        )
@@ -194,14 +240,31 @@ def create_entity(
            "project": project,
            "status": status,
            "confidence": confidence,
            "hand_authored": hand_authored,
            "extractor_version": ev,
        },
    )
    # V1-0 (F-5 hook, engineering-v1-acceptance.md:99): synchronous
    # conflict detection on any active-entity write. The promote path
    # already had this hook (see promote_entity below); V1-0 adds it to
    # direct-active creates so every active row — however it got that
    # way — is checked. Fail-open per "flag, never block" rule in
    # conflict-model.md:256: detector errors log but never fail the write.
    if status == "active":
        try:
            from atocore.engineering.conflicts import detect_conflicts_for_entity
            detect_conflicts_for_entity(entity_id)
        except Exception as e:
            log.warning("conflict_detection_failed", entity_id=entity_id, error=str(e))
    return Entity(
        id=entity_id, entity_type=entity_type, name=name.strip(),
        project=project, description=description, properties=props,
        status=status, confidence=confidence, source_refs=refs,
        created_at=now, updated_at=now,
        extractor_version=ev, canonical_home="entity",
        hand_authored=hand_authored,
    )
@@ -361,6 +424,20 @@ def promote_entity(
    if entity is None or entity.status != "candidate":
        return False
    # V1-0 (F-8 provenance re-check at promote). The invariant must hold at
    # BOTH create_entity AND promote_entity per the plan, because candidate
    # rows can exist in the DB from before V1-0 (no enforcement at their
    # create time) or can be inserted by code paths that bypass the service
    # layer. Block any candidate with empty source_refs that is NOT flagged
    # hand_authored from ever becoming active. Same error shape as the
    # create-side check for symmetry.
    if not (entity.source_refs or []) and not entity.hand_authored:
        raise ValueError(
            "source_refs required: cannot promote a candidate with no "
            "provenance. Attach source_refs via PATCH /entities/{id}, "
            "or flag hand_authored=true before promoting."
        )
    if target_project is not None:
        new_project = (
            resolve_project_name(target_project) if target_project else ""
@@ -503,6 +580,22 @@ def supersede_entity(
                superseded_by=superseded_by,
                error=str(e),
            )
        # V1-0 (F-5 hook on supersede, per plan's "every active-entity
        # write path"). Supersede demotes `entity_id` AND adds a
        # `supersedes` relationship rooted at the already-active
        # `superseded_by`. That new edge can create a conflict the
        # detector should catch synchronously. Fail-open per
        # conflict-model.md:256.
        try:
            from atocore.engineering.conflicts import detect_conflicts_for_entity
            detect_conflicts_for_entity(superseded_by)
        except Exception as e:
            log.warning(
                "conflict_detection_failed",
                entity_id=superseded_by,
                error=str(e),
            )
    return True
@@ -774,6 +867,15 @@ def get_entity_with_context(entity_id: str) -> dict | None:
 def _row_to_entity(row) -> Entity:
    # V1-0 shared-header fields are optional on read — rows that predate
    # V1-0 migration have NULL / missing values, so defaults kick in and
    # older tests that build Entity() without the new fields keep passing.
    # `row.keys()` lets us tolerate SQLite rows that lack the columns
    # entirely (pre-migration sqlite3.Row).
    keys = set(row.keys())
    extractor_version = (row["extractor_version"] or "") if "extractor_version" in keys else ""
    canonical_home = (row["canonical_home"] or "entity") if "canonical_home" in keys else "entity"
    hand_authored = bool(row["hand_authored"]) if "hand_authored" in keys and row["hand_authored"] is not None else False
    return Entity(
        id=row["id"],
        entity_type=row["entity_type"],
@@ -786,6 +888,9 @@ def _row_to_entity(row) -> Entity:
        source_refs=json.loads(row["source_refs"] or "[]"),
        created_at=row["created_at"] or "",
        updated_at=row["updated_at"] or "",
        extractor_version=extractor_version,
        canonical_home=canonical_home,
        hand_authored=hand_authored,
    )
--- a/src/atocore/engineering/wiki.py
+++ b/src/atocore/engineering/wiki.py
@@ -391,6 +391,8 @@ def render_new_entity_form(name: str = "", project: str = "") -> str:
      entity_type: fd.get('entity_type'),
      project: fd.get('project') || '',
      description: fd.get('description') || '',
      // V1-0: human writes via the wiki form are hand_authored by definition.
      hand_authored: true,
    };
    try {
      const r = await fetch('/v1/entities', {
--- a/src/atocore/ingestion/pipeline.py
+++ b/src/atocore/ingestion/pipeline.py
@@ -32,10 +32,23 @@ def exclusive_ingestion():
        _INGESTION_LOCK.release()
-def ingest_file(file_path: Path) -> dict:
+def ingest_file(file_path: Path, project_id: str = "") -> dict:
    """Ingest a single markdown file. Returns stats."""
    start = time.time()
    file_path = file_path.resolve()
    project_id = (project_id or "").strip()
    if not project_id:
        try:
            from atocore.projects.registry import derive_project_id_for_path
            project_id = derive_project_id_for_path(file_path)
        except Exception as exc:
            log.warning(
                "project_id_derivation_failed",
                file_path=str(file_path),
                error=str(exc),
            )
            project_id = ""
    if not file_path.exists():
        raise FileNotFoundError(f"File not found: {file_path}")
@@ -65,6 +78,7 @@ def ingest_file(file_path: Path) -> dict:
        "source_file": str(file_path),
        "tags": parsed.tags,
        "title": parsed.title,
        "project_id": project_id,
    }
    chunks = chunk_markdown(parsed.body, base_metadata=base_meta)
@@ -116,6 +130,7 @@ def ingest_file(file_path: Path) -> dict:
                        "source_file": str(file_path),
                        "tags": json.dumps(parsed.tags),
                        "title": parsed.title,
                        "project_id": project_id,
                    })
                    conn.execute(
@@ -173,7 +188,17 @@ def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
        purge_deleted: If True, remove DB/vector entries for files
                       that no longer exist on disk.
    """
    return ingest_project_folder(folder_path, purge_deleted=purge_deleted, project_id="")
 def ingest_project_folder(
    folder_path: Path,
    purge_deleted: bool = True,
    project_id: str = "",
 ) -> list[dict]:
    """Ingest a folder and annotate chunks with an optional project id."""
    folder_path = folder_path.resolve()
    project_id = (project_id or "").strip()
    if not folder_path.is_dir():
        raise NotADirectoryError(f"Not a directory: {folder_path}")
@@ -187,7 +212,7 @@ def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
    # Ingest new/changed files
    for md_file in md_files:
        try:
-            result = ingest_file(md_file)
+            result = ingest_file(md_file, project_id=project_id)
            results.append(result)
        except Exception as e:
            log.error("ingestion_error", file_path=str(md_file), error=str(e))
--- a/src/atocore/memory/service.py
+++ b/src/atocore/memory/service.py
@@ -50,6 +50,9 @@ MEMORY_STATUSES = [
    "graduated",  # Phase 5: memory has become an entity; content frozen, forward pointer in properties
 ]
 DEFAULT_CONTEXT_MEMORY_LIMIT = 30
 QUERY_CONTEXT_MEMORY_LIMIT = 120
@dataclass
 class Memory:
@@ -896,6 +899,7 @@ def get_memories_for_context(
        from atocore.memory.reinforcement import _normalize, _tokenize
        query_tokens = _tokenize(_normalize(query))
        query_tokens = _prepare_memory_query_tokens(query_tokens, project=project)
        if not query_tokens:
            query_tokens = None
@@ -908,12 +912,13 @@ def get_memories_for_context(
    # ``_rank_memories_for_query`` via Python's stable sort.
    pool: list[Memory] = []
    seen_ids: set[str] = set()
    candidate_limit = QUERY_CONTEXT_MEMORY_LIMIT if query_tokens is not None else DEFAULT_CONTEXT_MEMORY_LIMIT
    for mtype in memory_types:
        for mem in get_memories(
            memory_type=mtype,
            project=project,
            min_confidence=0.5,
-            limit=30,
+            limit=candidate_limit,
        ):
            if mem.id in seen_ids:
                continue
@@ -980,11 +985,11 @@ def _rank_memories_for_query(
 ) -> list["Memory"]:
    """Rerank a memory list by lexical overlap with a pre-tokenized query.
-    Primary key: overlap_density (overlap_count / memory_token_count),
+    Primary key: absolute overlap count, which keeps a richer memory
-    which rewards short focused memories that match the query precisely
+    matching multiple query-intent terms ahead of a short memory that
-    over long overview memories that incidentally share a few tokens.
+    only happens to share one term. Secondary: overlap_density
-    Secondary: absolute overlap count. Tertiary: domain-tag match.
+    (overlap_count / memory_token_count), so ties still prefer short
-    Quaternary: confidence.
+    focused memories. Tertiary: domain-tag match. Quaternary: confidence.
    Phase 3: domain_tags contribute a boost when they appear in the
    query text. A memory tagged [optics, thermal] for a query about
@@ -1010,10 +1015,46 @@ def _rank_memories_for_query(
                tag_hits += 1
        scored.append((density, overlap, tag_hits, mem.confidence, mem))
-    scored.sort(key=lambda t: (t[0], t[1], t[2], t[3]), reverse=True)
+    scored.sort(key=lambda t: (t[1], t[0], t[2], t[3]), reverse=True)
    return [mem for _, _, _, _, mem in scored]
 _MEMORY_QUERY_STOP_TOKENS = {
    "how",
    "what",
    "when",
    "where",
    "which",
    "who",
    "why",
    "current",
    "status",
    "project",
    "machine",
 }
 _MEMORY_QUERY_TOKEN_EXPANSIONS = {
    "remotely": {"remote"},
 }
 def _prepare_memory_query_tokens(
    query_tokens: set[str],
    project: str | None = None,
 ) -> set[str]:
    """Remove project-scope noise and add tiny intent-preserving expansions."""
    prepared = set(query_tokens)
    for token in list(prepared):
        prepared.update(_MEMORY_QUERY_TOKEN_EXPANSIONS.get(token, set()))
    prepared -= _MEMORY_QUERY_STOP_TOKENS
    if project:
        for part in project.lower().replace("_", "-").split("-"):
            if part:
                prepared.discard(part)
    return prepared
 def _row_to_memory(row) -> Memory:
    """Convert a DB row to Memory dataclass."""
    import json as _json
--- a/src/atocore/models/database.py
+++ b/src/atocore/models/database.py
@@ -146,6 +146,28 @@ def _apply_migrations(conn: sqlite3.Connection) -> None:
        "CREATE INDEX IF NOT EXISTS idx_memories_graduated ON memories(graduated_to_entity_id)"
    )
    # V1-0 (Engineering V1 completion): shared header fields per
    # engineering-v1-acceptance.md:45. Three columns on `entities`:
    # - extractor_version: which extractor produced this row. Lets old
    #   candidates be re-evaluated with a newer extractor per
    #   promotion-rules.md:268.
    # - canonical_home: which layer holds the canonical record. Always
    #   "entity" for rows written via create_entity; reserved for future
    #   cross-layer bookkeeping.
    # - hand_authored: 1 when the row was created directly by a human
    #   without source provenance. Enforced by the write path so every
    #   non-hand-authored row must carry non-empty source_refs (F-8).
    # The entities table itself is created by init_engineering_schema
    # (see engineering/service.py); these ALTERs cover existing DBs
    # where the original CREATE TABLE predates V1-0.
    if _table_exists(conn, "entities"):
        if not _column_exists(conn, "entities", "extractor_version"):
            conn.execute("ALTER TABLE entities ADD COLUMN extractor_version TEXT DEFAULT ''")
        if not _column_exists(conn, "entities", "canonical_home"):
            conn.execute("ALTER TABLE entities ADD COLUMN canonical_home TEXT DEFAULT 'entity'")
        if not _column_exists(conn, "entities", "hand_authored"):
            conn.execute("ALTER TABLE entities ADD COLUMN hand_authored INTEGER DEFAULT 0")
    # Phase 4 (Robustness V1): append-only audit log for memory mutations.
    # Every create/update/promote/reject/supersede/invalidate/reinforce/expire/
    # auto_promote writes one row here. before/after are JSON snapshots of the
@@ -352,6 +374,14 @@ def _column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
    return any(row["name"] == column for row in rows)
 def _table_exists(conn: sqlite3.Connection, table: str) -> bool:
    row = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='table' AND name=?",
        (table,),
    ).fetchone()
    return row is not None
@contextmanager
 def get_connection() -> Generator[sqlite3.Connection, None, None]:
    """Get a database connection with row factory."""
--- a/src/atocore/projects/registry.py
+++ b/src/atocore/projects/registry.py
@@ -8,7 +8,6 @@ from dataclasses import asdict, dataclass
 from pathlib import Path
 import atocore.config as _config
 from atocore.ingestion.pipeline import ingest_folder
 # Reserved pseudo-projects. `inbox` holds pre-project / lead / quote
@@ -260,6 +259,7 @@ def load_project_registry() -> list[RegisteredProject]:
        )
    _validate_unique_project_names(projects)
    _validate_ingest_root_overlaps(projects)
    return projects
@@ -307,6 +307,28 @@ def resolve_project_name(name: str | None) -> str:
    return name
 def derive_project_id_for_path(file_path: str | Path) -> str:
    """Return the registered project that owns a source path, if any."""
    if not file_path:
        return ""
    doc_path = Path(file_path).resolve(strict=False)
    matches: list[tuple[int, int, str]] = []
    for project in load_project_registry():
        for source_ref in project.ingest_roots:
            root_path = _resolve_ingest_root(source_ref)
            try:
                doc_path.relative_to(root_path)
            except ValueError:
                continue
            matches.append((len(root_path.parts), len(str(root_path)), project.project_id))
    if not matches:
        return ""
    matches.sort(reverse=True)
    return matches[0][2]
 def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
    """Ingest all configured source roots for a registered project.
@@ -322,6 +344,8 @@ def refresh_registered_project(project_name: str, purge_deleted: bool = False) -
    if project is None:
        raise ValueError(f"Unknown project: {project_name}")
    from atocore.ingestion.pipeline import ingest_project_folder
    roots = []
    ingested_count = 0
    skipped_count = 0
@@ -346,7 +370,11 @@ def refresh_registered_project(project_name: str, purge_deleted: bool = False) -
            {
                **root_result,
                "status": "ingested",
-                "results": ingest_folder(resolved, purge_deleted=purge_deleted),
+                "results": ingest_project_folder(
                    resolved,
                    purge_deleted=purge_deleted,
                    project_id=project.project_id,
                ),
            }
        )
        ingested_count += 1
@@ -443,6 +471,33 @@ def _validate_unique_project_names(projects: list[RegisteredProject]) -> None:
            seen[key] = project.project_id
 def _validate_ingest_root_overlaps(projects: list[RegisteredProject]) -> None:
    roots: list[tuple[str, Path]] = []
    for project in projects:
        for source_ref in project.ingest_roots:
            roots.append((project.project_id, _resolve_ingest_root(source_ref)))
    for i, (left_project, left_root) in enumerate(roots):
        for right_project, right_root in roots[i + 1:]:
            if left_project == right_project:
                continue
            try:
                left_root.relative_to(right_root)
                overlaps = True
            except ValueError:
                try:
                    right_root.relative_to(left_root)
                    overlaps = True
                except ValueError:
                    overlaps = False
            if overlaps:
                raise ValueError(
                    "Project registry ingest root overlap: "
                    f"'{left_root}' ({left_project}) and "
                    f"'{right_root}' ({right_project})"
                )
 def _find_name_collisions(
    project_id: str,
    aliases: list[str],
--- a/src/atocore/retrieval/retriever.py
+++ b/src/atocore/retrieval/retriever.py
@@ -1,5 +1,6 @@
 """Retrieval: query to ranked chunks."""
 import json
 import re
 import time
 from dataclasses import dataclass
@@ -7,7 +8,7 @@ from dataclasses import dataclass
 import atocore.config as _config
 from atocore.models.database import get_connection
 from atocore.observability.logger import get_logger
-from atocore.projects.registry import get_registered_project
+from atocore.projects.registry import RegisteredProject, get_registered_project, load_project_registry
 from atocore.retrieval.embeddings import embed_query
 from atocore.retrieval.vector_store import get_vector_store
@@ -83,6 +84,27 @@ def retrieve(
    """Retrieve the most relevant chunks for a query."""
    top_k = top_k or _config.settings.context_top_k
    start = time.time()
    try:
        scoped_project = get_registered_project(project_hint) if project_hint else None
    except Exception as exc:
        log.warning(
            "project_scope_resolution_failed",
            project_hint=project_hint,
            error=str(exc),
        )
        scoped_project = None
    scope_filter_enabled = bool(scoped_project and _config.settings.rank_project_scope_filter)
    registered_projects = None
    query_top_k = top_k
    if scope_filter_enabled:
        query_top_k = max(
            top_k,
            top_k * max(1, _config.settings.rank_project_scope_candidate_multiplier),
        )
        try:
            registered_projects = load_project_registry()
        except Exception:
            registered_projects = None
    query_embedding = embed_query(query)
    store = get_vector_store()
@@ -101,11 +123,12 @@ def retrieve(
    results = store.query(
        query_embedding=query_embedding,
-        top_k=top_k,
+        top_k=query_top_k,
        where=where,
    )
    chunks = []
    raw_result_count = len(results["ids"][0]) if results and results["ids"] and results["ids"][0] else 0
    if results and results["ids"] and results["ids"][0]:
        existing_ids = _existing_chunk_ids(results["ids"][0])
        for i, chunk_id in enumerate(results["ids"][0]):
@@ -117,6 +140,13 @@ def retrieve(
            meta = results["metadatas"][0][i] if results["metadatas"] else {}
            content = results["documents"][0][i] if results["documents"] else ""
            if scope_filter_enabled and not _is_allowed_for_project_scope(
                scoped_project,
                meta,
                registered_projects,
            ):
                continue
            score *= _query_match_boost(query, meta)
            score *= _path_signal_boost(meta)
            if project_hint:
@@ -137,42 +167,151 @@ def retrieve(
    duration_ms = int((time.time() - start) * 1000)
    chunks.sort(key=lambda chunk: chunk.score, reverse=True)
    post_filter_count = len(chunks)
    chunks = chunks[:top_k]
    log.info(
        "retrieval_done",
        query=query[:100],
        top_k=top_k,
        query_top_k=query_top_k,
        raw_results_count=raw_result_count,
        post_filter_count=post_filter_count,
        results_count=len(chunks),
        post_filter_dropped=max(0, raw_result_count - post_filter_count),
        underfilled=bool(raw_result_count >= query_top_k and len(chunks) < top_k),
        duration_ms=duration_ms,
    )
    return chunks
 def _is_allowed_for_project_scope(
    project: RegisteredProject,
    metadata: dict,
    registered_projects: list[RegisteredProject] | None = None,
 ) -> bool:
    """Return True when a chunk is target-project or not project-owned.
    Project-hinted retrieval should not let one registered project's corpus
    compete with another's. At the same time, unowned/global sources should
    remain eligible because shared docs and cross-project references can be
    genuinely useful. The registry gives us the boundary: if metadata matches
    a registered project and it is not the requested project, filter it out.
    """
    if _metadata_matches_project(project, metadata):
        return True
    if registered_projects is None:
        try:
            registered_projects = load_project_registry()
        except Exception:
            return True
    for other in registered_projects:
        if other.project_id == project.project_id:
            continue
        if _metadata_matches_project(other, metadata):
            return False
    return True
 def _metadata_matches_project(project: RegisteredProject, metadata: dict) -> bool:
    stored_project_id = str(metadata.get("project_id", "")).strip().lower()
    if stored_project_id:
        return stored_project_id == project.project_id.lower()
    path = _metadata_source_path(metadata)
    tags = _metadata_tags(metadata)
    for term in _project_scope_terms(project):
        if _path_matches_term(path, term) or term in tags:
            return True
    return False
 def _project_scope_terms(project: RegisteredProject) -> set[str]:
    terms = {project.project_id.lower()}
    terms.update(alias.lower() for alias in project.aliases)
    for source_ref in project.ingest_roots:
        normalized = source_ref.subpath.replace("\\", "/").strip("/").lower()
        if normalized:
            terms.add(normalized)
            terms.add(normalized.split("/")[-1])
    return {term for term in terms if term}
 def _metadata_searchable(metadata: dict) -> str:
    return " ".join(
        [
            str(metadata.get("source_file", "")).replace("\\", "/").lower(),
            str(metadata.get("title", "")).lower(),
            str(metadata.get("heading_path", "")).lower(),
            str(metadata.get("tags", "")).lower(),
        ]
    )
 def _metadata_source_path(metadata: dict) -> str:
    return str(metadata.get("source_file", "")).replace("\\", "/").strip("/").lower()
 def _metadata_tags(metadata: dict) -> set[str]:
    raw_tags = metadata.get("tags", [])
    if isinstance(raw_tags, (list, tuple, set)):
        return {str(tag).strip().lower() for tag in raw_tags if str(tag).strip()}
    if isinstance(raw_tags, str):
        try:
            parsed = json.loads(raw_tags)
        except json.JSONDecodeError:
            parsed = [raw_tags]
        if isinstance(parsed, (list, tuple, set)):
            return {str(tag).strip().lower() for tag in parsed if str(tag).strip()}
        if isinstance(parsed, str) and parsed.strip():
            return {parsed.strip().lower()}
    return set()
 def _path_matches_term(path: str, term: str) -> bool:
    normalized = term.replace("\\", "/").strip("/").lower()
    if not path or not normalized:
        return False
    if "/" in normalized:
        return path == normalized or path.startswith(f"{normalized}/")
    return normalized in set(path.split("/"))
 def _metadata_has_term(metadata: dict, term: str) -> bool:
    normalized = term.replace("\\", "/").strip("/").lower()
    if not normalized:
        return False
    if _path_matches_term(_metadata_source_path(metadata), normalized):
        return True
    if normalized in _metadata_tags(metadata):
        return True
    return re.search(
        rf"(?<![a-z0-9]){re.escape(normalized)}(?![a-z0-9])",
        _metadata_searchable(metadata),
    ) is not None
 def _project_match_boost(project_hint: str, metadata: dict) -> float:
    """Return a project-aware relevance multiplier for raw retrieval."""
    hint_lower = project_hint.strip().lower()
    if not hint_lower:
        return 1.0
-    source_file = str(metadata.get("source_file", "")).lower()
+    try:
-    title = str(metadata.get("title", "")).lower()
+        project = get_registered_project(project_hint)
-    tags = str(metadata.get("tags", "")).lower()
+    except Exception as exc:
-    searchable = " ".join([source_file, title, tags])
+        log.warning(
-
+            "project_match_boost_resolution_failed",
-    project = get_registered_project(project_hint)
+            project_hint=project_hint,
-    candidate_names = {hint_lower}
+            error=str(exc),
    if project is not None:
        candidate_names.add(project.project_id.lower())
        candidate_names.update(alias.lower() for alias in project.aliases)
        candidate_names.update(
            source_ref.subpath.replace("\\", "/").strip("/").split("/")[-1].lower()
            for source_ref in project.ingest_roots
            if source_ref.subpath.strip("/\\")
        )
-
+        project = None
    candidate_names = _project_scope_terms(project) if project is not None else {hint_lower}
    for candidate in candidate_names:
-        if candidate and candidate in searchable:
+        if _metadata_has_term(metadata, candidate):
            return _config.settings.rank_project_match_boost
    return 1.0
--- a/src/atocore/retrieval/vector_store.py
+++ b/src/atocore/retrieval/vector_store.py
@@ -64,6 +64,18 @@ class VectorStore:
            self._collection.delete(ids=ids)
            log.debug("vectors_deleted", count=len(ids))
    def get_metadatas(self, ids: list[str]) -> dict:
        """Fetch vector metadata by chunk IDs."""
        if not ids:
            return {"ids": [], "metadatas": []}
        return self._collection.get(ids=ids, include=["metadatas"])
    def update_metadatas(self, ids: list[str], metadatas: list[dict]) -> None:
        """Update vector metadata without re-embedding documents."""
        if ids:
            self._collection.update(ids=ids, metadatas=metadatas)
            log.debug("vector_metadatas_updated", count=len(ids))
    @property
    def count(self) -> int:
        return self._collection.count()
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -16,6 +16,36 @@ os.environ["ATOCORE_DATA_DIR"] = _default_test_dir
 os.environ["ATOCORE_DEBUG"] = "true"
 # V1-0: every entity created in a test is "hand authored" by the test
 # author — fixture data, not extracted content. Rather than rewrite 100+
 # existing test call sites, wrap create_entity so that tests which don't
 # provide source_refs get hand_authored=True automatically. Tests that
 # explicitly pass source_refs or hand_authored are unaffected. This keeps
 # the F-8 invariant enforced in production (the API, the wiki form, and
 # graduation scripts all go through the unwrapped function) while leaving
 # the existing test corpus intact.
 def _patch_create_entity_for_tests():
    from atocore.engineering import service as _svc
    _original = _svc.create_entity
    def _create_entity_test(*args, **kwargs):
        # Only auto-flag when hand_authored isn't explicitly specified.
        # Tests that want to exercise the F-8 raise path pass
        # hand_authored=False explicitly and should hit the error.
        if (
            not kwargs.get("source_refs")
            and "hand_authored" not in kwargs
        ):
            kwargs["hand_authored"] = True
        return _original(*args, **kwargs)
    _svc.create_entity = _create_entity_test
 _patch_create_entity_for_tests()
@pytest.fixture
 def tmp_data_dir(tmp_path):
    """Provide a temporary data directory for tests."""
--- a/tests/test_backfill_chunk_project_ids.py
+++ b/tests/test_backfill_chunk_project_ids.py
@@ -0,0 +1,154 @@
 """Tests for explicit chunk project_id metadata backfill."""
 import json
 import atocore.config as config
 from atocore.models.database import get_connection, init_db
 from scripts import backfill_chunk_project_ids as backfill
 def _write_registry(tmp_path, monkeypatch):
    vault_dir = tmp_path / "vault"
    drive_dir = tmp_path / "drive"
    config_dir = tmp_path / "config"
    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
    project_dir.mkdir(parents=True)
    drive_dir.mkdir()
    config_dir.mkdir()
    registry_path = config_dir / "project-registry.json"
    registry_path.write_text(
        json.dumps(
            {
                "projects": [
                    {
                        "id": "p04-gigabit",
                        "aliases": ["p04"],
                        "ingest_roots": [
                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
                        ],
                    }
                ]
            }
        ),
        encoding="utf-8",
    )
    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
    config.settings = config.Settings()
    return project_dir
 def _insert_chunk(file_path, metadata=None, chunk_id="chunk-1"):
    with get_connection() as conn:
        conn.execute(
            """
            INSERT INTO source_documents (id, file_path, file_hash, title, doc_type, tags)
            VALUES (?, ?, ?, ?, ?, ?)
            """,
            ("doc-1", str(file_path), "hash", "Title", "markdown", "[]"),
        )
        conn.execute(
            """
            INSERT INTO source_chunks
                (id, document_id, chunk_index, content, heading_path, char_count, metadata)
            VALUES (?, ?, ?, ?, ?, ?, ?)
            """,
            (
                chunk_id,
                "doc-1",
                0,
                "content",
                "Overview",
                7,
                json.dumps(metadata if metadata is not None else {}),
            ),
        )
 class FakeVectorStore:
    def __init__(self, metadatas):
        self.metadatas = dict(metadatas)
        self.updated = []
    def get_metadatas(self, ids):
        returned_ids = [chunk_id for chunk_id in ids if chunk_id in self.metadatas]
        return {
            "ids": returned_ids,
            "metadatas": [self.metadatas[chunk_id] for chunk_id in returned_ids],
        }
    def update_metadatas(self, ids, metadatas):
        self.updated.append((list(ids), list(metadatas)))
        for chunk_id, metadata in zip(ids, metadatas, strict=True):
            self.metadatas[chunk_id] = metadata
 def test_backfill_dry_run_is_non_mutating(tmp_data_dir, tmp_path, monkeypatch):
    init_db()
    project_dir = _write_registry(tmp_path, monkeypatch)
    _insert_chunk(project_dir / "status.md")
    result = backfill.backfill(apply=False)
    assert result["updates"] == 1
    with get_connection() as conn:
        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
    assert json.loads(row["metadata"]) == {}
 def test_backfill_apply_updates_chroma_then_sql(tmp_data_dir, tmp_path, monkeypatch):
    init_db()
    project_dir = _write_registry(tmp_path, monkeypatch)
    _insert_chunk(project_dir / "status.md", metadata={"source_file": "status.md"})
    fake_store = FakeVectorStore({"chunk-1": {"source_file": "status.md"}})
    monkeypatch.setattr(backfill, "get_vector_store", lambda: fake_store)
    result = backfill.backfill(apply=True, require_chroma_snapshot=True)
    assert result["applied_updates"] == 1
    assert fake_store.metadatas["chunk-1"]["project_id"] == "p04-gigabit"
    with get_connection() as conn:
        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
    assert json.loads(row["metadata"])["project_id"] == "p04-gigabit"
 def test_backfill_apply_requires_snapshot_confirmation(tmp_data_dir, tmp_path, monkeypatch):
    init_db()
    project_dir = _write_registry(tmp_path, monkeypatch)
    _insert_chunk(project_dir / "status.md")
    try:
        backfill.backfill(apply=True)
    except ValueError as exc:
        assert "Chroma backup" in str(exc)
    else:
        raise AssertionError("Expected snapshot confirmation requirement")
 def test_backfill_missing_vector_skips_sql_update(tmp_data_dir, tmp_path, monkeypatch):
    init_db()
    project_dir = _write_registry(tmp_path, monkeypatch)
    _insert_chunk(project_dir / "status.md")
    fake_store = FakeVectorStore({})
    monkeypatch.setattr(backfill, "get_vector_store", lambda: fake_store)
    result = backfill.backfill(apply=True, require_chroma_snapshot=True)
    assert result["updates"] == 1
    assert result["applied_updates"] == 0
    assert result["missing_vectors"] == 1
    with get_connection() as conn:
        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
    assert json.loads(row["metadata"]) == {}
 def test_backfill_skips_malformed_metadata(tmp_data_dir, tmp_path, monkeypatch):
    init_db()
    project_dir = _write_registry(tmp_path, monkeypatch)
    _insert_chunk(project_dir / "status.md", metadata=[])
    result = backfill.backfill(apply=False)
    assert result["updates"] == 0
    assert result["malformed_metadata"] == 1
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -46,6 +46,8 @@ def test_settings_keep_legacy_db_path_when_present(tmp_path, monkeypatch):
 def test_ranking_weights_are_tunable_via_env(monkeypatch):
    monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5")
    monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_FILTER", "false")
    monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER", "6")
    monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12")
    monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5")
    monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25")
@@ -54,6 +56,8 @@ def test_ranking_weights_are_tunable_via_env(monkeypatch):
    settings = config.Settings()
    assert settings.rank_project_match_boost == 3.5
    assert settings.rank_project_scope_filter is False
    assert settings.rank_project_scope_candidate_multiplier == 6
    assert settings.rank_query_token_step == 0.12
    assert settings.rank_query_token_cap == 1.5
    assert settings.rank_path_high_signal_boost == 1.25
--- a/tests/test_context_builder.py
+++ b/tests/test_context_builder.py
@@ -143,6 +143,52 @@ def test_project_state_respects_total_budget(tmp_data_dir, sample_markdown):
    assert len(pack.formatted_context) <= 120
 def test_project_state_query_relevance_before_truncation(tmp_data_dir, sample_markdown):
    """Relevant trusted state should survive the project-state budget cap."""
    init_db()
    init_project_state_schema()
    ingest_file(sample_markdown)
    set_state(
        "p04-gigabit",
        "contact",
        "abb-space",
        "ABB Space is the primary vendor contact for polishing, CCP, IBF, procurement coordination, "
        "contract administration, interface planning, and delivery discussions.",
    )
    set_state(
        "p04-gigabit",
        "decision",
        "back-structure",
        "Option B selected: conical isogrid back structure with variable rib density. "
        "Chosen over flat-back for stiffness-to-weight ratio and manufacturability.",
    )
    set_state(
        "p04-gigabit",
        "decision",
        "polishing-vendor",
        "ABB Space selected as polishing vendor. Contract includes computer-controlled polishing "
        "and ion beam figuring.",
    )
    set_state(
        "p04-gigabit",
        "requirement",
        "key_constraints",
        "The program targets a 1.2 m lightweight Zerodur mirror with filtered mechanical WFE below 15 nm "
        "and mass below 103.5 kg.",
    )
    pack = build_context(
        "what are the key GigaBIT M1 program constraints",
        project_hint="p04-gigabit",
        budget=3000,
    )
    assert "Zerodur" in pack.formatted_context
    assert "1.2" in pack.formatted_context
    assert pack.formatted_context.find("[REQUIREMENT]") < pack.formatted_context.find("[CONTACT]")
 def test_project_hint_matches_state_case_insensitively(tmp_data_dir, sample_markdown):
    """Project state lookup should not depend on exact casing."""
    init_db()
--- a/tests/test_engineering_v1_phase5.py
+++ b/tests/test_engineering_v1_phase5.py
@@ -143,8 +143,11 @@ def test_requirement_name_conflict_detected(tmp_data_dir):
    r2 = create_entity("requirement", "Surface figure < 25nm",
                      project="p-test", description="Different interpretation")
-    detected = detect_conflicts_for_entity(r2.id)
+    # V1-0 synchronous hook: the conflict is already detected at r2's
-    assert len(detected) == 1
+    # create-time, so a redundant detect call returns [] due to
    # _record_conflict dedup. Assert on list_open_conflicts instead —
    # that's what the intent of this test really tests: duplicate
    # active requirements surface as an open conflict.
    conflicts = list_open_conflicts(project="p-test")
    assert any(c["slot_kind"] == "requirement.name" for c in conflicts)
@@ -191,8 +194,12 @@ def test_conflict_resolution_dismiss_leaves_entities_alone(tmp_data_dir):
                      description="first meaning")
    r2 = create_entity("requirement", "Dup req", project="p-test",
                      description="second meaning")
-    detected = detect_conflicts_for_entity(r2.id)
+    # V1-0 synchronous hook already recorded the conflict at r2's
-    conflict_id = detected[0]
+    # create-time. Look it up via list_open_conflicts rather than
    # calling the detector again (which returns [] due to dedup).
    open_list = list_open_conflicts(project="p-test")
    assert open_list, "expected conflict recorded by create-time hook"
    conflict_id = open_list[0]["id"]
    assert resolve_conflict(conflict_id, "dismiss")
    # Both still active — dismiss just clears the conflict marker
--- a/tests/test_inbox_crossproject.py
+++ b/tests/test_inbox_crossproject.py
@@ -132,6 +132,7 @@ def test_api_post_entity_with_null_project_stores_global(seeded_db):
        "entity_type": "material",
        "name": "Titanium",
        "project": None,
        "hand_authored": True,  # V1-0 F-8: test fixture, no source_refs
    })
    assert r.status_code == 200
--- a/tests/test_ingestion.py
+++ b/tests/test_ingestion.py
@@ -1,8 +1,10 @@
 """Tests for the ingestion pipeline."""
 import json
 from atocore.ingestion.parser import parse_markdown
 from atocore.models.database import get_connection, init_db
-from atocore.ingestion.pipeline import ingest_file, ingest_folder
+from atocore.ingestion.pipeline import ingest_file, ingest_folder, ingest_project_folder
 def test_parse_markdown(sample_markdown):
@@ -69,6 +71,153 @@ def test_ingest_updates_changed(tmp_data_dir, sample_markdown):
    assert result["status"] == "ingested"
 def test_ingest_file_records_project_id_metadata(tmp_data_dir, sample_markdown, monkeypatch):
    """Project-aware ingestion should tag DB and vector metadata exactly."""
    init_db()
    class FakeVectorStore:
        def __init__(self):
            self.metadatas = []
        def add(self, ids, documents, metadatas):
            self.metadatas.extend(metadatas)
        def delete(self, ids):
            return None
    fake_store = FakeVectorStore()
    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
    result = ingest_file(sample_markdown, project_id="p04-gigabit")
    assert result["status"] == "ingested"
    assert fake_store.metadatas
    assert all(meta["project_id"] == "p04-gigabit" for meta in fake_store.metadatas)
    with get_connection() as conn:
        rows = conn.execute("SELECT metadata FROM source_chunks").fetchall()
    assert rows
    assert all(
        json.loads(row["metadata"])["project_id"] == "p04-gigabit"
        for row in rows
    )
 def test_ingest_file_derives_project_id_from_registry_root(tmp_data_dir, tmp_path, monkeypatch):
    """Unscoped ingest should preserve ownership for files under registered roots."""
    import atocore.config as config
    vault_dir = tmp_path / "vault"
    drive_dir = tmp_path / "drive"
    config_dir = tmp_path / "config"
    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
    project_dir.mkdir(parents=True)
    drive_dir.mkdir()
    config_dir.mkdir()
    note = project_dir / "status.md"
    note.write_text(
        "# Status\n\nCurrent project status with enough detail to create "
        "a retrievable chunk for the ingestion pipeline test.",
        encoding="utf-8",
    )
    registry_path = config_dir / "project-registry.json"
    registry_path.write_text(
        json.dumps(
            {
                "projects": [
                    {
                        "id": "p04-gigabit",
                        "aliases": ["p04"],
                        "ingest_roots": [
                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
                        ],
                    }
                ]
            }
        ),
        encoding="utf-8",
    )
    class FakeVectorStore:
        def __init__(self):
            self.metadatas = []
        def add(self, ids, documents, metadatas):
            self.metadatas.extend(metadatas)
        def delete(self, ids):
            return None
    fake_store = FakeVectorStore()
    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
    config.settings = config.Settings()
    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
    init_db()
    result = ingest_file(note)
    assert result["status"] == "ingested"
    assert fake_store.metadatas
    assert all(meta["project_id"] == "p04-gigabit" for meta in fake_store.metadatas)
 def test_ingest_file_logs_and_fails_open_when_project_derivation_fails(
    tmp_data_dir,
    sample_markdown,
    monkeypatch,
 ):
    """A broken registry should be visible but should not block ingestion."""
    init_db()
    warnings = []
    class FakeVectorStore:
        def __init__(self):
            self.metadatas = []
        def add(self, ids, documents, metadatas):
            self.metadatas.extend(metadatas)
        def delete(self, ids):
            return None
    fake_store = FakeVectorStore()
    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
    monkeypatch.setattr(
        "atocore.projects.registry.derive_project_id_for_path",
        lambda path: (_ for _ in ()).throw(ValueError("registry broken")),
    )
    monkeypatch.setattr(
        "atocore.ingestion.pipeline.log.warning",
        lambda event, **kwargs: warnings.append((event, kwargs)),
    )
    result = ingest_file(sample_markdown)
    assert result["status"] == "ingested"
    assert fake_store.metadatas
    assert all(meta["project_id"] == "" for meta in fake_store.metadatas)
    assert warnings[0][0] == "project_id_derivation_failed"
    assert "registry broken" in warnings[0][1]["error"]
 def test_ingest_project_folder_passes_project_id_to_files(tmp_data_dir, sample_folder, monkeypatch):
    seen = []
    def fake_ingest_file(path, project_id=""):
        seen.append((path.name, project_id))
        return {"file": str(path), "status": "ingested"}
    monkeypatch.setattr("atocore.ingestion.pipeline.ingest_file", fake_ingest_file)
    monkeypatch.setattr("atocore.ingestion.pipeline._purge_deleted_files", lambda *args, **kwargs: 0)
    ingest_project_folder(sample_folder, project_id="p05-interferometer")
    assert seen
    assert {project_id for _, project_id in seen} == {"p05-interferometer"}
 def test_parse_markdown_uses_supplied_text(sample_markdown):
    """Parsing should be able to reuse pre-read content from ingestion."""
    latin_text = """---\ntags: parser\n---\n# Parser Title\n\nBody text."""
--- a/tests/test_memory.py
+++ b/tests/test_memory.py
@@ -428,6 +428,136 @@ def test_context_builder_tag_boost_orders_results(isolated_db):
    assert idx_tagged < idx_untagged
 def test_project_memory_ranking_ignores_scope_noise(isolated_db):
    """Project words should not crowd out the actual query intent."""
    from atocore.memory.service import create_memory, get_memories_for_context
    create_memory(
        "project",
        "Norman is the end operator for p06-polisher and requires an explicit manual mode to operate the machine.",
        project="p06-polisher",
        confidence=0.7,
    )
    create_memory(
        "project",
        "Polisher Control firmware spec document titled 'Fulum Polisher Machine Control Firmware Spec v1' lives in PKM.",
        project="p06-polisher",
        confidence=0.7,
    )
    create_memory(
        "project",
        "Machine design principle: works fully offline and independently; network connection is for remote access only",
        project="p06-polisher",
        confidence=0.5,
    )
    create_memory(
        "project",
        "Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
        project="p06-polisher",
        confidence=0.5,
    )
    text, _ = get_memories_for_context(
        memory_types=["project"],
        project="p06-polisher",
        budget=360,
        query="how do we access the polisher machine remotely",
    )
    assert "Tailscale" in text
    assert text.find("remote access only") < text.find("Tailscale")
    assert "manual mode" not in text
 def test_project_memory_ranking_prefers_multiple_intent_hits(isolated_db):
    """A rich memory with several query hits should beat a terse one-hit memory."""
    from atocore.memory.service import create_memory, get_memories_for_context
    create_memory(
        "project",
        "CGH vendor selected for p05. Active integration coordination with Katie/AOM.",
        project="p05-interferometer",
        confidence=0.7,
    )
    create_memory(
        "knowledge",
        "Vendor-summary current signal: 4D is the strongest technical Twyman-Green candidate; "
        "a certified used Zygo Verifire SV around $55k emerged as a strong value path.",
        project="p05-interferometer",
        confidence=0.9,
    )
    text, _ = get_memories_for_context(
        memory_types=["project", "knowledge"],
        project="p05-interferometer",
        budget=220,
        query="what is the current vendor signal for the interferometer procurement",
    )
    assert "4D" in text
    assert "Zygo" in text
 def test_project_memory_query_ranks_beyond_confidence_prefilter(isolated_db):
    """Query-time ranking should see older low-confidence but exact-intent memories."""
    from atocore.memory.service import create_memory, get_memories_for_context
    for idx in range(35):
        create_memory(
            "project",
            f"High confidence p06 filler memory {idx}: Polisher Control planning note.",
            project="p06-polisher",
            confidence=0.9,
        )
    create_memory(
        "project",
        "Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
        project="p06-polisher",
        confidence=0.5,
    )
    text, _ = get_memories_for_context(
        memory_types=["project"],
        project="p06-polisher",
        budget=360,
        query="how do we access the polisher machine remotely",
    )
    assert "Tailscale" in text
 def test_project_memory_query_prefers_exact_cam_fact(isolated_db):
    from atocore.memory.service import create_memory, get_memories_for_context
    create_memory(
        "project",
        "Polisher Control firmware spec document titled 'Fulum Polisher Machine Control Firmware Spec v1' lives in PKM.",
        project="p06-polisher",
        confidence=0.9,
    )
    create_memory(
        "project",
        "Polisher Control doc must cover manual mode for Norman as a required deliverable per the plan.",
        project="p06-polisher",
        confidence=0.9,
    )
    create_memory(
        "project",
        "Cam amplitude and offset are mechanically set by operator and read via encoders; no actuators control them.",
        project="p06-polisher",
        confidence=0.5,
    )
    text, _ = get_memories_for_context(
        memory_types=["project"],
        project="p06-polisher",
        budget=300,
        query="how is cam amplitude controlled on the polisher",
    )
    assert "encoders" in text
 def test_expire_stale_candidates_keeps_reinforced(isolated_db):
    from atocore.memory.service import create_memory, expire_stale_candidates
    from atocore.models.database import get_connection
--- a/tests/test_project_registry.py
+++ b/tests/test_project_registry.py
@@ -5,6 +5,7 @@ import json
 import atocore.config as config
 from atocore.projects.registry import (
    build_project_registration_proposal,
    derive_project_id_for_path,
    get_registered_project,
    get_project_registry_template,
    list_registered_projects,
@@ -103,6 +104,98 @@ def test_project_registry_resolves_alias(tmp_path, monkeypatch):
    assert project.project_id == "p05-interferometer"
 def test_derive_project_id_for_path_uses_registered_roots(tmp_path, monkeypatch):
    vault_dir = tmp_path / "vault"
    drive_dir = tmp_path / "drive"
    config_dir = tmp_path / "config"
    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
    project_dir.mkdir(parents=True)
    drive_dir.mkdir()
    config_dir.mkdir()
    note = project_dir / "status.md"
    note.write_text("# Status\n\nCurrent work.", encoding="utf-8")
    registry_path = config_dir / "project-registry.json"
    registry_path.write_text(
        json.dumps(
            {
                "projects": [
                    {
                        "id": "p04-gigabit",
                        "aliases": ["p04"],
                        "ingest_roots": [
                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
                        ],
                    }
                ]
            }
        ),
        encoding="utf-8",
    )
    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
    original_settings = config.settings
    try:
        config.settings = config.Settings()
        assert derive_project_id_for_path(note) == "p04-gigabit"
        assert derive_project_id_for_path(tmp_path / "elsewhere.md") == ""
    finally:
        config.settings = original_settings
 def test_project_registry_rejects_cross_project_ingest_root_overlap(tmp_path, monkeypatch):
    vault_dir = tmp_path / "vault"
    drive_dir = tmp_path / "drive"
    config_dir = tmp_path / "config"
    vault_dir.mkdir()
    drive_dir.mkdir()
    config_dir.mkdir()
    registry_path = config_dir / "project-registry.json"
    registry_path.write_text(
        json.dumps(
            {
                "projects": [
                    {
                        "id": "parent",
                        "aliases": [],
                        "ingest_roots": [
                            {"source": "vault", "subpath": "incoming/projects/parent"}
                        ],
                    },
                    {
                        "id": "child",
                        "aliases": [],
                        "ingest_roots": [
                            {"source": "vault", "subpath": "incoming/projects/parent/child"}
                        ],
                    },
                ]
            }
        ),
        encoding="utf-8",
    )
    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
    original_settings = config.settings
    try:
        config.settings = config.Settings()
        try:
            list_registered_projects()
        except ValueError as exc:
            assert "ingest root overlap" in str(exc)
        else:
            raise AssertionError("Expected overlapping ingest roots to raise")
    finally:
        config.settings = original_settings
 def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypatch):
    vault_dir = tmp_path / "vault"
    drive_dir = tmp_path / "drive"
@@ -133,8 +226,8 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
    calls = []
-    def fake_ingest_folder(path, purge_deleted=True):
+    def fake_ingest_folder(path, purge_deleted=True, project_id=""):
-        calls.append((str(path), purge_deleted))
+        calls.append((str(path), purge_deleted, project_id))
        return [{"file": str(path / "README.md"), "status": "ingested"}]
    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -144,7 +237,7 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
+        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fake_ingest_folder)
        result = refresh_registered_project("polisher")
    finally:
        config.settings = original_settings
@@ -153,6 +246,7 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
    assert len(calls) == 1
    assert calls[0][0].endswith("p06-polisher")
    assert calls[0][1] is False
    assert calls[0][2] == "p06-polisher"
    assert result["roots"][0]["status"] == "ingested"
    assert result["status"] == "ingested"
    assert result["roots_ingested"] == 1
@@ -188,7 +282,7 @@ def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
        encoding="utf-8",
    )
-    def fail_ingest_folder(path, purge_deleted=True):
+    def fail_ingest_folder(path, purge_deleted=True, project_id=""):
        raise AssertionError(f"ingest_folder should not be called for missing root: {path}")
    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -198,7 +292,7 @@ def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fail_ingest_folder)
+        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fail_ingest_folder)
        result = refresh_registered_project("ghost")
    finally:
        config.settings = original_settings
@@ -238,7 +332,7 @@ def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch
        encoding="utf-8",
    )
-    def fake_ingest_folder(path, purge_deleted=True):
+    def fake_ingest_folder(path, purge_deleted=True, project_id=""):
        return [{"file": str(path / "README.md"), "status": "ingested"}]
    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -248,7 +342,7 @@ def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
+        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fake_ingest_folder)
        result = refresh_registered_project("mixed")
    finally:
        config.settings = original_settings
--- a/tests/test_retrieval.py
+++ b/tests/test_retrieval.py
@@ -70,8 +70,28 @@ def test_retrieve_skips_stale_vector_entries(tmp_data_dir, sample_markdown, monk
 def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
    target_project = type(
        "Project",
        (),
        {
            "project_id": "p04-gigabit",
            "aliases": ("p04", "gigabit"),
            "ingest_roots": (),
        },
    )()
    other_project = type(
        "Project",
        (),
        {
            "project_id": "p05-interferometer",
            "aliases": ("p05", "interferometer"),
            "ingest_roots": (),
        },
    )()
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            assert top_k == 8
            return {
                "ids": [["chunk-a", "chunk-b"]],
                "documents": [["project doc", "other doc"]],
@@ -102,22 +122,501 @@ def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: type(
+        lambda project_name: target_project,
-            "Project",
+    )
-            (),
+    monkeypatch.setattr(
-            {
+        "atocore.retrieval.retriever.load_project_registry",
-                "project_id": "p04-gigabit",
+        lambda: [target_project, other_project],
                "aliases": ("p04", "gigabit"),
                "ingest_roots": (),
            },
        )(),
    )
    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
-    assert len(results) == 2
+    assert len(results) == 1
    assert results[0].chunk_id == "chunk-a"
-    assert results[0].score > results[1].score
+
 def test_retrieve_project_scope_allows_unowned_global_chunks(monkeypatch):
    target_project = type(
        "Project",
        (),
        {
            "project_id": "p04-gigabit",
            "aliases": ("p04", "gigabit"),
            "ingest_roots": (),
        },
    )()
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            return {
                "ids": [["chunk-a", "chunk-global"]],
                "documents": [["project doc", "global doc"]],
                "metadatas": [[
                    {
                        "heading_path": "Overview",
                        "source_file": "p04-gigabit/pkm/_index.md",
                        "tags": '["p04-gigabit"]',
                        "title": "P04",
                        "document_id": "doc-a",
                    },
                    {
                        "heading_path": "Overview",
                        "source_file": "shared/engineering-rules.md",
                        "tags": "[]",
                        "title": "Shared engineering rules",
                        "document_id": "doc-global",
                    },
                ]],
                "distances": [[0.2, 0.21]],
            }
    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
    monkeypatch.setattr(
        "atocore.retrieval.retriever._existing_chunk_ids",
        lambda chunk_ids: set(chunk_ids),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
        lambda project_name: target_project,
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.load_project_registry",
        lambda: [target_project],
    )
    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-global"]
 def test_retrieve_project_scope_filter_can_be_disabled(monkeypatch):
    target_project = type(
        "Project",
        (),
        {
            "project_id": "p04-gigabit",
            "aliases": ("p04", "gigabit"),
            "ingest_roots": (),
        },
    )()
    other_project = type(
        "Project",
        (),
        {
            "project_id": "p05-interferometer",
            "aliases": ("p05", "interferometer"),
            "ingest_roots": (),
        },
    )()
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            assert top_k == 2
            return {
                "ids": [["chunk-a", "chunk-b"]],
                "documents": [["project doc", "other project doc"]],
                "metadatas": [[
                    {
                        "heading_path": "Overview",
                        "source_file": "p04-gigabit/pkm/_index.md",
                        "tags": '["p04-gigabit"]',
                        "title": "P04",
                        "document_id": "doc-a",
                    },
                    {
                        "heading_path": "Overview",
                        "source_file": "p05-interferometer/pkm/_index.md",
                        "tags": '["p05-interferometer"]',
                        "title": "P05",
                        "document_id": "doc-b",
                    },
                ]],
                "distances": [[0.2, 0.2]],
            }
    monkeypatch.setattr("atocore.config.settings.rank_project_scope_filter", False)
    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
    monkeypatch.setattr(
        "atocore.retrieval.retriever._existing_chunk_ids",
        lambda chunk_ids: set(chunk_ids),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
        lambda project_name: target_project,
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.load_project_registry",
        lambda: [target_project, other_project],
    )
    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
    assert {r.chunk_id for r in results} == {"chunk-a", "chunk-b"}
 def test_retrieve_project_scope_ignores_title_for_ownership(monkeypatch):
    target_project = type(
        "Project",
        (),
        {
            "project_id": "p04-gigabit",
            "aliases": ("p04", "gigabit"),
            "ingest_roots": (),
        },
    )()
    other_project = type(
        "Project",
        (),
        {
            "project_id": "p06-polisher",
            "aliases": ("p06", "polisher", "p11"),
            "ingest_roots": (),
        },
    )()
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            return {
                "ids": [["chunk-target", "chunk-poisoned-title"]],
                "documents": [["p04 doc", "p06 doc"]],
                "metadatas": [[
                    {
                        "heading_path": "Overview",
                        "source_file": "p04-gigabit/pkm/_index.md",
                        "tags": '["p04-gigabit"]',
                        "title": "P04",
                        "document_id": "doc-a",
                    },
                    {
                        "heading_path": "Overview",
                        "source_file": "p06-polisher/pkm/architecture.md",
                        "tags": '["p06-polisher"]',
                        "title": "GigaBIT M1 mirror lessons",
                        "document_id": "doc-b",
                    },
                ]],
                "distances": [[0.2, 0.19]],
            }
    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
    monkeypatch.setattr(
        "atocore.retrieval.retriever._existing_chunk_ids",
        lambda chunk_ids: set(chunk_ids),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
        lambda project_name: target_project,
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.load_project_registry",
        lambda: [target_project, other_project],
    )
    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
    assert [r.chunk_id for r in results] == ["chunk-target"]
 def test_retrieve_project_scope_uses_path_segments_not_substrings(monkeypatch):
    target_project = type(
        "Project",
        (),
        {
            "project_id": "p05-interferometer",
            "aliases": ("p05", "interferometer"),
            "ingest_roots": (),
        },
    )()
    abb_project = type(
        "Project",
        (),
        {
            "project_id": "abb-space",
            "aliases": ("abb",),
            "ingest_roots": (),
        },
    )()
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            return {
                "ids": [["chunk-target", "chunk-global"]],
                "documents": [["p05 doc", "global doc"]],
                "metadatas": [[
                    {
                        "heading_path": "Overview",
                        "source_file": "p05-interferometer/pkm/_index.md",
                        "tags": '["p05-interferometer"]',
                        "title": "P05",
                        "document_id": "doc-a",
                    },
                    {
                        "heading_path": "Abbreviation notes",
                        "source_file": "shared/cabbage-abbreviations.md",
                        "tags": "[]",
                        "title": "ABB-style abbreviations",
                        "document_id": "doc-global",
                    },
                ]],
                "distances": [[0.2, 0.21]],
            }
    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
    monkeypatch.setattr(
        "atocore.retrieval.retriever._existing_chunk_ids",
        lambda chunk_ids: set(chunk_ids),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
        lambda project_name: target_project,
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.load_project_registry",
        lambda: [target_project, abb_project],
    )
    results = retrieve("abbreviations", top_k=2, project_hint="p05")
    assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
 def test_retrieve_project_scope_prefers_exact_project_id(monkeypatch):
    target_project = type(
        "Project",
        (),
        {
            "project_id": "p04-gigabit",
            "aliases": ("p04", "gigabit"),
            "ingest_roots": (),
        },
    )()
    other_project = type(
        "Project",
        (),
        {
            "project_id": "p06-polisher",
            "aliases": ("p06", "polisher"),
            "ingest_roots": (),
        },
    )()
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            return {
                "ids": [["chunk-target", "chunk-other", "chunk-global"]],
                "documents": [["target doc", "other doc", "global doc"]],
                "metadatas": [[
                    {
                        "heading_path": "Overview",
                        "source_file": "legacy/unhelpful-path.md",
                        "tags": "[]",
                        "title": "Target",
                        "project_id": "p04-gigabit",
                        "document_id": "doc-a",
                    },
                    {
                        "heading_path": "Overview",
                        "source_file": "p04-gigabit/title-poisoned.md",
                        "tags": '["p04-gigabit"]',
                        "title": "Looks target-owned but is explicit p06",
                        "project_id": "p06-polisher",
                        "document_id": "doc-b",
                    },
                    {
                        "heading_path": "Overview",
                        "source_file": "shared/global.md",
                        "tags": "[]",
                        "title": "Shared",
                        "project_id": "",
                        "document_id": "doc-global",
                    },
                ]],
                "distances": [[0.2, 0.19, 0.21]],
            }
    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
    monkeypatch.setattr(
        "atocore.retrieval.retriever._existing_chunk_ids",
        lambda chunk_ids: set(chunk_ids),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
        lambda project_name: target_project,
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.load_project_registry",
        lambda: [target_project, other_project],
    )
    results = retrieve("mirror architecture", top_k=3, project_hint="p04")
    assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
 def test_retrieve_empty_project_id_falls_back_to_path_ownership(monkeypatch):
    target_project = type(
        "Project",
        (),
        {
            "project_id": "p04-gigabit",
            "aliases": ("p04", "gigabit"),
            "ingest_roots": (),
        },
    )()
    other_project = type(
        "Project",
        (),
        {
            "project_id": "p05-interferometer",
            "aliases": ("p05", "interferometer"),
            "ingest_roots": (),
        },
    )()
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            return {
                "ids": [["chunk-target", "chunk-other"]],
                "documents": [["target doc", "other doc"]],
                "metadatas": [[
                    {
                        "heading_path": "Overview",
                        "source_file": "p04-gigabit/status.md",
                        "tags": "[]",
                        "title": "Target",
                        "project_id": "",
                        "document_id": "doc-a",
                    },
                    {
                        "heading_path": "Overview",
                        "source_file": "p05-interferometer/status.md",
                        "tags": "[]",
                        "title": "Other",
                        "project_id": "",
                        "document_id": "doc-b",
                    },
                ]],
                "distances": [[0.2, 0.19]],
            }
    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
    monkeypatch.setattr(
        "atocore.retrieval.retriever._existing_chunk_ids",
        lambda chunk_ids: set(chunk_ids),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
        lambda project_name: target_project,
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.load_project_registry",
        lambda: [target_project, other_project],
    )
    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
    assert [r.chunk_id for r in results] == ["chunk-target"]
 def test_retrieve_unknown_project_hint_does_not_widen_or_filter(monkeypatch):
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            assert top_k == 2
            return {
                "ids": [["chunk-a", "chunk-b"]],
                "documents": [["doc a", "doc b"]],
                "metadatas": [[
                    {
                        "heading_path": "Overview",
                        "source_file": "project-a/file.md",
                        "tags": "[]",
                        "title": "A",
                        "document_id": "doc-a",
                    },
                    {
                        "heading_path": "Overview",
                        "source_file": "project-b/file.md",
                        "tags": "[]",
                        "title": "B",
                        "document_id": "doc-b",
                    },
                ]],
                "distances": [[0.2, 0.21]],
            }
    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
    monkeypatch.setattr(
        "atocore.retrieval.retriever._existing_chunk_ids",
        lambda chunk_ids: set(chunk_ids),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
        lambda project_name: None,
    )
    results = retrieve("overview", top_k=2, project_hint="unknown-project")
    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
 def test_retrieve_fails_open_when_project_scope_resolution_fails(monkeypatch):
    warnings = []
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
            assert top_k == 2
            return {
                "ids": [["chunk-a", "chunk-b"]],
                "documents": [["doc a", "doc b"]],
                "metadatas": [[
                    {
                        "heading_path": "Overview",
                        "source_file": "p04-gigabit/file.md",
                        "tags": "[]",
                        "title": "A",
                        "document_id": "doc-a",
                    },
                    {
                        "heading_path": "Overview",
                        "source_file": "p05-interferometer/file.md",
                        "tags": "[]",
                        "title": "B",
                        "document_id": "doc-b",
                    },
                ]],
                "distances": [[0.2, 0.21]],
            }
    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
    monkeypatch.setattr(
        "atocore.retrieval.retriever._existing_chunk_ids",
        lambda chunk_ids: set(chunk_ids),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
        lambda project_name: (_ for _ in ()).throw(ValueError("registry overlap")),
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.log.warning",
        lambda event, **kwargs: warnings.append((event, kwargs)),
    )
    results = retrieve("overview", top_k=2, project_hint="p04")
    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
    assert {warning[0] for warning in warnings} == {
        "project_scope_resolution_failed",
        "project_match_boost_resolution_failed",
    }
    assert all("registry overlap" in warning[1]["error"] for warning in warnings)
 def test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths(monkeypatch):
--- a/tests/test_v1_0_write_invariants.py
+++ b/tests/test_v1_0_write_invariants.py
@@ -0,0 +1,398 @@
 """V1-0 write-time invariant tests.
 Covers the Engineering V1 completion plan Phase V1-0 acceptance:
 - F-1 shared-header fields: extractor_version + canonical_home + hand_authored
  land in the entities table with working defaults
 - F-8 provenance enforcement: create_entity raises without source_refs
  unless hand_authored=True
 - F-5 synchronous conflict-detection hook on any active-entity write
  (create_entity with status="active" + the pre-existing promote_entity
  path); fail-open per conflict-model.md:256
 - Q-3 "flag, never block": a conflict never 4xx-blocks the write
 - Q-4 partial trust: get_entities scope_only filters candidates out
 Plan: docs/plans/engineering-v1-completion-plan.md
 Spec: docs/architecture/engineering-v1-acceptance.md
 """
 from __future__ import annotations
 import pytest
 from atocore.engineering.service import (
    EXTRACTOR_VERSION,
    create_entity,
    create_relationship,
    get_entities,
    get_entity,
    init_engineering_schema,
    promote_entity,
    supersede_entity,
 )
 from atocore.models.database import get_connection, init_db
 # ---------- F-1: shared-header fields ----------
 def test_entity_row_has_shared_header_fields(tmp_data_dir):
    init_db()
    init_engineering_schema()
    with get_connection() as conn:
        cols = {row["name"] for row in conn.execute("PRAGMA table_info(entities)").fetchall()}
    assert "extractor_version" in cols
    assert "canonical_home" in cols
    assert "hand_authored" in cols
 def test_created_entity_has_default_extractor_version_and_canonical_home(tmp_data_dir):
    init_db()
    init_engineering_schema()
    e = create_entity(
        entity_type="component",
        name="Pivot Pin",
        project="p04-gigabit",
        source_refs=["test:fixture"],
    )
    assert e.extractor_version == EXTRACTOR_VERSION
    assert e.canonical_home == "entity"
    assert e.hand_authored is False
    # round-trip through get_entity to confirm the row mapper returns
    # the same values (not just the return-by-construct path)
    got = get_entity(e.id)
    assert got is not None
    assert got.extractor_version == EXTRACTOR_VERSION
    assert got.canonical_home == "entity"
    assert got.hand_authored is False
 def test_explicit_extractor_version_is_persisted(tmp_data_dir):
    init_db()
    init_engineering_schema()
    e = create_entity(
        entity_type="decision",
        name="Pick GF-PTFE pads",
        project="p04-gigabit",
        source_refs=["interaction:abc"],
        extractor_version="custom-v2.3",
    )
    got = get_entity(e.id)
    assert got.extractor_version == "custom-v2.3"
 # ---------- F-8: provenance enforcement ----------
 def test_create_entity_without_provenance_raises(tmp_data_dir):
    init_db()
    init_engineering_schema()
    with pytest.raises(ValueError, match="source_refs required"):
        create_entity(
            entity_type="component",
            name="No Provenance",
            project="p04-gigabit",
            hand_authored=False,  # explicit — bypasses the test-conftest auto-flag
        )
 def test_create_entity_with_hand_authored_needs_no_source_refs(tmp_data_dir):
    init_db()
    init_engineering_schema()
    e = create_entity(
        entity_type="component",
        name="Human Entry",
        project="p04-gigabit",
        hand_authored=True,
    )
    assert e.hand_authored is True
    got = get_entity(e.id)
    assert got.hand_authored is True
    # source_refs stays empty — the hand_authored flag IS the provenance
    assert got.source_refs == []
 def test_create_entity_with_empty_source_refs_list_is_treated_as_missing(tmp_data_dir):
    init_db()
    init_engineering_schema()
    with pytest.raises(ValueError, match="source_refs required"):
        create_entity(
            entity_type="component",
            name="Empty Refs",
            project="p04-gigabit",
            source_refs=[],
            hand_authored=False,
        )
 def test_promote_rejects_legacy_candidate_without_provenance(tmp_data_dir):
    """Regression (Codex V1-0 probe): candidate rows can exist in the DB
    from before V1-0 enforcement (or from paths that bypass create_entity).
    promote_entity must re-check the invariant and refuse to flip a
    no-provenance candidate to active. Without this check, the active
    store can leak F-8 violations in from legacy data."""
    init_db()
    init_engineering_schema()
    # Simulate a pre-V1-0 candidate by inserting directly into the table,
    # bypassing the service-layer invariant. Real legacy rows look exactly
    # like this: empty source_refs, hand_authored=0.
    import uuid as _uuid
    entity_id = str(_uuid.uuid4())
    with get_connection() as conn:
        conn.execute(
            "INSERT INTO entities (id, entity_type, name, project, "
            "description, properties, status, confidence, source_refs, "
            "extractor_version, canonical_home, hand_authored, "
            "created_at, updated_at) "
            "VALUES (?, 'component', 'Legacy Orphan', 'p04-gigabit', "
            "'', '{}', 'candidate', 1.0, '[]', '', 'entity', 0, "
            "CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)",
            (entity_id,),
        )
    with pytest.raises(ValueError, match="source_refs required"):
        promote_entity(entity_id)
    # And the row stays a candidate — no half-transition.
    got = get_entity(entity_id)
    assert got is not None
    assert got.status == "candidate"
 def test_api_promote_returns_400_on_legacy_no_provenance(tmp_data_dir):
    """R14 (Codex, 2026-04-22): the HTTP promote route must translate
    the V1-0 ValueError for no-provenance candidates into 400, not 500.
    Previously the route didn't catch ValueError so legacy bad
    candidates surfaced as a server error."""
    init_db()
    init_engineering_schema()
    import uuid as _uuid
    from fastapi.testclient import TestClient
    from atocore.main import app
    entity_id = str(_uuid.uuid4())
    with get_connection() as conn:
        conn.execute(
            "INSERT INTO entities (id, entity_type, name, project, "
            "description, properties, status, confidence, source_refs, "
            "extractor_version, canonical_home, hand_authored, "
            "created_at, updated_at) "
            "VALUES (?, 'component', 'Legacy HTTP', 'p04-gigabit', "
            "'', '{}', 'candidate', 1.0, '[]', '', 'entity', 0, "
            "CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)",
            (entity_id,),
        )
    client = TestClient(app)
    r = client.post(f"/entities/{entity_id}/promote")
    assert r.status_code == 400
    assert "source_refs required" in r.json().get("detail", "")
    # Row still candidate — the 400 didn't half-transition.
    got = get_entity(entity_id)
    assert got is not None
    assert got.status == "candidate"
 def test_promote_accepts_candidate_flagged_hand_authored(tmp_data_dir):
    """The other side of the promote re-check: hand_authored=1 with
    empty source_refs still lets promote succeed, matching
    create_entity's symmetry."""
    init_db()
    init_engineering_schema()
    import uuid as _uuid
    entity_id = str(_uuid.uuid4())
    with get_connection() as conn:
        conn.execute(
            "INSERT INTO entities (id, entity_type, name, project, "
            "description, properties, status, confidence, source_refs, "
            "extractor_version, canonical_home, hand_authored, "
            "created_at, updated_at) "
            "VALUES (?, 'component', 'Hand Authored Candidate', "
            "'p04-gigabit', '', '{}', 'candidate', 1.0, '[]', '', "
            "'entity', 1, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)",
            (entity_id,),
        )
    assert promote_entity(entity_id) is True
    assert get_entity(entity_id).status == "active"
 # ---------- F-5: synchronous conflict-detection hook ----------
 def test_active_create_runs_conflict_detection_hook(tmp_data_dir, monkeypatch):
    """status=active writes trigger detect_conflicts_for_entity."""
    init_db()
    init_engineering_schema()
    called_with: list[str] = []
    def _fake_detect(entity_id: str):
        called_with.append(entity_id)
        return []
    import atocore.engineering.conflicts as conflicts_mod
    monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _fake_detect)
    e = create_entity(
        entity_type="component",
        name="Active With Hook",
        project="p04-gigabit",
        source_refs=["test:hook"],
        status="active",
    )
    assert called_with == [e.id]
 def test_supersede_runs_conflict_detection_on_new_active(tmp_data_dir, monkeypatch):
    """Regression (Codex V1-0 probe): per plan's 'every active-entity
    write path', supersede_entity must trigger synchronous conflict
    detection. The subject is the `superseded_by` entity — the one
    whose graph state just changed because a new `supersedes` edge was
    rooted at it."""
    init_db()
    init_engineering_schema()
    old = create_entity(
        entity_type="component",
        name="Old Pad",
        project="p04-gigabit",
        source_refs=["test:old"],
        status="active",
    )
    new = create_entity(
        entity_type="component",
        name="New Pad",
        project="p04-gigabit",
        source_refs=["test:new"],
        status="active",
    )
    called_with: list[str] = []
    def _fake_detect(entity_id: str):
        called_with.append(entity_id)
        return []
    import atocore.engineering.conflicts as conflicts_mod
    monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _fake_detect)
    assert supersede_entity(old.id, superseded_by=new.id) is True
    # The detector fires on the `superseded_by` entity — the one whose
    # edges just grew a new `supersedes` relationship.
    assert new.id in called_with
 def test_supersede_hook_fails_open(tmp_data_dir, monkeypatch):
    """Supersede must survive a broken detector per Q-3 flag-never-block."""
    init_db()
    init_engineering_schema()
    old = create_entity(
        entity_type="component", name="Old2", project="p04-gigabit",
        source_refs=["test:old"], status="active",
    )
    new = create_entity(
        entity_type="component", name="New2", project="p04-gigabit",
        source_refs=["test:new"], status="active",
    )
    def _boom(entity_id: str):
        raise RuntimeError("synthetic detector failure")
    import atocore.engineering.conflicts as conflicts_mod
    monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _boom)
    # The supersede still succeeds despite the detector blowing up.
    assert supersede_entity(old.id, superseded_by=new.id) is True
    assert get_entity(old.id).status == "superseded"
 def test_candidate_create_does_not_run_conflict_hook(tmp_data_dir, monkeypatch):
    """status=candidate writes do NOT trigger detection — the hook is
    for active rows only, per V1-0 scope. Candidates are checked at
    promote time."""
    init_db()
    init_engineering_schema()
    called: list[str] = []
    def _fake_detect(entity_id: str):
        called.append(entity_id)
        return []
    import atocore.engineering.conflicts as conflicts_mod
    monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _fake_detect)
    create_entity(
        entity_type="component",
        name="Candidate No Hook",
        project="p04-gigabit",
        source_refs=["test:cand"],
        status="candidate",
    )
    assert called == []
 # ---------- Q-3: flag, never block ----------
 def test_conflict_detector_failure_does_not_block_write(tmp_data_dir, monkeypatch):
    """Per conflict-model.md:256: detection errors must not fail the
    write. The entity is still created; only a warning is logged."""
    init_db()
    init_engineering_schema()
    def _boom(entity_id: str):
        raise RuntimeError("synthetic detector failure")
    import atocore.engineering.conflicts as conflicts_mod
    monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _boom)
    # The write still succeeds — no exception propagates.
    e = create_entity(
        entity_type="component",
        name="Hook Fails Open",
        project="p04-gigabit",
        source_refs=["test:failopen"],
        status="active",
    )
    assert get_entity(e.id) is not None
 # ---------- Q-4 (partial): trust-hierarchy — scope_only filters candidates ----------
 def test_scope_only_active_does_not_return_candidates(tmp_data_dir):
    """V1-0 partial Q-4: active-scoped listing never returns candidates.
    Full trust-hierarchy coverage (no-auto-project-state, etc.) ships in
    V1-E per plan."""
    init_db()
    init_engineering_schema()
    active = create_entity(
        entity_type="component",
        name="Active Alpha",
        project="p04-gigabit",
        source_refs=["test:alpha"],
        status="active",
    )
    candidate = create_entity(
        entity_type="component",
        name="Candidate Beta",
        project="p04-gigabit",
        source_refs=["test:beta"],
        status="candidate",
    )
    listed = get_entities(project="p04-gigabit", status="active", scope_only=True)
    ids = {e.id for e in listed}
    assert active.id in ids
    assert candidate.id not in ids
Author	SHA1	Message	Date
Anto01	d3de9f67ea	fix(context): rank trusted state by query relevance	2026-04-24 20:54:56 -04:00
Anto01	0fc6705173	docs: record retrieval stabilization state	2026-04-24 20:45:05 -04:00
Anto01	a87d9845a8	fix(memory): widen query-time context candidates	2026-04-24 20:29:00 -04:00
Antoine Letarte	4744c69d10	fix(memory): rank project memories by query intent	2026-04-24 20:49:53 +00:00
Anto01	867a1abfaa	merge: explicit project id retrieval metadata	2026-04-24 11:36:19 -04:00
Anto01	05c11fd4fb	fix(retrieval): fail open on registry resolution errors	2026-04-24 11:32:46 -04:00
Anto01	ce6ffdbb63	fix(retrieval): preserve project ids across unscoped ingest	2026-04-24 11:22:13 -04:00
Anto01	c03022d864	feat(retrieval): persist explicit chunk project ids	2026-04-24 11:02:30 -04:00
Anto01	f44a211497	merge: project-scoped retrieval audit improvements	2026-04-24 10:47:15 -04:00
Anto01	c7212900b0	fix(retrieval): enforce project-scoped context boundaries	2026-04-24 10:46:56 -04:00
Anto01	c53e61eb67	docs(ledger): R14 closure + live_sha bump to `2b86543` post-deploy - Orientation: live_sha `2712c5d` -> `2b86543` (verified via /health at 2026-04-23T15:20:53Z), test_count 547 -> 548, main_tip -> `2b86543`, open_branches -> none - Session Log: R14 close-out narrative (Codex no-findings, squash-merge `0989fed`, deploy `2b86543` status=ok). V1-A gate status recap: density CLEARED (784 active), soak day 5/~7.	2026-04-23 11:33:29 -04:00
Anto01	2b86543e6a	docs(ledger): fill R14 resolved_by with squash SHA `0989fed`	2026-04-23 11:20:47 -04:00
Anto01	0989fed9ee	fix(api): R14 — promote route translates V1-0 ValueError to 400 Squash-merge of branch claude/r14-promote-400 (`3888db9`), approved by Codex (no findings, targeted suite 15 passed). POST /entities/{id}/promote now wraps promote_entity in try/except ValueError → HTTPException(400). Previously the V1-0 provenance re-check raised ValueError that the route didn't catch, so legacy no-provenance candidates promoted via the API surfaced as 500 instead of 400. Matches the existing ValueError → 400 handling on POST /entities (routes.py:1490). New regression test test_api_promote_returns_400_on_legacy_no_provenance inserts a pre-V1-0 candidate directly, POSTs promote, asserts 400 with the expected detail, asserts the row stays candidate. Also adds .obsidian/, .vscode/, .idea/ to .gitignore so editor state doesn't sneak into future commits. Test count: 547 → 548. Closes R14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 11:20:15 -04:00
Anto01	98bc848184	docs(ledger): refresh Orientation from live Dalidou 2026-04-23 The Orientation counts were last refreshed 2026-04-22 and have drifted materially because the density batch-extract ran between then and now: - active_memories: 84 -> 784 (V1-A density gate cleared — 100+ target crushed) - interactions: 234 -> 500+ (real count pending /stats re-sample) - entities: 35 -> 66 - conflicts: 0 open - candidate_memories: 2 (queue drained by auto-triage) Also added: - Last nightly pipeline summary (2026-04-23T03:00:20Z): harness 17/18, triage 3+7+0, dedup 7 clusters auto-merged, graduation 30-skipped 0-graduated 0-errors — cleanly executed - active_track now records soak status (day 5 of ~7, all clean) and flags density gate as cleared No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:12:06 -04:00
Anto01	69176d11c5	docs: land wiki-reorg supersession record (was in working tree) - DEV-LEDGER Recent Decisions: the wiki-reorg plan drafted earlier 2026-04-22 is superseded by a read-only operator-orientation plan in the ATOCore-clean workspace. Decision authored by Antoine, drafted by Claude in a parallel session. - docs/plans/wiki-reorg-plan.md: new file, carries a SUPERSEDED banner at the top pointing to operator-orientation-plan.md. Kept in-tree for historical context; explicitly marked do-not-implement. Syncing working-tree state — no authoring from this session beyond committing what was already on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:11:22 -04:00
Anto01	4ca81e9b36	docs: reflect V1-0 landed + V1 completion track active + resume map - current-state.md: header bumped to 2026-04-22, live_sha `2712c5d`, 547 tests. New "V1-0 landed" section covers what merged, what the prod backfill did, and where the next phase stands. - master-plan-status.md: new "Active - Engineering V1 Completion Track (started 2026-04-22)" section between Now and Next. Contains the 7-phase table with V1-0 marked done, V1-A gated, V1-B..V1-F pending. V1 removed from the "Next" section since it is now Active. - DEV-LEDGER.md Orientation: two new pointers — active_track points to the completion plan + resume map; open_branches points to the R14 branch still awaiting Codex review. - docs/plans/v1-resume-state.md (new): single-page "you are here" for any future session. Covers state of play, start-gates for V1-A, pre-flight checklist, phase map, parallel-safe work, do-nots, open findings, agreement protocol history, reference index. Designed to be cold-readable — no prior session context required. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 16:01:09 -04:00
Anto01	22a37a7241	docs(ledger): record V1-0 closure + prod backfill + R14 residual - Orientation bumped: live_sha `775960c` -> `2712c5d`, test_count 533 -> 547, main_tip `999788b` -> `2712c5d` - Recent Decisions: V1-0 approved, merged, deployed, 31-row prod backfill with zero-remainder follow-up dry-run - Open Review Findings: new R14 (P2) — POST /entities/{id}/promote returns 500 instead of 400 on the new ValueError from legacy no-provenance candidate promotion. Non-blocking, follow-up tidy - Session Log: full V1-0 closure narrative + V1-A gate status V1-A (Q-001 subsystem-scoped variant + Q-6 killer-correctness integration) holds until the soak window ends ~2026-04-26 and the 100-active-memory density target is hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 15:17:06 -04:00
Anto01	2712c5d2d0	feat(engineering): enforce V1-0 write invariants	2026-04-22 14:59:17 -04:00