docs(ledger): V1-A landed and deployed

build 23cdb31. Live harness 20/20. V1-A status in master-plan-status.md flipped to ✅; V1-B is the next phase. 605 tests on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(engineering): V1-A — Q-001 subsystem-scoped + pillar query integration test
2026-04-29 13:11:07 -04:00 · 2026-04-29 13:03:58 -04:00 · 2026-04-29 11:41:21 -04:00 · 2026-04-29 10:32:41 -04:00 · 2026-04-28 22:23:09 -04:00 · 2026-04-28 22:16:18 -04:00
29 changed files with 3099 additions and 152 deletions
--- a/DEV-LEDGER.md
+++ b/DEV-LEDGER.md
@@ -6,26 +6,26 @@

 ## Orientation

- **live_sha** (Dalidou `/health` build_sha): `2b86543` (verified 2026-04-23T15:20:53Z post-R14 deploy; status=ok)
- **last_updated**: 2026-04-23 by Claude (R14 squash-merged + deployed; Orientation refreshed)
- **main_tip**: `2b86543`
- **test_count**: 548 (547 + 1 R14 regression test)
- **harness**: `17/18 PASS` on live Dalidou (p04-constraints expects "Zerodur" — known content gap, not regression; consistent since 2026-04-19)
+- **live_sha** (Dalidou `/health` build_sha): `23cdb31` (verified 2026-04-29T17:04:05Z post V1-A deploy; status=ok)
+- **last_updated**: 2026-04-29 by Claude (Wave 1 + Wave 1.5 + V1-A all squash-merged and deployed today)
+- **main_tip**: `23cdb31`
+- **test_count**: 605 on `main` (572 + 14 Wave 1 + 10 Wave 1.5 + 9 V1-A)
+- **harness**: `20/20 PASS` on live Dalidou against `23cdb31`, 0 blocking failures, 0 known issues
 - **vectors**: 33,253
- **active_memories**: 784 (up from 84 pre-density-batch — density gate CRUSHED vs V1-A's 100-target)
- **candidate_memories**: 2 (triage queue drained)
- **interactions**: 500+ (limit=2000 query returned 500 — density batch has been running; actual may be higher, confirm via /stats next update)
- **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08)
- **project_state_entries**: 63 (atocore alone; full cross-project count not re-sampled this update)
- **entities**: 66 (up from 35 — V1-0 backfill + ongoing work; 0 open conflicts)
+- **active_memories**: 1170 dashboard (live SQL post-fix) vs 1091 integrity-panel snapshot (last nightly 2026-04-28T03:00:30Z; difference is ~80 captures since the snapshot — no longer a sampling artifact). Total memories: 2436. Pre-deploy the dashboard reported 315 due to a confidence-sorted limit=500 sampling bug, now closed.
+- **candidate_memories**: 44 (live as of 2026-04-29T02:00Z; queue accumulated since the 2026-04-28 nightly autotriage)
+- **interactions**: 1054 (claude-code 474, openclaw 576; verified `/admin/dashboard` 2026-04-29)
+- **registered_projects** (8): atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08), **apm** (registered 2026-04-29 — Atomaste Parts Manager I03), **openclaw** (registered 2026-04-29 — external LLM agent harness, read-only pull integration; alias `clawd`). `/admin/projects/proposals?min_active=10` is now empty.
+- **project_state_entries**: 128 across registered projects (verified 2026-04-29)
+- **entities**: 66 (V1-0 backfill complete; 0 open conflicts)
 - **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified
- **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)** → **pipeline summary (NEW)**
+- **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → auto-promote/expire → weekly synth/lint Sundays → retrieval harness → pipeline summary
 - **capture_clients**: claude-code (Stop hook + cwd project inference), openclaw (before_agent_start + llm_output plugin, verified live)
 - **wiki**: http://dalidou:8100/wiki (browse), /wiki/projects/{id}, /wiki/entities/{id}, /wiki/search
- **dashboard**: http://dalidou:8100/admin/dashboard (now shows pipeline health, interaction totals by client, all registered projects)
- **active_track**: Engineering V1 Completion (started 2026-04-22). V1-0 landed (`2712c5d`). V1-A density gate CLEARED (784 active ≫ 100 target as of 2026-04-23). V1-A soak gate at day 5/~7 (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; failures confined to the known p04-constraints content gap). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`.
- **last_nightly_pipeline**: `2026-04-23T03:00:20Z` — harness 17/18, triage promoted=3 rejected=7 human=0, dedup 7 clusters (1 tier1 + 6 tier2 auto-merged), graduation 30-skipped 0-graduated 0-errors, auto-triage drained the queue (0 new candidates 2026-04-22T00:52Z run)
- **open_branches**: none — R14 squash-merged as `0989fed` and deployed 2026-04-23T15:20:53Z. V1-A is the next scheduled work
+- **dashboard**: http://dalidou:8100/admin/dashboard (now reports SQL-aggregate counts; new fields: `memories.total`, `memories.by_status`)
+- **active_track**: Wave 1 + Wave 1.5 + V1-A closed → V1-B is the next phase (KB-CAD/KB-FEM ingest + D-2 schema docs per the V1 completion plan). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`.
+- **last_nightly_pipeline**: `2026-04-28T03:00:30Z` — harness 20/20, triage promoted=1 rejected=1 human=0
+- **open_branches**: none. Wave 1 (`4c70756`), Wave 1.5 (`b69d2c7`), and V1-A (`23cdb31`) all squash-merged and deployed today; branches cleaned up.

 ## Active Plan

@@ -170,6 +170,34 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha

 ## Session Log

+- **2026-04-29 Codex + Claude (V1-A landed and deployed)** Engineering V1 Completion Plan Phase V1-A done. Branch `claude/v1-a-pillar-query-slice` shipped the Q-001 subsystem-scoped shape fix exactly per `engineering-query-catalog.md:71`: new `subsystem_contents()` helper in `queries.py` walking inbound `part_of` edges, new `GET /entities/Subsystem/{id}?expand=contains` route returning `{subsystem, contains: [{id, type, entity_type, name, status}]}`, dual-emit of `type` (spec) and `entity_type` (rest-of-module convention) for compat. The existing project-wide `/engineering/projects/{name}/systems` route stays for Q-004. Aliased under `/v1`. V1-A acceptance integration test runs all four pillar queries (Q-001 subsystem, Q-005 requirements_for, Q-006 orphan_requirements, Q-017 evidence_chain) plus Q-009/Q-011 against a single p05-interferometer seed graph. Codex audited at `b575773` (GO WITH CONDITIONS, two P2s + two P3s) → all four amends folded in at `0e83525` → re-review GO. Squash-merged to main as `23cdb31`. Deployed via canonical script; live `/health` build_sha `23cdb3149fde2185f0eaf1350c5b065b08f80319`, build_time `2026-04-29T17:04:05Z`, status=ok. **Live post-deploy probes** all green: `/entities/Subsystem/{p05-id}?expand=contains` → 200, spec-shaped response; `/v1/entities/Subsystem/{p05-id}?expand=contains` → 200; `?expand=parents` → 400 with informative detail; component-id against the Subsystem route → 404 with informative detail. Live retrieval harness 20/20 against `23cdb31`. Tests on main: 596 → 605 (+9). master-plan-status.md V1-A status flipped to ✅; V1-B is now "next".
+
+- **2026-04-29 Antoine + Claude (openclaw registered, proposals queue drained)** Talked through the framing tension — openclaw is "a tool" but external (we don't own its product), so the 17 memories are really integration notes (plugin architecture, `cleanedBody` hook field, read-only pull contract, hermes-agent comparison, atocore-capture v0.2.0 plugin). Picked Option A from the four options laid out: register `openclaw` straight (zero migration) with description that names the integration framing explicitly. Aliases: `clawd` (matches local repo path on Dalidou + T420). POSTed to `/admin/projects/register-emerging`. Live `/admin/projects/proposals?min_active=10` now returns count=0 — registered project set is 8, no unregistered labels above threshold. Wave 1.5 closed end-to-end: code + audit + deploy + both registrations.
+
+- **2026-04-29 Antoine + Claude (apm registered)** Antoine confirmed apm is a product alongside atomizer-v2 (I03 in the Atomaste tools I-series). Registered via `POST /admin/projects/register-emerging` mirroring atomizer-v2's pattern: aliases `i03`, `atomaste-parts-manager`, `parts-manager`; description "Atomaste Parts Manager — parts/COTS lifecycle tool (I03 in Atomaste tool series alongside I01 Atomizer-v2 and I02 AtoCore)". Default ingest_root `vault:incoming/projects/apm/` (does not yet exist on disk; may want a `/repo` mirror via PUT /projects/apm if a code repo gets staged, like atomizer-v2's `incoming/projects/atomizer-v2/repo`). Live `/admin/projects/proposals?min_active=10` post-registration: count drops 2→1 (apm gone, openclaw remains). 165 active + 22 candidate apm-tagged memories now resolve to a registered canonical id.
+
+- **2026-04-29 Codex + Claude (Wave 1.5 audited, merged, deployed)** Branch `claude/wave1.5-emerging-project-proposals` shipped `GET /admin/projects/proposals?min_active=N` — live, on-demand companion to the nightly `scripts/detect_emerging.py` cache. Each proposal includes project_id, active+candidate counts, sample_memories (3 most recent), suggested_aliases (sibling labels sharing a >=4-char token, case-insensitive), guessed_ingest_root. New `propose_emerging_projects()` helper in `service.py`. Codex audited tip `e8ac8bb`: verdict GO with one P2 (test was trivially true) and one P3 (case-sensitive tokens). Both folded into `f70fa6b` plus a registered-token-leak guard test. Tip `f70fa6b` squash-merged to main as `b69d2c7`. Deployed via canonical script; live `/health` reports build_sha `b69d2c70881b661e6fd8f25194cbc5c20a3947be`, build_time `2026-04-29T02:16:25Z`, status=ok. Live `/admin/projects/proposals?min_active=10` returns the expected 2 proposals: **apm (165 active / 22 candidate, no aliases)** and **openclaw (17 active / 0 candidate, no aliases)**. Hydrotech and lead-space variants below threshold or fewer captures than the dashboard `by_project` showed pre-deploy (live SQL is current). apm sample memories are clearly real APM project content (parts manager, McMaster cleanup workflows). Tests: 586 → 596 (+10). Branches deleted local + remote. **Open Wave 1.5 follow-on (P2)**: Codex recommended a one-step shortcut so the operator doesn't have to copy `suggested_aliases` from the proposal JSON into the register-emerging POST — e.g., `POST /admin/projects/register-emerging` with `from_proposal: {project_id}` that reads server-side. Deferred — the JSON copy is acceptable for the small number of projects in flight. Next: register apm (operator decision pending).
+
+- **2026-04-29 Claude (Wave 1 deployed, post-deploy probes green)** Squash-merged `claude/wave1-dashboard-counts-and-memory-fixes` (tip `9604c3e`) to main as `4c70756` after Codex's GO verdict on the amended branch. Pushed main, deployed via `ssh papa@dalidou "bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh"`. Live `/health` reports build_sha `4c7075650c8187473bc5992b306fb8fa67542074`, build_time `2026-04-29T01:58:59Z`, status=ok. **Post-deploy acceptance** per Codex's checklist: (1) `/admin/dashboard memories.active = 1170` — was 315 pre-deploy, sampling bug closed; new fields `memories.total = 2436`, `memories.by_status` and full `memories.by_project` breakdown live; integrity panel still reports the 22h-old 1091 snapshot from last nightly which is expected behavior. (2) Live retrieval harness 20/20 PASS against `4c70756`, no regression. (3) Supersede/invalidate route branches probed live: unknown id → 404; candidate target → 409 with status-specific detail (`...is candidate; only active memories can be superseded`, `...is candidate; use /reject for candidates`). (4) Auto-triage suggested-project correction path covered by the script-source invariant test plus end-to-end TestClient coverage of the new `MemoryUpdateRequest.project` plumbing — not exercised against live prod state to avoid casual mutation. **Live by-project dashboard breakdown** revealed 14 unregistered projects accumulating memories: apm (165!), openclaw (17), hydrotech-mining variants (13), lead-space variants (5), and ten others. apm is severely overdue. Wave 1.5 starts next: one-click registration proposal endpoint with sample-memory + alias suggestions. **Test count**: 586 on main. **Open branches**: none — Wave 1 closed.
+
+- **2026-04-29 Codex + Claude (Wave 1 formal audit closed + amends)** Codex's formal audit of `fb4d55c` (Wave 1 first commit on `claude/wave1-dashboard-counts-and-memory-fixes`): verdict GO WITH CONDITIONS. Two P1-prior closures confirmed (dashboard count bug + invalidate top-1 lookup). Project-update P2 was only "partially closed" — API/service plumbing fine, but `scripts/auto_triage.py:417` still PUT `{"content": cand["content"]}` so the operational suggested-project correction was unreachable even with `MemoryUpdateRequest.project` in place. Codex also flagged the symmetric supersede-route gap as same-class adjacent surface and recommended pulling it in here, not in Wave 1.5. Plus one P3: cover retarget-to-empty-project against a global active duplicate. Amended on `3a474f7`: (1) auto_triage PUT body now `{"project": suggested}` with a guard test that lints the script source for the new shape; (2) `/memory/{id}/supersede` mirrors the invalidate guard via `get_memory(id)` — 404 unknown / 200 already_superseded / 409 wrong-status / 200 superseded; (3) regression test for project-empty duplicate detection. Test count 581 → 586. Codex's recommended deployment checklist (post-deploy verifications, including the run-or-simulate auto-triage retarget probe) carried into the merge plan. Awaiting Codex re-review of the amended tip before squash-merge.
+
+- **2026-04-29 Claude (Wave 1 debt-pay started; Codex review of state-of-service plan)** Audited live state on Dalidou: `/health` build_sha `7042eae` (4d old), harness 20/20, 33,253 vectors, 1,748 docs, 1054 interactions (+103/4d), dashboard memories.active=315 vs integrity.active_memory_count=1091. Drafted a state-of-service assessment + Wave 1/2/3/4 plan, then asked Codex (gpt-5.5) for an adversarial review via `codex exec`. Codex verdict: assessment MIXED, plan ENDORSE WITH CHANGES. Codex caught two factual corrections (the 315-vs-1091 gap is a *sampling bug* not a definitional gap; my "R9 drops unregistered tags" framing is wrong — `extractor_llm.py:213-233` preserves them) and two new memory-write-path bugs I missed. Branched `claude/wave1-dashboard-counts-and-memory-fixes` from `7042eae`, fixed all three: (1) `/admin/dashboard` now uses a new `get_memory_count_summary()` SQL aggregate helper instead of counting inside a confidence-sorted `get_memories(limit=500)` sample; (2) `MemoryUpdateRequest` and `update_memory()` accept `project` with `resolve_project_name` canonicalization + before/after audit, so `auto_triage.py:407` suggested-project corrections will now actually apply; (3) `POST /memory/{id}/invalidate` replaces the `_get_memories(status="active", limit=1)` lookup (which only saw the highest-confidence active row) with a direct id lookup via new `get_memory(id)` helper. 9 regression tests added across `test_memory.py` and `test_invalidate_supersede.py`. Full local suite: 581 passed (572 → 581). Commit `fb4d55c`. Branch not pushed/deployed yet — awaiting Codex audit per working model. Refreshed Orientation block (live_sha, last_updated, test_count, memory counts, capture cadence, registered/unregistered project breakdown, open_branches). Wave 1 follow-up still open: (W1.2) one-click registration proposal for unregistered projects with ≥10 active memories (apm=63 is overdue); (W1.3 done by this entry) sync ledger; (W1.4) committed measurable-win probe fixture+JSON output. After this branch lands, V1-A is unblocked: gates have effectively cleared (soak ended 2026-04-26; density 315 ≫ 100 target).
+
+- **2026-04-25 Codex (p04 constraint gap closed; harness fully green)** Root-caused the remaining `p04-constraints` fixture: the `Zerodur` / `1.2` fact already existed in Trusted Project State (`requirement/key_constraints`), but project-state formatting was category/key ordered and then truncated to the 20% state budget, so contacts/decisions consumed the budget before the relevant requirement. Added query-relevance ranking for Trusted Project State entries before formatting/truncation, with regression coverage in `test_project_state_query_relevance_before_truncation`. Removed the fixture's `known_issue` lane so future p04 constraint regressions are blocking. Cleaned up a duplicate live requirement entry created during diagnosis by invalidating `requirement/mirror-blank-core-constraints`; canonical `requirement/key_constraints` remains active. Verified focused suite: 35 passed. Verified full local suite: 572 passed. Deployed `d3de9f67eaa08dfc5b2d86e8221b8c70fef266d3`; live exact p04 probe now surfaces `[REQUIREMENT] key_constraints` with `1.2` and `Zerodur`. Live retrieval harness: 20/20, 0 known issues, 0 blocking failures.
+
+- **2026-04-25 Codex (project_id backfill + retrieval stabilization closed)** Merged `codex/project-id-metadata-retrieval` into `main` (`867a1ab`) and deployed to Dalidou. Took Chroma-inclusive backup `/srv/storage/atocore/backups/snapshots/20260424T154358Z`, then ran `scripts/backfill_chunk_project_ids.py` per project; populated projects `p04-gigabit`, `p05-interferometer`, `p06-polisher`, `atomizer-v2`, and `atocore` applied cleanly for 33,253 vectors total, with 0 missing/malformed and an immediate final dry-run showing 33,253 already tagged / 0 updates. Post-backfill harness exposed p06 memory-ranking misses (`Tailscale`, `encoder`), so Codex shipped `4744c69` then `a87d984 fix(memory): widen query-time context candidates`. Full local suite: 571 passed. Live `/health` reports `a87d9845a8c34395a02890f0cf22aa7a46afaf62`, vectors=33,253, sources_ready=true. Live retrieval harness: 19/20, 0 blocking failures, 1 known issue (`p04-constraints` missing `Zerodur` / `1.2`). A repeat backfill dry-run after the code-only stabilization deploy was aborted after the one-off container ran too long; the live service stayed healthy and the earlier post-apply idempotency result remains the migration acceptance record. Dalidou HTTP push credentials are still not configured; this session pushed through the Windows credential path.
+
+- **2026-04-24 Codex (retrieval boundary deployed + project_id metadata tranche)** Merged `codex/audit-improvements-foundation` to `main` as `f44a211` and pushed to Dalidou Gitea. Took pre-deploy runtime backup `/srv/storage/atocore/backups/snapshots/20260424T144810Z` (DB + registry, no Chroma). Deployed via `papa@dalidou` canonical `deploy/dalidou/deploy.sh`; live `/health` reports build_sha `f44a2114970008a7eec4e7fc2860c8f072914e38`, build_time `2026-04-24T14:48:44Z`, status ok. Post-deploy retrieval harness: 20 fixtures, 19 pass, 0 blocking failures, 1 known issue (`p04-constraints`). The former blocker `p05-broad-status-no-atomizer` now passes. Manual p05 `context-build "current status"` spot check shows no p04/Atomizer source bleed in retrieved chunks. Started follow-up branch `codex/project-id-metadata-retrieval`: registered-project ingestion now writes explicit `project_id` into DB chunk metadata and Chroma vector metadata; retrieval prefers exact `project_id` when present and keeps path/tag matching as legacy fallback; added dry-run-by-default `scripts/backfill_chunk_project_ids.py` to backfill SQLite + Chroma metadata; added tests for project-id ingestion, registered refresh propagation, exact project-id retrieval, and collision fallback. Verified targeted suite (`test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 36 passed. Verified full suite: 556 passed in 72.44s. Branch not merged or deployed yet.
+
+- **2026-04-24 Codex (project_id audit response)** Applied independent-audit fixes on `codex/project-id-metadata-retrieval`. Closed the nightly `/ingest/sources` clobber risk by adding registry-level `derive_project_id_for_path()` and making unscoped `ingest_file()` derive ownership from registered ingest roots when possible; `refresh_registered_project()` still passes the canonical project id directly. Changed retrieval so empty `project_id` falls through to legacy path/tag ownership instead of short-circuiting as unowned. Hardened `scripts/backfill_chunk_project_ids.py`: `--apply` now requires `--chroma-snapshot-confirmed`, runs Chroma metadata updates before SQLite writes, batches updates, skips/report missing vectors, skips/report malformed metadata, reports already-tagged rows, and turns missing ingestion tables into a JSON `db_warning` instead of a traceback. Added tests for auto-derive ingestion, empty-project fallback, ingest-root overlap rejection, and backfill dry-run/apply/snapshot/missing-vector/malformed cases. Verified targeted suite (`test_backfill_chunk_project_ids.py`, `test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 45 passed. Verified full suite: 565 passed in 73.16s. Local dry-run on empty/default data returns 0 updates with `db_warning` rather than crashing. Branch still not merged/deployed.
+
+- **2026-04-24 Codex (project_id final hardening before merge)** Applied the final independent-review P2s on `codex/project-id-metadata-retrieval`: `ingest_file()` still fails open when project-id derivation fails, but now emits `project_id_derivation_failed` with file path and error; retrieval now catches registry failures both at project-scope resolution and the soft project-match boost path, logs warnings, and serves unscoped rather than raising. Added regression tests for both fail-open paths. Verified targeted suite (`test_ingestion.py`, `test_retrieval.py`, `test_backfill_chunk_project_ids.py`, `test_project_registry.py`): 47 passed. Verified full suite: 567 passed in 79.66s. Branch still not merged/deployed.
+
+- **2026-04-24 Codex (audit improvements foundation)** Started implementation of the audit recommendations on branch `codex/audit-improvements-foundation` from `origin/main@c53e61e`. First tranche: registry-aware project-scoped retrieval filtering (`ATOCORE_RANK_PROJECT_SCOPE_FILTER`, widened candidate pull before filtering), eval harness known-issue lane, two p05 project-bleed fixtures, `scripts/live_status.py`, README/current-state/master-plan status refresh. Verified `pytest -q`: 550 passed in 67.11s. Live retrieval harness against undeployed production: 20 fixtures, 18 pass, 1 known issue (`p04-constraints` Zerodur/1.2 content gap), 1 blocking guard (`p05-broad-status-no-atomizer`) still failing because production has not yet deployed the retrieval filter and currently pulls `P04-GigaBIT-M1-KB-design` into broad p05 status context. Live dashboard refresh: health ok, build `2b86543`, docs 1748, chunks/vectors 33253, interactions 948, active memories 289, candidates 0, project_state total 128. Noted count discrepancy: dashboard memories.active=289 while integrity active_memory_count=951; schedule reconciliation in a follow-up.
+
+- **2026-04-24 Codex (independent-audit hardening)** Applied the Opus independent audit's fast follow-ups before merge/deploy. Closed the two P1s by making project-scope ownership path/tag-based only, adding path-segment/tag-exact matching to avoid short-alias substring collisions, and keeping title/heading text out of provenance decisions. Added regression tests for title poisoning, substring collision, and unknown-project fallback. Added retrieval log fields `raw_results_count`, `post_filter_count`, `post_filter_dropped`, and `underfilled`. Added retrieval-eval run metadata (`generated_at`, `base_url`, `/health`) and `live_status.py` auth-token/status support. README now documents the ranking knobs and clarifies that the hard scope filter and soft project match boost are separate controls. Verified `pytest -q`: 553 passed in 66.07s. Live production remains expected-predeploy: 20 fixtures, 18 pass, 1 known content gap, 1 blocking p05 bleed guard. Latest live dashboard: build `2b86543`, docs 1748, chunks/vectors 33253, interactions 950, active memories 290, candidates 0, project_state total 128.
+
 - **2026-04-23 Codex + Claude (R14 closed)** Codex reviewed `claude/r14-promote-400` at `3888db9`, no findings: "The route change is narrowly scoped: `promote_entity()` still returns False for not-found/not-candidate cases, so the existing 404 behavior remains intact, while caller-fixable validation failures now surface as 400." Ran `pytest tests/test_v1_0_write_invariants.py -q` from an isolated worktree: 15 passed in 1.91s. Claude squash-merged to main as `0989fed`, followed by ledger close-out `2b86543`, then deployed via canonical script. Dalidou `/health` reports build_sha=`2b86543e6ad26011b39a44509cc8df3809725171`, build_time `2026-04-23T15:20:53Z`, status=ok. R14 closed. Orientation refreshed earlier this session also reflected the V1-A gate status: **density gate CLEARED** (784 active memories vs 100 target — density batch-extract ran between 2026-04-22 and 2026-04-23 and more than crushed the gate), **soak gate at day 5 of ~7** (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; only chronic failure is the known p04-constraints "Zerodur" content gap). V1-A branches from a clean V1-0 baseline as soon as the soak is called done.

 - **2026-04-22 Codex + Antoine (V1-0 closed)** Codex approved `f16cd52` after re-running both original probes (legacy-candidate promote + supersede hook — both correct) and the three targeted regression suites (`test_v1_0_write_invariants.py`, `test_engineering_v1_phase5.py`, `test_inbox_crossproject.py` — all pass). Squash-merged to main as `2712c5d` ("feat(engineering): enforce V1-0 write invariants"). Deployed to Dalidou via the canonical deploy script; `/health` build_sha=`2712c5d2d03cb2a6af38b559664afd1c4cd0e050` status=ok. Validated backup snapshot at `/srv/storage/atocore/backups/snapshots/20260422T190624Z` taken BEFORE prod backfill. Prod backfill of `scripts/v1_0_backfill_provenance.py` against live DB: dry-run found 31 active/superseded entities with no provenance, list reviewed and looked sane; live run with default `hand_authored=1` flag path updated 31 rows; follow-up dry-run returned 0 rows remaining → no lingering F-8 violations in prod. Codex logged one residual P2 (R14): HTTP `POST /entities/{id}/promote` route doesn't translate the new service-layer `ValueError` into 400 — legacy bad candidate promoted through the API surfaces as 500. Not blocking. V1-0 closed. **Gates for V1-A**: soak window ends ~2026-04-26; 100-active-memory density target (currently 84 active + the ~31 newly flagged ones — need to check how those count in density math). V1-A holds until both gates clear.
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@ Personal context engine that enriches LLM interactions with durable memory, stru

 ```bash
 pip install -e .
-uvicorn src.atocore.main:app --port 8100
+uvicorn atocore.main:app --port 8100
 ```

 ## Usage
@@ -37,6 +37,10 @@ python scripts/atocore_client.py audit-query "gigabit" 5
 | POST | /ingest | Ingest markdown file or folder |
 | POST | /query | Retrieve relevant chunks |
 | POST | /context/build | Build full context pack |
+| POST | /interactions | Capture prompt/response interactions |
+| GET/POST | /memory | List/create durable memories |
+| GET/POST | /entities | Engineering entity graph surface |
+| GET | /admin/dashboard | Operator dashboard |
 | GET | /health | Health check |
 | GET | /debug/context | Inspect last context pack |

@@ -66,8 +70,10 @@ unversioned forms.
 FastAPI (port 8100)
  |- Ingestion: markdown -> parse -> chunk -> embed -> store
  |- Retrieval: query -> embed -> vector search -> rank
-  |- Context Builder: retrieve -> boost -> budget -> format
-  |- SQLite (documents, chunks, memories, projects, interactions)
+  |- Context Builder: project state -> memories -> entities -> retrieval -> budget
+  |- Reflection: capture -> reinforce -> extract -> triage -> promote/expire
+  |- Engineering: typed entities, relationships, conflicts, wiki/mirror
+  |- SQLite (documents, chunks, memories, projects, interactions, entities)
  '- ChromaDB (vector embeddings)
 ```

@@ -82,6 +88,16 @@ Set via environment variables (prefix `ATOCORE_`):
 | ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) |
 | ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) |
 | ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model |
+| ATOCORE_RANK_PROJECT_MATCH_BOOST | 2.0 | Soft boost for chunks whose metadata matches the project hint |
+| ATOCORE_RANK_PROJECT_SCOPE_FILTER | true | Filter project-hinted retrieval away from other registered project corpora |
+| ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER | 4 | Widen candidate pull before project-scope filtering |
+| ATOCORE_RANK_QUERY_TOKEN_STEP | 0.08 | Per-token boost when query terms appear in high-signal metadata |
+| ATOCORE_RANK_QUERY_TOKEN_CAP | 1.32 | Maximum query-token boost multiplier |
+| ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST | 1.18 | Boost current decision/status/requirements-like paths |
+| ATOCORE_RANK_PATH_LOW_SIGNAL_PENALTY | 0.72 | Down-rank archive/history-like paths |
+
+`ATOCORE_RANK_PROJECT_SCOPE_FILTER` gates the hard cross-project filter only.
+`ATOCORE_RANK_PROJECT_MATCH_BOOST` remains the separate soft-ranking knob.

 ## Testing

@@ -93,7 +109,11 @@ pytest
 ## Operations

 - `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
+- `scripts/retrieval_eval.py` runs the live retrieval/context harness, separates blocking failures from known content gaps, and stamps JSON output with target/build metadata.
+- `scripts/live_status.py` renders a compact read-only status report from `/health`, `/stats`, `/projects`, and `/admin/dashboard`; set `ATOCORE_AUTH_TOKEN` or `--auth-token` when those endpoints are gated.
+- `scripts/backfill_chunk_project_ids.py` dry-runs or applies explicit `project_id` metadata backfills for SQLite chunks and Chroma vectors; `--apply` requires a confirmed Chroma snapshot.
 - `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
+- `DEV-LEDGER.md` is the fast-moving source of operational truth during active development; copy claims into docs only after checking the live service.

 ## Architecture Notes

--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -1,6 +1,18 @@
-# AtoCore — Current State (2026-04-22)
+# AtoCore - Current State (2026-04-25)

-Live deploy: `2712c5d` · Dalidou health: ok · Harness: 17/18 · Tests: 547 passing.
+Update 2026-04-25: project-id chunk/vector metadata is deployed and backfilled,
+and the final p04 Trusted Project State budget/ranking gap is closed. Live
+Dalidou is on `d3de9f6`; `/health` is ok with 33,253 vectors and sources
+ready. Live retrieval harness is 20/20 with 0 blocking failures and 0 known
+issues. Full local suite: 572 passed.
+
+The project-id backfill was applied per populated project after a
+Chroma-inclusive backup at
+`/srv/storage/atocore/backups/snapshots/20260424T154358Z`. The immediate
+post-apply dry-run reported 33,253 already tagged, 0 updates, 0 missing, and 0
+malformed. A later repeat dry-run after the code-only ranking deploy was
+aborted because the one-off container ran too long; the earlier post-apply
+idempotency result remains the migration acceptance record.

 ## V1-0 landed 2026-04-22

@@ -13,9 +25,8 @@ supersede) with Q-3 fail-open. Prod backfill ran cleanly — 31 legacy
 active/superseded entities flagged `hand_authored=1`, follow-up dry-run
 returned 0 remaining rows. Test count 533 → 547 (+14).

-R14 (P2, non-blocking): `POST /entities/{id}/promote` route fix translates
-the new `ValueError` into 400. Branch `claude/r14-promote-400` pending
-Codex review + squash-merge.
+R14 is closed: `POST /entities/{id}/promote` now translates the new
+caller-fixable V1-0 `ValueError` into HTTP 400.

 **Next in the V1 track:** V1-A (minimal query slice + Q-6 killer-correctness
 integration). Gated on pipeline soak (~2026-04-26) + 100+ active memory
@@ -65,10 +76,10 @@ Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs
 | 7G | Re-extraction on prompt version bump | pending |
 | 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |

-## Known gaps (honest)
+## Known gaps (honest, refreshed 2026-04-25)

 1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
-2. **OpenClaw is capture-only, not context-grounded.** The plugin POSTs `/interactions` on `llm_output` but does NOT call `/context/build` on `before_agent_start`. OpenClaw's underlying agent runs blind. See Q2 below.
-3. **Human interface (wiki) is thin and static.** 5 project cards + a "System" line. No dashboard for the autonomous activity. No per-memory detail page. See Q3/Q5.
-4. **Harness 17/18** — the `p04-constraints` fixture wants "Zerodur" but retrieval surfaces related-not-exact terms. Content gap, not a retrieval regression.
-5. **Two projects under-populated**: p05-interferometer (4 memories, 18 state) and atomizer-v2 (1 memory, 6 state). Batch re-extract with the new llm-0.6.0 prompt would help.
+2. **Project-scoped retrieval guard is deployed and passing.** Explicit `project_id` chunk/vector metadata is now present in SQLite and Chroma for the 33,253-vector corpus. Retrieval prefers exact metadata ownership and keeps path/tag matching as a legacy fallback.
+3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
+4. **Harness is currently green.** The former `p04-constraints` known issue is closed; query-relevant Trusted Project State entries now rank before state-budget truncation.
+5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.
--- a/docs/master-plan-status.md
+++ b/docs/master-plan-status.md
@@ -70,9 +70,14 @@ read-only additive mode.
 - Phase 6 - AtoDrive
 - Phase 10 - Write-back
 - Phase 11 - Multi-model
- Phase 12 - Evaluation
 - Phase 13 - Hardening

+### Partial / Operational Baseline
+
+- Phase 12 - Evaluation. The retrieval/context harness exists and runs
+  against live Dalidou, but coverage is still intentionally small and
+  should grow before this is complete in the intended sense.
+
 ### Engineering Layer Planning Sprint

 **Status: complete.** All 8 architecture docs are drafted. The
@@ -126,11 +131,15 @@ This sits implicitly between Phase 8 (OpenClaw) and Phase 11
 (multi-model). Memory-review and engineering-entity commands are
 deferred from the shared client until their workflows are exercised.

-## What Is Real Today (updated 2026-04-16)
+## What Is Real Today (updated 2026-04-25)

- canonical AtoCore runtime on Dalidou (`775960c`, deploy.sh verified)
- 33,253 vectors across 6 registered projects
- 234 captured interactions (192 claude-code, 38 openclaw, 4 test)
+- canonical AtoCore runtime on Dalidou (`d3de9f6`, deploy.sh verified)
+- 33,253 vectors across 6 registered projects, with explicit `project_id`
+  metadata backfilled into SQLite and Chroma after snapshot
+  `/srv/storage/atocore/backups/snapshots/20260424T154358Z`
+- 951 captured interactions as of the 2026-04-24 live dashboard; refresh
+  exact live counts with
+  `python scripts/live_status.py`
 - 6 registered projects:
  - `p04-gigabit` (483 docs, 15 state entries)
  - `p05-interferometer` (109 docs, 18 state entries)
@@ -138,12 +147,17 @@ deferred from the shared client until their workflows are exercised.
  - `atomizer-v2` (568 docs, 5 state entries)
  - `abb-space` (6 state entries)
  - `atocore` (drive source, 47 state entries)
- 110 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
- 84 active memories (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
+- 128 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
+- 290 active memories and 0 candidate memories as of the 2026-04-24 live
+  dashboard
 - context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
- query-relevance memory ranking with overlap-density scoring
- retrieval eval harness: 18 fixtures, 17/18 passing on live
- 303 tests passing
+- query-relevance memory ranking with overlap-density scoring and widened
+  query-time candidate pools so older exact-intent project memories can rank
+  ahead of generic high-confidence notes
+- query-relevance Trusted Project State ranking before state-budget truncation
+- retrieval eval harness: 20 fixtures; current live has 20 pass, 0 known
+  issues, and 0 blocking failures
+- 572 tests passing on `main`
 - nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness** → **pipeline summary to project state**
 - Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
 - pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
@@ -164,9 +178,10 @@ These are the current practical priorities.
   Target: 100+ active memories.
 3. **Multi-model triage** (Phase 11 entry) — switch auto-triage to a
   different model than the extractor for independent validation
-4. **Fix p04-constraints harness failure** — retrieval doesn't surface
-   "Zerodur" for p04 constraint queries. Investigate if it's a missing
-   memory or retrieval ranking issue.
+4. **Fix Dalidou Git credentials** — the host checkout can fetch but cannot
+   push to Gitea over HTTP in non-interactive SSH sessions. Prefer switching
+   the deploy checkout to a Gitea SSH key; PAT-backed `credential.helper store`
+   is the fallback.

 ## Active — Engineering V1 Completion Track (started 2026-04-22)

@@ -183,16 +198,16 @@ where surfaces are disjoint, pauses when they collide.
 | Phase | Scope | Status |
 |---|---|---|
 | V1-0 | Write-time invariants: F-1 header fields + F-8 provenance enforcement + F-5 hook on every active-entity write + Q-3 flag-never-block | ✅ done 2026-04-22 (`2712c5d`) |
-| V1-A | Minimum query slice: Q-001 subsystem-scoped variant + Q-6 killer-correctness integration test on p05-interferometer | 🟡 gated — starts when soak (~2026-04-26) + density (100+ active memories) gates clear |
-| V1-B | KB-CAD + KB-FEM ingest (`POST /ingest/kb-cad/export`, `POST /ingest/kb-fem/export`) + D-2 schema docs | pending V1-A |
+| V1-A | Minimum query slice: Q-001 subsystem-scoped variant + Q-6 killer-correctness integration test on p05-interferometer | ✅ done 2026-04-29 (`23cdb31`) |
+| V1-B | KB-CAD + KB-FEM ingest (`POST /ingest/kb-cad/export`, `POST /ingest/kb-fem/export`) + D-2 schema docs | next |
 | V1-C | Close the remaining 8 queries (Q-002/003/007/010/012/014/018/019; Q-020 to V1-D) | pending V1-B |
 | V1-D | Full mirror surface (3 spec routes + regenerate + determinism + disputed + curated markers) + Q-5 golden file | pending V1-C |
 | V1-E | Memory→entity graduation end-to-end + remaining Q-4 trust tests | pending V1-D (note: collides with memory extractor; pauses for multi-model triage work) |
 | V1-F | F-5 detector generalization + route alias + O-1/O-2/O-3 operational + D-1/D-3/D-4 docs | finish line |

-R14 (P2, non-blocking): `POST /entities/{id}/promote` route returns 500
-on the new V1-0 `ValueError` instead of 400. Fix on branch
-`claude/r14-promote-400`, pending Codex review.
+R14 is closed: `POST /entities/{id}/promote` now translates
+caller-fixable V1-0 provenance validation failures into HTTP 400 instead
+of leaking as HTTP 500.

 ## Next

--- a/scripts/auto_triage.py
+++ b/scripts/auto_triage.py
@@ -404,19 +404,23 @@ def process_candidate(cand, base_url, active_cache, state_cache, known_projects,
        known_projects, TIER1_MODEL, DEFAULT_TIMEOUT_S,
    )

-    # Project misattribution fix: suggested_project surfaces from tier 1
+    # Project misattribution fix: suggested_project surfaces from tier 1.
+    # Earlier code POSTed only {"content": cand["content"]}, which left
+    # the project field unchanged because MemoryUpdateRequest had no
+    # project key and the service signature didn't accept one. Wave 1
+    # added project to MemoryUpdateRequest and update_memory(); this
+    # caller now actually applies the suggested project.
    suggested = (v1.get("suggested_project") or "").strip()
    if suggested and suggested != project and suggested in known_projects:
-        # Try to re-canonicalize the memory's project
        if not dry_run:
            try:
                import urllib.request as _ur
                req = _ur.Request(
                    f"{base_url}/memory/{mid}", method="PUT",
                    headers={"Content-Type": "application/json"},
-                    data=json.dumps({"content": cand["content"]}).encode("utf-8"),
+                    data=json.dumps({"project": suggested}).encode("utf-8"),
                )
-                _ur.urlopen(req, timeout=10).read()  # triggers canonicalization via update
+                _ur.urlopen(req, timeout=10).read()
            except Exception:
                pass
        print(f"    ↺ misattribution flagged: {project!r} → {suggested!r}")
--- a/scripts/backfill_chunk_project_ids.py
+++ b/scripts/backfill_chunk_project_ids.py
@@ -0,0 +1,178 @@
+"""Backfill explicit project_id into chunk and vector metadata.
+
+Dry-run by default. The script derives ownership from the registered project
+ingest roots and updates both SQLite source_chunks.metadata and Chroma vector
+metadata only when --apply is provided.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sqlite3
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "src"))
+
+from atocore.models.database import get_connection  # noqa: E402
+from atocore.projects.registry import derive_project_id_for_path  # noqa: E402
+from atocore.retrieval.vector_store import get_vector_store  # noqa: E402
+
+DEFAULT_BATCH_SIZE = 500
+
+
+def _decode_metadata(raw: str | None) -> dict | None:
+    if not raw:
+        return {}
+    try:
+        parsed = json.loads(raw)
+    except json.JSONDecodeError:
+        return None
+    return parsed if isinstance(parsed, dict) else None
+
+
+def _chunk_rows() -> tuple[list[dict], str]:
+    try:
+        with get_connection() as conn:
+            rows = conn.execute(
+                """
+                SELECT
+                    sc.id AS chunk_id,
+                    sc.metadata AS chunk_metadata,
+                    sd.file_path AS file_path
+                FROM source_chunks sc
+                JOIN source_documents sd ON sd.id = sc.document_id
+                ORDER BY sd.file_path, sc.chunk_index
+                """
+            ).fetchall()
+    except sqlite3.OperationalError as exc:
+        if "source_chunks" in str(exc) or "source_documents" in str(exc):
+            return [], f"missing ingestion tables: {exc}"
+        raise
+    return [dict(row) for row in rows], ""
+
+
+def _batches(items: list, batch_size: int) -> list[list]:
+    return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]
+
+
+def backfill(
+    apply: bool = False,
+    project_filter: str = "",
+    batch_size: int = DEFAULT_BATCH_SIZE,
+    require_chroma_snapshot: bool = False,
+) -> dict:
+    rows, db_warning = _chunk_rows()
+    updates: list[tuple[str, str, dict]] = []
+    by_project: dict[str, int] = {}
+    skipped_unowned = 0
+    already_tagged = 0
+    malformed_metadata = 0
+
+    for row in rows:
+        project_id = derive_project_id_for_path(row["file_path"])
+        if project_filter and project_id != project_filter:
+            continue
+        if not project_id:
+            skipped_unowned += 1
+            continue
+        metadata = _decode_metadata(row["chunk_metadata"])
+        if metadata is None:
+            malformed_metadata += 1
+            continue
+        if metadata.get("project_id") == project_id:
+            already_tagged += 1
+            continue
+        metadata["project_id"] = project_id
+        updates.append((row["chunk_id"], project_id, metadata))
+        by_project[project_id] = by_project.get(project_id, 0) + 1
+
+    missing_vectors: list[str] = []
+    applied_updates = 0
+    if apply and updates:
+        if not require_chroma_snapshot:
+            raise ValueError(
+                "--apply requires --chroma-snapshot-confirmed after taking a Chroma backup"
+            )
+        vector_store = get_vector_store()
+        for batch in _batches(updates, max(1, batch_size)):
+            chunk_ids = [chunk_id for chunk_id, _, _ in batch]
+            vector_payload = vector_store.get_metadatas(chunk_ids)
+            existing_vector_metadata = {
+                chunk_id: metadata
+                for chunk_id, metadata in zip(
+                    vector_payload.get("ids", []),
+                    vector_payload.get("metadatas", []),
+                    strict=False,
+                )
+                if isinstance(metadata, dict)
+            }
+
+            vector_ids = []
+            vector_metadatas = []
+            sql_updates = []
+            for chunk_id, project_id, chunk_metadata in batch:
+                vector_metadata = existing_vector_metadata.get(chunk_id)
+                if vector_metadata is None:
+                    missing_vectors.append(chunk_id)
+                    continue
+                vector_metadata = dict(vector_metadata)
+                vector_metadata["project_id"] = project_id
+                vector_ids.append(chunk_id)
+                vector_metadatas.append(vector_metadata)
+                sql_updates.append((json.dumps(chunk_metadata, ensure_ascii=True), chunk_id))
+
+            if not vector_ids:
+                continue
+
+            vector_store.update_metadatas(vector_ids, vector_metadatas)
+            with get_connection() as conn:
+                cursor = conn.executemany(
+                    "UPDATE source_chunks SET metadata = ? WHERE id = ?",
+                    sql_updates,
+                )
+                if cursor.rowcount != len(sql_updates):
+                    raise RuntimeError(
+                        f"SQLite rowcount mismatch: {cursor.rowcount} != {len(sql_updates)}"
+                    )
+            applied_updates += len(sql_updates)
+
+    return {
+        "apply": apply,
+        "total_chunks": len(rows),
+        "updates": len(updates),
+        "applied_updates": applied_updates,
+        "already_tagged": already_tagged,
+        "skipped_unowned": skipped_unowned,
+        "malformed_metadata": malformed_metadata,
+        "missing_vectors": len(missing_vectors),
+        "db_warning": db_warning,
+        "by_project": dict(sorted(by_project.items())),
+    }
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--apply", action="store_true", help="write SQLite and Chroma metadata updates")
+    parser.add_argument("--project", default="", help="optional canonical project_id filter")
+    parser.add_argument("--batch-size", type=int, default=DEFAULT_BATCH_SIZE)
+    parser.add_argument(
+        "--chroma-snapshot-confirmed",
+        action="store_true",
+        help="required with --apply; confirms a Chroma snapshot exists",
+    )
+    args = parser.parse_args()
+
+    payload = backfill(
+        apply=args.apply,
+        project_filter=args.project.strip(),
+        batch_size=args.batch_size,
+        require_chroma_snapshot=args.chroma_snapshot_confirmed,
+    )
+    print(json.dumps(payload, indent=2, ensure_ascii=True))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/live_status.py
+++ b/scripts/live_status.py
@@ -0,0 +1,131 @@
+"""Render a compact live-status report from a running AtoCore instance.
+
+This is intentionally read-only and stdlib-only so it can be used from a
+fresh checkout, a cron job, or a Codex/Claude session without installing the
+full app package. The output is meant to reduce docs drift: copy the report
+into status docs only after it was generated from the live service.
+"""
+
+from __future__ import annotations
+
+import argparse
+import errno
+import json
+import os
+import sys
+import urllib.error
+import urllib.request
+from typing import Any
+
+
+DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100").rstrip("/")
+DEFAULT_TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
+DEFAULT_AUTH_TOKEN = os.environ.get("ATOCORE_AUTH_TOKEN", "").strip()
+
+
+def request_json(base_url: str, path: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
+    headers = {"Authorization": f"Bearer {auth_token}"} if auth_token else {}
+    req = urllib.request.Request(f"{base_url}{path}", method="GET", headers=headers)
+    with urllib.request.urlopen(req, timeout=timeout) as response:
+        body = response.read().decode("utf-8")
+        status = getattr(response, "status", None)
+    payload = json.loads(body) if body.strip() else {}
+    if not isinstance(payload, dict):
+        payload = {"value": payload}
+    if status is not None:
+        payload["_http_status"] = status
+    return payload
+
+
+def collect_status(base_url: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
+    payload: dict[str, Any] = {"base_url": base_url}
+    for name, path in {
+        "health": "/health",
+        "stats": "/stats",
+        "projects": "/projects",
+        "dashboard": "/admin/dashboard",
+    }.items():
+        try:
+            payload[name] = request_json(base_url, path, timeout, auth_token)
+        except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
+            payload[name] = {"error": str(exc)}
+    return payload
+
+
+def render_markdown(status: dict[str, Any]) -> str:
+    health = status.get("health", {})
+    stats = status.get("stats", {})
+    projects = status.get("projects", {}).get("projects", [])
+    dashboard = status.get("dashboard", {})
+    memories = dashboard.get("memories", {}) if isinstance(dashboard.get("memories"), dict) else {}
+    project_state = dashboard.get("project_state", {}) if isinstance(dashboard.get("project_state"), dict) else {}
+    interactions = dashboard.get("interactions", {}) if isinstance(dashboard.get("interactions"), dict) else {}
+    pipeline = dashboard.get("pipeline", {}) if isinstance(dashboard.get("pipeline"), dict) else {}
+
+    lines = [
+        "# AtoCore Live Status",
+        "",
+        f"- base_url: `{status.get('base_url', '')}`",
+        "- endpoint_http_statuses: "
+        f"`health={health.get('_http_status', 'error')}, "
+        f"stats={stats.get('_http_status', 'error')}, "
+        f"projects={status.get('projects', {}).get('_http_status', 'error')}, "
+        f"dashboard={dashboard.get('_http_status', 'error')}`",
+        f"- service_status: `{health.get('status', 'unknown')}`",
+        f"- code_version: `{health.get('code_version', health.get('version', 'unknown'))}`",
+        f"- build_sha: `{health.get('build_sha', 'unknown')}`",
+        f"- build_branch: `{health.get('build_branch', 'unknown')}`",
+        f"- build_time: `{health.get('build_time', 'unknown')}`",
+        f"- env: `{health.get('env', 'unknown')}`",
+        f"- documents: `{stats.get('total_documents', 'unknown')}`",
+        f"- chunks: `{stats.get('total_chunks', 'unknown')}`",
+        f"- vectors: `{stats.get('total_vectors', health.get('vectors_count', 'unknown'))}`",
+        f"- registered_projects: `{len(projects)}`",
+        f"- active_memories: `{memories.get('active', 'unknown')}`",
+        f"- candidate_memories: `{memories.get('candidates', 'unknown')}`",
+        f"- interactions: `{interactions.get('total', 'unknown')}`",
+        f"- project_state_entries: `{project_state.get('total', 'unknown')}`",
+        f"- pipeline_last_run: `{pipeline.get('last_run', 'unknown')}`",
+    ]
+
+    if projects:
+        lines.extend(["", "## Projects"])
+        for project in projects:
+            aliases = ", ".join(project.get("aliases", []))
+            suffix = f" ({aliases})" if aliases else ""
+            lines.append(f"- `{project.get('id', '')}`{suffix}")
+
+    return "\n".join(lines) + "\n"
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Render live AtoCore status")
+    parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
+    parser.add_argument("--timeout", type=int, default=DEFAULT_TIMEOUT)
+    parser.add_argument(
+        "--auth-token",
+        default=DEFAULT_AUTH_TOKEN,
+        help="Bearer token; defaults to ATOCORE_AUTH_TOKEN when set",
+    )
+    parser.add_argument("--json", action="store_true", help="emit raw JSON")
+    args = parser.parse_args()
+
+    status = collect_status(args.base_url.rstrip("/"), args.timeout, args.auth_token)
+    if args.json:
+        output = json.dumps(status, indent=2, ensure_ascii=True) + "\n"
+    else:
+        output = render_markdown(status)
+
+    try:
+        sys.stdout.write(output)
+    except BrokenPipeError:
+        return 0
+    except OSError as exc:
+        if exc.errno in {errno.EINVAL, errno.EPIPE}:
+            return 0
+        raise
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/retrieval_eval.py
+++ b/scripts/retrieval_eval.py
@@ -44,6 +44,7 @@ import urllib.error
 import urllib.parse
 import urllib.request
 from dataclasses import dataclass, field
+from datetime import datetime, timezone
 from pathlib import Path

 DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
@@ -52,6 +53,13 @@ DEFAULT_BUDGET = 3000
 DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json"


+def request_json(base_url: str, path: str, timeout: int) -> dict:
+    req = urllib.request.Request(f"{base_url}{path}", method="GET")
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        body = resp.read().decode("utf-8")
+    return json.loads(body) if body.strip() else {}
+
+
@dataclass
 class Fixture:
    name: str
@@ -60,6 +68,7 @@ class Fixture:
    budget: int = DEFAULT_BUDGET
    expect_present: list[str] = field(default_factory=list)
    expect_absent: list[str] = field(default_factory=list)
+    known_issue: bool = False
    notes: str = ""


@@ -70,8 +79,13 @@ class FixtureResult:
    missing_present: list[str]
    unexpected_absent: list[str]
    total_chars: int
+    known_issue: bool = False
    error: str = ""

+    @property
+    def blocking_failure(self) -> bool:
+        return not self.ok and not self.known_issue
+

 def load_fixtures(path: Path) -> list[Fixture]:
    data = json.loads(path.read_text(encoding="utf-8"))
@@ -89,6 +103,7 @@ def load_fixtures(path: Path) -> list[Fixture]:
                budget=int(raw.get("budget", DEFAULT_BUDGET)),
                expect_present=list(raw.get("expect_present", [])),
                expect_absent=list(raw.get("expect_absent", [])),
+                known_issue=bool(raw.get("known_issue", False)),
                notes=raw.get("notes", ""),
            )
        )
@@ -117,6 +132,7 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
            missing_present=list(fixture.expect_present),
            unexpected_absent=[],
            total_chars=0,
+            known_issue=fixture.known_issue,
            error=f"http_error: {exc}",
        )

@@ -129,16 +145,26 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
        missing_present=missing,
        unexpected_absent=unexpected,
        total_chars=len(formatted),
+        known_issue=fixture.known_issue,
    )


-def print_human_report(results: list[FixtureResult]) -> None:
+def print_human_report(results: list[FixtureResult], metadata: dict) -> None:
    total = len(results)
    passed = sum(1 for r in results if r.ok)
+    known = sum(1 for r in results if not r.ok and r.known_issue)
+    blocking = sum(1 for r in results if r.blocking_failure)
    print(f"Retrieval eval: {passed}/{total} fixtures passed")
+    print(
+        "Target: "
+        f"{metadata.get('base_url', 'unknown')} "
+        f"build={metadata.get('health', {}).get('build_sha', 'unknown')}"
+    )
+    if known or blocking:
+        print(f"Blocking failures: {blocking}  Known issues: {known}")
    print()
    for r in results:
-        marker = "PASS" if r.ok else "FAIL"
+        marker = "PASS" if r.ok else ("KNOWN" if r.known_issue else "FAIL")
        print(f"[{marker}] {r.fixture.name}  project={r.fixture.project}  chars={r.total_chars}")
        if r.error:
            print(f"       error: {r.error}")
@@ -150,15 +176,21 @@ def print_human_report(results: list[FixtureResult]) -> None:
            print(f"       notes: {r.fixture.notes}")


-def print_json_report(results: list[FixtureResult]) -> None:
+def print_json_report(results: list[FixtureResult], metadata: dict) -> None:
    payload = {
+        "generated_at": metadata.get("generated_at"),
+        "base_url": metadata.get("base_url"),
+        "health": metadata.get("health", {}),
        "total": len(results),
        "passed": sum(1 for r in results if r.ok),
+        "known_issues": sum(1 for r in results if not r.ok and r.known_issue),
+        "blocking_failures": sum(1 for r in results if r.blocking_failure),
        "fixtures": [
            {
                "name": r.fixture.name,
                "project": r.fixture.project,
                "ok": r.ok,
+                "known_issue": r.known_issue,
                "total_chars": r.total_chars,
                "missing_present": r.missing_present,
                "unexpected_absent": r.unexpected_absent,
@@ -179,15 +211,26 @@ def main() -> int:
    parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
    args = parser.parse_args()

+    base_url = args.base_url.rstrip("/")
+    try:
+        health = request_json(base_url, "/health", args.timeout)
+    except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
+        health = {"error": str(exc)}
+    metadata = {
+        "generated_at": datetime.now(timezone.utc).isoformat(),
+        "base_url": base_url,
+        "health": health,
+    }
+
    fixtures = load_fixtures(args.fixtures)
-    results = [run_fixture(f, args.base_url, args.timeout) for f in fixtures]
+    results = [run_fixture(f, base_url, args.timeout) for f in fixtures]

    if args.json:
-        print_json_report(results)
+        print_json_report(results, metadata)
    else:
-        print_human_report(results)
+        print_human_report(results, metadata)

-    return 0 if all(r.ok for r in results) else 1
+    return 0 if not any(r.blocking_failure for r in results) else 1


 if __name__ == "__main__":
--- a/scripts/retrieval_eval_fixtures.json
+++ b/scripts/retrieval_eval_fixtures.json
@@ -27,7 +27,7 @@
    "expect_absent": [
      "polisher suite"
    ],
-    "notes": "Key constraints are in Trusted Project State and in the mission-framing memory"
+    "notes": "Regression guard: query-relevant Trusted Project State requirements must survive the project-state budget cap."
  },
  {
    "name": "p04-short-ambiguous",
@@ -80,6 +80,36 @@
    ],
    "notes": "CGH is a core p05 concept. Should surface via chunks and possibly the architecture memory. Must not bleed p06 polisher-suite terms."
  },
+  {
+    "name": "p05-broad-status-no-atomizer",
+    "project": "p05-interferometer",
+    "prompt": "current status",
+    "expect_present": [
+      "--- Trusted Project State ---",
+      "--- Project Memories ---",
+      "Zygo"
+    ],
+    "expect_absent": [
+      "atomizer-v2",
+      "ATOMIZER_PODCAST_BRIEFING",
+      "[Source: atomizer-v2/",
+      "P04-GigaBIT-M1-KB-design"
+    ],
+    "notes": "Regression guard for the April 24 audit finding: broad p05 status queries must not pull Atomizer/archive context into project-scoped packs."
+  },
+  {
+    "name": "p05-vendor-decision-no-archive-first",
+    "project": "p05-interferometer",
+    "prompt": "vendor selection decision",
+    "expect_present": [
+      "Selection-Decision"
+    ],
+    "expect_absent": [
+      "[Source: atomizer-v2/",
+      "ATOMIZER_PODCAST_BRIEFING"
+    ],
+    "notes": "Project-scoped decision query should stay inside p05 and prefer current decision/vendor material over unrelated project archives."
+  },
  {
    "name": "p06-suite-split",
    "project": "p06-polisher",
--- a/src/atocore/api/routes.py
+++ b/src/atocore/api/routes.py
@@ -303,6 +303,7 @@ class MemoryUpdateRequest(BaseModel):
    memory_type: str | None = None
    domain_tags: list[str] | None = None
    valid_until: str | None = None
+    project: str | None = None


 class ProjectStateSetRequest(BaseModel):
@@ -473,6 +474,33 @@ def api_register_emerging_project(req: RegisterEmergingRequest) -> dict:
    return result


+@router.get("/admin/projects/proposals")
+def api_project_proposals(min_active: int = 10) -> dict:
+    """Live registration proposals for unregistered projects.
+
+    Reads SQL + the registry directly, so the result is current — unlike
+    `/admin/dashboard.proposals.unregistered_projects` which is the
+    nightly cache from `scripts/detect_emerging.py`. Each proposal
+    includes a guessed ingest root, sibling labels suggested as aliases,
+    and a few sample memories so the operator can sanity-check before
+    POSTing to /admin/projects/register-emerging.
+
+    Query params:
+        min_active: minimum active-memory count for a label to surface
+                    (default 10).
+    """
+    from atocore.memory.service import propose_emerging_projects
+
+    if min_active < 1:
+        raise HTTPException(status_code=400, detail="min_active must be >= 1")
+    proposals = propose_emerging_projects(min_active=min_active)
+    return {
+        "proposals": proposals,
+        "count": len(proposals),
+        "min_active": min_active,
+    }
+
+
@router.put("/projects/{project_name}")
 def api_project_update(project_name: str, req: ProjectUpdateRequest) -> dict:
    """Update an existing project registration."""
@@ -636,6 +664,7 @@ def api_update_memory(memory_id: str, req: MemoryUpdateRequest) -> dict:
            memory_type=req.memory_type,
            domain_tags=req.domain_tags,
            valid_until=req.valid_until,
+            project=req.project,
        )
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))
@@ -794,33 +823,25 @@ def api_invalidate_memory(
    req: MemoryInvalidateRequest | None = None,
 ) -> dict:
    """Retract an active memory (Issue E — active → invalid)."""
-    from atocore.memory.service import get_memories as _get_memories, invalidate_memory
+    from atocore.memory.service import get_memory, invalidate_memory

    reason = req.reason if req else ""
-    # Quick existence/status check for a clean 404 vs 409.
-    existing = [
-        m for m in _get_memories(status="active", limit=1)
-        if m.id == memory_id
-    ]
-    if not existing:
-        # Fall through to generic not-active if the id exists in another status.
-        all_match = [
-            m for m in _get_memories(status="candidate", limit=5000)
-            + _get_memories(status="invalid", limit=5000)
-            + _get_memories(status="superseded", limit=5000)
-            if m.id == memory_id
-        ]
-        if all_match:
-            if all_match[0].status == "invalid":
-                return {"status": "already_invalid", "id": memory_id}
-            raise HTTPException(
-                status_code=409,
-                detail=(
-                    f"Memory {memory_id} is {all_match[0].status}; "
-                    "use /reject for candidates"
-                ),
-            )
+    # Direct id lookup — earlier code used get_memories(status='active', limit=1)
+    # which only saw the highest-confidence active row, so any other active
+    # memory would 404 here even though it existed.
+    target = get_memory(memory_id)
+    if target is None:
        raise HTTPException(status_code=404, detail=f"Memory not found: {memory_id}")
+    if target.status == "invalid":
+        return {"status": "already_invalid", "id": memory_id}
+    if target.status != "active":
+        raise HTTPException(
+            status_code=409,
+            detail=(
+                f"Memory {memory_id} is {target.status}; "
+                "use /reject for candidates"
+            ),
+        )

    success = invalidate_memory(memory_id, actor="api-http", reason=reason)
    if not success:
@@ -833,15 +854,33 @@ def api_supersede_memory(
    memory_id: str,
    req: MemorySupersedeRequest | None = None,
 ) -> dict:
-    """Supersede an active memory (Issue E — active → superseded)."""
-    from atocore.memory.service import supersede_memory
+    """Supersede an active memory (Issue E — active → superseded).
+
+    Mirrors the invalidate route's status guard: candidates and other
+    non-active rows must not silently flip to superseded.
+    """
+    from atocore.memory.service import get_memory, supersede_memory

    reason = req.reason if req else ""
+    target = get_memory(memory_id)
+    if target is None:
+        raise HTTPException(status_code=404, detail=f"Memory not found: {memory_id}")
+    if target.status == "superseded":
+        return {"status": "already_superseded", "id": memory_id}
+    if target.status != "active":
+        raise HTTPException(
+            status_code=409,
+            detail=(
+                f"Memory {memory_id} is {target.status}; "
+                "only active memories can be superseded"
+            ),
+        )
+
    success = supersede_memory(memory_id, actor="api-http", reason=reason)
    if not success:
        raise HTTPException(
-            status_code=404,
-            detail=f"Memory not found or not active: {memory_id}",
+            status_code=409,
+            detail=f"Memory {memory_id} could not be superseded",
        )
    return {"status": "superseded", "id": memory_id}

@@ -1280,16 +1319,20 @@ def api_dashboard() -> dict:
    health beyond the basic /health endpoint.
    """
    import json as _json
-    from collections import Counter
    from datetime import datetime as _dt, timezone as _tz

-    all_memories = get_memories(active_only=False, limit=500)
-    active = [m for m in all_memories if m.status == "active"]
-    candidates = [m for m in all_memories if m.status == "candidate"]
+    from atocore.memory.service import get_memory_count_summary

-    type_counts = dict(Counter(m.memory_type for m in active))
-    project_counts = dict(Counter(m.project or "(none)" for m in active))
-    reinforced = [m for m in active if m.reference_count > 0]
+    # SQL-backed counts. Earlier code derived these by sampling the top
+    # 500 rows of get_memories() ordered by confidence — anything past
+    # the cap was invisible, so /admin/dashboard silently undercounted
+    # active memories once the corpus crossed ~500 active rows.
+    counts = get_memory_count_summary()
+    active_total = counts["active"]["total"]
+    candidate_total = counts["by_status"].get("candidate", 0)
+    type_counts = counts["active"]["by_type"]
+    project_counts = counts["active"]["by_project"]
+    reinforced_total = counts["active"]["reinforced"]

    # Interaction stats — total + by_client from DB directly
    interaction_stats: dict = {"most_recent": None, "total": 0, "by_client": {}}
@@ -1402,13 +1445,13 @@ def api_dashboard() -> dict:

    # Triage queue health
    triage: dict = {
-        "pending": len(candidates),
+        "pending": candidate_total,
        "review_url": "/admin/triage",
    }
-    if len(candidates) > 50:
-        triage["warning"] = f"High queue: {len(candidates)} candidates pending review."
-    elif len(candidates) > 20:
-        triage["notice"] = f"{len(candidates)} candidates awaiting triage."
+    if candidate_total > 50:
+        triage["warning"] = f"High queue: {candidate_total} candidates pending review."
+    elif candidate_total > 20:
+        triage["notice"] = f"{candidate_total} candidates awaiting triage."

    # Recent audit activity (Phase 4 V1) — last 10 mutations for operator
    recent_audit: list[dict] = []
@@ -1420,11 +1463,13 @@ def api_dashboard() -> dict:

    return {
        "memories": {
-            "active": len(active),
-            "candidates": len(candidates),
+            "active": active_total,
+            "candidates": candidate_total,
            "by_type": type_counts,
            "by_project": project_counts,
-            "reinforced": len(reinforced),
+            "reinforced": reinforced_total,
+            "by_status": counts["by_status"],
+            "total": counts["total"],
        },
        "project_state": {
            "counts": ps_counts,
@@ -2348,6 +2393,41 @@ def api_entity_audit(entity_id: str, limit: int = 100) -> dict:
    return {"entity_id": entity_id, "entries": entries, "count": len(entries)}


+@router.get("/entities/Subsystem/{subsystem_id}")
+def api_subsystem_contents(subsystem_id: str, expand: str = "contains") -> dict:
+    """Q-001 (subsystem-scoped variant) — the spec-shaped invocation.
+
+    Per ``docs/architecture/engineering-query-catalog.md`` Q-001:
+        ``GET /entities/Subsystem/<id>?expand=contains``
+        → ``{ subsystem, contains: [{ id, type, name, status }] }``
+
+    Distinct from the project-wide tree at
+    ``GET /engineering/projects/{name}/systems`` (Q-004), which stays
+    as-is. V1-A only adds this single shape fix; broader query catalog
+    closure is V1-C.
+
+    ``expand`` is currently restricted to ``"contains"``; other expand
+    facets (Q-007 ``constraints``, Q-008 ``decisions``) land in V1-C.
+    """
+    if expand != "contains":
+        raise HTTPException(
+            status_code=400,
+            detail=(
+                f"unsupported expand={expand!r} for /entities/Subsystem; "
+                "V1-A supports only 'contains'."
+            ),
+        )
+    from atocore.engineering.queries import subsystem_contents
+
+    result = subsystem_contents(subsystem_id)
+    if result is None:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Subsystem not found or not a subsystem: {subsystem_id}",
+        )
+    return result
+
+
@router.get("/entities/{entity_id}")
 def api_get_entity(entity_id: str) -> dict:
    """Get an entity with its relationships and related entities."""
--- a/src/atocore/config.py
+++ b/src/atocore/config.py
@@ -46,6 +46,8 @@ class Settings(BaseSettings):
    # All multipliers default to the values used since Wave 1; tighten or
    # loosen them via ATOCORE_* env vars without touching code.
    rank_project_match_boost: float = 2.0
+    rank_project_scope_filter: bool = True
+    rank_project_scope_candidate_multiplier: int = 4
    rank_query_token_step: float = 0.08
    rank_query_token_cap: float = 1.32
    rank_path_high_signal_boost: float = 1.18
--- a/src/atocore/context/builder.py
+++ b/src/atocore/context/builder.py
@@ -11,7 +11,7 @@ from dataclasses import dataclass, field
 from pathlib import Path

 import atocore.config as _config
-from atocore.context.project_state import format_project_state, get_state
+from atocore.context.project_state import ProjectStateEntry, format_project_state, get_state
 from atocore.memory.service import get_memories_for_context
 from atocore.observability.logger import get_logger
 from atocore.engineering.service import get_entities, get_entity_with_context
@@ -116,6 +116,11 @@ def build_context(
    if canonical_project:
        state_entries = get_state(canonical_project)
        if state_entries:
+            state_entries = _rank_project_state_entries(
+                state_entries,
+                query=user_prompt,
+                project=canonical_project,
+            )
            project_state_text = format_project_state(state_entries)
            project_state_text, project_state_chars = _truncate_text_block(
                project_state_text,
@@ -284,6 +289,55 @@ def get_last_context_pack() -> ContextPack | None:
    return _last_context_pack


+def _rank_project_state_entries(
+    entries: list[ProjectStateEntry],
+    query: str,
+    project: str,
+) -> list[ProjectStateEntry]:
+    """Promote query-relevant trusted state before the state band is truncated."""
+    if not query or len(entries) <= 1:
+        return entries
+
+    from atocore.memory.reinforcement import _normalize, _tokenize
+
+    query_text = _normalize(query.replace("_", " "))
+    query_tokens = set(_tokenize(query_text))
+    query_tokens -= {
+        "how",
+        "what",
+        "when",
+        "where",
+        "which",
+        "who",
+        "why",
+        "current",
+        "status",
+        "project",
+    }
+    for part in (project or "").lower().replace("_", "-").split("-"):
+        query_tokens.discard(part)
+    if not query_tokens:
+        return entries
+
+    scored: list[tuple[int, float, float, int, ProjectStateEntry]] = []
+    for index, entry in enumerate(entries):
+        entry_text = " ".join(
+            [
+                entry.category,
+                entry.key.replace("_", " "),
+                entry.value,
+                entry.source,
+            ]
+        )
+        entry_tokens = _tokenize(_normalize(entry_text))
+        overlap = len(entry_tokens & query_tokens) if entry_tokens else 0
+        density = overlap / len(entry_tokens) if entry_tokens else 0.0
+        scored.append((overlap, density, entry.confidence, -index, entry))
+
+    scored.sort(key=lambda item: (item[0], item[1], item[2], item[3]), reverse=True)
+    return [entry for _, _, _, _, entry in scored]
+
+
 def _rank_chunks(
    candidates: list[ChunkResult],
    project_hint: str | None,
--- a/src/atocore/engineering/queries.py
+++ b/src/atocore/engineering/queries.py
@@ -118,6 +118,70 @@ def system_map(project: str) -> dict:
    return out


+def subsystem_contents(subsystem_id: str) -> dict | None:
+    """Q-001 subsystem-scoped variant: a single subsystem and its
+    direct ``CONTAINS`` children.
+
+    Spec: ``GET /entities/Subsystem/<id>?expand=contains`` per
+    ``docs/architecture/engineering-query-catalog.md`` Q-001.
+
+    Differs from :func:`system_map` (Q-004) which returns the
+    project-wide tree. The subsystem-scoped form is what individual
+    operator queries actually need: "what's inside this one subsystem?"
+    rather than "show me the whole project."
+
+    The relationship walk uses inbound ``part_of`` edges (the inverse
+    of ``CONTAINS``) so both child Components and child Subsystems
+    surface uniformly. Filters to active children only — superseded
+    or invalid rows do not belong in a "current contents" answer.
+
+    Returns:
+        ``{"subsystem": {id, name, project, status, description},
+           "contains": [{id, entity_type, name, status}]}``
+        or ``None`` when the entity does not exist or is not a subsystem.
+    """
+    with get_connection() as conn:
+        ss = conn.execute(
+            "SELECT * FROM entities WHERE id = ?",
+            (subsystem_id,),
+        ).fetchone()
+        if ss is None or ss["entity_type"] != "subsystem":
+            return None
+        rows = conn.execute(
+            "SELECT e.id, e.entity_type, e.name, e.status "
+            "FROM relationships r "
+            "JOIN entities e ON e.id = r.source_entity_id "
+            "WHERE r.relationship_type = 'part_of' "
+            "AND r.target_entity_id = ? "
+            "AND e.status = 'active' "
+            "ORDER BY e.entity_type, e.name",
+            (subsystem_id,),
+        ).fetchall()
+
+    return {
+        "subsystem": {
+            "id": ss["id"],
+            "name": ss["name"],
+            "project": ss["project"],
+            "status": ss["status"],
+            "description": ss["description"] or "",
+        },
+        "contains": [
+            {
+                "id": r["id"],
+                # V1-A spec uses `type` per engineering-query-catalog.md Q-001;
+                # `entity_type` is duplicated for parity with the rest of
+                # this module's response shape (see `_entity_dict`).
+                "type": r["entity_type"],
+                "entity_type": r["entity_type"],
+                "name": r["name"],
+                "status": r["status"],
+            }
+            for r in rows
+        ],
+    }
+
+
 def decisions_affecting(project: str, subsystem_id: str | None = None) -> dict:
    """Q-008: decisions that affect a subsystem (or whole project).

--- a/src/atocore/ingestion/pipeline.py
+++ b/src/atocore/ingestion/pipeline.py
@@ -32,10 +32,23 @@ def exclusive_ingestion():
        _INGESTION_LOCK.release()


-def ingest_file(file_path: Path) -> dict:
+def ingest_file(file_path: Path, project_id: str = "") -> dict:
    """Ingest a single markdown file. Returns stats."""
    start = time.time()
    file_path = file_path.resolve()
+    project_id = (project_id or "").strip()
+    if not project_id:
+        try:
+            from atocore.projects.registry import derive_project_id_for_path
+
+            project_id = derive_project_id_for_path(file_path)
+        except Exception as exc:
+            log.warning(
+                "project_id_derivation_failed",
+                file_path=str(file_path),
+                error=str(exc),
+            )
+            project_id = ""

    if not file_path.exists():
        raise FileNotFoundError(f"File not found: {file_path}")
@@ -65,6 +78,7 @@ def ingest_file(file_path: Path) -> dict:
        "source_file": str(file_path),
        "tags": parsed.tags,
        "title": parsed.title,
+        "project_id": project_id,
    }
    chunks = chunk_markdown(parsed.body, base_metadata=base_meta)

@@ -116,6 +130,7 @@ def ingest_file(file_path: Path) -> dict:
                        "source_file": str(file_path),
                        "tags": json.dumps(parsed.tags),
                        "title": parsed.title,
+                        "project_id": project_id,
                    })

                    conn.execute(
@@ -173,7 +188,17 @@ def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
        purge_deleted: If True, remove DB/vector entries for files
                       that no longer exist on disk.
    """
+    return ingest_project_folder(folder_path, purge_deleted=purge_deleted, project_id="")
+
+
+def ingest_project_folder(
+    folder_path: Path,
+    purge_deleted: bool = True,
+    project_id: str = "",
+) -> list[dict]:
+    """Ingest a folder and annotate chunks with an optional project id."""
    folder_path = folder_path.resolve()
+    project_id = (project_id or "").strip()
    if not folder_path.is_dir():
        raise NotADirectoryError(f"Not a directory: {folder_path}")

@@ -187,7 +212,7 @@ def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
    # Ingest new/changed files
    for md_file in md_files:
        try:
-            result = ingest_file(md_file)
+            result = ingest_file(md_file, project_id=project_id)
            results.append(result)
        except Exception as e:
            log.error("ingestion_error", file_path=str(md_file), error=str(e))
--- a/src/atocore/main.py
+++ b/src/atocore/main.py
@@ -66,6 +66,8 @@ _V1_PUBLIC_PATHS = {
    "/entities/{entity_id}/invalidate",
    "/entities/{entity_id}/supersede",
    "/entities/{entity_id}/audit",
+    # V1-A: Q-001 subsystem-scoped variant per engineering-query-catalog
+    "/entities/Subsystem/{subsystem_id}",
    "/relationships",
    "/ingest",
    "/ingest/sources",
--- a/src/atocore/memory/service.py
+++ b/src/atocore/memory/service.py
@@ -50,6 +50,9 @@ MEMORY_STATUSES = [
    "graduated",  # Phase 5: memory has become an entity; content frozen, forward pointer in properties
 ]

+DEFAULT_CONTEXT_MEMORY_LIMIT = 30
+QUERY_CONTEXT_MEMORY_LIMIT = 120
+

@dataclass
 class Memory:
@@ -344,6 +347,203 @@ def get_memories(
    return [_row_to_memory(r) for r in rows]


+def get_memory(memory_id: str) -> Memory | None:
+    """Return a single memory by id, or None if missing.
+
+    Direct id lookup (no LIMIT, no confidence ordering) — the right
+    primitive for routes that need to check a specific memory's status
+    before acting. Avoids the sampling pitfall where ``get_memories``
+    with a small ``limit`` could hide a target row sorted past the cap.
+    """
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT * FROM memories WHERE id = ?", (memory_id,)
+        ).fetchone()
+    return _row_to_memory(row) if row else None
+
+
+def get_memory_count_summary() -> dict:
+    """Aggregate memory counts straight from SQL (no sampling).
+
+    Returned shape:
+        {
+            "total": int,                          # all rows
+            "by_status": {status: int, ...},       # full table
+            "active": {
+                "total": int,
+                "reinforced": int,                 # active with reference_count > 0
+                "by_type": {memory_type: int, ...},
+                "by_project": {project_or_none: int, ...},
+            },
+        }
+
+    Distinct from ``get_memories(...)``, which is a row-fetcher with a
+    confidence-sorted LIMIT and is therefore not safe for counting.
+    """
+    summary: dict = {
+        "total": 0,
+        "by_status": {},
+        "active": {
+            "total": 0,
+            "reinforced": 0,
+            "by_type": {},
+            "by_project": {},
+        },
+    }
+
+    with get_connection() as conn:
+        row = conn.execute("SELECT count(*) FROM memories").fetchone()
+        summary["total"] = row[0] if row else 0
+
+        rows = conn.execute(
+            "SELECT status, count(*) FROM memories GROUP BY status"
+        ).fetchall()
+        summary["by_status"] = {r[0]: r[1] for r in rows}
+
+        active_total = summary["by_status"].get("active", 0)
+        summary["active"]["total"] = active_total
+
+        rows = conn.execute(
+            "SELECT memory_type, count(*) FROM memories "
+            "WHERE status = 'active' GROUP BY memory_type"
+        ).fetchall()
+        summary["active"]["by_type"] = {r[0]: r[1] for r in rows}
+
+        rows = conn.execute(
+            "SELECT COALESCE(NULLIF(project, ''), '(none)') AS project, count(*) "
+            "FROM memories WHERE status = 'active' GROUP BY project"
+        ).fetchall()
+        summary["active"]["by_project"] = {r[0]: r[1] for r in rows}
+
+        row = conn.execute(
+            "SELECT count(*) FROM memories "
+            "WHERE status = 'active' AND reference_count > 0"
+        ).fetchone()
+        summary["active"]["reinforced"] = row[0] if row else 0
+
+    return summary
+
+
+def propose_emerging_projects(min_active: int = 10) -> list[dict]:
+    """Return live, on-demand registration proposals for unregistered projects.
+
+    Differs from the nightly ``scripts/detect_emerging.py`` cache (which
+    is fresh once a day and lives in ``project_state.proposals``) by
+    reading current SQL and the registry directly. Each proposal is
+    operator-ready: a guessed ingest root, sibling labels suggested as
+    aliases, and a few sample memories so the operator can sanity-check
+    the bucket before committing it.
+
+    Args:
+        min_active: minimum number of active memories required for a
+            label to surface as a proposal. Defaults to 10 — anything
+            smaller is too noisy to register without more signal.
+
+    Returns:
+        list of proposal dicts, sorted by active_count desc:
+            {
+                "project_id": str,
+                "active_count": int,
+                "candidate_count": int,
+                "suggested_aliases": list[str],
+                "guessed_ingest_root": {"source": "vault", "subpath": ...},
+                "sample_memories": [{id, content_preview, updated_at}, ...],
+            }
+    """
+    from atocore.projects.registry import load_project_registry
+
+    # Build the set of names already known to the registry (canonical + aliases),
+    # lowercased. Anything in this set is "registered" and not a proposal.
+    registered_names: set[str] = set()
+    try:
+        for project in load_project_registry():
+            registered_names.add(project.project_id.lower())
+            for alias in project.aliases:
+                registered_names.add(alias.lower())
+    except Exception:
+        # Fail-open: if the registry can't load, assume nothing is
+        # registered and let the proposal surface everything.
+        pass
+
+    with get_connection() as conn:
+        # Active counts per project (excluding empty/null project — that's
+        # the global bucket, not a proposal candidate).
+        active_rows = conn.execute(
+            "SELECT project, count(*) AS c FROM memories "
+            "WHERE status = 'active' AND project IS NOT NULL AND project != '' "
+            "GROUP BY project"
+        ).fetchall()
+        cand_rows = conn.execute(
+            "SELECT project, count(*) AS c FROM memories "
+            "WHERE status = 'candidate' AND project IS NOT NULL AND project != '' "
+            "GROUP BY project"
+        ).fetchall()
+
+    cand_counts = {r["project"]: r["c"] for r in cand_rows}
+
+    # Filter to unregistered labels above threshold
+    unregistered: list[tuple[str, int, int]] = []  # (project, active_n, candidate_n)
+    for r in active_rows:
+        proj = r["project"]
+        if proj.lower() in registered_names:
+            continue
+        if r["c"] < min_active:
+            continue
+        unregistered.append((proj, r["c"], cand_counts.get(proj, 0)))
+
+    # Sibling alias detection: two unregistered labels are siblings if
+    # they share a non-trivial token (length >= 4 after splitting on
+    # '-' and '_'). Cheap, defensible, and the operator gets to veto.
+    def _tokens(label: str) -> set[str]:
+        parts = label.lower().replace("_", "-").split("-")
+        return {p for p in parts if len(p) >= 4}
+
+    label_tokens = {label: _tokens(label) for label, _a, _c in unregistered}
+
+    proposals = []
+    for proj, active_n, candidate_n in sorted(
+        unregistered, key=lambda t: (-t[1], t[0])
+    ):
+        siblings = [
+            other
+            for other in label_tokens
+            if other != proj and (label_tokens[proj] & label_tokens[other])
+        ]
+        siblings.sort()
+
+        # Sample memories: top 3 active by updated_at desc
+        with get_connection() as conn:
+            sample_rows = conn.execute(
+                "SELECT id, content, updated_at FROM memories "
+                "WHERE status = 'active' AND project = ? "
+                "ORDER BY updated_at DESC LIMIT 3",
+                (proj,),
+            ).fetchall()
+        samples = [
+            {
+                "id": r["id"],
+                "content_preview": (r["content"] or "")[:160],
+                "updated_at": r["updated_at"],
+            }
+            for r in sample_rows
+        ]
+
+        proposals.append(
+            {
+                "project_id": proj,
+                "active_count": active_n,
+                "candidate_count": candidate_n,
+                "suggested_aliases": siblings,
+                "guessed_ingest_root": {
+                    "source": "vault",
+                    "subpath": f"incoming/projects/{proj}/",
+                },
+                "sample_memories": samples,
+            }
+        )
+    return proposals
+
+
 def update_memory(
    memory_id: str,
    content: str | None = None,
@@ -352,6 +552,7 @@ def update_memory(
    memory_type: str | None = None,
    domain_tags: list[str] | None = None,
    valid_until: str | None = None,
+    project: str | None = None,
    actor: str = "api",
    note: str = "",
 ) -> bool:
@@ -365,6 +566,10 @@ def update_memory(

        next_content = content if content is not None else existing["content"]
        next_status = status if status is not None else existing["status"]
+        next_project = (
+            resolve_project_name(project) if project is not None
+            else (existing["project"] or "")
+        )
        if confidence is not None:
            _validate_confidence(confidence)

@@ -372,7 +577,7 @@ def update_memory(
            duplicate = conn.execute(
                "SELECT id FROM memories "
                "WHERE memory_type = ? AND content = ? AND project = ? AND status = 'active' AND id != ?",
-                (existing["memory_type"], next_content, existing["project"] or "", memory_id),
+                (existing["memory_type"], next_content, next_project, memory_id),
            ).fetchone()
            if duplicate:
                raise ValueError("Update would create a duplicate active memory")
@@ -383,6 +588,7 @@ def update_memory(
            "status": existing["status"],
            "confidence": existing["confidence"],
            "memory_type": existing["memory_type"],
+            "project": existing["project"] or "",
        }
        after_snapshot = dict(before_snapshot)

@@ -419,6 +625,10 @@ def update_memory(
            updates.append("valid_until = ?")
            params.append(vu)
            after_snapshot["valid_until"] = vu or ""
+        if project is not None:
+            updates.append("project = ?")
+            params.append(next_project)
+            after_snapshot["project"] = next_project

        if not updates:
            return False
@@ -896,6 +1106,7 @@ def get_memories_for_context(
        from atocore.memory.reinforcement import _normalize, _tokenize

        query_tokens = _tokenize(_normalize(query))
+        query_tokens = _prepare_memory_query_tokens(query_tokens, project=project)
        if not query_tokens:
            query_tokens = None

@@ -908,12 +1119,13 @@ def get_memories_for_context(
    # ``_rank_memories_for_query`` via Python's stable sort.
    pool: list[Memory] = []
    seen_ids: set[str] = set()
+    candidate_limit = QUERY_CONTEXT_MEMORY_LIMIT if query_tokens is not None else DEFAULT_CONTEXT_MEMORY_LIMIT
    for mtype in memory_types:
        for mem in get_memories(
            memory_type=mtype,
            project=project,
            min_confidence=0.5,
-            limit=30,
+            limit=candidate_limit,
        ):
            if mem.id in seen_ids:
                continue
@@ -980,11 +1192,11 @@ def _rank_memories_for_query(
 ) -> list["Memory"]:
    """Rerank a memory list by lexical overlap with a pre-tokenized query.

-    Primary key: overlap_density (overlap_count / memory_token_count),
-    which rewards short focused memories that match the query precisely
-    over long overview memories that incidentally share a few tokens.
-    Secondary: absolute overlap count. Tertiary: domain-tag match.
-    Quaternary: confidence.
+    Primary key: absolute overlap count, which keeps a richer memory
+    matching multiple query-intent terms ahead of a short memory that
+    only happens to share one term. Secondary: overlap_density
+    (overlap_count / memory_token_count), so ties still prefer short
+    focused memories. Tertiary: domain-tag match. Quaternary: confidence.

    Phase 3: domain_tags contribute a boost when they appear in the
    query text. A memory tagged [optics, thermal] for a query about
@@ -1010,10 +1222,46 @@ def _rank_memories_for_query(
                tag_hits += 1

        scored.append((density, overlap, tag_hits, mem.confidence, mem))
-    scored.sort(key=lambda t: (t[0], t[1], t[2], t[3]), reverse=True)
+    scored.sort(key=lambda t: (t[1], t[0], t[2], t[3]), reverse=True)
    return [mem for _, _, _, _, mem in scored]


+_MEMORY_QUERY_STOP_TOKENS = {
+    "how",
+    "what",
+    "when",
+    "where",
+    "which",
+    "who",
+    "why",
+    "current",
+    "status",
+    "project",
+    "machine",
+}
+
+_MEMORY_QUERY_TOKEN_EXPANSIONS = {
+    "remotely": {"remote"},
+}
+
+
+def _prepare_memory_query_tokens(
+    query_tokens: set[str],
+    project: str | None = None,
+) -> set[str]:
+    """Remove project-scope noise and add tiny intent-preserving expansions."""
+    prepared = set(query_tokens)
+    for token in list(prepared):
+        prepared.update(_MEMORY_QUERY_TOKEN_EXPANSIONS.get(token, set()))
+
+    prepared -= _MEMORY_QUERY_STOP_TOKENS
+    if project:
+        for part in project.lower().replace("_", "-").split("-"):
+            if part:
+                prepared.discard(part)
+    return prepared
+
+
 def _row_to_memory(row) -> Memory:
    """Convert a DB row to Memory dataclass."""
    import json as _json
--- a/src/atocore/projects/registry.py
+++ b/src/atocore/projects/registry.py
@@ -8,7 +8,6 @@ from dataclasses import asdict, dataclass
 from pathlib import Path

 import atocore.config as _config
-from atocore.ingestion.pipeline import ingest_folder


 # Reserved pseudo-projects. `inbox` holds pre-project / lead / quote
@@ -260,6 +259,7 @@ def load_project_registry() -> list[RegisteredProject]:
        )

    _validate_unique_project_names(projects)
+    _validate_ingest_root_overlaps(projects)
    return projects


@@ -307,6 +307,28 @@ def resolve_project_name(name: str | None) -> str:
    return name


+def derive_project_id_for_path(file_path: str | Path) -> str:
+    """Return the registered project that owns a source path, if any."""
+    if not file_path:
+        return ""
+    doc_path = Path(file_path).resolve(strict=False)
+    matches: list[tuple[int, int, str]] = []
+
+    for project in load_project_registry():
+        for source_ref in project.ingest_roots:
+            root_path = _resolve_ingest_root(source_ref)
+            try:
+                doc_path.relative_to(root_path)
+            except ValueError:
+                continue
+            matches.append((len(root_path.parts), len(str(root_path)), project.project_id))
+
+    if not matches:
+        return ""
+    matches.sort(reverse=True)
+    return matches[0][2]
+
+
 def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
    """Ingest all configured source roots for a registered project.

@@ -322,6 +344,8 @@ def refresh_registered_project(project_name: str, purge_deleted: bool = False) -
    if project is None:
        raise ValueError(f"Unknown project: {project_name}")

+    from atocore.ingestion.pipeline import ingest_project_folder
+
    roots = []
    ingested_count = 0
    skipped_count = 0
@@ -346,7 +370,11 @@ def refresh_registered_project(project_name: str, purge_deleted: bool = False) -
            {
                **root_result,
                "status": "ingested",
-                "results": ingest_folder(resolved, purge_deleted=purge_deleted),
+                "results": ingest_project_folder(
+                    resolved,
+                    purge_deleted=purge_deleted,
+                    project_id=project.project_id,
+                ),
            }
        )
        ingested_count += 1
@@ -443,6 +471,33 @@ def _validate_unique_project_names(projects: list[RegisteredProject]) -> None:
            seen[key] = project.project_id


+def _validate_ingest_root_overlaps(projects: list[RegisteredProject]) -> None:
+    roots: list[tuple[str, Path]] = []
+    for project in projects:
+        for source_ref in project.ingest_roots:
+            roots.append((project.project_id, _resolve_ingest_root(source_ref)))
+
+    for i, (left_project, left_root) in enumerate(roots):
+        for right_project, right_root in roots[i + 1:]:
+            if left_project == right_project:
+                continue
+            try:
+                left_root.relative_to(right_root)
+                overlaps = True
+            except ValueError:
+                try:
+                    right_root.relative_to(left_root)
+                    overlaps = True
+                except ValueError:
+                    overlaps = False
+            if overlaps:
+                raise ValueError(
+                    "Project registry ingest root overlap: "
+                    f"'{left_root}' ({left_project}) and "
+                    f"'{right_root}' ({right_project})"
+                )
+
+
 def _find_name_collisions(
    project_id: str,
    aliases: list[str],
--- a/src/atocore/retrieval/retriever.py
+++ b/src/atocore/retrieval/retriever.py
@@ -1,5 +1,6 @@
 """Retrieval: query to ranked chunks."""

+import json
 import re
 import time
 from dataclasses import dataclass
@@ -7,7 +8,7 @@ from dataclasses import dataclass
 import atocore.config as _config
 from atocore.models.database import get_connection
 from atocore.observability.logger import get_logger
-from atocore.projects.registry import get_registered_project
+from atocore.projects.registry import RegisteredProject, get_registered_project, load_project_registry
 from atocore.retrieval.embeddings import embed_query
 from atocore.retrieval.vector_store import get_vector_store

@@ -83,6 +84,27 @@ def retrieve(
    """Retrieve the most relevant chunks for a query."""
    top_k = top_k or _config.settings.context_top_k
    start = time.time()
+    try:
+        scoped_project = get_registered_project(project_hint) if project_hint else None
+    except Exception as exc:
+        log.warning(
+            "project_scope_resolution_failed",
+            project_hint=project_hint,
+            error=str(exc),
+        )
+        scoped_project = None
+    scope_filter_enabled = bool(scoped_project and _config.settings.rank_project_scope_filter)
+    registered_projects = None
+    query_top_k = top_k
+    if scope_filter_enabled:
+        query_top_k = max(
+            top_k,
+            top_k * max(1, _config.settings.rank_project_scope_candidate_multiplier),
+        )
+        try:
+            registered_projects = load_project_registry()
+        except Exception:
+            registered_projects = None

    query_embedding = embed_query(query)
    store = get_vector_store()
@@ -101,11 +123,12 @@ def retrieve(

    results = store.query(
        query_embedding=query_embedding,
-        top_k=top_k,
+        top_k=query_top_k,
        where=where,
    )

    chunks = []
+    raw_result_count = len(results["ids"][0]) if results and results["ids"] and results["ids"][0] else 0
    if results and results["ids"] and results["ids"][0]:
        existing_ids = _existing_chunk_ids(results["ids"][0])
        for i, chunk_id in enumerate(results["ids"][0]):
@@ -117,6 +140,13 @@ def retrieve(
            meta = results["metadatas"][0][i] if results["metadatas"] else {}
            content = results["documents"][0][i] if results["documents"] else ""

+            if scope_filter_enabled and not _is_allowed_for_project_scope(
+                scoped_project,
+                meta,
+                registered_projects,
+            ):
+                continue
+
            score *= _query_match_boost(query, meta)
            score *= _path_signal_boost(meta)
            if project_hint:
@@ -137,42 +167,151 @@ def retrieve(

    duration_ms = int((time.time() - start) * 1000)
    chunks.sort(key=lambda chunk: chunk.score, reverse=True)
+    post_filter_count = len(chunks)
+    chunks = chunks[:top_k]

    log.info(
        "retrieval_done",
        query=query[:100],
        top_k=top_k,
+        query_top_k=query_top_k,
+        raw_results_count=raw_result_count,
+        post_filter_count=post_filter_count,
        results_count=len(chunks),
+        post_filter_dropped=max(0, raw_result_count - post_filter_count),
+        underfilled=bool(raw_result_count >= query_top_k and len(chunks) < top_k),
        duration_ms=duration_ms,
    )

    return chunks


+def _is_allowed_for_project_scope(
+    project: RegisteredProject,
+    metadata: dict,
+    registered_projects: list[RegisteredProject] | None = None,
+) -> bool:
+    """Return True when a chunk is target-project or not project-owned.
+
+    Project-hinted retrieval should not let one registered project's corpus
+    compete with another's. At the same time, unowned/global sources should
+    remain eligible because shared docs and cross-project references can be
+    genuinely useful. The registry gives us the boundary: if metadata matches
+    a registered project and it is not the requested project, filter it out.
+    """
+    if _metadata_matches_project(project, metadata):
+        return True
+
+    if registered_projects is None:
+        try:
+            registered_projects = load_project_registry()
+        except Exception:
+            return True
+
+    for other in registered_projects:
+        if other.project_id == project.project_id:
+            continue
+        if _metadata_matches_project(other, metadata):
+            return False
+    return True
+
+
+def _metadata_matches_project(project: RegisteredProject, metadata: dict) -> bool:
+    stored_project_id = str(metadata.get("project_id", "")).strip().lower()
+    if stored_project_id:
+        return stored_project_id == project.project_id.lower()
+
+    path = _metadata_source_path(metadata)
+    tags = _metadata_tags(metadata)
+    for term in _project_scope_terms(project):
+        if _path_matches_term(path, term) or term in tags:
+            return True
+    return False
+
+
+def _project_scope_terms(project: RegisteredProject) -> set[str]:
+    terms = {project.project_id.lower()}
+    terms.update(alias.lower() for alias in project.aliases)
+    for source_ref in project.ingest_roots:
+        normalized = source_ref.subpath.replace("\\", "/").strip("/").lower()
+        if normalized:
+            terms.add(normalized)
+            terms.add(normalized.split("/")[-1])
+    return {term for term in terms if term}
+
+
+def _metadata_searchable(metadata: dict) -> str:
+    return " ".join(
+        [
+            str(metadata.get("source_file", "")).replace("\\", "/").lower(),
+            str(metadata.get("title", "")).lower(),
+            str(metadata.get("heading_path", "")).lower(),
+            str(metadata.get("tags", "")).lower(),
+        ]
+    )
+
+
+def _metadata_source_path(metadata: dict) -> str:
+    return str(metadata.get("source_file", "")).replace("\\", "/").strip("/").lower()
+
+
+def _metadata_tags(metadata: dict) -> set[str]:
+    raw_tags = metadata.get("tags", [])
+    if isinstance(raw_tags, (list, tuple, set)):
+        return {str(tag).strip().lower() for tag in raw_tags if str(tag).strip()}
+    if isinstance(raw_tags, str):
+        try:
+            parsed = json.loads(raw_tags)
+        except json.JSONDecodeError:
+            parsed = [raw_tags]
+        if isinstance(parsed, (list, tuple, set)):
+            return {str(tag).strip().lower() for tag in parsed if str(tag).strip()}
+        if isinstance(parsed, str) and parsed.strip():
+            return {parsed.strip().lower()}
+    return set()
+
+
+def _path_matches_term(path: str, term: str) -> bool:
+    normalized = term.replace("\\", "/").strip("/").lower()
+    if not path or not normalized:
+        return False
+    if "/" in normalized:
+        return path == normalized or path.startswith(f"{normalized}/")
+    return normalized in set(path.split("/"))
+
+
+def _metadata_has_term(metadata: dict, term: str) -> bool:
+    normalized = term.replace("\\", "/").strip("/").lower()
+    if not normalized:
+        return False
+    if _path_matches_term(_metadata_source_path(metadata), normalized):
+        return True
+    if normalized in _metadata_tags(metadata):
+        return True
+    return re.search(
+        rf"(?<![a-z0-9]){re.escape(normalized)}(?![a-z0-9])",
+        _metadata_searchable(metadata),
+    ) is not None
+
+
 def _project_match_boost(project_hint: str, metadata: dict) -> float:
    """Return a project-aware relevance multiplier for raw retrieval."""
    hint_lower = project_hint.strip().lower()
    if not hint_lower:
        return 1.0

-    source_file = str(metadata.get("source_file", "")).lower()
-    title = str(metadata.get("title", "")).lower()
-    tags = str(metadata.get("tags", "")).lower()
-    searchable = " ".join([source_file, title, tags])
-
-    project = get_registered_project(project_hint)
-    candidate_names = {hint_lower}
-    if project is not None:
-        candidate_names.add(project.project_id.lower())
-        candidate_names.update(alias.lower() for alias in project.aliases)
-        candidate_names.update(
-            source_ref.subpath.replace("\\", "/").strip("/").split("/")[-1].lower()
-            for source_ref in project.ingest_roots
-            if source_ref.subpath.strip("/\\")
+    try:
+        project = get_registered_project(project_hint)
+    except Exception as exc:
+        log.warning(
+            "project_match_boost_resolution_failed",
+            project_hint=project_hint,
+            error=str(exc),
        )
-
+        project = None
+    candidate_names = _project_scope_terms(project) if project is not None else {hint_lower}
    for candidate in candidate_names:
-        if candidate and candidate in searchable:
+        if _metadata_has_term(metadata, candidate):
            return _config.settings.rank_project_match_boost

    return 1.0
--- a/src/atocore/retrieval/vector_store.py
+++ b/src/atocore/retrieval/vector_store.py
@@ -64,6 +64,18 @@ class VectorStore:
            self._collection.delete(ids=ids)
            log.debug("vectors_deleted", count=len(ids))

+    def get_metadatas(self, ids: list[str]) -> dict:
+        """Fetch vector metadata by chunk IDs."""
+        if not ids:
+            return {"ids": [], "metadatas": []}
+        return self._collection.get(ids=ids, include=["metadatas"])
+
+    def update_metadatas(self, ids: list[str], metadatas: list[dict]) -> None:
+        """Update vector metadata without re-embedding documents."""
+        if ids:
+            self._collection.update(ids=ids, metadatas=metadatas)
+            log.debug("vector_metadatas_updated", count=len(ids))
+
    @property
    def count(self) -> int:
        return self._collection.count()
--- a/tests/test_backfill_chunk_project_ids.py
+++ b/tests/test_backfill_chunk_project_ids.py
@@ -0,0 +1,154 @@
+"""Tests for explicit chunk project_id metadata backfill."""
+
+import json
+
+import atocore.config as config
+from atocore.models.database import get_connection, init_db
+from scripts import backfill_chunk_project_ids as backfill
+
+
+def _write_registry(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
+    project_dir.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p04-gigabit",
+                        "aliases": ["p04"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+    return project_dir
+
+
+def _insert_chunk(file_path, metadata=None, chunk_id="chunk-1"):
+    with get_connection() as conn:
+        conn.execute(
+            """
+            INSERT INTO source_documents (id, file_path, file_hash, title, doc_type, tags)
+            VALUES (?, ?, ?, ?, ?, ?)
+            """,
+            ("doc-1", str(file_path), "hash", "Title", "markdown", "[]"),
+        )
+        conn.execute(
+            """
+            INSERT INTO source_chunks
+                (id, document_id, chunk_index, content, heading_path, char_count, metadata)
+            VALUES (?, ?, ?, ?, ?, ?, ?)
+            """,
+            (
+                chunk_id,
+                "doc-1",
+                0,
+                "content",
+                "Overview",
+                7,
+                json.dumps(metadata if metadata is not None else {}),
+            ),
+        )
+
+
+class FakeVectorStore:
+    def __init__(self, metadatas):
+        self.metadatas = dict(metadatas)
+        self.updated = []
+
+    def get_metadatas(self, ids):
+        returned_ids = [chunk_id for chunk_id in ids if chunk_id in self.metadatas]
+        return {
+            "ids": returned_ids,
+            "metadatas": [self.metadatas[chunk_id] for chunk_id in returned_ids],
+        }
+
+    def update_metadatas(self, ids, metadatas):
+        self.updated.append((list(ids), list(metadatas)))
+        for chunk_id, metadata in zip(ids, metadatas, strict=True):
+            self.metadatas[chunk_id] = metadata
+
+
+def test_backfill_dry_run_is_non_mutating(tmp_data_dir, tmp_path, monkeypatch):
+    init_db()
+    project_dir = _write_registry(tmp_path, monkeypatch)
+    _insert_chunk(project_dir / "status.md")
+
+    result = backfill.backfill(apply=False)
+
+    assert result["updates"] == 1
+    with get_connection() as conn:
+        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
+    assert json.loads(row["metadata"]) == {}
+
+
+def test_backfill_apply_updates_chroma_then_sql(tmp_data_dir, tmp_path, monkeypatch):
+    init_db()
+    project_dir = _write_registry(tmp_path, monkeypatch)
+    _insert_chunk(project_dir / "status.md", metadata={"source_file": "status.md"})
+    fake_store = FakeVectorStore({"chunk-1": {"source_file": "status.md"}})
+    monkeypatch.setattr(backfill, "get_vector_store", lambda: fake_store)
+
+    result = backfill.backfill(apply=True, require_chroma_snapshot=True)
+
+    assert result["applied_updates"] == 1
+    assert fake_store.metadatas["chunk-1"]["project_id"] == "p04-gigabit"
+    with get_connection() as conn:
+        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
+    assert json.loads(row["metadata"])["project_id"] == "p04-gigabit"
+
+
+def test_backfill_apply_requires_snapshot_confirmation(tmp_data_dir, tmp_path, monkeypatch):
+    init_db()
+    project_dir = _write_registry(tmp_path, monkeypatch)
+    _insert_chunk(project_dir / "status.md")
+
+    try:
+        backfill.backfill(apply=True)
+    except ValueError as exc:
+        assert "Chroma backup" in str(exc)
+    else:
+        raise AssertionError("Expected snapshot confirmation requirement")
+
+
+def test_backfill_missing_vector_skips_sql_update(tmp_data_dir, tmp_path, monkeypatch):
+    init_db()
+    project_dir = _write_registry(tmp_path, monkeypatch)
+    _insert_chunk(project_dir / "status.md")
+    fake_store = FakeVectorStore({})
+    monkeypatch.setattr(backfill, "get_vector_store", lambda: fake_store)
+
+    result = backfill.backfill(apply=True, require_chroma_snapshot=True)
+
+    assert result["updates"] == 1
+    assert result["applied_updates"] == 0
+    assert result["missing_vectors"] == 1
+    with get_connection() as conn:
+        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
+    assert json.loads(row["metadata"]) == {}
+
+
+def test_backfill_skips_malformed_metadata(tmp_data_dir, tmp_path, monkeypatch):
+    init_db()
+    project_dir = _write_registry(tmp_path, monkeypatch)
+    _insert_chunk(project_dir / "status.md", metadata=[])
+
+    result = backfill.backfill(apply=False)
+
+    assert result["updates"] == 0
+    assert result["malformed_metadata"] == 1
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -46,6 +46,8 @@ def test_settings_keep_legacy_db_path_when_present(tmp_path, monkeypatch):

 def test_ranking_weights_are_tunable_via_env(monkeypatch):
    monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5")
+    monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_FILTER", "false")
+    monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER", "6")
    monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12")
    monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5")
    monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25")
@@ -54,6 +56,8 @@ def test_ranking_weights_are_tunable_via_env(monkeypatch):
    settings = config.Settings()

    assert settings.rank_project_match_boost == 3.5
+    assert settings.rank_project_scope_filter is False
+    assert settings.rank_project_scope_candidate_multiplier == 6
    assert settings.rank_query_token_step == 0.12
    assert settings.rank_query_token_cap == 1.5
    assert settings.rank_path_high_signal_boost == 1.25
--- a/tests/test_context_builder.py
+++ b/tests/test_context_builder.py
@@ -143,6 +143,52 @@ def test_project_state_respects_total_budget(tmp_data_dir, sample_markdown):
    assert len(pack.formatted_context) <= 120


+def test_project_state_query_relevance_before_truncation(tmp_data_dir, sample_markdown):
+    """Relevant trusted state should survive the project-state budget cap."""
+    init_db()
+    init_project_state_schema()
+    ingest_file(sample_markdown)
+
+    set_state(
+        "p04-gigabit",
+        "contact",
+        "abb-space",
+        "ABB Space is the primary vendor contact for polishing, CCP, IBF, procurement coordination, "
+        "contract administration, interface planning, and delivery discussions.",
+    )
+    set_state(
+        "p04-gigabit",
+        "decision",
+        "back-structure",
+        "Option B selected: conical isogrid back structure with variable rib density. "
+        "Chosen over flat-back for stiffness-to-weight ratio and manufacturability.",
+    )
+    set_state(
+        "p04-gigabit",
+        "decision",
+        "polishing-vendor",
+        "ABB Space selected as polishing vendor. Contract includes computer-controlled polishing "
+        "and ion beam figuring.",
+    )
+    set_state(
+        "p04-gigabit",
+        "requirement",
+        "key_constraints",
+        "The program targets a 1.2 m lightweight Zerodur mirror with filtered mechanical WFE below 15 nm "
+        "and mass below 103.5 kg.",
+    )
+
+    pack = build_context(
+        "what are the key GigaBIT M1 program constraints",
+        project_hint="p04-gigabit",
+        budget=3000,
+    )
+
+    assert "Zerodur" in pack.formatted_context
+    assert "1.2" in pack.formatted_context
+    assert pack.formatted_context.find("[REQUIREMENT]") < pack.formatted_context.find("[CONTACT]")
+
+
 def test_project_hint_matches_state_case_insensitively(tmp_data_dir, sample_markdown):
    """Project state lookup should not depend on exact casing."""
    init_db()
--- a/tests/test_emerging_project_proposals.py
+++ b/tests/test_emerging_project_proposals.py
@@ -0,0 +1,217 @@
+"""Wave 1.5 — live emerging-project registration proposals.
+
+The nightly `scripts/detect_emerging.py` writes a stale cache to
+`project_state.proposals.unregistered_projects`. This endpoint provides
+the on-demand alternative that operators can hit before deciding which
+unregistered project to register via `/admin/projects/register-emerging`.
+"""
+
+import json
+
+import pytest
+from fastapi.testclient import TestClient
+
+import atocore.config as config
+from atocore.main import app
+from atocore.memory.service import create_memory
+from atocore.models.database import init_db
+
+
+@pytest.fixture
+def env(tmp_data_dir, tmp_path, monkeypatch):
+    """Fresh DB + a registry holding a single registered project so we
+    can prove the proposals endpoint excludes registered names."""
+    registry_path = tmp_path / "registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p04-gigabit",
+                        "aliases": ["p04", "gigabit"],
+                        "description": "test",
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+    init_db()
+    yield tmp_data_dir
+
+
+def test_proposals_excludes_registered_project_and_its_aliases(env):
+    """Memories tagged on a registered canonical id or any of its
+    aliases must not appear as a registration proposal."""
+    # Registered: p04-gigabit (aliases p04, gigabit)
+    for i in range(15):
+        create_memory("knowledge", f"p04 fact {i}", project="p04-gigabit")
+    for i in range(15):
+        create_memory("knowledge", f"alias fact {i}", project="p04")  # alias
+
+    # Unregistered, above threshold
+    for i in range(12):
+        create_memory("knowledge", f"apm fact {i}", project="apm")
+
+    client = TestClient(app)
+    body = client.get("/admin/projects/proposals?min_active=10").json()
+    ids = [p["project_id"] for p in body["proposals"]]
+    assert "apm" in ids
+    assert "p04-gigabit" not in ids
+    assert "p04" not in ids
+    assert "gigabit" not in ids
+
+
+def test_proposals_threshold_filters_low_count_labels(env):
+    create_memory("knowledge", "single one-off", project="discrawl")
+    for i in range(3):
+        create_memory("knowledge", f"low-volume {i}", project="drill")
+    for i in range(11):
+        create_memory("knowledge", f"high-volume {i}", project="apm")
+
+    client = TestClient(app)
+    proposals = client.get("/admin/projects/proposals?min_active=10").json()["proposals"]
+    ids = [p["project_id"] for p in proposals]
+    assert "apm" in ids
+    assert "drill" not in ids
+    assert "discrawl" not in ids
+
+
+def test_proposals_suggest_sibling_aliases_via_shared_tokens(env):
+    """Lead-space + lead-space-exploration-ltd + space-exploration-ltd
+    should cluster: each proposes the others as suggested_aliases via
+    shared non-trivial tokens (length >= 4)."""
+    for label in ("lead-space", "lead-space-exploration-ltd", "space-exploration-ltd"):
+        for i in range(11):
+            create_memory("knowledge", f"{label} content {i}", project=label)
+
+    client = TestClient(app)
+    proposals = {p["project_id"]: p for p in client.get("/admin/projects/proposals").json()["proposals"]}
+
+    # All three appear and each suggests at least one of the others
+    for label in ("lead-space", "lead-space-exploration-ltd", "space-exploration-ltd"):
+        assert label in proposals
+        siblings = set(proposals[label]["suggested_aliases"])
+        # Every label shares "space" (and others share "exploration"/"lead")
+        # so at least one sibling must be present.
+        assert siblings & {"lead-space", "lead-space-exploration-ltd", "space-exploration-ltd"} - {label}
+
+
+def test_proposals_short_token_does_not_match(env):
+    """Per Codex Wave 1.5 P2: previously this test only asserted apm
+    and drill have empty siblings, which is trivially true because they
+    share no tokens at all. The real risk is an accidental relaxation
+    that lets <4-char tokens trigger clustering. Construct a setup where
+    that would matter:
+      - 'apm' and 'apm-fpga': only the 3-char 'apm' is shared. They must
+        NOT cluster, because 'apm' is too short.
+      - 'foo-fpga' and 'bar-fpga': the 4-char 'fpga' is shared. They
+        MUST cluster.
+    """
+    for label in ("apm", "apm-fpga", "foo-fpga", "bar-fpga"):
+        for i in range(11):
+            create_memory("knowledge", f"{label} fact {i}", project=label)
+
+    client = TestClient(app)
+    proposals = {p["project_id"]: p for p in client.get("/admin/projects/proposals").json()["proposals"]}
+
+    # Negative: short-token match must not happen
+    assert "apm-fpga" not in proposals["apm"]["suggested_aliases"], (
+        "'apm' (3 chars) is below the 4-char minimum; 'apm' and 'apm-fpga' "
+        "must not cluster via the 'apm' token."
+    )
+
+    # Positive: long-token match must happen — both directions
+    assert "bar-fpga" in proposals["foo-fpga"]["suggested_aliases"]
+    assert "foo-fpga" in proposals["bar-fpga"]["suggested_aliases"]
+    # And 'apm-fpga' clusters with the others via 'fpga'
+    assert "apm-fpga" in proposals["foo-fpga"]["suggested_aliases"]
+
+
+def test_proposals_clustering_is_case_insensitive(env):
+    """Token comparison must be case-insensitive so labels captured
+    with mixed casing still cluster. Codex Wave 1.5 P3."""
+    for label in ("HydroTech-Mining", "hydrotech-split-tank"):
+        for i in range(11):
+            create_memory("knowledge", f"{label} fact {i}", project=label)
+
+    client = TestClient(app)
+    proposals = {p["project_id"]: p for p in client.get("/admin/projects/proposals").json()["proposals"]}
+    assert "hydrotech-split-tank" in proposals["HydroTech-Mining"]["suggested_aliases"]
+    assert "HydroTech-Mining" in proposals["hydrotech-split-tank"]["suggested_aliases"]
+
+
+def test_proposals_registered_token_does_not_leak_into_sibling_set(env, monkeypatch):
+    """Registered project ids must be filtered BEFORE clustering so a
+    registered token doesn't get suggested as an alias for an
+    unregistered sibling. p04-gigabit is registered in env; an
+    unregistered 'gigabit-other' must not list 'p04-gigabit' as alias."""
+    for i in range(15):
+        create_memory("knowledge", f"p04 fact {i}", project="p04-gigabit")
+    for i in range(11):
+        create_memory("knowledge", f"gigabit-other fact {i}", project="gigabit-other")
+
+    client = TestClient(app)
+    proposals = {p["project_id"]: p for p in client.get("/admin/projects/proposals").json()["proposals"]}
+    assert "p04-gigabit" not in proposals
+    assert "gigabit-other" in proposals
+    # And the registered name must not surface as a sibling
+    assert "p04-gigabit" not in proposals["gigabit-other"]["suggested_aliases"]
+
+
+def test_proposals_include_sample_memories_and_guessed_root(env):
+    for i in range(11):
+        create_memory("knowledge", f"sample content {i}", project="apm")
+
+    client = TestClient(app)
+    body = client.get("/admin/projects/proposals").json()
+    apm = next(p for p in body["proposals"] if p["project_id"] == "apm")
+    assert apm["active_count"] == 11
+    assert apm["candidate_count"] == 0
+    assert apm["guessed_ingest_root"] == {
+        "source": "vault",
+        "subpath": "incoming/projects/apm/",
+    }
+    assert len(apm["sample_memories"]) == 3
+    for s in apm["sample_memories"]:
+        assert s["id"]
+        assert "sample content" in s["content_preview"]
+
+
+def test_proposals_count_candidates_separately(env):
+    for i in range(11):
+        create_memory("knowledge", f"active {i}", project="apm")
+    for i in range(4):
+        create_memory("knowledge", f"candidate {i}", project="apm", status="candidate")
+
+    client = TestClient(app)
+    apm = next(
+        p for p in client.get("/admin/projects/proposals").json()["proposals"]
+        if p["project_id"] == "apm"
+    )
+    assert apm["active_count"] == 11
+    assert apm["candidate_count"] == 4
+
+
+def test_proposals_min_active_param_validation(env):
+    client = TestClient(app)
+    r = client.get("/admin/projects/proposals?min_active=0")
+    assert r.status_code == 400
+
+
+def test_proposals_sorted_by_active_count_desc(env):
+    for i in range(20):
+        create_memory("knowledge", f"big {i}", project="apm")
+    for i in range(11):
+        create_memory("knowledge", f"small {i}", project="openclaw")
+
+    client = TestClient(app)
+    proposals = client.get("/admin/projects/proposals").json()["proposals"]
+    ids = [p["project_id"] for p in proposals]
+    assert ids[0] == "apm"
+    assert ids[1] == "openclaw"
--- a/tests/test_ingestion.py
+++ b/tests/test_ingestion.py
@@ -1,8 +1,10 @@
 """Tests for the ingestion pipeline."""

+import json
+
 from atocore.ingestion.parser import parse_markdown
 from atocore.models.database import get_connection, init_db
-from atocore.ingestion.pipeline import ingest_file, ingest_folder
+from atocore.ingestion.pipeline import ingest_file, ingest_folder, ingest_project_folder


 def test_parse_markdown(sample_markdown):
@@ -69,6 +71,153 @@ def test_ingest_updates_changed(tmp_data_dir, sample_markdown):
    assert result["status"] == "ingested"


+def test_ingest_file_records_project_id_metadata(tmp_data_dir, sample_markdown, monkeypatch):
+    """Project-aware ingestion should tag DB and vector metadata exactly."""
+    init_db()
+
+    class FakeVectorStore:
+        def __init__(self):
+            self.metadatas = []
+
+        def add(self, ids, documents, metadatas):
+            self.metadatas.extend(metadatas)
+
+        def delete(self, ids):
+            return None
+
+    fake_store = FakeVectorStore()
+    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
+
+    result = ingest_file(sample_markdown, project_id="p04-gigabit")
+
+    assert result["status"] == "ingested"
+    assert fake_store.metadatas
+    assert all(meta["project_id"] == "p04-gigabit" for meta in fake_store.metadatas)
+
+    with get_connection() as conn:
+        rows = conn.execute("SELECT metadata FROM source_chunks").fetchall()
+    assert rows
+    assert all(
+        json.loads(row["metadata"])["project_id"] == "p04-gigabit"
+        for row in rows
+    )
+
+
+def test_ingest_file_derives_project_id_from_registry_root(tmp_data_dir, tmp_path, monkeypatch):
+    """Unscoped ingest should preserve ownership for files under registered roots."""
+    import atocore.config as config
+
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
+    project_dir.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    note = project_dir / "status.md"
+    note.write_text(
+        "# Status\n\nCurrent project status with enough detail to create "
+        "a retrievable chunk for the ingestion pipeline test.",
+        encoding="utf-8",
+    )
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p04-gigabit",
+                        "aliases": ["p04"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    class FakeVectorStore:
+        def __init__(self):
+            self.metadatas = []
+
+        def add(self, ids, documents, metadatas):
+            self.metadatas.extend(metadatas)
+
+        def delete(self, ids):
+            return None
+
+    fake_store = FakeVectorStore()
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
+
+    init_db()
+    result = ingest_file(note)
+
+    assert result["status"] == "ingested"
+    assert fake_store.metadatas
+    assert all(meta["project_id"] == "p04-gigabit" for meta in fake_store.metadatas)
+
+
+def test_ingest_file_logs_and_fails_open_when_project_derivation_fails(
+    tmp_data_dir,
+    sample_markdown,
+    monkeypatch,
+):
+    """A broken registry should be visible but should not block ingestion."""
+    init_db()
+    warnings = []
+
+    class FakeVectorStore:
+        def __init__(self):
+            self.metadatas = []
+
+        def add(self, ids, documents, metadatas):
+            self.metadatas.extend(metadatas)
+
+        def delete(self, ids):
+            return None
+
+    fake_store = FakeVectorStore()
+    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
+    monkeypatch.setattr(
+        "atocore.projects.registry.derive_project_id_for_path",
+        lambda path: (_ for _ in ()).throw(ValueError("registry broken")),
+    )
+    monkeypatch.setattr(
+        "atocore.ingestion.pipeline.log.warning",
+        lambda event, **kwargs: warnings.append((event, kwargs)),
+    )
+
+    result = ingest_file(sample_markdown)
+
+    assert result["status"] == "ingested"
+    assert fake_store.metadatas
+    assert all(meta["project_id"] == "" for meta in fake_store.metadatas)
+    assert warnings[0][0] == "project_id_derivation_failed"
+    assert "registry broken" in warnings[0][1]["error"]
+
+
+def test_ingest_project_folder_passes_project_id_to_files(tmp_data_dir, sample_folder, monkeypatch):
+    seen = []
+
+    def fake_ingest_file(path, project_id=""):
+        seen.append((path.name, project_id))
+        return {"file": str(path), "status": "ingested"}
+
+    monkeypatch.setattr("atocore.ingestion.pipeline.ingest_file", fake_ingest_file)
+    monkeypatch.setattr("atocore.ingestion.pipeline._purge_deleted_files", lambda *args, **kwargs: 0)
+
+    ingest_project_folder(sample_folder, project_id="p05-interferometer")
+
+    assert seen
+    assert {project_id for _, project_id in seen} == {"p05-interferometer"}
+
+
 def test_parse_markdown_uses_supplied_text(sample_markdown):
    """Parsing should be able to reuse pre-read content from ingestion."""
    latin_text = """---\ntags: parser\n---\n# Parser Title\n\nBody text."""
--- a/tests/test_invalidate_supersede.py
+++ b/tests/test_invalidate_supersede.py
@@ -192,3 +192,120 @@ def test_v1_aliases_present(env):
        "/v1/memory/{memory_id}/supersede",
    ):
        assert p in paths, f"{p} missing"
+
+
+# ---------------------------------------------------------------------------
+# Wave 1 (2026-04-29) — invalidation route used to do
+# `_get_memories(status='active', limit=1)` and look for the target id
+# inside that single highest-confidence row, so any active memory
+# outside slot 0 fell through as 404. Direct id lookup fixes it.
+# ---------------------------------------------------------------------------
+
+
+def test_api_invalidate_finds_active_memory_outside_top_one(env):
+    """An active memory not at the top of the confidence sort must still
+    be invalidatable via POST /memory/{id}/invalidate."""
+    high = create_memory(
+        memory_type="knowledge",
+        content="high-confidence top row",
+        confidence=0.99,
+    )
+    low = create_memory(
+        memory_type="knowledge",
+        content="lower-confidence target",
+        confidence=0.55,
+    )
+    client = TestClient(app)
+    r = client.post(f"/memory/{low.id}/invalidate", json={"reason": "wave1 regression"})
+    assert r.status_code == 200, r.text
+    assert r.json()["status"] == "invalidated"
+    # And confirm the high-confidence row is untouched
+    assert _get_memory(high.id).status == "active"
+    assert _get_memory(low.id).status == "invalid"
+
+
+def test_api_invalidate_already_invalid_is_idempotent(env):
+    m = create_memory(memory_type="knowledge", content="already invalid")
+    client = TestClient(app)
+    r1 = client.post(f"/memory/{m.id}/invalidate", json={"reason": "first"})
+    assert r1.status_code == 200
+    r2 = client.post(f"/memory/{m.id}/invalidate", json={"reason": "again"})
+    assert r2.status_code == 200
+    assert r2.json()["status"] == "already_invalid"
+
+
+def test_api_invalidate_candidate_returns_409(env):
+    m = create_memory(
+        memory_type="knowledge", content="candidate route", status="candidate"
+    )
+    client = TestClient(app)
+    r = client.post(f"/memory/{m.id}/invalidate", json={"reason": "wrong route"})
+    assert r.status_code == 409
+
+
+def test_api_invalidate_unknown_id_is_404(env):
+    client = TestClient(app)
+    r = client.post("/memory/no-such-id/invalidate", json={"reason": "ghost"})
+    assert r.status_code == 404
+
+
+def test_api_supersede_candidate_returns_409(env):
+    """Mirror of the invalidate guard: candidates must not silently flip
+    to superseded via the active-only supersede route."""
+    m = create_memory(
+        memory_type="knowledge", content="candidate target", status="candidate"
+    )
+    client = TestClient(app)
+    r = client.post(f"/memory/{m.id}/supersede", json={"reason": "wrong route"})
+    assert r.status_code == 409
+    # Row should still be a candidate
+    assert _get_memory(m.id).status == "candidate"
+
+
+def test_api_supersede_already_superseded_is_idempotent(env):
+    m = create_memory(memory_type="knowledge", content="will be superseded")
+    client = TestClient(app)
+    r1 = client.post(f"/memory/{m.id}/supersede", json={"reason": "first"})
+    assert r1.status_code == 200
+    r2 = client.post(f"/memory/{m.id}/supersede", json={"reason": "again"})
+    assert r2.status_code == 200
+    assert r2.json()["status"] == "already_superseded"
+
+
+def test_api_supersede_unknown_id_is_404(env):
+    client = TestClient(app)
+    r = client.post("/memory/no-such-id/supersede", json={"reason": "ghost"})
+    assert r.status_code == 404
+
+
+def test_admin_dashboard_active_count_matches_full_table(env):
+    """/admin/dashboard memories.active must match the SQL aggregate even
+    when there are more active memories than the legacy sample limit (500).
+
+    This guards the Codex finding that the dashboard was deriving counts
+    from a confidence-sorted limit=500 fetch, hiding rows past the cap.
+    We don't need 500 rows in the test — a small corpus that exercises
+    the SQL-aggregate path is enough; the integrity-vs-dashboard equality
+    is the invariant being asserted.
+    """
+    # Mix of statuses to exercise the by_status aggregate
+    create_memory(memory_type="knowledge", content="a")
+    create_memory(memory_type="knowledge", content="b", project="p06-polisher")
+    create_memory(memory_type="project", content="c-cand", status="candidate")
+    cand = create_memory(memory_type="project", content="d-cand", status="candidate")
+    # Invalidate one to seed an "invalid" bucket
+    from atocore.memory.service import invalidate_memory
+    target_id = cand.id
+    # Promote it first via direct DB so invalidate does flip a candidate
+    # to invalid via the service path (mirrors actual API trajectory).
+    invalidate_memory(target_id)
+
+    client = TestClient(app)
+    dash = client.get("/admin/dashboard").json()
+    assert dash["memories"]["active"] == 2
+    assert dash["memories"]["candidates"] == 1
+    assert dash["memories"]["by_status"]["invalid"] == 1
+    assert dash["memories"]["total"] == 4
+    assert dash["memories"]["by_project"].get("p06-polisher") == 1
+    # "(none)" bucket is the COALESCE label for empty/null project
+    assert "(none)" in dash["memories"]["by_project"]
--- a/tests/test_memory.py
+++ b/tests/test_memory.py
@@ -428,6 +428,136 @@ def test_context_builder_tag_boost_orders_results(isolated_db):
    assert idx_tagged < idx_untagged


+def test_project_memory_ranking_ignores_scope_noise(isolated_db):
+    """Project words should not crowd out the actual query intent."""
+    from atocore.memory.service import create_memory, get_memories_for_context
+
+    create_memory(
+        "project",
+        "Norman is the end operator for p06-polisher and requires an explicit manual mode to operate the machine.",
+        project="p06-polisher",
+        confidence=0.7,
+    )
+    create_memory(
+        "project",
+        "Polisher Control firmware spec document titled 'Fulum Polisher Machine Control Firmware Spec v1' lives in PKM.",
+        project="p06-polisher",
+        confidence=0.7,
+    )
+    create_memory(
+        "project",
+        "Machine design principle: works fully offline and independently; network connection is for remote access only",
+        project="p06-polisher",
+        confidence=0.5,
+    )
+    create_memory(
+        "project",
+        "Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
+        project="p06-polisher",
+        confidence=0.5,
+    )
+
+    text, _ = get_memories_for_context(
+        memory_types=["project"],
+        project="p06-polisher",
+        budget=360,
+        query="how do we access the polisher machine remotely",
+    )
+
+    assert "Tailscale" in text
+    assert text.find("remote access only") < text.find("Tailscale")
+    assert "manual mode" not in text
+
+
+def test_project_memory_ranking_prefers_multiple_intent_hits(isolated_db):
+    """A rich memory with several query hits should beat a terse one-hit memory."""
+    from atocore.memory.service import create_memory, get_memories_for_context
+
+    create_memory(
+        "project",
+        "CGH vendor selected for p05. Active integration coordination with Katie/AOM.",
+        project="p05-interferometer",
+        confidence=0.7,
+    )
+    create_memory(
+        "knowledge",
+        "Vendor-summary current signal: 4D is the strongest technical Twyman-Green candidate; "
+        "a certified used Zygo Verifire SV around $55k emerged as a strong value path.",
+        project="p05-interferometer",
+        confidence=0.9,
+    )
+
+    text, _ = get_memories_for_context(
+        memory_types=["project", "knowledge"],
+        project="p05-interferometer",
+        budget=220,
+        query="what is the current vendor signal for the interferometer procurement",
+    )
+
+    assert "4D" in text
+    assert "Zygo" in text
+
+
+def test_project_memory_query_ranks_beyond_confidence_prefilter(isolated_db):
+    """Query-time ranking should see older low-confidence but exact-intent memories."""
+    from atocore.memory.service import create_memory, get_memories_for_context
+
+    for idx in range(35):
+        create_memory(
+            "project",
+            f"High confidence p06 filler memory {idx}: Polisher Control planning note.",
+            project="p06-polisher",
+            confidence=0.9,
+        )
+    create_memory(
+        "project",
+        "Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
+        project="p06-polisher",
+        confidence=0.5,
+    )
+
+    text, _ = get_memories_for_context(
+        memory_types=["project"],
+        project="p06-polisher",
+        budget=360,
+        query="how do we access the polisher machine remotely",
+    )
+
+    assert "Tailscale" in text
+
+
+def test_project_memory_query_prefers_exact_cam_fact(isolated_db):
+    from atocore.memory.service import create_memory, get_memories_for_context
+
+    create_memory(
+        "project",
+        "Polisher Control firmware spec document titled 'Fulum Polisher Machine Control Firmware Spec v1' lives in PKM.",
+        project="p06-polisher",
+        confidence=0.9,
+    )
+    create_memory(
+        "project",
+        "Polisher Control doc must cover manual mode for Norman as a required deliverable per the plan.",
+        project="p06-polisher",
+        confidence=0.9,
+    )
+    create_memory(
+        "project",
+        "Cam amplitude and offset are mechanically set by operator and read via encoders; no actuators control them.",
+        project="p06-polisher",
+        confidence=0.5,
+    )
+
+    text, _ = get_memories_for_context(
+        memory_types=["project"],
+        project="p06-polisher",
+        budget=300,
+        query="how is cam amplitude controlled on the polisher",
+    )
+
+    assert "encoders" in text
+
+
 def test_expire_stale_candidates_keeps_reinforced(isolated_db):
    from atocore.memory.service import create_memory, expire_stale_candidates
    from atocore.models.database import get_connection
@@ -445,3 +575,121 @@ def test_expire_stale_candidates_keeps_reinforced(isolated_db):
    assert mid not in expired
    mem = _get_memory_by_id(mid)
    assert mem["status"] == "candidate"
+
+
+# ---------------------------------------------------------------------------
+# Wave 1 (2026-04-29) — counts come from SQL, not from the top-N sample.
+# Exposed by Codex audit when prod /admin/dashboard reported 315 active
+# while /admin/integrity-check reported 1091. The dashboard was building
+# its counts from a confidence-sorted limit=500 fetch.
+# ---------------------------------------------------------------------------
+
+
+def test_get_memory_count_summary_returns_full_table_aggregates(isolated_db):
+    """Counts come from SQL aggregates, not a sampled fetch."""
+    from atocore.memory.service import (
+        create_memory,
+        get_memory_count_summary,
+        invalidate_memory,
+    )
+
+    # Create more rows than any reasonable sampling LIMIT so any
+    # LIMIT-based counter would visibly disagree with reality.
+    for i in range(120):
+        create_memory(
+            "knowledge",
+            f"fact-{i}",
+            project="p04-gigabit",
+            confidence=0.9,
+            status="active",
+        )
+    for i in range(7):
+        create_memory("knowledge", f"cand-{i}", status="candidate")
+    invalid_obj = create_memory("knowledge", "to-invalidate", status="active")
+    invalidate_memory(invalid_obj.id)
+
+    summary = get_memory_count_summary()
+    assert summary["total"] == 120 + 7 + 1
+    assert summary["by_status"]["active"] == 120
+    assert summary["by_status"]["candidate"] == 7
+    assert summary["by_status"]["invalid"] == 1
+    assert summary["active"]["total"] == 120
+    assert summary["active"]["by_type"] == {"knowledge": 120}
+    assert summary["active"]["by_project"] == {"p04-gigabit": 120}
+
+
+def test_get_memory_returns_single_row_or_none(isolated_db):
+    from atocore.memory.service import create_memory, get_memory
+
+    mem = create_memory("knowledge", "single-row test")
+    fetched = get_memory(mem.id)
+    assert fetched is not None
+    assert fetched.id == mem.id
+    assert get_memory("non-existent-id") is None
+
+
+def test_update_memory_can_change_project_with_canonicalization(
+    isolated_db, project_registry
+):
+    """update_memory(project=...) canonicalizes aliases and writes audit."""
+    project_registry(("p04-gigabit", ("p04", "gigabit")))
+    from atocore.memory.service import (
+        create_memory,
+        get_memory,
+        get_memory_audit,
+        update_memory,
+    )
+
+    mem = create_memory("knowledge", "retargetable fact", project="atocore")
+    ok = update_memory(mem.id, project="p04")  # alias
+    assert ok is True
+
+    refreshed = get_memory(mem.id)
+    assert refreshed.project == "p04-gigabit"  # canonical, not "p04"
+
+    audit_rows = get_memory_audit(mem.id, limit=10)
+    update_rows = [r for r in audit_rows if r.get("action") == "updated"]
+    assert update_rows, f"expected an updated audit row, got {audit_rows}"
+    head = update_rows[0]
+    assert head["before"]["project"] == "atocore"
+    assert head["after"]["project"] == "p04-gigabit"
+
+
+def test_update_memory_project_unchanged_when_not_passed(isolated_db):
+    from atocore.memory.service import create_memory, get_memory, update_memory
+
+    mem = create_memory("knowledge", "untouched project", project="p06-polisher")
+    update_memory(mem.id, content="edited content")
+    assert get_memory(mem.id).project == "p06-polisher"
+
+
+def test_update_memory_to_empty_project_detects_global_duplicate(isolated_db):
+    """Codex P3: when retargeting to project='' (global), the duplicate
+    check must scope to the new project. If a global active memory with
+    the same content already exists, the update must raise."""
+    import pytest as _pytest
+    from atocore.memory.service import create_memory, update_memory
+
+    create_memory("knowledge", "shared global fact", project="")
+    scoped = create_memory("knowledge", "shared global fact", project="p04-gigabit")
+
+    with _pytest.raises(ValueError, match="duplicate active memory"):
+        update_memory(scoped.id, project="")
+
+
+def test_auto_triage_suggested_project_put_body_uses_project_key():
+    """Regression: the auto_triage caller used to PUT {"content": ...}
+    which silently dropped the suggested project change. The fix sends
+    {"project": suggested}. Inspect the script source so we don't have
+    to spin up a live triage run."""
+    from pathlib import Path
+
+    src = Path(__file__).resolve().parents[1] / "scripts" / "auto_triage.py"
+    text = src.read_text(encoding="utf-8")
+    # The block that PUTs to /memory/{mid} for a suggested_project fix
+    assert 'json.dumps({"project": suggested})' in text, (
+        "auto_triage.py must PUT {\"project\": suggested} so the "
+        "suggested-project correction actually applies. See Wave 1."
+    )
+    # And must not be back to the old shape
+    assert 'json.dumps({"content": cand["content"]})' not in text
--- a/tests/test_project_registry.py
+++ b/tests/test_project_registry.py
@@ -5,6 +5,7 @@ import json
 import atocore.config as config
 from atocore.projects.registry import (
    build_project_registration_proposal,
+    derive_project_id_for_path,
    get_registered_project,
    get_project_registry_template,
    list_registered_projects,
@@ -103,6 +104,98 @@ def test_project_registry_resolves_alias(tmp_path, monkeypatch):
    assert project.project_id == "p05-interferometer"


+def test_derive_project_id_for_path_uses_registered_roots(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
+    project_dir.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    note = project_dir / "status.md"
+    note.write_text("# Status\n\nCurrent work.", encoding="utf-8")
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p04-gigabit",
+                        "aliases": ["p04"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        assert derive_project_id_for_path(note) == "p04-gigabit"
+        assert derive_project_id_for_path(tmp_path / "elsewhere.md") == ""
+    finally:
+        config.settings = original_settings
+
+
+def test_project_registry_rejects_cross_project_ingest_root_overlap(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "parent",
+                        "aliases": [],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/parent"}
+                        ],
+                    },
+                    {
+                        "id": "child",
+                        "aliases": [],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/parent/child"}
+                        ],
+                    },
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        try:
+            list_registered_projects()
+        except ValueError as exc:
+            assert "ingest root overlap" in str(exc)
+        else:
+            raise AssertionError("Expected overlapping ingest roots to raise")
+    finally:
+        config.settings = original_settings
+
+
 def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypatch):
    vault_dir = tmp_path / "vault"
    drive_dir = tmp_path / "drive"
@@ -133,8 +226,8 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat

    calls = []

-    def fake_ingest_folder(path, purge_deleted=True):
-        calls.append((str(path), purge_deleted))
+    def fake_ingest_folder(path, purge_deleted=True, project_id=""):
+        calls.append((str(path), purge_deleted, project_id))
        return [{"file": str(path / "README.md"), "status": "ingested"}]

    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -144,7 +237,7 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
+        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fake_ingest_folder)
        result = refresh_registered_project("polisher")
    finally:
        config.settings = original_settings
@@ -153,6 +246,7 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
    assert len(calls) == 1
    assert calls[0][0].endswith("p06-polisher")
    assert calls[0][1] is False
+    assert calls[0][2] == "p06-polisher"
    assert result["roots"][0]["status"] == "ingested"
    assert result["status"] == "ingested"
    assert result["roots_ingested"] == 1
@@ -188,7 +282,7 @@ def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
        encoding="utf-8",
    )

-    def fail_ingest_folder(path, purge_deleted=True):
+    def fail_ingest_folder(path, purge_deleted=True, project_id=""):
        raise AssertionError(f"ingest_folder should not be called for missing root: {path}")

    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -198,7 +292,7 @@ def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fail_ingest_folder)
+        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fail_ingest_folder)
        result = refresh_registered_project("ghost")
    finally:
        config.settings = original_settings
@@ -238,7 +332,7 @@ def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch
        encoding="utf-8",
    )

-    def fake_ingest_folder(path, purge_deleted=True):
+    def fake_ingest_folder(path, purge_deleted=True, project_id=""):
        return [{"file": str(path / "README.md"), "status": "ingested"}]

    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -248,7 +342,7 @@ def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
+        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fake_ingest_folder)
        result = refresh_registered_project("mixed")
    finally:
        config.settings = original_settings
--- a/tests/test_retrieval.py
+++ b/tests/test_retrieval.py
@@ -70,8 +70,28 @@ def test_retrieve_skips_stale_vector_entries(tmp_data_dir, sample_markdown, monk


 def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
+    target_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p04-gigabit",
+            "aliases": ("p04", "gigabit"),
+            "ingest_roots": (),
+        },
+    )()
+    other_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p05-interferometer",
+            "aliases": ("p05", "interferometer"),
+            "ingest_roots": (),
+        },
+    )()
+
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
+            assert top_k == 8
            return {
                "ids": [["chunk-a", "chunk-b"]],
                "documents": [["project doc", "other doc"]],
@@ -102,22 +122,501 @@ def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: type(
-            "Project",
-            (),
-            {
-                "project_id": "p04-gigabit",
-                "aliases": ("p04", "gigabit"),
-                "ingest_roots": (),
-            },
-        )(),
+        lambda project_name: target_project,
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.load_project_registry",
+        lambda: [target_project, other_project],
    )

    results = retrieve("mirror architecture", top_k=2, project_hint="p04")

-    assert len(results) == 2
+    assert len(results) == 1
    assert results[0].chunk_id == "chunk-a"
-    assert results[0].score > results[1].score
+
+
+def test_retrieve_project_scope_allows_unowned_global_chunks(monkeypatch):
+    target_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p04-gigabit",
+            "aliases": ("p04", "gigabit"),
+            "ingest_roots": (),
+        },
+    )()
+
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            return {
+                "ids": [["chunk-a", "chunk-global"]],
+                "documents": [["project doc", "global doc"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p04-gigabit/pkm/_index.md",
+                        "tags": '["p04-gigabit"]',
+                        "title": "P04",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "shared/engineering-rules.md",
+                        "tags": "[]",
+                        "title": "Shared engineering rules",
+                        "document_id": "doc-global",
+                    },
+                ]],
+                "distances": [[0.2, 0.21]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: target_project,
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.load_project_registry",
+        lambda: [target_project],
+    )
+
+    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
+
+    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-global"]
+
+
+def test_retrieve_project_scope_filter_can_be_disabled(monkeypatch):
+    target_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p04-gigabit",
+            "aliases": ("p04", "gigabit"),
+            "ingest_roots": (),
+        },
+    )()
+    other_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p05-interferometer",
+            "aliases": ("p05", "interferometer"),
+            "ingest_roots": (),
+        },
+    )()
+
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            assert top_k == 2
+            return {
+                "ids": [["chunk-a", "chunk-b"]],
+                "documents": [["project doc", "other project doc"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p04-gigabit/pkm/_index.md",
+                        "tags": '["p04-gigabit"]',
+                        "title": "P04",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p05-interferometer/pkm/_index.md",
+                        "tags": '["p05-interferometer"]',
+                        "title": "P05",
+                        "document_id": "doc-b",
+                    },
+                ]],
+                "distances": [[0.2, 0.2]],
+            }
+
+    monkeypatch.setattr("atocore.config.settings.rank_project_scope_filter", False)
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: target_project,
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.load_project_registry",
+        lambda: [target_project, other_project],
+    )
+
+    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
+
+    assert {r.chunk_id for r in results} == {"chunk-a", "chunk-b"}
+
+
+def test_retrieve_project_scope_ignores_title_for_ownership(monkeypatch):
+    target_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p04-gigabit",
+            "aliases": ("p04", "gigabit"),
+            "ingest_roots": (),
+        },
+    )()
+    other_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p06-polisher",
+            "aliases": ("p06", "polisher", "p11"),
+            "ingest_roots": (),
+        },
+    )()
+
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            return {
+                "ids": [["chunk-target", "chunk-poisoned-title"]],
+                "documents": [["p04 doc", "p06 doc"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p04-gigabit/pkm/_index.md",
+                        "tags": '["p04-gigabit"]',
+                        "title": "P04",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p06-polisher/pkm/architecture.md",
+                        "tags": '["p06-polisher"]',
+                        "title": "GigaBIT M1 mirror lessons",
+                        "document_id": "doc-b",
+                    },
+                ]],
+                "distances": [[0.2, 0.19]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: target_project,
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.load_project_registry",
+        lambda: [target_project, other_project],
+    )
+
+    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
+
+    assert [r.chunk_id for r in results] == ["chunk-target"]
+
+
+def test_retrieve_project_scope_uses_path_segments_not_substrings(monkeypatch):
+    target_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p05-interferometer",
+            "aliases": ("p05", "interferometer"),
+            "ingest_roots": (),
+        },
+    )()
+    abb_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "abb-space",
+            "aliases": ("abb",),
+            "ingest_roots": (),
+        },
+    )()
+
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            return {
+                "ids": [["chunk-target", "chunk-global"]],
+                "documents": [["p05 doc", "global doc"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p05-interferometer/pkm/_index.md",
+                        "tags": '["p05-interferometer"]',
+                        "title": "P05",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Abbreviation notes",
+                        "source_file": "shared/cabbage-abbreviations.md",
+                        "tags": "[]",
+                        "title": "ABB-style abbreviations",
+                        "document_id": "doc-global",
+                    },
+                ]],
+                "distances": [[0.2, 0.21]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: target_project,
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.load_project_registry",
+        lambda: [target_project, abb_project],
+    )
+
+    results = retrieve("abbreviations", top_k=2, project_hint="p05")
+
+    assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
+
+
+def test_retrieve_project_scope_prefers_exact_project_id(monkeypatch):
+    target_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p04-gigabit",
+            "aliases": ("p04", "gigabit"),
+            "ingest_roots": (),
+        },
+    )()
+    other_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p06-polisher",
+            "aliases": ("p06", "polisher"),
+            "ingest_roots": (),
+        },
+    )()
+
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            return {
+                "ids": [["chunk-target", "chunk-other", "chunk-global"]],
+                "documents": [["target doc", "other doc", "global doc"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "legacy/unhelpful-path.md",
+                        "tags": "[]",
+                        "title": "Target",
+                        "project_id": "p04-gigabit",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p04-gigabit/title-poisoned.md",
+                        "tags": '["p04-gigabit"]',
+                        "title": "Looks target-owned but is explicit p06",
+                        "project_id": "p06-polisher",
+                        "document_id": "doc-b",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "shared/global.md",
+                        "tags": "[]",
+                        "title": "Shared",
+                        "project_id": "",
+                        "document_id": "doc-global",
+                    },
+                ]],
+                "distances": [[0.2, 0.19, 0.21]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: target_project,
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.load_project_registry",
+        lambda: [target_project, other_project],
+    )
+
+    results = retrieve("mirror architecture", top_k=3, project_hint="p04")
+
+    assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
+
+
+def test_retrieve_empty_project_id_falls_back_to_path_ownership(monkeypatch):
+    target_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p04-gigabit",
+            "aliases": ("p04", "gigabit"),
+            "ingest_roots": (),
+        },
+    )()
+    other_project = type(
+        "Project",
+        (),
+        {
+            "project_id": "p05-interferometer",
+            "aliases": ("p05", "interferometer"),
+            "ingest_roots": (),
+        },
+    )()
+
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            return {
+                "ids": [["chunk-target", "chunk-other"]],
+                "documents": [["target doc", "other doc"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p04-gigabit/status.md",
+                        "tags": "[]",
+                        "title": "Target",
+                        "project_id": "",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p05-interferometer/status.md",
+                        "tags": "[]",
+                        "title": "Other",
+                        "project_id": "",
+                        "document_id": "doc-b",
+                    },
+                ]],
+                "distances": [[0.2, 0.19]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: target_project,
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.load_project_registry",
+        lambda: [target_project, other_project],
+    )
+
+    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
+
+    assert [r.chunk_id for r in results] == ["chunk-target"]
+
+
+def test_retrieve_unknown_project_hint_does_not_widen_or_filter(monkeypatch):
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            assert top_k == 2
+            return {
+                "ids": [["chunk-a", "chunk-b"]],
+                "documents": [["doc a", "doc b"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "project-a/file.md",
+                        "tags": "[]",
+                        "title": "A",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "project-b/file.md",
+                        "tags": "[]",
+                        "title": "B",
+                        "document_id": "doc-b",
+                    },
+                ]],
+                "distances": [[0.2, 0.21]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: None,
+    )
+
+    results = retrieve("overview", top_k=2, project_hint="unknown-project")
+
+    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
+
+
+def test_retrieve_fails_open_when_project_scope_resolution_fails(monkeypatch):
+    warnings = []
+
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            assert top_k == 2
+            return {
+                "ids": [["chunk-a", "chunk-b"]],
+                "documents": [["doc a", "doc b"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p04-gigabit/file.md",
+                        "tags": "[]",
+                        "title": "A",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p05-interferometer/file.md",
+                        "tags": "[]",
+                        "title": "B",
+                        "document_id": "doc-b",
+                    },
+                ]],
+                "distances": [[0.2, 0.21]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: (_ for _ in ()).throw(ValueError("registry overlap")),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.log.warning",
+        lambda event, **kwargs: warnings.append((event, kwargs)),
+    )
+
+    results = retrieve("overview", top_k=2, project_hint="p04")
+
+    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
+    assert {warning[0] for warning in warnings} == {
+        "project_scope_resolution_failed",
+        "project_match_boost_resolution_failed",
+    }
+    assert all("registry overlap" in warning[1]["error"] for warning in warnings)


 def test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths(monkeypatch):
--- a/tests/test_v1_a_pillar_queries.py
+++ b/tests/test_v1_a_pillar_queries.py
@@ -0,0 +1,278 @@
+"""V1-A — minimum query slice that proves the entity model end-to-end.
+
+Per ``docs/plans/engineering-v1-completion-plan.md`` Phase V1-A:
+the four pillar queries (Q-001 subsystem-scoped, Q-005, Q-006, Q-017)
+must run against a single seed graph and report the expected shapes.
+
+This file is intentionally separate from ``test_engineering_queries.py``
+so the V1-A acceptance is auditable in isolation. The seed graph mirrors
+what the V1-A plan called for: one satisfying Component, one orphan
+Requirement, one supported ValidationClaim, one unsupported one, one
+Decision on a flagged Assumption.
+"""
+
+from __future__ import annotations
+
+import pytest
+from fastapi.testclient import TestClient
+
+from atocore.engineering.queries import (
+    evidence_chain,
+    orphan_requirements,
+    requirements_for,
+    risky_decisions,
+    subsystem_contents,
+    unsupported_claims,
+)
+from atocore.engineering.service import (
+    create_entity,
+    create_relationship,
+    init_engineering_schema,
+    invalidate_active_entity,
+)
+from atocore.main import app
+from atocore.models.database import init_db
+
+
+@pytest.fixture
+def v1a_seed(tmp_data_dir):
+    """Seed the Q-6 integration data exactly as V1-A's plan describes:
+    one satisfying Component + one orphan Requirement + one Decision on
+    a flagged Assumption + one supported ValidationClaim + one
+    unsupported ValidationClaim, plus the Subsystem they live under."""
+    init_db()
+    init_engineering_schema()
+
+    # Subsystem (Q-001 target)
+    optics = create_entity("subsystem", "Optics", project="p05-interferometer")
+
+    # Components — one with PART_OF the subsystem and SATISFIES a requirement,
+    # one without parents (orphan_components in Q-004; not the focus here).
+    primary = create_entity("component", "Primary Mirror", project="p05-interferometer")
+    diverger = create_entity("component", "Diverger Lens", project="p05-interferometer")
+    create_relationship(primary.id, optics.id, "part_of")
+    create_relationship(diverger.id, optics.id, "part_of")
+
+    # Requirements — one satisfied by primary, one orphan (Q-006)
+    req_satisfied = create_entity(
+        "requirement", "Surface figure < 25 nm RMS", project="p05-interferometer"
+    )
+    req_orphan = create_entity(
+        "requirement", "Calibration repeatable to lambda/20", project="p05-interferometer"
+    )
+    create_relationship(primary.id, req_satisfied.id, "satisfies")
+
+    # Decision on a flagged Assumption (Q-009 surface — not asserted in V1-A
+    # acceptance but useful as background data)
+    flagged_assumption = create_entity(
+        "parameter",
+        "Vendor lead time = 6 weeks",
+        project="p05-interferometer",
+        properties={"flagged": True},
+    )
+    risky = create_entity(
+        "decision", "Pre-order CGH from external vendor", project="p05-interferometer"
+    )
+    create_relationship(risky.id, flagged_assumption.id, "based_on_assumption")
+
+    # Validation claims — one supported by a Result, one unsupported (Q-017
+    # touches the supported one; Q-011 surfaces the unsupported one)
+    fea_result = create_entity(
+        "result", "FEA thermal sweep 2026-04", project="p05-interferometer"
+    )
+    claim_supported = create_entity(
+        "validation_claim", "Thermal margin is adequate", project="p05-interferometer"
+    )
+    claim_unsupported = create_entity(
+        "validation_claim", "Vibration isolation passes spec", project="p05-interferometer"
+    )
+    create_relationship(fea_result.id, claim_supported.id, "supports")
+
+    return {
+        "subsystem": optics,
+        "primary": primary,
+        "diverger": diverger,
+        "req_satisfied": req_satisfied,
+        "req_orphan": req_orphan,
+        "claim_supported": claim_supported,
+        "claim_unsupported": claim_unsupported,
+        "fea_result": fea_result,
+        "risky_decision": risky,
+        "flagged_assumption": flagged_assumption,
+    }
+
+
+# ---------------------------------------------------------------------------
+# Q-001 subsystem-scoped — the V1-A code change
+# ---------------------------------------------------------------------------
+
+
+def test_subsystem_contents_returns_spec_shape(v1a_seed):
+    """The shape must match the catalog spec
+    (engineering-query-catalog.md Q-001):
+    ``{subsystem, contains: [{id, type, name, status}]}``.
+    The implementation also emits ``entity_type`` for parity with the
+    rest of this module's response style — both must be present."""
+    result = subsystem_contents(v1a_seed["subsystem"].id)
+
+    assert result is not None
+    assert set(result.keys()) == {"subsystem", "contains"}
+    ss = result["subsystem"]
+    assert ss["id"] == v1a_seed["subsystem"].id
+    assert ss["name"] == "Optics"
+    assert ss["project"] == "p05-interferometer"
+    assert ss["status"] == "active"
+
+    contained = {c["name"] for c in result["contains"]}
+    assert contained == {"Primary Mirror", "Diverger Lens"}
+    for child in result["contains"]:
+        # Spec requires `type`; we also include `entity_type` for parity.
+        assert "type" in child
+        assert "entity_type" in child
+        assert child["type"] == child["entity_type"]
+        assert set(child.keys()) >= {"id", "type", "entity_type", "name", "status"}
+        assert child["entity_type"] == "component"
+        assert child["status"] == "active"
+
+
+def test_subsystem_contents_returns_none_for_missing_id(tmp_data_dir):
+    init_db()
+    init_engineering_schema()
+    assert subsystem_contents("no-such-id") is None
+
+
+def test_subsystem_contents_returns_none_when_entity_is_not_a_subsystem(v1a_seed):
+    """Calling the subsystem-scoped query against a Component must
+    return None, not the component dressed up as a subsystem."""
+    assert subsystem_contents(v1a_seed["primary"].id) is None
+
+
+def test_subsystem_contents_route_matches_spec(v1a_seed):
+    """Spec invocation: GET /entities/Subsystem/<id>?expand=contains.
+    Verify the route is registered, the expand=contains path works,
+    and unsupported expand values 400."""
+    client = TestClient(app)
+    sid = v1a_seed["subsystem"].id
+
+    r = client.get(f"/entities/Subsystem/{sid}?expand=contains")
+    assert r.status_code == 200
+    body = r.json()
+    assert body["subsystem"]["id"] == sid
+    names = {c["name"] for c in body["contains"]}
+    assert names == {"Primary Mirror", "Diverger Lens"}
+
+    # Default expand is "contains" so omitting the param should also work
+    r = client.get(f"/entities/Subsystem/{sid}")
+    assert r.status_code == 200
+
+    # Unsupported expand value 400s with an informative message
+    r = client.get(f"/entities/Subsystem/{sid}?expand=parents")
+    assert r.status_code == 400
+    assert "expand" in r.json()["detail"].lower()
+
+    # Missing/wrong-type subsystem 404s
+    r = client.get(f"/entities/Subsystem/{v1a_seed['primary'].id}")
+    assert r.status_code == 404
+
+
+def test_subsystem_contents_v1_alias_is_present():
+    """The subsystem-scoped variant must also be reachable under /v1."""
+    client = TestClient(app)
+    spec = client.get("/openapi.json").json()
+    assert "/v1/entities/Subsystem/{subsystem_id}" in spec["paths"]
+
+
+def test_subsystem_contents_v1_alias_serves_real_response(v1a_seed):
+    """Behavior, not just OpenAPI presence — catches a future
+    route-copy or ordering regression (Codex V1-A P3)."""
+    client = TestClient(app)
+    sid = v1a_seed["subsystem"].id
+
+    r = client.get(f"/v1/entities/Subsystem/{sid}?expand=contains")
+    assert r.status_code == 200
+    body = r.json()
+    assert body["subsystem"]["id"] == sid
+    assert {c["name"] for c in body["contains"]} == {"Primary Mirror", "Diverger Lens"}
+
+
+def test_subsystem_contents_includes_child_subsystems_and_excludes_inactive(tmp_data_dir):
+    """Codex V1-A P3: lock in two intended behaviors that aren't
+    obvious from the data shape:
+      1. Child *Subsystems* (nested) surface in `contains` — the impl
+         walks part_of irrespective of child entity_type.
+      2. Children whose status is not 'active' are filtered out."""
+    init_db()
+    init_engineering_schema()
+
+    parent = create_entity("subsystem", "Parent System", project="p-nested")
+    child_ss = create_entity("subsystem", "Child Subsystem", project="p-nested")
+    child_comp = create_entity("component", "Child Component", project="p-nested")
+    invalid_comp = create_entity("component", "Invalidated Component", project="p-nested")
+    create_relationship(child_ss.id, parent.id, "part_of")
+    create_relationship(child_comp.id, parent.id, "part_of")
+    create_relationship(invalid_comp.id, parent.id, "part_of")
+
+    invalidate_active_entity(invalid_comp.id, reason="v1a regression seed")
+
+    result = subsystem_contents(parent.id)
+    names_by_type = {(c["entity_type"], c["name"]) for c in result["contains"]}
+    assert ("subsystem", "Child Subsystem") in names_by_type
+    assert ("component", "Child Component") in names_by_type
+    # Inactive child must not appear
+    assert all(c["name"] != "Invalidated Component" for c in result["contains"])
+    assert all(c["status"] == "active" for c in result["contains"])
+
+
+# ---------------------------------------------------------------------------
+# V1-A acceptance — the four pillar queries against the single seed graph
+# ---------------------------------------------------------------------------
+
+
+def test_v1a_pillar_queries_run_end_to_end_against_single_seed(v1a_seed):
+    """The V1-A acceptance test. All four pillar queries must report
+    the expected shape against the same seed graph. If this test fails,
+    V1-A is not done — full stop. (Per the plan: "If the four pillar
+    queries don't work, stopping here is cheap.")"""
+    project = "p05-interferometer"
+
+    # Q-001 subsystem-scoped
+    q1 = subsystem_contents(v1a_seed["subsystem"].id)
+    assert q1 is not None
+    assert {c["name"] for c in q1["contains"]} == {"Primary Mirror", "Diverger Lens"}
+
+    # Q-005 — requirements satisfied by Primary Mirror
+    q5 = requirements_for(v1a_seed["primary"].id)
+    assert q5["count"] == 1
+    assert q5["requirements"][0]["name"] == "Surface figure < 25 nm RMS"
+
+    # Q-006 (killer correctness) — orphan requirements in the project
+    q6 = orphan_requirements(project)
+    orphan_names = {r["name"] for r in q6["gaps"]}
+    assert "Calibration repeatable to lambda/20" in orphan_names
+    assert "Surface figure < 25 nm RMS" not in orphan_names
+
+    # Q-017 — evidence chain for the supported ValidationClaim
+    q17 = evidence_chain(v1a_seed["claim_supported"].id)
+    via_supports = [edge for edge in q17["evidence_chain"] if edge["via"] == "supports"]
+    assert any(edge["source_name"] == "FEA thermal sweep 2026-04" for edge in via_supports)
+
+    # And the unsupported claim must have an empty supports chain
+    q17_unsup = evidence_chain(v1a_seed["claim_unsupported"].id)
+    assert not any(edge["via"] == "supports" for edge in q17_unsup["evidence_chain"])
+
+
+def test_v1a_seed_also_exercises_q009_and_q011_killers(v1a_seed):
+    """Per Codex V1-A P2: the seed already includes Q-009 (decision on
+    flagged assumption) and Q-011 (unsupported validation claim) data,
+    so the V1-A integration test should assert those killer queries
+    surface them. Otherwise the seed entities are dead weight."""
+    project = "p05-interferometer"
+
+    q9 = risky_decisions(project)
+    risky_names = {row["decision_name"] for row in q9["gaps"]}
+    assert "Pre-order CGH from external vendor" in risky_names
+
+    q11 = unsupported_claims(project)
+    unsupported_names = {row["name"] for row in q11["gaps"]}
+    assert "Vibration isolation passes spec" in unsupported_names
+    assert "Thermal margin is adequate" not in unsupported_names
Author	SHA1	Message	Date
Anto01	365259fde0	docs(ledger): V1-A landed and deployed build `23cdb31`. Live harness 20/20. V1-A status in master-plan-status.md flipped to ✅; V1-B is the next phase. 605 tests on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 13:11:07 -04:00
Anto01	23cdb3149f	feat(engineering): V1-A — Q-001 subsystem-scoped + pillar query integration test Phase V1-A of the Engineering V1 Completion Plan. Scope was kept tight per the plan: a single Q-001 shape fix and an integration test that proves the four pillar queries work end-to-end against one seed graph. Code change: - subsystem_contents() in src/atocore/engineering/queries.py returns {subsystem, contains: [{id, type, entity_type, name, status}]} by walking inbound part_of edges (the inverse of CONTAINS), filtered to active children. `type` matches the catalog spec; `entity_type` preserves parity with the rest of this module's response shape. - GET /entities/Subsystem/{id}?expand=contains route in routes.py matches the Q-001 spec invocation verbatim; 400 on unsupported expand, 404 on missing-or-wrong-type. - Aliased under /v1. - The existing project-wide /engineering/projects/{name}/systems route stays as-is for Q-004 (whole-project tree). V1-A acceptance test (tests/test_v1_a_pillar_queries.py): - Q-001 subsystem-scoped: Optics → Primary Mirror + Diverger Lens - Q-005 requirements_for: Primary Mirror has its single satisfied req - Q-006 orphan_requirements: orphan surfaces, satisfied does not - Q-017 evidence_chain: supported claim has the FEA result via 'supports'; unsupported claim does not - Q-009 risky_decisions / Q-011 unsupported_claims also asserted against the same seed (Codex P2: don't seed data you don't assert) Plus targeted tests for the Q-001 helper: missing id, wrong type, nested child subsystems, inactive-child filtering, real /v1 GET behavior. Reviewed by Codex twice: GO WITH CONDITIONS at `b575773`, GO at `0e83525` after the dual-key emit + Q-009/Q-011 + /v1-behavior + nested/inactive amends folded in. Tests: 596 -> 605 (+9). Full local suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 13:03:58 -04:00
Anto01	785756fb58	docs(ledger): openclaw registered, proposals queue drained Wave 1.5 fully closed. 8 registered projects total. No unregistered labels above the 10-active-memory threshold remain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 11:41:21 -04:00
Anto01	a8f4d51f06	docs(ledger): apm registered as product I03 Atomaste Parts Manager joins the registered project set alongside atomizer-v2. Aliases: i03, atomaste-parts-manager, parts-manager. 165 active + 22 candidate memories now resolve to a registered canonical id. /admin/projects/proposals count drops 2 -> 1 (openclaw remains, framing pending). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 10:32:41 -04:00
Anto01	848ad9db2d	docs(ledger): Wave 1.5 deployed, apm + openclaw proposals live build `b69d2c7`. /admin/projects/proposals returns apm (165 active) and openclaw (17). Both have empty suggested_aliases — they're standalone, not part of a sibling cluster. Branches cleaned up. Operator decision pending: register apm via POST /admin/projects/register-emerging with body {"project_id":"apm"}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:23:09 -04:00
Anto01	b69d2c7088	feat(projects): Wave 1.5 — live emerging-project registration proposals GET /admin/projects/proposals?min_active=N — on-demand companion to the nightly scripts/detect_emerging.py cache. Reads SQL + the registry directly so the result is current. Each proposal: - project_id (literal label as captured) - active_count / candidate_count from current SQL - sample_memories: 3 most recent active rows with content preview - suggested_aliases: sibling labels sharing a >=4-char token, case-insensitive (lead-space + lead-space-exploration-ltd + space-exploration-ltd cluster; apm and apm-fpga do NOT cluster via the 3-char 'apm') - guessed_ingest_root: vault:incoming/projects/<id>/ Workflow: hit /admin/projects/proposals to see "what should I register?", then POST to existing /admin/projects/register-emerging. For prod: apm has 165 active memories, openclaw has 17, hydrotech-mining variants combine to 13. apm is overdue. Closes Codex's prior P2 from the state-of-service review. Reviewed by Codex on tip `e8ac8bb` (verdict GO); two follow-on improvements (stronger negative-clustering test + case-insensitive tokens) folded into `f70fa6b`. 10 regression tests covering: registered canonical/alias exclusion, threshold filtering, sibling clustering, short-token negative, case-insensitive clustering, registered-token-leak guard, sample shape, candidate counting, param validation, sort order. Test count: 586 -> 596. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:16:18 -04:00
Anto01	d4ee52729c	docs(ledger): Wave 1 deployed, post-deploy probes green live_sha `4c70756` verified. /admin/dashboard now reports SQL-aggregate counts (1170 active visible vs 315 pre-deploy). Harness 20/20. Supersede/invalidate route branches probed live: 404 unknown / 409 candidate / 200 active. Wave 1 closed; Wave 1.5 (emerging-project registration proposal) is the next branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:05:48 -04:00
Anto01	4c7075650c	fix(memory): Wave 1 — SQL-aggregate dashboard counts + memory write-path fixes Closes three live-affecting bugs surfaced by the 2026-04-29 Codex review, all in the memory write/read path. Pre-deploy on Dalidou the live discrepancy was dashboard.memories.active=315 vs integrity active=1091. 1. /admin/dashboard counts now SQL-aggregate (no sampling). New get_memory_count_summary() helper. Dashboard memories.{active, candidates,by_type,by_project,reinforced,by_status,total} all derive from full-table SQL, not a confidence-sorted limit=500 sample. Post deploy the dashboard active count must match the integrity panel. 2. PUT /memory/{id} accepts project; auto-triage now applies it. Added project to MemoryUpdateRequest and update_memory() with resolve_project_name canonicalization, before/after audit, and duplicate-active check scoped to the new project. scripts/auto_triage.py suggested-project correction now PUTs {"project": suggested} so misattribution flags actually retarget the memory. 3. POST /memory/{id}/invalidate uses direct id lookup. New get_memory(id) helper. Replaces the old _get_memories(status="active", limit=1) lookup, which only saw the highest-confidence active row. Active memories outside slot 0 no longer 404. Same status-guard structure applied to POST /memory/{id}/supersede so candidates can't silently flip to superseded. 14 regression tests added (572 -> 586 locally). Reviewed by Codex twice: verdict GO on tip `9604c3e`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 21:57:08 -04:00
Anto01	7042eaea46	docs: record fully green retrieval harness	2026-04-24 21:04:13 -04:00
Anto01	d3de9f67ea	fix(context): rank trusted state by query relevance	2026-04-24 20:54:56 -04:00
Anto01	0fc6705173	docs: record retrieval stabilization state	2026-04-24 20:45:05 -04:00
Anto01	a87d9845a8	fix(memory): widen query-time context candidates	2026-04-24 20:29:00 -04:00
Antoine Letarte	4744c69d10	fix(memory): rank project memories by query intent	2026-04-24 20:49:53 +00:00
Anto01	867a1abfaa	merge: explicit project id retrieval metadata	2026-04-24 11:36:19 -04:00
Anto01	05c11fd4fb	fix(retrieval): fail open on registry resolution errors	2026-04-24 11:32:46 -04:00
Anto01	ce6ffdbb63	fix(retrieval): preserve project ids across unscoped ingest	2026-04-24 11:22:13 -04:00
Anto01	c03022d864	feat(retrieval): persist explicit chunk project ids	2026-04-24 11:02:30 -04:00
Anto01	f44a211497	merge: project-scoped retrieval audit improvements	2026-04-24 10:47:15 -04:00
Anto01	c7212900b0	fix(retrieval): enforce project-scoped context boundaries	2026-04-24 10:46:56 -04:00