Add rebased OpenClaw x AtoCore V1 review docs

2026-04-23 16:11:00 +00:00
26 changed files with 2186 additions and 1641 deletions
--- a/DEV-LEDGER.md
+++ b/DEV-LEDGER.md
@@ -6,17 +6,17 @@

 ## Orientation

- **live_sha** (Dalidou `/health` build_sha): `f44a211` (verified 2026-04-24T14:48:44Z post audit-improvements deploy; status=ok)
- **last_updated**: 2026-04-24 by Codex (retrieval boundary deployed; project_id metadata branch started)
- **main_tip**: `f44a211`
- **test_count**: 567 on `codex/project-id-metadata-retrieval` (deployed main baseline: 553)
- **harness**: `19/20 PASS` on live Dalidou, 0 blocking failures, 1 known content gap (`p04-constraints`)
+- **live_sha** (Dalidou `/health` build_sha): `2b86543` (verified 2026-04-23T15:20:53Z post-R14 deploy; status=ok)
+- **last_updated**: 2026-04-23 by Claude (R14 squash-merged + deployed; Orientation refreshed)
+- **main_tip**: `2b86543`
+- **test_count**: 548 (547 + 1 R14 regression test)
+- **harness**: `17/18 PASS` on live Dalidou (p04-constraints expects "Zerodur" — known content gap, not regression; consistent since 2026-04-19)
 - **vectors**: 33,253
- **active_memories**: 290 (`/admin/dashboard` 2026-04-24; note integrity panel reports a separate active_memory_count=951 and needs reconciliation)
- **candidate_memories**: 0 (triage queue drained)
- **interactions**: 951 (`/admin/dashboard` 2026-04-24)
+- **active_memories**: 784 (up from 84 pre-density-batch — density gate CRUSHED vs V1-A's 100-target)
+- **candidate_memories**: 2 (triage queue drained)
+- **interactions**: 500+ (limit=2000 query returned 500 — density batch has been running; actual may be higher, confirm via /stats next update)
 - **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08)
- **project_state_entries**: 128 across registered projects (`/admin/dashboard` 2026-04-24)
+- **project_state_entries**: 63 (atocore alone; full cross-project count not re-sampled this update)
 - **entities**: 66 (up from 35 — V1-0 backfill + ongoing work; 0 open conflicts)
 - **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified
 - **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)** → **pipeline summary (NEW)**
@@ -170,16 +170,6 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha

 ## Session Log

- **2026-04-24 Codex (retrieval boundary deployed + project_id metadata tranche)** Merged `codex/audit-improvements-foundation` to `main` as `f44a211` and pushed to Dalidou Gitea. Took pre-deploy runtime backup `/srv/storage/atocore/backups/snapshots/20260424T144810Z` (DB + registry, no Chroma). Deployed via `papa@dalidou` canonical `deploy/dalidou/deploy.sh`; live `/health` reports build_sha `f44a2114970008a7eec4e7fc2860c8f072914e38`, build_time `2026-04-24T14:48:44Z`, status ok. Post-deploy retrieval harness: 20 fixtures, 19 pass, 0 blocking failures, 1 known issue (`p04-constraints`). The former blocker `p05-broad-status-no-atomizer` now passes. Manual p05 `context-build "current status"` spot check shows no p04/Atomizer source bleed in retrieved chunks. Started follow-up branch `codex/project-id-metadata-retrieval`: registered-project ingestion now writes explicit `project_id` into DB chunk metadata and Chroma vector metadata; retrieval prefers exact `project_id` when present and keeps path/tag matching as legacy fallback; added dry-run-by-default `scripts/backfill_chunk_project_ids.py` to backfill SQLite + Chroma metadata; added tests for project-id ingestion, registered refresh propagation, exact project-id retrieval, and collision fallback. Verified targeted suite (`test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 36 passed. Verified full suite: 556 passed in 72.44s. Branch not merged or deployed yet.
-
- **2026-04-24 Codex (project_id audit response)** Applied independent-audit fixes on `codex/project-id-metadata-retrieval`. Closed the nightly `/ingest/sources` clobber risk by adding registry-level `derive_project_id_for_path()` and making unscoped `ingest_file()` derive ownership from registered ingest roots when possible; `refresh_registered_project()` still passes the canonical project id directly. Changed retrieval so empty `project_id` falls through to legacy path/tag ownership instead of short-circuiting as unowned. Hardened `scripts/backfill_chunk_project_ids.py`: `--apply` now requires `--chroma-snapshot-confirmed`, runs Chroma metadata updates before SQLite writes, batches updates, skips/report missing vectors, skips/report malformed metadata, reports already-tagged rows, and turns missing ingestion tables into a JSON `db_warning` instead of a traceback. Added tests for auto-derive ingestion, empty-project fallback, ingest-root overlap rejection, and backfill dry-run/apply/snapshot/missing-vector/malformed cases. Verified targeted suite (`test_backfill_chunk_project_ids.py`, `test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 45 passed. Verified full suite: 565 passed in 73.16s. Local dry-run on empty/default data returns 0 updates with `db_warning` rather than crashing. Branch still not merged/deployed.
-
- **2026-04-24 Codex (project_id final hardening before merge)** Applied the final independent-review P2s on `codex/project-id-metadata-retrieval`: `ingest_file()` still fails open when project-id derivation fails, but now emits `project_id_derivation_failed` with file path and error; retrieval now catches registry failures both at project-scope resolution and the soft project-match boost path, logs warnings, and serves unscoped rather than raising. Added regression tests for both fail-open paths. Verified targeted suite (`test_ingestion.py`, `test_retrieval.py`, `test_backfill_chunk_project_ids.py`, `test_project_registry.py`): 47 passed. Verified full suite: 567 passed in 79.66s. Branch still not merged/deployed.
-
- **2026-04-24 Codex (audit improvements foundation)** Started implementation of the audit recommendations on branch `codex/audit-improvements-foundation` from `origin/main@c53e61e`. First tranche: registry-aware project-scoped retrieval filtering (`ATOCORE_RANK_PROJECT_SCOPE_FILTER`, widened candidate pull before filtering), eval harness known-issue lane, two p05 project-bleed fixtures, `scripts/live_status.py`, README/current-state/master-plan status refresh. Verified `pytest -q`: 550 passed in 67.11s. Live retrieval harness against undeployed production: 20 fixtures, 18 pass, 1 known issue (`p04-constraints` Zerodur/1.2 content gap), 1 blocking guard (`p05-broad-status-no-atomizer`) still failing because production has not yet deployed the retrieval filter and currently pulls `P04-GigaBIT-M1-KB-design` into broad p05 status context. Live dashboard refresh: health ok, build `2b86543`, docs 1748, chunks/vectors 33253, interactions 948, active memories 289, candidates 0, project_state total 128. Noted count discrepancy: dashboard memories.active=289 while integrity active_memory_count=951; schedule reconciliation in a follow-up.
-
- **2026-04-24 Codex (independent-audit hardening)** Applied the Opus independent audit's fast follow-ups before merge/deploy. Closed the two P1s by making project-scope ownership path/tag-based only, adding path-segment/tag-exact matching to avoid short-alias substring collisions, and keeping title/heading text out of provenance decisions. Added regression tests for title poisoning, substring collision, and unknown-project fallback. Added retrieval log fields `raw_results_count`, `post_filter_count`, `post_filter_dropped`, and `underfilled`. Added retrieval-eval run metadata (`generated_at`, `base_url`, `/health`) and `live_status.py` auth-token/status support. README now documents the ranking knobs and clarifies that the hard scope filter and soft project match boost are separate controls. Verified `pytest -q`: 553 passed in 66.07s. Live production remains expected-predeploy: 20 fixtures, 18 pass, 1 known content gap, 1 blocking p05 bleed guard. Latest live dashboard: build `2b86543`, docs 1748, chunks/vectors 33253, interactions 950, active memories 290, candidates 0, project_state total 128.
-
 - **2026-04-23 Codex + Claude (R14 closed)** Codex reviewed `claude/r14-promote-400` at `3888db9`, no findings: "The route change is narrowly scoped: `promote_entity()` still returns False for not-found/not-candidate cases, so the existing 404 behavior remains intact, while caller-fixable validation failures now surface as 400." Ran `pytest tests/test_v1_0_write_invariants.py -q` from an isolated worktree: 15 passed in 1.91s. Claude squash-merged to main as `0989fed`, followed by ledger close-out `2b86543`, then deployed via canonical script. Dalidou `/health` reports build_sha=`2b86543e6ad26011b39a44509cc8df3809725171`, build_time `2026-04-23T15:20:53Z`, status=ok. R14 closed. Orientation refreshed earlier this session also reflected the V1-A gate status: **density gate CLEARED** (784 active memories vs 100 target — density batch-extract ran between 2026-04-22 and 2026-04-23 and more than crushed the gate), **soak gate at day 5 of ~7** (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; only chronic failure is the known p04-constraints "Zerodur" content gap). V1-A branches from a clean V1-0 baseline as soon as the soak is called done.

 - **2026-04-22 Codex + Antoine (V1-0 closed)** Codex approved `f16cd52` after re-running both original probes (legacy-candidate promote + supersede hook — both correct) and the three targeted regression suites (`test_v1_0_write_invariants.py`, `test_engineering_v1_phase5.py`, `test_inbox_crossproject.py` — all pass). Squash-merged to main as `2712c5d` ("feat(engineering): enforce V1-0 write invariants"). Deployed to Dalidou via the canonical deploy script; `/health` build_sha=`2712c5d2d03cb2a6af38b559664afd1c4cd0e050` status=ok. Validated backup snapshot at `/srv/storage/atocore/backups/snapshots/20260422T190624Z` taken BEFORE prod backfill. Prod backfill of `scripts/v1_0_backfill_provenance.py` against live DB: dry-run found 31 active/superseded entities with no provenance, list reviewed and looked sane; live run with default `hand_authored=1` flag path updated 31 rows; follow-up dry-run returned 0 rows remaining → no lingering F-8 violations in prod. Codex logged one residual P2 (R14): HTTP `POST /entities/{id}/promote` route doesn't translate the new service-layer `ValueError` into 400 — legacy bad candidate promoted through the API surfaces as 500. Not blocking. V1-0 closed. **Gates for V1-A**: soak window ends ~2026-04-26; 100-active-memory density target (currently 84 active + the ~31 newly flagged ones — need to check how those count in density math). V1-A holds until both gates clear.
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@ Personal context engine that enriches LLM interactions with durable memory, stru

 ```bash
 pip install -e .
-uvicorn atocore.main:app --port 8100
+uvicorn src.atocore.main:app --port 8100
 ```

 ## Usage
@@ -37,10 +37,6 @@ python scripts/atocore_client.py audit-query "gigabit" 5
 | POST | /ingest | Ingest markdown file or folder |
 | POST | /query | Retrieve relevant chunks |
 | POST | /context/build | Build full context pack |
-| POST | /interactions | Capture prompt/response interactions |
-| GET/POST | /memory | List/create durable memories |
-| GET/POST | /entities | Engineering entity graph surface |
-| GET | /admin/dashboard | Operator dashboard |
 | GET | /health | Health check |
 | GET | /debug/context | Inspect last context pack |

@@ -70,10 +66,8 @@ unversioned forms.
 FastAPI (port 8100)
  |- Ingestion: markdown -> parse -> chunk -> embed -> store
  |- Retrieval: query -> embed -> vector search -> rank
-  |- Context Builder: project state -> memories -> entities -> retrieval -> budget
-  |- Reflection: capture -> reinforce -> extract -> triage -> promote/expire
-  |- Engineering: typed entities, relationships, conflicts, wiki/mirror
-  |- SQLite (documents, chunks, memories, projects, interactions, entities)
+  |- Context Builder: retrieve -> boost -> budget -> format
+  |- SQLite (documents, chunks, memories, projects, interactions)
  '- ChromaDB (vector embeddings)
 ```

@@ -88,16 +82,6 @@ Set via environment variables (prefix `ATOCORE_`):
 | ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) |
 | ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) |
 | ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model |
-| ATOCORE_RANK_PROJECT_MATCH_BOOST | 2.0 | Soft boost for chunks whose metadata matches the project hint |
-| ATOCORE_RANK_PROJECT_SCOPE_FILTER | true | Filter project-hinted retrieval away from other registered project corpora |
-| ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER | 4 | Widen candidate pull before project-scope filtering |
-| ATOCORE_RANK_QUERY_TOKEN_STEP | 0.08 | Per-token boost when query terms appear in high-signal metadata |
-| ATOCORE_RANK_QUERY_TOKEN_CAP | 1.32 | Maximum query-token boost multiplier |
-| ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST | 1.18 | Boost current decision/status/requirements-like paths |
-| ATOCORE_RANK_PATH_LOW_SIGNAL_PENALTY | 0.72 | Down-rank archive/history-like paths |
-
-`ATOCORE_RANK_PROJECT_SCOPE_FILTER` gates the hard cross-project filter only.
-`ATOCORE_RANK_PROJECT_MATCH_BOOST` remains the separate soft-ranking knob.

 ## Testing

@@ -109,11 +93,7 @@ pytest
 ## Operations

 - `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
- `scripts/retrieval_eval.py` runs the live retrieval/context harness, separates blocking failures from known content gaps, and stamps JSON output with target/build metadata.
- `scripts/live_status.py` renders a compact read-only status report from `/health`, `/stats`, `/projects`, and `/admin/dashboard`; set `ATOCORE_AUTH_TOKEN` or `--auth-token` when those endpoints are gated.
- `scripts/backfill_chunk_project_ids.py` dry-runs or applies explicit `project_id` metadata backfills for SQLite chunks and Chroma vectors; `--apply` requires a confirmed Chroma snapshot.
 - `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
- `DEV-LEDGER.md` is the fast-moving source of operational truth during active development; copy claims into docs only after checking the live service.

 ## Architecture Notes

--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -1,11 +1,6 @@
-# AtoCore - Current State (2026-04-24)
+# AtoCore — Current State (2026-04-22)

-Update 2026-04-24: audit-improvements deployed as `f44a211`; live harness is
-19/20 with 0 blocking failures and 1 known content gap. Active follow-up branch
-`codex/project-id-metadata-retrieval` is at 567 passing tests.
-
-Live deploy: `2b86543` · Dalidou health: ok · Harness: 18/20 with 1 known
-content gap and 1 current blocking project-bleed guard · Tests: 553 passing.
+Live deploy: `2712c5d` · Dalidou health: ok · Harness: 17/18 · Tests: 547 passing.

 ## V1-0 landed 2026-04-22

@@ -18,8 +13,9 @@ supersede) with Q-3 fail-open. Prod backfill ran cleanly — 31 legacy
 active/superseded entities flagged `hand_authored=1`, follow-up dry-run
 returned 0 remaining rows. Test count 533 → 547 (+14).

-R14 is closed: `POST /entities/{id}/promote` now translates the new
-caller-fixable V1-0 `ValueError` into HTTP 400.
+R14 (P2, non-blocking): `POST /entities/{id}/promote` route fix translates
+the new `ValueError` into 400. Branch `claude/r14-promote-400` pending
+Codex review + squash-merge.

 **Next in the V1 track:** V1-A (minimal query slice + Q-6 killer-correctness
 integration). Gated on pipeline soak (~2026-04-26) + 100+ active memory
@@ -69,10 +65,10 @@ Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs
 | 7G | Re-extraction on prompt version bump | pending |
 | 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |

-## Known gaps (honest, refreshed 2026-04-24)
+## Known gaps (honest)

 1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
-2. **Project-scoped retrieval guard is deployed and passing.** The April 24 p05 broad-status bleed guard now passes on live Dalidou. The active follow-up branch adds explicit `project_id` chunk/vector metadata so the deployed path/tag heuristic can become a legacy fallback.
-3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
-4. **Harness known issue:** `p04-constraints` wants "Zerodur" and "1.2"; live retrieval surfaces related constraints but not those exact strings. Treat as content/state gap until fixed.
-5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.
+2. **OpenClaw is capture-only, not context-grounded.** The plugin POSTs `/interactions` on `llm_output` but does NOT call `/context/build` on `before_agent_start`. OpenClaw's underlying agent runs blind. See Q2 below.
+3. **Human interface (wiki) is thin and static.** 5 project cards + a "System" line. No dashboard for the autonomous activity. No per-memory detail page. See Q3/Q5.
+4. **Harness 17/18** — the `p04-constraints` fixture wants "Zerodur" but retrieval surfaces related-not-exact terms. Content gap, not a retrieval regression.
+5. **Two projects under-populated**: p05-interferometer (4 memories, 18 state) and atomizer-v2 (1 memory, 6 state). Batch re-extract with the new llm-0.6.0 prompt would help.
--- a/docs/master-plan-status.md
+++ b/docs/master-plan-status.md
@@ -70,14 +70,9 @@ read-only additive mode.
 - Phase 6 - AtoDrive
 - Phase 10 - Write-back
 - Phase 11 - Multi-model
+- Phase 12 - Evaluation
 - Phase 13 - Hardening

-### Partial / Operational Baseline
-
- Phase 12 - Evaluation. The retrieval/context harness exists and runs
-  against live Dalidou, but coverage is still intentionally small and
-  should grow before this is complete in the intended sense.
-
 ### Engineering Layer Planning Sprint

 **Status: complete.** All 8 architecture docs are drafted. The
@@ -131,13 +126,11 @@ This sits implicitly between Phase 8 (OpenClaw) and Phase 11
 (multi-model). Memory-review and engineering-entity commands are
 deferred from the shared client until their workflows are exercised.

-## What Is Real Today (updated 2026-04-24)
+## What Is Real Today (updated 2026-04-16)

- canonical AtoCore runtime on Dalidou (`2b86543`, deploy.sh verified)
+- canonical AtoCore runtime on Dalidou (`775960c`, deploy.sh verified)
 - 33,253 vectors across 6 registered projects
- 951 captured interactions as of the 2026-04-24 live dashboard; refresh
-  exact live counts with
-  `python scripts/live_status.py`
+- 234 captured interactions (192 claude-code, 38 openclaw, 4 test)
 - 6 registered projects:
  - `p04-gigabit` (483 docs, 15 state entries)
  - `p05-interferometer` (109 docs, 18 state entries)
@@ -145,14 +138,12 @@ deferred from the shared client until their workflows are exercised.
  - `atomizer-v2` (568 docs, 5 state entries)
  - `abb-space` (6 state entries)
  - `atocore` (drive source, 47 state entries)
- 128 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
- 290 active memories and 0 candidate memories as of the 2026-04-24 live
-  dashboard
+- 110 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
+- 84 active memories (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
 - context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
 - query-relevance memory ranking with overlap-density scoring
- retrieval eval harness: 20 fixtures; current live has 19 pass, 1 known
-  content gap, and 0 blocking failures after the audit-improvements deploy
- 567 tests passing on the active `codex/project-id-metadata-retrieval` branch
+- retrieval eval harness: 18 fixtures, 17/18 passing on live
+- 303 tests passing
 - nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness** → **pipeline summary to project state**
 - Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
 - pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
@@ -199,9 +190,9 @@ where surfaces are disjoint, pauses when they collide.
 | V1-E | Memory→entity graduation end-to-end + remaining Q-4 trust tests | pending V1-D (note: collides with memory extractor; pauses for multi-model triage work) |
 | V1-F | F-5 detector generalization + route alias + O-1/O-2/O-3 operational + D-1/D-3/D-4 docs | finish line |

-R14 is closed: `POST /entities/{id}/promote` now translates
-caller-fixable V1-0 provenance validation failures into HTTP 400 instead
-of leaking as HTTP 500.
+R14 (P2, non-blocking): `POST /entities/{id}/promote` route returns 500
+on the new V1-0 `ValueError` instead of 400. Fix on branch
+`claude/r14-promote-400`, pending Codex review.

 ## Next

--- a/docs/openclaw-atocore-audit-note.md
+++ b/docs/openclaw-atocore-audit-note.md
@@ -0,0 +1,317 @@
+# OpenClaw x AtoCore V1 Audit Note
+
+## Scope
+
+This note is the Phase 1 audit for a safe OpenClaw x AtoCore operating model.
+It covers only what was directly verified in `/home/papa/ATOCore` and `/home/papa/clawd` on 2026-04-23, plus explicit assumptions called out as assumptions.
+
+This phase does not change code, runtime behavior, skills, helpers, or automation.
+
+## Files requested and verified
+
+The following requested AtoCore files were present and reviewed:
+
+- `docs/openclaw-integration-contract.md`
+- `docs/architecture/llm-client-integration.md`
+- `docs/architecture/representation-authority.md`
+- `docs/operating-model.md`
+- `docs/current-state.md`
+- `docs/master-plan-status.md`
+- `docs/operations.md`
+- `AGENTS.md`
+- `CLAUDE.md`
+- `DEV-LEDGER.md`
+
+No requested files were missing.
+
+## What was directly verified
+
+### 1. OpenClaw instruction surface
+
+In `/home/papa/clawd/AGENTS.md`, OpenClaw is currently instructed to:
+
+- use the `atocore-context` skill for project-dependent work
+- treat AtoCore as additive and fail-open
+- prefer `auto-context` for project knowledge questions
+- prefer `project-state` for trusted current truth
+- use `refresh-project` if the human explicitly asked to refresh or ingest project changes
+- use `discrawl` automatically when Antoine asks about prior Discord discussions
+
+This is already close to the intended additive read path, but it also exposes mutating project operations in a general operator workflow.
+
+### 2. OpenClaw helper skill surface
+
+The current helper skill is:
+
+- `/home/papa/clawd/skills/atocore-context/SKILL.md`
+- `/home/papa/clawd/skills/atocore-context/scripts/atocore.sh`
+
+The skill describes AtoCore as a read-only additive context service, but the helper script currently exposes the following commands:
+
+- `health`
+- `sources`
+- `stats`
+- `projects`
+- `project-template`
+- `detect-project`
+- `auto-context`
+- `debug-context`
+- `propose-project`
+- `register-project`
+- `update-project`
+- `refresh-project`
+- `project-state`
+- `query`
+- `context-build`
+- `ingest-sources`
+
+That means the helper is not actually read-only. It can drive registry mutation and ingestion-related operations.
+
+### 3. AtoCore shared operator client surface
+
+The shared operator client in `/home/papa/ATOCore/scripts/atocore_client.py` exposes a broader surface than the OpenClaw helper, including:
+
+- all of the project and context operations above
+- `project-state-set`
+- `project-state-invalidate`
+- `capture`
+- `extract`
+- `reinforce-interaction`
+- `list-interactions`
+- `get-interaction`
+- `queue`
+- `promote`
+- `reject`
+- `batch-extract`
+- `triage`
+
+This matches the architectural intent in `docs/architecture/llm-client-integration.md`: a shared operator client should be the canonical reusable surface for multiple frontends.
+
+### 4. Actual layering status today
+
+The intended layering is documented in `docs/architecture/llm-client-integration.md` as:
+
+- AtoCore HTTP API
+- shared operator client
+- thin per-agent frontends
+
+But the current OpenClaw helper is still its own Bash implementation. It does not shell out to the shared operator client today.
+
+So the shared-client pattern is documented, but not yet applied to OpenClaw.
+
+### 5. AtoCore availability and fail-open behavior
+
+The OpenClaw helper successfully reached the live AtoCore instance during this audit.
+
+Verified live behavior:
+
+- `health` worked
+- `projects` worked
+- the helper still has fail-open logic when network access fails
+
+This part is consistent with the stated additive and fail-open stance.
+
+### 6. Discrawl availability
+
+The `discrawl` CLI is installed locally and available.
+
+Verified during audit:
+
+- binary present
+- version `0.3.0`
+- OpenClaw workspace instructions explicitly route project-history recall through `discrawl`
+
+This supports the desired framing of Discord and Discrawl as an evidence stream.
+
+### 7. Screenpipe status
+
+`screenpipe` was not present as a local command in this environment during the audit.
+
+For V1, Screenpipe is deferred and out of scope. No active Screenpipe input lane was verified or adopted in the final V1 policy.
+
+## Current implementation shape
+
+### What OpenClaw can do safely right now
+
+The current safe, directly verified OpenClaw -> AtoCore path is:
+
+- project detection
+- context build
+- query and retrieval
+- project-state read
+- service inspection
+- fail-open fallback
+
+That is the mature part of the integration.
+
+### What OpenClaw can also do today, but should be treated as controlled operator actions
+
+The current helper also exposes:
+
+- project proposal preview
+- project registration
+- project update
+- project refresh
+- ingest-sources
+
+These should not be treated as background or conversational automation. They are operator actions and need explicit approval policy.
+
+### What exists in AtoCore but is not exposed through the OpenClaw helper
+
+The shared operator client already supports:
+
+- interaction capture
+- candidate extraction
+- queue review
+- promote or reject
+- trusted project-state write and invalidate
+
+The current OpenClaw helper does not expose that surface.
+
+This is important for V1 design: the write-capable lanes already exist in AtoCore, but they are not yet safely shaped for Discord-originated automation.
+
+## Conflicts with the target V1 stance
+
+The following conflicts are real and should be named explicitly.
+
+### Conflict 1 - the OpenClaw helper is described as read-only, but it is not read-only
+
+`SKILL.md` frames the integration as read-only additive context.
+`atocore.sh` exposes mutating operations:
+
+- `register-project`
+- `update-project`
+- `refresh-project`
+- `ingest-sources`
+
+That mismatch needs a policy fix in Phase 2. For Phase 1 it must be documented as a conflict.
+
+### Conflict 2 - OpenClaw duplicates client logic instead of using the shared operator client
+
+The architecture docs prefer a shared operator client reused across frontends.
+The OpenClaw helper currently reimplements request logic and project detection in Bash.
+
+That is a direct conflict with the preferred shared-client pattern.
+
+### Conflict 3 - mutating project operations are too close to the conversational surface
+
+The helper makes registry and ingestion operations reachable from the OpenClaw side without a dedicated Discord-specific approval gate.
+
+Even if the human explicitly asks for a refresh, the current shape does not yet distinguish between:
+
+- a direct trusted operator action in a controlled session
+- a Discord-originated conversational path that should require an explicit human approval step before mutation
+
+The Phase 2 V1 policy needs that distinction.
+
+### Conflict 4 - current docs overstate or blur write capabilities
+
+`docs/current-state.md` says OpenClaw can seed AtoCore through project-scoped memory entries and staged document ingestion.
+That was not directly verified through the current OpenClaw helper surface in `/home/papa/clawd`.
+
+The helper script does not expose:
+
+- `capture`
+- `extract`
+- `promote`
+- `reject`
+- `project-state-set`
+
+So there is at least a documentation and runtime-surface mismatch.
+
+### Conflict 5 - there was no single OpenClaw-facing evidence lane description before this doc set
+
+The target architecture needs a clean distinction between:
+
+- raw evidence
+- reviewable candidates
+- active memories and entities
+- trusted project_state
+
+Today that distinction exists conceptually across several AtoCore docs, but before this Phase 1 doc set there was no single OpenClaw-facing operating model that told an operator exactly where Discord and Discrawl signals are allowed to land.
+
+That is the main gap this doc set closes.
+
+## What is already aligned with the target V1 stance
+
+Several important pieces are already aligned.
+
+### Aligned 1 - additive plus fail-open
+
+Both AtoCore and OpenClaw docs consistently say AtoCore should be additive and fail-open from the OpenClaw side.
+That is the right baseline and was verified live.
+
+### Aligned 2 - project_state is already treated as special and curated
+
+AtoCore architecture docs already treat `project_state` as the highest-trust curated layer.
+This supports the rule that raw signals must not directly auto-write trusted project state.
+
+### Aligned 3 - canonical-home thinking already exists
+
+`docs/architecture/representation-authority.md` already establishes that each fact type needs one canonical home.
+That is exactly the right foundation for the Discord and Discrawl design.
+
+### Aligned 4 - reflection and candidate lifecycle already exists in AtoCore
+
+The shared operator client and AtoCore docs already have a candidate workflow:
+
+- capture
+- extract
+- queue
+- promote or reject
+
+That means V1 does not need to invent a new trust model. It needs to apply the existing one correctly to Discord and Discrawl signals.
+
+## Recommended V1 operating interpretation
+
+Until implementation work begins, the safest V1 operating interpretation is:
+
+1. Discord and Discrawl are evidence sources, not truth sources.
+2. OpenClaw is the orchestrator and operator, not canonical storage.
+3. AtoCore memories may hold reviewed episodic, personal, and loose project signal.
+4. Future AtoCore entities should hold reviewed structured decisions, requirements, and constraints.
+5. `project_state` remains manual or tightly gated only.
+6. Registry mutation, refresh, ingestion, and candidate promotion or rejection require explicit human approval on Discord-originated paths.
+7. The shared operator client should become the only write-capable operator surface reused by OpenClaw and other frontends.
+8. Screenpipe remains deferred and out of V1 scope.
+
+## Assumption log
+
+The following points were not directly verified and must stay labeled as assumptions.
+
+1. Screenpipe integration shape is unverified and deferred.
+   - The `screenpipe` command was not present locally.
+   - No verified Screenpipe pipeline files were found in the inspected workspaces.
+   - V1 therefore excludes Screenpipe from active policy and runtime scope.
+
+2. No direct Discord -> AtoCore auto-mutation path was verified in code.
+   - The OpenClaw workspace clearly contains read and query context behavior and a Discrawl retrieval rule.
+   - It does not clearly expose a verified Discord-triggered path that auto-calls `project-state-set`, `promote`, `reject`, or `register-project`.
+   - The risk is therefore policy and proximity of commands, not a proven live mutation bug.
+
+3. OpenClaw runtime use of the shared operator client was not verified because it is not implemented yet.
+   - The shared client exists in the AtoCore repo.
+   - The OpenClaw helper is still its own Bash implementation.
+
+4. A dedicated evidence store was not verified as a first-class AtoCore schema layer.
+   - Existing AtoCore surfaces clearly support interactions and candidate memories.
+   - This V1 model therefore uses evidence artifacts, interactions, and archive bundles as an architectural lane, without claiming a new implemented table already exists.
+
+5. Future entities remain future.
+   - The entity layer is architected in AtoCore docs.
+   - This audit did not verify a production entity promotion flow being used by OpenClaw.
+
+## Bottom line
+
+The good news is that the trust foundations already exist.
+
+The main conclusion is that the current system is closest to a safe V1 when interpreted this way:
+
+- keep AtoCore additive and fail-open
+- treat Discord and Discrawl as evidence only
+- route reviewed signal into memory candidates first
+- reserve `project_state` for explicit curation only
+- move OpenClaw toward the shared operator client instead of maintaining a separate write-capable helper surface
+- keep Screenpipe out of V1
+
+That gives a coherent path to Phase 2 without pretending the current implementation is already there.
--- a/docs/openclaw-atocore-clawd-governance-review.patch
+++ b/docs/openclaw-atocore-clawd-governance-review.patch
@@ -0,0 +1,224 @@
+commit 80bd99aaea1bcab2ea5ea732df2f749e84d84318
+Author: Anto01 <antoine.letarte@gmail.com>
+Date:   Thu Apr 23 15:59:59 2026 +0000
+
+    Tighten OpenClaw AtoCore governance policy
+
+diff --git a/AGENTS.md b/AGENTS.md
+index 1da3385..ea4d103 100644
+--- a/AGENTS.md
+++ b/AGENTS.md
+@@ -105,7 +105,7 @@ Reactions are lightweight social signals. Humans use them constantly — they sa
+ 
+ ## Tools
+ 
+-When a task is contextual and project-dependent, use the `atocore-context` skill to query Dalidou-hosted AtoCore for trusted project state, retrieval, context-building, registered project refresh, or project registration discovery when that will improve accuracy. Treat AtoCore as additive and fail-open; do not replace OpenClaw's own memory with it. Prefer `projects` and `refresh-project <id>` when a known project needs a clean source refresh, and use `project-template` when proposing a new project registration, and `propose-project ...` when you want a normalized preview before editing the registry manually.
+When a task is contextual and project-dependent, use the `atocore-context` skill to query Dalidou-hosted AtoCore for trusted project-state reads, retrieval, and context-building when that will improve accuracy. Treat AtoCore as additive and fail-open; do not replace OpenClaw's own memory with it.
+ 
+ ### Organic AtoCore Routing
+ 
+@@ -116,14 +116,60 @@ Use AtoCore first when the prompt:
+ - asks about architecture, constraints, status, requirements, vendors, planning, prior decisions, or current project truth
+ - would benefit from cross-source context instead of only the local repo
+ 
+-Preferred flow:
+Preferred read path:
+ 1. `auto-context "<prompt>" 3000` for most project knowledge questions
+ 2. `project-state <project>` when the user is clearly asking for trusted current truth
+-3. `refresh-project <id>` before answering if the user explicitly asked to refresh or ingest project changes
+3. fall back to normal OpenClaw tools and memory if AtoCore returns `no_project_match` or is unavailable
+ 
+ Do not force AtoCore for purely local coding actions like fixing a function, editing one file, or running tests, unless broader project context is likely to matter.
+ 
+-If `auto-context` returns `no_project_match` or AtoCore is unavailable, continue normally with OpenClaw's own tools and memory.
+### AtoCore Governance
+
+Default Discord posture for AtoCore is read-only and additive.
+
+Discord-originated or Discrawl-originated context may inform:
+- evidence collection
+- retrieval
+- context building
+- candidate review preparation
+
+It must not directly perform AtoCore mutating actions.
+
+Mutating AtoCore actions include:
+- `register-project`
+- `update-project`
+- `refresh-project`
+- `ingest-sources`
+- `project-state-set`
+- `project-state-invalidate`
+- `promote`
+- `reject`
+- any future trusted-state or review mutation
+
+These actions require explicit human approval for the specific action in the current thread or session.
+Do not infer approval from:
+- prior Discord discussion
+- Discrawl archive recall
+- screener output
+- vague intent like "we should probably refresh this"
+
+Hard rules:
+- no direct Discord -> `project_state`
+- no direct Discord -> register / update / refresh / ingest / promote / reject
+- no hidden mutation inside screening or review-prep flows
+- PKM notes are not the main operator instruction surface for AtoCore behavior
+
+### Discord Archive Retrieval (discrawl)
+
+When Antoine asks in natural language about prior project discussions, decisions, thread history, answers, or whether something was already discussed in Discord, use the local `discrawl` archive automatically.
+
+Rules:
+- Antoine should not need to remember or type `discrawl` commands.
+- Treat Discord history as a normal background retrieval source, like memory or project docs.
+- Use `discrawl` silently when it will materially improve recall or confidence.
+- Prefer this for prompts like "what did we decide", "did we discuss", "summarize the thread", "what were the open questions", or anything clearly anchored in prior Discord conversation.
+- If both AtoCore and Discord history are relevant, use both and synthesize.
+- If `discrawl` is stale or unavailable, say so briefly and continue with the best available context.
+ 
+ Skills provide your tools. When you need one, check its `SKILL.md`. Keep local notes (camera names, SSH details, voice preferences) in `TOOLS.md`.
+ 
+diff --git a/skills/atocore-context/SKILL.md b/skills/atocore-context/SKILL.md
+index e42a7b7..fa23207 100644
+--- a/skills/atocore-context/SKILL.md
+++ b/skills/atocore-context/SKILL.md
+@@ -1,12 +1,11 @@
+ ---
+ name: atocore-context
+-description: Use Dalidou-hosted AtoCore as a read-only external context service for project state, retrieval, and context-building without touching OpenClaw's own memory.
+description: Use Dalidou-hosted AtoCore as an additive external context service for project-state reads, retrieval, and context-building without replacing OpenClaw's own memory.
+ ---
+ 
+ # AtoCore Context
+ 
+-Use this skill when you need trusted project context, retrieval help, or AtoCore
+-health/status from the canonical Dalidou instance.
+Use this skill when you need trusted project context, retrieval help, or AtoCore health and status from the canonical Dalidou instance.
+ 
+ ## Purpose
+ 
+@@ -14,7 +13,7 @@ AtoCore is an additive external context service.
+ 
+ - It does not replace OpenClaw's own memory.
+ - It should be used for contextual work, not trivial prompts.
+-- It is read-only in this first integration batch.
+- The default posture is read-only and fail-open.
+ - If AtoCore is unavailable, continue normally.
+ 
+ ## Canonical Endpoint
+@@ -31,27 +30,22 @@ Override with:
+ ATOCORE_BASE_URL=http://host:port
+ ```
+ 
+-## Safe Usage
+## V1 scope
+ 
+-Use AtoCore for:
+-- project-state checks
+Use this skill in V1 for:
+
+- project-state reads
+ - automatic project detection for normal project questions
+-- retrieval over ingested project/ecosystem docs
+- retrieval over ingested project and ecosystem docs
+ - context-building for complex project prompts
+ - verifying current AtoCore hosting and architecture state
+-- listing registered projects and refreshing a known project source set
+-- inspecting the project registration template before proposing a new project entry
+-- generating a proposal preview for a new project registration without writing it
+-- registering an approved project entry when explicitly requested
+-- updating an existing registered project when aliases or description need refinement
+- inspecting project registrations and proposal previews when operator review is needed
+ 
+-Do not use AtoCore for:
+-- automatic memory write-back
+-- replacing OpenClaw memory
+-- silent ingestion of broad new corpora without approval
+-- mutating the registry automatically without human approval
+Screenpipe is out of V1 scope. Do not treat it as an active input lane or dependency for this skill.
+
+## Read path commands
+ 
+-## Commands
+These are the normal additive commands:
+ 
+ ```bash
+ ~/clawd/skills/atocore-context/scripts/atocore.sh health
+@@ -62,15 +56,56 @@ Do not use AtoCore for:
+ ~/clawd/skills/atocore-context/scripts/atocore.sh detect-project "what's the interferometer error budget?"
+ ~/clawd/skills/atocore-context/scripts/atocore.sh auto-context "what's the interferometer error budget?" 3000
+ ~/clawd/skills/atocore-context/scripts/atocore.sh debug-context
+-~/clawd/skills/atocore-context/scripts/atocore.sh propose-project p07-example "p07,example-project" vault incoming/projects/p07-example "Example project" "Primary staged project docs"
+-~/clawd/skills/atocore-context/scripts/atocore.sh register-project p07-example "p07,example-project" vault incoming/projects/p07-example "Example project" "Primary staged project docs"
+-~/clawd/skills/atocore-context/scripts/atocore.sh update-project p05 "Curated staged docs for the P05 interferometer architecture, vendors, and error-budget project."
+-~/clawd/skills/atocore-context/scripts/atocore.sh refresh-project p05
+ ~/clawd/skills/atocore-context/scripts/atocore.sh project-state atocore
+ ~/clawd/skills/atocore-context/scripts/atocore.sh query "What is AtoDrive?"
+ ~/clawd/skills/atocore-context/scripts/atocore.sh context-build "Need current AtoCore architecture" atocore 3000
+ ```
+ 
+## Approved operator actions only
+
+The helper currently exposes some mutating commands, but they are not normal background behavior.
+Treat them as approved operator actions only:
+
+```bash
+~/clawd/skills/atocore-context/scripts/atocore.sh propose-project ...
+~/clawd/skills/atocore-context/scripts/atocore.sh register-project ...
+~/clawd/skills/atocore-context/scripts/atocore.sh update-project ...
+~/clawd/skills/atocore-context/scripts/atocore.sh refresh-project ...
+~/clawd/skills/atocore-context/scripts/atocore.sh ingest-sources
+```
+
+Do not use these from a Discord-originated path unless the human explicitly approves the specific action in the current thread or session.
+
+## Explicit approval rule
+
+Explicit approval means all of the following:
+
+- the human directly instructs the specific mutating action
+- the instruction is in the current thread or current session
+- the approval is for that specific action
+- the approval is not inferred from Discord evidence, Discrawl recall, screener output, or vague intent
+
+Examples of explicit approval:
+
+- "refresh p05 now"
+- "register this project"
+- "update the aliases"
+
+Non-examples:
+
+- "we should probably refresh this"
+- archived discussion suggesting a refresh
+- a screener note recommending promotion or ingestion
+
+## Do not use AtoCore for
+
+- automatic memory write-back
+- replacing OpenClaw memory
+- silent ingestion of broad new corpora without approval
+- automatic registry mutation
+- direct Discord-originated mutation of trusted or operator state
+- direct Discord-originated promote or reject actions
+
+ ## Contract
+ 
+ - prefer AtoCore only when additional context is genuinely useful
+@@ -79,10 +114,6 @@ Do not use AtoCore for:
+ - cite when information came from AtoCore rather than local OpenClaw memory
+ - for normal project knowledge questions, prefer `auto-context "<prompt>" 3000` before answering
+ - use `detect-project "<prompt>"` when you want to inspect project inference explicitly
+-- use `debug-context` right after `auto-context` or `context-build` when you want
+-  to inspect the exact last AtoCore context pack
+-- prefer `projects` plus `refresh-project <id>` over long ad hoc ingest instructions when the project is already registered
+-- use `project-template` when preparing a new project registration proposal
+-- use `propose-project ...` to draft a normalized entry and review collisions first
+-- use `register-project ...` only after the proposal has been reviewed and approved
+-- use `update-project ...` when a registered project's description or aliases need refinement before refresh
+- use `debug-context` right after `auto-context` or `context-build` when you want to inspect the exact last AtoCore context pack
+- use `project-template` and `propose-project ...` when preparing a reviewed registration proposal
+- use `register-project ...`, `update-project ...`, `refresh-project ...`, and `ingest-sources` only after explicit approval
--- a/docs/openclaw-atocore-nightly-screener-runbook.md
+++ b/docs/openclaw-atocore-nightly-screener-runbook.md
@@ -0,0 +1,354 @@
+# OpenClaw x AtoCore Nightly Screener Runbook
+
+## Purpose
+
+The nightly screener is the V1 bridge between broad evidence capture and narrow trusted state.
+
+Its job is to:
+
+- gather raw evidence from approved V1 sources
+- reduce noise
+- produce reviewable candidate material
+- prepare operator review work
+- never silently create trusted truth
+
+## Scope
+
+The nightly screener is a screening and preparation job.
+It is not a trusted-state writer.
+It is not a registry operator.
+It is not a hidden reviewer.
+
+V1 active inputs are:
+
+- Discord and Discrawl evidence
+- OpenClaw interaction evidence
+- PKM, repos, and KB references
+- read-only AtoCore context for comparison and deduplication
+
+## Explicit approval rule
+
+If the screener output points at a mutating operator action, that action still requires:
+
+- direct human instruction
+- in the current thread or current session
+- for that specific action
+- with no inference from evidence or screener output alone
+
+The screener may recommend review. It may not manufacture approval.
+
+## Inputs
+
+The screener may consume the following inputs when available.
+
+### 1. Discord and Discrawl evidence
+
+Examples:
+
+- recent archived Discord messages
+- thread excerpts relevant to known projects
+- conversation clusters around decisions, requirements, constraints, or repeated questions
+
+### 2. OpenClaw interaction evidence
+
+Examples:
+
+- captured interactions
+- recent operator conversations relevant to projects
+- already-logged evidence bundles
+
+### 3. Read-only AtoCore context inputs
+
+Examples:
+
+- project registry lookup for project matching
+- project_state read for comparison only
+- memory or entity lookups for deduplication only
+
+These reads may help the screener rank or classify candidates, but they must not be used as a write side effect.
+
+### 4. Optional canonical-source references
+
+Examples:
+
+- PKM notes
+- repo docs
+- KB-export summaries
+
+These may be consulted to decide whether a signal appears to duplicate or contradict already-canonical truth.
+
+## Outputs
+
+The screener should produce output in four buckets.
+
+### 1. Nightly screener report
+
+A compact report describing:
+
+- inputs seen
+- items skipped
+- candidate counts
+- project match confidence distribution
+- failures or unavailable sources
+- items requiring human review
+
+### 2. Evidence bundle or manifest
+
+A structured bundle of the source snippets that justified each candidate or unresolved item.
+This is the reviewer's provenance package.
+
+### 3. Candidate manifests
+
+Separate candidate manifests for:
+
+- memory candidates
+- entity candidates later
+- unresolved "needs canonical-source update first" items
+
+### 4. Operator action queue
+
+A short list of items needing explicit human action, such as:
+
+- review these candidates
+- decide whether to refresh project X
+- decide whether to curate project_state
+- decide whether a Discord-originated claim should first be reflected in PKM, repo, or KB
+
+## Required non-output
+
+The screener must not directly produce any of the following:
+
+- active memories without review
+- active entities without review
+- project_state writes
+- registry mutation
+- refresh operations
+- ingestion operations
+- promote or reject decisions
+
+## Nightly procedure
+
+### Step 1 - load last-run checkpoint
+
+Read the last successful screener checkpoint so the run knows:
+
+- what time range to inspect
+- what evidence was already processed
+- which items were already dropped or bundled
+
+If no checkpoint exists, use a conservative bounded time window and mark the run as bootstrap mode.
+
+### Step 2 - gather evidence
+
+Collect available evidence from each configured source.
+
+Per-source rule:
+
+- source unavailable -> note it, continue
+- source empty -> note it, continue
+- source noisy -> keep raw capture bounded and deduplicated
+
+### Step 3 - normalize and deduplicate
+
+For each collected item:
+
+- normalize timestamps, source ids, and project hints
+- remove exact duplicates
+- group repeated or near-identical evidence when practical
+- keep provenance pointers intact
+
+The goal is to avoid flooding review with repeated copies of the same conversation.
+
+### Step 4 - attempt project association
+
+For each evidence item, try to associate it with:
+
+- a registered project id, or
+- `unassigned` if confidence is low
+
+Rules:
+
+- high confidence match -> attach project id
+- low confidence match -> mark as uncertain
+- no good match -> leave unassigned
+
+Do not force a project assignment just to make the output tidier.
+
+### Step 5 - classify signal type
+
+Classify each normalized item into one of these buckets:
+
+- noise / ignore
+- evidence only
+- memory candidate
+- entity candidate
+- needs canonical-source update first
+- needs explicit operator decision
+
+If the classification is uncertain, choose the lower-trust bucket.
+
+### Step 6 - compare against higher-trust layers
+
+For non-noise items, compare against the current higher-trust landscape.
+
+Check for:
+
+- already-active equivalent memory
+- already-active equivalent entity later
+- existing project_state answer
+- obvious duplication of canonical source truth
+- obvious contradiction with canonical source truth
+
+This comparison is read-only.
+It is used only to rank and annotate output.
+
+### Step 7 - build candidate bundles
+
+For each candidate:
+
+- include the candidate text or shape
+- include provenance snippets
+- include source type
+- include project association confidence
+- include reason for candidate classification
+- include conflict or duplicate notes if found
+
+### Step 8 - build unresolved operator queue
+
+Some items should not become candidates yet.
+Examples:
+
+- "This looks like current truth but should first be updated in PKM, repo, or KB."
+- "This Discord-originated request asks for refresh or ingest."
+- "This might be a decision, but confidence is too low."
+
+These belong in a small operator queue, not in trusted state.
+
+### Step 9 - persist report artifacts only
+
+Persist only:
+
+- screener report
+- evidence manifests
+- candidate manifests
+- checkpoint metadata
+
+If candidate persistence into AtoCore is enabled later, it still remains a candidate-only path and must not skip review.
+
+### Step 10 - exit fail-open
+
+If the screener could not reach AtoCore or some source system:
+
+- write the failure or skip into the report
+- keep the checkpoint conservative
+- do not fake success
+- do not silently mutate anything elsewhere
+
+## Failure modes
+
+### Failure mode 1 - AtoCore unavailable
+
+Behavior:
+
+- continue in fail-open mode if possible
+- write a report that the run was evidence-only or degraded
+- do not attempt write-side recovery actions
+
+### Failure mode 2 - Discrawl unavailable or stale
+
+Behavior:
+
+- note Discord archive input unavailable or stale
+- continue with other sources
+- do not invent Discord evidence summaries
+
+### Failure mode 3 - candidate explosion
+
+Behavior:
+
+- rank candidates
+- keep only a bounded top set for review
+- put the remainder into a dropped or deferred manifest
+- do not overwhelm the reviewer queue
+
+### Failure mode 4 - low-confidence project mapping
+
+Behavior:
+
+- leave items unassigned or uncertain
+- do not force them into a project-specific truth lane
+
+### Failure mode 5 - contradiction with trusted truth
+
+Behavior:
+
+- flag the contradiction in the report
+- keep the evidence or candidate for review if useful
+- do not overwrite project_state
+
+### Failure mode 6 - direct operator-action request found in evidence
+
+Examples:
+
+- "register this project"
+- "refresh this source"
+- "promote this memory"
+
+Behavior:
+
+- place the item into the operator action queue
+- require explicit human approval
+- do not perform the mutation as part of the screener
+
+## Review handoff format
+
+Each screener run should hand off a compact review package containing:
+
+1. a run summary
+2. candidate counts by type and project
+3. top candidates with provenance
+4. unresolved items needing explicit operator choice
+5. unavailable-source notes
+6. checkpoint status
+
+The handoff should be short enough for a human to review without reading the entire raw archive.
+
+## Safety rules
+
+The screener must obey these rules every night.
+
+1. No direct project_state writes.
+2. No direct registry mutation.
+3. No direct refresh or ingest.
+4. No direct promote or reject.
+5. No treating Discord or Discrawl as trusted truth.
+6. No hiding source uncertainty.
+7. No inventing missing integrations.
+8. No bringing deferred sources into V1 through policy drift or hidden dependency.
+
+## Minimum useful run
+
+A useful screener run can still succeed even if it only does this:
+
+- gathers available Discord and OpenClaw evidence
+- filters obvious noise
+- produces a small candidate manifest
+- notes unavailable archive inputs if any
+- leaves trusted state untouched
+
+That is still a correct V1 run.
+
+## Deferred from V1
+
+Screenpipe is deferred from V1. It is not an active input, not a required dependency, and not part of the runtime behavior of this V1 screener.
+
+## Bottom line
+
+The nightly screener is not the brain of the system.
+It is the filter.
+
+Its purpose is to make human review easier while preserving the trust hierarchy:
+
+- broad capture in
+- narrow reviewed truth out
+- no hidden mutations in the middle
--- a/docs/openclaw-atocore-promotion-pipeline.md
+++ b/docs/openclaw-atocore-promotion-pipeline.md
@@ -0,0 +1,360 @@
+# OpenClaw x AtoCore V1 Promotion Pipeline
+
+## Purpose
+
+This document defines the V1 promotion pipeline for signals coming from Discord, Discrawl, OpenClaw, PKM, and repos.
+
+The rule is simple:
+
+- raw capture is evidence
+- screening turns evidence into candidate material
+- review promotes candidates into canonical homes
+- trusted state is curated explicitly, not inferred automatically
+
+## V1 scope
+
+V1 active inputs are:
+
+- Discord live conversation
+- Discrawl archive retrieval
+- OpenClaw interaction logs and evidence bundles
+- PKM notes
+- repos, KB exports, and repo docs
+
+Read-only AtoCore context may be consulted for comparison and deduplication.
+
+## Explicit approval rule
+
+When this pipeline refers to approval or review for a mutating action, it means:
+
+- the human directly instructs the specific action
+- the instruction is in the current thread or current session
+- the approval is for that specific action
+- the approval is not inferred from evidence, archives, or screener output
+
+## Pipeline summary
+
+```text
+raw capture
+  -> evidence bundle
+  -> nightly screening
+  -> candidate queue
+  -> human review
+  -> canonical home
+  -> optional trusted-state curation
+```
+
+## Stage 0 - raw capture
+
+### Inputs
+
+Raw capture may come from:
+
+- Discord live conversation
+- Discrawl archive retrieval
+- OpenClaw interaction logs
+- PKM notes
+- repos / KB exports / repo docs
+
+### Rule at this stage
+
+Nothing captured here is trusted truth yet.
+Everything is either:
+
+- raw evidence, or
+- a pointer to an already-canonical source
+
+## Stage 1 - evidence bundle
+
+The first durable V1 destination for raw signals is the evidence lane.
+
+Examples of evidence bundle forms:
+
+- AtoCore interaction records
+- Discrawl retrieval result sets
+- nightly screener input bundles
+- local archived artifacts or manifests
+- optional source snapshots used only for review preparation
+
+### What evidence is for
+
+Evidence exists so the operator can later answer:
+
+- what did we actually see?
+- where did this claim come from?
+- what context supported the candidate?
+- what should the reviewer inspect before promoting anything?
+
+### What evidence is not for
+
+Evidence is not:
+
+- active memory
+- active entity
+- trusted project_state
+- registry truth
+
+## Stage 2 - screening
+
+The nightly screener or an explicit review flow reads evidence and classifies it.
+
+### Screening outputs
+
+Each observed signal should be classified into one of these lanes:
+
+1. Ignore / noise
+   - chatter
+   - duplicate archive material
+   - ambiguous fragments
+   - low-signal scraps
+
+2. Keep as evidence only
+   - useful context, but too ambiguous or too raw to promote
+
+3. Memory candidate
+   - stable enough to review as episodic, personal, or loose project signal
+
+4. Entity candidate
+   - structured enough to review as a future decision, requirement, constraint, or validation fact
+
+5. Needs canonical-source update first
+   - appears to assert current trusted truth but should first be reflected in the real canonical home, such as PKM, repo, or KB tool
+
+### Key screening rule
+
+If the screener cannot confidently tell whether a signal is:
+
+- raw evidence,
+- a loose durable memory,
+- or a structured project truth,
+
+then it must pick the lower-trust lane.
+
+In V1, uncertainty resolves downward.
+
+## Stage 3 - candidate queue
+
+Only screened outputs may enter the candidate queue.
+
+### Memory-candidate lane
+
+Use this lane for reviewed-signal candidates such as:
+
+- preferences
+- episodic facts
+- identity facts
+- loose stable project signal that is useful to remember but not yet a formal structured entity
+
+Examples:
+
+- "Antoine prefers operator summaries without extra ceremony."
+- "The team discussed moving OpenClaw toward a shared operator client."
+- "Discord history is useful as evidence but not as direct truth."
+
+### Entity-candidate lane
+
+Use this lane for future structured facts such as:
+
+- decisions
+- requirements
+- constraints
+- validation claims
+
+Examples:
+
+- "Decision: use the shared operator client instead of duplicated frontend logic."
+- "Constraint: Discord-originated paths must not directly mutate project_state."
+
+### What cannot enter directly from raw capture
+
+The following must not be created directly from raw Discord or Discrawl evidence without a screening step:
+
+- active memories
+- active entities
+- project_state entries
+- registry mutations
+- promote or reject decisions
+
+## Stage 4 - human review
+
+This is the load-bearing stage.
+
+A human reviewer, mediated by OpenClaw and eventually using the shared operator client, decides whether the candidate:
+
+- should be promoted
+- should be rejected
+- should stay pending
+- should first be rewritten into the actual canonical source
+- should become project_state only after stronger curation
+
+### Review questions
+
+For every candidate, the reviewer should ask:
+
+1. Is this actually stable enough to preserve?
+2. Is this fact ambiguous, historical, or current?
+3. What is the one canonical home for this fact type?
+4. Is memory the right home, or should this be an entity later?
+5. Is project_state justified, or is this still only evidence or candidate material?
+6. Does the source prove current truth, or only past conversation?
+
+## Stage 5 - canonical promotion
+
+After review, the signal can move into exactly one canonical home.
+
+## Promotion rules by fact shape
+
+### A. Personal, episodic, or loose project signal
+
+Promotion destination:
+
+- AtoCore memory
+
+Use when the fact is durable and useful, but not a formal structured engineering record.
+
+### B. Structured engineering fact
+
+Promotion destination:
+
+- future AtoCore entity
+
+Use when the fact is really a:
+
+- decision
+- requirement
+- constraint
+- validation claim
+
+### C. Current trusted project answer
+
+Promotion destination:
+
+- AtoCore project_state
+
+But only after explicit curation.
+
+A candidate does not become project_state just because it looks important.
+The reviewer must decide that it now represents the trusted current answer.
+
+### D. Human or tool source truth
+
+Promotion destination:
+
+- PKM / repo / KB tool of origin
+
+If a Discord-originated signal claims current truth but the canonical home is not AtoCore memory or entity, the right move may be:
+
+1. update the canonical source first
+2. then optionally refresh or ingest, with explicit approval if the action is mutating
+3. then optionally curate a project_state answer
+
+This prevents Discord from becoming the hidden source of truth.
+
+## Stage 6 - optional trusted-state curation
+
+`project_state` is not the general destination for important facts.
+It is the curated destination for current trusted project answers.
+
+Examples that may justify explicit project_state curation:
+
+- current selected architecture
+- current next milestone
+- current status summary
+- current trusted decision outcome
+
+Examples that usually do not justify immediate project_state curation:
+
+- a raw Discord debate
+- a speculative suggestion
+- a historical conversation retrieved through Discrawl
+
+## Discord-originated pipeline examples
+
+### Example 1 - raw discussion about operator-client refactor
+
+1. Discord message enters the evidence lane.
+2. Nightly screener marks it as either evidence-only or decision candidate.
+3. Human review checks whether it is an actual decision or just discussion.
+4. If stable and approved, it becomes a memory or future entity.
+5. It reaches project_state only if explicitly curated as the trusted current answer.
+
+### Example 2 - Discord thread says "refresh this project now"
+
+1. Discord message is evidence of operator intent.
+2. It does not auto-trigger refresh.
+3. OpenClaw asks for or recognizes explicit human approval.
+4. Approved operator action invokes the shared operator client.
+5. Refresh result may later influence candidates or trusted state, but the raw Discord message never performed the mutation by itself.
+
+### Example 3 - archived thread says a requirement might be current
+
+1. Discrawl retrieval enters the evidence lane.
+2. Screener marks it as evidence-only or a requirement candidate.
+3. Human review checks the canonical source alignment.
+4. If accepted later, it becomes an entity candidate or active entity.
+5. project_state remains a separate explicit curation step.
+
+## Promotion invariants
+
+The pipeline must preserve these invariants.
+
+### Invariant 1 - raw evidence is not trusted truth
+
+No raw Discord or Discrawl signal can directly become trusted project_state.
+
+### Invariant 2 - unreviewed signals can at most become candidates
+
+Automatic processing stops at evidence or candidate creation.
+
+### Invariant 3 - each fact has one canonical home
+
+A fact may be supported by many evidence items, but after review it belongs in one canonical place.
+
+### Invariant 4 - operator mutations require explicit approval
+
+Registry mutation, refresh, ingest, promote, reject, and project_state writes are operator actions.
+
+### Invariant 5 - OpenClaw orchestrates; it does not become storage
+
+OpenClaw should coordinate the pipeline, not silently become the canonical data layer.
+
+## Decision table
+
+| Observed signal type | Default pipeline outcome | Canonical destination if accepted |
+|---|---|---|
+| Ambiguous or raw conversation | Evidence only | none |
+| Historical archive context | Evidence only or candidate | memory or entity only after review |
+| Personal preference | Memory candidate | AtoCore memory |
+| Episodic fact | Memory candidate | AtoCore memory |
+| Loose stable project signal | Memory candidate | AtoCore memory |
+| Structured decision / requirement / constraint | Entity candidate | future AtoCore entity |
+| Claimed current trusted answer | Needs explicit curation | project_state, but only after review |
+| Tool-origin engineering fact | Canonical source update first | repo / KB / PKM tool of origin |
+
+## What the pipeline deliberately prevents
+
+This V1 pipeline deliberately prevents these bad paths:
+
+- Discord -> project_state directly
+- Discrawl archive -> project_state directly
+- Discord -> registry mutation directly
+- Discord -> refresh or ingest directly without explicit approval
+- raw chat -> promote or reject directly
+- OpenClaw turning evidence into truth without a review gate
+
+## Deferred from V1
+
+Screenpipe is deferred from V1. It is not an active input lane in this pipeline and it is not a runtime dependency of this pipeline. If it is revisited later, it should be handled in a separate future design and not treated as an implicit part of this pipeline.
+
+## Bottom line
+
+The promotion pipeline is intentionally conservative.
+
+Its job is not to maximize writes.
+Its job is to preserve trust while still letting Discord, Discrawl, OpenClaw, PKM, and repos contribute useful signal.
+
+That means the safe default path is:
+
+- capture broadly
+- trust narrowly
+- promote deliberately
--- a/docs/openclaw-atocore-shared-client-consolidation-preview.md
+++ b/docs/openclaw-atocore-shared-client-consolidation-preview.md
@@ -0,0 +1,96 @@
+# OpenClaw x AtoCore Shared-Client Consolidation Preview
+
+## Status
+
+Proposal only. Not applied.
+
+## Why this exists
+
+The current OpenClaw helper script duplicates AtoCore-calling logic that already exists in the shared operator client:
+
+- request handling
+- fail-open behavior
+- project detection
+- project lifecycle command surface
+
+The preferred direction is to consolidate OpenClaw toward the shared operator client pattern documented in `docs/architecture/llm-client-integration.md`.
+
+## Goal
+
+Keep the OpenClaw skill and operator policy in OpenClaw, but stop maintaining a separate Bash implementation of the AtoCore client surface when the shared client already exists in `/home/papa/ATOCore/scripts/atocore_client.py`.
+
+## Non-goals for this preview
+
+- no implementation in this phase
+- no runtime change in this phase
+- no new helper command in this phase
+- no change to approval policy in this preview
+
+## Preview diff
+
+This is a conceptual diff preview only.
+It is not applied.
+
+```diff
+--- a/skills/atocore-context/scripts/atocore.sh
+++ b/skills/atocore-context/scripts/atocore.sh
+@@
+-#!/usr/bin/env bash
+-set -euo pipefail
+-
+-BASE_URL="${ATOCORE_BASE_URL:-http://dalidou:8100}"
+-TIMEOUT="${ATOCORE_TIMEOUT_SECONDS:-30}"
+-REFRESH_TIMEOUT="${ATOCORE_REFRESH_TIMEOUT_SECONDS:-1800}"
+-FAIL_OPEN="${ATOCORE_FAIL_OPEN:-true}"
+-
+-request() {
+-  # local curl-based request logic
+-}
+-
+-detect_project() {
+-  # local project detection logic
+-}
+-
+-case "$cmd" in
+-  health) request GET /health ;;
+-  projects) request GET /projects ;;
+-  auto-context) ... ;;
+-  register-project) ... ;;
+-  refresh-project) ... ;;
+-  ingest-sources) ... ;;
+-esac
+#!/usr/bin/env bash
+set -euo pipefail
+
+CLIENT="${ATOCORE_SHARED_CLIENT:-/home/papa/ATOCore/scripts/atocore_client.py}"
+
+if [[ ! -f "$CLIENT" ]]; then
+  echo "Shared AtoCore client not found: $CLIENT" >&2
+  exit 1
+fi
+
+exec python3 "$CLIENT" "$@"
+```
+
+## Recommended implementation shape later
+
+If and when this is implemented, the safer shape is:
+
+1. keep policy and approval guidance in OpenClaw instructions and skill text
+2. delegate actual AtoCore client behavior to the shared operator client
+3. avoid adding any new helper command unless explicitly approved
+4. keep read-path and approved-operator-path distinctions in the OpenClaw guidance layer
+
+## Risk notes
+
+Potential follow-up concerns to handle before applying:
+
+- path dependency on `/home/papa/ATOCore/scripts/atocore_client.py`
+- what should happen if the AtoCore repo is unavailable from the OpenClaw machine
+- whether a thin compatibility wrapper is needed for help text or argument normalization
+- ensuring OpenClaw policy still blocks unapproved Discord-originated mutations even if the shared client exposes them
+
+## Bottom line
+
+The duplication is real and consolidation is still the right direction.
+But in this phase it remains a proposal only.
--- a/docs/openclaw-atocore-v1-architecture.md
+++ b/docs/openclaw-atocore-v1-architecture.md
@@ -0,0 +1,362 @@
+# OpenClaw x AtoCore V1 Architecture
+
+## Purpose
+
+This document defines the safe V1 operating model for how Discord, Discrawl, OpenClaw, PKM, repos, and AtoCore work together.
+
+The goal is to let these systems contribute useful signal into AtoCore without turning AtoCore into a raw dump and without blurring trust boundaries.
+
+## V1 scope
+
+V1 active inputs are:
+
+- Discord and Discrawl evidence
+- OpenClaw interaction evidence
+- PKM, repos, and KB sources
+- read-only AtoCore context for comparison and deduplication
+
+## Core stance
+
+The V1 stance is simple:
+
+- Discord and Discrawl are evidence streams.
+- OpenClaw is the operator and orchestrator.
+- PKM, repos, and KB tools remain the canonical human and tool truth.
+- AtoCore memories hold reviewed episodic, personal, and loose project signal.
+- AtoCore project_state holds the current trusted project answer, manually or tightly gated only.
+- Future AtoCore entities hold reviewed structured decisions, requirements, constraints, and related facts.
+
+## Architectural principles
+
+1. AtoCore remains additive and fail-open from the OpenClaw side.
+2. Every fact type has exactly one canonical home.
+3. Raw evidence is not trusted truth.
+4. Unreviewed signals become evidence or candidates, not active truth.
+5. Discord-originated paths never directly mutate project_state, registry state, refresh state, ingestion state, or review decisions without explicit human approval.
+6. OpenClaw is not canonical storage. It retrieves, compares, summarizes, requests approval, and performs approved operator actions.
+7. The shared operator client is the canonical mutating operator surface. Frontends should reuse it instead of reimplementing AtoCore-calling logic.
+
+## Explicit approval rule
+
+In this V1 policy, explicit approval means all of the following:
+
+- the human directly instructs the specific mutating action
+- the instruction appears in the current thread or current session
+- the approval is for that specific action, not vague intent
+- the approval is not inferred from Discord evidence, Discrawl recall, screener output, or general discussion
+
+Examples of explicit approval:
+
+- "refresh p05 now"
+- "register this project"
+- "promote that candidate"
+- "write this to project_state"
+
+Examples that are not explicit approval:
+
+- "we should probably refresh this sometime"
+- "I think this is the current answer"
+- archived discussion saying a mutation might be useful
+- a screener report recommending a mutation
+
+## System roles
+
+### Discord
+
+Discord is a live conversational source.
+It contains fresh context, discussion, uncertainty, and project language grounded in real work.
+It is not authoritative by itself.
+
+Discord-originated material should be treated as:
+
+- raw evidence
+- candidate material after screening
+- possible justification for a later human-reviewed promotion into a canonical home
+
+Discord should never be treated as direct trusted project truth just because someone said it in chat.
+
+### Discrawl
+
+Discrawl is a retrieval and archive layer over Discord history.
+It turns prior conversation into searchable evidence.
+That is useful for recall, context building, and finding prior decisions or open questions.
+
+Discrawl is still evidence, not authority.
+A retrieved Discord thread may show what people thought or said. It does not by itself become trusted project_state.
+
+### OpenClaw
+
+OpenClaw is the orchestrator and operator.
+It is where the human interacts, where approvals happen, and where cross-source reasoning happens.
+
+OpenClaw's job is to:
+
+- retrieve
+- compare
+- summarize
+- ask for approval when mutation is requested
+- call the shared operator client for approved writes
+- fail open when AtoCore is unavailable
+
+OpenClaw is not the canonical place where project facts live long-term.
+
+### PKM
+
+PKM is a canonical human-authored prose source.
+It is where notes, thinking, and ongoing project writing live.
+
+PKM is the canonical home for:
+
+- project prose notes
+- working notes
+- long-form summaries
+- journal-style project history
+
+PKM is not the place where OpenClaw should be taught how to operate AtoCore. Operator instructions belong in repo docs and OpenClaw instructions and skills.
+
+### Repos and KB tools
+
+Repos and KB tools are canonical human and tool truth for code and structured engineering artifacts.
+
+They are the canonical home for:
+
+- source code
+- repo design docs
+- structured tool outputs
+- KB-CAD and KB-FEM facts where those systems are the tool of origin
+
+### AtoCore memories
+
+AtoCore memories are for reviewed, durable machine-usable signal that is still loose enough to belong in memory rather than in a stricter structured layer.
+
+Examples:
+
+- episodic facts
+- preferences
+- identity facts
+- reviewed loose project facts
+
+AtoCore memories are not a place to dump raw Discord capture.
+
+### AtoCore project_state
+
+AtoCore project_state is the trusted current answer layer.
+It is the place for questions like:
+
+- what is the current selected architecture?
+- what is the current next focus?
+- what is the trusted status answer right now?
+
+Because this layer answers current-truth questions, it must remain manually curated or tightly gated.
+
+### Future AtoCore entities
+
+Future entities are the canonical home for structured engineering facts that deserve stronger representation than freeform memory.
+
+Examples:
+
+- decisions
+- requirements
+- constraints
+- validation claims
+- structured relationships later
+
+These should be promoted from evidence or candidates only after review.
+
+## Logical flow
+
+```text
+Discord live chat --.
+Discrawl archive ----+--> evidence bundle / interactions / screener input
+OpenClaw evidence ---'
+                           |
+                           v
+                    nightly screener
+                           |
+                  .--------+--------.
+                  v                 v
+           memory candidates   entity candidates (later)
+                  |                 |
+                  '--------+--------'
+                           v
+                   human review in OpenClaw
+                           |
+         .-----------------+-----------------.
+         v                 v                 v
+    active memory     active entity    explicit curation
+                                                 |
+                                                 v
+                                            project_state
+```
+
+The load-bearing rule is that review happens before trust.
+
+## Canonical-home table
+
+Every named fact type below has exactly one canonical home.
+
+| Fact type | Canonical home | Why |
+|---|---|---|
+| Raw Discord message | Discord / Discrawl archive | It is conversational evidence, not normalized truth |
+| Archived Discord thread history | Discrawl archive | It is the retrieval form of Discord evidence |
+| OpenClaw operator instructions | OpenClaw repo docs / skills / instructions | Operating behavior should live in code-adjacent instructions, not PKM |
+| Project prose notes | PKM | Human-authored project prose belongs in PKM |
+| Source code | Repo | Code truth lives in version control |
+| Repo design or architecture doc | Repo | The documentation belongs with the code or system it describes |
+| Structured KB-CAD / KB-FEM fact | KB tool of origin | Tool-managed structured engineering facts belong in their tool of origin |
+| Personal identity fact | AtoCore memory (`identity`) | AtoCore memory is the durable machine-usable home |
+| Preference fact | AtoCore memory (`preference`) | Same reason |
+| Episodic fact | AtoCore memory (`episodic`) | It is durable recall, not project_state |
+| Loose reviewed project signal | AtoCore memory (`project`) | Good fit for reviewed but not fully structured project signal |
+| Engineering decision | Future AtoCore entity (`Decision`) | Decisions need structured lifecycle and supersession |
+| Requirement | Future AtoCore entity (`Requirement`) | Requirements need structured management |
+| Constraint | Future AtoCore entity (`Constraint`) | Constraints need structured management |
+| Current trusted project answer | AtoCore `project_state` | This layer is explicitly for current trusted truth |
+| Project registration metadata | AtoCore project registry | Registry state is its own canonical operator layer |
+| Review action (promote / reject / invalidate) | AtoCore audit trail / operator action log | Review decisions are operator events, not source facts |
+
+## What this means for Discord-originated facts
+
+A Discord-originated signal can end in more than one place, but not directly.
+
+### If the signal is conversational, ambiguous, or historical
+
+It stays in the evidence lane:
+
+- Discord
+- Discrawl archive
+- optional screener artifact
+- optional candidate queue
+
+It does not become trusted project_state.
+
+### If the signal is a stable personal or episodic fact
+
+It may be promoted to AtoCore memory after review.
+
+Examples:
+
+- "Antoine prefers concise operator summaries."
+- "We decided in discussion to keep AtoCore additive."
+
+These belong in reviewed memory, not in project_state.
+
+### If the signal expresses a structured engineering fact
+
+It may become an entity candidate and later an active entity.
+
+Examples:
+
+- a requirement
+- a decision
+- a constraint
+
+Again, not directly from raw chat. The chat is evidence for the candidate.
+
+### If the signal is the current trusted answer
+
+It still should not jump directly from Discord into project_state.
+Instead, a human should explicitly curate it into project_state after checking it against the right canonical home.
+
+That canonical home may be:
+
+- PKM for prose and project notes
+- repo for code and design docs
+- KB tools for structured engineering facts
+- active entity if the engineering layer is the canonical home
+
+## Approval boundaries
+
+### Reads
+
+The following may be invoked automatically when useful:
+
+- `health`
+- `projects`
+- `detect-project`
+- `auto-context`
+- `query`
+- `project-state` read
+- Discrawl retrieval
+
+These are additive and fail-open.
+
+### Mutations requiring explicit human approval
+
+The following are operator actions, not conversational automation:
+
+- `register-project`
+- `update-project`
+- `refresh-project`
+- `ingest-sources`
+- `project-state-set`
+- `project-state-invalidate`
+- `capture` when used as a durable write outside conservative logging policy
+- `extract` with persistence
+- `promote`
+- `reject`
+- future entity promotion or rejection
+
+For Discord-originated paths, approval must satisfy the explicit approval rule above.
+
+## Shared operator client rule
+
+The preferred V1 architecture is:
+
+- AtoCore HTTP API as system interface
+- shared operator client as reusable mutating surface
+- OpenClaw as a thin frontend and operator around that client
+
+That avoids duplicating:
+
+- project detection logic
+- request logic
+- failure handling
+- mutation surface behavior
+- approval wrappers
+
+OpenClaw should keep its own high-level operating instructions, but it should not keep growing a parallel AtoCore mutation implementation.
+
+## V1 boundary summary
+
+### Allowed automatic behavior
+
+- read-only retrieval
+- context build
+- Discrawl recall
+- evidence collection
+- nightly screening into reviewable output
+- fail-open fallback when AtoCore is unavailable
+
+### Allowed only after explicit human review or approval
+
+- candidate persistence from evidence
+- candidate promotion or rejection
+- project refresh or ingestion
+- registry mutation
+- trusted project_state writes
+
+### Not allowed as automatic behavior
+
+- direct Discord -> project_state writes
+- direct Discord -> register / update / refresh / ingest / promote / reject
+- hidden mutation inside the screener
+- treating PKM as the main operator-instruction layer for AtoCore behavior
+
+## Deferred from V1
+
+Screenpipe is deferred.
+It is not an active input lane in V1 and it must not become a runtime, skill, or policy dependency in V1.
+If it is revisited later, it must be treated as a separate future design decision, not as an implicit V1 extension.
+
+## Bottom line
+
+The safe V1 architecture is not "everything can write into AtoCore."
+It is a layered system where:
+
+- evidence comes in broadly
+- trust rises slowly
+- canonical homes stay singular
+- OpenClaw remains the operator
+- AtoCore remains the additive machine-memory and trusted-state layer
+- the shared operator client becomes the one reusable write-capable surface
--- a/docs/openclaw-atocore-v1-proof-runbook.md
+++ b/docs/openclaw-atocore-v1-proof-runbook.md
@@ -0,0 +1,207 @@
+# OpenClaw x AtoCore V1 Proof Runbook
+
+## Purpose
+
+This is the concise proof and operator runbook for the final V1 policy.
+It shows, in concrete paths, that:
+
+- a Discord-originated signal cannot reach `project_state` without candidate or review gating
+- Discord cannot directly execute `register-project`, `update-project`, `refresh-project`, `ingest-sources`, `promote`, or `reject` without explicit approval
+
+## Explicit approval definition
+
+For V1, explicit approval means:
+
+- the human directly instructs the specific mutating action
+- the instruction is in the current thread or current session
+- the approval is for that exact action
+- the approval is not inferred from evidence, archives, or screener output
+
+Examples:
+
+- "refresh p05 now"
+- "register this project"
+- "promote that candidate"
+- "write this to project_state"
+
+Non-examples:
+
+- "this looks like the current answer"
+- "we should probably refresh this"
+- an old Discord thread saying a refresh might help
+- a screener report recommending a mutation
+
+## Proof 1 - Discord cannot directly reach project_state
+
+Blocked path:
+
+```text
+Discord message
+  -> evidence
+  -> optional candidate
+  -> review
+  -> optional explicit curation
+  -> project_state
+```
+
+What is blocked:
+
+- Discord -> project_state directly
+- Discrawl archive -> project_state directly
+- screener output -> project_state directly
+
+What is allowed:
+
+1. Discord message enters the evidence lane.
+2. It may become a memory or entity candidate after screening.
+3. A human reviews the candidate.
+4. If the fact is truly the current trusted answer, the human may explicitly curate it into `project_state`.
+
+Conclusion:
+
+`project_state` is reachable only after review and explicit curation. There is no direct Discord-originated write path.
+
+## Proof 2 - Discord cannot directly execute mutating operator actions
+
+Blocked direct actions:
+
+- `register-project`
+- `update-project`
+- `refresh-project`
+- `ingest-sources`
+- `promote`
+- `reject`
+- `project-state-set`
+- `project-state-invalidate`
+
+Blocked path:
+
+```text
+Discord message
+  -> evidence or operator request context
+  -X-> direct mutation
+```
+
+Allowed path:
+
+```text
+Discord message
+  -> OpenClaw recognizes requested operator action
+  -> explicit approval check
+  -> approved operator action
+  -> shared operator client or helper call
+```
+
+Conclusion:
+
+Discord can request or justify a mutation, but it cannot perform it on its own.
+
+## Proof 3 - Discrawl does not create approval
+
+Discrawl is evidence retrieval.
+It may surface:
+
+- prior discussions
+- earlier decisions
+- unresolved questions
+- prior suggestions to mutate state
+
+It does not create approval for mutation.
+
+Blocked path:
+
+```text
+Discrawl recall
+  -X-> refresh-project
+  -X-> promote
+  -X-> project_state write
+```
+
+Allowed path:
+
+```text
+Discrawl recall
+  -> evidence for human review
+  -> explicit approval in current thread/session if mutation is desired
+  -> approved operator action
+```
+
+Conclusion:
+
+Archive recall informs review. It does not authorize writes.
+
+## Proof 4 - Screener has no hidden mutation lane
+
+The screener may:
+
+- gather evidence
+- classify evidence
+- prepare candidates
+- prepare operator queues
+- report contradictions or missing context
+
+The screener may not:
+
+- write `project_state`
+- mutate registry state
+- refresh or ingest directly
+- promote or reject directly
+
+Blocked path:
+
+```text
+screener output
+  -X-> hidden mutation
+```
+
+Allowed path:
+
+```text
+screener output
+  -> review queue or operator queue
+  -> explicit approval if mutation is wanted
+  -> approved operator action
+```
+
+Conclusion:
+
+The screener is a filter, not a hidden writer.
+
+## Minimal operator decision table
+
+| Situation | Allowed next step | Blocked next step |
+|---|---|---|
+| Discord says "this is the current answer" | evidence, then review, then possible explicit curation | direct `project_state` write |
+| Discord says "refresh p05" without direct instruction | ask for explicit approval | direct `refresh-project` |
+| Discord says "refresh p05 now" | approved operator action may run | none, if approval is explicit |
+| Discrawl finds an old thread asking for registration | use as review context only | direct `register-project` |
+| Screener recommends promotion | ask for explicit review decision | direct `promote` |
+
+## Practical runbook
+
+### Case A - current-truth claim from Discord
+
+1. Treat the message as evidence.
+2. Check the canonical home.
+3. If needed, prepare a candidate or review note.
+4. Do not write `project_state` unless the human explicitly approves that curation step.
+
+### Case B - requested refresh from Discord
+
+1. Determine whether the message is a direct instruction or only discussion.
+2. If not explicit, ask for approval.
+3. Only perform `refresh-project` after explicit approval in the current thread or session.
+
+### Case C - candidate promotion request
+
+1. Candidate exists or is proposed.
+2. Review the evidence and the candidate text.
+3. Only perform `promote` or `reject` after explicit review decision.
+
+## Bottom line
+
+The V1 rule is easy to test:
+
+If the path starts from Discord or Discrawl and ends in trusted or operator state, there must be a visible approval or review step in the middle.
+
+If that visible step is missing, the action is not allowed.
--- a/docs/openclaw-atocore-write-policy-matrix.md
+++ b/docs/openclaw-atocore-write-policy-matrix.md
@@ -0,0 +1,184 @@
+# OpenClaw x AtoCore V1 Write-Policy Matrix
+
+## Purpose
+
+This matrix defines what each source is allowed to write to each target in V1.
+
+Policy meanings:
+
+- `auto-write` = allowed automatically without a human approval gate
+- `candidate-only` = may create reviewable candidate material, but not active truth
+- `human-review` = allowed only after explicit human review or explicit human approval
+- `never-auto-write` = never allowed as an automatic write path
+
+## Explicit approval rule
+
+In this matrix, `human-review` is concrete, not vague.
+For Discord-originated or Discrawl-originated paths it means:
+
+- the human directly instructs the specific mutating action
+- the instruction is in the current thread or current session
+- the approval is for that specific action
+- the approval is not inferred from evidence, archives, screener output, or general discussion
+
+Examples of explicit approval:
+
+- "refresh p05 now"
+- "register this project"
+- "promote this candidate"
+- "write this to project_state"
+
+Non-examples:
+
+- "this looks important"
+- "we should probably refresh this"
+- archived discussion that once mentioned a similar mutation
+- a screener note recommending promotion
+
+## V1 scope note
+
+V1 active inputs are:
+
+- Discord and Discrawl
+- OpenClaw interaction evidence
+- PKM, repos, and KB sources
+- read-only AtoCore context for comparison and deduplication
+
+## Targets
+
+The targets below are the only ones that matter for this policy.
+
+- Evidence artifacts
+- Memory candidates
+- Active memories
+- Entity candidates
+- Active entities
+- Trusted project_state
+- Registry / refresh / ingest mutations
+- Review actions
+
+## Matrix
+
+| Source | Target | Policy | Notes / gate |
+|---|---|---|---|
+| Discord live message | Evidence artifacts | auto-write | Safe evidence capture or archive only |
+| Discord live message | Memory candidates | candidate-only | Only after screening or extraction; never direct active write |
+| Discord live message | Active memories | human-review | Promote only after review of the candidate and evidence |
+| Discord live message | Entity candidates | candidate-only | Only when structured signal is extracted from evidence |
+| Discord live message | Active entities | human-review | Review required before promotion |
+| Discord live message | Trusted project_state | human-review | Only via explicit curation; never directly from raw chat |
+| Discord live message | Registry / refresh / ingest mutations | human-review | Requires explicit approval in the current thread or session |
+| Discord live message | Review actions | human-review | Discord cannot silently promote or reject on its own |
+| Discrawl archive result | Evidence artifacts | auto-write | Archive or search result is evidence by design |
+| Discrawl archive result | Memory candidates | candidate-only | Extract reviewed signal from archived conversation |
+| Discrawl archive result | Active memories | human-review | Promotion required |
+| Discrawl archive result | Entity candidates | candidate-only | Archived discussion may justify candidate creation |
+| Discrawl archive result | Active entities | human-review | Promotion required |
+| Discrawl archive result | Trusted project_state | human-review | Must be explicitly curated; never inferred directly from archive |
+| Discrawl archive result | Registry / refresh / ingest mutations | human-review | Archive recall cannot directly mutate operator state |
+| Discrawl archive result | Review actions | human-review | Archive evidence informs review; it does not perform review |
+| OpenClaw read/query flow | Evidence artifacts | auto-write | Conservative interaction or evidence logging is acceptable |
+| OpenClaw read/query flow | Memory candidates | candidate-only | Only through explicit extraction path |
+| OpenClaw read/query flow | Active memories | human-review | Requires operator review |
+| OpenClaw read/query flow | Entity candidates | candidate-only | Future extraction path |
+| OpenClaw read/query flow | Active entities | human-review | Requires operator review |
+| OpenClaw read/query flow | Trusted project_state | never-auto-write | Read/query flow must stay additive |
+| OpenClaw read/query flow | Registry / refresh / ingest mutations | never-auto-write | Read/query automation must not mutate operator state |
+| OpenClaw read/query flow | Review actions | never-auto-write | Read automation cannot silently promote or reject |
+| OpenClaw approved operator action | Evidence artifacts | auto-write | May create operator or audit artifacts |
+| OpenClaw approved operator action | Memory candidates | human-review | Candidate persistence is itself an approved operator action |
+| OpenClaw approved operator action | Active memories | human-review | Promotion allowed only through reviewed operator action |
+| OpenClaw approved operator action | Entity candidates | human-review | Same rule for future entities |
+| OpenClaw approved operator action | Active entities | human-review | Promotion allowed only through reviewed operator action |
+| OpenClaw approved operator action | Trusted project_state | human-review | Allowed only as explicit curation |
+| OpenClaw approved operator action | Registry / refresh / ingest mutations | human-review | Explicit approval required |
+| OpenClaw approved operator action | Review actions | human-review | Explicit review required |
+| PKM note | Evidence artifacts | human-review | Snapshotting into evidence is optional, not the primary path |
+| PKM note | Memory candidates | candidate-only | Extraction from PKM is allowed into the candidate lane |
+| PKM note | Active memories | human-review | Promotion required |
+| PKM note | Entity candidates | candidate-only | Extract structured signal into the candidate lane |
+| PKM note | Active entities | human-review | Promotion required |
+| PKM note | Trusted project_state | human-review | Only via explicit curation of current truth |
+| PKM note | Registry / refresh / ingest mutations | human-review | A human may choose to refresh based on PKM changes |
+| PKM note | Review actions | human-review | PKM may support the decision, but not execute it automatically |
+| Repo / KB source | Evidence artifacts | human-review | Optional audit or screener snapshot only |
+| Repo / KB source | Memory candidates | candidate-only | Extract loose durable signal if useful |
+| Repo / KB source | Active memories | human-review | Promotion required |
+| Repo / KB source | Entity candidates | candidate-only | Strong future path for structured facts |
+| Repo / KB source | Active entities | human-review | Promotion required |
+| Repo / KB source | Trusted project_state | human-review | Explicit curation only |
+| Repo / KB source | Registry / refresh / ingest mutations | human-review | A human may refresh or ingest based on source changes |
+| Repo / KB source | Review actions | human-review | Source can justify review; it does not perform review |
+| AtoCore active memory | Evidence artifacts | never-auto-write | Active memory is already above the evidence layer |
+| AtoCore active memory | Memory candidates | never-auto-write | Do not recursively re-candidate active memory |
+| AtoCore active memory | Active memories | never-auto-write | Already active |
+| AtoCore active memory | Entity candidates | human-review | Graduation proposal only with review |
+| AtoCore active memory | Active entities | human-review | Requires graduation plus promotion |
+| AtoCore active memory | Trusted project_state | human-review | A human may explicitly curate current truth from memory |
+| AtoCore active memory | Registry / refresh / ingest mutations | never-auto-write | Memory must not mutate registry or ingestion state |
+| AtoCore active memory | Review actions | human-review | Human reviewer decides |
+| AtoCore active entity | Evidence artifacts | never-auto-write | Already above the evidence layer |
+| AtoCore active entity | Memory candidates | never-auto-write | Do not backflow structured truth into memory candidates automatically |
+| AtoCore active entity | Active memories | never-auto-write | Canonical home is the entity, not a new memory |
+| AtoCore active entity | Entity candidates | never-auto-write | Already active |
+| AtoCore active entity | Active entities | never-auto-write | Already active |
+| AtoCore active entity | Trusted project_state | human-review | Explicit curation may publish the current trusted answer |
+| AtoCore active entity | Registry / refresh / ingest mutations | never-auto-write | Entities do not operate the registry |
+| AtoCore active entity | Review actions | human-review | Human reviewer decides |
+
+## Discord-originated trace examples
+
+### Example 1 - conversational decision in Discord
+
+Allowed path:
+
+1. Discord live message -> Evidence artifacts (`auto-write`)
+2. Evidence artifacts -> Memory candidates or Entity candidates (`candidate-only`)
+3. Candidate -> Active memory or Active entity (`human-review`)
+4. If it becomes the current trusted answer, a human may explicitly curate it into Trusted project_state (`human-review`)
+
+There is no direct Discord -> project_state automatic path.
+
+### Example 2 - archived Discord thread via Discrawl
+
+Allowed path:
+
+1. Discrawl result -> Evidence artifacts (`auto-write`)
+2. Discrawl result -> Memory candidates or Entity candidates (`candidate-only`)
+3. Human review decides promotion
+4. Optional explicit curation into project_state later
+
+Again, there is no direct archive -> trusted truth path.
+
+### Example 3 - Discord request to refresh a project
+
+Allowed path:
+
+1. Discord message is evidence of requested operator intent
+2. No mutation happens automatically
+3. OpenClaw requires explicit approval in the current thread or session for `refresh-project`
+4. Only then may OpenClaw perform the approved operator action
+
+There is no direct Discord -> refresh path without explicit approval.
+
+## V1 interpretation rules
+
+1. Evidence can flow in broadly.
+2. Truth can only rise through review.
+3. project_state is the narrowest lane.
+4. Registry and ingestion operations are operator actions, not evidence effects.
+5. Discord-originated paths can inform operator actions, but they cannot silently execute them.
+6. Deferred sources that are out of V1 scope have no automatic or manual role in this V1 matrix.
+
+## Deferred from V1
+
+Screenpipe is deferred and intentionally omitted from this V1 matrix.
+
+## Bottom line
+
+If a source is noisy, conversational, or archived, its maximum automatic privilege in V1 is:
+
+- evidence capture, or
+- candidate creation
+
+Everything above that requires explicit human review or explicit human approval.
--- a/scripts/backfill_chunk_project_ids.py
+++ b/scripts/backfill_chunk_project_ids.py
@@ -1,178 +0,0 @@
-"""Backfill explicit project_id into chunk and vector metadata.
-
-Dry-run by default. The script derives ownership from the registered project
-ingest roots and updates both SQLite source_chunks.metadata and Chroma vector
-metadata only when --apply is provided.
-"""
-
-from __future__ import annotations
-
-import argparse
-import json
-import sqlite3
-import sys
-from pathlib import Path
-
-sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "src"))
-
-from atocore.models.database import get_connection  # noqa: E402
-from atocore.projects.registry import derive_project_id_for_path  # noqa: E402
-from atocore.retrieval.vector_store import get_vector_store  # noqa: E402
-
-DEFAULT_BATCH_SIZE = 500
-
-
-def _decode_metadata(raw: str | None) -> dict | None:
-    if not raw:
-        return {}
-    try:
-        parsed = json.loads(raw)
-    except json.JSONDecodeError:
-        return None
-    return parsed if isinstance(parsed, dict) else None
-
-
-def _chunk_rows() -> tuple[list[dict], str]:
-    try:
-        with get_connection() as conn:
-            rows = conn.execute(
-                """
-                SELECT
-                    sc.id AS chunk_id,
-                    sc.metadata AS chunk_metadata,
-                    sd.file_path AS file_path
-                FROM source_chunks sc
-                JOIN source_documents sd ON sd.id = sc.document_id
-                ORDER BY sd.file_path, sc.chunk_index
-                """
-            ).fetchall()
-    except sqlite3.OperationalError as exc:
-        if "source_chunks" in str(exc) or "source_documents" in str(exc):
-            return [], f"missing ingestion tables: {exc}"
-        raise
-    return [dict(row) for row in rows], ""
-
-
-def _batches(items: list, batch_size: int) -> list[list]:
-    return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]
-
-
-def backfill(
-    apply: bool = False,
-    project_filter: str = "",
-    batch_size: int = DEFAULT_BATCH_SIZE,
-    require_chroma_snapshot: bool = False,
-) -> dict:
-    rows, db_warning = _chunk_rows()
-    updates: list[tuple[str, str, dict]] = []
-    by_project: dict[str, int] = {}
-    skipped_unowned = 0
-    already_tagged = 0
-    malformed_metadata = 0
-
-    for row in rows:
-        project_id = derive_project_id_for_path(row["file_path"])
-        if project_filter and project_id != project_filter:
-            continue
-        if not project_id:
-            skipped_unowned += 1
-            continue
-        metadata = _decode_metadata(row["chunk_metadata"])
-        if metadata is None:
-            malformed_metadata += 1
-            continue
-        if metadata.get("project_id") == project_id:
-            already_tagged += 1
-            continue
-        metadata["project_id"] = project_id
-        updates.append((row["chunk_id"], project_id, metadata))
-        by_project[project_id] = by_project.get(project_id, 0) + 1
-
-    missing_vectors: list[str] = []
-    applied_updates = 0
-    if apply and updates:
-        if not require_chroma_snapshot:
-            raise ValueError(
-                "--apply requires --chroma-snapshot-confirmed after taking a Chroma backup"
-            )
-        vector_store = get_vector_store()
-        for batch in _batches(updates, max(1, batch_size)):
-            chunk_ids = [chunk_id for chunk_id, _, _ in batch]
-            vector_payload = vector_store.get_metadatas(chunk_ids)
-            existing_vector_metadata = {
-                chunk_id: metadata
-                for chunk_id, metadata in zip(
-                    vector_payload.get("ids", []),
-                    vector_payload.get("metadatas", []),
-                    strict=False,
-                )
-                if isinstance(metadata, dict)
-            }
-
-            vector_ids = []
-            vector_metadatas = []
-            sql_updates = []
-            for chunk_id, project_id, chunk_metadata in batch:
-                vector_metadata = existing_vector_metadata.get(chunk_id)
-                if vector_metadata is None:
-                    missing_vectors.append(chunk_id)
-                    continue
-                vector_metadata = dict(vector_metadata)
-                vector_metadata["project_id"] = project_id
-                vector_ids.append(chunk_id)
-                vector_metadatas.append(vector_metadata)
-                sql_updates.append((json.dumps(chunk_metadata, ensure_ascii=True), chunk_id))
-
-            if not vector_ids:
-                continue
-
-            vector_store.update_metadatas(vector_ids, vector_metadatas)
-            with get_connection() as conn:
-                cursor = conn.executemany(
-                    "UPDATE source_chunks SET metadata = ? WHERE id = ?",
-                    sql_updates,
-                )
-                if cursor.rowcount != len(sql_updates):
-                    raise RuntimeError(
-                        f"SQLite rowcount mismatch: {cursor.rowcount} != {len(sql_updates)}"
-                    )
-            applied_updates += len(sql_updates)
-
-    return {
-        "apply": apply,
-        "total_chunks": len(rows),
-        "updates": len(updates),
-        "applied_updates": applied_updates,
-        "already_tagged": already_tagged,
-        "skipped_unowned": skipped_unowned,
-        "malformed_metadata": malformed_metadata,
-        "missing_vectors": len(missing_vectors),
-        "db_warning": db_warning,
-        "by_project": dict(sorted(by_project.items())),
-    }
-
-
-def main() -> int:
-    parser = argparse.ArgumentParser(description=__doc__)
-    parser.add_argument("--apply", action="store_true", help="write SQLite and Chroma metadata updates")
-    parser.add_argument("--project", default="", help="optional canonical project_id filter")
-    parser.add_argument("--batch-size", type=int, default=DEFAULT_BATCH_SIZE)
-    parser.add_argument(
-        "--chroma-snapshot-confirmed",
-        action="store_true",
-        help="required with --apply; confirms a Chroma snapshot exists",
-    )
-    args = parser.parse_args()
-
-    payload = backfill(
-        apply=args.apply,
-        project_filter=args.project.strip(),
-        batch_size=args.batch_size,
-        require_chroma_snapshot=args.chroma_snapshot_confirmed,
-    )
-    print(json.dumps(payload, indent=2, ensure_ascii=True))
-    return 0
-
-
-if __name__ == "__main__":
-    raise SystemExit(main())
--- a/scripts/live_status.py
+++ b/scripts/live_status.py
@@ -1,131 +0,0 @@
-"""Render a compact live-status report from a running AtoCore instance.
-
-This is intentionally read-only and stdlib-only so it can be used from a
-fresh checkout, a cron job, or a Codex/Claude session without installing the
-full app package. The output is meant to reduce docs drift: copy the report
-into status docs only after it was generated from the live service.
-"""
-
-from __future__ import annotations
-
-import argparse
-import errno
-import json
-import os
-import sys
-import urllib.error
-import urllib.request
-from typing import Any
-
-
-DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100").rstrip("/")
-DEFAULT_TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
-DEFAULT_AUTH_TOKEN = os.environ.get("ATOCORE_AUTH_TOKEN", "").strip()
-
-
-def request_json(base_url: str, path: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
-    headers = {"Authorization": f"Bearer {auth_token}"} if auth_token else {}
-    req = urllib.request.Request(f"{base_url}{path}", method="GET", headers=headers)
-    with urllib.request.urlopen(req, timeout=timeout) as response:
-        body = response.read().decode("utf-8")
-        status = getattr(response, "status", None)
-    payload = json.loads(body) if body.strip() else {}
-    if not isinstance(payload, dict):
-        payload = {"value": payload}
-    if status is not None:
-        payload["_http_status"] = status
-    return payload
-
-
-def collect_status(base_url: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
-    payload: dict[str, Any] = {"base_url": base_url}
-    for name, path in {
-        "health": "/health",
-        "stats": "/stats",
-        "projects": "/projects",
-        "dashboard": "/admin/dashboard",
-    }.items():
-        try:
-            payload[name] = request_json(base_url, path, timeout, auth_token)
-        except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
-            payload[name] = {"error": str(exc)}
-    return payload
-
-
-def render_markdown(status: dict[str, Any]) -> str:
-    health = status.get("health", {})
-    stats = status.get("stats", {})
-    projects = status.get("projects", {}).get("projects", [])
-    dashboard = status.get("dashboard", {})
-    memories = dashboard.get("memories", {}) if isinstance(dashboard.get("memories"), dict) else {}
-    project_state = dashboard.get("project_state", {}) if isinstance(dashboard.get("project_state"), dict) else {}
-    interactions = dashboard.get("interactions", {}) if isinstance(dashboard.get("interactions"), dict) else {}
-    pipeline = dashboard.get("pipeline", {}) if isinstance(dashboard.get("pipeline"), dict) else {}
-
-    lines = [
-        "# AtoCore Live Status",
-        "",
-        f"- base_url: `{status.get('base_url', '')}`",
-        "- endpoint_http_statuses: "
-        f"`health={health.get('_http_status', 'error')}, "
-        f"stats={stats.get('_http_status', 'error')}, "
-        f"projects={status.get('projects', {}).get('_http_status', 'error')}, "
-        f"dashboard={dashboard.get('_http_status', 'error')}`",
-        f"- service_status: `{health.get('status', 'unknown')}`",
-        f"- code_version: `{health.get('code_version', health.get('version', 'unknown'))}`",
-        f"- build_sha: `{health.get('build_sha', 'unknown')}`",
-        f"- build_branch: `{health.get('build_branch', 'unknown')}`",
-        f"- build_time: `{health.get('build_time', 'unknown')}`",
-        f"- env: `{health.get('env', 'unknown')}`",
-        f"- documents: `{stats.get('total_documents', 'unknown')}`",
-        f"- chunks: `{stats.get('total_chunks', 'unknown')}`",
-        f"- vectors: `{stats.get('total_vectors', health.get('vectors_count', 'unknown'))}`",
-        f"- registered_projects: `{len(projects)}`",
-        f"- active_memories: `{memories.get('active', 'unknown')}`",
-        f"- candidate_memories: `{memories.get('candidates', 'unknown')}`",
-        f"- interactions: `{interactions.get('total', 'unknown')}`",
-        f"- project_state_entries: `{project_state.get('total', 'unknown')}`",
-        f"- pipeline_last_run: `{pipeline.get('last_run', 'unknown')}`",
-    ]
-
-    if projects:
-        lines.extend(["", "## Projects"])
-        for project in projects:
-            aliases = ", ".join(project.get("aliases", []))
-            suffix = f" ({aliases})" if aliases else ""
-            lines.append(f"- `{project.get('id', '')}`{suffix}")
-
-    return "\n".join(lines) + "\n"
-
-
-def main() -> int:
-    parser = argparse.ArgumentParser(description="Render live AtoCore status")
-    parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
-    parser.add_argument("--timeout", type=int, default=DEFAULT_TIMEOUT)
-    parser.add_argument(
-        "--auth-token",
-        default=DEFAULT_AUTH_TOKEN,
-        help="Bearer token; defaults to ATOCORE_AUTH_TOKEN when set",
-    )
-    parser.add_argument("--json", action="store_true", help="emit raw JSON")
-    args = parser.parse_args()
-
-    status = collect_status(args.base_url.rstrip("/"), args.timeout, args.auth_token)
-    if args.json:
-        output = json.dumps(status, indent=2, ensure_ascii=True) + "\n"
-    else:
-        output = render_markdown(status)
-
-    try:
-        sys.stdout.write(output)
-    except BrokenPipeError:
-        return 0
-    except OSError as exc:
-        if exc.errno in {errno.EINVAL, errno.EPIPE}:
-            return 0
-        raise
-    return 0
-
-
-if __name__ == "__main__":
-    raise SystemExit(main())
--- a/scripts/retrieval_eval.py
+++ b/scripts/retrieval_eval.py
@@ -44,7 +44,6 @@ import urllib.error
 import urllib.parse
 import urllib.request
 from dataclasses import dataclass, field
-from datetime import datetime, timezone
 from pathlib import Path

 DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
@@ -53,13 +52,6 @@ DEFAULT_BUDGET = 3000
 DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json"


-def request_json(base_url: str, path: str, timeout: int) -> dict:
-    req = urllib.request.Request(f"{base_url}{path}", method="GET")
-    with urllib.request.urlopen(req, timeout=timeout) as resp:
-        body = resp.read().decode("utf-8")
-    return json.loads(body) if body.strip() else {}
-
-
@dataclass
 class Fixture:
    name: str
@@ -68,7 +60,6 @@ class Fixture:
    budget: int = DEFAULT_BUDGET
    expect_present: list[str] = field(default_factory=list)
    expect_absent: list[str] = field(default_factory=list)
-    known_issue: bool = False
    notes: str = ""


@@ -79,13 +70,8 @@ class FixtureResult:
    missing_present: list[str]
    unexpected_absent: list[str]
    total_chars: int
-    known_issue: bool = False
    error: str = ""

-    @property
-    def blocking_failure(self) -> bool:
-        return not self.ok and not self.known_issue
-

 def load_fixtures(path: Path) -> list[Fixture]:
    data = json.loads(path.read_text(encoding="utf-8"))
@@ -103,7 +89,6 @@ def load_fixtures(path: Path) -> list[Fixture]:
                budget=int(raw.get("budget", DEFAULT_BUDGET)),
                expect_present=list(raw.get("expect_present", [])),
                expect_absent=list(raw.get("expect_absent", [])),
-                known_issue=bool(raw.get("known_issue", False)),
                notes=raw.get("notes", ""),
            )
        )
@@ -132,7 +117,6 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
            missing_present=list(fixture.expect_present),
            unexpected_absent=[],
            total_chars=0,
-            known_issue=fixture.known_issue,
            error=f"http_error: {exc}",
        )

@@ -145,26 +129,16 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
        missing_present=missing,
        unexpected_absent=unexpected,
        total_chars=len(formatted),
-        known_issue=fixture.known_issue,
    )


-def print_human_report(results: list[FixtureResult], metadata: dict) -> None:
+def print_human_report(results: list[FixtureResult]) -> None:
    total = len(results)
    passed = sum(1 for r in results if r.ok)
-    known = sum(1 for r in results if not r.ok and r.known_issue)
-    blocking = sum(1 for r in results if r.blocking_failure)
    print(f"Retrieval eval: {passed}/{total} fixtures passed")
-    print(
-        "Target: "
-        f"{metadata.get('base_url', 'unknown')} "
-        f"build={metadata.get('health', {}).get('build_sha', 'unknown')}"
-    )
-    if known or blocking:
-        print(f"Blocking failures: {blocking}  Known issues: {known}")
    print()
    for r in results:
-        marker = "PASS" if r.ok else ("KNOWN" if r.known_issue else "FAIL")
+        marker = "PASS" if r.ok else "FAIL"
        print(f"[{marker}] {r.fixture.name}  project={r.fixture.project}  chars={r.total_chars}")
        if r.error:
            print(f"       error: {r.error}")
@@ -176,21 +150,15 @@ def print_human_report(results: list[FixtureResult], metadata: dict) -> None:
            print(f"       notes: {r.fixture.notes}")


-def print_json_report(results: list[FixtureResult], metadata: dict) -> None:
+def print_json_report(results: list[FixtureResult]) -> None:
    payload = {
-        "generated_at": metadata.get("generated_at"),
-        "base_url": metadata.get("base_url"),
-        "health": metadata.get("health", {}),
        "total": len(results),
        "passed": sum(1 for r in results if r.ok),
-        "known_issues": sum(1 for r in results if not r.ok and r.known_issue),
-        "blocking_failures": sum(1 for r in results if r.blocking_failure),
        "fixtures": [
            {
                "name": r.fixture.name,
                "project": r.fixture.project,
                "ok": r.ok,
-                "known_issue": r.known_issue,
                "total_chars": r.total_chars,
                "missing_present": r.missing_present,
                "unexpected_absent": r.unexpected_absent,
@@ -211,26 +179,15 @@ def main() -> int:
    parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
    args = parser.parse_args()

-    base_url = args.base_url.rstrip("/")
-    try:
-        health = request_json(base_url, "/health", args.timeout)
-    except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
-        health = {"error": str(exc)}
-    metadata = {
-        "generated_at": datetime.now(timezone.utc).isoformat(),
-        "base_url": base_url,
-        "health": health,
-    }
-
    fixtures = load_fixtures(args.fixtures)
-    results = [run_fixture(f, base_url, args.timeout) for f in fixtures]
+    results = [run_fixture(f, args.base_url, args.timeout) for f in fixtures]

    if args.json:
-        print_json_report(results, metadata)
+        print_json_report(results)
    else:
-        print_human_report(results, metadata)
+        print_human_report(results)

-    return 0 if not any(r.blocking_failure for r in results) else 1
+    return 0 if all(r.ok for r in results) else 1


 if __name__ == "__main__":
--- a/scripts/retrieval_eval_fixtures.json
+++ b/scripts/retrieval_eval_fixtures.json
@@ -27,8 +27,7 @@
    "expect_absent": [
      "polisher suite"
    ],
-    "known_issue": true,
-    "notes": "Known content gap as of 2026-04-24: live retrieval surfaces related constraints but not the exact Zerodur / 1.2 strings. Keep visible, but do not make nightly harness red until the source/state gap is fixed."
+    "notes": "Key constraints are in Trusted Project State and in the mission-framing memory"
  },
  {
    "name": "p04-short-ambiguous",
@@ -81,36 +80,6 @@
    ],
    "notes": "CGH is a core p05 concept. Should surface via chunks and possibly the architecture memory. Must not bleed p06 polisher-suite terms."
  },
-  {
-    "name": "p05-broad-status-no-atomizer",
-    "project": "p05-interferometer",
-    "prompt": "current status",
-    "expect_present": [
-      "--- Trusted Project State ---",
-      "--- Project Memories ---",
-      "Zygo"
-    ],
-    "expect_absent": [
-      "atomizer-v2",
-      "ATOMIZER_PODCAST_BRIEFING",
-      "[Source: atomizer-v2/",
-      "P04-GigaBIT-M1-KB-design"
-    ],
-    "notes": "Regression guard for the April 24 audit finding: broad p05 status queries must not pull Atomizer/archive context into project-scoped packs."
-  },
-  {
-    "name": "p05-vendor-decision-no-archive-first",
-    "project": "p05-interferometer",
-    "prompt": "vendor selection decision",
-    "expect_present": [
-      "Selection-Decision"
-    ],
-    "expect_absent": [
-      "[Source: atomizer-v2/",
-      "ATOMIZER_PODCAST_BRIEFING"
-    ],
-    "notes": "Project-scoped decision query should stay inside p05 and prefer current decision/vendor material over unrelated project archives."
-  },
  {
    "name": "p06-suite-split",
    "project": "p06-polisher",
--- a/src/atocore/config.py
+++ b/src/atocore/config.py
@@ -46,8 +46,6 @@ class Settings(BaseSettings):
    # All multipliers default to the values used since Wave 1; tighten or
    # loosen them via ATOCORE_* env vars without touching code.
    rank_project_match_boost: float = 2.0
-    rank_project_scope_filter: bool = True
-    rank_project_scope_candidate_multiplier: int = 4
    rank_query_token_step: float = 0.08
    rank_query_token_cap: float = 1.32
    rank_path_high_signal_boost: float = 1.18
--- a/src/atocore/ingestion/pipeline.py
+++ b/src/atocore/ingestion/pipeline.py
@@ -32,23 +32,10 @@ def exclusive_ingestion():
        _INGESTION_LOCK.release()


-def ingest_file(file_path: Path, project_id: str = "") -> dict:
+def ingest_file(file_path: Path) -> dict:
    """Ingest a single markdown file. Returns stats."""
    start = time.time()
    file_path = file_path.resolve()
-    project_id = (project_id or "").strip()
-    if not project_id:
-        try:
-            from atocore.projects.registry import derive_project_id_for_path
-
-            project_id = derive_project_id_for_path(file_path)
-        except Exception as exc:
-            log.warning(
-                "project_id_derivation_failed",
-                file_path=str(file_path),
-                error=str(exc),
-            )
-            project_id = ""

    if not file_path.exists():
        raise FileNotFoundError(f"File not found: {file_path}")
@@ -78,7 +65,6 @@ def ingest_file(file_path: Path, project_id: str = "") -> dict:
        "source_file": str(file_path),
        "tags": parsed.tags,
        "title": parsed.title,
-        "project_id": project_id,
    }
    chunks = chunk_markdown(parsed.body, base_metadata=base_meta)

@@ -130,7 +116,6 @@ def ingest_file(file_path: Path, project_id: str = "") -> dict:
                        "source_file": str(file_path),
                        "tags": json.dumps(parsed.tags),
                        "title": parsed.title,
-                        "project_id": project_id,
                    })

                    conn.execute(
@@ -188,17 +173,7 @@ def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
        purge_deleted: If True, remove DB/vector entries for files
                       that no longer exist on disk.
    """
-    return ingest_project_folder(folder_path, purge_deleted=purge_deleted, project_id="")
-
-
-def ingest_project_folder(
-    folder_path: Path,
-    purge_deleted: bool = True,
-    project_id: str = "",
-) -> list[dict]:
-    """Ingest a folder and annotate chunks with an optional project id."""
    folder_path = folder_path.resolve()
-    project_id = (project_id or "").strip()
    if not folder_path.is_dir():
        raise NotADirectoryError(f"Not a directory: {folder_path}")

@@ -212,7 +187,7 @@ def ingest_project_folder(
    # Ingest new/changed files
    for md_file in md_files:
        try:
-            result = ingest_file(md_file, project_id=project_id)
+            result = ingest_file(md_file)
            results.append(result)
        except Exception as e:
            log.error("ingestion_error", file_path=str(md_file), error=str(e))
--- a/src/atocore/projects/registry.py
+++ b/src/atocore/projects/registry.py
@@ -8,6 +8,7 @@ from dataclasses import asdict, dataclass
 from pathlib import Path

 import atocore.config as _config
+from atocore.ingestion.pipeline import ingest_folder


 # Reserved pseudo-projects. `inbox` holds pre-project / lead / quote
@@ -259,7 +260,6 @@ def load_project_registry() -> list[RegisteredProject]:
        )

    _validate_unique_project_names(projects)
-    _validate_ingest_root_overlaps(projects)
    return projects


@@ -307,28 +307,6 @@ def resolve_project_name(name: str | None) -> str:
    return name


-def derive_project_id_for_path(file_path: str | Path) -> str:
-    """Return the registered project that owns a source path, if any."""
-    if not file_path:
-        return ""
-    doc_path = Path(file_path).resolve(strict=False)
-    matches: list[tuple[int, int, str]] = []
-
-    for project in load_project_registry():
-        for source_ref in project.ingest_roots:
-            root_path = _resolve_ingest_root(source_ref)
-            try:
-                doc_path.relative_to(root_path)
-            except ValueError:
-                continue
-            matches.append((len(root_path.parts), len(str(root_path)), project.project_id))
-
-    if not matches:
-        return ""
-    matches.sort(reverse=True)
-    return matches[0][2]
-
-
 def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
    """Ingest all configured source roots for a registered project.

@@ -344,8 +322,6 @@ def refresh_registered_project(project_name: str, purge_deleted: bool = False) -
    if project is None:
        raise ValueError(f"Unknown project: {project_name}")

-    from atocore.ingestion.pipeline import ingest_project_folder
-
    roots = []
    ingested_count = 0
    skipped_count = 0
@@ -370,11 +346,7 @@ def refresh_registered_project(project_name: str, purge_deleted: bool = False) -
            {
                **root_result,
                "status": "ingested",
-                "results": ingest_project_folder(
-                    resolved,
-                    purge_deleted=purge_deleted,
-                    project_id=project.project_id,
-                ),
+                "results": ingest_folder(resolved, purge_deleted=purge_deleted),
            }
        )
        ingested_count += 1
@@ -471,33 +443,6 @@ def _validate_unique_project_names(projects: list[RegisteredProject]) -> None:
            seen[key] = project.project_id


-def _validate_ingest_root_overlaps(projects: list[RegisteredProject]) -> None:
-    roots: list[tuple[str, Path]] = []
-    for project in projects:
-        for source_ref in project.ingest_roots:
-            roots.append((project.project_id, _resolve_ingest_root(source_ref)))
-
-    for i, (left_project, left_root) in enumerate(roots):
-        for right_project, right_root in roots[i + 1:]:
-            if left_project == right_project:
-                continue
-            try:
-                left_root.relative_to(right_root)
-                overlaps = True
-            except ValueError:
-                try:
-                    right_root.relative_to(left_root)
-                    overlaps = True
-                except ValueError:
-                    overlaps = False
-            if overlaps:
-                raise ValueError(
-                    "Project registry ingest root overlap: "
-                    f"'{left_root}' ({left_project}) and "
-                    f"'{right_root}' ({right_project})"
-                )
-
-
 def _find_name_collisions(
    project_id: str,
    aliases: list[str],
--- a/src/atocore/retrieval/retriever.py
+++ b/src/atocore/retrieval/retriever.py
@@ -1,6 +1,5 @@
 """Retrieval: query to ranked chunks."""

-import json
 import re
 import time
 from dataclasses import dataclass
@@ -8,7 +7,7 @@ from dataclasses import dataclass
 import atocore.config as _config
 from atocore.models.database import get_connection
 from atocore.observability.logger import get_logger
-from atocore.projects.registry import RegisteredProject, get_registered_project, load_project_registry
+from atocore.projects.registry import get_registered_project
 from atocore.retrieval.embeddings import embed_query
 from atocore.retrieval.vector_store import get_vector_store

@@ -84,27 +83,6 @@ def retrieve(
    """Retrieve the most relevant chunks for a query."""
    top_k = top_k or _config.settings.context_top_k
    start = time.time()
-    try:
-        scoped_project = get_registered_project(project_hint) if project_hint else None
-    except Exception as exc:
-        log.warning(
-            "project_scope_resolution_failed",
-            project_hint=project_hint,
-            error=str(exc),
-        )
-        scoped_project = None
-    scope_filter_enabled = bool(scoped_project and _config.settings.rank_project_scope_filter)
-    registered_projects = None
-    query_top_k = top_k
-    if scope_filter_enabled:
-        query_top_k = max(
-            top_k,
-            top_k * max(1, _config.settings.rank_project_scope_candidate_multiplier),
-        )
-        try:
-            registered_projects = load_project_registry()
-        except Exception:
-            registered_projects = None

    query_embedding = embed_query(query)
    store = get_vector_store()
@@ -123,12 +101,11 @@ def retrieve(

    results = store.query(
        query_embedding=query_embedding,
-        top_k=query_top_k,
+        top_k=top_k,
        where=where,
    )

    chunks = []
-    raw_result_count = len(results["ids"][0]) if results and results["ids"] and results["ids"][0] else 0
    if results and results["ids"] and results["ids"][0]:
        existing_ids = _existing_chunk_ids(results["ids"][0])
        for i, chunk_id in enumerate(results["ids"][0]):
@@ -140,13 +117,6 @@ def retrieve(
            meta = results["metadatas"][0][i] if results["metadatas"] else {}
            content = results["documents"][0][i] if results["documents"] else ""

-            if scope_filter_enabled and not _is_allowed_for_project_scope(
-                scoped_project,
-                meta,
-                registered_projects,
-            ):
-                continue
-
            score *= _query_match_boost(query, meta)
            score *= _path_signal_boost(meta)
            if project_hint:
@@ -167,151 +137,42 @@ def retrieve(

    duration_ms = int((time.time() - start) * 1000)
    chunks.sort(key=lambda chunk: chunk.score, reverse=True)
-    post_filter_count = len(chunks)
-    chunks = chunks[:top_k]

    log.info(
        "retrieval_done",
        query=query[:100],
        top_k=top_k,
-        query_top_k=query_top_k,
-        raw_results_count=raw_result_count,
-        post_filter_count=post_filter_count,
        results_count=len(chunks),
-        post_filter_dropped=max(0, raw_result_count - post_filter_count),
-        underfilled=bool(raw_result_count >= query_top_k and len(chunks) < top_k),
        duration_ms=duration_ms,
    )

    return chunks


-def _is_allowed_for_project_scope(
-    project: RegisteredProject,
-    metadata: dict,
-    registered_projects: list[RegisteredProject] | None = None,
-) -> bool:
-    """Return True when a chunk is target-project or not project-owned.
-
-    Project-hinted retrieval should not let one registered project's corpus
-    compete with another's. At the same time, unowned/global sources should
-    remain eligible because shared docs and cross-project references can be
-    genuinely useful. The registry gives us the boundary: if metadata matches
-    a registered project and it is not the requested project, filter it out.
-    """
-    if _metadata_matches_project(project, metadata):
-        return True
-
-    if registered_projects is None:
-        try:
-            registered_projects = load_project_registry()
-        except Exception:
-            return True
-
-    for other in registered_projects:
-        if other.project_id == project.project_id:
-            continue
-        if _metadata_matches_project(other, metadata):
-            return False
-    return True
-
-
-def _metadata_matches_project(project: RegisteredProject, metadata: dict) -> bool:
-    stored_project_id = str(metadata.get("project_id", "")).strip().lower()
-    if stored_project_id:
-        return stored_project_id == project.project_id.lower()
-
-    path = _metadata_source_path(metadata)
-    tags = _metadata_tags(metadata)
-    for term in _project_scope_terms(project):
-        if _path_matches_term(path, term) or term in tags:
-            return True
-    return False
-
-
-def _project_scope_terms(project: RegisteredProject) -> set[str]:
-    terms = {project.project_id.lower()}
-    terms.update(alias.lower() for alias in project.aliases)
-    for source_ref in project.ingest_roots:
-        normalized = source_ref.subpath.replace("\\", "/").strip("/").lower()
-        if normalized:
-            terms.add(normalized)
-            terms.add(normalized.split("/")[-1])
-    return {term for term in terms if term}
-
-
-def _metadata_searchable(metadata: dict) -> str:
-    return " ".join(
-        [
-            str(metadata.get("source_file", "")).replace("\\", "/").lower(),
-            str(metadata.get("title", "")).lower(),
-            str(metadata.get("heading_path", "")).lower(),
-            str(metadata.get("tags", "")).lower(),
-        ]
-    )
-
-
-def _metadata_source_path(metadata: dict) -> str:
-    return str(metadata.get("source_file", "")).replace("\\", "/").strip("/").lower()
-
-
-def _metadata_tags(metadata: dict) -> set[str]:
-    raw_tags = metadata.get("tags", [])
-    if isinstance(raw_tags, (list, tuple, set)):
-        return {str(tag).strip().lower() for tag in raw_tags if str(tag).strip()}
-    if isinstance(raw_tags, str):
-        try:
-            parsed = json.loads(raw_tags)
-        except json.JSONDecodeError:
-            parsed = [raw_tags]
-        if isinstance(parsed, (list, tuple, set)):
-            return {str(tag).strip().lower() for tag in parsed if str(tag).strip()}
-        if isinstance(parsed, str) and parsed.strip():
-            return {parsed.strip().lower()}
-    return set()
-
-
-def _path_matches_term(path: str, term: str) -> bool:
-    normalized = term.replace("\\", "/").strip("/").lower()
-    if not path or not normalized:
-        return False
-    if "/" in normalized:
-        return path == normalized or path.startswith(f"{normalized}/")
-    return normalized in set(path.split("/"))
-
-
-def _metadata_has_term(metadata: dict, term: str) -> bool:
-    normalized = term.replace("\\", "/").strip("/").lower()
-    if not normalized:
-        return False
-    if _path_matches_term(_metadata_source_path(metadata), normalized):
-        return True
-    if normalized in _metadata_tags(metadata):
-        return True
-    return re.search(
-        rf"(?<![a-z0-9]){re.escape(normalized)}(?![a-z0-9])",
-        _metadata_searchable(metadata),
-    ) is not None
-
-
 def _project_match_boost(project_hint: str, metadata: dict) -> float:
    """Return a project-aware relevance multiplier for raw retrieval."""
    hint_lower = project_hint.strip().lower()
    if not hint_lower:
        return 1.0

-    try:
-        project = get_registered_project(project_hint)
-    except Exception as exc:
-        log.warning(
-            "project_match_boost_resolution_failed",
-            project_hint=project_hint,
-            error=str(exc),
+    source_file = str(metadata.get("source_file", "")).lower()
+    title = str(metadata.get("title", "")).lower()
+    tags = str(metadata.get("tags", "")).lower()
+    searchable = " ".join([source_file, title, tags])
+
+    project = get_registered_project(project_hint)
+    candidate_names = {hint_lower}
+    if project is not None:
+        candidate_names.add(project.project_id.lower())
+        candidate_names.update(alias.lower() for alias in project.aliases)
+        candidate_names.update(
+            source_ref.subpath.replace("\\", "/").strip("/").split("/")[-1].lower()
+            for source_ref in project.ingest_roots
+            if source_ref.subpath.strip("/\\")
        )
-        project = None
-    candidate_names = _project_scope_terms(project) if project is not None else {hint_lower}
+
    for candidate in candidate_names:
-        if _metadata_has_term(metadata, candidate):
+        if candidate and candidate in searchable:
            return _config.settings.rank_project_match_boost

    return 1.0
--- a/src/atocore/retrieval/vector_store.py
+++ b/src/atocore/retrieval/vector_store.py
@@ -64,18 +64,6 @@ class VectorStore:
            self._collection.delete(ids=ids)
            log.debug("vectors_deleted", count=len(ids))

-    def get_metadatas(self, ids: list[str]) -> dict:
-        """Fetch vector metadata by chunk IDs."""
-        if not ids:
-            return {"ids": [], "metadatas": []}
-        return self._collection.get(ids=ids, include=["metadatas"])
-
-    def update_metadatas(self, ids: list[str], metadatas: list[dict]) -> None:
-        """Update vector metadata without re-embedding documents."""
-        if ids:
-            self._collection.update(ids=ids, metadatas=metadatas)
-            log.debug("vector_metadatas_updated", count=len(ids))
-
    @property
    def count(self) -> int:
        return self._collection.count()
--- a/tests/test_backfill_chunk_project_ids.py
+++ b/tests/test_backfill_chunk_project_ids.py
@@ -1,154 +0,0 @@
-"""Tests for explicit chunk project_id metadata backfill."""
-
-import json
-
-import atocore.config as config
-from atocore.models.database import get_connection, init_db
-from scripts import backfill_chunk_project_ids as backfill
-
-
-def _write_registry(tmp_path, monkeypatch):
-    vault_dir = tmp_path / "vault"
-    drive_dir = tmp_path / "drive"
-    config_dir = tmp_path / "config"
-    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
-    project_dir.mkdir(parents=True)
-    drive_dir.mkdir()
-    config_dir.mkdir()
-    registry_path = config_dir / "project-registry.json"
-    registry_path.write_text(
-        json.dumps(
-            {
-                "projects": [
-                    {
-                        "id": "p04-gigabit",
-                        "aliases": ["p04"],
-                        "ingest_roots": [
-                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
-                        ],
-                    }
-                ]
-            }
-        ),
-        encoding="utf-8",
-    )
-    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
-    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
-    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
-    config.settings = config.Settings()
-    return project_dir
-
-
-def _insert_chunk(file_path, metadata=None, chunk_id="chunk-1"):
-    with get_connection() as conn:
-        conn.execute(
-            """
-            INSERT INTO source_documents (id, file_path, file_hash, title, doc_type, tags)
-            VALUES (?, ?, ?, ?, ?, ?)
-            """,
-            ("doc-1", str(file_path), "hash", "Title", "markdown", "[]"),
-        )
-        conn.execute(
-            """
-            INSERT INTO source_chunks
-                (id, document_id, chunk_index, content, heading_path, char_count, metadata)
-            VALUES (?, ?, ?, ?, ?, ?, ?)
-            """,
-            (
-                chunk_id,
-                "doc-1",
-                0,
-                "content",
-                "Overview",
-                7,
-                json.dumps(metadata if metadata is not None else {}),
-            ),
-        )
-
-
-class FakeVectorStore:
-    def __init__(self, metadatas):
-        self.metadatas = dict(metadatas)
-        self.updated = []
-
-    def get_metadatas(self, ids):
-        returned_ids = [chunk_id for chunk_id in ids if chunk_id in self.metadatas]
-        return {
-            "ids": returned_ids,
-            "metadatas": [self.metadatas[chunk_id] for chunk_id in returned_ids],
-        }
-
-    def update_metadatas(self, ids, metadatas):
-        self.updated.append((list(ids), list(metadatas)))
-        for chunk_id, metadata in zip(ids, metadatas, strict=True):
-            self.metadatas[chunk_id] = metadata
-
-
-def test_backfill_dry_run_is_non_mutating(tmp_data_dir, tmp_path, monkeypatch):
-    init_db()
-    project_dir = _write_registry(tmp_path, monkeypatch)
-    _insert_chunk(project_dir / "status.md")
-
-    result = backfill.backfill(apply=False)
-
-    assert result["updates"] == 1
-    with get_connection() as conn:
-        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
-    assert json.loads(row["metadata"]) == {}
-
-
-def test_backfill_apply_updates_chroma_then_sql(tmp_data_dir, tmp_path, monkeypatch):
-    init_db()
-    project_dir = _write_registry(tmp_path, monkeypatch)
-    _insert_chunk(project_dir / "status.md", metadata={"source_file": "status.md"})
-    fake_store = FakeVectorStore({"chunk-1": {"source_file": "status.md"}})
-    monkeypatch.setattr(backfill, "get_vector_store", lambda: fake_store)
-
-    result = backfill.backfill(apply=True, require_chroma_snapshot=True)
-
-    assert result["applied_updates"] == 1
-    assert fake_store.metadatas["chunk-1"]["project_id"] == "p04-gigabit"
-    with get_connection() as conn:
-        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
-    assert json.loads(row["metadata"])["project_id"] == "p04-gigabit"
-
-
-def test_backfill_apply_requires_snapshot_confirmation(tmp_data_dir, tmp_path, monkeypatch):
-    init_db()
-    project_dir = _write_registry(tmp_path, monkeypatch)
-    _insert_chunk(project_dir / "status.md")
-
-    try:
-        backfill.backfill(apply=True)
-    except ValueError as exc:
-        assert "Chroma backup" in str(exc)
-    else:
-        raise AssertionError("Expected snapshot confirmation requirement")
-
-
-def test_backfill_missing_vector_skips_sql_update(tmp_data_dir, tmp_path, monkeypatch):
-    init_db()
-    project_dir = _write_registry(tmp_path, monkeypatch)
-    _insert_chunk(project_dir / "status.md")
-    fake_store = FakeVectorStore({})
-    monkeypatch.setattr(backfill, "get_vector_store", lambda: fake_store)
-
-    result = backfill.backfill(apply=True, require_chroma_snapshot=True)
-
-    assert result["updates"] == 1
-    assert result["applied_updates"] == 0
-    assert result["missing_vectors"] == 1
-    with get_connection() as conn:
-        row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
-    assert json.loads(row["metadata"]) == {}
-
-
-def test_backfill_skips_malformed_metadata(tmp_data_dir, tmp_path, monkeypatch):
-    init_db()
-    project_dir = _write_registry(tmp_path, monkeypatch)
-    _insert_chunk(project_dir / "status.md", metadata=[])
-
-    result = backfill.backfill(apply=False)
-
-    assert result["updates"] == 0
-    assert result["malformed_metadata"] == 1
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -46,8 +46,6 @@ def test_settings_keep_legacy_db_path_when_present(tmp_path, monkeypatch):

 def test_ranking_weights_are_tunable_via_env(monkeypatch):
    monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5")
-    monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_FILTER", "false")
-    monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER", "6")
    monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12")
    monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5")
    monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25")
@@ -56,8 +54,6 @@ def test_ranking_weights_are_tunable_via_env(monkeypatch):
    settings = config.Settings()

    assert settings.rank_project_match_boost == 3.5
-    assert settings.rank_project_scope_filter is False
-    assert settings.rank_project_scope_candidate_multiplier == 6
    assert settings.rank_query_token_step == 0.12
    assert settings.rank_query_token_cap == 1.5
    assert settings.rank_path_high_signal_boost == 1.25
--- a/tests/test_ingestion.py
+++ b/tests/test_ingestion.py
@@ -1,10 +1,8 @@
 """Tests for the ingestion pipeline."""

-import json
-
 from atocore.ingestion.parser import parse_markdown
 from atocore.models.database import get_connection, init_db
-from atocore.ingestion.pipeline import ingest_file, ingest_folder, ingest_project_folder
+from atocore.ingestion.pipeline import ingest_file, ingest_folder


 def test_parse_markdown(sample_markdown):
@@ -71,153 +69,6 @@ def test_ingest_updates_changed(tmp_data_dir, sample_markdown):
    assert result["status"] == "ingested"


-def test_ingest_file_records_project_id_metadata(tmp_data_dir, sample_markdown, monkeypatch):
-    """Project-aware ingestion should tag DB and vector metadata exactly."""
-    init_db()
-
-    class FakeVectorStore:
-        def __init__(self):
-            self.metadatas = []
-
-        def add(self, ids, documents, metadatas):
-            self.metadatas.extend(metadatas)
-
-        def delete(self, ids):
-            return None
-
-    fake_store = FakeVectorStore()
-    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
-
-    result = ingest_file(sample_markdown, project_id="p04-gigabit")
-
-    assert result["status"] == "ingested"
-    assert fake_store.metadatas
-    assert all(meta["project_id"] == "p04-gigabit" for meta in fake_store.metadatas)
-
-    with get_connection() as conn:
-        rows = conn.execute("SELECT metadata FROM source_chunks").fetchall()
-    assert rows
-    assert all(
-        json.loads(row["metadata"])["project_id"] == "p04-gigabit"
-        for row in rows
-    )
-
-
-def test_ingest_file_derives_project_id_from_registry_root(tmp_data_dir, tmp_path, monkeypatch):
-    """Unscoped ingest should preserve ownership for files under registered roots."""
-    import atocore.config as config
-
-    vault_dir = tmp_path / "vault"
-    drive_dir = tmp_path / "drive"
-    config_dir = tmp_path / "config"
-    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
-    project_dir.mkdir(parents=True)
-    drive_dir.mkdir()
-    config_dir.mkdir()
-    note = project_dir / "status.md"
-    note.write_text(
-        "# Status\n\nCurrent project status with enough detail to create "
-        "a retrievable chunk for the ingestion pipeline test.",
-        encoding="utf-8",
-    )
-    registry_path = config_dir / "project-registry.json"
-    registry_path.write_text(
-        json.dumps(
-            {
-                "projects": [
-                    {
-                        "id": "p04-gigabit",
-                        "aliases": ["p04"],
-                        "ingest_roots": [
-                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
-                        ],
-                    }
-                ]
-            }
-        ),
-        encoding="utf-8",
-    )
-
-    class FakeVectorStore:
-        def __init__(self):
-            self.metadatas = []
-
-        def add(self, ids, documents, metadatas):
-            self.metadatas.extend(metadatas)
-
-        def delete(self, ids):
-            return None
-
-    fake_store = FakeVectorStore()
-    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
-    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
-    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
-    config.settings = config.Settings()
-    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
-
-    init_db()
-    result = ingest_file(note)
-
-    assert result["status"] == "ingested"
-    assert fake_store.metadatas
-    assert all(meta["project_id"] == "p04-gigabit" for meta in fake_store.metadatas)
-
-
-def test_ingest_file_logs_and_fails_open_when_project_derivation_fails(
-    tmp_data_dir,
-    sample_markdown,
-    monkeypatch,
-):
-    """A broken registry should be visible but should not block ingestion."""
-    init_db()
-    warnings = []
-
-    class FakeVectorStore:
-        def __init__(self):
-            self.metadatas = []
-
-        def add(self, ids, documents, metadatas):
-            self.metadatas.extend(metadatas)
-
-        def delete(self, ids):
-            return None
-
-    fake_store = FakeVectorStore()
-    monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
-    monkeypatch.setattr(
-        "atocore.projects.registry.derive_project_id_for_path",
-        lambda path: (_ for _ in ()).throw(ValueError("registry broken")),
-    )
-    monkeypatch.setattr(
-        "atocore.ingestion.pipeline.log.warning",
-        lambda event, **kwargs: warnings.append((event, kwargs)),
-    )
-
-    result = ingest_file(sample_markdown)
-
-    assert result["status"] == "ingested"
-    assert fake_store.metadatas
-    assert all(meta["project_id"] == "" for meta in fake_store.metadatas)
-    assert warnings[0][0] == "project_id_derivation_failed"
-    assert "registry broken" in warnings[0][1]["error"]
-
-
-def test_ingest_project_folder_passes_project_id_to_files(tmp_data_dir, sample_folder, monkeypatch):
-    seen = []
-
-    def fake_ingest_file(path, project_id=""):
-        seen.append((path.name, project_id))
-        return {"file": str(path), "status": "ingested"}
-
-    monkeypatch.setattr("atocore.ingestion.pipeline.ingest_file", fake_ingest_file)
-    monkeypatch.setattr("atocore.ingestion.pipeline._purge_deleted_files", lambda *args, **kwargs: 0)
-
-    ingest_project_folder(sample_folder, project_id="p05-interferometer")
-
-    assert seen
-    assert {project_id for _, project_id in seen} == {"p05-interferometer"}
-
-
 def test_parse_markdown_uses_supplied_text(sample_markdown):
    """Parsing should be able to reuse pre-read content from ingestion."""
    latin_text = """---\ntags: parser\n---\n# Parser Title\n\nBody text."""
--- a/tests/test_project_registry.py
+++ b/tests/test_project_registry.py
@@ -5,7 +5,6 @@ import json
 import atocore.config as config
 from atocore.projects.registry import (
    build_project_registration_proposal,
-    derive_project_id_for_path,
    get_registered_project,
    get_project_registry_template,
    list_registered_projects,
@@ -104,98 +103,6 @@ def test_project_registry_resolves_alias(tmp_path, monkeypatch):
    assert project.project_id == "p05-interferometer"


-def test_derive_project_id_for_path_uses_registered_roots(tmp_path, monkeypatch):
-    vault_dir = tmp_path / "vault"
-    drive_dir = tmp_path / "drive"
-    config_dir = tmp_path / "config"
-    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
-    project_dir.mkdir(parents=True)
-    drive_dir.mkdir()
-    config_dir.mkdir()
-    note = project_dir / "status.md"
-    note.write_text("# Status\n\nCurrent work.", encoding="utf-8")
-
-    registry_path = config_dir / "project-registry.json"
-    registry_path.write_text(
-        json.dumps(
-            {
-                "projects": [
-                    {
-                        "id": "p04-gigabit",
-                        "aliases": ["p04"],
-                        "ingest_roots": [
-                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
-                        ],
-                    }
-                ]
-            }
-        ),
-        encoding="utf-8",
-    )
-
-    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
-    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
-    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
-
-    original_settings = config.settings
-    try:
-        config.settings = config.Settings()
-        assert derive_project_id_for_path(note) == "p04-gigabit"
-        assert derive_project_id_for_path(tmp_path / "elsewhere.md") == ""
-    finally:
-        config.settings = original_settings
-
-
-def test_project_registry_rejects_cross_project_ingest_root_overlap(tmp_path, monkeypatch):
-    vault_dir = tmp_path / "vault"
-    drive_dir = tmp_path / "drive"
-    config_dir = tmp_path / "config"
-    vault_dir.mkdir()
-    drive_dir.mkdir()
-    config_dir.mkdir()
-
-    registry_path = config_dir / "project-registry.json"
-    registry_path.write_text(
-        json.dumps(
-            {
-                "projects": [
-                    {
-                        "id": "parent",
-                        "aliases": [],
-                        "ingest_roots": [
-                            {"source": "vault", "subpath": "incoming/projects/parent"}
-                        ],
-                    },
-                    {
-                        "id": "child",
-                        "aliases": [],
-                        "ingest_roots": [
-                            {"source": "vault", "subpath": "incoming/projects/parent/child"}
-                        ],
-                    },
-                ]
-            }
-        ),
-        encoding="utf-8",
-    )
-
-    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
-    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
-    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
-
-    original_settings = config.settings
-    try:
-        config.settings = config.Settings()
-        try:
-            list_registered_projects()
-        except ValueError as exc:
-            assert "ingest root overlap" in str(exc)
-        else:
-            raise AssertionError("Expected overlapping ingest roots to raise")
-    finally:
-        config.settings = original_settings
-
-
 def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypatch):
    vault_dir = tmp_path / "vault"
    drive_dir = tmp_path / "drive"
@@ -226,8 +133,8 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat

    calls = []

-    def fake_ingest_folder(path, purge_deleted=True, project_id=""):
-        calls.append((str(path), purge_deleted, project_id))
+    def fake_ingest_folder(path, purge_deleted=True):
+        calls.append((str(path), purge_deleted))
        return [{"file": str(path / "README.md"), "status": "ingested"}]

    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -237,7 +144,7 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fake_ingest_folder)
+        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
        result = refresh_registered_project("polisher")
    finally:
        config.settings = original_settings
@@ -246,7 +153,6 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
    assert len(calls) == 1
    assert calls[0][0].endswith("p06-polisher")
    assert calls[0][1] is False
-    assert calls[0][2] == "p06-polisher"
    assert result["roots"][0]["status"] == "ingested"
    assert result["status"] == "ingested"
    assert result["roots_ingested"] == 1
@@ -282,7 +188,7 @@ def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
        encoding="utf-8",
    )

-    def fail_ingest_folder(path, purge_deleted=True, project_id=""):
+    def fail_ingest_folder(path, purge_deleted=True):
        raise AssertionError(f"ingest_folder should not be called for missing root: {path}")

    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -292,7 +198,7 @@ def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fail_ingest_folder)
+        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fail_ingest_folder)
        result = refresh_registered_project("ghost")
    finally:
        config.settings = original_settings
@@ -332,7 +238,7 @@ def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch
        encoding="utf-8",
    )

-    def fake_ingest_folder(path, purge_deleted=True, project_id=""):
+    def fake_ingest_folder(path, purge_deleted=True):
        return [{"file": str(path / "README.md"), "status": "ingested"}]

    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
@@ -342,7 +248,7 @@ def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch
    original_settings = config.settings
    try:
        config.settings = config.Settings()
-        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fake_ingest_folder)
+        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
        result = refresh_registered_project("mixed")
    finally:
        config.settings = original_settings
--- a/tests/test_retrieval.py
+++ b/tests/test_retrieval.py
@@ -70,28 +70,8 @@ def test_retrieve_skips_stale_vector_entries(tmp_data_dir, sample_markdown, monk


 def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
-    target_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p04-gigabit",
-            "aliases": ("p04", "gigabit"),
-            "ingest_roots": (),
-        },
-    )()
-    other_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p05-interferometer",
-            "aliases": ("p05", "interferometer"),
-            "ingest_roots": (),
-        },
-    )()
-
    class FakeStore:
        def query(self, query_embedding, top_k=10, where=None):
-            assert top_k == 8
            return {
                "ids": [["chunk-a", "chunk-b"]],
                "documents": [["project doc", "other doc"]],
@@ -122,501 +102,22 @@ def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
    )
    monkeypatch.setattr(
        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: target_project,
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.load_project_registry",
-        lambda: [target_project, other_project],
+        lambda project_name: type(
+            "Project",
+            (),
+            {
+                "project_id": "p04-gigabit",
+                "aliases": ("p04", "gigabit"),
+                "ingest_roots": (),
+            },
+        )(),
    )

    results = retrieve("mirror architecture", top_k=2, project_hint="p04")

-    assert len(results) == 1
+    assert len(results) == 2
    assert results[0].chunk_id == "chunk-a"
-
-
-def test_retrieve_project_scope_allows_unowned_global_chunks(monkeypatch):
-    target_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p04-gigabit",
-            "aliases": ("p04", "gigabit"),
-            "ingest_roots": (),
-        },
-    )()
-
-    class FakeStore:
-        def query(self, query_embedding, top_k=10, where=None):
-            return {
-                "ids": [["chunk-a", "chunk-global"]],
-                "documents": [["project doc", "global doc"]],
-                "metadatas": [[
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p04-gigabit/pkm/_index.md",
-                        "tags": '["p04-gigabit"]',
-                        "title": "P04",
-                        "document_id": "doc-a",
-                    },
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "shared/engineering-rules.md",
-                        "tags": "[]",
-                        "title": "Shared engineering rules",
-                        "document_id": "doc-global",
-                    },
-                ]],
-                "distances": [[0.2, 0.21]],
-            }
-
-    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
-    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever._existing_chunk_ids",
-        lambda chunk_ids: set(chunk_ids),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: target_project,
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.load_project_registry",
-        lambda: [target_project],
-    )
-
-    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
-
-    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-global"]
-
-
-def test_retrieve_project_scope_filter_can_be_disabled(monkeypatch):
-    target_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p04-gigabit",
-            "aliases": ("p04", "gigabit"),
-            "ingest_roots": (),
-        },
-    )()
-    other_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p05-interferometer",
-            "aliases": ("p05", "interferometer"),
-            "ingest_roots": (),
-        },
-    )()
-
-    class FakeStore:
-        def query(self, query_embedding, top_k=10, where=None):
-            assert top_k == 2
-            return {
-                "ids": [["chunk-a", "chunk-b"]],
-                "documents": [["project doc", "other project doc"]],
-                "metadatas": [[
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p04-gigabit/pkm/_index.md",
-                        "tags": '["p04-gigabit"]',
-                        "title": "P04",
-                        "document_id": "doc-a",
-                    },
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p05-interferometer/pkm/_index.md",
-                        "tags": '["p05-interferometer"]',
-                        "title": "P05",
-                        "document_id": "doc-b",
-                    },
-                ]],
-                "distances": [[0.2, 0.2]],
-            }
-
-    monkeypatch.setattr("atocore.config.settings.rank_project_scope_filter", False)
-    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
-    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever._existing_chunk_ids",
-        lambda chunk_ids: set(chunk_ids),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: target_project,
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.load_project_registry",
-        lambda: [target_project, other_project],
-    )
-
-    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
-
-    assert {r.chunk_id for r in results} == {"chunk-a", "chunk-b"}
-
-
-def test_retrieve_project_scope_ignores_title_for_ownership(monkeypatch):
-    target_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p04-gigabit",
-            "aliases": ("p04", "gigabit"),
-            "ingest_roots": (),
-        },
-    )()
-    other_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p06-polisher",
-            "aliases": ("p06", "polisher", "p11"),
-            "ingest_roots": (),
-        },
-    )()
-
-    class FakeStore:
-        def query(self, query_embedding, top_k=10, where=None):
-            return {
-                "ids": [["chunk-target", "chunk-poisoned-title"]],
-                "documents": [["p04 doc", "p06 doc"]],
-                "metadatas": [[
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p04-gigabit/pkm/_index.md",
-                        "tags": '["p04-gigabit"]',
-                        "title": "P04",
-                        "document_id": "doc-a",
-                    },
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p06-polisher/pkm/architecture.md",
-                        "tags": '["p06-polisher"]',
-                        "title": "GigaBIT M1 mirror lessons",
-                        "document_id": "doc-b",
-                    },
-                ]],
-                "distances": [[0.2, 0.19]],
-            }
-
-    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
-    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever._existing_chunk_ids",
-        lambda chunk_ids: set(chunk_ids),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: target_project,
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.load_project_registry",
-        lambda: [target_project, other_project],
-    )
-
-    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
-
-    assert [r.chunk_id for r in results] == ["chunk-target"]
-
-
-def test_retrieve_project_scope_uses_path_segments_not_substrings(monkeypatch):
-    target_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p05-interferometer",
-            "aliases": ("p05", "interferometer"),
-            "ingest_roots": (),
-        },
-    )()
-    abb_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "abb-space",
-            "aliases": ("abb",),
-            "ingest_roots": (),
-        },
-    )()
-
-    class FakeStore:
-        def query(self, query_embedding, top_k=10, where=None):
-            return {
-                "ids": [["chunk-target", "chunk-global"]],
-                "documents": [["p05 doc", "global doc"]],
-                "metadatas": [[
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p05-interferometer/pkm/_index.md",
-                        "tags": '["p05-interferometer"]',
-                        "title": "P05",
-                        "document_id": "doc-a",
-                    },
-                    {
-                        "heading_path": "Abbreviation notes",
-                        "source_file": "shared/cabbage-abbreviations.md",
-                        "tags": "[]",
-                        "title": "ABB-style abbreviations",
-                        "document_id": "doc-global",
-                    },
-                ]],
-                "distances": [[0.2, 0.21]],
-            }
-
-    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
-    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever._existing_chunk_ids",
-        lambda chunk_ids: set(chunk_ids),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: target_project,
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.load_project_registry",
-        lambda: [target_project, abb_project],
-    )
-
-    results = retrieve("abbreviations", top_k=2, project_hint="p05")
-
-    assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
-
-
-def test_retrieve_project_scope_prefers_exact_project_id(monkeypatch):
-    target_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p04-gigabit",
-            "aliases": ("p04", "gigabit"),
-            "ingest_roots": (),
-        },
-    )()
-    other_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p06-polisher",
-            "aliases": ("p06", "polisher"),
-            "ingest_roots": (),
-        },
-    )()
-
-    class FakeStore:
-        def query(self, query_embedding, top_k=10, where=None):
-            return {
-                "ids": [["chunk-target", "chunk-other", "chunk-global"]],
-                "documents": [["target doc", "other doc", "global doc"]],
-                "metadatas": [[
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "legacy/unhelpful-path.md",
-                        "tags": "[]",
-                        "title": "Target",
-                        "project_id": "p04-gigabit",
-                        "document_id": "doc-a",
-                    },
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p04-gigabit/title-poisoned.md",
-                        "tags": '["p04-gigabit"]',
-                        "title": "Looks target-owned but is explicit p06",
-                        "project_id": "p06-polisher",
-                        "document_id": "doc-b",
-                    },
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "shared/global.md",
-                        "tags": "[]",
-                        "title": "Shared",
-                        "project_id": "",
-                        "document_id": "doc-global",
-                    },
-                ]],
-                "distances": [[0.2, 0.19, 0.21]],
-            }
-
-    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
-    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever._existing_chunk_ids",
-        lambda chunk_ids: set(chunk_ids),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: target_project,
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.load_project_registry",
-        lambda: [target_project, other_project],
-    )
-
-    results = retrieve("mirror architecture", top_k=3, project_hint="p04")
-
-    assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
-
-
-def test_retrieve_empty_project_id_falls_back_to_path_ownership(monkeypatch):
-    target_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p04-gigabit",
-            "aliases": ("p04", "gigabit"),
-            "ingest_roots": (),
-        },
-    )()
-    other_project = type(
-        "Project",
-        (),
-        {
-            "project_id": "p05-interferometer",
-            "aliases": ("p05", "interferometer"),
-            "ingest_roots": (),
-        },
-    )()
-
-    class FakeStore:
-        def query(self, query_embedding, top_k=10, where=None):
-            return {
-                "ids": [["chunk-target", "chunk-other"]],
-                "documents": [["target doc", "other doc"]],
-                "metadatas": [[
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p04-gigabit/status.md",
-                        "tags": "[]",
-                        "title": "Target",
-                        "project_id": "",
-                        "document_id": "doc-a",
-                    },
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p05-interferometer/status.md",
-                        "tags": "[]",
-                        "title": "Other",
-                        "project_id": "",
-                        "document_id": "doc-b",
-                    },
-                ]],
-                "distances": [[0.2, 0.19]],
-            }
-
-    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
-    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever._existing_chunk_ids",
-        lambda chunk_ids: set(chunk_ids),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: target_project,
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.load_project_registry",
-        lambda: [target_project, other_project],
-    )
-
-    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
-
-    assert [r.chunk_id for r in results] == ["chunk-target"]
-
-
-def test_retrieve_unknown_project_hint_does_not_widen_or_filter(monkeypatch):
-    class FakeStore:
-        def query(self, query_embedding, top_k=10, where=None):
-            assert top_k == 2
-            return {
-                "ids": [["chunk-a", "chunk-b"]],
-                "documents": [["doc a", "doc b"]],
-                "metadatas": [[
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "project-a/file.md",
-                        "tags": "[]",
-                        "title": "A",
-                        "document_id": "doc-a",
-                    },
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "project-b/file.md",
-                        "tags": "[]",
-                        "title": "B",
-                        "document_id": "doc-b",
-                    },
-                ]],
-                "distances": [[0.2, 0.21]],
-            }
-
-    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
-    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever._existing_chunk_ids",
-        lambda chunk_ids: set(chunk_ids),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: None,
-    )
-
-    results = retrieve("overview", top_k=2, project_hint="unknown-project")
-
-    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
-
-
-def test_retrieve_fails_open_when_project_scope_resolution_fails(monkeypatch):
-    warnings = []
-
-    class FakeStore:
-        def query(self, query_embedding, top_k=10, where=None):
-            assert top_k == 2
-            return {
-                "ids": [["chunk-a", "chunk-b"]],
-                "documents": [["doc a", "doc b"]],
-                "metadatas": [[
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p04-gigabit/file.md",
-                        "tags": "[]",
-                        "title": "A",
-                        "document_id": "doc-a",
-                    },
-                    {
-                        "heading_path": "Overview",
-                        "source_file": "p05-interferometer/file.md",
-                        "tags": "[]",
-                        "title": "B",
-                        "document_id": "doc-b",
-                    },
-                ]],
-                "distances": [[0.2, 0.21]],
-            }
-
-    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
-    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever._existing_chunk_ids",
-        lambda chunk_ids: set(chunk_ids),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.get_registered_project",
-        lambda project_name: (_ for _ in ()).throw(ValueError("registry overlap")),
-    )
-    monkeypatch.setattr(
-        "atocore.retrieval.retriever.log.warning",
-        lambda event, **kwargs: warnings.append((event, kwargs)),
-    )
-
-    results = retrieve("overview", top_k=2, project_hint="p04")
-
-    assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
-    assert {warning[0] for warning in warnings} == {
-        "project_scope_resolution_failed",
-        "project_match_boost_resolution_failed",
-    }
-    assert all("registry overlap" in warning[1]["error"] for warning in warnings)
+    assert results[0].score > results[1].score


 def test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths(monkeypatch):