Merge origin/main into codex/dalidou-storage-foundation

Integrate codex/port-atocore-ops-client (operator client + operations playbook) so the dalidou-storage-foundation branch can fast-forward into main. # Conflicts: # README.md
phase9 first-real-use validation + small hygiene wins
2026-04-07 06:20:19 -04:00 · 2026-04-07 06:16:35 -04:00 · 2026-04-07 06:13:45 -04:00 · 2026-04-06 21:30:35 -04:00 · 2026-04-06 21:24:17 -04:00 · 2026-04-06 21:18:38 -04:00
59 changed files with 9348 additions and 59 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,9 @@
+.git
+.pytest_cache
+.coverage
+.claude
+data
+__pycache__
+*.pyc
+tests
+docs
--- a/.env.example
+++ b/.env.example
@@ -1,7 +1,23 @@
+ATOCORE_ENV=development
 ATOCORE_DEBUG=false
+ATOCORE_LOG_LEVEL=INFO
 ATOCORE_DATA_DIR=./data
+ATOCORE_DB_DIR=
+ATOCORE_CHROMA_DIR=
+ATOCORE_CACHE_DIR=
+ATOCORE_TMP_DIR=
+ATOCORE_VAULT_SOURCE_DIR=./sources/vault
+ATOCORE_DRIVE_SOURCE_DIR=./sources/drive
+ATOCORE_SOURCE_VAULT_ENABLED=true
+ATOCORE_SOURCE_DRIVE_ENABLED=true
+ATOCORE_LOG_DIR=./logs
+ATOCORE_BACKUP_DIR=./backups
+ATOCORE_RUN_DIR=./run
+ATOCORE_PROJECT_REGISTRY_DIR=./config
+ATOCORE_PROJECT_REGISTRY_PATH=./config/project-registry.json
 ATOCORE_HOST=127.0.0.1
 ATOCORE_PORT=8100
+ATOCORE_DB_BUSY_TIMEOUT_MS=5000
 ATOCORE_EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
 ATOCORE_CHUNK_MAX_SIZE=800
 ATOCORE_CHUNK_OVERLAP=100
--- a/.gitignore
+++ b/.gitignore
@@ -10,3 +10,4 @@ htmlcov/
 .coverage
 venv/
 .venv/
+.claude/
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,37 @@
+# AGENTS.md
+
+## Project role
+This repository is AtoCore, the runtime and machine-memory layer of the Ato ecosystem.
+
+## Ecosystem definitions
+- AtoCore = app/runtime/API/ingestion/retrieval/context builder/machine DB logic
+- AtoMind = future intelligence layer for promotion, reflection, conflict handling, trust decisions
+- AtoVault = human-readable memory source, intended for Obsidian
+- AtoDrive = trusted operational project source, higher trust than general vault notes
+
+## Storage principles
+- Human-readable source layers and machine operational storage must remain separate
+- AtoVault is not the live vector database location
+- AtoDrive is not the live vector database location
+- Machine operational storage includes SQLite, vector store, indexes, embeddings, and runtime metadata
+- The machine DB is derived operational state, not the primary human source of truth
+
+## Deployment principles
+- Dalidou is the canonical host for AtoCore service and machine database
+- OpenClaw on the T420 should consume AtoCore over API/network/Tailscale
+- Do not design around Syncthing for the live SQLite/vector DB
+- Prefer one canonical running service over multi-node live DB replication
+
+## Coding guidance
+- Keep path handling explicit and configurable via environment variables
+- Do not hard-code machine-specific absolute paths
+- Keep implementation small, testable, and reversible
+- Preserve current working behavior unless a change is necessary
+- Add or update tests when changing config, storage, or path logic
+
+## Change policy
+Before large refactors:
+1. explain the architectural reason
+2. propose the smallest safe batch
+3. implement incrementally
+4. summarize changed files and migration impact
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+FROM python:3.12-slim
+
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+
+WORKDIR /app
+
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends build-essential curl git \
+    && rm -rf /var/lib/apt/lists/*
+
+COPY pyproject.toml README.md requirements.txt requirements-dev.txt ./
+COPY config ./config
+COPY src ./src
+
+RUN pip install --no-cache-dir --upgrade pip \
+    && pip install --no-cache-dir .
+
+EXPOSE 8100
+
+CMD ["python", "-m", "uvicorn", "atocore.main:app", "--host", "0.0.0.0", "--port", "8100"]
--- a/README.md
+++ b/README.md
@@ -42,13 +42,13 @@ python scripts/atocore_client.py audit-query "gigabit" 5

 ## Architecture

-```
+```text
 FastAPI (port 8100)
-  ├── Ingestion: markdown → parse → chunk → embed → store
-  ├── Retrieval: query → embed → vector search → rank
-  ├── Context Builder: retrieve → boost → budget → format
-  ├── SQLite (documents, chunks, memories, projects, interactions)
-  └── ChromaDB (vector embeddings)
+  |- Ingestion: markdown -> parse -> chunk -> embed -> store
+  |- Retrieval: query -> embed -> vector search -> rank
+  |- Context Builder: retrieve -> boost -> budget -> format
+  |- SQLite (documents, chunks, memories, projects, interactions)
+  '- ChromaDB (vector embeddings)
 ```

 ## Configuration
@@ -74,3 +74,15 @@ pytest

 - `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
 - `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
+
+## Architecture Notes
+
+Implementation-facing architecture notes live under `docs/architecture/`.
+
+Current additions:
+- `docs/architecture/engineering-knowledge-hybrid-architecture.md` — 5-layer hybrid model
+- `docs/architecture/engineering-ontology-v1.md` — V1 object and relationship inventory
+- `docs/architecture/engineering-query-catalog.md` — 20 v1-required queries
+- `docs/architecture/memory-vs-entities.md` — canonical home split
+- `docs/architecture/promotion-rules.md` — Layer 0 to Layer 2 pipeline
+- `docs/architecture/conflict-model.md` — contradictory facts detection and resolution
--- a/config/project-registry.example.json
+++ b/config/project-registry.example.json
@@ -0,0 +1,21 @@
+{
+  "projects": [
+    {
+      "id": "p07-example",
+      "aliases": ["p07", "example-project"],
+      "description": "Short description of the project and the staged source set.",
+      "ingest_roots": [
+        {
+          "source": "vault",
+          "subpath": "incoming/projects/p07-example",
+          "label": "Primary staged project docs"
+        },
+        {
+          "source": "drive",
+          "subpath": "projects/p07-example",
+          "label": "Trusted operational docs"
+        }
+      ]
+    }
+  ]
+}
--- a/config/project-registry.json
+++ b/config/project-registry.json
@@ -0,0 +1,52 @@
+{
+  "projects": [
+    {
+      "id": "atocore",
+      "aliases": ["ato core"],
+      "description": "AtoCore platform docs and trusted project materials.",
+      "ingest_roots": [
+        {
+          "source": "drive",
+          "subpath": "atocore",
+          "label": "AtoCore drive docs"
+        }
+      ]
+    },
+    {
+      "id": "p04-gigabit",
+      "aliases": ["p04", "gigabit", "gigaBIT"],
+      "description": "Active P04 GigaBIT mirror project corpus from PKM plus staged operational docs.",
+      "ingest_roots": [
+        {
+          "source": "vault",
+          "subpath": "incoming/projects/p04-gigabit",
+          "label": "P04 staged project docs"
+        }
+      ]
+    },
+    {
+      "id": "p05-interferometer",
+      "aliases": ["p05", "interferometer"],
+      "description": "Active P05 interferometer corpus from PKM plus selected repo context and vendor documentation.",
+      "ingest_roots": [
+        {
+          "source": "vault",
+          "subpath": "incoming/projects/p05-interferometer",
+          "label": "P05 staged project docs"
+        }
+      ]
+    },
+    {
+      "id": "p06-polisher",
+      "aliases": ["p06", "polisher"],
+      "description": "Active P06 polisher corpus from PKM, software-suite notes, and selected repo context.",
+      "ingest_roots": [
+        {
+          "source": "vault",
+          "subpath": "incoming/projects/p06-polisher",
+          "label": "P06 staged project docs"
+        }
+      ]
+    }
+  ]
+}
--- a/deploy/dalidou/.env.example
+++ b/deploy/dalidou/.env.example
@@ -0,0 +1,19 @@
+ATOCORE_ENV=production
+ATOCORE_DEBUG=false
+ATOCORE_LOG_LEVEL=INFO
+ATOCORE_HOST=0.0.0.0
+ATOCORE_PORT=8100
+
+ATOCORE_DATA_DIR=/srv/storage/atocore/data
+ATOCORE_DB_DIR=/srv/storage/atocore/data/db
+ATOCORE_CHROMA_DIR=/srv/storage/atocore/data/chroma
+ATOCORE_CACHE_DIR=/srv/storage/atocore/data/cache
+ATOCORE_TMP_DIR=/srv/storage/atocore/data/tmp
+ATOCORE_LOG_DIR=/srv/storage/atocore/logs
+ATOCORE_BACKUP_DIR=/srv/storage/atocore/backups
+ATOCORE_RUN_DIR=/srv/storage/atocore/run
+
+ATOCORE_VAULT_SOURCE_DIR=/srv/storage/atocore/sources/vault
+ATOCORE_DRIVE_SOURCE_DIR=/srv/storage/atocore/sources/drive
+ATOCORE_SOURCE_VAULT_ENABLED=true
+ATOCORE_SOURCE_DRIVE_ENABLED=true
--- a/deploy/dalidou/docker-compose.yml
+++ b/deploy/dalidou/docker-compose.yml
@@ -0,0 +1,28 @@
+services:
+  atocore:
+    build:
+      context: ../../
+      dockerfile: Dockerfile
+    container_name: atocore
+    restart: unless-stopped
+    ports:
+      - "${ATOCORE_PORT:-8100}:8100"
+    env_file:
+      - .env
+    volumes:
+      - ${ATOCORE_DB_DIR}:${ATOCORE_DB_DIR}
+      - ${ATOCORE_CHROMA_DIR}:${ATOCORE_CHROMA_DIR}
+      - ${ATOCORE_CACHE_DIR}:${ATOCORE_CACHE_DIR}
+      - ${ATOCORE_TMP_DIR}:${ATOCORE_TMP_DIR}
+      - ${ATOCORE_LOG_DIR}:${ATOCORE_LOG_DIR}
+      - ${ATOCORE_BACKUP_DIR}:${ATOCORE_BACKUP_DIR}
+      - ${ATOCORE_RUN_DIR}:${ATOCORE_RUN_DIR}
+      - ${ATOCORE_PROJECT_REGISTRY_DIR}:${ATOCORE_PROJECT_REGISTRY_DIR}
+      - ${ATOCORE_VAULT_SOURCE_DIR}:${ATOCORE_VAULT_SOURCE_DIR}:ro
+      - ${ATOCORE_DRIVE_SOURCE_DIR}:${ATOCORE_DRIVE_SOURCE_DIR}:ro
+    healthcheck:
+      test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8100/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 20s
--- a/docs/architecture/conflict-model.md
+++ b/docs/architecture/conflict-model.md
@@ -0,0 +1,332 @@
+# Conflict Model (how AtoCore handles contradictory facts)
+
+## Why this document exists
+
+Any system that accumulates facts from multiple sources — interactions,
+ingested documents, repo history, PKM notes — will eventually see
+contradictory facts about the same thing. AtoCore's operating model
+already has the hard rule:
+
+> **Bad memory is worse than no memory.**
+
+The practical consequence of that rule is: AtoCore must never
+silently merge contradictory facts, never silently pick a winner,
+and never silently discard evidence. Every conflict must be
+surfaced to a human reviewer with full audit context.
+
+This document defines what "conflict" means in AtoCore, how
+conflicts are detected, how they are represented, how they are
+surfaced, and how they are resolved.
+
+## What counts as a conflict
+
+A conflict exists when two or more facts in the system claim
+incompatible values for the same conceptual slot. More precisely:
+
+A conflict is a set of two or more **active** rows (across memories,
+entities, project_state) such that:
+
+1. They share the same **target identity** — same entity type and
+   same semantic key
+2. Their **claimed values** are incompatible
+3. They are all in an **active** status (not superseded, not
+   invalid, not candidate)
+
+Examples that are conflicts:
+
+- Two active `Decision` entities affecting the same `Subsystem`
+  with contradictory values for the same decided field (e.g.
+  lateral support material = GF-PTFE vs lateral support material = PEEK)
+- An active `preference` memory "prefers rebase workflow" and an
+  active `preference` memory "prefers merge-commit workflow"
+- A `project_state` entry `p05 / decision / lateral_support_material = GF-PTFE`
+  and an active `Decision` entity also claiming the lateral support
+  material is PEEK (cross-layer conflict)
+
+Examples that are NOT conflicts:
+
+- Two active memories both saying "prefers small diffs" — same
+  meaning, not contradictory
+- An active memory saying X and a candidate memory saying Y —
+  candidates are not active, so this is part of the review queue
+  flow, not the conflict flow
+- A superseded `Decision` saying X and an active `Decision` saying Y
+  — supersession is a resolved history, not a conflict
+- Two active `Requirement` entities each constraining the same
+  component in different but compatible ways (e.g. one caps mass,
+  one caps heat flux) — different fields, no contradiction
+
+## Detection triggers
+
+Conflict detection must fire at every write that could create a new
+active fact. That means the following hook points:
+
+1. **`POST /memory` creating an active memory** (legacy path)
+2. **`POST /memory/{id}/promote`** (candidate → active)
+3. **`POST /entities` creating an active entity** (future)
+4. **`POST /entities/{id}/promote`** (candidate → active, future)
+5. **`POST /project/state`** (curating trusted state directly)
+6. **`POST /memory/{id}/graduate`** (memory → entity graduation,
+   future — the resulting entity could conflict with something)
+
+Extraction passes do NOT trigger conflict detection at candidate
+write time. Candidates are allowed to sit in the queue in an
+apparently-conflicting state; the reviewer will see them during
+promotion and decision-detection fires at that moment.
+
+## Detection strategy per layer
+
+### Memory layer
+
+For identity / preference / episodic memories (the ones that stay
+in the memory layer):
+
+- Matching key: `(memory_type, project, normalized_content_family)`
+- `normalized_content_family` is not a hash of the content — that
+  would require exact equality — but a slot identifier extracted
+  by a small per-type rule set:
+  - identity: slot is "role" / "background" / "credentials"
+  - preference: slot is the first content word after "prefers" / "uses" / "likes"
+    normalized to a lowercase noun stem, OR the rule id that extracted it
+  - episodic: no slot — episodic entries are intrinsically tied to
+    a moment in time and rarely conflict
+
+A conflict is flagged when two active memories share a
+`(memory_type, project, slot)` but have different content bodies.
+
+### Entity layer (V1)
+
+For each V1 entity type, the conflict key is a short tuple that
+uniquely identifies the "slot" that entity is claiming:
+
+| Entity type       | Conflict slot                                         |
+|-------------------|-------------------------------------------------------|
+| Project           | `(project_id)`                                        |
+| Subsystem         | `(project_id, subsystem_name)`                        |
+| Component         | `(project_id, subsystem_name, component_name)`        |
+| Requirement       | `(project_id, requirement_key)`                       |
+| Constraint        | `(project_id, constraint_target, constraint_kind)`    |
+| Decision          | `(project_id, decision_target, decision_field)`       |
+| Material          | `(project_id, component_id)`                          |
+| Parameter         | `(project_id, parameter_scope, parameter_name)`       |
+| AnalysisModel     | `(project_id, subsystem_id, model_name)`              |
+| Result            | `(project_id, analysis_model_id, result_key)`         |
+| ValidationClaim   | `(project_id, claim_key)`                             |
+| Artifact          | no conflict detection — artifacts are additive        |
+
+A conflict is two active entities with the same slot but
+different structural values. The exact "which fields count as
+structural" list is per-type and lives in the entity schema doc
+(not yet written — tracked as future `engineering-ontology-v1.md`
+updates).
+
+### Cross-layer (memory vs entity vs trusted project state)
+
+Trusted project state trumps active entities trumps active
+memories. This is the trust hierarchy from the operating model.
+
+Cross-layer conflict detection works by a nightly job that walks
+the three layers and flags any slot that has entries in more than
+one layer with incompatible values:
+
+- If trusted project state and an entity disagree: the entity is
+  flagged; trusted state is assumed correct
+- If an entity and a memory disagree: the memory is flagged; the
+  entity is assumed correct
+- If trusted state and a memory disagree: the memory is flagged;
+  trusted state is assumed correct
+
+In all three cases the lower-trust row gets a `conflicts_with`
+reference pointing at the higher-trust row but does NOT auto-move
+to superseded. The flag is an alert, not an action.
+
+## Representation
+
+Conflicts are represented as rows in a new `conflicts` table
+(V1 schema, not yet shipped):
+
+```sql
+CREATE TABLE conflicts (
+    id TEXT PRIMARY KEY,
+    detected_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    slot_kind TEXT NOT NULL,            -- "memory_slot" or "entity_slot" or "cross_layer"
+    slot_key TEXT NOT NULL,             -- JSON-encoded tuple identifying the slot
+    project TEXT DEFAULT '',
+    status TEXT NOT NULL DEFAULT 'open', -- open | resolved | dismissed
+    resolved_at DATETIME,
+    resolution TEXT DEFAULT '',         -- free text from the reviewer
+    -- links to conflicting rows live in conflict_members
+    UNIQUE(slot_kind, slot_key, status) -- ensures only one open conflict per slot
+);
+
+CREATE TABLE conflict_members (
+    conflict_id TEXT NOT NULL REFERENCES conflicts(id) ON DELETE CASCADE,
+    member_kind TEXT NOT NULL,          -- "memory" | "entity" | "project_state"
+    member_id TEXT NOT NULL,
+    member_layer_trust INTEGER NOT NULL,-- 1=memory, 2=entity, 3=project_state
+    PRIMARY KEY (conflict_id, member_kind, member_id)
+);
+```
+
+Constraint rationale:
+
+- `UNIQUE(slot_kind, slot_key, status)` where status='open' prevents
+  duplicate "conflict already open for this slot" rows. At most one
+  open conflict exists per slot at a time; new conflicting rows are
+  added as members to the existing conflict, not as a new conflict.
+- `conflict_members.member_layer_trust` is denormalized so the
+  conflict resolution UI can sort conflicting rows by trust tier
+  without re-querying.
+- `status='dismissed'` exists separately from `resolved` because
+  "the reviewer looked at this and declared it not a real conflict"
+  is a valid distinct outcome (the two rows really do describe
+  different things and the detector was overfitting).
+
+## API shape
+
+```
+GET  /conflicts                               list open conflicts
+GET  /conflicts?status=resolved               list resolved conflicts
+GET  /conflicts?project=p05-interferometer    scope by project
+GET  /conflicts/{id}                          full detail including all members
+
+POST /conflicts/{id}/resolve                  mark resolved with notes
+  body: {
+    "resolution_notes": "...",
+    "winner_member_id": "...",       # optional: if specified,
+                                     # other members are auto-superseded
+    "action": "supersede_others"     # or "no_action" if reviewer
+                                     # wants to resolve without touching rows
+  }
+
+POST /conflicts/{id}/dismiss                  mark dismissed ("not a real conflict")
+  body: {
+    "reason": "..."
+  }
+```
+
+Conflict detection must also surface in existing endpoints:
+
+- `GET /memory/{id}` — response includes a `conflicts` array if
+  the memory is a member of any open conflict
+- `GET /entities/{type}/{id}` (future) — same
+- `GET /health` — includes `open_conflicts_count` so the operator
+  sees at a glance that review is pending
+
+## Supersession as a conflict resolution tool
+
+When the reviewer resolves a conflict with `action: "supersede_others"`,
+the winner stays active and every other member is flipped to
+status="superseded" with a `superseded_by` pointer to the winner.
+This is the normal path: "we used to think X, now we know Y, flag
+X as superseded so the audit trail keeps X visible but X no longer
+influences context".
+
+The conflict resolution audit record links back to all superseded
+members, so the conflict history itself is queryable:
+
+- "Show me every conflict that touched Subsystem X"
+- "Show me every Decision that superseded another Decision because
+  of a conflict"
+
+These are entries in the V1 query catalog (see Q-014 decision history).
+
+## Detection latency
+
+Conflict detection runs at two latencies:
+
+1. **Synchronous (at write time)** — every create/promote/update of
+   an active row in a conflict-enabled type runs a synchronous
+   same-layer detector. If a conflict is detected the write still
+   succeeds but a row is inserted into `conflicts` and the API
+   response includes a `conflict_id` field so the caller knows
+   immediately.
+
+2. **Asynchronous (nightly sweep)** — a scheduled job walks all
+   three layers looking for cross-layer conflicts that slipped
+   past write-time detection (e.g. a memory that was already
+   active before an entity with the same slot was promoted). The
+   sweep also looks for slot overlaps that the synchronous
+   detector can't see because the slot key extraction rules have
+   improved since the row was written.
+
+Both paths write to the same `conflicts` table and both are
+surfaced in the same review queue.
+
+## The "flag, never block" rule
+
+Detection **never** blocks writes. The operating rule is:
+
+- If the write is otherwise valid (schema, permissions, trust
+  hierarchy), accept it
+- Log the conflict
+- Surface it to the reviewer
+- Let the system keep functioning with the conflict in place
+
+The alternative — blocking writes on conflict — would mean that
+one stale fact could prevent all future writes until manually
+resolved, which in practice makes the system unusable for normal
+work. The "flag, never block" rule keeps AtoCore responsive while
+still making conflicts impossible to ignore (the `/health`
+endpoint's `open_conflicts_count` makes them loud).
+
+The one exception: writing to `project_state` (layer 3) when an
+open conflict already exists on that slot will return a warning
+in the response body. The write still happens, but the reviewer
+is explicitly told "you just wrote to a slot that has an open
+conflict". This is the highest-trust layer so we want extra
+friction there without actually blocking.
+
+## Showing conflicts in the Human Mirror
+
+When the Human Mirror template renders a project overview, any
+open conflict in that project shows as a **"⚠ disputed"** marker
+next to the affected field, with a link to the conflict detail.
+This makes conflicts visible to anyone reading the derived
+human-facing pages, not just to reviewers who think to check the
+`/conflicts` endpoint.
+
+The Human Mirror render rules (not yet written — tracked as future
+`human-mirror-rules.md`) will specify exactly where and how the
+disputed marker appears.
+
+## What this document does NOT solve
+
+1. **Automatic conflict resolution.** No policy will ever
+   automatically promote one conflict member over another. The
+   trust hierarchy is an *alert ordering* for reviewers, not an
+   auto-resolve rule. The human signs off on every resolution.
+
+2. **Cross-project conflicts.** If p04 and p06 both have
+   entities claiming conflicting things about a shared component,
+   that is currently out of scope because the V1 slot keys all
+   include `project_id`. Cross-project conflict detection is a
+   future concern that needs its own slot key strategy.
+
+3. **Temporal conflicts with partial overlap.** If a fact was
+   true during a time window and another fact is true in a
+   different time window, that is not a conflict — it's history.
+   Representing time-bounded facts is deferred to a future
+   temporal-entities doc.
+
+4. **Probabilistic "soft" conflicts.** If two entities claim the
+   same slot with slightly different values (e.g. "4.8 kg" vs
+   "4.82 kg"), is that a conflict? For V1, yes — the string
+   values are unequal so they're flagged. Tolerance-aware
+   numeric comparisons are a V2 concern.
+
+## TL;DR
+
+- Conflicts = two or more active rows claiming the same slot with
+  incompatible values
+- Detection fires on every active write AND in a nightly sweep
+- Conflicts are stored in a dedicated `conflicts` table with a
+  `conflict_members` join
+- Resolution is always human (promote-winner / supersede-others
+  / dismiss-as-not-a-conflict)
+- "Flag, never block" — writes always succeed, conflicts are
+  surfaced via `/conflicts`, `/health`, per-entity responses, and
+  the Human Mirror
+- Trusted project state is the top of the trust hierarchy and is
+  assumed correct in any cross-layer conflict until the reviewer
+  says otherwise
--- a/docs/architecture/engineering-knowledge-hybrid-architecture.md
+++ b/docs/architecture/engineering-knowledge-hybrid-architecture.md
@@ -0,0 +1,205 @@
+# Engineering Knowledge Hybrid Architecture
+
+## Purpose
+
+This note defines how **AtoCore** can evolve into the machine foundation for a **living engineering project knowledge system** while remaining aligned with core AtoCore philosophy.
+
+AtoCore remains:
+- the trust engine
+- the memory/context engine
+- the retrieval/context assembly layer
+- the runtime-facing augmentation layer
+
+It does **not** become a generic wiki app or a PLM clone.
+
+## Core Architectural Thesis
+
+AtoCore should act as the **machine truth / context / memory substrate** for project knowledge systems.
+
+That substrate can then support:
+- engineering knowledge accumulation
+- human-readable mirrors
+- OpenClaw augmentation
+- future engineering copilots
+- project traceability across design, analysis, manufacturing, and operations
+
+## Layer Model
+
+### Layer 0 — Raw Artifact Layer
+Examples:
+- CAD exports
+- FEM exports
+- videos / transcripts
+- screenshots
+- PDFs
+- source code
+- spreadsheets
+- reports
+- test data
+
+### Layer 1 — AtoCore Core Machine Layer
+Canonical machine substrate.
+
+Contains:
+- source registry
+- source chunks
+- embeddings / vector retrieval
+- structured memory
+- trusted project state
+- entity and relationship stores
+- provenance and confidence metadata
+- interactions / retrieval logs / context packs
+
+### Layer 2 — Engineering Knowledge Layer
+Domain-specific project model built on top of AtoCore.
+
+Represents typed engineering objects such as:
+- Project
+- System
+- Subsystem
+- Component
+- Interface
+- Requirement
+- Constraint
+- Assumption
+- Decision
+- Material
+- Parameter
+- Equation
+- Analysis Model
+- Result
+- Validation Claim
+- Manufacturing Process
+- Test
+- Software Module
+- Vendor
+- Artifact
+
+### Layer 3 — Human Mirror
+Derived human-readable support surface.
+
+Examples:
+- project overview
+- current state
+- subsystem pages
+- component pages
+- decision log
+- validation summary
+- timeline
+- open questions / risks
+
+This layer is **derived** from structured state and approved synthesis. It is not canonical machine truth.
+
+### Layer 4 — Runtime / Clients
+Consumers such as:
+- OpenClaw
+- CLI tools
+- dashboards
+- future IDE integrations
+- engineering copilots
+- reporting systems
+- Atomizer / optimization tooling
+
+## Non-Negotiable Rule
+
+**Human-readable pages are support artifacts. They are not the primary machine truth layer.**
+
+Runtime trust order should remain:
+1. trusted current project state
+2. validated structured records
+3. selected reviewed synthesis
+4. retrieved source evidence
+5. historical / low-confidence material
+
+## Responsibilities
+
+### AtoCore core owns
+- memory CRUD
+- trusted project state CRUD
+- retrieval orchestration
+- context assembly
+- provenance
+- confidence / status
+- conflict flags
+- runtime APIs
+
+### Engineering Knowledge Layer owns
+- engineering object taxonomy
+- engineering relationships
+- domain adapters
+- project-specific interpretation logic
+- design / analysis / manufacturing / operations linkage
+
+### Human Mirror owns
+- readability
+- navigation
+- overview pages
+- subsystem summaries
+- decision digests
+- human inspection / audit comfort
+
+## Update Model
+
+New artifacts should not directly overwrite trusted state.
+
+Recommended update flow:
+1. ingest source
+2. parse / chunk / register artifact
+3. extract candidate objects / claims / relationships
+4. compare against current trusted state
+5. flag conflicts or supersessions
+6. promote updates only under explicit rules
+7. regenerate affected human-readable pages
+8. log history and provenance
+
+## Integration with Existing Knowledge Base System
+
+The existing engineering Knowledge Base project can be treated as the first major domain adapter.
+
+Bridge targets include:
+- KB-CAD component and architecture pages
+- KB-FEM models / results / validation pages
+- generation history
+- images / transcripts / session captures
+
+AtoCore should absorb the structured value of that system, not replace it with plain retrieval.
+
+## Suggested First Implementation Scope
+
+1. stabilize current AtoCore core behavior
+2. define engineering ontology v1
+3. add minimal entity / relationship support
+4. create a Knowledge Base bridge for existing project structures
+5. generate Human Mirror v1 pages:
+   - overview
+   - current state
+   - decision log
+   - subsystem summary
+6. add engineering-aware context assembly for OpenClaw
+
+## Why This Is Aligned With AtoCore Philosophy
+
+This architecture preserves the original core ideas:
+- owned memory layer
+- owned context assembly
+- machine-human separation
+- provenance and trust clarity
+- portability across runtimes
+- robustness before sophistication
+
+## Long-Range Outcome
+
+AtoCore can become the substrate for a **knowledge twin** of an engineering project:
+- structure
+- intent
+- rationale
+- validation
+- manufacturing impact
+- operational behavior
+- change history
+- evidence traceability
+
+That is significantly more powerful than either:
+- a generic wiki
+- plain document RAG
+- an assistant with only chat memory
--- a/docs/architecture/engineering-ontology-v1.md
+++ b/docs/architecture/engineering-ontology-v1.md
@@ -0,0 +1,250 @@
+# Engineering Ontology V1
+
+## Purpose
+
+Define the first practical engineering ontology that can sit on top of AtoCore and represent a real engineering project as structured knowledge.
+
+This ontology is intended to be:
+- useful to machines
+- inspectable by humans through derived views
+- aligned with AtoCore trust / provenance rules
+- expandable across mechanical, FEM, electrical, software, manufacturing, and operations
+
+## Goal
+
+Represent a project as a **system of objects and relationships**, not as a pile of notes.
+
+The ontology should support queries such as:
+- what is this subsystem?
+- what requirements does this component satisfy?
+- what result validates this claim?
+- what changed recently?
+- what interfaces are affected by a design change?
+- what is active vs superseded?
+
+## Object Families
+
+### Project structure
+- Project
+- System
+- Subsystem
+- Assembly
+- Component
+- Interface
+
+### Intent / design logic
+- Requirement
+- Constraint
+- Assumption
+- Decision
+- Rationale
+- Risk
+- Issue
+- Open Question
+- Change Request
+
+### Physical / technical definition
+- Material
+- Parameter
+- Equation
+- Configuration
+- Geometry Artifact
+- CAD Artifact
+- Tolerance
+- Operating Mode
+
+### Analysis / validation
+- Analysis Model
+- Load Case
+- Boundary Condition
+- Solver Setup
+- Result
+- Validation Claim
+- Test
+- Correlation Record
+
+### Manufacturing / delivery
+- Manufacturing Process
+- Vendor
+- BOM Item
+- Part Number
+- Assembly Procedure
+- Inspection Step
+- Cost Driver
+
+### Software / controls / electrical
+- Software Module
+- Control Function
+- State Machine
+- Signal
+- Sensor
+- Actuator
+- Electrical Interface
+- Firmware Artifact
+
+### Evidence / provenance
+- Source Document
+- Transcript Segment
+- Image / Screenshot
+- Session
+- Report
+- External Reference
+- Generated Summary
+
+## Minimum Viable V1 Scope
+
+Initial implementation should start with:
+- Project
+- Subsystem
+- Component
+- Requirement
+- Constraint
+- Decision
+- Material
+- Parameter
+- Analysis Model
+- Result
+- Validation Claim
+- Artifact
+
+This is enough to represent meaningful project state without trying to model everything immediately.
+
+## Core Relationship Types
+
+### Structural
+- `CONTAINS`
+- `PART_OF`
+- `INTERFACES_WITH`
+
+### Intent / logic
+- `SATISFIES`
+- `CONSTRAINED_BY`
+- `BASED_ON_ASSUMPTION`
+- `AFFECTED_BY_DECISION`
+- `SUPERSEDES`
+
+### Validation
+- `ANALYZED_BY`
+- `VALIDATED_BY`
+- `SUPPORTS`
+- `CONFLICTS_WITH`
+- `DEPENDS_ON`
+
+### Artifact / provenance
+- `DESCRIBED_BY`
+- `UPDATED_BY_SESSION`
+- `EVIDENCED_BY`
+- `SUMMARIZED_IN`
+
+## Example Statements
+
+- `Subsystem:Lateral Support CONTAINS Component:Pivot Pin`
+- `Component:Pivot Pin CONSTRAINED_BY Requirement:low lateral friction`
+- `Decision:Use GF-PTFE pad AFFECTS Subsystem:Lateral Support`
+- `AnalysisModel:M1 static model ANALYZES Subsystem:Reference Frame`
+- `Result:deflection case 03 SUPPORTS ValidationClaim:vertical stiffness acceptable`
+- `Artifact:NX assembly DESCRIBES Component:Reference Frame`
+- `Session:gen-004 UPDATED_BY_SESSION Component:Vertical Support`
+
+## Shared Required Fields
+
+Every major object should support fields equivalent to:
+- `id`
+- `type`
+- `name`
+- `project_id`
+- `status`
+- `confidence`
+- `source_refs`
+- `created_at`
+- `updated_at`
+- `notes` (optional)
+
+## Suggested Status Lifecycle
+
+For objects and claims:
+- `candidate`
+- `active`
+- `superseded`
+- `invalid`
+- `needs_review`
+
+## Trust Rules
+
+1. An object may exist before it becomes trusted.
+2. A generated markdown summary is not canonical truth by default.
+3. If evidence conflicts, prefer:
+   1. trusted current project state
+   2. validated structured records
+   3. reviewed derived synthesis
+   4. raw evidence
+   5. historical notes
+4. Conflicts should be surfaced, not silently blended.
+
+## Mapping to the Existing Knowledge Base System
+
+### KB-CAD can map to
+- System
+- Subsystem
+- Component
+- Material
+- Decision
+- Constraint
+- Artifact
+
+### KB-FEM can map to
+- Analysis Model
+- Load Case
+- Boundary Condition
+- Result
+- Validation Claim
+- Correlation Record
+
+### Session generations can map to
+- Session
+- Generated Summary
+- object update history
+- provenance events
+
+## Human Mirror Possibilities
+
+Once the ontology exists, AtoCore can generate pages such as:
+- project overview
+- subsystem page
+- component page
+- decision log
+- validation summary
+- requirement trace page
+
+These should remain **derived representations** of structured state.
+
+## Recommended V1 Deliverables
+
+1. minimal typed object registry
+2. minimal typed relationship registry
+3. evidence-linking support
+4. practical query support for:
+   - component summary
+   - subsystem current state
+   - requirement coverage
+   - result-to-claim mapping
+   - decision history
+
+## What Not To Do In V1
+
+- do not model every engineering concept immediately
+- do not build a giant graph with no practical queries
+- do not collapse structured objects back into only markdown
+- do not let generated prose outrank structured truth
+- do not auto-promote trusted state too aggressively
+
+## Summary
+
+Ontology V1 should be:
+- small enough to implement
+- rich enough to be useful
+- aligned with AtoCore trust philosophy
+- capable of absorbing the existing engineering Knowledge Base work
+
+The first goal is not to model everything.
+The first goal is to represent enough of a real project that AtoCore can reason over structure, not just notes.
--- a/docs/architecture/engineering-query-catalog.md
+++ b/docs/architecture/engineering-query-catalog.md
@@ -0,0 +1,380 @@
+# Engineering Query Catalog (V1 driving target)
+
+## Purpose
+
+This document is the **single most important driver** of the engineering
+layer V1 design. The ontology, the schema, the relationship types, and
+the human mirror templates should all be designed *to answer the queries
+in this catalog*. Anything in the ontology that does not serve at least
+one of these queries is overdesign for V1.
+
+The rule is:
+
+> If we cannot describe what question a typed object or relationship
+> lets us answer, that object or relationship is not in V1.
+
+The catalog is also the **acceptance test** for the engineering layer.
+"V1 is done" means: AtoCore can answer at least the V1-required queries
+in this list against the active project set (`p04-gigabit`,
+`p05-interferometer`, `p06-polisher`).
+
+## Structure of each entry
+
+Each query is documented as:
+
+- **id**: stable identifier (`Q-001`, `Q-002`, ...)
+- **question**: the natural-language question a human or LLM would ask
+- **example invocation**: how a client would call AtoCore to ask it
+- **expected result shape**: the structure of the answer (not real data)
+- **objects required**: which engineering objects must exist
+- **relationships required**: which relationships must exist
+- **provenance requirement**: what evidence must be linkable
+- **tier**: `v1-required` | `v1-stretch` | `v2`
+
+## Tiering
+
+- **v1-required** queries are the floor. The engineering layer cannot
+  ship without all of them working.
+- **v1-stretch** queries should be doable with V1 objects but may need
+  additional adapters.
+- **v2** queries are aspirational; they belong to a later wave of
+  ontology work and are listed here only to make sure V1 does not
+  paint us into a corner.
+
+## V1 minimum object set (recap)
+
+For reference, the V1 ontology includes:
+
+- Project, Subsystem, Component
+- Requirement, Constraint, Decision
+- Material, Parameter
+- AnalysisModel, Result, ValidationClaim
+- Artifact
+
+And the four relationship families:
+
+- Structural: `CONTAINS`, `PART_OF`, `INTERFACES_WITH`
+- Intent: `SATISFIES`, `CONSTRAINED_BY`, `BASED_ON_ASSUMPTION`,
+  `AFFECTED_BY_DECISION`, `SUPERSEDES`
+- Validation: `ANALYZED_BY`, `VALIDATED_BY`, `SUPPORTS`,
+  `CONFLICTS_WITH`, `DEPENDS_ON`
+- Provenance: `DESCRIBED_BY`, `UPDATED_BY_SESSION`, `EVIDENCED_BY`,
+  `SUMMARIZED_IN`
+
+Every query below is annotated with which of these it depends on, so
+that the V1 implementation order is unambiguous.
+
+---
+
+## Tier 1: Structure queries
+
+### Q-001 — What does this subsystem contain?
+- **question**: "What components and child subsystems make up
+  Subsystem `<name>`?"
+- **invocation**: `GET /entities/Subsystem/<id>?expand=contains`
+- **expected**: `{ subsystem, contains: [{ id, type, name, status }] }`
+- **objects**: Subsystem, Component
+- **relationships**: `CONTAINS`
+- **provenance**: each child must link back to at least one Artifact or
+  source chunk via `DESCRIBED_BY` / `EVIDENCED_BY`
+- **tier**: v1-required
+
+### Q-002 — What is this component a part of?
+- **question**: "Which subsystem(s) does Component `<name>` belong to?"
+- **invocation**: `GET /entities/Component/<id>?expand=parents`
+- **expected**: `{ component, part_of: [{ id, type, name, status }] }`
+- **objects**: Component, Subsystem
+- **relationships**: `PART_OF` (inverse of `CONTAINS`)
+- **provenance**: same as Q-001
+- **tier**: v1-required
+
+### Q-003 — What interfaces does this subsystem have, and to what?
+- **question**: "What does Subsystem `<name>` interface with, and on
+  which interfaces?"
+- **invocation**: `GET /entities/Subsystem/<id>/interfaces`
+- **expected**: `[{ interface_id, peer: { id, type, name }, role }]`
+- **objects**: Subsystem (Interface object deferred to v2)
+- **relationships**: `INTERFACES_WITH`
+- **tier**: v1-required (with simplified Interface = string label;
+  full Interface object becomes v2)
+
+### Q-004 — What is the system map for this project right now?
+- **question**: "Give me the current structural tree of Project `<id>`."
+- **invocation**: `GET /projects/<id>/system-map`
+- **expected**: nested tree of `{ id, type, name, status, children: [] }`
+- **objects**: Project, Subsystem, Component
+- **relationships**: `CONTAINS`, `PART_OF`
+- **tier**: v1-required
+
+---
+
+## Tier 2: Intent queries
+
+### Q-005 — Which requirements does this component satisfy?
+- **question**: "Which Requirements does Component `<name>` satisfy
+  today?"
+- **invocation**: `GET /entities/Component/<id>?expand=satisfies`
+- **expected**: `[{ requirement_id, name, status, confidence }]`
+- **objects**: Component, Requirement
+- **relationships**: `SATISFIES`
+- **provenance**: each `SATISFIES` edge must link to a Result or
+  ValidationClaim that supports the satisfaction (or be flagged as
+  `unverified`)
+- **tier**: v1-required
+
+### Q-006 — Which requirements are not satisfied by anything?
+- **question**: "Show me orphan Requirements in Project `<id>` —
+  requirements with no `SATISFIES` edge from any Component."
+- **invocation**: `GET /projects/<id>/requirements?coverage=orphan`
+- **expected**: `[{ requirement_id, name, status, last_updated }]`
+- **objects**: Project, Requirement, Component
+- **relationships**: absence of `SATISFIES`
+- **tier**: v1-required (this is the killer correctness query — it's
+  the engineering equivalent of "untested code")
+
+### Q-007 — What constrains this component?
+- **question**: "What Constraints apply to Component `<name>`?"
+- **invocation**: `GET /entities/Component/<id>?expand=constraints`
+- **expected**: `[{ constraint_id, name, value, source_decision_id? }]`
+- **objects**: Component, Constraint
+- **relationships**: `CONSTRAINED_BY`
+- **tier**: v1-required
+
+### Q-008 — Which decisions affect this subsystem or component?
+- **question**: "Show me every Decision that affects `<entity>`."
+- **invocation**: `GET /entities/<type>/<id>?expand=decisions`
+- **expected**: `[{ decision_id, name, status, made_at, supersedes? }]`
+- **objects**: Decision, plus the affected entity
+- **relationships**: `AFFECTED_BY_DECISION`, `SUPERSEDES`
+- **tier**: v1-required
+
+### Q-009 — Which decisions are based on assumptions that are now flagged?
+- **question**: "Are any active Decisions in Project `<id>` based on an
+  Assumption that has been marked invalid or needs_review?"
+- **invocation**: `GET /projects/<id>/decisions?assumption_status=needs_review,invalid`
+- **expected**: `[{ decision_id, assumption_id, assumption_status }]`
+- **objects**: Decision, Assumption
+- **relationships**: `BASED_ON_ASSUMPTION`
+- **tier**: v1-required (this is the second killer correctness query —
+  catches fragile design)
+
+---
+
+## Tier 3: Validation queries
+
+### Q-010 — What result validates this claim?
+- **question**: "Show me the Result(s) supporting ValidationClaim
+  `<name>`."
+- **invocation**: `GET /entities/ValidationClaim/<id>?expand=supports`
+- **expected**: `[{ result_id, analysis_model_id, summary, confidence }]`
+- **objects**: ValidationClaim, Result, AnalysisModel
+- **relationships**: `SUPPORTS`, `ANALYZED_BY`
+- **provenance**: every Result must link to its AnalysisModel and an
+  Artifact via `DESCRIBED_BY`
+- **tier**: v1-required
+
+### Q-011 — Are there any active validation claims with no supporting result?
+- **question**: "Which active ValidationClaims in Project `<id>` have
+  no `SUPPORTS` edge from any Result?"
+- **invocation**: `GET /projects/<id>/validation?coverage=unsupported`
+- **expected**: `[{ claim_id, name, status, last_updated }]`
+- **objects**: ValidationClaim, Result
+- **relationships**: absence of `SUPPORTS`
+- **tier**: v1-required (third killer correctness query — catches
+  claims that are not yet evidenced)
+
+### Q-012 — Are there conflicting results for the same claim?
+- **question**: "Show me ValidationClaims where multiple Results
+  disagree (one `SUPPORTS`, another `CONFLICTS_WITH`)."
+- **invocation**: `GET /projects/<id>/validation?coverage=conflict`
+- **expected**: `[{ claim_id, supporting_results, conflicting_results }]`
+- **objects**: ValidationClaim, Result
+- **relationships**: `SUPPORTS`, `CONFLICTS_WITH`
+- **tier**: v1-required
+
+---
+
+## Tier 4: Change / time queries
+
+### Q-013 — What changed in this project recently?
+- **question**: "List entities in Project `<id>` whose `updated_at`
+  is within the last `<window>`."
+- **invocation**: `GET /projects/<id>/changes?since=<iso>`
+- **expected**: `[{ id, type, name, status, updated_at, change_kind }]`
+- **objects**: any
+- **relationships**: any
+- **tier**: v1-required
+
+### Q-014 — What is the decision history for this subsystem?
+- **question**: "Show me all Decisions affecting Subsystem `<id>` in
+  chronological order, including superseded ones."
+- **invocation**: `GET /entities/Subsystem/<id>/decision-log`
+- **expected**: ordered list with supersession chain
+- **objects**: Decision, Subsystem
+- **relationships**: `AFFECTED_BY_DECISION`, `SUPERSEDES`
+- **tier**: v1-required (this is what a human-readable decision log
+  is generated from)
+
+### Q-015 — What was the trusted state of this entity at time T?
+- **question**: "Reconstruct the active fields of `<entity>` as of
+  timestamp `<T>`."
+- **invocation**: `GET /entities/<type>/<id>?as_of=<iso>`
+- **expected**: the entity record as it would have been seen at T
+- **objects**: any
+- **relationships**: status lifecycle
+- **tier**: v1-stretch (requires status history table — defer if
+  baseline implementation runs long)
+
+---
+
+## Tier 5: Cross-cutting queries
+
+### Q-016 — Which interfaces are affected by changing this component?
+- **question**: "If Component `<name>` changes, which Interfaces and
+  which peer subsystems are impacted?"
+- **invocation**: `GET /entities/Component/<id>/impact`
+- **expected**: `[{ interface_id, peer_id, peer_type, peer_name }]`
+- **objects**: Component, Subsystem
+- **relationships**: `PART_OF`, `INTERFACES_WITH`
+- **tier**: v1-required (this is the change-impact-analysis query the
+  whole engineering layer exists for)
+
+### Q-017 — What evidence supports this fact?
+- **question**: "Give me the source documents and chunks that support
+  the current value of `<entity>.<field>`."
+- **invocation**: `GET /entities/<type>/<id>/evidence?field=<field>`
+- **expected**: `[{ source_file, chunk_id, heading_path, score }]`
+- **objects**: any
+- **relationships**: `EVIDENCED_BY`, `DESCRIBED_BY`
+- **tier**: v1-required (without this the engineering layer cannot
+  pass the AtoCore "trust + provenance" rule)
+
+### Q-018 — What is active vs superseded for this concept?
+- **question**: "Show me the current active record for `<key>` plus
+  the chain of superseded versions."
+- **invocation**: `GET /entities/<type>/<id>?include=superseded`
+- **expected**: `{ active, superseded_chain: [...] }`
+- **objects**: any
+- **relationships**: `SUPERSEDES`
+- **tier**: v1-required
+
+### Q-019 — Which components depend on this material?
+- **question**: "List every Component whose Material is `<material>`."
+- **invocation**: `GET /entities/Material/<id>/components`
+- **expected**: `[{ component_id, name, subsystem_id }]`
+- **objects**: Component, Material
+- **relationships**: derived from Component.material field, no edge
+  needed
+- **tier**: v1-required
+
+### Q-020 — What does this project look like as a project overview?
+- **question**: "Generate the human-readable Project Overview for
+  Project `<id>` from current trusted state."
+- **invocation**: `GET /projects/<id>/mirror/overview`
+- **expected**: formatted markdown derived from active entities
+- **objects**: Project, Subsystem, Component, Decision, Requirement,
+  ValidationClaim
+- **relationships**: structural + intent
+- **tier**: v1-required (this is the Layer 3 Human Mirror entry
+  point — the moment the engineering layer becomes useful to humans
+  who do not want to call APIs)
+
+---
+
+## v1-stretch (nice to have)
+
+### Q-021 — Which parameters drive this analysis result?
+- **objects**: AnalysisModel, Parameter, Result
+- **relationships**: `ANALYZED_BY`, plus a new `DRIVEN_BY` edge
+
+### Q-022 — Which decisions cite which prior decisions?
+- **objects**: Decision
+- **relationships**: `BASED_ON_DECISION` (new)
+
+### Q-023 — Cross-project comparison
+- **question**: "Are any Materials shared between p04, p05, and p06,
+  and are their Constraints consistent?"
+- **objects**: Project, Material, Constraint
+
+---
+
+## v2 (deferred)
+
+### Q-024 — Cost rollup
+- requires BOM Item, Cost Driver, Vendor — out of V1 scope
+
+### Q-025 — Manufacturing readiness
+- requires Manufacturing Process, Inspection Step, Assembly Procedure
+  — out of V1 scope
+
+### Q-026 — Software / control state
+- requires Software Module, State Machine, Sensor, Actuator — out
+  of V1 scope
+
+### Q-027 — Test correlation across analyses
+- requires Test, Correlation Record — out of V1 scope
+
+---
+
+## What this catalog implies for V1 implementation order
+
+The 20 v1-required queries above tell us what to build first, in
+roughly this order:
+
+1. **Structural** (Q-001 to Q-004): need Project, Subsystem, Component
+   and `CONTAINS` / `PART_OF` / `INTERFACES_WITH` (with Interface as a
+   simple string label, not its own entity).
+2. **Intent core** (Q-005 to Q-008): need Requirement, Constraint,
+   Decision and `SATISFIES` / `CONSTRAINED_BY` / `AFFECTED_BY_DECISION`.
+3. **Killer correctness queries** (Q-006, Q-009, Q-011): need the
+   absence-of-edge query patterns and the Assumption object.
+4. **Validation** (Q-010 to Q-012): need AnalysisModel, Result,
+   ValidationClaim and `SUPPORTS` / `ANALYZED_BY` / `CONFLICTS_WITH`.
+5. **Change/time** (Q-013, Q-014): need a write log per entity (the
+   existing `updated_at` plus a status history if Q-015 is in scope).
+6. **Cross-cutting** (Q-016 to Q-019): impact analysis is mostly a
+   graph traversal once the structural and intent edges exist.
+7. **Provenance** (Q-017): the entity store must always link to
+   chunks/artifacts via `EVIDENCED_BY` / `DESCRIBED_BY`. This is
+   non-negotiable and should be enforced at insert time, not later.
+8. **Human Mirror** (Q-020): the markdown generator is the *last*
+   thing built, not the first. It is derived from everything above.
+
+## What is intentionally left out of V1
+
+- BOM, manufacturing, vendor, cost objects (entire family deferred)
+- Software, control, electrical objects (entire family deferred)
+- Test correlation objects (entire family deferred)
+- Full Interface as its own entity (string label is enough for V1)
+- Time-travel queries beyond `since=<iso>` (Q-015 is stretch)
+- Multi-project rollups (Q-023 is stretch)
+
+## Open questions this catalog raises
+
+These are the design questions that need to be answered in the next
+planning docs (memory-vs-entities, conflict-model, promotion-rules):
+
+- **Q-006, Q-011 (orphan / unsupported queries)**: do orphans get
+  flagged at insert time, computed at query time, or both?
+- **Q-009 (assumption-driven decisions)**: when an Assumption flips
+  to `needs_review`, are all dependent Decisions auto-flagged or do
+  they only show up when this query is run?
+- **Q-012 (conflicting results)**: does AtoCore *block* a conflict
+  from being saved, or always save and flag? (The trust rule says
+  flag, never block — but the implementation needs the explicit nod.)
+- **Q-017 (evidence)**: is `EVIDENCED_BY` mandatory at insert? If yes,
+  how do we backfill entities extracted from older interactions where
+  the source link is fuzzy?
+- **Q-020 (Project Overview mirror)**: when does it regenerate?
+  On every entity write? On a schedule? On demand?
+
+These are the questions the next architecture docs in the planning
+sprint should resolve before any code is written.
+
+## Working rule
+
+> If a v1-required query in this catalog cannot be answered against
+> at least one of `p04-gigabit`, `p05-interferometer`, or
+> `p06-polisher`, the engineering layer is not done.
+
+This catalog is the contract.
--- a/docs/architecture/memory-vs-entities.md
+++ b/docs/architecture/memory-vs-entities.md
@@ -0,0 +1,309 @@
+# Memory vs Entities (Engineering Layer V1 boundary)
+
+## Why this document exists
+
+The engineering layer introduces a new representation — typed
+entities with explicit relationships — alongside AtoCore's existing
+memory system and its six memory types. The question that blocks
+every other engineering-layer planning doc is:
+
+> When we extract a fact from an interaction or a document, does it
+> become a memory, an entity, or both? And if both, which one is
+> canonical?
+
+Without an answer, the rest of the engineering layer cannot be
+designed. This document is the answer.
+
+## The short version
+
+- **Memories stay.** They are still the canonical home for
+  *unstructured, attributed, personal, natural-language* facts.
+- **Entities are new.** They are the canonical home for *structured,
+  typed, relational, engineering-domain* facts.
+- **No concept lives in both at full fidelity.** Every concept has
+  exactly one canonical home. The other layer may hold a pointer or
+  a rendered view, never a second source of truth.
+- **The two layers share one review queue.** Candidates from
+  extraction flow into the same `status=candidate` lifecycle
+  regardless of whether they are memory-bound or entity-bound.
+- **Memories can "graduate" into entities** when enough structure has
+  accumulated, but the upgrade is an explicit, logged promotion, not
+  a silent rewrite.
+
+## The split per memory type
+
+The six memory types from the current Phase 2 implementation each
+map to exactly one outcome in V1:
+
+| Memory type   | V1 destination                | Rationale                                                                                                   |
+|---------------|-------------------------------|-------------------------------------------------------------------------------------------------------------|
+| identity      | **memory only**               | Always about the human user. No engineering domain structure. Never gets entity-shaped.                     |
+| preference    | **memory only**               | Always about the human user's working style. Same reasoning.                                                |
+| episodic      | **memory only**               | "What happened in this conversation / this day." Attribution and time are the point, not typed structure.  |
+| knowledge     | **entity when possible**, memory otherwise | If the knowledge maps to a typed engineering object (material property, constant, tolerance), it becomes a Fact entity with provenance. If it's loose general knowledge, stays a memory. |
+| project       | **entity**                    | Anything that belonged in the "project" memory type is really a Requirement, Constraint, Decision, Subsystem attribute, etc. It belongs in the engineering layer once entities exist. |
+| adaptation    | **entity (Decision)**         | "We decided to X" is literally a Decision entity in the ontology. This is the clearest migration.            |
+
+**Practical consequence:** when the engineering layer V1 ships, the
+`project`, `knowledge`, and `adaptation` memory types are deprecated
+as a canonical home for new facts. Existing rows are not deleted —
+they are backfilled as entities through the promotion-rules flow
+(see `promotion-rules.md`), and the old memory rows become frozen
+references pointing at their graduated entity.
+
+The `identity`, `preference`, and `episodic` memory types continue
+to exist exactly as they do today and do not interact with the
+engineering layer at all.
+
+## What "canonical home" actually means
+
+A concept's canonical home is the single place where:
+
+- its *current active value* is stored
+- its *status lifecycle* is managed (active/superseded/invalid)
+- its *confidence* is tracked
+- its *provenance chain* is rooted
+- edits, supersessions, and invalidations are applied
+- conflict resolution is arbitrated
+
+Everything else is a derived view of that canonical row.
+
+If a `Decision` entity is the canonical home for "we switched to
+GF-PTFE pads", then:
+
+- there is no `adaptation` memory row with the same content; the
+  extractor creates a `Decision` candidate directly
+- the context builder, when asked to include relevant state, reaches
+  into the entity store via the engineering layer, not the memory
+  store
+- if the user wants to see "recent decisions" they hit the entity
+  API, never the memory API
+- if they want to invalidate the decision, they do so via the entity
+  API
+
+The memory API remains the canonical home for `identity`,
+`preference`, and `episodic` — same rules, just a different set of
+types.
+
+## Why not a unified table with a `kind` column?
+
+It would be simpler to implement. It is rejected for three reasons:
+
+1. **Different query shapes.** Memories are queried by type, project,
+   confidence, recency. Entities are queried by type, relationships,
+   graph traversal, coverage gaps ("orphan requirements"). Cramming
+   both into one table forces the schema to be the union of both
+   worlds and makes each query slower.
+
+2. **Different lifecycles.** Memories have a simple four-state
+   lifecycle (candidate/active/superseded/invalid). Entities have
+   the same four states *plus* per-relationship supersession,
+   per-field versioning for the killer correctness queries, and
+   structured conflict flagging. The unified table would have to
+   carry all entity apparatus for every memory row.
+
+3. **Different provenance semantics.** A preference memory is
+   provenanced by "the user told me" — one author, one time.
+   An entity like a `Requirement` is provenanced by "this source
+   chunk + this source document + these supporting Results" — a
+   graph. The tables want to be different because their provenance
+   models are different.
+
+So: two tables, one review queue, one promotion flow, one trust
+hierarchy.
+
+## The shared review queue
+
+Both the memory extractor (Phase 9 Commit C, already shipped) and
+the future entity extractor write into the same conceptual queue:
+everything lands at `status=candidate` in its own table, and the
+human reviewer sees a unified list. The reviewer UI (future work)
+shows candidates of all kinds side by side, grouped by source
+interaction / source document, with the rule that fired.
+
+From the data side this means:
+
+- the memories table gets a `candidate` status (**already done in
+  Phase 9 Commit B/C**)
+- the future entities table will get the same `candidate` status
+- both tables get the same `promote` / `reject` API shape: one verb
+  per candidate, with an audit log entry
+
+Implementation note: the API routes should evolve from
+`POST /memory/{id}/promote` to `POST /candidates/{id}/promote` once
+both tables exist, so the reviewer tooling can treat them
+uniformly. The current memory-only route stays in place for
+backward compatibility and is aliased by the unified route.
+
+## Memory-to-entity graduation
+
+Even though the split is clean on paper, real usage will reveal
+memories that deserve to be entities but started as plain text.
+Four signals are good candidates for proposing graduation:
+
+1. **Reference count crosses a threshold.** A memory that has been
+   reinforced 5+ times across multiple interactions is a strong
+   signal that it deserves structure.
+
+2. **Memory content matches a known entity template.** If a
+   `knowledge` memory's content matches the shape "X = value [unit]"
+   it can be proposed as a `Fact` or `Parameter` entity.
+
+3. **A user explicitly asks for promotion.** `POST /memory/{id}/graduate`
+   is the simplest explicit path — it returns a proposal for an
+   entity structured from the memory's content, which the user can
+   accept or reject.
+
+4. **Extraction pass proposes an entity that happens to match an
+   existing memory.** The entity extractor, when scanning a new
+   interaction, sees the same content already exists as a memory
+   and proposes graduation as part of its candidate output.
+
+The graduation flow is:
+
+```
+memory row (active, confidence C)
+      |
+      | propose_graduation()
+      v
+entity candidate row (candidate, confidence C)
+      +
+memory row gets status="graduated" and a forward pointer to the
+entity candidate
+      |
+      | human promotes the candidate entity
+      v
+entity row (active)
+      +
+memory row stays "graduated" permanently (historical record)
+```
+
+The memory is never deleted. It becomes a frozen historical
+pointer to the entity it became. This keeps the audit trail intact
+and lets the Human Mirror show "this decision started life as a
+memory on April 2, was graduated to an entity on April 15, now has
+2 supporting ValidationClaims".
+
+The `graduated` status is a new memory status that gets added when
+the graduation flow is implemented. For now (Phase 9), only the
+three non-graduating types (identity/preference/episodic) would
+ever avoid it, and the three graduating types stay in their current
+memory-only state until the engineering layer ships.
+
+## Context pack assembly after the split
+
+The context builder today (`src/atocore/context/builder.py`) pulls:
+
+1. Trusted Project State
+2. Identity + Preference memories
+3. Retrieved chunks
+
+After the split, it pulls:
+
+1. Trusted Project State (unchanged)
+2. **Identity + Preference memories** (unchanged — these stay memories)
+3. **Engineering-layer facts relevant to the prompt**, queried through
+   the entity API (new)
+4. Retrieved chunks (unchanged, lowest trust)
+
+Note the ordering: identity/preference memories stay above entities,
+because personal style information is always more trusted than
+extracted engineering facts. Entities sit below the personal layer
+but above raw retrieval, because they have structured provenance
+that raw chunks lack.
+
+The budget allocation gains a new slot:
+
+- trusted project state: 20% (unchanged, highest trust)
+- identity memories: 5% (unchanged)
+- preference memories: 5% (unchanged)
+- **engineering entities: 15%** (new — pulls only V1-required
+  objects relevant to the prompt)
+- retrieval: 55% (reduced from 70% to make room)
+
+These are starting numbers. After the engineering layer ships and
+real usage tunes retrieval quality, these will be revisited.
+
+## What the shipped memory types still mean after the split
+
+| Memory type | Still accepts new writes? | V1 destination for new extractions |
+|-------------|---------------------------|------------------------------------|
+| identity    | **yes**                   | memory (no change)                 |
+| preference  | **yes**                   | memory (no change)                 |
+| episodic    | **yes**                   | memory (no change)                 |
+| knowledge   | yes, but only for loose facts | entity (Fact / Parameter) for structured things; memory is a fallback |
+| project     | **no new writes after engineering V1 ships** | entity (Requirement / Constraint / Subsystem attribute) |
+| adaptation  | **no new writes after engineering V1 ships** | entity (Decision) |
+
+"No new writes" means the `create_memory` path will refuse to
+create new `project` or `adaptation` memories once the engineering
+layer V1 ships. Existing rows stay queryable and reinforceable but
+new facts of those kinds must become entities. This keeps the
+canonical-home rule clean going forward.
+
+The deprecation is deferred: it does not happen until the engineering
+layer V1 is demonstrably working against the active project set. Until
+then, the existing memory types continue to accept writes so the
+Phase 9 loop can be exercised without waiting on the engineering
+layer.
+
+## Consequences for Phase 9 (what we just built)
+
+The capture loop, reinforcement, and extractor we shipped today
+are *memory-facing*. They produce memory candidates, reinforce
+memory confidence, and respect the memory status lifecycle. None
+of that changes.
+
+When the engineering layer V1 ships, the extractor in
+`src/atocore/memory/extractor.py` gets a sibling in
+`src/atocore/entities/extractor.py` that uses the same
+interaction-scanning approach but produces entity candidates
+instead. The `POST /interactions/{id}/extract` endpoint either:
+
+- runs both extractors and returns a combined result, or
+- gains a `?target=memory|entities|both` query parameter
+
+and the decision between those two shapes can wait until the
+entity extractor actually exists.
+
+Until the entity layer is real, the memory extractor also has to
+cover some things that will eventually move to entities (decisions,
+constraints, requirements). **That overlap is temporary and
+intentional.** Rather than leave those cues unextracted for months
+while the entity layer is being built, the memory extractor
+surfaces them as memory candidates. Later, a migration pass will
+propose graduation on every active memory created by
+`decision_heading`, `constraint_heading`, and `requirement_heading`
+rules once the entity types exist to receive them.
+
+So: **no rework in Phase 9, no wasted extraction, clean handoff
+once the entity layer lands**.
+
+## Open questions this document does NOT answer
+
+These are deliberately deferred to later planning docs:
+
+1. **When exactly does extraction fire?** (answered by
+   `promotion-rules.md`)
+2. **How are conflicts between a memory and an entity handled
+   during graduation?** (answered by `conflict-model.md`)
+3. **Does the context builder traverse the entity graph for
+   relationship-rich queries, or does it only surface direct facts?**
+   (answered by the context-builder spec in a future
+   `engineering-context-integration.md` doc)
+4. **What is the exact API shape of the unified candidate review
+   queue?** (answered by a future `review-queue-api.md` doc when
+   the entity extractor exists and both tables need one UI)
+
+## TL;DR
+
+- memories = user-facing unstructured facts, still own identity/preference/episodic
+- entities = engineering-facing typed facts, own project/knowledge/adaptation
+- one canonical home per concept, never both
+- one shared candidate-review queue, same promote/reject shape
+- graduated memories stay as frozen historical pointers
+- Phase 9 stays memory-only and ships today; entity V1 follows the
+  remaining architecture docs in this planning sprint
+- no rework required when the entity layer lands; the current memory
+  extractor's structural cues get migrated forward via explicit
+  graduation
--- a/docs/architecture/promotion-rules.md
+++ b/docs/architecture/promotion-rules.md
@@ -0,0 +1,343 @@
+# Promotion Rules (Layer 0 → Layer 2 pipeline)
+
+## Purpose
+
+AtoCore ingests raw human-authored content (markdown, repo notes,
+interaction transcripts) and eventually must turn some of it into
+typed engineering entities that the V1 query catalog can answer.
+The path from raw text to typed entity has to be:
+
+- **explicit**: every step has a named operation, a trigger, and an
+  audit log
+- **reversible**: every promotion can be undone without data loss
+- **conservative**: no automatic movement into trusted state; a human
+  (or later, a very confident policy) always signs off
+- **traceable**: every typed entity must carry a back-pointer to
+  the raw source that produced it
+
+This document defines that path.
+
+## The four layers
+
+Promotion is described in terms of four layers, all of which exist
+simultaneously in the system once the engineering layer V1 ships:
+
+| Layer | Name              | Canonical storage                        | Trust | Who writes |
+|-------|-------------------|------------------------------------------|-------|------------|
+| L0    | Raw source        | source_documents + source_chunks         | low   | ingestion pipeline |
+| L1    | Memory candidate  | memories (status="candidate")            | low   | extractor |
+| L1'   | Active memory     | memories (status="active")               | med   | human promotion |
+| L2    | Entity candidate  | entities (status="candidate")            | low   | extractor + graduation |
+| L2'   | Active entity     | entities (status="active")               | high  | human promotion |
+| L3    | Trusted state     | project_state                            | highest | human curation |
+
+Layer 3 (trusted project state) is already implemented and stays
+manually curated — automatic promotion into L3 is **never** allowed.
+
+## The promotion graph
+
+```
+          [L0] source chunks
+               |
+               | extraction (memory extractor, Phase 9 Commit C)
+               v
+        [L1] memory candidate
+               |
+               | promote_memory()
+               v
+        [L1'] active memory
+               |
+               | (optional) propose_graduation()
+               v
+        [L2] entity candidate
+               |
+               | promote_entity()
+               v
+        [L2'] active entity
+               |
+               | (manual curation, NEVER automatic)
+               v
+        [L3] trusted project state
+```
+
+Short path (direct entity extraction, once the entity extractor
+exists):
+
+```
+          [L0] source chunks
+               |
+               | entity extractor
+               v
+        [L2] entity candidate
+               |
+               | promote_entity()
+               v
+        [L2'] active entity
+```
+
+A single fact can travel either path depending on what the
+extractor saw. The graduation path exists for facts that started
+life as memories before the entity layer existed, and for the
+memory extractor's structural cues (decisions, constraints,
+requirements) which are eventually entity-shaped.
+
+## Triggers (when does extraction fire?)
+
+Phase 9 already shipped one trigger: **on explicit API request**
+(`POST /interactions/{id}/extract`). The V1 engineering layer adds
+two more:
+
+1. **On interaction capture (automatic)**
+   - Same event that runs reinforcement today
+   - Controlled by a `extract` boolean flag on the record request
+     (default: `false` for memory extractor, `true` once an
+     engineering extractor exists and has been validated)
+   - Output goes to the candidate queue; nothing auto-promotes
+
+2. **On ingestion (batched, per wave)**
+   - After a wave of markdown ingestion finishes, a batch extractor
+     pass sweeps all newly-added source chunks and produces
+     candidates from them
+   - Batched per wave (not per chunk) to keep the review queue
+     digestible and to let the reviewer see all candidates from a
+     single ingestion in one place
+   - Output: a report artifact plus a review queue entry per
+     candidate
+
+3. **On explicit human request (existing)**
+   - `POST /interactions/{id}/extract` for a single interaction
+   - Future: `POST /ingestion/wave/{id}/extract` for a whole wave
+   - Future: `POST /memory/{id}/graduate` to propose graduation
+     of one specific memory into an entity
+
+Batch size rule: **extraction passes never write more than N
+candidates per human review cycle, where N = 50 by default**. If
+a pass produces more, it ranks by (rule confidence × content
+length × novelty) and only writes the top N. The remaining
+candidates are logged, not persisted. This protects the reviewer
+from getting buried.
+
+## Confidence and ranking of candidates
+
+Each rule-based extraction rule carries a *prior confidence*
+based on how specific its pattern is:
+
+| Rule class                | Prior | Rationale |
+|---------------------------|-------|-----------|
+| Heading with explicit type (`## Decision:`) | 0.7 | Very specific structural cue, intentional author marker |
+| Typed list item (`- [Decision] ...`)        | 0.65 | Explicit but often embedded in looser prose |
+| Sentence pattern (`I prefer X`)             | 0.5 | Moderate structure, more false positives |
+| Regex pattern matching a value+unit (`X = 4.8 kg`) | 0.6 | Structural but prone to coincidence |
+| LLM-based (future)                          | variable | Depends on model's returned confidence |
+
+The candidate's final confidence at write time is:
+
+```
+final = prior * structural_signal_multiplier * freshness_bonus
+```
+
+Where:
+
+- `structural_signal_multiplier` is 1.1 if the source chunk path
+  contains any of `_HIGH_SIGNAL_HINTS` from the retriever (status,
+  decision, requirements, charter, ...) and 0.9 if it contains
+  `_LOW_SIGNAL_HINTS` (`_archive`, `_history`, ...)
+- `freshness_bonus` is 1.05 if the source chunk was updated in the
+  last 30 days, else 1.0
+
+This formula is tuned later; the numbers are starting values.
+
+## Review queue mechanics
+
+### Queue population
+
+- Each candidate writes one row into its target table
+  (memories or entities) with `status="candidate"`
+- Each candidate carries: `rule`, `source_span`, `source_chunk_id`,
+  `source_interaction_id`, `extractor_version`
+- No two candidates ever share the same (type, normalized_content,
+  project) — if a second extraction pass produces a duplicate, it
+  is dropped before being written
+
+### Queue surfacing
+
+- `GET /memory?status=candidate` lists memory candidates
+- `GET /entities?status=candidate` (future) lists entity candidates
+- `GET /candidates` (future unified route) lists both
+
+### Reviewer actions
+
+For each candidate, exactly one of:
+
+- **promote**: `POST /memory/{id}/promote` or
+  `POST /entities/{id}/promote`
+  - sets `status="active"`
+  - preserves the audit trail (source_chunk_id, rule, source_span)
+- **reject**: `POST /memory/{id}/reject` or
+  `POST /entities/{id}/reject`
+  - sets `status="invalid"`
+  - preserves audit trail so repeat extractions don't re-propose
+- **edit-then-promote**: `PUT /memory/{id}` to adjust content, then
+  `POST /memory/{id}/promote`
+  - every edit is logged, original content preserved in a
+    `previous_content_log` column (schema addition deferred to
+    the first implementation sprint)
+- **defer**: no action; candidate stays in queue indefinitely
+  (future: add a `pending_since` staleness indicator to the UI)
+
+### Reviewer authentication
+
+In V1 the review queue is single-user by convention. There is no
+per-reviewer authorization. Every promote/reject call is logged
+with the same default identity. Multi-user review is a V2 concern.
+
+## Auto-promotion policies (deferred, but designed for)
+
+The current V1 stance is: **no auto-promotion, ever**. All
+promotions require a human reviewer.
+
+The schema and API are designed so that automatic policies can be
+added later without schema changes. The anticipated policies:
+
+1. **Reference-count threshold**
+   - If a candidate accumulates N+ references across multiple
+     interactions within M days AND the reviewer hasn't seen it yet
+     (indicating the system sees it often but the human hasn't
+     gotten to it), propose auto-promote
+   - Starting thresholds: N=5, M=7 days. Never auto-promote
+     entity candidates that affect validation claims or decisions
+     without explicit human review — those are too consequential.
+
+2. **Confidence threshold**
+   - If `final_confidence >= 0.85` AND the rule is a heading
+     rule (not a sentence rule), eligible for auto-promotion
+
+3. **Identity/preference lane**
+   - identity and preference memories extracted from an
+     interaction where the user explicitly says "I am X" or
+     "I prefer X" with a first-person subject and high-signal
+     verb could auto-promote. This is the safest lane because
+     the user is the authoritative source for their own identity.
+
+None of these run in V1. The APIs and data shape are designed so
+they can be added as a separate policy module without disrupting
+existing tests.
+
+## Reversibility
+
+Every promotion step must be undoable:
+
+| Operation                 | How to undo                                           |
+|---------------------------|-------------------------------------------------------|
+| memory candidate written  | delete the candidate row (low-risk, it was never in context) |
+| memory candidate promoted | `PUT /memory/{id}` status=candidate (reverts to queue) |
+| memory candidate rejected | `PUT /memory/{id}` status=candidate                   |
+| memory graduated          | memory stays as a frozen pointer; delete the entity candidate to undo |
+| entity candidate promoted | `PUT /entities/{id}` status=candidate                 |
+| entity promoted to active | supersede with a new active, or `PUT` back to candidate |
+
+The only irreversible operation is manual curation into L3
+(trusted project state). That is by design — L3 is small, curated,
+and human-authored end to end.
+
+## Provenance (what every candidate must carry)
+
+Every candidate row, memory or entity, MUST have:
+
+- `source_chunk_id` — if extracted from ingested content, the chunk it came from
+- `source_interaction_id` — if extracted from a captured interaction, the interaction it came from
+- `rule` — the extractor rule id that fired
+- `extractor_version` — a semver-ish string the extractor module carries
+  so old candidates can be re-evaluated with a newer extractor
+
+If both `source_chunk_id` and `source_interaction_id` are null, the
+candidate was hand-authored (via `POST /memory` directly) and must
+be flagged as such. Hand-authored candidates are allowed but
+discouraged — the preference is to extract from real content, not
+dictate candidates directly.
+
+The active rows inherit all of these fields from their candidate
+row at promotion time. They are never overwritten.
+
+## Extractor versioning
+
+The extractor is going to change — new rules added, old rules
+refined, precision/recall tuned over time. The promotion flow
+must survive extractor changes:
+
+- every extractor module exposes an `EXTRACTOR_VERSION = "0.1.0"`
+  constant
+- every candidate row records this version
+- when the extractor version changes, the change log explains
+  what the new rules do
+- old candidates are NOT automatically re-evaluated by the new
+  extractor — that would lose the auditable history of why the
+  old candidate was created
+- future `POST /memory/{id}/re-extract` can optionally propose
+  an updated candidate from the same source chunk with the new
+  extractor, but it produces a *new* candidate alongside the old
+  one, never a silent rewrite
+
+## Ingestion-wave extraction semantics
+
+When the batched extraction pass fires on an ingestion wave, it
+produces a report artifact:
+
+```
+data/extraction-reports/<wave-id>/
+  ├── report.json           # summary counts, rule distribution
+  ├── candidates.ndjson     # one JSON line per persisted candidate
+  ├── dropped.ndjson        # one JSON line per candidate dropped
+  │                         # (over batch cap, duplicate, below
+  │                         # min content length, etc.)
+  └── errors.log            # any rule-level errors
+```
+
+The report artifact lives under the configured `data_dir` and is
+retained per the backup retention policy. The ingestion-waves doc
+(`docs/ingestion-waves.md`) is updated to include an "extract"
+step after each wave, with the expectation that the human
+reviews the candidates before the next wave fires.
+
+## Candidate-to-candidate deduplication across passes
+
+Two extraction passes over the same chunk (or two different
+chunks containing the same fact) should not produce two identical
+candidate rows. The deduplication key is:
+
+```
+(memory_type_or_entity_type, normalized_content, project, status)
+```
+
+Normalization strips whitespace variants, lowercases, and drops
+trailing punctuation (same rules as the extractor's `_clean_value`
+function). If a second pass would produce a duplicate, it instead
+increments a `re_extraction_count` column on the existing
+candidate row and updates `last_re_extracted_at`. This gives the
+reviewer a "saw this N times" signal without flooding the queue.
+
+This column is a future schema addition — current candidates do
+not track re-extraction. The promotion-rules implementation will
+land the column as part of its first migration.
+
+## The "never auto-promote into trusted state" invariant
+
+Regardless of what auto-promotion policies might exist between
+L0 → L2', **nothing ever moves into L3 (trusted project state)
+without explicit human action via `POST /project/state`**. This
+is the one hard line in the promotion graph and it is enforced
+by having no API endpoint that takes a candidate id and writes
+to `project_state`.
+
+## Summary
+
+- Four layers: L0 raw, L1 memory candidate/active, L2 entity
+  candidate/active, L3 trusted state
+- Three triggers for extraction: on capture, on ingestion wave, on
+  explicit request
+- Per-rule prior confidence, tuned by structural signals at write time
+- Shared candidate review queue, promote/reject/edit/defer actions
+- No auto-promotion in V1 (but the schema allows it later)
+- Every candidate carries full provenance and extractor version
+- Every promotion step is reversible except L3 curation
+- L3 is never touched automatically
--- a/docs/atocore-ecosystem-and-hosting.md
+++ b/docs/atocore-ecosystem-and-hosting.md
@@ -0,0 +1,155 @@
+# AtoCore Ecosystem And Hosting
+
+## Purpose
+
+This document defines the intended boundaries between the Ato ecosystem layers
+and the current hosting model.
+
+## Ecosystem Roles
+
+- `AtoCore`
+  - runtime, ingestion, retrieval, memory, context builder, API
+  - owns the machine-memory and context assembly system
+- `AtoMind`
+  - future intelligence layer
+  - will own promotion, reflection, conflict handling, and trust decisions
+- `AtoVault`
+  - human-readable memory source
+  - intended for Obsidian and manual inspection/editing
+- `AtoDrive`
+  - trusted operational project source
+  - curated project truth with higher trust than general notes
+
+## Trust Model
+
+Current intended trust precedence:
+
+1. Trusted Project State
+2. AtoDrive artifacts
+3. Recent validated memory
+4. AtoVault summaries
+5. PKM chunks
+6. Historical or low-confidence material
+
+## Storage Boundaries
+
+Human-readable source layers and machine operational storage must remain
+separate.
+
+- `AtoVault` is a source layer, not the live vector database
+- `AtoDrive` is a source layer, not the live vector database
+- machine operational state includes:
+  - SQLite database
+  - vector store
+  - indexes
+  - embeddings
+  - runtime metadata
+  - cache and temp artifacts
+
+The machine database is derived operational state, not the primary
+human-readable source of truth.
+
+## Source Snapshot Vs Machine Store
+
+The human-readable files visible under `sources/vault` or `sources/drive` are
+not the final "smart storage" format of AtoCore.
+
+They are source snapshots made visible to the canonical Dalidou instance so
+AtoCore can ingest them.
+
+The actual machine-processed state lives in:
+
+- `source_documents`
+- `source_chunks`
+- vector embeddings and indexes
+- project memories
+- trusted project state
+- context-builder output
+
+This means the staged markdown can still look very similar to the original PKM
+or repo docs. That is normal.
+
+The intelligence does not come from rewriting everything into a new markdown
+vault. It comes from ingesting selected source material into the machine store
+and then using that store for retrieval, trust-aware context assembly, and
+memory.
+
+## Canonical Hosting Model
+
+Dalidou is the canonical host for the AtoCore service and machine database.
+
+OpenClaw on the T420 should consume AtoCore over API and network, ideally over
+Tailscale or another trusted internal network path.
+
+The live SQLite and vector store must not be treated as a multi-node synced
+filesystem. The architecture should prefer one canonical running service over
+file replication of the live machine store.
+
+## Canonical Dalidou Layout
+
+```text
+/srv/storage/atocore/
+  app/         # deployed AtoCore repository
+  data/        # canonical machine state
+    db/
+    chroma/
+    cache/
+    tmp/
+  sources/     # human-readable source inputs
+    vault/
+    drive/
+  logs/
+  backups/
+  run/
+```
+
+## Operational Rules
+
+- source directories are treated as read-only by the AtoCore runtime
+- Dalidou holds the canonical machine DB
+- OpenClaw should use AtoCore as an additive context service
+- OpenClaw must continue to work if AtoCore is unavailable
+- write-back from OpenClaw into AtoCore is deferred until later phases
+
+Current staging behavior:
+
+- selected project docs may be copied into a readable staging area on Dalidou
+- AtoCore ingests from that staging area into the machine store
+- the staging area is not itself the durable intelligence layer
+- changes to the original PKM or repo source do not propagate automatically
+  until a refresh or re-ingest happens
+
+## Intended Daily Operating Model
+
+The target workflow is:
+
+- the human continues to work primarily in PKM project notes, Git/Gitea repos,
+  Discord, and normal OpenClaw sessions
+- OpenClaw keeps its own runtime behavior and memory system
+- AtoCore acts as the durable external context layer that compiles trusted
+  project state, retrieval, and long-lived machine-readable context
+- AtoCore improves prompt quality and robustness without replacing direct repo
+  work, direct file reads, or OpenClaw's own memory
+
+In other words:
+
+- PKM and repos remain the human-authoritative project sources
+- OpenClaw remains the active operating environment
+- AtoCore remains the compiled context engine and machine-memory host
+
+## Current Status
+
+As of the current implementation pass:
+
+- the AtoCore runtime is deployed on Dalidou
+- the canonical machine-data layout exists on Dalidou
+- the service is running from Dalidou
+- the T420/OpenClaw machine can reach AtoCore over network
+- a first read-only OpenClaw-side helper exists
+- the live corpus now includes initial AtoCore self-knowledge and a first
+  curated batch for active projects
+- the long-term content corpus still needs broader project and vault ingestion
+
+This means the platform is hosted on Dalidou now, the first cross-machine
+integration path exists, and the live content corpus is partially populated but
+not yet fully ingested.
--- a/docs/backup-strategy.md
+++ b/docs/backup-strategy.md
@@ -0,0 +1,80 @@
+# AtoCore Backup Strategy
+
+## Purpose
+
+This document describes the current backup baseline for the Dalidou-hosted
+AtoCore machine store.
+
+The immediate goal is not full disaster-proof automation yet. The goal is to
+have one safe, repeatable way to snapshot the most important writable state.
+
+## Current Backup Baseline
+
+Today, the safest hot-backup target is:
+
+- SQLite machine database
+- project registry JSON
+- backup metadata describing what was captured
+
+This is now supported by:
+
+- `python -m atocore.ops.backup`
+
+## What The Script Captures
+
+The backup command creates a timestamped snapshot under:
+
+- `ATOCORE_BACKUP_DIR/snapshots/<timestamp>/`
+
+It currently writes:
+
+- `db/atocore.db`
+  - created with SQLite's backup API
+- `config/project-registry.json`
+  - copied if it exists
+- `backup-metadata.json`
+  - timestamp, paths, and backup notes
+
+## What It Does Not Yet Capture
+
+The current script does not hot-backup Chroma.
+
+That is intentional.
+
+For now, Chroma should be treated as one of:
+
+- rebuildable derived state
+- or something that needs a deliberate cold snapshot/export workflow
+
+Until that workflow exists, do not rely on ad hoc live file copies of the
+vector store while the service is actively writing.
+
+## Dalidou Use
+
+On Dalidou, the canonical machine paths are:
+
+- DB:
+  - `/srv/storage/atocore/data/db/atocore.db`
+- registry:
+  - `/srv/storage/atocore/config/project-registry.json`
+- backups:
+  - `/srv/storage/atocore/backups`
+
+So a normal backup run should happen on Dalidou itself, not from another
+machine.
+
+## Next Backup Improvements
+
+1. decide Chroma policy clearly
+   - rebuild vs cold snapshot vs export
+2. add a simple scheduled backup routine on Dalidou
+3. add retention policy for old snapshots
+4. optionally add a restore validation check
+
+## Healthy Rule
+
+Do not design around syncing the live machine DB/vector store between machines.
+
+Back up the canonical Dalidou state.
+Restore from Dalidou state.
+Keep OpenClaw as a client of AtoCore, not a storage peer.
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -0,0 +1,247 @@
+# AtoCore Current State
+
+## Status Summary
+
+AtoCore is no longer just a proof of concept. The local engine exists, the
+correctness pass is complete, Dalidou now hosts the canonical runtime and
+machine-storage location, and the T420/OpenClaw side now has a safe read-only
+path to consume AtoCore. The live corpus is no longer just self-knowledge: it
+now includes a first curated ingestion batch for the active projects.
+
+## Phase Assessment
+
+- completed
+  - Phase 0
+  - Phase 0.5
+  - Phase 1
+- baseline complete
+  - Phase 2
+  - Phase 3
+  - Phase 5
+  - Phase 7
+  - Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
+- partial
+  - Phase 4
+  - Phase 8
+- not started
+  - Phase 6
+  - Phase 10
+  - Phase 11
+  - Phase 12
+  - Phase 13
+
+## What Exists Today
+
+- ingestion pipeline
+- parser and chunker
+- SQLite-backed memory and project state
+- vector retrieval
+- context builder
+- API routes for query, context, health, and source status
+- project registry and per-project refresh foundation
+- project registration lifecycle:
+  - template
+  - proposal preview
+  - approved registration
+  - safe update of existing project registrations
+  - refresh
+- implementation-facing architecture notes for:
+  - engineering knowledge hybrid architecture
+  - engineering ontology v1
+- env-driven storage and deployment paths
+- Dalidou Docker deployment foundation
+- initial AtoCore self-knowledge corpus ingested on Dalidou
+- T420/OpenClaw read-only AtoCore helper skill
+- full active-project markdown/text corpus wave for:
+  - `p04-gigabit`
+  - `p05-interferometer`
+  - `p06-polisher`
+
+## What Is True On Dalidou
+
+- deployed repo location:
+  - `/srv/storage/atocore/app`
+- canonical machine DB location:
+  - `/srv/storage/atocore/data/db/atocore.db`
+- canonical vector store location:
+  - `/srv/storage/atocore/data/chroma`
+- source input locations:
+  - `/srv/storage/atocore/sources/vault`
+  - `/srv/storage/atocore/sources/drive`
+
+The service and storage foundation are live on Dalidou.
+
+The machine-data host is real and canonical.
+
+The project registry is now also persisted in a canonical mounted config path on
+Dalidou:
+
+- `/srv/storage/atocore/config/project-registry.json`
+
+The content corpus is partially populated now.
+
+The Dalidou instance already contains:
+
+- AtoCore ecosystem and hosting docs
+- current-state and OpenClaw integration docs
+- Master Plan V3
+- Build Spec V1
+- trusted project-state entries for `atocore`
+- full staged project markdown/text corpora for:
+  - `p04-gigabit`
+  - `p05-interferometer`
+  - `p06-polisher`
+- curated repo-context docs for:
+  - `p05`: `Fullum-Interferometer`
+  - `p06`: `polisher-sim`
+- trusted project-state entries for:
+  - `p04-gigabit`
+  - `p05-interferometer`
+  - `p06-polisher`
+
+Current live stats after the full active-project wave are now far beyond the
+initial seed stage:
+
+- more than `1,100` source documents
+- more than `20,000` chunks
+- matching vector count
+
+The broader long-term corpus is still not fully populated yet. Wider project and
+vault ingestion remains a deliberate next step rather than something already
+completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.
+
+For human-readable quality review, the current staged project markdown corpus is
+primarily visible under:
+
+- `/srv/storage/atocore/sources/vault/incoming/projects`
+
+This staged area is now useful for review because it contains the markdown/text
+project docs that were actually ingested for the full active-project wave.
+
+It is important to read this staged area correctly:
+
+- it is a readable ingestion input layer
+- it is not the final machine-memory representation itself
+- seeing familiar PKM-style notes there is expected
+- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
+  trusted project state, and context-builder outputs
+
+## What Is True On The T420
+
+- SSH access is working
+- OpenClaw workspace inspected at `/home/papa/clawd`
+- OpenClaw's own memory system remains unchanged
+- a read-only AtoCore integration skill exists in the workspace:
+  - `/home/papa/clawd/skills/atocore-context/`
+- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
+- fail-open behavior has been verified for the helper path
+- OpenClaw can now seed AtoCore in two distinct ways:
+  - project-scoped memory entries
+  - staged document ingestion into the retrieval corpus
+- the helper now supports the practical registered-project lifecycle:
+  - projects
+  - project-template
+  - propose-project
+  - register-project
+  - update-project
+  - refresh-project
+- the helper now also supports the first organic routing layer:
+  - `detect-project "<prompt>"`
+  - `auto-context "<prompt>" [budget] [project]`
+- OpenClaw can now default to AtoCore for project-knowledge questions without
+  requiring explicit helper commands from the human every time
+
+## What Exists In Memory vs Corpus
+
+These remain separate and that is intentional.
+
+In `/memory`:
+
+- project-scoped curated memories now exist for:
+  - `p04-gigabit`: 5 memories
+  - `p05-interferometer`: 6 memories
+  - `p06-polisher`: 8 memories
+
+These are curated summaries and extracted stable project signals.
+
+In `source_documents` / retrieval corpus:
+
+- full project markdown/text corpora are now present for the active project set
+- retrieval is no longer limited to AtoCore self-knowledge only
+- the current corpus is broad enough that ranking quality matters more than
+  corpus presence alone
+- underspecified prompts can still pull in historical or archive material, so
+  project-aware routing and better ranking remain important
+
+The source refresh model now has a concrete foundation in code:
+
+- a project registry file defines known project ids, aliases, and ingest roots
+- the API can list registered projects
+- the API can return a registration template
+- the API can preview a registration without mutating state
+- the API can persist an approved registration
+- the API can update an existing registered project without changing its canonical id
+- the API can refresh one registered project at a time
+
+This lifecycle is now coherent end to end for normal use.
+
+The first live update passes on existing registered projects have now been
+verified against `p04-gigabit` and `p05-interferometer`:
+
+- the registration description can be updated safely
+- the canonical project id remains unchanged
+- refresh still behaves cleanly after the update
+- `context/build` still returns useful project-specific context afterward
+
+## Reliability Baseline
+
+The runtime has now been hardened in a few practical ways:
+
+- SQLite connections use a configurable busy timeout
+- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
+- project registry writes are atomic file replacements rather than in-place rewrites
+- a first runtime backup path now exists for:
+  - SQLite
+  - project registry
+  - backup metadata
+
+This does not eliminate every concurrency edge, but it materially improves the
+current operational baseline.
+
+In `Trusted Project State`:
+
+- each active seeded project now has a conservative trusted-state set
+- promoted facts cover:
+  - summary
+  - core architecture or boundary decision
+  - key constraints
+  - next focus
+
+This separation is healthy:
+
+- memory stores distilled project facts
+- corpus stores the underlying retrievable documents
+
+## Immediate Next Focus
+
+1. Use the new T420-side organic routing layer in real OpenClaw workflows
+2. Tighten retrieval quality for the now fully ingested active project corpora
+3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
+4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
+5. Expand the boring operations baseline:
+   - restore validation
+   - Chroma rebuild / backup policy
+   - retention
+6. Only later consider write-back, reflection, or deeper autonomous behaviors
+
+See also:
+
+- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
+- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
+
+## Guiding Constraints
+
+- bad memory is worse than no memory
+- trusted project state must remain highest priority
+- human-readable sources and machine storage stay separate
+- OpenClaw integration must not degrade OpenClaw baseline behavior
--- a/docs/dalidou-deployment.md
+++ b/docs/dalidou-deployment.md
@@ -0,0 +1,91 @@
+# Dalidou Deployment
+
+## Purpose
+Deploy AtoCore on Dalidou as the canonical runtime and machine-memory host.
+
+## Model
+
+- Dalidou hosts the canonical AtoCore service.
+- OpenClaw on the T420 consumes AtoCore over network/Tailscale API.
+- `sources/vault` and `sources/drive` are read-only inputs by convention.
+- SQLite/Chroma machine state stays on Dalidou and is not treated as a sync peer.
+- The app and machine-storage host can be live before the long-term content
+  corpus is fully populated.
+
+## Directory layout
+
+```text
+/srv/storage/atocore/
+  app/         # deployed repo checkout
+  data/
+    db/
+    chroma/
+    cache/
+    tmp/
+  sources/
+    vault/
+    drive/
+  logs/
+  backups/
+  run/
+```
+
+## Compose workflow
+
+The compose definition lives in:
+
+```text
+deploy/dalidou/docker-compose.yml
+```
+
+The Dalidou environment file should be copied to:
+
+```text
+deploy/dalidou/.env
+```
+
+starting from:
+
+```text
+deploy/dalidou/.env.example
+```
+
+## Deployment steps
+
+1. Place the repository under `/srv/storage/atocore/app`.
+2. Create the canonical directories listed above.
+3. Copy `deploy/dalidou/.env.example` to `deploy/dalidou/.env`.
+4. Adjust the source paths if your AtoVault/AtoDrive mirrors live elsewhere.
+5. Run:
+
+```bash
+cd /srv/storage/atocore/app/deploy/dalidou
+docker compose up -d --build
+```
+
+6. Validate:
+
+```bash
+curl http://127.0.0.1:8100/health
+curl http://127.0.0.1:8100/sources
+```
+
+## Deferred
+
+- backup automation
+- restore/snapshot tooling
+- reverse proxy / TLS exposure
+- automated source ingestion job
+- OpenClaw client wiring
+
+## Current Reality Check
+
+When this deployment is first brought up, the service may be healthy before the
+real corpus has been ingested.
+
+That means:
+
+- AtoCore the system can already be hosted on Dalidou
+- the canonical machine-data location can already be on Dalidou
+- but the live knowledge/content corpus may still be empty or only partially
+  loaded until source ingestion is run
--- a/docs/dalidou-storage-migration.md
+++ b/docs/dalidou-storage-migration.md
@@ -0,0 +1,61 @@
+# Dalidou Storage Migration
+
+## Goal
+Establish Dalidou as the canonical AtoCore host while keeping human-readable
+source layers separate from machine operational storage.
+
+## Canonical layout
+
+```text
+/srv/atocore/
+  app/         # git checkout of this repository
+  data/        # machine operational state
+    db/
+      atocore.db
+    chroma/
+    cache/
+    tmp/
+  sources/
+    vault/     # AtoVault input, read-only by convention
+    drive/     # AtoDrive input, read-only by convention
+  logs/
+  backups/
+  run/
+  config/
+    .env
+```
+
+## Environment variables
+
+Suggested Dalidou values:
+
+```bash
+ATOCORE_ENV=production
+ATOCORE_DATA_DIR=/srv/atocore/data
+ATOCORE_DB_DIR=/srv/atocore/data/db
+ATOCORE_CHROMA_DIR=/srv/atocore/data/chroma
+ATOCORE_CACHE_DIR=/srv/atocore/data/cache
+ATOCORE_TMP_DIR=/srv/atocore/data/tmp
+ATOCORE_VAULT_SOURCE_DIR=/srv/atocore/sources/vault
+ATOCORE_DRIVE_SOURCE_DIR=/srv/atocore/sources/drive
+ATOCORE_LOG_DIR=/srv/atocore/logs
+ATOCORE_BACKUP_DIR=/srv/atocore/backups
+ATOCORE_RUN_DIR=/srv/atocore/run
+```
+
+## Migration notes
+
+- Existing local installs remain backward-compatible.
+- If `data/atocore.db` already exists, AtoCore continues using it.
+- Fresh installs default to `data/db/atocore.db`.
+- Source directories are inputs only; AtoCore should ingest from them but not
+  treat them as writable runtime state.
+- Avoid syncing live SQLite/Chroma state between Dalidou and other machines.
+  Prefer one canonical running service and API access from OpenClaw.
+
+## Deferred work
+
+- service manager wiring
+- backup/snapshot procedures
+- automated source registration jobs
+- OpenClaw integration
--- a/docs/ingestion-waves.md
+++ b/docs/ingestion-waves.md
@@ -0,0 +1,129 @@
+# AtoCore Ingestion Waves
+
+## Purpose
+
+This document tracks how the corpus should grow without losing signal quality.
+
+The rule is:
+
+- ingest in waves
+- validate retrieval after each wave
+- only then widen the source scope
+
+## Wave 1 - Active Project Full Markdown Corpus
+
+Status: complete
+
+Projects:
+
+- `p04-gigabit`
+- `p05-interferometer`
+- `p06-polisher`
+
+What was ingested:
+
+- the full markdown/text PKM stacks for the three active projects
+- selected staged operational docs already under the Dalidou source roots
+- selected repo markdown/text context for:
+  - `Fullum-Interferometer`
+  - `polisher-sim`
+  - `Polisher-Toolhead` (when markdown exists)
+
+What was intentionally excluded:
+
+- binaries
+- images
+- PDFs
+- generated outputs unless they were plain text reports
+- dependency folders
+- hidden runtime junk
+
+Practical result:
+
+- AtoCore moved from a curated-seed corpus to a real active-project corpus
+- the live corpus now contains well over one thousand source documents and over
+  twenty thousand chunks
+- project-specific context building is materially stronger than before
+
+Main lesson from Wave 1:
+
+- full project ingestion is valuable
+- but broad historical/archive material can dilute retrieval for underspecified
+  prompts
+- context quality now depends more strongly on good project hints and better
+  ranking than on corpus size alone
+
+## Wave 2 - Trusted Operational Layer Expansion
+
+Status: next
+
+Goal:
+
+- expand `AtoDrive`-style operational truth for the active projects
+
+Candidate inputs:
+
+- current status dashboards
+- decision logs
+- milestone tracking
+- curated requirements baselines
+- explicit next-step plans
+
+Why this matters:
+
+- this raises the quality of the high-trust layer instead of only widening
+  general retrieval
+
+## Wave 3 - Broader Active Engineering References
+
+Status: planned
+
+Goal:
+
+- ingest reusable engineering references that support the active project set
+  without dumping the entire vault
+
+Candidate inputs:
+
+- interferometry reference notes directly tied to `p05`
+- polishing physics references directly tied to `p06`
+- mirror and structural reference material directly tied to `p04`
+
+Rule:
+
+- only bring in references with a clear connection to active work
+
+## Wave 4 - Wider PKM Population
+
+Status: deferred
+
+Goal:
+
+- widen beyond the active projects while preserving retrieval quality
+
+Preconditions:
+
+- stronger ranking
+- better project-aware routing
+- stable operational restore path
+- clearer promotion rules for trusted state
+
+## Validation After Each Wave
+
+After every ingestion wave, verify:
+
+- `stats`
+- project-specific `query`
+- project-specific `context-build`
+- `debug-context`
+- whether trusted project state still dominates when it should
+- whether cross-project bleed is getting worse or better
+
+## Working Rule
+
+The next wave should only happen when the current wave is:
+
+- ingested
+- inspected
+- retrieval-tested
+- operationally stable
--- a/docs/master-plan-status.md
+++ b/docs/master-plan-status.md
@@ -0,0 +1,155 @@
+# AtoCore Master Plan Status
+
+## Current Position
+
+AtoCore is currently between **Phase 7** and **Phase 8**.
+
+The platform is no longer just a proof of concept. The local engine exists, the
+core correctness pass is complete, Dalidou hosts the canonical runtime and
+machine database, and OpenClaw on the T420 can consume AtoCore safely in
+read-only additive mode.
+
+## Phase Status
+
+### Completed
+
+- Phase 0 - Foundation
+- Phase 0.5 - Proof of Concept
+- Phase 1 - Ingestion
+
+### Baseline Complete
+
+- Phase 2 - Memory Core
+- Phase 3 - Retrieval
+- Phase 5 - Project State
+- Phase 7 - Context Builder
+
+### Partial
+
+- Phase 4 - Identity / Preferences
+- Phase 8 - OpenClaw Integration
+
+### Baseline Complete
+
+- Phase 9 - Reflection (all three foundation commits landed:
+  A capture, B reinforcement, C candidate extraction + review queue)
+
+### Not Yet Complete In The Intended Sense
+
+- Phase 6 - AtoDrive
+- Phase 10 - Write-back
+- Phase 11 - Multi-model
+- Phase 12 - Evaluation
+- Phase 13 - Hardening
+
+### Engineering Layer Planning Sprint
+
+The engineering layer is intentionally in planning, not implementation.
+The architecture docs below are the current state of that planning:
+
+- [engineering-query-catalog.md](architecture/engineering-query-catalog.md) —
+  the 20 v1-required queries the engineering layer must answer
+- [memory-vs-entities.md](architecture/memory-vs-entities.md) —
+  canonical home split between memory and entity tables
+- [promotion-rules.md](architecture/promotion-rules.md) —
+  Layer 0 → Layer 2 pipeline, triggers, review queue mechanics
+- [conflict-model.md](architecture/conflict-model.md) —
+  detection, representation, and resolution of contradictory facts
+- [engineering-knowledge-hybrid-architecture.md](architecture/engineering-knowledge-hybrid-architecture.md) —
+  the 5-layer model (from the previous planning wave)
+- [engineering-ontology-v1.md](architecture/engineering-ontology-v1.md) —
+  the initial V1 object and relationship inventory (previous wave)
+
+Still to draft before engineering-layer implementation begins:
+
+- tool-handoff-boundaries.md (KB-CAD / KB-FEM read vs write)
+- human-mirror-rules.md (templates, triggers, edit flow)
+- representation-authority.md (PKM / KB / repo / AtoCore canonical home matrix)
+- engineering-v1-acceptance.md (done definition)
+
+## What Is Real Today
+
+- canonical AtoCore runtime on Dalidou
+- canonical machine DB and vector store on Dalidou
+- project registry with:
+  - template
+  - proposal preview
+  - register
+  - update
+  - refresh
+- read-only additive OpenClaw helper on the T420
+- seeded project corpus for:
+  - `p04-gigabit`
+  - `p05-interferometer`
+  - `p06-polisher`
+- conservative Trusted Project State for those active projects
+- first operational backup foundation for SQLite + project registry
+- implementation-facing architecture notes for future engineering knowledge work
+- first organic routing layer in OpenClaw via:
+  - `detect-project`
+  - `auto-context`
+
+## Now
+
+These are the current practical priorities.
+
+1. Finish practical OpenClaw integration
+   - make the helper lifecycle feel natural in daily use
+   - use the new organic routing layer for project-knowledge questions
+   - confirm fail-open behavior remains acceptable
+   - keep AtoCore clearly additive
+2. Tighten retrieval quality
+   - reduce cross-project competition
+   - improve ranking on short or ambiguous prompts
+   - add only a few anchor docs where retrieval is still weak
+3. Continue controlled ingestion
+   - deepen active projects selectively
+   - avoid noisy bulk corpus growth
+4. Strengthen operational boringness
+   - backup and restore procedure
+   - Chroma rebuild / backup policy
+   - retention and restore validation
+
+## Next
+
+These are the next major layers after the current practical pass.
+
+1. Clarify AtoDrive as a real operational truth layer
+2. Mature identity / preferences handling
+3. Improve observability for:
+   - retrieval quality
+   - context-pack inspection
+   - comparison of behavior with and without AtoCore
+
+## Later
+
+These are the deliberate future expansions already supported by the architecture
+direction, but not yet ready for immediate implementation.
+
+1. Minimal engineering knowledge layer
+   - driven by `docs/architecture/engineering-knowledge-hybrid-architecture.md`
+   - guided by `docs/architecture/engineering-ontology-v1.md`
+2. Minimal typed objects and relationships
+3. Evidence-linking and provenance-rich structured records
+4. Human mirror generation from structured state
+
+## Not Yet
+
+These remain intentionally deferred.
+
+- automatic write-back from OpenClaw into AtoCore
+- automatic memory promotion
+- reflection loop integration
+- replacing OpenClaw's own memory system
+- live machine-DB sync between machines
+- full ontology / graph expansion before the current baseline is stable
+
+## Working Rule
+
+The next sensible implementation threshold for the engineering ontology work is:
+
+- after the current ingestion, retrieval, registry, OpenClaw helper, organic
+  routing, and backup baseline feels boring and dependable
+
+Until then, the architecture docs should shape decisions, not force premature
+schema work.
--- a/docs/next-steps.md
+++ b/docs/next-steps.md
@@ -0,0 +1,152 @@
+# AtoCore Next Steps
+
+## Current Position
+
+AtoCore now has:
+
+- canonical runtime and machine storage on Dalidou
+- separated source and machine-data boundaries
+- initial self-knowledge ingested into the live instance
+- trusted project-state entries for AtoCore itself
+- a first read-only OpenClaw integration path on the T420
+- a first real active-project corpus batch for:
+  - `p04-gigabit`
+  - `p05-interferometer`
+  - `p06-polisher`
+
+This working list should be read alongside:
+
+- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
+
+## Immediate Next Steps
+
+1. Use the T420 `atocore-context` skill and the new organic routing layer in
+   real OpenClaw workflows
+   - confirm `auto-context` feels natural
+   - confirm project inference is good enough in practice
+   - confirm the fail-open behavior remains acceptable in practice
+2. Review retrieval quality after the first real project ingestion batch
+   - check whether the top hits are useful
+   - check whether trusted project state remains dominant
+   - reduce cross-project competition and prompt ambiguity where needed
+   - use `debug-context` to inspect the exact last AtoCore supplement
+3. Treat the active-project full markdown/text wave as complete
+   - `p04-gigabit`
+   - `p05-interferometer`
+   - `p06-polisher`
+4. Define a cleaner source refresh model
+   - make the difference between source truth, staged inputs, and machine store
+     explicit
+   - move toward a project source registry and refresh workflow
+   - foundation now exists via project registry + per-project refresh API
+   - registration policy + template + proposal + approved registration are now
+     the normal path for new projects
+5. Move to Wave 2 trusted-operational ingestion
+   - curated dashboards
+   - decision logs
+   - milestone/current-status views
+   - operational truth, not just raw project notes
+6. Integrate the new engineering architecture docs into active planning, not immediate schema code
+   - keep `docs/architecture/engineering-knowledge-hybrid-architecture.md` as the target layer model
+   - keep `docs/architecture/engineering-ontology-v1.md` as the V1 structured-domain target
+   - do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
+7. Define backup and export procedures for Dalidou
+   - exercise the new SQLite + registry snapshot path on Dalidou
+   - Chroma backup or rebuild policy
+   - retention and restore validation
+   - admin backup endpoint now supports `include_chroma` cold snapshot
+     under the ingestion lock and `validate` confirms each snapshot is
+     openable; remaining work is the operational retention policy
+8. Keep deeper automatic runtime integration modest until the organic read-only
+   model has proven value
+
+## Trusted State Status
+
+The first conservative trusted-state promotion pass is now complete for:
+
+- `p04-gigabit`
+- `p05-interferometer`
+- `p06-polisher`
+
+Each project now has a small set of stable entries covering:
+
+- summary
+- architecture or boundary decision
+- key constraints
+- current next focus
+
+This materially improves `context/build` quality for project-hinted prompts.
+
+## Recommended Near-Term Project Work
+
+The active-project full markdown/text wave is now in.
+
+The near-term work is now:
+
+1. strengthen retrieval quality
+2. promote or refine trusted operational truth where the broad corpus is now too noisy
+3. keep trusted project state concise and high-confidence
+4. widen only through named ingestion waves
+
+## Recommended Next Wave Inputs
+
+Wave 2 should emphasize trusted operational truth, not bulk historical notes.
+
+P04:
+
+- current status dashboard
+- current selected design path
+- current frame interface truth
+- current next-step milestone view
+
+P05:
+
+- selected vendor path
+- current error-budget baseline
+- current architecture freeze or open decisions
+- current procurement / next-action view
+
+P06:
+
+- current system map
+- current shared contracts baseline
+- current calibration procedure truth
+- current July / proving roadmap view
+
+## Deferred On Purpose
+
+- automatic write-back from OpenClaw into AtoCore
+- automatic memory promotion
+- reflection loop integration
+- replacing OpenClaw's own memory system
+- syncing the live machine DB between machines
+
+## Success Criteria For The Next Batch
+
+The next batch is successful if:
+
+- OpenClaw can use AtoCore naturally when context is needed
+- OpenClaw can infer registered projects and call AtoCore organically for
+  project-knowledge questions
+- the active-project full corpus wave can be inspected and used concretely
+  through `auto-context`, `context-build`, and `debug-context`
+- OpenClaw can also register a new project cleanly before refreshing it
+- existing project registrations can be refined safely before refresh when the
+  staged source set evolves
+- AtoCore answers correctly for the active project set
+- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
+- trusted project state remains concise and high confidence
+- project ingestion remains controlled rather than noisy
+- the canonical Dalidou instance stays stable
+
+## Long-Run Goal
+
+The long-run target is:
+
+- continue working normally inside PKM project stacks and Gitea repos
+- let OpenClaw keep its own memory and runtime behavior
+- let AtoCore supplement LLM work with stronger trusted context, retrieval, and
+  context assembly
+
+That means AtoCore should behave like a durable external context engine and
+machine-memory layer, not a replacement for normal repo work or OpenClaw memory.
--- a/docs/openclaw-integration-contract.md
+++ b/docs/openclaw-integration-contract.md
@@ -0,0 +1,157 @@
+# OpenClaw Integration Contract
+
+## Purpose
+
+This document defines the first safe integration contract between OpenClaw and
+AtoCore.
+
+The goal is to let OpenClaw consume AtoCore as an external context service
+without degrading OpenClaw's existing baseline behavior.
+
+## Current Implemented State
+
+The first safe integration foundation now exists on the T420 workspace:
+
+- OpenClaw's own memory system is unchanged
+- a local read-only helper skill exists at:
+  - `/home/papa/clawd/skills/atocore-context/`
+- the helper currently talks to the canonical Dalidou instance
+- the helper has verified:
+  - `health`
+  - `project-state`
+  - `query`
+  - `detect-project`
+  - `auto-context`
+  - fail-open fallback when AtoCore is unavailable
+
+This means the network and workflow foundation is working, and the first
+organic routing layer now exists, even though deeper autonomous integration
+into OpenClaw runtime behavior is still deferred.
+
+## Integration Principles
+
+- OpenClaw remains the runtime and orchestration layer
+- AtoCore remains the context enrichment layer
+- AtoCore is optional at runtime
+- if AtoCore is unavailable, OpenClaw must continue operating normally
+- initial integration is read-only
+- OpenClaw should not automatically write memories, project state, or ingestion
+  updates during the first integration batch
+
+## First Safe Responsibilities
+
+OpenClaw may use AtoCore for:
+
+- health and readiness checks
+- context building for contextual prompts
+- retrieval/query support
+- project-state lookup when a project is detected
+- automatic project-context augmentation for project-knowledge questions
+
+OpenClaw should not yet use AtoCore for:
+
+- automatic memory write-back
+- automatic reflection
+- conflict resolution decisions
+- replacing OpenClaw's own memory system
+
+## First API Surface
+
+OpenClaw should treat these as the initial contract:
+
+- `GET /health`
+  - check service readiness
+- `GET /sources`
+  - inspect source registration state
+- `POST /context/build`
+  - ask AtoCore for a budgeted context pack
+- `POST /query`
+  - use retrieval when useful
+
+Additional project-state inspection can be added if needed, but the first
+integration should stay small and resilient.
+
+## Current Helper Surface
+
+The current helper script exposes:
+
+- `health`
+- `sources`
+- `stats`
+- `projects`
+- `project-template`
+- `detect-project <prompt>`
+- `auto-context <prompt> [budget] [project]`
+- `debug-context`
+- `propose-project ...`
+- `register-project ...`
+- `update-project ...`
+- `refresh-project <project>`
+- `project-state <project>`
+- `query <prompt> [top_k]`
+- `context-build <prompt> [project] [budget]`
+- `ingest-sources`
+
+This means OpenClaw can now use the full practical registry lifecycle for known
+projects without dropping down to raw API calls.
+
+## Failure Behavior
+
+OpenClaw must treat AtoCore as additive.
+
+If AtoCore times out, returns an error, or is unavailable:
+
+- OpenClaw should continue with its own normal baseline behavior
+- no hard dependency should block the user's run
+- no partially written AtoCore state should be assumed
+
+## Suggested OpenClaw Configuration
+
+OpenClaw should eventually expose configuration like:
+
+- `ATOCORE_ENABLED`
+- `ATOCORE_BASE_URL`
+- `ATOCORE_TIMEOUT_MS`
+- `ATOCORE_FAIL_OPEN`
+
+Recommended first behavior:
+
+- enabled only when configured
+- low timeout
+- fail open by default
+- no writeback enabled
+
+## Suggested Usage Pattern
+
+1. OpenClaw receives a user request
+2. If the prompt looks like project knowledge, OpenClaw should try:
+   - `auto-context "<prompt>" 3000`
+   - optionally `debug-context` immediately after if a human wants to inspect
+     the exact AtoCore supplement
+3. If the prompt is clearly asking for trusted current truth, OpenClaw should
+   prefer:
+   - `project-state <project>`
+4. If the user explicitly asked for source refresh or ingestion, OpenClaw
+   should use:
+   - `refresh-project <id>`
+5. If AtoCore returns usable context, OpenClaw includes it
+6. If AtoCore fails, returns `no_project_match`, or is unavailable, OpenClaw
+   proceeds normally
+
+## Deferred Work
+
+- deeper automatic runtime wiring inside OpenClaw itself
+- memory promotion rules
+- identity and preference write flows
+- reflection loop
+- automatic ingestion requests from OpenClaw
+- write-back policy
+- conflict-resolution integration
+
+## Precondition Before Wider Ingestion
+
+Before bulk ingestion of projects or ecosystem notes:
+
+- the AtoCore service should be reachable from the T420
+- the OpenClaw failure fallback path should be confirmed
+- the initial contract should be documented and stable
--- a/docs/operating-model.md
+++ b/docs/operating-model.md
@@ -0,0 +1,142 @@
+# AtoCore Operating Model
+
+## Purpose
+
+This document makes the intended day-to-day operating model explicit.
+
+The goal is not to replace how work already happens. The goal is to make that
+existing workflow stronger by adding a durable context engine.
+
+## Core Idea
+
+Normal work continues in:
+
+- PKM project notes
+- Gitea repositories
+- Discord and OpenClaw workflows
+
+OpenClaw keeps:
+
+- its own memory
+- its own runtime and orchestration behavior
+- its own workspace and direct file/repo tooling
+
+AtoCore adds:
+
+- trusted project state
+- retrievable cross-source context
+- durable machine memory
+- context assembly that improves prompt quality and robustness
+
+## Layer Responsibilities
+
+- PKM and repos
+  - human-authoritative project sources
+  - where knowledge is created, edited, reviewed, and maintained
+- OpenClaw
+  - active operating environment
+  - orchestration, direct repo work, messaging, agent workflows, local memory
+- AtoCore
+  - compiled context engine
+  - durable machine-memory host
+  - retrieval and context assembly layer
+
+## Why This Architecture Works
+
+Each layer has different strengths and weaknesses.
+
+- PKM and repos are rich but noisy and manual to search
+- OpenClaw memory is useful but session-shaped and not the whole project record
+- raw LLM repo work is powerful but can miss trusted broader context
+- AtoCore can compile context across sources and provide a better prompt input
+
+The result should be:
+
+- stronger prompts
+- more robust outputs
+- less manual reconstruction
+- better continuity across sessions and models
+
+## What AtoCore Should Not Replace
+
+AtoCore should not replace:
+
+- normal file reads
+- direct repo search
+- direct PKM work
+- OpenClaw's own memory
+- OpenClaw's runtime and tool behavior
+
+It should supplement those systems.
+
+## What Healthy Usage Looks Like
+
+When working on a project:
+
+1. OpenClaw still uses local workspace/repo context
+2. OpenClaw still uses its own memory
+3. AtoCore adds:
+   - trusted current project state
+   - retrieved project documents
+   - cross-source project context
+   - context assembly for more robust model prompts
+
+## Practical Rule
+
+Think of AtoCore as the durable external context hard drive for LLM work:
+
+- fast machine-readable context
+- persistent project understanding
+- stronger prompt inputs
+- no need to replace the normal project workflow
+
+That is the architecture target.
+
+## Why The Staged Markdown Exists
+
+The staged markdown on Dalidou is a source-input layer, not the end product of
+the system.
+
+In the current deployment model:
+
+1. selected PKM, AtoDrive, or repo docs are copied or mirrored into a Dalidou
+   source path
+2. AtoCore ingests them
+3. the machine store keeps the processed representation
+4. retrieval and context building operate on that machine store
+
+So if the staged docs look very similar to your original PKM notes, that is
+expected. They are source material, not the compiled context layer itself.
+
+## What Happens When A Source Changes
+
+If you edit a PKM note or repo doc at the original source, AtoCore does not
+magically know yet.
+
+The current model is refresh-based:
+
+1. update the human-authoritative source
+2. refresh or re-stage the relevant project source set on Dalidou
+3. run ingestion again
+4. let AtoCore update the machine representation
+
+This is still an intermediate workflow. The long-run target is a cleaner source
+registry and refresh model so that commands like `refresh p05-interferometer`
+become natural and reliable.
+
+## Current Scope Of Ingestion
+
+The current project corpus is intentionally selective, not exhaustive.
+
+For active projects, the goal right now is to ingest:
+
+- high-value anchor docs
+- strong meeting notes with real decisions
+- architecture and constraints docs
+- selected repo context that explains the system shape
+
+The goal is not to dump the entire PKM or whole repo tree into AtoCore on the
+first pass.
+
+So if a project only has some curated notes and not the full project universe in
+the staged area yet, that is normal for the current phase.
--- a/docs/phase9-first-real-use.md
+++ b/docs/phase9-first-real-use.md
@@ -0,0 +1,321 @@
+# Phase 9 First Real Use Report
+
+## What this is
+
+The first empirical exercise of the Phase 9 reflection loop after
+Commits A, B, and C all landed. The goal is to find out where the
+extractor and the reinforcement matcher actually behave well versus
+where their behaviour drifts from the design intent.
+
+The validation is reproducible. To re-run:
+
+```bash
+python scripts/phase9_first_real_use.py
+```
+
+This writes an isolated SQLite + Chroma store under
+`data/validation/phase9-first-use/` (gitignored), seeds three active
+memories, then runs eight sample interactions through the full
+capture → reinforce → extract pipeline.
+
+## What we ran
+
+Eight synthetic interactions, each paraphrased from a real working
+session about AtoCore itself or the active engineering projects:
+
+| # | Label                                | Project              | Expected                  |
+|---|--------------------------------------|----------------------|---------------------------|
+| 1 | exdev-mount-merge-decision           | atocore              | 1 decision_heading        |
+| 2 | ownership-was-the-real-fix           | atocore              | 1 fact_heading            |
+| 3 | memory-vs-entity-canonical-home      | atocore              | 1 decision_heading (long) |
+| 4 | auto-promotion-deferred              | atocore              | 1 decision_heading        |
+| 5 | preference-rebase-workflow           | atocore              | 1 preference_sentence     |
+| 6 | constraint-from-doc-cite             | p05-interferometer   | 1 constraint_heading      |
+| 7 | prose-only-no-cues                   | atocore              | 0 candidates              |
+| 8 | multiple-cues-in-one-interaction     | p06-polisher         | 3 distinct rules          |
+
+Plus 3 seed memories were inserted before the run:
+
+- `pref_rebase`: "prefers rebase-based workflows because history stays linear" (preference, 0.6)
+- `pref_concise`: "writes commit messages focused on the why, not the what" (preference, 0.6)
+- `identity_runs_atocore`: "mechanical engineer who runs AtoCore for context engineering" (identity, 0.9)
+
+## What happened — extraction (the good news)
+
+**Every extraction expectation was met exactly.** All eight samples
+produced the predicted candidate count and the predicted rule
+classifications:
+
+| Sample                                | Expected | Got | Pass |
+|---------------------------------------|----------|-----|------|
+| exdev-mount-merge-decision            | 1        | 1   | ✅    |
+| ownership-was-the-real-fix            | 1        | 1   | ✅    |
+| memory-vs-entity-canonical-home       | 1        | 1   | ✅    |
+| auto-promotion-deferred               | 1        | 1   | ✅    |
+| preference-rebase-workflow            | 1        | 1   | ✅    |
+| constraint-from-doc-cite              | 1        | 1   | ✅    |
+| prose-only-no-cues                    | **0**    | **0** | ✅  |
+| multiple-cues-in-one-interaction      | 3        | 3   | ✅    |
+
+**Total: 9 candidates from 8 interactions, 0 false positives, 0 misses
+on heading patterns or sentence patterns.**
+
+The extractor's strictness is well-tuned for the kinds of structural
+cues we actually use. Things worth noting:
+
+- **Sample 7 (`prose-only-no-cues`) produced zero candidates as
+  designed.** This is the most important sanity check — it confirms
+  the extractor won't fill the review queue with general prose when
+  there's no structural intent.
+- **Sample 3's long content was preserved without truncation.** The
+  280-char max wasn't hit, and the content kept its full meaning.
+- **Sample 8 produced three distinct rules in one interaction**
+  (decision_heading, constraint_heading, requirement_heading) without
+  the dedup key collapsing them. The dedup key is
+  `(memory_type, normalized_content, rule)` and the three are all
+  different on at least one axis, so they coexist as expected.
+- **The prose around each heading was correctly ignored.** Sample 6
+  has a second sentence ("the error budget allocates 6 nm to the
+  laser source...") that does NOT have a structural cue, and the
+  extractor correctly didn't fire on it.
+
+## What happened — reinforcement (the empirical finding)
+
+**Reinforcement matched zero seeded memories across all 8 samples,
+even when the response clearly echoed the seed.**
+
+Sample 5's response was:
+
+> *"I prefer rebase-based workflows because the history stays linear
+> and reviewers have an easier time."*
+
+The seeded `pref_rebase` memory was:
+
+> *"prefers rebase-based workflows because history stays linear"*
+
+A human reading both says these are the same fact. The reinforcement
+matcher disagrees. After all 8 interactions:
+
+```
+pref_rebase:           confidence=0.6000  refs=0  last=-
+pref_concise:          confidence=0.6000  refs=0  last=-
+identity_runs_atocore: confidence=0.9000  refs=0  last=-
+```
+
+**Nothing moved.** This is the most important finding from this
+validation pass.
+
+### Why the matcher missed it
+
+The current `_memory_matches` rule (in
+`src/atocore/memory/reinforcement.py`) does a normalized substring
+match: it lowercases both sides, collapses whitespace, then asks
+"does the leading 80-char window of the memory content appear as a
+substring in the response?"
+
+For the rebase example:
+
+- needle (normalized): `prefers rebase-based workflows because history stays linear`
+- haystack (normalized): `i prefer rebase-based workflows because the history stays linear and reviewers have an easier time.`
+
+The needle starts with `prefers` (with the trailing `s`), and the
+haystack has `prefer` (without the `s`, because of the first-person
+voice). And the needle has `because history stays linear`, while the
+haystack has `because the history stays linear`. **Two small natural
+paraphrases, and the substring fails.**
+
+This isn't a bug in the matcher's implementation — it's doing
+exactly what it was specified to do. It's a design limitation: the
+substring rule is too brittle for real prose, where the same fact
+gets re-stated with different verb forms, articles, and word order.
+
+### Severity
+
+**Medium-high.** Reinforcement is the entire point of Commit B.
+A reinforcement matcher that never fires on natural paraphrases
+will leave seeded memories with stale confidence forever. The
+reflection loop runs but it doesn't actually reinforce anything.
+That hollows out the value of having reinforcement at all.
+
+It is not a critical bug because:
+- Nothing breaks. The pipeline still runs cleanly.
+- Reinforcement is supposed to be a *signal*, not the only path to
+  high confidence — humans can still curate confidence directly.
+- The candidate-extraction path (Commit C) is unaffected and works
+  perfectly.
+
+But it does need to be addressed before Phase 9 can be considered
+operationally complete.
+
+## Recommended fix (deferred to a follow-up commit)
+
+Replace the substring matcher with a token-overlap matcher. The
+specification:
+
+1. Tokenize both memory content and response into lowercase words
+   of length >= 3, dropping a small stop list (`the`, `a`, `an`,
+   `and`, `or`, `of`, `to`, `is`, `was`, `that`, `this`, `with`,
+   `for`, `from`, `into`).
+2. Stem aggressively (or at minimum, fold trailing `s` and `ed`
+   so `prefers`/`prefer`/`preferred` collapse to one token).
+3. A match exists if **at least 70% of the memory's content
+   tokens** appear in the response token set.
+4. Memory content must still be at least `_MIN_MEMORY_CONTENT_LENGTH`
+   characters to be considered.
+
+This is more permissive than the substring rule but still tight
+enough to avoid spurious matches on generic words. It would have
+caught the rebase example because:
+
+- memory tokens (after stop-list and stemming):
+  `{prefer, rebase-bas, workflow, because, history, stay, linear}`
+- response tokens:
+  `{prefer, rebase-bas, workflow, because, history, stay, linear,
+  reviewer, easi, time}`
+- overlap: 7 / 7 memory tokens = 100% > 70% threshold → match
+
+### Why not fix it in this report
+
+Three reasons:
+
+1. The validation report is supposed to be evidence, not a fix
+   spec. A separate commit will introduce the new matcher with
+   its own tests.
+2. The token-overlap matcher needs its own design review for edge
+   cases (very long memories, very short responses, technical
+   abbreviations, code snippets in responses).
+3. Mixing the report and the fix into one commit would muddle the
+   audit trail. The report is the empirical evidence; the fix is
+   the response.
+
+The fix is queued as the next Phase 9 maintenance commit and is
+flagged in the next-steps section below.
+
+## Other observations
+
+### Extraction is conservative on purpose, and that's working
+
+Sample 7 is the most important data point in the whole run.
+A natural prose response with no structural cues produced zero
+candidates. **This is exactly the design intent** — the extractor
+should be loud about explicit decisions/constraints/requirements
+and quiet about everything else. If the extractor were too loose
+the review queue would fill up with low-value items and the human
+would stop reviewing.
+
+After this run I have measurably more confidence that the V0 rule
+set is the right starting point. Future rules can be added one at
+a time as we see specific patterns the extractor misses, instead of
+guessing at what might be useful.
+
+### Confidence on candidates
+
+All extracted candidates landed at the default `confidence=0.5`,
+which is what the extractor is currently hardcoded to do. The
+`promotion-rules.md` doc proposes a per-rule prior with a
+structural-signal multiplier and freshness bonus. None of that is
+implemented yet. The validation didn't reveal any urgency around
+this — humans review the candidates either way — but it confirms
+that the priors-and-multipliers refinement is a reasonable next
+step rather than a critical one.
+
+### Multiple cues in one interaction
+
+Sample 8 confirmed an important property: **three structural
+cues in the same response do not collide in dedup**. The dedup
+key is `(memory_type, normalized_content, rule)`, and since each
+cue produced a distinct (type, content, rule) tuple, all three
+landed cleanly.
+
+This matters because real working sessions naturally bundle
+multiple decisions/constraints/requirements into one summary.
+The extractor handles those bundles correctly.
+
+### Project scoping
+
+Each candidate carries the `project` from the source interaction
+into its own `project` field. Sample 6 (p05) and sample 8 (p06)
+both produced candidates with the right project. This is
+non-obvious because the extractor module never explicitly looks
+at project — it inherits from the interaction it's scanning. Worth
+keeping in mind when the entity extractor is built: same pattern
+should apply.
+
+## What this validates and what it doesn't
+
+### Validates
+
+- The Phase 9 Commit C extractor's rule set is well-tuned for
+  hand-written structural cues
+- The dedup logic does the right thing across multiple cues
+- The "drop candidates that match an existing active memory" filter
+  works (would have been visible if any seeded memory had matched
+  one of the heading texts — none did, but the code path is the
+  same one that's covered in `tests/test_extractor.py`)
+- The `prose-only-no-cues` no-fire case is solid
+- Long content is preserved without truncation
+- Project scoping flows through the pipeline
+
+### Does NOT validate
+
+- The reinforcement matcher (clearly, since it caught nothing)
+- The behaviour against very long documents (each sample was
+  under 700 chars; real interaction responses can be 10× that)
+- The behaviour against responses that contain code blocks (the
+  extractor's regex rules don't handle code-block fenced sections
+  specially)
+- Cross-interaction promotion-to-active flow (no candidate was
+  promoted in this run; the lifecycle is covered by the unit tests
+  but not by this empirical exercise)
+- The behaviour at scale: 8 interactions is a one-shot. We need
+  to see the queue after 50+ before judging reviewer ergonomics.
+
+### Recommended next empirical exercises
+
+1. **Real conversation capture**, using a slash command from a
+   real Claude Code session against either a local or Dalidou
+   AtoCore instance. The synthetic responses in this script are
+   honest paraphrases but they're still hand-curated.
+2. **Bulk capture from existing PKM**, ingesting a few real
+   project notes through the extractor as if they were
+   interactions. This stresses the rules against documents that
+   weren't written with the extractor in mind.
+3. **Reinforcement matcher rerun** after the token-overlap
+   matcher lands.
+
+## Action items from this report
+
+- [ ] **Fix reinforcement matcher** with token-overlap rule
+      described in the "Recommended fix" section above. Owner:
+      next session. Severity: medium-high.
+- [x] **Document the extractor's V0 strictness** as a working
+      property, not a limitation. Sample 7 makes the case.
+- [ ] **Build the slash command** so the next validation run
+      can use real (not synthetic) interactions. Tracked in
+      Session 2 of the current planning sprint.
+- [ ] **Run a 50+ interaction batch** to evaluate reviewer
+      ergonomics. Deferred until the slash command exists.
+
+## Reproducibility
+
+The script is deterministic. Re-running it will produce
+identical results because:
+
+- the data dir is wiped on every run
+- the sample interactions are constants
+- the memory uuid generation is non-deterministic but the
+  important fields (content, type, count, rule) are not
+- the `data/validation/phase9-first-use/` directory is gitignored,
+  so no state leaks across runs
+
+To reproduce this exact report:
+
+```bash
+python scripts/phase9_first_real_use.py
+```
+
+To get JSON output for downstream tooling:
+
+```bash
+python scripts/phase9_first_real_use.py --json
+```
--- a/docs/project-registration-policy.md
+++ b/docs/project-registration-policy.md
@@ -0,0 +1,129 @@
+# AtoCore Project Registration Policy
+
+## Purpose
+
+This document defines the normal path for adding a new project to AtoCore and
+for safely updating an existing registration later.
+
+The goal is to make `register + refresh` the standard workflow instead of
+relying on long custom ingestion prompts every time.
+
+## What Registration Means
+
+Registering a project does not ingest it by itself.
+
+Registration means:
+
+- the project gets a canonical AtoCore id
+- known aliases are recorded
+- the staged source roots for that project are defined
+- AtoCore and OpenClaw can later refresh that project consistently
+
+Updating a project means:
+
+- aliases can be corrected or expanded
+- the short registry description can be improved
+- ingest roots can be adjusted deliberately
+- the canonical project id remains stable
+
+## Required Fields
+
+Each project registry entry must include:
+
+- `id`
+  - stable canonical project id
+  - prefer lowercase kebab-case
+  - examples:
+    - `p04-gigabit`
+    - `p05-interferometer`
+    - `p06-polisher`
+- `aliases`
+  - short common names or abbreviations
+  - examples:
+    - `p05`
+    - `interferometer`
+- `description`
+  - short explanation of what the registered source set represents
+- `ingest_roots`
+  - one or more staged roots under configured source layers
+
+## Allowed Source Roots
+
+Current allowed `source` values are:
+
+- `vault`
+- `drive`
+
+These map to the configured Dalidou source boundaries.
+
+## Recommended Registration Rules
+
+1. Prefer one canonical project id
+2. Keep aliases short and practical
+3. Start with the smallest useful staged roots
+4. Prefer curated high-signal docs before broad corpora
+5. Keep repo context selective at first
+6. Avoid registering noisy or generated trees
+7. Use `drive` for trusted operational material when available
+8. Use `vault` for curated staged PKM and repo-doc snapshots
+
+## Normal Workflow
+
+For a new project:
+
+1. stage the initial source docs on Dalidou
+2. inspect the expected shape with:
+   - `GET /projects/template`
+   - or `atocore.sh project-template`
+3. preview the entry without mutating state:
+   - `POST /projects/proposal`
+   - or `atocore.sh propose-project ...`
+4. register the approved entry:
+   - `POST /projects/register`
+   - or `atocore.sh register-project ...`
+5. verify the entry with:
+   - `GET /projects`
+   - or the T420 helper `atocore.sh projects`
+6. refresh it with:
+   - `POST /projects/{id}/refresh`
+   - or `atocore.sh refresh-project <id>`
+7. verify retrieval and context quality
+8. only later promote stable facts into Trusted Project State
+
+For an existing registered project:
+
+1. inspect the current entry with:
+   - `GET /projects`
+   - or `atocore.sh projects`
+2. update the registration if aliases, description, or roots need refinement:
+   - `PUT /projects/{id}`
+3. verify the updated entry
+4. refresh the project again
+5. verify retrieval and context quality did not regress
+
+## What Not To Do
+
+Do not:
+
+- register giant noisy trees blindly
+- treat registration as equivalent to trusted state
+- dump the full PKM by default
+- rely on aliases that collide across projects
+- use the live machine DB as a source root
+
+## Template
+
+Use:
+
+- [project-registry.example.json](C:/Users/antoi/ATOCore/config/project-registry.example.json)
+
+And the API template endpoint:
+
+- `GET /projects/template`
+
+Other lifecycle endpoints:
+
+- `POST /projects/proposal`
+- `POST /projects/register`
+- `PUT /projects/{id}`
+- `POST /projects/{id}/refresh`
--- a/docs/source-refresh-model.md
+++ b/docs/source-refresh-model.md
@@ -0,0 +1,103 @@
+# AtoCore Source Refresh Model
+
+## Purpose
+
+This document explains how human-authored project material should flow into the
+Dalidou-hosted AtoCore machine store.
+
+It exists to make one distinction explicit:
+
+- source markdown is not the same thing as the machine-memory layer
+- source refresh is how changes in PKM or repos become visible to AtoCore
+
+## Current Model
+
+Today, the flow is:
+
+1. human-authoritative project material exists in PKM, AtoDrive, and repos
+2. selected high-value files are staged into Dalidou source paths
+3. AtoCore ingests those source files
+4. AtoCore stores the processed representation in:
+   - document records
+   - chunks
+   - vectors
+   - project memory
+   - trusted project state
+5. retrieval and context assembly use the machine store, not the staged folder
+
+## Why This Feels Redundant
+
+The staged source files can look almost identical to the original PKM notes or
+repo docs because they are still source material.
+
+That is expected.
+
+The staged source area exists because the canonical AtoCore instance on Dalidou
+needs a server-visible path to ingest from.
+
+## What Happens When A Project Source Changes
+
+If you edit a note in PKM or a doc in a repo:
+
+- the original source changes immediately
+- the staged Dalidou copy does not change automatically
+- the AtoCore machine store also does not change automatically
+
+To refresh AtoCore:
+
+1. select the updated project source set
+2. copy or mirror the new version into the Dalidou source area
+3. run ingestion again
+4. verify that retrieval and context reflect the new material
+
+## Current Intentional Limits
+
+The current active-project ingestion strategy is selective.
+
+That means:
+
+- not every note from a project is staged
+- not every repo file is staged
+- the goal is to start with high-value anchor docs
+- broader ingestion comes later if needed
+
+This is why the staged source area for a project may look partial or uneven at
+this stage.
+
+## Long-Run Target
+
+The long-run workflow should become much more natural:
+
+- each project has a registered source map
+  - PKM root
+  - AtoDrive root
+  - repo root
+  - preferred docs
+  - excluded noisy paths
+- a command like `refresh p06-polisher` resolves the right sources
+- AtoCore refreshes the machine representation cleanly
+- OpenClaw consumes the improved context over API
+
+## Current Foundation
+
+The first concrete foundation for this now exists in AtoCore:
+
+- a project registry file records known project ids, aliases, and ingest roots
+- the API can list those registered projects
+- the API can return a registration template for new projects
+- the API can preview a proposed registration before writing it
+- the API can persist an approved registration to the registry
+- the API can refresh a single registered project from its configured roots
+
+This is not full source automation yet, but it gives the refresh model a real
+home in the system.
+
+## Healthy Mental Model
+
+Use this distinction:
+
+- PKM / AtoDrive / repos = human-authoritative sources
+- staged Dalidou markdown = server-visible ingestion inputs
+- AtoCore DB/vector state = compiled machine context layer
+
+That separation is intentional and healthy.
--- a/scripts/phase9_first_real_use.py
+++ b/scripts/phase9_first_real_use.py
@@ -0,0 +1,393 @@
+"""Phase 9 first-real-use validation script.
+
+Captures a small set of representative interactions drawn from a real
+working session, runs the full Phase 9 loop (capture -> reinforce ->
+extract) over them, and prints what each step produced. The intent is
+to generate empirical evidence about the extractor's behaviour against
+prose that wasn't written to make the test pass.
+
+Usage:
+    python scripts/phase9_first_real_use.py [--data-dir PATH]
+
+The script writes a fresh isolated SQLite + Chroma store under the
+given data dir (default: ./data/validation/phase9-first-use). The
+data dir is gitignored so the script can be re-run cleanly.
+
+Each interaction is printed with:
+  - the captured interaction id
+  - the reinforcement results (which seeded memories were echoed)
+  - the extraction results (which candidates were proposed and why)
+  - notes on what the extractor MISSED (manually annotated below)
+
+The output is intentionally human-readable so the run can be saved as
+the body of docs/phase9-first-real-use.md.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import shutil
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+
+# Make the package importable when run directly from the repo root.
+_REPO_ROOT = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(_REPO_ROOT / "src"))
+
+
+@dataclass
+class SampleInteraction:
+    label: str
+    prompt: str
+    response: str
+    project: str
+    notes_for_human: str
+
+
+# ---------------------------------------------------------------------------
+# Sample interactions, paraphrased from real working sessions on AtoCore
+# itself. Each one is a real-feeling exchange with at least one structural
+# cue the extractor should catch, plus prose the extractor should NOT
+# accidentally catch.
+# ---------------------------------------------------------------------------
+
+SAMPLES: list[SampleInteraction] = [
+    SampleInteraction(
+        label="exdev-mount-merge-decision",
+        project="atocore",
+        prompt="The TMPDIR fix didn't work, the LOCAL_COPY_PATH change didn't work, what now?",
+        response=(
+            "We've been chasing this through three layers without success. "
+            "Time to stop guessing and apply the simple fix Dalidou Claude "
+            "proposed in the very first diagnosis.\n\n"
+            "## Decision: merge the two bind mounts into one\n\n"
+            "Both /srv/storage/repos and /srv/storage/gitea/data live on "
+            "/dev/sdb1, so the move is a same-filesystem rename — instant, "
+            "atomic, no data copy. After the move, the EXDEV class of bug "
+            "becomes structurally impossible because there is only one "
+            "mount inside the container."
+        ),
+        notes_for_human=(
+            "Should fire decision_heading rule. Should NOT fire requirement "
+            "or constraint rules. The 'becomes structurally impossible' line "
+            "is prose, not a candidate."
+        ),
+    ),
+    SampleInteraction(
+        label="ownership-was-the-real-fix",
+        project="atocore",
+        prompt="Still failing after the mount merge. What's actually wrong?",
+        response=(
+            "Same rejection even after the mounts were merged, which kills "
+            "the EXDEV theory. The error 'unable to migrate objects to "
+            "permanent storage' is git's catch-all for any failure in "
+            "tmp_objdir_migrate, and the real errno is being thrown away.\n\n"
+            "## Fact: the actual root cause was ownership, not cross-device link\n\n"
+            "After mv on the host, the moved directory had perms git could "
+            "navigate for some subdirs but not others. Recursive chown "
+            "git:git plus chmod u+rwX,g+rwX fixed it on the first try."
+        ),
+        notes_for_human=(
+            "Should fire fact_heading rule, target type knowledge. The "
+            "'unable to migrate' string is inside prose and should NOT "
+            "trigger any rule."
+        ),
+    ),
+    SampleInteraction(
+        label="memory-vs-entity-canonical-home",
+        project="atocore",
+        prompt="Does a Decision live in memories or entities?",
+        response=(
+            "Both layers exist for different reasons. After thinking through "
+            "the ontology question, I'm landing on a clean rule: one "
+            "canonical home per concept, with explicit graduation when a "
+            "memory eventually deserves structure.\n\n"
+            "## Decision: project, knowledge, and adaptation memory types graduate to entities\n\n"
+            "The identity, preference, and episodic types stay in the memory "
+            "layer because they're personal and unstructured. The other "
+            "three become entities once the engineering layer V1 ships. The "
+            "original memory rows are NEVER deleted — they become frozen "
+            "historical pointers to whichever entity they graduated into."
+        ),
+        notes_for_human=(
+            "Should fire decision_heading rule. Long content (over 100 "
+            "chars) — verifies the max-length cap of 280 doesn't truncate."
+        ),
+    ),
+    SampleInteraction(
+        label="auto-promotion-deferred",
+        project="atocore",
+        prompt="Should the engineering layer auto-promote candidates?",
+        response=(
+            "The trust hierarchy is the whole point of AtoCore. If we let "
+            "the system auto-promote, we trade away the 'bad memory is "
+            "worse than no memory' invariant for a marginal speed gain.\n\n"
+            "## Decision: no auto-promotion in V1, ever\n\n"
+            "The schema and API are designed so policies can be added "
+            "later without migration, but the V1 stance is that every "
+            "promotion needs a human signoff. The hard line: nothing ever "
+            "moves into trusted project state automatically, even with the "
+            "engineering layer running."
+        ),
+        notes_for_human=(
+            "Should fire decision_heading rule. Has 'invariant' jargon — "
+            "verify the extractor doesn't get distracted."
+        ),
+    ),
+    SampleInteraction(
+        label="preference-rebase-workflow",
+        project="atocore",
+        prompt="Should we use rebase or merge commits for this branch?",
+        response=(
+            "I prefer rebase-based workflows because the history stays "
+            "linear and reviewers have an easier time. For the dalidou "
+            "storage-foundation branch we'll rebase before merging into "
+            "main so the timeline reads as one coherent narrative."
+        ),
+        notes_for_human=(
+            "Should fire preference_sentence rule. Note: the rule looks "
+            "for 'I prefer' which is exactly the wording used."
+        ),
+    ),
+    SampleInteraction(
+        label="constraint-from-doc-cite",
+        project="p05-interferometer",
+        prompt="What's the budget situation for p05 right now?",
+        response=(
+            "Looking at the latest design notes, the constraint is fairly "
+            "tight on the optical path side.\n\n"
+            "### Constraint: total optical path stability must stay within 10 nm RMS at 1 kHz\n\n"
+            "The error budget allocates 6 nm to the laser source and 4 nm "
+            "to the mechanical mounts and frame combined."
+        ),
+        notes_for_human=(
+            "Should fire constraint_heading rule with type=project. The "
+            "second sentence ('error budget allocates 6 nm') is descriptive "
+            "and should NOT fire any rule because it doesn't have a "
+            "structural cue."
+        ),
+    ),
+    SampleInteraction(
+        label="prose-only-no-cues",
+        project="atocore",
+        prompt="What should I work on next?",
+        response=(
+            "Looking at the current state of AtoCore, the next natural "
+            "step is to validate Phase 9 against real interactions before "
+            "starting the engineering layer implementation. Empirical "
+            "evidence is missing and the planning docs would benefit from "
+            "real signal."
+        ),
+        notes_for_human=(
+            "Should produce ZERO candidates. Pure recommendation prose, no "
+            "structural cues. If this fires anything the extractor is too "
+            "loose."
+        ),
+    ),
+    SampleInteraction(
+        label="multiple-cues-in-one-interaction",
+        project="p06-polisher",
+        prompt="Summarize today's polisher session",
+        response=(
+            "We worked through three things in the polisher session today.\n\n"
+            "## Decision: defer the laser interlock redesign to after the July milestone\n\n"
+            "## Constraint: the calibration routine must complete in under 90 seconds for production use\n\n"
+            "## Requirement: the polisher must hold position to within 0.5 micron at 1 g loading\n\n"
+            "Action items captured for the next sync."
+        ),
+        notes_for_human=(
+            "Three rules should fire on the same interaction: "
+            "decision_heading -> adaptation, constraint_heading -> project, "
+            "requirement_heading -> project. Verify dedup doesn't merge them."
+        ),
+    ),
+]
+
+
+def setup_environment(data_dir: Path) -> None:
+    """Configure AtoCore to use an isolated data directory for this run."""
+    if data_dir.exists():
+        shutil.rmtree(data_dir)
+    data_dir.mkdir(parents=True, exist_ok=True)
+    os.environ["ATOCORE_DATA_DIR"] = str(data_dir)
+    os.environ.setdefault("ATOCORE_DEBUG", "true")
+    # Reset cached settings so the new env vars take effect
+    import atocore.config as config
+
+    config.settings = config.Settings()
+    import atocore.retrieval.vector_store as vs
+
+    vs._store = None
+
+
+def seed_memories() -> dict[str, str]:
+    """Insert a small set of seed active memories so reinforcement has
+    something to match against."""
+    from atocore.memory.service import create_memory
+
+    seeded: dict[str, str] = {}
+    seeded["pref_rebase"] = create_memory(
+        memory_type="preference",
+        content="prefers rebase-based workflows because history stays linear",
+        confidence=0.6,
+    ).id
+    seeded["pref_concise"] = create_memory(
+        memory_type="preference",
+        content="writes commit messages focused on the why, not the what",
+        confidence=0.6,
+    ).id
+    seeded["identity_runs_atocore"] = create_memory(
+        memory_type="identity",
+        content="mechanical engineer who runs AtoCore for context engineering",
+        confidence=0.9,
+    ).id
+    return seeded
+
+
+def run_sample(sample: SampleInteraction) -> dict:
+    """Capture one sample, run extraction, return a result dict."""
+    from atocore.interactions.service import record_interaction
+    from atocore.memory.extractor import extract_candidates_from_interaction
+
+    interaction = record_interaction(
+        prompt=sample.prompt,
+        response=sample.response,
+        project=sample.project,
+        client="phase9-first-real-use",
+        session_id="first-real-use",
+        reinforce=True,
+    )
+    candidates = extract_candidates_from_interaction(interaction)
+
+    return {
+        "label": sample.label,
+        "project": sample.project,
+        "interaction_id": interaction.id,
+        "expected_notes": sample.notes_for_human,
+        "candidate_count": len(candidates),
+        "candidates": [
+            {
+                "memory_type": c.memory_type,
+                "rule": c.rule,
+                "content": c.content,
+                "source_span": c.source_span[:120],
+            }
+            for c in candidates
+        ],
+    }
+
+
+def report_seed_memory_state(seeded_ids: dict[str, str]) -> dict:
+    from atocore.memory.service import get_memories
+
+    state = {}
+    for label, mid in seeded_ids.items():
+        rows = [m for m in get_memories(limit=200) if m.id == mid]
+        if not rows:
+            state[label] = None
+            continue
+        m = rows[0]
+        state[label] = {
+            "id": m.id,
+            "memory_type": m.memory_type,
+            "content_preview": m.content[:80],
+            "confidence": round(m.confidence, 4),
+            "reference_count": m.reference_count,
+            "last_referenced_at": m.last_referenced_at,
+        }
+    return state
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data-dir",
+        default=str(_REPO_ROOT / "data" / "validation" / "phase9-first-use"),
+        help="Isolated data directory to use for this validation run",
+    )
+    parser.add_argument(
+        "--json",
+        action="store_true",
+        help="Emit machine-readable JSON instead of human prose",
+    )
+    args = parser.parse_args()
+
+    data_dir = Path(args.data_dir).resolve()
+    setup_environment(data_dir)
+
+    from atocore.models.database import init_db
+    from atocore.context.project_state import init_project_state_schema
+
+    init_db()
+    init_project_state_schema()
+
+    seeded = seed_memories()
+    sample_results = [run_sample(s) for s in SAMPLES]
+    final_seed_state = report_seed_memory_state(seeded)
+
+    if args.json:
+        json.dump(
+            {
+                "data_dir": str(data_dir),
+                "seeded_memories_initial": list(seeded.keys()),
+                "samples": sample_results,
+                "seed_memory_state_after_run": final_seed_state,
+            },
+            sys.stdout,
+            indent=2,
+            default=str,
+        )
+        return 0
+
+    print("=" * 78)
+    print("Phase 9 first-real-use validation run")
+    print("=" * 78)
+    print(f"Isolated data dir: {data_dir}")
+    print()
+    print("Seeded the memory store with 3 active memories:")
+    for label, mid in seeded.items():
+        print(f"  - {label}  ({mid[:8]})")
+    print()
+    print("-" * 78)
+    print(f"Running {len(SAMPLES)} sample interactions ...")
+    print("-" * 78)
+
+    for result in sample_results:
+        print()
+        print(f"## {result['label']}  [project={result['project']}]")
+        print(f"   interaction_id={result['interaction_id'][:8]}")
+        print(f"   expected: {result['expected_notes']}")
+        print(f"   candidates produced: {result['candidate_count']}")
+        for i, cand in enumerate(result["candidates"], 1):
+            print(
+                f"     [{i}] type={cand['memory_type']:11s} "
+                f"rule={cand['rule']:21s} "
+                f"content={cand['content']!r}"
+            )
+
+    print()
+    print("-" * 78)
+    print("Reinforcement state on seeded memories AFTER all interactions:")
+    print("-" * 78)
+    for label, state in final_seed_state.items():
+        if state is None:
+            print(f"  {label}: <missing>")
+            continue
+        print(
+            f"  {label}: confidence={state['confidence']:.4f}  "
+            f"refs={state['reference_count']}  "
+            f"last={state['last_referenced_at'] or '-'}"
+        )
+
+    print()
+    print("=" * 78)
+    print("Run complete. Data written to:", data_dir)
+    print("=" * 78)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/src/atocore/api/routes.py
+++ b/src/atocore/api/routes.py
@@ -5,6 +5,7 @@ from pathlib import Path
 from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel

+import atocore.config as _config
 from atocore.context.builder import (
    build_context,
    get_last_context_pack,
@@ -16,16 +17,50 @@ from atocore.context.project_state import (
    invalidate_state,
    set_state,
 )
-from atocore.ingestion.pipeline import ingest_file, ingest_folder, get_ingestion_stats
+from atocore.ingestion.pipeline import (
+    exclusive_ingestion,
+    get_ingestion_stats,
+    get_source_status,
+    ingest_configured_sources,
+    ingest_file,
+    ingest_folder,
+)
+from atocore.interactions.service import (
+    get_interaction,
+    list_interactions,
+    record_interaction,
+)
+from atocore.memory.extractor import (
+    EXTRACTOR_VERSION,
+    MemoryCandidate,
+    extract_candidates_from_interaction,
+)
+from atocore.memory.reinforcement import reinforce_from_interaction
 from atocore.memory.service import (
+    MEMORY_STATUSES,
    MEMORY_TYPES,
    create_memory,
    get_memories,
    invalidate_memory,
+    promote_memory,
+    reject_candidate_memory,
    supersede_memory,
    update_memory,
 )
 from atocore.observability.logger import get_logger
+from atocore.ops.backup import (
+    create_runtime_backup,
+    list_runtime_backups,
+    validate_backup,
+)
+from atocore.projects.registry import (
+    build_project_registration_proposal,
+    get_project_registry_template,
+    list_registered_projects,
+    register_project,
+    refresh_registered_project,
+    update_project,
+)
 from atocore.retrieval.retriever import retrieve
 from atocore.retrieval.vector_store import get_vector_store

@@ -44,10 +79,39 @@ class IngestResponse(BaseModel):
    results: list[dict]


+class IngestSourcesResponse(BaseModel):
+    results: list[dict]
+
+
+class ProjectRefreshResponse(BaseModel):
+    project: str
+    aliases: list[str]
+    description: str
+    purge_deleted: bool
+    status: str
+    roots_ingested: int
+    roots_skipped: int
+    roots: list[dict]
+
+
+class ProjectRegistrationProposalRequest(BaseModel):
+    project_id: str
+    aliases: list[str] = []
+    description: str = ""
+    ingest_roots: list[dict]
+
+
+class ProjectUpdateRequest(BaseModel):
+    aliases: list[str] | None = None
+    description: str | None = None
+    ingest_roots: list[dict] | None = None
+
+
 class QueryRequest(BaseModel):
    prompt: str
    top_k: int = 10
    filter_tags: list[str] | None = None
+    project: str | None = None


 class QueryResponse(BaseModel):
@@ -112,12 +176,13 @@ def api_ingest(req: IngestRequest) -> IngestResponse:
    """Ingest a markdown file or folder."""
    target = Path(req.path)
    try:
-        if target.is_file():
-            results = [ingest_file(target)]
-        elif target.is_dir():
-            results = ingest_folder(target)
-        else:
-            raise HTTPException(status_code=404, detail=f"Path not found: {req.path}")
+        with exclusive_ingestion():
+            if target.is_file():
+                results = [ingest_file(target)]
+            elif target.is_dir():
+                results = ingest_folder(target)
+            else:
+                raise HTTPException(status_code=404, detail=f"Path not found: {req.path}")
    except HTTPException:
        raise
    except Exception as e:
@@ -126,11 +191,106 @@ def api_ingest(req: IngestRequest) -> IngestResponse:
    return IngestResponse(results=results)


+@router.post("/ingest/sources", response_model=IngestSourcesResponse)
+def api_ingest_sources() -> IngestSourcesResponse:
+    """Ingest enabled configured source directories."""
+    try:
+        with exclusive_ingestion():
+            results = ingest_configured_sources()
+    except Exception as e:
+        log.error("ingest_sources_failed", error=str(e))
+        raise HTTPException(status_code=500, detail=f"Configured source ingestion failed: {e}")
+    return IngestSourcesResponse(results=results)
+
+
+@router.get("/projects")
+def api_projects() -> dict:
+    """Return registered projects and their resolved ingest roots."""
+    return {
+        "projects": list_registered_projects(),
+        "registry_path": str(_config.settings.resolved_project_registry_path),
+    }
+
+
+@router.get("/projects/template")
+def api_projects_template() -> dict:
+    """Return a starter template for project registry entries."""
+    return {
+        "template": get_project_registry_template(),
+        "registry_path": str(_config.settings.resolved_project_registry_path),
+        "allowed_sources": ["vault", "drive"],
+    }
+
+
+@router.post("/projects/proposal")
+def api_project_registration_proposal(req: ProjectRegistrationProposalRequest) -> dict:
+    """Return a normalized project registration proposal without writing it."""
+    try:
+        return build_project_registration_proposal(
+            project_id=req.project_id,
+            aliases=req.aliases,
+            description=req.description,
+            ingest_roots=req.ingest_roots,
+        )
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+
+@router.post("/projects/register")
+def api_project_registration(req: ProjectRegistrationProposalRequest) -> dict:
+    """Persist a validated project registration to the registry file."""
+    try:
+        return register_project(
+            project_id=req.project_id,
+            aliases=req.aliases,
+            description=req.description,
+            ingest_roots=req.ingest_roots,
+        )
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+
+@router.put("/projects/{project_name}")
+def api_project_update(project_name: str, req: ProjectUpdateRequest) -> dict:
+    """Update an existing project registration."""
+    try:
+        return update_project(
+            project_name=project_name,
+            aliases=req.aliases,
+            description=req.description,
+            ingest_roots=req.ingest_roots,
+        )
+    except ValueError as e:
+        detail = str(e)
+        if detail.startswith("Unknown project"):
+            raise HTTPException(status_code=404, detail=detail)
+        raise HTTPException(status_code=400, detail=detail)
+
+
+@router.post("/projects/{project_name}/refresh", response_model=ProjectRefreshResponse)
+def api_refresh_project(project_name: str, purge_deleted: bool = False) -> ProjectRefreshResponse:
+    """Refresh one registered project from its configured ingest roots."""
+    try:
+        with exclusive_ingestion():
+            result = refresh_registered_project(project_name, purge_deleted=purge_deleted)
+    except ValueError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+    except Exception as e:
+        log.error("project_refresh_failed", project=project_name, error=str(e))
+        raise HTTPException(status_code=500, detail=f"Project refresh failed: {e}")
+    return ProjectRefreshResponse(**result)
+
+
@router.post("/query", response_model=QueryResponse)
 def api_query(req: QueryRequest) -> QueryResponse:
    """Retrieve relevant chunks for a prompt."""
    try:
-        chunks = retrieve(req.prompt, top_k=req.top_k, filter_tags=req.filter_tags)
+        chunks = retrieve(
+            req.prompt,
+            top_k=req.top_k,
+            filter_tags=req.filter_tags,
+            project_hint=req.project,
+        )
    except Exception as e:
        log.error("query_failed", prompt=req.prompt[:100], error=str(e))
        raise HTTPException(status_code=500, detail=f"Query failed: {e}")
@@ -196,15 +356,25 @@ def api_get_memories(
    active_only: bool = True,
    min_confidence: float = 0.0,
    limit: int = 50,
+    status: str | None = None,
 ) -> dict:
-    """List memories, optionally filtered."""
-    memories = get_memories(
-        memory_type=memory_type,
-        project=project,
-        active_only=active_only,
-        min_confidence=min_confidence,
-        limit=limit,
-    )
+    """List memories, optionally filtered.
+
+    When ``status`` is given explicitly it overrides ``active_only`` so
+    the Phase 9 Commit C review queue can be listed via
+    ``GET /memory?status=candidate``.
+    """
+    try:
+        memories = get_memories(
+            memory_type=memory_type,
+            project=project,
+            active_only=active_only,
+            min_confidence=min_confidence,
+            limit=limit,
+            status=status,
+        )
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
    return {
        "memories": [
            {
@@ -214,11 +384,14 @@ def api_get_memories(
                "project": m.project,
                "confidence": m.confidence,
                "status": m.status,
+                "reference_count": m.reference_count,
+                "last_referenced_at": m.last_referenced_at,
                "updated_at": m.updated_at,
            }
            for m in memories
        ],
        "types": MEMORY_TYPES,
+        "statuses": MEMORY_STATUSES,
    }


@@ -248,6 +421,33 @@ def api_invalidate_memory(memory_id: str) -> dict:
    return {"status": "invalidated", "id": memory_id}


+@router.post("/memory/{memory_id}/promote")
+def api_promote_memory(memory_id: str) -> dict:
+    """Promote a candidate memory to active (Phase 9 Commit C)."""
+    try:
+        success = promote_memory(memory_id)
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    if not success:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Memory not found or not a candidate: {memory_id}",
+        )
+    return {"status": "promoted", "id": memory_id}
+
+
+@router.post("/memory/{memory_id}/reject")
+def api_reject_candidate_memory(memory_id: str) -> dict:
+    """Reject a candidate memory (Phase 9 Commit C review queue)."""
+    success = reject_candidate_memory(memory_id)
+    if not success:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Memory not found or not a candidate: {memory_id}",
+        )
+    return {"status": "rejected", "id": memory_id}
+
+
@router.post("/project/state")
 def api_set_project_state(req: ProjectStateSetRequest) -> dict:
    """Set or update a trusted project state entry."""
@@ -300,14 +500,278 @@ def api_invalidate_project_state(req: ProjectStateInvalidateRequest) -> dict:
    return {"status": "invalidated", "project": req.project, "category": req.category, "key": req.key}


+class InteractionRecordRequest(BaseModel):
+    prompt: str
+    response: str = ""
+    response_summary: str = ""
+    project: str = ""
+    client: str = ""
+    session_id: str = ""
+    memories_used: list[str] = []
+    chunks_used: list[str] = []
+    context_pack: dict | None = None
+    reinforce: bool = True
+
+
+@router.post("/interactions")
+def api_record_interaction(req: InteractionRecordRequest) -> dict:
+    """Capture one interaction (prompt + response + what was used).
+
+    This is the foundation of the AtoCore reflection loop. It records
+    what the system fed to an LLM and what came back. If ``reinforce``
+    is true (default) and there is response content, the Phase 9
+    Commit B reinforcement pass runs automatically, bumping the
+    confidence of any active memory echoed in the response. Nothing is
+    ever promoted into trusted state automatically.
+    """
+    try:
+        interaction = record_interaction(
+            prompt=req.prompt,
+            response=req.response,
+            response_summary=req.response_summary,
+            project=req.project,
+            client=req.client,
+            session_id=req.session_id,
+            memories_used=req.memories_used,
+            chunks_used=req.chunks_used,
+            context_pack=req.context_pack,
+            reinforce=req.reinforce,
+        )
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    return {
+        "status": "recorded",
+        "id": interaction.id,
+        "created_at": interaction.created_at,
+    }
+
+
+@router.post("/interactions/{interaction_id}/reinforce")
+def api_reinforce_interaction(interaction_id: str) -> dict:
+    """Run the reinforcement pass on an already-captured interaction.
+
+    Useful for backfilling reinforcement over historical interactions,
+    or for retrying after a transient failure in the automatic pass
+    that runs inside ``POST /interactions``.
+    """
+    interaction = get_interaction(interaction_id)
+    if interaction is None:
+        raise HTTPException(status_code=404, detail=f"Interaction not found: {interaction_id}")
+    results = reinforce_from_interaction(interaction)
+    return {
+        "interaction_id": interaction_id,
+        "reinforced_count": len(results),
+        "reinforced": [
+            {
+                "memory_id": r.memory_id,
+                "memory_type": r.memory_type,
+                "old_confidence": round(r.old_confidence, 4),
+                "new_confidence": round(r.new_confidence, 4),
+            }
+            for r in results
+        ],
+    }
+
+
+class InteractionExtractRequest(BaseModel):
+    persist: bool = False
+
+
+@router.post("/interactions/{interaction_id}/extract")
+def api_extract_from_interaction(
+    interaction_id: str,
+    req: InteractionExtractRequest | None = None,
+) -> dict:
+    """Extract candidate memories from a captured interaction.
+
+    Phase 9 Commit C. The extractor is rule-based and deliberately
+    conservative — it only surfaces candidates that matched an explicit
+    structural cue (decision heading, preference sentence, etc.). By
+    default the candidates are returned *without* being persisted so a
+    caller can preview them before committing to a review queue. Pass
+    ``persist: true`` to immediately create candidate memories for
+    each extraction result.
+    """
+    interaction = get_interaction(interaction_id)
+    if interaction is None:
+        raise HTTPException(status_code=404, detail=f"Interaction not found: {interaction_id}")
+    payload = req or InteractionExtractRequest()
+    candidates: list[MemoryCandidate] = extract_candidates_from_interaction(interaction)
+
+    persisted_ids: list[str] = []
+    if payload.persist:
+        for candidate in candidates:
+            try:
+                mem = create_memory(
+                    memory_type=candidate.memory_type,
+                    content=candidate.content,
+                    project=candidate.project,
+                    confidence=candidate.confidence,
+                    status="candidate",
+                )
+                persisted_ids.append(mem.id)
+            except ValueError as e:
+                log.error(
+                    "extract_persist_failed",
+                    interaction_id=interaction_id,
+                    rule=candidate.rule,
+                    error=str(e),
+                )
+
+    return {
+        "interaction_id": interaction_id,
+        "candidate_count": len(candidates),
+        "persisted": payload.persist,
+        "persisted_ids": persisted_ids,
+        "extractor_version": EXTRACTOR_VERSION,
+        "candidates": [
+            {
+                "memory_type": c.memory_type,
+                "content": c.content,
+                "project": c.project,
+                "confidence": c.confidence,
+                "rule": c.rule,
+                "source_span": c.source_span,
+                "extractor_version": c.extractor_version,
+            }
+            for c in candidates
+        ],
+    }
+
+
+@router.get("/interactions")
+def api_list_interactions(
+    project: str | None = None,
+    session_id: str | None = None,
+    client: str | None = None,
+    since: str | None = None,
+    limit: int = 50,
+) -> dict:
+    """List captured interactions, optionally filtered by project, session,
+    client, or creation time. Hard-capped at 500 entries per call."""
+    interactions = list_interactions(
+        project=project,
+        session_id=session_id,
+        client=client,
+        since=since,
+        limit=limit,
+    )
+    return {
+        "count": len(interactions),
+        "interactions": [
+            {
+                "id": i.id,
+                "prompt": i.prompt,
+                "response_summary": i.response_summary,
+                "response_chars": len(i.response),
+                "project": i.project,
+                "client": i.client,
+                "session_id": i.session_id,
+                "memories_used": i.memories_used,
+                "chunks_used": i.chunks_used,
+                "created_at": i.created_at,
+            }
+            for i in interactions
+        ],
+    }
+
+
+@router.get("/interactions/{interaction_id}")
+def api_get_interaction(interaction_id: str) -> dict:
+    """Fetch a single interaction with the full response and context pack."""
+    interaction = get_interaction(interaction_id)
+    if interaction is None:
+        raise HTTPException(status_code=404, detail=f"Interaction not found: {interaction_id}")
+    return {
+        "id": interaction.id,
+        "prompt": interaction.prompt,
+        "response": interaction.response,
+        "response_summary": interaction.response_summary,
+        "project": interaction.project,
+        "client": interaction.client,
+        "session_id": interaction.session_id,
+        "memories_used": interaction.memories_used,
+        "chunks_used": interaction.chunks_used,
+        "context_pack": interaction.context_pack,
+        "created_at": interaction.created_at,
+    }
+
+
+class BackupCreateRequest(BaseModel):
+    include_chroma: bool = False
+
+
+@router.post("/admin/backup")
+def api_create_backup(req: BackupCreateRequest | None = None) -> dict:
+    """Create a runtime backup snapshot.
+
+    When ``include_chroma`` is true the call holds the ingestion lock so a
+    safe cold copy of the vector store can be taken without racing against
+    refresh or ingest endpoints.
+    """
+    payload = req or BackupCreateRequest()
+    try:
+        if payload.include_chroma:
+            with exclusive_ingestion():
+                metadata = create_runtime_backup(include_chroma=True)
+        else:
+            metadata = create_runtime_backup(include_chroma=False)
+    except Exception as e:
+        log.error("admin_backup_failed", error=str(e))
+        raise HTTPException(status_code=500, detail=f"Backup failed: {e}")
+    return metadata
+
+
+@router.get("/admin/backup")
+def api_list_backups() -> dict:
+    """List all runtime backups under the configured backup directory."""
+    return {
+        "backup_dir": str(_config.settings.resolved_backup_dir),
+        "backups": list_runtime_backups(),
+    }
+
+
+@router.get("/admin/backup/{stamp}/validate")
+def api_validate_backup(stamp: str) -> dict:
+    """Validate that a previously created backup is structurally usable."""
+    result = validate_backup(stamp)
+    if not result.get("exists", False):
+        raise HTTPException(status_code=404, detail=f"Backup not found: {stamp}")
+    return result
+
+
@router.get("/health")
 def api_health() -> dict:
    """Health check."""
    store = get_vector_store()
+    source_status = get_source_status()
    return {
        "status": "ok",
        "version": "0.1.0",
        "vectors_count": store.count,
+        "env": _config.settings.env,
+        "machine_paths": {
+            "db_path": str(_config.settings.db_path),
+            "chroma_path": str(_config.settings.chroma_path),
+            "log_dir": str(_config.settings.resolved_log_dir),
+            "backup_dir": str(_config.settings.resolved_backup_dir),
+            "run_dir": str(_config.settings.resolved_run_dir),
+        },
+        "sources_ready": all(
+            (not source["enabled"]) or (source["exists"] and source["is_dir"])
+            for source in source_status
+        ),
+        "source_status": source_status,
+    }
+
+
+@router.get("/sources")
+def api_sources() -> dict:
+    """Return configured ingestion source directories and readiness."""
+    return {
+        "sources": get_source_status(),
+        "vault_enabled": _config.settings.source_vault_enabled,
+        "drive_enabled": _config.settings.source_drive_enabled,
    }


--- a/src/atocore/config.py
+++ b/src/atocore/config.py
@@ -6,10 +6,25 @@ from pydantic_settings import BaseSettings


 class Settings(BaseSettings):
+    env: str = "development"
    debug: bool = False
+    log_level: str = "INFO"
    data_dir: Path = Path("./data")
+    db_dir: Path | None = None
+    chroma_dir: Path | None = None
+    cache_dir: Path | None = None
+    tmp_dir: Path | None = None
+    vault_source_dir: Path = Path("./sources/vault")
+    drive_source_dir: Path = Path("./sources/drive")
+    source_vault_enabled: bool = True
+    source_drive_enabled: bool = True
+    log_dir: Path = Path("./logs")
+    backup_dir: Path = Path("./backups")
+    run_dir: Path = Path("./run")
+    project_registry_path: Path = Path("./config/project-registry.json")
    host: str = "127.0.0.1"
    port: int = 8100
+    db_busy_timeout_ms: int = 5000

    # Embedding
    embedding_model: str = (
@@ -25,15 +40,118 @@ class Settings(BaseSettings):
    context_budget: int = 3000
    context_top_k: int = 15

+    # Retrieval ranking weights (tunable per environment).
+    # All multipliers default to the values used since Wave 1; tighten or
+    # loosen them via ATOCORE_* env vars without touching code.
+    rank_project_match_boost: float = 2.0
+    rank_query_token_step: float = 0.08
+    rank_query_token_cap: float = 1.32
+    rank_path_high_signal_boost: float = 1.18
+    rank_path_low_signal_penalty: float = 0.72
+
    model_config = {"env_prefix": "ATOCORE_"}

    @property
    def db_path(self) -> Path:
-        return self.data_dir / "atocore.db"
+        legacy_path = self.resolved_data_dir / "atocore.db"
+        if self.db_dir is not None:
+            return self.resolved_db_dir / "atocore.db"
+        if legacy_path.exists():
+            return legacy_path
+        return self.resolved_db_dir / "atocore.db"

    @property
    def chroma_path(self) -> Path:
-        return self.data_dir / "chroma"
+        return self._resolve_path(self.chroma_dir or (self.resolved_data_dir / "chroma"))
+
+    @property
+    def cache_path(self) -> Path:
+        return self._resolve_path(self.cache_dir or (self.resolved_data_dir / "cache"))
+
+    @property
+    def tmp_path(self) -> Path:
+        return self._resolve_path(self.tmp_dir or (self.resolved_data_dir / "tmp"))
+
+    @property
+    def resolved_data_dir(self) -> Path:
+        return self._resolve_path(self.data_dir)
+
+    @property
+    def resolved_db_dir(self) -> Path:
+        return self._resolve_path(self.db_dir or (self.resolved_data_dir / "db"))
+
+    @property
+    def resolved_vault_source_dir(self) -> Path:
+        return self._resolve_path(self.vault_source_dir)
+
+    @property
+    def resolved_drive_source_dir(self) -> Path:
+        return self._resolve_path(self.drive_source_dir)
+
+    @property
+    def resolved_log_dir(self) -> Path:
+        return self._resolve_path(self.log_dir)
+
+    @property
+    def resolved_backup_dir(self) -> Path:
+        return self._resolve_path(self.backup_dir)
+
+    @property
+    def resolved_run_dir(self) -> Path:
+        if self.run_dir == Path("./run"):
+            return self._resolve_path(self.resolved_data_dir.parent / "run")
+        return self._resolve_path(self.run_dir)
+
+    @property
+    def resolved_project_registry_path(self) -> Path:
+        return self._resolve_path(self.project_registry_path)
+
+    @property
+    def machine_dirs(self) -> list[Path]:
+        return [
+            self.db_path.parent,
+            self.chroma_path,
+            self.cache_path,
+            self.tmp_path,
+            self.resolved_log_dir,
+            self.resolved_backup_dir,
+            self.resolved_run_dir,
+            self.resolved_project_registry_path.parent,
+        ]
+
+    @property
+    def source_specs(self) -> list[dict[str, object]]:
+        return [
+            {
+                "name": "vault",
+                "enabled": self.source_vault_enabled,
+                "path": self.resolved_vault_source_dir,
+                "read_only": True,
+            },
+            {
+                "name": "drive",
+                "enabled": self.source_drive_enabled,
+                "path": self.resolved_drive_source_dir,
+                "read_only": True,
+            },
+        ]
+
+    @property
+    def source_dirs(self) -> list[Path]:
+        return [spec["path"] for spec in self.source_specs if spec["enabled"]]
+
+    def _resolve_path(self, path: Path) -> Path:
+        return path.expanduser().resolve(strict=False)


 settings = Settings()
+
+
+def ensure_runtime_dirs() -> None:
+    """Create writable runtime directories for machine state and logs.
+
+    Source directories are intentionally excluded because they are treated as
+    read-only ingestion inputs by convention.
+    """
+    for directory in settings.machine_dirs:
+        directory.mkdir(parents=True, exist_ok=True)
--- a/src/atocore/context/builder.py
+++ b/src/atocore/context/builder.py
@@ -104,7 +104,15 @@ def build_context(
    retrieval_budget = budget - project_state_chars - memory_chars

    # 4. Retrieve candidates
-    candidates = retrieve(user_prompt, top_k=_config.settings.context_top_k) if retrieval_budget > 0 else []
+    candidates = (
+        retrieve(
+            user_prompt,
+            top_k=_config.settings.context_top_k,
+            project_hint=project_hint,
+        )
+        if retrieval_budget > 0
+        else []
+    )

    # 5. Score and rank
    scored = _rank_chunks(candidates, project_hint)
--- a/src/atocore/ingestion/pipeline.py
+++ b/src/atocore/ingestion/pipeline.py
@@ -2,10 +2,13 @@

 import hashlib
 import json
+import threading
 import time
 import uuid
+from contextlib import contextmanager
 from pathlib import Path

+import atocore.config as _config
 from atocore.ingestion.chunker import chunk_markdown
 from atocore.ingestion.parser import parse_markdown
 from atocore.models.database import get_connection
@@ -16,6 +19,17 @@ log = get_logger("ingestion")

 # Encodings to try when reading markdown files
 _ENCODINGS = ["utf-8", "utf-8-sig", "latin-1", "cp1252"]
+_INGESTION_LOCK = threading.Lock()
+
+
+@contextmanager
+def exclusive_ingestion():
+    """Serialize long-running ingestion operations across API requests."""
+    _INGESTION_LOCK.acquire()
+    try:
+        yield
+    finally:
+        _INGESTION_LOCK.release()


 def ingest_file(file_path: Path) -> dict:
@@ -189,6 +203,52 @@ def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
    return results


+def get_source_status() -> list[dict]:
+    """Describe configured source directories and their readiness."""
+    sources = []
+    for spec in _config.settings.source_specs:
+        path = spec["path"]
+        assert isinstance(path, Path)
+        sources.append(
+            {
+                "name": spec["name"],
+                "enabled": spec["enabled"],
+                "path": str(path),
+                "exists": path.exists(),
+                "is_dir": path.is_dir(),
+                "read_only": spec["read_only"],
+            }
+        )
+    return sources
+
+
+def ingest_configured_sources(purge_deleted: bool = False) -> list[dict]:
+    """Ingest enabled source directories declared in config.
+
+    Purge is disabled by default here because sources are intended to be
+    read-only inputs and should not be treated as the primary writable state.
+    """
+    results = []
+    for source in get_source_status():
+        if not source["enabled"]:
+            results.append({"source": source["name"], "status": "disabled", "path": source["path"]})
+            continue
+        if not source["exists"] or not source["is_dir"]:
+            results.append({"source": source["name"], "status": "missing", "path": source["path"]})
+            continue
+
+        folder_results = ingest_folder(Path(source["path"]), purge_deleted=purge_deleted)
+        results.append(
+            {
+                "source": source["name"],
+                "status": "ingested",
+                "path": source["path"],
+                "results": folder_results,
+            }
+        )
+    return results
+
+
 def get_ingestion_stats() -> dict:
    """Return ingestion statistics."""
    with get_connection() as conn:
--- a/src/atocore/interactions/init.py
+++ b/src/atocore/interactions/init.py
@@ -0,0 +1,27 @@
+"""Interactions: capture loop for AtoCore.
+
+This module is the foundation for Phase 9 (Reflection) and Phase 10
+(Write-back). It records what AtoCore fed to an LLM and what came back,
+so that later phases can:
+
+- reinforce active memories that the LLM actually relied on
+- extract candidate memories / project state from real conversations
+- inspect the audit trail of any answer the system helped produce
+
+Nothing here automatically promotes information into trusted state.
+The capture loop is intentionally read-only with respect to trust.
+"""
+
+from atocore.interactions.service import (
+    Interaction,
+    get_interaction,
+    list_interactions,
+    record_interaction,
+)
+
+__all__ = [
+    "Interaction",
+    "get_interaction",
+    "list_interactions",
+    "record_interaction",
+]
--- a/src/atocore/interactions/service.py
+++ b/src/atocore/interactions/service.py
@@ -0,0 +1,245 @@
+"""Interaction capture service.
+
+An *interaction* is one round-trip of:
+- a user prompt
+- the AtoCore context pack that was assembled for it
+- the LLM response (full text or a summary, caller's choice)
+- which memories and chunks were actually used in the pack
+- a client identifier (e.g. ``openclaw``, ``claude-code``, ``manual``)
+- an optional session identifier so multi-turn conversations can be
+  reconstructed later
+
+The capture is intentionally additive: it never modifies memories,
+project state, or chunks. Reflection (Phase 9 Commit B/C) and
+write-back (Phase 10) are layered on top of this audit trail without
+violating the AtoCore trust hierarchy.
+"""
+
+from __future__ import annotations
+
+import json
+import uuid
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+
+from atocore.models.database import get_connection
+from atocore.observability.logger import get_logger
+
+log = get_logger("interactions")
+
+
+@dataclass
+class Interaction:
+    id: str
+    prompt: str
+    response: str
+    response_summary: str
+    project: str
+    client: str
+    session_id: str
+    memories_used: list[str] = field(default_factory=list)
+    chunks_used: list[str] = field(default_factory=list)
+    context_pack: dict = field(default_factory=dict)
+    created_at: str = ""
+
+
+def record_interaction(
+    prompt: str,
+    response: str = "",
+    response_summary: str = "",
+    project: str = "",
+    client: str = "",
+    session_id: str = "",
+    memories_used: list[str] | None = None,
+    chunks_used: list[str] | None = None,
+    context_pack: dict | None = None,
+    reinforce: bool = True,
+) -> Interaction:
+    """Persist a single interaction to the audit trail.
+
+    The only required field is ``prompt`` so this can be called even when
+    the caller is in the middle of a partial turn (for example to record
+    that AtoCore was queried even before the LLM response is back).
+
+    When ``reinforce`` is True (default) and the interaction has response
+    content, the Phase 9 Commit B reinforcement pass runs automatically
+    against the active memory set. This bumps the confidence of any
+    memory whose content is echoed in the response. Set ``reinforce`` to
+    False to capture the interaction without touching memory confidence,
+    which is useful for backfill and for tests that want to isolate the
+    audit trail from the reinforcement loop.
+    """
+    if not prompt or not prompt.strip():
+        raise ValueError("Interaction prompt must be non-empty")
+
+    interaction_id = str(uuid.uuid4())
+    # Store created_at explicitly so the same string lives in both the DB
+    # column and the returned dataclass. SQLite's CURRENT_TIMESTAMP uses
+    # 'YYYY-MM-DD HH:MM:SS' which would not compare cleanly against ISO
+    # timestamps with 'T' and tz offset, breaking the `since` filter on
+    # list_interactions.
+    now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+    memories_used = list(memories_used or [])
+    chunks_used = list(chunks_used or [])
+    context_pack_payload = context_pack or {}
+
+    with get_connection() as conn:
+        conn.execute(
+            """
+            INSERT INTO interactions (
+                id, prompt, context_pack, response_summary, response,
+                memories_used, chunks_used, client, session_id, project,
+                created_at
+            ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+            """,
+            (
+                interaction_id,
+                prompt,
+                json.dumps(context_pack_payload, ensure_ascii=True),
+                response_summary,
+                response,
+                json.dumps(memories_used, ensure_ascii=True),
+                json.dumps(chunks_used, ensure_ascii=True),
+                client,
+                session_id,
+                project,
+                now,
+            ),
+        )
+
+    log.info(
+        "interaction_recorded",
+        interaction_id=interaction_id,
+        project=project,
+        client=client,
+        session_id=session_id,
+        memories_used=len(memories_used),
+        chunks_used=len(chunks_used),
+        response_chars=len(response),
+    )
+
+    interaction = Interaction(
+        id=interaction_id,
+        prompt=prompt,
+        response=response,
+        response_summary=response_summary,
+        project=project,
+        client=client,
+        session_id=session_id,
+        memories_used=memories_used,
+        chunks_used=chunks_used,
+        context_pack=context_pack_payload,
+        created_at=now,
+    )
+
+    if reinforce and (response or response_summary):
+        # Import inside the function to avoid a circular import between
+        # the interactions service and the reinforcement module which
+        # depends on it.
+        try:
+            from atocore.memory.reinforcement import reinforce_from_interaction
+
+            reinforce_from_interaction(interaction)
+        except Exception as exc:  # pragma: no cover - reinforcement must never block capture
+            log.error(
+                "reinforcement_failed_on_capture",
+                interaction_id=interaction_id,
+                error=str(exc),
+            )
+
+    return interaction
+
+
+def list_interactions(
+    project: str | None = None,
+    session_id: str | None = None,
+    client: str | None = None,
+    since: str | None = None,
+    limit: int = 50,
+) -> list[Interaction]:
+    """List captured interactions, optionally filtered.
+
+    ``since`` is an ISO timestamp string; only interactions created at or
+    after that time are returned. ``limit`` is hard-capped at 500 to keep
+    casual API listings cheap.
+    """
+    if limit <= 0:
+        return []
+    limit = min(limit, 500)
+
+    query = "SELECT * FROM interactions WHERE 1=1"
+    params: list = []
+
+    if project:
+        query += " AND project = ?"
+        params.append(project)
+    if session_id:
+        query += " AND session_id = ?"
+        params.append(session_id)
+    if client:
+        query += " AND client = ?"
+        params.append(client)
+    if since:
+        query += " AND created_at >= ?"
+        params.append(since)
+
+    query += " ORDER BY created_at DESC LIMIT ?"
+    params.append(limit)
+
+    with get_connection() as conn:
+        rows = conn.execute(query, params).fetchall()
+
+    return [_row_to_interaction(row) for row in rows]
+
+
+def get_interaction(interaction_id: str) -> Interaction | None:
+    """Fetch one interaction by id, or return None if it does not exist."""
+    if not interaction_id:
+        return None
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT * FROM interactions WHERE id = ?", (interaction_id,)
+        ).fetchone()
+    if row is None:
+        return None
+    return _row_to_interaction(row)
+
+
+def _row_to_interaction(row) -> Interaction:
+    return Interaction(
+        id=row["id"],
+        prompt=row["prompt"],
+        response=row["response"] or "",
+        response_summary=row["response_summary"] or "",
+        project=row["project"] or "",
+        client=row["client"] or "",
+        session_id=row["session_id"] or "",
+        memories_used=_safe_json_list(row["memories_used"]),
+        chunks_used=_safe_json_list(row["chunks_used"]),
+        context_pack=_safe_json_dict(row["context_pack"]),
+        created_at=row["created_at"] or "",
+    )
+
+
+def _safe_json_list(raw: str | None) -> list[str]:
+    if not raw:
+        return []
+    try:
+        value = json.loads(raw)
+    except json.JSONDecodeError:
+        return []
+    if not isinstance(value, list):
+        return []
+    return [str(item) for item in value]
+
+
+def _safe_json_dict(raw: str | None) -> dict:
+    if not raw:
+        return {}
+    try:
+        value = json.loads(raw)
+    except json.JSONDecodeError:
+        return {}
+    if not isinstance(value, dict):
+        return {}
+    return value
--- a/src/atocore/main.py
+++ b/src/atocore/main.py
@@ -1,29 +1,55 @@
 """AtoCore — FastAPI application entry point."""

+from contextlib import asynccontextmanager
+
 from fastapi import FastAPI

 from atocore.api.routes import router
 import atocore.config as _config
 from atocore.context.project_state import init_project_state_schema
+from atocore.ingestion.pipeline import get_source_status
 from atocore.models.database import init_db
-from atocore.observability.logger import setup_logging
+from atocore.observability.logger import get_logger, setup_logging
+
+
+log = get_logger("main")
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Run setup before the first request and teardown after shutdown.
+
+    Replaces the deprecated ``@app.on_event("startup")`` hook with the
+    modern ``lifespan`` context manager. Setup runs synchronously (the
+    underlying calls are blocking I/O) so no await is needed; the
+    function still must be async per the FastAPI contract.
+    """
+    setup_logging()
+    _config.ensure_runtime_dirs()
+    init_db()
+    init_project_state_schema()
+    log.info(
+        "startup_ready",
+        env=_config.settings.env,
+        db_path=str(_config.settings.db_path),
+        chroma_path=str(_config.settings.chroma_path),
+        source_status=get_source_status(),
+    )
+    yield
+    # No teardown work needed today; SQLite connections are short-lived
+    # and the Chroma client cleans itself up on process exit.
+

 app = FastAPI(
    title="AtoCore",
    description="Personal Context Engine for LLM interactions",
    version="0.1.0",
+    lifespan=lifespan,
 )

 app.include_router(router)


-@app.on_event("startup")
-def startup():
-    setup_logging()
-    init_db()
-    init_project_state_schema()
-
-
 if __name__ == "__main__":
    import uvicorn

--- a/src/atocore/memory/extractor.py
+++ b/src/atocore/memory/extractor.py
@@ -0,0 +1,242 @@
+"""Rule-based candidate-memory extraction from captured interactions.
+
+Phase 9 Commit C. This module reads an interaction's response text and
+produces a list of *candidate* memories that a human can later review
+and either promote to active or reject. Nothing extracted here is ever
+automatically promoted into trusted state — the AtoCore trust rule is
+that bad memory is worse than no memory, so the extractor is
+conservative on purpose.
+
+Design rules for V0
+-------------------
+1. Rule-based only. No LLM calls. The extractor should be fast, cheap,
+   fully explainable, and produce the same output for the same input
+   across runs.
+2. Patterns match obvious, high-signal structures and are intentionally
+   narrow. False positives are more harmful than false negatives because
+   every candidate means review work for a human.
+3. Every extracted candidate records which pattern fired and which text
+   span it came from, so a reviewer can audit the extractor's reasoning.
+4. Patterns should feel like idioms the user already writes in their
+   PKM and interaction notes:
+     * ``## Decision: ...`` and variants
+     * ``## Constraint: ...`` and variants
+     * ``I prefer <X>`` / ``the user prefers <X>``
+     * ``decided to <X>``
+     * ``<X> is a requirement`` / ``requirement: <X>``
+5. Candidates are de-duplicated against already-active memories of the
+   same type+project so review queues don't fill up with things the
+   user has already curated.
+
+The extractor produces ``MemoryCandidate`` objects. The caller decides
+whether to persist them via ``create_memory(..., status="candidate")``.
+Persistence is kept out of the extractor itself so it can be tested
+without touching the database and so future extractors (LLM-based,
+structural, ontology-driven) can be swapped in cleanly.
+"""
+
+from __future__ import annotations
+
+import re
+from dataclasses import dataclass
+
+from atocore.interactions.service import Interaction
+from atocore.memory.service import MEMORY_TYPES, get_memories
+from atocore.observability.logger import get_logger
+
+log = get_logger("extractor")
+
+
+# Bumped whenever the rule set, regex shapes, or post-processing
+# semantics change in a way that could affect candidate output. The
+# promotion-rules doc requires every candidate to record the version
+# of the extractor that produced it so old candidates can be re-evaluated
+# (or kept as-is) when the rules evolve.
+#
+# History:
+#   0.1.0 - initial Phase 9 Commit C rule set (Apr 6, 2026)
+EXTRACTOR_VERSION = "0.1.0"
+
+
+# Every candidate is attributed to the rule that fired so reviewers can
+# audit why it was proposed.
+@dataclass
+class MemoryCandidate:
+    memory_type: str
+    content: str
+    rule: str
+    source_span: str
+    project: str = ""
+    confidence: float = 0.5  # default review-queue confidence
+    source_interaction_id: str = ""
+    extractor_version: str = EXTRACTOR_VERSION
+
+
+# ---------------------------------------------------------------------------
+# Pattern definitions
+# ---------------------------------------------------------------------------
+#
+# Each pattern maps to:
+#   - the memory type the candidate should land in
+#   - a compiled regex over the response text
+#   - a short human-readable rule id
+#
+# Regexes are intentionally anchored to obvious structural cues so random
+# prose doesn't light them up. All are case-insensitive and DOTALL so
+# they can span a line break inside a single logical phrase.
+
+_RULES: list[tuple[str, str, re.Pattern]] = [
+    (
+        "decision_heading",
+        "adaptation",
+        re.compile(
+            r"^[ \t]*#{1,6}[ \t]*decision[ \t]*[:\-\u2014][ \t]*(?P<value>.+?)$",
+            re.IGNORECASE | re.MULTILINE,
+        ),
+    ),
+    (
+        "constraint_heading",
+        "project",
+        re.compile(
+            r"^[ \t]*#{1,6}[ \t]*constraint[ \t]*[:\-\u2014][ \t]*(?P<value>.+?)$",
+            re.IGNORECASE | re.MULTILINE,
+        ),
+    ),
+    (
+        "requirement_heading",
+        "project",
+        re.compile(
+            r"^[ \t]*#{1,6}[ \t]*requirement[ \t]*[:\-\u2014][ \t]*(?P<value>.+?)$",
+            re.IGNORECASE | re.MULTILINE,
+        ),
+    ),
+    (
+        "fact_heading",
+        "knowledge",
+        re.compile(
+            r"^[ \t]*#{1,6}[ \t]*fact[ \t]*[:\-\u2014][ \t]*(?P<value>.+?)$",
+            re.IGNORECASE | re.MULTILINE,
+        ),
+    ),
+    (
+        "preference_sentence",
+        "preference",
+        re.compile(
+            r"(?:^|[\s\.])(?:I|the user)\s+prefer(?:s)?\s+(?P<value>[^\n\.\!]{6,200})",
+            re.IGNORECASE,
+        ),
+    ),
+    (
+        "decided_to_sentence",
+        "adaptation",
+        re.compile(
+            r"(?:^|[\s\.])(?:I|we|the user)\s+decided\s+to\s+(?P<value>[^\n\.\!]{6,200})",
+            re.IGNORECASE,
+        ),
+    ),
+    (
+        "requirement_sentence",
+        "project",
+        re.compile(
+            r"(?:^|[\s\.])(?:the[ \t]+)?requirement\s+(?:is|was)\s+(?P<value>[^\n\.\!]{6,200})",
+            re.IGNORECASE,
+        ),
+    ),
+]
+
+# A minimum content length after trimming stops silly one-word candidates.
+_MIN_CANDIDATE_LENGTH = 8
+# A maximum content length keeps candidates reviewable at a glance.
+_MAX_CANDIDATE_LENGTH = 280
+
+
+def extract_candidates_from_interaction(
+    interaction: Interaction,
+) -> list[MemoryCandidate]:
+    """Return a list of candidate memories for human review.
+
+    The returned candidates are not persisted. The caller can iterate
+    over the result and call ``create_memory(..., status="candidate")``
+    for each one it wants to land.
+    """
+    text = _combined_response_text(interaction)
+    if not text:
+        return []
+
+    raw_candidates: list[MemoryCandidate] = []
+    seen_spans: set[tuple[str, str, str]] = set()  # (type, normalized_value, rule)
+
+    for rule_id, memory_type, pattern in _RULES:
+        for match in pattern.finditer(text):
+            value = _clean_value(match.group("value"))
+            if len(value) < _MIN_CANDIDATE_LENGTH or len(value) > _MAX_CANDIDATE_LENGTH:
+                continue
+            normalized = value.lower()
+            dedup_key = (memory_type, normalized, rule_id)
+            if dedup_key in seen_spans:
+                continue
+            seen_spans.add(dedup_key)
+            raw_candidates.append(
+                MemoryCandidate(
+                    memory_type=memory_type,
+                    content=value,
+                    rule=rule_id,
+                    source_span=match.group(0).strip(),
+                    project=interaction.project or "",
+                    confidence=0.5,
+                    source_interaction_id=interaction.id,
+                )
+            )
+
+    # Drop anything that duplicates an already-active memory of the
+    # same type and project so reviewers aren't asked to re-curate
+    # things they already promoted.
+    filtered = [c for c in raw_candidates if not _matches_existing_active(c)]
+
+    if filtered:
+        log.info(
+            "extraction_produced_candidates",
+            interaction_id=interaction.id,
+            candidate_count=len(filtered),
+            dropped_as_duplicate=len(raw_candidates) - len(filtered),
+        )
+    return filtered
+
+
+def _combined_response_text(interaction: Interaction) -> str:
+    parts: list[str] = []
+    if interaction.response:
+        parts.append(interaction.response)
+    if interaction.response_summary:
+        parts.append(interaction.response_summary)
+    return "\n".join(parts).strip()
+
+
+def _clean_value(raw: str) -> str:
+    """Trim whitespace, strip trailing punctuation, collapse inner spaces."""
+    cleaned = re.sub(r"\s+", " ", raw).strip()
+    # Trim trailing punctuation that commonly trails sentences but is not
+    # part of the fact itself.
+    cleaned = cleaned.rstrip(".;,!?\u2014-")
+    return cleaned.strip()
+
+
+def _matches_existing_active(candidate: MemoryCandidate) -> bool:
+    """Return True if an identical active memory already exists."""
+    if candidate.memory_type not in MEMORY_TYPES:
+        return False
+    try:
+        existing = get_memories(
+            memory_type=candidate.memory_type,
+            project=candidate.project or None,
+            active_only=True,
+            limit=200,
+        )
+    except Exception as exc:  # pragma: no cover - defensive
+        log.error("extractor_existing_lookup_failed", error=str(exc))
+        return False
+    needle = candidate.content.lower()
+    for mem in existing:
+        if mem.content.lower() == needle:
+            return True
+    return False
--- a/src/atocore/memory/reinforcement.py
+++ b/src/atocore/memory/reinforcement.py
@@ -0,0 +1,155 @@
+"""Reinforce active memories from captured interactions (Phase 9 Commit B).
+
+When an interaction is captured with a non-empty response, this module
+scans the response text against currently-active memories and bumps the
+confidence of any memory whose content appears in the response. The
+intent is to surface a weak signal that the LLM actually relied on a
+given memory, without ever promoting anything new into trusted state.
+
+Design notes
+------------
+- Matching is intentionally simple and explainable:
+    * normalize both sides (lowercase, collapse whitespace)
+    * require the normalized memory content (or its first 80 chars) to
+      appear as a substring in the normalized response
+- Candidates and invalidated memories are NEVER considered — reinforcement
+  must not revive history.
+- Reinforcement is capped at 1.0 and monotonically non-decreasing.
+- The function is idempotent with respect to a single call but will
+  accumulate confidence across multiple calls; that is intentional — if
+  the same memory is mentioned in 10 separate conversations it is, by
+  definition, more confidently useful.
+"""
+
+from __future__ import annotations
+
+import re
+from dataclasses import dataclass
+
+from atocore.interactions.service import Interaction
+from atocore.memory.service import (
+    Memory,
+    get_memories,
+    reinforce_memory,
+)
+from atocore.observability.logger import get_logger
+
+log = get_logger("reinforcement")
+
+# Minimum memory content length to consider for matching. Too-short
+# memories (e.g. "use SI") would otherwise fire on almost every response
+# and generate noise. 12 characters is long enough to require real
+# semantic content but short enough to match one-liner identity
+# memories like "prefers Python".
+_MIN_MEMORY_CONTENT_LENGTH = 12
+
+# When a memory's content is very long, match on its leading window only
+# to avoid punishing small paraphrases further into the body.
+_MATCH_WINDOW_CHARS = 80
+
+DEFAULT_CONFIDENCE_DELTA = 0.02
+
+
+@dataclass
+class ReinforcementResult:
+    memory_id: str
+    memory_type: str
+    old_confidence: float
+    new_confidence: float
+
+
+def reinforce_from_interaction(
+    interaction: Interaction,
+    confidence_delta: float = DEFAULT_CONFIDENCE_DELTA,
+) -> list[ReinforcementResult]:
+    """Scan an interaction's response for active-memory mentions.
+
+    Returns the list of memories that were reinforced. An empty list is
+    returned if the interaction has no response content, if no memories
+    match, or if the interaction has no project scope and the global
+    active set is empty.
+    """
+    response_text = _combined_response_text(interaction)
+    if not response_text:
+        return []
+
+    normalized_response = _normalize(response_text)
+    if not normalized_response:
+        return []
+
+    # Fetch the candidate pool of active memories. We cast a wide net
+    # here: project-scoped memories for the interaction's project first,
+    # plus identity and preference memories which are global by nature.
+    candidate_pool: list[Memory] = []
+    seen_ids: set[str] = set()
+
+    def _add_batch(batch: list[Memory]) -> None:
+        for mem in batch:
+            if mem.id in seen_ids:
+                continue
+            seen_ids.add(mem.id)
+            candidate_pool.append(mem)
+
+    if interaction.project:
+        _add_batch(get_memories(project=interaction.project, active_only=True, limit=200))
+    _add_batch(get_memories(memory_type="identity", active_only=True, limit=50))
+    _add_batch(get_memories(memory_type="preference", active_only=True, limit=50))
+
+    reinforced: list[ReinforcementResult] = []
+    for memory in candidate_pool:
+        if not _memory_matches(memory.content, normalized_response):
+            continue
+        applied, old_conf, new_conf = reinforce_memory(
+            memory.id, confidence_delta=confidence_delta
+        )
+        if not applied:
+            continue
+        reinforced.append(
+            ReinforcementResult(
+                memory_id=memory.id,
+                memory_type=memory.memory_type,
+                old_confidence=old_conf,
+                new_confidence=new_conf,
+            )
+        )
+
+    if reinforced:
+        log.info(
+            "reinforcement_applied",
+            interaction_id=interaction.id,
+            project=interaction.project,
+            reinforced_count=len(reinforced),
+        )
+    return reinforced
+
+
+def _combined_response_text(interaction: Interaction) -> str:
+    """Pick the best available response text from an interaction."""
+    parts: list[str] = []
+    if interaction.response:
+        parts.append(interaction.response)
+    if interaction.response_summary:
+        parts.append(interaction.response_summary)
+    return "\n".join(parts).strip()
+
+
+def _normalize(text: str) -> str:
+    """Lowercase and collapse whitespace for substring matching."""
+    if not text:
+        return ""
+    lowered = text.lower()
+    # Collapse any run of whitespace (including newlines and tabs) to
+    # a single space so multi-line responses match single-line memories.
+    collapsed = re.sub(r"\s+", " ", lowered)
+    return collapsed.strip()
+
+
+def _memory_matches(memory_content: str, normalized_response: str) -> bool:
+    """Return True if the memory content appears in the response."""
+    if not memory_content:
+        return False
+    normalized_memory = _normalize(memory_content)
+    if len(normalized_memory) < _MIN_MEMORY_CONTENT_LENGTH:
+        return False
+    window = normalized_memory[:_MATCH_WINDOW_CHARS]
+    return window in normalized_response
--- a/src/atocore/memory/service.py
+++ b/src/atocore/memory/service.py
@@ -10,7 +10,16 @@ Memory types (per Master Plan):

 Memories have:
  - confidence (0.0–1.0): how certain we are
-  - status (active/superseded/invalid): lifecycle state
+  - status: lifecycle state, one of MEMORY_STATUSES
+      * candidate:  extracted from an interaction, awaiting human review
+                    (Phase 9 Commit C). Candidates are NEVER included in
+                    context packs.
+      * active:     promoted/curated, visible to retrieval and context
+      * superseded: replaced by a newer entry
+      * invalid:    rejected / error-corrected
+  - last_referenced_at / reference_count: reinforcement signal
+      (Phase 9 Commit B). Bumped whenever a captured interaction's
+      response content echoes this memory.
  - optional link to source chunk: traceability
 """

@@ -32,6 +41,13 @@ MEMORY_TYPES = [
    "adaptation",
 ]

+MEMORY_STATUSES = [
+    "candidate",
+    "active",
+    "superseded",
+    "invalid",
+]
+

@dataclass
 class Memory:
@@ -44,6 +60,8 @@ class Memory:
    status: str
    created_at: str
    updated_at: str
+    last_referenced_at: str = ""
+    reference_count: int = 0


 def create_memory(
@@ -52,35 +70,57 @@ def create_memory(
    project: str = "",
    source_chunk_id: str = "",
    confidence: float = 1.0,
+    status: str = "active",
 ) -> Memory:
-    """Create a new memory entry."""
+    """Create a new memory entry.
+
+    ``status`` defaults to ``active`` for backward compatibility. Pass
+    ``candidate`` when the memory is being proposed by the Phase 9 Commit C
+    extractor and still needs human review before it can influence context.
+    """
    if memory_type not in MEMORY_TYPES:
        raise ValueError(f"Invalid memory type '{memory_type}'. Must be one of: {MEMORY_TYPES}")
+    if status not in MEMORY_STATUSES:
+        raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
    _validate_confidence(confidence)

    memory_id = str(uuid.uuid4())
    now = datetime.now(timezone.utc).isoformat()

-    # Check for duplicate content within same type+project
+    # Check for duplicate content within the same type+project at the same status.
+    # Scoping by status keeps active curation separate from the candidate
+    # review queue: a candidate and an active memory with identical text can
+    # legitimately coexist if the candidate is a fresh extraction of something
+    # already curated.
    with get_connection() as conn:
        existing = conn.execute(
            "SELECT id FROM memories "
-            "WHERE memory_type = ? AND content = ? AND project = ? AND status = 'active'",
-            (memory_type, content, project),
+            "WHERE memory_type = ? AND content = ? AND project = ? AND status = ?",
+            (memory_type, content, project, status),
        ).fetchone()
        if existing:
-            log.info("memory_duplicate_skipped", memory_type=memory_type, content_preview=content[:80])
+            log.info(
+                "memory_duplicate_skipped",
+                memory_type=memory_type,
+                status=status,
+                content_preview=content[:80],
+            )
            return _row_to_memory(
                conn.execute("SELECT * FROM memories WHERE id = ?", (existing["id"],)).fetchone()
            )

        conn.execute(
            "INSERT INTO memories (id, memory_type, content, project, source_chunk_id, confidence, status) "
-            "VALUES (?, ?, ?, ?, ?, ?, 'active')",
-            (memory_id, memory_type, content, project, source_chunk_id or None, confidence),
+            "VALUES (?, ?, ?, ?, ?, ?, ?)",
+            (memory_id, memory_type, content, project, source_chunk_id or None, confidence, status),
        )

-    log.info("memory_created", memory_type=memory_type, content_preview=content[:80])
+    log.info(
+        "memory_created",
+        memory_type=memory_type,
+        status=status,
+        content_preview=content[:80],
+    )

    return Memory(
        id=memory_id,
@@ -89,9 +129,11 @@ def create_memory(
        project=project,
        source_chunk_id=source_chunk_id,
        confidence=confidence,
-        status="active",
+        status=status,
        created_at=now,
        updated_at=now,
+        last_referenced_at="",
+        reference_count=0,
    )


@@ -101,8 +143,18 @@ def get_memories(
    active_only: bool = True,
    min_confidence: float = 0.0,
    limit: int = 50,
+    status: str | None = None,
 ) -> list[Memory]:
-    """Retrieve memories, optionally filtered."""
+    """Retrieve memories, optionally filtered.
+
+    When ``status`` is provided explicitly, it takes precedence over
+    ``active_only`` so callers can list the candidate review queue via
+    ``get_memories(status='candidate')``. When ``status`` is omitted the
+    legacy ``active_only`` behaviour still applies.
+    """
+    if status is not None and status not in MEMORY_STATUSES:
+        raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
+
    query = "SELECT * FROM memories WHERE 1=1"
    params: list = []

@@ -112,7 +164,10 @@ def get_memories(
    if project is not None:
        query += " AND project = ?"
        params.append(project)
-    if active_only:
+    if status is not None:
+        query += " AND status = ?"
+        params.append(status)
+    elif active_only:
        query += " AND status = 'active'"
    if min_confidence > 0:
        query += " AND confidence >= ?"
@@ -163,8 +218,8 @@ def update_memory(
            updates.append("confidence = ?")
            params.append(confidence)
        if status is not None:
-            if status not in ("active", "superseded", "invalid"):
-                raise ValueError(f"Invalid status '{status}'")
+            if status not in MEMORY_STATUSES:
+                raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
            updates.append("status = ?")
            params.append(status)

@@ -195,6 +250,83 @@ def supersede_memory(memory_id: str) -> bool:
    return update_memory(memory_id, status="superseded")


+def promote_memory(memory_id: str) -> bool:
+    """Promote a candidate memory to active (Phase 9 Commit C review queue).
+
+    Returns False if the memory does not exist or is not currently a
+    candidate. Raises ValueError only if the promotion would create a
+    duplicate active memory (delegates to update_memory's existing check).
+    """
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT status FROM memories WHERE id = ?", (memory_id,)
+        ).fetchone()
+    if row is None:
+        return False
+    if row["status"] != "candidate":
+        return False
+    return update_memory(memory_id, status="active")
+
+
+def reject_candidate_memory(memory_id: str) -> bool:
+    """Reject a candidate memory (Phase 9 Commit C).
+
+    Sets the candidate's status to ``invalid`` so it drops out of the
+    review queue without polluting the active set. Returns False if the
+    memory does not exist or is not currently a candidate.
+    """
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT status FROM memories WHERE id = ?", (memory_id,)
+        ).fetchone()
+    if row is None:
+        return False
+    if row["status"] != "candidate":
+        return False
+    return update_memory(memory_id, status="invalid")
+
+
+def reinforce_memory(
+    memory_id: str,
+    confidence_delta: float = 0.02,
+) -> tuple[bool, float, float]:
+    """Bump a memory's confidence and reference count (Phase 9 Commit B).
+
+    Returns a 3-tuple ``(applied, old_confidence, new_confidence)``.
+    ``applied`` is False if the memory does not exist or is not in the
+    ``active`` state — reinforcement only touches live memories so the
+    candidate queue and invalidated history are never silently revived.
+
+    Confidence is capped at 1.0. last_referenced_at is set to the current
+    UTC time in SQLite-comparable format. reference_count is incremented
+    by one per call (not per delta amount).
+    """
+    if confidence_delta < 0:
+        raise ValueError("confidence_delta must be non-negative for reinforcement")
+    now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT confidence, status FROM memories WHERE id = ?", (memory_id,)
+        ).fetchone()
+        if row is None or row["status"] != "active":
+            return False, 0.0, 0.0
+        old_confidence = float(row["confidence"])
+        new_confidence = min(1.0, old_confidence + confidence_delta)
+        conn.execute(
+            "UPDATE memories SET confidence = ?, last_referenced_at = ?, "
+            "reference_count = COALESCE(reference_count, 0) + 1 "
+            "WHERE id = ?",
+            (new_confidence, now, memory_id),
+        )
+    log.info(
+        "memory_reinforced",
+        memory_id=memory_id,
+        old_confidence=round(old_confidence, 4),
+        new_confidence=round(new_confidence, 4),
+    )
+    return True, old_confidence, new_confidence
+
+
 def get_memories_for_context(
    memory_types: list[str] | None = None,
    project: str | None = None,
@@ -251,6 +383,9 @@ def get_memories_for_context(

 def _row_to_memory(row) -> Memory:
    """Convert a DB row to Memory dataclass."""
+    keys = row.keys() if hasattr(row, "keys") else []
+    last_ref = row["last_referenced_at"] if "last_referenced_at" in keys else None
+    ref_count = row["reference_count"] if "reference_count" in keys else 0
    return Memory(
        id=row["id"],
        memory_type=row["memory_type"],
@@ -261,6 +396,8 @@ def _row_to_memory(row) -> Memory:
        status=row["status"],
        created_at=row["created_at"],
        updated_at=row["updated_at"],
+        last_referenced_at=last_ref or "",
+        reference_count=int(ref_count or 0),
    )


--- a/src/atocore/models/database.py
+++ b/src/atocore/models/database.py
@@ -41,6 +41,8 @@ CREATE TABLE IF NOT EXISTS memories (
    source_chunk_id TEXT REFERENCES source_chunks(id),
    confidence REAL DEFAULT 1.0,
    status TEXT DEFAULT 'active',
+    last_referenced_at DATETIME,
+    reference_count INTEGER DEFAULT 0,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
 );
@@ -59,6 +61,12 @@ CREATE TABLE IF NOT EXISTS interactions (
    prompt TEXT NOT NULL,
    context_pack TEXT DEFAULT '{}',
    response_summary TEXT DEFAULT '',
+    response TEXT DEFAULT '',
+    memories_used TEXT DEFAULT '[]',
+    chunks_used TEXT DEFAULT '[]',
+    client TEXT DEFAULT '',
+    session_id TEXT DEFAULT '',
+    project TEXT DEFAULT '',
    project_id TEXT REFERENCES projects(id),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
 );
@@ -68,11 +76,14 @@ CREATE INDEX IF NOT EXISTS idx_memories_type ON memories(memory_type);
 CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project);
 CREATE INDEX IF NOT EXISTS idx_memories_status ON memories(status);
 CREATE INDEX IF NOT EXISTS idx_interactions_project ON interactions(project_id);
+CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project);
+CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id);
+CREATE INDEX IF NOT EXISTS idx_interactions_created_at ON interactions(created_at);
 """


 def _ensure_data_dir() -> None:
-    _config.settings.data_dir.mkdir(parents=True, exist_ok=True)
+    _config.ensure_runtime_dirs()


 def init_db() -> None:
@@ -90,6 +101,47 @@ def _apply_migrations(conn: sqlite3.Connection) -> None:
        conn.execute("ALTER TABLE memories ADD COLUMN project TEXT DEFAULT ''")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project)")

+    # Phase 9 Commit B: reinforcement columns.
+    # last_referenced_at records when a memory was most recently referenced
+    # in a captured interaction; reference_count is a monotonically
+    # increasing counter bumped on every reference. Together they let
+    # Reflection (Commit C) and decay (deferred) reason about which
+    # memories are actually being used versus which have gone cold.
+    if not _column_exists(conn, "memories", "last_referenced_at"):
+        conn.execute("ALTER TABLE memories ADD COLUMN last_referenced_at DATETIME")
+    if not _column_exists(conn, "memories", "reference_count"):
+        conn.execute("ALTER TABLE memories ADD COLUMN reference_count INTEGER DEFAULT 0")
+    conn.execute(
+        "CREATE INDEX IF NOT EXISTS idx_memories_last_referenced ON memories(last_referenced_at)"
+    )
+
+    # Phase 9 Commit A: capture loop columns on the interactions table.
+    # The original schema only carried prompt + project_id + a context_pack
+    # JSON blob. To make interactions a real audit trail of what AtoCore fed
+    # the LLM and what came back, we record the full response, the chunk
+    # and memory ids that were actually used, plus client + session metadata.
+    if not _column_exists(conn, "interactions", "response"):
+        conn.execute("ALTER TABLE interactions ADD COLUMN response TEXT DEFAULT ''")
+    if not _column_exists(conn, "interactions", "memories_used"):
+        conn.execute("ALTER TABLE interactions ADD COLUMN memories_used TEXT DEFAULT '[]'")
+    if not _column_exists(conn, "interactions", "chunks_used"):
+        conn.execute("ALTER TABLE interactions ADD COLUMN chunks_used TEXT DEFAULT '[]'")
+    if not _column_exists(conn, "interactions", "client"):
+        conn.execute("ALTER TABLE interactions ADD COLUMN client TEXT DEFAULT ''")
+    if not _column_exists(conn, "interactions", "session_id"):
+        conn.execute("ALTER TABLE interactions ADD COLUMN session_id TEXT DEFAULT ''")
+    if not _column_exists(conn, "interactions", "project"):
+        conn.execute("ALTER TABLE interactions ADD COLUMN project TEXT DEFAULT ''")
+    conn.execute(
+        "CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id)"
+    )
+    conn.execute(
+        "CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project)"
+    )
+    conn.execute(
+        "CREATE INDEX IF NOT EXISTS idx_interactions_created_at ON interactions(created_at)"
+    )
+

 def _column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
    rows = conn.execute(f"PRAGMA table_info({table})").fetchall()
@@ -100,9 +152,15 @@ def _column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
 def get_connection() -> Generator[sqlite3.Connection, None, None]:
    """Get a database connection with row factory."""
    _ensure_data_dir()
-    conn = sqlite3.connect(str(_config.settings.db_path))
+    conn = sqlite3.connect(
+        str(_config.settings.db_path),
+        timeout=_config.settings.db_busy_timeout_ms / 1000,
+    )
    conn.row_factory = sqlite3.Row
    conn.execute("PRAGMA foreign_keys = ON")
+    conn.execute(f"PRAGMA busy_timeout = {_config.settings.db_busy_timeout_ms}")
+    conn.execute("PRAGMA journal_mode = WAL")
+    conn.execute("PRAGMA synchronous = NORMAL")
    try:
        yield conn
        conn.commit()
--- a/src/atocore/observability/logger.py
+++ b/src/atocore/observability/logger.py
@@ -16,15 +16,18 @@ _LOG_LEVELS = {
 def setup_logging() -> None:
    """Configure structlog with JSON output."""
    log_level = "DEBUG" if _config.settings.debug else "INFO"
+    renderer = (
+        structlog.dev.ConsoleRenderer()
+        if _config.settings.debug
+        else structlog.processors.JSONRenderer()
+    )

    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.processors.add_log_level,
            structlog.processors.TimeStamper(fmt="iso"),
-            structlog.dev.ConsoleRenderer()
-            if settings.debug
-            else structlog.processors.JSONRenderer(),
+            renderer,
        ],
        wrapper_class=structlog.make_filtering_bound_logger(
            _LOG_LEVELS.get(log_level, logging.INFO)
--- a/src/atocore/ops/init.py
+++ b/src/atocore/ops/init.py
@@ -0,0 +1 @@
+"""Operational utilities for running AtoCore safely."""
--- a/src/atocore/ops/backup.py
+++ b/src/atocore/ops/backup.py
@@ -0,0 +1,250 @@
+"""Create safe runtime backups for the AtoCore machine store.
+
+This module is intentionally conservative:
+
+- The SQLite snapshot uses the online ``conn.backup()`` API and is safe to
+  call while the database is in use.
+- The project registry snapshot is a simple file copy of the canonical
+  registry JSON.
+- The Chroma snapshot is a *cold* directory copy. To stay safe it must be
+  taken while no ingestion is running. The recommended pattern from the API
+  layer is to acquire ``exclusive_ingestion()`` for the duration of the
+  backup so refreshes and ingestions cannot run concurrently with the copy.
+
+The backup metadata file records what was actually included so restore
+tooling does not have to guess.
+"""
+
+from __future__ import annotations
+
+import json
+import shutil
+import sqlite3
+from datetime import datetime, UTC
+from pathlib import Path
+
+import atocore.config as _config
+from atocore.models.database import init_db
+from atocore.observability.logger import get_logger
+
+log = get_logger("backup")
+
+
+def create_runtime_backup(
+    timestamp: datetime | None = None,
+    include_chroma: bool = False,
+) -> dict:
+    """Create a hot SQLite backup plus registry/config metadata.
+
+    When ``include_chroma`` is true the Chroma persistence directory is also
+    snapshotted as a cold directory copy. The caller is responsible for
+    ensuring no ingestion is running concurrently. The HTTP layer enforces
+    this by holding ``exclusive_ingestion()`` around the call.
+    """
+    init_db()
+    now = timestamp or datetime.now(UTC)
+    stamp = now.strftime("%Y%m%dT%H%M%SZ")
+
+    backup_root = _config.settings.resolved_backup_dir / "snapshots" / stamp
+    db_backup_dir = backup_root / "db"
+    config_backup_dir = backup_root / "config"
+    chroma_backup_dir = backup_root / "chroma"
+    metadata_path = backup_root / "backup-metadata.json"
+
+    db_backup_dir.mkdir(parents=True, exist_ok=True)
+    config_backup_dir.mkdir(parents=True, exist_ok=True)
+
+    db_snapshot_path = db_backup_dir / _config.settings.db_path.name
+    _backup_sqlite_db(_config.settings.db_path, db_snapshot_path)
+
+    registry_snapshot = None
+    registry_path = _config.settings.resolved_project_registry_path
+    if registry_path.exists():
+        registry_snapshot = config_backup_dir / registry_path.name
+        registry_snapshot.write_text(
+            registry_path.read_text(encoding="utf-8"), encoding="utf-8"
+        )
+
+    chroma_snapshot_path = ""
+    chroma_files_copied = 0
+    chroma_bytes_copied = 0
+    if include_chroma:
+        source_chroma = _config.settings.chroma_path
+        if source_chroma.exists() and source_chroma.is_dir():
+            chroma_backup_dir.mkdir(parents=True, exist_ok=True)
+            chroma_files_copied, chroma_bytes_copied = _copy_directory_tree(
+                source_chroma, chroma_backup_dir
+            )
+            chroma_snapshot_path = str(chroma_backup_dir)
+        else:
+            log.info(
+                "chroma_snapshot_skipped_missing",
+                path=str(source_chroma),
+            )
+
+    metadata = {
+        "created_at": now.isoformat(),
+        "backup_root": str(backup_root),
+        "db_snapshot_path": str(db_snapshot_path),
+        "db_size_bytes": db_snapshot_path.stat().st_size,
+        "registry_snapshot_path": str(registry_snapshot) if registry_snapshot else "",
+        "chroma_snapshot_path": chroma_snapshot_path,
+        "chroma_snapshot_bytes": chroma_bytes_copied,
+        "chroma_snapshot_files": chroma_files_copied,
+        "chroma_snapshot_included": include_chroma,
+        "vector_store_note": (
+            "Chroma snapshot included as cold directory copy."
+            if include_chroma and chroma_snapshot_path
+            else "Chroma hot backup is not included; rerun with include_chroma=True under exclusive_ingestion()."
+        ),
+    }
+    metadata_path.write_text(
+        json.dumps(metadata, indent=2, ensure_ascii=True) + "\n",
+        encoding="utf-8",
+    )
+
+    log.info(
+        "runtime_backup_created",
+        backup_root=str(backup_root),
+        db_snapshot=str(db_snapshot_path),
+        chroma_included=include_chroma,
+        chroma_bytes=chroma_bytes_copied,
+    )
+    return metadata
+
+
+def list_runtime_backups() -> list[dict]:
+    """List all runtime backups under the configured backup directory."""
+    snapshots_root = _config.settings.resolved_backup_dir / "snapshots"
+    if not snapshots_root.exists() or not snapshots_root.is_dir():
+        return []
+
+    entries: list[dict] = []
+    for snapshot_dir in sorted(snapshots_root.iterdir()):
+        if not snapshot_dir.is_dir():
+            continue
+        metadata_path = snapshot_dir / "backup-metadata.json"
+        entry: dict = {
+            "stamp": snapshot_dir.name,
+            "path": str(snapshot_dir),
+            "has_metadata": metadata_path.exists(),
+        }
+        if metadata_path.exists():
+            try:
+                entry["metadata"] = json.loads(metadata_path.read_text(encoding="utf-8"))
+            except json.JSONDecodeError:
+                entry["metadata"] = None
+                entry["metadata_error"] = "invalid_json"
+        entries.append(entry)
+    return entries
+
+
+def validate_backup(stamp: str) -> dict:
+    """Validate that a previously created backup is structurally usable.
+
+    Checks:
+    - the snapshot directory exists
+    - the SQLite snapshot is openable and ``PRAGMA integrity_check`` returns ok
+    - the registry snapshot, if recorded, parses as JSON
+    - the chroma snapshot directory, if recorded, exists
+    """
+    snapshot_dir = _config.settings.resolved_backup_dir / "snapshots" / stamp
+    result: dict = {
+        "stamp": stamp,
+        "path": str(snapshot_dir),
+        "exists": snapshot_dir.exists(),
+        "db_ok": False,
+        "registry_ok": None,
+        "chroma_ok": None,
+        "errors": [],
+    }
+    if not snapshot_dir.exists():
+        result["errors"].append("snapshot_directory_missing")
+        return result
+
+    metadata_path = snapshot_dir / "backup-metadata.json"
+    if not metadata_path.exists():
+        result["errors"].append("metadata_missing")
+        return result
+
+    try:
+        metadata = json.loads(metadata_path.read_text(encoding="utf-8"))
+    except json.JSONDecodeError as exc:
+        result["errors"].append(f"metadata_invalid_json: {exc}")
+        return result
+    result["metadata"] = metadata
+
+    db_path = Path(metadata.get("db_snapshot_path", ""))
+    if not db_path.exists():
+        result["errors"].append("db_snapshot_missing")
+    else:
+        try:
+            with sqlite3.connect(str(db_path)) as conn:
+                row = conn.execute("PRAGMA integrity_check").fetchone()
+                result["db_ok"] = bool(row and row[0] == "ok")
+                if not result["db_ok"]:
+                    result["errors"].append(
+                        f"db_integrity_check_failed: {row[0] if row else 'no_row'}"
+                    )
+        except sqlite3.DatabaseError as exc:
+            result["errors"].append(f"db_open_failed: {exc}")
+
+    registry_snapshot_path = metadata.get("registry_snapshot_path", "")
+    if registry_snapshot_path:
+        registry_path = Path(registry_snapshot_path)
+        if not registry_path.exists():
+            result["registry_ok"] = False
+            result["errors"].append("registry_snapshot_missing")
+        else:
+            try:
+                json.loads(registry_path.read_text(encoding="utf-8"))
+                result["registry_ok"] = True
+            except json.JSONDecodeError as exc:
+                result["registry_ok"] = False
+                result["errors"].append(f"registry_invalid_json: {exc}")
+
+    chroma_snapshot_path = metadata.get("chroma_snapshot_path", "")
+    if chroma_snapshot_path:
+        chroma_dir = Path(chroma_snapshot_path)
+        if chroma_dir.exists() and chroma_dir.is_dir():
+            result["chroma_ok"] = True
+        else:
+            result["chroma_ok"] = False
+            result["errors"].append("chroma_snapshot_missing")
+
+    result["valid"] = not result["errors"]
+    return result
+
+
+def _backup_sqlite_db(source_path: Path, dest_path: Path) -> None:
+    source_conn = sqlite3.connect(str(source_path))
+    dest_conn = sqlite3.connect(str(dest_path))
+    try:
+        source_conn.backup(dest_conn)
+    finally:
+        dest_conn.close()
+        source_conn.close()
+
+
+def _copy_directory_tree(source: Path, dest: Path) -> tuple[int, int]:
+    """Copy a directory tree and return (file_count, total_bytes)."""
+    if dest.exists():
+        shutil.rmtree(dest)
+    shutil.copytree(source, dest)
+
+    file_count = 0
+    total_bytes = 0
+    for path in dest.rglob("*"):
+        if path.is_file():
+            file_count += 1
+            total_bytes += path.stat().st_size
+    return file_count, total_bytes
+
+
+def main() -> None:
+    result = create_runtime_backup()
+    print(json.dumps(result, indent=2, ensure_ascii=True))
+
+
+if __name__ == "__main__":
+    main()
--- a/src/atocore/projects/init.py
+++ b/src/atocore/projects/init.py
@@ -0,0 +1 @@
+"""Project registry and source refresh helpers."""
--- a/src/atocore/projects/registry.py
+++ b/src/atocore/projects/registry.py
@@ -0,0 +1,437 @@
+"""Registered project source metadata and refresh helpers."""
+
+from __future__ import annotations
+
+import json
+import tempfile
+from dataclasses import asdict, dataclass
+from pathlib import Path
+
+import atocore.config as _config
+from atocore.ingestion.pipeline import ingest_folder
+
+
+@dataclass(frozen=True)
+class ProjectSourceRef:
+    source: str
+    subpath: str
+    label: str = ""
+
+
+@dataclass(frozen=True)
+class RegisteredProject:
+    project_id: str
+    aliases: tuple[str, ...]
+    description: str
+    ingest_roots: tuple[ProjectSourceRef, ...]
+
+
+def get_project_registry_template() -> dict:
+    """Return a minimal template for registering a new project."""
+    return {
+        "projects": [
+            {
+                "id": "p07-example",
+                "aliases": ["p07", "example-project"],
+                "description": "Short description of the project and staged corpus.",
+                "ingest_roots": [
+                    {
+                        "source": "vault",
+                        "subpath": "incoming/projects/p07-example",
+                        "label": "Primary staged project docs",
+                    }
+                ],
+            }
+        ]
+    }
+
+
+def build_project_registration_proposal(
+    project_id: str,
+    aliases: list[str] | tuple[str, ...] | None = None,
+    description: str = "",
+    ingest_roots: list[dict] | tuple[dict, ...] | None = None,
+) -> dict:
+    """Build a normalized project registration proposal without mutating state."""
+    normalized_id = project_id.strip()
+    if not normalized_id:
+        raise ValueError("Project id must be non-empty")
+
+    normalized_aliases = _normalize_aliases(aliases or [])
+    normalized_roots = _normalize_ingest_roots(ingest_roots or [])
+    if not normalized_roots:
+        raise ValueError("At least one ingest root is required")
+
+    collisions = _find_name_collisions(normalized_id, normalized_aliases)
+    resolved_roots = []
+    for root in normalized_roots:
+        source_ref = ProjectSourceRef(
+            source=root["source"],
+            subpath=root["subpath"],
+            label=root.get("label", ""),
+        )
+        resolved_path = _resolve_ingest_root(source_ref)
+        resolved_roots.append(
+            {
+                **root,
+                "path": str(resolved_path),
+                "exists": resolved_path.exists(),
+                "is_dir": resolved_path.is_dir(),
+            }
+        )
+
+    return {
+        "project": {
+            "id": normalized_id,
+            "aliases": normalized_aliases,
+            "description": description.strip(),
+            "ingest_roots": normalized_roots,
+        },
+        "resolved_ingest_roots": resolved_roots,
+        "collisions": collisions,
+        "registry_path": str(_config.settings.resolved_project_registry_path),
+        "valid": not collisions,
+    }
+
+
+def register_project(
+    project_id: str,
+    aliases: list[str] | tuple[str, ...] | None = None,
+    description: str = "",
+    ingest_roots: list[dict] | tuple[dict, ...] | None = None,
+) -> dict:
+    """Persist a validated project registration to the registry file."""
+    proposal = build_project_registration_proposal(
+        project_id=project_id,
+        aliases=aliases,
+        description=description,
+        ingest_roots=ingest_roots,
+    )
+    if not proposal["valid"]:
+        collision_names = ", ".join(collision["name"] for collision in proposal["collisions"])
+        raise ValueError(f"Project registration has collisions: {collision_names}")
+
+    registry_path = _config.settings.resolved_project_registry_path
+    payload = _load_registry_payload(registry_path)
+    payload.setdefault("projects", []).append(proposal["project"])
+    _write_registry_payload(registry_path, payload)
+
+    return {
+        **proposal,
+        "status": "registered",
+    }
+
+
+def update_project(
+    project_name: str,
+    aliases: list[str] | tuple[str, ...] | None = None,
+    description: str | None = None,
+    ingest_roots: list[dict] | tuple[dict, ...] | None = None,
+) -> dict:
+    """Update an existing project registration in the registry file."""
+    existing = get_registered_project(project_name)
+    if existing is None:
+        raise ValueError(f"Unknown project: {project_name}")
+
+    final_aliases = _normalize_aliases(aliases) if aliases is not None else list(existing.aliases)
+    final_description = description.strip() if description is not None else existing.description
+    final_roots = (
+        _normalize_ingest_roots(ingest_roots)
+        if ingest_roots is not None
+        else [asdict(root) for root in existing.ingest_roots]
+    )
+    if not final_roots:
+        raise ValueError("At least one ingest root is required")
+
+    collisions = _find_name_collisions(
+        existing.project_id,
+        final_aliases,
+        exclude_project_id=existing.project_id,
+    )
+    if collisions:
+        collision_names = ", ".join(collision["name"] for collision in collisions)
+        raise ValueError(f"Project update has collisions: {collision_names}")
+
+    updated_entry = {
+        "id": existing.project_id,
+        "aliases": final_aliases,
+        "description": final_description,
+        "ingest_roots": final_roots,
+    }
+
+    resolved_roots = []
+    for root in final_roots:
+        source_ref = ProjectSourceRef(
+            source=root["source"],
+            subpath=root["subpath"],
+            label=root.get("label", ""),
+        )
+        resolved_path = _resolve_ingest_root(source_ref)
+        resolved_roots.append(
+            {
+                **root,
+                "path": str(resolved_path),
+                "exists": resolved_path.exists(),
+                "is_dir": resolved_path.is_dir(),
+            }
+        )
+
+    registry_path = _config.settings.resolved_project_registry_path
+    payload = _load_registry_payload(registry_path)
+    payload["projects"] = [
+        updated_entry if str(entry.get("id", "")).strip() == existing.project_id else entry
+        for entry in payload.get("projects", [])
+    ]
+    _write_registry_payload(registry_path, payload)
+
+    return {
+        "project": updated_entry,
+        "resolved_ingest_roots": resolved_roots,
+        "collisions": [],
+        "registry_path": str(registry_path),
+        "valid": True,
+        "status": "updated",
+    }
+
+
+def load_project_registry() -> list[RegisteredProject]:
+    """Load project registry entries from JSON config."""
+    registry_path = _config.settings.resolved_project_registry_path
+    payload = _load_registry_payload(registry_path)
+    entries = payload.get("projects", [])
+    projects: list[RegisteredProject] = []
+
+    for entry in entries:
+        project_id = str(entry["id"]).strip()
+        if not project_id:
+            raise ValueError("Project registry entry is missing a non-empty id")
+        aliases = tuple(
+            alias.strip()
+            for alias in entry.get("aliases", [])
+            if isinstance(alias, str) and alias.strip()
+        )
+        description = str(entry.get("description", "")).strip()
+        ingest_roots = tuple(
+            ProjectSourceRef(
+                source=str(root["source"]).strip(),
+                subpath=str(root["subpath"]).strip(),
+                label=str(root.get("label", "")).strip(),
+            )
+            for root in entry.get("ingest_roots", [])
+            if str(root.get("source", "")).strip()
+            and str(root.get("subpath", "")).strip()
+        )
+        if not ingest_roots:
+            raise ValueError(f"Project registry entry '{project_id}' has no ingest_roots")
+        projects.append(
+            RegisteredProject(
+                project_id=project_id,
+                aliases=aliases,
+                description=description,
+                ingest_roots=ingest_roots,
+            )
+        )
+
+    _validate_unique_project_names(projects)
+    return projects
+
+
+def list_registered_projects() -> list[dict]:
+    """Return registry entries with resolved source readiness."""
+    return [_project_to_dict(project) for project in load_project_registry()]
+
+
+def get_registered_project(project_name: str) -> RegisteredProject | None:
+    """Resolve a registry entry by id or alias."""
+    needle = project_name.strip().lower()
+    if not needle:
+        return None
+
+    for project in load_project_registry():
+        candidates = {project.project_id.lower(), *(alias.lower() for alias in project.aliases)}
+        if needle in candidates:
+            return project
+    return None
+
+
+def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
+    """Ingest all configured source roots for a registered project.
+
+    The returned dict carries an overall ``status`` so callers can tell at a
+    glance whether the refresh was fully successful, partial, or did nothing
+    at all because every configured root was missing or not a directory:
+
+    - ``ingested``: every root was a real directory and was ingested
+    - ``partial``:  at least one root ingested and at least one was unusable
+    - ``nothing_to_ingest``: no roots were usable
+    """
+    project = get_registered_project(project_name)
+    if project is None:
+        raise ValueError(f"Unknown project: {project_name}")
+
+    roots = []
+    ingested_count = 0
+    skipped_count = 0
+    for source_ref in project.ingest_roots:
+        resolved = _resolve_ingest_root(source_ref)
+        root_result = {
+            "source": source_ref.source,
+            "subpath": source_ref.subpath,
+            "label": source_ref.label,
+            "path": str(resolved),
+        }
+        if not resolved.exists():
+            roots.append({**root_result, "status": "missing"})
+            skipped_count += 1
+            continue
+        if not resolved.is_dir():
+            roots.append({**root_result, "status": "not_directory"})
+            skipped_count += 1
+            continue
+
+        roots.append(
+            {
+                **root_result,
+                "status": "ingested",
+                "results": ingest_folder(resolved, purge_deleted=purge_deleted),
+            }
+        )
+        ingested_count += 1
+
+    if ingested_count == 0:
+        overall_status = "nothing_to_ingest"
+    elif skipped_count == 0:
+        overall_status = "ingested"
+    else:
+        overall_status = "partial"
+
+    return {
+        "project": project.project_id,
+        "aliases": list(project.aliases),
+        "description": project.description,
+        "purge_deleted": purge_deleted,
+        "status": overall_status,
+        "roots_ingested": ingested_count,
+        "roots_skipped": skipped_count,
+        "roots": roots,
+    }
+
+
+def _normalize_aliases(aliases: list[str] | tuple[str, ...]) -> list[str]:
+    deduped: list[str] = []
+    seen: set[str] = set()
+    for alias in aliases:
+        candidate = alias.strip()
+        if not candidate:
+            continue
+        key = candidate.lower()
+        if key in seen:
+            continue
+        seen.add(key)
+        deduped.append(candidate)
+    return deduped
+
+
+def _normalize_ingest_roots(ingest_roots: list[dict] | tuple[dict, ...]) -> list[dict]:
+    normalized: list[dict] = []
+    for root in ingest_roots:
+        source = str(root.get("source", "")).strip()
+        subpath = str(root.get("subpath", "")).strip()
+        label = str(root.get("label", "")).strip()
+        if not source or not subpath:
+            continue
+        if source not in {"vault", "drive"}:
+            raise ValueError(f"Unsupported source root: {source}")
+        normalized.append({"source": source, "subpath": subpath, "label": label})
+    return normalized
+
+
+def _project_to_dict(project: RegisteredProject) -> dict:
+    return {
+        "id": project.project_id,
+        "aliases": list(project.aliases),
+        "description": project.description,
+        "ingest_roots": [
+            {
+                **asdict(source_ref),
+                "path": str(_resolve_ingest_root(source_ref)),
+                "exists": _resolve_ingest_root(source_ref).exists(),
+                "is_dir": _resolve_ingest_root(source_ref).is_dir(),
+            }
+            for source_ref in project.ingest_roots
+        ],
+    }
+
+
+def _resolve_ingest_root(source_ref: ProjectSourceRef) -> Path:
+    base_map = {
+        "vault": _config.settings.resolved_vault_source_dir,
+        "drive": _config.settings.resolved_drive_source_dir,
+    }
+    try:
+        base_dir = base_map[source_ref.source]
+    except KeyError as exc:
+        raise ValueError(f"Unsupported source root: {source_ref.source}") from exc
+
+    return (base_dir / source_ref.subpath).resolve(strict=False)
+
+
+def _validate_unique_project_names(projects: list[RegisteredProject]) -> None:
+    seen: dict[str, str] = {}
+    for project in projects:
+        names = [project.project_id, *project.aliases]
+        for name in names:
+            key = name.lower()
+            if key in seen and seen[key] != project.project_id:
+                raise ValueError(
+                    f"Project registry name collision: '{name}' is used by both "
+                    f"'{seen[key]}' and '{project.project_id}'"
+                )
+            seen[key] = project.project_id
+
+
+def _find_name_collisions(
+    project_id: str,
+    aliases: list[str],
+    exclude_project_id: str | None = None,
+) -> list[dict]:
+    collisions: list[dict] = []
+    existing = load_project_registry()
+    requested_names = [project_id, *aliases]
+    for requested in requested_names:
+        requested_key = requested.lower()
+        for project in existing:
+            if exclude_project_id is not None and project.project_id == exclude_project_id:
+                continue
+            project_names = [project.project_id, *project.aliases]
+            if requested_key in {name.lower() for name in project_names}:
+                collisions.append(
+                    {
+                        "name": requested,
+                        "existing_project": project.project_id,
+                    }
+                )
+                break
+    return collisions
+
+
+def _load_registry_payload(registry_path: Path) -> dict:
+    if not registry_path.exists():
+        return {"projects": []}
+    return json.loads(registry_path.read_text(encoding="utf-8"))
+
+
+def _write_registry_payload(registry_path: Path, payload: dict) -> None:
+    registry_path.parent.mkdir(parents=True, exist_ok=True)
+    rendered = json.dumps(payload, indent=2, ensure_ascii=True) + "\n"
+    with tempfile.NamedTemporaryFile(
+        mode="w",
+        encoding="utf-8",
+        dir=registry_path.parent,
+        prefix=f"{registry_path.stem}.",
+        suffix=".tmp",
+        delete=False,
+    ) as tmp_file:
+        tmp_file.write(rendered)
+        temp_path = Path(tmp_file.name)
+    temp_path.replace(registry_path)
--- a/src/atocore/retrieval/retriever.py
+++ b/src/atocore/retrieval/retriever.py
@@ -1,16 +1,66 @@
-"""Retrieval: query → ranked chunks."""
+"""Retrieval: query to ranked chunks."""

+import re
 import time
 from dataclasses import dataclass

 import atocore.config as _config
 from atocore.models.database import get_connection
 from atocore.observability.logger import get_logger
+from atocore.projects.registry import get_registered_project
 from atocore.retrieval.embeddings import embed_query
 from atocore.retrieval.vector_store import get_vector_store

 log = get_logger("retriever")

+_STOP_TOKENS = {
+    "about",
+    "and",
+    "current",
+    "for",
+    "from",
+    "into",
+    "like",
+    "project",
+    "shared",
+    "system",
+    "that",
+    "the",
+    "this",
+    "what",
+    "with",
+}
+
+_HIGH_SIGNAL_HINTS = (
+    "status",
+    "decision",
+    "requirements",
+    "requirement",
+    "roadmap",
+    "charter",
+    "system-map",
+    "system_map",
+    "contracts",
+    "schema",
+    "architecture",
+    "workflow",
+    "error-budget",
+    "comparison-matrix",
+    "selection-decision",
+)
+
+_LOW_SIGNAL_HINTS = (
+    "/_archive/",
+    "\\_archive\\",
+    "/archive/",
+    "\\archive\\",
+    "_history",
+    "history",
+    "pre-cleanup",
+    "pre-migration",
+    "reviews/",
+)
+

@dataclass
 class ChunkResult:
@@ -28,6 +78,7 @@ def retrieve(
    query: str,
    top_k: int | None = None,
    filter_tags: list[str] | None = None,
+    project_hint: str | None = None,
 ) -> list[ChunkResult]:
    """Retrieve the most relevant chunks for a query."""
    top_k = top_k or _config.settings.context_top_k
@@ -36,10 +87,6 @@ def retrieve(
    query_embedding = embed_query(query)
    store = get_vector_store()

-    # Build filter
-    # Tags are stored as JSON strings like '["tag1", "tag2"]'.
-    # We use $contains with quoted tag to avoid substring false positives
-    # (e.g. searching "prod" won't match "production" because we search '"prod"').
    where = None
    if filter_tags:
        if len(filter_tags) == 1:
@@ -64,13 +111,17 @@ def retrieve(
        for i, chunk_id in enumerate(results["ids"][0]):
            if chunk_id not in existing_ids:
                continue
-            # ChromaDB returns distances (lower = more similar for cosine)
-            # Convert to similarity score (1 - distance)
+
            distance = results["distances"][0][i] if results["distances"] else 0
            score = 1.0 - distance
            meta = results["metadatas"][0][i] if results["metadatas"] else {}
            content = results["documents"][0][i] if results["documents"] else ""

+            score *= _query_match_boost(query, meta)
+            score *= _path_signal_boost(meta)
+            if project_hint:
+                score *= _project_match_boost(project_hint, meta)
+
            chunks.append(
                ChunkResult(
                    chunk_id=chunk_id,
@@ -85,6 +136,8 @@ def retrieve(
            )

    duration_ms = int((time.time() - start) * 1000)
+    chunks.sort(key=lambda chunk: chunk.score, reverse=True)
+
    log.info(
        "retrieval_done",
        query=query[:100],
@@ -96,6 +149,79 @@ def retrieve(
    return chunks


+def _project_match_boost(project_hint: str, metadata: dict) -> float:
+    """Return a project-aware relevance multiplier for raw retrieval."""
+    hint_lower = project_hint.strip().lower()
+    if not hint_lower:
+        return 1.0
+
+    source_file = str(metadata.get("source_file", "")).lower()
+    title = str(metadata.get("title", "")).lower()
+    tags = str(metadata.get("tags", "")).lower()
+    searchable = " ".join([source_file, title, tags])
+
+    project = get_registered_project(project_hint)
+    candidate_names = {hint_lower}
+    if project is not None:
+        candidate_names.add(project.project_id.lower())
+        candidate_names.update(alias.lower() for alias in project.aliases)
+        candidate_names.update(
+            source_ref.subpath.replace("\\", "/").strip("/").split("/")[-1].lower()
+            for source_ref in project.ingest_roots
+            if source_ref.subpath.strip("/\\")
+        )
+
+    for candidate in candidate_names:
+        if candidate and candidate in searchable:
+            return _config.settings.rank_project_match_boost
+
+    return 1.0
+
+
+def _query_match_boost(query: str, metadata: dict) -> float:
+    """Boost chunks whose path/title/headings echo the query's high-signal terms."""
+    tokens = [
+        token
+        for token in re.findall(r"[a-z0-9][a-z0-9_-]{2,}", query.lower())
+        if token not in _STOP_TOKENS
+    ]
+    if not tokens:
+        return 1.0
+
+    searchable = " ".join(
+        [
+            str(metadata.get("source_file", "")).lower(),
+            str(metadata.get("title", "")).lower(),
+            str(metadata.get("heading_path", "")).lower(),
+        ]
+    )
+    matches = sum(1 for token in set(tokens) if token in searchable)
+    if matches <= 0:
+        return 1.0
+    return min(
+        1.0 + matches * _config.settings.rank_query_token_step,
+        _config.settings.rank_query_token_cap,
+    )
+
+
+def _path_signal_boost(metadata: dict) -> float:
+    """Prefer current high-signal docs and gently down-rank archival noise."""
+    searchable = " ".join(
+        [
+            str(metadata.get("source_file", "")).lower(),
+            str(metadata.get("title", "")).lower(),
+            str(metadata.get("heading_path", "")).lower(),
+        ]
+    )
+
+    multiplier = 1.0
+    if any(hint in searchable for hint in _LOW_SIGNAL_HINTS):
+        multiplier *= _config.settings.rank_path_low_signal_penalty
+    if any(hint in searchable for hint in _HIGH_SIGNAL_HINTS):
+        multiplier *= _config.settings.rank_path_high_signal_boost
+    return multiplier
+
+
 def _existing_chunk_ids(chunk_ids: list[str]) -> set[str]:
    """Filter out stale vector entries whose chunk rows no longer exist."""
    if not chunk_ids:
--- a/tests/test_api_storage.py
+++ b/tests/test_api_storage.py
@@ -0,0 +1,577 @@
+"""Tests for storage-related API readiness endpoints."""
+
+from contextlib import contextmanager
+
+from fastapi.testclient import TestClient
+
+import atocore.config as config
+from atocore.main import app
+
+
+def test_sources_endpoint_reports_configured_sources(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.get("/sources")
+
+    assert response.status_code == 200
+    body = response.json()
+    assert body["vault_enabled"] is True
+    assert body["drive_enabled"] is True
+    assert len(body["sources"]) == 2
+    assert all(source["read_only"] for source in body["sources"])
+
+
+def test_health_endpoint_exposes_machine_paths_and_source_readiness(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.get("/health")
+
+    assert response.status_code == 200
+    body = response.json()
+    assert body["status"] == "ok"
+    assert body["sources_ready"] is True
+    assert "db_path" in body["machine_paths"]
+    assert "run_dir" in body["machine_paths"]
+
+
+def test_projects_endpoint_reports_registered_projects(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    config_dir = tmp_data_dir / "config"
+    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
+    project_dir.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        """
+{
+  "projects": [
+    {
+      "id": "p04-gigabit",
+      "aliases": ["p04"],
+      "description": "P04 docs",
+      "ingest_roots": [
+        {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+      ]
+    }
+  ]
+}
+""".strip(),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.get("/projects")
+
+    assert response.status_code == 200
+    body = response.json()
+    assert body["projects"][0]["id"] == "p04-gigabit"
+    assert body["projects"][0]["ingest_roots"][0]["exists"] is True
+
+
+def test_project_refresh_endpoint_uses_registered_roots(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    config_dir = tmp_data_dir / "config"
+    project_dir = vault_dir / "incoming" / "projects" / "p05-interferometer"
+    project_dir.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        """
+{
+  "projects": [
+    {
+      "id": "p05-interferometer",
+      "aliases": ["p05"],
+      "description": "P05 docs",
+      "ingest_roots": [
+        {"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
+      ]
+    }
+  ]
+}
+""".strip(),
+        encoding="utf-8",
+    )
+
+    calls = []
+
+    def fake_refresh_registered_project(project_name, purge_deleted=False):
+        calls.append((project_name, purge_deleted))
+        return {
+            "project": "p05-interferometer",
+            "aliases": ["p05"],
+            "description": "P05 docs",
+            "purge_deleted": purge_deleted,
+            "status": "ingested",
+            "roots_ingested": 1,
+            "roots_skipped": 0,
+            "roots": [
+                {
+                    "source": "vault",
+                    "subpath": "incoming/projects/p05-interferometer",
+                    "path": str(project_dir),
+                    "status": "ingested",
+                    "results": [],
+                }
+            ],
+        }
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+    monkeypatch.setattr("atocore.api.routes.refresh_registered_project", fake_refresh_registered_project)
+
+    client = TestClient(app)
+    response = client.post("/projects/p05/refresh")
+
+    assert response.status_code == 200
+    assert calls == [("p05", False)]
+    assert response.json()["project"] == "p05-interferometer"
+
+
+def test_project_refresh_endpoint_serializes_ingestion(tmp_data_dir, monkeypatch):
+    config.settings = config.Settings()
+    events = []
+
+    @contextmanager
+    def fake_lock():
+        events.append("enter")
+        try:
+            yield
+        finally:
+            events.append("exit")
+
+    def fake_refresh_registered_project(project_name, purge_deleted=False):
+        events.append(("refresh", project_name, purge_deleted))
+        return {
+            "project": "p05-interferometer",
+            "aliases": ["p05"],
+            "description": "P05 docs",
+            "purge_deleted": purge_deleted,
+            "status": "nothing_to_ingest",
+            "roots_ingested": 0,
+            "roots_skipped": 0,
+            "roots": [],
+        }
+
+    monkeypatch.setattr("atocore.api.routes.exclusive_ingestion", fake_lock)
+    monkeypatch.setattr("atocore.api.routes.refresh_registered_project", fake_refresh_registered_project)
+
+    client = TestClient(app)
+    response = client.post("/projects/p05/refresh")
+
+    assert response.status_code == 200
+    assert events == ["enter", ("refresh", "p05", False), "exit"]
+
+
+def test_projects_template_endpoint_returns_template(tmp_data_dir, monkeypatch):
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.get("/projects/template")
+
+    assert response.status_code == 200
+    body = response.json()
+    assert body["allowed_sources"] == ["vault", "drive"]
+    assert body["template"]["projects"][0]["id"] == "p07-example"
+
+
+def test_project_proposal_endpoint_returns_normalized_preview(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    config_dir = tmp_data_dir / "config"
+    staged = vault_dir / "incoming" / "projects" / "p07-example"
+    staged.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text('{"projects": []}', encoding="utf-8")
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.post(
+        "/projects/proposal",
+        json={
+            "project_id": "p07-example",
+            "aliases": ["p07", "example-project", "p07"],
+            "description": "Example project",
+            "ingest_roots": [
+                {
+                    "source": "vault",
+                    "subpath": "incoming/projects/p07-example",
+                    "label": "Primary docs",
+                }
+            ],
+        },
+    )
+
+    assert response.status_code == 200
+    body = response.json()
+    assert body["project"]["aliases"] == ["p07", "example-project"]
+    assert body["resolved_ingest_roots"][0]["exists"] is True
+    assert body["valid"] is True
+
+
+def test_project_register_endpoint_persists_entry(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    config_dir = tmp_data_dir / "config"
+    staged = vault_dir / "incoming" / "projects" / "p07-example"
+    staged.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text('{"projects": []}', encoding="utf-8")
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.post(
+        "/projects/register",
+        json={
+            "project_id": "p07-example",
+            "aliases": ["p07", "example-project"],
+            "description": "Example project",
+            "ingest_roots": [
+                {
+                    "source": "vault",
+                    "subpath": "incoming/projects/p07-example",
+                    "label": "Primary docs",
+                }
+            ],
+        },
+    )
+
+    assert response.status_code == 200
+    body = response.json()
+    assert body["status"] == "registered"
+    assert body["project"]["id"] == "p07-example"
+    assert '"p07-example"' in registry_path.read_text(encoding="utf-8")
+
+
+def test_project_register_endpoint_rejects_collisions(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    config_dir = tmp_data_dir / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        """
+{
+  "projects": [
+    {
+      "id": "p05-interferometer",
+      "aliases": ["p05", "interferometer"],
+      "ingest_roots": [
+        {"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
+      ]
+    }
+  ]
+}
+""".strip(),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.post(
+        "/projects/register",
+        json={
+            "project_id": "p07-example",
+            "aliases": ["interferometer"],
+            "ingest_roots": [
+                {
+                    "source": "vault",
+                    "subpath": "incoming/projects/p07-example",
+                }
+            ],
+        },
+    )
+
+    assert response.status_code == 400
+    assert "collisions" in response.json()["detail"]
+
+
+def test_project_update_endpoint_persists_changes(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    config_dir = tmp_data_dir / "config"
+    project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
+    project_dir.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        """
+{
+  "projects": [
+    {
+      "id": "p04-gigabit",
+      "aliases": ["p04", "gigabit"],
+      "description": "Old description",
+      "ingest_roots": [
+        {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+      ]
+    }
+  ]
+}
+""".strip(),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.put(
+        "/projects/p04",
+        json={
+            "aliases": ["p04", "gigabit", "gigabit-project"],
+            "description": "Updated P04 docs",
+        },
+    )
+
+    assert response.status_code == 200
+    body = response.json()
+    assert body["status"] == "updated"
+    assert body["project"]["aliases"] == ["p04", "gigabit", "gigabit-project"]
+    assert body["project"]["description"] == "Updated P04 docs"
+
+
+def test_project_update_endpoint_rejects_collisions(tmp_data_dir, monkeypatch):
+    vault_dir = tmp_data_dir / "vault-source"
+    drive_dir = tmp_data_dir / "drive-source"
+    config_dir = tmp_data_dir / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        """
+{
+  "projects": [
+    {
+      "id": "p04-gigabit",
+      "aliases": ["p04", "gigabit"],
+      "ingest_roots": [
+        {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+      ]
+    },
+    {
+      "id": "p05-interferometer",
+      "aliases": ["p05", "interferometer"],
+      "ingest_roots": [
+        {"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
+      ]
+    }
+  ]
+}
+""".strip(),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+    config.settings = config.Settings()
+
+    client = TestClient(app)
+    response = client.put(
+        "/projects/p04",
+        json={
+            "aliases": ["p04", "interferometer"],
+        },
+    )
+
+    assert response.status_code == 400
+    assert "collisions" in response.json()["detail"]
+
+
+def test_admin_backup_create_without_chroma(tmp_data_dir, monkeypatch):
+    config.settings = config.Settings()
+    captured = {}
+
+    def fake_create_runtime_backup(timestamp=None, include_chroma=False):
+        captured["include_chroma"] = include_chroma
+        return {
+            "created_at": "2026-04-06T23:00:00+00:00",
+            "backup_root": "/tmp/fake",
+            "db_snapshot_path": "/tmp/fake/db/atocore.db",
+            "db_size_bytes": 0,
+            "registry_snapshot_path": "",
+            "chroma_snapshot_path": "",
+            "chroma_snapshot_bytes": 0,
+            "chroma_snapshot_files": 0,
+            "chroma_snapshot_included": False,
+            "vector_store_note": "skipped",
+        }
+
+    monkeypatch.setattr("atocore.api.routes.create_runtime_backup", fake_create_runtime_backup)
+
+    client = TestClient(app)
+    response = client.post("/admin/backup", json={})
+
+    assert response.status_code == 200
+    assert captured == {"include_chroma": False}
+    body = response.json()
+    assert body["chroma_snapshot_included"] is False
+
+
+def test_admin_backup_create_with_chroma_holds_lock(tmp_data_dir, monkeypatch):
+    config.settings = config.Settings()
+    events = []
+
+    @contextmanager
+    def fake_lock():
+        events.append("enter")
+        try:
+            yield
+        finally:
+            events.append("exit")
+
+    def fake_create_runtime_backup(timestamp=None, include_chroma=False):
+        events.append(("backup", include_chroma))
+        return {
+            "created_at": "2026-04-06T23:30:00+00:00",
+            "backup_root": "/tmp/fake",
+            "db_snapshot_path": "/tmp/fake/db/atocore.db",
+            "db_size_bytes": 0,
+            "registry_snapshot_path": "",
+            "chroma_snapshot_path": "/tmp/fake/chroma",
+            "chroma_snapshot_bytes": 4,
+            "chroma_snapshot_files": 1,
+            "chroma_snapshot_included": True,
+            "vector_store_note": "included",
+        }
+
+    monkeypatch.setattr("atocore.api.routes.exclusive_ingestion", fake_lock)
+    monkeypatch.setattr("atocore.api.routes.create_runtime_backup", fake_create_runtime_backup)
+
+    client = TestClient(app)
+    response = client.post("/admin/backup", json={"include_chroma": True})
+
+    assert response.status_code == 200
+    assert events == ["enter", ("backup", True), "exit"]
+    assert response.json()["chroma_snapshot_included"] is True
+
+
+def test_admin_backup_list_and_validate_endpoints(tmp_data_dir, monkeypatch):
+    config.settings = config.Settings()
+
+    def fake_list_runtime_backups():
+        return [
+            {
+                "stamp": "20260406T220000Z",
+                "path": "/tmp/fake/snapshots/20260406T220000Z",
+                "has_metadata": True,
+                "metadata": {"db_snapshot_path": "/tmp/fake/snapshots/20260406T220000Z/db/atocore.db"},
+            }
+        ]
+
+    def fake_validate_backup(stamp):
+        if stamp == "missing":
+            return {
+                "stamp": stamp,
+                "path": f"/tmp/fake/snapshots/{stamp}",
+                "exists": False,
+                "errors": ["snapshot_directory_missing"],
+            }
+        return {
+            "stamp": stamp,
+            "path": f"/tmp/fake/snapshots/{stamp}",
+            "exists": True,
+            "db_ok": True,
+            "registry_ok": True,
+            "chroma_ok": None,
+            "valid": True,
+            "errors": [],
+        }
+
+    monkeypatch.setattr("atocore.api.routes.list_runtime_backups", fake_list_runtime_backups)
+    monkeypatch.setattr("atocore.api.routes.validate_backup", fake_validate_backup)
+
+    client = TestClient(app)
+
+    listing = client.get("/admin/backup")
+    assert listing.status_code == 200
+    listing_body = listing.json()
+    assert "backup_dir" in listing_body
+    assert listing_body["backups"][0]["stamp"] == "20260406T220000Z"
+
+    valid = client.get("/admin/backup/20260406T220000Z/validate")
+    assert valid.status_code == 200
+    assert valid.json()["valid"] is True
+
+    missing = client.get("/admin/backup/missing/validate")
+    assert missing.status_code == 404
+
+
+def test_query_endpoint_accepts_project_hint(monkeypatch):
+    def fake_retrieve(prompt, top_k=10, filter_tags=None, project_hint=None):
+        assert prompt == "architecture"
+        assert top_k == 3
+        assert project_hint == "p04-gigabit"
+        return []
+
+    monkeypatch.setattr("atocore.api.routes.retrieve", fake_retrieve)
+
+    client = TestClient(app)
+    response = client.post(
+        "/query",
+        json={
+            "prompt": "architecture",
+            "top_k": 3,
+            "project": "p04-gigabit",
+        },
+    )
+
+    assert response.status_code == 200
+    assert response.json()["results"] == []
--- a/tests/test_backup.py
+++ b/tests/test_backup.py
@@ -0,0 +1,158 @@
+"""Tests for runtime backup creation."""
+
+import json
+import sqlite3
+from datetime import UTC, datetime
+
+import atocore.config as config
+from atocore.models.database import init_db
+from atocore.ops.backup import (
+    create_runtime_backup,
+    list_runtime_backups,
+    validate_backup,
+)
+
+
+def test_create_runtime_backup_copies_db_and_registry(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
+    monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
+    monkeypatch.setenv(
+        "ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
+    )
+
+    registry_path = tmp_path / "config" / "project-registry.json"
+    registry_path.parent.mkdir(parents=True)
+    registry_path.write_text('{"projects":[{"id":"p01-example","aliases":[],"ingest_roots":[{"source":"vault","subpath":"incoming/projects/p01-example"}]}]}\n', encoding="utf-8")
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        init_db()
+        with sqlite3.connect(str(config.settings.db_path)) as conn:
+            conn.execute("INSERT INTO projects (id, name) VALUES (?, ?)", ("p01", "P01 Example"))
+            conn.commit()
+
+        result = create_runtime_backup(datetime(2026, 4, 6, 18, 0, 0, tzinfo=UTC))
+    finally:
+        config.settings = original_settings
+
+    db_snapshot = tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "db" / "atocore.db"
+    registry_snapshot = (
+        tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "config" / "project-registry.json"
+    )
+    metadata_path = (
+        tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "backup-metadata.json"
+    )
+
+    assert result["db_snapshot_path"] == str(db_snapshot)
+    assert db_snapshot.exists()
+    assert registry_snapshot.exists()
+    assert metadata_path.exists()
+
+    with sqlite3.connect(str(db_snapshot)) as conn:
+        row = conn.execute("SELECT name FROM projects WHERE id = ?", ("p01",)).fetchone()
+    assert row[0] == "P01 Example"
+
+    metadata = json.loads(metadata_path.read_text(encoding="utf-8"))
+    assert metadata["registry_snapshot_path"] == str(registry_snapshot)
+
+
+def test_create_runtime_backup_includes_chroma_when_requested(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
+    monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
+    monkeypatch.setenv(
+        "ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
+    )
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        init_db()
+
+        # Create a fake chroma directory tree with a couple of files.
+        chroma_dir = config.settings.chroma_path
+        (chroma_dir / "collection-a").mkdir(parents=True, exist_ok=True)
+        (chroma_dir / "collection-a" / "data.bin").write_bytes(b"\x00\x01\x02\x03")
+        (chroma_dir / "metadata.json").write_text('{"ok":true}', encoding="utf-8")
+
+        result = create_runtime_backup(
+            datetime(2026, 4, 6, 20, 0, 0, tzinfo=UTC),
+            include_chroma=True,
+        )
+    finally:
+        config.settings = original_settings
+
+    chroma_snapshot_root = (
+        tmp_path / "backups" / "snapshots" / "20260406T200000Z" / "chroma"
+    )
+    assert result["chroma_snapshot_included"] is True
+    assert result["chroma_snapshot_path"] == str(chroma_snapshot_root)
+    assert result["chroma_snapshot_files"] >= 2
+    assert result["chroma_snapshot_bytes"] > 0
+    assert (chroma_snapshot_root / "collection-a" / "data.bin").exists()
+    assert (chroma_snapshot_root / "metadata.json").exists()
+
+
+def test_list_and_validate_runtime_backups(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
+    monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
+    monkeypatch.setenv(
+        "ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
+    )
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        init_db()
+        first = create_runtime_backup(datetime(2026, 4, 6, 21, 0, 0, tzinfo=UTC))
+        second = create_runtime_backup(datetime(2026, 4, 6, 22, 0, 0, tzinfo=UTC))
+
+        listing = list_runtime_backups()
+        first_validation = validate_backup("20260406T210000Z")
+        second_validation = validate_backup("20260406T220000Z")
+        missing_validation = validate_backup("20260101T000000Z")
+    finally:
+        config.settings = original_settings
+
+    assert len(listing) == 2
+    assert {entry["stamp"] for entry in listing} == {
+        "20260406T210000Z",
+        "20260406T220000Z",
+    }
+    for entry in listing:
+        assert entry["has_metadata"] is True
+        assert entry["metadata"]["db_snapshot_path"]
+
+    assert first_validation["valid"] is True
+    assert first_validation["db_ok"] is True
+    assert first_validation["errors"] == []
+
+    assert second_validation["valid"] is True
+
+    assert missing_validation["exists"] is False
+    assert "snapshot_directory_missing" in missing_validation["errors"]
+
+    # both metadata paths are reachable on disk
+    assert json.loads(
+        (tmp_path / "backups" / "snapshots" / "20260406T210000Z" / "backup-metadata.json")
+        .read_text(encoding="utf-8")
+    )["db_snapshot_path"] == first["db_snapshot_path"]
+    assert second["db_snapshot_path"].endswith("atocore.db")
+
+
+def test_create_runtime_backup_handles_missing_registry(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
+    monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
+    monkeypatch.setenv(
+        "ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
+    )
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        init_db()
+        result = create_runtime_backup(datetime(2026, 4, 6, 19, 0, 0, tzinfo=UTC))
+    finally:
+        config.settings = original_settings
+
+    assert result["registry_snapshot_path"] == ""
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -0,0 +1,89 @@
+"""Tests for configuration and canonical path boundaries."""
+
+import os
+from pathlib import Path
+
+import atocore.config as config
+
+
+def test_settings_resolve_canonical_directories(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(tmp_path / "vault-source"))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(tmp_path / "drive-source"))
+    monkeypatch.setenv("ATOCORE_LOG_DIR", str(tmp_path / "logs"))
+    monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
+    monkeypatch.setenv(
+        "ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
+    )
+
+    settings = config.Settings()
+
+    assert settings.db_path == (tmp_path / "data" / "db" / "atocore.db").resolve()
+    assert settings.chroma_path == (tmp_path / "data" / "chroma").resolve()
+    assert settings.cache_path == (tmp_path / "data" / "cache").resolve()
+    assert settings.tmp_path == (tmp_path / "data" / "tmp").resolve()
+    assert settings.resolved_vault_source_dir == (tmp_path / "vault-source").resolve()
+    assert settings.resolved_drive_source_dir == (tmp_path / "drive-source").resolve()
+    assert settings.resolved_log_dir == (tmp_path / "logs").resolve()
+    assert settings.resolved_backup_dir == (tmp_path / "backups").resolve()
+    assert settings.resolved_run_dir == (tmp_path / "run").resolve()
+    assert settings.resolved_project_registry_path == (
+        tmp_path / "config" / "project-registry.json"
+    ).resolve()
+
+
+def test_settings_keep_legacy_db_path_when_present(tmp_path, monkeypatch):
+    data_dir = tmp_path / "data"
+    data_dir.mkdir()
+    legacy_db = data_dir / "atocore.db"
+    legacy_db.write_text("", encoding="utf-8")
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(data_dir))
+
+    settings = config.Settings()
+
+    assert settings.db_path == legacy_db.resolve()
+
+
+def test_ranking_weights_are_tunable_via_env(monkeypatch):
+    monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5")
+    monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12")
+    monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5")
+    monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25")
+    monkeypatch.setenv("ATOCORE_RANK_PATH_LOW_SIGNAL_PENALTY", "0.5")
+
+    settings = config.Settings()
+
+    assert settings.rank_project_match_boost == 3.5
+    assert settings.rank_query_token_step == 0.12
+    assert settings.rank_query_token_cap == 1.5
+    assert settings.rank_path_high_signal_boost == 1.25
+    assert settings.rank_path_low_signal_penalty == 0.5
+
+
+def test_ensure_runtime_dirs_creates_machine_dirs_only(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(tmp_path / "vault-source"))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(tmp_path / "drive-source"))
+    monkeypatch.setenv("ATOCORE_LOG_DIR", str(tmp_path / "logs"))
+    monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
+    monkeypatch.setenv(
+        "ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
+    )
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        config.ensure_runtime_dirs()
+
+        assert config.settings.db_path.parent.exists()
+        assert config.settings.chroma_path.exists()
+        assert config.settings.cache_path.exists()
+        assert config.settings.tmp_path.exists()
+        assert config.settings.resolved_log_dir.exists()
+        assert config.settings.resolved_backup_dir.exists()
+        assert config.settings.resolved_run_dir.exists()
+        assert config.settings.resolved_project_registry_path.parent.exists()
+        assert not config.settings.resolved_vault_source_dir.exists()
+        assert not config.settings.resolved_drive_source_dir.exists()
+    finally:
+        config.settings = original_settings
--- a/tests/test_context_builder.py
+++ b/tests/test_context_builder.py
@@ -41,6 +41,23 @@ def test_context_with_project_hint(tmp_data_dir, sample_markdown):
    assert pack.total_chars > 0


+def test_context_builder_passes_project_hint_to_retrieval(monkeypatch):
+    init_db()
+    init_project_state_schema()
+
+    calls = []
+
+    def fake_retrieve(query, top_k=None, filter_tags=None, project_hint=None):
+        calls.append((query, project_hint))
+        return []
+
+    monkeypatch.setattr("atocore.context.builder.retrieve", fake_retrieve)
+
+    build_context("architecture", project_hint="p05-interferometer", budget=300)
+
+    assert calls == [("architecture", "p05-interferometer")]
+
+
 def test_last_context_pack_stored(tmp_data_dir, sample_markdown):
    """Test that last context pack is stored for debug."""
    init_db()
--- a/tests/test_database.py
+++ b/tests/test_database.py
@@ -0,0 +1,49 @@
+"""Tests for SQLite connection pragmas and runtime behavior."""
+
+import sqlite3
+
+import atocore.config as config
+from atocore.models.database import get_connection, init_db
+
+
+def test_get_connection_applies_busy_timeout_and_wal(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
+    monkeypatch.setenv("ATOCORE_DB_BUSY_TIMEOUT_MS", "7000")
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        init_db()
+        with get_connection() as conn:
+            busy_timeout = conn.execute("PRAGMA busy_timeout").fetchone()[0]
+            journal_mode = conn.execute("PRAGMA journal_mode").fetchone()[0]
+            foreign_keys = conn.execute("PRAGMA foreign_keys").fetchone()[0]
+    finally:
+        config.settings = original_settings
+
+    assert busy_timeout == 7000
+    assert str(journal_mode).lower() == "wal"
+    assert foreign_keys == 1
+
+
+def test_get_connection_uses_configured_timeout_value(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
+    monkeypatch.setenv("ATOCORE_DB_BUSY_TIMEOUT_MS", "2500")
+
+    original_settings = config.settings
+    original_connect = sqlite3.connect
+    calls = []
+
+    def fake_connect(*args, **kwargs):
+        calls.append(kwargs.get("timeout"))
+        return original_connect(*args, **kwargs)
+
+    try:
+        config.settings = config.Settings()
+        monkeypatch.setattr("atocore.models.database.sqlite3.connect", fake_connect)
+        init_db()
+    finally:
+        config.settings = original_settings
+
+    assert calls
+    assert calls[0] == 2.5
--- a/tests/test_extractor.py
+++ b/tests/test_extractor.py
@@ -0,0 +1,374 @@
+"""Tests for Phase 9 Commit C rule-based candidate extractor."""
+
+from fastapi.testclient import TestClient
+
+from atocore.interactions.service import record_interaction
+from atocore.main import app
+from atocore.memory.extractor import (
+    MemoryCandidate,
+    extract_candidates_from_interaction,
+)
+from atocore.memory.service import (
+    create_memory,
+    get_memories,
+    promote_memory,
+    reject_candidate_memory,
+)
+from atocore.models.database import init_db
+
+
+def _capture(**fields):
+    return record_interaction(
+        prompt=fields.get("prompt", "unused"),
+        response=fields.get("response", ""),
+        response_summary=fields.get("response_summary", ""),
+        project=fields.get("project", ""),
+        reinforce=False,
+    )
+
+
+# --- extractor: heading patterns ------------------------------------------
+
+
+def test_extractor_finds_decision_heading(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        response=(
+            "We talked about the frame.\n\n"
+            "## Decision: switch the lateral supports to GF-PTFE pads\n\n"
+            "Rationale: thermal stability."
+        ),
+    )
+    results = extract_candidates_from_interaction(interaction)
+    assert len(results) == 1
+    assert results[0].memory_type == "adaptation"
+    assert "GF-PTFE" in results[0].content
+    assert results[0].rule == "decision_heading"
+
+
+def test_extractor_finds_constraint_and_requirement_headings(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        response=(
+            "### Constraint: total mass must stay under 4.8 kg\n"
+            "## Requirement: survives 12g shock in any axis\n"
+        ),
+    )
+    results = extract_candidates_from_interaction(interaction)
+    rules = {r.rule for r in results}
+    assert "constraint_heading" in rules
+    assert "requirement_heading" in rules
+    constraint = next(r for r in results if r.rule == "constraint_heading")
+    requirement = next(r for r in results if r.rule == "requirement_heading")
+    assert constraint.memory_type == "project"
+    assert requirement.memory_type == "project"
+    assert "4.8 kg" in constraint.content
+    assert "12g" in requirement.content
+
+
+def test_extractor_finds_fact_heading(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        response="## Fact: the polisher sim uses floating-point deltas in microns\n",
+    )
+    results = extract_candidates_from_interaction(interaction)
+    assert len(results) == 1
+    assert results[0].memory_type == "knowledge"
+    assert results[0].rule == "fact_heading"
+
+
+def test_extractor_heading_separator_variants(tmp_data_dir):
+    """Decision headings should match with `:`, `-`, or em-dash."""
+    init_db()
+    for sep in (":", "-", "\u2014"):
+        interaction = _capture(
+            response=f"## Decision {sep} adopt option B for the mount interface\n",
+        )
+        results = extract_candidates_from_interaction(interaction)
+        assert len(results) == 1, f"sep={sep!r}"
+        assert "option B" in results[0].content
+
+
+# --- extractor: sentence patterns -----------------------------------------
+
+
+def test_extractor_finds_preference_sentence(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        response=(
+            "I prefer rebase-based workflows because history stays linear "
+            "and reviewers have an easier time."
+        ),
+    )
+    results = extract_candidates_from_interaction(interaction)
+    pref_matches = [r for r in results if r.rule == "preference_sentence"]
+    assert len(pref_matches) == 1
+    assert pref_matches[0].memory_type == "preference"
+    assert "rebase" in pref_matches[0].content.lower()
+
+
+def test_extractor_finds_decided_to_sentence(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        response=(
+            "After going through the options we decided to keep the legacy "
+            "calibration routine for the July milestone."
+        ),
+    )
+    results = extract_candidates_from_interaction(interaction)
+    decision_matches = [r for r in results if r.rule == "decided_to_sentence"]
+    assert len(decision_matches) == 1
+    assert decision_matches[0].memory_type == "adaptation"
+    assert "legacy calibration" in decision_matches[0].content.lower()
+
+
+def test_extractor_finds_requirement_sentence(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        response=(
+            "One of the findings: the requirement is that the interferometer "
+            "must resolve 50 picometer displacements at 1 kHz bandwidth."
+        ),
+    )
+    results = extract_candidates_from_interaction(interaction)
+    req_matches = [r for r in results if r.rule == "requirement_sentence"]
+    assert len(req_matches) == 1
+    assert req_matches[0].memory_type == "project"
+    assert "picometer" in req_matches[0].content.lower()
+
+
+# --- extractor: content rules ---------------------------------------------
+
+
+def test_extractor_rejects_too_short_matches(tmp_data_dir):
+    init_db()
+    interaction = _capture(response="## Decision: yes\n")  # too short after clean
+    results = extract_candidates_from_interaction(interaction)
+    assert results == []
+
+
+def test_extractor_deduplicates_identical_matches(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        response=(
+            "## Decision: use the modular frame variant for prototyping\n"
+            "## Decision: use the modular frame variant for prototyping\n"
+        ),
+    )
+    results = extract_candidates_from_interaction(interaction)
+    assert len(results) == 1
+
+
+def test_extractor_strips_trailing_punctuation(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        response="## Decision: defer the laser redesign to Q3.\n",
+    )
+    results = extract_candidates_from_interaction(interaction)
+    assert len(results) == 1
+    assert results[0].content.endswith("Q3")
+
+
+def test_extractor_includes_project_and_source_interaction_id(tmp_data_dir):
+    init_db()
+    interaction = _capture(
+        project="p05-interferometer",
+        response="## Decision: freeze the optical path for the prototype run\n",
+    )
+    results = extract_candidates_from_interaction(interaction)
+    assert len(results) == 1
+    assert results[0].project == "p05-interferometer"
+    assert results[0].source_interaction_id == interaction.id
+
+
+def test_extractor_drops_candidates_matching_existing_active(tmp_data_dir):
+    init_db()
+    # Seed an active memory that the extractor would otherwise re-propose
+    create_memory(
+        memory_type="preference",
+        content="prefers small reviewable diffs",
+    )
+    interaction = _capture(
+        response="Remember that I prefer small reviewable diffs because they merge faster.",
+    )
+    results = extract_candidates_from_interaction(interaction)
+    # The only candidate would have been the preference, now dropped
+    assert not any(r.content.lower() == "small reviewable diffs" for r in results)
+
+
+def test_extractor_returns_empty_for_no_patterns(tmp_data_dir):
+    init_db()
+    interaction = _capture(response="Nothing structural here, just prose.")
+    results = extract_candidates_from_interaction(interaction)
+    assert results == []
+
+
+# --- service: candidate lifecycle -----------------------------------------
+
+
+def test_candidate_and_active_can_coexist(tmp_data_dir):
+    init_db()
+    active = create_memory(
+        memory_type="preference",
+        content="logs every config change to the change log",
+        status="active",
+    )
+    candidate = create_memory(
+        memory_type="preference",
+        content="logs every config change to the change log",
+        status="candidate",
+    )
+    # The two are distinct rows because status is part of the dedup key
+    assert active.id != candidate.id
+
+
+def test_promote_memory_moves_candidate_to_active(tmp_data_dir):
+    init_db()
+    candidate = create_memory(
+        memory_type="adaptation",
+        content="moved the staging scripts into deploy/staging",
+        status="candidate",
+    )
+    ok = promote_memory(candidate.id)
+    assert ok is True
+
+    active_list = get_memories(memory_type="adaptation", status="active")
+    assert any(m.id == candidate.id for m in active_list)
+
+
+def test_promote_memory_on_non_candidate_returns_false(tmp_data_dir):
+    init_db()
+    active = create_memory(
+        memory_type="adaptation",
+        content="already active adaptation entry",
+        status="active",
+    )
+    assert promote_memory(active.id) is False
+
+
+def test_reject_candidate_moves_it_to_invalid(tmp_data_dir):
+    init_db()
+    candidate = create_memory(
+        memory_type="knowledge",
+        content="the calibration uses barometric pressure compensation",
+        status="candidate",
+    )
+    ok = reject_candidate_memory(candidate.id)
+    assert ok is True
+
+    invalid_list = get_memories(memory_type="knowledge", status="invalid")
+    assert any(m.id == candidate.id for m in invalid_list)
+
+
+def test_reject_on_non_candidate_returns_false(tmp_data_dir):
+    init_db()
+    active = create_memory(memory_type="preference", content="always uses structured logging")
+    assert reject_candidate_memory(active.id) is False
+
+
+def test_get_memories_filters_by_candidate_status(tmp_data_dir):
+    init_db()
+    create_memory(memory_type="preference", content="active one", status="active")
+    create_memory(memory_type="preference", content="candidate one", status="candidate")
+    create_memory(memory_type="preference", content="another candidate", status="candidate")
+    candidates = get_memories(status="candidate", memory_type="preference")
+    assert len(candidates) == 2
+    assert all(c.status == "candidate" for c in candidates)
+
+
+# --- API: extract / promote / reject / list -------------------------------
+
+
+def test_api_extract_interaction_without_persist(tmp_data_dir):
+    init_db()
+    interaction = record_interaction(
+        prompt="review",
+        response="## Decision: flip the default budget to 4000 for p05\n",
+        reinforce=False,
+    )
+    client = TestClient(app)
+    response = client.post(f"/interactions/{interaction.id}/extract", json={})
+    assert response.status_code == 200
+    body = response.json()
+    assert body["candidate_count"] == 1
+    assert body["persisted"] is False
+    assert body["persisted_ids"] == []
+    # The candidate should NOT have been written to the memory table
+    queue = get_memories(status="candidate")
+    assert queue == []
+
+
+def test_api_extract_interaction_with_persist(tmp_data_dir):
+    init_db()
+    interaction = record_interaction(
+        prompt="review",
+        response=(
+            "## Decision: pin the embedding model to v2.3 for Wave 2\n"
+            "## Constraint: context budget must stay under 4000 chars\n"
+        ),
+        reinforce=False,
+    )
+    client = TestClient(app)
+    response = client.post(
+        f"/interactions/{interaction.id}/extract", json={"persist": True}
+    )
+    assert response.status_code == 200
+    body = response.json()
+    assert body["candidate_count"] == 2
+    assert body["persisted"] is True
+    assert len(body["persisted_ids"]) == 2
+
+    queue = get_memories(status="candidate", limit=50)
+    assert len(queue) == 2
+
+
+def test_api_extract_returns_404_for_missing_interaction(tmp_data_dir):
+    init_db()
+    client = TestClient(app)
+    response = client.post("/interactions/nope/extract", json={})
+    assert response.status_code == 404
+
+
+def test_api_promote_and_reject_endpoints(tmp_data_dir):
+    init_db()
+    candidate = create_memory(
+        memory_type="adaptation",
+        content="restructured the ingestion pipeline into layered stages",
+        status="candidate",
+    )
+    client = TestClient(app)
+
+    promote_response = client.post(f"/memory/{candidate.id}/promote")
+    assert promote_response.status_code == 200
+    assert promote_response.json()["status"] == "promoted"
+
+    # Promoting it again should 404 because it's no longer a candidate
+    second_promote = client.post(f"/memory/{candidate.id}/promote")
+    assert second_promote.status_code == 404
+
+    reject_response = client.post("/memory/does-not-exist/reject")
+    assert reject_response.status_code == 404
+
+
+def test_api_get_memory_candidate_status_filter(tmp_data_dir):
+    init_db()
+    create_memory(memory_type="preference", content="prefers explicit types", status="active")
+    create_memory(
+        memory_type="preference",
+        content="prefers pull requests sized by diff lines not files",
+        status="candidate",
+    )
+    client = TestClient(app)
+    response = client.get("/memory", params={"status": "candidate"})
+    assert response.status_code == 200
+    body = response.json()
+    assert "candidate" in body["statuses"]
+    assert len(body["memories"]) == 1
+    assert body["memories"][0]["status"] == "candidate"
+
+
+def test_api_get_memory_invalid_status_returns_400(tmp_data_dir):
+    init_db()
+    client = TestClient(app)
+    response = client.get("/memory", params={"status": "not-a-status"})
+    assert response.status_code == 400
--- a/tests/test_interactions.py
+++ b/tests/test_interactions.py
@@ -0,0 +1,211 @@
+"""Tests for the Phase 9 Commit A interaction capture loop."""
+
+import time
+
+import pytest
+from fastapi.testclient import TestClient
+
+from atocore.interactions.service import (
+    get_interaction,
+    list_interactions,
+    record_interaction,
+)
+from atocore.main import app
+from atocore.models.database import init_db
+
+
+# --- Service-level tests --------------------------------------------------
+
+
+def test_record_interaction_persists_all_fields(tmp_data_dir):
+    init_db()
+    interaction = record_interaction(
+        prompt="What is the lateral support material for p05?",
+        response="The current lateral support uses GF-PTFE pads per Decision D-024.",
+        response_summary="lateral support: GF-PTFE per D-024",
+        project="p05-interferometer",
+        client="claude-code",
+        session_id="sess-001",
+        memories_used=["mem-aaa", "mem-bbb"],
+        chunks_used=["chunk-111", "chunk-222", "chunk-333"],
+        context_pack={"budget": 3000, "chunks": 3},
+    )
+
+    assert interaction.id
+    assert interaction.created_at
+
+    fetched = get_interaction(interaction.id)
+    assert fetched is not None
+    assert fetched.prompt.startswith("What is the lateral support")
+    assert fetched.response.startswith("The current lateral support")
+    assert fetched.response_summary == "lateral support: GF-PTFE per D-024"
+    assert fetched.project == "p05-interferometer"
+    assert fetched.client == "claude-code"
+    assert fetched.session_id == "sess-001"
+    assert fetched.memories_used == ["mem-aaa", "mem-bbb"]
+    assert fetched.chunks_used == ["chunk-111", "chunk-222", "chunk-333"]
+    assert fetched.context_pack == {"budget": 3000, "chunks": 3}
+
+
+def test_record_interaction_minimum_fields(tmp_data_dir):
+    init_db()
+    interaction = record_interaction(prompt="ping")
+    assert interaction.id
+    assert interaction.prompt == "ping"
+    assert interaction.response == ""
+    assert interaction.memories_used == []
+    assert interaction.chunks_used == []
+
+
+def test_record_interaction_rejects_empty_prompt(tmp_data_dir):
+    init_db()
+    with pytest.raises(ValueError):
+        record_interaction(prompt="")
+    with pytest.raises(ValueError):
+        record_interaction(prompt="   ")
+
+
+def test_get_interaction_returns_none_for_unknown_id(tmp_data_dir):
+    init_db()
+    assert get_interaction("does-not-exist") is None
+    assert get_interaction("") is None
+
+
+def test_list_interactions_filters_by_project(tmp_data_dir):
+    init_db()
+    record_interaction(prompt="p04 question", project="p04-gigabit")
+    record_interaction(prompt="p05 question", project="p05-interferometer")
+    record_interaction(prompt="another p05", project="p05-interferometer")
+
+    p05 = list_interactions(project="p05-interferometer")
+    p04 = list_interactions(project="p04-gigabit")
+
+    assert len(p05) == 2
+    assert len(p04) == 1
+    assert all(i.project == "p05-interferometer" for i in p05)
+    assert p04[0].prompt == "p04 question"
+
+
+def test_list_interactions_filters_by_session_and_client(tmp_data_dir):
+    init_db()
+    record_interaction(prompt="a", session_id="sess-A", client="openclaw")
+    record_interaction(prompt="b", session_id="sess-A", client="claude-code")
+    record_interaction(prompt="c", session_id="sess-B", client="openclaw")
+
+    sess_a = list_interactions(session_id="sess-A")
+    openclaw = list_interactions(client="openclaw")
+
+    assert len(sess_a) == 2
+    assert len(openclaw) == 2
+    assert {i.client for i in sess_a} == {"openclaw", "claude-code"}
+
+
+def test_list_interactions_orders_newest_first_and_respects_limit(tmp_data_dir):
+    init_db()
+    # created_at has 1-second resolution; sleep enough to keep ordering
+    # deterministic regardless of insert speed.
+    for index in range(5):
+        record_interaction(prompt=f"prompt-{index}")
+        time.sleep(1.05)
+
+    items = list_interactions(limit=3)
+    assert len(items) == 3
+    # Newest first: prompt-4, prompt-3, prompt-2
+    assert items[0].prompt == "prompt-4"
+    assert items[1].prompt == "prompt-3"
+    assert items[2].prompt == "prompt-2"
+
+
+def test_list_interactions_respects_since_filter(tmp_data_dir):
+    init_db()
+    first = record_interaction(prompt="early")
+    time.sleep(1.05)
+    second = record_interaction(prompt="late")
+
+    after_first = list_interactions(since=first.created_at)
+    ids_after_first = {item.id for item in after_first}
+    assert second.id in ids_after_first
+    assert first.id in ids_after_first  # cutoff is inclusive
+
+    after_second = list_interactions(since=second.created_at)
+    ids_after_second = {item.id for item in after_second}
+    assert second.id in ids_after_second
+    assert first.id not in ids_after_second
+
+
+def test_list_interactions_zero_limit_returns_empty(tmp_data_dir):
+    init_db()
+    record_interaction(prompt="ping")
+    assert list_interactions(limit=0) == []
+
+
+# --- API-level tests ------------------------------------------------------
+
+
+def test_post_interactions_endpoint_records_interaction(tmp_data_dir):
+    init_db()
+    client = TestClient(app)
+    response = client.post(
+        "/interactions",
+        json={
+            "prompt": "What changed in p06 this week?",
+            "response": "Polisher kinematic frame parameters updated to v0.3.",
+            "response_summary": "p06 frame parameters bumped to v0.3",
+            "project": "p06-polisher",
+            "client": "claude-code",
+            "session_id": "sess-xyz",
+            "memories_used": ["mem-1"],
+            "chunks_used": ["chunk-a", "chunk-b"],
+            "context_pack": {"chunks": 2},
+        },
+    )
+    assert response.status_code == 200
+    body = response.json()
+    assert body["status"] == "recorded"
+    interaction_id = body["id"]
+
+    # Round-trip via the GET endpoint
+    fetched = client.get(f"/interactions/{interaction_id}")
+    assert fetched.status_code == 200
+    fetched_body = fetched.json()
+    assert fetched_body["prompt"].startswith("What changed in p06")
+    assert fetched_body["response"].startswith("Polisher kinematic frame")
+    assert fetched_body["project"] == "p06-polisher"
+    assert fetched_body["chunks_used"] == ["chunk-a", "chunk-b"]
+    assert fetched_body["context_pack"] == {"chunks": 2}
+
+
+def test_post_interactions_rejects_empty_prompt(tmp_data_dir):
+    init_db()
+    client = TestClient(app)
+    response = client.post("/interactions", json={"prompt": ""})
+    assert response.status_code == 400
+
+
+def test_get_unknown_interaction_returns_404(tmp_data_dir):
+    init_db()
+    client = TestClient(app)
+    response = client.get("/interactions/does-not-exist")
+    assert response.status_code == 404
+
+
+def test_list_interactions_endpoint_returns_summaries(tmp_data_dir):
+    init_db()
+    client = TestClient(app)
+    client.post(
+        "/interactions",
+        json={"prompt": "alpha", "project": "p04-gigabit", "response": "x" * 10},
+    )
+    client.post(
+        "/interactions",
+        json={"prompt": "beta", "project": "p05-interferometer", "response": "y" * 50},
+    )
+
+    response = client.get("/interactions", params={"project": "p05-interferometer"})
+    assert response.status_code == 200
+    body = response.json()
+    assert body["count"] == 1
+    assert body["interactions"][0]["prompt"] == "beta"
+    assert body["interactions"][0]["response_chars"] == 50
+    # The list endpoint never includes the full response body
+    assert "response" not in body["interactions"][0]
--- a/tests/test_logging.py
+++ b/tests/test_logging.py
@@ -0,0 +1,18 @@
+"""Tests for logging configuration."""
+
+from types import SimpleNamespace
+
+import atocore.config as config
+from atocore.observability.logger import setup_logging
+
+
+def test_setup_logging_uses_dynamic_settings_without_name_error():
+    original_settings = config.settings
+    try:
+        config.settings = SimpleNamespace(debug=False)
+        setup_logging()
+
+        config.settings = SimpleNamespace(debug=True)
+        setup_logging()
+    finally:
+        config.settings = original_settings
--- a/tests/test_project_registry.py
+++ b/tests/test_project_registry.py
@@ -0,0 +1,596 @@
+"""Tests for project registry resolution and refresh behavior."""
+
+import json
+
+import atocore.config as config
+from atocore.projects.registry import (
+    build_project_registration_proposal,
+    get_registered_project,
+    get_project_registry_template,
+    list_registered_projects,
+    register_project,
+    refresh_registered_project,
+    update_project,
+)
+
+
+def test_project_registry_lists_projects_with_resolved_roots(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    (vault_dir / "incoming" / "projects" / "p04-gigabit").mkdir(parents=True)
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p04-gigabit",
+                        "aliases": ["p04", "gigabit"],
+                        "description": "P04 docs",
+                        "ingest_roots": [
+                            {
+                                "source": "vault",
+                                "subpath": "incoming/projects/p04-gigabit",
+                                "label": "P04 staged docs",
+                            }
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        projects = list_registered_projects()
+    finally:
+        config.settings = original_settings
+
+    assert len(projects) == 1
+    assert projects[0]["id"] == "p04-gigabit"
+    assert projects[0]["ingest_roots"][0]["exists"] is True
+
+
+def test_project_registry_resolves_alias(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p05-interferometer",
+                        "aliases": ["p05", "interferometer"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        project = get_registered_project("p05")
+    finally:
+        config.settings = original_settings
+
+    assert project is not None
+    assert project.project_id == "p05-interferometer"
+
+
+def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    project_dir = vault_dir / "incoming" / "projects" / "p06-polisher"
+    project_dir.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p06-polisher",
+                        "aliases": ["p06", "polisher"],
+                        "description": "P06 docs",
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p06-polisher"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    calls = []
+
+    def fake_ingest_folder(path, purge_deleted=True):
+        calls.append((str(path), purge_deleted))
+        return [{"file": str(path / "README.md"), "status": "ingested"}]
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
+        result = refresh_registered_project("polisher")
+    finally:
+        config.settings = original_settings
+
+    assert result["project"] == "p06-polisher"
+    assert len(calls) == 1
+    assert calls[0][0].endswith("p06-polisher")
+    assert calls[0][1] is False
+    assert result["roots"][0]["status"] == "ingested"
+    assert result["status"] == "ingested"
+    assert result["roots_ingested"] == 1
+    assert result["roots_skipped"] == 0
+
+
+def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
+    tmp_path, monkeypatch
+):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p07-ghost",
+                        "aliases": ["ghost"],
+                        "description": "Project whose roots do not exist on disk",
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p07-ghost"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    def fail_ingest_folder(path, purge_deleted=True):
+        raise AssertionError(f"ingest_folder should not be called for missing root: {path}")
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fail_ingest_folder)
+        result = refresh_registered_project("ghost")
+    finally:
+        config.settings = original_settings
+
+    assert result["status"] == "nothing_to_ingest"
+    assert result["roots_ingested"] == 0
+    assert result["roots_skipped"] == 1
+    assert result["roots"][0]["status"] == "missing"
+
+
+def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    real_root = vault_dir / "incoming" / "projects" / "p08-mixed"
+    real_root.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p08-mixed",
+                        "aliases": ["mixed"],
+                        "description": "One root present, one missing",
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p08-mixed"},
+                            {"source": "vault", "subpath": "incoming/projects/p08-mixed-missing"},
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    def fake_ingest_folder(path, purge_deleted=True):
+        return [{"file": str(path / "README.md"), "status": "ingested"}]
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
+        result = refresh_registered_project("mixed")
+    finally:
+        config.settings = original_settings
+
+    assert result["status"] == "partial"
+    assert result["roots_ingested"] == 1
+    assert result["roots_skipped"] == 1
+    statuses = sorted(root["status"] for root in result["roots"])
+    assert statuses == ["ingested", "missing"]
+
+
+def test_project_registry_template_has_expected_shape():
+    template = get_project_registry_template()
+    assert "projects" in template
+    assert template["projects"][0]["id"] == "p07-example"
+    assert template["projects"][0]["ingest_roots"][0]["source"] == "vault"
+
+
+def test_project_registry_rejects_alias_collision(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p04-gigabit",
+                        "aliases": ["shared"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+                        ],
+                    },
+                    {
+                        "id": "p05-interferometer",
+                        "aliases": ["shared"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
+                        ],
+                    },
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        try:
+            list_registered_projects()
+        except ValueError as exc:
+            assert "collision" in str(exc)
+        else:
+            raise AssertionError("Expected project registry collision to raise")
+    finally:
+        config.settings = original_settings
+
+
+def test_project_registration_proposal_normalizes_and_resolves_paths(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    staged = vault_dir / "incoming" / "projects" / "p07-example"
+    staged.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(json.dumps({"projects": []}), encoding="utf-8")
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        proposal = build_project_registration_proposal(
+            project_id="p07-example",
+            aliases=["p07", "example-project", "p07"],
+            description="Example project",
+            ingest_roots=[
+                {
+                    "source": "vault",
+                    "subpath": "incoming/projects/p07-example",
+                    "label": "Primary docs",
+                }
+            ],
+        )
+    finally:
+        config.settings = original_settings
+
+    assert proposal["project"]["aliases"] == ["p07", "example-project"]
+    assert proposal["resolved_ingest_roots"][0]["exists"] is True
+    assert proposal["valid"] is True
+
+
+def test_project_registration_proposal_reports_collisions(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p05-interferometer",
+                        "aliases": ["p05", "interferometer"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        proposal = build_project_registration_proposal(
+            project_id="p08-example",
+            aliases=["interferometer"],
+            ingest_roots=[
+                {"source": "vault", "subpath": "incoming/projects/p08-example"}
+            ],
+        )
+    finally:
+        config.settings = original_settings
+
+    assert proposal["valid"] is False
+    assert proposal["collisions"][0]["existing_project"] == "p05-interferometer"
+
+
+def test_register_project_persists_new_entry(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    staged = vault_dir / "incoming" / "projects" / "p07-example"
+    staged.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(json.dumps({"projects": []}), encoding="utf-8")
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        result = register_project(
+            project_id="p07-example",
+            aliases=["p07", "example-project"],
+            description="Example project",
+            ingest_roots=[
+                {
+                    "source": "vault",
+                    "subpath": "incoming/projects/p07-example",
+                    "label": "Primary docs",
+                }
+            ],
+        )
+    finally:
+        config.settings = original_settings
+
+    assert result["status"] == "registered"
+    payload = json.loads(registry_path.read_text(encoding="utf-8"))
+    assert payload["projects"][0]["id"] == "p07-example"
+    assert payload["projects"][0]["aliases"] == ["p07", "example-project"]
+
+
+def test_register_project_rejects_collisions(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p05-interferometer",
+                        "aliases": ["p05", "interferometer"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        try:
+            register_project(
+                project_id="p07-example",
+                aliases=["interferometer"],
+                ingest_roots=[
+                    {"source": "vault", "subpath": "incoming/projects/p07-example"}
+                ],
+            )
+        except ValueError as exc:
+            assert "collisions" in str(exc)
+        else:
+            raise AssertionError("Expected collision to prevent project registration")
+    finally:
+        config.settings = original_settings
+
+
+def test_update_project_persists_description_and_aliases(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    staged = vault_dir / "incoming" / "projects" / "p04-gigabit"
+    staged.mkdir(parents=True)
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p04-gigabit",
+                        "aliases": ["p04", "gigabit"],
+                        "description": "Old description",
+                        "ingest_roots": [
+                            {
+                                "source": "vault",
+                                "subpath": "incoming/projects/p04-gigabit",
+                                "label": "Primary docs",
+                            }
+                        ],
+                    }
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        result = update_project(
+            "p04",
+            aliases=["p04", "gigabit", "gigabit-project"],
+            description="Updated P04 project docs",
+        )
+    finally:
+        config.settings = original_settings
+
+    assert result["status"] == "updated"
+    assert result["project"]["id"] == "p04-gigabit"
+    assert result["project"]["aliases"] == ["p04", "gigabit", "gigabit-project"]
+    assert result["project"]["description"] == "Updated P04 project docs"
+
+    payload = json.loads(registry_path.read_text(encoding="utf-8"))
+    assert payload["projects"][0]["aliases"] == ["p04", "gigabit", "gigabit-project"]
+    assert payload["projects"][0]["description"] == "Updated P04 project docs"
+
+
+def test_update_project_rejects_colliding_aliases(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    config_dir = tmp_path / "config"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    config_dir.mkdir()
+    registry_path = config_dir / "project-registry.json"
+    registry_path.write_text(
+        json.dumps(
+            {
+                "projects": [
+                    {
+                        "id": "p04-gigabit",
+                        "aliases": ["p04", "gigabit"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
+                        ],
+                    },
+                    {
+                        "id": "p05-interferometer",
+                        "aliases": ["p05", "interferometer"],
+                        "ingest_roots": [
+                            {"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
+                        ],
+                    },
+                ]
+            }
+        ),
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+    monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        try:
+            update_project(
+                "p04-gigabit",
+                aliases=["p04", "interferometer"],
+            )
+        except ValueError as exc:
+            assert "collisions" in str(exc)
+        else:
+            raise AssertionError("Expected collision to prevent project update")
+    finally:
+        config.settings = original_settings
--- a/tests/test_reinforcement.py
+++ b/tests/test_reinforcement.py
@@ -0,0 +1,316 @@
+"""Tests for Phase 9 Commit B reinforcement loop."""
+
+from fastapi.testclient import TestClient
+
+from atocore.interactions.service import record_interaction
+from atocore.main import app
+from atocore.memory.reinforcement import (
+    DEFAULT_CONFIDENCE_DELTA,
+    reinforce_from_interaction,
+)
+from atocore.memory.service import (
+    create_memory,
+    get_memories,
+    reinforce_memory,
+)
+from atocore.models.database import init_db
+
+
+# --- service-level tests: reinforce_memory primitive ----------------------
+
+
+def test_reinforce_memory_bumps_active_memory(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="preference",
+        content="prefers Python over Ruby for scripting",
+        confidence=0.6,
+    )
+
+    applied, old_conf, new_conf = reinforce_memory(mem.id, confidence_delta=0.05)
+
+    assert applied is True
+    assert old_conf == 0.6
+    assert abs(new_conf - 0.65) < 1e-9
+
+    reloaded = get_memories(memory_type="preference", limit=10)
+    match = next((m for m in reloaded if m.id == mem.id), None)
+    assert match is not None
+    assert abs(match.confidence - 0.65) < 1e-9
+    assert match.reference_count == 1
+    assert match.last_referenced_at  # non-empty
+
+
+def test_reinforce_memory_caps_at_one(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="identity",
+        content="is a mechanical engineer who runs AtoCore",
+        confidence=0.98,
+    )
+    applied, old_conf, new_conf = reinforce_memory(mem.id, confidence_delta=0.05)
+    assert applied is True
+    assert old_conf == 0.98
+    assert new_conf == 1.0
+
+
+def test_reinforce_memory_rejects_candidate_and_missing(tmp_data_dir):
+    init_db()
+    candidate = create_memory(
+        memory_type="knowledge",
+        content="the lateral support uses GF-PTFE pads",
+        confidence=0.5,
+        status="candidate",
+    )
+    applied, _, _ = reinforce_memory(candidate.id)
+    assert applied is False
+
+    missing, _, _ = reinforce_memory("no-such-id")
+    assert missing is False
+
+
+def test_reinforce_memory_accumulates_reference_count(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="preference",
+        content="likes concise code reviews that focus on the why",
+        confidence=0.5,
+    )
+    for _ in range(5):
+        reinforce_memory(mem.id, confidence_delta=0.01)
+    reloaded = [m for m in get_memories(memory_type="preference", limit=10) if m.id == mem.id][0]
+    assert reloaded.reference_count == 5
+    assert abs(reloaded.confidence - 0.55) < 1e-9
+
+
+def test_reinforce_memory_rejects_negative_delta(tmp_data_dir):
+    init_db()
+    mem = create_memory(memory_type="preference", content="always uses structured logging")
+    import pytest
+
+    with pytest.raises(ValueError):
+        reinforce_memory(mem.id, confidence_delta=-0.01)
+
+
+# --- reinforce_from_interaction: the high-level matcher -------------------
+
+
+def _make_interaction(**overrides):
+    return record_interaction(
+        prompt=overrides.get("prompt", "ignored"),
+        response=overrides.get("response", ""),
+        response_summary=overrides.get("response_summary", ""),
+        project=overrides.get("project", ""),
+        client=overrides.get("client", ""),
+        session_id=overrides.get("session_id", ""),
+        reinforce=False,  # the matcher is tested in isolation here
+    )
+
+
+def test_reinforce_from_interaction_matches_active_memory(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="preference",
+        content="prefers tests that describe behaviour in plain English",
+        confidence=0.5,
+    )
+    interaction = _make_interaction(
+        response=(
+            "I wrote the new tests in plain English, since the project "
+            "prefers tests that describe behaviour in plain English and "
+            "that makes them easier to review."
+        ),
+    )
+    results = reinforce_from_interaction(interaction)
+    assert len(results) == 1
+    assert results[0].memory_id == mem.id
+    assert abs(results[0].new_confidence - (0.5 + DEFAULT_CONFIDENCE_DELTA)) < 1e-9
+
+
+def test_reinforce_from_interaction_ignores_candidates_and_inactive(tmp_data_dir):
+    init_db()
+    candidate = create_memory(
+        memory_type="knowledge",
+        content="the polisher frame uses kinematic mounts for thermal isolation",
+        confidence=0.6,
+        status="candidate",
+    )
+    interaction = _make_interaction(
+        response=(
+            "The polisher frame uses kinematic mounts for thermal isolation, "
+            "which matches the note in the design log."
+        ),
+    )
+    results = reinforce_from_interaction(interaction)
+    # Candidate should NOT be reinforced even though the text matches
+    assert all(r.memory_id != candidate.id for r in results)
+
+
+def test_reinforce_from_interaction_requires_min_content_length(tmp_data_dir):
+    init_db()
+    short_mem = create_memory(
+        memory_type="preference",
+        content="uses SI",  # below min length
+    )
+    interaction = _make_interaction(
+        response="Everything uses SI for this project, consistently.",
+    )
+    results = reinforce_from_interaction(interaction)
+    assert all(r.memory_id != short_mem.id for r in results)
+
+
+def test_reinforce_from_interaction_empty_response_is_noop(tmp_data_dir):
+    init_db()
+    create_memory(memory_type="preference", content="prefers structured logging")
+    interaction = _make_interaction(response="", response_summary="")
+    results = reinforce_from_interaction(interaction)
+    assert results == []
+
+
+def test_reinforce_from_interaction_is_normalized(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="preference",
+        content="Prefers concise commit messages focused on the why",
+    )
+    # Response has different casing and extra whitespace — should still match
+    interaction = _make_interaction(
+        response=(
+            "The commit message was short on purpose — the user\n\n"
+            "PREFERS concise commit MESSAGES   focused on the WHY, "
+            "so I stuck to one sentence."
+        ),
+    )
+    results = reinforce_from_interaction(interaction)
+    assert any(r.memory_id == mem.id for r in results)
+
+
+def test_reinforce_from_interaction_deduplicates_across_buckets(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="identity",
+        content="mechanical engineer who runs AtoCore",
+        project="",
+    )
+    # This memory belongs to the identity bucket AND would also be
+    # fetched via the project query if project matched. We want to ensure
+    # we don't double-reinforce.
+    interaction = _make_interaction(
+        response="The mechanical engineer who runs AtoCore asked for this patch.",
+        project="p05-interferometer",
+    )
+    results = reinforce_from_interaction(interaction)
+    assert sum(1 for r in results if r.memory_id == mem.id) == 1
+
+
+# --- automatic reinforcement on record_interaction ------------------------
+
+
+def test_record_interaction_auto_reinforces_by_default(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="preference",
+        content="writes tests before hooking features into API routes",
+        confidence=0.5,
+    )
+    record_interaction(
+        prompt="please add the /foo endpoint with tests",
+        response=(
+            "Wrote tests first, then added the /foo endpoint. The project "
+            "writes tests before hooking features into API routes so the "
+            "order is enforced."
+        ),
+    )
+    reloaded = [m for m in get_memories(memory_type="preference", limit=20) if m.id == mem.id][0]
+    assert reloaded.confidence > 0.5
+    assert reloaded.reference_count == 1
+
+
+def test_record_interaction_reinforce_false_skips_pass(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="preference",
+        content="always includes a rollback note in risky commits",
+        confidence=0.5,
+    )
+    record_interaction(
+        prompt="ignored",
+        response=(
+            "I always includes a rollback note in risky commits, so the "
+            "commit message mentions how to revert if needed."
+        ),
+        reinforce=False,
+    )
+    reloaded = [m for m in get_memories(memory_type="preference", limit=20) if m.id == mem.id][0]
+    assert reloaded.confidence == 0.5
+    assert reloaded.reference_count == 0
+
+
+def test_record_interaction_auto_reinforce_handles_empty_response(tmp_data_dir):
+    init_db()
+    mem = create_memory(memory_type="preference", content="prefers descriptive branch names")
+    # No response text — reinforcement should be a silent no-op
+    record_interaction(prompt="hi", response="", response_summary="")
+    reloaded = [m for m in get_memories(memory_type="preference", limit=20) if m.id == mem.id][0]
+    assert reloaded.reference_count == 0
+
+
+# --- API level ------------------------------------------------------------
+
+
+def test_api_reinforce_endpoint_runs_against_stored_interaction(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="preference",
+        content="rejects commits that touch credential files",
+        confidence=0.5,
+    )
+    interaction = record_interaction(
+        prompt="review commit",
+        response=(
+            "I rejects commits that touch credential files on sight. "
+            "That commit touched ~/.git-credentials, so it was blocked."
+        ),
+        reinforce=False,  # leave untouched for the endpoint to do it
+    )
+
+    client = TestClient(app)
+    response = client.post(f"/interactions/{interaction.id}/reinforce")
+    assert response.status_code == 200
+    body = response.json()
+    assert body["interaction_id"] == interaction.id
+    assert body["reinforced_count"] >= 1
+    ids = [r["memory_id"] for r in body["reinforced"]]
+    assert mem.id in ids
+
+
+def test_api_reinforce_endpoint_returns_404_for_missing(tmp_data_dir):
+    init_db()
+    client = TestClient(app)
+    response = client.post("/interactions/does-not-exist/reinforce")
+    assert response.status_code == 404
+
+
+def test_api_post_interactions_accepts_reinforce_false(tmp_data_dir):
+    init_db()
+    mem = create_memory(
+        memory_type="preference",
+        content="writes runbooks alongside new services",
+        confidence=0.5,
+    )
+    client = TestClient(app)
+    response = client.post(
+        "/interactions",
+        json={
+            "prompt": "review",
+            "response": (
+                "I writes runbooks alongside new services and the diff includes "
+                "one under docs/runbooks/."
+            ),
+            "reinforce": False,
+        },
+    )
+    assert response.status_code == 200
+    reloaded = [m for m in get_memories(memory_type="preference", limit=20) if m.id == mem.id][0]
+    assert reloaded.confidence == 0.5
+    assert reloaded.reference_count == 0
--- a/tests/test_retrieval.py
+++ b/tests/test_retrieval.py
@@ -67,3 +67,109 @@ def test_retrieve_skips_stale_vector_entries(tmp_data_dir, sample_markdown, monk
    results = retrieve("overview", top_k=2)
    assert len(results) == 1
    assert results[0].chunk_id == chunk_ids[0]
+
+
+def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            return {
+                "ids": [["chunk-a", "chunk-b"]],
+                "documents": [["project doc", "other doc"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p04-gigabit/pkm/_index.md",
+                        "tags": '["p04-gigabit"]',
+                        "title": "P04",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p05-interferometer/pkm/_index.md",
+                        "tags": '["p05-interferometer"]',
+                        "title": "P05",
+                        "document_id": "doc-b",
+                    },
+                ]],
+                "distances": [[0.3, 0.25]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: type(
+            "Project",
+            (),
+            {
+                "project_id": "p04-gigabit",
+                "aliases": ("p04", "gigabit"),
+                "ingest_roots": (),
+            },
+        )(),
+    )
+
+    results = retrieve("mirror architecture", top_k=2, project_hint="p04")
+
+    assert len(results) == 2
+    assert results[0].chunk_id == "chunk-a"
+    assert results[0].score > results[1].score
+
+
+def test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths(monkeypatch):
+    class FakeStore:
+        def query(self, query_embedding, top_k=10, where=None):
+            return {
+                "ids": [["chunk-archive", "chunk-requirements"]],
+                "documents": [["archive doc", "requirements doc"]],
+                "metadatas": [[
+                    {
+                        "heading_path": "History",
+                        "source_file": "p05-interferometer/pkm/_archive/old/Error-Budget.md",
+                        "tags": '["p05-interferometer"]',
+                        "title": "Old Error Budget",
+                        "document_id": "doc-a",
+                    },
+                    {
+                        "heading_path": "Overview",
+                        "source_file": "p05-interferometer/pkm/Requirements/Error-Budget.md",
+                        "tags": '["p05-interferometer"]',
+                        "title": "Error Budget",
+                        "document_id": "doc-b",
+                    },
+                ]],
+                "distances": [[0.2, 0.24]],
+            }
+
+    monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
+    monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever._existing_chunk_ids",
+        lambda chunk_ids: set(chunk_ids),
+    )
+    monkeypatch.setattr(
+        "atocore.retrieval.retriever.get_registered_project",
+        lambda project_name: type(
+            "Project",
+            (),
+            {
+                "project_id": "p05-interferometer",
+                "aliases": ("p05", "interferometer"),
+                "ingest_roots": (),
+            },
+        )(),
+    )
+
+    results = retrieve(
+        "interferometer error budget vendor constraints",
+        top_k=2,
+        project_hint="p05-interferometer",
+    )
+
+    assert len(results) == 2
+    assert results[0].chunk_id == "chunk-requirements"
+    assert results[0].score > results[1].score
--- a/tests/test_sources.py
+++ b/tests/test_sources.py
@@ -0,0 +1,70 @@
+"""Tests for configured source registration and readiness."""
+
+import atocore.config as config
+from atocore.ingestion.pipeline import get_source_status, ingest_configured_sources
+
+
+def test_get_source_status_reports_read_only_inputs(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(tmp_path / "vault"))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(tmp_path / "drive"))
+    monkeypatch.setenv("ATOCORE_SOURCE_DRIVE_ENABLED", "false")
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        status = get_source_status()
+    finally:
+        config.settings = original_settings
+
+    assert status[0]["name"] == "vault"
+    assert status[0]["enabled"] is True
+    assert status[0]["read_only"] is True
+    assert status[0]["exists"] is False
+    assert status[1]["name"] == "drive"
+    assert status[1]["enabled"] is False
+
+
+def test_ingest_configured_sources_reports_missing_and_disabled(tmp_path, monkeypatch):
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(tmp_path / "vault"))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(tmp_path / "drive"))
+    monkeypatch.setenv("ATOCORE_SOURCE_DRIVE_ENABLED", "false")
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        results = ingest_configured_sources()
+    finally:
+        config.settings = original_settings
+
+    assert results[0]["source"] == "vault"
+    assert results[0]["status"] == "missing"
+    assert results[1]["source"] == "drive"
+    assert results[1]["status"] == "disabled"
+
+
+def test_ingest_configured_sources_uses_ingest_folder(tmp_path, monkeypatch):
+    vault_dir = tmp_path / "vault"
+    drive_dir = tmp_path / "drive"
+    vault_dir.mkdir()
+    drive_dir.mkdir()
+    monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
+    monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
+
+    calls = []
+
+    def fake_ingest_folder(path, purge_deleted=True):
+        calls.append((str(path), purge_deleted))
+        return [{"file": str(path / "note.md"), "status": "ingested"}]
+
+    original_settings = config.settings
+    try:
+        config.settings = config.Settings()
+        monkeypatch.setattr("atocore.ingestion.pipeline.ingest_folder", fake_ingest_folder)
+        results = ingest_configured_sources()
+    finally:
+        config.settings = original_settings
+
+    assert len(calls) == 2
+    assert all(purge_deleted is False for _, purge_deleted in calls)
+    assert results[0]["status"] == "ingested"
+    assert "results" in results[0]
				`@@ -0,0 +1 @@`
				`"""Operational utilities for running AtoCore safely."""`
				`@@ -0,0 +1 @@`
				`"""Project registry and source refresh helpers."""`