fix(retrieval): enforce project-scoped context boundaries

This commit is contained in:
2026-04-24 10:46:56 -04:00
parent c53e61eb67
commit c7212900b0
11 changed files with 737 additions and 68 deletions

View File

@@ -7,16 +7,16 @@
## Orientation ## Orientation
- **live_sha** (Dalidou `/health` build_sha): `2b86543` (verified 2026-04-23T15:20:53Z post-R14 deploy; status=ok) - **live_sha** (Dalidou `/health` build_sha): `2b86543` (verified 2026-04-23T15:20:53Z post-R14 deploy; status=ok)
- **last_updated**: 2026-04-23 by Claude (R14 squash-merged + deployed; Orientation refreshed) - **last_updated**: 2026-04-24 by Codex (audit-improvements foundation branch; live status refreshed)
- **main_tip**: `2b86543` - **main_tip**: `2b86543`
- **test_count**: 548 (547 + 1 R14 regression test) - **test_count**: 553
- **harness**: `17/18 PASS` on live Dalidou (p04-constraints expects "Zerodur" — known content gap, not regression; consistent since 2026-04-19) - **harness**: `18/20 PASS` on live Dalidou plus 1 known content gap and 1 blocking project-bleed guard pending deploy of this branch
- **vectors**: 33,253 - **vectors**: 33,253
- **active_memories**: 784 (up from 84 pre-density-batch — density gate CRUSHED vs V1-A's 100-target) - **active_memories**: 290 (`/admin/dashboard` 2026-04-24; note integrity panel reports a separate active_memory_count=951 and needs reconciliation)
- **candidate_memories**: 2 (triage queue drained) - **candidate_memories**: 0 (triage queue drained)
- **interactions**: 500+ (limit=2000 query returned 500 — density batch has been running; actual may be higher, confirm via /stats next update) - **interactions**: 950 (`/admin/dashboard` 2026-04-24)
- **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08) - **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08)
- **project_state_entries**: 63 (atocore alone; full cross-project count not re-sampled this update) - **project_state_entries**: 128 across registered projects (`/admin/dashboard` 2026-04-24)
- **entities**: 66 (up from 35 — V1-0 backfill + ongoing work; 0 open conflicts) - **entities**: 66 (up from 35 — V1-0 backfill + ongoing work; 0 open conflicts)
- **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified - **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified
- **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)****pipeline summary (NEW)** - **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)****pipeline summary (NEW)**
@@ -170,6 +170,10 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
## Session Log ## Session Log
- **2026-04-24 Codex (audit improvements foundation)** Started implementation of the audit recommendations on branch `codex/audit-improvements-foundation` from `origin/main@c53e61e`. First tranche: registry-aware project-scoped retrieval filtering (`ATOCORE_RANK_PROJECT_SCOPE_FILTER`, widened candidate pull before filtering), eval harness known-issue lane, two p05 project-bleed fixtures, `scripts/live_status.py`, README/current-state/master-plan status refresh. Verified `pytest -q`: 550 passed in 67.11s. Live retrieval harness against undeployed production: 20 fixtures, 18 pass, 1 known issue (`p04-constraints` Zerodur/1.2 content gap), 1 blocking guard (`p05-broad-status-no-atomizer`) still failing because production has not yet deployed the retrieval filter and currently pulls `P04-GigaBIT-M1-KB-design` into broad p05 status context. Live dashboard refresh: health ok, build `2b86543`, docs 1748, chunks/vectors 33253, interactions 948, active memories 289, candidates 0, project_state total 128. Noted count discrepancy: dashboard memories.active=289 while integrity active_memory_count=951; schedule reconciliation in a follow-up.
- **2026-04-24 Codex (independent-audit hardening)** Applied the Opus independent audit's fast follow-ups before merge/deploy. Closed the two P1s by making project-scope ownership path/tag-based only, adding path-segment/tag-exact matching to avoid short-alias substring collisions, and keeping title/heading text out of provenance decisions. Added regression tests for title poisoning, substring collision, and unknown-project fallback. Added retrieval log fields `raw_results_count`, `post_filter_count`, `post_filter_dropped`, and `underfilled`. Added retrieval-eval run metadata (`generated_at`, `base_url`, `/health`) and `live_status.py` auth-token/status support. README now documents the ranking knobs and clarifies that the hard scope filter and soft project match boost are separate controls. Verified `pytest -q`: 553 passed in 66.07s. Live production remains expected-predeploy: 20 fixtures, 18 pass, 1 known content gap, 1 blocking p05 bleed guard. Latest live dashboard: build `2b86543`, docs 1748, chunks/vectors 33253, interactions 950, active memories 290, candidates 0, project_state total 128.
- **2026-04-23 Codex + Claude (R14 closed)** Codex reviewed `claude/r14-promote-400` at `3888db9`, no findings: "The route change is narrowly scoped: `promote_entity()` still returns False for not-found/not-candidate cases, so the existing 404 behavior remains intact, while caller-fixable validation failures now surface as 400." Ran `pytest tests/test_v1_0_write_invariants.py -q` from an isolated worktree: 15 passed in 1.91s. Claude squash-merged to main as `0989fed`, followed by ledger close-out `2b86543`, then deployed via canonical script. Dalidou `/health` reports build_sha=`2b86543e6ad26011b39a44509cc8df3809725171`, build_time `2026-04-23T15:20:53Z`, status=ok. R14 closed. Orientation refreshed earlier this session also reflected the V1-A gate status: **density gate CLEARED** (784 active memories vs 100 target — density batch-extract ran between 2026-04-22 and 2026-04-23 and more than crushed the gate), **soak gate at day 5 of ~7** (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; only chronic failure is the known p04-constraints "Zerodur" content gap). V1-A branches from a clean V1-0 baseline as soon as the soak is called done. - **2026-04-23 Codex + Claude (R14 closed)** Codex reviewed `claude/r14-promote-400` at `3888db9`, no findings: "The route change is narrowly scoped: `promote_entity()` still returns False for not-found/not-candidate cases, so the existing 404 behavior remains intact, while caller-fixable validation failures now surface as 400." Ran `pytest tests/test_v1_0_write_invariants.py -q` from an isolated worktree: 15 passed in 1.91s. Claude squash-merged to main as `0989fed`, followed by ledger close-out `2b86543`, then deployed via canonical script. Dalidou `/health` reports build_sha=`2b86543e6ad26011b39a44509cc8df3809725171`, build_time `2026-04-23T15:20:53Z`, status=ok. R14 closed. Orientation refreshed earlier this session also reflected the V1-A gate status: **density gate CLEARED** (784 active memories vs 100 target — density batch-extract ran between 2026-04-22 and 2026-04-23 and more than crushed the gate), **soak gate at day 5 of ~7** (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; only chronic failure is the known p04-constraints "Zerodur" content gap). V1-A branches from a clean V1-0 baseline as soon as the soak is called done.
- **2026-04-22 Codex + Antoine (V1-0 closed)** Codex approved `f16cd52` after re-running both original probes (legacy-candidate promote + supersede hook — both correct) and the three targeted regression suites (`test_v1_0_write_invariants.py`, `test_engineering_v1_phase5.py`, `test_inbox_crossproject.py` — all pass). Squash-merged to main as `2712c5d` ("feat(engineering): enforce V1-0 write invariants"). Deployed to Dalidou via the canonical deploy script; `/health` build_sha=`2712c5d2d03cb2a6af38b559664afd1c4cd0e050` status=ok. Validated backup snapshot at `/srv/storage/atocore/backups/snapshots/20260422T190624Z` taken BEFORE prod backfill. Prod backfill of `scripts/v1_0_backfill_provenance.py` against live DB: dry-run found 31 active/superseded entities with no provenance, list reviewed and looked sane; live run with default `hand_authored=1` flag path updated 31 rows; follow-up dry-run returned 0 rows remaining → no lingering F-8 violations in prod. Codex logged one residual P2 (R14): HTTP `POST /entities/{id}/promote` route doesn't translate the new service-layer `ValueError` into 400 — legacy bad candidate promoted through the API surfaces as 500. Not blocking. V1-0 closed. **Gates for V1-A**: soak window ends ~2026-04-26; 100-active-memory density target (currently 84 active + the ~31 newly flagged ones — need to check how those count in density math). V1-A holds until both gates clear. - **2026-04-22 Codex + Antoine (V1-0 closed)** Codex approved `f16cd52` after re-running both original probes (legacy-candidate promote + supersede hook — both correct) and the three targeted regression suites (`test_v1_0_write_invariants.py`, `test_engineering_v1_phase5.py`, `test_inbox_crossproject.py` — all pass). Squash-merged to main as `2712c5d` ("feat(engineering): enforce V1-0 write invariants"). Deployed to Dalidou via the canonical deploy script; `/health` build_sha=`2712c5d2d03cb2a6af38b559664afd1c4cd0e050` status=ok. Validated backup snapshot at `/srv/storage/atocore/backups/snapshots/20260422T190624Z` taken BEFORE prod backfill. Prod backfill of `scripts/v1_0_backfill_provenance.py` against live DB: dry-run found 31 active/superseded entities with no provenance, list reviewed and looked sane; live run with default `hand_authored=1` flag path updated 31 rows; follow-up dry-run returned 0 rows remaining → no lingering F-8 violations in prod. Codex logged one residual P2 (R14): HTTP `POST /entities/{id}/promote` route doesn't translate the new service-layer `ValueError` into 400 — legacy bad candidate promoted through the API surfaces as 500. Not blocking. V1-0 closed. **Gates for V1-A**: soak window ends ~2026-04-26; 100-active-memory density target (currently 84 active + the ~31 newly flagged ones — need to check how those count in density math). V1-A holds until both gates clear.

View File

@@ -6,7 +6,7 @@ Personal context engine that enriches LLM interactions with durable memory, stru
```bash ```bash
pip install -e . pip install -e .
uvicorn src.atocore.main:app --port 8100 uvicorn atocore.main:app --port 8100
``` ```
## Usage ## Usage
@@ -37,6 +37,10 @@ python scripts/atocore_client.py audit-query "gigabit" 5
| POST | /ingest | Ingest markdown file or folder | | POST | /ingest | Ingest markdown file or folder |
| POST | /query | Retrieve relevant chunks | | POST | /query | Retrieve relevant chunks |
| POST | /context/build | Build full context pack | | POST | /context/build | Build full context pack |
| POST | /interactions | Capture prompt/response interactions |
| GET/POST | /memory | List/create durable memories |
| GET/POST | /entities | Engineering entity graph surface |
| GET | /admin/dashboard | Operator dashboard |
| GET | /health | Health check | | GET | /health | Health check |
| GET | /debug/context | Inspect last context pack | | GET | /debug/context | Inspect last context pack |
@@ -66,8 +70,10 @@ unversioned forms.
FastAPI (port 8100) FastAPI (port 8100)
|- Ingestion: markdown -> parse -> chunk -> embed -> store |- Ingestion: markdown -> parse -> chunk -> embed -> store
|- Retrieval: query -> embed -> vector search -> rank |- Retrieval: query -> embed -> vector search -> rank
|- Context Builder: retrieve -> boost -> budget -> format |- Context Builder: project state -> memories -> entities -> retrieval -> budget
|- SQLite (documents, chunks, memories, projects, interactions) |- Reflection: capture -> reinforce -> extract -> triage -> promote/expire
|- Engineering: typed entities, relationships, conflicts, wiki/mirror
|- SQLite (documents, chunks, memories, projects, interactions, entities)
'- ChromaDB (vector embeddings) '- ChromaDB (vector embeddings)
``` ```
@@ -82,6 +88,16 @@ Set via environment variables (prefix `ATOCORE_`):
| ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) | | ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) |
| ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) | | ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) |
| ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model | | ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model |
| ATOCORE_RANK_PROJECT_MATCH_BOOST | 2.0 | Soft boost for chunks whose metadata matches the project hint |
| ATOCORE_RANK_PROJECT_SCOPE_FILTER | true | Filter project-hinted retrieval away from other registered project corpora |
| ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER | 4 | Widen candidate pull before project-scope filtering |
| ATOCORE_RANK_QUERY_TOKEN_STEP | 0.08 | Per-token boost when query terms appear in high-signal metadata |
| ATOCORE_RANK_QUERY_TOKEN_CAP | 1.32 | Maximum query-token boost multiplier |
| ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST | 1.18 | Boost current decision/status/requirements-like paths |
| ATOCORE_RANK_PATH_LOW_SIGNAL_PENALTY | 0.72 | Down-rank archive/history-like paths |
`ATOCORE_RANK_PROJECT_SCOPE_FILTER` gates the hard cross-project filter only.
`ATOCORE_RANK_PROJECT_MATCH_BOOST` remains the separate soft-ranking knob.
## Testing ## Testing
@@ -93,7 +109,10 @@ pytest
## Operations ## Operations
- `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits. - `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
- `scripts/retrieval_eval.py` runs the live retrieval/context harness, separates blocking failures from known content gaps, and stamps JSON output with target/build metadata.
- `scripts/live_status.py` renders a compact read-only status report from `/health`, `/stats`, `/projects`, and `/admin/dashboard`; set `ATOCORE_AUTH_TOKEN` or `--auth-token` when those endpoints are gated.
- `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation. - `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
- `DEV-LEDGER.md` is the fast-moving source of operational truth during active development; copy claims into docs only after checking the live service.
## Architecture Notes ## Architecture Notes

View File

@@ -1,6 +1,7 @@
# AtoCore Current State (2026-04-22) # AtoCore - Current State (2026-04-24)
Live deploy: `2712c5d` · Dalidou health: ok · Harness: 17/18 · Tests: 547 passing. Live deploy: `2b86543` · Dalidou health: ok · Harness: 18/20 with 1 known
content gap and 1 current blocking project-bleed guard · Tests: 553 passing.
## V1-0 landed 2026-04-22 ## V1-0 landed 2026-04-22
@@ -13,9 +14,8 @@ supersede) with Q-3 fail-open. Prod backfill ran cleanly — 31 legacy
active/superseded entities flagged `hand_authored=1`, follow-up dry-run active/superseded entities flagged `hand_authored=1`, follow-up dry-run
returned 0 remaining rows. Test count 533 → 547 (+14). returned 0 remaining rows. Test count 533 → 547 (+14).
R14 (P2, non-blocking): `POST /entities/{id}/promote` route fix translates R14 is closed: `POST /entities/{id}/promote` now translates the new
the new `ValueError` into 400. Branch `claude/r14-promote-400` pending caller-fixable V1-0 `ValueError` into HTTP 400.
Codex review + squash-merge.
**Next in the V1 track:** V1-A (minimal query slice + Q-6 killer-correctness **Next in the V1 track:** V1-A (minimal query slice + Q-6 killer-correctness
integration). Gated on pipeline soak (~2026-04-26) + 100+ active memory integration). Gated on pipeline soak (~2026-04-26) + 100+ active memory
@@ -65,10 +65,10 @@ Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs
| 7G | Re-extraction on prompt version bump | pending | | 7G | Re-extraction on prompt version bump | pending |
| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending | | 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |
## Known gaps (honest) ## Known gaps (honest, refreshed 2026-04-24)
1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below. 1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
2. **OpenClaw is capture-only, not context-grounded.** The plugin POSTs `/interactions` on `llm_output` but does NOT call `/context/build` on `before_agent_start`. OpenClaw's underlying agent runs blind. See Q2 below. 2. **Project-scoped retrieval still needs deployment verification.** The April 24 audit reproduced cross-project competition on broad p05 prompts. The current branch adds registry-aware project filtering and a harness guard; verify after deploy.
3. **Human interface (wiki) is thin and static.** 5 project cards + a "System" line. No dashboard for the autonomous activity. No per-memory detail page. See Q3/Q5. 3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
4. **Harness 17/18** — the `p04-constraints` fixture wants "Zerodur" but retrieval surfaces related-not-exact terms. Content gap, not a retrieval regression. 4. **Harness known issue:** `p04-constraints` wants "Zerodur" and "1.2"; live retrieval surfaces related constraints but not those exact strings. Treat as content/state gap until fixed.
5. **Two projects under-populated**: p05-interferometer (4 memories, 18 state) and atomizer-v2 (1 memory, 6 state). Batch re-extract with the new llm-0.6.0 prompt would help. 5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.

View File

@@ -70,9 +70,14 @@ read-only additive mode.
- Phase 6 - AtoDrive - Phase 6 - AtoDrive
- Phase 10 - Write-back - Phase 10 - Write-back
- Phase 11 - Multi-model - Phase 11 - Multi-model
- Phase 12 - Evaluation
- Phase 13 - Hardening - Phase 13 - Hardening
### Partial / Operational Baseline
- Phase 12 - Evaluation. The retrieval/context harness exists and runs
against live Dalidou, but coverage is still intentionally small and
should grow before this is complete in the intended sense.
### Engineering Layer Planning Sprint ### Engineering Layer Planning Sprint
**Status: complete.** All 8 architecture docs are drafted. The **Status: complete.** All 8 architecture docs are drafted. The
@@ -126,11 +131,13 @@ This sits implicitly between Phase 8 (OpenClaw) and Phase 11
(multi-model). Memory-review and engineering-entity commands are (multi-model). Memory-review and engineering-entity commands are
deferred from the shared client until their workflows are exercised. deferred from the shared client until their workflows are exercised.
## What Is Real Today (updated 2026-04-16) ## What Is Real Today (updated 2026-04-24)
- canonical AtoCore runtime on Dalidou (`775960c`, deploy.sh verified) - canonical AtoCore runtime on Dalidou (`2b86543`, deploy.sh verified)
- 33,253 vectors across 6 registered projects - 33,253 vectors across 6 registered projects
- 234 captured interactions (192 claude-code, 38 openclaw, 4 test) - 950 captured interactions as of the 2026-04-24 live dashboard; refresh
exact live counts with
`python scripts/live_status.py`
- 6 registered projects: - 6 registered projects:
- `p04-gigabit` (483 docs, 15 state entries) - `p04-gigabit` (483 docs, 15 state entries)
- `p05-interferometer` (109 docs, 18 state entries) - `p05-interferometer` (109 docs, 18 state entries)
@@ -138,12 +145,15 @@ deferred from the shared client until their workflows are exercised.
- `atomizer-v2` (568 docs, 5 state entries) - `atomizer-v2` (568 docs, 5 state entries)
- `abb-space` (6 state entries) - `abb-space` (6 state entries)
- `atocore` (drive source, 47 state entries) - `atocore` (drive source, 47 state entries)
- 110 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones) - 128 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
- 84 active memories (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity) - 290 active memories and 0 candidate memories as of the 2026-04-24 live
dashboard
- context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks - context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
- query-relevance memory ranking with overlap-density scoring - query-relevance memory ranking with overlap-density scoring
- retrieval eval harness: 18 fixtures, 17/18 passing on live - retrieval eval harness: 20 fixtures; current live has 18 pass, 1 known
- 303 tests passing content gap, and 1 blocking cross-project bleed guard targeted by the
current retrieval-scoping branch
- 553 tests passing on the audit-improvements branch
- nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness****pipeline summary to project state** - nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness****pipeline summary to project state**
- Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced) - Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
- pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats - pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
@@ -190,9 +200,9 @@ where surfaces are disjoint, pauses when they collide.
| V1-E | Memory→entity graduation end-to-end + remaining Q-4 trust tests | pending V1-D (note: collides with memory extractor; pauses for multi-model triage work) | | V1-E | Memory→entity graduation end-to-end + remaining Q-4 trust tests | pending V1-D (note: collides with memory extractor; pauses for multi-model triage work) |
| V1-F | F-5 detector generalization + route alias + O-1/O-2/O-3 operational + D-1/D-3/D-4 docs | finish line | | V1-F | F-5 detector generalization + route alias + O-1/O-2/O-3 operational + D-1/D-3/D-4 docs | finish line |
R14 (P2, non-blocking): `POST /entities/{id}/promote` route returns 500 R14 is closed: `POST /entities/{id}/promote` now translates
on the new V1-0 `ValueError` instead of 400. Fix on branch caller-fixable V1-0 provenance validation failures into HTTP 400 instead
`claude/r14-promote-400`, pending Codex review. of leaking as HTTP 500.
## Next ## Next

131
scripts/live_status.py Normal file
View File

@@ -0,0 +1,131 @@
"""Render a compact live-status report from a running AtoCore instance.
This is intentionally read-only and stdlib-only so it can be used from a
fresh checkout, a cron job, or a Codex/Claude session without installing the
full app package. The output is meant to reduce docs drift: copy the report
into status docs only after it was generated from the live service.
"""
from __future__ import annotations
import argparse
import errno
import json
import os
import sys
import urllib.error
import urllib.request
from typing import Any
DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100").rstrip("/")
DEFAULT_TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
DEFAULT_AUTH_TOKEN = os.environ.get("ATOCORE_AUTH_TOKEN", "").strip()
def request_json(base_url: str, path: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
headers = {"Authorization": f"Bearer {auth_token}"} if auth_token else {}
req = urllib.request.Request(f"{base_url}{path}", method="GET", headers=headers)
with urllib.request.urlopen(req, timeout=timeout) as response:
body = response.read().decode("utf-8")
status = getattr(response, "status", None)
payload = json.loads(body) if body.strip() else {}
if not isinstance(payload, dict):
payload = {"value": payload}
if status is not None:
payload["_http_status"] = status
return payload
def collect_status(base_url: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
payload: dict[str, Any] = {"base_url": base_url}
for name, path in {
"health": "/health",
"stats": "/stats",
"projects": "/projects",
"dashboard": "/admin/dashboard",
}.items():
try:
payload[name] = request_json(base_url, path, timeout, auth_token)
except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
payload[name] = {"error": str(exc)}
return payload
def render_markdown(status: dict[str, Any]) -> str:
health = status.get("health", {})
stats = status.get("stats", {})
projects = status.get("projects", {}).get("projects", [])
dashboard = status.get("dashboard", {})
memories = dashboard.get("memories", {}) if isinstance(dashboard.get("memories"), dict) else {}
project_state = dashboard.get("project_state", {}) if isinstance(dashboard.get("project_state"), dict) else {}
interactions = dashboard.get("interactions", {}) if isinstance(dashboard.get("interactions"), dict) else {}
pipeline = dashboard.get("pipeline", {}) if isinstance(dashboard.get("pipeline"), dict) else {}
lines = [
"# AtoCore Live Status",
"",
f"- base_url: `{status.get('base_url', '')}`",
"- endpoint_http_statuses: "
f"`health={health.get('_http_status', 'error')}, "
f"stats={stats.get('_http_status', 'error')}, "
f"projects={status.get('projects', {}).get('_http_status', 'error')}, "
f"dashboard={dashboard.get('_http_status', 'error')}`",
f"- service_status: `{health.get('status', 'unknown')}`",
f"- code_version: `{health.get('code_version', health.get('version', 'unknown'))}`",
f"- build_sha: `{health.get('build_sha', 'unknown')}`",
f"- build_branch: `{health.get('build_branch', 'unknown')}`",
f"- build_time: `{health.get('build_time', 'unknown')}`",
f"- env: `{health.get('env', 'unknown')}`",
f"- documents: `{stats.get('total_documents', 'unknown')}`",
f"- chunks: `{stats.get('total_chunks', 'unknown')}`",
f"- vectors: `{stats.get('total_vectors', health.get('vectors_count', 'unknown'))}`",
f"- registered_projects: `{len(projects)}`",
f"- active_memories: `{memories.get('active', 'unknown')}`",
f"- candidate_memories: `{memories.get('candidates', 'unknown')}`",
f"- interactions: `{interactions.get('total', 'unknown')}`",
f"- project_state_entries: `{project_state.get('total', 'unknown')}`",
f"- pipeline_last_run: `{pipeline.get('last_run', 'unknown')}`",
]
if projects:
lines.extend(["", "## Projects"])
for project in projects:
aliases = ", ".join(project.get("aliases", []))
suffix = f" ({aliases})" if aliases else ""
lines.append(f"- `{project.get('id', '')}`{suffix}")
return "\n".join(lines) + "\n"
def main() -> int:
parser = argparse.ArgumentParser(description="Render live AtoCore status")
parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
parser.add_argument("--timeout", type=int, default=DEFAULT_TIMEOUT)
parser.add_argument(
"--auth-token",
default=DEFAULT_AUTH_TOKEN,
help="Bearer token; defaults to ATOCORE_AUTH_TOKEN when set",
)
parser.add_argument("--json", action="store_true", help="emit raw JSON")
args = parser.parse_args()
status = collect_status(args.base_url.rstrip("/"), args.timeout, args.auth_token)
if args.json:
output = json.dumps(status, indent=2, ensure_ascii=True) + "\n"
else:
output = render_markdown(status)
try:
sys.stdout.write(output)
except BrokenPipeError:
return 0
except OSError as exc:
if exc.errno in {errno.EINVAL, errno.EPIPE}:
return 0
raise
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -44,6 +44,7 @@ import urllib.error
import urllib.parse import urllib.parse
import urllib.request import urllib.request
from dataclasses import dataclass, field from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path from pathlib import Path
DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100") DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
@@ -52,6 +53,13 @@ DEFAULT_BUDGET = 3000
DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json" DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json"
def request_json(base_url: str, path: str, timeout: int) -> dict:
req = urllib.request.Request(f"{base_url}{path}", method="GET")
with urllib.request.urlopen(req, timeout=timeout) as resp:
body = resp.read().decode("utf-8")
return json.loads(body) if body.strip() else {}
@dataclass @dataclass
class Fixture: class Fixture:
name: str name: str
@@ -60,6 +68,7 @@ class Fixture:
budget: int = DEFAULT_BUDGET budget: int = DEFAULT_BUDGET
expect_present: list[str] = field(default_factory=list) expect_present: list[str] = field(default_factory=list)
expect_absent: list[str] = field(default_factory=list) expect_absent: list[str] = field(default_factory=list)
known_issue: bool = False
notes: str = "" notes: str = ""
@@ -70,8 +79,13 @@ class FixtureResult:
missing_present: list[str] missing_present: list[str]
unexpected_absent: list[str] unexpected_absent: list[str]
total_chars: int total_chars: int
known_issue: bool = False
error: str = "" error: str = ""
@property
def blocking_failure(self) -> bool:
return not self.ok and not self.known_issue
def load_fixtures(path: Path) -> list[Fixture]: def load_fixtures(path: Path) -> list[Fixture]:
data = json.loads(path.read_text(encoding="utf-8")) data = json.loads(path.read_text(encoding="utf-8"))
@@ -89,6 +103,7 @@ def load_fixtures(path: Path) -> list[Fixture]:
budget=int(raw.get("budget", DEFAULT_BUDGET)), budget=int(raw.get("budget", DEFAULT_BUDGET)),
expect_present=list(raw.get("expect_present", [])), expect_present=list(raw.get("expect_present", [])),
expect_absent=list(raw.get("expect_absent", [])), expect_absent=list(raw.get("expect_absent", [])),
known_issue=bool(raw.get("known_issue", False)),
notes=raw.get("notes", ""), notes=raw.get("notes", ""),
) )
) )
@@ -117,6 +132,7 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
missing_present=list(fixture.expect_present), missing_present=list(fixture.expect_present),
unexpected_absent=[], unexpected_absent=[],
total_chars=0, total_chars=0,
known_issue=fixture.known_issue,
error=f"http_error: {exc}", error=f"http_error: {exc}",
) )
@@ -129,16 +145,26 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
missing_present=missing, missing_present=missing,
unexpected_absent=unexpected, unexpected_absent=unexpected,
total_chars=len(formatted), total_chars=len(formatted),
known_issue=fixture.known_issue,
) )
def print_human_report(results: list[FixtureResult]) -> None: def print_human_report(results: list[FixtureResult], metadata: dict) -> None:
total = len(results) total = len(results)
passed = sum(1 for r in results if r.ok) passed = sum(1 for r in results if r.ok)
known = sum(1 for r in results if not r.ok and r.known_issue)
blocking = sum(1 for r in results if r.blocking_failure)
print(f"Retrieval eval: {passed}/{total} fixtures passed") print(f"Retrieval eval: {passed}/{total} fixtures passed")
print(
"Target: "
f"{metadata.get('base_url', 'unknown')} "
f"build={metadata.get('health', {}).get('build_sha', 'unknown')}"
)
if known or blocking:
print(f"Blocking failures: {blocking} Known issues: {known}")
print() print()
for r in results: for r in results:
marker = "PASS" if r.ok else "FAIL" marker = "PASS" if r.ok else ("KNOWN" if r.known_issue else "FAIL")
print(f"[{marker}] {r.fixture.name} project={r.fixture.project} chars={r.total_chars}") print(f"[{marker}] {r.fixture.name} project={r.fixture.project} chars={r.total_chars}")
if r.error: if r.error:
print(f" error: {r.error}") print(f" error: {r.error}")
@@ -150,15 +176,21 @@ def print_human_report(results: list[FixtureResult]) -> None:
print(f" notes: {r.fixture.notes}") print(f" notes: {r.fixture.notes}")
def print_json_report(results: list[FixtureResult]) -> None: def print_json_report(results: list[FixtureResult], metadata: dict) -> None:
payload = { payload = {
"generated_at": metadata.get("generated_at"),
"base_url": metadata.get("base_url"),
"health": metadata.get("health", {}),
"total": len(results), "total": len(results),
"passed": sum(1 for r in results if r.ok), "passed": sum(1 for r in results if r.ok),
"known_issues": sum(1 for r in results if not r.ok and r.known_issue),
"blocking_failures": sum(1 for r in results if r.blocking_failure),
"fixtures": [ "fixtures": [
{ {
"name": r.fixture.name, "name": r.fixture.name,
"project": r.fixture.project, "project": r.fixture.project,
"ok": r.ok, "ok": r.ok,
"known_issue": r.known_issue,
"total_chars": r.total_chars, "total_chars": r.total_chars,
"missing_present": r.missing_present, "missing_present": r.missing_present,
"unexpected_absent": r.unexpected_absent, "unexpected_absent": r.unexpected_absent,
@@ -179,15 +211,26 @@ def main() -> int:
parser.add_argument("--json", action="store_true", help="emit machine-readable JSON") parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
args = parser.parse_args() args = parser.parse_args()
base_url = args.base_url.rstrip("/")
try:
health = request_json(base_url, "/health", args.timeout)
except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
health = {"error": str(exc)}
metadata = {
"generated_at": datetime.now(timezone.utc).isoformat(),
"base_url": base_url,
"health": health,
}
fixtures = load_fixtures(args.fixtures) fixtures = load_fixtures(args.fixtures)
results = [run_fixture(f, args.base_url, args.timeout) for f in fixtures] results = [run_fixture(f, base_url, args.timeout) for f in fixtures]
if args.json: if args.json:
print_json_report(results) print_json_report(results, metadata)
else: else:
print_human_report(results) print_human_report(results, metadata)
return 0 if all(r.ok for r in results) else 1 return 0 if not any(r.blocking_failure for r in results) else 1
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -27,7 +27,8 @@
"expect_absent": [ "expect_absent": [
"polisher suite" "polisher suite"
], ],
"notes": "Key constraints are in Trusted Project State and in the mission-framing memory" "known_issue": true,
"notes": "Known content gap as of 2026-04-24: live retrieval surfaces related constraints but not the exact Zerodur / 1.2 strings. Keep visible, but do not make nightly harness red until the source/state gap is fixed."
}, },
{ {
"name": "p04-short-ambiguous", "name": "p04-short-ambiguous",
@@ -80,6 +81,36 @@
], ],
"notes": "CGH is a core p05 concept. Should surface via chunks and possibly the architecture memory. Must not bleed p06 polisher-suite terms." "notes": "CGH is a core p05 concept. Should surface via chunks and possibly the architecture memory. Must not bleed p06 polisher-suite terms."
}, },
{
"name": "p05-broad-status-no-atomizer",
"project": "p05-interferometer",
"prompt": "current status",
"expect_present": [
"--- Trusted Project State ---",
"--- Project Memories ---",
"Zygo"
],
"expect_absent": [
"atomizer-v2",
"ATOMIZER_PODCAST_BRIEFING",
"[Source: atomizer-v2/",
"P04-GigaBIT-M1-KB-design"
],
"notes": "Regression guard for the April 24 audit finding: broad p05 status queries must not pull Atomizer/archive context into project-scoped packs."
},
{
"name": "p05-vendor-decision-no-archive-first",
"project": "p05-interferometer",
"prompt": "vendor selection decision",
"expect_present": [
"Selection-Decision"
],
"expect_absent": [
"[Source: atomizer-v2/",
"ATOMIZER_PODCAST_BRIEFING"
],
"notes": "Project-scoped decision query should stay inside p05 and prefer current decision/vendor material over unrelated project archives."
},
{ {
"name": "p06-suite-split", "name": "p06-suite-split",
"project": "p06-polisher", "project": "p06-polisher",

View File

@@ -46,6 +46,8 @@ class Settings(BaseSettings):
# All multipliers default to the values used since Wave 1; tighten or # All multipliers default to the values used since Wave 1; tighten or
# loosen them via ATOCORE_* env vars without touching code. # loosen them via ATOCORE_* env vars without touching code.
rank_project_match_boost: float = 2.0 rank_project_match_boost: float = 2.0
rank_project_scope_filter: bool = True
rank_project_scope_candidate_multiplier: int = 4
rank_query_token_step: float = 0.08 rank_query_token_step: float = 0.08
rank_query_token_cap: float = 1.32 rank_query_token_cap: float = 1.32
rank_path_high_signal_boost: float = 1.18 rank_path_high_signal_boost: float = 1.18

View File

@@ -1,5 +1,6 @@
"""Retrieval: query to ranked chunks.""" """Retrieval: query to ranked chunks."""
import json
import re import re
import time import time
from dataclasses import dataclass from dataclasses import dataclass
@@ -7,7 +8,7 @@ from dataclasses import dataclass
import atocore.config as _config import atocore.config as _config
from atocore.models.database import get_connection from atocore.models.database import get_connection
from atocore.observability.logger import get_logger from atocore.observability.logger import get_logger
from atocore.projects.registry import get_registered_project from atocore.projects.registry import RegisteredProject, get_registered_project, load_project_registry
from atocore.retrieval.embeddings import embed_query from atocore.retrieval.embeddings import embed_query
from atocore.retrieval.vector_store import get_vector_store from atocore.retrieval.vector_store import get_vector_store
@@ -83,6 +84,19 @@ def retrieve(
"""Retrieve the most relevant chunks for a query.""" """Retrieve the most relevant chunks for a query."""
top_k = top_k or _config.settings.context_top_k top_k = top_k or _config.settings.context_top_k
start = time.time() start = time.time()
scoped_project = get_registered_project(project_hint) if project_hint else None
scope_filter_enabled = bool(scoped_project and _config.settings.rank_project_scope_filter)
registered_projects = None
query_top_k = top_k
if scope_filter_enabled:
query_top_k = max(
top_k,
top_k * max(1, _config.settings.rank_project_scope_candidate_multiplier),
)
try:
registered_projects = load_project_registry()
except Exception:
registered_projects = None
query_embedding = embed_query(query) query_embedding = embed_query(query)
store = get_vector_store() store = get_vector_store()
@@ -101,11 +115,12 @@ def retrieve(
results = store.query( results = store.query(
query_embedding=query_embedding, query_embedding=query_embedding,
top_k=top_k, top_k=query_top_k,
where=where, where=where,
) )
chunks = [] chunks = []
raw_result_count = len(results["ids"][0]) if results and results["ids"] and results["ids"][0] else 0
if results and results["ids"] and results["ids"][0]: if results and results["ids"] and results["ids"][0]:
existing_ids = _existing_chunk_ids(results["ids"][0]) existing_ids = _existing_chunk_ids(results["ids"][0])
for i, chunk_id in enumerate(results["ids"][0]): for i, chunk_id in enumerate(results["ids"][0]):
@@ -117,6 +132,13 @@ def retrieve(
meta = results["metadatas"][0][i] if results["metadatas"] else {} meta = results["metadatas"][0][i] if results["metadatas"] else {}
content = results["documents"][0][i] if results["documents"] else "" content = results["documents"][0][i] if results["documents"] else ""
if scope_filter_enabled and not _is_allowed_for_project_scope(
scoped_project,
meta,
registered_projects,
):
continue
score *= _query_match_boost(query, meta) score *= _query_match_boost(query, meta)
score *= _path_signal_boost(meta) score *= _path_signal_boost(meta)
if project_hint: if project_hint:
@@ -137,42 +159,139 @@ def retrieve(
duration_ms = int((time.time() - start) * 1000) duration_ms = int((time.time() - start) * 1000)
chunks.sort(key=lambda chunk: chunk.score, reverse=True) chunks.sort(key=lambda chunk: chunk.score, reverse=True)
post_filter_count = len(chunks)
chunks = chunks[:top_k]
log.info( log.info(
"retrieval_done", "retrieval_done",
query=query[:100], query=query[:100],
top_k=top_k, top_k=top_k,
query_top_k=query_top_k,
raw_results_count=raw_result_count,
post_filter_count=post_filter_count,
results_count=len(chunks), results_count=len(chunks),
post_filter_dropped=max(0, raw_result_count - post_filter_count),
underfilled=bool(raw_result_count >= query_top_k and len(chunks) < top_k),
duration_ms=duration_ms, duration_ms=duration_ms,
) )
return chunks return chunks
def _is_allowed_for_project_scope(
project: RegisteredProject,
metadata: dict,
registered_projects: list[RegisteredProject] | None = None,
) -> bool:
"""Return True when a chunk is target-project or not project-owned.
Project-hinted retrieval should not let one registered project's corpus
compete with another's. At the same time, unowned/global sources should
remain eligible because shared docs and cross-project references can be
genuinely useful. The registry gives us the boundary: if metadata matches
a registered project and it is not the requested project, filter it out.
"""
if _metadata_matches_project(project, metadata):
return True
if registered_projects is None:
try:
registered_projects = load_project_registry()
except Exception:
return True
for other in registered_projects:
if other.project_id == project.project_id:
continue
if _metadata_matches_project(other, metadata):
return False
return True
def _metadata_matches_project(project: RegisteredProject, metadata: dict) -> bool:
path = _metadata_source_path(metadata)
tags = _metadata_tags(metadata)
for term in _project_scope_terms(project):
if _path_matches_term(path, term) or term in tags:
return True
return False
def _project_scope_terms(project: RegisteredProject) -> set[str]:
terms = {project.project_id.lower()}
terms.update(alias.lower() for alias in project.aliases)
for source_ref in project.ingest_roots:
normalized = source_ref.subpath.replace("\\", "/").strip("/").lower()
if normalized:
terms.add(normalized)
terms.add(normalized.split("/")[-1])
return {term for term in terms if term}
def _metadata_searchable(metadata: dict) -> str:
return " ".join(
[
str(metadata.get("source_file", "")).replace("\\", "/").lower(),
str(metadata.get("title", "")).lower(),
str(metadata.get("heading_path", "")).lower(),
str(metadata.get("tags", "")).lower(),
]
)
def _metadata_source_path(metadata: dict) -> str:
return str(metadata.get("source_file", "")).replace("\\", "/").strip("/").lower()
def _metadata_tags(metadata: dict) -> set[str]:
raw_tags = metadata.get("tags", [])
if isinstance(raw_tags, (list, tuple, set)):
return {str(tag).strip().lower() for tag in raw_tags if str(tag).strip()}
if isinstance(raw_tags, str):
try:
parsed = json.loads(raw_tags)
except json.JSONDecodeError:
parsed = [raw_tags]
if isinstance(parsed, (list, tuple, set)):
return {str(tag).strip().lower() for tag in parsed if str(tag).strip()}
if isinstance(parsed, str) and parsed.strip():
return {parsed.strip().lower()}
return set()
def _path_matches_term(path: str, term: str) -> bool:
normalized = term.replace("\\", "/").strip("/").lower()
if not path or not normalized:
return False
if "/" in normalized:
return path == normalized or path.startswith(f"{normalized}/")
return normalized in set(path.split("/"))
def _metadata_has_term(metadata: dict, term: str) -> bool:
normalized = term.replace("\\", "/").strip("/").lower()
if not normalized:
return False
if _path_matches_term(_metadata_source_path(metadata), normalized):
return True
if normalized in _metadata_tags(metadata):
return True
return re.search(
rf"(?<![a-z0-9]){re.escape(normalized)}(?![a-z0-9])",
_metadata_searchable(metadata),
) is not None
def _project_match_boost(project_hint: str, metadata: dict) -> float: def _project_match_boost(project_hint: str, metadata: dict) -> float:
"""Return a project-aware relevance multiplier for raw retrieval.""" """Return a project-aware relevance multiplier for raw retrieval."""
hint_lower = project_hint.strip().lower() hint_lower = project_hint.strip().lower()
if not hint_lower: if not hint_lower:
return 1.0 return 1.0
source_file = str(metadata.get("source_file", "")).lower()
title = str(metadata.get("title", "")).lower()
tags = str(metadata.get("tags", "")).lower()
searchable = " ".join([source_file, title, tags])
project = get_registered_project(project_hint) project = get_registered_project(project_hint)
candidate_names = {hint_lower} candidate_names = _project_scope_terms(project) if project is not None else {hint_lower}
if project is not None:
candidate_names.add(project.project_id.lower())
candidate_names.update(alias.lower() for alias in project.aliases)
candidate_names.update(
source_ref.subpath.replace("\\", "/").strip("/").split("/")[-1].lower()
for source_ref in project.ingest_roots
if source_ref.subpath.strip("/\\")
)
for candidate in candidate_names: for candidate in candidate_names:
if candidate and candidate in searchable: if _metadata_has_term(metadata, candidate):
return _config.settings.rank_project_match_boost return _config.settings.rank_project_match_boost
return 1.0 return 1.0

View File

@@ -46,6 +46,8 @@ def test_settings_keep_legacy_db_path_when_present(tmp_path, monkeypatch):
def test_ranking_weights_are_tunable_via_env(monkeypatch): def test_ranking_weights_are_tunable_via_env(monkeypatch):
monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5") monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5")
monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_FILTER", "false")
monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER", "6")
monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12") monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12")
monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5") monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5")
monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25") monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25")
@@ -54,6 +56,8 @@ def test_ranking_weights_are_tunable_via_env(monkeypatch):
settings = config.Settings() settings = config.Settings()
assert settings.rank_project_match_boost == 3.5 assert settings.rank_project_match_boost == 3.5
assert settings.rank_project_scope_filter is False
assert settings.rank_project_scope_candidate_multiplier == 6
assert settings.rank_query_token_step == 0.12 assert settings.rank_query_token_step == 0.12
assert settings.rank_query_token_cap == 1.5 assert settings.rank_query_token_cap == 1.5
assert settings.rank_path_high_signal_boost == 1.25 assert settings.rank_path_high_signal_boost == 1.25

View File

@@ -70,8 +70,28 @@ def test_retrieve_skips_stale_vector_entries(tmp_data_dir, sample_markdown, monk
def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch): def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
target_project = type(
"Project",
(),
{
"project_id": "p04-gigabit",
"aliases": ("p04", "gigabit"),
"ingest_roots": (),
},
)()
other_project = type(
"Project",
(),
{
"project_id": "p05-interferometer",
"aliases": ("p05", "interferometer"),
"ingest_roots": (),
},
)()
class FakeStore: class FakeStore:
def query(self, query_embedding, top_k=10, where=None): def query(self, query_embedding, top_k=10, where=None):
assert top_k == 8
return { return {
"ids": [["chunk-a", "chunk-b"]], "ids": [["chunk-a", "chunk-b"]],
"documents": [["project doc", "other doc"]], "documents": [["project doc", "other doc"]],
@@ -102,7 +122,21 @@ def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
) )
monkeypatch.setattr( monkeypatch.setattr(
"atocore.retrieval.retriever.get_registered_project", "atocore.retrieval.retriever.get_registered_project",
lambda project_name: type( lambda project_name: target_project,
)
monkeypatch.setattr(
"atocore.retrieval.retriever.load_project_registry",
lambda: [target_project, other_project],
)
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
assert len(results) == 1
assert results[0].chunk_id == "chunk-a"
def test_retrieve_project_scope_allows_unowned_global_chunks(monkeypatch):
target_project = type(
"Project", "Project",
(), (),
{ {
@@ -110,14 +144,286 @@ def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
"aliases": ("p04", "gigabit"), "aliases": ("p04", "gigabit"),
"ingest_roots": (), "ingest_roots": (),
}, },
)(), )()
class FakeStore:
def query(self, query_embedding, top_k=10, where=None):
return {
"ids": [["chunk-a", "chunk-global"]],
"documents": [["project doc", "global doc"]],
"metadatas": [[
{
"heading_path": "Overview",
"source_file": "p04-gigabit/pkm/_index.md",
"tags": '["p04-gigabit"]',
"title": "P04",
"document_id": "doc-a",
},
{
"heading_path": "Overview",
"source_file": "shared/engineering-rules.md",
"tags": "[]",
"title": "Shared engineering rules",
"document_id": "doc-global",
},
]],
"distances": [[0.2, 0.21]],
}
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
monkeypatch.setattr(
"atocore.retrieval.retriever._existing_chunk_ids",
lambda chunk_ids: set(chunk_ids),
)
monkeypatch.setattr(
"atocore.retrieval.retriever.get_registered_project",
lambda project_name: target_project,
)
monkeypatch.setattr(
"atocore.retrieval.retriever.load_project_registry",
lambda: [target_project],
) )
results = retrieve("mirror architecture", top_k=2, project_hint="p04") results = retrieve("mirror architecture", top_k=2, project_hint="p04")
assert len(results) == 2 assert [r.chunk_id for r in results] == ["chunk-a", "chunk-global"]
assert results[0].chunk_id == "chunk-a"
assert results[0].score > results[1].score
def test_retrieve_project_scope_filter_can_be_disabled(monkeypatch):
target_project = type(
"Project",
(),
{
"project_id": "p04-gigabit",
"aliases": ("p04", "gigabit"),
"ingest_roots": (),
},
)()
other_project = type(
"Project",
(),
{
"project_id": "p05-interferometer",
"aliases": ("p05", "interferometer"),
"ingest_roots": (),
},
)()
class FakeStore:
def query(self, query_embedding, top_k=10, where=None):
assert top_k == 2
return {
"ids": [["chunk-a", "chunk-b"]],
"documents": [["project doc", "other project doc"]],
"metadatas": [[
{
"heading_path": "Overview",
"source_file": "p04-gigabit/pkm/_index.md",
"tags": '["p04-gigabit"]',
"title": "P04",
"document_id": "doc-a",
},
{
"heading_path": "Overview",
"source_file": "p05-interferometer/pkm/_index.md",
"tags": '["p05-interferometer"]',
"title": "P05",
"document_id": "doc-b",
},
]],
"distances": [[0.2, 0.2]],
}
monkeypatch.setattr("atocore.config.settings.rank_project_scope_filter", False)
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
monkeypatch.setattr(
"atocore.retrieval.retriever._existing_chunk_ids",
lambda chunk_ids: set(chunk_ids),
)
monkeypatch.setattr(
"atocore.retrieval.retriever.get_registered_project",
lambda project_name: target_project,
)
monkeypatch.setattr(
"atocore.retrieval.retriever.load_project_registry",
lambda: [target_project, other_project],
)
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
assert {r.chunk_id for r in results} == {"chunk-a", "chunk-b"}
def test_retrieve_project_scope_ignores_title_for_ownership(monkeypatch):
target_project = type(
"Project",
(),
{
"project_id": "p04-gigabit",
"aliases": ("p04", "gigabit"),
"ingest_roots": (),
},
)()
other_project = type(
"Project",
(),
{
"project_id": "p06-polisher",
"aliases": ("p06", "polisher", "p11"),
"ingest_roots": (),
},
)()
class FakeStore:
def query(self, query_embedding, top_k=10, where=None):
return {
"ids": [["chunk-target", "chunk-poisoned-title"]],
"documents": [["p04 doc", "p06 doc"]],
"metadatas": [[
{
"heading_path": "Overview",
"source_file": "p04-gigabit/pkm/_index.md",
"tags": '["p04-gigabit"]',
"title": "P04",
"document_id": "doc-a",
},
{
"heading_path": "Overview",
"source_file": "p06-polisher/pkm/architecture.md",
"tags": '["p06-polisher"]',
"title": "GigaBIT M1 mirror lessons",
"document_id": "doc-b",
},
]],
"distances": [[0.2, 0.19]],
}
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
monkeypatch.setattr(
"atocore.retrieval.retriever._existing_chunk_ids",
lambda chunk_ids: set(chunk_ids),
)
monkeypatch.setattr(
"atocore.retrieval.retriever.get_registered_project",
lambda project_name: target_project,
)
monkeypatch.setattr(
"atocore.retrieval.retriever.load_project_registry",
lambda: [target_project, other_project],
)
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
assert [r.chunk_id for r in results] == ["chunk-target"]
def test_retrieve_project_scope_uses_path_segments_not_substrings(monkeypatch):
target_project = type(
"Project",
(),
{
"project_id": "p05-interferometer",
"aliases": ("p05", "interferometer"),
"ingest_roots": (),
},
)()
abb_project = type(
"Project",
(),
{
"project_id": "abb-space",
"aliases": ("abb",),
"ingest_roots": (),
},
)()
class FakeStore:
def query(self, query_embedding, top_k=10, where=None):
return {
"ids": [["chunk-target", "chunk-global"]],
"documents": [["p05 doc", "global doc"]],
"metadatas": [[
{
"heading_path": "Overview",
"source_file": "p05-interferometer/pkm/_index.md",
"tags": '["p05-interferometer"]',
"title": "P05",
"document_id": "doc-a",
},
{
"heading_path": "Abbreviation notes",
"source_file": "shared/cabbage-abbreviations.md",
"tags": "[]",
"title": "ABB-style abbreviations",
"document_id": "doc-global",
},
]],
"distances": [[0.2, 0.21]],
}
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
monkeypatch.setattr(
"atocore.retrieval.retriever._existing_chunk_ids",
lambda chunk_ids: set(chunk_ids),
)
monkeypatch.setattr(
"atocore.retrieval.retriever.get_registered_project",
lambda project_name: target_project,
)
monkeypatch.setattr(
"atocore.retrieval.retriever.load_project_registry",
lambda: [target_project, abb_project],
)
results = retrieve("abbreviations", top_k=2, project_hint="p05")
assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
def test_retrieve_unknown_project_hint_does_not_widen_or_filter(monkeypatch):
class FakeStore:
def query(self, query_embedding, top_k=10, where=None):
assert top_k == 2
return {
"ids": [["chunk-a", "chunk-b"]],
"documents": [["doc a", "doc b"]],
"metadatas": [[
{
"heading_path": "Overview",
"source_file": "project-a/file.md",
"tags": "[]",
"title": "A",
"document_id": "doc-a",
},
{
"heading_path": "Overview",
"source_file": "project-b/file.md",
"tags": "[]",
"title": "B",
"document_id": "doc-b",
},
]],
"distances": [[0.2, 0.21]],
}
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
monkeypatch.setattr(
"atocore.retrieval.retriever._existing_chunk_ids",
lambda chunk_ids: set(chunk_ids),
)
monkeypatch.setattr(
"atocore.retrieval.retriever.get_registered_project",
lambda project_name: None,
)
results = retrieve("overview", top_k=2, project_hint="unknown-project")
assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
def test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths(monkeypatch): def test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths(monkeypatch):