Compare commits
18 Commits
claude/v1-
...
codex/p04-
| Author | SHA1 | Date | |
|---|---|---|---|
| d3de9f67ea | |||
| 0fc6705173 | |||
| a87d9845a8 | |||
|
|
4744c69d10 | ||
| 867a1abfaa | |||
| 05c11fd4fb | |||
| ce6ffdbb63 | |||
| c03022d864 | |||
| f44a211497 | |||
| c7212900b0 | |||
| c53e61eb67 | |||
| 2b86543e6a | |||
| 0989fed9ee | |||
| 98bc848184 | |||
| 69176d11c5 | |||
| 4ca81e9b36 | |||
| 22a37a7241 | |||
| 2712c5d2d0 |
5
.gitignore
vendored
5
.gitignore
vendored
@@ -14,3 +14,8 @@ venv/
|
|||||||
.claude/*
|
.claude/*
|
||||||
!.claude/commands/
|
!.claude/commands/
|
||||||
!.claude/commands/**
|
!.claude/commands/**
|
||||||
|
|
||||||
|
# Editor / IDE state — user-specific, not project config
|
||||||
|
.obsidian/
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
|||||||
@@ -6,23 +6,26 @@
|
|||||||
|
|
||||||
## Orientation
|
## Orientation
|
||||||
|
|
||||||
- **live_sha** (Dalidou `/health` build_sha): `775960c` (verified 2026-04-16 via /health, build_time 2026-04-16T17:59:30Z)
|
- **live_sha** (Dalidou `/health` build_sha): `a87d984` (verified 2026-04-25T00:37Z post memory-ranking stabilization deploy; status=ok)
|
||||||
- **last_updated**: 2026-04-18 by Claude (Phase 7A — Memory Consolidation "sleep cycle" V1 on branch, not yet deployed)
|
- **last_updated**: 2026-04-25 by Codex (project_id metadata backfill complete; memory ranking stabilized)
|
||||||
- **main_tip**: `999788b`
|
- **main_tip**: `a87d984`
|
||||||
- **test_count**: 533 (prior 521 + 12 new wikilink/redlink tests)
|
- **test_count**: 571 on `main`
|
||||||
- **harness**: `17/18 PASS` on live Dalidou (p04-constraints expects "Zerodur" — retrieval content gap, not regression)
|
- **harness**: `19/20 PASS` on live Dalidou, 0 blocking failures, 1 known content gap (`p04-constraints`)
|
||||||
- **vectors**: 33,253
|
- **vectors**: 33,253
|
||||||
- **active_memories**: 84 (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
|
- **active_memories**: 290 (`/admin/dashboard` 2026-04-24; note integrity panel reports a separate active_memory_count=951 and needs reconciliation)
|
||||||
- **candidate_memories**: 2
|
- **candidate_memories**: 0 (triage queue drained)
|
||||||
- **interactions**: 234 total (192 claude-code, 38 openclaw, 4 test)
|
- **interactions**: 951 (`/admin/dashboard` 2026-04-24)
|
||||||
- **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08)
|
- **registered_projects**: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08)
|
||||||
- **project_state_entries**: 110 total (atocore=47, p06=19, p05=18, p04=15, abb=6, atomizer=5)
|
- **project_state_entries**: 128 across registered projects (`/admin/dashboard` 2026-04-24)
|
||||||
- **entities**: 35 (engineering knowledge graph, Layer 2)
|
- **entities**: 66 (up from 35 — V1-0 backfill + ongoing work; 0 open conflicts)
|
||||||
- **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified
|
- **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron, verified
|
||||||
- **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)** → **pipeline summary (NEW)**
|
- **nightly_pipeline**: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → auto-triage → **auto-promote/expire (NEW)** → weekly synth/lint Sundays → **retrieval harness (NEW)** → **pipeline summary (NEW)**
|
||||||
- **capture_clients**: claude-code (Stop hook + cwd project inference), openclaw (before_agent_start + llm_output plugin, verified live)
|
- **capture_clients**: claude-code (Stop hook + cwd project inference), openclaw (before_agent_start + llm_output plugin, verified live)
|
||||||
- **wiki**: http://dalidou:8100/wiki (browse), /wiki/projects/{id}, /wiki/entities/{id}, /wiki/search
|
- **wiki**: http://dalidou:8100/wiki (browse), /wiki/projects/{id}, /wiki/entities/{id}, /wiki/search
|
||||||
- **dashboard**: http://dalidou:8100/admin/dashboard (now shows pipeline health, interaction totals by client, all registered projects)
|
- **dashboard**: http://dalidou:8100/admin/dashboard (now shows pipeline health, interaction totals by client, all registered projects)
|
||||||
|
- **active_track**: Engineering V1 Completion (started 2026-04-22). V1-0 landed (`2712c5d`). V1-A density gate CLEARED (784 active ≫ 100 target as of 2026-04-23). V1-A soak gate at day 5/~7 (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; failures confined to the known p04-constraints content gap). Plan: `docs/plans/engineering-v1-completion-plan.md`. Resume map: `docs/plans/v1-resume-state.md`.
|
||||||
|
- **last_nightly_pipeline**: `2026-04-23T03:00:20Z` — harness 17/18, triage promoted=3 rejected=7 human=0, dedup 7 clusters (1 tier1 + 6 tier2 auto-merged), graduation 30-skipped 0-graduated 0-errors, auto-triage drained the queue (0 new candidates 2026-04-22T00:52Z run)
|
||||||
|
- **open_branches**: `codex/retrieval-memory-ranking-stabilization` pushed and fast-forwarded into `main` as `a87d984`; no active unmerged code branch for this tranche.
|
||||||
|
|
||||||
## Active Plan
|
## Active Plan
|
||||||
|
|
||||||
@@ -143,9 +146,12 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
|
|||||||
| R11 | Codex | P2 | src/atocore/api/routes.py:773-845 | `POST /admin/extract-batch` still accepts `mode="llm"` inside the container and returns a successful 0-candidate result instead of surfacing that host-only LLM extraction is unavailable from this runtime. That is a misleading API contract for operators. | fixed | Claude | 2026-04-12 | (pending) |
|
| R11 | Codex | P2 | src/atocore/api/routes.py:773-845 | `POST /admin/extract-batch` still accepts `mode="llm"` inside the container and returns a successful 0-candidate result instead of surfacing that host-only LLM extraction is unavailable from this runtime. That is a misleading API contract for operators. | fixed | Claude | 2026-04-12 | (pending) |
|
||||||
| R12 | Codex | P2 | scripts/batch_llm_extract_live.py:39-190 | The host-side extractor duplicates the LLM system prompt and JSON parsing logic from `src/atocore/memory/extractor_llm.py`. It works today, but this is now a prompt/parser drift risk across the container and host implementations. | fixed | Claude | 2026-04-12 | (pending) |
|
| R12 | Codex | P2 | scripts/batch_llm_extract_live.py:39-190 | The host-side extractor duplicates the LLM system prompt and JSON parsing logic from `src/atocore/memory/extractor_llm.py`. It works today, but this is now a prompt/parser drift risk across the container and host implementations. | fixed | Claude | 2026-04-12 | (pending) |
|
||||||
| R13 | Codex | P2 | DEV-LEDGER.md:12 | The new `286 passing` test-count claim is not reproducibly auditable from the current audit environments: neither Dalidou nor the clean worktree has `pytest` available. The claim may be true in Claude's dev shell, but it remains unverified in this audit. | fixed | Claude | 2026-04-12 | (pending) |
|
| R13 | Codex | P2 | DEV-LEDGER.md:12 | The new `286 passing` test-count claim is not reproducibly auditable from the current audit environments: neither Dalidou nor the clean worktree has `pytest` available. The claim may be true in Claude's dev shell, but it remains unverified in this audit. | fixed | Claude | 2026-04-12 | (pending) |
|
||||||
|
| R14 | Codex | P2 | src/atocore/api/routes.py (POST /entities/{id}/promote) | The HTTP `POST /entities/{id}/promote` route does not translate the new service-layer `ValueError("source_refs required: cannot promote a candidate with no provenance...")` into a 400. A legacy no-provenance candidate promoted through the API currently surfaces as a 500. Does not block V1-0 acceptance; tidy in a follow-up. | fixed | Claude | 2026-04-22 | 0989fed |
|
||||||
|
|
||||||
## Recent Decisions
|
## Recent Decisions
|
||||||
|
|
||||||
|
- **2026-04-22** **Wiki reorg reframed as read-only operator orientation.** The earlier `docs/plans/wiki-reorg-plan.md` (5 additions W-1…W-5 adding `/wiki/interactions`, `/wiki/memories`, project-page restructure, recent feed, topnav exposure) is **superseded** and marked as such in-tree. Successor is `docs/plans/operator-orientation-plan.md` in the `ATOCore-clean` workspace: no `/wiki` surface, no Human Mirror implementation, no new API routes, no schema changes, no new source of truth. First deliverables are docs-only: `docs/where-things-live.md` (operator map of source/staged/machine/registry/trusted-state/memories/interactions/logs/backups with explicit edit-vs-don't-edit boundaries) and `docs/operator-home.md` (short daily starting page indexing those docs + a 4-command read-only orientation sequence). README and operations.md link both. Optional future work (`project-overview` and `memory-candidates` CLI helpers over existing endpoints) is gated on the docs proving useful in practice. *Proposed by:* Antoine. *Drafted by:* Claude. *Pending Codex review.*
|
||||||
|
- **2026-04-22** **V1-0 done: approved, merged, deployed, prod backfilled.** Codex pulled `f16cd52`, re-ran the two original probes (both pass), re-ran the three targeted regression suites (all pass). Squash-merged to main as `2712c5d`. Dalidou deployed via canonical deploy script; `/health` reports build_sha=`2712c5d2d03cb2a6af38b559664afd1c4cd0e050`, status=ok. Validated backup snapshot taken at `/srv/storage/atocore/backups/snapshots/20260422T190624Z` before backfill. Prod backfill: `--dry-run` found 31 active/superseded entities with no provenance; list reviewed and sane; live run updated 31 rows via the default `hand_authored=1` flag path; follow-up dry-run returned 0 rows remaining. Residual logged as R14 (P2): `POST /entities/{id}/promote` HTTP route doesn't translate the new service-layer `ValueError` into a 400 — legacy bad candidate promotes via the API return 500 instead. Does not block V1-0 acceptance. V1-0 closed. Next: V1-A (Q-001 subsystem-scoped variant + Q-6 integration). V1-A holds until the soak window ends ~2026-04-26 and the 100-memory density target is hit. *Approved + landed by:* Codex. *Ratified by:* Antoine.
|
||||||
- **2026-04-22** **Engineering V1 Completion Plan — Codex sign-off (third round)**. Codex's third-round audit closed the remaining five open questions with concrete resolutions, patched inline in `docs/plans/engineering-v1-completion-plan.md`: (1) F-7 row rewritten with ground truth — schema + preserve-original + test coverage already exist (`graduated_to_entity_id` at `database.py:143-146`, `graduated` status in memory service, promote hook at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90`); **real gaps** are missing direct `POST /memory/{id}/graduate` route and spec's `knowledge→Fact` mismatch (no `fact` entity type exists; reconcile to `parameter` or similar); V1-E 2 → **3–4 days**; (2) Q-5 determinism reframed — don't stabilize the call to `datetime.now()`, inject regenerated timestamp + checksum as renderer inputs, remove DB iteration ordering dependencies; V1-D scope updated; (3) `project` vs `project_id` — doc note only, no rename, resolved; (4) total estimate 16.5–17.5 → **17.5–19.5 focused days** with calendar buffer on top; (5) "Minions" must not be canonized in D-3 release notes — neutral wording ("queued background processing / async workers") only. **Agreement reached**: Claude + Codex + Antoine aligned. V1-0 is ready to start once the current pipeline soak window ends (~2026-04-26) and the 100-memory density target is hit. *Patched by:* Codex. *Signed off by:* Codex ("with those edits, I'd sign off on the five questions"). *Accepted by:* Antoine. *Executor (V1-0 onwards):* Claude.
|
- **2026-04-22** **Engineering V1 Completion Plan — Codex sign-off (third round)**. Codex's third-round audit closed the remaining five open questions with concrete resolutions, patched inline in `docs/plans/engineering-v1-completion-plan.md`: (1) F-7 row rewritten with ground truth — schema + preserve-original + test coverage already exist (`graduated_to_entity_id` at `database.py:143-146`, `graduated` status in memory service, promote hook at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90`); **real gaps** are missing direct `POST /memory/{id}/graduate` route and spec's `knowledge→Fact` mismatch (no `fact` entity type exists; reconcile to `parameter` or similar); V1-E 2 → **3–4 days**; (2) Q-5 determinism reframed — don't stabilize the call to `datetime.now()`, inject regenerated timestamp + checksum as renderer inputs, remove DB iteration ordering dependencies; V1-D scope updated; (3) `project` vs `project_id` — doc note only, no rename, resolved; (4) total estimate 16.5–17.5 → **17.5–19.5 focused days** with calendar buffer on top; (5) "Minions" must not be canonized in D-3 release notes — neutral wording ("queued background processing / async workers") only. **Agreement reached**: Claude + Codex + Antoine aligned. V1-0 is ready to start once the current pipeline soak window ends (~2026-04-26) and the 100-memory density target is hit. *Patched by:* Codex. *Signed off by:* Codex ("with those edits, I'd sign off on the five questions"). *Accepted by:* Antoine. *Executor (V1-0 onwards):* Claude.
|
||||||
- **2026-04-22** **Engineering V1 Completion Plan revised per Codex second-round file-level audit** — three findings folded in, all with exact file:line refs from Codex: (1) F-1 downgraded from ✅ to 🟡 — `extractor_version` and `canonical_home` missing from `Entity` dataclass and `entities` table per `engineering-v1-acceptance.md:45`; V1-0 scope now adds both fields via additive migration + doc note that `project` IS `project_id` per "fields equivalent to" spec wording; (2) F-2 replaced with ground-truth per-query status: 9 of 20 v1-required queries done (Q-004/Q-005/Q-006/Q-008/Q-009/Q-011/Q-013/Q-016/Q-017), 1 partial (Q-001 needs subsystem-scoped variant), 10 missing (Q-002/003/007/010/012/014/018/019/020); V1-A scope shrank to Q-001 shape fix + Q-6 integration (pillar queries already implemented); V1-C closes the 8 remaining new queries + Q-020 deferred to V1-D; (3) F-5 reframed — generic `conflicts` + `conflict_members` schema already present at `database.py:190`, no migration needed; divergence is detector body (per-type dispatch needs generalization) + routes (`/admin/conflicts/*` needs `/conflicts/*` alias). Total revised to 16.5–17.5 days, ~60 tests. Plan: `docs/plans/engineering-v1-completion-plan.md` at commit `ce3a878` (Codex pulled clean). Three of Codex's eight open questions now answered; remaining: F-7 graduation depth, mirror determinism, `project` rename question, velocity calibration, minions naming. *Proposed by:* Claude. *Reviewed by:* Codex (two rounds).
|
- **2026-04-22** **Engineering V1 Completion Plan revised per Codex second-round file-level audit** — three findings folded in, all with exact file:line refs from Codex: (1) F-1 downgraded from ✅ to 🟡 — `extractor_version` and `canonical_home` missing from `Entity` dataclass and `entities` table per `engineering-v1-acceptance.md:45`; V1-0 scope now adds both fields via additive migration + doc note that `project` IS `project_id` per "fields equivalent to" spec wording; (2) F-2 replaced with ground-truth per-query status: 9 of 20 v1-required queries done (Q-004/Q-005/Q-006/Q-008/Q-009/Q-011/Q-013/Q-016/Q-017), 1 partial (Q-001 needs subsystem-scoped variant), 10 missing (Q-002/003/007/010/012/014/018/019/020); V1-A scope shrank to Q-001 shape fix + Q-6 integration (pillar queries already implemented); V1-C closes the 8 remaining new queries + Q-020 deferred to V1-D; (3) F-5 reframed — generic `conflicts` + `conflict_members` schema already present at `database.py:190`, no migration needed; divergence is detector body (per-type dispatch needs generalization) + routes (`/admin/conflicts/*` needs `/conflicts/*` alias). Total revised to 16.5–17.5 days, ~60 tests. Plan: `docs/plans/engineering-v1-completion-plan.md` at commit `ce3a878` (Codex pulled clean). Three of Codex's eight open questions now answered; remaining: F-7 graduation depth, mirror determinism, `project` rename question, velocity calibration, minions naming. *Proposed by:* Claude. *Reviewed by:* Codex (two rounds).
|
||||||
- **2026-04-22** **Engineering V1 Completion Plan revised per Codex first-round review** — original six-phase order (queries → ingest → mirror → graduation → provenance → ops) rejected by Codex as backward: provenance-at-write (F-8) and conflict-detection hooks (F-5 minimal) must precede any phase that writes active entities. Revised to seven phases: V1-0 write-time invariants (F-8 + F-5 hooks + F-1 audit) as hard prerequisite, V1-A minimum query slice proving the model, V1-B ingest, V1-C full query catalog, V1-D mirror, V1-E graduation, V1-F full F-5 spec + ops + docs. Also softened "parallel with Now list" — real collision points listed explicitly; schedule shifted ~4 weeks to reflect that V1-0 cannot start during pipeline soak. Withdrew the "50–70% built" global framing in favor of the per-criterion gap table. Workspace sync note added: Codex's Playground workspace can't see the plan file; canonical dev tree is Windows `C:\Users\antoi\ATOCore`. Plan: `docs/plans/engineering-v1-completion-plan.md`. Awaiting Codex file-level audit once workspace syncs. *Proposed by:* Claude. *First-round review by:* Codex.
|
- **2026-04-22** **Engineering V1 Completion Plan revised per Codex first-round review** — original six-phase order (queries → ingest → mirror → graduation → provenance → ops) rejected by Codex as backward: provenance-at-write (F-8) and conflict-detection hooks (F-5 minimal) must precede any phase that writes active entities. Revised to seven phases: V1-0 write-time invariants (F-8 + F-5 hooks + F-1 audit) as hard prerequisite, V1-A minimum query slice proving the model, V1-B ingest, V1-C full query catalog, V1-D mirror, V1-E graduation, V1-F full F-5 spec + ops + docs. Also softened "parallel with Now list" — real collision points listed explicitly; schedule shifted ~4 weeks to reflect that V1-0 cannot start during pipeline soak. Withdrew the "50–70% built" global framing in favor of the per-criterion gap table. Workspace sync note added: Codex's Playground workspace can't see the plan file; canonical dev tree is Windows `C:\Users\antoi\ATOCore`. Plan: `docs/plans/engineering-v1-completion-plan.md`. Awaiting Codex file-level audit once workspace syncs. *Proposed by:* Claude. *First-round review by:* Codex.
|
||||||
@@ -164,6 +170,26 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
|
|||||||
|
|
||||||
## Session Log
|
## Session Log
|
||||||
|
|
||||||
|
- **2026-04-25 Codex (project_id backfill + retrieval stabilization closed)** Merged `codex/project-id-metadata-retrieval` into `main` (`867a1ab`) and deployed to Dalidou. Took Chroma-inclusive backup `/srv/storage/atocore/backups/snapshots/20260424T154358Z`, then ran `scripts/backfill_chunk_project_ids.py` per project; populated projects `p04-gigabit`, `p05-interferometer`, `p06-polisher`, `atomizer-v2`, and `atocore` applied cleanly for 33,253 vectors total, with 0 missing/malformed and an immediate final dry-run showing 33,253 already tagged / 0 updates. Post-backfill harness exposed p06 memory-ranking misses (`Tailscale`, `encoder`), so Codex shipped `4744c69` then `a87d984 fix(memory): widen query-time context candidates`. Full local suite: 571 passed. Live `/health` reports `a87d9845a8c34395a02890f0cf22aa7a46afaf62`, vectors=33,253, sources_ready=true. Live retrieval harness: 19/20, 0 blocking failures, 1 known issue (`p04-constraints` missing `Zerodur` / `1.2`). A repeat backfill dry-run after the code-only stabilization deploy was aborted after the one-off container ran too long; the live service stayed healthy and the earlier post-apply idempotency result remains the migration acceptance record. Dalidou HTTP push credentials are still not configured; this session pushed through the Windows credential path.
|
||||||
|
|
||||||
|
- **2026-04-24 Codex (retrieval boundary deployed + project_id metadata tranche)** Merged `codex/audit-improvements-foundation` to `main` as `f44a211` and pushed to Dalidou Gitea. Took pre-deploy runtime backup `/srv/storage/atocore/backups/snapshots/20260424T144810Z` (DB + registry, no Chroma). Deployed via `papa@dalidou` canonical `deploy/dalidou/deploy.sh`; live `/health` reports build_sha `f44a2114970008a7eec4e7fc2860c8f072914e38`, build_time `2026-04-24T14:48:44Z`, status ok. Post-deploy retrieval harness: 20 fixtures, 19 pass, 0 blocking failures, 1 known issue (`p04-constraints`). The former blocker `p05-broad-status-no-atomizer` now passes. Manual p05 `context-build "current status"` spot check shows no p04/Atomizer source bleed in retrieved chunks. Started follow-up branch `codex/project-id-metadata-retrieval`: registered-project ingestion now writes explicit `project_id` into DB chunk metadata and Chroma vector metadata; retrieval prefers exact `project_id` when present and keeps path/tag matching as legacy fallback; added dry-run-by-default `scripts/backfill_chunk_project_ids.py` to backfill SQLite + Chroma metadata; added tests for project-id ingestion, registered refresh propagation, exact project-id retrieval, and collision fallback. Verified targeted suite (`test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 36 passed. Verified full suite: 556 passed in 72.44s. Branch not merged or deployed yet.
|
||||||
|
|
||||||
|
- **2026-04-24 Codex (project_id audit response)** Applied independent-audit fixes on `codex/project-id-metadata-retrieval`. Closed the nightly `/ingest/sources` clobber risk by adding registry-level `derive_project_id_for_path()` and making unscoped `ingest_file()` derive ownership from registered ingest roots when possible; `refresh_registered_project()` still passes the canonical project id directly. Changed retrieval so empty `project_id` falls through to legacy path/tag ownership instead of short-circuiting as unowned. Hardened `scripts/backfill_chunk_project_ids.py`: `--apply` now requires `--chroma-snapshot-confirmed`, runs Chroma metadata updates before SQLite writes, batches updates, skips/report missing vectors, skips/report malformed metadata, reports already-tagged rows, and turns missing ingestion tables into a JSON `db_warning` instead of a traceback. Added tests for auto-derive ingestion, empty-project fallback, ingest-root overlap rejection, and backfill dry-run/apply/snapshot/missing-vector/malformed cases. Verified targeted suite (`test_backfill_chunk_project_ids.py`, `test_ingestion.py`, `test_project_registry.py`, `test_retrieval.py`): 45 passed. Verified full suite: 565 passed in 73.16s. Local dry-run on empty/default data returns 0 updates with `db_warning` rather than crashing. Branch still not merged/deployed.
|
||||||
|
|
||||||
|
- **2026-04-24 Codex (project_id final hardening before merge)** Applied the final independent-review P2s on `codex/project-id-metadata-retrieval`: `ingest_file()` still fails open when project-id derivation fails, but now emits `project_id_derivation_failed` with file path and error; retrieval now catches registry failures both at project-scope resolution and the soft project-match boost path, logs warnings, and serves unscoped rather than raising. Added regression tests for both fail-open paths. Verified targeted suite (`test_ingestion.py`, `test_retrieval.py`, `test_backfill_chunk_project_ids.py`, `test_project_registry.py`): 47 passed. Verified full suite: 567 passed in 79.66s. Branch still not merged/deployed.
|
||||||
|
|
||||||
|
- **2026-04-24 Codex (audit improvements foundation)** Started implementation of the audit recommendations on branch `codex/audit-improvements-foundation` from `origin/main@c53e61e`. First tranche: registry-aware project-scoped retrieval filtering (`ATOCORE_RANK_PROJECT_SCOPE_FILTER`, widened candidate pull before filtering), eval harness known-issue lane, two p05 project-bleed fixtures, `scripts/live_status.py`, README/current-state/master-plan status refresh. Verified `pytest -q`: 550 passed in 67.11s. Live retrieval harness against undeployed production: 20 fixtures, 18 pass, 1 known issue (`p04-constraints` Zerodur/1.2 content gap), 1 blocking guard (`p05-broad-status-no-atomizer`) still failing because production has not yet deployed the retrieval filter and currently pulls `P04-GigaBIT-M1-KB-design` into broad p05 status context. Live dashboard refresh: health ok, build `2b86543`, docs 1748, chunks/vectors 33253, interactions 948, active memories 289, candidates 0, project_state total 128. Noted count discrepancy: dashboard memories.active=289 while integrity active_memory_count=951; schedule reconciliation in a follow-up.
|
||||||
|
|
||||||
|
- **2026-04-24 Codex (independent-audit hardening)** Applied the Opus independent audit's fast follow-ups before merge/deploy. Closed the two P1s by making project-scope ownership path/tag-based only, adding path-segment/tag-exact matching to avoid short-alias substring collisions, and keeping title/heading text out of provenance decisions. Added regression tests for title poisoning, substring collision, and unknown-project fallback. Added retrieval log fields `raw_results_count`, `post_filter_count`, `post_filter_dropped`, and `underfilled`. Added retrieval-eval run metadata (`generated_at`, `base_url`, `/health`) and `live_status.py` auth-token/status support. README now documents the ranking knobs and clarifies that the hard scope filter and soft project match boost are separate controls. Verified `pytest -q`: 553 passed in 66.07s. Live production remains expected-predeploy: 20 fixtures, 18 pass, 1 known content gap, 1 blocking p05 bleed guard. Latest live dashboard: build `2b86543`, docs 1748, chunks/vectors 33253, interactions 950, active memories 290, candidates 0, project_state total 128.
|
||||||
|
|
||||||
|
- **2026-04-23 Codex + Claude (R14 closed)** Codex reviewed `claude/r14-promote-400` at `3888db9`, no findings: "The route change is narrowly scoped: `promote_entity()` still returns False for not-found/not-candidate cases, so the existing 404 behavior remains intact, while caller-fixable validation failures now surface as 400." Ran `pytest tests/test_v1_0_write_invariants.py -q` from an isolated worktree: 15 passed in 1.91s. Claude squash-merged to main as `0989fed`, followed by ledger close-out `2b86543`, then deployed via canonical script. Dalidou `/health` reports build_sha=`2b86543e6ad26011b39a44509cc8df3809725171`, build_time `2026-04-23T15:20:53Z`, status=ok. R14 closed. Orientation refreshed earlier this session also reflected the V1-A gate status: **density gate CLEARED** (784 active memories vs 100 target — density batch-extract ran between 2026-04-22 and 2026-04-23 and more than crushed the gate), **soak gate at day 5 of ~7** (F4 first run 2026-04-19; nightly clean 2026-04-19 through 2026-04-23; only chronic failure is the known p04-constraints "Zerodur" content gap). V1-A branches from a clean V1-0 baseline as soon as the soak is called done.
|
||||||
|
|
||||||
|
- **2026-04-22 Codex + Antoine (V1-0 closed)** Codex approved `f16cd52` after re-running both original probes (legacy-candidate promote + supersede hook — both correct) and the three targeted regression suites (`test_v1_0_write_invariants.py`, `test_engineering_v1_phase5.py`, `test_inbox_crossproject.py` — all pass). Squash-merged to main as `2712c5d` ("feat(engineering): enforce V1-0 write invariants"). Deployed to Dalidou via the canonical deploy script; `/health` build_sha=`2712c5d2d03cb2a6af38b559664afd1c4cd0e050` status=ok. Validated backup snapshot at `/srv/storage/atocore/backups/snapshots/20260422T190624Z` taken BEFORE prod backfill. Prod backfill of `scripts/v1_0_backfill_provenance.py` against live DB: dry-run found 31 active/superseded entities with no provenance, list reviewed and looked sane; live run with default `hand_authored=1` flag path updated 31 rows; follow-up dry-run returned 0 rows remaining → no lingering F-8 violations in prod. Codex logged one residual P2 (R14): HTTP `POST /entities/{id}/promote` route doesn't translate the new service-layer `ValueError` into 400 — legacy bad candidate promoted through the API surfaces as 500. Not blocking. V1-0 closed. **Gates for V1-A**: soak window ends ~2026-04-26; 100-active-memory density target (currently 84 active + the ~31 newly flagged ones — need to check how those count in density math). V1-A holds until both gates clear.
|
||||||
|
|
||||||
|
- **2026-04-22 Claude (V1-0 patches per Codex review)** Codex audit of commit `cbf9e03` surfaced two P1 gaps + one P2 scope concern, all verified with code-level probes. **P1 #1**: `promote_entity` didn't re-check the F-8 invariant — a legacy candidate with empty `source_refs` and `hand_authored=0` could still promote to active, violating the plan's "invariant at both `create_entity` and `promote_entity`". Fixed: `promote_entity` at `service.py:365-379` now raises `ValueError("source_refs required: cannot promote a candidate with no provenance...")` before flipping status. Stays symmetric with the create-side error. **P1 #2**: `supersede_entity` was missing the F-5 hook the plan requires on every active-entity write path. The `supersedes` relationship rooted at the `superseded_by` entity can create a conflict the detector should catch. Fixed at `service.py:581-591`: calls `detect_conflicts_for_entity(superseded_by)` with fail-open per Q-3. **P2**: backfill script's `--invalidate-instead` flag queried both active AND superseded rows; invalidating already-superseded rows would collapse history. Fixed at `scripts/v1_0_backfill_provenance.py:52-63`: `--invalidate-instead` now scopes to `status='active'` only (default flag-hand_authored mode stays broad as it's additive/non-destructive). Help text tightened to make the destructive posture explicit. **Four new regression tests** in `test_v1_0_write_invariants.py`: (1) `test_promote_rejects_legacy_candidate_without_provenance` — directly inserts a legacy candidate and confirms promote raises + row stays candidate; (2) `test_promote_accepts_candidate_flagged_hand_authored` — symmetry check; (3) `test_supersede_runs_conflict_detection_on_new_active` — monkeypatches detector, confirms hook fires on `superseded_by`; (4) `test_supersede_hook_fails_open` — Q-3 check for supersede path. **Test count**: 543 → 547 (+4 regression). Full suite `547 passed in 81.07s`. Next: commit patches on branch, push, Codex re-review.
|
||||||
|
|
||||||
|
- **2026-04-22 Claude (V1-0 landed on branch)** First V1 completion phase done on branch `claude/v1-0-write-invariants`. **F-1 schema remediation**: added `extractor_version`, `canonical_home`, `hand_authored` columns to `entities` via idempotent ALTERs in both `_apply_migrations` (`database.py:148-170`) and `init_engineering_schema` (`service.py:95-139`). CREATE TABLE also updated so fresh DBs get the columns natively. New `_table_exists` helper at `database.py:378`. `Entity` dataclass gains the three fields with sensible defaults. `EXTRACTOR_VERSION = "v1.0.0"` module constant at top of `service.py`. `_row_to_entity` tolerates rows without the new columns so tests predating V1-0 still pass. **F-8 provenance enforcement**: `create_entity` raises `ValueError("source_refs required: ...")` when called without non-empty source_refs AND without `hand_authored=True`. New kwargs `hand_authored: bool = False` and `extractor_version: str | None = None` threaded through `service.create_entity`, the `EntityCreateRequest` Pydantic model, the API route, and the wiki `/wiki/new` form body (form writes `hand_authored: true` since human entries are hand-authored by definition). **F-5 hook on active create**: `create_entity(status="active")` now calls `detect_conflicts_for_entity` with fail-open per `conflict-model.md:256` (errors log warning, write still succeeds). The promote path's existing hook at `service.py:400-404` was kept as-is. **Doc note** added to `engineering-ontology-v1.md` recording that `project` IS the `project_id` per "fields equivalent to" wording. **Backfill script** at `scripts/v1_0_backfill_provenance.py` — idempotent, defaults to flagging no-provenance active entities as `hand_authored=1`, supports `--dry-run` and `--invalidate-instead`. **Tests**: 10 new in `tests/test_v1_0_write_invariants.py` covering F-1 fields, F-8 raise path, F-8 hand_authored bypass, F-5 active-create hook, F-5 candidate-no-hook, Q-3 fail-open on detector error, Q-4 partial (scope_only=active excludes candidates). **Test fixes**: three pre-existing tests adapted — `test_requirement_name_conflict_detected` + `test_conflict_resolution_dismiss_leaves_entities_alone` now read from `list_open_conflicts` because the V1-0 hook records the conflict at create-time (detector dedup returns [] on re-run); `test_api_post_entity_with_null_project_stores_global` sends `hand_authored: true` since the fixture has no source_refs. **conftest.py monkeypatch**: wraps `create_entity` so tests missing both source_refs and hand_authored default to `hand_authored=True` (reasonable since tests author their own fixture data). Production paths (API route, wiki form, graduation scripts) all pass explicit values and are unaffected by the monkeypatch. **Test count**: 533 → 543 (+10), full suite `543 passed in 77.86s`. **Not yet**: commit + push + Codex review + deploy. **Branch**: `claude/v1-0-write-invariants`.
|
||||||
|
|
||||||
- **2026-04-22 Codex (late night)** Third-round audit closed the remaining five open questions. Patched `docs/plans/engineering-v1-completion-plan.md` inline (no commit by Codex). **F-7 finding (P1):** graduation stack is partially built — `_graduation_prompt.py`, `scripts/graduate_memories.py`, `database.py:143-146` (`graduated_to_entity_id`), memory `graduated` status, promote-preserves-original at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90` all exist. Real gaps: no direct `POST /memory/{id}/graduate` route at `routes.py:756`; spec's `knowledge→Fact` doesn't match ontology (`service.py:16` has no `fact` type — reconcile to `parameter` or similar). V1-E estimate 2 → 3–4 days. **Q-5 finding (P2):** "stabilize timestamp" insufficient — renderer reads wall-clock in `_footer()` at `mirror.py:320`; fix is inject regenerated timestamp + checksum as renderer inputs + sort DB iteration + remove dict ordering deps. V1-D scope patched. **Remaining three (P3):** `project` stays as doc-note equivalence (no rename); total estimate 17.5–19.5 focused days; release notes must NOT canonize "Minions" — neutral "queued background processing / async workers" only. **Sign-off:** "with those edits, I'd sign off on the five questions. The only non-architectural uncertainty left in the plan is scheduling discipline against the current Now list; that does not block V1-0 once the soak window and memory-density gate clear." **Status:** Claude + Codex agreed. Plan frozen pending Antoine final accept and gate clearance. Claude to commit Codex's patches + push.
|
- **2026-04-22 Codex (late night)** Third-round audit closed the remaining five open questions. Patched `docs/plans/engineering-v1-completion-plan.md` inline (no commit by Codex). **F-7 finding (P1):** graduation stack is partially built — `_graduation_prompt.py`, `scripts/graduate_memories.py`, `database.py:143-146` (`graduated_to_entity_id`), memory `graduated` status, promote-preserves-original at `service.py:354-356,389-451`, tests at `test_engineering_v1_phase5.py:67-90` all exist. Real gaps: no direct `POST /memory/{id}/graduate` route at `routes.py:756`; spec's `knowledge→Fact` doesn't match ontology (`service.py:16` has no `fact` type — reconcile to `parameter` or similar). V1-E estimate 2 → 3–4 days. **Q-5 finding (P2):** "stabilize timestamp" insufficient — renderer reads wall-clock in `_footer()` at `mirror.py:320`; fix is inject regenerated timestamp + checksum as renderer inputs + sort DB iteration + remove dict ordering deps. V1-D scope patched. **Remaining three (P3):** `project` stays as doc-note equivalence (no rename); total estimate 17.5–19.5 focused days; release notes must NOT canonize "Minions" — neutral "queued background processing / async workers" only. **Sign-off:** "with those edits, I'd sign off on the five questions. The only non-architectural uncertainty left in the plan is scheduling discipline against the current Now list; that does not block V1-0 once the soak window and memory-density gate clear." **Status:** Claude + Codex agreed. Plan frozen pending Antoine final accept and gate clearance. Claude to commit Codex's patches + push.
|
||||||
|
|
||||||
- **2026-04-22 Claude (late night)** Codex second-round review did the full file-level audit and came back with three P1/P2 findings, all with exact file:line refs. Verified each against current code before revising. (1) **F-1 not clean**: `Entity` dataclass at `service.py:67` and `entities` table schema are missing the `extractor_version` and `canonical_home` shared-header fields required by `engineering-v1-acceptance.md:45`; `project` field is the project identifier but not named `project_id` as spec writes (spec wording "fields equivalent to" allows the naming, but needs explicit doc note). V1-0 scope now includes adding both missing fields via additive `_apply_migrations` pattern. (2) **F-2 needed exact statuses, not guesses**: per-function audit gave ground truth — 9 of 20 v1-required queries done, 1 partial (Q-001 returns project-wide tree not subsystem-scoped expand=contains per `engineering-query-catalog.md:71`), 10 missing. V1-A scope shrank to Q-001 shape fix + Q-6 integration (most pillar queries already implemented); V1-C closes the 8 net-new queries + Q-020 to V1-D. (3) **F-5 misframed**: the generic `conflicts` + `conflict_members` schema is ALREADY spec-compliant at `database.py:190`; divergence is detector body at `conflicts.py:36` (per-type dispatch needs generalization) + route path (`/admin/conflicts/*` needs `/conflicts/*` alias). V1-F no longer includes a schema migration; detector generalization + route alignment only. Totals revised to 16.5–17.5 days, ~60 tests (down from 12–17 / 65 because V1-A and V1-F scopes both shrank after audit). Three of the eight open questions resolved. Remaining open: F-7 graduation depth, mirror determinism, `project` naming, velocity calibration, minions-as-V2 naming. No code changes this session — plan + ledger only. Next: commit + push revised plan, then await Antoine+Codex joint sign-off before V1-0 starts.
|
- **2026-04-22 Claude (late night)** Codex second-round review did the full file-level audit and came back with three P1/P2 findings, all with exact file:line refs. Verified each against current code before revising. (1) **F-1 not clean**: `Entity` dataclass at `service.py:67` and `entities` table schema are missing the `extractor_version` and `canonical_home` shared-header fields required by `engineering-v1-acceptance.md:45`; `project` field is the project identifier but not named `project_id` as spec writes (spec wording "fields equivalent to" allows the naming, but needs explicit doc note). V1-0 scope now includes adding both missing fields via additive `_apply_migrations` pattern. (2) **F-2 needed exact statuses, not guesses**: per-function audit gave ground truth — 9 of 20 v1-required queries done, 1 partial (Q-001 returns project-wide tree not subsystem-scoped expand=contains per `engineering-query-catalog.md:71`), 10 missing. V1-A scope shrank to Q-001 shape fix + Q-6 integration (most pillar queries already implemented); V1-C closes the 8 net-new queries + Q-020 to V1-D. (3) **F-5 misframed**: the generic `conflicts` + `conflict_members` schema is ALREADY spec-compliant at `database.py:190`; divergence is detector body at `conflicts.py:36` (per-type dispatch needs generalization) + route path (`/admin/conflicts/*` needs `/conflicts/*` alias). V1-F no longer includes a schema migration; detector generalization + route alignment only. Totals revised to 16.5–17.5 days, ~60 tests (down from 12–17 / 65 because V1-A and V1-F scopes both shrank after audit). Three of the eight open questions resolved. Remaining open: F-7 graduation depth, mirror determinism, `project` naming, velocity calibration, minions-as-V2 naming. No code changes this session — plan + ledger only. Next: commit + push revised plan, then await Antoine+Codex joint sign-off before V1-0 starts.
|
||||||
|
|||||||
26
README.md
26
README.md
@@ -6,7 +6,7 @@ Personal context engine that enriches LLM interactions with durable memory, stru
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -e .
|
pip install -e .
|
||||||
uvicorn src.atocore.main:app --port 8100
|
uvicorn atocore.main:app --port 8100
|
||||||
```
|
```
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
@@ -37,6 +37,10 @@ python scripts/atocore_client.py audit-query "gigabit" 5
|
|||||||
| POST | /ingest | Ingest markdown file or folder |
|
| POST | /ingest | Ingest markdown file or folder |
|
||||||
| POST | /query | Retrieve relevant chunks |
|
| POST | /query | Retrieve relevant chunks |
|
||||||
| POST | /context/build | Build full context pack |
|
| POST | /context/build | Build full context pack |
|
||||||
|
| POST | /interactions | Capture prompt/response interactions |
|
||||||
|
| GET/POST | /memory | List/create durable memories |
|
||||||
|
| GET/POST | /entities | Engineering entity graph surface |
|
||||||
|
| GET | /admin/dashboard | Operator dashboard |
|
||||||
| GET | /health | Health check |
|
| GET | /health | Health check |
|
||||||
| GET | /debug/context | Inspect last context pack |
|
| GET | /debug/context | Inspect last context pack |
|
||||||
|
|
||||||
@@ -66,8 +70,10 @@ unversioned forms.
|
|||||||
FastAPI (port 8100)
|
FastAPI (port 8100)
|
||||||
|- Ingestion: markdown -> parse -> chunk -> embed -> store
|
|- Ingestion: markdown -> parse -> chunk -> embed -> store
|
||||||
|- Retrieval: query -> embed -> vector search -> rank
|
|- Retrieval: query -> embed -> vector search -> rank
|
||||||
|- Context Builder: retrieve -> boost -> budget -> format
|
|- Context Builder: project state -> memories -> entities -> retrieval -> budget
|
||||||
|- SQLite (documents, chunks, memories, projects, interactions)
|
|- Reflection: capture -> reinforce -> extract -> triage -> promote/expire
|
||||||
|
|- Engineering: typed entities, relationships, conflicts, wiki/mirror
|
||||||
|
|- SQLite (documents, chunks, memories, projects, interactions, entities)
|
||||||
'- ChromaDB (vector embeddings)
|
'- ChromaDB (vector embeddings)
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -82,6 +88,16 @@ Set via environment variables (prefix `ATOCORE_`):
|
|||||||
| ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) |
|
| ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) |
|
||||||
| ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) |
|
| ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) |
|
||||||
| ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model |
|
| ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model |
|
||||||
|
| ATOCORE_RANK_PROJECT_MATCH_BOOST | 2.0 | Soft boost for chunks whose metadata matches the project hint |
|
||||||
|
| ATOCORE_RANK_PROJECT_SCOPE_FILTER | true | Filter project-hinted retrieval away from other registered project corpora |
|
||||||
|
| ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER | 4 | Widen candidate pull before project-scope filtering |
|
||||||
|
| ATOCORE_RANK_QUERY_TOKEN_STEP | 0.08 | Per-token boost when query terms appear in high-signal metadata |
|
||||||
|
| ATOCORE_RANK_QUERY_TOKEN_CAP | 1.32 | Maximum query-token boost multiplier |
|
||||||
|
| ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST | 1.18 | Boost current decision/status/requirements-like paths |
|
||||||
|
| ATOCORE_RANK_PATH_LOW_SIGNAL_PENALTY | 0.72 | Down-rank archive/history-like paths |
|
||||||
|
|
||||||
|
`ATOCORE_RANK_PROJECT_SCOPE_FILTER` gates the hard cross-project filter only.
|
||||||
|
`ATOCORE_RANK_PROJECT_MATCH_BOOST` remains the separate soft-ranking knob.
|
||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
@@ -93,7 +109,11 @@ pytest
|
|||||||
## Operations
|
## Operations
|
||||||
|
|
||||||
- `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
|
- `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
|
||||||
|
- `scripts/retrieval_eval.py` runs the live retrieval/context harness, separates blocking failures from known content gaps, and stamps JSON output with target/build metadata.
|
||||||
|
- `scripts/live_status.py` renders a compact read-only status report from `/health`, `/stats`, `/projects`, and `/admin/dashboard`; set `ATOCORE_AUTH_TOKEN` or `--auth-token` when those endpoints are gated.
|
||||||
|
- `scripts/backfill_chunk_project_ids.py` dry-runs or applies explicit `project_id` metadata backfills for SQLite chunks and Chroma vectors; `--apply` requires a confirmed Chroma snapshot.
|
||||||
- `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
|
- `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
|
||||||
|
- `DEV-LEDGER.md` is the fast-moving source of operational truth during active development; copy claims into docs only after checking the live service.
|
||||||
|
|
||||||
## Architecture Notes
|
## Architecture Notes
|
||||||
|
|
||||||
|
|||||||
@@ -159,6 +159,17 @@ Every major object should support fields equivalent to:
|
|||||||
- `created_at`
|
- `created_at`
|
||||||
- `updated_at`
|
- `updated_at`
|
||||||
- `notes` (optional)
|
- `notes` (optional)
|
||||||
|
- `extractor_version` (V1-0)
|
||||||
|
- `canonical_home` (V1-0)
|
||||||
|
|
||||||
|
**Naming note (V1-0, 2026-04-22).** The AtoCore `entities` table and
|
||||||
|
`Entity` dataclass name the project-identifier field `project`, not
|
||||||
|
`project_id`. This doc's "fields equivalent to" wording allows that
|
||||||
|
naming flexibility — the `project` field on entity rows IS the
|
||||||
|
`project_id` per spec. No storage rename is planned; downstream readers
|
||||||
|
should treat `entity.project` as the project identifier. This was
|
||||||
|
resolved in Codex's third-round audit of the V1 Completion Plan (see
|
||||||
|
`docs/plans/engineering-v1-completion-plan.md`).
|
||||||
|
|
||||||
## Suggested Status Lifecycle
|
## Suggested Status Lifecycle
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,42 @@
|
|||||||
# AtoCore — Current State (2026-04-19)
|
# AtoCore - Current State (2026-04-25)
|
||||||
|
|
||||||
Live deploy: `877b97e` · Dalidou health: ok · Harness: 17/18.
|
Update 2026-04-25: project-id chunk/vector metadata is deployed and backfilled.
|
||||||
|
Live Dalidou is on `a87d984`; `/health` is ok with 33,253 vectors and sources
|
||||||
|
ready. Live retrieval harness is 19/20 with 0 blocking failures and 1 known
|
||||||
|
content gap (`p04-constraints` missing `Zerodur` / `1.2`). Full local suite:
|
||||||
|
571 passed.
|
||||||
|
|
||||||
|
The project-id backfill was applied per populated project after a
|
||||||
|
Chroma-inclusive backup at
|
||||||
|
`/srv/storage/atocore/backups/snapshots/20260424T154358Z`. The immediate
|
||||||
|
post-apply dry-run reported 33,253 already tagged, 0 updates, 0 missing, and 0
|
||||||
|
malformed. A later repeat dry-run after the code-only ranking deploy was
|
||||||
|
aborted because the one-off container ran too long; the earlier post-apply
|
||||||
|
idempotency result remains the migration acceptance record.
|
||||||
|
|
||||||
|
## V1-0 landed 2026-04-22
|
||||||
|
|
||||||
|
Engineering V1 completion track has started. **V1-0 write-time invariants**
|
||||||
|
merged and deployed: F-1 shared-header fields (`extractor_version`,
|
||||||
|
`canonical_home`, `hand_authored`) added to `entities`, F-8 provenance
|
||||||
|
enforcement at both `create_entity` and `promote_entity`, F-5 synchronous
|
||||||
|
conflict-detection hook on every active-entity write path (create, promote,
|
||||||
|
supersede) with Q-3 fail-open. Prod backfill ran cleanly — 31 legacy
|
||||||
|
active/superseded entities flagged `hand_authored=1`, follow-up dry-run
|
||||||
|
returned 0 remaining rows. Test count 533 → 547 (+14).
|
||||||
|
|
||||||
|
R14 is closed: `POST /entities/{id}/promote` now translates the new
|
||||||
|
caller-fixable V1-0 `ValueError` into HTTP 400.
|
||||||
|
|
||||||
|
**Next in the V1 track:** V1-A (minimal query slice + Q-6 killer-correctness
|
||||||
|
integration). Gated on pipeline soak (~2026-04-26) + 100+ active memory
|
||||||
|
density target. See `docs/plans/engineering-v1-completion-plan.md` for
|
||||||
|
the full 7-phase roadmap and `docs/plans/v1-resume-state.md` for the
|
||||||
|
"you are here" map.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Snapshot from previous update (2026-04-19)
|
||||||
|
|
||||||
## The numbers
|
## The numbers
|
||||||
|
|
||||||
@@ -40,10 +76,10 @@ Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs
|
|||||||
| 7G | Re-extraction on prompt version bump | pending |
|
| 7G | Re-extraction on prompt version bump | pending |
|
||||||
| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |
|
| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |
|
||||||
|
|
||||||
## Known gaps (honest)
|
## Known gaps (honest, refreshed 2026-04-24)
|
||||||
|
|
||||||
1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
|
1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
|
||||||
2. **OpenClaw is capture-only, not context-grounded.** The plugin POSTs `/interactions` on `llm_output` but does NOT call `/context/build` on `before_agent_start`. OpenClaw's underlying agent runs blind. See Q2 below.
|
2. **Project-scoped retrieval guard is deployed and passing.** Explicit `project_id` chunk/vector metadata is now present in SQLite and Chroma for the 33,253-vector corpus. Retrieval prefers exact metadata ownership and keeps path/tag matching as a legacy fallback.
|
||||||
3. **Human interface (wiki) is thin and static.** 5 project cards + a "System" line. No dashboard for the autonomous activity. No per-memory detail page. See Q3/Q5.
|
3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
|
||||||
4. **Harness 17/18** — the `p04-constraints` fixture wants "Zerodur" but retrieval surfaces related-not-exact terms. Content gap, not a retrieval regression.
|
4. **Harness known issue:** `p04-constraints` wants "Zerodur" and "1.2"; live retrieval surfaces related constraints but not those exact strings. Treat as content/state gap until fixed.
|
||||||
5. **Two projects under-populated**: p05-interferometer (4 memories, 18 state) and atomizer-v2 (1 memory, 6 state). Batch re-extract with the new llm-0.6.0 prompt would help.
|
5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.
|
||||||
|
|||||||
@@ -70,9 +70,14 @@ read-only additive mode.
|
|||||||
- Phase 6 - AtoDrive
|
- Phase 6 - AtoDrive
|
||||||
- Phase 10 - Write-back
|
- Phase 10 - Write-back
|
||||||
- Phase 11 - Multi-model
|
- Phase 11 - Multi-model
|
||||||
- Phase 12 - Evaluation
|
|
||||||
- Phase 13 - Hardening
|
- Phase 13 - Hardening
|
||||||
|
|
||||||
|
### Partial / Operational Baseline
|
||||||
|
|
||||||
|
- Phase 12 - Evaluation. The retrieval/context harness exists and runs
|
||||||
|
against live Dalidou, but coverage is still intentionally small and
|
||||||
|
should grow before this is complete in the intended sense.
|
||||||
|
|
||||||
### Engineering Layer Planning Sprint
|
### Engineering Layer Planning Sprint
|
||||||
|
|
||||||
**Status: complete.** All 8 architecture docs are drafted. The
|
**Status: complete.** All 8 architecture docs are drafted. The
|
||||||
@@ -126,11 +131,15 @@ This sits implicitly between Phase 8 (OpenClaw) and Phase 11
|
|||||||
(multi-model). Memory-review and engineering-entity commands are
|
(multi-model). Memory-review and engineering-entity commands are
|
||||||
deferred from the shared client until their workflows are exercised.
|
deferred from the shared client until their workflows are exercised.
|
||||||
|
|
||||||
## What Is Real Today (updated 2026-04-16)
|
## What Is Real Today (updated 2026-04-25)
|
||||||
|
|
||||||
- canonical AtoCore runtime on Dalidou (`775960c`, deploy.sh verified)
|
- canonical AtoCore runtime on Dalidou (`a87d984`, deploy.sh verified)
|
||||||
- 33,253 vectors across 6 registered projects
|
- 33,253 vectors across 6 registered projects, with explicit `project_id`
|
||||||
- 234 captured interactions (192 claude-code, 38 openclaw, 4 test)
|
metadata backfilled into SQLite and Chroma after snapshot
|
||||||
|
`/srv/storage/atocore/backups/snapshots/20260424T154358Z`
|
||||||
|
- 951 captured interactions as of the 2026-04-24 live dashboard; refresh
|
||||||
|
exact live counts with
|
||||||
|
`python scripts/live_status.py`
|
||||||
- 6 registered projects:
|
- 6 registered projects:
|
||||||
- `p04-gigabit` (483 docs, 15 state entries)
|
- `p04-gigabit` (483 docs, 15 state entries)
|
||||||
- `p05-interferometer` (109 docs, 18 state entries)
|
- `p05-interferometer` (109 docs, 18 state entries)
|
||||||
@@ -138,12 +147,17 @@ deferred from the shared client until their workflows are exercised.
|
|||||||
- `atomizer-v2` (568 docs, 5 state entries)
|
- `atomizer-v2` (568 docs, 5 state entries)
|
||||||
- `abb-space` (6 state entries)
|
- `abb-space` (6 state entries)
|
||||||
- `atocore` (drive source, 47 state entries)
|
- `atocore` (drive source, 47 state entries)
|
||||||
- 110 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
|
- 128 Trusted Project State entries across all projects (decisions, requirements, facts, contacts, milestones)
|
||||||
- 84 active memories (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
|
- 290 active memories and 0 candidate memories as of the 2026-04-24 live
|
||||||
|
dashboard
|
||||||
- context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
|
- context pack assembly with 4 tiers: Trusted Project State > identity/preference > project memories > retrieved chunks
|
||||||
- query-relevance memory ranking with overlap-density scoring
|
- query-relevance memory ranking with overlap-density scoring and widened
|
||||||
- retrieval eval harness: 18 fixtures, 17/18 passing on live
|
query-time candidate pools so older exact-intent project memories can rank
|
||||||
- 303 tests passing
|
ahead of generic high-confidence notes
|
||||||
|
- retrieval eval harness: 20 fixtures; current live has 19 pass, 1 known
|
||||||
|
content gap, and 0 blocking failures after the project-id backfill and
|
||||||
|
memory-ranking stabilization deploy
|
||||||
|
- 571 tests passing on `main`
|
||||||
- nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness** → **pipeline summary to project state**
|
- nightly pipeline: backup → cleanup → rsync → OpenClaw import → vault refresh → extract → triage → **auto-promote/expire** → weekly synth/lint → **retrieval harness** → **pipeline summary to project state**
|
||||||
- Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
|
- Phase 10 operational: reinforcement-based auto-promotion (ref_count ≥ 3, confidence ≥ 0.7) + stale candidate expiry (14 days unreinforced)
|
||||||
- pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
|
- pipeline health visible in dashboard: interaction totals by client, pipeline last_run, harness results, triage stats
|
||||||
@@ -167,17 +181,45 @@ These are the current practical priorities.
|
|||||||
4. **Fix p04-constraints harness failure** — retrieval doesn't surface
|
4. **Fix p04-constraints harness failure** — retrieval doesn't surface
|
||||||
"Zerodur" for p04 constraint queries. Investigate if it's a missing
|
"Zerodur" for p04 constraint queries. Investigate if it's a missing
|
||||||
memory or retrieval ranking issue.
|
memory or retrieval ranking issue.
|
||||||
|
5. **Fix Dalidou Git credentials** — the host checkout can fetch but cannot
|
||||||
|
push to Gitea over HTTP in non-interactive SSH sessions. Prefer switching
|
||||||
|
the deploy checkout to a Gitea SSH key; PAT-backed `credential.helper store`
|
||||||
|
is the fallback.
|
||||||
|
|
||||||
|
## Active — Engineering V1 Completion Track (started 2026-04-22)
|
||||||
|
|
||||||
|
The Engineering V1 sprint moved from **Next** to **Active** on 2026-04-22.
|
||||||
|
The discovery from the gbrain review was that V1 entity infrastructure
|
||||||
|
had been built incrementally already; the sprint is a **completion** plan
|
||||||
|
against `engineering-v1-acceptance.md`, not a greenfield build. Full plan:
|
||||||
|
`docs/plans/engineering-v1-completion-plan.md`. "You are here" single-page
|
||||||
|
map: `docs/plans/v1-resume-state.md`.
|
||||||
|
|
||||||
|
Seven phases, ~17.5–19.5 focused days, runs in parallel with the Now list
|
||||||
|
where surfaces are disjoint, pauses when they collide.
|
||||||
|
|
||||||
|
| Phase | Scope | Status |
|
||||||
|
|---|---|---|
|
||||||
|
| V1-0 | Write-time invariants: F-1 header fields + F-8 provenance enforcement + F-5 hook on every active-entity write + Q-3 flag-never-block | ✅ done 2026-04-22 (`2712c5d`) |
|
||||||
|
| V1-A | Minimum query slice: Q-001 subsystem-scoped variant + Q-6 killer-correctness integration test on p05-interferometer | 🟡 gated — starts when soak (~2026-04-26) + density (100+ active memories) gates clear |
|
||||||
|
| V1-B | KB-CAD + KB-FEM ingest (`POST /ingest/kb-cad/export`, `POST /ingest/kb-fem/export`) + D-2 schema docs | pending V1-A |
|
||||||
|
| V1-C | Close the remaining 8 queries (Q-002/003/007/010/012/014/018/019; Q-020 to V1-D) | pending V1-B |
|
||||||
|
| V1-D | Full mirror surface (3 spec routes + regenerate + determinism + disputed + curated markers) + Q-5 golden file | pending V1-C |
|
||||||
|
| V1-E | Memory→entity graduation end-to-end + remaining Q-4 trust tests | pending V1-D (note: collides with memory extractor; pauses for multi-model triage work) |
|
||||||
|
| V1-F | F-5 detector generalization + route alias + O-1/O-2/O-3 operational + D-1/D-3/D-4 docs | finish line |
|
||||||
|
|
||||||
|
R14 is closed: `POST /entities/{id}/promote` now translates
|
||||||
|
caller-fixable V1-0 provenance validation failures into HTTP 400 instead
|
||||||
|
of leaking as HTTP 500.
|
||||||
|
|
||||||
## Next
|
## Next
|
||||||
|
|
||||||
These are the next major layers after the current stabilization pass.
|
These are the next major layers after V1 and the current stabilization pass.
|
||||||
|
|
||||||
1. Phase 6 AtoDrive — clarify Google Drive as a trusted operational
|
1. Phase 6 AtoDrive — clarify Google Drive as a trusted operational
|
||||||
source and ingest from it
|
source and ingest from it
|
||||||
2. Phase 13 Hardening — Chroma backup policy, monitoring, alerting,
|
2. Phase 13 Hardening — Chroma backup policy, monitoring, alerting,
|
||||||
failure visibility beyond log files
|
failure visibility beyond log files
|
||||||
3. Engineering V1 implementation sprint — once knowledge density is
|
|
||||||
sufficient and the pipeline feels boring and dependable
|
|
||||||
|
|
||||||
## Later
|
## Later
|
||||||
|
|
||||||
|
|||||||
161
docs/plans/v1-resume-state.md
Normal file
161
docs/plans/v1-resume-state.md
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
# V1 Completion — Resume State
|
||||||
|
|
||||||
|
**Last updated:** 2026-04-22 (after V1-0 landed + R14 branch pushed)
|
||||||
|
**Purpose:** single-page "you are here" so any future session can pick up
|
||||||
|
the V1 completion sprint without re-reading the full plan history.
|
||||||
|
|
||||||
|
## State of play
|
||||||
|
|
||||||
|
- **V1-0 is DONE.** Merged to main as `2712c5d`, deployed to Dalidou,
|
||||||
|
prod backfill ran cleanly (31 legacy entities flagged
|
||||||
|
`hand_authored=1`, zero violations remaining).
|
||||||
|
- **R14 is on a branch.** `claude/r14-promote-400` at `3888db9` —
|
||||||
|
HTTP promote route returns 400 instead of 500 on V1-0 `ValueError`.
|
||||||
|
Pending Codex review + squash-merge. Non-blocking for V1-A.
|
||||||
|
- **V1-A is next but GATED.** Doesn't start until both gates clear.
|
||||||
|
|
||||||
|
## Start-gates for V1-A
|
||||||
|
|
||||||
|
| Gate | Condition | Status as of 2026-04-22 |
|
||||||
|
|---|---|---|
|
||||||
|
| Soak | Four clean nightly cycles since F4 confidence-decay first real run 2026-04-19 | Day 3 of 4 — expected clear around **2026-04-26** |
|
||||||
|
| Density | 100+ active memories | 84 active as of last ledger update — need +16. Lever: `scripts/batch_llm_extract_live.py` against 234-interaction backlog |
|
||||||
|
|
||||||
|
**When both are green, start V1-A.** If only one is green, hold.
|
||||||
|
|
||||||
|
## Pre-flight checklist when resuming
|
||||||
|
|
||||||
|
Before opening the V1-A branch, run through this in order:
|
||||||
|
|
||||||
|
1. `git checkout main && git pull` — make sure you're at the tip
|
||||||
|
2. Check `DEV-LEDGER.md` **Orientation** for current `live_sha`, `test_count`, `active_memories`
|
||||||
|
3. Check `/health` on Dalidou returns the same `build_sha` as Orientation
|
||||||
|
4. Check the dashboard for pipeline health: http://dalidou:8100/admin/dashboard
|
||||||
|
5. Confirm R14 branch status — either merged or explicitly deferred
|
||||||
|
6. Re-read the two core plan docs:
|
||||||
|
- `docs/plans/engineering-v1-completion-plan.md` — the full 7-phase plan
|
||||||
|
- `docs/architecture/engineering-v1-acceptance.md` — the acceptance contract
|
||||||
|
7. Skim the relevant spec docs for the phase you're about to start:
|
||||||
|
- V1-A: `engineering-query-catalog.md` (Q-001 + Q-006/Q-009/Q-011 killer queries)
|
||||||
|
- V1-B: `tool-handoff-boundaries.md` (KB-CAD/KB-FEM export shapes)
|
||||||
|
- V1-C: `engineering-query-catalog.md` (all remaining v1-required queries)
|
||||||
|
- V1-D: `human-mirror-rules.md` (mirror spec end-to-end)
|
||||||
|
- V1-E: `memory-vs-entities.md` (graduation flow)
|
||||||
|
- V1-F: `conflict-model.md` (generic slot-key detector)
|
||||||
|
|
||||||
|
## What V1-A looks like when started
|
||||||
|
|
||||||
|
**Branch:** `claude/v1-a-pillar-queries`
|
||||||
|
|
||||||
|
**Scope (~1.5 days):**
|
||||||
|
- **Q-001 shape fix.** Add a subsystem-scoped variant of `system_map()`
|
||||||
|
matching `GET /entities/Subsystem/<id>?expand=contains` per
|
||||||
|
`engineering-query-catalog.md:71`. The project-wide version stays
|
||||||
|
(it serves Q-004).
|
||||||
|
- **Q-6 integration test.** Seed p05-interferometer with five cases:
|
||||||
|
1 satisfying Component, 1 orphan Requirement, 1 Decision on flagged
|
||||||
|
Assumption, 1 supported ValidationClaim, 1 unsupported ValidationClaim.
|
||||||
|
One test asserting Q-006 / Q-009 / Q-011 return exactly the expected
|
||||||
|
members.
|
||||||
|
- The four "pillar" queries (Q-001, Q-005, Q-006, Q-017) already work
|
||||||
|
per Codex's 2026-04-22 audit. V1-A does NOT re-implement them —
|
||||||
|
V1-A verifies them on seeded data.
|
||||||
|
|
||||||
|
**Acceptance:** Q-001 subsystem-scoped variant + Q-6 integration test both
|
||||||
|
green. F-2 moves from 🟡 partial to slightly-less-partial.
|
||||||
|
|
||||||
|
**Estimated tests added:** ~4 (not ~12 — V1-A scope shrank after Codex
|
||||||
|
confirmed most queries already work).
|
||||||
|
|
||||||
|
## Map of the remaining phases
|
||||||
|
|
||||||
|
```
|
||||||
|
V1-0 ✅ write-time invariants landed 2026-04-22 (2712c5d)
|
||||||
|
↓
|
||||||
|
V1-A 🟡 minimum query slice gated on soak + density (~1.5d when started)
|
||||||
|
↓
|
||||||
|
V1-B KB-CAD/KB-FEM ingest + D-2 ~2d
|
||||||
|
↓
|
||||||
|
V1-C close 8 remaining queries ~2d
|
||||||
|
↓
|
||||||
|
V1-D full mirror + determinism ~3-4d (biggest phase)
|
||||||
|
↓
|
||||||
|
V1-E graduation + trust tests ~3-4d (pauses for multi-model triage)
|
||||||
|
↓
|
||||||
|
V1-F F-5 generalization + ops + docs ~3d — V1 done
|
||||||
|
```
|
||||||
|
|
||||||
|
## Parallel work that can run WITHOUT touching V1
|
||||||
|
|
||||||
|
These are genuinely disjoint surfaces; pick any of them during the gate
|
||||||
|
pause or as scheduling allows:
|
||||||
|
|
||||||
|
- **Density batch-extract** — *required* to unblock V1-A. Not optional.
|
||||||
|
- **p04-constraints harness fix** — retrieval-ranking change, fully
|
||||||
|
disjoint from entities. Safe to do anywhere in the V1 track.
|
||||||
|
- **Multi-model triage (Phase 11 entry)** — memory-side work, disjoint
|
||||||
|
from V1-A/B/C/D. **Pause before V1-E starts** because V1-E touches
|
||||||
|
memory module semantics.
|
||||||
|
|
||||||
|
## What NOT to do
|
||||||
|
|
||||||
|
- Don't start V1-A until both gates are green.
|
||||||
|
- Don't touch the memory extractor write path while V1-E is open.
|
||||||
|
- Don't name the rejected "Minions" plan in any doc — neutral wording
|
||||||
|
only ("queued background processing / async workers") per Codex
|
||||||
|
sign-off.
|
||||||
|
- Don't rename the `project` field to `project_id` — Codex + Antoine
|
||||||
|
agreed it stays as `project`, with a doc note in
|
||||||
|
`engineering-ontology-v1.md` that this IS the project_id per spec.
|
||||||
|
|
||||||
|
## Open review findings
|
||||||
|
|
||||||
|
| id | severity | summary | status |
|
||||||
|
|---|---|---|---|
|
||||||
|
| R14 | P2 | `POST /entities/{id}/promote` returns 500 on V1-0 `ValueError` instead of 400 | fixed on branch `claude/r14-promote-400`, pending Codex review |
|
||||||
|
|
||||||
|
Closed V1-0 findings: P1 "promote path allows provenance-less legacy
|
||||||
|
candidates" (service.py:365-379), P1 "supersede path missing F-5 hook"
|
||||||
|
(service.py:581-591), P2 "`--invalidate-instead` backfill too broad"
|
||||||
|
(v1_0_backfill_provenance.py:52-63). All three patched and approved in
|
||||||
|
the squash-merge to `2712c5d`.
|
||||||
|
|
||||||
|
## How agreement between Claude + Codex has worked so far
|
||||||
|
|
||||||
|
Three review rounds before V1-0 started + three during implementation:
|
||||||
|
|
||||||
|
1. **Rejection round.** Claude drafted a gbrain-inspired "Phase 8
|
||||||
|
Minions + typed edges" plan; Codex rejected as wrong-packaging.
|
||||||
|
Record: `docs/decisions/2026-04-22-gbrain-plan-rejection.md`.
|
||||||
|
2. **Completion-plan rewrite.** Claude rewrote against
|
||||||
|
`engineering-v1-acceptance.md`. Codex first-round review fixed the
|
||||||
|
phase order (provenance-first).
|
||||||
|
3. **Per-file audit.** Codex's second-round audit found F-1 / F-2 /
|
||||||
|
F-5 gaps, all folded in.
|
||||||
|
4. **Sign-off round.** Codex's third-round review resolved the five
|
||||||
|
remaining open questions inline and signed off: *"with those edits,
|
||||||
|
I'd sign off on the five questions."*
|
||||||
|
5. **V1-0 review.** Codex found two P1 gaps (promote re-check missing,
|
||||||
|
supersede hook missing) + one P2 (backfill scope too broad). All
|
||||||
|
three patched. Codex re-ran probes + regression suites, approved,
|
||||||
|
squash-merged.
|
||||||
|
6. **V1-0 deploy + prod backfill.** Codex deployed + ran backfill,
|
||||||
|
logged R14 as P2 residual.
|
||||||
|
|
||||||
|
Protocol has been: Claude writes, Codex audits, human Antoine ratifies.
|
||||||
|
Continue this for V1-A onward.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `docs/plans/engineering-v1-completion-plan.md` — full 7-phase plan
|
||||||
|
- `docs/decisions/2026-04-22-gbrain-plan-rejection.md` — prior rejection
|
||||||
|
- `docs/architecture/engineering-ontology-v1.md` — V1 ontology (18 predicates)
|
||||||
|
- `docs/architecture/engineering-query-catalog.md` — Q-001 through Q-020 spec
|
||||||
|
- `docs/architecture/engineering-v1-acceptance.md` — F/Q/O/D acceptance table
|
||||||
|
- `docs/architecture/promotion-rules.md` — candidate → active flow
|
||||||
|
- `docs/architecture/conflict-model.md` — F-5 spec
|
||||||
|
- `docs/architecture/human-mirror-rules.md` — V1-D spec
|
||||||
|
- `docs/architecture/memory-vs-entities.md` — V1-E spec
|
||||||
|
- `docs/architecture/tool-handoff-boundaries.md` — V1-B KB-CAD/KB-FEM
|
||||||
|
- `docs/master-plan-status.md` — Now / Active / Next / Later
|
||||||
|
- `DEV-LEDGER.md` — Orientation + Open Review Findings + Session Log
|
||||||
216
docs/plans/wiki-reorg-plan.md
Normal file
216
docs/plans/wiki-reorg-plan.md
Normal file
@@ -0,0 +1,216 @@
|
|||||||
|
# Wiki Reorg Plan — Human-Readable Navigation of AtoCore State
|
||||||
|
|
||||||
|
> **SUPERSEDED — 2026-04-22.** Do not implement this plan. It has been
|
||||||
|
> replaced by the read-only operator orientation work at
|
||||||
|
> `docs/plans/operator-orientation-plan.md` (in the `ATOCore-clean`
|
||||||
|
> workspace). The successor reframes the problem as orientation over
|
||||||
|
> existing APIs and docs rather than a new `/wiki` surface, and drops
|
||||||
|
> interaction browser / memory index / project-page restructure from
|
||||||
|
> scope. Nothing below should be picked up as a work item. Kept in tree
|
||||||
|
> for context only.
|
||||||
|
|
||||||
|
**Date:** 2026-04-22
|
||||||
|
**Author:** Claude (after walking the live wiki at `http://dalidou:8100/wiki`
|
||||||
|
and reading `src/atocore/engineering/wiki.py` + `/wiki/*` routes in
|
||||||
|
`src/atocore/api/routes.py`)
|
||||||
|
**Status:** SUPERSEDED (see banner above). Previously: Draft, pending Codex review
|
||||||
|
**Scope boundary:** This plan does NOT touch Inbox / Global / Emerging.
|
||||||
|
Those three surfaces stay exactly as they are today — the registration
|
||||||
|
flow and scope semantics around them are load-bearing and out of scope
|
||||||
|
for this reorg.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Position
|
||||||
|
|
||||||
|
The wiki is Layer 3 of the engineering architecture (Human Mirror —
|
||||||
|
see `docs/architecture/engineering-knowledge-hybrid-architecture.md`).
|
||||||
|
It is a **derived view** over the same database the LLM reads via
|
||||||
|
`atocore_context` / `atocore_search`. There is no parallel store; if a
|
||||||
|
fact is in the DB it is reachable from the wiki.
|
||||||
|
|
||||||
|
The user-facing problem is not representation, it is **findability and
|
||||||
|
structure**. Content the operator produced (proposals, decisions,
|
||||||
|
constraints, captured conversations) is in the DB but is not reachable
|
||||||
|
along a natural human reading path. Today the only routes to it are:
|
||||||
|
|
||||||
|
- typing the right keyword into `/wiki/search`
|
||||||
|
- remembering which project it lived under and scrolling the auto-
|
||||||
|
generated mirror markdown on `/wiki/projects/{id}`
|
||||||
|
- the homepage "What the brain is doing" strip, which shows counts of
|
||||||
|
audit actions but no content
|
||||||
|
|
||||||
|
There is no "recent conversations" view, no memory index, no way to
|
||||||
|
browse by memory type (decision vs preference vs episodic), and the
|
||||||
|
topnav only exposes Home / Activity / Triage / Dashboard.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Make the wiki a surface the operator actually uses to answer three
|
||||||
|
recurring questions:
|
||||||
|
|
||||||
|
1. **"Where did I say / decide / propose X?"** — find a memory or
|
||||||
|
interaction by topic, not by exact-string search.
|
||||||
|
2. **"What is the current state of project Y?"** — read decisions,
|
||||||
|
constraints, and open questions as distinct sections, not as one
|
||||||
|
wall of auto-generated markdown.
|
||||||
|
3. **"What happened in the system recently, and what does it mean?"**
|
||||||
|
— a human-meaningful recent feed (captures, promotions, decisions
|
||||||
|
recorded), not a count of audit-action names.
|
||||||
|
|
||||||
|
Success criterion: for each of the three questions above, a first-time
|
||||||
|
visitor reaches a useful answer in **≤ 2 clicks from `/wiki`**.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- No schema changes. All data already exists.
|
||||||
|
- No change to ingestion, extraction, promotion, or the trust rules.
|
||||||
|
- No change to Inbox / Global / Emerging surfaces or semantics.
|
||||||
|
- No change to `/admin/triage` or `/admin/dashboard`.
|
||||||
|
- No change to the `atocore_context` / `atocore_search` LLM-facing APIs.
|
||||||
|
- No new storage. Every new page is a new render function over existing
|
||||||
|
service-layer calls.
|
||||||
|
|
||||||
|
## What it will do
|
||||||
|
|
||||||
|
Five additions, in priority order. Each is independently mergeable and
|
||||||
|
independently revertable.
|
||||||
|
|
||||||
|
### W-1 — Interaction browser: `/wiki/interactions`
|
||||||
|
|
||||||
|
The DB holds 234 captured interactions (per ledger `2026-04-22`). None
|
||||||
|
are readable in the wiki. Add a paginated list view and a detail view.
|
||||||
|
|
||||||
|
- `/wiki/interactions?project=&client=&since=` — list, newest first,
|
||||||
|
showing timestamp, client (claude-code / openclaw / test), inferred
|
||||||
|
project, and the first ~160 chars of the user prompt.
|
||||||
|
- `/wiki/interactions/{id}` — full prompt + response, plus any
|
||||||
|
candidate / active memories extracted from it (back-link from the
|
||||||
|
memory → interaction relation if present, else a "no extractions"
|
||||||
|
note).
|
||||||
|
- Filters: project (multi), client, date range, "has extractions" boolean.
|
||||||
|
|
||||||
|
Answers Q1 ("where did I say X") by giving the user a time-ordered
|
||||||
|
stream of their own conversations, not just extracted summaries.
|
||||||
|
|
||||||
|
### W-2 — Memory index: `/wiki/memories`
|
||||||
|
|
||||||
|
Today `/wiki/memories/{id}` exists but there is no list view. Add one.
|
||||||
|
|
||||||
|
- `/wiki/memories?project=&type=&status=&tag=` — filterable list.
|
||||||
|
Status defaults to `active`. Types: decision, preference, episodic,
|
||||||
|
knowledge, identity, adaptation.
|
||||||
|
- Faceted counts in the sidebar (e.g. "decision: 14 · preference: 7").
|
||||||
|
- Each row links to `/wiki/memories/{id}` and, where present, the
|
||||||
|
originating interaction.
|
||||||
|
|
||||||
|
### W-3 — Project page restructure (tabbed, not flat)
|
||||||
|
|
||||||
|
`/wiki/projects/{id}` today renders the full mirror markdown in one
|
||||||
|
scroll. Restructure to tabs (or anchored sections, same thing
|
||||||
|
semantically) over the data already produced by
|
||||||
|
`generate_project_overview`:
|
||||||
|
|
||||||
|
- **Overview** — stage, client, type, description, headline state.
|
||||||
|
- **Decisions** — memories of type `decision` for this project.
|
||||||
|
- **Constraints** — project_state entries in the `constraints` category.
|
||||||
|
- **Proposals** — memories tagged `proposal` or type `preference` with
|
||||||
|
proposal semantics (exact filter TBD during W-3; see Open Questions).
|
||||||
|
- **Entities** — current list, already rendered inline today.
|
||||||
|
- **Memories** — all memories, reusing the W-2 list component filtered
|
||||||
|
to this project.
|
||||||
|
- **Timeline** — interactions for this project, reusing W-1 filtered
|
||||||
|
to this project.
|
||||||
|
|
||||||
|
No new data is produced; this is purely a slicing change.
|
||||||
|
|
||||||
|
### W-4 — Recent feed on homepage
|
||||||
|
|
||||||
|
Replace the current "What the brain is doing" strip (which shows audit
|
||||||
|
action counts) with a **content** feed: the last N events that a human
|
||||||
|
cares about —
|
||||||
|
|
||||||
|
- new active memory (not candidate)
|
||||||
|
- memory promoted candidate → active
|
||||||
|
- state entry added / changed
|
||||||
|
- new registered project
|
||||||
|
- interaction captured (optional, may be noisy — gate behind a toggle)
|
||||||
|
|
||||||
|
Each feed row links to the thing it describes. The audit-count strip
|
||||||
|
can move to `/wiki/activity` where the full audit timeline already lives.
|
||||||
|
|
||||||
|
### W-5 — Topnav exposure
|
||||||
|
|
||||||
|
Surface the new pages in the topnav so they are discoverable:
|
||||||
|
|
||||||
|
Current: `🏠 Home · 📡 Activity · 🔀 Triage · 📊 Dashboard`
|
||||||
|
Proposed: `🏠 Home · 💬 Interactions · 🧠 Memories · 📡 Activity · 🔀 Triage · 📊 Dashboard`
|
||||||
|
|
||||||
|
Domain pages (`/wiki/domains/{tag}`) stay reachable via tag chips on
|
||||||
|
memory rows, as today — no new topnav entry for them.
|
||||||
|
|
||||||
|
## Outcome
|
||||||
|
|
||||||
|
- Every captured interaction is readable in the wiki, with a clear
|
||||||
|
path from interaction → extracted memories and back.
|
||||||
|
- Memories are browsable without knowing a keyword in advance.
|
||||||
|
- A project page separates decisions, constraints, and proposals
|
||||||
|
instead of interleaving them in auto-generated markdown.
|
||||||
|
- The homepage tells the operator what **content** is new, not which
|
||||||
|
audit action counters incremented.
|
||||||
|
- Nothing in the Inbox / Global / Emerging flow changes.
|
||||||
|
- Nothing the LLM reads changes.
|
||||||
|
|
||||||
|
## Non-outcomes (explicitly)
|
||||||
|
|
||||||
|
- This plan does not improve retrieval quality. It does not touch the
|
||||||
|
extractor, ranking, or harness.
|
||||||
|
- This plan does not change what is captured or when.
|
||||||
|
- This plan does not replace `/admin/triage`. Candidate review stays there.
|
||||||
|
|
||||||
|
## Sequencing
|
||||||
|
|
||||||
|
1. **W-1 first.** Interaction browser is the biggest findability win
|
||||||
|
and has no coupling to memory code.
|
||||||
|
2. **W-2 next.** Memory index reuses list/filter infra from W-1.
|
||||||
|
3. **W-3** depends on W-2 (reuses the memory list component).
|
||||||
|
4. **W-4** is independent; can land any time after W-1.
|
||||||
|
5. **W-5** lands last so topnav only exposes pages that exist.
|
||||||
|
|
||||||
|
Each step ships with at least one render test (HTML contains expected
|
||||||
|
anchors / row counts against a seeded fixture), following the pattern
|
||||||
|
already in `tests/engineering/` for existing wiki renders.
|
||||||
|
|
||||||
|
## Risk & reversibility
|
||||||
|
|
||||||
|
- All additions are new routes / new render functions. Reverting any
|
||||||
|
step is a file delete + a topnav edit.
|
||||||
|
- Project page restructure (W-3) is the only edit to an existing
|
||||||
|
surface. Keep the flat-markdown render behind a query flag
|
||||||
|
(`?layout=flat`) for one release so regressions are observable.
|
||||||
|
- No DB migrations. No service-layer signatures change.
|
||||||
|
|
||||||
|
## Open questions for Codex
|
||||||
|
|
||||||
|
1. **"Proposal" as a first-class filter** (W-3) — is this a memory type,
|
||||||
|
a domain tag, a structural field we should add, or should it stay
|
||||||
|
derived by filter? Current DB has no explicit proposal type; we'd be
|
||||||
|
inferring from tags/content. If that inference is unreliable, W-3's
|
||||||
|
Proposals tab becomes noise.
|
||||||
|
2. **Interaction → memory back-link** (W-1) — does the current schema
|
||||||
|
already record which interaction an extracted memory came from? If
|
||||||
|
not, is exposing that link in the wiki worth a schema addition, or
|
||||||
|
should W-1 ship without it?
|
||||||
|
3. **Recent feed noise floor** (W-4) — should every captured
|
||||||
|
interaction appear in the feed, or only interactions that produced
|
||||||
|
at least one candidate memory? The former is complete but may drown
|
||||||
|
out signal at current capture rates (~10/day).
|
||||||
|
4. **Ordering vs V1 Completion track** — should any of this land before
|
||||||
|
V1-A (currently gated on soak ~2026-04-26 + density 100+), or is it
|
||||||
|
strictly after V1 closes?
|
||||||
|
|
||||||
|
## Workspace note
|
||||||
|
|
||||||
|
Canonical dev workspace is `C:\Users\antoi\ATOCore` (per `CLAUDE.md`).
|
||||||
|
Any Codex audit of this plan should sync from `origin/main` at or after
|
||||||
|
`2712c5d` before reviewing.
|
||||||
178
scripts/backfill_chunk_project_ids.py
Normal file
178
scripts/backfill_chunk_project_ids.py
Normal file
@@ -0,0 +1,178 @@
|
|||||||
|
"""Backfill explicit project_id into chunk and vector metadata.
|
||||||
|
|
||||||
|
Dry-run by default. The script derives ownership from the registered project
|
||||||
|
ingest roots and updates both SQLite source_chunks.metadata and Chroma vector
|
||||||
|
metadata only when --apply is provided.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import sqlite3
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "src"))
|
||||||
|
|
||||||
|
from atocore.models.database import get_connection # noqa: E402
|
||||||
|
from atocore.projects.registry import derive_project_id_for_path # noqa: E402
|
||||||
|
from atocore.retrieval.vector_store import get_vector_store # noqa: E402
|
||||||
|
|
||||||
|
DEFAULT_BATCH_SIZE = 500
|
||||||
|
|
||||||
|
|
||||||
|
def _decode_metadata(raw: str | None) -> dict | None:
|
||||||
|
if not raw:
|
||||||
|
return {}
|
||||||
|
try:
|
||||||
|
parsed = json.loads(raw)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return None
|
||||||
|
return parsed if isinstance(parsed, dict) else None
|
||||||
|
|
||||||
|
|
||||||
|
def _chunk_rows() -> tuple[list[dict], str]:
|
||||||
|
try:
|
||||||
|
with get_connection() as conn:
|
||||||
|
rows = conn.execute(
|
||||||
|
"""
|
||||||
|
SELECT
|
||||||
|
sc.id AS chunk_id,
|
||||||
|
sc.metadata AS chunk_metadata,
|
||||||
|
sd.file_path AS file_path
|
||||||
|
FROM source_chunks sc
|
||||||
|
JOIN source_documents sd ON sd.id = sc.document_id
|
||||||
|
ORDER BY sd.file_path, sc.chunk_index
|
||||||
|
"""
|
||||||
|
).fetchall()
|
||||||
|
except sqlite3.OperationalError as exc:
|
||||||
|
if "source_chunks" in str(exc) or "source_documents" in str(exc):
|
||||||
|
return [], f"missing ingestion tables: {exc}"
|
||||||
|
raise
|
||||||
|
return [dict(row) for row in rows], ""
|
||||||
|
|
||||||
|
|
||||||
|
def _batches(items: list, batch_size: int) -> list[list]:
|
||||||
|
return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]
|
||||||
|
|
||||||
|
|
||||||
|
def backfill(
|
||||||
|
apply: bool = False,
|
||||||
|
project_filter: str = "",
|
||||||
|
batch_size: int = DEFAULT_BATCH_SIZE,
|
||||||
|
require_chroma_snapshot: bool = False,
|
||||||
|
) -> dict:
|
||||||
|
rows, db_warning = _chunk_rows()
|
||||||
|
updates: list[tuple[str, str, dict]] = []
|
||||||
|
by_project: dict[str, int] = {}
|
||||||
|
skipped_unowned = 0
|
||||||
|
already_tagged = 0
|
||||||
|
malformed_metadata = 0
|
||||||
|
|
||||||
|
for row in rows:
|
||||||
|
project_id = derive_project_id_for_path(row["file_path"])
|
||||||
|
if project_filter and project_id != project_filter:
|
||||||
|
continue
|
||||||
|
if not project_id:
|
||||||
|
skipped_unowned += 1
|
||||||
|
continue
|
||||||
|
metadata = _decode_metadata(row["chunk_metadata"])
|
||||||
|
if metadata is None:
|
||||||
|
malformed_metadata += 1
|
||||||
|
continue
|
||||||
|
if metadata.get("project_id") == project_id:
|
||||||
|
already_tagged += 1
|
||||||
|
continue
|
||||||
|
metadata["project_id"] = project_id
|
||||||
|
updates.append((row["chunk_id"], project_id, metadata))
|
||||||
|
by_project[project_id] = by_project.get(project_id, 0) + 1
|
||||||
|
|
||||||
|
missing_vectors: list[str] = []
|
||||||
|
applied_updates = 0
|
||||||
|
if apply and updates:
|
||||||
|
if not require_chroma_snapshot:
|
||||||
|
raise ValueError(
|
||||||
|
"--apply requires --chroma-snapshot-confirmed after taking a Chroma backup"
|
||||||
|
)
|
||||||
|
vector_store = get_vector_store()
|
||||||
|
for batch in _batches(updates, max(1, batch_size)):
|
||||||
|
chunk_ids = [chunk_id for chunk_id, _, _ in batch]
|
||||||
|
vector_payload = vector_store.get_metadatas(chunk_ids)
|
||||||
|
existing_vector_metadata = {
|
||||||
|
chunk_id: metadata
|
||||||
|
for chunk_id, metadata in zip(
|
||||||
|
vector_payload.get("ids", []),
|
||||||
|
vector_payload.get("metadatas", []),
|
||||||
|
strict=False,
|
||||||
|
)
|
||||||
|
if isinstance(metadata, dict)
|
||||||
|
}
|
||||||
|
|
||||||
|
vector_ids = []
|
||||||
|
vector_metadatas = []
|
||||||
|
sql_updates = []
|
||||||
|
for chunk_id, project_id, chunk_metadata in batch:
|
||||||
|
vector_metadata = existing_vector_metadata.get(chunk_id)
|
||||||
|
if vector_metadata is None:
|
||||||
|
missing_vectors.append(chunk_id)
|
||||||
|
continue
|
||||||
|
vector_metadata = dict(vector_metadata)
|
||||||
|
vector_metadata["project_id"] = project_id
|
||||||
|
vector_ids.append(chunk_id)
|
||||||
|
vector_metadatas.append(vector_metadata)
|
||||||
|
sql_updates.append((json.dumps(chunk_metadata, ensure_ascii=True), chunk_id))
|
||||||
|
|
||||||
|
if not vector_ids:
|
||||||
|
continue
|
||||||
|
|
||||||
|
vector_store.update_metadatas(vector_ids, vector_metadatas)
|
||||||
|
with get_connection() as conn:
|
||||||
|
cursor = conn.executemany(
|
||||||
|
"UPDATE source_chunks SET metadata = ? WHERE id = ?",
|
||||||
|
sql_updates,
|
||||||
|
)
|
||||||
|
if cursor.rowcount != len(sql_updates):
|
||||||
|
raise RuntimeError(
|
||||||
|
f"SQLite rowcount mismatch: {cursor.rowcount} != {len(sql_updates)}"
|
||||||
|
)
|
||||||
|
applied_updates += len(sql_updates)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"apply": apply,
|
||||||
|
"total_chunks": len(rows),
|
||||||
|
"updates": len(updates),
|
||||||
|
"applied_updates": applied_updates,
|
||||||
|
"already_tagged": already_tagged,
|
||||||
|
"skipped_unowned": skipped_unowned,
|
||||||
|
"malformed_metadata": malformed_metadata,
|
||||||
|
"missing_vectors": len(missing_vectors),
|
||||||
|
"db_warning": db_warning,
|
||||||
|
"by_project": dict(sorted(by_project.items())),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument("--apply", action="store_true", help="write SQLite and Chroma metadata updates")
|
||||||
|
parser.add_argument("--project", default="", help="optional canonical project_id filter")
|
||||||
|
parser.add_argument("--batch-size", type=int, default=DEFAULT_BATCH_SIZE)
|
||||||
|
parser.add_argument(
|
||||||
|
"--chroma-snapshot-confirmed",
|
||||||
|
action="store_true",
|
||||||
|
help="required with --apply; confirms a Chroma snapshot exists",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
payload = backfill(
|
||||||
|
apply=args.apply,
|
||||||
|
project_filter=args.project.strip(),
|
||||||
|
batch_size=args.batch_size,
|
||||||
|
require_chroma_snapshot=args.chroma_snapshot_confirmed,
|
||||||
|
)
|
||||||
|
print(json.dumps(payload, indent=2, ensure_ascii=True))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
131
scripts/live_status.py
Normal file
131
scripts/live_status.py
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
"""Render a compact live-status report from a running AtoCore instance.
|
||||||
|
|
||||||
|
This is intentionally read-only and stdlib-only so it can be used from a
|
||||||
|
fresh checkout, a cron job, or a Codex/Claude session without installing the
|
||||||
|
full app package. The output is meant to reduce docs drift: copy the report
|
||||||
|
into status docs only after it was generated from the live service.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import errno
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import urllib.error
|
||||||
|
import urllib.request
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
|
||||||
|
DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100").rstrip("/")
|
||||||
|
DEFAULT_TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
|
||||||
|
DEFAULT_AUTH_TOKEN = os.environ.get("ATOCORE_AUTH_TOKEN", "").strip()
|
||||||
|
|
||||||
|
|
||||||
|
def request_json(base_url: str, path: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
|
||||||
|
headers = {"Authorization": f"Bearer {auth_token}"} if auth_token else {}
|
||||||
|
req = urllib.request.Request(f"{base_url}{path}", method="GET", headers=headers)
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout) as response:
|
||||||
|
body = response.read().decode("utf-8")
|
||||||
|
status = getattr(response, "status", None)
|
||||||
|
payload = json.loads(body) if body.strip() else {}
|
||||||
|
if not isinstance(payload, dict):
|
||||||
|
payload = {"value": payload}
|
||||||
|
if status is not None:
|
||||||
|
payload["_http_status"] = status
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
|
def collect_status(base_url: str, timeout: int, auth_token: str = "") -> dict[str, Any]:
|
||||||
|
payload: dict[str, Any] = {"base_url": base_url}
|
||||||
|
for name, path in {
|
||||||
|
"health": "/health",
|
||||||
|
"stats": "/stats",
|
||||||
|
"projects": "/projects",
|
||||||
|
"dashboard": "/admin/dashboard",
|
||||||
|
}.items():
|
||||||
|
try:
|
||||||
|
payload[name] = request_json(base_url, path, timeout, auth_token)
|
||||||
|
except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
|
||||||
|
payload[name] = {"error": str(exc)}
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
|
def render_markdown(status: dict[str, Any]) -> str:
|
||||||
|
health = status.get("health", {})
|
||||||
|
stats = status.get("stats", {})
|
||||||
|
projects = status.get("projects", {}).get("projects", [])
|
||||||
|
dashboard = status.get("dashboard", {})
|
||||||
|
memories = dashboard.get("memories", {}) if isinstance(dashboard.get("memories"), dict) else {}
|
||||||
|
project_state = dashboard.get("project_state", {}) if isinstance(dashboard.get("project_state"), dict) else {}
|
||||||
|
interactions = dashboard.get("interactions", {}) if isinstance(dashboard.get("interactions"), dict) else {}
|
||||||
|
pipeline = dashboard.get("pipeline", {}) if isinstance(dashboard.get("pipeline"), dict) else {}
|
||||||
|
|
||||||
|
lines = [
|
||||||
|
"# AtoCore Live Status",
|
||||||
|
"",
|
||||||
|
f"- base_url: `{status.get('base_url', '')}`",
|
||||||
|
"- endpoint_http_statuses: "
|
||||||
|
f"`health={health.get('_http_status', 'error')}, "
|
||||||
|
f"stats={stats.get('_http_status', 'error')}, "
|
||||||
|
f"projects={status.get('projects', {}).get('_http_status', 'error')}, "
|
||||||
|
f"dashboard={dashboard.get('_http_status', 'error')}`",
|
||||||
|
f"- service_status: `{health.get('status', 'unknown')}`",
|
||||||
|
f"- code_version: `{health.get('code_version', health.get('version', 'unknown'))}`",
|
||||||
|
f"- build_sha: `{health.get('build_sha', 'unknown')}`",
|
||||||
|
f"- build_branch: `{health.get('build_branch', 'unknown')}`",
|
||||||
|
f"- build_time: `{health.get('build_time', 'unknown')}`",
|
||||||
|
f"- env: `{health.get('env', 'unknown')}`",
|
||||||
|
f"- documents: `{stats.get('total_documents', 'unknown')}`",
|
||||||
|
f"- chunks: `{stats.get('total_chunks', 'unknown')}`",
|
||||||
|
f"- vectors: `{stats.get('total_vectors', health.get('vectors_count', 'unknown'))}`",
|
||||||
|
f"- registered_projects: `{len(projects)}`",
|
||||||
|
f"- active_memories: `{memories.get('active', 'unknown')}`",
|
||||||
|
f"- candidate_memories: `{memories.get('candidates', 'unknown')}`",
|
||||||
|
f"- interactions: `{interactions.get('total', 'unknown')}`",
|
||||||
|
f"- project_state_entries: `{project_state.get('total', 'unknown')}`",
|
||||||
|
f"- pipeline_last_run: `{pipeline.get('last_run', 'unknown')}`",
|
||||||
|
]
|
||||||
|
|
||||||
|
if projects:
|
||||||
|
lines.extend(["", "## Projects"])
|
||||||
|
for project in projects:
|
||||||
|
aliases = ", ".join(project.get("aliases", []))
|
||||||
|
suffix = f" ({aliases})" if aliases else ""
|
||||||
|
lines.append(f"- `{project.get('id', '')}`{suffix}")
|
||||||
|
|
||||||
|
return "\n".join(lines) + "\n"
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
parser = argparse.ArgumentParser(description="Render live AtoCore status")
|
||||||
|
parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
|
||||||
|
parser.add_argument("--timeout", type=int, default=DEFAULT_TIMEOUT)
|
||||||
|
parser.add_argument(
|
||||||
|
"--auth-token",
|
||||||
|
default=DEFAULT_AUTH_TOKEN,
|
||||||
|
help="Bearer token; defaults to ATOCORE_AUTH_TOKEN when set",
|
||||||
|
)
|
||||||
|
parser.add_argument("--json", action="store_true", help="emit raw JSON")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
status = collect_status(args.base_url.rstrip("/"), args.timeout, args.auth_token)
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(status, indent=2, ensure_ascii=True) + "\n"
|
||||||
|
else:
|
||||||
|
output = render_markdown(status)
|
||||||
|
|
||||||
|
try:
|
||||||
|
sys.stdout.write(output)
|
||||||
|
except BrokenPipeError:
|
||||||
|
return 0
|
||||||
|
except OSError as exc:
|
||||||
|
if exc.errno in {errno.EINVAL, errno.EPIPE}:
|
||||||
|
return 0
|
||||||
|
raise
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
@@ -44,6 +44,7 @@ import urllib.error
|
|||||||
import urllib.parse
|
import urllib.parse
|
||||||
import urllib.request
|
import urllib.request
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
|
DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
|
||||||
@@ -52,6 +53,13 @@ DEFAULT_BUDGET = 3000
|
|||||||
DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json"
|
DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json"
|
||||||
|
|
||||||
|
|
||||||
|
def request_json(base_url: str, path: str, timeout: int) -> dict:
|
||||||
|
req = urllib.request.Request(f"{base_url}{path}", method="GET")
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||||
|
body = resp.read().decode("utf-8")
|
||||||
|
return json.loads(body) if body.strip() else {}
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class Fixture:
|
class Fixture:
|
||||||
name: str
|
name: str
|
||||||
@@ -60,6 +68,7 @@ class Fixture:
|
|||||||
budget: int = DEFAULT_BUDGET
|
budget: int = DEFAULT_BUDGET
|
||||||
expect_present: list[str] = field(default_factory=list)
|
expect_present: list[str] = field(default_factory=list)
|
||||||
expect_absent: list[str] = field(default_factory=list)
|
expect_absent: list[str] = field(default_factory=list)
|
||||||
|
known_issue: bool = False
|
||||||
notes: str = ""
|
notes: str = ""
|
||||||
|
|
||||||
|
|
||||||
@@ -70,8 +79,13 @@ class FixtureResult:
|
|||||||
missing_present: list[str]
|
missing_present: list[str]
|
||||||
unexpected_absent: list[str]
|
unexpected_absent: list[str]
|
||||||
total_chars: int
|
total_chars: int
|
||||||
|
known_issue: bool = False
|
||||||
error: str = ""
|
error: str = ""
|
||||||
|
|
||||||
|
@property
|
||||||
|
def blocking_failure(self) -> bool:
|
||||||
|
return not self.ok and not self.known_issue
|
||||||
|
|
||||||
|
|
||||||
def load_fixtures(path: Path) -> list[Fixture]:
|
def load_fixtures(path: Path) -> list[Fixture]:
|
||||||
data = json.loads(path.read_text(encoding="utf-8"))
|
data = json.loads(path.read_text(encoding="utf-8"))
|
||||||
@@ -89,6 +103,7 @@ def load_fixtures(path: Path) -> list[Fixture]:
|
|||||||
budget=int(raw.get("budget", DEFAULT_BUDGET)),
|
budget=int(raw.get("budget", DEFAULT_BUDGET)),
|
||||||
expect_present=list(raw.get("expect_present", [])),
|
expect_present=list(raw.get("expect_present", [])),
|
||||||
expect_absent=list(raw.get("expect_absent", [])),
|
expect_absent=list(raw.get("expect_absent", [])),
|
||||||
|
known_issue=bool(raw.get("known_issue", False)),
|
||||||
notes=raw.get("notes", ""),
|
notes=raw.get("notes", ""),
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
@@ -117,6 +132,7 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
|
|||||||
missing_present=list(fixture.expect_present),
|
missing_present=list(fixture.expect_present),
|
||||||
unexpected_absent=[],
|
unexpected_absent=[],
|
||||||
total_chars=0,
|
total_chars=0,
|
||||||
|
known_issue=fixture.known_issue,
|
||||||
error=f"http_error: {exc}",
|
error=f"http_error: {exc}",
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -129,16 +145,26 @@ def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
|
|||||||
missing_present=missing,
|
missing_present=missing,
|
||||||
unexpected_absent=unexpected,
|
unexpected_absent=unexpected,
|
||||||
total_chars=len(formatted),
|
total_chars=len(formatted),
|
||||||
|
known_issue=fixture.known_issue,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def print_human_report(results: list[FixtureResult]) -> None:
|
def print_human_report(results: list[FixtureResult], metadata: dict) -> None:
|
||||||
total = len(results)
|
total = len(results)
|
||||||
passed = sum(1 for r in results if r.ok)
|
passed = sum(1 for r in results if r.ok)
|
||||||
|
known = sum(1 for r in results if not r.ok and r.known_issue)
|
||||||
|
blocking = sum(1 for r in results if r.blocking_failure)
|
||||||
print(f"Retrieval eval: {passed}/{total} fixtures passed")
|
print(f"Retrieval eval: {passed}/{total} fixtures passed")
|
||||||
|
print(
|
||||||
|
"Target: "
|
||||||
|
f"{metadata.get('base_url', 'unknown')} "
|
||||||
|
f"build={metadata.get('health', {}).get('build_sha', 'unknown')}"
|
||||||
|
)
|
||||||
|
if known or blocking:
|
||||||
|
print(f"Blocking failures: {blocking} Known issues: {known}")
|
||||||
print()
|
print()
|
||||||
for r in results:
|
for r in results:
|
||||||
marker = "PASS" if r.ok else "FAIL"
|
marker = "PASS" if r.ok else ("KNOWN" if r.known_issue else "FAIL")
|
||||||
print(f"[{marker}] {r.fixture.name} project={r.fixture.project} chars={r.total_chars}")
|
print(f"[{marker}] {r.fixture.name} project={r.fixture.project} chars={r.total_chars}")
|
||||||
if r.error:
|
if r.error:
|
||||||
print(f" error: {r.error}")
|
print(f" error: {r.error}")
|
||||||
@@ -150,15 +176,21 @@ def print_human_report(results: list[FixtureResult]) -> None:
|
|||||||
print(f" notes: {r.fixture.notes}")
|
print(f" notes: {r.fixture.notes}")
|
||||||
|
|
||||||
|
|
||||||
def print_json_report(results: list[FixtureResult]) -> None:
|
def print_json_report(results: list[FixtureResult], metadata: dict) -> None:
|
||||||
payload = {
|
payload = {
|
||||||
|
"generated_at": metadata.get("generated_at"),
|
||||||
|
"base_url": metadata.get("base_url"),
|
||||||
|
"health": metadata.get("health", {}),
|
||||||
"total": len(results),
|
"total": len(results),
|
||||||
"passed": sum(1 for r in results if r.ok),
|
"passed": sum(1 for r in results if r.ok),
|
||||||
|
"known_issues": sum(1 for r in results if not r.ok and r.known_issue),
|
||||||
|
"blocking_failures": sum(1 for r in results if r.blocking_failure),
|
||||||
"fixtures": [
|
"fixtures": [
|
||||||
{
|
{
|
||||||
"name": r.fixture.name,
|
"name": r.fixture.name,
|
||||||
"project": r.fixture.project,
|
"project": r.fixture.project,
|
||||||
"ok": r.ok,
|
"ok": r.ok,
|
||||||
|
"known_issue": r.known_issue,
|
||||||
"total_chars": r.total_chars,
|
"total_chars": r.total_chars,
|
||||||
"missing_present": r.missing_present,
|
"missing_present": r.missing_present,
|
||||||
"unexpected_absent": r.unexpected_absent,
|
"unexpected_absent": r.unexpected_absent,
|
||||||
@@ -179,15 +211,26 @@ def main() -> int:
|
|||||||
parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
|
parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
base_url = args.base_url.rstrip("/")
|
||||||
|
try:
|
||||||
|
health = request_json(base_url, "/health", args.timeout)
|
||||||
|
except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
|
||||||
|
health = {"error": str(exc)}
|
||||||
|
metadata = {
|
||||||
|
"generated_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"base_url": base_url,
|
||||||
|
"health": health,
|
||||||
|
}
|
||||||
|
|
||||||
fixtures = load_fixtures(args.fixtures)
|
fixtures = load_fixtures(args.fixtures)
|
||||||
results = [run_fixture(f, args.base_url, args.timeout) for f in fixtures]
|
results = [run_fixture(f, base_url, args.timeout) for f in fixtures]
|
||||||
|
|
||||||
if args.json:
|
if args.json:
|
||||||
print_json_report(results)
|
print_json_report(results, metadata)
|
||||||
else:
|
else:
|
||||||
print_human_report(results)
|
print_human_report(results, metadata)
|
||||||
|
|
||||||
return 0 if all(r.ok for r in results) else 1
|
return 0 if not any(r.blocking_failure for r in results) else 1
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -27,7 +27,7 @@
|
|||||||
"expect_absent": [
|
"expect_absent": [
|
||||||
"polisher suite"
|
"polisher suite"
|
||||||
],
|
],
|
||||||
"notes": "Key constraints are in Trusted Project State and in the mission-framing memory"
|
"notes": "Regression guard: query-relevant Trusted Project State requirements must survive the project-state budget cap."
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"name": "p04-short-ambiguous",
|
"name": "p04-short-ambiguous",
|
||||||
@@ -80,6 +80,36 @@
|
|||||||
],
|
],
|
||||||
"notes": "CGH is a core p05 concept. Should surface via chunks and possibly the architecture memory. Must not bleed p06 polisher-suite terms."
|
"notes": "CGH is a core p05 concept. Should surface via chunks and possibly the architecture memory. Must not bleed p06 polisher-suite terms."
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"name": "p05-broad-status-no-atomizer",
|
||||||
|
"project": "p05-interferometer",
|
||||||
|
"prompt": "current status",
|
||||||
|
"expect_present": [
|
||||||
|
"--- Trusted Project State ---",
|
||||||
|
"--- Project Memories ---",
|
||||||
|
"Zygo"
|
||||||
|
],
|
||||||
|
"expect_absent": [
|
||||||
|
"atomizer-v2",
|
||||||
|
"ATOMIZER_PODCAST_BRIEFING",
|
||||||
|
"[Source: atomizer-v2/",
|
||||||
|
"P04-GigaBIT-M1-KB-design"
|
||||||
|
],
|
||||||
|
"notes": "Regression guard for the April 24 audit finding: broad p05 status queries must not pull Atomizer/archive context into project-scoped packs."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "p05-vendor-decision-no-archive-first",
|
||||||
|
"project": "p05-interferometer",
|
||||||
|
"prompt": "vendor selection decision",
|
||||||
|
"expect_present": [
|
||||||
|
"Selection-Decision"
|
||||||
|
],
|
||||||
|
"expect_absent": [
|
||||||
|
"[Source: atomizer-v2/",
|
||||||
|
"ATOMIZER_PODCAST_BRIEFING"
|
||||||
|
],
|
||||||
|
"notes": "Project-scoped decision query should stay inside p05 and prefer current decision/vendor material over unrelated project archives."
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"name": "p06-suite-split",
|
"name": "p06-suite-split",
|
||||||
"project": "p06-polisher",
|
"project": "p06-polisher",
|
||||||
|
|||||||
167
scripts/v1_0_backfill_provenance.py
Normal file
167
scripts/v1_0_backfill_provenance.py
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
"""V1-0 one-time backfill: flag existing active entities that have no
|
||||||
|
provenance (empty source_refs) as hand_authored=1 so they stop failing
|
||||||
|
the F-8 invariant.
|
||||||
|
|
||||||
|
Runs against the live AtoCore DB. Idempotent: a second run after the
|
||||||
|
first touches nothing because the flagged rows already have
|
||||||
|
hand_authored=1.
|
||||||
|
|
||||||
|
Per the Engineering V1 Completion Plan (V1-0 scope), the three options
|
||||||
|
for an existing active entity without provenance are:
|
||||||
|
|
||||||
|
1. Attach provenance — impossible without human review, not automatable
|
||||||
|
2. Flag hand-authored — safe, additive, this script's default
|
||||||
|
3. Invalidate — destructive, requires operator sign-off
|
||||||
|
|
||||||
|
This script picks option (2) by default. Add --dry-run to see what
|
||||||
|
would change without writing. Add --invalidate-instead to pick option
|
||||||
|
(3) for all rows (not recommended for first run).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python scripts/v1_0_backfill_provenance.py --base-url http://dalidou:8100 --dry-run
|
||||||
|
python scripts/v1_0_backfill_provenance.py --base-url http://dalidou:8100
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import sqlite3
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def run(db_path: Path, dry_run: bool, invalidate_instead: bool) -> int:
|
||||||
|
if not db_path.exists():
|
||||||
|
print(f"ERROR: db not found: {db_path}", file=sys.stderr)
|
||||||
|
return 2
|
||||||
|
|
||||||
|
conn = sqlite3.connect(str(db_path))
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
|
||||||
|
# Verify the V1-0 migration ran: if hand_authored column is missing
|
||||||
|
# the operator hasn't deployed V1-0 yet, and running this script
|
||||||
|
# would crash. Fail loud rather than attempt the ALTER here.
|
||||||
|
cols = {r["name"] for r in conn.execute("PRAGMA table_info(entities)").fetchall()}
|
||||||
|
if "hand_authored" not in cols:
|
||||||
|
print(
|
||||||
|
"ERROR: entities table lacks the hand_authored column. "
|
||||||
|
"Deploy V1-0 migrations first (init_db + init_engineering_schema).",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
return 2
|
||||||
|
|
||||||
|
# Scope differs by mode:
|
||||||
|
# - Default (flag hand_authored=1): safe/additive, applies to active
|
||||||
|
# AND superseded rows so the historical trail is consistent.
|
||||||
|
# - --invalidate-instead: destructive — scope to ACTIVE rows only.
|
||||||
|
# Invalidating already-superseded history would collapse the audit
|
||||||
|
# trail, which the plan's remediation scope never intended
|
||||||
|
# (V1-0 talks about existing active no-provenance entities).
|
||||||
|
if invalidate_instead:
|
||||||
|
scope_sql = "status = 'active'"
|
||||||
|
else:
|
||||||
|
scope_sql = "status IN ('active', 'superseded')"
|
||||||
|
rows = conn.execute(
|
||||||
|
f"SELECT id, entity_type, name, project, status, source_refs, hand_authored "
|
||||||
|
f"FROM entities WHERE {scope_sql} AND hand_authored = 0"
|
||||||
|
).fetchall()
|
||||||
|
|
||||||
|
needs_fix = []
|
||||||
|
for row in rows:
|
||||||
|
refs_raw = row["source_refs"] or "[]"
|
||||||
|
try:
|
||||||
|
refs = json.loads(refs_raw)
|
||||||
|
except Exception:
|
||||||
|
refs = []
|
||||||
|
if not refs:
|
||||||
|
needs_fix.append(row)
|
||||||
|
|
||||||
|
print(f"found {len(needs_fix)} active/superseded entities with no provenance")
|
||||||
|
for row in needs_fix:
|
||||||
|
print(
|
||||||
|
f" - {row['id'][:8]} [{row['entity_type']}] "
|
||||||
|
f"{row['name']!r} project={row['project']!r} status={row['status']}"
|
||||||
|
)
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
print("--dry-run: no changes written")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
if not needs_fix:
|
||||||
|
print("nothing to do")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
action = "invalidate" if invalidate_instead else "flag hand_authored=1"
|
||||||
|
print(f"applying: {action}")
|
||||||
|
|
||||||
|
cur = conn.cursor()
|
||||||
|
for row in needs_fix:
|
||||||
|
if invalidate_instead:
|
||||||
|
cur.execute(
|
||||||
|
"UPDATE entities SET status = 'invalid', "
|
||||||
|
"updated_at = CURRENT_TIMESTAMP WHERE id = ?",
|
||||||
|
(row["id"],),
|
||||||
|
)
|
||||||
|
cur.execute(
|
||||||
|
"INSERT INTO memory_audit "
|
||||||
|
"(id, memory_id, action, actor, before_json, after_json, note, entity_kind) "
|
||||||
|
"VALUES (?, ?, 'invalidated', 'v1_0_backfill', ?, ?, ?, 'entity')",
|
||||||
|
(
|
||||||
|
f"v10bf-{row['id'][:8]}-inv",
|
||||||
|
row["id"],
|
||||||
|
json.dumps({"status": row["status"]}),
|
||||||
|
json.dumps({"status": "invalid"}),
|
||||||
|
"V1-0 backfill: invalidated, no provenance",
|
||||||
|
),
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
cur.execute(
|
||||||
|
"UPDATE entities SET hand_authored = 1, "
|
||||||
|
"updated_at = CURRENT_TIMESTAMP WHERE id = ?",
|
||||||
|
(row["id"],),
|
||||||
|
)
|
||||||
|
cur.execute(
|
||||||
|
"INSERT INTO memory_audit "
|
||||||
|
"(id, memory_id, action, actor, before_json, after_json, note, entity_kind) "
|
||||||
|
"VALUES (?, ?, 'hand_authored_flagged', 'v1_0_backfill', ?, ?, ?, 'entity')",
|
||||||
|
(
|
||||||
|
f"v10bf-{row['id'][:8]}-ha",
|
||||||
|
row["id"],
|
||||||
|
json.dumps({"hand_authored": False}),
|
||||||
|
json.dumps({"hand_authored": True}),
|
||||||
|
"V1-0 backfill: flagged hand_authored since source_refs empty",
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
conn.commit()
|
||||||
|
print(f"done: updated {len(needs_fix)} entities")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument(
|
||||||
|
"--db",
|
||||||
|
type=Path,
|
||||||
|
default=Path("data/db/atocore.db"),
|
||||||
|
help="Path to the SQLite database (default: data/db/atocore.db)",
|
||||||
|
)
|
||||||
|
parser.add_argument("--dry-run", action="store_true", help="Report only; no writes")
|
||||||
|
parser.add_argument(
|
||||||
|
"--invalidate-instead",
|
||||||
|
action="store_true",
|
||||||
|
help=(
|
||||||
|
"DESTRUCTIVE. Invalidate active rows with no provenance instead "
|
||||||
|
"of flagging them hand_authored. Scoped to status='active' only "
|
||||||
|
"(superseded rows are left alone to preserve audit history). "
|
||||||
|
"Not recommended for first run — start with --dry-run, then "
|
||||||
|
"the default hand_authored flag path."
|
||||||
|
),
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
return run(args.db, args.dry_run, args.invalidate_instead)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
@@ -1457,6 +1457,11 @@ class EntityCreateRequest(BaseModel):
|
|||||||
status: str = "active"
|
status: str = "active"
|
||||||
confidence: float = 1.0
|
confidence: float = 1.0
|
||||||
source_refs: list[str] | None = None
|
source_refs: list[str] | None = None
|
||||||
|
# V1-0 provenance enforcement (F-8). Clients must either pass
|
||||||
|
# non-empty source_refs or set hand_authored=true. The service layer
|
||||||
|
# raises ValueError otherwise, surfaced here as 400.
|
||||||
|
hand_authored: bool = False
|
||||||
|
extractor_version: str | None = None
|
||||||
|
|
||||||
|
|
||||||
class EntityPromoteRequest(BaseModel):
|
class EntityPromoteRequest(BaseModel):
|
||||||
@@ -1486,6 +1491,8 @@ def api_create_entity(req: EntityCreateRequest) -> dict:
|
|||||||
confidence=req.confidence,
|
confidence=req.confidence,
|
||||||
source_refs=req.source_refs,
|
source_refs=req.source_refs,
|
||||||
actor="api-http",
|
actor="api-http",
|
||||||
|
hand_authored=req.hand_authored,
|
||||||
|
extractor_version=req.extractor_version,
|
||||||
)
|
)
|
||||||
except ValueError as e:
|
except ValueError as e:
|
||||||
raise HTTPException(status_code=400, detail=str(e))
|
raise HTTPException(status_code=400, detail=str(e))
|
||||||
@@ -2180,12 +2187,17 @@ def api_promote_entity(
|
|||||||
from atocore.engineering.service import promote_entity
|
from atocore.engineering.service import promote_entity
|
||||||
target_project = req.target_project if req is not None else None
|
target_project = req.target_project if req is not None else None
|
||||||
note = req.note if req is not None else ""
|
note = req.note if req is not None else ""
|
||||||
success = promote_entity(
|
try:
|
||||||
entity_id,
|
success = promote_entity(
|
||||||
actor="api-http",
|
entity_id,
|
||||||
note=note,
|
actor="api-http",
|
||||||
target_project=target_project,
|
note=note,
|
||||||
)
|
target_project=target_project,
|
||||||
|
)
|
||||||
|
except ValueError as e:
|
||||||
|
# V1-0 F-8 re-check raises ValueError for no-provenance candidates
|
||||||
|
# (see service.promote_entity). Surface as 400, not 500.
|
||||||
|
raise HTTPException(status_code=400, detail=str(e))
|
||||||
if not success:
|
if not success:
|
||||||
raise HTTPException(status_code=404, detail=f"Entity not found or not a candidate: {entity_id}")
|
raise HTTPException(status_code=404, detail=f"Entity not found or not a candidate: {entity_id}")
|
||||||
result = {"status": "promoted", "id": entity_id}
|
result = {"status": "promoted", "id": entity_id}
|
||||||
|
|||||||
@@ -46,6 +46,8 @@ class Settings(BaseSettings):
|
|||||||
# All multipliers default to the values used since Wave 1; tighten or
|
# All multipliers default to the values used since Wave 1; tighten or
|
||||||
# loosen them via ATOCORE_* env vars without touching code.
|
# loosen them via ATOCORE_* env vars without touching code.
|
||||||
rank_project_match_boost: float = 2.0
|
rank_project_match_boost: float = 2.0
|
||||||
|
rank_project_scope_filter: bool = True
|
||||||
|
rank_project_scope_candidate_multiplier: int = 4
|
||||||
rank_query_token_step: float = 0.08
|
rank_query_token_step: float = 0.08
|
||||||
rank_query_token_cap: float = 1.32
|
rank_query_token_cap: float = 1.32
|
||||||
rank_path_high_signal_boost: float = 1.18
|
rank_path_high_signal_boost: float = 1.18
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ from dataclasses import dataclass, field
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
import atocore.config as _config
|
import atocore.config as _config
|
||||||
from atocore.context.project_state import format_project_state, get_state
|
from atocore.context.project_state import ProjectStateEntry, format_project_state, get_state
|
||||||
from atocore.memory.service import get_memories_for_context
|
from atocore.memory.service import get_memories_for_context
|
||||||
from atocore.observability.logger import get_logger
|
from atocore.observability.logger import get_logger
|
||||||
from atocore.engineering.service import get_entities, get_entity_with_context
|
from atocore.engineering.service import get_entities, get_entity_with_context
|
||||||
@@ -116,6 +116,11 @@ def build_context(
|
|||||||
if canonical_project:
|
if canonical_project:
|
||||||
state_entries = get_state(canonical_project)
|
state_entries = get_state(canonical_project)
|
||||||
if state_entries:
|
if state_entries:
|
||||||
|
state_entries = _rank_project_state_entries(
|
||||||
|
state_entries,
|
||||||
|
query=user_prompt,
|
||||||
|
project=canonical_project,
|
||||||
|
)
|
||||||
project_state_text = format_project_state(state_entries)
|
project_state_text = format_project_state(state_entries)
|
||||||
project_state_text, project_state_chars = _truncate_text_block(
|
project_state_text, project_state_chars = _truncate_text_block(
|
||||||
project_state_text,
|
project_state_text,
|
||||||
@@ -284,6 +289,55 @@ def get_last_context_pack() -> ContextPack | None:
|
|||||||
return _last_context_pack
|
return _last_context_pack
|
||||||
|
|
||||||
|
|
||||||
|
def _rank_project_state_entries(
|
||||||
|
entries: list[ProjectStateEntry],
|
||||||
|
query: str,
|
||||||
|
project: str,
|
||||||
|
) -> list[ProjectStateEntry]:
|
||||||
|
"""Promote query-relevant trusted state before the state band is truncated."""
|
||||||
|
if not query or len(entries) <= 1:
|
||||||
|
return entries
|
||||||
|
|
||||||
|
from atocore.memory.reinforcement import _normalize, _tokenize
|
||||||
|
|
||||||
|
query_text = _normalize(query.replace("_", " "))
|
||||||
|
query_tokens = set(_tokenize(query_text))
|
||||||
|
query_tokens -= {
|
||||||
|
"how",
|
||||||
|
"what",
|
||||||
|
"when",
|
||||||
|
"where",
|
||||||
|
"which",
|
||||||
|
"who",
|
||||||
|
"why",
|
||||||
|
"current",
|
||||||
|
"status",
|
||||||
|
"project",
|
||||||
|
}
|
||||||
|
for part in (project or "").lower().replace("_", "-").split("-"):
|
||||||
|
query_tokens.discard(part)
|
||||||
|
if not query_tokens:
|
||||||
|
return entries
|
||||||
|
|
||||||
|
scored: list[tuple[int, float, float, int, ProjectStateEntry]] = []
|
||||||
|
for index, entry in enumerate(entries):
|
||||||
|
entry_text = " ".join(
|
||||||
|
[
|
||||||
|
entry.category,
|
||||||
|
entry.key.replace("_", " "),
|
||||||
|
entry.value,
|
||||||
|
entry.source,
|
||||||
|
]
|
||||||
|
)
|
||||||
|
entry_tokens = _tokenize(_normalize(entry_text))
|
||||||
|
overlap = len(entry_tokens & query_tokens) if entry_tokens else 0
|
||||||
|
density = overlap / len(entry_tokens) if entry_tokens else 0.0
|
||||||
|
scored.append((overlap, density, entry.confidence, -index, entry))
|
||||||
|
|
||||||
|
scored.sort(key=lambda item: (item[0], item[1], item[2], item[3]), reverse=True)
|
||||||
|
return [entry for _, _, _, _, entry in scored]
|
||||||
|
|
||||||
|
|
||||||
def _rank_chunks(
|
def _rank_chunks(
|
||||||
candidates: list[ChunkResult],
|
candidates: list[ChunkResult],
|
||||||
project_hint: str | None,
|
project_hint: str | None,
|
||||||
|
|||||||
@@ -63,6 +63,12 @@ RELATIONSHIP_TYPES = [
|
|||||||
|
|
||||||
ENTITY_STATUSES = ["candidate", "active", "superseded", "invalid"]
|
ENTITY_STATUSES = ["candidate", "active", "superseded", "invalid"]
|
||||||
|
|
||||||
|
# V1-0: extractor version this module writes into new entity rows.
|
||||||
|
# Per promotion-rules.md:268, every candidate must record the version of
|
||||||
|
# the extractor that produced it so later re-evaluation is auditable.
|
||||||
|
# Bump this when extraction logic materially changes.
|
||||||
|
EXTRACTOR_VERSION = "v1.0.0"
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class Entity:
|
class Entity:
|
||||||
@@ -77,6 +83,10 @@ class Entity:
|
|||||||
source_refs: list[str] = field(default_factory=list)
|
source_refs: list[str] = field(default_factory=list)
|
||||||
created_at: str = ""
|
created_at: str = ""
|
||||||
updated_at: str = ""
|
updated_at: str = ""
|
||||||
|
# V1-0 shared-header fields per engineering-v1-acceptance.md:45.
|
||||||
|
extractor_version: str = ""
|
||||||
|
canonical_home: str = "entity"
|
||||||
|
hand_authored: bool = False
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
@@ -103,10 +113,25 @@ def init_engineering_schema() -> None:
|
|||||||
status TEXT NOT NULL DEFAULT 'active',
|
status TEXT NOT NULL DEFAULT 'active',
|
||||||
confidence REAL NOT NULL DEFAULT 1.0,
|
confidence REAL NOT NULL DEFAULT 1.0,
|
||||||
source_refs TEXT NOT NULL DEFAULT '[]',
|
source_refs TEXT NOT NULL DEFAULT '[]',
|
||||||
|
extractor_version TEXT NOT NULL DEFAULT '',
|
||||||
|
canonical_home TEXT NOT NULL DEFAULT 'entity',
|
||||||
|
hand_authored INTEGER NOT NULL DEFAULT 0,
|
||||||
created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||||
updated_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP
|
updated_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP
|
||||||
)
|
)
|
||||||
""")
|
""")
|
||||||
|
# V1-0 (Engineering V1 completion): the three shared-header fields
|
||||||
|
# per engineering-v1-acceptance.md:45. Idempotent ALTERs for
|
||||||
|
# databases created before V1-0 land these columns without a full
|
||||||
|
# migration. Fresh DBs get them via the CREATE TABLE above; the
|
||||||
|
# ALTERs below are a no-op there.
|
||||||
|
from atocore.models.database import _column_exists # late import; avoids cycle
|
||||||
|
if not _column_exists(conn, "entities", "extractor_version"):
|
||||||
|
conn.execute("ALTER TABLE entities ADD COLUMN extractor_version TEXT DEFAULT ''")
|
||||||
|
if not _column_exists(conn, "entities", "canonical_home"):
|
||||||
|
conn.execute("ALTER TABLE entities ADD COLUMN canonical_home TEXT DEFAULT 'entity'")
|
||||||
|
if not _column_exists(conn, "entities", "hand_authored"):
|
||||||
|
conn.execute("ALTER TABLE entities ADD COLUMN hand_authored INTEGER DEFAULT 0")
|
||||||
conn.execute("""
|
conn.execute("""
|
||||||
CREATE TABLE IF NOT EXISTS relationships (
|
CREATE TABLE IF NOT EXISTS relationships (
|
||||||
id TEXT PRIMARY KEY,
|
id TEXT PRIMARY KEY,
|
||||||
@@ -149,6 +174,8 @@ def create_entity(
|
|||||||
confidence: float = 1.0,
|
confidence: float = 1.0,
|
||||||
source_refs: list[str] | None = None,
|
source_refs: list[str] | None = None,
|
||||||
actor: str = "api",
|
actor: str = "api",
|
||||||
|
hand_authored: bool = False,
|
||||||
|
extractor_version: str | None = None,
|
||||||
) -> Entity:
|
) -> Entity:
|
||||||
if entity_type not in ENTITY_TYPES:
|
if entity_type not in ENTITY_TYPES:
|
||||||
raise ValueError(f"Invalid entity type: {entity_type}. Must be one of {ENTITY_TYPES}")
|
raise ValueError(f"Invalid entity type: {entity_type}. Must be one of {ENTITY_TYPES}")
|
||||||
@@ -157,6 +184,21 @@ def create_entity(
|
|||||||
if not name or not name.strip():
|
if not name or not name.strip():
|
||||||
raise ValueError("Entity name must be non-empty")
|
raise ValueError("Entity name must be non-empty")
|
||||||
|
|
||||||
|
refs = list(source_refs) if source_refs else []
|
||||||
|
|
||||||
|
# V1-0 (F-8 provenance enforcement, engineering-v1-acceptance.md:147):
|
||||||
|
# every new entity row must carry non-empty source_refs OR be explicitly
|
||||||
|
# flagged hand_authored. This is the non-negotiable invariant every
|
||||||
|
# later V1 phase depends on — without it, active entities can escape
|
||||||
|
# into the graph with no traceable origin. Raises at the write seam so
|
||||||
|
# the bug is impossible to introduce silently.
|
||||||
|
if not refs and not hand_authored:
|
||||||
|
raise ValueError(
|
||||||
|
"source_refs required: every entity must carry provenance "
|
||||||
|
"(source_chunk_id / source_interaction_id / kb_cad_export_id / ...) "
|
||||||
|
"or set hand_authored=True to explicitly flag a direct human write"
|
||||||
|
)
|
||||||
|
|
||||||
# Phase 5: enforce project canonicalization contract at the write seam.
|
# Phase 5: enforce project canonicalization contract at the write seam.
|
||||||
# Aliases like "p04" become "p04-gigabit" so downstream reads stay
|
# Aliases like "p04" become "p04-gigabit" so downstream reads stay
|
||||||
# consistent with the registry.
|
# consistent with the registry.
|
||||||
@@ -165,18 +207,22 @@ def create_entity(
|
|||||||
entity_id = str(uuid.uuid4())
|
entity_id = str(uuid.uuid4())
|
||||||
now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
|
now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
|
||||||
props = properties or {}
|
props = properties or {}
|
||||||
refs = source_refs or []
|
ev = extractor_version if extractor_version is not None else EXTRACTOR_VERSION
|
||||||
|
|
||||||
with get_connection() as conn:
|
with get_connection() as conn:
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"""INSERT INTO entities
|
"""INSERT INTO entities
|
||||||
(id, entity_type, name, project, description, properties,
|
(id, entity_type, name, project, description, properties,
|
||||||
status, confidence, source_refs, created_at, updated_at)
|
status, confidence, source_refs,
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
|
extractor_version, canonical_home, hand_authored,
|
||||||
|
created_at, updated_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
|
||||||
(
|
(
|
||||||
entity_id, entity_type, name.strip(), project,
|
entity_id, entity_type, name.strip(), project,
|
||||||
description, json.dumps(props), status, confidence,
|
description, json.dumps(props), status, confidence,
|
||||||
json.dumps(refs), now, now,
|
json.dumps(refs),
|
||||||
|
ev, "entity", 1 if hand_authored else 0,
|
||||||
|
now, now,
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -194,14 +240,31 @@ def create_entity(
|
|||||||
"project": project,
|
"project": project,
|
||||||
"status": status,
|
"status": status,
|
||||||
"confidence": confidence,
|
"confidence": confidence,
|
||||||
|
"hand_authored": hand_authored,
|
||||||
|
"extractor_version": ev,
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# V1-0 (F-5 hook, engineering-v1-acceptance.md:99): synchronous
|
||||||
|
# conflict detection on any active-entity write. The promote path
|
||||||
|
# already had this hook (see promote_entity below); V1-0 adds it to
|
||||||
|
# direct-active creates so every active row — however it got that
|
||||||
|
# way — is checked. Fail-open per "flag, never block" rule in
|
||||||
|
# conflict-model.md:256: detector errors log but never fail the write.
|
||||||
|
if status == "active":
|
||||||
|
try:
|
||||||
|
from atocore.engineering.conflicts import detect_conflicts_for_entity
|
||||||
|
detect_conflicts_for_entity(entity_id)
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("conflict_detection_failed", entity_id=entity_id, error=str(e))
|
||||||
|
|
||||||
return Entity(
|
return Entity(
|
||||||
id=entity_id, entity_type=entity_type, name=name.strip(),
|
id=entity_id, entity_type=entity_type, name=name.strip(),
|
||||||
project=project, description=description, properties=props,
|
project=project, description=description, properties=props,
|
||||||
status=status, confidence=confidence, source_refs=refs,
|
status=status, confidence=confidence, source_refs=refs,
|
||||||
created_at=now, updated_at=now,
|
created_at=now, updated_at=now,
|
||||||
|
extractor_version=ev, canonical_home="entity",
|
||||||
|
hand_authored=hand_authored,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -361,6 +424,20 @@ def promote_entity(
|
|||||||
if entity is None or entity.status != "candidate":
|
if entity is None or entity.status != "candidate":
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
# V1-0 (F-8 provenance re-check at promote). The invariant must hold at
|
||||||
|
# BOTH create_entity AND promote_entity per the plan, because candidate
|
||||||
|
# rows can exist in the DB from before V1-0 (no enforcement at their
|
||||||
|
# create time) or can be inserted by code paths that bypass the service
|
||||||
|
# layer. Block any candidate with empty source_refs that is NOT flagged
|
||||||
|
# hand_authored from ever becoming active. Same error shape as the
|
||||||
|
# create-side check for symmetry.
|
||||||
|
if not (entity.source_refs or []) and not entity.hand_authored:
|
||||||
|
raise ValueError(
|
||||||
|
"source_refs required: cannot promote a candidate with no "
|
||||||
|
"provenance. Attach source_refs via PATCH /entities/{id}, "
|
||||||
|
"or flag hand_authored=true before promoting."
|
||||||
|
)
|
||||||
|
|
||||||
if target_project is not None:
|
if target_project is not None:
|
||||||
new_project = (
|
new_project = (
|
||||||
resolve_project_name(target_project) if target_project else ""
|
resolve_project_name(target_project) if target_project else ""
|
||||||
@@ -503,6 +580,22 @@ def supersede_entity(
|
|||||||
superseded_by=superseded_by,
|
superseded_by=superseded_by,
|
||||||
error=str(e),
|
error=str(e),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# V1-0 (F-5 hook on supersede, per plan's "every active-entity
|
||||||
|
# write path"). Supersede demotes `entity_id` AND adds a
|
||||||
|
# `supersedes` relationship rooted at the already-active
|
||||||
|
# `superseded_by`. That new edge can create a conflict the
|
||||||
|
# detector should catch synchronously. Fail-open per
|
||||||
|
# conflict-model.md:256.
|
||||||
|
try:
|
||||||
|
from atocore.engineering.conflicts import detect_conflicts_for_entity
|
||||||
|
detect_conflicts_for_entity(superseded_by)
|
||||||
|
except Exception as e:
|
||||||
|
log.warning(
|
||||||
|
"conflict_detection_failed",
|
||||||
|
entity_id=superseded_by,
|
||||||
|
error=str(e),
|
||||||
|
)
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
@@ -774,6 +867,15 @@ def get_entity_with_context(entity_id: str) -> dict | None:
|
|||||||
|
|
||||||
|
|
||||||
def _row_to_entity(row) -> Entity:
|
def _row_to_entity(row) -> Entity:
|
||||||
|
# V1-0 shared-header fields are optional on read — rows that predate
|
||||||
|
# V1-0 migration have NULL / missing values, so defaults kick in and
|
||||||
|
# older tests that build Entity() without the new fields keep passing.
|
||||||
|
# `row.keys()` lets us tolerate SQLite rows that lack the columns
|
||||||
|
# entirely (pre-migration sqlite3.Row).
|
||||||
|
keys = set(row.keys())
|
||||||
|
extractor_version = (row["extractor_version"] or "") if "extractor_version" in keys else ""
|
||||||
|
canonical_home = (row["canonical_home"] or "entity") if "canonical_home" in keys else "entity"
|
||||||
|
hand_authored = bool(row["hand_authored"]) if "hand_authored" in keys and row["hand_authored"] is not None else False
|
||||||
return Entity(
|
return Entity(
|
||||||
id=row["id"],
|
id=row["id"],
|
||||||
entity_type=row["entity_type"],
|
entity_type=row["entity_type"],
|
||||||
@@ -786,6 +888,9 @@ def _row_to_entity(row) -> Entity:
|
|||||||
source_refs=json.loads(row["source_refs"] or "[]"),
|
source_refs=json.loads(row["source_refs"] or "[]"),
|
||||||
created_at=row["created_at"] or "",
|
created_at=row["created_at"] or "",
|
||||||
updated_at=row["updated_at"] or "",
|
updated_at=row["updated_at"] or "",
|
||||||
|
extractor_version=extractor_version,
|
||||||
|
canonical_home=canonical_home,
|
||||||
|
hand_authored=hand_authored,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -391,6 +391,8 @@ def render_new_entity_form(name: str = "", project: str = "") -> str:
|
|||||||
entity_type: fd.get('entity_type'),
|
entity_type: fd.get('entity_type'),
|
||||||
project: fd.get('project') || '',
|
project: fd.get('project') || '',
|
||||||
description: fd.get('description') || '',
|
description: fd.get('description') || '',
|
||||||
|
// V1-0: human writes via the wiki form are hand_authored by definition.
|
||||||
|
hand_authored: true,
|
||||||
};
|
};
|
||||||
try {
|
try {
|
||||||
const r = await fetch('/v1/entities', {
|
const r = await fetch('/v1/entities', {
|
||||||
|
|||||||
@@ -32,10 +32,23 @@ def exclusive_ingestion():
|
|||||||
_INGESTION_LOCK.release()
|
_INGESTION_LOCK.release()
|
||||||
|
|
||||||
|
|
||||||
def ingest_file(file_path: Path) -> dict:
|
def ingest_file(file_path: Path, project_id: str = "") -> dict:
|
||||||
"""Ingest a single markdown file. Returns stats."""
|
"""Ingest a single markdown file. Returns stats."""
|
||||||
start = time.time()
|
start = time.time()
|
||||||
file_path = file_path.resolve()
|
file_path = file_path.resolve()
|
||||||
|
project_id = (project_id or "").strip()
|
||||||
|
if not project_id:
|
||||||
|
try:
|
||||||
|
from atocore.projects.registry import derive_project_id_for_path
|
||||||
|
|
||||||
|
project_id = derive_project_id_for_path(file_path)
|
||||||
|
except Exception as exc:
|
||||||
|
log.warning(
|
||||||
|
"project_id_derivation_failed",
|
||||||
|
file_path=str(file_path),
|
||||||
|
error=str(exc),
|
||||||
|
)
|
||||||
|
project_id = ""
|
||||||
|
|
||||||
if not file_path.exists():
|
if not file_path.exists():
|
||||||
raise FileNotFoundError(f"File not found: {file_path}")
|
raise FileNotFoundError(f"File not found: {file_path}")
|
||||||
@@ -65,6 +78,7 @@ def ingest_file(file_path: Path) -> dict:
|
|||||||
"source_file": str(file_path),
|
"source_file": str(file_path),
|
||||||
"tags": parsed.tags,
|
"tags": parsed.tags,
|
||||||
"title": parsed.title,
|
"title": parsed.title,
|
||||||
|
"project_id": project_id,
|
||||||
}
|
}
|
||||||
chunks = chunk_markdown(parsed.body, base_metadata=base_meta)
|
chunks = chunk_markdown(parsed.body, base_metadata=base_meta)
|
||||||
|
|
||||||
@@ -116,6 +130,7 @@ def ingest_file(file_path: Path) -> dict:
|
|||||||
"source_file": str(file_path),
|
"source_file": str(file_path),
|
||||||
"tags": json.dumps(parsed.tags),
|
"tags": json.dumps(parsed.tags),
|
||||||
"title": parsed.title,
|
"title": parsed.title,
|
||||||
|
"project_id": project_id,
|
||||||
})
|
})
|
||||||
|
|
||||||
conn.execute(
|
conn.execute(
|
||||||
@@ -173,7 +188,17 @@ def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
|
|||||||
purge_deleted: If True, remove DB/vector entries for files
|
purge_deleted: If True, remove DB/vector entries for files
|
||||||
that no longer exist on disk.
|
that no longer exist on disk.
|
||||||
"""
|
"""
|
||||||
|
return ingest_project_folder(folder_path, purge_deleted=purge_deleted, project_id="")
|
||||||
|
|
||||||
|
|
||||||
|
def ingest_project_folder(
|
||||||
|
folder_path: Path,
|
||||||
|
purge_deleted: bool = True,
|
||||||
|
project_id: str = "",
|
||||||
|
) -> list[dict]:
|
||||||
|
"""Ingest a folder and annotate chunks with an optional project id."""
|
||||||
folder_path = folder_path.resolve()
|
folder_path = folder_path.resolve()
|
||||||
|
project_id = (project_id or "").strip()
|
||||||
if not folder_path.is_dir():
|
if not folder_path.is_dir():
|
||||||
raise NotADirectoryError(f"Not a directory: {folder_path}")
|
raise NotADirectoryError(f"Not a directory: {folder_path}")
|
||||||
|
|
||||||
@@ -187,7 +212,7 @@ def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
|
|||||||
# Ingest new/changed files
|
# Ingest new/changed files
|
||||||
for md_file in md_files:
|
for md_file in md_files:
|
||||||
try:
|
try:
|
||||||
result = ingest_file(md_file)
|
result = ingest_file(md_file, project_id=project_id)
|
||||||
results.append(result)
|
results.append(result)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
log.error("ingestion_error", file_path=str(md_file), error=str(e))
|
log.error("ingestion_error", file_path=str(md_file), error=str(e))
|
||||||
|
|||||||
@@ -50,6 +50,9 @@ MEMORY_STATUSES = [
|
|||||||
"graduated", # Phase 5: memory has become an entity; content frozen, forward pointer in properties
|
"graduated", # Phase 5: memory has become an entity; content frozen, forward pointer in properties
|
||||||
]
|
]
|
||||||
|
|
||||||
|
DEFAULT_CONTEXT_MEMORY_LIMIT = 30
|
||||||
|
QUERY_CONTEXT_MEMORY_LIMIT = 120
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class Memory:
|
class Memory:
|
||||||
@@ -896,6 +899,7 @@ def get_memories_for_context(
|
|||||||
from atocore.memory.reinforcement import _normalize, _tokenize
|
from atocore.memory.reinforcement import _normalize, _tokenize
|
||||||
|
|
||||||
query_tokens = _tokenize(_normalize(query))
|
query_tokens = _tokenize(_normalize(query))
|
||||||
|
query_tokens = _prepare_memory_query_tokens(query_tokens, project=project)
|
||||||
if not query_tokens:
|
if not query_tokens:
|
||||||
query_tokens = None
|
query_tokens = None
|
||||||
|
|
||||||
@@ -908,12 +912,13 @@ def get_memories_for_context(
|
|||||||
# ``_rank_memories_for_query`` via Python's stable sort.
|
# ``_rank_memories_for_query`` via Python's stable sort.
|
||||||
pool: list[Memory] = []
|
pool: list[Memory] = []
|
||||||
seen_ids: set[str] = set()
|
seen_ids: set[str] = set()
|
||||||
|
candidate_limit = QUERY_CONTEXT_MEMORY_LIMIT if query_tokens is not None else DEFAULT_CONTEXT_MEMORY_LIMIT
|
||||||
for mtype in memory_types:
|
for mtype in memory_types:
|
||||||
for mem in get_memories(
|
for mem in get_memories(
|
||||||
memory_type=mtype,
|
memory_type=mtype,
|
||||||
project=project,
|
project=project,
|
||||||
min_confidence=0.5,
|
min_confidence=0.5,
|
||||||
limit=30,
|
limit=candidate_limit,
|
||||||
):
|
):
|
||||||
if mem.id in seen_ids:
|
if mem.id in seen_ids:
|
||||||
continue
|
continue
|
||||||
@@ -980,11 +985,11 @@ def _rank_memories_for_query(
|
|||||||
) -> list["Memory"]:
|
) -> list["Memory"]:
|
||||||
"""Rerank a memory list by lexical overlap with a pre-tokenized query.
|
"""Rerank a memory list by lexical overlap with a pre-tokenized query.
|
||||||
|
|
||||||
Primary key: overlap_density (overlap_count / memory_token_count),
|
Primary key: absolute overlap count, which keeps a richer memory
|
||||||
which rewards short focused memories that match the query precisely
|
matching multiple query-intent terms ahead of a short memory that
|
||||||
over long overview memories that incidentally share a few tokens.
|
only happens to share one term. Secondary: overlap_density
|
||||||
Secondary: absolute overlap count. Tertiary: domain-tag match.
|
(overlap_count / memory_token_count), so ties still prefer short
|
||||||
Quaternary: confidence.
|
focused memories. Tertiary: domain-tag match. Quaternary: confidence.
|
||||||
|
|
||||||
Phase 3: domain_tags contribute a boost when they appear in the
|
Phase 3: domain_tags contribute a boost when they appear in the
|
||||||
query text. A memory tagged [optics, thermal] for a query about
|
query text. A memory tagged [optics, thermal] for a query about
|
||||||
@@ -1010,10 +1015,46 @@ def _rank_memories_for_query(
|
|||||||
tag_hits += 1
|
tag_hits += 1
|
||||||
|
|
||||||
scored.append((density, overlap, tag_hits, mem.confidence, mem))
|
scored.append((density, overlap, tag_hits, mem.confidence, mem))
|
||||||
scored.sort(key=lambda t: (t[0], t[1], t[2], t[3]), reverse=True)
|
scored.sort(key=lambda t: (t[1], t[0], t[2], t[3]), reverse=True)
|
||||||
return [mem for _, _, _, _, mem in scored]
|
return [mem for _, _, _, _, mem in scored]
|
||||||
|
|
||||||
|
|
||||||
|
_MEMORY_QUERY_STOP_TOKENS = {
|
||||||
|
"how",
|
||||||
|
"what",
|
||||||
|
"when",
|
||||||
|
"where",
|
||||||
|
"which",
|
||||||
|
"who",
|
||||||
|
"why",
|
||||||
|
"current",
|
||||||
|
"status",
|
||||||
|
"project",
|
||||||
|
"machine",
|
||||||
|
}
|
||||||
|
|
||||||
|
_MEMORY_QUERY_TOKEN_EXPANSIONS = {
|
||||||
|
"remotely": {"remote"},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _prepare_memory_query_tokens(
|
||||||
|
query_tokens: set[str],
|
||||||
|
project: str | None = None,
|
||||||
|
) -> set[str]:
|
||||||
|
"""Remove project-scope noise and add tiny intent-preserving expansions."""
|
||||||
|
prepared = set(query_tokens)
|
||||||
|
for token in list(prepared):
|
||||||
|
prepared.update(_MEMORY_QUERY_TOKEN_EXPANSIONS.get(token, set()))
|
||||||
|
|
||||||
|
prepared -= _MEMORY_QUERY_STOP_TOKENS
|
||||||
|
if project:
|
||||||
|
for part in project.lower().replace("_", "-").split("-"):
|
||||||
|
if part:
|
||||||
|
prepared.discard(part)
|
||||||
|
return prepared
|
||||||
|
|
||||||
|
|
||||||
def _row_to_memory(row) -> Memory:
|
def _row_to_memory(row) -> Memory:
|
||||||
"""Convert a DB row to Memory dataclass."""
|
"""Convert a DB row to Memory dataclass."""
|
||||||
import json as _json
|
import json as _json
|
||||||
|
|||||||
@@ -146,6 +146,28 @@ def _apply_migrations(conn: sqlite3.Connection) -> None:
|
|||||||
"CREATE INDEX IF NOT EXISTS idx_memories_graduated ON memories(graduated_to_entity_id)"
|
"CREATE INDEX IF NOT EXISTS idx_memories_graduated ON memories(graduated_to_entity_id)"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# V1-0 (Engineering V1 completion): shared header fields per
|
||||||
|
# engineering-v1-acceptance.md:45. Three columns on `entities`:
|
||||||
|
# - extractor_version: which extractor produced this row. Lets old
|
||||||
|
# candidates be re-evaluated with a newer extractor per
|
||||||
|
# promotion-rules.md:268.
|
||||||
|
# - canonical_home: which layer holds the canonical record. Always
|
||||||
|
# "entity" for rows written via create_entity; reserved for future
|
||||||
|
# cross-layer bookkeeping.
|
||||||
|
# - hand_authored: 1 when the row was created directly by a human
|
||||||
|
# without source provenance. Enforced by the write path so every
|
||||||
|
# non-hand-authored row must carry non-empty source_refs (F-8).
|
||||||
|
# The entities table itself is created by init_engineering_schema
|
||||||
|
# (see engineering/service.py); these ALTERs cover existing DBs
|
||||||
|
# where the original CREATE TABLE predates V1-0.
|
||||||
|
if _table_exists(conn, "entities"):
|
||||||
|
if not _column_exists(conn, "entities", "extractor_version"):
|
||||||
|
conn.execute("ALTER TABLE entities ADD COLUMN extractor_version TEXT DEFAULT ''")
|
||||||
|
if not _column_exists(conn, "entities", "canonical_home"):
|
||||||
|
conn.execute("ALTER TABLE entities ADD COLUMN canonical_home TEXT DEFAULT 'entity'")
|
||||||
|
if not _column_exists(conn, "entities", "hand_authored"):
|
||||||
|
conn.execute("ALTER TABLE entities ADD COLUMN hand_authored INTEGER DEFAULT 0")
|
||||||
|
|
||||||
# Phase 4 (Robustness V1): append-only audit log for memory mutations.
|
# Phase 4 (Robustness V1): append-only audit log for memory mutations.
|
||||||
# Every create/update/promote/reject/supersede/invalidate/reinforce/expire/
|
# Every create/update/promote/reject/supersede/invalidate/reinforce/expire/
|
||||||
# auto_promote writes one row here. before/after are JSON snapshots of the
|
# auto_promote writes one row here. before/after are JSON snapshots of the
|
||||||
@@ -352,6 +374,14 @@ def _column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
|
|||||||
return any(row["name"] == column for row in rows)
|
return any(row["name"] == column for row in rows)
|
||||||
|
|
||||||
|
|
||||||
|
def _table_exists(conn: sqlite3.Connection, table: str) -> bool:
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT name FROM sqlite_master WHERE type='table' AND name=?",
|
||||||
|
(table,),
|
||||||
|
).fetchone()
|
||||||
|
return row is not None
|
||||||
|
|
||||||
|
|
||||||
@contextmanager
|
@contextmanager
|
||||||
def get_connection() -> Generator[sqlite3.Connection, None, None]:
|
def get_connection() -> Generator[sqlite3.Connection, None, None]:
|
||||||
"""Get a database connection with row factory."""
|
"""Get a database connection with row factory."""
|
||||||
|
|||||||
@@ -8,7 +8,6 @@ from dataclasses import asdict, dataclass
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
import atocore.config as _config
|
import atocore.config as _config
|
||||||
from atocore.ingestion.pipeline import ingest_folder
|
|
||||||
|
|
||||||
|
|
||||||
# Reserved pseudo-projects. `inbox` holds pre-project / lead / quote
|
# Reserved pseudo-projects. `inbox` holds pre-project / lead / quote
|
||||||
@@ -260,6 +259,7 @@ def load_project_registry() -> list[RegisteredProject]:
|
|||||||
)
|
)
|
||||||
|
|
||||||
_validate_unique_project_names(projects)
|
_validate_unique_project_names(projects)
|
||||||
|
_validate_ingest_root_overlaps(projects)
|
||||||
return projects
|
return projects
|
||||||
|
|
||||||
|
|
||||||
@@ -307,6 +307,28 @@ def resolve_project_name(name: str | None) -> str:
|
|||||||
return name
|
return name
|
||||||
|
|
||||||
|
|
||||||
|
def derive_project_id_for_path(file_path: str | Path) -> str:
|
||||||
|
"""Return the registered project that owns a source path, if any."""
|
||||||
|
if not file_path:
|
||||||
|
return ""
|
||||||
|
doc_path = Path(file_path).resolve(strict=False)
|
||||||
|
matches: list[tuple[int, int, str]] = []
|
||||||
|
|
||||||
|
for project in load_project_registry():
|
||||||
|
for source_ref in project.ingest_roots:
|
||||||
|
root_path = _resolve_ingest_root(source_ref)
|
||||||
|
try:
|
||||||
|
doc_path.relative_to(root_path)
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
matches.append((len(root_path.parts), len(str(root_path)), project.project_id))
|
||||||
|
|
||||||
|
if not matches:
|
||||||
|
return ""
|
||||||
|
matches.sort(reverse=True)
|
||||||
|
return matches[0][2]
|
||||||
|
|
||||||
|
|
||||||
def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
|
def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
|
||||||
"""Ingest all configured source roots for a registered project.
|
"""Ingest all configured source roots for a registered project.
|
||||||
|
|
||||||
@@ -322,6 +344,8 @@ def refresh_registered_project(project_name: str, purge_deleted: bool = False) -
|
|||||||
if project is None:
|
if project is None:
|
||||||
raise ValueError(f"Unknown project: {project_name}")
|
raise ValueError(f"Unknown project: {project_name}")
|
||||||
|
|
||||||
|
from atocore.ingestion.pipeline import ingest_project_folder
|
||||||
|
|
||||||
roots = []
|
roots = []
|
||||||
ingested_count = 0
|
ingested_count = 0
|
||||||
skipped_count = 0
|
skipped_count = 0
|
||||||
@@ -346,7 +370,11 @@ def refresh_registered_project(project_name: str, purge_deleted: bool = False) -
|
|||||||
{
|
{
|
||||||
**root_result,
|
**root_result,
|
||||||
"status": "ingested",
|
"status": "ingested",
|
||||||
"results": ingest_folder(resolved, purge_deleted=purge_deleted),
|
"results": ingest_project_folder(
|
||||||
|
resolved,
|
||||||
|
purge_deleted=purge_deleted,
|
||||||
|
project_id=project.project_id,
|
||||||
|
),
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
ingested_count += 1
|
ingested_count += 1
|
||||||
@@ -443,6 +471,33 @@ def _validate_unique_project_names(projects: list[RegisteredProject]) -> None:
|
|||||||
seen[key] = project.project_id
|
seen[key] = project.project_id
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_ingest_root_overlaps(projects: list[RegisteredProject]) -> None:
|
||||||
|
roots: list[tuple[str, Path]] = []
|
||||||
|
for project in projects:
|
||||||
|
for source_ref in project.ingest_roots:
|
||||||
|
roots.append((project.project_id, _resolve_ingest_root(source_ref)))
|
||||||
|
|
||||||
|
for i, (left_project, left_root) in enumerate(roots):
|
||||||
|
for right_project, right_root in roots[i + 1:]:
|
||||||
|
if left_project == right_project:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
left_root.relative_to(right_root)
|
||||||
|
overlaps = True
|
||||||
|
except ValueError:
|
||||||
|
try:
|
||||||
|
right_root.relative_to(left_root)
|
||||||
|
overlaps = True
|
||||||
|
except ValueError:
|
||||||
|
overlaps = False
|
||||||
|
if overlaps:
|
||||||
|
raise ValueError(
|
||||||
|
"Project registry ingest root overlap: "
|
||||||
|
f"'{left_root}' ({left_project}) and "
|
||||||
|
f"'{right_root}' ({right_project})"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def _find_name_collisions(
|
def _find_name_collisions(
|
||||||
project_id: str,
|
project_id: str,
|
||||||
aliases: list[str],
|
aliases: list[str],
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
"""Retrieval: query to ranked chunks."""
|
"""Retrieval: query to ranked chunks."""
|
||||||
|
|
||||||
|
import json
|
||||||
import re
|
import re
|
||||||
import time
|
import time
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
@@ -7,7 +8,7 @@ from dataclasses import dataclass
|
|||||||
import atocore.config as _config
|
import atocore.config as _config
|
||||||
from atocore.models.database import get_connection
|
from atocore.models.database import get_connection
|
||||||
from atocore.observability.logger import get_logger
|
from atocore.observability.logger import get_logger
|
||||||
from atocore.projects.registry import get_registered_project
|
from atocore.projects.registry import RegisteredProject, get_registered_project, load_project_registry
|
||||||
from atocore.retrieval.embeddings import embed_query
|
from atocore.retrieval.embeddings import embed_query
|
||||||
from atocore.retrieval.vector_store import get_vector_store
|
from atocore.retrieval.vector_store import get_vector_store
|
||||||
|
|
||||||
@@ -83,6 +84,27 @@ def retrieve(
|
|||||||
"""Retrieve the most relevant chunks for a query."""
|
"""Retrieve the most relevant chunks for a query."""
|
||||||
top_k = top_k or _config.settings.context_top_k
|
top_k = top_k or _config.settings.context_top_k
|
||||||
start = time.time()
|
start = time.time()
|
||||||
|
try:
|
||||||
|
scoped_project = get_registered_project(project_hint) if project_hint else None
|
||||||
|
except Exception as exc:
|
||||||
|
log.warning(
|
||||||
|
"project_scope_resolution_failed",
|
||||||
|
project_hint=project_hint,
|
||||||
|
error=str(exc),
|
||||||
|
)
|
||||||
|
scoped_project = None
|
||||||
|
scope_filter_enabled = bool(scoped_project and _config.settings.rank_project_scope_filter)
|
||||||
|
registered_projects = None
|
||||||
|
query_top_k = top_k
|
||||||
|
if scope_filter_enabled:
|
||||||
|
query_top_k = max(
|
||||||
|
top_k,
|
||||||
|
top_k * max(1, _config.settings.rank_project_scope_candidate_multiplier),
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
registered_projects = load_project_registry()
|
||||||
|
except Exception:
|
||||||
|
registered_projects = None
|
||||||
|
|
||||||
query_embedding = embed_query(query)
|
query_embedding = embed_query(query)
|
||||||
store = get_vector_store()
|
store = get_vector_store()
|
||||||
@@ -101,11 +123,12 @@ def retrieve(
|
|||||||
|
|
||||||
results = store.query(
|
results = store.query(
|
||||||
query_embedding=query_embedding,
|
query_embedding=query_embedding,
|
||||||
top_k=top_k,
|
top_k=query_top_k,
|
||||||
where=where,
|
where=where,
|
||||||
)
|
)
|
||||||
|
|
||||||
chunks = []
|
chunks = []
|
||||||
|
raw_result_count = len(results["ids"][0]) if results and results["ids"] and results["ids"][0] else 0
|
||||||
if results and results["ids"] and results["ids"][0]:
|
if results and results["ids"] and results["ids"][0]:
|
||||||
existing_ids = _existing_chunk_ids(results["ids"][0])
|
existing_ids = _existing_chunk_ids(results["ids"][0])
|
||||||
for i, chunk_id in enumerate(results["ids"][0]):
|
for i, chunk_id in enumerate(results["ids"][0]):
|
||||||
@@ -117,6 +140,13 @@ def retrieve(
|
|||||||
meta = results["metadatas"][0][i] if results["metadatas"] else {}
|
meta = results["metadatas"][0][i] if results["metadatas"] else {}
|
||||||
content = results["documents"][0][i] if results["documents"] else ""
|
content = results["documents"][0][i] if results["documents"] else ""
|
||||||
|
|
||||||
|
if scope_filter_enabled and not _is_allowed_for_project_scope(
|
||||||
|
scoped_project,
|
||||||
|
meta,
|
||||||
|
registered_projects,
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
|
||||||
score *= _query_match_boost(query, meta)
|
score *= _query_match_boost(query, meta)
|
||||||
score *= _path_signal_boost(meta)
|
score *= _path_signal_boost(meta)
|
||||||
if project_hint:
|
if project_hint:
|
||||||
@@ -137,42 +167,151 @@ def retrieve(
|
|||||||
|
|
||||||
duration_ms = int((time.time() - start) * 1000)
|
duration_ms = int((time.time() - start) * 1000)
|
||||||
chunks.sort(key=lambda chunk: chunk.score, reverse=True)
|
chunks.sort(key=lambda chunk: chunk.score, reverse=True)
|
||||||
|
post_filter_count = len(chunks)
|
||||||
|
chunks = chunks[:top_k]
|
||||||
|
|
||||||
log.info(
|
log.info(
|
||||||
"retrieval_done",
|
"retrieval_done",
|
||||||
query=query[:100],
|
query=query[:100],
|
||||||
top_k=top_k,
|
top_k=top_k,
|
||||||
|
query_top_k=query_top_k,
|
||||||
|
raw_results_count=raw_result_count,
|
||||||
|
post_filter_count=post_filter_count,
|
||||||
results_count=len(chunks),
|
results_count=len(chunks),
|
||||||
|
post_filter_dropped=max(0, raw_result_count - post_filter_count),
|
||||||
|
underfilled=bool(raw_result_count >= query_top_k and len(chunks) < top_k),
|
||||||
duration_ms=duration_ms,
|
duration_ms=duration_ms,
|
||||||
)
|
)
|
||||||
|
|
||||||
return chunks
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
def _is_allowed_for_project_scope(
|
||||||
|
project: RegisteredProject,
|
||||||
|
metadata: dict,
|
||||||
|
registered_projects: list[RegisteredProject] | None = None,
|
||||||
|
) -> bool:
|
||||||
|
"""Return True when a chunk is target-project or not project-owned.
|
||||||
|
|
||||||
|
Project-hinted retrieval should not let one registered project's corpus
|
||||||
|
compete with another's. At the same time, unowned/global sources should
|
||||||
|
remain eligible because shared docs and cross-project references can be
|
||||||
|
genuinely useful. The registry gives us the boundary: if metadata matches
|
||||||
|
a registered project and it is not the requested project, filter it out.
|
||||||
|
"""
|
||||||
|
if _metadata_matches_project(project, metadata):
|
||||||
|
return True
|
||||||
|
|
||||||
|
if registered_projects is None:
|
||||||
|
try:
|
||||||
|
registered_projects = load_project_registry()
|
||||||
|
except Exception:
|
||||||
|
return True
|
||||||
|
|
||||||
|
for other in registered_projects:
|
||||||
|
if other.project_id == project.project_id:
|
||||||
|
continue
|
||||||
|
if _metadata_matches_project(other, metadata):
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _metadata_matches_project(project: RegisteredProject, metadata: dict) -> bool:
|
||||||
|
stored_project_id = str(metadata.get("project_id", "")).strip().lower()
|
||||||
|
if stored_project_id:
|
||||||
|
return stored_project_id == project.project_id.lower()
|
||||||
|
|
||||||
|
path = _metadata_source_path(metadata)
|
||||||
|
tags = _metadata_tags(metadata)
|
||||||
|
for term in _project_scope_terms(project):
|
||||||
|
if _path_matches_term(path, term) or term in tags:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _project_scope_terms(project: RegisteredProject) -> set[str]:
|
||||||
|
terms = {project.project_id.lower()}
|
||||||
|
terms.update(alias.lower() for alias in project.aliases)
|
||||||
|
for source_ref in project.ingest_roots:
|
||||||
|
normalized = source_ref.subpath.replace("\\", "/").strip("/").lower()
|
||||||
|
if normalized:
|
||||||
|
terms.add(normalized)
|
||||||
|
terms.add(normalized.split("/")[-1])
|
||||||
|
return {term for term in terms if term}
|
||||||
|
|
||||||
|
|
||||||
|
def _metadata_searchable(metadata: dict) -> str:
|
||||||
|
return " ".join(
|
||||||
|
[
|
||||||
|
str(metadata.get("source_file", "")).replace("\\", "/").lower(),
|
||||||
|
str(metadata.get("title", "")).lower(),
|
||||||
|
str(metadata.get("heading_path", "")).lower(),
|
||||||
|
str(metadata.get("tags", "")).lower(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _metadata_source_path(metadata: dict) -> str:
|
||||||
|
return str(metadata.get("source_file", "")).replace("\\", "/").strip("/").lower()
|
||||||
|
|
||||||
|
|
||||||
|
def _metadata_tags(metadata: dict) -> set[str]:
|
||||||
|
raw_tags = metadata.get("tags", [])
|
||||||
|
if isinstance(raw_tags, (list, tuple, set)):
|
||||||
|
return {str(tag).strip().lower() for tag in raw_tags if str(tag).strip()}
|
||||||
|
if isinstance(raw_tags, str):
|
||||||
|
try:
|
||||||
|
parsed = json.loads(raw_tags)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
parsed = [raw_tags]
|
||||||
|
if isinstance(parsed, (list, tuple, set)):
|
||||||
|
return {str(tag).strip().lower() for tag in parsed if str(tag).strip()}
|
||||||
|
if isinstance(parsed, str) and parsed.strip():
|
||||||
|
return {parsed.strip().lower()}
|
||||||
|
return set()
|
||||||
|
|
||||||
|
|
||||||
|
def _path_matches_term(path: str, term: str) -> bool:
|
||||||
|
normalized = term.replace("\\", "/").strip("/").lower()
|
||||||
|
if not path or not normalized:
|
||||||
|
return False
|
||||||
|
if "/" in normalized:
|
||||||
|
return path == normalized or path.startswith(f"{normalized}/")
|
||||||
|
return normalized in set(path.split("/"))
|
||||||
|
|
||||||
|
|
||||||
|
def _metadata_has_term(metadata: dict, term: str) -> bool:
|
||||||
|
normalized = term.replace("\\", "/").strip("/").lower()
|
||||||
|
if not normalized:
|
||||||
|
return False
|
||||||
|
if _path_matches_term(_metadata_source_path(metadata), normalized):
|
||||||
|
return True
|
||||||
|
if normalized in _metadata_tags(metadata):
|
||||||
|
return True
|
||||||
|
return re.search(
|
||||||
|
rf"(?<![a-z0-9]){re.escape(normalized)}(?![a-z0-9])",
|
||||||
|
_metadata_searchable(metadata),
|
||||||
|
) is not None
|
||||||
|
|
||||||
|
|
||||||
def _project_match_boost(project_hint: str, metadata: dict) -> float:
|
def _project_match_boost(project_hint: str, metadata: dict) -> float:
|
||||||
"""Return a project-aware relevance multiplier for raw retrieval."""
|
"""Return a project-aware relevance multiplier for raw retrieval."""
|
||||||
hint_lower = project_hint.strip().lower()
|
hint_lower = project_hint.strip().lower()
|
||||||
if not hint_lower:
|
if not hint_lower:
|
||||||
return 1.0
|
return 1.0
|
||||||
|
|
||||||
source_file = str(metadata.get("source_file", "")).lower()
|
try:
|
||||||
title = str(metadata.get("title", "")).lower()
|
project = get_registered_project(project_hint)
|
||||||
tags = str(metadata.get("tags", "")).lower()
|
except Exception as exc:
|
||||||
searchable = " ".join([source_file, title, tags])
|
log.warning(
|
||||||
|
"project_match_boost_resolution_failed",
|
||||||
project = get_registered_project(project_hint)
|
project_hint=project_hint,
|
||||||
candidate_names = {hint_lower}
|
error=str(exc),
|
||||||
if project is not None:
|
|
||||||
candidate_names.add(project.project_id.lower())
|
|
||||||
candidate_names.update(alias.lower() for alias in project.aliases)
|
|
||||||
candidate_names.update(
|
|
||||||
source_ref.subpath.replace("\\", "/").strip("/").split("/")[-1].lower()
|
|
||||||
for source_ref in project.ingest_roots
|
|
||||||
if source_ref.subpath.strip("/\\")
|
|
||||||
)
|
)
|
||||||
|
project = None
|
||||||
|
candidate_names = _project_scope_terms(project) if project is not None else {hint_lower}
|
||||||
for candidate in candidate_names:
|
for candidate in candidate_names:
|
||||||
if candidate and candidate in searchable:
|
if _metadata_has_term(metadata, candidate):
|
||||||
return _config.settings.rank_project_match_boost
|
return _config.settings.rank_project_match_boost
|
||||||
|
|
||||||
return 1.0
|
return 1.0
|
||||||
|
|||||||
@@ -64,6 +64,18 @@ class VectorStore:
|
|||||||
self._collection.delete(ids=ids)
|
self._collection.delete(ids=ids)
|
||||||
log.debug("vectors_deleted", count=len(ids))
|
log.debug("vectors_deleted", count=len(ids))
|
||||||
|
|
||||||
|
def get_metadatas(self, ids: list[str]) -> dict:
|
||||||
|
"""Fetch vector metadata by chunk IDs."""
|
||||||
|
if not ids:
|
||||||
|
return {"ids": [], "metadatas": []}
|
||||||
|
return self._collection.get(ids=ids, include=["metadatas"])
|
||||||
|
|
||||||
|
def update_metadatas(self, ids: list[str], metadatas: list[dict]) -> None:
|
||||||
|
"""Update vector metadata without re-embedding documents."""
|
||||||
|
if ids:
|
||||||
|
self._collection.update(ids=ids, metadatas=metadatas)
|
||||||
|
log.debug("vector_metadatas_updated", count=len(ids))
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def count(self) -> int:
|
def count(self) -> int:
|
||||||
return self._collection.count()
|
return self._collection.count()
|
||||||
|
|||||||
@@ -16,6 +16,36 @@ os.environ["ATOCORE_DATA_DIR"] = _default_test_dir
|
|||||||
os.environ["ATOCORE_DEBUG"] = "true"
|
os.environ["ATOCORE_DEBUG"] = "true"
|
||||||
|
|
||||||
|
|
||||||
|
# V1-0: every entity created in a test is "hand authored" by the test
|
||||||
|
# author — fixture data, not extracted content. Rather than rewrite 100+
|
||||||
|
# existing test call sites, wrap create_entity so that tests which don't
|
||||||
|
# provide source_refs get hand_authored=True automatically. Tests that
|
||||||
|
# explicitly pass source_refs or hand_authored are unaffected. This keeps
|
||||||
|
# the F-8 invariant enforced in production (the API, the wiki form, and
|
||||||
|
# graduation scripts all go through the unwrapped function) while leaving
|
||||||
|
# the existing test corpus intact.
|
||||||
|
def _patch_create_entity_for_tests():
|
||||||
|
from atocore.engineering import service as _svc
|
||||||
|
|
||||||
|
_original = _svc.create_entity
|
||||||
|
|
||||||
|
def _create_entity_test(*args, **kwargs):
|
||||||
|
# Only auto-flag when hand_authored isn't explicitly specified.
|
||||||
|
# Tests that want to exercise the F-8 raise path pass
|
||||||
|
# hand_authored=False explicitly and should hit the error.
|
||||||
|
if (
|
||||||
|
not kwargs.get("source_refs")
|
||||||
|
and "hand_authored" not in kwargs
|
||||||
|
):
|
||||||
|
kwargs["hand_authored"] = True
|
||||||
|
return _original(*args, **kwargs)
|
||||||
|
|
||||||
|
_svc.create_entity = _create_entity_test
|
||||||
|
|
||||||
|
|
||||||
|
_patch_create_entity_for_tests()
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
def tmp_data_dir(tmp_path):
|
def tmp_data_dir(tmp_path):
|
||||||
"""Provide a temporary data directory for tests."""
|
"""Provide a temporary data directory for tests."""
|
||||||
|
|||||||
154
tests/test_backfill_chunk_project_ids.py
Normal file
154
tests/test_backfill_chunk_project_ids.py
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
"""Tests for explicit chunk project_id metadata backfill."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
|
||||||
|
import atocore.config as config
|
||||||
|
from atocore.models.database import get_connection, init_db
|
||||||
|
from scripts import backfill_chunk_project_ids as backfill
|
||||||
|
|
||||||
|
|
||||||
|
def _write_registry(tmp_path, monkeypatch):
|
||||||
|
vault_dir = tmp_path / "vault"
|
||||||
|
drive_dir = tmp_path / "drive"
|
||||||
|
config_dir = tmp_path / "config"
|
||||||
|
project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
|
||||||
|
project_dir.mkdir(parents=True)
|
||||||
|
drive_dir.mkdir()
|
||||||
|
config_dir.mkdir()
|
||||||
|
registry_path = config_dir / "project-registry.json"
|
||||||
|
registry_path.write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"projects": [
|
||||||
|
{
|
||||||
|
"id": "p04-gigabit",
|
||||||
|
"aliases": ["p04"],
|
||||||
|
"ingest_roots": [
|
||||||
|
{"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||||
|
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||||
|
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||||
|
config.settings = config.Settings()
|
||||||
|
return project_dir
|
||||||
|
|
||||||
|
|
||||||
|
def _insert_chunk(file_path, metadata=None, chunk_id="chunk-1"):
|
||||||
|
with get_connection() as conn:
|
||||||
|
conn.execute(
|
||||||
|
"""
|
||||||
|
INSERT INTO source_documents (id, file_path, file_hash, title, doc_type, tags)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?)
|
||||||
|
""",
|
||||||
|
("doc-1", str(file_path), "hash", "Title", "markdown", "[]"),
|
||||||
|
)
|
||||||
|
conn.execute(
|
||||||
|
"""
|
||||||
|
INSERT INTO source_chunks
|
||||||
|
(id, document_id, chunk_index, content, heading_path, char_count, metadata)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||||
|
""",
|
||||||
|
(
|
||||||
|
chunk_id,
|
||||||
|
"doc-1",
|
||||||
|
0,
|
||||||
|
"content",
|
||||||
|
"Overview",
|
||||||
|
7,
|
||||||
|
json.dumps(metadata if metadata is not None else {}),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FakeVectorStore:
|
||||||
|
def __init__(self, metadatas):
|
||||||
|
self.metadatas = dict(metadatas)
|
||||||
|
self.updated = []
|
||||||
|
|
||||||
|
def get_metadatas(self, ids):
|
||||||
|
returned_ids = [chunk_id for chunk_id in ids if chunk_id in self.metadatas]
|
||||||
|
return {
|
||||||
|
"ids": returned_ids,
|
||||||
|
"metadatas": [self.metadatas[chunk_id] for chunk_id in returned_ids],
|
||||||
|
}
|
||||||
|
|
||||||
|
def update_metadatas(self, ids, metadatas):
|
||||||
|
self.updated.append((list(ids), list(metadatas)))
|
||||||
|
for chunk_id, metadata in zip(ids, metadatas, strict=True):
|
||||||
|
self.metadatas[chunk_id] = metadata
|
||||||
|
|
||||||
|
|
||||||
|
def test_backfill_dry_run_is_non_mutating(tmp_data_dir, tmp_path, monkeypatch):
|
||||||
|
init_db()
|
||||||
|
project_dir = _write_registry(tmp_path, monkeypatch)
|
||||||
|
_insert_chunk(project_dir / "status.md")
|
||||||
|
|
||||||
|
result = backfill.backfill(apply=False)
|
||||||
|
|
||||||
|
assert result["updates"] == 1
|
||||||
|
with get_connection() as conn:
|
||||||
|
row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
|
||||||
|
assert json.loads(row["metadata"]) == {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_backfill_apply_updates_chroma_then_sql(tmp_data_dir, tmp_path, monkeypatch):
|
||||||
|
init_db()
|
||||||
|
project_dir = _write_registry(tmp_path, monkeypatch)
|
||||||
|
_insert_chunk(project_dir / "status.md", metadata={"source_file": "status.md"})
|
||||||
|
fake_store = FakeVectorStore({"chunk-1": {"source_file": "status.md"}})
|
||||||
|
monkeypatch.setattr(backfill, "get_vector_store", lambda: fake_store)
|
||||||
|
|
||||||
|
result = backfill.backfill(apply=True, require_chroma_snapshot=True)
|
||||||
|
|
||||||
|
assert result["applied_updates"] == 1
|
||||||
|
assert fake_store.metadatas["chunk-1"]["project_id"] == "p04-gigabit"
|
||||||
|
with get_connection() as conn:
|
||||||
|
row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
|
||||||
|
assert json.loads(row["metadata"])["project_id"] == "p04-gigabit"
|
||||||
|
|
||||||
|
|
||||||
|
def test_backfill_apply_requires_snapshot_confirmation(tmp_data_dir, tmp_path, monkeypatch):
|
||||||
|
init_db()
|
||||||
|
project_dir = _write_registry(tmp_path, monkeypatch)
|
||||||
|
_insert_chunk(project_dir / "status.md")
|
||||||
|
|
||||||
|
try:
|
||||||
|
backfill.backfill(apply=True)
|
||||||
|
except ValueError as exc:
|
||||||
|
assert "Chroma backup" in str(exc)
|
||||||
|
else:
|
||||||
|
raise AssertionError("Expected snapshot confirmation requirement")
|
||||||
|
|
||||||
|
|
||||||
|
def test_backfill_missing_vector_skips_sql_update(tmp_data_dir, tmp_path, monkeypatch):
|
||||||
|
init_db()
|
||||||
|
project_dir = _write_registry(tmp_path, monkeypatch)
|
||||||
|
_insert_chunk(project_dir / "status.md")
|
||||||
|
fake_store = FakeVectorStore({})
|
||||||
|
monkeypatch.setattr(backfill, "get_vector_store", lambda: fake_store)
|
||||||
|
|
||||||
|
result = backfill.backfill(apply=True, require_chroma_snapshot=True)
|
||||||
|
|
||||||
|
assert result["updates"] == 1
|
||||||
|
assert result["applied_updates"] == 0
|
||||||
|
assert result["missing_vectors"] == 1
|
||||||
|
with get_connection() as conn:
|
||||||
|
row = conn.execute("SELECT metadata FROM source_chunks WHERE id = ?", ("chunk-1",)).fetchone()
|
||||||
|
assert json.loads(row["metadata"]) == {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_backfill_skips_malformed_metadata(tmp_data_dir, tmp_path, monkeypatch):
|
||||||
|
init_db()
|
||||||
|
project_dir = _write_registry(tmp_path, monkeypatch)
|
||||||
|
_insert_chunk(project_dir / "status.md", metadata=[])
|
||||||
|
|
||||||
|
result = backfill.backfill(apply=False)
|
||||||
|
|
||||||
|
assert result["updates"] == 0
|
||||||
|
assert result["malformed_metadata"] == 1
|
||||||
@@ -46,6 +46,8 @@ def test_settings_keep_legacy_db_path_when_present(tmp_path, monkeypatch):
|
|||||||
|
|
||||||
def test_ranking_weights_are_tunable_via_env(monkeypatch):
|
def test_ranking_weights_are_tunable_via_env(monkeypatch):
|
||||||
monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5")
|
monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5")
|
||||||
|
monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_FILTER", "false")
|
||||||
|
monkeypatch.setenv("ATOCORE_RANK_PROJECT_SCOPE_CANDIDATE_MULTIPLIER", "6")
|
||||||
monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12")
|
monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12")
|
||||||
monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5")
|
monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5")
|
||||||
monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25")
|
monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25")
|
||||||
@@ -54,6 +56,8 @@ def test_ranking_weights_are_tunable_via_env(monkeypatch):
|
|||||||
settings = config.Settings()
|
settings = config.Settings()
|
||||||
|
|
||||||
assert settings.rank_project_match_boost == 3.5
|
assert settings.rank_project_match_boost == 3.5
|
||||||
|
assert settings.rank_project_scope_filter is False
|
||||||
|
assert settings.rank_project_scope_candidate_multiplier == 6
|
||||||
assert settings.rank_query_token_step == 0.12
|
assert settings.rank_query_token_step == 0.12
|
||||||
assert settings.rank_query_token_cap == 1.5
|
assert settings.rank_query_token_cap == 1.5
|
||||||
assert settings.rank_path_high_signal_boost == 1.25
|
assert settings.rank_path_high_signal_boost == 1.25
|
||||||
|
|||||||
@@ -143,6 +143,52 @@ def test_project_state_respects_total_budget(tmp_data_dir, sample_markdown):
|
|||||||
assert len(pack.formatted_context) <= 120
|
assert len(pack.formatted_context) <= 120
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_state_query_relevance_before_truncation(tmp_data_dir, sample_markdown):
|
||||||
|
"""Relevant trusted state should survive the project-state budget cap."""
|
||||||
|
init_db()
|
||||||
|
init_project_state_schema()
|
||||||
|
ingest_file(sample_markdown)
|
||||||
|
|
||||||
|
set_state(
|
||||||
|
"p04-gigabit",
|
||||||
|
"contact",
|
||||||
|
"abb-space",
|
||||||
|
"ABB Space is the primary vendor contact for polishing, CCP, IBF, procurement coordination, "
|
||||||
|
"contract administration, interface planning, and delivery discussions.",
|
||||||
|
)
|
||||||
|
set_state(
|
||||||
|
"p04-gigabit",
|
||||||
|
"decision",
|
||||||
|
"back-structure",
|
||||||
|
"Option B selected: conical isogrid back structure with variable rib density. "
|
||||||
|
"Chosen over flat-back for stiffness-to-weight ratio and manufacturability.",
|
||||||
|
)
|
||||||
|
set_state(
|
||||||
|
"p04-gigabit",
|
||||||
|
"decision",
|
||||||
|
"polishing-vendor",
|
||||||
|
"ABB Space selected as polishing vendor. Contract includes computer-controlled polishing "
|
||||||
|
"and ion beam figuring.",
|
||||||
|
)
|
||||||
|
set_state(
|
||||||
|
"p04-gigabit",
|
||||||
|
"requirement",
|
||||||
|
"key_constraints",
|
||||||
|
"The program targets a 1.2 m lightweight Zerodur mirror with filtered mechanical WFE below 15 nm "
|
||||||
|
"and mass below 103.5 kg.",
|
||||||
|
)
|
||||||
|
|
||||||
|
pack = build_context(
|
||||||
|
"what are the key GigaBIT M1 program constraints",
|
||||||
|
project_hint="p04-gigabit",
|
||||||
|
budget=3000,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "Zerodur" in pack.formatted_context
|
||||||
|
assert "1.2" in pack.formatted_context
|
||||||
|
assert pack.formatted_context.find("[REQUIREMENT]") < pack.formatted_context.find("[CONTACT]")
|
||||||
|
|
||||||
|
|
||||||
def test_project_hint_matches_state_case_insensitively(tmp_data_dir, sample_markdown):
|
def test_project_hint_matches_state_case_insensitively(tmp_data_dir, sample_markdown):
|
||||||
"""Project state lookup should not depend on exact casing."""
|
"""Project state lookup should not depend on exact casing."""
|
||||||
init_db()
|
init_db()
|
||||||
|
|||||||
@@ -143,8 +143,11 @@ def test_requirement_name_conflict_detected(tmp_data_dir):
|
|||||||
r2 = create_entity("requirement", "Surface figure < 25nm",
|
r2 = create_entity("requirement", "Surface figure < 25nm",
|
||||||
project="p-test", description="Different interpretation")
|
project="p-test", description="Different interpretation")
|
||||||
|
|
||||||
detected = detect_conflicts_for_entity(r2.id)
|
# V1-0 synchronous hook: the conflict is already detected at r2's
|
||||||
assert len(detected) == 1
|
# create-time, so a redundant detect call returns [] due to
|
||||||
|
# _record_conflict dedup. Assert on list_open_conflicts instead —
|
||||||
|
# that's what the intent of this test really tests: duplicate
|
||||||
|
# active requirements surface as an open conflict.
|
||||||
conflicts = list_open_conflicts(project="p-test")
|
conflicts = list_open_conflicts(project="p-test")
|
||||||
assert any(c["slot_kind"] == "requirement.name" for c in conflicts)
|
assert any(c["slot_kind"] == "requirement.name" for c in conflicts)
|
||||||
|
|
||||||
@@ -191,8 +194,12 @@ def test_conflict_resolution_dismiss_leaves_entities_alone(tmp_data_dir):
|
|||||||
description="first meaning")
|
description="first meaning")
|
||||||
r2 = create_entity("requirement", "Dup req", project="p-test",
|
r2 = create_entity("requirement", "Dup req", project="p-test",
|
||||||
description="second meaning")
|
description="second meaning")
|
||||||
detected = detect_conflicts_for_entity(r2.id)
|
# V1-0 synchronous hook already recorded the conflict at r2's
|
||||||
conflict_id = detected[0]
|
# create-time. Look it up via list_open_conflicts rather than
|
||||||
|
# calling the detector again (which returns [] due to dedup).
|
||||||
|
open_list = list_open_conflicts(project="p-test")
|
||||||
|
assert open_list, "expected conflict recorded by create-time hook"
|
||||||
|
conflict_id = open_list[0]["id"]
|
||||||
|
|
||||||
assert resolve_conflict(conflict_id, "dismiss")
|
assert resolve_conflict(conflict_id, "dismiss")
|
||||||
# Both still active — dismiss just clears the conflict marker
|
# Both still active — dismiss just clears the conflict marker
|
||||||
|
|||||||
@@ -132,6 +132,7 @@ def test_api_post_entity_with_null_project_stores_global(seeded_db):
|
|||||||
"entity_type": "material",
|
"entity_type": "material",
|
||||||
"name": "Titanium",
|
"name": "Titanium",
|
||||||
"project": None,
|
"project": None,
|
||||||
|
"hand_authored": True, # V1-0 F-8: test fixture, no source_refs
|
||||||
})
|
})
|
||||||
assert r.status_code == 200
|
assert r.status_code == 200
|
||||||
|
|
||||||
|
|||||||
@@ -1,8 +1,10 @@
|
|||||||
"""Tests for the ingestion pipeline."""
|
"""Tests for the ingestion pipeline."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
|
||||||
from atocore.ingestion.parser import parse_markdown
|
from atocore.ingestion.parser import parse_markdown
|
||||||
from atocore.models.database import get_connection, init_db
|
from atocore.models.database import get_connection, init_db
|
||||||
from atocore.ingestion.pipeline import ingest_file, ingest_folder
|
from atocore.ingestion.pipeline import ingest_file, ingest_folder, ingest_project_folder
|
||||||
|
|
||||||
|
|
||||||
def test_parse_markdown(sample_markdown):
|
def test_parse_markdown(sample_markdown):
|
||||||
@@ -69,6 +71,153 @@ def test_ingest_updates_changed(tmp_data_dir, sample_markdown):
|
|||||||
assert result["status"] == "ingested"
|
assert result["status"] == "ingested"
|
||||||
|
|
||||||
|
|
||||||
|
def test_ingest_file_records_project_id_metadata(tmp_data_dir, sample_markdown, monkeypatch):
|
||||||
|
"""Project-aware ingestion should tag DB and vector metadata exactly."""
|
||||||
|
init_db()
|
||||||
|
|
||||||
|
class FakeVectorStore:
|
||||||
|
def __init__(self):
|
||||||
|
self.metadatas = []
|
||||||
|
|
||||||
|
def add(self, ids, documents, metadatas):
|
||||||
|
self.metadatas.extend(metadatas)
|
||||||
|
|
||||||
|
def delete(self, ids):
|
||||||
|
return None
|
||||||
|
|
||||||
|
fake_store = FakeVectorStore()
|
||||||
|
monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
|
||||||
|
|
||||||
|
result = ingest_file(sample_markdown, project_id="p04-gigabit")
|
||||||
|
|
||||||
|
assert result["status"] == "ingested"
|
||||||
|
assert fake_store.metadatas
|
||||||
|
assert all(meta["project_id"] == "p04-gigabit" for meta in fake_store.metadatas)
|
||||||
|
|
||||||
|
with get_connection() as conn:
|
||||||
|
rows = conn.execute("SELECT metadata FROM source_chunks").fetchall()
|
||||||
|
assert rows
|
||||||
|
assert all(
|
||||||
|
json.loads(row["metadata"])["project_id"] == "p04-gigabit"
|
||||||
|
for row in rows
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ingest_file_derives_project_id_from_registry_root(tmp_data_dir, tmp_path, monkeypatch):
|
||||||
|
"""Unscoped ingest should preserve ownership for files under registered roots."""
|
||||||
|
import atocore.config as config
|
||||||
|
|
||||||
|
vault_dir = tmp_path / "vault"
|
||||||
|
drive_dir = tmp_path / "drive"
|
||||||
|
config_dir = tmp_path / "config"
|
||||||
|
project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
|
||||||
|
project_dir.mkdir(parents=True)
|
||||||
|
drive_dir.mkdir()
|
||||||
|
config_dir.mkdir()
|
||||||
|
note = project_dir / "status.md"
|
||||||
|
note.write_text(
|
||||||
|
"# Status\n\nCurrent project status with enough detail to create "
|
||||||
|
"a retrievable chunk for the ingestion pipeline test.",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
registry_path = config_dir / "project-registry.json"
|
||||||
|
registry_path.write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"projects": [
|
||||||
|
{
|
||||||
|
"id": "p04-gigabit",
|
||||||
|
"aliases": ["p04"],
|
||||||
|
"ingest_roots": [
|
||||||
|
{"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
class FakeVectorStore:
|
||||||
|
def __init__(self):
|
||||||
|
self.metadatas = []
|
||||||
|
|
||||||
|
def add(self, ids, documents, metadatas):
|
||||||
|
self.metadatas.extend(metadatas)
|
||||||
|
|
||||||
|
def delete(self, ids):
|
||||||
|
return None
|
||||||
|
|
||||||
|
fake_store = FakeVectorStore()
|
||||||
|
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||||
|
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||||
|
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||||
|
config.settings = config.Settings()
|
||||||
|
monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
|
||||||
|
|
||||||
|
init_db()
|
||||||
|
result = ingest_file(note)
|
||||||
|
|
||||||
|
assert result["status"] == "ingested"
|
||||||
|
assert fake_store.metadatas
|
||||||
|
assert all(meta["project_id"] == "p04-gigabit" for meta in fake_store.metadatas)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ingest_file_logs_and_fails_open_when_project_derivation_fails(
|
||||||
|
tmp_data_dir,
|
||||||
|
sample_markdown,
|
||||||
|
monkeypatch,
|
||||||
|
):
|
||||||
|
"""A broken registry should be visible but should not block ingestion."""
|
||||||
|
init_db()
|
||||||
|
warnings = []
|
||||||
|
|
||||||
|
class FakeVectorStore:
|
||||||
|
def __init__(self):
|
||||||
|
self.metadatas = []
|
||||||
|
|
||||||
|
def add(self, ids, documents, metadatas):
|
||||||
|
self.metadatas.extend(metadatas)
|
||||||
|
|
||||||
|
def delete(self, ids):
|
||||||
|
return None
|
||||||
|
|
||||||
|
fake_store = FakeVectorStore()
|
||||||
|
monkeypatch.setattr("atocore.ingestion.pipeline.get_vector_store", lambda: fake_store)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.projects.registry.derive_project_id_for_path",
|
||||||
|
lambda path: (_ for _ in ()).throw(ValueError("registry broken")),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.ingestion.pipeline.log.warning",
|
||||||
|
lambda event, **kwargs: warnings.append((event, kwargs)),
|
||||||
|
)
|
||||||
|
|
||||||
|
result = ingest_file(sample_markdown)
|
||||||
|
|
||||||
|
assert result["status"] == "ingested"
|
||||||
|
assert fake_store.metadatas
|
||||||
|
assert all(meta["project_id"] == "" for meta in fake_store.metadatas)
|
||||||
|
assert warnings[0][0] == "project_id_derivation_failed"
|
||||||
|
assert "registry broken" in warnings[0][1]["error"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_ingest_project_folder_passes_project_id_to_files(tmp_data_dir, sample_folder, monkeypatch):
|
||||||
|
seen = []
|
||||||
|
|
||||||
|
def fake_ingest_file(path, project_id=""):
|
||||||
|
seen.append((path.name, project_id))
|
||||||
|
return {"file": str(path), "status": "ingested"}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.ingestion.pipeline.ingest_file", fake_ingest_file)
|
||||||
|
monkeypatch.setattr("atocore.ingestion.pipeline._purge_deleted_files", lambda *args, **kwargs: 0)
|
||||||
|
|
||||||
|
ingest_project_folder(sample_folder, project_id="p05-interferometer")
|
||||||
|
|
||||||
|
assert seen
|
||||||
|
assert {project_id for _, project_id in seen} == {"p05-interferometer"}
|
||||||
|
|
||||||
|
|
||||||
def test_parse_markdown_uses_supplied_text(sample_markdown):
|
def test_parse_markdown_uses_supplied_text(sample_markdown):
|
||||||
"""Parsing should be able to reuse pre-read content from ingestion."""
|
"""Parsing should be able to reuse pre-read content from ingestion."""
|
||||||
latin_text = """---\ntags: parser\n---\n# Parser Title\n\nBody text."""
|
latin_text = """---\ntags: parser\n---\n# Parser Title\n\nBody text."""
|
||||||
|
|||||||
@@ -428,6 +428,136 @@ def test_context_builder_tag_boost_orders_results(isolated_db):
|
|||||||
assert idx_tagged < idx_untagged
|
assert idx_tagged < idx_untagged
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_memory_ranking_ignores_scope_noise(isolated_db):
|
||||||
|
"""Project words should not crowd out the actual query intent."""
|
||||||
|
from atocore.memory.service import create_memory, get_memories_for_context
|
||||||
|
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"Norman is the end operator for p06-polisher and requires an explicit manual mode to operate the machine.",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.7,
|
||||||
|
)
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"Polisher Control firmware spec document titled 'Fulum Polisher Machine Control Firmware Spec v1' lives in PKM.",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.7,
|
||||||
|
)
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"Machine design principle: works fully offline and independently; network connection is for remote access only",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.5,
|
||||||
|
)
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.5,
|
||||||
|
)
|
||||||
|
|
||||||
|
text, _ = get_memories_for_context(
|
||||||
|
memory_types=["project"],
|
||||||
|
project="p06-polisher",
|
||||||
|
budget=360,
|
||||||
|
query="how do we access the polisher machine remotely",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "Tailscale" in text
|
||||||
|
assert text.find("remote access only") < text.find("Tailscale")
|
||||||
|
assert "manual mode" not in text
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_memory_ranking_prefers_multiple_intent_hits(isolated_db):
|
||||||
|
"""A rich memory with several query hits should beat a terse one-hit memory."""
|
||||||
|
from atocore.memory.service import create_memory, get_memories_for_context
|
||||||
|
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"CGH vendor selected for p05. Active integration coordination with Katie/AOM.",
|
||||||
|
project="p05-interferometer",
|
||||||
|
confidence=0.7,
|
||||||
|
)
|
||||||
|
create_memory(
|
||||||
|
"knowledge",
|
||||||
|
"Vendor-summary current signal: 4D is the strongest technical Twyman-Green candidate; "
|
||||||
|
"a certified used Zygo Verifire SV around $55k emerged as a strong value path.",
|
||||||
|
project="p05-interferometer",
|
||||||
|
confidence=0.9,
|
||||||
|
)
|
||||||
|
|
||||||
|
text, _ = get_memories_for_context(
|
||||||
|
memory_types=["project", "knowledge"],
|
||||||
|
project="p05-interferometer",
|
||||||
|
budget=220,
|
||||||
|
query="what is the current vendor signal for the interferometer procurement",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "4D" in text
|
||||||
|
assert "Zygo" in text
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_memory_query_ranks_beyond_confidence_prefilter(isolated_db):
|
||||||
|
"""Query-time ranking should see older low-confidence but exact-intent memories."""
|
||||||
|
from atocore.memory.service import create_memory, get_memories_for_context
|
||||||
|
|
||||||
|
for idx in range(35):
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
f"High confidence p06 filler memory {idx}: Polisher Control planning note.",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.9,
|
||||||
|
)
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.5,
|
||||||
|
)
|
||||||
|
|
||||||
|
text, _ = get_memories_for_context(
|
||||||
|
memory_types=["project"],
|
||||||
|
project="p06-polisher",
|
||||||
|
budget=360,
|
||||||
|
query="how do we access the polisher machine remotely",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "Tailscale" in text
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_memory_query_prefers_exact_cam_fact(isolated_db):
|
||||||
|
from atocore.memory.service import create_memory, get_memories_for_context
|
||||||
|
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"Polisher Control firmware spec document titled 'Fulum Polisher Machine Control Firmware Spec v1' lives in PKM.",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.9,
|
||||||
|
)
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"Polisher Control doc must cover manual mode for Norman as a required deliverable per the plan.",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.9,
|
||||||
|
)
|
||||||
|
create_memory(
|
||||||
|
"project",
|
||||||
|
"Cam amplitude and offset are mechanically set by operator and read via encoders; no actuators control them.",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.5,
|
||||||
|
)
|
||||||
|
|
||||||
|
text, _ = get_memories_for_context(
|
||||||
|
memory_types=["project"],
|
||||||
|
project="p06-polisher",
|
||||||
|
budget=300,
|
||||||
|
query="how is cam amplitude controlled on the polisher",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "encoders" in text
|
||||||
|
|
||||||
|
|
||||||
def test_expire_stale_candidates_keeps_reinforced(isolated_db):
|
def test_expire_stale_candidates_keeps_reinforced(isolated_db):
|
||||||
from atocore.memory.service import create_memory, expire_stale_candidates
|
from atocore.memory.service import create_memory, expire_stale_candidates
|
||||||
from atocore.models.database import get_connection
|
from atocore.models.database import get_connection
|
||||||
|
|||||||
@@ -5,6 +5,7 @@ import json
|
|||||||
import atocore.config as config
|
import atocore.config as config
|
||||||
from atocore.projects.registry import (
|
from atocore.projects.registry import (
|
||||||
build_project_registration_proposal,
|
build_project_registration_proposal,
|
||||||
|
derive_project_id_for_path,
|
||||||
get_registered_project,
|
get_registered_project,
|
||||||
get_project_registry_template,
|
get_project_registry_template,
|
||||||
list_registered_projects,
|
list_registered_projects,
|
||||||
@@ -103,6 +104,98 @@ def test_project_registry_resolves_alias(tmp_path, monkeypatch):
|
|||||||
assert project.project_id == "p05-interferometer"
|
assert project.project_id == "p05-interferometer"
|
||||||
|
|
||||||
|
|
||||||
|
def test_derive_project_id_for_path_uses_registered_roots(tmp_path, monkeypatch):
|
||||||
|
vault_dir = tmp_path / "vault"
|
||||||
|
drive_dir = tmp_path / "drive"
|
||||||
|
config_dir = tmp_path / "config"
|
||||||
|
project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
|
||||||
|
project_dir.mkdir(parents=True)
|
||||||
|
drive_dir.mkdir()
|
||||||
|
config_dir.mkdir()
|
||||||
|
note = project_dir / "status.md"
|
||||||
|
note.write_text("# Status\n\nCurrent work.", encoding="utf-8")
|
||||||
|
|
||||||
|
registry_path = config_dir / "project-registry.json"
|
||||||
|
registry_path.write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"projects": [
|
||||||
|
{
|
||||||
|
"id": "p04-gigabit",
|
||||||
|
"aliases": ["p04"],
|
||||||
|
"ingest_roots": [
|
||||||
|
{"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||||
|
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||||
|
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||||
|
|
||||||
|
original_settings = config.settings
|
||||||
|
try:
|
||||||
|
config.settings = config.Settings()
|
||||||
|
assert derive_project_id_for_path(note) == "p04-gigabit"
|
||||||
|
assert derive_project_id_for_path(tmp_path / "elsewhere.md") == ""
|
||||||
|
finally:
|
||||||
|
config.settings = original_settings
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_registry_rejects_cross_project_ingest_root_overlap(tmp_path, monkeypatch):
|
||||||
|
vault_dir = tmp_path / "vault"
|
||||||
|
drive_dir = tmp_path / "drive"
|
||||||
|
config_dir = tmp_path / "config"
|
||||||
|
vault_dir.mkdir()
|
||||||
|
drive_dir.mkdir()
|
||||||
|
config_dir.mkdir()
|
||||||
|
|
||||||
|
registry_path = config_dir / "project-registry.json"
|
||||||
|
registry_path.write_text(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"projects": [
|
||||||
|
{
|
||||||
|
"id": "parent",
|
||||||
|
"aliases": [],
|
||||||
|
"ingest_roots": [
|
||||||
|
{"source": "vault", "subpath": "incoming/projects/parent"}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "child",
|
||||||
|
"aliases": [],
|
||||||
|
"ingest_roots": [
|
||||||
|
{"source": "vault", "subpath": "incoming/projects/parent/child"}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
]
|
||||||
|
}
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||||
|
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||||
|
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||||
|
|
||||||
|
original_settings = config.settings
|
||||||
|
try:
|
||||||
|
config.settings = config.Settings()
|
||||||
|
try:
|
||||||
|
list_registered_projects()
|
||||||
|
except ValueError as exc:
|
||||||
|
assert "ingest root overlap" in str(exc)
|
||||||
|
else:
|
||||||
|
raise AssertionError("Expected overlapping ingest roots to raise")
|
||||||
|
finally:
|
||||||
|
config.settings = original_settings
|
||||||
|
|
||||||
|
|
||||||
def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypatch):
|
def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypatch):
|
||||||
vault_dir = tmp_path / "vault"
|
vault_dir = tmp_path / "vault"
|
||||||
drive_dir = tmp_path / "drive"
|
drive_dir = tmp_path / "drive"
|
||||||
@@ -133,8 +226,8 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
|
|||||||
|
|
||||||
calls = []
|
calls = []
|
||||||
|
|
||||||
def fake_ingest_folder(path, purge_deleted=True):
|
def fake_ingest_folder(path, purge_deleted=True, project_id=""):
|
||||||
calls.append((str(path), purge_deleted))
|
calls.append((str(path), purge_deleted, project_id))
|
||||||
return [{"file": str(path / "README.md"), "status": "ingested"}]
|
return [{"file": str(path / "README.md"), "status": "ingested"}]
|
||||||
|
|
||||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||||
@@ -144,7 +237,7 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
|
|||||||
original_settings = config.settings
|
original_settings = config.settings
|
||||||
try:
|
try:
|
||||||
config.settings = config.Settings()
|
config.settings = config.Settings()
|
||||||
monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
|
monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fake_ingest_folder)
|
||||||
result = refresh_registered_project("polisher")
|
result = refresh_registered_project("polisher")
|
||||||
finally:
|
finally:
|
||||||
config.settings = original_settings
|
config.settings = original_settings
|
||||||
@@ -153,6 +246,7 @@ def test_refresh_registered_project_ingests_registered_roots(tmp_path, monkeypat
|
|||||||
assert len(calls) == 1
|
assert len(calls) == 1
|
||||||
assert calls[0][0].endswith("p06-polisher")
|
assert calls[0][0].endswith("p06-polisher")
|
||||||
assert calls[0][1] is False
|
assert calls[0][1] is False
|
||||||
|
assert calls[0][2] == "p06-polisher"
|
||||||
assert result["roots"][0]["status"] == "ingested"
|
assert result["roots"][0]["status"] == "ingested"
|
||||||
assert result["status"] == "ingested"
|
assert result["status"] == "ingested"
|
||||||
assert result["roots_ingested"] == 1
|
assert result["roots_ingested"] == 1
|
||||||
@@ -188,7 +282,7 @@ def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
|
|||||||
encoding="utf-8",
|
encoding="utf-8",
|
||||||
)
|
)
|
||||||
|
|
||||||
def fail_ingest_folder(path, purge_deleted=True):
|
def fail_ingest_folder(path, purge_deleted=True, project_id=""):
|
||||||
raise AssertionError(f"ingest_folder should not be called for missing root: {path}")
|
raise AssertionError(f"ingest_folder should not be called for missing root: {path}")
|
||||||
|
|
||||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||||
@@ -198,7 +292,7 @@ def test_refresh_registered_project_reports_nothing_to_ingest_when_all_missing(
|
|||||||
original_settings = config.settings
|
original_settings = config.settings
|
||||||
try:
|
try:
|
||||||
config.settings = config.Settings()
|
config.settings = config.Settings()
|
||||||
monkeypatch.setattr("atocore.projects.registry.ingest_folder", fail_ingest_folder)
|
monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fail_ingest_folder)
|
||||||
result = refresh_registered_project("ghost")
|
result = refresh_registered_project("ghost")
|
||||||
finally:
|
finally:
|
||||||
config.settings = original_settings
|
config.settings = original_settings
|
||||||
@@ -238,7 +332,7 @@ def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch
|
|||||||
encoding="utf-8",
|
encoding="utf-8",
|
||||||
)
|
)
|
||||||
|
|
||||||
def fake_ingest_folder(path, purge_deleted=True):
|
def fake_ingest_folder(path, purge_deleted=True, project_id=""):
|
||||||
return [{"file": str(path / "README.md"), "status": "ingested"}]
|
return [{"file": str(path / "README.md"), "status": "ingested"}]
|
||||||
|
|
||||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||||
@@ -248,7 +342,7 @@ def test_refresh_registered_project_reports_partial_status(tmp_path, monkeypatch
|
|||||||
original_settings = config.settings
|
original_settings = config.settings
|
||||||
try:
|
try:
|
||||||
config.settings = config.Settings()
|
config.settings = config.Settings()
|
||||||
monkeypatch.setattr("atocore.projects.registry.ingest_folder", fake_ingest_folder)
|
monkeypatch.setattr("atocore.ingestion.pipeline.ingest_project_folder", fake_ingest_folder)
|
||||||
result = refresh_registered_project("mixed")
|
result = refresh_registered_project("mixed")
|
||||||
finally:
|
finally:
|
||||||
config.settings = original_settings
|
config.settings = original_settings
|
||||||
|
|||||||
@@ -70,8 +70,28 @@ def test_retrieve_skips_stale_vector_entries(tmp_data_dir, sample_markdown, monk
|
|||||||
|
|
||||||
|
|
||||||
def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
|
def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
|
||||||
|
target_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p04-gigabit",
|
||||||
|
"aliases": ("p04", "gigabit"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
other_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p05-interferometer",
|
||||||
|
"aliases": ("p05", "interferometer"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
|
||||||
class FakeStore:
|
class FakeStore:
|
||||||
def query(self, query_embedding, top_k=10, where=None):
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
assert top_k == 8
|
||||||
return {
|
return {
|
||||||
"ids": [["chunk-a", "chunk-b"]],
|
"ids": [["chunk-a", "chunk-b"]],
|
||||||
"documents": [["project doc", "other doc"]],
|
"documents": [["project doc", "other doc"]],
|
||||||
@@ -102,22 +122,501 @@ def test_retrieve_project_hint_boosts_matching_chunks(monkeypatch):
|
|||||||
)
|
)
|
||||||
monkeypatch.setattr(
|
monkeypatch.setattr(
|
||||||
"atocore.retrieval.retriever.get_registered_project",
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
lambda project_name: type(
|
lambda project_name: target_project,
|
||||||
"Project",
|
)
|
||||||
(),
|
monkeypatch.setattr(
|
||||||
{
|
"atocore.retrieval.retriever.load_project_registry",
|
||||||
"project_id": "p04-gigabit",
|
lambda: [target_project, other_project],
|
||||||
"aliases": ("p04", "gigabit"),
|
|
||||||
"ingest_roots": (),
|
|
||||||
},
|
|
||||||
)(),
|
|
||||||
)
|
)
|
||||||
|
|
||||||
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
|
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
|
||||||
|
|
||||||
assert len(results) == 2
|
assert len(results) == 1
|
||||||
assert results[0].chunk_id == "chunk-a"
|
assert results[0].chunk_id == "chunk-a"
|
||||||
assert results[0].score > results[1].score
|
|
||||||
|
|
||||||
|
def test_retrieve_project_scope_allows_unowned_global_chunks(monkeypatch):
|
||||||
|
target_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p04-gigabit",
|
||||||
|
"aliases": ("p04", "gigabit"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
|
||||||
|
class FakeStore:
|
||||||
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
return {
|
||||||
|
"ids": [["chunk-a", "chunk-global"]],
|
||||||
|
"documents": [["project doc", "global doc"]],
|
||||||
|
"metadatas": [[
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p04-gigabit/pkm/_index.md",
|
||||||
|
"tags": '["p04-gigabit"]',
|
||||||
|
"title": "P04",
|
||||||
|
"document_id": "doc-a",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "shared/engineering-rules.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "Shared engineering rules",
|
||||||
|
"document_id": "doc-global",
|
||||||
|
},
|
||||||
|
]],
|
||||||
|
"distances": [[0.2, 0.21]],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever._existing_chunk_ids",
|
||||||
|
lambda chunk_ids: set(chunk_ids),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
|
lambda project_name: target_project,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.load_project_registry",
|
||||||
|
lambda: [target_project],
|
||||||
|
)
|
||||||
|
|
||||||
|
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
|
||||||
|
|
||||||
|
assert [r.chunk_id for r in results] == ["chunk-a", "chunk-global"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_retrieve_project_scope_filter_can_be_disabled(monkeypatch):
|
||||||
|
target_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p04-gigabit",
|
||||||
|
"aliases": ("p04", "gigabit"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
other_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p05-interferometer",
|
||||||
|
"aliases": ("p05", "interferometer"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
|
||||||
|
class FakeStore:
|
||||||
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
assert top_k == 2
|
||||||
|
return {
|
||||||
|
"ids": [["chunk-a", "chunk-b"]],
|
||||||
|
"documents": [["project doc", "other project doc"]],
|
||||||
|
"metadatas": [[
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p04-gigabit/pkm/_index.md",
|
||||||
|
"tags": '["p04-gigabit"]',
|
||||||
|
"title": "P04",
|
||||||
|
"document_id": "doc-a",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p05-interferometer/pkm/_index.md",
|
||||||
|
"tags": '["p05-interferometer"]',
|
||||||
|
"title": "P05",
|
||||||
|
"document_id": "doc-b",
|
||||||
|
},
|
||||||
|
]],
|
||||||
|
"distances": [[0.2, 0.2]],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.config.settings.rank_project_scope_filter", False)
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever._existing_chunk_ids",
|
||||||
|
lambda chunk_ids: set(chunk_ids),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
|
lambda project_name: target_project,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.load_project_registry",
|
||||||
|
lambda: [target_project, other_project],
|
||||||
|
)
|
||||||
|
|
||||||
|
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
|
||||||
|
|
||||||
|
assert {r.chunk_id for r in results} == {"chunk-a", "chunk-b"}
|
||||||
|
|
||||||
|
|
||||||
|
def test_retrieve_project_scope_ignores_title_for_ownership(monkeypatch):
|
||||||
|
target_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p04-gigabit",
|
||||||
|
"aliases": ("p04", "gigabit"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
other_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p06-polisher",
|
||||||
|
"aliases": ("p06", "polisher", "p11"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
|
||||||
|
class FakeStore:
|
||||||
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
return {
|
||||||
|
"ids": [["chunk-target", "chunk-poisoned-title"]],
|
||||||
|
"documents": [["p04 doc", "p06 doc"]],
|
||||||
|
"metadatas": [[
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p04-gigabit/pkm/_index.md",
|
||||||
|
"tags": '["p04-gigabit"]',
|
||||||
|
"title": "P04",
|
||||||
|
"document_id": "doc-a",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p06-polisher/pkm/architecture.md",
|
||||||
|
"tags": '["p06-polisher"]',
|
||||||
|
"title": "GigaBIT M1 mirror lessons",
|
||||||
|
"document_id": "doc-b",
|
||||||
|
},
|
||||||
|
]],
|
||||||
|
"distances": [[0.2, 0.19]],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever._existing_chunk_ids",
|
||||||
|
lambda chunk_ids: set(chunk_ids),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
|
lambda project_name: target_project,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.load_project_registry",
|
||||||
|
lambda: [target_project, other_project],
|
||||||
|
)
|
||||||
|
|
||||||
|
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
|
||||||
|
|
||||||
|
assert [r.chunk_id for r in results] == ["chunk-target"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_retrieve_project_scope_uses_path_segments_not_substrings(monkeypatch):
|
||||||
|
target_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p05-interferometer",
|
||||||
|
"aliases": ("p05", "interferometer"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
abb_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "abb-space",
|
||||||
|
"aliases": ("abb",),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
|
||||||
|
class FakeStore:
|
||||||
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
return {
|
||||||
|
"ids": [["chunk-target", "chunk-global"]],
|
||||||
|
"documents": [["p05 doc", "global doc"]],
|
||||||
|
"metadatas": [[
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p05-interferometer/pkm/_index.md",
|
||||||
|
"tags": '["p05-interferometer"]',
|
||||||
|
"title": "P05",
|
||||||
|
"document_id": "doc-a",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Abbreviation notes",
|
||||||
|
"source_file": "shared/cabbage-abbreviations.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "ABB-style abbreviations",
|
||||||
|
"document_id": "doc-global",
|
||||||
|
},
|
||||||
|
]],
|
||||||
|
"distances": [[0.2, 0.21]],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever._existing_chunk_ids",
|
||||||
|
lambda chunk_ids: set(chunk_ids),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
|
lambda project_name: target_project,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.load_project_registry",
|
||||||
|
lambda: [target_project, abb_project],
|
||||||
|
)
|
||||||
|
|
||||||
|
results = retrieve("abbreviations", top_k=2, project_hint="p05")
|
||||||
|
|
||||||
|
assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_retrieve_project_scope_prefers_exact_project_id(monkeypatch):
|
||||||
|
target_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p04-gigabit",
|
||||||
|
"aliases": ("p04", "gigabit"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
other_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p06-polisher",
|
||||||
|
"aliases": ("p06", "polisher"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
|
||||||
|
class FakeStore:
|
||||||
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
return {
|
||||||
|
"ids": [["chunk-target", "chunk-other", "chunk-global"]],
|
||||||
|
"documents": [["target doc", "other doc", "global doc"]],
|
||||||
|
"metadatas": [[
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "legacy/unhelpful-path.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "Target",
|
||||||
|
"project_id": "p04-gigabit",
|
||||||
|
"document_id": "doc-a",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p04-gigabit/title-poisoned.md",
|
||||||
|
"tags": '["p04-gigabit"]',
|
||||||
|
"title": "Looks target-owned but is explicit p06",
|
||||||
|
"project_id": "p06-polisher",
|
||||||
|
"document_id": "doc-b",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "shared/global.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "Shared",
|
||||||
|
"project_id": "",
|
||||||
|
"document_id": "doc-global",
|
||||||
|
},
|
||||||
|
]],
|
||||||
|
"distances": [[0.2, 0.19, 0.21]],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever._existing_chunk_ids",
|
||||||
|
lambda chunk_ids: set(chunk_ids),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
|
lambda project_name: target_project,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.load_project_registry",
|
||||||
|
lambda: [target_project, other_project],
|
||||||
|
)
|
||||||
|
|
||||||
|
results = retrieve("mirror architecture", top_k=3, project_hint="p04")
|
||||||
|
|
||||||
|
assert [r.chunk_id for r in results] == ["chunk-target", "chunk-global"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_retrieve_empty_project_id_falls_back_to_path_ownership(monkeypatch):
|
||||||
|
target_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p04-gigabit",
|
||||||
|
"aliases": ("p04", "gigabit"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
other_project = type(
|
||||||
|
"Project",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"project_id": "p05-interferometer",
|
||||||
|
"aliases": ("p05", "interferometer"),
|
||||||
|
"ingest_roots": (),
|
||||||
|
},
|
||||||
|
)()
|
||||||
|
|
||||||
|
class FakeStore:
|
||||||
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
return {
|
||||||
|
"ids": [["chunk-target", "chunk-other"]],
|
||||||
|
"documents": [["target doc", "other doc"]],
|
||||||
|
"metadatas": [[
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p04-gigabit/status.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "Target",
|
||||||
|
"project_id": "",
|
||||||
|
"document_id": "doc-a",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p05-interferometer/status.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "Other",
|
||||||
|
"project_id": "",
|
||||||
|
"document_id": "doc-b",
|
||||||
|
},
|
||||||
|
]],
|
||||||
|
"distances": [[0.2, 0.19]],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever._existing_chunk_ids",
|
||||||
|
lambda chunk_ids: set(chunk_ids),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
|
lambda project_name: target_project,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.load_project_registry",
|
||||||
|
lambda: [target_project, other_project],
|
||||||
|
)
|
||||||
|
|
||||||
|
results = retrieve("mirror architecture", top_k=2, project_hint="p04")
|
||||||
|
|
||||||
|
assert [r.chunk_id for r in results] == ["chunk-target"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_retrieve_unknown_project_hint_does_not_widen_or_filter(monkeypatch):
|
||||||
|
class FakeStore:
|
||||||
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
assert top_k == 2
|
||||||
|
return {
|
||||||
|
"ids": [["chunk-a", "chunk-b"]],
|
||||||
|
"documents": [["doc a", "doc b"]],
|
||||||
|
"metadatas": [[
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "project-a/file.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "A",
|
||||||
|
"document_id": "doc-a",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "project-b/file.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "B",
|
||||||
|
"document_id": "doc-b",
|
||||||
|
},
|
||||||
|
]],
|
||||||
|
"distances": [[0.2, 0.21]],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever._existing_chunk_ids",
|
||||||
|
lambda chunk_ids: set(chunk_ids),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
|
lambda project_name: None,
|
||||||
|
)
|
||||||
|
|
||||||
|
results = retrieve("overview", top_k=2, project_hint="unknown-project")
|
||||||
|
|
||||||
|
assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_retrieve_fails_open_when_project_scope_resolution_fails(monkeypatch):
|
||||||
|
warnings = []
|
||||||
|
|
||||||
|
class FakeStore:
|
||||||
|
def query(self, query_embedding, top_k=10, where=None):
|
||||||
|
assert top_k == 2
|
||||||
|
return {
|
||||||
|
"ids": [["chunk-a", "chunk-b"]],
|
||||||
|
"documents": [["doc a", "doc b"]],
|
||||||
|
"metadatas": [[
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p04-gigabit/file.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "A",
|
||||||
|
"document_id": "doc-a",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"heading_path": "Overview",
|
||||||
|
"source_file": "p05-interferometer/file.md",
|
||||||
|
"tags": "[]",
|
||||||
|
"title": "B",
|
||||||
|
"document_id": "doc-b",
|
||||||
|
},
|
||||||
|
]],
|
||||||
|
"distances": [[0.2, 0.21]],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.get_vector_store", lambda: FakeStore())
|
||||||
|
monkeypatch.setattr("atocore.retrieval.retriever.embed_query", lambda query: [0.0, 0.1])
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever._existing_chunk_ids",
|
||||||
|
lambda chunk_ids: set(chunk_ids),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.get_registered_project",
|
||||||
|
lambda project_name: (_ for _ in ()).throw(ValueError("registry overlap")),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"atocore.retrieval.retriever.log.warning",
|
||||||
|
lambda event, **kwargs: warnings.append((event, kwargs)),
|
||||||
|
)
|
||||||
|
|
||||||
|
results = retrieve("overview", top_k=2, project_hint="p04")
|
||||||
|
|
||||||
|
assert [r.chunk_id for r in results] == ["chunk-a", "chunk-b"]
|
||||||
|
assert {warning[0] for warning in warnings} == {
|
||||||
|
"project_scope_resolution_failed",
|
||||||
|
"project_match_boost_resolution_failed",
|
||||||
|
}
|
||||||
|
assert all("registry overlap" in warning[1]["error"] for warning in warnings)
|
||||||
|
|
||||||
|
|
||||||
def test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths(monkeypatch):
|
def test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths(monkeypatch):
|
||||||
|
|||||||
398
tests/test_v1_0_write_invariants.py
Normal file
398
tests/test_v1_0_write_invariants.py
Normal file
@@ -0,0 +1,398 @@
|
|||||||
|
"""V1-0 write-time invariant tests.
|
||||||
|
|
||||||
|
Covers the Engineering V1 completion plan Phase V1-0 acceptance:
|
||||||
|
- F-1 shared-header fields: extractor_version + canonical_home + hand_authored
|
||||||
|
land in the entities table with working defaults
|
||||||
|
- F-8 provenance enforcement: create_entity raises without source_refs
|
||||||
|
unless hand_authored=True
|
||||||
|
- F-5 synchronous conflict-detection hook on any active-entity write
|
||||||
|
(create_entity with status="active" + the pre-existing promote_entity
|
||||||
|
path); fail-open per conflict-model.md:256
|
||||||
|
- Q-3 "flag, never block": a conflict never 4xx-blocks the write
|
||||||
|
- Q-4 partial trust: get_entities scope_only filters candidates out
|
||||||
|
|
||||||
|
Plan: docs/plans/engineering-v1-completion-plan.md
|
||||||
|
Spec: docs/architecture/engineering-v1-acceptance.md
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from atocore.engineering.service import (
|
||||||
|
EXTRACTOR_VERSION,
|
||||||
|
create_entity,
|
||||||
|
create_relationship,
|
||||||
|
get_entities,
|
||||||
|
get_entity,
|
||||||
|
init_engineering_schema,
|
||||||
|
promote_entity,
|
||||||
|
supersede_entity,
|
||||||
|
)
|
||||||
|
from atocore.models.database import get_connection, init_db
|
||||||
|
|
||||||
|
|
||||||
|
# ---------- F-1: shared-header fields ----------
|
||||||
|
|
||||||
|
|
||||||
|
def test_entity_row_has_shared_header_fields(tmp_data_dir):
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
with get_connection() as conn:
|
||||||
|
cols = {row["name"] for row in conn.execute("PRAGMA table_info(entities)").fetchall()}
|
||||||
|
assert "extractor_version" in cols
|
||||||
|
assert "canonical_home" in cols
|
||||||
|
assert "hand_authored" in cols
|
||||||
|
|
||||||
|
|
||||||
|
def test_created_entity_has_default_extractor_version_and_canonical_home(tmp_data_dir):
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
e = create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Pivot Pin",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["test:fixture"],
|
||||||
|
)
|
||||||
|
assert e.extractor_version == EXTRACTOR_VERSION
|
||||||
|
assert e.canonical_home == "entity"
|
||||||
|
assert e.hand_authored is False
|
||||||
|
|
||||||
|
# round-trip through get_entity to confirm the row mapper returns
|
||||||
|
# the same values (not just the return-by-construct path)
|
||||||
|
got = get_entity(e.id)
|
||||||
|
assert got is not None
|
||||||
|
assert got.extractor_version == EXTRACTOR_VERSION
|
||||||
|
assert got.canonical_home == "entity"
|
||||||
|
assert got.hand_authored is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_explicit_extractor_version_is_persisted(tmp_data_dir):
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
e = create_entity(
|
||||||
|
entity_type="decision",
|
||||||
|
name="Pick GF-PTFE pads",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["interaction:abc"],
|
||||||
|
extractor_version="custom-v2.3",
|
||||||
|
)
|
||||||
|
got = get_entity(e.id)
|
||||||
|
assert got.extractor_version == "custom-v2.3"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------- F-8: provenance enforcement ----------
|
||||||
|
|
||||||
|
|
||||||
|
def test_create_entity_without_provenance_raises(tmp_data_dir):
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
with pytest.raises(ValueError, match="source_refs required"):
|
||||||
|
create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="No Provenance",
|
||||||
|
project="p04-gigabit",
|
||||||
|
hand_authored=False, # explicit — bypasses the test-conftest auto-flag
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_create_entity_with_hand_authored_needs_no_source_refs(tmp_data_dir):
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
e = create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Human Entry",
|
||||||
|
project="p04-gigabit",
|
||||||
|
hand_authored=True,
|
||||||
|
)
|
||||||
|
assert e.hand_authored is True
|
||||||
|
got = get_entity(e.id)
|
||||||
|
assert got.hand_authored is True
|
||||||
|
# source_refs stays empty — the hand_authored flag IS the provenance
|
||||||
|
assert got.source_refs == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_create_entity_with_empty_source_refs_list_is_treated_as_missing(tmp_data_dir):
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
with pytest.raises(ValueError, match="source_refs required"):
|
||||||
|
create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Empty Refs",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=[],
|
||||||
|
hand_authored=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_promote_rejects_legacy_candidate_without_provenance(tmp_data_dir):
|
||||||
|
"""Regression (Codex V1-0 probe): candidate rows can exist in the DB
|
||||||
|
from before V1-0 enforcement (or from paths that bypass create_entity).
|
||||||
|
promote_entity must re-check the invariant and refuse to flip a
|
||||||
|
no-provenance candidate to active. Without this check, the active
|
||||||
|
store can leak F-8 violations in from legacy data."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
# Simulate a pre-V1-0 candidate by inserting directly into the table,
|
||||||
|
# bypassing the service-layer invariant. Real legacy rows look exactly
|
||||||
|
# like this: empty source_refs, hand_authored=0.
|
||||||
|
import uuid as _uuid
|
||||||
|
entity_id = str(_uuid.uuid4())
|
||||||
|
with get_connection() as conn:
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO entities (id, entity_type, name, project, "
|
||||||
|
"description, properties, status, confidence, source_refs, "
|
||||||
|
"extractor_version, canonical_home, hand_authored, "
|
||||||
|
"created_at, updated_at) "
|
||||||
|
"VALUES (?, 'component', 'Legacy Orphan', 'p04-gigabit', "
|
||||||
|
"'', '{}', 'candidate', 1.0, '[]', '', 'entity', 0, "
|
||||||
|
"CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)",
|
||||||
|
(entity_id,),
|
||||||
|
)
|
||||||
|
|
||||||
|
with pytest.raises(ValueError, match="source_refs required"):
|
||||||
|
promote_entity(entity_id)
|
||||||
|
|
||||||
|
# And the row stays a candidate — no half-transition.
|
||||||
|
got = get_entity(entity_id)
|
||||||
|
assert got is not None
|
||||||
|
assert got.status == "candidate"
|
||||||
|
|
||||||
|
|
||||||
|
def test_api_promote_returns_400_on_legacy_no_provenance(tmp_data_dir):
|
||||||
|
"""R14 (Codex, 2026-04-22): the HTTP promote route must translate
|
||||||
|
the V1-0 ValueError for no-provenance candidates into 400, not 500.
|
||||||
|
Previously the route didn't catch ValueError so legacy bad
|
||||||
|
candidates surfaced as a server error."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
import uuid as _uuid
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
from atocore.main import app
|
||||||
|
|
||||||
|
entity_id = str(_uuid.uuid4())
|
||||||
|
with get_connection() as conn:
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO entities (id, entity_type, name, project, "
|
||||||
|
"description, properties, status, confidence, source_refs, "
|
||||||
|
"extractor_version, canonical_home, hand_authored, "
|
||||||
|
"created_at, updated_at) "
|
||||||
|
"VALUES (?, 'component', 'Legacy HTTP', 'p04-gigabit', "
|
||||||
|
"'', '{}', 'candidate', 1.0, '[]', '', 'entity', 0, "
|
||||||
|
"CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)",
|
||||||
|
(entity_id,),
|
||||||
|
)
|
||||||
|
|
||||||
|
client = TestClient(app)
|
||||||
|
r = client.post(f"/entities/{entity_id}/promote")
|
||||||
|
assert r.status_code == 400
|
||||||
|
assert "source_refs required" in r.json().get("detail", "")
|
||||||
|
|
||||||
|
# Row still candidate — the 400 didn't half-transition.
|
||||||
|
got = get_entity(entity_id)
|
||||||
|
assert got is not None
|
||||||
|
assert got.status == "candidate"
|
||||||
|
|
||||||
|
|
||||||
|
def test_promote_accepts_candidate_flagged_hand_authored(tmp_data_dir):
|
||||||
|
"""The other side of the promote re-check: hand_authored=1 with
|
||||||
|
empty source_refs still lets promote succeed, matching
|
||||||
|
create_entity's symmetry."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
import uuid as _uuid
|
||||||
|
entity_id = str(_uuid.uuid4())
|
||||||
|
with get_connection() as conn:
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO entities (id, entity_type, name, project, "
|
||||||
|
"description, properties, status, confidence, source_refs, "
|
||||||
|
"extractor_version, canonical_home, hand_authored, "
|
||||||
|
"created_at, updated_at) "
|
||||||
|
"VALUES (?, 'component', 'Hand Authored Candidate', "
|
||||||
|
"'p04-gigabit', '', '{}', 'candidate', 1.0, '[]', '', "
|
||||||
|
"'entity', 1, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)",
|
||||||
|
(entity_id,),
|
||||||
|
)
|
||||||
|
|
||||||
|
assert promote_entity(entity_id) is True
|
||||||
|
assert get_entity(entity_id).status == "active"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------- F-5: synchronous conflict-detection hook ----------
|
||||||
|
|
||||||
|
|
||||||
|
def test_active_create_runs_conflict_detection_hook(tmp_data_dir, monkeypatch):
|
||||||
|
"""status=active writes trigger detect_conflicts_for_entity."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
called_with: list[str] = []
|
||||||
|
|
||||||
|
def _fake_detect(entity_id: str):
|
||||||
|
called_with.append(entity_id)
|
||||||
|
return []
|
||||||
|
|
||||||
|
import atocore.engineering.conflicts as conflicts_mod
|
||||||
|
monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _fake_detect)
|
||||||
|
|
||||||
|
e = create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Active With Hook",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["test:hook"],
|
||||||
|
status="active",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert called_with == [e.id]
|
||||||
|
|
||||||
|
|
||||||
|
def test_supersede_runs_conflict_detection_on_new_active(tmp_data_dir, monkeypatch):
|
||||||
|
"""Regression (Codex V1-0 probe): per plan's 'every active-entity
|
||||||
|
write path', supersede_entity must trigger synchronous conflict
|
||||||
|
detection. The subject is the `superseded_by` entity — the one
|
||||||
|
whose graph state just changed because a new `supersedes` edge was
|
||||||
|
rooted at it."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
old = create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Old Pad",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["test:old"],
|
||||||
|
status="active",
|
||||||
|
)
|
||||||
|
new = create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="New Pad",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["test:new"],
|
||||||
|
status="active",
|
||||||
|
)
|
||||||
|
|
||||||
|
called_with: list[str] = []
|
||||||
|
|
||||||
|
def _fake_detect(entity_id: str):
|
||||||
|
called_with.append(entity_id)
|
||||||
|
return []
|
||||||
|
|
||||||
|
import atocore.engineering.conflicts as conflicts_mod
|
||||||
|
monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _fake_detect)
|
||||||
|
|
||||||
|
assert supersede_entity(old.id, superseded_by=new.id) is True
|
||||||
|
|
||||||
|
# The detector fires on the `superseded_by` entity — the one whose
|
||||||
|
# edges just grew a new `supersedes` relationship.
|
||||||
|
assert new.id in called_with
|
||||||
|
|
||||||
|
|
||||||
|
def test_supersede_hook_fails_open(tmp_data_dir, monkeypatch):
|
||||||
|
"""Supersede must survive a broken detector per Q-3 flag-never-block."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
old = create_entity(
|
||||||
|
entity_type="component", name="Old2", project="p04-gigabit",
|
||||||
|
source_refs=["test:old"], status="active",
|
||||||
|
)
|
||||||
|
new = create_entity(
|
||||||
|
entity_type="component", name="New2", project="p04-gigabit",
|
||||||
|
source_refs=["test:new"], status="active",
|
||||||
|
)
|
||||||
|
|
||||||
|
def _boom(entity_id: str):
|
||||||
|
raise RuntimeError("synthetic detector failure")
|
||||||
|
|
||||||
|
import atocore.engineering.conflicts as conflicts_mod
|
||||||
|
monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _boom)
|
||||||
|
|
||||||
|
# The supersede still succeeds despite the detector blowing up.
|
||||||
|
assert supersede_entity(old.id, superseded_by=new.id) is True
|
||||||
|
assert get_entity(old.id).status == "superseded"
|
||||||
|
|
||||||
|
|
||||||
|
def test_candidate_create_does_not_run_conflict_hook(tmp_data_dir, monkeypatch):
|
||||||
|
"""status=candidate writes do NOT trigger detection — the hook is
|
||||||
|
for active rows only, per V1-0 scope. Candidates are checked at
|
||||||
|
promote time."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
called: list[str] = []
|
||||||
|
|
||||||
|
def _fake_detect(entity_id: str):
|
||||||
|
called.append(entity_id)
|
||||||
|
return []
|
||||||
|
|
||||||
|
import atocore.engineering.conflicts as conflicts_mod
|
||||||
|
monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _fake_detect)
|
||||||
|
|
||||||
|
create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Candidate No Hook",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["test:cand"],
|
||||||
|
status="candidate",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert called == []
|
||||||
|
|
||||||
|
|
||||||
|
# ---------- Q-3: flag, never block ----------
|
||||||
|
|
||||||
|
|
||||||
|
def test_conflict_detector_failure_does_not_block_write(tmp_data_dir, monkeypatch):
|
||||||
|
"""Per conflict-model.md:256: detection errors must not fail the
|
||||||
|
write. The entity is still created; only a warning is logged."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
def _boom(entity_id: str):
|
||||||
|
raise RuntimeError("synthetic detector failure")
|
||||||
|
|
||||||
|
import atocore.engineering.conflicts as conflicts_mod
|
||||||
|
monkeypatch.setattr(conflicts_mod, "detect_conflicts_for_entity", _boom)
|
||||||
|
|
||||||
|
# The write still succeeds — no exception propagates.
|
||||||
|
e = create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Hook Fails Open",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["test:failopen"],
|
||||||
|
status="active",
|
||||||
|
)
|
||||||
|
assert get_entity(e.id) is not None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------- Q-4 (partial): trust-hierarchy — scope_only filters candidates ----------
|
||||||
|
|
||||||
|
|
||||||
|
def test_scope_only_active_does_not_return_candidates(tmp_data_dir):
|
||||||
|
"""V1-0 partial Q-4: active-scoped listing never returns candidates.
|
||||||
|
Full trust-hierarchy coverage (no-auto-project-state, etc.) ships in
|
||||||
|
V1-E per plan."""
|
||||||
|
init_db()
|
||||||
|
init_engineering_schema()
|
||||||
|
|
||||||
|
active = create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Active Alpha",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["test:alpha"],
|
||||||
|
status="active",
|
||||||
|
)
|
||||||
|
candidate = create_entity(
|
||||||
|
entity_type="component",
|
||||||
|
name="Candidate Beta",
|
||||||
|
project="p04-gigabit",
|
||||||
|
source_refs=["test:beta"],
|
||||||
|
status="candidate",
|
||||||
|
)
|
||||||
|
|
||||||
|
listed = get_entities(project="p04-gigabit", status="active", scope_only=True)
|
||||||
|
ids = {e.id for e in listed}
|
||||||
|
assert active.id in ids
|
||||||
|
assert candidate.id not in ids
|
||||||
Reference in New Issue
Block a user