scripts/retrieval_eval.py walks a fixture file of project-hinted
questions, runs each against POST /context/build, and scores the
returned formatted_context against per-fixture expect_present and
expect_absent substring checklists. Exit 0 on all-pass, 1 on any
miss. Human-readable by default, --json for automation.
First live run against Dalidou at SHA 1161645: 4/6 pass. The two
failures are real findings, not harness bugs:
- p05-configuration FAIL: "GigaBIT M1" appears in the p05 pack.
Cross-project bleed from a shared p05 doc that legitimately
mentions the p04 mirror under test. Fixture kept strict so
future ranker tuning can close the gap.
- p05-vendor-signal FAIL: "Zygo" missing. The vendor memory exists
with confidence 0.9 but get_memories_for_context walks memories
in fixed order (effectively by updated_at / confidence), so lower-
ranked memories get pushed out of the per-project budget slice by
higher-confidence ones even when the query is specifically about
the lower-ranked content. Query-relevance ordering of memories is
the natural next fix.
Docs sync:
- master-plan-status.md: Phase 9 reflection entry now notes that
capture→reinforce runs automatically and project memories reach
the context pack, while extract remains batch/manual. First batch-
extract pass surfaced 1 candidate from 42 interactions — extractor
rule tuning is a known follow-up.
- next-steps.md: the 2026-04-11 retrieval quality review entry now
shows the project-memory-band work as DONE, and a new
"Reflection Loop Live Check" subsection records the extractor-
coverage finding from the first batch run.
- Both files now agree with the code; follow-up reviewers
(Codex, future Claude) should no longer see narrative drift.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
86 lines
2.3 KiB
JSON
86 lines
2.3 KiB
JSON
[
|
|
{
|
|
"name": "p04-architecture-decision",
|
|
"project": "p04-gigabit",
|
|
"prompt": "what mirror architecture was selected for GigaBIT M1 and why",
|
|
"expect_present": [
|
|
"--- Trusted Project State ---",
|
|
"Option B",
|
|
"conical",
|
|
"--- Project Memories ---"
|
|
],
|
|
"expect_absent": [
|
|
"p06-polisher",
|
|
"folded-beam"
|
|
],
|
|
"notes": "Canonical p04 decision — should surface both Trusted Project State (selected_mirror_architecture) and the project-memory band with the Option B memory"
|
|
},
|
|
{
|
|
"name": "p04-constraints",
|
|
"project": "p04-gigabit",
|
|
"prompt": "what are the key GigaBIT M1 program constraints",
|
|
"expect_present": [
|
|
"--- Trusted Project State ---",
|
|
"Zerodur",
|
|
"1.2"
|
|
],
|
|
"expect_absent": [
|
|
"polisher suite"
|
|
],
|
|
"notes": "Key constraints are in Trusted Project State (key_constraints) and in the mission-framing memory"
|
|
},
|
|
{
|
|
"name": "p05-configuration",
|
|
"project": "p05-interferometer",
|
|
"prompt": "what is the selected interferometer configuration",
|
|
"expect_present": [
|
|
"folded-beam",
|
|
"CGH"
|
|
],
|
|
"expect_absent": [
|
|
"p04-gigabit",
|
|
"GigaBIT M1"
|
|
],
|
|
"notes": "P05 architecture memory covers folded-beam + CGH; should not bleed p04"
|
|
},
|
|
{
|
|
"name": "p05-vendor-signal",
|
|
"project": "p05-interferometer",
|
|
"prompt": "what is the current vendor signal for the interferometer procurement",
|
|
"expect_present": [
|
|
"4D",
|
|
"Zygo"
|
|
],
|
|
"expect_absent": [
|
|
"polisher"
|
|
],
|
|
"notes": "Vendor memory mentions 4D as strongest technical candidate and Zygo Verifire SV as value path"
|
|
},
|
|
{
|
|
"name": "p06-suite-split",
|
|
"project": "p06-polisher",
|
|
"prompt": "how is the polisher software suite split across layers",
|
|
"expect_present": [
|
|
"polisher-sim",
|
|
"polisher-post",
|
|
"polisher-control"
|
|
],
|
|
"expect_absent": [
|
|
"GigaBIT"
|
|
],
|
|
"notes": "The three-layer split is in multiple p06 memories; check all three names surface together"
|
|
},
|
|
{
|
|
"name": "p06-control-rule",
|
|
"project": "p06-polisher",
|
|
"prompt": "what is the polisher control design rule",
|
|
"expect_present": [
|
|
"interlocks"
|
|
],
|
|
"expect_absent": [
|
|
"interferometer"
|
|
],
|
|
"notes": "Control design rule memory mentions interlocks and state transitions"
|
|
}
|
|
]
|