Files
ATOCore/scripts/retrieval_eval_fixtures.json
Anto01 3921c5ffc7 test: Day 6 — retrieval harness expanded from 6 to 18 fixtures
Added 12 new fixtures across all three active projects:

- p04: 1 short/ambiguous case ('current status')
- p05: 1 CGH calibration case with cross-project bleed guard
- p06: 7 new fixtures targeting triage-promoted memories
  (firmware interface, z-axis, cam encoder, telemetry rate,
  offline design, USB SSD, Tailscale)
- Adversarial: cross-project-no-bleed (p04 query must not surface
  p06 telemetry rate), no-project-hint (project memories must not
  appear without a hint)

First run: 14/18 passing.

4 failures (p06-firmware-interface, p06-z-axis, p06-offline-design,
p06-tailscale) share the same root cause: long pre-existing p06
memories (530+ chars, confidence 0.9+) fill the 25% project-memory
budget before the query-relevant newly-promoted memories (shorter,
confidence 0.5) get a slot. Budget contention at equal overlap
score tiebroken by confidence. Day 7 ranking tweak target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 06:32:47 -04:00

226 lines
6.3 KiB
JSON

[
{
"name": "p04-architecture-decision",
"project": "p04-gigabit",
"prompt": "what mirror architecture was selected for GigaBIT M1 and why",
"expect_present": [
"--- Trusted Project State ---",
"Option B",
"conical",
"--- Project Memories ---"
],
"expect_absent": [
"p06-polisher",
"folded-beam"
],
"notes": "Canonical p04 decision — should surface both Trusted Project State and the project-memory band"
},
{
"name": "p04-constraints",
"project": "p04-gigabit",
"prompt": "what are the key GigaBIT M1 program constraints",
"expect_present": [
"--- Trusted Project State ---",
"Zerodur",
"1.2"
],
"expect_absent": [
"polisher suite"
],
"notes": "Key constraints are in Trusted Project State and in the mission-framing memory"
},
{
"name": "p04-short-ambiguous",
"project": "p04-gigabit",
"prompt": "current status",
"expect_present": [
"--- Trusted Project State ---"
],
"expect_absent": [],
"notes": "Short ambiguous prompt — at minimum project state should surface. Hard case: the prompt is generic enough that chunks may not rank well."
},
{
"name": "p05-configuration",
"project": "p05-interferometer",
"prompt": "what is the selected interferometer configuration",
"expect_present": [
"folded-beam",
"CGH"
],
"expect_absent": [
"Option B",
"conical back",
"polisher suite"
],
"notes": "P05 architecture memory covers folded-beam + CGH. GigaBIT M1 legitimately appears in p05 source docs."
},
{
"name": "p05-vendor-signal",
"project": "p05-interferometer",
"prompt": "what is the current vendor signal for the interferometer procurement",
"expect_present": [
"4D",
"Zygo"
],
"expect_absent": [
"polisher"
],
"notes": "Vendor memory mentions 4D as strongest technical candidate and Zygo Verifire SV as value path"
},
{
"name": "p05-cgh-calibration",
"project": "p05-interferometer",
"prompt": "how does CGH calibration work for the interferometer",
"expect_present": [
"CGH"
],
"expect_absent": [
"polisher-sim",
"polisher-post"
],
"notes": "CGH is a core p05 concept. Should surface via chunks and possibly the architecture memory. Must not bleed p06 polisher-suite terms."
},
{
"name": "p06-suite-split",
"project": "p06-polisher",
"prompt": "how is the polisher software suite split across layers",
"expect_present": [
"polisher-sim",
"polisher-post",
"polisher-control"
],
"expect_absent": [
"GigaBIT"
],
"notes": "The three-layer split is in multiple p06 memories"
},
{
"name": "p06-control-rule",
"project": "p06-polisher",
"prompt": "what is the polisher control design rule",
"expect_present": [
"interlocks"
],
"expect_absent": [
"interferometer"
],
"notes": "Control design rule memory mentions interlocks and state transitions"
},
{
"name": "p06-firmware-interface",
"project": "p06-polisher",
"prompt": "what is the firmware interface contract for the polisher machine",
"expect_present": [
"controller-job"
],
"expect_absent": [
"interferometer",
"GigaBIT"
],
"notes": "New p06 memory from the first triage: firmware interface contract is invariant controller-job.v1 in, run-log.v1 out"
},
{
"name": "p06-z-axis",
"project": "p06-polisher",
"prompt": "how does the polisher Z-axis work",
"expect_present": [
"engage"
],
"expect_absent": [
"interferometer"
],
"notes": "New p06 memory: Z-axis is binary engage/retract, not continuous position. The word 'engage' should appear."
},
{
"name": "p06-cam-mechanism",
"project": "p06-polisher",
"prompt": "how is cam amplitude controlled on the polisher",
"expect_present": [
"encoder"
],
"expect_absent": [
"GigaBIT"
],
"notes": "New p06 memory: cam set mechanically by operator, read by encoders. The word 'encoder' should appear."
},
{
"name": "p06-telemetry-rate",
"project": "p06-polisher",
"prompt": "what is the expected polishing telemetry data rate",
"expect_present": [
"29 MB"
],
"expect_absent": [
"interferometer"
],
"notes": "New p06 knowledge memory: approximately 29 MB per hour at 100 Hz"
},
{
"name": "p06-offline-design",
"project": "p06-polisher",
"prompt": "does the polisher machine need network to operate",
"expect_present": [
"offline"
],
"expect_absent": [
"CGH"
],
"notes": "New p06 memory: machine works fully offline and independently; network is for remote access only"
},
{
"name": "p06-short-ambiguous",
"project": "p06-polisher",
"prompt": "current status",
"expect_present": [
"--- Trusted Project State ---"
],
"expect_absent": [],
"notes": "Short ambiguous prompt — project state should surface at minimum"
},
{
"name": "cross-project-no-bleed",
"project": "p04-gigabit",
"prompt": "what telemetry rate should we target",
"expect_present": [],
"expect_absent": [
"29 MB",
"polisher"
],
"notes": "Adversarial: telemetry rate is a p06 fact. A p04 query for 'telemetry rate' must NOT surface p06 memories. Tests cross-project gating."
},
{
"name": "no-project-hint",
"project": "",
"prompt": "tell me about the current projects",
"expect_present": [],
"expect_absent": [
"--- Project Memories ---"
],
"notes": "Without a project hint, project memories must not appear (cross-project bleed guard). Chunks may appear if any match."
},
{
"name": "p06-usb-ssd",
"project": "p06-polisher",
"prompt": "what storage solution is specified for the polisher RPi",
"expect_present": [
"USB SSD"
],
"expect_absent": [
"interferometer"
],
"notes": "New p06 memory from triage: USB SSD mandatory, not SD card"
},
{
"name": "p06-tailscale",
"project": "p06-polisher",
"prompt": "how do we access the polisher machine remotely",
"expect_present": [
"Tailscale"
],
"expect_absent": [
"GigaBIT"
],
"notes": "New p06 memory: Tailscale mesh for RPi remote access"
}
]