docs: Day 5 — extractor scope + stale follow-ups cleaned

Documents the LLM-assisted extractor's in-scope / out-of-scope
categories derived from the first live triage pass (16 promoted,
35 rejected). Five in-scope classes, six explicit out-of-scope
classes, trust model summary, multi-model future direction.

Cleaned up stale follow-up items in next-steps.md: rule expansion
marked deprioritized, LLM extractor marked done, retrieval harness
marked done with expansion pending.

Fixed docstring timeout (45s -> 90s) in extractor_llm.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-12 06:24:25 -04:00
parent b98a658831
commit 93f796207f
2 changed files with 47 additions and 8 deletions

View File

@@ -226,14 +226,53 @@ candidate was a synthetic test capture from earlier in the session
- Capture → reinforce is working correctly on live data (length-aware
matcher verified on live paraphrase of a p04 memory).
Follow-up candidates (not yet scheduled):
Follow-up candidates:
1. Extractor rule expansion — add conversational-form rules so real
session text has a chance of surfacing candidates.
2. LLM-assisted extractor as a separate rule family, guarded by
confidence and always landing in `status=candidate` (never active).
3. Retrieval eval harness — diffable scorecard of
`formatted_context` across a fixed question set per active project.
1. ~~Extractor rule expansion~~ — Day 2 baseline showed 0% recall
across 5 distinct miss classes; rule expansion cannot close a
5-way miss. Deprioritized.
2. ~~LLM-assisted extractor~~ — DONE 2026-04-12. `extractor_llm.py`
shells out to `claude -p` (Haiku, OAuth, no API key). First live
run: 100% recall, 2.55 yield/interaction on a 20-interaction
labeled set. First triage: 51 candidates → 16 promoted, 35
rejected (31% accept rate). Active memories 20 → 36.
3. ~~Retrieval eval harness~~ — DONE 2026-04-11 (scripts/retrieval_eval.py,
6/6 passing). Expansion to 15-20 fixtures is mini-phase Day 6.
## Extractor Scope — 2026-04-12
What the LLM-assisted extractor (`src/atocore/memory/extractor_llm.py`)
extracts from conversational Claude Code captures:
**In scope:**
- Architectural commitments (e.g. "Z-axis is engage/retract, not
continuous position")
- Ratified decisions with project scope (e.g. "USB SSD mandatory on
RPi for telemetry storage")
- Durable engineering facts (e.g. "telemetry data rate ~29 MB/hour")
- Working rules and adaptation patterns (e.g. "extraction stays off
the capture hot path")
- Interface invariants (e.g. "controller-job.v1 in, run-log.v1 out;
no firmware change needed")
**Out of scope (intentionally rejected by triage):**
- Transient roadmap / plan steps that will be stale in a week
- Operational instructions ("run this command to deploy")
- Process rules that live in DEV-LEDGER.md / AGENTS.md, not in memory
- Implementation details that are too granular (individual field names
when the parent concept is already captured)
- Already-fixed review findings (P1/P2 that no longer apply)
- Duplicates of existing active memories with wrong project tags
**Trust model:**
- Extraction stays off the capture hot path (batch / manual only)
- All candidates land as `status=candidate`, never auto-promoted
- Human or auto-triage reviews before promotion to active
- Future direction: multi-model extraction + triage (Codex/Gemini as
second-pass reviewers for robustness against single-model bias)
## Long-Run Goal

View File

@@ -29,7 +29,7 @@ Configuration:
- ``ATOCORE_LLM_EXTRACTOR_MODEL`` overrides the model alias (default
``haiku``).
- ``ATOCORE_LLM_EXTRACTOR_TIMEOUT_S`` overrides the per-call timeout
(default 45 seconds — first invocation is slow because Node.js
(default 90 seconds — first invocation is slow because Node.js
startup plus OAuth check is non-trivial).
Implementation notes: