docs: Day 5 — extractor scope + stale follow-ups cleaned

Documents the LLM-assisted extractor's in-scope / out-of-scope categories derived from the first live triage pass (16 promoted, 35 rejected). Five in-scope classes, six explicit out-of-scope classes, trust model summary, multi-model future direction. Cleaned up stale follow-up items in next-steps.md: rule expansion marked deprioritized, LLM extractor marked done, retrieval harness marked done with expansion pending. Fixed docstring timeout (45s -> 90s) in extractor_llm.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 06:24:25 -04:00
parent b98a658831
commit 93f796207f
2 changed files with 47 additions and 8 deletions
--- a/docs/next-steps.md
+++ b/docs/next-steps.md
@@ -226,14 +226,53 @@ candidate was a synthetic test capture from earlier in the session
 - Capture → reinforce is working correctly on live data (length-aware
  matcher verified on live paraphrase of a p04 memory).

-Follow-up candidates (not yet scheduled):
+Follow-up candidates:

-1. Extractor rule expansion — add conversational-form rules so real
-   session text has a chance of surfacing candidates.
-2. LLM-assisted extractor as a separate rule family, guarded by
-   confidence and always landing in `status=candidate` (never active).
-3. Retrieval eval harness — diffable scorecard of
-   `formatted_context` across a fixed question set per active project.
+1. ~~Extractor rule expansion~~ — Day 2 baseline showed 0% recall
+   across 5 distinct miss classes; rule expansion cannot close a
+   5-way miss. Deprioritized.
+2. ~~LLM-assisted extractor~~ — DONE 2026-04-12. `extractor_llm.py`
+   shells out to `claude -p` (Haiku, OAuth, no API key). First live
+   run: 100% recall, 2.55 yield/interaction on a 20-interaction
+   labeled set. First triage: 51 candidates → 16 promoted, 35
+   rejected (31% accept rate). Active memories 20 → 36.
+3. ~~Retrieval eval harness~~ — DONE 2026-04-11 (scripts/retrieval_eval.py,
+   6/6 passing). Expansion to 15-20 fixtures is mini-phase Day 6.
+
+## Extractor Scope — 2026-04-12
+
+What the LLM-assisted extractor (`src/atocore/memory/extractor_llm.py`)
+extracts from conversational Claude Code captures:
+
+**In scope:**
+
+- Architectural commitments (e.g. "Z-axis is engage/retract, not
+  continuous position")
+- Ratified decisions with project scope (e.g. "USB SSD mandatory on
+  RPi for telemetry storage")
+- Durable engineering facts (e.g. "telemetry data rate ~29 MB/hour")
+- Working rules and adaptation patterns (e.g. "extraction stays off
+  the capture hot path")
+- Interface invariants (e.g. "controller-job.v1 in, run-log.v1 out;
+  no firmware change needed")
+
+**Out of scope (intentionally rejected by triage):**
+
+- Transient roadmap / plan steps that will be stale in a week
+- Operational instructions ("run this command to deploy")
+- Process rules that live in DEV-LEDGER.md / AGENTS.md, not in memory
+- Implementation details that are too granular (individual field names
+  when the parent concept is already captured)
+- Already-fixed review findings (P1/P2 that no longer apply)
+- Duplicates of existing active memories with wrong project tags
+
+**Trust model:**
+
+- Extraction stays off the capture hot path (batch / manual only)
+- All candidates land as `status=candidate`, never auto-promoted
+- Human or auto-triage reviews before promotion to active
+- Future direction: multi-model extraction + triage (Codex/Gemini as
+  second-pass reviewers for robustness against single-model bias)

 ## Long-Run Goal

--- a/src/atocore/memory/extractor_llm.py
+++ b/src/atocore/memory/extractor_llm.py
@@ -29,7 +29,7 @@ Configuration:
 - ``ATOCORE_LLM_EXTRACTOR_MODEL`` overrides the model alias (default
  ``haiku``).
 - ``ATOCORE_LLM_EXTRACTOR_TIMEOUT_S`` overrides the per-call timeout
-  (default 45 seconds — first invocation is slow because Node.js
+  (default 90 seconds — first invocation is slow because Node.js
  startup plus OAuth check is non-trivial).

 Implementation notes: