docs: Day 5 — extractor scope + stale follow-ups cleaned

Documents the LLM-assisted extractor's in-scope / out-of-scope categories derived from the first live triage pass (16 promoted, 35 rejected). Five in-scope classes, six explicit out-of-scope classes, trust model summary, multi-model future direction. Cleaned up stale follow-up items in next-steps.md: rule expansion marked deprioritized, LLM extractor marked done, retrieval harness marked done with expansion pending. Fixed docstring timeout (45s -> 90s) in extractor_llm.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 06:24:25 -04:00
parent b98a658831
commit 93f796207f
2 changed files with 47 additions and 8 deletions
--- a/docs/next-steps.md
+++ b/docs/next-steps.md
@@ -226,14 +226,53 @@ candidate was a synthetic test capture from earlier in the session
 - Capture → reinforce is working correctly on live data (length-aware
  matcher verified on live paraphrase of a p04 memory).
-Follow-up candidates (not yet scheduled):
+Follow-up candidates:
-1. Extractor rule expansion — add conversational-form rules so real
+1. ~~Extractor rule expansion~~ — Day 2 baseline showed 0% recall
-   session text has a chance of surfacing candidates.
+   across 5 distinct miss classes; rule expansion cannot close a
-2. LLM-assisted extractor as a separate rule family, guarded by
+   5-way miss. Deprioritized.
-   confidence and always landing in `status=candidate` (never active).
+2. ~~LLM-assisted extractor~~ — DONE 2026-04-12. `extractor_llm.py`
-3. Retrieval eval harness — diffable scorecard of
+   shells out to `claude -p` (Haiku, OAuth, no API key). First live
-   `formatted_context` across a fixed question set per active project.
+   run: 100% recall, 2.55 yield/interaction on a 20-interaction
   labeled set. First triage: 51 candidates → 16 promoted, 35
   rejected (31% accept rate). Active memories 20 → 36.
 3. ~~Retrieval eval harness~~ — DONE 2026-04-11 (scripts/retrieval_eval.py,
   6/6 passing). Expansion to 15-20 fixtures is mini-phase Day 6.
 ## Extractor Scope — 2026-04-12
 What the LLM-assisted extractor (`src/atocore/memory/extractor_llm.py`)
 extracts from conversational Claude Code captures:
 **In scope:**
 - Architectural commitments (e.g. "Z-axis is engage/retract, not
  continuous position")
 - Ratified decisions with project scope (e.g. "USB SSD mandatory on
  RPi for telemetry storage")
 - Durable engineering facts (e.g. "telemetry data rate ~29 MB/hour")
 - Working rules and adaptation patterns (e.g. "extraction stays off
  the capture hot path")
 - Interface invariants (e.g. "controller-job.v1 in, run-log.v1 out;
  no firmware change needed")
 **Out of scope (intentionally rejected by triage):**
 - Transient roadmap / plan steps that will be stale in a week
 - Operational instructions ("run this command to deploy")
 - Process rules that live in DEV-LEDGER.md / AGENTS.md, not in memory
 - Implementation details that are too granular (individual field names
  when the parent concept is already captured)
 - Already-fixed review findings (P1/P2 that no longer apply)
 - Duplicates of existing active memories with wrong project tags
 **Trust model:**
 - Extraction stays off the capture hot path (batch / manual only)
 - All candidates land as `status=candidate`, never auto-promoted
 - Human or auto-triage reviews before promotion to active
 - Future direction: multi-model extraction + triage (Codex/Gemini as
  second-pass reviewers for robustness against single-model bias)
 ## Long-Run Goal
--- a/src/atocore/memory/extractor_llm.py
+++ b/src/atocore/memory/extractor_llm.py
@@ -29,7 +29,7 @@ Configuration:
 - ``ATOCORE_LLM_EXTRACTOR_MODEL`` overrides the model alias (default
  ``haiku``).
 - ``ATOCORE_LLM_EXTRACTOR_TIMEOUT_S`` overrides the per-call timeout
-  (default 45 seconds — first invocation is slow because Node.js
+  (default 90 seconds — first invocation is slow because Node.js
  startup plus OAuth check is non-trivial).
 Implementation notes: