Commit Graph

1 Commits

Author SHA1 Message Date
53147d326c feat(phase9-C): rule-based candidate extractor and review queue
Phase 9 Commit C. Closes the capture loop: Commit A records what
AtoCore fed the LLM and what came back, Commit B bumps confidence on
active memories the response actually references, and this commit
turns structured cues in the response into candidate memories for a
human review queue.

Nothing extracted here is ever automatically promoted into trusted
state. Every candidate sits at status="candidate" until a human (or
later, a confident automatic policy) calls /memory/{id}/promote or
/memory/{id}/reject. This keeps the "bad memory is worse than no
memory" invariant from the operating model intact.

New module: src/atocore/memory/extractor.py
- MemoryCandidate dataclass (type, content, rule, source_span,
  project, confidence, source_interaction_id)
- extract_candidates_from_interaction(interaction): runs a fixed set
  of regex rules over the response + response_summary and returns
  a list of candidates

V0 rule set (deliberately narrow to keep false positives low):
- decision_heading     ## Decision: / ## Decision - / ## Decision —
                       -> adaptation candidate
- constraint_heading   ## Constraint: ...      -> project candidate
- requirement_heading  ## Requirement: ...     -> project candidate
- fact_heading         ## Fact: ...            -> knowledge candidate
- preference_sentence  "I prefer X" / "the user prefers X"
                       -> preference candidate
- decided_to_sentence  "decided to X"          -> adaptation candidate
- requirement_sentence "the requirement is X"  -> project candidate

Extractor post-processing:
- clean_value: collapse whitespace, strip trailing punctuation
- min content length 8 chars, max 280 (keeps candidates reviewable)
- dedupe by (memory_type, normalized value, rule)
- drop candidates whose content already matches an active memory of
  the same type+project so the queue doesn't ask humans to re-curate
  things they already promoted

Memory service (extends Commit B candidate-status foundation):
- promote_memory(id): candidate -> active (404 if not a candidate)
- reject_candidate_memory(id): candidate -> invalid
- both are no-ops if the target isn't currently a candidate so the
  API can surface 404 without the caller needing to pre-check

API endpoints (new):
- POST /interactions/{id}/extract             run extractor, preview-only
  body: {"persist": false}                    (default) returns candidates
        {"persist": true}                     creates candidate memories
- POST /memory/{id}/promote                   candidate -> active
- POST /memory/{id}/reject                    candidate -> invalid
- GET  /memory?status=candidate               list review queue explicitly
      (existing endpoint now accepts status= override)
- GET  /memory now also returns reference_count and last_referenced_at
  per memory so the Commit B reinforcement signal is visible to clients

Trust model unchanged:
- candidates NEVER appear in context packs (get_memories_for_context
  still filters to active via the active_only default)
- candidates NEVER get reinforced by the Commit B loop (reinforcement
  refuses non-active memories)
- trusted project state is untouched end-to-end

Tests (25 new, all green):
- heading pattern: decision, constraint, requirement, fact
- separator variants :, -, em-dash
- sentence patterns: preference, decided_to, requirement
- rejects too-short matches
- dedupes identical matches
- strips trailing punctuation
- carries project and source_interaction_id onto candidates
- drops candidates that duplicate an existing active memory
- returns empty for prose without structural cues
- candidate and active coexist in the memory table
- promote_memory moves candidate -> active
- promote on non-candidate returns False
- reject_candidate_memory moves candidate -> invalid
- reject on non-candidate returns False
- get_memories(status="candidate") returns just the queue
- POST /interactions/{id}/extract preview-only path
- POST /interactions/{id}/extract persist=true path
- POST /interactions/{id}/extract 404 for missing interaction
- POST /memory/{id}/promote success + 404 on non-candidate
- POST /memory/{id}/reject 404 on missing
- GET /memory?status=candidate surfaces the queue
- GET /memory?status=<invalid> returns 400

Full suite: 160 passing (was 135).

What Phase 9 looks like end to end after this commit
----------------------------------------------------
prompt
  -> context pack assembled
    -> LLM response
      -> POST /interactions (capture)
         -> automatic Commit B reinforcement (active memories only)
         -> [optional] POST /interactions/{id}/extract
            -> Commit C extractor proposes candidates
               -> human reviews via GET /memory?status=candidate
                  -> POST /memory/{id}/promote  (candidate -> active)
                  OR POST /memory/{id}/reject   (candidate -> invalid)

Not in this commit (deferred on purpose):
- Decay of unused memories (we keep reference_count and
  last_referenced_at so a later decay job has the signal it needs)
- LLM-based extractor as an alternative to the regex rules
- Automatic promotion of high-confidence candidates
- Candidate-to-entity upgrade path (needs the engineering layer
  memory-vs-entities decision, planned in a coming architecture doc)
2026-04-06 21:24:17 -04:00