feat(phase9-C): rule-based candidate extractor and review queue
Phase 9 Commit C. Closes the capture loop: Commit A records what
AtoCore fed the LLM and what came back, Commit B bumps confidence on
active memories the response actually references, and this commit
turns structured cues in the response into candidate memories for a
human review queue.
Nothing extracted here is ever automatically promoted into trusted
state. Every candidate sits at status="candidate" until a human (or
later, a confident automatic policy) calls /memory/{id}/promote or
/memory/{id}/reject. This keeps the "bad memory is worse than no
memory" invariant from the operating model intact.
New module: src/atocore/memory/extractor.py
- MemoryCandidate dataclass (type, content, rule, source_span,
project, confidence, source_interaction_id)
- extract_candidates_from_interaction(interaction): runs a fixed set
of regex rules over the response + response_summary and returns
a list of candidates
V0 rule set (deliberately narrow to keep false positives low):
- decision_heading ## Decision: / ## Decision - / ## Decision —
-> adaptation candidate
- constraint_heading ## Constraint: ... -> project candidate
- requirement_heading ## Requirement: ... -> project candidate
- fact_heading ## Fact: ... -> knowledge candidate
- preference_sentence "I prefer X" / "the user prefers X"
-> preference candidate
- decided_to_sentence "decided to X" -> adaptation candidate
- requirement_sentence "the requirement is X" -> project candidate
Extractor post-processing:
- clean_value: collapse whitespace, strip trailing punctuation
- min content length 8 chars, max 280 (keeps candidates reviewable)
- dedupe by (memory_type, normalized value, rule)
- drop candidates whose content already matches an active memory of
the same type+project so the queue doesn't ask humans to re-curate
things they already promoted
Memory service (extends Commit B candidate-status foundation):
- promote_memory(id): candidate -> active (404 if not a candidate)
- reject_candidate_memory(id): candidate -> invalid
- both are no-ops if the target isn't currently a candidate so the
API can surface 404 without the caller needing to pre-check
API endpoints (new):
- POST /interactions/{id}/extract run extractor, preview-only
body: {"persist": false} (default) returns candidates
{"persist": true} creates candidate memories
- POST /memory/{id}/promote candidate -> active
- POST /memory/{id}/reject candidate -> invalid
- GET /memory?status=candidate list review queue explicitly
(existing endpoint now accepts status= override)
- GET /memory now also returns reference_count and last_referenced_at
per memory so the Commit B reinforcement signal is visible to clients
Trust model unchanged:
- candidates NEVER appear in context packs (get_memories_for_context
still filters to active via the active_only default)
- candidates NEVER get reinforced by the Commit B loop (reinforcement
refuses non-active memories)
- trusted project state is untouched end-to-end
Tests (25 new, all green):
- heading pattern: decision, constraint, requirement, fact
- separator variants :, -, em-dash
- sentence patterns: preference, decided_to, requirement
- rejects too-short matches
- dedupes identical matches
- strips trailing punctuation
- carries project and source_interaction_id onto candidates
- drops candidates that duplicate an existing active memory
- returns empty for prose without structural cues
- candidate and active coexist in the memory table
- promote_memory moves candidate -> active
- promote on non-candidate returns False
- reject_candidate_memory moves candidate -> invalid
- reject on non-candidate returns False
- get_memories(status="candidate") returns just the queue
- POST /interactions/{id}/extract preview-only path
- POST /interactions/{id}/extract persist=true path
- POST /interactions/{id}/extract 404 for missing interaction
- POST /memory/{id}/promote success + 404 on non-candidate
- POST /memory/{id}/reject 404 on missing
- GET /memory?status=candidate surfaces the queue
- GET /memory?status=<invalid> returns 400
Full suite: 160 passing (was 135).
What Phase 9 looks like end to end after this commit
----------------------------------------------------
prompt
-> context pack assembled
-> LLM response
-> POST /interactions (capture)
-> automatic Commit B reinforcement (active memories only)
-> [optional] POST /interactions/{id}/extract
-> Commit C extractor proposes candidates
-> human reviews via GET /memory?status=candidate
-> POST /memory/{id}/promote (candidate -> active)
OR POST /memory/{id}/reject (candidate -> invalid)
Not in this commit (deferred on purpose):
- Decay of unused memories (we keep reference_count and
last_referenced_at so a later decay job has the signal it needs)
- LLM-based extractor as an alternative to the regex rules
- Automatic promotion of high-confidence candidates
- Candidate-to-entity upgrade path (needs the engineering layer
memory-vs-entities decision, planned in a coming architecture doc)
This commit is contained in:
@@ -30,6 +30,10 @@ from atocore.interactions.service import (
|
||||
list_interactions,
|
||||
record_interaction,
|
||||
)
|
||||
from atocore.memory.extractor import (
|
||||
MemoryCandidate,
|
||||
extract_candidates_from_interaction,
|
||||
)
|
||||
from atocore.memory.reinforcement import reinforce_from_interaction
|
||||
from atocore.memory.service import (
|
||||
MEMORY_STATUSES,
|
||||
@@ -351,15 +355,25 @@ def api_get_memories(
|
||||
active_only: bool = True,
|
||||
min_confidence: float = 0.0,
|
||||
limit: int = 50,
|
||||
status: str | None = None,
|
||||
) -> dict:
|
||||
"""List memories, optionally filtered."""
|
||||
memories = get_memories(
|
||||
memory_type=memory_type,
|
||||
project=project,
|
||||
active_only=active_only,
|
||||
min_confidence=min_confidence,
|
||||
limit=limit,
|
||||
)
|
||||
"""List memories, optionally filtered.
|
||||
|
||||
When ``status`` is given explicitly it overrides ``active_only`` so
|
||||
the Phase 9 Commit C review queue can be listed via
|
||||
``GET /memory?status=candidate``.
|
||||
"""
|
||||
try:
|
||||
memories = get_memories(
|
||||
memory_type=memory_type,
|
||||
project=project,
|
||||
active_only=active_only,
|
||||
min_confidence=min_confidence,
|
||||
limit=limit,
|
||||
status=status,
|
||||
)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
return {
|
||||
"memories": [
|
||||
{
|
||||
@@ -369,11 +383,14 @@ def api_get_memories(
|
||||
"project": m.project,
|
||||
"confidence": m.confidence,
|
||||
"status": m.status,
|
||||
"reference_count": m.reference_count,
|
||||
"last_referenced_at": m.last_referenced_at,
|
||||
"updated_at": m.updated_at,
|
||||
}
|
||||
for m in memories
|
||||
],
|
||||
"types": MEMORY_TYPES,
|
||||
"statuses": MEMORY_STATUSES,
|
||||
}
|
||||
|
||||
|
||||
@@ -403,6 +420,33 @@ def api_invalidate_memory(memory_id: str) -> dict:
|
||||
return {"status": "invalidated", "id": memory_id}
|
||||
|
||||
|
||||
@router.post("/memory/{memory_id}/promote")
|
||||
def api_promote_memory(memory_id: str) -> dict:
|
||||
"""Promote a candidate memory to active (Phase 9 Commit C)."""
|
||||
try:
|
||||
success = promote_memory(memory_id)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
if not success:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Memory not found or not a candidate: {memory_id}",
|
||||
)
|
||||
return {"status": "promoted", "id": memory_id}
|
||||
|
||||
|
||||
@router.post("/memory/{memory_id}/reject")
|
||||
def api_reject_candidate_memory(memory_id: str) -> dict:
|
||||
"""Reject a candidate memory (Phase 9 Commit C review queue)."""
|
||||
success = reject_candidate_memory(memory_id)
|
||||
if not success:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Memory not found or not a candidate: {memory_id}",
|
||||
)
|
||||
return {"status": "rejected", "id": memory_id}
|
||||
|
||||
|
||||
@router.post("/project/state")
|
||||
def api_set_project_state(req: ProjectStateSetRequest) -> dict:
|
||||
"""Set or update a trusted project state entry."""
|
||||
@@ -528,6 +572,70 @@ def api_reinforce_interaction(interaction_id: str) -> dict:
|
||||
}
|
||||
|
||||
|
||||
class InteractionExtractRequest(BaseModel):
|
||||
persist: bool = False
|
||||
|
||||
|
||||
@router.post("/interactions/{interaction_id}/extract")
|
||||
def api_extract_from_interaction(
|
||||
interaction_id: str,
|
||||
req: InteractionExtractRequest | None = None,
|
||||
) -> dict:
|
||||
"""Extract candidate memories from a captured interaction.
|
||||
|
||||
Phase 9 Commit C. The extractor is rule-based and deliberately
|
||||
conservative — it only surfaces candidates that matched an explicit
|
||||
structural cue (decision heading, preference sentence, etc.). By
|
||||
default the candidates are returned *without* being persisted so a
|
||||
caller can preview them before committing to a review queue. Pass
|
||||
``persist: true`` to immediately create candidate memories for
|
||||
each extraction result.
|
||||
"""
|
||||
interaction = get_interaction(interaction_id)
|
||||
if interaction is None:
|
||||
raise HTTPException(status_code=404, detail=f"Interaction not found: {interaction_id}")
|
||||
payload = req or InteractionExtractRequest()
|
||||
candidates: list[MemoryCandidate] = extract_candidates_from_interaction(interaction)
|
||||
|
||||
persisted_ids: list[str] = []
|
||||
if payload.persist:
|
||||
for candidate in candidates:
|
||||
try:
|
||||
mem = create_memory(
|
||||
memory_type=candidate.memory_type,
|
||||
content=candidate.content,
|
||||
project=candidate.project,
|
||||
confidence=candidate.confidence,
|
||||
status="candidate",
|
||||
)
|
||||
persisted_ids.append(mem.id)
|
||||
except ValueError as e:
|
||||
log.error(
|
||||
"extract_persist_failed",
|
||||
interaction_id=interaction_id,
|
||||
rule=candidate.rule,
|
||||
error=str(e),
|
||||
)
|
||||
|
||||
return {
|
||||
"interaction_id": interaction_id,
|
||||
"candidate_count": len(candidates),
|
||||
"persisted": payload.persist,
|
||||
"persisted_ids": persisted_ids,
|
||||
"candidates": [
|
||||
{
|
||||
"memory_type": c.memory_type,
|
||||
"content": c.content,
|
||||
"project": c.project,
|
||||
"confidence": c.confidence,
|
||||
"rule": c.rule,
|
||||
"source_span": c.source_span,
|
||||
}
|
||||
for c in candidates
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@router.get("/interactions")
|
||||
def api_list_interactions(
|
||||
project: str | None = None,
|
||||
|
||||
Reference in New Issue
Block a user