feat: Day 3 — auto-triage via LLM second pass

scripts/auto_triage.py: fetches candidate memories, asks a triage model (claude -p, default sonnet) to classify each as promote / reject / needs_human, and executes the verdict via the API. Trust model: - Auto-promote: model says promote AND confidence >= 0.8 AND dedup-checked against existing active memories for the project - Auto-reject: model says reject - needs_human: everything else stays in queue for manual review The triage model receives both the candidate content AND a summary of existing active memories for the same project, so it can detect duplicates and near-duplicates. The system prompt explicitly lists the rejection categories learned from the first two manual triage passes (stale snapshots, impl details, planned-not-implemented, process rules that belong in ledger not memory). deploy/dalidou/batch-extract.sh now runs extraction (Step A) then auto-triage (Step B) in sequence. The nightly cron at 03:00 UTC will run the full pipeline: backup → cleanup → rsync → extract → triage. Only needs_human candidates reach the human. Supports --dry-run for preview without executing. Supports --model override for multi-model triage (e.g. opus for higher-quality review, or a future Gemini/Ollama backend). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:30:57 -04:00
parent 9b149d4bfd
commit 1a2ee5e07f
4 changed files with 289 additions and 3 deletions
--- a/scripts/eval_data/candidate_queue_2026-04-12.json
+++ b/scripts/eval_data/candidate_queue_2026-04-12.json
--- a/scripts/eval_data/candidate_queue_2026-04-12.txt
+++ b/scripts/eval_data/candidate_queue_2026-04-12.txt
@@ -0,0 +1,29 @@
+ 1. [project    ] proj=atocore          AtoCore extraction must stay off the hot capture path; batch endpoint only
+ 2. [project    ] proj=atocore          Auto-promote gate: confidence ≥0.8 AND no duplicate in active memories
+ 3. [project    ] proj=atocore          AtoCore LLM extraction pipeline deployed on Dalidou host, runs via cron at 03:00 UTC via scripts/batch_llm_extract_live.py
+ 4. [project    ] proj=atocore          LLM extractor runs host-side (not in container) because claude CLI not available in container environment
+ 5. [project    ] proj=atocore          Host-side extraction script scripts/batch_llm_extract_live.py uses pure stdlib, no atocore imports for deployment simplicity
+ 6. [project    ] proj=atocore          POST /admin/extract-batch accepts mode: rule|llm, POST /interactions/{id}/extract now mode-aware
+ 7. [knowledge  ] proj=atocore          claude CLI 2.0.60 removed --no-session-persistence flag, extraction sessions now persist in claude history
+ 8. [adaptation ] proj=atocore          Durable memory extraction candidates must be <200 chars, stand-alone, typed as project|knowledge|preference|adaptation
+ 9. [adaptation ] proj=atocore          Memory extraction confidence defaults to 0.5, raise to 0.6 only for unambiguous committed claims
+10. [project    ] proj=atocore          Live Dalidou is on commit 39d73e9, not e2895b5
+11. [project    ] proj=atocore          Live harness is reproducible at 16/18 PASS
+12. [project    ] proj=atocore          Live active memories count is 36
+13. [project    ] proj=atocore          Wave 2 project-state entries on live: p04=5, p05=6, p06=6
+14. [project    ] proj=atocore          R6 is fixed by commit 39d73e9
+15. [project    ] proj=atocore          R9: R6 fix only covers empty project fallback; wrong non-empty model project can still override known interaction scope
+16. [project    ] proj=atocore          R10: Phase 8 is baseline-complete but not primary-complete; OpenClaw client covers narrow read-oriented slice of API
+17. [project    ] proj=atocore          Phase 8 is decent baseline integration milestone but not primary-ready yet
+18. [project    ] proj=atocore          4-step roadmap complete: extractor → harness → Wave 2 → OpenClaw
+19. [project    ] proj=atocore          Codex audit loop proven across two full round-trips in one session
+20. [project    ] proj=atocore          Session end state: 36 active memories, 17 project-state entries, 16/18 harness, 280 tests, main at 54d84b5
+21. [project    ] proj=atocore          AtoCore extraction stays off the hot capture path; LLM extraction runs as scheduled batch, not inline with POST /interactions.
+22. [project    ] proj=atocore          AtoCore auto-triage trust model: auto-promote only when confidence ≥0.8 AND no duplicate active memory; else needs_human.
+23. [project    ] proj=atocore          Multi-model triage: use different model for triage reviewer than extractor (sonnet for extract)
+24. [project    ] proj=atocore          R9 fix: when interaction has known project, prefer it over model's non-matching project unless model's is registered
+25. [project    ] proj=atocore          R7 ranking fix: add overlap-density as secondary signal (overlap_count / memory_token_count)
+26. [project    ] proj=atocore          Extraction pipeline skips interactions with response_chars < 50 to avoid low-signal content
+27. [project    ] proj=atocore          AtoCore triage uses independent model from extractor (extractor: sonnet, triage: different model or different prompt).
+28. [project    ] proj=atocore          AtoCore ranking scorer adds overlap-density (overlap_count / memory_tokens) as secondary signal to fix short-memory ranking.
+29. [project    ] proj=atocore          AtoCore project trust: when interaction has known project and model returns different project, prefer interaction's project unless