feat: signal-aggressive extraction + auto vault refresh in nightly cron

Extraction prompt rewritten for signal-aggressive mode. The old prompt rewarded silence ("durable insight only, empty is correct") which caused quiet failures — real project signal (Schott quotes arriving, stakeholder events, blockers) was dropped as "not architectural enough". New prompt explicitly lists what to emit: 1. Project activity (mentions with context — quote received, blocker, action item) 2. Decisions and choices (architectural commitments, vendor selection) 3. Durable engineering insight (earned knowledge, generalizable) 4. Stakeholder and vendor events (emails sent, meetings scheduled) 5. Preferences and adaptations (how Antoine works) Philosophy shift: "capture more signal, let triage filter noise" replaces "extract only durable architectural facts". Auto-triage already rejects noise well, so moving the filter downstream gives us visibility into weak signals without polluting active memory. Added 'episodic' to the candidate types list to support stakeholder events with a timestamp feel. LLM_EXTRACTOR_VERSION bumped to llm-0.4.0. Also: cron-backup.sh now runs POST /ingest/sources before extraction so new PKM files flow in automatically. Fail-open, non-blocking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:24:50 -04:00
parent c1f5b3bdee
commit 3f23ca1bc6
3 changed files with 137 additions and 58 deletions
--- a/deploy/dalidou/cron-backup.sh
+++ b/deploy/dalidou/cron-backup.sh
@@ -82,6 +82,16 @@ else
    log "Step 3: ATOCORE_BACKUP_RSYNC not set, skipping off-host copy"
 fi
 # Step 3b: Auto-refresh vault sources so new PKM files flow in
 # automatically. Fail-open: never blocks the rest of the pipeline.
 log "Step 3b: auto-refresh vault sources"
 REFRESH_RESULT=$(curl -sf -X POST --max-time 600 \
    "$ATOCORE_URL/ingest/sources" 2>&1) && {
    log "Sources refresh complete"
 } || {
    log "WARN: sources refresh failed (non-blocking): $REFRESH_RESULT"
 }
 # Step 4: Batch LLM extraction on recent interactions (optional).
 # Runs HOST-SIDE because claude CLI is on the host, not inside the
 # Docker container. The script fetches interactions from the API,
--- a/scripts/batch_llm_extract_live.py
+++ b/scripts/batch_llm_extract_live.py
@@ -31,45 +31,80 @@ MAX_PROMPT_CHARS = 2000
 MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "adaptation"}
-SYSTEM_PROMPT = """You extract durable memory candidates from LLM conversation turns for a personal context engine called AtoCore.
+SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for a personal context engine called AtoCore.
-AtoCore stores two kinds of knowledge:
+AtoCore is the brain for Atomaste's engineering work. Known projects:
 p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore,
 abb-space. Unknown project names — still tag them, the system auto-detects.
-A. PROJECT-SPECIFIC: applied decisions, constraints, and architecture for a named project. Known projects include p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, abb-space. If the conversation discusses a project NOT in this list, still tag it with the project name you identify — the system will auto-detect it as a new project or lead.
+Your job is to emit SIGNALS that matter for future context. Be aggressive:
 err on the side of capturing useful signal. Triage filters noise downstream.
-B. DOMAIN KNOWLEDGE: generalizable engineering insight that was EARNED through project work and is reusable across projects. Tag these with a domain instead of a project.
+WHAT TO EMIT (in order of importance):
-THE CRITICAL BAR FOR DOMAIN KNOWLEDGE:
+1. PROJECT ACTIVITY — any mention of a project with context worth remembering:
-Only extract insight that took real effort to discover. The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" If they can look it up, do NOT extract it.
+   - "Schott quote received for ABB-Space" (event + project)
   - "Cédric asked about p06 firmware timing" (stakeholder event)
   - "Still waiting on Zygo lead-time from Nabeel" (blocker status)
   - "p05 vendor decision needs to happen this week" (action item)
-EXTRACT (earned insight):
+2. DECISIONS AND CHOICES — anything that commits to a direction:
- "At F/1.2, Zerodur CTE gradient across the blank is the second-largest WFE contributor after gravity sag"
+   - "Going with Zygo Verifire SV for p05" (decision)
- "Preston removal rate model breaks down below 5N applied force because the contact assumption fails"
+   - "Dropping stitching from primary workflow" (design choice)
- "For swing-arm polishing, m=1 (coma) is NOT correctable by force modulation (score 0.09)"
+   - "USB SSD mandatory, not SD card" (architectural commitment)
-DO NOT EXTRACT (common knowledge):
+3. DURABLE ENGINEERING INSIGHT — earned knowledge that generalizes:
- "Zerodur CTE is 0.05 ppm/K" (textbook value)
+   - "CTE gradient dominates WFE at F/1.2" (materials insight)
- "FEA uses finite elements to discretize continuous domains" (definition)
+   - "Preston model breaks below 5N because contact assumption fails"
- "Python is a programming language" (obvious)
+   - "m=1 coma NOT correctable by force modulation" (controls insight)
   Test: would a competent engineer NEED experience to know this?
   If it's textbook/google-findable, skip it.
-Rules:
+4. STAKEHOLDER AND VENDOR EVENTS:
   - "Email sent to Nabeel 2026-04-13 asking for lead time"
   - "Meeting with Jason on Table 7 next Tuesday"
   - "Starspec wants updated CAD by Friday"
-1. Only surface durable claims. Skip transient status, instructional guidance, troubleshooting, ephemeral recommendations, session recaps.
+5. PREFERENCES AND ADAPTATIONS that shape how Antoine works:
-2. A candidate is durable when a reader coming back in two weeks would still need to know it.
+   - "Antoine prefers OAuth over API keys"
-3. Each candidate must stand alone in one sentence under 200 characters.
+   - "Extraction stays off the capture hot path"
 4. Type must be one of: project, knowledge, preference, adaptation.
 5. For project-specific claims, set ``project`` to the project id.
 6. For generalizable domain insight, set ``project`` to empty and set ``domain`` to one of: physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance.
 7. When one conversation produces BOTH a project-specific fact AND a generalizable principle, emit BOTH as separate candidates.
 8. Return [] on most turns. The bar is high. Empty is correct and expected.
 9. Confidence 0.5 default. Raise to 0.6 only for unambiguous committed claims.
 10. Output a raw JSON array only. No prose, no markdown fences.
-Each array element:
+WHAT TO SKIP:
-{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "domain": "", "confidence": 0.5}
+- Pure conversational filler ("ok thanks", "let me check")
 - Instructional help content ("run this command", "here's how to...")
 - Obvious textbook facts anyone can google in 30 seconds
 - Session meta-chatter ("let me commit this", "deploy running")
 - Transient system state snapshots ("36 active memories right now")
-Use ``project`` for project-scoped candidates. Use ``domain`` for cross-project knowledge. Never set both."""
+CANDIDATE TYPES — choose the best fit:
 - project — a fact, decision, or event specific to one named project
 - knowledge — durable engineering insight (use domain, not project)
 - preference — how Antoine works / wants things done
 - adaptation — a standing rule or adjustment to behavior
 - episodic — a stakeholder event or milestone worth remembering
 DOMAINS for knowledge candidates (required when type=knowledge and project is empty):
 physics, materials, optics, mechanics, manufacturing, metrology,
 controls, software, math, finance, business
 TRUST HIERARCHY:
 - project-specific: set project to the project id, leave domain empty
 - domain knowledge: set domain, leave project empty
 - events/activity: use project, type=project or episodic
 - one conversation can produce MULTIPLE candidates — emit them all
 OUTPUT RULES:
 - Each candidate content under 250 characters, stands alone
 - Default confidence 0.5. Raise to 0.7 only for ratified/committed claims.
 - Raw JSON array, no prose, no markdown fences
 - Empty array [] is fine when the conversation has no durable signal
 Each element:
 {"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5}"""
 _sandbox_cwd = None
--- a/src/atocore/memory/extractor_llm.py
+++ b/src/atocore/memory/extractor_llm.py
@@ -64,52 +64,86 @@ from atocore.observability.logger import get_logger
 log = get_logger("extractor_llm")
-LLM_EXTRACTOR_VERSION = "llm-0.3.0"
+LLM_EXTRACTOR_VERSION = "llm-0.4.0"
 DEFAULT_MODEL = os.environ.get("ATOCORE_LLM_EXTRACTOR_MODEL", "sonnet")
 DEFAULT_TIMEOUT_S = float(os.environ.get("ATOCORE_LLM_EXTRACTOR_TIMEOUT_S", "90"))
 MAX_RESPONSE_CHARS = 8000
 MAX_PROMPT_CHARS = 2000
-_SYSTEM_PROMPT = """You extract durable memory candidates from LLM conversation turns for a personal context engine called AtoCore.
+_SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for a personal context engine called AtoCore.
-AtoCore stores two kinds of knowledge:
+AtoCore is the brain for Atomaste's engineering work. Known projects:
 p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore,
 abb-space. Unknown project names — still tag them, the system auto-detects.
-A. PROJECT-SPECIFIC: applied decisions, constraints, and architecture for a named project. Known projects include p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, abb-space. If the conversation discusses a project NOT in this list, still tag it with the project name you identify — the system will auto-detect it as a new project or lead.
+Your job is to emit SIGNALS that matter for future context. Be aggressive:
 err on the side of capturing useful signal. Triage filters noise downstream.
-B. DOMAIN KNOWLEDGE: generalizable engineering insight that was EARNED through project work and is reusable across projects. Tag these with a domain instead of a project.
+WHAT TO EMIT (in order of importance):
-THE CRITICAL BAR FOR DOMAIN KNOWLEDGE:
+1. PROJECT ACTIVITY — any mention of a project with context worth remembering:
-Only extract insight that took real effort to discover. The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" If they can look it up, do NOT extract it.
+   - "Schott quote received for ABB-Space" (event + project)
   - "Cédric asked about p06 firmware timing" (stakeholder event)
   - "Still waiting on Zygo lead-time from Nabeel" (blocker status)
   - "p05 vendor decision needs to happen this week" (action item)
-EXTRACT (earned insight):
+2. DECISIONS AND CHOICES — anything that commits to a direction:
- "At F/1.2, Zerodur CTE gradient across the blank is the second-largest WFE contributor after gravity sag — costs ~3nm and drove the support pad layout"
+   - "Going with Zygo Verifire SV for p05" (decision)
- "Preston removal rate model breaks down below 5N applied force because the contact assumption fails"
+   - "Dropping stitching from primary workflow" (design choice)
- "For swing-arm polishing, m=1 (coma) is NOT correctable by force modulation (score 0.09) — only m=2 and m=3 work"
+   - "USB SSD mandatory, not SD card" (architectural commitment)
-DO NOT EXTRACT (common knowledge):
+3. DURABLE ENGINEERING INSIGHT — earned knowledge that generalizes:
- "Zerodur CTE is 0.05 ppm/K" (textbook value)
+   - "CTE gradient dominates WFE at F/1.2" (materials insight)
- "FEA uses finite elements to discretize continuous domains" (definition)
+   - "Preston model breaks below 5N because contact assumption fails"
- "Python is a programming language" (obvious)
+   - "m=1 coma NOT correctable by force modulation" (controls insight)
- "git commit saves changes" (basic tool knowledge)
+   Test: would a competent engineer NEED experience to know this?
   If it's textbook/google-findable, skip it.
-Rules:
+4. STAKEHOLDER AND VENDOR EVENTS:
   - "Email sent to Nabeel 2026-04-13 asking for lead time"
   - "Meeting with Jason on Table 7 next Tuesday"
   - "Starspec wants updated CAD by Friday"
-1. Only surface durable claims. Skip transient status, instructional guidance, troubleshooting tactics, ephemeral recommendations, and session recaps.
+5. PREFERENCES AND ADAPTATIONS that shape how Antoine works:
-2. A candidate is durable when a reader coming back in two weeks would still need to know it.
+   - "Antoine prefers OAuth over API keys"
-3. Each candidate must stand alone in one sentence under 200 characters.
+   - "Extraction stays off the capture hot path"
 4. Type must be one of: project, knowledge, preference, adaptation.
 5. For project-specific claims, set ``project`` to the project id.
 6. For generalizable domain insight, set ``project`` to empty and set ``domain`` to one of: physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance.
 7. When one conversation produces BOTH a project-specific fact AND a generalizable principle, emit BOTH as separate candidates.
 8. Return [] on most turns. The bar is high. Empty is correct and expected.
 9. Confidence 0.5 default. Raise to 0.6 only for unambiguous committed claims.
 10. Output a raw JSON array only. No prose, no markdown fences.
-Each array element:
+WHAT TO SKIP:
-{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "domain": "", "confidence": 0.5}
+- Pure conversational filler ("ok thanks", "let me check")
 - Instructional help content ("run this command", "here's how to...")
 - Obvious textbook facts anyone can google in 30 seconds
 - Session meta-chatter ("let me commit this", "deploy running")
 - Transient system state snapshots ("36 active memories right now")
-Use ``project`` for project-scoped candidates. Use ``domain`` for cross-project knowledge. Never set both."""
+CANDIDATE TYPES — choose the best fit:
 - project — a fact, decision, or event specific to one named project
 - knowledge — durable engineering insight (use domain, not project)
 - preference — how Antoine works / wants things done
 - adaptation — a standing rule or adjustment to behavior
 - episodic — a stakeholder event or milestone worth remembering
 DOMAINS for knowledge candidates (required when type=knowledge and project is empty):
 physics, materials, optics, mechanics, manufacturing, metrology,
 controls, software, math, finance, business
 TRUST HIERARCHY:
 - project-specific: set project to the project id, leave domain empty
 - domain knowledge: set domain, leave project empty
 - events/activity: use project, type=project or episodic
 - one conversation can produce MULTIPLE candidates — emit them all
 OUTPUT RULES:
 - Each candidate content under 250 characters, stands alone
 - Default confidence 0.5. Raise to 0.7 only for ratified/committed claims.
 - Raw JSON array, no prose, no markdown fences
 - Empty array [] is fine when the conversation has no durable signal
 Each element:
 {"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5}"""
@dataclass