diff --git a/deploy/dalidou/cron-backup.sh b/deploy/dalidou/cron-backup.sh
index 64570f5..113bd75 100755
--- a/deploy/dalidou/cron-backup.sh
+++ b/deploy/dalidou/cron-backup.sh
@@ -82,6 +82,16 @@ else
     log "Step 3: ATOCORE_BACKUP_RSYNC not set, skipping off-host copy"
 fi
 
+# Step 3b: Auto-refresh vault sources so new PKM files flow in
+# automatically. Fail-open: never blocks the rest of the pipeline.
+log "Step 3b: auto-refresh vault sources"
+REFRESH_RESULT=$(curl -sf -X POST --max-time 600 \
+    "$ATOCORE_URL/ingest/sources" 2>&1) && {
+    log "Sources refresh complete"
+} || {
+    log "WARN: sources refresh failed (non-blocking): $REFRESH_RESULT"
+}
+
 # Step 4: Batch LLM extraction on recent interactions (optional).
 # Runs HOST-SIDE because claude CLI is on the host, not inside the
 # Docker container. The script fetches interactions from the API,
diff --git a/scripts/batch_llm_extract_live.py b/scripts/batch_llm_extract_live.py
index f177796..c129d73 100644
--- a/scripts/batch_llm_extract_live.py
+++ b/scripts/batch_llm_extract_live.py
@@ -31,45 +31,80 @@ MAX_PROMPT_CHARS = 2000
 
 MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "adaptation"}
 
-SYSTEM_PROMPT = """You extract durable memory candidates from LLM conversation turns for a personal context engine called AtoCore.
+SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for a personal context engine called AtoCore.
 
-AtoCore stores two kinds of knowledge:
+AtoCore is the brain for Atomaste's engineering work. Known projects:
+p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore,
+abb-space. Unknown project names — still tag them, the system auto-detects.
 
-A. PROJECT-SPECIFIC: applied decisions, constraints, and architecture for a named project. Known projects include p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, abb-space. If the conversation discusses a project NOT in this list, still tag it with the project name you identify — the system will auto-detect it as a new project or lead.
+Your job is to emit SIGNALS that matter for future context. Be aggressive:
+err on the side of capturing useful signal. Triage filters noise downstream.
 
-B. DOMAIN KNOWLEDGE: generalizable engineering insight that was EARNED through project work and is reusable across projects. Tag these with a domain instead of a project.
+WHAT TO EMIT (in order of importance):
 
-THE CRITICAL BAR FOR DOMAIN KNOWLEDGE:
-Only extract insight that took real effort to discover. The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" If they can look it up, do NOT extract it.
+1. PROJECT ACTIVITY — any mention of a project with context worth remembering:
+   - "Schott quote received for ABB-Space" (event + project)
+   - "Cédric asked about p06 firmware timing" (stakeholder event)
+   - "Still waiting on Zygo lead-time from Nabeel" (blocker status)
+   - "p05 vendor decision needs to happen this week" (action item)
 
-EXTRACT (earned insight):
-- "At F/1.2, Zerodur CTE gradient across the blank is the second-largest WFE contributor after gravity sag"
-- "Preston removal rate model breaks down below 5N applied force because the contact assumption fails"
-- "For swing-arm polishing, m=1 (coma) is NOT correctable by force modulation (score 0.09)"
+2. DECISIONS AND CHOICES — anything that commits to a direction:
+   - "Going with Zygo Verifire SV for p05" (decision)
+   - "Dropping stitching from primary workflow" (design choice)
+   - "USB SSD mandatory, not SD card" (architectural commitment)
 
-DO NOT EXTRACT (common knowledge):
-- "Zerodur CTE is 0.05 ppm/K" (textbook value)
-- "FEA uses finite elements to discretize continuous domains" (definition)
-- "Python is a programming language" (obvious)
+3. DURABLE ENGINEERING INSIGHT — earned knowledge that generalizes:
+   - "CTE gradient dominates WFE at F/1.2" (materials insight)
+   - "Preston model breaks below 5N because contact assumption fails"
+   - "m=1 coma NOT correctable by force modulation" (controls insight)
+   Test: would a competent engineer NEED experience to know this?
+   If it's textbook/google-findable, skip it.
 
-Rules:
+4. STAKEHOLDER AND VENDOR EVENTS:
+   - "Email sent to Nabeel 2026-04-13 asking for lead time"
+   - "Meeting with Jason on Table 7 next Tuesday"
+   - "Starspec wants updated CAD by Friday"
 
-1. Only surface durable claims. Skip transient status, instructional guidance, troubleshooting, ephemeral recommendations, session recaps.
-2. A candidate is durable when a reader coming back in two weeks would still need to know it.
-3. Each candidate must stand alone in one sentence under 200 characters.
-4. Type must be one of: project, knowledge, preference, adaptation.
-5. For project-specific claims, set ``project`` to the project id.
-6. For generalizable domain insight, set ``project`` to empty and set ``domain`` to one of: physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance.
-7. When one conversation produces BOTH a project-specific fact AND a generalizable principle, emit BOTH as separate candidates.
-8. Return [] on most turns. The bar is high. Empty is correct and expected.
-9. Confidence 0.5 default. Raise to 0.6 only for unambiguous committed claims.
-10. Output a raw JSON array only. No prose, no markdown fences.
+5. PREFERENCES AND ADAPTATIONS that shape how Antoine works:
+   - "Antoine prefers OAuth over API keys"
+   - "Extraction stays off the capture hot path"
 
-Each array element:
+WHAT TO SKIP:
 
-{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "domain": "", "confidence": 0.5}
+- Pure conversational filler ("ok thanks", "let me check")
+- Instructional help content ("run this command", "here's how to...")
+- Obvious textbook facts anyone can google in 30 seconds
+- Session meta-chatter ("let me commit this", "deploy running")
+- Transient system state snapshots ("36 active memories right now")
 
-Use ``project`` for project-scoped candidates. Use ``domain`` for cross-project knowledge. Never set both."""
+CANDIDATE TYPES — choose the best fit:
+
+- project — a fact, decision, or event specific to one named project
+- knowledge — durable engineering insight (use domain, not project)
+- preference — how Antoine works / wants things done
+- adaptation — a standing rule or adjustment to behavior
+- episodic — a stakeholder event or milestone worth remembering
+
+DOMAINS for knowledge candidates (required when type=knowledge and project is empty):
+physics, materials, optics, mechanics, manufacturing, metrology,
+controls, software, math, finance, business
+
+TRUST HIERARCHY:
+
+- project-specific: set project to the project id, leave domain empty
+- domain knowledge: set domain, leave project empty
+- events/activity: use project, type=project or episodic
+- one conversation can produce MULTIPLE candidates — emit them all
+
+OUTPUT RULES:
+
+- Each candidate content under 250 characters, stands alone
+- Default confidence 0.5. Raise to 0.7 only for ratified/committed claims.
+- Raw JSON array, no prose, no markdown fences
+- Empty array [] is fine when the conversation has no durable signal
+
+Each element:
+{"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5}"""
 
 _sandbox_cwd = None
 
diff --git a/src/atocore/memory/extractor_llm.py b/src/atocore/memory/extractor_llm.py
index 19417cf..acbb3a6 100644
--- a/src/atocore/memory/extractor_llm.py
+++ b/src/atocore/memory/extractor_llm.py
@@ -64,52 +64,86 @@ from atocore.observability.logger import get_logger
 
 log = get_logger("extractor_llm")
 
-LLM_EXTRACTOR_VERSION = "llm-0.3.0"
+LLM_EXTRACTOR_VERSION = "llm-0.4.0"
 DEFAULT_MODEL = os.environ.get("ATOCORE_LLM_EXTRACTOR_MODEL", "sonnet")
 DEFAULT_TIMEOUT_S = float(os.environ.get("ATOCORE_LLM_EXTRACTOR_TIMEOUT_S", "90"))
 MAX_RESPONSE_CHARS = 8000
 MAX_PROMPT_CHARS = 2000
 
-_SYSTEM_PROMPT = """You extract durable memory candidates from LLM conversation turns for a personal context engine called AtoCore.
+_SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for a personal context engine called AtoCore.
 
-AtoCore stores two kinds of knowledge:
+AtoCore is the brain for Atomaste's engineering work. Known projects:
+p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore,
+abb-space. Unknown project names — still tag them, the system auto-detects.
 
-A. PROJECT-SPECIFIC: applied decisions, constraints, and architecture for a named project. Known projects include p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, abb-space. If the conversation discusses a project NOT in this list, still tag it with the project name you identify — the system will auto-detect it as a new project or lead.
+Your job is to emit SIGNALS that matter for future context. Be aggressive:
+err on the side of capturing useful signal. Triage filters noise downstream.
 
-B. DOMAIN KNOWLEDGE: generalizable engineering insight that was EARNED through project work and is reusable across projects. Tag these with a domain instead of a project.
+WHAT TO EMIT (in order of importance):
 
-THE CRITICAL BAR FOR DOMAIN KNOWLEDGE:
-Only extract insight that took real effort to discover. The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" If they can look it up, do NOT extract it.
+1. PROJECT ACTIVITY — any mention of a project with context worth remembering:
+   - "Schott quote received for ABB-Space" (event + project)
+   - "Cédric asked about p06 firmware timing" (stakeholder event)
+   - "Still waiting on Zygo lead-time from Nabeel" (blocker status)
+   - "p05 vendor decision needs to happen this week" (action item)
 
-EXTRACT (earned insight):
-- "At F/1.2, Zerodur CTE gradient across the blank is the second-largest WFE contributor after gravity sag — costs ~3nm and drove the support pad layout"
-- "Preston removal rate model breaks down below 5N applied force because the contact assumption fails"
-- "For swing-arm polishing, m=1 (coma) is NOT correctable by force modulation (score 0.09) — only m=2 and m=3 work"
+2. DECISIONS AND CHOICES — anything that commits to a direction:
+   - "Going with Zygo Verifire SV for p05" (decision)
+   - "Dropping stitching from primary workflow" (design choice)
+   - "USB SSD mandatory, not SD card" (architectural commitment)
 
-DO NOT EXTRACT (common knowledge):
-- "Zerodur CTE is 0.05 ppm/K" (textbook value)
-- "FEA uses finite elements to discretize continuous domains" (definition)
-- "Python is a programming language" (obvious)
-- "git commit saves changes" (basic tool knowledge)
+3. DURABLE ENGINEERING INSIGHT — earned knowledge that generalizes:
+   - "CTE gradient dominates WFE at F/1.2" (materials insight)
+   - "Preston model breaks below 5N because contact assumption fails"
+   - "m=1 coma NOT correctable by force modulation" (controls insight)
+   Test: would a competent engineer NEED experience to know this?
+   If it's textbook/google-findable, skip it.
 
-Rules:
+4. STAKEHOLDER AND VENDOR EVENTS:
+   - "Email sent to Nabeel 2026-04-13 asking for lead time"
+   - "Meeting with Jason on Table 7 next Tuesday"
+   - "Starspec wants updated CAD by Friday"
 
-1. Only surface durable claims. Skip transient status, instructional guidance, troubleshooting tactics, ephemeral recommendations, and session recaps.
-2. A candidate is durable when a reader coming back in two weeks would still need to know it.
-3. Each candidate must stand alone in one sentence under 200 characters.
-4. Type must be one of: project, knowledge, preference, adaptation.
-5. For project-specific claims, set ``project`` to the project id.
-6. For generalizable domain insight, set ``project`` to empty and set ``domain`` to one of: physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance.
-7. When one conversation produces BOTH a project-specific fact AND a generalizable principle, emit BOTH as separate candidates.
-8. Return [] on most turns. The bar is high. Empty is correct and expected.
-9. Confidence 0.5 default. Raise to 0.6 only for unambiguous committed claims.
-10. Output a raw JSON array only. No prose, no markdown fences.
+5. PREFERENCES AND ADAPTATIONS that shape how Antoine works:
+   - "Antoine prefers OAuth over API keys"
+   - "Extraction stays off the capture hot path"
 
-Each array element:
+WHAT TO SKIP:
 
-{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "domain": "", "confidence": 0.5}
+- Pure conversational filler ("ok thanks", "let me check")
+- Instructional help content ("run this command", "here's how to...")
+- Obvious textbook facts anyone can google in 30 seconds
+- Session meta-chatter ("let me commit this", "deploy running")
+- Transient system state snapshots ("36 active memories right now")
 
-Use ``project`` for project-scoped candidates. Use ``domain`` for cross-project knowledge. Never set both."""
+CANDIDATE TYPES — choose the best fit:
+
+- project — a fact, decision, or event specific to one named project
+- knowledge — durable engineering insight (use domain, not project)
+- preference — how Antoine works / wants things done
+- adaptation — a standing rule or adjustment to behavior
+- episodic — a stakeholder event or milestone worth remembering
+
+DOMAINS for knowledge candidates (required when type=knowledge and project is empty):
+physics, materials, optics, mechanics, manufacturing, metrology,
+controls, software, math, finance, business
+
+TRUST HIERARCHY:
+
+- project-specific: set project to the project id, leave domain empty
+- domain knowledge: set domain, leave project empty
+- events/activity: use project, type=project or episodic
+- one conversation can produce MULTIPLE candidates — emit them all
+
+OUTPUT RULES:
+
+- Each candidate content under 250 characters, stands alone
+- Default confidence 0.5. Raise to 0.7 only for ratified/committed claims.
+- Raw JSON array, no prose, no markdown fences
+- Empty array [] is fine when the conversation has no durable signal
+
+Each element:
+{"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5}"""
 
 
 @dataclass