diff --git a/deploy/dalidou/cron-backup.sh b/deploy/dalidou/cron-backup.sh index 64570f5..113bd75 100755 --- a/deploy/dalidou/cron-backup.sh +++ b/deploy/dalidou/cron-backup.sh @@ -82,6 +82,16 @@ else log "Step 3: ATOCORE_BACKUP_RSYNC not set, skipping off-host copy" fi +# Step 3b: Auto-refresh vault sources so new PKM files flow in +# automatically. Fail-open: never blocks the rest of the pipeline. +log "Step 3b: auto-refresh vault sources" +REFRESH_RESULT=$(curl -sf -X POST --max-time 600 \ + "$ATOCORE_URL/ingest/sources" 2>&1) && { + log "Sources refresh complete" +} || { + log "WARN: sources refresh failed (non-blocking): $REFRESH_RESULT" +} + # Step 4: Batch LLM extraction on recent interactions (optional). # Runs HOST-SIDE because claude CLI is on the host, not inside the # Docker container. The script fetches interactions from the API, diff --git a/scripts/batch_llm_extract_live.py b/scripts/batch_llm_extract_live.py index f177796..c129d73 100644 --- a/scripts/batch_llm_extract_live.py +++ b/scripts/batch_llm_extract_live.py @@ -31,45 +31,80 @@ MAX_PROMPT_CHARS = 2000 MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "adaptation"} -SYSTEM_PROMPT = """You extract durable memory candidates from LLM conversation turns for a personal context engine called AtoCore. +SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for a personal context engine called AtoCore. -AtoCore stores two kinds of knowledge: +AtoCore is the brain for Atomaste's engineering work. Known projects: +p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, +abb-space. Unknown project names — still tag them, the system auto-detects. -A. PROJECT-SPECIFIC: applied decisions, constraints, and architecture for a named project. Known projects include p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, abb-space. If the conversation discusses a project NOT in this list, still tag it with the project name you identify — the system will auto-detect it as a new project or lead. +Your job is to emit SIGNALS that matter for future context. Be aggressive: +err on the side of capturing useful signal. Triage filters noise downstream. -B. DOMAIN KNOWLEDGE: generalizable engineering insight that was EARNED through project work and is reusable across projects. Tag these with a domain instead of a project. +WHAT TO EMIT (in order of importance): -THE CRITICAL BAR FOR DOMAIN KNOWLEDGE: -Only extract insight that took real effort to discover. The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" If they can look it up, do NOT extract it. +1. PROJECT ACTIVITY — any mention of a project with context worth remembering: + - "Schott quote received for ABB-Space" (event + project) + - "Cédric asked about p06 firmware timing" (stakeholder event) + - "Still waiting on Zygo lead-time from Nabeel" (blocker status) + - "p05 vendor decision needs to happen this week" (action item) -EXTRACT (earned insight): -- "At F/1.2, Zerodur CTE gradient across the blank is the second-largest WFE contributor after gravity sag" -- "Preston removal rate model breaks down below 5N applied force because the contact assumption fails" -- "For swing-arm polishing, m=1 (coma) is NOT correctable by force modulation (score 0.09)" +2. DECISIONS AND CHOICES — anything that commits to a direction: + - "Going with Zygo Verifire SV for p05" (decision) + - "Dropping stitching from primary workflow" (design choice) + - "USB SSD mandatory, not SD card" (architectural commitment) -DO NOT EXTRACT (common knowledge): -- "Zerodur CTE is 0.05 ppm/K" (textbook value) -- "FEA uses finite elements to discretize continuous domains" (definition) -- "Python is a programming language" (obvious) +3. DURABLE ENGINEERING INSIGHT — earned knowledge that generalizes: + - "CTE gradient dominates WFE at F/1.2" (materials insight) + - "Preston model breaks below 5N because contact assumption fails" + - "m=1 coma NOT correctable by force modulation" (controls insight) + Test: would a competent engineer NEED experience to know this? + If it's textbook/google-findable, skip it. -Rules: +4. STAKEHOLDER AND VENDOR EVENTS: + - "Email sent to Nabeel 2026-04-13 asking for lead time" + - "Meeting with Jason on Table 7 next Tuesday" + - "Starspec wants updated CAD by Friday" -1. Only surface durable claims. Skip transient status, instructional guidance, troubleshooting, ephemeral recommendations, session recaps. -2. A candidate is durable when a reader coming back in two weeks would still need to know it. -3. Each candidate must stand alone in one sentence under 200 characters. -4. Type must be one of: project, knowledge, preference, adaptation. -5. For project-specific claims, set ``project`` to the project id. -6. For generalizable domain insight, set ``project`` to empty and set ``domain`` to one of: physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance. -7. When one conversation produces BOTH a project-specific fact AND a generalizable principle, emit BOTH as separate candidates. -8. Return [] on most turns. The bar is high. Empty is correct and expected. -9. Confidence 0.5 default. Raise to 0.6 only for unambiguous committed claims. -10. Output a raw JSON array only. No prose, no markdown fences. +5. PREFERENCES AND ADAPTATIONS that shape how Antoine works: + - "Antoine prefers OAuth over API keys" + - "Extraction stays off the capture hot path" -Each array element: +WHAT TO SKIP: -{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "domain": "", "confidence": 0.5} +- Pure conversational filler ("ok thanks", "let me check") +- Instructional help content ("run this command", "here's how to...") +- Obvious textbook facts anyone can google in 30 seconds +- Session meta-chatter ("let me commit this", "deploy running") +- Transient system state snapshots ("36 active memories right now") -Use ``project`` for project-scoped candidates. Use ``domain`` for cross-project knowledge. Never set both.""" +CANDIDATE TYPES — choose the best fit: + +- project — a fact, decision, or event specific to one named project +- knowledge — durable engineering insight (use domain, not project) +- preference — how Antoine works / wants things done +- adaptation — a standing rule or adjustment to behavior +- episodic — a stakeholder event or milestone worth remembering + +DOMAINS for knowledge candidates (required when type=knowledge and project is empty): +physics, materials, optics, mechanics, manufacturing, metrology, +controls, software, math, finance, business + +TRUST HIERARCHY: + +- project-specific: set project to the project id, leave domain empty +- domain knowledge: set domain, leave project empty +- events/activity: use project, type=project or episodic +- one conversation can produce MULTIPLE candidates — emit them all + +OUTPUT RULES: + +- Each candidate content under 250 characters, stands alone +- Default confidence 0.5. Raise to 0.7 only for ratified/committed claims. +- Raw JSON array, no prose, no markdown fences +- Empty array [] is fine when the conversation has no durable signal + +Each element: +{"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5}""" _sandbox_cwd = None diff --git a/src/atocore/memory/extractor_llm.py b/src/atocore/memory/extractor_llm.py index 19417cf..acbb3a6 100644 --- a/src/atocore/memory/extractor_llm.py +++ b/src/atocore/memory/extractor_llm.py @@ -64,52 +64,86 @@ from atocore.observability.logger import get_logger log = get_logger("extractor_llm") -LLM_EXTRACTOR_VERSION = "llm-0.3.0" +LLM_EXTRACTOR_VERSION = "llm-0.4.0" DEFAULT_MODEL = os.environ.get("ATOCORE_LLM_EXTRACTOR_MODEL", "sonnet") DEFAULT_TIMEOUT_S = float(os.environ.get("ATOCORE_LLM_EXTRACTOR_TIMEOUT_S", "90")) MAX_RESPONSE_CHARS = 8000 MAX_PROMPT_CHARS = 2000 -_SYSTEM_PROMPT = """You extract durable memory candidates from LLM conversation turns for a personal context engine called AtoCore. +_SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for a personal context engine called AtoCore. -AtoCore stores two kinds of knowledge: +AtoCore is the brain for Atomaste's engineering work. Known projects: +p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, +abb-space. Unknown project names — still tag them, the system auto-detects. -A. PROJECT-SPECIFIC: applied decisions, constraints, and architecture for a named project. Known projects include p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, abb-space. If the conversation discusses a project NOT in this list, still tag it with the project name you identify — the system will auto-detect it as a new project or lead. +Your job is to emit SIGNALS that matter for future context. Be aggressive: +err on the side of capturing useful signal. Triage filters noise downstream. -B. DOMAIN KNOWLEDGE: generalizable engineering insight that was EARNED through project work and is reusable across projects. Tag these with a domain instead of a project. +WHAT TO EMIT (in order of importance): -THE CRITICAL BAR FOR DOMAIN KNOWLEDGE: -Only extract insight that took real effort to discover. The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" If they can look it up, do NOT extract it. +1. PROJECT ACTIVITY — any mention of a project with context worth remembering: + - "Schott quote received for ABB-Space" (event + project) + - "Cédric asked about p06 firmware timing" (stakeholder event) + - "Still waiting on Zygo lead-time from Nabeel" (blocker status) + - "p05 vendor decision needs to happen this week" (action item) -EXTRACT (earned insight): -- "At F/1.2, Zerodur CTE gradient across the blank is the second-largest WFE contributor after gravity sag — costs ~3nm and drove the support pad layout" -- "Preston removal rate model breaks down below 5N applied force because the contact assumption fails" -- "For swing-arm polishing, m=1 (coma) is NOT correctable by force modulation (score 0.09) — only m=2 and m=3 work" +2. DECISIONS AND CHOICES — anything that commits to a direction: + - "Going with Zygo Verifire SV for p05" (decision) + - "Dropping stitching from primary workflow" (design choice) + - "USB SSD mandatory, not SD card" (architectural commitment) -DO NOT EXTRACT (common knowledge): -- "Zerodur CTE is 0.05 ppm/K" (textbook value) -- "FEA uses finite elements to discretize continuous domains" (definition) -- "Python is a programming language" (obvious) -- "git commit saves changes" (basic tool knowledge) +3. DURABLE ENGINEERING INSIGHT — earned knowledge that generalizes: + - "CTE gradient dominates WFE at F/1.2" (materials insight) + - "Preston model breaks below 5N because contact assumption fails" + - "m=1 coma NOT correctable by force modulation" (controls insight) + Test: would a competent engineer NEED experience to know this? + If it's textbook/google-findable, skip it. -Rules: +4. STAKEHOLDER AND VENDOR EVENTS: + - "Email sent to Nabeel 2026-04-13 asking for lead time" + - "Meeting with Jason on Table 7 next Tuesday" + - "Starspec wants updated CAD by Friday" -1. Only surface durable claims. Skip transient status, instructional guidance, troubleshooting tactics, ephemeral recommendations, and session recaps. -2. A candidate is durable when a reader coming back in two weeks would still need to know it. -3. Each candidate must stand alone in one sentence under 200 characters. -4. Type must be one of: project, knowledge, preference, adaptation. -5. For project-specific claims, set ``project`` to the project id. -6. For generalizable domain insight, set ``project`` to empty and set ``domain`` to one of: physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance. -7. When one conversation produces BOTH a project-specific fact AND a generalizable principle, emit BOTH as separate candidates. -8. Return [] on most turns. The bar is high. Empty is correct and expected. -9. Confidence 0.5 default. Raise to 0.6 only for unambiguous committed claims. -10. Output a raw JSON array only. No prose, no markdown fences. +5. PREFERENCES AND ADAPTATIONS that shape how Antoine works: + - "Antoine prefers OAuth over API keys" + - "Extraction stays off the capture hot path" -Each array element: +WHAT TO SKIP: -{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "domain": "", "confidence": 0.5} +- Pure conversational filler ("ok thanks", "let me check") +- Instructional help content ("run this command", "here's how to...") +- Obvious textbook facts anyone can google in 30 seconds +- Session meta-chatter ("let me commit this", "deploy running") +- Transient system state snapshots ("36 active memories right now") -Use ``project`` for project-scoped candidates. Use ``domain`` for cross-project knowledge. Never set both.""" +CANDIDATE TYPES — choose the best fit: + +- project — a fact, decision, or event specific to one named project +- knowledge — durable engineering insight (use domain, not project) +- preference — how Antoine works / wants things done +- adaptation — a standing rule or adjustment to behavior +- episodic — a stakeholder event or milestone worth remembering + +DOMAINS for knowledge candidates (required when type=knowledge and project is empty): +physics, materials, optics, mechanics, manufacturing, metrology, +controls, software, math, finance, business + +TRUST HIERARCHY: + +- project-specific: set project to the project id, leave domain empty +- domain knowledge: set domain, leave project empty +- events/activity: use project, type=project or episodic +- one conversation can produce MULTIPLE candidates — emit them all + +OUTPUT RULES: + +- Each candidate content under 250 characters, stands alone +- Default confidence 0.5. Raise to 0.7 only for ratified/committed claims. +- Raw JSON array, no prose, no markdown fences +- Empty array [] is fine when the conversation has no durable signal + +Each element: +{"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5}""" @dataclass