feat: dual-layer knowledge extraction + domain knowledge band

The extraction system now produces two kinds of candidates from the same conversation: A. PROJECT-SPECIFIC: applied facts scoped to a named project (unchanged behavior) B. DOMAIN KNOWLEDGE: generalizable engineering insight earned through project work, tagged with a domain (physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance) and stored with project="" so it surfaces across all projects. Critical quality bar enforced in the system prompt: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" Textbook values, definitions, and obvious facts are explicitly excluded. Only hard-won insight qualifies — the kind that takes weeks of FEA or real machining experience to discover. Domain tags are embedded in the content as a prefix ("[physics]", "[materials]") so they survive without a schema migration. A future column can parse them out. Context builder gains a new tier between project memories and retrieved chunks: Tier 1: Trusted Project State (project-specific) Tier 2: Identity / Preferences (global) Tier 3: Project Memories (project-specific) Tier 4: Domain Knowledge (NEW) (cross-project, 10% budget) Tier 5: Retrieved Chunks (project-boosted) Trim order: chunks -> domain knowledge -> project memories -> identity/preference -> project state. Host-side extraction script updated with the same prompt and domain-tag handling. LLM_EXTRACTOR_VERSION bumped to llm-0.3.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:04:04 -04:00
parent db89978871
commit 9118f824fa
3 changed files with 142 additions and 38 deletions
--- a/scripts/batch_llm_extract_live.py
+++ b/scripts/batch_llm_extract_live.py
@@ -33,24 +33,43 @@ MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "a

 SYSTEM_PROMPT = """You extract durable memory candidates from LLM conversation turns for a personal context engine called AtoCore.

-Your job is to read one user prompt plus the assistant's response and decide which durable facts, decisions, preferences, architectural rules, or project invariants should be remembered across future sessions.
+AtoCore stores two kinds of knowledge:
+
+A. PROJECT-SPECIFIC: applied decisions, constraints, and architecture for a named project (p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore). These stay scoped to one project.
+
+B. DOMAIN KNOWLEDGE: generalizable engineering insight that was EARNED through project work and is reusable across projects. Tag these with a domain instead of a project.
+
+THE CRITICAL BAR FOR DOMAIN KNOWLEDGE:
+Only extract insight that took real effort to discover. The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" If they can look it up, do NOT extract it.
+
+EXTRACT (earned insight):
+- "At F/1.2, Zerodur CTE gradient across the blank is the second-largest WFE contributor after gravity sag"
+- "Preston removal rate model breaks down below 5N applied force because the contact assumption fails"
+- "For swing-arm polishing, m=1 (coma) is NOT correctable by force modulation (score 0.09)"
+
+DO NOT EXTRACT (common knowledge):
+- "Zerodur CTE is 0.05 ppm/K" (textbook value)
+- "FEA uses finite elements to discretize continuous domains" (definition)
+- "Python is a programming language" (obvious)

 Rules:

-1. Only surface durable claims. Skip transient status ("deploy is still running"), instructional guidance ("here is how to run the command"), troubleshooting tactics, ephemeral recommendations ("merge this PR now"), and session recaps.
-2. A candidate is durable when a reader coming back in two weeks would still need to know it. Architectural choices, named rules, ratified decisions, invariants, procurement commitments, and project-level constraints qualify. Conversational fillers and step-by-step instructions do not.
-3. Each candidate must stand alone. Rewrite the claim in one sentence under 200 characters with enough context that a reader without the conversation understands it.
-4. Each candidate must have a type from this closed set: project, knowledge, preference, adaptation.
-5. If the conversation is clearly scoped to a project (p04-gigabit, p05-interferometer, p06-polisher, atocore), set ``project`` to that id. Otherwise leave ``project`` empty.
-6. If the response makes no durable claim, return an empty list. It is correct and expected to return [] on most conversational turns.
-7. Confidence should be 0.5 by default so human review workload is honest. Raise to 0.6 only when the response states the claim in an unambiguous, committed form (e.g. "the decision is X", "the selected approach is Y", "X is non-negotiable").
-8. Output must be a raw JSON array and nothing else. No prose before or after. No markdown fences. No explanations.
+1. Only surface durable claims. Skip transient status, instructional guidance, troubleshooting, ephemeral recommendations, session recaps.
+2. A candidate is durable when a reader coming back in two weeks would still need to know it.
+3. Each candidate must stand alone in one sentence under 200 characters.
+4. Type must be one of: project, knowledge, preference, adaptation.
+5. For project-specific claims, set ``project`` to the project id.
+6. For generalizable domain insight, set ``project`` to empty and set ``domain`` to one of: physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance.
+7. When one conversation produces BOTH a project-specific fact AND a generalizable principle, emit BOTH as separate candidates.
+8. Return [] on most turns. The bar is high. Empty is correct and expected.
+9. Confidence 0.5 default. Raise to 0.6 only for unambiguous committed claims.
+10. Output a raw JSON array only. No prose, no markdown fences.

-Each array element has exactly this shape:
+Each array element:

-{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "confidence": 0.5}
+{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "domain": "", "confidence": 0.5}

-Return [] when there is nothing to extract."""
+Use ``project`` for project-scoped candidates. Use ``domain`` for cross-project knowledge. Never set both."""

 _sandbox_cwd = None

@@ -192,14 +211,17 @@ def parse_candidates(raw, interaction_project):
        mem_type = str(item.get("type") or "").strip().lower()
        content = str(item.get("content") or "").strip()
        model_project = str(item.get("project") or "").strip()
+        domain = str(item.get("domain") or "").strip().lower()
        # R9 trust hierarchy: interaction scope always wins when set.
-        # Model project only used for unscoped interactions + registered check.
        if interaction_project:
            project = interaction_project
        elif model_project and model_project in _known_projects:
            project = model_project
        else:
            project = ""
+        # Domain knowledge: embed tag in content for cross-project retrieval
+        if domain and not project:
+            content = f"[{domain}] {content}"
        conf = item.get("confidence", 0.5)
        if mem_type not in MEMORY_TYPES or not content:
            continue