fix(extraction): R11 container 503 + R12 shared prompt module

R11: POST /admin/extract-batch with mode=llm now returns 503 when the claude CLI is unavailable (was silently returning success with 0 candidates), with a message pointing at the host-side script. +2 tests. R12: extracted SYSTEM_PROMPT + parse_llm_json_array + normalize_candidate_item + build_user_message into stdlib-only src/atocore/memory/_llm_prompt.py. Both the container extractor and scripts/batch_llm_extract_live.py now import from it, eliminating the prompt/parser drift risk. Tests 297 -> 299. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:47:01 -04:00
parent dc9fdd3a38
commit c2e7064238
6 changed files with 310 additions and 302 deletions
--- a/scripts/batch_llm_extract_live.py
+++ b/scripts/batch_llm_extract_live.py
@@ -1,12 +1,15 @@
-"""Host-side LLM batch extraction — pure HTTP client, no atocore imports.
+"""Host-side LLM batch extraction — HTTP client + shared prompt module.

 Fetches interactions from the AtoCore API, runs ``claude -p`` locally
-for each, and POSTs candidates back. Zero dependency on atocore source
-or Python packages — only uses stdlib + the ``claude`` CLI on PATH.
+for each, and POSTs candidates back. Uses stdlib + the ``claude`` CLI
+on PATH, plus the stdlib-only shared prompt/parser module at
+``atocore.memory._llm_prompt`` to eliminate prompt/parser drift
+against the in-container extractor (R12).

 This is necessary because the ``claude`` CLI is on the Dalidou HOST
 but not inside the Docker container, and the host's Python doesn't
-have the container's dependencies (pydantic_settings, etc.).
+have the container's dependencies (pydantic_settings, etc.) — so we
+only import the one stdlib-only module, not the full atocore package.
 """

 from __future__ import annotations
@@ -23,88 +26,26 @@ import urllib.parse
 import urllib.request
 from datetime import datetime, timezone

+# R12: share the prompt + parser with the in-container extractor so
+# the two paths can't drift. The imported module is stdlib-only by
+# design; see src/atocore/memory/_llm_prompt.py.
+_SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+_SRC_DIR = os.path.abspath(os.path.join(_SCRIPT_DIR, "..", "src"))
+if _SRC_DIR not in sys.path:
+    sys.path.insert(0, _SRC_DIR)
+
+from atocore.memory._llm_prompt import (  # noqa: E402
+    MEMORY_TYPES,
+    SYSTEM_PROMPT,
+    build_user_message,
+    normalize_candidate_item,
+    parse_llm_json_array,
+)
+
 DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://localhost:8100")
 DEFAULT_MODEL = os.environ.get("ATOCORE_LLM_EXTRACTOR_MODEL", "sonnet")
 DEFAULT_TIMEOUT_S = float(os.environ.get("ATOCORE_LLM_EXTRACTOR_TIMEOUT_S", "90"))
-MAX_RESPONSE_CHARS = 8000
-MAX_PROMPT_CHARS = 2000

-MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "adaptation"}
-
-SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for a personal context engine called AtoCore.
-
-AtoCore is the brain for Atomaste's engineering work. Known projects:
-p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore,
-abb-space. Unknown project names — still tag them, the system auto-detects.
-
-Your job is to emit SIGNALS that matter for future context. Be aggressive:
-err on the side of capturing useful signal. Triage filters noise downstream.
-
-WHAT TO EMIT (in order of importance):
-
-1. PROJECT ACTIVITY — any mention of a project with context worth remembering:
-   - "Schott quote received for ABB-Space" (event + project)
-   - "Cédric asked about p06 firmware timing" (stakeholder event)
-   - "Still waiting on Zygo lead-time from Nabeel" (blocker status)
-   - "p05 vendor decision needs to happen this week" (action item)
-
-2. DECISIONS AND CHOICES — anything that commits to a direction:
-   - "Going with Zygo Verifire SV for p05" (decision)
-   - "Dropping stitching from primary workflow" (design choice)
-   - "USB SSD mandatory, not SD card" (architectural commitment)
-
-3. DURABLE ENGINEERING INSIGHT — earned knowledge that generalizes:
-   - "CTE gradient dominates WFE at F/1.2" (materials insight)
-   - "Preston model breaks below 5N because contact assumption fails"
-   - "m=1 coma NOT correctable by force modulation" (controls insight)
-   Test: would a competent engineer NEED experience to know this?
-   If it's textbook/google-findable, skip it.
-
-4. STAKEHOLDER AND VENDOR EVENTS:
-   - "Email sent to Nabeel 2026-04-13 asking for lead time"
-   - "Meeting with Jason on Table 7 next Tuesday"
-   - "Starspec wants updated CAD by Friday"
-
-5. PREFERENCES AND ADAPTATIONS that shape how Antoine works:
-   - "Antoine prefers OAuth over API keys"
-   - "Extraction stays off the capture hot path"
-
-WHAT TO SKIP:
-
- Pure conversational filler ("ok thanks", "let me check")
- Instructional help content ("run this command", "here's how to...")
- Obvious textbook facts anyone can google in 30 seconds
- Session meta-chatter ("let me commit this", "deploy running")
- Transient system state snapshots ("36 active memories right now")
-
-CANDIDATE TYPES — choose the best fit:
-
- project — a fact, decision, or event specific to one named project
- knowledge — durable engineering insight (use domain, not project)
- preference — how Antoine works / wants things done
- adaptation — a standing rule or adjustment to behavior
- episodic — a stakeholder event or milestone worth remembering
-
-DOMAINS for knowledge candidates (required when type=knowledge and project is empty):
-physics, materials, optics, mechanics, manufacturing, metrology,
-controls, software, math, finance, business
-
-TRUST HIERARCHY:
-
- project-specific: set project to the project id, leave domain empty
- domain knowledge: set domain, leave project empty
- events/activity: use project, type=project or episodic
- one conversation can produce MULTIPLE candidates — emit them all
-
-OUTPUT RULES:
-
- Each candidate content under 250 characters, stands alone
- Default confidence 0.5. Raise to 0.7 only for ratified/committed claims.
- Raw JSON array, no prose, no markdown fences
- Empty array [] is fine when the conversation has no durable signal
-
-Each element:
-{"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5}"""

 _sandbox_cwd = None

@@ -175,14 +116,7 @@ def extract_one(prompt, response, project, model, timeout_s):
    if not shutil.which("claude"):
        return [], "claude_cli_missing"

-    prompt_excerpt = prompt[:MAX_PROMPT_CHARS]
-    response_excerpt = response[:MAX_RESPONSE_CHARS]
-    user_message = (
-        f"PROJECT HINT (may be empty): {project}\n\n"
-        f"USER PROMPT:\n{prompt_excerpt}\n\n"
-        f"ASSISTANT RESPONSE:\n{response_excerpt}\n\n"
-        "Return the JSON array now."
-    )
+    user_message = build_user_message(prompt, response, project)

    args = [
        "claude", "-p",
@@ -211,66 +145,25 @@ def extract_one(prompt, response, project, model, timeout_s):


 def parse_candidates(raw, interaction_project):
-    """Parse model JSON output into candidate dicts."""
-    text = raw.strip()
-    if text.startswith("```"):
-        text = text.strip("`")
-        nl = text.find("\n")
-        if nl >= 0:
-            text = text[nl + 1:]
-        if text.endswith("```"):
-            text = text[:-3]
-        text = text.strip()
-
-    if not text or text == "[]":
-        return []
-
-    if not text.lstrip().startswith("["):
-        start = text.find("[")
-        end = text.rfind("]")
-        if start >= 0 and end > start:
-            text = text[start:end + 1]
-
-    try:
-        parsed = json.loads(text)
-    except json.JSONDecodeError:
-        return []
-
-    if not isinstance(parsed, list):
-        return []
+    """Parse model JSON output into candidate dicts.

+    Stripping + per-item normalization come from the shared
+    ``_llm_prompt`` module. Host-side project attribution: interaction
+    scope wins, otherwise keep the model's tag (the API's own R9
+    registry-check will happen server-side in the container on write;
+    here we preserve the signal instead of dropping it).
+    """
    results = []
-    for item in parsed:
-        if not isinstance(item, dict):
+    for item in parse_llm_json_array(raw):
+        normalized = normalize_candidate_item(item)
+        if normalized is None:
            continue
-        mem_type = str(item.get("type") or "").strip().lower()
-        content = str(item.get("content") or "").strip()
-        model_project = str(item.get("project") or "").strip()
-        domain = str(item.get("domain") or "").strip().lower()
-        # R9 trust hierarchy: interaction scope always wins when set.
-        # For unscoped interactions, keep model's project tag even if
-        # unregistered — the system will detect new projects/leads.
-        if interaction_project:
-            project = interaction_project
-        elif model_project:
-            project = model_project
-        else:
-            project = ""
-        # Domain knowledge: embed tag in content for cross-project retrieval
-        if domain and not project:
-            content = f"[{domain}] {content}"
-        conf = item.get("confidence", 0.5)
-        if mem_type not in MEMORY_TYPES or not content:
-            continue
-        try:
-            conf = max(0.0, min(1.0, float(conf)))
-        except (TypeError, ValueError):
-            conf = 0.5
+        project = interaction_project or normalized["project"] or ""
        results.append({
-            "memory_type": mem_type,
-            "content": content[:1000],
+            "memory_type": normalized["type"],
+            "content": normalized["content"],
            "project": project,
-            "confidence": conf,
+            "confidence": normalized["confidence"],
        })
    return results