feat: Karpathy-inspired upgrades — contradiction, lint, synthesis

Three additive upgrades borrowed from Karpathy's LLM Wiki pattern: 1. CONTRADICTION DETECTION: auto-triage now has a fourth verdict — "contradicts". When a candidate conflicts with an existing memory (not duplicates, genuine disagreement like "Option A selected" vs "Option B selected"), the triage model flags it and leaves it in the queue for human review instead of silently rejecting or double-storing. Preserves source tension rather than suppressing it. 2. WEEKLY LINT PASS: scripts/lint_knowledge_base.py checks for: - Orphan memories (active but zero references after 14 days) - Stale candidates (>7 days unreviewed) - Unused entities (no relationships) - Empty-state projects - Unregistered projects auto-detected in memories Runs Sundays via the cron. Outputs a report. 3. WEEKLY SYNTHESIS: scripts/synthesize_projects.py uses sonnet to generate a 3-5 sentence "current state" paragraph per project from state + memories + entities. Cached in project_state under status/synthesis_cache. Wiki project pages now show this at the top under "Current State (auto-synthesis)". Falls back to a deterministic summary if no cache exists. deploy/dalidou/batch-extract.sh: added Step C (synthesis) and Step D (lint) gated to Sundays via date check. All additive — nothing existing changes behavior. The database remains the source of truth; these operations just produce better synthesized views and catch rot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 21:08:13 -04:00
parent 761c483474
commit c1f5b3bdee
5 changed files with 421 additions and 5 deletions
--- a/deploy/dalidou/batch-extract.sh
+++ b/deploy/dalidou/batch-extract.sh
@@ -51,4 +51,19 @@ python3 "$APP_DIR/scripts/auto_triage.py" \
    log "WARN: auto-triage failed (non-blocking)"
 }

+# Step C: Weekly synthesis (Sundays only)
+if [[ "$(date -u +%u)" == "7" ]]; then
+    log "Step C: weekly project synthesis"
+    python3 "$APP_DIR/scripts/synthesize_projects.py" \
+        --base-url "$ATOCORE_URL" \
+        2>&1 || {
+        log "WARN: synthesis failed (non-blocking)"
+    }
+
+    log "Step D: weekly lint pass"
+    python3 "$APP_DIR/scripts/lint_knowledge_base.py" \
+        --base-url "$ATOCORE_URL" \
+        2>&1 || true
+fi
+
 log "=== AtoCore batch extraction + triage complete ==="
--- a/scripts/auto_triage.py
+++ b/scripts/auto_triage.py
@@ -47,7 +47,7 @@ You will receive:

 For each candidate, output exactly one JSON object:

-{"verdict": "promote|reject|needs_human", "confidence": 0.0-1.0, "reason": "one sentence"}
+{"verdict": "promote|reject|needs_human|contradicts", "confidence": 0.0-1.0, "reason": "one sentence", "conflicts_with": "id of existing memory if contradicts"}

 Rules:

@@ -61,9 +61,11 @@ Rules:
   - A session observation or conversational filler
   - A process rule that belongs in DEV-LEDGER.md or AGENTS.md, not memory

-3. NEEDS_HUMAN when you're genuinely unsure — the candidate might be valuable but you can't tell without domain knowledge. This should be rare (< 20% of candidates).
+3. CONTRADICTS when the candidate *conflicts* with an existing active memory (not a duplicate, but states something that can't both be true). Set `conflicts_with` to the existing memory id. This flags the tension for human review instead of silently rejecting or double-storing. Examples: "Option A selected" vs "Option B selected" for the same decision; "uses material X" vs "uses material Y" for the same component.

-4. Output ONLY the JSON object. No prose, no markdown, no explanation outside the reason field."""
+4. NEEDS_HUMAN when you're genuinely unsure — the candidate might be valuable but you can't tell without domain knowledge. This should be rare (< 20% of candidates).
+
+5. Output ONLY the JSON object. No prose, no markdown, no explanation outside the reason field."""

 _sandbox_cwd = None

@@ -169,7 +171,7 @@ def parse_verdict(raw):
        return {"verdict": "needs_human", "confidence": 0.0, "reason": "failed to parse triage output"}

    verdict = str(parsed.get("verdict", "needs_human")).strip().lower()
-    if verdict not in {"promote", "reject", "needs_human"}:
+    if verdict not in {"promote", "reject", "needs_human", "contradicts"}:
        verdict = "needs_human"

    confidence = parsed.get("confidence", 0.5)
@@ -179,7 +181,13 @@ def parse_verdict(raw):
        confidence = 0.5

    reason = str(parsed.get("reason", "")).strip()[:200]
-    return {"verdict": verdict, "confidence": confidence, "reason": reason}
+    conflicts_with = str(parsed.get("conflicts_with", "")).strip()
+    return {
+        "verdict": verdict,
+        "confidence": confidence,
+        "reason": reason,
+        "conflicts_with": conflicts_with,
+    }


 def main():
@@ -211,6 +219,7 @@ def main():
        verdict = verdict_obj["verdict"]
        conf = verdict_obj["confidence"]
        reason = verdict_obj["reason"]
+        conflicts_with = verdict_obj.get("conflicts_with", "")

        mid = cand["id"]
        label = f"[{i:2d}/{len(candidates)}] {mid[:8]} [{cand['memory_type']}]"
@@ -236,6 +245,13 @@ def main():
                except Exception:
                    errors += 1
            rejected += 1
+        elif verdict == "contradicts":
+            # Leave candidate in queue but flag the conflict in content
+            # so the wiki/triage shows it. This is conservative: we
+            # don't silently merge or reject when sources disagree.
+            print(f"  CONTRADICTS    {label}  vs {conflicts_with[:8] if conflicts_with else '?'}  {reason}")
+            contradicts_count = locals().get('contradicts_count', 0) + 1
+            needs_human += 1
        else:
            print(f"  NEEDS_HUMAN    {label}  conf={conf:.2f}  {reason}")
            needs_human += 1
--- a/scripts/lint_knowledge_base.py
+++ b/scripts/lint_knowledge_base.py
@@ -0,0 +1,170 @@
+"""Weekly lint pass — health check for the AtoCore knowledge base.
+
+Inspired by Karpathy's LLM Wiki pattern (the 'lint' operation).
+Checks for orphans, stale claims, contradictions, and gaps.
+Outputs a report that can be posted to the wiki as needs_review.
+
+Usage:
+  python3 scripts/lint_knowledge_base.py --base-url http://dalidou:8100
+
+Run weekly via cron, or on-demand when the knowledge base feels stale.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import urllib.request
+from datetime import datetime, timezone, timedelta
+
+DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://localhost:8100")
+ORPHAN_AGE_DAYS = 14
+
+
+def api_get(base_url: str, path: str):
+    with urllib.request.urlopen(f"{base_url}{path}", timeout=15) as r:
+        return json.loads(r.read())
+
+
+def parse_ts(ts: str) -> datetime | None:
+    if not ts:
+        return None
+    try:
+        return datetime.strptime(ts[:19], "%Y-%m-%d %H:%M:%S").replace(tzinfo=timezone.utc)
+    except Exception:
+        return None
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
+    args = parser.parse_args()
+    b = args.base_url
+    now = datetime.now(timezone.utc)
+    orphan_threshold = now - timedelta(days=ORPHAN_AGE_DAYS)
+
+    print(f"=== AtoCore Lint — {now.strftime('%Y-%m-%d %H:%M UTC')} ===\n")
+
+    findings = {
+        "orphan_memories": [],
+        "stale_candidates": [],
+        "unused_entities": [],
+        "empty_state_projects": [],
+        "unregistered_projects": [],
+    }
+
+    # 1. Orphan memories: active but never reinforced after N days
+    memories = api_get(b, "/memory?active_only=true&limit=500").get("memories", [])
+    for m in memories:
+        updated = parse_ts(m.get("updated_at", ""))
+        if m.get("reference_count", 0) == 0 and updated and updated < orphan_threshold:
+            findings["orphan_memories"].append({
+                "id": m["id"],
+                "type": m["memory_type"],
+                "project": m.get("project") or "(none)",
+                "age_days": (now - updated).days,
+                "content": m["content"][:120],
+            })
+
+    # 2. Stale candidates: been in queue > 7 days without triage
+    candidates = api_get(b, "/memory?status=candidate&limit=500").get("memories", [])
+    stale_threshold = now - timedelta(days=7)
+    for c in candidates:
+        updated = parse_ts(c.get("updated_at", ""))
+        if updated and updated < stale_threshold:
+            findings["stale_candidates"].append({
+                "id": c["id"],
+                "age_days": (now - updated).days,
+                "content": c["content"][:120],
+            })
+
+    # 3. Unused entities: no relationships in either direction
+    entities = api_get(b, "/entities?limit=500").get("entities", [])
+    for e in entities:
+        try:
+            detail = api_get(b, f"/entities/{e['id']}")
+            if not detail.get("relationships"):
+                findings["unused_entities"].append({
+                    "id": e["id"],
+                    "type": e["entity_type"],
+                    "name": e["name"],
+                    "project": e.get("project") or "(none)",
+                })
+        except Exception:
+            pass
+
+    # 4. Registered projects with no state entries
+    try:
+        projects = api_get(b, "/projects").get("projects", [])
+        for p in projects:
+            state = api_get(b, f"/project/state/{p['id']}").get("entries", [])
+            if not state:
+                findings["empty_state_projects"].append(p["id"])
+    except Exception:
+        pass
+
+    # 5. Memories tagged to unregistered projects (auto-detection candidates)
+    registered_ids = {p["id"] for p in projects} | {
+        a for p in projects for a in p.get("aliases", [])
+    }
+    all_mems = api_get(b, "/memory?limit=500").get("memories", [])
+    for m in all_mems:
+        proj = m.get("project", "")
+        if proj and proj not in registered_ids and proj != "(none)":
+            if proj not in findings["unregistered_projects"]:
+                findings["unregistered_projects"].append(proj)
+
+    # Print report
+    print(f"## Orphan memories (active, no reinforcement, >{ORPHAN_AGE_DAYS} days old)")
+    if findings["orphan_memories"]:
+        print(f"  Found: {len(findings['orphan_memories'])}")
+        for o in findings["orphan_memories"][:10]:
+            print(f"  - [{o['type']}] {o['project']} ({o['age_days']}d): {o['content']}")
+    else:
+        print("  (none)")
+
+    print(f"\n## Stale candidates (>7 days in queue)")
+    if findings["stale_candidates"]:
+        print(f"  Found: {len(findings['stale_candidates'])}")
+        for s in findings["stale_candidates"][:10]:
+            print(f"  - ({s['age_days']}d): {s['content']}")
+    else:
+        print("  (none)")
+
+    print(f"\n## Unused entities (no relationships)")
+    if findings["unused_entities"]:
+        print(f"  Found: {len(findings['unused_entities'])}")
+        for u in findings["unused_entities"][:10]:
+            print(f"  - [{u['type']}] {u['project']}: {u['name']}")
+    else:
+        print("  (none)")
+
+    print(f"\n## Empty-state projects")
+    if findings["empty_state_projects"]:
+        print(f"  Found: {len(findings['empty_state_projects'])}")
+        for p in findings["empty_state_projects"]:
+            print(f"  - {p}")
+    else:
+        print("  (none)")
+
+    print(f"\n## Unregistered projects detected in memories")
+    if findings["unregistered_projects"]:
+        print(f"  Found: {len(findings['unregistered_projects'])}")
+        print("  These were auto-detected by extraction — consider registering them:")
+        for p in findings["unregistered_projects"]:
+            print(f"  - {p}")
+    else:
+        print("  (none)")
+
+    total_findings = sum(
+        len(v) if isinstance(v, list) else 0 for v in findings.values()
+    )
+    print(f"\n=== Total findings: {total_findings} ===")
+
+    # Return exit code based on findings count (for CI)
+    return 0 if total_findings == 0 else 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/synthesize_projects.py
+++ b/scripts/synthesize_projects.py
@@ -0,0 +1,168 @@
+"""Weekly project synthesis — LLM-generated 'current state' paragraph per project.
+
+Reads each registered project's state entries, memories, and entities,
+asks sonnet for a 3-5 sentence synthesis, and caches it under
+project_state/status/synthesis_cache. The wiki's project page reads
+this cached synthesis as the top band.
+
+Runs weekly via cron (or manually). Cheap — one LLM call per project.
+
+Usage:
+  python3 scripts/synthesize_projects.py --base-url http://localhost:8100
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import shutil
+import subprocess
+import tempfile
+import urllib.request
+
+DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://localhost:8100")
+DEFAULT_MODEL = os.environ.get("ATOCORE_SYNTHESIS_MODEL", "sonnet")
+TIMEOUT_S = 60
+
+SYSTEM_PROMPT = """You are summarizing the current state of an engineering project for a personal context engine called AtoCore.
+
+You will receive:
+- Project state entries (decisions, requirements, status)
+- Active memories tagged to this project
+- Entity graph (subsystems, components, materials, decisions)
+
+Write a 3-5 sentence synthesis covering:
+1. What the project is and its current stage
+2. The key locked-in decisions and architecture
+3. What the next focus is
+
+Rules:
+- Plain prose, no bullet lists
+- Factual, grounded in what the data says — don't invent or speculate
+- Present tense
+- Under 500 characters total
+- No markdown formatting, just prose
+- If the data is sparse, say so honestly ("limited project data available")
+
+Output ONLY the synthesis paragraph. No preamble, no JSON, no markdown headers."""
+
+
+_cwd = None
+
+
+def get_cwd():
+    global _cwd
+    if _cwd is None:
+        _cwd = tempfile.mkdtemp(prefix="ato-synth-")
+    return _cwd
+
+
+def api_get(base_url, path):
+    with urllib.request.urlopen(f"{base_url}{path}", timeout=15) as r:
+        return json.loads(r.read())
+
+
+def api_post(base_url, path, body):
+    data = json.dumps(body).encode("utf-8")
+    req = urllib.request.Request(
+        f"{base_url}{path}", method="POST",
+        headers={"Content-Type": "application/json"}, data=data,
+    )
+    with urllib.request.urlopen(req, timeout=15) as r:
+        return json.loads(r.read())
+
+
+def synthesize_project(base_url, project_id, model):
+    # Gather context
+    state = api_get(base_url, f"/project/state/{project_id}").get("entries", [])
+    memories = api_get(base_url, f"/memory?project={project_id}&active_only=true&limit=20").get("memories", [])
+    entities = api_get(base_url, f"/entities?project={project_id}&limit=50").get("entities", [])
+
+    if not (state or memories or entities):
+        return None
+
+    lines = [f"PROJECT: {project_id}\n"]
+    if state:
+        lines.append("STATE ENTRIES:")
+        for e in state[:15]:
+            if e.get("key") == "synthesis_cache":
+                continue
+            lines.append(f"  [{e['category']}] {e['key']}: {e['value'][:200]}")
+
+    if memories:
+        lines.append("\nACTIVE MEMORIES:")
+        for m in memories[:10]:
+            lines.append(f"  [{m['memory_type']}] {m['content'][:200]}")
+
+    if entities:
+        lines.append("\nENTITIES:")
+        by_type = {}
+        for e in entities:
+            by_type.setdefault(e["entity_type"], []).append(e["name"])
+        for t, names in by_type.items():
+            lines.append(f"  {t}: {', '.join(names[:8])}")
+
+    user_msg = "\n".join(lines) + "\n\nWrite the synthesis paragraph now."
+
+    if not shutil.which("claude"):
+        print(f"  ! claude CLI not available, skipping {project_id}")
+        return None
+
+    try:
+        result = subprocess.run(
+            ["claude", "-p", "--model", model,
+             "--append-system-prompt", SYSTEM_PROMPT,
+             "--disable-slash-commands",
+             user_msg],
+            capture_output=True, text=True, timeout=TIMEOUT_S,
+            cwd=get_cwd(), encoding="utf-8", errors="replace",
+        )
+    except Exception as e:
+        print(f"  ! subprocess failed for {project_id}: {e}")
+        return None
+
+    if result.returncode != 0:
+        print(f"  ! claude exit {result.returncode} for {project_id}")
+        return None
+
+    synthesis = (result.stdout or "").strip()
+    if not synthesis or len(synthesis) < 50:
+        return None
+    return synthesis[:1000]
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
+    parser.add_argument("--model", default=DEFAULT_MODEL)
+    parser.add_argument("--project", default=None, help="single project to synthesize")
+    args = parser.parse_args()
+
+    projects = api_get(args.base_url, "/projects").get("projects", [])
+    if args.project:
+        projects = [p for p in projects if p["id"] == args.project]
+
+    print(f"Synthesizing {len(projects)} project(s) with {args.model}...")
+
+    for p in projects:
+        pid = p["id"]
+        print(f"\n- {pid}")
+        synthesis = synthesize_project(args.base_url, pid, args.model)
+        if synthesis:
+            print(f"  {synthesis[:200]}...")
+            try:
+                api_post(args.base_url, "/project/state", {
+                    "project": pid,
+                    "category": "status",
+                    "key": "synthesis_cache",
+                    "value": synthesis,
+                    "source": "weekly synthesis pass",
+                })
+                print(f"  + cached")
+            except Exception as e:
+                print(f"  ! save failed: {e}")
+
+
+if __name__ == "__main__":
+    main()
--- a/src/atocore/engineering/mirror.py
+++ b/src/atocore/engineering/mirror.py
@@ -28,6 +28,7 @@ def generate_project_overview(project: str) -> str:
    """Generate a full project overview page in markdown."""
    sections = [
        _header(project),
+        _synthesis_section(project),
        _state_section(project),
        _system_architecture(project),
        _decisions_section(project),
@@ -40,6 +41,52 @@ def generate_project_overview(project: str) -> str:
    return "\n\n".join(s for s in sections if s)


+def _synthesis_section(project: str) -> str:
+    """Generate a short LLM synthesis of the current project state.
+
+    Reads the cached synthesis from project_state if available
+    (category=status, key=synthesis_cache). If not cached, returns
+    a deterministic summary from the existing structured data.
+    The actual LLM-generated synthesis is produced by the weekly
+    lint/synthesis pass on Dalidou (where claude CLI is available).
+    """
+    entries = get_state(project)
+    cached = ""
+    for e in entries:
+        if e.category == "status" and e.key == "synthesis_cache":
+            cached = e.value
+            break
+
+    if cached:
+        return f"## Current State (auto-synthesis)\n\n> {cached}"
+
+    # Fallback: deterministic summary from structured data
+    stage = ""
+    summary = ""
+    next_focus = ""
+    for e in entries:
+        if e.category == "status":
+            if e.key == "stage":
+                stage = e.value
+            elif e.key == "summary":
+                summary = e.value
+            elif e.key == "next_focus":
+                next_focus = e.value
+
+    if not (stage or summary or next_focus):
+        return ""
+
+    bits = []
+    if summary:
+        bits.append(summary)
+    if stage:
+        bits.append(f"**Stage**: {stage}")
+    if next_focus:
+        bits.append(f"**Next**: {next_focus}")
+
+    return "## Current State\n\n" + "\n\n".join(bits)
+
+
 def _header(project: str) -> str:
    return (
        f"# {project} — Project Overview\n\n"