feat: Karpathy-inspired upgrades — contradiction, lint, synthesis

Three additive upgrades borrowed from Karpathy's LLM Wiki pattern: 1. CONTRADICTION DETECTION: auto-triage now has a fourth verdict — "contradicts". When a candidate conflicts with an existing memory (not duplicates, genuine disagreement like "Option A selected" vs "Option B selected"), the triage model flags it and leaves it in the queue for human review instead of silently rejecting or double-storing. Preserves source tension rather than suppressing it. 2. WEEKLY LINT PASS: scripts/lint_knowledge_base.py checks for: - Orphan memories (active but zero references after 14 days) - Stale candidates (>7 days unreviewed) - Unused entities (no relationships) - Empty-state projects - Unregistered projects auto-detected in memories Runs Sundays via the cron. Outputs a report. 3. WEEKLY SYNTHESIS: scripts/synthesize_projects.py uses sonnet to generate a 3-5 sentence "current state" paragraph per project from state + memories + entities. Cached in project_state under status/synthesis_cache. Wiki project pages now show this at the top under "Current State (auto-synthesis)". Falls back to a deterministic summary if no cache exists. deploy/dalidou/batch-extract.sh: added Step C (synthesis) and Step D (lint) gated to Sundays via date check. All additive — nothing existing changes behavior. The database remains the source of truth; these operations just produce better synthesized views and catch rot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 21:08:13 -04:00
parent 761c483474
commit c1f5b3bdee
5 changed files with 421 additions and 5 deletions
--- a/scripts/auto_triage.py
+++ b/scripts/auto_triage.py
@@ -47,7 +47,7 @@ You will receive:

 For each candidate, output exactly one JSON object:

-{"verdict": "promote|reject|needs_human", "confidence": 0.0-1.0, "reason": "one sentence"}
+{"verdict": "promote|reject|needs_human|contradicts", "confidence": 0.0-1.0, "reason": "one sentence", "conflicts_with": "id of existing memory if contradicts"}

 Rules:

@@ -61,9 +61,11 @@ Rules:
   - A session observation or conversational filler
   - A process rule that belongs in DEV-LEDGER.md or AGENTS.md, not memory

-3. NEEDS_HUMAN when you're genuinely unsure — the candidate might be valuable but you can't tell without domain knowledge. This should be rare (< 20% of candidates).
+3. CONTRADICTS when the candidate *conflicts* with an existing active memory (not a duplicate, but states something that can't both be true). Set `conflicts_with` to the existing memory id. This flags the tension for human review instead of silently rejecting or double-storing. Examples: "Option A selected" vs "Option B selected" for the same decision; "uses material X" vs "uses material Y" for the same component.

-4. Output ONLY the JSON object. No prose, no markdown, no explanation outside the reason field."""
+4. NEEDS_HUMAN when you're genuinely unsure — the candidate might be valuable but you can't tell without domain knowledge. This should be rare (< 20% of candidates).
+
+5. Output ONLY the JSON object. No prose, no markdown, no explanation outside the reason field."""

 _sandbox_cwd = None

@@ -169,7 +171,7 @@ def parse_verdict(raw):
        return {"verdict": "needs_human", "confidence": 0.0, "reason": "failed to parse triage output"}

    verdict = str(parsed.get("verdict", "needs_human")).strip().lower()
-    if verdict not in {"promote", "reject", "needs_human"}:
+    if verdict not in {"promote", "reject", "needs_human", "contradicts"}:
        verdict = "needs_human"

    confidence = parsed.get("confidence", 0.5)
@@ -179,7 +181,13 @@ def parse_verdict(raw):
        confidence = 0.5

    reason = str(parsed.get("reason", "")).strip()[:200]
-    return {"verdict": verdict, "confidence": confidence, "reason": reason}
+    conflicts_with = str(parsed.get("conflicts_with", "")).strip()
+    return {
+        "verdict": verdict,
+        "confidence": confidence,
+        "reason": reason,
+        "conflicts_with": conflicts_with,
+    }


 def main():
@@ -211,6 +219,7 @@ def main():
        verdict = verdict_obj["verdict"]
        conf = verdict_obj["confidence"]
        reason = verdict_obj["reason"]
+        conflicts_with = verdict_obj.get("conflicts_with", "")

        mid = cand["id"]
        label = f"[{i:2d}/{len(candidates)}] {mid[:8]} [{cand['memory_type']}]"
@@ -236,6 +245,13 @@ def main():
                except Exception:
                    errors += 1
            rejected += 1
+        elif verdict == "contradicts":
+            # Leave candidate in queue but flag the conflict in content
+            # so the wiki/triage shows it. This is conservative: we
+            # don't silently merge or reject when sources disagree.
+            print(f"  CONTRADICTS    {label}  vs {conflicts_with[:8] if conflicts_with else '?'}  {reason}")
+            contradicts_count = locals().get('contradicts_count', 0) + 1
+            needs_human += 1
        else:
            print(f"  NEEDS_HUMAN    {label}  conf={conf:.2f}  {reason}")
            needs_human += 1