# AtoCore Master Brain Plan > Vision: AtoCore becomes the **single source of truth** that grounds every LLM > interaction across the entire ecosystem (Claude, OpenClaw, Codex, Ollama, future > agents). Every prompt is automatically enriched with full project context. The > brain self-grows from daily work, auto-organizes its metadata, and stays > flawlessly reliable. ## The Core Insight AtoCore today is a **well-architected capture + curation system with a critical gap on the consumption side**. We pour water into the bucket (capture from Claude Code Stop hook + OpenClaw message hooks) but nothing is drinking from it at prompt time. Fixing that gap is the single highest-leverage move. **Once every LLM call is AtoCore-grounded automatically, the feedback loop closes**: LLMs use the context → produce better responses → those responses reference the injected memories → reinforcement fires → knowledge curates itself. The capture side is already working. The pull side is what's missing. ## Universal Consumption Strategy MCP is great for Claude (Claude Desktop, Claude Code, Cursor, Zed, Windsurf) but is **not universal**. OpenClaw has its own plugin SDK. Codex, Ollama, and GPT don't natively support MCP. The right strategy: **HTTP API is the truth; every client gets the thinnest possible adapter.** ``` ┌─────────────────────┐ │ AtoCore HTTP API │ ← canonical interface │ /context/build │ │ /query │ │ /memory │ │ /project/state │ └──────────┬──────────┘ │ ┌────────────┬───────────┼──────────┬────────────┐ │ │ │ │ │ ┌──┴───┐ ┌────┴────┐ ┌───┴───┐ ┌───┴────┐ ┌───┴────┐ │ MCP │ │OpenClaw │ │Claude │ │ Codex │ │ Ollama │ │server│ │ plugin │ │ Code │ │ skill │ │ proxy │ │ │ │ (pull) │ │ hook │ │ │ │ │ └──┬───┘ └────┬────┘ └───┬───┘ └────┬───┘ └────┬───┘ │ │ │ │ │ Claude OpenClaw Claude Code Codex CLI Ollama Desktop, agent local Cursor, models Zed, Windsurf ``` Each adapter's only job: accept a prompt, call AtoCore HTTP, prepend the returned context pack. The adapter itself carries no logic. ## Three Integration Tiers ### Tier 1: MCP-native clients (Claude ecosystem) Build **atocore-mcp** — a standalone MCP server that wraps the HTTP API. Exposes: - `context(query, project)` → context pack - `search(query)` → raw retrieval - `remember(type, content, project)` → create candidate memory - `recall(project, key)` → project state lookup - `list_projects()` → registered projects Works with Claude Desktop, Claude Code (via `claude mcp add atocore`), Cursor, Zed, Windsurf without any per-client work beyond config. ### Tier 2: Custom plugin ecosystems (OpenClaw) Extend the existing `atocore-capture` plugin on T420 to also register a **`before_prompt_build`** hook that pulls context from AtoCore and injects it into the agent's system prompt. The plugin already has the HTTP client, the authentication, the fail-open pattern. This is ~30 lines of added code. ### Tier 3: Everything else (Codex, Ollama, custom agents) For clients without plugin/hook systems, ship a **thin proxy/middleware** the user configures as the LLM endpoint: - `atocore-proxy` listens on `localhost:PORT` - Intercepts OpenAI-compatible chat/completion calls - Pulls context from AtoCore, injects into system prompt - Forwards to the real model endpoint (OpenAI, Ollama, Anthropic, etc.) - Returns the response, then captures the interaction back to AtoCore This makes AtoCore a "drop-in" layer for anything that speaks OpenAI-compatible HTTP — which is nearly every modern LLM runtime. ## Knowledge Density Plan The brain is only as smart as what it knows. Current state: 80 active memories across 6 projects, 324 candidates in the queue being processed. Target: **1,000+ curated memories** to become a real master brain. Mechanisms: 1. **Finish the current triage pass** (324 → ~80 more promotions expected). 2. **Re-extract with stronger prompt on existing 236 interactions** — tune the LLM extractor system prompt to pull more durable facts and fewer ephemeral snapshots. 3. **Ingest all drive/vault documents as memory candidates** (not just chunks). Every structured markdown section with a decision/fact/requirement header becomes a candidate memory. 4. **Multi-source triangulation**: same fact in 3+ sources = auto-promote to confidence 0.95. 5. **Cross-project synthesis**: facts appearing in multiple project contexts get promoted to global domain knowledge. ## Auto-Organization of Metadata Currently: `type`, `project`, `confidence`, `status`, `reference_count`. For master brain we need more structure, inferred automatically: | Addition | Purpose | Mechanism | |---|---|---| | **Domain tags** (optics, mechanics, firmware, business…) | Cross-cutting retrieval | LLM inference during triage | | **Temporal scope** (permanent, valid_until_X, transient) | Avoid stale truth | LLM classifies during triage | | **Source refs** (chunk_id[], interaction_id[]) | Provenance for every fact | Enforced at creation time | | **Relationships** (contradicts, updates, depends_on) | Memory graph | Triage infers during review | | **Semantic clusters** | Detect duplicates, find gaps | Weekly HDBSCAN pass on embeddings | Layer these in progressively — none of them require schema rewrites, just additional fields and batch jobs. ## Self-Growth Mechanisms Four loops that make AtoCore grow autonomously: ### 1. Drift detection (nightly) Compare new chunk embeddings to existing vector distribution. Centroids >X cosine distance from any existing centroid = new knowledge area. Log to dashboard; human decides if it's noise or a domain worth curating. ### 2. Gap identification (continuous) Every `/context/build` logs `query + chunks_returned + memories_returned`. Weekly report: "top 10 queries with weak coverage." Those are targeted curation opportunities. ### 3. Multi-source triangulation (weekly) Scan memory content similarity across sources. When a fact appears in 3+ independent sources (vault doc + drive doc + interaction), auto-promote to high confidence and mark as "triangulated." ### 4. Active learning prompts (monthly) Surface "you have 200 p06 memories but only 15 p04 memories. Spend 30 min curating p04?" via dashboard digest. ## Robustness Strategy (Flawless Operation Bar) Current: nightly backup, off-host rsync, health endpoint, 303 tests, harness, enhanced dashboard with pipeline health (this session). To reach "flawless": | Gap | Fix | Priority | |---|---|---| | Silent pipeline failures | Alerting webhook on harness drop / pipeline skip | P1 | | Memory mutations untracked | Append-only audit log table | P1 | | Integrity drift | Nightly FK + vector-chunk parity checks | P1 | | Schema migrations ad-hoc | Formal migration framework with rollback | P2 | | Single point of failure | Daily backup to user's main computer (new) | P1 | | No hot standby | Second instance following primary via WAL | P3 | | No temporal history | Memory audit + valid_until fields | P2 | ### Daily Backup to Main Computer Currently: Dalidou → T420 (192.168.86.39) via rsync. Add: Dalidou → main computer via a pull (main computer runs the rsync, pulls from Dalidou). Pull-based is simpler than push — no need for SSH keys on Dalidou to reach the Windows machine. ```bash # On main computer, daily scheduled task: rsync -a papa@dalidou:/srv/storage/atocore/backups/snapshots/ \ /path/to/local/atocore-backups/ ``` Configure via Windows Task Scheduler or a cron-like runner. Verify weekly that the latest snapshot is present. ## Human Interface Auto-Evolution Current: wiki at `/wiki`, regenerates on every request from DB. Synthesis (the "current state" paragraph at top of project pages) runs **weekly on Sundays only**. That's why it feels stalled. Fixes: 1. **Run synthesis daily, not weekly.** It's cheap (one claude call per project) and keeps the human-readable overview fresh. 2. **Trigger synthesis on major events** — when 5+ new memories land for a project, regenerate its synthesis. 3. **Add "What's New" feed** — wiki homepage shows recent additions across all projects (last 7 days of memory promotions, state entries, entities). 4. **Memory timeline view** — project page gets a chronological list of what we learned when. ## Phased Roadmap (8-10 weeks) ### Phase 1 (week 1-2): Universal Consumption **Goal: every LLM call is AtoCore-grounded automatically.** - [ ] Build `atocore-mcp` server (wraps HTTP API, stdio transport) - [ ] Publish to npm / or run via `pipx` / stdlib HTTP - [ ] Configure in Claude Desktop (`~/.claude/mcp_servers.json`) - [ ] Configure in Claude Code (`claude mcp add atocore …`) - [ ] Extend OpenClaw plugin with `before_prompt_build` PULL - [ ] Write `atocore-proxy` middleware for Codex/Ollama/generic clients - [ ] Document configuration for each client **Success:** open a fresh Claude Code session, ask a project question, verify the response references AtoCore memories without manual context commands. ### Phase 2 (week 2-3): Knowledge Density + Wiki Evolution - [ ] Finish current triage pass (324 candidates → active) - [ ] Tune extractor prompt for higher promotion rate on durable facts - [ ] Daily synthesis in cron (not just Sundays) - [ ] Event-triggered synthesis on significant project changes - [ ] Wiki "What's New" feed - [ ] Memory timeline per project **Target:** 300+ active memories, wiki feels alive daily. ### Phase 3 (week 3-4): Auto-Organization - [ ] Schema: add `domain_tags`, `valid_until`, `source_refs`, `triangulated_count` - [ ] Triage prompt upgraded: infer tags + temporal scope + relationships - [ ] Weekly HDBSCAN clustering of embeddings → dup detection + gap reports - [ ] Relationship edges in a new `memory_relationships` table ### Phase 4 (week 4-5): Robustness Hardening - [ ] Append-only `memory_audit` table + retrofit mutations - [ ] Nightly integrity checks (FK validation, orphan detection, parity) - [ ] Alerting webhook (Discord/email) on pipeline anomalies - [ ] Daily backup to user's main computer (pull-based) - [ ] Formal migration framework ### Phase 5 (week 6-7): Engineering V1 Implementation Execute the 23 acceptance criteria in `docs/architecture/engineering-v1-acceptance.md` against p06-polisher as the test bed. The ontology and queries are designed; this phase implements them. ### Phase 6 (week 8-9): Self-Growth Loops - [ ] Drift detection (nightly) - [ ] Gap identification from `/context/build` logs - [ ] Multi-source triangulation - [ ] Active learning digest (monthly) - [ ] Cross-project synthesis ### Phase 7 (ongoing): Scale & Polish - [ ] Multi-model validation (sonnet triages, opus cross-checks on disagreements) - [ ] AtoDrive integration (Google Drive as trusted source) - [ ] Hot standby when real production dependence materializes - [ ] More MCP tools (write-back, memory search, entity queries) ## Success Criteria AtoCore is a master brain when: 1. **Zero manual context commands.** A fresh Claude/OpenClaw session answering a project question without being told "use AtoCore context." 2. **1,000+ active memories** with >90% provenance coverage (every fact traceable to a source). 3. **Every project has a current, human-readable overview** updated within 24h of significant changes. 4. **Harness stays >95%** across 20+ fixtures covering all active projects. 5. **Zero silent pipeline failures** for 30 consecutive days (all failures surface via alert within the hour). 6. **Claude on any task knows what we know** — user asks "what did we decide about X?" and the answer is grounded in AtoCore, not reconstructed from scratch. ## Where We Are Now (2026-04-16) - ✅ Core infrastructure: HTTP API, SQLite, Chroma, deploy pipeline - ✅ Capture pipes: Claude Code Stop hook, OpenClaw message hooks - ✅ Nightly pipeline: backup, extract, triage, synthesis, lint, harness, summary - ✅ Phase 10: auto-promotion from reinforcement + candidate expiry - ✅ Dashboard shows pipeline health + interaction totals + all projects - ⚡ 324 candidates being triaged (down from 439), ~80 active memories, growing - ❌ No consumption at prompt time (capture-only) - ❌ Wiki auto-evolves only on Sundays (synthesis cadence) - ❌ No MCP adapter - ❌ No daily backup to main computer - ❌ Engineering V1 not implemented - ❌ No alerting on pipeline failures The path is clear. Phase 1 is the keystone.