Documents the path from current AtoCore (capture-only, thin knowledge) to master brain status (universal consumption, dense knowledge, auto-organized, self-growing, flawless). Key strategic decisions documented: - HTTP API is the canonical truth; every client gets a thin adapter - MCP is for Claude ecosystem; OpenClaw plugin + middleware proxy handle Codex/Ollama/others - Three-tier integration: MCP server, OpenClaw plugin, generic proxy - Phase 1 (keystone) = universal consumption at prompt time - 7-phase roadmap over 8-10 weeks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 KiB
AtoCore Master Brain Plan
Vision: AtoCore becomes the single source of truth that grounds every LLM interaction across the entire ecosystem (Claude, OpenClaw, Codex, Ollama, future agents). Every prompt is automatically enriched with full project context. The brain self-grows from daily work, auto-organizes its metadata, and stays flawlessly reliable.
The Core Insight
AtoCore today is a well-architected capture + curation system with a critical gap on the consumption side. We pour water into the bucket (capture from Claude Code Stop hook + OpenClaw message hooks) but nothing is drinking from it at prompt time. Fixing that gap is the single highest-leverage move.
Once every LLM call is AtoCore-grounded automatically, the feedback loop closes: LLMs use the context → produce better responses → those responses reference the injected memories → reinforcement fires → knowledge curates itself. The capture side is already working. The pull side is what's missing.
Universal Consumption Strategy
MCP is great for Claude (Claude Desktop, Claude Code, Cursor, Zed, Windsurf) but is not universal. OpenClaw has its own plugin SDK. Codex, Ollama, and GPT don't natively support MCP. The right strategy:
HTTP API is the truth; every client gets the thinnest possible adapter.
┌─────────────────────┐
│ AtoCore HTTP API │ ← canonical interface
│ /context/build │
│ /query │
│ /memory │
│ /project/state │
└──────────┬──────────┘
│
┌────────────┬───────────┼──────────┬────────────┐
│ │ │ │ │
┌──┴───┐ ┌────┴────┐ ┌───┴───┐ ┌───┴────┐ ┌───┴────┐
│ MCP │ │OpenClaw │ │Claude │ │ Codex │ │ Ollama │
│server│ │ plugin │ │ Code │ │ skill │ │ proxy │
│ │ │ (pull) │ │ hook │ │ │ │ │
└──┬───┘ └────┬────┘ └───┬───┘ └────┬───┘ └────┬───┘
│ │ │ │ │
Claude OpenClaw Claude Code Codex CLI Ollama
Desktop, agent local
Cursor, models
Zed,
Windsurf
Each adapter's only job: accept a prompt, call AtoCore HTTP, prepend the returned context pack. The adapter itself carries no logic.
Three Integration Tiers
Tier 1: MCP-native clients (Claude ecosystem)
Build atocore-mcp — a standalone MCP server that wraps the HTTP API. Exposes:
context(query, project)→ context packsearch(query)→ raw retrievalremember(type, content, project)→ create candidate memoryrecall(project, key)→ project state lookuplist_projects()→ registered projects
Works with Claude Desktop, Claude Code (via claude mcp add atocore), Cursor,
Zed, Windsurf without any per-client work beyond config.
Tier 2: Custom plugin ecosystems (OpenClaw)
Extend the existing atocore-capture plugin on T420 to also register a
before_prompt_build hook that pulls context from AtoCore and injects it
into the agent's system prompt. The plugin already has the HTTP client, the
authentication, the fail-open pattern. This is ~30 lines of added code.
Tier 3: Everything else (Codex, Ollama, custom agents)
For clients without plugin/hook systems, ship a thin proxy/middleware the user configures as the LLM endpoint:
atocore-proxylistens onlocalhost:PORT- Intercepts OpenAI-compatible chat/completion calls
- Pulls context from AtoCore, injects into system prompt
- Forwards to the real model endpoint (OpenAI, Ollama, Anthropic, etc.)
- Returns the response, then captures the interaction back to AtoCore
This makes AtoCore a "drop-in" layer for anything that speaks OpenAI-compatible HTTP — which is nearly every modern LLM runtime.
Knowledge Density Plan
The brain is only as smart as what it knows. Current state: 80 active memories across 6 projects, 324 candidates in the queue being processed. Target: 1,000+ curated memories to become a real master brain.
Mechanisms:
- Finish the current triage pass (324 → ~80 more promotions expected).
- Re-extract with stronger prompt on existing 236 interactions — tune the LLM extractor system prompt to pull more durable facts and fewer ephemeral snapshots.
- Ingest all drive/vault documents as memory candidates (not just chunks). Every structured markdown section with a decision/fact/requirement header becomes a candidate memory.
- Multi-source triangulation: same fact in 3+ sources = auto-promote to confidence 0.95.
- Cross-project synthesis: facts appearing in multiple project contexts get promoted to global domain knowledge.
Auto-Organization of Metadata
Currently: type, project, confidence, status, reference_count. For
master brain we need more structure, inferred automatically:
| Addition | Purpose | Mechanism |
|---|---|---|
| Domain tags (optics, mechanics, firmware, business…) | Cross-cutting retrieval | LLM inference during triage |
| Temporal scope (permanent, valid_until_X, transient) | Avoid stale truth | LLM classifies during triage |
| Source refs (chunk_id[], interaction_id[]) | Provenance for every fact | Enforced at creation time |
| Relationships (contradicts, updates, depends_on) | Memory graph | Triage infers during review |
| Semantic clusters | Detect duplicates, find gaps | Weekly HDBSCAN pass on embeddings |
Layer these in progressively — none of them require schema rewrites, just additional fields and batch jobs.
Self-Growth Mechanisms
Four loops that make AtoCore grow autonomously:
1. Drift detection (nightly)
Compare new chunk embeddings to existing vector distribution. Centroids >X cosine distance from any existing centroid = new knowledge area. Log to dashboard; human decides if it's noise or a domain worth curating.
2. Gap identification (continuous)
Every /context/build logs query + chunks_returned + memories_returned.
Weekly report: "top 10 queries with weak coverage." Those are targeted
curation opportunities.
3. Multi-source triangulation (weekly)
Scan memory content similarity across sources. When a fact appears in 3+ independent sources (vault doc + drive doc + interaction), auto-promote to high confidence and mark as "triangulated."
4. Active learning prompts (monthly)
Surface "you have 200 p06 memories but only 15 p04 memories. Spend 30 min curating p04?" via dashboard digest.
Robustness Strategy (Flawless Operation Bar)
Current: nightly backup, off-host rsync, health endpoint, 303 tests, harness, enhanced dashboard with pipeline health (this session).
To reach "flawless":
| Gap | Fix | Priority |
|---|---|---|
| Silent pipeline failures | Alerting webhook on harness drop / pipeline skip | P1 |
| Memory mutations untracked | Append-only audit log table | P1 |
| Integrity drift | Nightly FK + vector-chunk parity checks | P1 |
| Schema migrations ad-hoc | Formal migration framework with rollback | P2 |
| Single point of failure | Daily backup to user's main computer (new) | P1 |
| No hot standby | Second instance following primary via WAL | P3 |
| No temporal history | Memory audit + valid_until fields | P2 |
Daily Backup to Main Computer
Currently: Dalidou → T420 (192.168.86.39) via rsync.
Add: Dalidou → main computer via a pull (main computer runs the rsync, pulls from Dalidou). Pull-based is simpler than push — no need for SSH keys on Dalidou to reach the Windows machine.
# On main computer, daily scheduled task:
rsync -a papa@dalidou:/srv/storage/atocore/backups/snapshots/ \
/path/to/local/atocore-backups/
Configure via Windows Task Scheduler or a cron-like runner. Verify weekly that the latest snapshot is present.
Human Interface Auto-Evolution
Current: wiki at /wiki, regenerates on every request from DB. Synthesis
(the "current state" paragraph at top of project pages) runs weekly on
Sundays only. That's why it feels stalled.
Fixes:
- Run synthesis daily, not weekly. It's cheap (one claude call per project) and keeps the human-readable overview fresh.
- Trigger synthesis on major events — when 5+ new memories land for a project, regenerate its synthesis.
- Add "What's New" feed — wiki homepage shows recent additions across all projects (last 7 days of memory promotions, state entries, entities).
- Memory timeline view — project page gets a chronological list of what we learned when.
Phased Roadmap (8-10 weeks)
Phase 1 (week 1-2): Universal Consumption
Goal: every LLM call is AtoCore-grounded automatically.
- Build
atocore-mcpserver (wraps HTTP API, stdio transport) - Publish to npm / or run via
pipx/ stdlib HTTP - Configure in Claude Desktop (
~/.claude/mcp_servers.json) - Configure in Claude Code (
claude mcp add atocore …) - Extend OpenClaw plugin with
before_prompt_buildPULL - Write
atocore-proxymiddleware for Codex/Ollama/generic clients - Document configuration for each client
Success: open a fresh Claude Code session, ask a project question, verify the response references AtoCore memories without manual context commands.
Phase 2 (week 2-3): Knowledge Density + Wiki Evolution
- Finish current triage pass (324 candidates → active)
- Tune extractor prompt for higher promotion rate on durable facts
- Daily synthesis in cron (not just Sundays)
- Event-triggered synthesis on significant project changes
- Wiki "What's New" feed
- Memory timeline per project
Target: 300+ active memories, wiki feels alive daily.
Phase 3 (week 3-4): Auto-Organization
- Schema: add
domain_tags,valid_until,source_refs,triangulated_count - Triage prompt upgraded: infer tags + temporal scope + relationships
- Weekly HDBSCAN clustering of embeddings → dup detection + gap reports
- Relationship edges in a new
memory_relationshipstable
Phase 4 (week 4-5): Robustness Hardening
- Append-only
memory_audittable + retrofit mutations - Nightly integrity checks (FK validation, orphan detection, parity)
- Alerting webhook (Discord/email) on pipeline anomalies
- Daily backup to user's main computer (pull-based)
- Formal migration framework
Phase 5 (week 6-7): Engineering V1 Implementation
Execute the 23 acceptance criteria in docs/architecture/engineering-v1-acceptance.md
against p06-polisher as the test bed. The ontology and queries are designed;
this phase implements them.
Phase 6 (week 8-9): Self-Growth Loops
- Drift detection (nightly)
- Gap identification from
/context/buildlogs - Multi-source triangulation
- Active learning digest (monthly)
- Cross-project synthesis
Phase 7 (ongoing): Scale & Polish
- Multi-model validation (sonnet triages, opus cross-checks on disagreements)
- AtoDrive integration (Google Drive as trusted source)
- Hot standby when real production dependence materializes
- More MCP tools (write-back, memory search, entity queries)
Success Criteria
AtoCore is a master brain when:
- Zero manual context commands. A fresh Claude/OpenClaw session answering a project question without being told "use AtoCore context."
- 1,000+ active memories with >90% provenance coverage (every fact traceable to a source).
- Every project has a current, human-readable overview updated within 24h of significant changes.
- Harness stays >95% across 20+ fixtures covering all active projects.
- Zero silent pipeline failures for 30 consecutive days (all failures surface via alert within the hour).
- Claude on any task knows what we know — user asks "what did we decide about X?" and the answer is grounded in AtoCore, not reconstructed from scratch.
Where We Are Now (2026-04-16)
- ✅ Core infrastructure: HTTP API, SQLite, Chroma, deploy pipeline
- ✅ Capture pipes: Claude Code Stop hook, OpenClaw message hooks
- ✅ Nightly pipeline: backup, extract, triage, synthesis, lint, harness, summary
- ✅ Phase 10: auto-promotion from reinforcement + candidate expiry
- ✅ Dashboard shows pipeline health + interaction totals + all projects
- ⚡ 324 candidates being triaged (down from 439), ~80 active memories, growing
- ❌ No consumption at prompt time (capture-only)
- ❌ Wiki auto-evolves only on Sundays (synthesis cadence)
- ❌ No MCP adapter
- ❌ No daily backup to main computer
- ❌ Engineering V1 not implemented
- ❌ No alerting on pipeline failures
The path is clear. Phase 1 is the keystone.