docs/universal-consumption.md

# Universal Consumption — Connecting LLM Clients to AtoCore

Phase 1 of the Master Brain plan. Every LLM interaction across the ecosystem
pulls context from AtoCore automatically, without the user or agent having
to remember to ask for it.

## Architecture

```
                 ┌─────────────────────┐
                 │  AtoCore HTTP API   │  ← single source of truth
                 │  http://dalidou:8100│
                 └──────────┬──────────┘
                            │
       ┌────────────────────┼────────────────────┐
       │                    │                    │
   ┌───┴────┐         ┌─────┴────┐         ┌────┴────┐
   │  MCP   │         │ OpenClaw │         │  HTTP   │
   │ server │         │  plugin  │         │  proxy  │
   └───┬────┘         └──────┬───┘         └────┬────┘
       │                     │                   │
   Claude/Cursor/         OpenClaw            Codex/Ollama/
   Zed/Windsurf                                any OpenAI-compat client
```

Three adapters, one HTTP backend. Each adapter is a thin passthrough — no
business logic duplicated.

---

## Adapter 1: MCP Server (Claude Desktop, Claude Code, Cursor, Zed, Windsurf)

The MCP server is `scripts/atocore_mcp.py` — stdlib-only Python, stdio
transport, wraps the HTTP API. Claude-family clients see AtoCore as built-in
tools just like `Read` or `Bash`.

### Tools exposed

- **`atocore_context`** (most important): Full context pack for a query —
  Trusted Project State + memories + retrieved chunks. Use at the start of
  any project-related conversation to ground it.
- **`atocore_search`**: Semantic search over ingested documents (top-K chunks).
- **`atocore_memory_list`**: List active memories, filterable by project + type.
- **`atocore_memory_create`**: Propose a candidate memory (enters triage queue).
- **`atocore_project_state`**: Get Trusted Project State entries by category.
- **`atocore_projects`**: List registered projects + aliases.
- **`atocore_health`**: Service status check.

### Registration

#### Claude Code (CLI)
```bash
claude mcp add atocore -- python C:/Users/antoi/ATOCore/scripts/atocore_mcp.py
claude mcp list    # verify: "atocore ... ✓ Connected"
```

#### Claude Desktop (GUI)
Edit `~/Library/Application Support/Claude/claude_desktop_config.json`
(macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):

```json
{
  "mcpServers": {
    "atocore": {
      "command": "python",
      "args": ["C:/Users/antoi/ATOCore/scripts/atocore_mcp.py"],
      "env": {
        "ATOCORE_URL": "http://dalidou:8100"
      }
    }
  }
}
```
Restart Claude Desktop.

#### Cursor / Zed / Windsurf
Similar JSON config in each tool's MCP settings. Consult their docs —
the config schema is standard MCP.

### Configuration

Environment variables the MCP server honors:

| Var | Default | Purpose |
|---|---|---|
| `ATOCORE_URL` | `http://dalidou:8100` | Where to reach AtoCore |
| `ATOCORE_TIMEOUT` | `10` | Per-request HTTP timeout (seconds) |

### Behavior

- Fail-open: if Dalidou is unreachable, tools return "AtoCore unavailable"
  error messages but don't crash the client.
- Zero business logic: every tool is a direct HTTP passthrough.
- stdlib only: no MCP SDK dependency.

---

## Adapter 2: OpenClaw Plugin (`openclaw-plugins/atocore-capture/handler.js`)

The plugin on T420 OpenClaw has two responsibilities:

1. **CAPTURE**: On `before_agent_start` + `llm_output`, POST completed turns
   to AtoCore `/interactions` (existing).
2. **PULL**: On `before_prompt_build`, call `/context/build` and inject the
   context pack via `prependContext` so the agent's system prompt includes
   AtoCore knowledge.

### Deployment

The plugin is loaded from
`/tmp/atocore-openclaw-capture-plugin/openclaw-plugins/atocore-capture/`
on the T420 (per OpenClaw's plugin config at `~/.openclaw/openclaw.json`).

To update:
```bash
scp openclaw-plugins/atocore-capture/handler.js \
    papa@192.168.86.39:/tmp/atocore-openclaw-capture-plugin/openclaw-plugins/atocore-capture/index.js
ssh papa@192.168.86.39 'systemctl --user restart openclaw-gateway'
```

Verify in gateway logs: look for "ready (7 plugins: acpx, atocore-capture, ...)"

### Configuration (env vars set on T420)

| Var | Default | Purpose |
|---|---|---|
| `ATOCORE_BASE_URL` | `http://dalidou:8100` | AtoCore HTTP endpoint |
| `ATOCORE_PULL_DISABLED` | (unset) | Set to `1` to disable context pull |

### Behavior

- Fail-open: AtoCore unreachable = no injection, no capture, agent runs
  normally.
- 6s timeout on context pull, 10s on capture — won't stall the agent.
- Context pack prepended as a clearly-bracketed block so the agent can see
  it's auto-injected grounding info.

---

## Adapter 3: HTTP Proxy (`scripts/atocore_proxy.py`)

A stdlib-only OpenAI-compatible HTTP proxy. Sits between any
OpenAI-API-speaking client and the real provider, enriches every
`/chat/completions` request with AtoCore context.

Works with:
- **Codex CLI** (OpenAI-compatible endpoint)
- **Ollama** (has OpenAI-compatible `/v1` endpoint since 0.1.24)
- **LiteLLM**, **llama.cpp server**, custom agents
- Anything that can be pointed at a custom base URL

### Start it

```bash
# For Ollama (local models):
ATOCORE_UPSTREAM=http://localhost:11434/v1 \
  python scripts/atocore_proxy.py

# For OpenAI cloud:
ATOCORE_UPSTREAM=https://api.openai.com/v1 \
  ATOCORE_CLIENT_LABEL=codex \
  python scripts/atocore_proxy.py

# Test:
curl http://127.0.0.1:11435/healthz
```

### Point a client at it

Set the client's OpenAI base URL to `http://127.0.0.1:11435/v1`.

#### Ollama example:
```bash
OPENAI_BASE_URL=http://127.0.0.1:11435/v1 \
  some-openai-client --model llama3:8b
```

#### Codex CLI:
Set `OPENAI_BASE_URL=http://127.0.0.1:11435/v1` in your codex config.

### Configuration

| Var | Default | Purpose |
|---|---|---|
| `ATOCORE_URL` | `http://dalidou:8100` | AtoCore HTTP endpoint |
| `ATOCORE_UPSTREAM` | (required) | Real provider base URL |
| `ATOCORE_PROXY_PORT` | `11435` | Proxy listen port |
| `ATOCORE_PROXY_HOST` | `127.0.0.1` | Proxy bind address |
| `ATOCORE_CLIENT_LABEL` | `proxy` | Client id in captures |
| `ATOCORE_INJECT` | `1` | Inject context (set `0` to disable) |
| `ATOCORE_CAPTURE` | `1` | Capture interactions (set `0` to disable) |

### Behavior

- GET requests (model listing etc) pass through unchanged
- POST to `/chat/completions` (or `/v1/chat/completions`) gets enriched:
  1. Last user message extracted as query
  2. AtoCore `/context/build` called with 6s timeout
  3. Pack injected as system message (or prepended to existing system)
  4. Enriched body forwarded to upstream
  5. After success, interaction POSTed to `/interactions` in background
- Fail-open: AtoCore unreachable = pass through without injection
- Streaming responses: currently buffered (not true stream). Good enough for
  most cases; can be upgraded later if needed.

### Running as a service

On Linux, create `~/.config/systemd/user/atocore-proxy.service`:
```ini
[Unit]
Description=AtoCore HTTP proxy

[Service]
Environment=ATOCORE_UPSTREAM=http://localhost:11434/v1
Environment=ATOCORE_CLIENT_LABEL=ollama
ExecStart=/usr/bin/python3 /path/to/scripts/atocore_proxy.py
Restart=on-failure

[Install]
WantedBy=default.target
```
Then: `systemctl --user enable --now atocore-proxy`

On Windows, register via Task Scheduler (similar pattern to backup task)
or use NSSM to install as a service.

---

## Verification Checklist

Fresh end-to-end test to confirm Phase 1 is working:

### For Claude Code (MCP)
1. Open a new Claude Code session (not this one).
2. Ask: "what do we know about p06 polisher's control architecture?"
3. Claude should invoke `atocore_context` or `atocore_project_state`
   on its own and answer grounded in AtoCore data.

### For OpenClaw (plugin pull)
1. Send a Discord message to OpenClaw: "what's the status on p04?"
2. Check T420 logs: `journalctl --user -u openclaw-gateway --since "1 min ago" | grep atocore-pull`
3. Expect: `atocore-pull:injected project=p04-gigabit chars=NNN`

### For proxy (any OpenAI-compat client)
1. Start proxy with appropriate upstream
2. Run a client query through it
3. Check stderr: `[atocore-proxy] inject: project=... chars=...`
4. Check `curl http://127.0.0.1:8100/interactions?client=proxy` — should
   show the captured turn

---

## Why not just MCP everywhere?

MCP is great for Claude-family clients but:
- Not supported natively by Codex CLI, Ollama, or OpenAI's own API
- No universal "attach MCP" mechanism in all LLM runtimes
- HTTP APIs are truly universal

HTTP API is the truth, each adapter is the thinnest possible shim for its
ecosystem. When new adapters are needed (Gemini CLI, Claude Code plugin
system, etc.), they follow the same pattern.

---

## Future enhancements

- **Streaming passthrough** in the proxy (currently buffered for simplicity)
- **Response grounding check**: parse assistant output for references to
  injected context, count reinforcement events
- **Per-client metrics** in the dashboard: how often each client pulls,
  context pack size, injection rate
- **Smart project detection**: today we use keyword matching; could use
  AtoCore's own project resolver endpoint
feat: universal LLM consumption (Phase 1 complete) Completes the Phase 1 master brain keystone: every LLM interaction across the ecosystem now pulls context from AtoCore automatically. Three adapters, one HTTP backend: 1. OpenClaw plugin pull (handler.js): - Added before_prompt_build hook that calls /context/build and injects the pack via prependContext - Existing capture hooks (before_agent_start + llm_output) unchanged - 6s context timeout, fail-open on AtoCore unreachable - Deployed to T420, gateway restarted, "7 plugins loaded" 2. atocore-proxy (scripts/atocore_proxy.py): - Stdlib-only OpenAI-compatible HTTP middleware - Drop-in layer for Codex, Ollama, LiteLLM, any OpenAI-compat client - Intercepts /chat/completions: extracts query, pulls context, injects as system message, forwards to upstream, captures back - Fail-open: AtoCore down = passthrough without injection - Configurable via env: UPSTREAM, PORT, CLIENT_LABEL, INJECT, CAPTURE 3. (from prior commit c49363f) atocore-mcp: - stdio MCP server, stdlib Python, 7 tools exposed - Registered in Claude Code: "✓ Connected" Plus quick win: - Project synthesis moved from Sunday-only to daily cron so wiki / mirror pages stay fresh (Step C in batch-extract.sh). Lint stays weekly. Plus docs: - docs/universal-consumption.md: configuration guide for all 3 adapters with registration/env-var tables and verification checklist Plus housekeeping: - .gitignore: add .mypy_cache/ Tests: 303/303 passing. This closes the consumption gap: the reinforcement feedback loop can now actually work (memories get injected → get referenced → reinforcement fires → auto-promotion). Every Claude, OpenClaw, Codex, or Ollama session is automatically AtoCore-grounded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-16 20:14:25 -04:00			`# Universal Consumption — Connecting LLM Clients to AtoCore`

			`Phase 1 of the Master Brain plan. Every LLM interaction across the ecosystem`
			`pulls context from AtoCore automatically, without the user or agent having`
			`to remember to ask for it.`

			`## Architecture`

			```
			`┌─────────────────────┐`
			`│ AtoCore HTTP API │ ← single source of truth`
			`│ http://dalidou:8100│`
			`└──────────┬──────────┘`
			`│`
			`┌────────────────────┼────────────────────┐`
			`│ │ │`
			`┌───┴────┐ ┌─────┴────┐ ┌────┴────┐`
			`│ MCP │ │ OpenClaw │ │ HTTP │`
			`│ server │ │ plugin │ │ proxy │`
			`└───┬────┘ └──────┬───┘ └────┬────┘`
			`│ │ │`
			`Claude/Cursor/ OpenClaw Codex/Ollama/`
			`Zed/Windsurf any OpenAI-compat client`
			```

			`Three adapters, one HTTP backend. Each adapter is a thin passthrough — no`
			`business logic duplicated.`

			`---`

			`## Adapter 1: MCP Server (Claude Desktop, Claude Code, Cursor, Zed, Windsurf)`

			The MCP server is `scripts/atocore_mcp.py` — stdlib-only Python, stdio
			`transport, wraps the HTTP API. Claude-family clients see AtoCore as built-in`
			tools just like `Read` or `Bash`.

			`### Tools exposed`

			- `atocore_context` (most important): Full context pack for a query —
			`Trusted Project State + memories + retrieved chunks. Use at the start of`
			`any project-related conversation to ground it.`
			- `atocore_search`: Semantic search over ingested documents (top-K chunks).
			- `atocore_memory_list`: List active memories, filterable by project + type.
			- `atocore_memory_create`: Propose a candidate memory (enters triage queue).
			- `atocore_project_state`: Get Trusted Project State entries by category.
			- `atocore_projects`: List registered projects + aliases.
			- `atocore_health`: Service status check.

			`### Registration`

			`#### Claude Code (CLI)`
			```bash
			`claude mcp add atocore -- python C:/Users/antoi/ATOCore/scripts/atocore_mcp.py`
			`claude mcp list # verify: "atocore ... ✓ Connected"`
			```

			`#### Claude Desktop (GUI)`
			Edit `~/Library/Application Support/Claude/claude_desktop_config.json`
			(macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):

			```json
			`{`
			`"mcpServers": {`
			`"atocore": {`
			`"command": "python",`
			`"args": ["C:/Users/antoi/ATOCore/scripts/atocore_mcp.py"],`
			`"env": {`
			`"ATOCORE_URL": "http://dalidou:8100"`
			`}`
			`}`
			`}`
			`}`
			```
			`Restart Claude Desktop.`

			`#### Cursor / Zed / Windsurf`
			`Similar JSON config in each tool's MCP settings. Consult their docs —`
			`the config schema is standard MCP.`

			`### Configuration`

			`Environment variables the MCP server honors:`

			`\| Var \| Default \| Purpose \|`
			`\|---\|---\|---\|`
			\| `ATOCORE_URL` \| `http://dalidou:8100` \| Where to reach AtoCore \|
			\| `ATOCORE_TIMEOUT` \| `10` \| Per-request HTTP timeout (seconds) \|

			`### Behavior`

			`- Fail-open: if Dalidou is unreachable, tools return "AtoCore unavailable"`
			`error messages but don't crash the client.`
			`- Zero business logic: every tool is a direct HTTP passthrough.`
			`- stdlib only: no MCP SDK dependency.`

			`---`

			## Adapter 2: OpenClaw Plugin (`openclaw-plugins/atocore-capture/handler.js`)

			`The plugin on T420 OpenClaw has two responsibilities:`

			1. CAPTURE: On `before_agent_start` + `llm_output`, POST completed turns
			to AtoCore `/interactions` (existing).
			2. PULL: On `before_prompt_build`, call `/context/build` and inject the
			context pack via `prependContext` so the agent's system prompt includes
			`AtoCore knowledge.`

			`### Deployment`

			`The plugin is loaded from`
			`/tmp/atocore-openclaw-capture-plugin/openclaw-plugins/atocore-capture/`
			on the T420 (per OpenClaw's plugin config at `~/.openclaw/openclaw.json`).

			`To update:`
			```bash
			`scp openclaw-plugins/atocore-capture/handler.js \`
			`papa@192.168.86.39:/tmp/atocore-openclaw-capture-plugin/openclaw-plugins/atocore-capture/index.js`
			`ssh papa@192.168.86.39 'systemctl --user restart openclaw-gateway'`
			```

			`Verify in gateway logs: look for "ready (7 plugins: acpx, atocore-capture, ...)"`

			`### Configuration (env vars set on T420)`

			`\| Var \| Default \| Purpose \|`
			`\|---\|---\|---\|`
			\| `ATOCORE_BASE_URL` \| `http://dalidou:8100` \| AtoCore HTTP endpoint \|
			\| `ATOCORE_PULL_DISABLED` \| (unset) \| Set to `1` to disable context pull \|

			`### Behavior`

			`- Fail-open: AtoCore unreachable = no injection, no capture, agent runs`
			`normally.`
			`- 6s timeout on context pull, 10s on capture — won't stall the agent.`
			`- Context pack prepended as a clearly-bracketed block so the agent can see`
			`it's auto-injected grounding info.`

			`---`

			## Adapter 3: HTTP Proxy (`scripts/atocore_proxy.py`)

			`A stdlib-only OpenAI-compatible HTTP proxy. Sits between any`
			`OpenAI-API-speaking client and the real provider, enriches every`
			`/chat/completions` request with AtoCore context.

			`Works with:`
			`- Codex CLI (OpenAI-compatible endpoint)`
			- Ollama (has OpenAI-compatible `/v1` endpoint since 0.1.24)
			`- LiteLLM, llama.cpp server, custom agents`
			`- Anything that can be pointed at a custom base URL`

			`### Start it`

			```bash
			`# For Ollama (local models):`
			`ATOCORE_UPSTREAM=http://localhost:11434/v1 \`
			`python scripts/atocore_proxy.py`

			`# For OpenAI cloud:`
			`ATOCORE_UPSTREAM=https://api.openai.com/v1 \`
			`ATOCORE_CLIENT_LABEL=codex \`
			`python scripts/atocore_proxy.py`

			`# Test:`
			`curl http://127.0.0.1:11435/healthz`
			```

			`### Point a client at it`

			Set the client's OpenAI base URL to `http://127.0.0.1:11435/v1`.

			`#### Ollama example:`
			```bash
			`OPENAI_BASE_URL=http://127.0.0.1:11435/v1 \`
			`some-openai-client --model llama3:8b`
			```

			`#### Codex CLI:`
			Set `OPENAI_BASE_URL=http://127.0.0.1:11435/v1` in your codex config.

			`### Configuration`

			`\| Var \| Default \| Purpose \|`
			`\|---\|---\|---\|`
			\| `ATOCORE_URL` \| `http://dalidou:8100` \| AtoCore HTTP endpoint \|
			\| `ATOCORE_UPSTREAM` \| (required) \| Real provider base URL \|
			\| `ATOCORE_PROXY_PORT` \| `11435` \| Proxy listen port \|
			\| `ATOCORE_PROXY_HOST` \| `127.0.0.1` \| Proxy bind address \|
			\| `ATOCORE_CLIENT_LABEL` \| `proxy` \| Client id in captures \|
			\| `ATOCORE_INJECT` \| `1` \| Inject context (set `0` to disable) \|
			\| `ATOCORE_CAPTURE` \| `1` \| Capture interactions (set `0` to disable) \|

			`### Behavior`

			`- GET requests (model listing etc) pass through unchanged`
			- POST to `/chat/completions` (or `/v1/chat/completions`) gets enriched:
			`1. Last user message extracted as query`
			2. AtoCore `/context/build` called with 6s timeout
			`3. Pack injected as system message (or prepended to existing system)`
			`4. Enriched body forwarded to upstream`
			5. After success, interaction POSTed to `/interactions` in background
			`- Fail-open: AtoCore unreachable = pass through without injection`
			`- Streaming responses: currently buffered (not true stream). Good enough for`
			`most cases; can be upgraded later if needed.`

			`### Running as a service`

			On Linux, create `~/.config/systemd/user/atocore-proxy.service`:
			```ini
			`[Unit]`
			`Description=AtoCore HTTP proxy`

			`[Service]`
			`Environment=ATOCORE_UPSTREAM=http://localhost:11434/v1`
			`Environment=ATOCORE_CLIENT_LABEL=ollama`
			`ExecStart=/usr/bin/python3 /path/to/scripts/atocore_proxy.py`
			`Restart=on-failure`

			`[Install]`
			`WantedBy=default.target`
			```
			Then: `systemctl --user enable --now atocore-proxy`

			`On Windows, register via Task Scheduler (similar pattern to backup task)`
			`or use NSSM to install as a service.`

			`---`

			`## Verification Checklist`

			`Fresh end-to-end test to confirm Phase 1 is working:`

			`### For Claude Code (MCP)`
			`1. Open a new Claude Code session (not this one).`
			`2. Ask: "what do we know about p06 polisher's control architecture?"`
			3. Claude should invoke `atocore_context` or `atocore_project_state`
			`on its own and answer grounded in AtoCore data.`

			`### For OpenClaw (plugin pull)`
			`1. Send a Discord message to OpenClaw: "what's the status on p04?"`
			2. Check T420 logs: `journalctl --user -u openclaw-gateway --since "1 min ago" \| grep atocore-pull`
			3. Expect: `atocore-pull:injected project=p04-gigabit chars=NNN`

			`### For proxy (any OpenAI-compat client)`
			`1. Start proxy with appropriate upstream`
			`2. Run a client query through it`
			3. Check stderr: `[atocore-proxy] inject: project=... chars=...`
			4. Check `curl http://127.0.0.1:8100/interactions?client=proxy` — should
			`show the captured turn`

			`---`

			`## Why not just MCP everywhere?`

			`MCP is great for Claude-family clients but:`
			`- Not supported natively by Codex CLI, Ollama, or OpenAI's own API`
			`- No universal "attach MCP" mechanism in all LLM runtimes`
			`- HTTP APIs are truly universal`

			`HTTP API is the truth, each adapter is the thinnest possible shim for its`
			`ecosystem. When new adapters are needed (Gemini CLI, Claude Code plugin`
			`system, etc.), they follow the same pattern.`

			`---`

			`## Future enhancements`

			`- Streaming passthrough in the proxy (currently buffered for simplicity)`
			`- Response grounding check: parse assistant output for references to`
			`injected context, count reinforcement events`
			`- Per-client metrics in the dashboard: how often each client pulls,`
			`context pack size, injection rate`
			`- Smart project detection: today we use keyword matching; could use`
			`AtoCore's own project resolver endpoint`