- Project plan, agent roster, architecture, roadmap - Decision log, full system plan, Discord setup/migration guides - System implementation status (as-built) - Cluster pivot history - Orchestration engine plan (Phases 1-4) - Webster and Auditor reviews
15 KiB
🔧 08 — System Implementation Status
How the multi-agent system actually works right now, as built. Last updated: 2026-02-15
1. Architecture Overview
Multi-Instance Cluster: 8 independent OpenClaw gateway processes, one per agent. Each has its own systemd service, Discord bot token, port, and state directory.
┌──────────────────────────────────────────────────────────────────┐
│ T420 (clawdbot) │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ OpenClaw Gateway — Mario (main instance) │ │
│ │ Port 18789 │ Slack: Antoine's personal workspace │ │
│ │ State: ~/.openclaw/ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────── Atomizer Cluster ────────────────────────┐ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Manager │ │ Tech Lead │ │ Secretary │ │ │
│ │ │ :18800 │ │ :18804 │ │ :18808 │ │ │
│ │ │ Opus 4.6 │ │ Opus 4.6 │ │ Gemini 2.5 │ │ │
│ │ └──────┬───────┘ └──────┬──────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ │ ┌──────┴───────┐ ┌─────┴──────┐ ┌──────┴───────┐ │ │
│ │ │ Auditor │ │ Optimizer │ │ Study Builder│ │ │
│ │ │ :18812 │ │ :18816 │ │ :18820 │ │ │
│ │ │ Opus 4.6 │ │ Sonnet 4.5 │ │ Sonnet 4.5 │ │ │
│ │ └──────────────┘ └────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ NX Expert │ │ Webster │ │ │
│ │ │ :18824 │ │ :18828 │ │ │
│ │ │ Sonnet 4.5 │ │ Gemini 2.5 │ │ │
│ │ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ Inter-agent: hooks API (curl between ports) │ │
│ │ Shared token: 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd… │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Discord: Atomizer-HQ Server │
│ Guild: 1471858733452890132 │
│ │
│ 📋 COMMAND: #ceo-office, #announcements, #daily-standup │
│ 🔧 ENGINEERING: #technical, #code-review, #fea-analysis, #nx │
│ 📊 OPERATIONS: #task-board, #meeting-notes, #reports │
│ 🔬 RESEARCH: #literature, #materials-data │
│ 🏗️ PROJECTS: #active-projects │
│ 📚 KNOWLEDGE: #knowledge-base, #lessons-learned │
│ 🤖 SYSTEM: #agent-logs, #inter-agent, #it-ops │
│ │
│ Each agent = its own Discord bot with unique name & avatar │
└──────────────────────────────────────────────────────────────────┘
2. Why Multi-Instance (Not Single Gateway)
OpenClaw's native Discord provider (@buape/carbon) has a race condition bug when multiple bot tokens connect from one process. Since we need 8 separate bot accounts, we run 8 separate processes — each handles exactly one token, bypassing the bug entirely.
Advantages over previous bridge approach:
- Native Discord streaming, threads, reactions, attachments
- Fault isolation — one agent crashing doesn't take down the others
- No middleware polling session files on disk
- Each agent appears as its own Discord user with independent presence
3. Port Map
| Agent | Port | Model | Notes |
|---|---|---|---|
| Manager | 18800 | Opus 4.6 | Orchestrates, delegates. Heartbeat disabled (Discord delivery bug) |
| Tech Lead | 18804 | Opus 4.6 | Technical authority |
| Secretary | 18808 | Gemini 2.5 Pro | Task tracking, notes. Changed from Codex 2026-02-15 (OAuth expired) |
| Auditor | 18812 | Gemini 2.5 Pro | Quality review. Changed from Codex 2026-02-15 (OAuth expired) |
| Optimizer | 18816 | Sonnet 4.5 | Optimization work |
| Study Builder | 18820 | Gemini 2.5 Pro | Study setup. Changed from Codex 2026-02-15 (OAuth expired) |
| NX Expert | 18824 | Sonnet 4.5 | CAD/NX work |
| Webster | 18828 | Gemini 2.5 Pro | Research. Heartbeat disabled (Discord delivery bug) |
⚠️ Port spacing = 4. OpenClaw uses port N AND N+3 (browser service). Never assign adjacent ports.
4. Systemd Setup
Template Service
File: ~/.config/systemd/user/openclaw-atomizer@.service
[Unit]
Description=OpenClaw Atomizer - %i
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/node /home/papa/.local/lib/node_modules/openclaw/dist/index.js gateway
Environment=PATH=/home/papa/.local/bin:/usr/local/bin:/usr/bin:/bin
Environment=HOME=/home/papa
Environment=OPENCLAW_STATE_DIR=/home/papa/atomizer/instances/%i
Environment=OPENCLAW_CONFIG_PATH=/home/papa/atomizer/instances/%i/openclaw.json
Environment=OPENCLAW_GATEWAY_TOKEN=31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1
EnvironmentFile=/home/papa/atomizer/instances/%i/env
EnvironmentFile=/home/papa/atomizer/config/.discord-tokens.env
Restart=always
RestartSec=5
StartLimitIntervalSec=60
StartLimitBurst=5
[Install]
WantedBy=default.target
Cluster Management Script
File: ~/atomizer/cluster.sh
# Start all: bash cluster.sh start
# Stop all: bash cluster.sh stop
# Restart all: bash cluster.sh restart
# Status: bash cluster.sh status
# Logs: bash cluster.sh logs [agent-name]
5. File System Layout
~/atomizer/
├── cluster.sh ← Cluster management script
├── config/
│ ├── .discord-tokens.env ← All 8 bot tokens (env vars)
│ └── atomizer-discord.env ← Legacy (can remove)
├── instances/ ← Per-agent OpenClaw state
│ ├── manager/
│ │ ├── openclaw.json ← Agent config (1 agent per instance)
│ │ ├── env ← Instance-specific env vars
│ │ └── agents/main/sessions/ ← Session data (auto-created)
│ ├── tech-lead/
│ ├── secretary/
│ ├── auditor/
│ ├── optimizer/
│ ├── study-builder/
│ ├── nx-expert/
│ └── webster/
├── workspaces/ ← Agent workspaces (SOUL, AGENTS, memory)
│ ├── manager/
│ │ ├── SOUL.md
│ │ ├── AGENTS.md
│ │ ├── MEMORY.md
│ │ └── memory/
│ ├── secretary/
│ ├── technical-lead/
│ ├── auditor/
│ ├── optimizer/
│ ├── study-builder/
│ ├── nx-expert/
│ ├── webster/
│ └── shared/ ← Shared context (CLUSTER.md, protocols)
└── tools/
└── nxopen-mcp/ ← NX Open MCP server (for CAD)
Key distinction: instances/ = OpenClaw runtime state (configs, sessions, SQLite). workspaces/ = agent personality and memory (SOUL.md, AGENTS.md, etc.).
6. Inter-Agent Communication
Delegation Skill (Primary Method)
Manager and Tech Lead use the delegate skill to assign tasks to other agents. The skill wraps the OpenClaw Hooks API with port mapping, auth, error handling, and logging.
Location: /home/papa/atomizer/workspaces/shared/skills/delegate/
Installed on: Manager, Tech Lead (symlinked from shared)
# Usage
bash /home/papa/atomizer/workspaces/shared/skills/delegate/delegate.sh <agent> "<instruction>" [options]
# Examples
delegate.sh webster "Find CTE of Zerodur Class 0 between 20-40°C"
delegate.sh nx-expert "Mesh the M2 mirror" --channel C0AEJV13TEU --deliver
delegate.sh auditor "Review thermal analysis" --no-deliver
How it works:
- Looks up the target agent's port from hardcoded port map
- Checks if the target is running
- POSTs to
http://127.0.0.1:PORT/hooks/agentwith auth token - Target agent processes the task asynchronously in an isolated session
- Response delivered to Discord if
--deliveris set
Options: --channel <id>, --deliver (default), --no-deliver
Delegation Authority
| Agent | Can Delegate To |
|---|---|
| Manager | All agents |
| Tech Lead | All agents except Manager |
| All others | Cannot delegate — request via Manager or Tech Lead |
Hooks Protocol
All agents follow /home/papa/atomizer/workspaces/shared/HOOKS-PROTOCOL.md:
- Hook messages = high-priority assignments, processed before other work
- After completing tasks, agents append status to
shared/project_log.md - Only the Manager updates
shared/PROJECT_STATUS.md(gatekeeper pattern)
Raw Hooks API (Reference)
The delegate skill wraps this, but for reference:
curl -s -X POST http://127.0.0.1:PORT/hooks/agent \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1" \
-d '{"message": "your request here", "deliver": true, "channel": "discord"}'
sessions_send / sessions_spawn
Agents configured with agentToAgent.enabled: true can use OpenClaw's built-in sessions_send and sessions_spawn tools to communicate within the same instance. Cross-instance communication requires the hooks API / delegate skill.
7. Current Status
✅ Working
- All 8 instances running as systemd services (auto-start on boot)
- Each agent has its own Discord bot identity (name, avatar, presence)
- Native Discord features: streaming, typing indicators, message chunking
- Agent workspaces with SOUL.md, AGENTS.md, MEMORY.md
- Hooks API enabled on all instances (Google Gemini + Anthropic auth configured)
- Delegation skill deployed — Manager and Tech Lead can delegate tasks to any agent via
delegate.sh - Hooks protocol — all agents know how to receive and prioritize delegated tasks
- Gatekeeper pattern — Manager owns PROJECT_STATUS.md; others append to project_log.md
- Cluster management via
cluster.sh - Estimated total RAM: ~4.2GB for 8 instances
❌ Known Issues
DELEGATE syntax is fake→ ✅ RESOLVED (2026-02-14): Replaced withdelegate.shskill using hooks API- Discord "Ambiguous recipient" bug (2026-02-15): OpenClaw Discord plugin requires
user:orchannel:prefix for message targets. When heartbeat tries to reply to a session that originated from a Discord DM, it uses the bare user ID → delivery fails. Workaround: Heartbeat disabled on Manager + Webster. Other agents unaffected (their sessions don't originate from Discord DMs). Proper fix requires OpenClaw patch to auto-inferuser:for known user IDs. - Codex OAuth expired (2026-02-15):
refresh_token_reusederror — multiple instances racing to refresh the same shared Codex token. Secretary, Auditor, Study-Builder switched to Gemini 2.5 Pro. To restore Codex: Antoine must re-runcodex loginvia SSH tunnel, then run~/atomizer/scripts/sync-codex-tokens.sh. - No automated orchestration layer: Manager delegates manually (but now has proper tooling to do so — orchestrate.sh, workflow engine)
- 5 agents not yet created: Post-Processor, Reporter, Developer, Knowledge Base, IT (from the original 13-agent plan)
- Windows execution bridge (
atomizer_job_watcher.py): exists but not connected end-to-end
8. Evolution History
| Date | Phase | What Changed |
|---|---|---|
| 2026-02-07 | Phase 0 | Vision doc created, 13-agent plan designed |
| 2026-02-08 | Phase 0 | Single gateway (port 18790) running on Slack |
| 2026-02-13 | Discord Migration | Discord server created, 8 bot tokens obtained |
| 2026-02-14 (AM) | Bridge Attempt | discord-bridge.js built — worked but fragile (no streaming, polled session files) |
| 2026-02-14 (PM) | Multi-Instance Cluster | Pivoted to 8 independent OpenClaw instances. Bridge killed. Native Discord restored. |
| 2026-02-14 (PM) | Delegation System | Built delegate.sh skill, hooks protocol, gatekeeper pattern. Fake DELEGATE syntax replaced with real hooks API calls. Google Gemini auth added to all instances. |
| 2026-02-15 | Orchestration Engine | Phases 1-3 complete: synchronous delegation (orchestrate.py), smart routing (capability registry), hierarchical delegation (Tech-Lead + Optimizer can sub-delegate), YAML workflow engine with parallel execution + approval gates. See 10-ORCHESTRATION-ENGINE-PLAN.md. |
| 2026-02-15 | Stability Fixes | Discord heartbeat delivery bug identified (ambiguous recipient). Codex OAuth token expired (refresh_token_reused). Heartbeat disabled on Manager + Webster. Secretary/Auditor/Study-Builder switched from Codex to Gemini 2.5 Pro. HEARTBEAT.md created for all agents. |
Created: 2026-02-14 by Mario This is the "as-built" document — updated as implementation evolves.