276 lines
15 KiB
Markdown
276 lines
15 KiB
Markdown
|
|
|
||
|
|
# 🔧 08 — System Implementation Status
|
||
|
|
|
||
|
|
> How the multi-agent system actually works right now, as built.
|
||
|
|
> Last updated: 2026-02-15
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Architecture Overview
|
||
|
|
|
||
|
|
**Multi-Instance Cluster:** 8 independent OpenClaw gateway processes, one per agent. Each has its own systemd service, Discord bot token, port, and state directory.
|
||
|
|
|
||
|
|
```
|
||
|
|
┌──────────────────────────────────────────────────────────────────┐
|
||
|
|
│ T420 (clawdbot) │
|
||
|
|
│ │
|
||
|
|
│ ┌────────────────────────────────────────────────────────────┐ │
|
||
|
|
│ │ OpenClaw Gateway — Mario (main instance) │ │
|
||
|
|
│ │ Port 18789 │ Slack: Antoine's personal workspace │ │
|
||
|
|
│ │ State: ~/.openclaw/ │ │
|
||
|
|
│ └────────────────────────────────────────────────────────────┘ │
|
||
|
|
│ │
|
||
|
|
│ ┌──────────────── Atomizer Cluster ────────────────────────┐ │
|
||
|
|
│ │ │ │
|
||
|
|
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||
|
|
│ │ │ Manager │ │ Tech Lead │ │ Secretary │ │ │
|
||
|
|
│ │ │ :18800 │ │ :18804 │ │ :18808 │ │ │
|
||
|
|
│ │ │ Opus 4.6 │ │ Opus 4.6 │ │ Gemini 2.5 │ │ │
|
||
|
|
│ │ └──────┬───────┘ └──────┬──────┘ └──────┬───────┘ │ │
|
||
|
|
│ │ │ │ │ │ │
|
||
|
|
│ │ ┌──────┴───────┐ ┌─────┴──────┐ ┌──────┴───────┐ │ │
|
||
|
|
│ │ │ Auditor │ │ Optimizer │ │ Study Builder│ │ │
|
||
|
|
│ │ │ :18812 │ │ :18816 │ │ :18820 │ │ │
|
||
|
|
│ │ │ Opus 4.6 │ │ Sonnet 4.5 │ │ Sonnet 4.5 │ │ │
|
||
|
|
│ │ └──────────────┘ └────────────┘ └──────────────┘ │ │
|
||
|
|
│ │ │ │
|
||
|
|
│ │ ┌─────────────┐ ┌─────────────┐ │ │
|
||
|
|
│ │ │ NX Expert │ │ Webster │ │ │
|
||
|
|
│ │ │ :18824 │ │ :18828 │ │ │
|
||
|
|
│ │ │ Sonnet 4.5 │ │ Gemini 2.5 │ │ │
|
||
|
|
│ │ └─────────────┘ └─────────────┘ │ │
|
||
|
|
│ │ │ │
|
||
|
|
│ │ Inter-agent: hooks API (curl between ports) │ │
|
||
|
|
│ │ Shared token: 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd… │ │
|
||
|
|
│ └───────────────────────────────────────────────────────────┘ │
|
||
|
|
│ │
|
||
|
|
└──────────────────────────────────────────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────────────────────────────────────────────────┐
|
||
|
|
│ Discord: Atomizer-HQ Server │
|
||
|
|
│ Guild: 1471858733452890132 │
|
||
|
|
│ │
|
||
|
|
│ 📋 COMMAND: #ceo-office, #announcements, #daily-standup │
|
||
|
|
│ 🔧 ENGINEERING: #technical, #code-review, #fea-analysis, #nx │
|
||
|
|
│ 📊 OPERATIONS: #task-board, #meeting-notes, #reports │
|
||
|
|
│ 🔬 RESEARCH: #literature, #materials-data │
|
||
|
|
│ 🏗️ PROJECTS: #active-projects │
|
||
|
|
│ 📚 KNOWLEDGE: #knowledge-base, #lessons-learned │
|
||
|
|
│ 🤖 SYSTEM: #agent-logs, #inter-agent, #it-ops │
|
||
|
|
│ │
|
||
|
|
│ Each agent = its own Discord bot with unique name & avatar │
|
||
|
|
└──────────────────────────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Why Multi-Instance (Not Single Gateway)
|
||
|
|
|
||
|
|
OpenClaw's native Discord provider (`@buape/carbon`) has a race condition bug when multiple bot tokens connect from one process. Since we need 8 separate bot accounts, we run 8 separate processes — each handles exactly one token, bypassing the bug entirely.
|
||
|
|
|
||
|
|
**Advantages over previous bridge approach:**
|
||
|
|
- Native Discord streaming, threads, reactions, attachments
|
||
|
|
- Fault isolation — one agent crashing doesn't take down the others
|
||
|
|
- No middleware polling session files on disk
|
||
|
|
- Each agent appears as its own Discord user with independent presence
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Port Map
|
||
|
|
|
||
|
|
| Agent | Port | Model | Notes |
|
||
|
|
|-------|------|-------|-------|
|
||
|
|
| Manager | 18800 | Opus 4.6 | Orchestrates, delegates. Heartbeat disabled (Discord delivery bug) |
|
||
|
|
| Tech Lead | 18804 | Opus 4.6 | Technical authority |
|
||
|
|
| Secretary | 18808 | Gemini 2.5 Pro | Task tracking, notes. Changed from Codex 2026-02-15 (OAuth expired) |
|
||
|
|
| Auditor | 18812 | Gemini 2.5 Pro | Quality review. Changed from Codex 2026-02-15 (OAuth expired) |
|
||
|
|
| Optimizer | 18816 | Sonnet 4.5 | Optimization work |
|
||
|
|
| Study Builder | 18820 | Gemini 2.5 Pro | Study setup. Changed from Codex 2026-02-15 (OAuth expired) |
|
||
|
|
| NX Expert | 18824 | Sonnet 4.5 | CAD/NX work |
|
||
|
|
| Webster | 18828 | Gemini 2.5 Pro | Research. Heartbeat disabled (Discord delivery bug) |
|
||
|
|
|
||
|
|
> **⚠️ Port spacing = 4.** OpenClaw uses port N AND N+3 (browser service). Never assign adjacent ports.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Systemd Setup
|
||
|
|
|
||
|
|
### Template Service
|
||
|
|
File: `~/.config/systemd/user/openclaw-atomizer@.service`
|
||
|
|
|
||
|
|
```ini
|
||
|
|
[Unit]
|
||
|
|
Description=OpenClaw Atomizer - %i
|
||
|
|
After=network.target
|
||
|
|
|
||
|
|
[Service]
|
||
|
|
Type=simple
|
||
|
|
ExecStart=/usr/bin/node /home/papa/.local/lib/node_modules/openclaw/dist/index.js gateway
|
||
|
|
Environment=PATH=/home/papa/.local/bin:/usr/local/bin:/usr/bin:/bin
|
||
|
|
Environment=HOME=/home/papa
|
||
|
|
Environment=OPENCLAW_STATE_DIR=/home/papa/atomizer/instances/%i
|
||
|
|
Environment=OPENCLAW_CONFIG_PATH=/home/papa/atomizer/instances/%i/openclaw.json
|
||
|
|
Environment=OPENCLAW_GATEWAY_TOKEN=31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1
|
||
|
|
EnvironmentFile=/home/papa/atomizer/instances/%i/env
|
||
|
|
EnvironmentFile=/home/papa/atomizer/config/.discord-tokens.env
|
||
|
|
Restart=always
|
||
|
|
RestartSec=5
|
||
|
|
StartLimitIntervalSec=60
|
||
|
|
StartLimitBurst=5
|
||
|
|
|
||
|
|
[Install]
|
||
|
|
WantedBy=default.target
|
||
|
|
```
|
||
|
|
|
||
|
|
### Cluster Management Script
|
||
|
|
File: `~/atomizer/cluster.sh`
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Start all: bash cluster.sh start
|
||
|
|
# Stop all: bash cluster.sh stop
|
||
|
|
# Restart all: bash cluster.sh restart
|
||
|
|
# Status: bash cluster.sh status
|
||
|
|
# Logs: bash cluster.sh logs [agent-name]
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. File System Layout
|
||
|
|
|
||
|
|
```
|
||
|
|
~/atomizer/
|
||
|
|
├── cluster.sh ← Cluster management script
|
||
|
|
├── config/
|
||
|
|
│ ├── .discord-tokens.env ← All 8 bot tokens (env vars)
|
||
|
|
│ └── atomizer-discord.env ← Legacy (can remove)
|
||
|
|
├── instances/ ← Per-agent OpenClaw state
|
||
|
|
│ ├── manager/
|
||
|
|
│ │ ├── openclaw.json ← Agent config (1 agent per instance)
|
||
|
|
│ │ ├── env ← Instance-specific env vars
|
||
|
|
│ │ └── agents/main/sessions/ ← Session data (auto-created)
|
||
|
|
│ ├── tech-lead/
|
||
|
|
│ ├── secretary/
|
||
|
|
│ ├── auditor/
|
||
|
|
│ ├── optimizer/
|
||
|
|
│ ├── study-builder/
|
||
|
|
│ ├── nx-expert/
|
||
|
|
│ └── webster/
|
||
|
|
├── workspaces/ ← Agent workspaces (SOUL, AGENTS, memory)
|
||
|
|
│ ├── manager/
|
||
|
|
│ │ ├── SOUL.md
|
||
|
|
│ │ ├── AGENTS.md
|
||
|
|
│ │ ├── MEMORY.md
|
||
|
|
│ │ └── memory/
|
||
|
|
│ ├── secretary/
|
||
|
|
│ ├── technical-lead/
|
||
|
|
│ ├── auditor/
|
||
|
|
│ ├── optimizer/
|
||
|
|
│ ├── study-builder/
|
||
|
|
│ ├── nx-expert/
|
||
|
|
│ ├── webster/
|
||
|
|
│ └── shared/ ← Shared context (CLUSTER.md, protocols)
|
||
|
|
└── tools/
|
||
|
|
└── nxopen-mcp/ ← NX Open MCP server (for CAD)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key distinction:** `instances/` = OpenClaw runtime state (configs, sessions, SQLite). `workspaces/` = agent personality and memory (SOUL.md, AGENTS.md, etc.).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Inter-Agent Communication
|
||
|
|
|
||
|
|
### Delegation Skill (Primary Method)
|
||
|
|
Manager and Tech Lead use the `delegate` skill to assign tasks to other agents. The skill wraps the OpenClaw Hooks API with port mapping, auth, error handling, and logging.
|
||
|
|
|
||
|
|
**Location:** `/home/papa/atomizer/workspaces/shared/skills/delegate/`
|
||
|
|
**Installed on:** Manager, Tech Lead (symlinked from shared)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Usage
|
||
|
|
bash /home/papa/atomizer/workspaces/shared/skills/delegate/delegate.sh <agent> "<instruction>" [options]
|
||
|
|
|
||
|
|
# Examples
|
||
|
|
delegate.sh webster "Find CTE of Zerodur Class 0 between 20-40°C"
|
||
|
|
delegate.sh nx-expert "Mesh the M2 mirror" --channel C0AEJV13TEU --deliver
|
||
|
|
delegate.sh auditor "Review thermal analysis" --no-deliver
|
||
|
|
```
|
||
|
|
|
||
|
|
**How it works:**
|
||
|
|
1. Looks up the target agent's port from hardcoded port map
|
||
|
|
2. Checks if the target is running
|
||
|
|
3. POSTs to `http://127.0.0.1:PORT/hooks/agent` with auth token
|
||
|
|
4. Target agent processes the task asynchronously in an isolated session
|
||
|
|
5. Response delivered to Discord if `--deliver` is set
|
||
|
|
|
||
|
|
**Options:** `--channel <id>`, `--deliver` (default), `--no-deliver`
|
||
|
|
|
||
|
|
### Delegation Authority
|
||
|
|
| Agent | Can Delegate To |
|
||
|
|
|-------|----------------|
|
||
|
|
| Manager | All agents |
|
||
|
|
| Tech Lead | All agents except Manager |
|
||
|
|
| All others | Cannot delegate — request via Manager or Tech Lead |
|
||
|
|
|
||
|
|
### Hooks Protocol
|
||
|
|
All agents follow `/home/papa/atomizer/workspaces/shared/HOOKS-PROTOCOL.md`:
|
||
|
|
- Hook messages = **high-priority assignments**, processed before other work
|
||
|
|
- After completing tasks, agents **append** status to `shared/project_log.md`
|
||
|
|
- Only the Manager updates `shared/PROJECT_STATUS.md` (gatekeeper pattern)
|
||
|
|
|
||
|
|
### Raw Hooks API (Reference)
|
||
|
|
The delegate skill wraps this, but for reference:
|
||
|
|
```bash
|
||
|
|
curl -s -X POST http://127.0.0.1:PORT/hooks/agent \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-H "Authorization: Bearer 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1" \
|
||
|
|
-d '{"message": "your request here", "deliver": true, "channel": "discord"}'
|
||
|
|
```
|
||
|
|
|
||
|
|
### sessions_send / sessions_spawn
|
||
|
|
Agents configured with `agentToAgent.enabled: true` can use OpenClaw's built-in `sessions_send` and `sessions_spawn` tools to communicate within the same instance. Cross-instance communication requires the hooks API / delegate skill.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. Current Status
|
||
|
|
|
||
|
|
### ✅ Working
|
||
|
|
- All 8 instances running as systemd services (auto-start on boot)
|
||
|
|
- Each agent has its own Discord bot identity (name, avatar, presence)
|
||
|
|
- Native Discord features: streaming, typing indicators, message chunking
|
||
|
|
- Agent workspaces with SOUL.md, AGENTS.md, MEMORY.md
|
||
|
|
- Hooks API enabled on all instances (Google Gemini + Anthropic auth configured)
|
||
|
|
- **Delegation skill deployed** — Manager and Tech Lead can delegate tasks to any agent via `delegate.sh`
|
||
|
|
- **Hooks protocol** — all agents know how to receive and prioritize delegated tasks
|
||
|
|
- **Gatekeeper pattern** — Manager owns PROJECT_STATUS.md; others append to project_log.md
|
||
|
|
- Cluster management via `cluster.sh`
|
||
|
|
- Estimated total RAM: ~4.2GB for 8 instances
|
||
|
|
|
||
|
|
### ❌ Known Issues
|
||
|
|
- ~~**DELEGATE syntax is fake**~~ → ✅ RESOLVED (2026-02-14): Replaced with `delegate.sh` skill using hooks API
|
||
|
|
- **Discord "Ambiguous recipient" bug** (2026-02-15): OpenClaw Discord plugin requires `user:` or `channel:` prefix for message targets. When heartbeat tries to reply to a session that originated from a Discord DM, it uses the bare user ID → delivery fails. **Workaround:** Heartbeat disabled on Manager + Webster. Other agents unaffected (their sessions don't originate from Discord DMs). Proper fix requires OpenClaw patch to auto-infer `user:` for known user IDs.
|
||
|
|
- **Codex OAuth expired** (2026-02-15): `refresh_token_reused` error — multiple instances racing to refresh the same shared Codex token. Secretary, Auditor, Study-Builder switched to Gemini 2.5 Pro. To restore Codex: Antoine must re-run `codex login` via SSH tunnel, then run `~/atomizer/scripts/sync-codex-tokens.sh`.
|
||
|
|
- **No automated orchestration layer:** Manager delegates manually (but now has proper tooling to do so — orchestrate.sh, workflow engine)
|
||
|
|
- **5 agents not yet created:** Post-Processor, Reporter, Developer, Knowledge Base, IT (from the original 13-agent plan)
|
||
|
|
- **Windows execution bridge** (`atomizer_job_watcher.py`): exists but not connected end-to-end
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. Evolution History
|
||
|
|
|
||
|
|
| Date | Phase | What Changed |
|
||
|
|
|------|-------|-------------|
|
||
|
|
| 2026-02-07 | Phase 0 | Vision doc created, 13-agent plan designed |
|
||
|
|
| 2026-02-08 | Phase 0 | Single gateway (port 18790) running on Slack |
|
||
|
|
| 2026-02-13 | Discord Migration | Discord server created, 8 bot tokens obtained |
|
||
|
|
| 2026-02-14 (AM) | Bridge Attempt | discord-bridge.js built — worked but fragile (no streaming, polled session files) |
|
||
|
|
| 2026-02-14 (PM) | **Multi-Instance Cluster** | Pivoted to 8 independent OpenClaw instances. Bridge killed. Native Discord restored. |
|
||
|
|
| 2026-02-14 (PM) | **Delegation System** | Built `delegate.sh` skill, hooks protocol, gatekeeper pattern. Fake DELEGATE syntax replaced with real hooks API calls. Google Gemini auth added to all instances. |
|
||
|
|
| 2026-02-15 | **Orchestration Engine** | Phases 1-3 complete: synchronous delegation (`orchestrate.py`), smart routing (capability registry), hierarchical delegation (Tech-Lead + Optimizer can sub-delegate), YAML workflow engine with parallel execution + approval gates. See `10-ORCHESTRATION-ENGINE-PLAN.md`. |
|
||
|
|
| 2026-02-15 | **Stability Fixes** | Discord heartbeat delivery bug identified (ambiguous recipient). Codex OAuth token expired (refresh_token_reused). Heartbeat disabled on Manager + Webster. Secretary/Auditor/Study-Builder switched from Codex to Gemini 2.5 Pro. HEARTBEAT.md created for all agents. |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
*Created: 2026-02-14 by Mario*
|
||
|
|
*This is the "as-built" document — updated as implementation evolves.*
|