Files
Atomizer/docs/hq/08-SYSTEM-IMPLEMENTATION-STATUS.md
Antoine cf82de4f06 docs: add HQ multi-agent framework documentation from PKM
- Project plan, agent roster, architecture, roadmap
- Decision log, full system plan, Discord setup/migration guides
- System implementation status (as-built)
- Cluster pivot history
- Orchestration engine plan (Phases 1-4)
- Webster and Auditor reviews
2026-02-15 21:44:07 +00:00

15 KiB

🔧 08 — System Implementation Status

How the multi-agent system actually works right now, as built. Last updated: 2026-02-15


1. Architecture Overview

Multi-Instance Cluster: 8 independent OpenClaw gateway processes, one per agent. Each has its own systemd service, Discord bot token, port, and state directory.

┌──────────────────────────────────────────────────────────────────┐
│                        T420 (clawdbot)                           │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │  OpenClaw Gateway — Mario (main instance)                  │  │
│  │  Port 18789 │ Slack: Antoine's personal workspace          │  │
│  │  State: ~/.openclaw/                                       │  │
│  └────────────────────────────────────────────────────────────┘  │
│                                                                  │
│  ┌──────────────── Atomizer Cluster ────────────────────────┐   │
│  │                                                           │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │   │
│  │  │  Manager     │  │  Tech Lead  │  │  Secretary   │      │   │
│  │  │  :18800      │  │  :18804     │  │  :18808      │      │   │
│  │  │  Opus 4.6    │  │  Opus 4.6   │  │  Gemini 2.5  │      │   │
│  │  └──────┬───────┘  └──────┬──────┘  └──────┬───────┘      │   │
│  │         │                 │                 │              │   │
│  │  ┌──────┴───────┐  ┌─────┴──────┐  ┌──────┴───────┐      │   │
│  │  │  Auditor     │  │  Optimizer  │  │ Study Builder│      │   │
│  │  │  :18812      │  │  :18816     │  │  :18820      │      │   │
│  │  │  Opus 4.6    │  │  Sonnet 4.5 │  │  Sonnet 4.5  │      │   │
│  │  └──────────────┘  └────────────┘  └──────────────┘      │   │
│  │                                                           │   │
│  │  ┌─────────────┐  ┌─────────────┐                        │   │
│  │  │  NX Expert   │  │  Webster    │                        │   │
│  │  │  :18824      │  │  :18828     │                        │   │
│  │  │  Sonnet 4.5  │  │  Gemini 2.5 │                        │   │
│  │  └─────────────┘  └─────────────┘                        │   │
│  │                                                           │   │
│  │  Inter-agent: hooks API (curl between ports)              │   │
│  │  Shared token: 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd…  │   │
│  └───────────────────────────────────────────────────────────┘   │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────────┐
│                 Discord: Atomizer-HQ Server                      │
│                 Guild: 1471858733452890132                        │
│                                                                  │
│  📋 COMMAND: #ceo-office, #announcements, #daily-standup         │
│  🔧 ENGINEERING: #technical, #code-review, #fea-analysis, #nx   │
│  📊 OPERATIONS: #task-board, #meeting-notes, #reports            │
│  🔬 RESEARCH: #literature, #materials-data                       │
│  🏗️ PROJECTS: #active-projects                                  │
│  📚 KNOWLEDGE: #knowledge-base, #lessons-learned                 │
│  🤖 SYSTEM: #agent-logs, #inter-agent, #it-ops                  │
│                                                                  │
│  Each agent = its own Discord bot with unique name & avatar      │
└──────────────────────────────────────────────────────────────────┘

2. Why Multi-Instance (Not Single Gateway)

OpenClaw's native Discord provider (@buape/carbon) has a race condition bug when multiple bot tokens connect from one process. Since we need 8 separate bot accounts, we run 8 separate processes — each handles exactly one token, bypassing the bug entirely.

Advantages over previous bridge approach:

  • Native Discord streaming, threads, reactions, attachments
  • Fault isolation — one agent crashing doesn't take down the others
  • No middleware polling session files on disk
  • Each agent appears as its own Discord user with independent presence

3. Port Map

Agent Port Model Notes
Manager 18800 Opus 4.6 Orchestrates, delegates. Heartbeat disabled (Discord delivery bug)
Tech Lead 18804 Opus 4.6 Technical authority
Secretary 18808 Gemini 2.5 Pro Task tracking, notes. Changed from Codex 2026-02-15 (OAuth expired)
Auditor 18812 Gemini 2.5 Pro Quality review. Changed from Codex 2026-02-15 (OAuth expired)
Optimizer 18816 Sonnet 4.5 Optimization work
Study Builder 18820 Gemini 2.5 Pro Study setup. Changed from Codex 2026-02-15 (OAuth expired)
NX Expert 18824 Sonnet 4.5 CAD/NX work
Webster 18828 Gemini 2.5 Pro Research. Heartbeat disabled (Discord delivery bug)

⚠️ Port spacing = 4. OpenClaw uses port N AND N+3 (browser service). Never assign adjacent ports.


4. Systemd Setup

Template Service

File: ~/.config/systemd/user/openclaw-atomizer@.service

[Unit]
Description=OpenClaw Atomizer - %i
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/node /home/papa/.local/lib/node_modules/openclaw/dist/index.js gateway
Environment=PATH=/home/papa/.local/bin:/usr/local/bin:/usr/bin:/bin
Environment=HOME=/home/papa
Environment=OPENCLAW_STATE_DIR=/home/papa/atomizer/instances/%i
Environment=OPENCLAW_CONFIG_PATH=/home/papa/atomizer/instances/%i/openclaw.json
Environment=OPENCLAW_GATEWAY_TOKEN=31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1
EnvironmentFile=/home/papa/atomizer/instances/%i/env
EnvironmentFile=/home/papa/atomizer/config/.discord-tokens.env
Restart=always
RestartSec=5
StartLimitIntervalSec=60
StartLimitBurst=5

[Install]
WantedBy=default.target

Cluster Management Script

File: ~/atomizer/cluster.sh

# Start all:   bash cluster.sh start
# Stop all:    bash cluster.sh stop
# Restart all: bash cluster.sh restart
# Status:      bash cluster.sh status
# Logs:        bash cluster.sh logs [agent-name]

5. File System Layout

~/atomizer/
├── cluster.sh                     ← Cluster management script
├── config/
│   ├── .discord-tokens.env        ← All 8 bot tokens (env vars)
│   └── atomizer-discord.env       ← Legacy (can remove)
├── instances/                     ← Per-agent OpenClaw state
│   ├── manager/
│   │   ├── openclaw.json          ← Agent config (1 agent per instance)
│   │   ├── env                    ← Instance-specific env vars
│   │   └── agents/main/sessions/  ← Session data (auto-created)
│   ├── tech-lead/
│   ├── secretary/
│   ├── auditor/
│   ├── optimizer/
│   ├── study-builder/
│   ├── nx-expert/
│   └── webster/
├── workspaces/                    ← Agent workspaces (SOUL, AGENTS, memory)
│   ├── manager/
│   │   ├── SOUL.md
│   │   ├── AGENTS.md
│   │   ├── MEMORY.md
│   │   └── memory/
│   ├── secretary/
│   ├── technical-lead/
│   ├── auditor/
│   ├── optimizer/
│   ├── study-builder/
│   ├── nx-expert/
│   ├── webster/
│   └── shared/                    ← Shared context (CLUSTER.md, protocols)
└── tools/
    └── nxopen-mcp/                ← NX Open MCP server (for CAD)

Key distinction: instances/ = OpenClaw runtime state (configs, sessions, SQLite). workspaces/ = agent personality and memory (SOUL.md, AGENTS.md, etc.).


6. Inter-Agent Communication

Delegation Skill (Primary Method)

Manager and Tech Lead use the delegate skill to assign tasks to other agents. The skill wraps the OpenClaw Hooks API with port mapping, auth, error handling, and logging.

Location: /home/papa/atomizer/workspaces/shared/skills/delegate/ Installed on: Manager, Tech Lead (symlinked from shared)

# Usage
bash /home/papa/atomizer/workspaces/shared/skills/delegate/delegate.sh <agent> "<instruction>" [options]

# Examples
delegate.sh webster "Find CTE of Zerodur Class 0 between 20-40°C"
delegate.sh nx-expert "Mesh the M2 mirror" --channel C0AEJV13TEU --deliver
delegate.sh auditor "Review thermal analysis" --no-deliver

How it works:

  1. Looks up the target agent's port from hardcoded port map
  2. Checks if the target is running
  3. POSTs to http://127.0.0.1:PORT/hooks/agent with auth token
  4. Target agent processes the task asynchronously in an isolated session
  5. Response delivered to Discord if --deliver is set

Options: --channel <id>, --deliver (default), --no-deliver

Delegation Authority

Agent Can Delegate To
Manager All agents
Tech Lead All agents except Manager
All others Cannot delegate — request via Manager or Tech Lead

Hooks Protocol

All agents follow /home/papa/atomizer/workspaces/shared/HOOKS-PROTOCOL.md:

  • Hook messages = high-priority assignments, processed before other work
  • After completing tasks, agents append status to shared/project_log.md
  • Only the Manager updates shared/PROJECT_STATUS.md (gatekeeper pattern)

Raw Hooks API (Reference)

The delegate skill wraps this, but for reference:

curl -s -X POST http://127.0.0.1:PORT/hooks/agent \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1" \
  -d '{"message": "your request here", "deliver": true, "channel": "discord"}'

sessions_send / sessions_spawn

Agents configured with agentToAgent.enabled: true can use OpenClaw's built-in sessions_send and sessions_spawn tools to communicate within the same instance. Cross-instance communication requires the hooks API / delegate skill.


7. Current Status

Working

  • All 8 instances running as systemd services (auto-start on boot)
  • Each agent has its own Discord bot identity (name, avatar, presence)
  • Native Discord features: streaming, typing indicators, message chunking
  • Agent workspaces with SOUL.md, AGENTS.md, MEMORY.md
  • Hooks API enabled on all instances (Google Gemini + Anthropic auth configured)
  • Delegation skill deployed — Manager and Tech Lead can delegate tasks to any agent via delegate.sh
  • Hooks protocol — all agents know how to receive and prioritize delegated tasks
  • Gatekeeper pattern — Manager owns PROJECT_STATUS.md; others append to project_log.md
  • Cluster management via cluster.sh
  • Estimated total RAM: ~4.2GB for 8 instances

Known Issues

  • DELEGATE syntax is fake RESOLVED (2026-02-14): Replaced with delegate.sh skill using hooks API
  • Discord "Ambiguous recipient" bug (2026-02-15): OpenClaw Discord plugin requires user: or channel: prefix for message targets. When heartbeat tries to reply to a session that originated from a Discord DM, it uses the bare user ID → delivery fails. Workaround: Heartbeat disabled on Manager + Webster. Other agents unaffected (their sessions don't originate from Discord DMs). Proper fix requires OpenClaw patch to auto-infer user: for known user IDs.
  • Codex OAuth expired (2026-02-15): refresh_token_reused error — multiple instances racing to refresh the same shared Codex token. Secretary, Auditor, Study-Builder switched to Gemini 2.5 Pro. To restore Codex: Antoine must re-run codex login via SSH tunnel, then run ~/atomizer/scripts/sync-codex-tokens.sh.
  • No automated orchestration layer: Manager delegates manually (but now has proper tooling to do so — orchestrate.sh, workflow engine)
  • 5 agents not yet created: Post-Processor, Reporter, Developer, Knowledge Base, IT (from the original 13-agent plan)
  • Windows execution bridge (atomizer_job_watcher.py): exists but not connected end-to-end

8. Evolution History

Date Phase What Changed
2026-02-07 Phase 0 Vision doc created, 13-agent plan designed
2026-02-08 Phase 0 Single gateway (port 18790) running on Slack
2026-02-13 Discord Migration Discord server created, 8 bot tokens obtained
2026-02-14 (AM) Bridge Attempt discord-bridge.js built — worked but fragile (no streaming, polled session files)
2026-02-14 (PM) Multi-Instance Cluster Pivoted to 8 independent OpenClaw instances. Bridge killed. Native Discord restored.
2026-02-14 (PM) Delegation System Built delegate.sh skill, hooks protocol, gatekeeper pattern. Fake DELEGATE syntax replaced with real hooks API calls. Google Gemini auth added to all instances.
2026-02-15 Orchestration Engine Phases 1-3 complete: synchronous delegation (orchestrate.py), smart routing (capability registry), hierarchical delegation (Tech-Lead + Optimizer can sub-delegate), YAML workflow engine with parallel execution + approval gates. See 10-ORCHESTRATION-ENGINE-PLAN.md.
2026-02-15 Stability Fixes Discord heartbeat delivery bug identified (ambiguous recipient). Codex OAuth token expired (refresh_token_reused). Heartbeat disabled on Manager + Webster. Secretary/Auditor/Study-Builder switched from Codex to Gemini 2.5 Pro. HEARTBEAT.md created for all agents.

Created: 2026-02-14 by Mario This is the "as-built" document — updated as implementation evolves.