Files

Antoine cf82de4f06 docs: add HQ multi-agent framework documentation from PKM

- Project plan, agent roster, architecture, roadmap
- Decision log, full system plan, Discord setup/migration guides
- System implementation status (as-built)
- Cluster pivot history
- Orchestration engine plan (Phases 1-4)
- Webster and Auditor reviews

2026-02-15 21:44:07 +00:00

15 KiB

Raw Blame History

🔧 08 — System Implementation Status

How the multi-agent system actually works right now, as built. Last updated: 2026-02-15

1. Architecture Overview

Multi-Instance Cluster: 8 independent OpenClaw gateway processes, one per agent. Each has its own systemd service, Discord bot token, port, and state directory.

┌──────────────────────────────────────────────────────────────────┐
│                        T420 (clawdbot)                           │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │  OpenClaw Gateway — Mario (main instance)                  │  │
│  │  Port 18789 │ Slack: Antoine's personal workspace          │  │
│  │  State: ~/.openclaw/                                       │  │
│  └────────────────────────────────────────────────────────────┘  │
│                                                                  │
│  ┌──────────────── Atomizer Cluster ────────────────────────┐   │
│  │                                                           │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │   │
│  │  │  Manager     │  │  Tech Lead  │  │  Secretary   │      │   │
│  │  │  :18800      │  │  :18804     │  │  :18808      │      │   │
│  │  │  Opus 4.6    │  │  Opus 4.6   │  │  Gemini 2.5  │      │   │
│  │  └──────┬───────┘  └──────┬──────┘  └──────┬───────┘      │   │
│  │         │                 │                 │              │   │
│  │  ┌──────┴───────┐  ┌─────┴──────┐  ┌──────┴───────┐      │   │
│  │  │  Auditor     │  │  Optimizer  │  │ Study Builder│      │   │
│  │  │  :18812      │  │  :18816     │  │  :18820      │      │   │
│  │  │  Opus 4.6    │  │  Sonnet 4.5 │  │  Sonnet 4.5  │      │   │
│  │  └──────────────┘  └────────────┘  └──────────────┘      │   │
│  │                                                           │   │
│  │  ┌─────────────┐  ┌─────────────┐                        │   │
│  │  │  NX Expert   │  │  Webster    │                        │   │
│  │  │  :18824      │  │  :18828     │                        │   │
│  │  │  Sonnet 4.5  │  │  Gemini 2.5 │                        │   │
│  │  └─────────────┘  └─────────────┘                        │   │
│  │                                                           │   │
│  │  Inter-agent: hooks API (curl between ports)              │   │
│  │  Shared token: 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd…  │   │
│  └───────────────────────────────────────────────────────────┘   │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────────┐
│                 Discord: Atomizer-HQ Server                      │
│                 Guild: 1471858733452890132                        │
│                                                                  │
│  📋 COMMAND: #ceo-office, #announcements, #daily-standup         │
│  🔧 ENGINEERING: #technical, #code-review, #fea-analysis, #nx   │
│  📊 OPERATIONS: #task-board, #meeting-notes, #reports            │
│  🔬 RESEARCH: #literature, #materials-data                       │
│  🏗️ PROJECTS: #active-projects                                  │
│  📚 KNOWLEDGE: #knowledge-base, #lessons-learned                 │
│  🤖 SYSTEM: #agent-logs, #inter-agent, #it-ops                  │
│                                                                  │
│  Each agent = its own Discord bot with unique name & avatar      │
└──────────────────────────────────────────────────────────────────┘

2. Why Multi-Instance (Not Single Gateway)

OpenClaw's native Discord provider (@buape/carbon) has a race condition bug when multiple bot tokens connect from one process. Since we need 8 separate bot accounts, we run 8 separate processes — each handles exactly one token, bypassing the bug entirely.

Advantages over previous bridge approach:

Native Discord streaming, threads, reactions, attachments
Fault isolation — one agent crashing doesn't take down the others
No middleware polling session files on disk
Each agent appears as its own Discord user with independent presence

3. Port Map

Agent	Port	Model	Notes
Manager	18800	Opus 4.6	Orchestrates, delegates. Heartbeat disabled (Discord delivery bug)
Tech Lead	18804	Opus 4.6	Technical authority
Secretary	18808	Gemini 2.5 Pro	Task tracking, notes. Changed from Codex 2026-02-15 (OAuth expired)
Auditor	18812	Gemini 2.5 Pro	Quality review. Changed from Codex 2026-02-15 (OAuth expired)
Optimizer	18816	Sonnet 4.5	Optimization work
Study Builder	18820	Gemini 2.5 Pro	Study setup. Changed from Codex 2026-02-15 (OAuth expired)
NX Expert	18824	Sonnet 4.5	CAD/NX work
Webster	18828	Gemini 2.5 Pro	Research. Heartbeat disabled (Discord delivery bug)

⚠️ Port spacing = 4. OpenClaw uses port N AND N+3 (browser service). Never assign adjacent ports.

4. Systemd Setup

Template Service

File: ~/.config/systemd/user/openclaw-atomizer@.service

[Unit]
Description=OpenClaw Atomizer - %i
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/node /home/papa/.local/lib/node_modules/openclaw/dist/index.js gateway
Environment=PATH=/home/papa/.local/bin:/usr/local/bin:/usr/bin:/bin
Environment=HOME=/home/papa
Environment=OPENCLAW_STATE_DIR=/home/papa/atomizer/instances/%i
Environment=OPENCLAW_CONFIG_PATH=/home/papa/atomizer/instances/%i/openclaw.json
Environment=OPENCLAW_GATEWAY_TOKEN=31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1
EnvironmentFile=/home/papa/atomizer/instances/%i/env
EnvironmentFile=/home/papa/atomizer/config/.discord-tokens.env
Restart=always
RestartSec=5
StartLimitIntervalSec=60
StartLimitBurst=5

[Install]
WantedBy=default.target

Cluster Management Script

File: ~/atomizer/cluster.sh

# Start all:   bash cluster.sh start
# Stop all:    bash cluster.sh stop
# Restart all: bash cluster.sh restart
# Status:      bash cluster.sh status
# Logs:        bash cluster.sh logs [agent-name]

5. File System Layout

~/atomizer/
├── cluster.sh                     ← Cluster management script
├── config/
│   ├── .discord-tokens.env        ← All 8 bot tokens (env vars)
│   └── atomizer-discord.env       ← Legacy (can remove)
├── instances/                     ← Per-agent OpenClaw state
│   ├── manager/
│   │   ├── openclaw.json          ← Agent config (1 agent per instance)
│   │   ├── env                    ← Instance-specific env vars
│   │   └── agents/main/sessions/  ← Session data (auto-created)
│   ├── tech-lead/
│   ├── secretary/
│   ├── auditor/
│   ├── optimizer/
│   ├── study-builder/
│   ├── nx-expert/
│   └── webster/
├── workspaces/                    ← Agent workspaces (SOUL, AGENTS, memory)
│   ├── manager/
│   │   ├── SOUL.md
│   │   ├── AGENTS.md
│   │   ├── MEMORY.md
│   │   └── memory/
│   ├── secretary/
│   ├── technical-lead/
│   ├── auditor/
│   ├── optimizer/
│   ├── study-builder/
│   ├── nx-expert/
│   ├── webster/
│   └── shared/                    ← Shared context (CLUSTER.md, protocols)
└── tools/
    └── nxopen-mcp/                ← NX Open MCP server (for CAD)

Key distinction: instances/ = OpenClaw runtime state (configs, sessions, SQLite). workspaces/ = agent personality and memory (SOUL.md, AGENTS.md, etc.).

6. Inter-Agent Communication

Delegation Skill (Primary Method)

Manager and Tech Lead use the delegate skill to assign tasks to other agents. The skill wraps the OpenClaw Hooks API with port mapping, auth, error handling, and logging.

Location: /home/papa/atomizer/workspaces/shared/skills/delegate/ Installed on: Manager, Tech Lead (symlinked from shared)

# Usage
bash /home/papa/atomizer/workspaces/shared/skills/delegate/delegate.sh <agent> "<instruction>" [options]

# Examples
delegate.sh webster "Find CTE of Zerodur Class 0 between 20-40°C"
delegate.sh nx-expert "Mesh the M2 mirror" --channel C0AEJV13TEU --deliver
delegate.sh auditor "Review thermal analysis" --no-deliver

How it works:

Looks up the target agent's port from hardcoded port map
Checks if the target is running
POSTs to http://127.0.0.1:PORT/hooks/agent with auth token
Target agent processes the task asynchronously in an isolated session
Response delivered to Discord if --deliver is set

Options: --channel <id>, --deliver (default), --no-deliver

Delegation Authority

Agent	Can Delegate To
Manager	All agents
Tech Lead	All agents except Manager
All others	Cannot delegate — request via Manager or Tech Lead

Hooks Protocol

All agents follow /home/papa/atomizer/workspaces/shared/HOOKS-PROTOCOL.md:

Hook messages = high-priority assignments, processed before other work
After completing tasks, agents append status to shared/project_log.md
Only the Manager updates shared/PROJECT_STATUS.md (gatekeeper pattern)

Raw Hooks API (Reference)

The delegate skill wraps this, but for reference:

curl -s -X POST http://127.0.0.1:PORT/hooks/agent \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1" \
  -d '{"message": "your request here", "deliver": true, "channel": "discord"}'

sessions_send / sessions_spawn

Agents configured with agentToAgent.enabled: true can use OpenClaw's built-in sessions_send and sessions_spawn tools to communicate within the same instance. Cross-instance communication requires the hooks API / delegate skill.

7. Current Status

✅ Working

All 8 instances running as systemd services (auto-start on boot)
Each agent has its own Discord bot identity (name, avatar, presence)
Native Discord features: streaming, typing indicators, message chunking
Agent workspaces with SOUL.md, AGENTS.md, MEMORY.md
Hooks API enabled on all instances (Google Gemini + Anthropic auth configured)
Delegation skill deployed — Manager and Tech Lead can delegate tasks to any agent via delegate.sh
Hooks protocol — all agents know how to receive and prioritize delegated tasks
Gatekeeper pattern — Manager owns PROJECT_STATUS.md; others append to project_log.md
Cluster management via cluster.sh
Estimated total RAM: ~4.2GB for 8 instances

❌ Known Issues

~~DELEGATE syntax is fake~~ → ✅ RESOLVED (2026-02-14): Replaced with delegate.sh skill using hooks API
Discord "Ambiguous recipient" bug (2026-02-15): OpenClaw Discord plugin requires user: or channel: prefix for message targets. When heartbeat tries to reply to a session that originated from a Discord DM, it uses the bare user ID → delivery fails. Workaround: Heartbeat disabled on Manager + Webster. Other agents unaffected (their sessions don't originate from Discord DMs). Proper fix requires OpenClaw patch to auto-infer user: for known user IDs.
Codex OAuth expired (2026-02-15): refresh_token_reused error — multiple instances racing to refresh the same shared Codex token. Secretary, Auditor, Study-Builder switched to Gemini 2.5 Pro. To restore Codex: Antoine must re-run codex login via SSH tunnel, then run ~/atomizer/scripts/sync-codex-tokens.sh.
No automated orchestration layer: Manager delegates manually (but now has proper tooling to do so — orchestrate.sh, workflow engine)
5 agents not yet created: Post-Processor, Reporter, Developer, Knowledge Base, IT (from the original 13-agent plan)
Windows execution bridge (atomizer_job_watcher.py): exists but not connected end-to-end

8. Evolution History

Date	Phase	What Changed
2026-02-07	Phase 0	Vision doc created, 13-agent plan designed
2026-02-08	Phase 0	Single gateway (port 18790) running on Slack
2026-02-13	Discord Migration	Discord server created, 8 bot tokens obtained
2026-02-14 (AM)	Bridge Attempt	discord-bridge.js built — worked but fragile (no streaming, polled session files)
2026-02-14 (PM)	Multi-Instance Cluster	Pivoted to 8 independent OpenClaw instances. Bridge killed. Native Discord restored.
2026-02-14 (PM)	Delegation System	Built `delegate.sh` skill, hooks protocol, gatekeeper pattern. Fake DELEGATE syntax replaced with real hooks API calls. Google Gemini auth added to all instances.
2026-02-15	Orchestration Engine	Phases 1-3 complete: synchronous delegation (`orchestrate.py`), smart routing (capability registry), hierarchical delegation (Tech-Lead + Optimizer can sub-delegate), YAML workflow engine with parallel execution + approval gates. See `10-ORCHESTRATION-ENGINE-PLAN.md`.
2026-02-15	Stability Fixes	Discord heartbeat delivery bug identified (ambiguous recipient). Codex OAuth token expired (refresh_token_reused). Heartbeat disabled on Manager + Webster. Secretary/Auditor/Study-Builder switched from Codex to Gemini 2.5 Pro. HEARTBEAT.md created for all agents.

Created: 2026-02-14 by Mario This is the "as-built" document — updated as implementation evolves.

15 KiB Raw Blame History