- Project plan, agent roster, architecture, roadmap - Decision log, full system plan, Discord setup/migration guides - System implementation status (as-built) - Cluster pivot history - Orchestration engine plan (Phases 1-4) - Webster and Auditor reviews
6.8 KiB
tags, up, date, description, summary, status
| tags | up | date | description | summary | status | |||
|---|---|---|---|---|---|---|---|---|
|
2026-02-14 |
|
Gemini said
This is a fascinating architectural evolution. You’ve successfully pivoted from a monolithic "simulated" multi-agent system (Slack Bridge) to a distributed micro-service architecture (The "Cluster" on Discord).
This "Cluster" approach (8 independent systemd services on one machine) is the correct long-term play for stability and native features, but it introduces fragmentation that you haven't fully solved yet.
Here is my challenge to your current implementation, followed by concrete technical proposals to fix the "Missing Glue."
🚨 Critical Challenge: The "Fake Delegation" Problem
The Issue: In 08-SYSTEM-IMPLEMENTATION-STATUS.md, you admit:
"Manager's
[DELEGATE:...]syntax is fake... It's just text output. Fix needed: Replace with actual exec + curl to hooks API."
Right now, you have 8 geniuses sitting in 8 separate rooms (ports), but the Manager is shouting instructions at a wall. If the Manager says [DELEGATE: nx-expert "mesh this"], nothing happens because OpenClaw doesn't know what that syntax means.
The Fix: The delegate_task Skill You need to build a custom tool (skill) for the Manager (and Tech Lead) that wraps the curl command to the Hooks API.
Proposed Implementation: Create a skill skills/orchestration/index.ts (or Python equivalent) available to the Manager:
TypeScript
// Map agent names to their local ports (from your Doc 08)
const PORT_MAP = {
"technical": 18804,
"secretary": 18808,
"auditor": 18812,
"optimizer": 18816,
"study-builder": 18820,
"nx-expert": 18824,
"webster": 18828
};
async function delegateTask(targetAgent: string, instruction: string, channelId: string) {
const port = PORT_MAP[targetAgent];
if (!port) throw new Error("Unknown agent");
// Call the OpenClaw Hooks API of the target agent
// This makes the target agent "hear" the instruction
const response = await fetch(`http://localhost:${port}/hooks/message`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: `🚨 **ASSIGNMENT FROM MANAGER:**\n${instruction}`,
channel_id: channelId, // Respond in the same Discord thread/channel
user: "Manager_Agent" // Identify sender so they know context
})
});
return `Signal sent to ${targetAgent} on port ${port}. Monitor channel for response.`;
}
Why this changes everything:
-
The Manager actually triggers the other process.
-
The target agent wakes up, sees the message in the correct Discord channel (via
channel_id), and processes it. -
The response appears natively in Discord.
🧠 Architectural Analysis: The T420 Bottleneck
The Constraint: You are running this on a Lenovo T420.
-
RAM: You estimated 4.2GB for 8 idle instances.
-
CPU: When 2-3 agents are active (e.g., Manager delegating to Tech Lead + Auditor), that dual-core i5 is going to scream. Node.js processes + Syncthing + potential Python execution will create latency.
Risk: If OpenClaw keeps the full context window loaded in memory for all 8 agents, you will OOM (Out of Memory) swap to disk, killing performance.
Proposed Improvements:
-
Aggressive Context Pruning: In your
clawdbot.json(or equivalent OpenClaw config per instance), ensurecontext_limitis strict. The "Secretary" (Haiku) doesn't need 200k tokens. -
Lazy Loading: Ensure agents that are rarely used (like "Auditor") don't keep heavy skills/models loaded until summoned.
-
Process Niceness: In your
cluster.sh, start non-critical agents (Webster, Secretary) with a lower process priority (nice -n 10) so the Manager and Tech Lead always get CPU preference.
⚡ The Windows Bridge (Speed vs. Reliability)
The Issue: You are relying on Syncthing for the Job Queue (Doc 05/08).
-
Pros: Bulletproof. If network drops, it syncs later.
-
Cons: Latency. 5-30 seconds delay. In a chat interface, 30 seconds of "thinking" feels like broken software.
The Upgrade: Hybrid Triggering Keep Syncthing for the heavy files (NX part files, results databases), but use HTTP for the Trigger.
-
On Windows: Run a tiny FastAPI server (listening on Tailscale IP).
-
On T420 (Study Builder Agent): When
run_optimization.pyis ready:-
Write files to Syncthing folder.
-
Wait 5 seconds.
-
Send HTTP POST to Windows:
http://[Windows-Tailscale-IP]:8000/trigger-run.
-
-
On Windows: The server receives the POST, checks if files match, and executes the script.
This cuts the "polling" loop out of the equation. The Windows machine reacts the moment the agent commands it.
📂 Shared State & Concurrency
The Issue: You moved away from the centralized bridge, which is good, but now you have distributed state. If the "Manager" writes to PROJECT_STATUS.md at the same time the "Secretary" tries to read it, you might get partial reads or file locks.
Recommendation: The "Bulletin Board" Protocol Since you are using a file system as a database (/opt/atomizer/workspaces/shared_context/), implement a strict protocol in the atomizer-protocols skill:
-
Append-Only Logs: For status updates, agents should append to
project_log.mdrather than overwriting a status file. -
The "Talking Stick": Only the Secretary should have permission to rewrite/summarize the
PROJECT_STATUS.mdbased on the logs. Other agents just add logs.-
Agent: "I finished task X" -> Appends to Log.
-
Secretary (Periodic): Reads Log -> Updates Status Board.
-
📝 Immediate To-Do List (Next 24 Hours)
Based on your docs, here is the exact path to move from "deployed" to "functional":
-
Implement the
delegate_tasktool: Give the Manager the ability tocurlthe other ports. Without this, you have no company, just 8 freelancers ignoring each other. -
Fix the Discord permissions: Ensure your
cluster.shsets the environment variables correctly so18800(Manager) behaves like a Manager and18808(Secretary) behaves like a Secretary. -
Define the "Handshake": Update
atomizer-protocolsto explicitly tell agents: "When you receive a message via hooks, treat it as a high-priority direct command." -
Smoke Test Orchestration:
-
User (Discord): "Manager, ask Webster to find the latest density of Ti-6Al-4V."
-
Manager (Internal): Calls
delegate_task("webster", "Find density of Ti-6Al-4V"). -
Webster (Discord): Replies in the thread with the data.
-
This setup is impressive, Mario. You've effectively built a Kubernetes-style cluster using systemd and a laptop. Just add the networking glue (hooks), and it's alive.