Atomizer/docs/hq/reviews/Review 2.md at cf82de4f06ce1f58ebf2e42e8bd94d787696e0e2

Files

Antoine cf82de4f06 docs: add HQ multi-agent framework documentation from PKM

- Project plan, agent roster, architecture, roadmap
- Decision log, full system plan, Discord setup/migration guides
- System implementation status (as-built)
- Cluster pivot history
- Orchestration engine plan (Phases 1-4)
- Webster and Auditor reviews

2026-02-15 21:44:07 +00:00

6.8 KiB

Raw Blame History

tags, up, date, description, summary, status

Gemini said

This is a fascinating architectural evolution. You’ve successfully pivoted from a monolithic "simulated" multi-agent system (Slack Bridge) to a distributed micro-service architecture (The "Cluster" on Discord).

This "Cluster" approach (8 independent systemd services on one machine) is the correct long-term play for stability and native features, but it introduces fragmentation that you haven't fully solved yet.

Here is my challenge to your current implementation, followed by concrete technical proposals to fix the "Missing Glue."

🚨 Critical Challenge: The "Fake Delegation" Problem

The Issue: In 08-SYSTEM-IMPLEMENTATION-STATUS.md, you admit:

"Manager's [DELEGATE:...] syntax is fake... It's just text output. Fix needed: Replace with actual exec + curl to hooks API."

Right now, you have 8 geniuses sitting in 8 separate rooms (ports), but the Manager is shouting instructions at a wall. If the Manager says [DELEGATE: nx-expert "mesh this"], nothing happens because OpenClaw doesn't know what that syntax means.

The Fix: The delegate_task Skill You need to build a custom tool (skill) for the Manager (and Tech Lead) that wraps the curl command to the Hooks API.

Proposed Implementation: Create a skill skills/orchestration/index.ts (or Python equivalent) available to the Manager:

TypeScript

// Map agent names to their local ports (from your Doc 08)
const PORT_MAP = {
  "technical": 18804,
  "secretary": 18808,
  "auditor": 18812,
  "optimizer": 18816,
  "study-builder": 18820,
  "nx-expert": 18824,
  "webster": 18828
};

async function delegateTask(targetAgent: string, instruction: string, channelId: string) {
  const port = PORT_MAP[targetAgent];
  if (!port) throw new Error("Unknown agent");

  // Call the OpenClaw Hooks API of the target agent
  // This makes the target agent "hear" the instruction
  const response = await fetch(`http://localhost:${port}/hooks/message`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      message: `🚨 **ASSIGNMENT FROM MANAGER:**\n${instruction}`,
      channel_id: channelId, // Respond in the same Discord thread/channel
      user: "Manager_Agent"  // Identify sender so they know context
    })
  });
  
  return `Signal sent to ${targetAgent} on port ${port}. Monitor channel for response.`;
}

Why this changes everything:

The Manager actually triggers the other process.
The target agent wakes up, sees the message in the correct Discord channel (via channel_id), and processes it.
The response appears natively in Discord.

🧠 Architectural Analysis: The T420 Bottleneck

The Constraint: You are running this on a Lenovo T420.

RAM: You estimated 4.2GB for 8 idle instances.
CPU: When 2-3 agents are active (e.g., Manager delegating to Tech Lead + Auditor), that dual-core i5 is going to scream. Node.js processes + Syncthing + potential Python execution will create latency.

Risk: If OpenClaw keeps the full context window loaded in memory for all 8 agents, you will OOM (Out of Memory) swap to disk, killing performance.

Proposed Improvements:

Aggressive Context Pruning: In your clawdbot.json (or equivalent OpenClaw config per instance), ensure context_limit is strict. The "Secretary" (Haiku) doesn't need 200k tokens.
Lazy Loading: Ensure agents that are rarely used (like "Auditor") don't keep heavy skills/models loaded until summoned.
Process Niceness: In your cluster.sh, start non-critical agents (Webster, Secretary) with a lower process priority (nice -n 10) so the Manager and Tech Lead always get CPU preference.

⚡ The Windows Bridge (Speed vs. Reliability)

The Issue: You are relying on Syncthing for the Job Queue (Doc 05/08).

Pros: Bulletproof. If network drops, it syncs later.
Cons: Latency. 5-30 seconds delay. In a chat interface, 30 seconds of "thinking" feels like broken software.

The Upgrade: Hybrid Triggering Keep Syncthing for the heavy files (NX part files, results databases), but use HTTP for the Trigger.

On Windows: Run a tiny FastAPI server (listening on Tailscale IP).
On T420 (Study Builder Agent): When run_optimization.py is ready:
- Write files to Syncthing folder.
- Wait 5 seconds.
- Send HTTP POST to Windows: http://[Windows-Tailscale-IP]:8000/trigger-run.
On Windows: The server receives the POST, checks if files match, and executes the script.

This cuts the "polling" loop out of the equation. The Windows machine reacts the moment the agent commands it.

📂 Shared State & Concurrency

The Issue: You moved away from the centralized bridge, which is good, but now you have distributed state. If the "Manager" writes to PROJECT_STATUS.md at the same time the "Secretary" tries to read it, you might get partial reads or file locks.

Recommendation: The "Bulletin Board" Protocol Since you are using a file system as a database (/opt/atomizer/workspaces/shared_context/), implement a strict protocol in the atomizer-protocols skill:

Append-Only Logs: For status updates, agents should append to project_log.md rather than overwriting a status file.
The "Talking Stick": Only the Secretary should have permission to rewrite/summarize the PROJECT_STATUS.md based on the logs. Other agents just add logs.
- Agent: "I finished task X" -> Appends to Log.
- Secretary (Periodic): Reads Log -> Updates Status Board.

📝 Immediate To-Do List (Next 24 Hours)

Based on your docs, here is the exact path to move from "deployed" to "functional":

Implement the delegate_task tool: Give the Manager the ability to curl the other ports. Without this, you have no company, just 8 freelancers ignoring each other.
Fix the Discord permissions: Ensure your cluster.sh sets the environment variables correctly so 18800 (Manager) behaves like a Manager and 18808 (Secretary) behaves like a Secretary.
Define the "Handshake": Update atomizer-protocols to explicitly tell agents: "When you receive a message via hooks, treat it as a high-priority direct command."
Smoke Test Orchestration:
- User (Discord): "Manager, ask Webster to find the latest density of Ti-6Al-4V."
- Manager (Internal): Calls delegate_task("webster", "Find density of Ti-6Al-4V").
- Webster (Discord): Replies in the thread with the data.

This setup is impressive, Mario. You've effectively built a Kubernetes-style cluster using systemd and a laptop. Just add the networking glue (hooks), and it's alive.

6.8 KiB Raw Blame History Unescape Escape