- Project plan, agent roster, architecture, roadmap - Decision log, full system plan, Discord setup/migration guides - System implementation status (as-built) - Cluster pivot history - Orchestration engine plan (Phases 1-4) - Webster and Auditor reviews
168 lines
6.8 KiB
Markdown
168 lines
6.8 KiB
Markdown
---
|
||
tags:
|
||
- Agentic
|
||
- Project/Atomizer
|
||
up:
|
||
date: 2026-02-14
|
||
description:
|
||
summary: ""
|
||
status:
|
||
- ongoing
|
||
---
|
||
|
||
###### Gemini said
|
||
|
||
This is a fascinating architectural evolution. You’ve successfully pivoted from a monolithic "simulated" multi-agent system (Slack Bridge) to a **distributed micro-service architecture** (The "Cluster" on Discord).
|
||
|
||
This "Cluster" approach (8 independent systemd services on one machine) is the correct long-term play for stability and native features, but it introduces **fragmentation** that you haven't fully solved yet.
|
||
|
||
Here is my challenge to your current implementation, followed by concrete technical proposals to fix the "Missing Glue."
|
||
|
||
---
|
||
|
||
### 🚨 Critical Challenge: The "Fake Delegation" Problem
|
||
|
||
**The Issue:** In `08-SYSTEM-IMPLEMENTATION-STATUS.md`, you admit:
|
||
|
||
> _"Manager's `[DELEGATE:...]` syntax is fake... It's just text output. Fix needed: Replace with actual exec + curl to hooks API."_
|
||
|
||
Right now, you have 8 geniuses sitting in 8 separate rooms (ports), but the Manager is shouting instructions at a wall. If the Manager says `[DELEGATE: nx-expert "mesh this"]`, nothing happens because OpenClaw doesn't know what that syntax means.
|
||
|
||
**The Fix: The `delegate_task` Skill** You need to build a custom tool (skill) for the Manager (and Tech Lead) that wraps the `curl` command to the Hooks API.
|
||
|
||
**Proposed Implementation:** Create a skill `skills/orchestration/index.ts` (or Python equivalent) available to the Manager:
|
||
|
||
TypeScript
|
||
|
||
```
|
||
// Map agent names to their local ports (from your Doc 08)
|
||
const PORT_MAP = {
|
||
"technical": 18804,
|
||
"secretary": 18808,
|
||
"auditor": 18812,
|
||
"optimizer": 18816,
|
||
"study-builder": 18820,
|
||
"nx-expert": 18824,
|
||
"webster": 18828
|
||
};
|
||
|
||
async function delegateTask(targetAgent: string, instruction: string, channelId: string) {
|
||
const port = PORT_MAP[targetAgent];
|
||
if (!port) throw new Error("Unknown agent");
|
||
|
||
// Call the OpenClaw Hooks API of the target agent
|
||
// This makes the target agent "hear" the instruction
|
||
const response = await fetch(`http://localhost:${port}/hooks/message`, {
|
||
method: 'POST',
|
||
headers: { 'Content-Type': 'application/json' },
|
||
body: JSON.stringify({
|
||
message: `🚨 **ASSIGNMENT FROM MANAGER:**\n${instruction}`,
|
||
channel_id: channelId, // Respond in the same Discord thread/channel
|
||
user: "Manager_Agent" // Identify sender so they know context
|
||
})
|
||
});
|
||
|
||
return `Signal sent to ${targetAgent} on port ${port}. Monitor channel for response.`;
|
||
}
|
||
```
|
||
|
||
**Why this changes everything:**
|
||
|
||
1. The Manager actually _triggers_ the other process.
|
||
|
||
2. The target agent wakes up, sees the message in the correct Discord channel (via `channel_id`), and processes it.
|
||
|
||
3. The response appears natively in Discord.
|
||
|
||
|
||
---
|
||
|
||
### 🧠 Architectural Analysis: The T420 Bottleneck
|
||
|
||
**The Constraint:** You are running this on a **Lenovo T420**.
|
||
|
||
- **RAM:** You estimated 4.2GB for 8 idle instances.
|
||
|
||
- **CPU:** When 2-3 agents are active (e.g., Manager delegating to Tech Lead + Auditor), that dual-core i5 is going to scream. Node.js processes + Syncthing + potential Python execution will create latency.
|
||
|
||
|
||
**Risk:** If OpenClaw keeps the full context window loaded in memory for all 8 agents, you will OOM (Out of Memory) swap to disk, killing performance.
|
||
|
||
**Proposed Improvements:**
|
||
|
||
1. **Aggressive Context Pruning:** In your `clawdbot.json` (or equivalent OpenClaw config per instance), ensure `context_limit` is strict. The "Secretary" (Haiku) doesn't need 200k tokens.
|
||
|
||
2. **Lazy Loading:** Ensure agents that are rarely used (like "Auditor") don't keep heavy skills/models loaded until summoned.
|
||
|
||
3. **Process Niceness:** In your `cluster.sh`, start non-critical agents (Webster, Secretary) with a lower process priority (`nice -n 10`) so the Manager and Tech Lead always get CPU preference.
|
||
|
||
|
||
---
|
||
|
||
### ⚡ The Windows Bridge (Speed vs. Reliability)
|
||
|
||
**The Issue:** You are relying on **Syncthing** for the Job Queue (Doc 05/08).
|
||
|
||
- _Pros:_ Bulletproof. If network drops, it syncs later.
|
||
|
||
- _Cons:_ **Latency.** 5-30 seconds delay. In a chat interface, 30 seconds of "thinking" feels like broken software.
|
||
|
||
|
||
**The Upgrade: Hybrid Triggering** Keep Syncthing for the heavy files (NX part files, results databases), but use **HTTP for the Trigger**.
|
||
|
||
1. **On Windows:** Run a tiny FastAPI server (listening on Tailscale IP).
|
||
|
||
2. **On T420 (Study Builder Agent):** When `run_optimization.py` is ready:
|
||
|
||
- Write files to Syncthing folder.
|
||
|
||
- _Wait 5 seconds._
|
||
|
||
- Send HTTP POST to Windows: `http://[Windows-Tailscale-IP]:8000/trigger-run`.
|
||
|
||
3. **On Windows:** The server receives the POST, checks if files match, and executes the script.
|
||
|
||
|
||
This cuts the "polling" loop out of the equation. The Windows machine reacts the moment the agent commands it.
|
||
|
||
---
|
||
|
||
### 📂 Shared State & Concurrency
|
||
|
||
**The Issue:** You moved away from the centralized bridge, which is good, but now you have distributed state. If the "Manager" writes to `PROJECT_STATUS.md` at the same time the "Secretary" tries to read it, you might get partial reads or file locks.
|
||
|
||
**Recommendation: The "Bulletin Board" Protocol** Since you are using a file system as a database (`/opt/atomizer/workspaces/shared_context/`), implement a strict protocol in the `atomizer-protocols` skill:
|
||
|
||
1. **Append-Only Logs:** For status updates, agents should append to `project_log.md` rather than overwriting a status file.
|
||
|
||
2. **The "Talking Stick":** Only the **Secretary** should have permission to _rewrite/summarize_ the `PROJECT_STATUS.md` based on the logs. Other agents just add logs.
|
||
|
||
- _Agent:_ "I finished task X" -> Appends to Log.
|
||
|
||
- _Secretary (Periodic):_ Reads Log -> Updates Status Board.
|
||
|
||
|
||
---
|
||
|
||
### 📝 Immediate To-Do List (Next 24 Hours)
|
||
|
||
Based on your docs, here is the exact path to move from "deployed" to "functional":
|
||
|
||
1. **Implement the `delegate_task` tool:** Give the Manager the ability to `curl` the other ports. Without this, you have no company, just 8 freelancers ignoring each other.
|
||
|
||
2. **Fix the Discord permissions:** Ensure your `cluster.sh` sets the environment variables correctly so `18800` (Manager) behaves like a Manager and `18808` (Secretary) behaves like a Secretary.
|
||
|
||
3. **Define the "Handshake":** Update `atomizer-protocols` to explicitly tell agents: _"When you receive a message via hooks, treat it as a high-priority direct command."_
|
||
|
||
4. **Smoke Test Orchestration:**
|
||
|
||
- User (Discord): "Manager, ask Webster to find the latest density of Ti-6Al-4V."
|
||
|
||
- Manager (Internal): Calls `delegate_task("webster", "Find density of Ti-6Al-4V")`.
|
||
|
||
- Webster (Discord): Replies in the thread with the data.
|
||
|
||
|
||
This setup is impressive, Mario. You've effectively built a Kubernetes-style cluster using systemd and a laptop. Just add the networking glue (hooks), and it's alive.
|
||
|