docs: add HQ multi-agent framework documentation from PKM
- Project plan, agent roster, architecture, roadmap - Decision log, full system plan, Discord setup/migration guides - System implementation status (as-built) - Cluster pivot history - Orchestration engine plan (Phases 1-4) - Webster and Auditor reviews
This commit is contained in:
167
docs/hq/reviews/Review 2.md
Normal file
167
docs/hq/reviews/Review 2.md
Normal file
@@ -0,0 +1,167 @@
|
||||
---
|
||||
tags:
|
||||
- Agentic
|
||||
- Project/Atomizer
|
||||
up:
|
||||
date: 2026-02-14
|
||||
description:
|
||||
summary: ""
|
||||
status:
|
||||
- ongoing
|
||||
---
|
||||
|
||||
###### Gemini said
|
||||
|
||||
This is a fascinating architectural evolution. You’ve successfully pivoted from a monolithic "simulated" multi-agent system (Slack Bridge) to a **distributed micro-service architecture** (The "Cluster" on Discord).
|
||||
|
||||
This "Cluster" approach (8 independent systemd services on one machine) is the correct long-term play for stability and native features, but it introduces **fragmentation** that you haven't fully solved yet.
|
||||
|
||||
Here is my challenge to your current implementation, followed by concrete technical proposals to fix the "Missing Glue."
|
||||
|
||||
---
|
||||
|
||||
### 🚨 Critical Challenge: The "Fake Delegation" Problem
|
||||
|
||||
**The Issue:** In `08-SYSTEM-IMPLEMENTATION-STATUS.md`, you admit:
|
||||
|
||||
> _"Manager's `[DELEGATE:...]` syntax is fake... It's just text output. Fix needed: Replace with actual exec + curl to hooks API."_
|
||||
|
||||
Right now, you have 8 geniuses sitting in 8 separate rooms (ports), but the Manager is shouting instructions at a wall. If the Manager says `[DELEGATE: nx-expert "mesh this"]`, nothing happens because OpenClaw doesn't know what that syntax means.
|
||||
|
||||
**The Fix: The `delegate_task` Skill** You need to build a custom tool (skill) for the Manager (and Tech Lead) that wraps the `curl` command to the Hooks API.
|
||||
|
||||
**Proposed Implementation:** Create a skill `skills/orchestration/index.ts` (or Python equivalent) available to the Manager:
|
||||
|
||||
TypeScript
|
||||
|
||||
```
|
||||
// Map agent names to their local ports (from your Doc 08)
|
||||
const PORT_MAP = {
|
||||
"technical": 18804,
|
||||
"secretary": 18808,
|
||||
"auditor": 18812,
|
||||
"optimizer": 18816,
|
||||
"study-builder": 18820,
|
||||
"nx-expert": 18824,
|
||||
"webster": 18828
|
||||
};
|
||||
|
||||
async function delegateTask(targetAgent: string, instruction: string, channelId: string) {
|
||||
const port = PORT_MAP[targetAgent];
|
||||
if (!port) throw new Error("Unknown agent");
|
||||
|
||||
// Call the OpenClaw Hooks API of the target agent
|
||||
// This makes the target agent "hear" the instruction
|
||||
const response = await fetch(`http://localhost:${port}/hooks/message`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
message: `🚨 **ASSIGNMENT FROM MANAGER:**\n${instruction}`,
|
||||
channel_id: channelId, // Respond in the same Discord thread/channel
|
||||
user: "Manager_Agent" // Identify sender so they know context
|
||||
})
|
||||
});
|
||||
|
||||
return `Signal sent to ${targetAgent} on port ${port}. Monitor channel for response.`;
|
||||
}
|
||||
```
|
||||
|
||||
**Why this changes everything:**
|
||||
|
||||
1. The Manager actually _triggers_ the other process.
|
||||
|
||||
2. The target agent wakes up, sees the message in the correct Discord channel (via `channel_id`), and processes it.
|
||||
|
||||
3. The response appears natively in Discord.
|
||||
|
||||
|
||||
---
|
||||
|
||||
### 🧠 Architectural Analysis: The T420 Bottleneck
|
||||
|
||||
**The Constraint:** You are running this on a **Lenovo T420**.
|
||||
|
||||
- **RAM:** You estimated 4.2GB for 8 idle instances.
|
||||
|
||||
- **CPU:** When 2-3 agents are active (e.g., Manager delegating to Tech Lead + Auditor), that dual-core i5 is going to scream. Node.js processes + Syncthing + potential Python execution will create latency.
|
||||
|
||||
|
||||
**Risk:** If OpenClaw keeps the full context window loaded in memory for all 8 agents, you will OOM (Out of Memory) swap to disk, killing performance.
|
||||
|
||||
**Proposed Improvements:**
|
||||
|
||||
1. **Aggressive Context Pruning:** In your `clawdbot.json` (or equivalent OpenClaw config per instance), ensure `context_limit` is strict. The "Secretary" (Haiku) doesn't need 200k tokens.
|
||||
|
||||
2. **Lazy Loading:** Ensure agents that are rarely used (like "Auditor") don't keep heavy skills/models loaded until summoned.
|
||||
|
||||
3. **Process Niceness:** In your `cluster.sh`, start non-critical agents (Webster, Secretary) with a lower process priority (`nice -n 10`) so the Manager and Tech Lead always get CPU preference.
|
||||
|
||||
|
||||
---
|
||||
|
||||
### ⚡ The Windows Bridge (Speed vs. Reliability)
|
||||
|
||||
**The Issue:** You are relying on **Syncthing** for the Job Queue (Doc 05/08).
|
||||
|
||||
- _Pros:_ Bulletproof. If network drops, it syncs later.
|
||||
|
||||
- _Cons:_ **Latency.** 5-30 seconds delay. In a chat interface, 30 seconds of "thinking" feels like broken software.
|
||||
|
||||
|
||||
**The Upgrade: Hybrid Triggering** Keep Syncthing for the heavy files (NX part files, results databases), but use **HTTP for the Trigger**.
|
||||
|
||||
1. **On Windows:** Run a tiny FastAPI server (listening on Tailscale IP).
|
||||
|
||||
2. **On T420 (Study Builder Agent):** When `run_optimization.py` is ready:
|
||||
|
||||
- Write files to Syncthing folder.
|
||||
|
||||
- _Wait 5 seconds._
|
||||
|
||||
- Send HTTP POST to Windows: `http://[Windows-Tailscale-IP]:8000/trigger-run`.
|
||||
|
||||
3. **On Windows:** The server receives the POST, checks if files match, and executes the script.
|
||||
|
||||
|
||||
This cuts the "polling" loop out of the equation. The Windows machine reacts the moment the agent commands it.
|
||||
|
||||
---
|
||||
|
||||
### 📂 Shared State & Concurrency
|
||||
|
||||
**The Issue:** You moved away from the centralized bridge, which is good, but now you have distributed state. If the "Manager" writes to `PROJECT_STATUS.md` at the same time the "Secretary" tries to read it, you might get partial reads or file locks.
|
||||
|
||||
**Recommendation: The "Bulletin Board" Protocol** Since you are using a file system as a database (`/opt/atomizer/workspaces/shared_context/`), implement a strict protocol in the `atomizer-protocols` skill:
|
||||
|
||||
1. **Append-Only Logs:** For status updates, agents should append to `project_log.md` rather than overwriting a status file.
|
||||
|
||||
2. **The "Talking Stick":** Only the **Secretary** should have permission to _rewrite/summarize_ the `PROJECT_STATUS.md` based on the logs. Other agents just add logs.
|
||||
|
||||
- _Agent:_ "I finished task X" -> Appends to Log.
|
||||
|
||||
- _Secretary (Periodic):_ Reads Log -> Updates Status Board.
|
||||
|
||||
|
||||
---
|
||||
|
||||
### 📝 Immediate To-Do List (Next 24 Hours)
|
||||
|
||||
Based on your docs, here is the exact path to move from "deployed" to "functional":
|
||||
|
||||
1. **Implement the `delegate_task` tool:** Give the Manager the ability to `curl` the other ports. Without this, you have no company, just 8 freelancers ignoring each other.
|
||||
|
||||
2. **Fix the Discord permissions:** Ensure your `cluster.sh` sets the environment variables correctly so `18800` (Manager) behaves like a Manager and `18808` (Secretary) behaves like a Secretary.
|
||||
|
||||
3. **Define the "Handshake":** Update `atomizer-protocols` to explicitly tell agents: _"When you receive a message via hooks, treat it as a high-priority direct command."_
|
||||
|
||||
4. **Smoke Test Orchestration:**
|
||||
|
||||
- User (Discord): "Manager, ask Webster to find the latest density of Ti-6Al-4V."
|
||||
|
||||
- Manager (Internal): Calls `delegate_task("webster", "Find density of Ti-6Al-4V")`.
|
||||
|
||||
- Webster (Discord): Replies in the thread with the data.
|
||||
|
||||
|
||||
This setup is impressive, Mario. You've effectively built a Kubernetes-style cluster using systemd and a laptop. Just add the networking glue (hooks), and it's alive.
|
||||
|
||||
Reference in New Issue
Block a user