# 10 — Orchestration Engine: Multi-Instance Intelligence > **Status:** Phases 1-3 Complete — Phase 4 (Metrics + Docs) in progress > **Author:** Mario Lavoie (with Antoine) > **Date:** 2026-02-15 > **Revised:** 2026-02-15 — Incorporated Webster's review (validation loops, error handling, hierarchical delegation) --- ## Problem Statement The Atomizer HQ cluster runs 8 independent OpenClaw instances (one per agent). This gives us true parallelism, specialized contexts, and independent Discord identities — but we lost the orchestration primitives that make a single OpenClaw instance powerful: - **`sessions_spawn`** — synchronous delegation with result return - **`sessions_history`** — cross-session context reading - **`sessions_send`** — bidirectional inter-session messaging The current `delegate.sh` is fire-and-forget. Manager throws a task over the wall and hopes. No result flows back. No chaining. No intelligent multi-step workflows. **Goal:** Rebuild OpenClaw's orchestration power at the inter-instance level, enhanced with Discord channel context and a capability registry. --- ## Architecture Overview Three layers, each building on the last: ``` ┌─────────────────────────────────────────────────────┐ │ LAYER 3: WORKFLOWS │ │ YAML-defined multi-step pipelines │ │ (sequential, parallel, conditional branching) │ ├─────────────────────────────────────────────────────┤ │ LAYER 2: SMART ROUTING │ │ Capability registry + channel context │ │ (manager knows who can do what + project state) │ ├─────────────────────────────────────────────────────┤ │ LAYER 1: ORCHESTRATION CORE │ │ Synchronous delegation + result return protocol │ │ (replaces fire-and-forget delegate.sh) │ ├─────────────────────────────────────────────────────┤ │ EXISTING INFRASTRUCTURE │ │ 8 OpenClaw instances, hooks API, shared filesystem│ └─────────────────────────────────────────────────────┘ ``` --- ## Layer 1: Orchestration Core **What it does:** Replaces `delegate.sh` with synchronous delegation. Manager sends a task, waits for the result, gets structured output back. Can then chain to the next agent. ### 1.1 — The Orchestrate Script **File:** `/home/papa/atomizer/workspaces/shared/skills/orchestrate/orchestrate.sh` **Behavior:** 1. Send task to target agent via `/hooks/agent` (existing mechanism) 2. Poll the agent's session for completion via `/hooks/status/{runId}` or `/sessions` API 3. Capture the agent's response (structured output) 4. Return it to the calling agent's session ```bash # Usage result=$(bash orchestrate.sh "" [options]) # Example: synchronous delegation result=$(bash orchestrate.sh webster "Find CTE of Zerodur Class 0 at 20-40°C" --wait --timeout 120) echo "$result" # Structured findings returned to manager's session ``` **Options:** - `--wait` — Block until agent completes (default for orchestrate) - `--timeout ` — Max wait time (default: 300) - `--retries ` — Retry on failure (default: 1, max: 3) - `--format json|text` — Expected response format - `--context ` — Attach context file to the task - `--channel-context [--messages N]` — Include recent channel history as context - `--validate` — Run lightweight self-check on agent output before returning - `--no-deliver` — Don't post to Discord (manager will synthesize and post) ### 1.2 — Report-Back Protocol Each agent gets instructions in their SOUL.md to format delegation responses: ```markdown ## When responding to a delegated task: Structure your response as: **TASK:** [restate what was asked] **STATUS:** complete | partial | blocked | failed **RESULT:** [your findings/output] **ARTIFACTS:** [any files created, with paths] **CONFIDENCE:** high | medium | low **NOTES:** [caveats, assumptions, open questions] ``` This gives manager structured data to reason about, not just a wall of text. ### 1.3 — Validation & Self-Check Protocol Every delegated response goes through a lightweight validation before the orchestrator accepts it: **Self-Check (built into agent SOUL.md instructions):** Each agent, when responding to a delegated task, must verify: - Did I answer all parts of the question? - Did I provide sources/evidence where applicable? - Is my confidence rating honest? If the agent's self-check identifies gaps, it sets `STATUS: partial` and explains what's missing in `NOTES`. **Orchestrator-Side Validation (in `orchestrate.sh`):** When `--validate` is passed (or for workflow steps with `validation` blocks): 1. Check that handoff JSON has all required fields (status, result, confidence) 2. If `STATUS: failed` or `STATUS: blocked` → trigger retry (up to `--retries` limit) 3. If `STATUS: partial` and confidence is `low` → retry with refined prompt including the partial result 4. If retries exhausted → return partial result with warning flag for the orchestrator to decide **Full Audit Validation (for high-stakes steps):** Workflow YAML can specify a validation agent (typically auditor) for critical steps: ```yaml - id: research agent: webster task: "Research materials..." validation: agent: auditor criteria: "Are all requested properties present with credible sources?" on_fail: retry max_retries: 2 ``` This runs the auditor on the output before passing it downstream. Prevents garbage-in-garbage-out in critical pipelines. ### 1.4 — Error Handling (Phase 1 Priority) Error handling is not deferred — it ships with the orchestration core: **Agent unreachable:** - `orchestrate.sh` checks health endpoint before sending - If agent is down: log error, return immediately with `STATUS: error, reason: agent_unreachable` - Caller (manager or workflow engine) decides whether to retry, skip, or abort **Timeout:** - Configurable per call (`--timeout`) and per workflow step - On timeout: kill the polling loop, check if partial handoff exists - If partial result available: return it with `STATUS: timeout_partial` - If no result: return `STATUS: timeout` **Malformed response:** - Agent didn't write handoff file or wrote invalid JSON - `orchestrate.sh` validates JSON schema before returning - On malformed: retry once with explicit reminder to write structured output - If still malformed: return raw text with `STATUS: malformed` **Retry logic (with idempotency):** ``` Attempt 1: Generate idempotencyKey={wfRunId}_{stepId}_1 → Send task → wait → check result If timeout → Check if handoff file exists (late arrival). If yes → use it. If no: Attempt 2: idempotencyKey={wfRunId}_{stepId}_2 → Resend with "Previous attempt failed: {reason}. Please retry." If timeout → Same late-arrival check. If no: Attempt 3 (if --retries 3): Same pattern If fail → Return error to caller with all attempt details ``` **Key rule:** Before every retry, check if the handoff file from the previous attempt landed. Prevents duplicate work when an agent was just slow, not dead. ### 1.5 — Result Capture Mechanism Two options (implement both, prefer A): **Option A — File-based handoff:** - Agent writes result to `/home/papa/atomizer/handoffs/{runId}.json` - Orchestrate script polls for file existence - Clean, simple, works with shared filesystem ```json { "schemaVersion": "1.0", "runId": "hook-delegation-1739587200", "idempotencyKey": "wf-mat-study-001_research_1", "workflowRunId": "wf-mat-study-001", "stepId": "research", "attempt": 1, "agent": "webster", "status": "complete", "result": "Zerodur Class 0 CTE: 0 ± 0.007 ppm/K (20-40°C)...", "artifacts": [], "confidence": "high", "latencyMs": 45200, "timestamp": "2026-02-15T03:00:00Z" } ``` **Required fields:** `schemaVersion`, `runId`, `agent`, `status`, `result`, `confidence`, `timestamp` **Trace fields (required):** `workflowRunId`, `stepId`, `attempt`, `latencyMs` **Idempotency:** `idempotencyKey` = `{workflowRunId}_{stepId}_{attempt}`. Orchestrator checks for existing handoff before retrying — if result exists, skip resend. **Option B — Hooks callback:** - Agent calls manager's `/hooks/report` endpoint with result - More real-time but adds complexity - Use for time-sensitive workflows ### 1.6 — Chaining Example ```bash # Manager orchestrates a material trade study # Step 1: Research data=$(bash orchestrate.sh webster "Research Clearceram-Z HS vs Zerodur Class 0: CTE, density, cost, lead time" --wait) # Step 2: Technical evaluation (pass webster's findings as context) echo "$data" > /tmp/material_data.json assessment=$(bash orchestrate.sh tech-lead "Evaluate these materials for M2/M3 mirrors against our thermal requirements" --context /tmp/material_data.json --wait) # Step 3: Audit echo "$assessment" > /tmp/assessment.json audit=$(bash orchestrate.sh auditor "Review this technical assessment for completeness" --context /tmp/assessment.json --wait) # Step 4: Manager synthesizes and delivers # (Manager has all three results in-session, reasons about them, posts to Discord) ``` --- ## Layer 2: Smart Routing **What it does:** Manager knows each agent's capabilities, strengths, and model. Routes tasks intelligently without hardcoded logic. ### 2.1 — Agent Capability Registry **File:** `/home/papa/atomizer/workspaces/shared/AGENTS_REGISTRY.json` ```json { "agents": { "tech-lead": { "port": 18804, "model": "anthropic/claude-opus-4-6", "capabilities": [ "fea-review", "design-decisions", "technical-analysis", "material-selection", "requirements-validation", "trade-studies" ], "strengths": "Deep reasoning, technical judgment, complex analysis", "limitations": "Slow (Opus), expensive tokens — use for high-value decisions", "inputFormat": "Technical problem with context and constraints", "outputFormat": "Structured analysis with recommendations and rationale", "channels": ["#hq", "#technical"] }, "webster": { "port": 18828, "model": "google/gemini-2.5-pro", "capabilities": [ "web-research", "literature-review", "data-lookup", "supplier-search", "standards-lookup", "competitive-analysis" ], "strengths": "Fast research, broad knowledge, cheap tokens, web access", "limitations": "No deep technical judgment — finds data, doesn't evaluate it", "inputFormat": "Natural language query with specifics", "outputFormat": "Structured findings with sources and confidence", "channels": ["#hq", "#research"] }, "optimizer": { "port": 18816, "model": "anthropic/claude-sonnet-4-20250514", "capabilities": [ "optimization-setup", "parameter-studies", "objective-definition", "constraint-formulation", "result-interpretation", "sensitivity-analysis" ], "strengths": "Optimization methodology, mathematical formulation, DOE", "limitations": "Needs clear problem definition — not for open-ended exploration", "inputFormat": "Optimization problem with objectives, variables, constraints", "outputFormat": "Study configuration, parameter definitions, result analysis", "channels": ["#hq", "#optimization"] }, "study-builder": { "port": 18820, "model": "anthropic/claude-sonnet-4-20250514", "capabilities": [ "study-configuration", "doe-setup", "batch-generation", "parameter-sweeps", "study-templates" ], "strengths": "Translating optimization plans into executable study configs", "limitations": "Needs optimizer's plan as input — doesn't design studies independently", "inputFormat": "Study plan from optimizer with parameter ranges", "outputFormat": "Ready-to-run study configuration files", "channels": ["#hq", "#optimization"] }, "nx-expert": { "port": 18824, "model": "anthropic/claude-sonnet-4-20250514", "capabilities": [ "nx-operations", "mesh-generation", "boundary-conditions", "nastran-setup", "cad-manipulation", "post-processing" ], "strengths": "NX/Simcenter expertise, FEA model setup, hands-on CAD/FEM work", "limitations": "Needs clear instructions — not for high-level design decisions", "inputFormat": "Specific NX task with model reference and parameters", "outputFormat": "Completed operation with verification screenshots/data", "channels": ["#hq", "#nx-work"] }, "auditor": { "port": 18812, "model": "anthropic/claude-opus-4-6", "capabilities": [ "quality-review", "compliance-check", "methodology-audit", "assumption-validation", "report-review", "standards-compliance" ], "strengths": "Critical eye, finds gaps and errors, ensures rigor", "limitations": "Reviews work, doesn't create it — needs output from other agents", "inputFormat": "Work product to review with applicable standards/requirements", "outputFormat": "Structured review: findings, severity, recommendations", "channels": ["#hq", "#quality"] }, "secretary": { "port": 18808, "model": "google/gemini-2.5-flash", "capabilities": [ "meeting-notes", "status-reports", "documentation", "scheduling", "action-tracking", "communication-drafting" ], "strengths": "Fast, cheap, good at summarization and admin tasks", "limitations": "Not for technical work — administrative and organizational only", "inputFormat": "Admin task or raw content to organize", "outputFormat": "Clean documentation, summaries, action lists", "channels": ["#hq", "#admin"] }, "manager": { "port": 18800, "model": "anthropic/claude-opus-4-6", "capabilities": [ "orchestration", "project-planning", "task-decomposition", "priority-management", "stakeholder-communication", "workflow-execution" ], "strengths": "Strategic thinking, orchestration, synthesis across agents", "limitations": "Should not do technical work — delegates everything", "inputFormat": "High-level directives from Antoine (CEO)", "outputFormat": "Plans, status updates, synthesized deliverables", "channels": ["#hq"] } } } ``` ### 2.2 — Manager Routing Logic Added to Manager's SOUL.md as a skill directive: ```markdown ## Smart Routing Before delegating, consult `/home/papa/atomizer/workspaces/shared/AGENTS_REGISTRY.json`. - Match task requirements to agent capabilities - Consider model strengths (Opus for reasoning, Gemini for speed, Sonnet for balanced) - For multi-step tasks, plan the full pipeline before starting - Prefer parallel execution when steps are independent - Always specify what you need back (don't let agents guess) ``` ### 2.3 — Discord Channel Context Integration **How channels feed context into orchestration:** Each Discord channel accumulates project-specific conversation history. The orchestration layer can pull this as context: ```bash # In orchestrate.sh, --channel-context fetches recent messages bash orchestrate.sh tech-lead "Review thermal margins for M2" \ --channel-context "#gigabit-m1" --messages 50 \ --wait ``` **Implementation:** Use Discord bot API (each instance has a bot token) to fetch channel message history. Format as context block prepended to the task. **Channel strategy for Atomizer HQ Discord:** | Channel | Purpose | Context Value | |---------|---------|---------------| | `#hq` | Cross-team coordination, announcements | Project-wide decisions | | `#technical` | FEA discussions, design decisions | Technical context for analysis tasks | | `#optimization` | Study configs, results, methodology | Optimization history and patterns | | `#research` | Webster's findings, literature | Reference data for technical work | | `#quality` | Audit findings, compliance notes | Review standards and past issues | | `#nx-work` | CAD/FEM operations, model updates | Model state and recent changes | | `#admin` | Meeting notes, schedules, action items | Project timeline and commitments | | `#handoffs` | Automated orchestration results (bot-only) | Pipeline audit trail | **Key insight:** Channels become **persistent, queryable context stores**. Instead of passing massive context blocks between agents, you say "read #technical for the last 20 messages" and the agent absorbs project state naturally. **Channel Context Sanitization (security):** Discord history is untrusted input. Before injecting into an agent's context: - Cap at configurable token window (default: last 30 messages, max ~4K tokens) - Strip any system-prompt-like instructions from message content - Tag entire block as `[CHANNEL CONTEXT — untrusted, for reference only]` - Never let channel content override task instructions This prevents prompt injection via crafted Discord messages in channel history. --- ## Layer 3: Workflow Engine **What it does:** Defines reusable multi-step pipelines as YAML. Manager reads and executes them. No coding needed to create new workflows. ### 3.1 — Workflow Definition Format **Location:** `/home/papa/atomizer/workspaces/shared/workflows/` ```yaml # /home/papa/atomizer/workspaces/shared/workflows/material-trade-study.yaml name: Material Trade Study description: Research, evaluate, and audit material options for optical components trigger: manual # or: keyword, schedule inputs: materials: type: list description: "Materials to compare" requirements: type: text description: "Performance requirements and constraints" project_channel: type: channel description: "Project channel for context" steps: - id: research agent: webster task: | Research the following materials: {materials} For each material, find: CTE (with temperature range), density, Young's modulus, cost per kg, lead time, availability, and any known issues for optical applications. Provide sources for all data. channel_context: "{project_channel}" channel_messages: 30 timeout: 180 retries: 2 output: material_data validation: agent: auditor criteria: "Are all requested material properties present with credible sources? Flag any missing data." on_fail: retry - id: evaluate agent: tech-lead task: | Evaluate these materials against our requirements: REQUIREMENTS: {requirements} MATERIAL DATA: {material_data} Provide a recommendation with full rationale. Include a comparison matrix. depends_on: [research] timeout: 300 retries: 1 output: technical_assessment - id: audit agent: auditor task: | Review this material trade study for completeness, methodological rigor, and potential gaps: {technical_assessment} Check: Are all requirements addressed? Are sources credible? Are there materials that should have been considered but weren't? depends_on: [evaluate] timeout: 180 output: audit_result - id: synthesize agent: manager action: synthesize # Manager processes internally, doesn't delegate inputs: [material_data, technical_assessment, audit_result] deliver: channel: "{project_channel}" format: summary # Manager writes a clean summary post notifications: on_complete: "#hq" on_failure: "#hq" ``` ### 3.2 — More Workflow Templates **Design Review:** ```yaml name: Design Review steps: - id: prepare agent: secretary task: "Compile design package: gather latest CAD screenshots, analysis results, and requirements from {project_channel}" - id: technical_review agent: tech-lead task: "Review design against requirements: {prepare}" depends_on: [prepare] - id: optimization_review agent: optimizer task: "Assess optimization potential: {prepare}" depends_on: [prepare] # technical_review and optimization_review run in PARALLEL (no dependency between them) - id: audit agent: auditor task: "Final review: {technical_review} + {optimization_review}" depends_on: [technical_review, optimization_review] - id: deliver agent: secretary task: "Format design review report from: {audit}" depends_on: [audit] deliver: channel: "{project_channel}" ``` **Quick Research:** ```yaml name: Quick Research steps: - id: research agent: webster task: "{query}" timeout: 120 output: findings - id: validate agent: tech-lead task: "Verify these findings are accurate and relevant: {findings}" depends_on: [research] deliver: channel: "{request_channel}" ``` ### 3.3 — Workflow Executor **File:** `/home/papa/atomizer/workspaces/shared/skills/orchestrate/workflow.sh` The manager's orchestration skill reads YAML workflows and executes them: ```bash # Run a workflow bash workflow.sh material-trade-study \ --input materials="Zerodur Class 0, Clearceram-Z HS, ULE" \ --input requirements="CTE < 0.01 ppm/K at 22°C, aperture 250mm" \ --input project_channel="#gigabit-m1" ``` **Executor logic:** 1. Parse YAML workflow definition 2. Resolve dependencies → build execution graph 3. Execute steps in order (parallel when no dependencies) 4. For each step: call `orchestrate.sh` with task + resolved inputs 5. Store results in `/home/papa/atomizer/handoffs/workflows/{workflow-run-id}/` 6. On completion: deliver final output to specified channel 7. On failure: notify `#hq` with error details and partial results --- ## Implementation Plan ### Phase 1: Orchestration Core + Validation + Error Handling (Day 1 — Feb 15) ✅ COMPLETE **Actual effort: ~6 hours** - [x] **1.1** Created `/home/papa/atomizer/workspaces/shared/skills/orchestrate/` directory - [x] **1.2** Built `orchestrate.py` (Python, not bash) — synchronous delegation with inotify-based waiting - Send via `/hooks/agent` (existing) - inotify watches handoff directory for result file - Timeout handling (configurable per call, `--timeout`) - Retry logic (`--retries N`, max 3, with error context) - Returns structured JSON result to caller - Thin bash wrapper: `orchestrate.sh` - [x] **1.3** Created `/home/papa/atomizer/handoffs/` directory for result passing - [x] **1.4** Updated all 8 agent SOUL.md files with: - Structured response format for delegated tasks (JSON handoff protocol) - Self-check protocol (verify completeness before submitting) - Write result to `/home/papa/atomizer/handoffs/{runId}.json` on completion - [x] **1.5** Implemented error handling in `orchestrate.py` - Health check before sending (agent health endpoint) - Timeout with partial result recovery - Malformed response detection and retry - Idempotency check before retry (check if handoff file landed late) - All errors logged to `/home/papa/atomizer/logs/orchestration/` - [x] **1.6** Implemented trace logging in handoff files - Required fields validated: `schemaVersion`, `runId`, `agent`, `status`, `result`, `confidence`, `timestamp` - Unified JSONL logging with trace fields - [x] **1.7** Implemented `--validate` flag for strict orchestrator-side output validation - [x] **1.8** Deployed `orchestrate` skill to Manager (SOUL.md + TOOLS.md updated) - [x] **1.9** Test: Manager → Webster smoke tests passed (18-49s response times, 12 successful handoffs) - Chain test (Webster → Tech-Lead): Webster completed, Tech-Lead returned `partial` due to missing context passthrough — engine bug, not protocol bug - [x] **1.10** Test: ACL enforcement works (deny/allow), strict validation works - [x] **1.11** `delegate.sh` kept as fallback for fire-and-forget use cases **Key implementation decisions:** - Python (`orchestrate.py`) over bash for all logic — better JSON handling, inotify support, error handling - `inotify_simple` for instant file detection (no polling) - Session key format: `hook:orchestrate:{run_id}:{attempt}` - ACL matrix hardcoded: Manager → all; Tech-Lead → webster/nx-expert/study-builder/secretary; Optimizer → webster/study-builder/secretary **Known issues to fix in Phase 2:** - Chain context passthrough: when chaining A→B→C, B's result must be explicitly injected into C's task - Webster's Brave API key intermittently fails (recovered on retry) - Manager Discord WebSocket reconnect loop (code 1005) — doesn't affect orchestration but blocks channel posting ### Phase 2: Smart Routing + Channel Context + Hierarchical Delegation (Day 1-2 — Feb 15-16) **Estimated effort: 4-5 hours** - [x] **2.1** Create `AGENTS_REGISTRY.json` in shared workspace *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - [x] **2.2** Update Manager's SOUL.md with routing instructions *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - [x] **2.3** Build channel context fetcher (`fetch-channel-context.sh`) *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - Uses Discord bot token to pull recent messages - Formats as markdown context block - Integrates with `orchestrate.sh` via `--channel-context` flag - [x] **2.4** Set up Discord channels per the channel strategy table *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - [x] **2.5** Implement hierarchical delegation *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - Deploy `orchestrate` skill to Tech-Lead and Optimizer - Add sub-orchestration rules to their SOUL.md (can delegate to: Webster, Study-Builder, NX-Expert, Secretary) - Cannot delegate to: Manager, Auditor, each other (prevents loops) - All sub-delegations logged to `/home/papa/atomizer/handoffs/sub/` for Manager visibility - [x] **2.6** Enforce delegation ACL matrix in `orchestrate.sh` runtime *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - Hardcoded check: caller + target validated against allowed pairs - Manager → can delegate to all agents - Tech-Lead → can delegate to: Webster, NX-Expert, Study-Builder, Secretary - Optimizer → can delegate to: Webster, Study-Builder, Secretary - All others → cannot sub-delegate (must go through Manager) - Block self-delegation and circular paths at runtime (not just SOUL.md policy) - [x] **2.7** Implement channel context sanitization *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - Cap token window, strip system-like instructions, tag as untrusted - [x] **2.8** Test: Manager auto-routes a task based on registry + includes channel context *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - [x] **2.9** Test: Tech-Lead delegates a data lookup to Webster mid-analysis *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* - [x] **2.10** Test: Auditor tries to sub-delegate → blocked by ACL *(completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)* ### Phase 3: Workflow Engine (Day 2-3 — Feb 16-17) **Estimated effort: 6-8 hours** - [x] **3.1** Build YAML workflow parser (Python script) - Implemented in `workflow.py` with name/path resolution from `/home/papa/atomizer/workspaces/shared/workflows/`, schema checks, step-ID validation, dependency validation, and cycle detection. - [x] **3.2** Build workflow executor (`workflow.sh`) - Dependency resolution - Parallel step execution - Variable substitution - Error handling and partial results - Implemented executor in `workflow.py` with `ThreadPoolExecutor`, dependency-aware scheduling, step-level `on_fail` handling (`skip`/`abort`), overall timeout enforcement, approval gates, and JSON summary output. - Added thin wrapper `workflow.sh`. - [x] **3.3** Create initial workflow templates: - `material-trade-study.yaml` - `design-review.yaml` - `quick-research.yaml` - [x] **3.4** Deploy workflow skill to Manager - Updated Manager `SOUL.md` with a dedicated "Running Workflows" section and command example. - Updated Manager `TOOLS.md` with `workflow.py`/`workflow.sh` references and usage. - [x] **3.5** Implement approval gates in workflow YAML - `workflow.py` now supports `approval_gate` prompts (`yes`/`no`) before step execution. - In `--non-interactive` mode, approval gates are skipped with warnings. - [x] **3.6** Add workflow dry-run mode (`--dry-run`) - Validates dependency graph and variable substitutions without executing - Reports: step metadata, dependency-based execution layers, and run output directory - Implemented dry-run planning output including step metadata, dependency layers, and run result directory. - [x] **3.7** Test: Run full material trade study workflow end-to-end - quick-research workflow tested E2E twice — Webster→Tech-Lead chain, 50s and 149s runs, Manager posted results to Discord - [x] **3.8** Create `#handoffs` channel for orchestration audit trail - Skipped — using workflow result directories instead of dedicated #handoffs channel **Phase 3 completion notes:** - `workflow.py`: 15KB Python, supports YAML parsing, dependency graphs, parallel execution (`ThreadPoolExecutor`), variable substitution, approval gates, dry-run, per-step result persistence - 3 workflow templates: `material-trade-study`, `quick-research`, `design-review` - `design-review` dry-run confirmed parallel execution detection (tech-lead + optimizer simultaneous) - Manager successfully ran workflow from Discord prompt, parsed JSON output, and posted synthesized results - Known issue fixed: Manager initially did not post results back — added explicit "Always Post Results Back" instructions to SOUL.md ### Phase 4: Metrics + Documentation (Day 3 — Feb 17) **Estimated effort: 2-3 hours** - [x] **4.1** Metrics: track delegation count, success rate, avg response time per agent - Implemented `metrics.py` to analyze handoff JSON and workflow summaries; supports JSON/text output with per-agent latency and success stats - [x] **4.2** Per-workflow token usage tracking across all agents - Added `metrics.sh` wrapper for easy execution from orchestrate skill directory - [x] **4.3** Document everything in this PKM project folder - Added Manager `TOOLS.md` reference for metrics usage under Agent Communication - [x] **4.4** Create orchestration documentation README - Created `/home/papa/atomizer/workspaces/shared/skills/orchestrate/README.md` with architecture, usage, ACL, workflows, and storage docs --- ## Context Flow Diagram ``` Antoine (CEO) │ ▼ ┌─────────────┐ │ MANAGER │ ◄── Reads AGENTS_REGISTRY.json │ (Opus 4.6) │ ◄── Reads workflow YAML └──────┬──────┘ ◄── Validates results │ ┌─────────────┼─────────────┐ ▼ ▼ ▼ ┌────────────┐ ┌──────────┐ ┌──────────┐ │ TECH-LEAD │ │ AUDITOR │ │OPTIMIZER │ │ (Opus) │ │ (Opus) │ │ (Sonnet) │ │ [can sub- │ └──────────┘ │ [can sub-│ │ delegate] │ │ delegate]│ └─────┬──────┘ └─────┬─────┘ │ sub-orchestration │ ┌────┴─────┐ ┌──────┴──────┐ ▼ ▼ ▼ ▼ ┌────────┐┌────────┐ ┌───────────┐┌──────────┐ │WEBSTER ││NX-EXPERT│ │STUDY-BLDR ││SECRETARY │ │(Gemini)││(Sonnet) │ │ (Sonnet) ││ (Flash) │ └───┬────┘└───┬─────┘ └─────┬─────┘└────┬─────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌──────────────────────────────────────────────┐ │ HANDOFF DIRECTORY │ │ /home/papa/atomizer/handoffs/ │ │ {runId}.json — structured results │ │ /sub/ — sub-delegation logs (visibility) │ └──────────────────────────────────────────────┘ │ │ │ │ └────┬────┘──────┬───────┘────┬───────┘ ▼ ▼ ▼ ┌────────────┐ ┌──────────┐ ┌─────────────────┐ │ DISCORD │ │VALIDATION│ │ SHARED FILES │ │ CHANNELS │ │ LOOPS │ │ (Atomizer repo │ │ (context) │ │(self-chk │ │ PKM, configs) │ └────────────┘ │+ auditor)│ └─────────────────┘ └──────────┘ CONTEXT SOURCES (per delegation): 1. Task context → Orchestrator passes explicitly 2. Channel context → Fetched from Discord history 3. Handoff context → Results from prior pipeline steps 4. Knowledge context → Shared filesystem (always available) VALIDATION FLOW: Agent output → Self-check → Orchestrator validation → [Auditor review if critical] → Accept/Retry HIERARCHY: Manager → delegates to all agents Tech-Lead, Optimizer → sub-delegate to Webster, NX-Expert, Study-Builder, Secretary All sub-delegations logged for Manager visibility ``` --- ## Comparison: Before vs After | Aspect | Before (delegate.sh) | After (Orchestration Engine) | |--------|----------------------|------------------------------| | Delegation | Fire-and-forget | Synchronous with result return | | Result flow | None — check Discord manually | Structured JSON via handoff files | | Chaining | Impossible | Native — output feeds next step | | Parallel work | Manual — delegate multiple, hope | Workflow engine handles automatically | | Context passing | None | Task + channel + handoff + filesystem | | Routing | Hardcoded agent names | Capability-based via registry | | Reusability | One-off bash calls | YAML workflow templates | | Audit trail | Discord messages only | Handoff logs + orchestration logs | | Validation | None | Self-check + auditor loops on critical steps | | Error handling | None | Timeout, retry, partial results (Phase 1) | | Hierarchy | Flat (manager only) | Hierarchical (Tech-Lead/Optimizer can sub-delegate) | | Adding agents | Edit bash script | Add entry to registry JSON | --- ## Future Extensions (Post-MVP) - **Conditional branching:** If auditor flags issues → route back to tech-lead for revision - **Human-in-the-loop gates:** Workflow pauses for Antoine's approval at critical steps - **Learning loops:** Store workflow results → agents learn from past runs - **Cost tracking:** Per-workflow token usage across all agents - **Web UI dashboard:** Visualize active workflows, agent status, handoff queue - **Inter-company workflows:** External client triggers → full analysis pipeline → deliverable --- ## Key Design Decisions 1. **File-based handoffs over HTTP callbacks** — Simpler, debuggable, works with shared filesystem we already have. HTTP callbacks are Phase 2 optimization if needed. 2. **Manager as primary orchestrator, with hierarchical delegation (Phase 2)** — Manager runs workflows and chains tasks. In Phase 2, senior agents (Tech-Lead, Optimizer) gain sub-orchestration rights to delegate directly to supporting agents (e.g., Tech-Lead → Webster for a data lookup mid-analysis) without routing through Manager. All sub-delegations are logged to the handoff directory so Manager retains visibility. No circular delegation — hierarchy is strict. 3. **YAML workflows over hardcoded scripts** — Workflows are data, not code. Antoine can define new ones. Manager can read and execute them. Future: manager could even *generate* workflows from natural language directives. 4. **Channel context is opt-in per step** — Not every step needs channel history. Explicit `channel_context` parameter keeps token usage efficient. 5. **Preserve fire-and-forget option** — `delegate.sh` stays for simple one-off tasks where you don't need the result back. `orchestrate.sh` is for pipeline work. --- --- ## Review Amendments (2026-02-15) **Source:** Webster's review (`reviews/REVIEW-Orchestration-Engine-Webster.md`) | Webster's Recommendation | Decision | Where | |---|---|---| | Hierarchical delegation | ✅ Adopted — Phase 2 | Tech-Lead + Optimizer get sub-orchestration rights | | Validation/critic loops | ✅ Adopted — Phase 1 | Self-check in agents + `--validate` flag + auditor validation blocks in YAML | | Error handling in Phase 1 | ✅ Adopted — Phase 1 | Timeouts, retries, health checks, malformed response handling | | Shared blackboard state | ⏳ Deferred | Not needed until workflows exceed 5+ steps. File-based handoffs sufficient for now | | Role-based dynamic routing | ⏳ Deferred | Only one agent per role currently. Revisit when we scale to redundant agents | | AutoGen group chat pattern | 📝 Noted | Interesting for brainstorming workflows. Not MVP priority | | LangGraph state graphs | 📝 Noted | YAML with `on_fail: goto` covers our needs without importing a paradigm | **Source:** Auditor's review (`reviews/REVIEW-Orchestration-Engine-Auditor-V2.md`) | Auditor's Recommendation | Decision | Where | |---|---|---| | Idempotency keys | ✅ Adopted — Phase 1 | `idempotencyKey` in handoff schema + existence check before retry | | Handoff schema versioning | ✅ Adopted — Phase 1 | `schemaVersion: "1.0"` + required fields validation in `orchestrate.sh` | | Approval gates | ✅ Adopted — Phase 3 | `approval_gate: ceo` in workflow YAML, posts to `#hq` and waits | | Per-run state blackboard | ⏳ Deferred | Same as Webster's — file handoffs sufficient for 3-5 step workflows | | Trace logging / observability | ✅ Adopted — Phase 1 | `workflowRunId`, `stepId`, `attempt`, `latencyMs` in every handoff | | Channel context sanitization | ✅ Adopted — Phase 2 | Token cap, instruction stripping, untrusted tagging | | ACL enforcement (runtime) | ✅ Adopted — Phase 2 | Hardcoded delegation matrix in `orchestrate.sh`, not just SOUL.md policy | | Quality score (0-1) | ⏳ Deferred | Nice-to-have for dashboards, not MVP | | Artifact checksums | ⏳ Deferred | Reproducibility concern — revisit for client deliverables | | Workflow dry-run mode | ✅ Adopted — Phase 3 | Validate dependency graph + substitutions without execution | --- > **Next step:** Implementation begins 2026-02-15. Start with Phase 1 (orchestrate.sh + handoff directory + agent SOUL.md updates). Test with a simple Webster → Tech-Lead chain before building the full workflow engine.