Files
Atomizer/docs/hq/10-ORCHESTRATION-ENGINE-PLAN.md
Antoine cf82de4f06 docs: add HQ multi-agent framework documentation from PKM
- Project plan, agent roster, architecture, roadmap
- Decision log, full system plan, Discord setup/migration guides
- System implementation status (as-built)
- Cluster pivot history
- Orchestration engine plan (Phases 1-4)
- Webster and Auditor reviews
2026-02-15 21:44:07 +00:00

40 KiB

10 — Orchestration Engine: Multi-Instance Intelligence

Status: Phases 1-3 Complete — Phase 4 (Metrics + Docs) in progress Author: Mario Lavoie (with Antoine) Date: 2026-02-15 Revised: 2026-02-15 — Incorporated Webster's review (validation loops, error handling, hierarchical delegation)


Problem Statement

The Atomizer HQ cluster runs 8 independent OpenClaw instances (one per agent). This gives us true parallelism, specialized contexts, and independent Discord identities — but we lost the orchestration primitives that make a single OpenClaw instance powerful:

  • sessions_spawn — synchronous delegation with result return
  • sessions_history — cross-session context reading
  • sessions_send — bidirectional inter-session messaging

The current delegate.sh is fire-and-forget. Manager throws a task over the wall and hopes. No result flows back. No chaining. No intelligent multi-step workflows.

Goal: Rebuild OpenClaw's orchestration power at the inter-instance level, enhanced with Discord channel context and a capability registry.


Architecture Overview

Three layers, each building on the last:

┌─────────────────────────────────────────────────────┐
│                 LAYER 3: WORKFLOWS                  │
│         YAML-defined multi-step pipelines           │
│     (sequential, parallel, conditional branching)   │
├─────────────────────────────────────────────────────┤
│              LAYER 2: SMART ROUTING                 │
│      Capability registry + channel context          │
│    (manager knows who can do what + project state)  │
├─────────────────────────────────────────────────────┤
│            LAYER 1: ORCHESTRATION CORE              │
│    Synchronous delegation + result return protocol  │
│       (replaces fire-and-forget delegate.sh)        │
├─────────────────────────────────────────────────────┤
│              EXISTING INFRASTRUCTURE                │
│   8 OpenClaw instances, hooks API, shared filesystem│
└─────────────────────────────────────────────────────┘

Layer 1: Orchestration Core

What it does: Replaces delegate.sh with synchronous delegation. Manager sends a task, waits for the result, gets structured output back. Can then chain to the next agent.

1.1 — The Orchestrate Script

File: /home/papa/atomizer/workspaces/shared/skills/orchestrate/orchestrate.sh

Behavior:

  1. Send task to target agent via /hooks/agent (existing mechanism)
  2. Poll the agent's session for completion via /hooks/status/{runId} or /sessions API
  3. Capture the agent's response (structured output)
  4. Return it to the calling agent's session
# Usage
result=$(bash orchestrate.sh <agent> "<task>" [options])

# Example: synchronous delegation
result=$(bash orchestrate.sh webster "Find CTE of Zerodur Class 0 at 20-40°C" --wait --timeout 120)
echo "$result"  # Structured findings returned to manager's session

Options:

  • --wait — Block until agent completes (default for orchestrate)
  • --timeout <seconds> — Max wait time (default: 300)
  • --retries <N> — Retry on failure (default: 1, max: 3)
  • --format json|text — Expected response format
  • --context <file> — Attach context file to the task
  • --channel-context <channel-id> [--messages N] — Include recent channel history as context
  • --validate — Run lightweight self-check on agent output before returning
  • --no-deliver — Don't post to Discord (manager will synthesize and post)

1.2 — Report-Back Protocol

Each agent gets instructions in their SOUL.md to format delegation responses:

## When responding to a delegated task:
Structure your response as:

**TASK:** [restate what was asked]
**STATUS:** complete | partial | blocked | failed
**RESULT:** [your findings/output]
**ARTIFACTS:** [any files created, with paths]
**CONFIDENCE:** high | medium | low
**NOTES:** [caveats, assumptions, open questions]

This gives manager structured data to reason about, not just a wall of text.

1.3 — Validation & Self-Check Protocol

Every delegated response goes through a lightweight validation before the orchestrator accepts it:

Self-Check (built into agent SOUL.md instructions): Each agent, when responding to a delegated task, must verify:

  • Did I answer all parts of the question?
  • Did I provide sources/evidence where applicable?
  • Is my confidence rating honest?

If the agent's self-check identifies gaps, it sets STATUS: partial and explains what's missing in NOTES.

Orchestrator-Side Validation (in orchestrate.sh): When --validate is passed (or for workflow steps with validation blocks):

  1. Check that handoff JSON has all required fields (status, result, confidence)
  2. If STATUS: failed or STATUS: blocked → trigger retry (up to --retries limit)
  3. If STATUS: partial and confidence is low → retry with refined prompt including the partial result
  4. If retries exhausted → return partial result with warning flag for the orchestrator to decide

Full Audit Validation (for high-stakes steps): Workflow YAML can specify a validation agent (typically auditor) for critical steps:

  - id: research
    agent: webster
    task: "Research materials..."
    validation:
      agent: auditor
      criteria: "Are all requested properties present with credible sources?"
      on_fail: retry
      max_retries: 2

This runs the auditor on the output before passing it downstream. Prevents garbage-in-garbage-out in critical pipelines.

1.4 — Error Handling (Phase 1 Priority)

Error handling is not deferred — it ships with the orchestration core:

Agent unreachable:

  • orchestrate.sh checks health endpoint before sending
  • If agent is down: log error, return immediately with STATUS: error, reason: agent_unreachable
  • Caller (manager or workflow engine) decides whether to retry, skip, or abort

Timeout:

  • Configurable per call (--timeout) and per workflow step
  • On timeout: kill the polling loop, check if partial handoff exists
  • If partial result available: return it with STATUS: timeout_partial
  • If no result: return STATUS: timeout

Malformed response:

  • Agent didn't write handoff file or wrote invalid JSON
  • orchestrate.sh validates JSON schema before returning
  • On malformed: retry once with explicit reminder to write structured output
  • If still malformed: return raw text with STATUS: malformed

Retry logic (with idempotency):

Attempt 1: Generate idempotencyKey={wfRunId}_{stepId}_1 → Send task → wait → check result
  If timeout → Check if handoff file exists (late arrival). If yes → use it. If no:
  Attempt 2: idempotencyKey={wfRunId}_{stepId}_2 → Resend with "Previous attempt failed: {reason}. Please retry."
    If timeout → Same late-arrival check. If no:
    Attempt 3 (if --retries 3): Same pattern
      If fail → Return error to caller with all attempt details

Key rule: Before every retry, check if the handoff file from the previous attempt landed. Prevents duplicate work when an agent was just slow, not dead.

1.5 — Result Capture Mechanism

Two options (implement both, prefer A):

Option A — File-based handoff:

  • Agent writes result to /home/papa/atomizer/handoffs/{runId}.json
  • Orchestrate script polls for file existence
  • Clean, simple, works with shared filesystem
{
  "schemaVersion": "1.0",
  "runId": "hook-delegation-1739587200",
  "idempotencyKey": "wf-mat-study-001_research_1",
  "workflowRunId": "wf-mat-study-001",
  "stepId": "research",
  "attempt": 1,
  "agent": "webster",
  "status": "complete",
  "result": "Zerodur Class 0 CTE: 0 ± 0.007 ppm/K (20-40°C)...",
  "artifacts": [],
  "confidence": "high",
  "latencyMs": 45200,
  "timestamp": "2026-02-15T03:00:00Z"
}

Required fields: schemaVersion, runId, agent, status, result, confidence, timestamp Trace fields (required): workflowRunId, stepId, attempt, latencyMs Idempotency: idempotencyKey = {workflowRunId}_{stepId}_{attempt}. Orchestrator checks for existing handoff before retrying — if result exists, skip resend.

Option B — Hooks callback:

  • Agent calls manager's /hooks/report endpoint with result
  • More real-time but adds complexity
  • Use for time-sensitive workflows

1.6 — Chaining Example

# Manager orchestrates a material trade study
# Step 1: Research
data=$(bash orchestrate.sh webster "Research Clearceram-Z HS vs Zerodur Class 0: CTE, density, cost, lead time" --wait)

# Step 2: Technical evaluation (pass webster's findings as context)
echo "$data" > /tmp/material_data.json
assessment=$(bash orchestrate.sh tech-lead "Evaluate these materials for M2/M3 mirrors against our thermal requirements" --context /tmp/material_data.json --wait)

# Step 3: Audit
echo "$assessment" > /tmp/assessment.json
audit=$(bash orchestrate.sh auditor "Review this technical assessment for completeness" --context /tmp/assessment.json --wait)

# Step 4: Manager synthesizes and delivers
# (Manager has all three results in-session, reasons about them, posts to Discord)

Layer 2: Smart Routing

What it does: Manager knows each agent's capabilities, strengths, and model. Routes tasks intelligently without hardcoded logic.

2.1 — Agent Capability Registry

File: /home/papa/atomizer/workspaces/shared/AGENTS_REGISTRY.json

{
  "agents": {
    "tech-lead": {
      "port": 18804,
      "model": "anthropic/claude-opus-4-6",
      "capabilities": [
        "fea-review",
        "design-decisions",
        "technical-analysis",
        "material-selection",
        "requirements-validation",
        "trade-studies"
      ],
      "strengths": "Deep reasoning, technical judgment, complex analysis",
      "limitations": "Slow (Opus), expensive tokens — use for high-value decisions",
      "inputFormat": "Technical problem with context and constraints",
      "outputFormat": "Structured analysis with recommendations and rationale",
      "channels": ["#hq", "#technical"]
    },
    "webster": {
      "port": 18828,
      "model": "google/gemini-2.5-pro",
      "capabilities": [
        "web-research",
        "literature-review",
        "data-lookup",
        "supplier-search",
        "standards-lookup",
        "competitive-analysis"
      ],
      "strengths": "Fast research, broad knowledge, cheap tokens, web access",
      "limitations": "No deep technical judgment — finds data, doesn't evaluate it",
      "inputFormat": "Natural language query with specifics",
      "outputFormat": "Structured findings with sources and confidence",
      "channels": ["#hq", "#research"]
    },
    "optimizer": {
      "port": 18816,
      "model": "anthropic/claude-sonnet-4-20250514",
      "capabilities": [
        "optimization-setup",
        "parameter-studies",
        "objective-definition",
        "constraint-formulation",
        "result-interpretation",
        "sensitivity-analysis"
      ],
      "strengths": "Optimization methodology, mathematical formulation, DOE",
      "limitations": "Needs clear problem definition — not for open-ended exploration",
      "inputFormat": "Optimization problem with objectives, variables, constraints",
      "outputFormat": "Study configuration, parameter definitions, result analysis",
      "channels": ["#hq", "#optimization"]
    },
    "study-builder": {
      "port": 18820,
      "model": "anthropic/claude-sonnet-4-20250514",
      "capabilities": [
        "study-configuration",
        "doe-setup",
        "batch-generation",
        "parameter-sweeps",
        "study-templates"
      ],
      "strengths": "Translating optimization plans into executable study configs",
      "limitations": "Needs optimizer's plan as input — doesn't design studies independently",
      "inputFormat": "Study plan from optimizer with parameter ranges",
      "outputFormat": "Ready-to-run study configuration files",
      "channels": ["#hq", "#optimization"]
    },
    "nx-expert": {
      "port": 18824,
      "model": "anthropic/claude-sonnet-4-20250514",
      "capabilities": [
        "nx-operations",
        "mesh-generation",
        "boundary-conditions",
        "nastran-setup",
        "cad-manipulation",
        "post-processing"
      ],
      "strengths": "NX/Simcenter expertise, FEA model setup, hands-on CAD/FEM work",
      "limitations": "Needs clear instructions — not for high-level design decisions",
      "inputFormat": "Specific NX task with model reference and parameters",
      "outputFormat": "Completed operation with verification screenshots/data",
      "channels": ["#hq", "#nx-work"]
    },
    "auditor": {
      "port": 18812,
      "model": "anthropic/claude-opus-4-6",
      "capabilities": [
        "quality-review",
        "compliance-check",
        "methodology-audit",
        "assumption-validation",
        "report-review",
        "standards-compliance"
      ],
      "strengths": "Critical eye, finds gaps and errors, ensures rigor",
      "limitations": "Reviews work, doesn't create it — needs output from other agents",
      "inputFormat": "Work product to review with applicable standards/requirements",
      "outputFormat": "Structured review: findings, severity, recommendations",
      "channels": ["#hq", "#quality"]
    },
    "secretary": {
      "port": 18808,
      "model": "google/gemini-2.5-flash",
      "capabilities": [
        "meeting-notes",
        "status-reports",
        "documentation",
        "scheduling",
        "action-tracking",
        "communication-drafting"
      ],
      "strengths": "Fast, cheap, good at summarization and admin tasks",
      "limitations": "Not for technical work — administrative and organizational only",
      "inputFormat": "Admin task or raw content to organize",
      "outputFormat": "Clean documentation, summaries, action lists",
      "channels": ["#hq", "#admin"]
    },
    "manager": {
      "port": 18800,
      "model": "anthropic/claude-opus-4-6",
      "capabilities": [
        "orchestration",
        "project-planning",
        "task-decomposition",
        "priority-management",
        "stakeholder-communication",
        "workflow-execution"
      ],
      "strengths": "Strategic thinking, orchestration, synthesis across agents",
      "limitations": "Should not do technical work — delegates everything",
      "inputFormat": "High-level directives from Antoine (CEO)",
      "outputFormat": "Plans, status updates, synthesized deliverables",
      "channels": ["#hq"]
    }
  }
}

2.2 — Manager Routing Logic

Added to Manager's SOUL.md as a skill directive:

## Smart Routing
Before delegating, consult `/home/papa/atomizer/workspaces/shared/AGENTS_REGISTRY.json`.
- Match task requirements to agent capabilities
- Consider model strengths (Opus for reasoning, Gemini for speed, Sonnet for balanced)
- For multi-step tasks, plan the full pipeline before starting
- Prefer parallel execution when steps are independent
- Always specify what you need back (don't let agents guess)

2.3 — Discord Channel Context Integration

How channels feed context into orchestration:

Each Discord channel accumulates project-specific conversation history. The orchestration layer can pull this as context:

# In orchestrate.sh, --channel-context fetches recent messages
bash orchestrate.sh tech-lead "Review thermal margins for M2" \
  --channel-context "#gigabit-m1" --messages 50 \
  --wait

Implementation: Use Discord bot API (each instance has a bot token) to fetch channel message history. Format as context block prepended to the task.

Channel strategy for Atomizer HQ Discord:

Channel Purpose Context Value
#hq Cross-team coordination, announcements Project-wide decisions
#technical FEA discussions, design decisions Technical context for analysis tasks
#optimization Study configs, results, methodology Optimization history and patterns
#research Webster's findings, literature Reference data for technical work
#quality Audit findings, compliance notes Review standards and past issues
#nx-work CAD/FEM operations, model updates Model state and recent changes
#admin Meeting notes, schedules, action items Project timeline and commitments
#handoffs Automated orchestration results (bot-only) Pipeline audit trail

Key insight: Channels become persistent, queryable context stores. Instead of passing massive context blocks between agents, you say "read #technical for the last 20 messages" and the agent absorbs project state naturally.

Channel Context Sanitization (security): Discord history is untrusted input. Before injecting into an agent's context:

  • Cap at configurable token window (default: last 30 messages, max ~4K tokens)
  • Strip any system-prompt-like instructions from message content
  • Tag entire block as [CHANNEL CONTEXT — untrusted, for reference only]
  • Never let channel content override task instructions

This prevents prompt injection via crafted Discord messages in channel history.


Layer 3: Workflow Engine

What it does: Defines reusable multi-step pipelines as YAML. Manager reads and executes them. No coding needed to create new workflows.

3.1 — Workflow Definition Format

Location: /home/papa/atomizer/workspaces/shared/workflows/

# /home/papa/atomizer/workspaces/shared/workflows/material-trade-study.yaml
name: Material Trade Study
description: Research, evaluate, and audit material options for optical components
trigger: manual  # or: keyword, schedule

inputs:
  materials:
    type: list
    description: "Materials to compare"
  requirements:
    type: text
    description: "Performance requirements and constraints"
  project_channel:
    type: channel
    description: "Project channel for context"

steps:
  - id: research
    agent: webster
    task: |
      Research the following materials: {materials}
      For each material, find: CTE (with temperature range), density, Young's modulus,
      cost per kg, lead time, availability, and any known issues for optical applications.
      Provide sources for all data.
    channel_context: "{project_channel}"
    channel_messages: 30
    timeout: 180
    retries: 2
    output: material_data
    validation:
      agent: auditor
      criteria: "Are all requested material properties present with credible sources? Flag any missing data."
      on_fail: retry

  - id: evaluate
    agent: tech-lead
    task: |
      Evaluate these materials against our requirements:

      REQUIREMENTS:
      {requirements}

      MATERIAL DATA:
      {material_data}

      Provide a recommendation with full rationale. Include a comparison matrix.
    depends_on: [research]
    timeout: 300
    retries: 1
    output: technical_assessment

  - id: audit
    agent: auditor
    task: |
      Review this material trade study for completeness, methodological rigor,
      and potential gaps:

      {technical_assessment}

      Check: Are all requirements addressed? Are sources credible?
      Are there materials that should have been considered but weren't?
    depends_on: [evaluate]
    timeout: 180
    output: audit_result

  - id: synthesize
    agent: manager
    action: synthesize  # Manager processes internally, doesn't delegate
    inputs: [material_data, technical_assessment, audit_result]
    deliver:
      channel: "{project_channel}"
      format: summary  # Manager writes a clean summary post

notifications:
  on_complete: "#hq"
  on_failure: "#hq"

3.2 — More Workflow Templates

Design Review:

name: Design Review
steps:
  - id: prepare
    agent: secretary
    task: "Compile design package: gather latest CAD screenshots, analysis results, and requirements from {project_channel}"

  - id: technical_review
    agent: tech-lead
    task: "Review design against requirements: {prepare}"
    depends_on: [prepare]

  - id: optimization_review
    agent: optimizer
    task: "Assess optimization potential: {prepare}"
    depends_on: [prepare]

  # technical_review and optimization_review run in PARALLEL (no dependency between them)

  - id: audit
    agent: auditor
    task: "Final review: {technical_review} + {optimization_review}"
    depends_on: [technical_review, optimization_review]

  - id: deliver
    agent: secretary
    task: "Format design review report from: {audit}"
    depends_on: [audit]
    deliver:
      channel: "{project_channel}"

Quick Research:

name: Quick Research
steps:
  - id: research
    agent: webster
    task: "{query}"
    timeout: 120
    output: findings

  - id: validate
    agent: tech-lead
    task: "Verify these findings are accurate and relevant: {findings}"
    depends_on: [research]
    deliver:
      channel: "{request_channel}"

3.3 — Workflow Executor

File: /home/papa/atomizer/workspaces/shared/skills/orchestrate/workflow.sh

The manager's orchestration skill reads YAML workflows and executes them:

# Run a workflow
bash workflow.sh material-trade-study \
  --input materials="Zerodur Class 0, Clearceram-Z HS, ULE" \
  --input requirements="CTE < 0.01 ppm/K at 22°C, aperture 250mm" \
  --input project_channel="#gigabit-m1"

Executor logic:

  1. Parse YAML workflow definition
  2. Resolve dependencies → build execution graph
  3. Execute steps in order (parallel when no dependencies)
  4. For each step: call orchestrate.sh with task + resolved inputs
  5. Store results in /home/papa/atomizer/handoffs/workflows/{workflow-run-id}/
  6. On completion: deliver final output to specified channel
  7. On failure: notify #hq with error details and partial results

Implementation Plan

Phase 1: Orchestration Core + Validation + Error Handling (Day 1 — Feb 15) COMPLETE

Actual effort: ~6 hours

  • 1.1 Created /home/papa/atomizer/workspaces/shared/skills/orchestrate/ directory
  • 1.2 Built orchestrate.py (Python, not bash) — synchronous delegation with inotify-based waiting
    • Send via /hooks/agent (existing)
    • inotify watches handoff directory for result file
    • Timeout handling (configurable per call, --timeout)
    • Retry logic (--retries N, max 3, with error context)
    • Returns structured JSON result to caller
    • Thin bash wrapper: orchestrate.sh
  • 1.3 Created /home/papa/atomizer/handoffs/ directory for result passing
  • 1.4 Updated all 8 agent SOUL.md files with:
    • Structured response format for delegated tasks (JSON handoff protocol)
    • Self-check protocol (verify completeness before submitting)
    • Write result to /home/papa/atomizer/handoffs/{runId}.json on completion
  • 1.5 Implemented error handling in orchestrate.py
    • Health check before sending (agent health endpoint)
    • Timeout with partial result recovery
    • Malformed response detection and retry
    • Idempotency check before retry (check if handoff file landed late)
    • All errors logged to /home/papa/atomizer/logs/orchestration/
  • 1.6 Implemented trace logging in handoff files
    • Required fields validated: schemaVersion, runId, agent, status, result, confidence, timestamp
    • Unified JSONL logging with trace fields
  • 1.7 Implemented --validate flag for strict orchestrator-side output validation
  • 1.8 Deployed orchestrate skill to Manager (SOUL.md + TOOLS.md updated)
  • 1.9 Test: Manager → Webster smoke tests passed (18-49s response times, 12 successful handoffs)
    • Chain test (Webster → Tech-Lead): Webster completed, Tech-Lead returned partial due to missing context passthrough — engine bug, not protocol bug
  • 1.10 Test: ACL enforcement works (deny/allow), strict validation works
  • 1.11 delegate.sh kept as fallback for fire-and-forget use cases

Key implementation decisions:

  • Python (orchestrate.py) over bash for all logic — better JSON handling, inotify support, error handling
  • inotify_simple for instant file detection (no polling)
  • Session key format: hook:orchestrate:{run_id}:{attempt}
  • ACL matrix hardcoded: Manager → all; Tech-Lead → webster/nx-expert/study-builder/secretary; Optimizer → webster/study-builder/secretary

Known issues to fix in Phase 2:

  • Chain context passthrough: when chaining A→B→C, B's result must be explicitly injected into C's task
  • Webster's Brave API key intermittently fails (recovered on retry)
  • Manager Discord WebSocket reconnect loop (code 1005) — doesn't affect orchestration but blocks channel posting

Phase 2: Smart Routing + Channel Context + Hierarchical Delegation (Day 1-2 — Feb 15-16)

Estimated effort: 4-5 hours

  • 2.1 Create AGENTS_REGISTRY.json in shared workspace (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
  • 2.2 Update Manager's SOUL.md with routing instructions (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
  • 2.3 Build channel context fetcher (fetch-channel-context.sh) (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
    • Uses Discord bot token to pull recent messages
    • Formats as markdown context block
    • Integrates with orchestrate.sh via --channel-context flag
  • 2.4 Set up Discord channels per the channel strategy table (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
  • 2.5 Implement hierarchical delegation (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
    • Deploy orchestrate skill to Tech-Lead and Optimizer
    • Add sub-orchestration rules to their SOUL.md (can delegate to: Webster, Study-Builder, NX-Expert, Secretary)
    • Cannot delegate to: Manager, Auditor, each other (prevents loops)
    • All sub-delegations logged to /home/papa/atomizer/handoffs/sub/ for Manager visibility
  • 2.6 Enforce delegation ACL matrix in orchestrate.sh runtime (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
    • Hardcoded check: caller + target validated against allowed pairs
    • Manager → can delegate to all agents
    • Tech-Lead → can delegate to: Webster, NX-Expert, Study-Builder, Secretary
    • Optimizer → can delegate to: Webster, Study-Builder, Secretary
    • All others → cannot sub-delegate (must go through Manager)
    • Block self-delegation and circular paths at runtime (not just SOUL.md policy)
  • 2.7 Implement channel context sanitization (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
    • Cap token window, strip system-like instructions, tag as untrusted
  • 2.8 Test: Manager auto-routes a task based on registry + includes channel context (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
  • 2.9 Test: Tech-Lead delegates a data lookup to Webster mid-analysis (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)
  • 2.10 Test: Auditor tries to sub-delegate → blocked by ACL (completed 2026-02-15 — channel context fetcher built, hierarchical delegation deployed to Tech-Lead + Optimizer, ACL tested, all tests pass)

Phase 3: Workflow Engine (Day 2-3 — Feb 16-17)

Estimated effort: 6-8 hours

  • 3.1 Build YAML workflow parser (Python script)
    • Implemented in workflow.py with name/path resolution from /home/papa/atomizer/workspaces/shared/workflows/, schema checks, step-ID validation, dependency validation, and cycle detection.
  • 3.2 Build workflow executor (workflow.sh)
    • Dependency resolution
    • Parallel step execution
    • Variable substitution
    • Error handling and partial results
    • Implemented executor in workflow.py with ThreadPoolExecutor, dependency-aware scheduling, step-level on_fail handling (skip/abort), overall timeout enforcement, approval gates, and JSON summary output.
    • Added thin wrapper workflow.sh.
  • 3.3 Create initial workflow templates:
    • material-trade-study.yaml
    • design-review.yaml
    • quick-research.yaml
  • 3.4 Deploy workflow skill to Manager
    • Updated Manager SOUL.md with a dedicated "Running Workflows" section and command example.
    • Updated Manager TOOLS.md with workflow.py/workflow.sh references and usage.
  • 3.5 Implement approval gates in workflow YAML
    • workflow.py now supports approval_gate prompts (yes/no) before step execution.
    • In --non-interactive mode, approval gates are skipped with warnings.
  • 3.6 Add workflow dry-run mode (--dry-run)
    • Validates dependency graph and variable substitutions without executing
    • Reports: step metadata, dependency-based execution layers, and run output directory
    • Implemented dry-run planning output including step metadata, dependency layers, and run result directory.
  • 3.7 Test: Run full material trade study workflow end-to-end
    • quick-research workflow tested E2E twice — Webster→Tech-Lead chain, 50s and 149s runs, Manager posted results to Discord
  • 3.8 Create #handoffs channel for orchestration audit trail
    • Skipped — using workflow result directories instead of dedicated #handoffs channel

Phase 3 completion notes:

  • workflow.py: 15KB Python, supports YAML parsing, dependency graphs, parallel execution (ThreadPoolExecutor), variable substitution, approval gates, dry-run, per-step result persistence
  • 3 workflow templates: material-trade-study, quick-research, design-review
  • design-review dry-run confirmed parallel execution detection (tech-lead + optimizer simultaneous)
  • Manager successfully ran workflow from Discord prompt, parsed JSON output, and posted synthesized results
  • Known issue fixed: Manager initially did not post results back — added explicit "Always Post Results Back" instructions to SOUL.md

Phase 4: Metrics + Documentation (Day 3 — Feb 17)

Estimated effort: 2-3 hours

  • 4.1 Metrics: track delegation count, success rate, avg response time per agent
    • Implemented metrics.py to analyze handoff JSON and workflow summaries; supports JSON/text output with per-agent latency and success stats
  • 4.2 Per-workflow token usage tracking across all agents
    • Added metrics.sh wrapper for easy execution from orchestrate skill directory
  • 4.3 Document everything in this PKM project folder
    • Added Manager TOOLS.md reference for metrics usage under Agent Communication
  • 4.4 Create orchestration documentation README
    • Created /home/papa/atomizer/workspaces/shared/skills/orchestrate/README.md with architecture, usage, ACL, workflows, and storage docs

Context Flow Diagram

                    Antoine (CEO)
                        │
                        ▼
                 ┌─────────────┐
                 │   MANAGER   │ ◄── Reads AGENTS_REGISTRY.json
                 │  (Opus 4.6) │ ◄── Reads workflow YAML
                 └──────┬──────┘     ◄── Validates results
                        │
          ┌─────────────┼─────────────┐
          ▼             ▼             ▼
   ┌────────────┐ ┌──────────┐ ┌──────────┐
   │ TECH-LEAD  │ │ AUDITOR  │ │OPTIMIZER │
   │  (Opus)    │ │  (Opus)  │ │ (Sonnet) │
   │ [can sub-  │ └──────────┘ │ [can sub-│
   │  delegate] │              │  delegate]│
   └─────┬──────┘              └─────┬─────┘
         │ sub-orchestration         │
    ┌────┴─────┐              ┌──────┴──────┐
    ▼          ▼              ▼             ▼
┌────────┐┌────────┐  ┌───────────┐┌──────────┐
│WEBSTER ││NX-EXPERT│  │STUDY-BLDR ││SECRETARY │
│(Gemini)││(Sonnet) │  │ (Sonnet)  ││ (Flash)  │
└───┬────┘└───┬─────┘  └─────┬─────┘└────┬─────┘
    │         │              │            │
    ▼         ▼              ▼            ▼
  ┌──────────────────────────────────────────────┐
  │            HANDOFF DIRECTORY                 │
  │  /home/papa/atomizer/handoffs/               │
  │  {runId}.json — structured results           │
  │  /sub/ — sub-delegation logs (visibility)    │
  └──────────────────────────────────────────────┘
    │         │              │            │
    └────┬────┘──────┬───────┘────┬───────┘
         ▼           ▼            ▼
  ┌────────────┐ ┌──────────┐ ┌─────────────────┐
  │  DISCORD   │ │VALIDATION│ │  SHARED FILES   │
  │  CHANNELS  │ │  LOOPS   │ │  (Atomizer repo │
  │ (context)  │ │(self-chk │ │   PKM, configs) │
  └────────────┘ │+ auditor)│ └─────────────────┘
                 └──────────┘

CONTEXT SOURCES (per delegation):
  1. Task context     → Orchestrator passes explicitly
  2. Channel context  → Fetched from Discord history
  3. Handoff context  → Results from prior pipeline steps
  4. Knowledge context → Shared filesystem (always available)

VALIDATION FLOW:
  Agent output → Self-check → Orchestrator validation → [Auditor review if critical] → Accept/Retry

HIERARCHY:
  Manager → delegates to all agents
  Tech-Lead, Optimizer → sub-delegate to Webster, NX-Expert, Study-Builder, Secretary
  All sub-delegations logged for Manager visibility

Comparison: Before vs After

Aspect Before (delegate.sh) After (Orchestration Engine)
Delegation Fire-and-forget Synchronous with result return
Result flow None — check Discord manually Structured JSON via handoff files
Chaining Impossible Native — output feeds next step
Parallel work Manual — delegate multiple, hope Workflow engine handles automatically
Context passing None Task + channel + handoff + filesystem
Routing Hardcoded agent names Capability-based via registry
Reusability One-off bash calls YAML workflow templates
Audit trail Discord messages only Handoff logs + orchestration logs
Validation None Self-check + auditor loops on critical steps
Error handling None Timeout, retry, partial results (Phase 1)
Hierarchy Flat (manager only) Hierarchical (Tech-Lead/Optimizer can sub-delegate)
Adding agents Edit bash script Add entry to registry JSON

Future Extensions (Post-MVP)

  • Conditional branching: If auditor flags issues → route back to tech-lead for revision
  • Human-in-the-loop gates: Workflow pauses for Antoine's approval at critical steps
  • Learning loops: Store workflow results → agents learn from past runs
  • Cost tracking: Per-workflow token usage across all agents
  • Web UI dashboard: Visualize active workflows, agent status, handoff queue
  • Inter-company workflows: External client triggers → full analysis pipeline → deliverable

Key Design Decisions

  1. File-based handoffs over HTTP callbacks — Simpler, debuggable, works with shared filesystem we already have. HTTP callbacks are Phase 2 optimization if needed.

  2. Manager as primary orchestrator, with hierarchical delegation (Phase 2) — Manager runs workflows and chains tasks. In Phase 2, senior agents (Tech-Lead, Optimizer) gain sub-orchestration rights to delegate directly to supporting agents (e.g., Tech-Lead → Webster for a data lookup mid-analysis) without routing through Manager. All sub-delegations are logged to the handoff directory so Manager retains visibility. No circular delegation — hierarchy is strict.

  3. YAML workflows over hardcoded scripts — Workflows are data, not code. Antoine can define new ones. Manager can read and execute them. Future: manager could even generate workflows from natural language directives.

  4. Channel context is opt-in per step — Not every step needs channel history. Explicit channel_context parameter keeps token usage efficient.

  5. Preserve fire-and-forget optiondelegate.sh stays for simple one-off tasks where you don't need the result back. orchestrate.sh is for pipeline work.



Review Amendments (2026-02-15)

Source: Webster's review (reviews/REVIEW-Orchestration-Engine-Webster.md)

Webster's Recommendation Decision Where
Hierarchical delegation Adopted — Phase 2 Tech-Lead + Optimizer get sub-orchestration rights
Validation/critic loops Adopted — Phase 1 Self-check in agents + --validate flag + auditor validation blocks in YAML
Error handling in Phase 1 Adopted — Phase 1 Timeouts, retries, health checks, malformed response handling
Shared blackboard state Deferred Not needed until workflows exceed 5+ steps. File-based handoffs sufficient for now
Role-based dynamic routing Deferred Only one agent per role currently. Revisit when we scale to redundant agents
AutoGen group chat pattern 📝 Noted Interesting for brainstorming workflows. Not MVP priority
LangGraph state graphs 📝 Noted YAML with on_fail: goto covers our needs without importing a paradigm

Source: Auditor's review (reviews/REVIEW-Orchestration-Engine-Auditor-V2.md)

Auditor's Recommendation Decision Where
Idempotency keys Adopted — Phase 1 idempotencyKey in handoff schema + existence check before retry
Handoff schema versioning Adopted — Phase 1 schemaVersion: "1.0" + required fields validation in orchestrate.sh
Approval gates Adopted — Phase 3 approval_gate: ceo in workflow YAML, posts to #hq and waits
Per-run state blackboard Deferred Same as Webster's — file handoffs sufficient for 3-5 step workflows
Trace logging / observability Adopted — Phase 1 workflowRunId, stepId, attempt, latencyMs in every handoff
Channel context sanitization Adopted — Phase 2 Token cap, instruction stripping, untrusted tagging
ACL enforcement (runtime) Adopted — Phase 2 Hardcoded delegation matrix in orchestrate.sh, not just SOUL.md policy
Quality score (0-1) Deferred Nice-to-have for dashboards, not MVP
Artifact checksums Deferred Reproducibility concern — revisit for client deliverables
Workflow dry-run mode Adopted — Phase 3 Validate dependency graph + substitutions without execution

Next step: Implementation begins 2026-02-15. Start with Phase 1 (orchestrate.sh + handoff directory + agent SOUL.md updates). Test with a simple Webster → Tech-Lead chain before building the full workflow engine.