feat: add Atomizer HQ multi-agent cluster infrastructure

- 8-agent OpenClaw cluster (Manager, Tech-Lead, Secretary, Auditor, Optimizer, Study-Builder, NX-Expert, Webster) - Orchestration engine: orchestrate.py (sync delegation + handoffs) - Workflow engine: YAML-defined multi-step pipelines - Agent workspaces: SOUL.md, AGENTS.md, MEMORY.md per agent - Shared skills: delegate, orchestrate, atomizer-protocols - Capability registry (AGENTS_REGISTRY.json) - Cluster management: cluster.sh, systemd template - All secrets replaced with env var references
2026-02-15 21:18:18 +00:00
parent d6a1d6eee1
commit 3289a76e19
170 changed files with 24949 additions and 0 deletions
--- a/hq/workspaces/shared/AGENTS_REGISTRY.json
+++ b/hq/workspaces/shared/AGENTS_REGISTRY.json
@@ -0,0 +1,70 @@
+{
+  "schemaVersion": "1.0",
+  "updated": "2026-02-15",
+  "agents": {
+    "tech-lead": {
+      "port": 18804,
+      "model": "anthropic/claude-opus-4-6",
+      "capabilities": ["fea-review", "design-decisions", "technical-analysis", "material-selection", "requirements-validation", "trade-studies"],
+      "strengths": "Deep reasoning, technical judgment, complex analysis",
+      "limitations": "Slow (Opus), expensive — use for high-value decisions",
+      "channels": ["#hq", "#technical"]
+    },
+    "webster": {
+      "port": 18828,
+      "model": "google/gemini-2.5-pro",
+      "capabilities": ["web-research", "literature-review", "data-lookup", "supplier-search", "standards-lookup"],
+      "strengths": "Fast research, broad knowledge, web access",
+      "limitations": "No deep technical judgment — finds data, doesn't evaluate it",
+      "channels": ["#hq", "#research"]
+    },
+    "optimizer": {
+      "port": 18816,
+      "model": "anthropic/claude-sonnet-4-20250514",
+      "capabilities": ["optimization-setup", "parameter-studies", "objective-definition", "constraint-formulation", "sensitivity-analysis"],
+      "strengths": "Optimization methodology, mathematical formulation, DOE",
+      "limitations": "Needs clear problem definition",
+      "channels": ["#hq", "#optimization"]
+    },
+    "study-builder": {
+      "port": 18820,
+      "model": "anthropic/claude-sonnet-4-20250514",
+      "capabilities": ["study-configuration", "doe-setup", "batch-generation", "parameter-sweeps"],
+      "strengths": "Translating optimization plans into executable configs",
+      "limitations": "Needs optimizer's plan as input",
+      "channels": ["#hq", "#optimization"]
+    },
+    "nx-expert": {
+      "port": 18824,
+      "model": "anthropic/claude-sonnet-4-20250514",
+      "capabilities": ["nx-operations", "mesh-generation", "boundary-conditions", "nastran-setup", "post-processing"],
+      "strengths": "NX/Simcenter expertise, FEA model setup",
+      "limitations": "Needs clear instructions",
+      "channels": ["#hq", "#nx-work"]
+    },
+    "auditor": {
+      "port": 18812,
+      "model": "anthropic/claude-opus-4-6",
+      "capabilities": ["quality-review", "compliance-check", "methodology-audit", "assumption-validation", "report-review"],
+      "strengths": "Critical eye, finds gaps and errors",
+      "limitations": "Reviews work, doesn't create it",
+      "channels": ["#hq", "#quality"]
+    },
+    "secretary": {
+      "port": 18808,
+      "model": "google/gemini-2.5-flash",
+      "capabilities": ["meeting-notes", "status-reports", "documentation", "scheduling", "action-tracking"],
+      "strengths": "Fast, cheap, good at summarization and admin",
+      "limitations": "Not for technical work",
+      "channels": ["#hq", "#admin"]
+    },
+    "manager": {
+      "port": 18800,
+      "model": "anthropic/claude-opus-4-6",
+      "capabilities": ["orchestration", "project-planning", "task-decomposition", "workflow-execution"],
+      "strengths": "Strategic thinking, orchestration, synthesis",
+      "limitations": "Should not do technical work — delegates everything",
+      "channels": ["#hq"]
+    }
+  }
+}
--- a/hq/workspaces/shared/CLUSTER.md
+++ b/hq/workspaces/shared/CLUSTER.md
@@ -0,0 +1,82 @@
+# Atomizer Agent Cluster
+
+## Agent Directory
+
+| Agent | ID | Port | Role |
+|-------|-----|------|------|
+| 🎯 Manager | manager | 18800 | Orchestration, delegation, strategy |
+| 🔧 Tech Lead | technical-lead | 18804 | FEA, R&D, technical review |
+| 📋 Secretary | secretary | 18808 | Admin, notes, reports, knowledge |
+| 🔍 Auditor | auditor | 18812 | Quality gatekeeper, reviews |
+| ⚡ Optimizer | optimizer | 18816 | Optimization algorithms & strategy |
+| 🏗️ Study Builder | study-builder | 18820 | Study code engineering |
+| 🖥️ NX Expert | nx-expert | 18824 | Siemens NX/CAD/CAE |
+| 🔬 Webster | webster | 18828 | Research & literature |
+
+## Inter-Agent Communication
+
+Each agent runs as an independent OpenClaw gateway. To send a message to another agent:
+
+```bash
+curl -s -X POST http://127.0.0.1:PORT/hooks/agent \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1" \
+  -d '{"message": "your message", "agentId": "AGENT_ID"}'
+```
+
+### Examples
+
+```bash
+# Report to manager
+curl -s -X POST http://127.0.0.1:18800/hooks/agent \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1" \
+  -d '{"message": "Status update: FEA analysis complete", "agentId": "manager"}'
+
+# Delegate to tech-lead
+curl -s -X POST http://127.0.0.1:18804/hooks/agent \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1" \
+  -d '{"message": "Please review the beam optimization study", "agentId": "technical-lead"}'
+
+# Ask webster for research
+curl -s -X POST http://127.0.0.1:18828/hooks/agent \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer 31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1" \
+  -d '{"message": "Find papers on topology optimization", "agentId": "webster"}'
+```
+
+## Discord Channel Ownership
+
+- **Manager**: #ceo-office, #announcements, #daily-standup, #active-projects, #agent-logs, #inter-agent, #general, #hydrotech-beam
+- **Tech Lead**: #technical, #code-review, #fea-analysis
+- **Secretary**: #task-board, #meeting-notes, #reports, #knowledge-base, #lessons-learned, #it-ops
+- **NX Expert**: #nx-cad
+- **Webster**: #literature, #materials-data
+- **Auditor, Optimizer, Study Builder**: DM + hooks (no dedicated channels)
+
+## Slack (Manager only)
+
+Manager also handles Slack channels: #all-atomizer-hq, #secretary, etc.
+
+## Rules
+
+1. Always respond to Discord messages — NEVER reply NO_REPLY
+2. When delegating, be specific about what you need
+3. Post results back in the originating Discord channel
+4. Use hooks API for inter-agent communication
+
+## Response Arbitration (Anti-Collision)
+
+To prevent multiple agents replying at once in the same public channel:
+
+1. **Single channel owner speaks by default.**
+   - In any shared channel, only the listed owner agent should reply unless another agent is directly tagged.
+2. **Non-owners are mention-gated.**
+   - If a non-owner is not explicitly @mentioned, it should stay silent and route updates via hooks to the owner.
+3. **Tagged specialist = scoped reply only.**
+   - When tagged, reply only to the tagged request (no broad channel takeover), then return to silent mode.
+4. **Manager synthesis for multi-agent asks.**
+   - If a user asks multiple roles at once, specialists send inputs to Manager via hooks; Manager posts one consolidated reply.
+5. **Duplicate suppression window (30s).**
+   - If an equivalent answer has just been posted by another agent, post only incremental/new info.
--- a/hq/workspaces/shared/HOOKS-PROTOCOL.md
+++ b/hq/workspaces/shared/HOOKS-PROTOCOL.md
@@ -0,0 +1,35 @@
+# Hooks Protocol — Inter-Agent Communication
+
+## When You Receive a Hook Message
+
+Messages arriving via the Hooks API (delegated tasks from other agents) are **high-priority direct assignments**. They appear as regular messages but come from the delegation system.
+
+### How to Recognize
+
+Hook messages typically contain specific task instructions — e.g., "Find density of Ti-6Al-4V" or "Review the thermal analysis assumptions." They arrive outside of normal Discord conversation flow.
+
+### How to Respond
+
+1. **Treat as top priority** — process before other pending work
+2. **Do the work** — execute the requested task fully
+3. **Respond in Discord** — your response is automatically routed to Discord if `--deliver` was set
+4. **Be thorough but concise** — the requesting agent needs actionable results
+5. **If you can't complete the task**, explain why clearly so the requester can reassign or adjust
+
+### Status Reporting
+
+After completing a delegated task, **append a status line** to `/home/papa/atomizer/workspaces/shared/project_log.md`:
+
+```
+[YYYY-MM-DD HH:MM] <your-agent-name>: Completed — <brief description of what was done>
+```
+
+Only the **Manager** updates `PROJECT_STATUS.md`. Everyone else appends to the log.
+
+## Delegation Authority
+
+| Agent | Can Delegate To |
+|-------|----------------|
+| Manager | All agents |
+| Tech Lead | All agents except Manager |
+| All others | Cannot delegate (request via Manager or Tech Lead) |
--- a/hq/workspaces/shared/PROJECT_STATUS.md
+++ b/hq/workspaces/shared/PROJECT_STATUS.md
@@ -0,0 +1,13 @@
+# Project Status Dashboard
+Updated: 2026-02-15 10:25 AM
+
+## Active Tasks
+- **Material Research (Webster):**
+  - [x] Zerodur Class 0 CTE data acknowledged (2026-02-15 10:07)
+  - [x] Ohara Clearceram-Z HS density confirmed: 2.55 g/cm³ (2026-02-15 10:12)
+  - [x] Zerodur Young's Modulus logged: 90.3 GPa (2026-02-15 10:18)
+
+## Recent Activity
+- Webster logged Young's Modulus for Zerodur (90.3 GPa) via orchestration hook.
+- Webster confirmed receipt of orchestration ping.
+- Webster reported density for Ohara Clearceram-Z HS (2.55 g/cm³).
--- a/hq/workspaces/shared/project_log.md
+++ b/hq/workspaces/shared/project_log.md
@@ -0,0 +1,6 @@
+
+[2026-02-15 18:12] webster: Completed — Research on Ohara Clearceram-Z HS vs Schott Zerodur.
+[2026-02-15 18:12] Webster: Completed — Updated and refined the research summary for Clearceram-Z HS vs. Zerodur with more nuanced data.
+[2026-02-15 18:12] Webster: Completed — Received duplicate refined research summary (Clearceram-Z HS vs. Zerodur). No action taken as data is already in memory.
+[2026-02-15 18:30] Webster: Completed — Logged new material property (Invar 36 Young's modulus) to memory.
+[2026-02-15 18:30] Webster: Completed — Received duplicate material property for Invar 36. No action taken as data is already in memory.
--- a/hq/workspaces/shared/skills/delegate/SKILL.md
+++ b/hq/workspaces/shared/skills/delegate/SKILL.md
@@ -0,0 +1,68 @@
+# Delegate Task to Another Agent
+
+Sends a task to another Atomizer agent via the OpenClaw Hooks API. The target agent processes the task in an isolated session and optionally delivers the response to Discord.
+
+## When to Use
+
+- You need another agent to perform a task (research, analysis, NX work, etc.)
+- You want to assign work and get a response in a Discord channel
+- Cross-agent orchestration
+
+## Usage
+
+```bash
+bash /home/papa/atomizer/workspaces/shared/skills/delegate/delegate.sh <agent> "<instruction>" [options]
+```
+
+### Agents
+
+| Agent | Specialty |
+|-------|-----------|
+| `manager` | Orchestration, project oversight |
+| `tech-lead` | Technical decisions, FEA review |
+| `secretary` | Meeting notes, admin, status updates |
+| `auditor` | Quality checks, compliance review |
+| `optimizer` | Optimization setup, parameter studies |
+| `study-builder` | Study configuration, DOE |
+| `nx-expert` | NX/Simcenter operations |
+| `webster` | Web research, literature search |
+
+### Options
+
+- `--channel <discord-channel-id>` — Route response to a specific Discord channel
+- `--deliver` / `--no-deliver` — Whether to post response to Discord (default: deliver)
+
+### Examples
+
+```bash
+# Ask Webster to research something
+bash /home/papa/atomizer/workspaces/shared/skills/delegate/delegate.sh webster "Find the CTE of Zerodur Class 0 between 20-40°C"
+
+# Assign NX work with channel routing
+bash /home/papa/atomizer/workspaces/shared/skills/delegate/delegate.sh nx-expert "Create mesh convergence study for M2 mirror" --channel C0AEJV13TEU
+
+# Ask auditor to review without posting to Discord
+bash /home/papa/atomizer/workspaces/shared/skills/delegate/delegate.sh auditor "Review the thermal analysis assumptions" --no-deliver
+```
+
+## How It Works
+
+1. Looks up the target agent's port from the cluster port map
+2. Checks if the target agent is running
+3. Sends a `POST /hooks/agent` request to the target's OpenClaw instance
+4. Target agent processes the task in an isolated session
+5. Response is delivered to Discord if `--deliver` is set
+
+## Response
+
+The script outputs:
+- ✅ confirmation with run ID on success
+- ❌ error message with HTTP code on failure
+
+The delegated task runs **asynchronously** — you won't get the result inline. The target agent will respond in Discord.
+
+## Notes
+
+- Tasks are fire-and-forget. Monitor the Discord channel for the response.
+- The target agent sees the message as a hook trigger, not a Discord message.
+- For complex multi-step workflows, delegate one step at a time.
--- a/hq/workspaces/shared/skills/delegate/delegate.sh
+++ b/hq/workspaces/shared/skills/delegate/delegate.sh
@@ -0,0 +1,118 @@
+#!/usr/bin/env bash
+# delegate.sh — Send a task to another Atomizer agent via OpenClaw Hooks API
+# Usage: delegate.sh <agent> <message> [--channel <discord-channel-id>] [--deliver] [--wait]
+#
+# Examples:
+#   delegate.sh webster "Find density of Ti-6Al-4V"
+#   delegate.sh nx-expert "Mesh the M2 mirror" --channel C0AEJV13TEU --deliver
+#   delegate.sh tech-lead "Review optimization results" --deliver
+
+set -euo pipefail
+
+# --- Port Map (from cluster config) ---
+declare -A PORT_MAP=(
+  [manager]=18800
+  [tech-lead]=18804
+  [secretary]=18808
+  [auditor]=18812
+  [optimizer]=18816
+  [study-builder]=18820
+  [nx-expert]=18824
+  [webster]=18828
+)
+
+# --- Config ---
+TOKEN="${GATEWAY_TOKEN}"
+HOST="127.0.0.1"
+
+# --- Parse args ---
+if [[ $# -lt 2 ]]; then
+  echo "Usage: delegate.sh <agent> <message> [--channel <id>] [--deliver] [--wait]"
+  echo ""
+  echo "Agents: ${!PORT_MAP[*]}"
+  exit 1
+fi
+
+AGENT="$1"
+MESSAGE="$2"
+shift 2
+
+CHANNEL=""
+DELIVER="true"
+WAIT=""
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --channel) CHANNEL="$2"; shift 2 ;;
+    --deliver) DELIVER="true"; shift ;;
+    --no-deliver) DELIVER="false"; shift ;;
+    --wait) WAIT="true"; shift ;;
+    *) echo "Unknown option: $1"; exit 1 ;;
+  esac
+done
+
+# --- Validate agent ---
+PORT="${PORT_MAP[$AGENT]:-}"
+if [[ -z "$PORT" ]]; then
+  echo "❌ Unknown agent: $AGENT"
+  echo "Available agents: ${!PORT_MAP[*]}"
+  exit 1
+fi
+
+# --- Don't delegate to yourself ---
+SELF_PORT="${ATOMIZER_SELF_PORT:-}"
+if [[ -n "$SELF_PORT" && "$PORT" == "$SELF_PORT" ]]; then
+  echo "❌ Cannot delegate to yourself"
+  exit 1
+fi
+
+# --- Check if target is running ---
+if ! curl -sf "http://$HOST:$PORT/health" > /dev/null 2>&1; then
+  # Try a simple connection check instead
+  if ! timeout 2 bash -c "echo > /dev/tcp/$HOST/$PORT" 2>/dev/null; then
+    echo "❌ Agent '$AGENT' is not running on port $PORT"
+    exit 1
+  fi
+fi
+
+# --- Build payload ---
+PAYLOAD=$(cat <<EOF
+{
+  "message": $(printf '%s' "$MESSAGE" | python3 -c "import json,sys; print(json.dumps(sys.stdin.read()))"),
+  "name": "delegation",
+  "sessionKey": "hook:delegation:$(date +%s)",
+  "deliver": $DELIVER,
+  "channel": "discord"
+}
+EOF
+)
+
+# Add Discord channel routing if specified
+if [[ -n "$CHANNEL" ]]; then
+  PAYLOAD=$(echo "$PAYLOAD" | python3 -c "
+import json, sys
+d = json.load(sys.stdin)
+d['to'] = 'channel:$CHANNEL'
+print(json.dumps(d))
+")
+fi
+
+# --- Send ---
+RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "http://$HOST:$PORT/hooks/agent" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H "Content-Type: application/json" \
+  -d "$PAYLOAD")
+
+HTTP_CODE=$(echo "$RESPONSE" | tail -1)
+BODY=$(echo "$RESPONSE" | head -n -1)
+
+if [[ "$HTTP_CODE" == "202" ]]; then
+  RUN_ID=$(echo "$BODY" | python3 -c "import json,sys; print(json.loads(sys.stdin.read()).get('runId','unknown'))" 2>/dev/null || echo "unknown")
+  echo "✅ Task delegated to $AGENT (port $PORT)"
+  echo "   Run ID: $RUN_ID"
+  echo "   Deliver to Discord: $DELIVER"
+else
+  echo "❌ Delegation failed (HTTP $HTTP_CODE)"
+  echo "   Response: $BODY"
+  exit 1
+fi
--- a/hq/workspaces/shared/skills/orchestrate/README.md
+++ b/hq/workspaces/shared/skills/orchestrate/README.md
@@ -0,0 +1,116 @@
+# Orchestration Engine — Atomizer HQ
+
+> Multi-instance synchronous delegation, workflow pipelines, and inter-agent coordination.
+
+## Overview
+
+The Orchestration Engine enables structured communication between 8 independent OpenClaw agent instances running on Discord. It replaces fire-and-forget delegation with synchronous handoffs, chaining, validation, and reusable YAML workflows.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────┐
+│             LAYER 3: WORKFLOWS                  │
+│       YAML multi-step pipelines                 │
+│   (workflow.py — parallel, sequential, gates)   │
+├─────────────────────────────────────────────────┤
+│           LAYER 2: SMART ROUTING                │
+│     Capability registry + channel context       │
+│  (AGENTS_REGISTRY.json + fetch-channel-context) │
+├─────────────────────────────────────────────────┤
+│         LAYER 1: ORCHESTRATION CORE             │
+│   Synchronous delegation + result return        │
+│      (orchestrate.py — inotify + handoffs)      │
+├─────────────────────────────────────────────────┤
+│           EXISTING INFRASTRUCTURE               │
+│  8 OpenClaw instances, hooks API, shared fs     │
+└─────────────────────────────────────────────────┘
+```
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `orchestrate.py` | Core delegation engine — sends tasks, waits for handoff files via inotify |
+| `orchestrate.sh` | Thin bash wrapper for orchestrate.py |
+| `workflow.py` | YAML workflow engine — parses, resolves deps, executes pipelines |
+| `workflow.sh` | Thin bash wrapper for workflow.py |
+| `fetch-channel-context.sh` | Fetches Discord channel history as formatted context |
+| `metrics.py` | Analyzes handoff files and workflow runs for stats |
+| `metrics.sh` | Thin bash wrapper for metrics.py |
+
+## Usage
+
+### Single delegation
+```bash
+# Synchronous — blocks until agent responds
+python3 orchestrate.py webster "Find CTE of Zerodur" --caller manager --timeout 120
+
+# With channel context
+python3 orchestrate.py tech-lead "Review thermal margins" --caller manager --channel-context technical --channel-messages 20
+
+# With validation
+python3 orchestrate.py webster "Research ULE properties" --caller manager --validate --timeout 120
+```
+
+### Workflow execution
+```bash
+# Dry-run (validate without executing)
+python3 workflow.py quick-research --input query="CTE of ULE" --caller manager --dry-run
+
+# Live run
+python3 workflow.py quick-research --input query="CTE of ULE" --caller manager --non-interactive
+
+# Material trade study (3-step pipeline)
+python3 workflow.py material-trade-study \
+  --input materials="Zerodur, Clearceram-Z HS, ULE" \
+  --input requirements="CTE < 0.01 ppm/K" \
+  --caller manager --non-interactive
+```
+
+### Metrics
+```bash
+python3 metrics.py text    # Human-readable
+python3 metrics.py json    # JSON output
+```
+
+## Handoff Protocol
+
+Agents write structured JSON to `/home/papa/atomizer/handoffs/{runId}.json`:
+
+```json
+{
+  "schemaVersion": "1.0",
+  "runId": "orch-...",
+  "agent": "webster",
+  "status": "complete|partial|blocked|failed",
+  "result": "...",
+  "artifacts": [],
+  "confidence": "high|medium|low",
+  "notes": "...",
+  "timestamp": "ISO-8601"
+}
+```
+
+## ACL Matrix
+
+| Caller | Can delegate to |
+|--------|----------------|
+| manager | All agents |
+| tech-lead | webster, nx-expert, study-builder, secretary |
+| optimizer | webster, study-builder, secretary |
+| Others | Cannot sub-delegate |
+
+## Workflow Templates
+
+- `quick-research.yaml` — 2 steps: Webster research → Tech-Lead validation
+- `material-trade-study.yaml` — 3 steps: Webster research → Tech-Lead evaluation → Auditor review
+- `design-review.yaml` — 3 steps: Tech-Lead + Optimizer (parallel) → Auditor consolidation
+
+## Result Storage
+
+- Individual handoffs: `/home/papa/atomizer/handoffs/orch-*.json`
+- Sub-delegations: `/home/papa/atomizer/handoffs/sub/`
+- Workflow runs: `/home/papa/atomizer/handoffs/workflows/{workflow-run-id}/`
+  - Per-step: `{step-id}.json`
+  - Summary: `summary.json`
--- a/hq/workspaces/shared/skills/orchestrate/fetch-channel-context.sh
+++ b/hq/workspaces/shared/skills/orchestrate/fetch-channel-context.sh
@@ -0,0 +1,192 @@
+#!/usr/bin/env bash
+# Usage: fetch-channel-context.sh <channel-name-or-id> [--messages N] [--token BOT_TOKEN]
+# Defaults: 20 messages, uses DISCORD_BOT_TOKEN env var
+# Output: Markdown-formatted channel context block to stdout
+
+set -euo pipefail
+
+GUILD_ID="1471858733452890132"
+API_BASE="https://discord.com/api/v10"
+DEFAULT_MESSAGES=20
+MAX_MESSAGES=30
+MAX_OUTPUT_CHARS=4000
+
+usage() {
+  echo "Usage: $0 <channel-name-or-id> [--messages N] [--token BOT_TOKEN]" >&2
+}
+
+if [[ $# -lt 1 ]]; then
+  usage
+  exit 1
+fi
+
+CHANNEL_INPUT="$1"
+shift
+
+MESSAGES="$DEFAULT_MESSAGES"
+TOKEN="${DISCORD_BOT_TOKEN:-}"
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --messages)
+      [[ $# -ge 2 ]] || { echo "Missing value for --messages" >&2; exit 1; }
+      MESSAGES="$2"
+      shift 2
+      ;;
+    --token)
+      [[ $# -ge 2 ]] || { echo "Missing value for --token" >&2; exit 1; }
+      TOKEN="$2"
+      shift 2
+      ;;
+    *)
+      echo "Unknown option: $1" >&2
+      usage
+      exit 1
+      ;;
+  esac
+done
+
+if [[ -z "$TOKEN" ]]; then
+  echo "Missing bot token. Use --token or set DISCORD_BOT_TOKEN." >&2
+  exit 1
+fi
+
+if ! [[ "$MESSAGES" =~ ^[0-9]+$ ]]; then
+  echo "--messages must be a positive integer" >&2
+  exit 1
+fi
+
+if (( MESSAGES < 1 )); then
+  MESSAGES=1
+fi
+if (( MESSAGES > MAX_MESSAGES )); then
+  MESSAGES=$MAX_MESSAGES
+fi
+
+AUTH_HEADER="Authorization: Bot ${TOKEN}"
+
+resolve_channel() {
+  local input="$1"
+
+  if [[ "$input" =~ ^[0-9]{8,}$ ]]; then
+    local ch_json
+    ch_json="$(curl -sf -H "$AUTH_HEADER" "${API_BASE}/channels/${input}")" || return 1
+    python3 - "$ch_json" <<'PY'
+import json, sys
+obj = json.loads(sys.argv[1])
+cid = obj.get("id", "")
+name = obj.get("name", cid)
+if not cid:
+    sys.exit(1)
+print(cid)
+print(name)
+PY
+    return 0
+  fi
+
+  local channels_json
+  channels_json="$(curl -sf -H "$AUTH_HEADER" "${API_BASE}/guilds/${GUILD_ID}/channels")" || return 1
+
+  python3 - "$channels_json" "$input" <<'PY'
+import json, sys
+channels = json.loads(sys.argv[1])
+needle = sys.argv[2].strip().lstrip('#').lower()
+for ch in channels:
+    if str(ch.get("type")) not in {"0", "5", "15"}:
+        continue
+    name = (ch.get("name") or "").lower()
+    if name == needle:
+        print(ch.get("id", ""))
+        print(ch.get("name", ""))
+        sys.exit(0)
+print("", end="")
+sys.exit(1)
+PY
+}
+
+if ! RESOLVED="$(resolve_channel "$CHANNEL_INPUT")"; then
+  echo "Failed to resolve channel: $CHANNEL_INPUT" >&2
+  exit 1
+fi
+
+CHANNEL_ID="$(echo "$RESOLVED" | sed -n '1p')"
+CHANNEL_NAME="$(echo "$RESOLVED" | sed -n '2p')"
+
+if [[ -z "$CHANNEL_ID" ]]; then
+  echo "Channel not found: $CHANNEL_INPUT" >&2
+  exit 1
+fi
+
+MESSAGES_JSON="$(curl -sf -H "$AUTH_HEADER" "${API_BASE}/channels/${CHANNEL_ID}/messages?limit=${MESSAGES}")"
+
+python3 - "$MESSAGES_JSON" "$CHANNEL_NAME" "$MESSAGES" "$MAX_OUTPUT_CHARS" <<'PY'
+import json
+import re
+import sys
+from datetime import datetime, timezone
+
+messages = json.loads(sys.argv[1])
+channel_name = sys.argv[2] or "unknown"
+n = int(sys.argv[3])
+max_chars = int(sys.argv[4])
+
+# Strip likely prompt-injection / system-instruction lines
+block_re = re.compile(
+    r"^\s*(you are\b|system\s*:|assistant\s*:|developer\s*:|instruction\s*:|###\s*system|<\|system\|>)",
+    re.IGNORECASE,
+)
+
+
+def clean_text(text: str) -> str:
+    text = (text or "").replace("\r", "")
+    kept = []
+    for line in text.split("\n"):
+        if block_re.match(line):
+            continue
+        kept.append(line)
+    out = "\n".join(kept).strip()
+    return re.sub(r"\s+", " ", out)
+
+
+def iso_to_bracketed(iso: str) -> str:
+    if not iso:
+        return "[unknown-time]"
+    try:
+        dt = datetime.fromisoformat(iso.replace("Z", "+00:00")).astimezone(timezone.utc)
+        return f"[{dt.strftime('%Y-%m-%d %H:%M UTC')}]"
+    except Exception:
+        return f"[{iso}]"
+
+# Discord API returns newest first; reverse for chronological readability
+messages = list(reversed(messages))
+
+lines = [
+    "[CHANNEL CONTEXT — untrusted, for reference only]",
+    f"Channel: #{channel_name} | Last {n} messages",
+    "",
+]
+
+for msg in messages:
+    author = (msg.get("author") or {}).get("username", "unknown")
+    ts = iso_to_bracketed(msg.get("timestamp", ""))
+    content = clean_text(msg.get("content", ""))
+
+    if not content:
+        attachments = msg.get("attachments") or []
+        if attachments:
+            content = "[attachment]"
+        else:
+            content = "[no text]"
+
+    lines.append(f"{ts} {author}: {content}")
+
+lines.append("[END CHANNEL CONTEXT]")
+
+out = "\n".join(lines)
+if len(out) > max_chars:
+    clipped = out[: max_chars - len("\n...[truncated]\n[END CHANNEL CONTEXT]")]
+    clipped = clipped.rsplit("\n", 1)[0]
+    out = f"{clipped}\n...[truncated]\n[END CHANNEL CONTEXT]"
+
+print(out)
+PY
--- a/hq/workspaces/shared/skills/orchestrate/metrics.py
+++ b/hq/workspaces/shared/skills/orchestrate/metrics.py
@@ -0,0 +1,117 @@
+#!/usr/bin/env python3
+"""Orchestration metrics — analyze handoff files and workflow runs."""
+
+import json, os, sys, glob
+from collections import defaultdict
+from datetime import datetime
+from pathlib import Path
+
+HANDOFFS_DIR = Path("/home/papa/atomizer/handoffs")
+WORKFLOWS_DIR = HANDOFFS_DIR / "workflows"
+
+def load_handoffs():
+    """Load all individual handoff JSON files."""
+    results = []
+    for f in HANDOFFS_DIR.glob("orch-*.json"):
+        try:
+            with open(f) as fh:
+                data = json.load(fh)
+                data["_file"] = f.name
+                results.append(data)
+        except Exception:
+            pass
+    return results
+
+def load_workflow_summaries():
+    """Load all workflow summary.json files."""
+    results = []
+    for d in WORKFLOWS_DIR.iterdir():
+        summary = d / "summary.json"
+        if summary.exists():
+            try:
+                with open(summary) as fh:
+                    data = json.load(fh)
+                    results.append(data)
+            except Exception:
+                pass
+    return results
+
+def compute_metrics():
+    handoffs = load_handoffs()
+    workflows = load_workflow_summaries()
+
+    # Per-agent stats
+    agent_stats = defaultdict(lambda: {"total": 0, "complete": 0, "failed": 0, "partial": 0, "blocked": 0, "avg_latency_ms": 0, "latencies": []})
+
+    for h in handoffs:
+        agent = h.get("agent", "unknown")
+        status = h.get("status", "unknown")
+        agent_stats[agent]["total"] += 1
+        if status in agent_stats[agent]:
+            agent_stats[agent][status] += 1
+        lat = h.get("latencyMs")
+        if lat:
+            agent_stats[agent]["latencies"].append(lat)
+
+    # Compute averages
+    for agent, stats in agent_stats.items():
+        lats = stats.pop("latencies")
+        if lats:
+            stats["avg_latency_ms"] = int(sum(lats) / len(lats))
+            stats["min_latency_ms"] = min(lats)
+            stats["max_latency_ms"] = max(lats)
+        stats["success_rate"] = f"{stats['complete']/stats['total']*100:.0f}%" if stats["total"] > 0 else "N/A"
+
+    # Workflow stats
+    wf_stats = {"total": len(workflows), "complete": 0, "failed": 0, "partial": 0, "avg_duration_s": 0, "durations": []}
+    for w in workflows:
+        status = w.get("status", "unknown")
+        if status == "complete":
+            wf_stats["complete"] += 1
+        elif status in ("failed", "error"):
+            wf_stats["failed"] += 1
+        else:
+            wf_stats["partial"] += 1
+        dur = w.get("duration_s")
+        if dur:
+            wf_stats["durations"].append(dur)
+
+    durs = wf_stats.pop("durations")
+    if durs:
+        wf_stats["avg_duration_s"] = round(sum(durs) / len(durs), 1)
+        wf_stats["min_duration_s"] = round(min(durs), 1)
+        wf_stats["max_duration_s"] = round(max(durs), 1)
+    wf_stats["success_rate"] = f"{wf_stats['complete']/wf_stats['total']*100:.0f}%" if wf_stats["total"] > 0 else "N/A"
+
+    return {
+        "generated_at": datetime.utcnow().isoformat() + "Z",
+        "total_handoffs": len(handoffs),
+        "total_workflows": len(workflows),
+        "agent_stats": dict(agent_stats),
+        "workflow_stats": wf_stats
+    }
+
+def main():
+    fmt = sys.argv[1] if len(sys.argv) > 1 else "json"
+    metrics = compute_metrics()
+
+    if fmt == "json":
+        print(json.dumps(metrics, indent=2))
+    elif fmt == "text":
+        print("=== Orchestration Metrics ===")
+        print(f"Generated: {metrics['generated_at']}")
+        print(f"Total handoffs: {metrics['total_handoffs']}")
+        print(f"Total workflows: {metrics['total_workflows']}")
+        print()
+        print("--- Per-Agent Stats ---")
+        for agent, stats in sorted(metrics["agent_stats"].items()):
+            print(f"  {agent}: {stats['total']} tasks, {stats['success_rate']} success, avg {stats.get('avg_latency_ms', 'N/A')}ms")
+        print()
+        print("--- Workflow Stats ---")
+        ws = metrics["workflow_stats"]
+        print(f"  {ws['total']} runs, {ws['success_rate']} success, avg {ws.get('avg_duration_s', 'N/A')}s")
+    else:
+        print(json.dumps(metrics, indent=2))
+
+if __name__ == "__main__":
+    main()
--- a/hq/workspaces/shared/skills/orchestrate/metrics.sh
+++ b/hq/workspaces/shared/skills/orchestrate/metrics.sh
@@ -0,0 +1,2 @@
+#!/usr/bin/env bash
+exec python3 "$(dirname "$0")/metrics.py" "$@"
--- a/hq/workspaces/shared/skills/orchestrate/orchestrate.py
+++ b/hq/workspaces/shared/skills/orchestrate/orchestrate.py
@@ -0,0 +1,582 @@
+#!/usr/bin/env python3
+"""
+Atomizer HQ Orchestration Engine — Phase 1b
+Synchronous delegation with file-based handoffs, inotify, validation, retries, error handling.
+
+Usage:
+    python3 orchestrate.py <agent> "<task>" [options]
+
+Options:
+    --wait              Block until agent completes (default: true)
+    --timeout <sec>     Max wait time per attempt (default: 300)
+    --format json|text  Expected response format (default: json)
+    --context <file>    Attach context file to the task
+    --no-deliver        Don't post to Discord
+    --run-id <id>       Custom run ID (default: auto-generated)
+    --retries <N>       Retry on failure (default: 1, max: 3)
+    --validate          Validate required handoff fields strictly
+    --workflow-id <id>  Workflow run ID (for tracing)
+    --step-id <id>      Workflow step ID (for tracing)
+    --caller <agent>    Calling agent (for ACL enforcement)
+    --channel-context <channel>  Include recent Discord channel history as untrusted context
+    --channel-messages <N>  Number of channel messages to fetch (default: 20, max: 30)
+"""
+
+import argparse
+import json
+import os
+import subprocess
+import sys
+import time
+import uuid
+from pathlib import Path
+
+# ── Constants ────────────────────────────────────────────────────────────────
+
+HANDOFF_DIR = Path("/home/papa/atomizer/handoffs")
+LOG_DIR = Path("/home/papa/atomizer/logs/orchestration")
+REGISTRY_PATH = Path("/home/papa/atomizer/workspaces/shared/AGENTS_REGISTRY.json")
+ORCHESTRATE_DIR = Path("/home/papa/atomizer/workspaces/shared/skills/orchestrate")
+GATEWAY_TOKEN = "31422bb39bc9e7a4d34f789d8a7cbc582dece8dd170dadd1"
+
+# Port map (fallback if registry unavailable)
+AGENT_PORTS = {
+    "manager": 18800,
+    "tech-lead": 18804,
+    "secretary": 18808,
+    "auditor": 18812,
+    "optimizer": 18816,
+    "study-builder": 18820,
+    "nx-expert": 18824,
+    "webster": 18828,
+}
+
+# Delegation ACL — who can delegate to whom
+DELEGATION_ACL = {
+    "manager":       ["tech-lead", "auditor", "optimizer", "study-builder", "nx-expert", "webster", "secretary"],
+    "tech-lead":     ["webster", "nx-expert", "study-builder", "secretary"],
+    "optimizer":     ["webster", "study-builder", "secretary"],
+    # All others: no sub-delegation allowed
+}
+
+# Required handoff fields for strict validation
+REQUIRED_FIELDS = ["status", "result"]
+STRICT_FIELDS = ["schemaVersion", "status", "result", "confidence", "timestamp"]
+
+# ── Helpers ──────────────────────────────────────────────────────────────────
+
+def get_agent_port(agent: str) -> int:
+    """Resolve agent name to port, checking registry first."""
+    if REGISTRY_PATH.exists():
+        try:
+            registry = json.loads(REGISTRY_PATH.read_text())
+            agent_info = registry.get("agents", {}).get(agent)
+            if agent_info and "port" in agent_info:
+                return agent_info["port"]
+        except (json.JSONDecodeError, KeyError):
+            pass
+    port = AGENT_PORTS.get(agent)
+    if port is None:
+        emit_error(f"Unknown agent '{agent}'")
+        sys.exit(1)
+    return port
+
+
+def check_acl(caller: str | None, target: str) -> bool:
+    """Check if caller is allowed to delegate to target."""
+    if caller is None:
+        return True  # No caller specified = no ACL enforcement
+    if caller == target:
+        return False  # No self-delegation
+    allowed = DELEGATION_ACL.get(caller)
+    if allowed is None:
+        return False  # Agent not in ACL = cannot delegate
+    return target in allowed
+
+
+def check_health(agent: str, port: int) -> bool:
+    """Quick health check — can we reach the agent's gateway?"""
+    try:
+        result = subprocess.run(
+            ["curl", "-sf", "-o", "/dev/null", "-w", "%{http_code}",
+             f"http://127.0.0.1:{port}/healthz"],
+            capture_output=True, text=True, timeout=5
+        )
+        return result.stdout.strip() in ("200", "204")
+    except (subprocess.TimeoutExpired, Exception):
+        return False
+
+
+def send_task(agent: str, port: int, task: str, run_id: str,
+              attempt: int = 1, prev_error: str = None,
+              context: str = None, no_deliver: bool = False) -> bool:
+    """Send a task to the agent via /hooks/agent endpoint."""
+    handoff_path = HANDOFF_DIR / f"{run_id}.json"
+    
+    # Build retry context if this is a retry
+    retry_note = ""
+    if attempt > 1 and prev_error:
+        retry_note = f"\n⚠️ RETRY (attempt {attempt}): Previous attempt failed: {prev_error}\nPlease try again carefully.\n"
+    
+    message = f"""[ORCHESTRATED TASK — run_id: {run_id}]
+{retry_note}
+IMPORTANT: Answer this task DIRECTLY. Do NOT spawn sub-agents, Codex, or background processes.
+Use your own knowledge and tools (web_search, web_fetch) directly. Keep your response focused and concise.
+
+{task}
+
+---
+IMPORTANT: When you complete this task, write your response as a JSON file to:
+{handoff_path}
+
+Use this exact format:
+```json
+{{
+  "schemaVersion": "1.0",
+  "runId": "{run_id}",
+  "agent": "{agent}",
+  "status": "complete",
+  "result": "<your findings/output here>",
+  "artifacts": [],
+  "confidence": "high|medium|low",
+  "notes": "<any caveats or open questions>",
+  "timestamp": "<ISO-8601 timestamp>"
+}}
+```
+
+Status values: complete | partial | blocked | failed
+Write the file BEFORE posting to Discord. The orchestrator is waiting for it."""
+
+    if context:
+        message = f"CONTEXT:\n{context}\n\n{message}"
+
+    payload = {
+        "message": message,
+        "name": f"orchestrate:{run_id}",
+        "sessionKey": f"hook:orchestrate:{run_id}:{attempt}",
+        "deliver": not no_deliver,
+        "wakeMode": "now",
+        "timeoutSeconds": 600,
+    }
+
+    try:
+        result = subprocess.run(
+            ["curl", "-sf", "-X", "POST",
+             f"http://127.0.0.1:{port}/hooks/agent",
+             "-H", f"Authorization: Bearer {GATEWAY_TOKEN}",
+             "-H", "Content-Type: application/json",
+             "-d", json.dumps(payload)],
+            capture_output=True, text=True, timeout=15
+        )
+        return result.returncode == 0
+    except (subprocess.TimeoutExpired, Exception) as e:
+        log_event(run_id, agent, "send_error", str(e), attempt=attempt)
+        return False
+
+
+def wait_for_handoff(run_id: str, timeout: int) -> dict | None:
+    """Wait for the handoff file using inotify. Falls back to polling."""
+    handoff_path = HANDOFF_DIR / f"{run_id}.json"
+    
+    # Check if already exists (agent was fast, or late arrival from prev attempt)
+    if handoff_path.exists():
+        return read_handoff(handoff_path)
+
+    try:
+        from inotify_simple import INotify, flags
+        
+        inotify = INotify()
+        watch_flags = flags.CREATE | flags.MOVED_TO | flags.CLOSE_WRITE
+        wd = inotify.add_watch(str(HANDOFF_DIR), watch_flags)
+        
+        deadline = time.time() + timeout
+        target_name = f"{run_id}.json"
+        
+        while time.time() < deadline:
+            remaining = max(0.1, deadline - time.time())
+            events = inotify.read(timeout=int(remaining * 1000))
+            
+            for event in events:
+                if event.name == target_name:
+                    time.sleep(0.3)  # Ensure file is fully written
+                    inotify.close()
+                    return read_handoff(handoff_path)
+            
+            # Direct check in case we missed the inotify event
+            if handoff_path.exists():
+                inotify.close()
+                return read_handoff(handoff_path)
+        
+        inotify.close()
+        return None
+        
+    except ImportError:
+        return poll_for_handoff(handoff_path, timeout)
+
+
+def poll_for_handoff(handoff_path: Path, timeout: int) -> dict | None:
+    """Fallback polling if inotify unavailable."""
+    deadline = time.time() + timeout
+    while time.time() < deadline:
+        if handoff_path.exists():
+            time.sleep(0.3)
+            return read_handoff(handoff_path)
+        time.sleep(2)
+    return None
+
+
+def read_handoff(path: Path) -> dict | None:
+    """Read and parse a handoff file."""
+    try:
+        raw = path.read_text().strip()
+        data = json.loads(raw)
+        return data
+    except json.JSONDecodeError:
+        return {
+            "status": "malformed",
+            "result": path.read_text()[:2000],
+            "notes": "Invalid JSON in handoff file",
+            "_raw": True,
+        }
+    except Exception as e:
+        return {
+            "status": "error",
+            "result": str(e),
+            "notes": f"Failed to read handoff file: {e}",
+        }
+
+
+def validate_handoff(data: dict, strict: bool = False) -> tuple[bool, str]:
+    """Validate handoff data. Returns (valid, error_message)."""
+    if data is None:
+        return False, "No handoff data"
+    
+    fields = STRICT_FIELDS if strict else REQUIRED_FIELDS
+    missing = [f for f in fields if f not in data]
+    if missing:
+        return False, f"Missing fields: {', '.join(missing)}"
+    
+    status = data.get("status", "")
+    if status not in ("complete", "partial", "blocked", "failed"):
+        return False, f"Invalid status: '{status}'"
+    
+    if status == "failed":
+        return False, f"Agent reported failure: {data.get('notes', 'no details')}"
+    
+    if status == "blocked":
+        return False, f"Agent blocked: {data.get('notes', 'no details')}"
+    
+    return True, ""
+
+
+def should_retry(result: dict | None, attempt: int, max_retries: int) -> tuple[bool, str]:
+    """Decide whether to retry based on result and attempt count."""
+    if attempt >= max_retries:
+        return False, "Max retries reached"
+    
+    if result is None:
+        return True, "timeout"
+    
+    status = result.get("status", "")
+    
+    if status == "malformed":
+        return True, "malformed response"
+    
+    if status == "failed":
+        return True, f"agent failed: {result.get('notes', '')}"
+    
+    if status == "partial" and result.get("confidence") == "low":
+        return True, "partial with low confidence"
+    
+    if status == "error":
+        return True, f"error: {result.get('notes', '')}"
+    
+    return False, ""
+
+
+def clear_handoff(run_id: str):
+    """Remove handoff file before retry."""
+    handoff_path = HANDOFF_DIR / f"{run_id}.json"
+    if handoff_path.exists():
+        # Rename to .prev instead of deleting (for debugging)
+        handoff_path.rename(handoff_path.with_suffix(".prev.json"))
+
+
+def log_event(run_id: str, agent: str, event_type: str, detail: str = "",
+              attempt: int = 1, elapsed_ms: int = 0, **extra):
+    """Unified logging."""
+    LOG_DIR.mkdir(parents=True, exist_ok=True)
+    log_file = LOG_DIR / f"{time.strftime('%Y-%m-%d')}.jsonl"
+    entry = {
+        "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
+        "runId": run_id,
+        "agent": agent,
+        "event": event_type,
+        "detail": detail[:500],
+        "attempt": attempt,
+        "elapsedMs": elapsed_ms,
+        **extra,
+    }
+    with open(log_file, "a") as f:
+        f.write(json.dumps(entry) + "\n")
+
+
+def emit_error(msg: str):
+    """Print error to stderr."""
+    print(f"ERROR: {msg}", file=sys.stderr)
+
+
+def get_discord_token_for_caller(caller: str) -> str | None:
+    """Load caller bot token from instance config."""
+    cfg = Path(f"/home/papa/atomizer/instances/{caller}/openclaw.json")
+    if not cfg.exists():
+        return None
+    try:
+        data = json.loads(cfg.read_text())
+        return data.get("channels", {}).get("discord", {}).get("token")
+    except Exception:
+        return None
+
+
+def fetch_channel_context(channel: str, messages: int, token: str) -> str | None:
+    """Fetch formatted channel context via helper script."""
+    script = ORCHESTRATE_DIR / "fetch-channel-context.sh"
+    if not script.exists():
+        return None
+    try:
+        result = subprocess.run(
+            [str(script), channel, "--messages", str(messages), "--token", token],
+            capture_output=True,
+            text=True,
+            timeout=30,
+            check=False,
+        )
+        if result.returncode != 0:
+            emit_error(f"Channel context fetch failed: {result.stderr.strip()}")
+            return None
+        return result.stdout.strip()
+    except Exception as e:
+        emit_error(f"Channel context fetch error: {e}")
+        return None
+
+
+# ── Main ─────────────────────────────────────────────────────────────────────
+
+def main():
+    parser = argparse.ArgumentParser(description="Atomizer Orchestration Engine")
+    parser.add_argument("agent", help="Target agent name")
+    parser.add_argument("task", help="Task to delegate")
+    parser.add_argument("--wait", action="store_true", default=True)
+    parser.add_argument("--timeout", type=int, default=300,
+                        help="Timeout per attempt in seconds (default: 300)")
+    parser.add_argument("--format", choices=["json", "text"], default="json")
+    parser.add_argument("--context", type=str, default=None,
+                        help="Path to context file")
+    parser.add_argument("--no-deliver", action="store_true")
+    parser.add_argument("--run-id", type=str, default=None)
+    parser.add_argument("--retries", type=int, default=1,
+                        help="Max attempts (default: 1, max: 3)")
+    parser.add_argument("--validate", action="store_true",
+                        help="Strict validation of handoff fields")
+    parser.add_argument("--workflow-id", type=str, default=None,
+                        help="Workflow run ID for tracing")
+    parser.add_argument("--step-id", type=str, default=None,
+                        help="Workflow step ID for tracing")
+    parser.add_argument("--caller", type=str, default=None,
+                        help="Calling agent for ACL enforcement")
+    parser.add_argument("--channel-context", type=str, default=None,
+                        help="Discord channel name or ID to include as context")
+    parser.add_argument("--channel-messages", type=int, default=20,
+                        help="Number of channel messages to fetch (default: 20, max: 30)")
+    
+    args = parser.parse_args()
+    
+    # Clamp retries
+    max_retries = min(max(args.retries, 1), 3)
+    
+    # Generate run ID
+    run_id = args.run_id or f"orch-{int(time.time())}-{uuid.uuid4().hex[:8]}"
+
+    # Task text can be augmented (e.g., channel context prepend)
+    delegated_task = args.task
+    
+    # ACL check
+    if not check_acl(args.caller, args.agent):
+        result = {
+            "status": "error",
+            "result": None,
+            "notes": f"ACL denied: '{args.caller}' cannot delegate to '{args.agent}'",
+            "agent": args.agent,
+            "runId": run_id,
+        }
+        print(json.dumps(result, indent=2))
+        log_event(run_id, args.agent, "acl_denied", f"caller={args.caller}")
+        sys.exit(1)
+    
+    # Resolve agent port
+    port = get_agent_port(args.agent)
+    
+    # Health check
+    if not check_health(args.agent, port):
+        result = {
+            "status": "error",
+            "result": None,
+            "notes": f"Agent '{args.agent}' unreachable at port {port}",
+            "agent": args.agent,
+            "runId": run_id,
+        }
+        print(json.dumps(result, indent=2))
+        log_event(run_id, args.agent, "health_failed", f"port={port}")
+        sys.exit(1)
+    
+    # Load context
+    context = None
+    if args.context:
+        ctx_path = Path(args.context)
+        if ctx_path.exists():
+            context = ctx_path.read_text()
+        else:
+            emit_error(f"Context file not found: {args.context}")
+    
+    # Optional channel context
+    if args.channel_context:
+        if not args.caller:
+            emit_error("--channel-context requires --caller so bot token can be resolved")
+            sys.exit(1)
+
+        token = get_discord_token_for_caller(args.caller)
+        if not token:
+            emit_error(f"Could not resolve Discord bot token for caller '{args.caller}'")
+            sys.exit(1)
+
+        channel_messages = min(max(args.channel_messages, 1), 30)
+        ch_ctx = fetch_channel_context(args.channel_context, channel_messages, token)
+        if not ch_ctx:
+            emit_error(f"Failed to fetch channel context for '{args.channel_context}'")
+            sys.exit(1)
+        delegated_task = f"{ch_ctx}\n\n{delegated_task}"
+
+    # ── Retry loop ───────────────────────────────────────────────────────
+    
+    result = None
+    prev_error = None
+    
+    for attempt in range(1, max_retries + 1):
+        attempt_start = time.time()
+        
+        log_event(run_id, args.agent, "attempt_start", delegated_task[:200],
+                  attempt=attempt)
+        
+        # Idempotency check: if handoff file exists from a previous attempt, use it
+        handoff_path = HANDOFF_DIR / f"{run_id}.json"
+        if attempt > 1 and handoff_path.exists():
+            result = read_handoff(handoff_path)
+            if result and result.get("status") in ("complete", "partial"):
+                log_event(run_id, args.agent, "late_arrival",
+                         "Handoff file arrived between retries",
+                         attempt=attempt)
+                break
+            # Previous result was bad, clear it for retry
+            clear_handoff(run_id)
+        
+        # Send task
+        sent = send_task(args.agent, port, delegated_task, run_id,
+                        attempt=attempt, prev_error=prev_error,
+                        context=context, no_deliver=args.no_deliver)
+        
+        if not sent:
+            prev_error = "Failed to send task"
+            log_event(run_id, args.agent, "send_failed", prev_error,
+                     attempt=attempt)
+            if attempt < max_retries:
+                time.sleep(5)  # Brief pause before retry
+                continue
+            result = {
+                "status": "error",
+                "result": None,
+                "notes": f"Failed to send task after {attempt} attempts",
+            }
+            break
+        
+        # Wait for result
+        if args.wait:
+            result = wait_for_handoff(run_id, args.timeout)
+            elapsed = time.time() - attempt_start
+            
+            # Validate
+            if result is not None:
+                valid, error_msg = validate_handoff(result, strict=args.validate)
+                if not valid:
+                    log_event(run_id, args.agent, "validation_failed",
+                             error_msg, attempt=attempt,
+                             elapsed_ms=int(elapsed * 1000))
+                    
+                    do_retry, reason = should_retry(result, attempt, max_retries)
+                    if do_retry:
+                        prev_error = reason
+                        clear_handoff(run_id)
+                        time.sleep(3)
+                        continue
+                    # No retry — return what we have
+                    break
+                else:
+                    # Valid result
+                    log_event(run_id, args.agent, "complete",
+                             result.get("status", ""),
+                             attempt=attempt,
+                             elapsed_ms=int(elapsed * 1000),
+                             confidence=result.get("confidence"))
+                    break
+            else:
+                # Timeout
+                log_event(run_id, args.agent, "timeout", "",
+                         attempt=attempt,
+                         elapsed_ms=int(elapsed * 1000))
+                
+                do_retry, reason = should_retry(result, attempt, max_retries)
+                if do_retry:
+                    prev_error = "timeout"
+                    continue
+                
+                result = {
+                    "status": "timeout",
+                    "result": None,
+                    "notes": f"Agent did not respond within {args.timeout}s "
+                             f"(attempt {attempt}/{max_retries})",
+                }
+                break
+        else:
+            # Fire and forget
+            print(json.dumps({"status": "sent", "runId": run_id, "agent": args.agent}))
+            sys.exit(0)
+    
+    # ── Output ───────────────────────────────────────────────────────────
+    
+    if result is None:
+        result = {
+            "status": "error",
+            "result": None,
+            "notes": "No result after all attempts",
+        }
+    
+    # Add metadata
+    total_elapsed = time.time() - (attempt_start if 'attempt_start' in dir() else time.time())
+    result["runId"] = run_id
+    result["agent"] = args.agent
+    result["latencyMs"] = int(total_elapsed * 1000)
+    if args.workflow_id:
+        result["workflowRunId"] = args.workflow_id
+    if args.step_id:
+        result["stepId"] = args.step_id
+    
+    if args.format == "json":
+        print(json.dumps(result, indent=2))
+    else:
+        print(result.get("result", ""))
+    
+    status = result.get("status", "error")
+    sys.exit(0 if status in ("complete", "partial") else 1)
+
+
+if __name__ == "__main__":
+    main()
--- a/hq/workspaces/shared/skills/orchestrate/orchestrate.sh
+++ b/hq/workspaces/shared/skills/orchestrate/orchestrate.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+# Thin wrapper around orchestrate.py
+# Usage: bash orchestrate.sh <agent> "<task>" [options]
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+exec python3 "$SCRIPT_DIR/orchestrate.py" "$@"
--- a/hq/workspaces/shared/skills/orchestrate/workflow.py
+++ b/hq/workspaces/shared/skills/orchestrate/workflow.py
@@ -0,0 +1,437 @@
+#!/usr/bin/env python3
+"""YAML workflow engine for Atomizer orchestration."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import subprocess
+import sys
+import threading
+import time
+import uuid
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+
+try:
+    import yaml
+except ImportError:
+    print(json.dumps({"status": "error", "error": "PyYAML is required (pip install pyyaml)"}, indent=2))
+    sys.exit(1)
+
+WORKFLOWS_DIR = Path("/home/papa/atomizer/workspaces/shared/workflows")
+ORCHESTRATE_PY = Path("/home/papa/atomizer/workspaces/shared/skills/orchestrate/orchestrate.py")
+HANDOFF_WORKFLOWS_DIR = Path("/home/papa/atomizer/handoffs/workflows")
+
+
+def now_iso() -> str:
+    return datetime.now(timezone.utc).isoformat()
+
+
+def parse_inputs(items: list[str]) -> dict[str, Any]:
+    parsed: dict[str, Any] = {}
+    for item in items:
+        if "=" not in item:
+            raise ValueError(f"Invalid --input '{item}', expected key=value")
+        k, v = item.split("=", 1)
+        parsed[k.strip()] = v.strip()
+    return parsed
+
+
+def resolve_workflow_path(name_or_path: str) -> Path:
+    p = Path(name_or_path)
+    if p.exists():
+        return p
+    candidates = [WORKFLOWS_DIR / name_or_path, WORKFLOWS_DIR / f"{name_or_path}.yaml", WORKFLOWS_DIR / f"{name_or_path}.yml"]
+    for c in candidates:
+        if c.exists():
+            return c
+    raise FileNotFoundError(f"Workflow not found: {name_or_path}")
+
+
+def load_workflow(path: Path) -> dict[str, Any]:
+    data = yaml.safe_load(path.read_text())
+    if not isinstance(data, dict):
+        raise ValueError("Workflow YAML must be an object")
+    if not isinstance(data.get("steps"), list) or not data["steps"]:
+        raise ValueError("Workflow must define non-empty 'steps'")
+    return data
+
+
+def validate_graph(steps: list[dict[str, Any]]) -> tuple[dict[str, dict[str, Any]], dict[str, set[str]], dict[str, set[str]], list[list[str]]]:
+    step_map: dict[str, dict[str, Any]] = {}
+    deps: dict[str, set[str]] = {}
+    reverse: dict[str, set[str]] = {}
+
+    for step in steps:
+        sid = step.get("id")
+        if not sid or not isinstance(sid, str):
+            raise ValueError("Each step needs string 'id'")
+        if sid in step_map:
+            raise ValueError(f"Duplicate step id: {sid}")
+        step_map[sid] = step
+        deps[sid] = set(step.get("depends_on", []) or [])
+        reverse[sid] = set()
+
+    for sid, dset in deps.items():
+        for dep in dset:
+            if dep not in step_map:
+                raise ValueError(f"Step '{sid}' depends on unknown step '{dep}'")
+            reverse[dep].add(sid)
+
+    # topological layering + cycle check
+    indeg = {sid: len(dset) for sid, dset in deps.items()}
+    ready = sorted([sid for sid, d in indeg.items() if d == 0])
+    visited = 0
+    layers: list[list[str]] = []
+
+    while ready:
+        layer = list(ready)
+        layers.append(layer)
+        visited += len(layer)
+        next_ready: list[str] = []
+        for sid in layer:
+            for child in sorted(reverse[sid]):
+                indeg[child] -= 1
+                if indeg[child] == 0:
+                    next_ready.append(child)
+        ready = sorted(next_ready)
+
+    if visited != len(step_map):
+        cycle_nodes = [sid for sid, d in indeg.items() if d > 0]
+        raise ValueError(f"Dependency cycle detected involving: {', '.join(cycle_nodes)}")
+
+    return step_map, deps, reverse, layers
+
+
+_VAR_RE = re.compile(r"\{([^{}]+)\}")
+
+
+def substitute(text: str, step_outputs: dict[str, Any], inputs: dict[str, Any]) -> str:
+    def repl(match: re.Match[str]) -> str:
+        key = match.group(1).strip()
+        if key.startswith("inputs."):
+            iv = key.split(".", 1)[1]
+            if iv not in inputs:
+                return match.group(0)
+            return str(inputs[iv])
+        if key in step_outputs:
+            val = step_outputs[key]
+            if isinstance(val, (dict, list)):
+                return json.dumps(val, ensure_ascii=False)
+            return str(val)
+        return match.group(0)
+
+    return _VAR_RE.sub(repl, text)
+
+
+def approval_check(step: dict[str, Any], non_interactive: bool) -> bool:
+    gate = step.get("approval_gate")
+    if not gate:
+        return True
+    if non_interactive:
+        print(f"WARNING: non-interactive mode, skipping approval gate '{gate}' for step '{step['id']}'", file=sys.stderr)
+        return True
+    print(f"Approval gate required for step '{step['id']}' ({gate}). Approve? [yes/no]: ", end="", flush=True)
+    response = sys.stdin.readline().strip().lower()
+    return response in {"y", "yes"}
+
+
+def run_orchestrate(agent: str, task: str, timeout_s: int, caller: str, workflow_run_id: str, step_id: str, retries: int) -> dict[str, Any]:
+    cmd = [
+        "python3", str(ORCHESTRATE_PY),
+        agent,
+        task,
+        "--timeout", str(timeout_s),
+        "--caller", caller,
+        "--workflow-id", workflow_run_id,
+        "--step-id", step_id,
+        "--retries", str(max(1, retries)),
+        "--format", "json",
+    ]
+    proc = subprocess.run(cmd, capture_output=True, text=True)
+
+    out = (proc.stdout or "").strip()
+    if not out:
+        return {
+            "status": "failed",
+            "result": None,
+            "notes": f"No stdout from orchestrate.py; stderr: {(proc.stderr or '').strip()[:1000]}",
+            "exitCode": proc.returncode,
+        }
+
+    try:
+        data = json.loads(out)
+    except json.JSONDecodeError:
+        return {
+            "status": "failed",
+            "result": out,
+            "notes": f"Non-JSON response from orchestrate.py; stderr: {(proc.stderr or '').strip()[:1000]}",
+            "exitCode": proc.returncode,
+        }
+
+    data["exitCode"] = proc.returncode
+    if proc.stderr:
+        data["stderr"] = proc.stderr.strip()[:2000]
+    return data
+
+
+def validation_passed(validation_result: dict[str, Any]) -> bool:
+    if validation_result.get("status") not in {"complete", "partial"}:
+        return False
+    body = str(validation_result.get("result", "")).strip()
+    # If validator returned JSON in result, try to parse decision.
+    try:
+        obj = json.loads(body)
+        decision = str(obj.get("decision", "")).lower()
+        if decision in {"accept", "approved", "pass", "passed"}:
+            return True
+        if decision in {"reject", "fail", "failed"}:
+            return False
+    except Exception:
+        pass
+    lowered = body.lower()
+    if "reject" in lowered or "fail" in lowered:
+        return False
+    return True
+
+
+def execute_step(
+    step: dict[str, Any],
+    inputs: dict[str, Any],
+    step_outputs: dict[str, Any],
+    caller: str,
+    workflow_run_id: str,
+    remaining_timeout: int,
+    non_interactive: bool,
+    out_dir: Path,
+    lock: threading.Lock,
+) -> dict[str, Any]:
+    sid = step["id"]
+    start = time.time()
+
+    if not approval_check(step, non_interactive):
+        result = {
+            "step_id": sid,
+            "status": "failed",
+            "error": "approval_denied",
+            "started_at": now_iso(),
+            "finished_at": now_iso(),
+            "duration_s": 0,
+        }
+        (out_dir / f"{sid}.json").write_text(json.dumps(result, indent=2))
+        return result
+
+    task = substitute(str(step.get("task", "")), step_outputs, inputs)
+    agent = step.get("agent")
+    if not agent:
+        result = {
+            "step_id": sid,
+            "status": "failed",
+            "error": "missing_agent",
+            "started_at": now_iso(),
+            "finished_at": now_iso(),
+            "duration_s": 0,
+        }
+        (out_dir / f"{sid}.json").write_text(json.dumps(result, indent=2))
+        return result
+
+    step_timeout = int(step.get("timeout", 300))
+    timeout_s = max(1, min(step_timeout, remaining_timeout))
+    retries = int(step.get("retries", 1))
+
+    run_res = run_orchestrate(agent, task, timeout_s, caller, workflow_run_id, sid, retries)
+
+    step_result: dict[str, Any] = {
+        "step_id": sid,
+        "agent": agent,
+        "status": run_res.get("status", "failed"),
+        "result": run_res.get("result"),
+        "notes": run_res.get("notes"),
+        "run": run_res,
+        "started_at": datetime.fromtimestamp(start, tz=timezone.utc).isoformat(),
+        "finished_at": now_iso(),
+        "duration_s": round(time.time() - start, 3),
+    }
+
+    validation_cfg = step.get("validation")
+    if validation_cfg and step_result["status"] in {"complete", "partial"}:
+        v_agent = validation_cfg.get("agent")
+        criteria = validation_cfg.get("criteria", "Validate this output for quality and correctness.")
+        if v_agent:
+            v_task = (
+                "Validate the following workflow step output. Return a decision in JSON like "
+                "{\"decision\":\"accept|reject\",\"reason\":\"...\"}.\n\n"
+                f"Step ID: {sid}\n"
+                f"Criteria: {criteria}\n\n"
+                f"Output to validate:\n{step_result.get('result')}"
+            )
+            v_timeout = int(validation_cfg.get("timeout", min(180, timeout_s)))
+            validation_res = run_orchestrate(v_agent, v_task, max(1, v_timeout), caller, workflow_run_id, f"{sid}__validation", 1)
+            step_result["validation"] = validation_res
+            if not validation_passed(validation_res):
+                step_result["status"] = "failed"
+                step_result["error"] = "validation_failed"
+                step_result["notes"] = f"Validation failed by {v_agent}: {validation_res.get('result') or validation_res.get('notes')}"
+
+    with lock:
+        (out_dir / f"{sid}.json").write_text(json.dumps(step_result, indent=2))
+    return step_result
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run YAML workflows using orchestrate.py")
+    parser.add_argument("workflow")
+    parser.add_argument("--input", action="append", default=[], help="key=value (repeatable)")
+    parser.add_argument("--caller", default="manager")
+    parser.add_argument("--dry-run", action="store_true")
+    parser.add_argument("--non-interactive", action="store_true")
+    parser.add_argument("--timeout", type=int, default=1800, help="Overall workflow timeout seconds")
+    args = parser.parse_args()
+
+    wf_path = resolve_workflow_path(args.workflow)
+    wf = load_workflow(wf_path)
+    inputs = parse_inputs(args.input)
+
+    steps = wf["steps"]
+    step_map, deps, reverse, layers = validate_graph(steps)
+
+    workflow_run_id = f"wf-{int(time.time())}-{uuid.uuid4().hex[:8]}"
+    out_dir = HANDOFF_WORKFLOWS_DIR / workflow_run_id
+    out_dir.mkdir(parents=True, exist_ok=True)
+
+    if args.dry_run:
+        plan = {
+            "status": "dry_run",
+            "workflow": wf.get("name", wf_path.name),
+            "workflow_file": str(wf_path),
+            "workflow_run_id": workflow_run_id,
+            "inputs": inputs,
+            "steps": [
+                {
+                    "id": s["id"],
+                    "agent": s.get("agent"),
+                    "depends_on": s.get("depends_on", []),
+                    "timeout": s.get("timeout", 300),
+                    "retries": s.get("retries", 1),
+                    "approval_gate": s.get("approval_gate"),
+                    "has_validation": bool(s.get("validation")),
+                }
+                for s in steps
+            ],
+            "execution_layers": layers,
+            "result_dir": str(out_dir),
+        }
+        print(json.dumps(plan, indent=2))
+        return
+
+    started = time.time()
+    deadline = started + args.timeout
+    lock = threading.Lock()
+
+    state: dict[str, str] = {sid: "pending" for sid in step_map}
+    step_results: dict[str, dict[str, Any]] = {}
+    step_outputs: dict[str, Any] = {}
+    overall_status = "complete"
+
+    max_workers = max(1, min(len(step_map), (os.cpu_count() or 4)))
+
+    while True:
+        if time.time() >= deadline:
+            overall_status = "timeout"
+            break
+
+        pending = [sid for sid, st in state.items() if st == "pending"]
+        if not pending:
+            break
+
+        ready = []
+        for sid in pending:
+            if all(state[d] in {"complete", "skipped"} for d in deps[sid]):
+                ready.append(sid)
+
+        if not ready:
+            # deadlock due to upstream abort/fail on pending deps
+            if any(st == "aborted" for st in state.values()):
+                break
+            overall_status = "failed"
+            break
+
+        futures = {}
+        with ThreadPoolExecutor(max_workers=max_workers) as pool:
+            for sid in ready:
+                state[sid] = "running"
+                remaining_timeout = int(max(1, deadline - time.time()))
+                futures[pool.submit(
+                    execute_step,
+                    step_map[sid],
+                    inputs,
+                    step_outputs,
+                    args.caller,
+                    workflow_run_id,
+                    remaining_timeout,
+                    args.non_interactive,
+                    out_dir,
+                    lock,
+                )] = sid
+
+            for fut in as_completed(futures):
+                sid = futures[fut]
+                res = fut.result()
+                step_results[sid] = res
+                st = res.get("status", "failed")
+
+                if st in {"complete", "partial"}:
+                    state[sid] = "complete"
+                    step_outputs[sid] = res.get("result")
+                    out_name = step_map[sid].get("output")
+                    if out_name:
+                        step_outputs[str(out_name)] = res.get("result")
+                else:
+                    on_fail = str(step_map[sid].get("on_fail", "abort")).lower()
+                    if on_fail == "skip":
+                        state[sid] = "skipped"
+                        overall_status = "partial"
+                    else:
+                        state[sid] = "failed"
+                        overall_status = "failed"
+                        # abort all pending steps
+                        for psid in list(state):
+                            if state[psid] == "pending":
+                                state[psid] = "aborted"
+
+    finished = time.time()
+    if overall_status == "complete" and any(st == "skipped" for st in state.values()):
+        overall_status = "partial"
+
+    summary = {
+        "status": overall_status,
+        "workflow": wf.get("name", wf_path.name),
+        "workflow_file": str(wf_path),
+        "workflow_run_id": workflow_run_id,
+        "caller": args.caller,
+        "started_at": datetime.fromtimestamp(started, tz=timezone.utc).isoformat(),
+        "finished_at": datetime.fromtimestamp(finished, tz=timezone.utc).isoformat(),
+        "duration_s": round(finished - started, 3),
+        "timeout_s": args.timeout,
+        "inputs": inputs,
+        "state": state,
+        "results": step_results,
+        "result_dir": str(out_dir),
+        "notifications": wf.get("notifications", {}),
+    }
+
+    (out_dir / "summary.json").write_text(json.dumps(summary, indent=2))
+    print(json.dumps(summary, indent=2))
+
+    if overall_status in {"complete", "partial"}:
+        sys.exit(0)
+    sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/hq/workspaces/shared/skills/orchestrate/workflow.sh
+++ b/hq/workspaces/shared/skills/orchestrate/workflow.sh
@@ -0,0 +1,2 @@
+#!/usr/bin/env bash
+exec python3 "$(dirname "$0")/workflow.py" "$@"
--- a/hq/workspaces/shared/workflows/design-review.yaml
+++ b/hq/workspaces/shared/workflows/design-review.yaml
@@ -0,0 +1,57 @@
+name: Design Review
+description: Multi-agent design review pipeline
+trigger: manual
+
+inputs:
+  design_description:
+    type: text
+    description: "What is being reviewed"
+  requirements:
+    type: text
+    description: "Requirements to review against"
+
+steps:
+  - id: technical_review
+    agent: tech-lead
+    task: |
+      Perform a technical review of the following design:
+
+      DESIGN: {inputs.design_description}
+      REQUIREMENTS: {inputs.requirements}
+
+      Assess: structural adequacy, thermal performance, manufacturability,
+      and compliance with requirements. Identify risks and gaps.
+    timeout: 300
+
+  - id: optimization_review
+    agent: optimizer
+    task: |
+      Assess optimization potential for the following design:
+
+      DESIGN: {inputs.design_description}
+      REQUIREMENTS: {inputs.requirements}
+
+      Identify: parameters that could be optimized, potential weight/cost savings,
+      and whether a formal optimization study is warranted.
+    timeout: 300
+
+  # These two run in PARALLEL (no dependency between them)
+
+  - id: audit
+    agent: auditor
+    task: |
+      Perform a final quality review combining both the technical and optimization assessments:
+
+      TECHNICAL REVIEW:
+      {technical_review}
+
+      OPTIMIZATION REVIEW:
+      {optimization_review}
+
+      Assess completeness, identify conflicts between reviewers, and provide
+      a consolidated recommendation.
+    depends_on: [technical_review, optimization_review]
+    timeout: 180
+
+notifications:
+  on_complete: "Design review complete"
--- a/hq/workspaces/shared/workflows/material-trade-study.yaml
+++ b/hq/workspaces/shared/workflows/material-trade-study.yaml
@@ -0,0 +1,58 @@
+name: Material Trade Study
+description: Research, evaluate, and audit material options for optical components
+trigger: manual
+
+inputs:
+  materials:
+    type: list
+    description: "Materials to compare"
+  requirements:
+    type: text
+    description: "Performance requirements and constraints"
+
+steps:
+  - id: research
+    agent: webster
+    task: |
+      Research the following materials: {inputs.materials}
+      For each material, find: CTE (with temperature range), density, Young's modulus,
+      cost per kg, lead time, availability, and any known issues for optical applications.
+      Provide sources for all data.
+    timeout: 180
+    retries: 2
+    output: material_data
+
+  - id: evaluate
+    agent: tech-lead
+    task: |
+      Evaluate these materials against our requirements:
+
+      REQUIREMENTS:
+      {inputs.requirements}
+
+      MATERIAL DATA:
+      {research}
+
+      Provide a recommendation with full rationale. Include a comparison matrix.
+    depends_on: [research]
+    timeout: 300
+    retries: 1
+    output: technical_assessment
+
+  - id: audit
+    agent: auditor
+    task: |
+      Review this material trade study for completeness, methodological rigor,
+      and potential gaps:
+
+      {evaluate}
+
+      Check: Are all requirements addressed? Are sources credible?
+      Are there materials that should have been considered but weren't?
+    depends_on: [evaluate]
+    timeout: 180
+    output: audit_result
+
+notifications:
+  on_complete: "Workflow complete"
+  on_failure: "Workflow failed"
--- a/hq/workspaces/shared/workflows/quick-research.yaml
+++ b/hq/workspaces/shared/workflows/quick-research.yaml
@@ -0,0 +1,29 @@
+name: Quick Research
+description: Fast web research with technical validation
+trigger: manual
+
+inputs:
+  query:
+    type: text
+    description: "Research question"
+
+steps:
+  - id: research
+    agent: webster
+    task: "{inputs.query}"
+    timeout: 120
+    retries: 1
+
+  - id: validate
+    agent: tech-lead
+    task: |
+      Verify these research findings are accurate and relevant for engineering use:
+
+      {research}
+
+      Flag any concerns about accuracy, missing context, or applicability.
+    depends_on: [research]
+    timeout: 180
+
+notifications:
+  on_complete: "Research complete"