Files
Atomizer/docs/archive/review/CANVAS_ROBUSTNESS_PLAN.md
Antoine 8d9d55356c docs: Archive stale docs and create Atomizer-HQ agent documentation
Archive Management:
- Moved RALPH_LOOP, CANVAS, and dashboard implementation plans to archive/review/ for CEO review
- Moved completed restructuring plan and protocol v1 to archive/historical/
- Moved old session summaries to archive/review/

New HQ Documentation (docs/hq/):
- README.md: Overview of Atomizer-HQ multi-agent optimization team
- PROJECT_STRUCTURE.md: Standard KB-integrated project layout with Hydrotech reference
- KB_CONVENTIONS.md: Knowledge Base accumulation principles with generation tracking
- AGENT_WORKFLOWS.md: Project lifecycle phases and agent handoffs (OP_09 integration)
- STUDY_CONVENTIONS.md: Technical study execution standards and atomizer_spec.json format

Index Update:
- Reorganized docs/00_INDEX.md with HQ docs prominent
- Updated structure to reflect new agent-focused organization
- Maintained core documentation access for engineers

No files deleted, only moved to appropriate archive locations.
2026-02-09 02:48:35 +00:00

13 KiB

Canvas Builder Robustness & Enhancement Plan

Created: January 21, 2026
Branch: feature/studio-enhancement
Status: Planning


Executive Summary

This plan addresses critical issues and enhancements to make the Canvas Builder robust and production-ready:

  1. Panel Management - Panels (Introspection, Config, Chat) disappear unexpectedly
  2. Pre-run Validation - No validation before starting optimization
  3. Error Handling - Poor feedback when things go wrong
  4. Live Updates - Polling is inefficient; need WebSocket
  5. Visualization - No convergence charts or progress indicators
  6. Testing - No automated tests for critical flows

Phase 1: Panel Management System (HIGH PRIORITY)

Problem

  • IntrospectionPanel disappears when user clicks elsewhere on canvas
  • Panel state is lost (e.g., introspection results, expanded sections)
  • No way to have multiple panels open simultaneously
  • Chat panel and Config panel are mutually exclusive

Root Cause

// Current: Local state in ModelNodeConfig (NodeConfigPanelV2.tsx:275)
const [showIntrospection, setShowIntrospection] = useState(false);

// When selectedNodeId changes, ModelNodeConfig unmounts, losing state

Solution: Centralized Panel Store

Create usePanelStore.ts - a Zustand store for panel management:

// atomizer-dashboard/frontend/src/hooks/usePanelStore.ts

interface PanelState {
  // Panel visibility
  panels: {
    introspection: { open: boolean; filePath?: string; data?: IntrospectionResult };
    config: { open: boolean; nodeId?: string };
    chat: { open: boolean; powerMode: boolean };
    validation: { open: boolean; errors?: ValidationError[] };
    results: { open: boolean; trialId?: number };
  };
  
  // Actions
  openPanel: (panel: PanelName, data?: any) => void;
  closePanel: (panel: PanelName) => void;
  togglePanel: (panel: PanelName) => void;
  
  // Panel data persistence
  setIntrospectionData: (data: IntrospectionResult) => void;
  clearIntrospectionData: () => void;
}

Implementation Tasks

Task File Description
1.1 usePanelStore.ts Create Zustand store for panel state
1.2 PanelContainer.tsx Create container that renders open panels
1.3 IntrospectionPanel.tsx Refactor to use store instead of local state
1.4 NodeConfigPanelV2.tsx Remove local panel state, use store
1.5 CanvasView.tsx Integrate PanelContainer, remove chat panel logic
1.6 SpecRenderer.tsx Add panel trigger buttons (introspect, validate)

UI Changes

Before:

[Canvas] [Config Panel OR Chat Panel]
         ↑ mutually exclusive

After:

[Canvas] [Right Panel Area]
         ├── Config Panel (pinnable)
         ├── Chat Panel (collapsible)
         └── Floating Panels:
             ├── Introspection (draggable, persistent)
             ├── Validation Results
             └── Trial Details

Panel Behaviors

Panel Trigger Persistence Position
Config Node click While node selected Right sidebar
Chat Toggle button Always available Right sidebar (below config)
Introspection "Introspect" button Until explicitly closed Floating, draggable
Validation "Validate" or pre-run Until fixed or dismissed Floating
Results Click on result badge Until dismissed Floating

Phase 2: Pre-run Validation (HIGH PRIORITY)

Problem

  • User can click "Run" with incomplete spec
  • No feedback about missing extractors, objectives, or connections
  • Optimization fails silently or with cryptic errors

Solution: Validation Pipeline

// Types of validation
interface ValidationResult {
  valid: boolean;
  errors: ValidationError[];   // Must fix before running
  warnings: ValidationWarning[]; // Can proceed but risky
}

interface ValidationError {
  code: string;
  severity: 'error' | 'warning';
  path: string;       // e.g., "objectives[0]"
  message: string;
  suggestion?: string;
  autoFix?: () => void;
}

Validation Rules

Rule Severity Message
No design variables Error "Add at least one design variable"
No objectives Error "Add at least one objective"
Objective not connected to extractor Error "Objective '{name}' has no source extractor"
Extractor type not set Error "Extractor '{name}' needs a type selected"
Design var bounds invalid Error "Min must be less than max for '{name}'"
No model file Error "No simulation file configured"
Custom extractor no code Warning "Custom extractor '{name}' has no code"
High trial count (>500) Warning "Large budget may take hours to complete"
Single trial Warning "Only 1 trial - results won't be meaningful"

Implementation Tasks

Task File Description
2.1 validation/specValidator.ts Client-side validation rules
2.2 ValidationPanel.tsx Display validation results
2.3 SpecRenderer.tsx Add "Validate" button, pre-run check
2.4 api/routes/spec.py Server-side validation endpoint
2.5 useSpecStore.ts Add validate() action

UI Flow

User clicks "Run Optimization"
    ↓
[Validate Spec] ──failed──→ [Show ValidationPanel]
    ↓ passed                      │
[Confirm Dialog]                  │
    ↓ confirmed                   │
[Start Optimization] ←── fix ─────┘

Phase 3: Error Handling & Recovery (HIGH PRIORITY)

Problem

  • NX crashes don't show useful feedback
  • Solver failures leave user confused
  • No way to resume after errors

Solution: Error Classification & Display

interface OptimizationError {
  type: 'nx_crash' | 'solver_fail' | 'extractor_error' | 'config_error' | 'system_error';
  trial?: number;
  message: string;
  details?: string;
  recoverable: boolean;
  suggestions: string[];
}

Error Handling Strategy

Error Type Display Recovery
NX Crash Toast + Error Panel Retry trial, skip trial
Solver Failure Badge on trial Mark infeasible, continue
Extractor Error Log + badge Use NaN, continue
Config Error Block run Show validation panel
System Error Full modal Restart optimization

Implementation Tasks

Task File Description
3.1 ErrorBoundary.tsx Wrap canvas in error boundary
3.2 ErrorPanel.tsx Detailed error display with suggestions
3.3 optimization.py Enhanced error responses with type/recovery
3.4 SpecRenderer.tsx Error state handling, retry buttons
3.5 useOptimizationStatus.ts Hook for status polling with error handling

Phase 4: Live Updates via WebSocket (MEDIUM PRIORITY)

Problem

  • Current polling (3s) is inefficient and has latency
  • Missed updates between polls
  • No real-time progress indication

Solution: WebSocket for Trial Updates

// WebSocket events
interface TrialStartEvent {
  type: 'trial_start';
  trial_number: number;
  params: Record<string, number>;
}

interface TrialCompleteEvent {
  type: 'trial_complete';
  trial_number: number;
  objectives: Record<string, number>;
  is_best: boolean;
  is_feasible: boolean;
}

interface OptimizationCompleteEvent {
  type: 'optimization_complete';
  best_trial: number;
  total_trials: number;
}

Implementation Tasks

Task File Description
4.1 websocket.py Add optimization events to WS
4.2 run_optimization.py Emit events during optimization
4.3 useOptimizationWebSocket.ts Hook for WS subscription
4.4 SpecRenderer.tsx Use WS instead of polling
4.5 ResultBadge.tsx Animate on new results

Phase 5: Convergence Visualization (MEDIUM PRIORITY)

Problem

  • No visual feedback on optimization progress
  • Can't tell if converging or stuck
  • No Pareto front visualization for multi-objective

Solution: Embedded Charts

Components

Component Description
ConvergenceSparkline Tiny chart in ObjectiveNode showing trend
ProgressRing Circular progress in header (trials/total)
ConvergenceChart Full chart in Results panel
ParetoPlot 2D Pareto front for multi-objective

Implementation Tasks

Task File Description
5.1 ConvergenceSparkline.tsx SVG sparkline component
5.2 ObjectiveNode.tsx Integrate sparkline
5.3 ProgressRing.tsx Circular progress indicator
5.4 ConvergenceChart.tsx Full chart with Recharts
5.5 ResultsPanel.tsx Panel showing detailed results

Phase 6: End-to-End Testing (MEDIUM PRIORITY)

Problem

  • No automated tests for canvas operations
  • Manual testing is time-consuming and error-prone
  • Regressions go unnoticed

Solution: Playwright E2E Tests

Test Scenarios

Test Steps Assertions
Load study Navigate to /canvas/{id} Spec loads, nodes render
Add design var Drag from palette Node appears, spec updates
Connect nodes Drag edge Edge renders, spec has edge
Edit node Click node, change value Value persists, API called
Run validation Click validate Errors shown for incomplete
Start optimization Complete spec, click run Status shows running
View results Wait for trial Badge shows value
Stop optimization Click stop Status shows stopped

Implementation Tasks

Task File Description
6.1 e2e/canvas.spec.ts Basic canvas operations
6.2 e2e/optimization.spec.ts Run/stop/status flow
6.3 e2e/panels.spec.ts Panel open/close/persist
6.4 playwright.config.ts Configure Playwright
6.5 CI workflow Run tests in GitHub Actions

Implementation Order

Week 1:
├── Phase 1: Panel Management (critical UX fix)
│   ├── Day 1-2: usePanelStore + PanelContainer
│   └── Day 3-4: Refactor existing panels
│
├── Phase 2: Validation (prevent user errors)
│   └── Day 5: Validation rules + UI

Week 2:
├── Phase 3: Error Handling
│   ├── Day 1-2: Error types + ErrorPanel
│   └── Day 3: Integration with optimization flow
│
├── Phase 4: WebSocket Updates
│   └── Day 4-5: WS events + frontend hook

Week 3:
├── Phase 5: Visualization
│   ├── Day 1-2: Sparklines
│   └── Day 3: Progress indicators
│
├── Phase 6: Testing
│   └── Day 4-5: Playwright setup + core tests

Quick Wins (Can Do Now)

These can be implemented immediately with minimal changes:

  1. Persist introspection data in localStorage

    • Cache introspection results
    • Restore on panel reopen
  2. Add loading states to all buttons

    • Disable during operations
    • Show spinners
  3. Add confirmation dialogs

    • Before stopping optimization
    • Before clearing canvas
  4. Improve error messages

    • Parse NX error logs
    • Show actionable suggestions

Files to Create/Modify

New Files

atomizer-dashboard/frontend/src/
├── hooks/
│   ├── usePanelStore.ts
│   └── useOptimizationWebSocket.ts
├── components/canvas/
│   ├── PanelContainer.tsx
│   ├── panels/
│   │   ├── ValidationPanel.tsx
│   │   ├── ErrorPanel.tsx
│   │   └── ResultsPanel.tsx
│   └── visualization/
│       ├── ConvergenceSparkline.tsx
│       ├── ProgressRing.tsx
│       └── ConvergenceChart.tsx
└── lib/
    └── validation/
        └── specValidator.ts

e2e/
├── canvas.spec.ts
├── optimization.spec.ts
└── panels.spec.ts

Modified Files

atomizer-dashboard/frontend/src/
├── pages/CanvasView.tsx
├── components/canvas/SpecRenderer.tsx
├── components/canvas/panels/IntrospectionPanel.tsx
├── components/canvas/panels/NodeConfigPanelV2.tsx
├── components/canvas/nodes/ObjectiveNode.tsx
└── hooks/useSpecStore.ts

atomizer-dashboard/backend/api/
├── routes/optimization.py
├── routes/spec.py
└── websocket.py

Success Criteria

Phase Success Metric
1 Introspection panel persists across node selections
2 Invalid spec shows clear error before run
3 NX errors display with recovery options
4 Results update within 500ms of trial completion
5 Convergence trend visible on objective nodes
6 All E2E tests pass in CI

Next Steps

  1. Review this plan
  2. Start with Phase 1 (Panel Management) - fixes your immediate issue
  3. Implement incrementally, commit after each phase