Files
Atomizer/docs/plans/CANVAS_ROBUSTNESS_PLAN.md
Anto01 c224b16ac3 feat: Add panel management, validation, and error handling to canvas
Phase 1 - Panel Management System:
- Create usePanelStore.ts for centralized panel state management
- Add PanelContainer.tsx for draggable floating panels
- Create FloatingIntrospectionPanel.tsx (persistent, doesn't disappear on node click)
- Create ResultsPanel.tsx for trial result details
- Refactor NodeConfigPanelV2 to use panel store for introspection
- Integrate PanelContainer into CanvasView

Phase 2 - Pre-run Validation:
- Create specValidator.ts with comprehensive validation rules
- Add ValidationPanel (enhanced version with error navigation)
- Add Validate button to SpecRenderer with status indicator
- Block run if validation fails
- Check for: design vars, objectives, extractors, bounds, connections

Phase 3 - Error Handling & Recovery:
- Create ErrorPanel.tsx for displaying optimization errors
- Add error classification (nx_crash, solver_fail, extractor_error, etc.)
- Add recovery suggestions based on error type
- Update status endpoint to return error info
- Add _get_study_error_info helper to check error_status.json and DB
- Integrate error detection into status polling

Documentation:
- Add CANVAS_ROBUSTNESS_PLAN.md with full implementation plan
2026-01-21 21:35:31 -05:00

13 KiB

Canvas Builder Robustness & Enhancement Plan

Created: January 21, 2026
Branch: feature/studio-enhancement
Status: Planning


Executive Summary

This plan addresses critical issues and enhancements to make the Canvas Builder robust and production-ready:

  1. Panel Management - Panels (Introspection, Config, Chat) disappear unexpectedly
  2. Pre-run Validation - No validation before starting optimization
  3. Error Handling - Poor feedback when things go wrong
  4. Live Updates - Polling is inefficient; need WebSocket
  5. Visualization - No convergence charts or progress indicators
  6. Testing - No automated tests for critical flows

Phase 1: Panel Management System (HIGH PRIORITY)

Problem

  • IntrospectionPanel disappears when user clicks elsewhere on canvas
  • Panel state is lost (e.g., introspection results, expanded sections)
  • No way to have multiple panels open simultaneously
  • Chat panel and Config panel are mutually exclusive

Root Cause

// Current: Local state in ModelNodeConfig (NodeConfigPanelV2.tsx:275)
const [showIntrospection, setShowIntrospection] = useState(false);

// When selectedNodeId changes, ModelNodeConfig unmounts, losing state

Solution: Centralized Panel Store

Create usePanelStore.ts - a Zustand store for panel management:

// atomizer-dashboard/frontend/src/hooks/usePanelStore.ts

interface PanelState {
  // Panel visibility
  panels: {
    introspection: { open: boolean; filePath?: string; data?: IntrospectionResult };
    config: { open: boolean; nodeId?: string };
    chat: { open: boolean; powerMode: boolean };
    validation: { open: boolean; errors?: ValidationError[] };
    results: { open: boolean; trialId?: number };
  };
  
  // Actions
  openPanel: (panel: PanelName, data?: any) => void;
  closePanel: (panel: PanelName) => void;
  togglePanel: (panel: PanelName) => void;
  
  // Panel data persistence
  setIntrospectionData: (data: IntrospectionResult) => void;
  clearIntrospectionData: () => void;
}

Implementation Tasks

Task File Description
1.1 usePanelStore.ts Create Zustand store for panel state
1.2 PanelContainer.tsx Create container that renders open panels
1.3 IntrospectionPanel.tsx Refactor to use store instead of local state
1.4 NodeConfigPanelV2.tsx Remove local panel state, use store
1.5 CanvasView.tsx Integrate PanelContainer, remove chat panel logic
1.6 SpecRenderer.tsx Add panel trigger buttons (introspect, validate)

UI Changes

Before:

[Canvas] [Config Panel OR Chat Panel]
         ↑ mutually exclusive

After:

[Canvas] [Right Panel Area]
         ├── Config Panel (pinnable)
         ├── Chat Panel (collapsible)
         └── Floating Panels:
             ├── Introspection (draggable, persistent)
             ├── Validation Results
             └── Trial Details

Panel Behaviors

Panel Trigger Persistence Position
Config Node click While node selected Right sidebar
Chat Toggle button Always available Right sidebar (below config)
Introspection "Introspect" button Until explicitly closed Floating, draggable
Validation "Validate" or pre-run Until fixed or dismissed Floating
Results Click on result badge Until dismissed Floating

Phase 2: Pre-run Validation (HIGH PRIORITY)

Problem

  • User can click "Run" with incomplete spec
  • No feedback about missing extractors, objectives, or connections
  • Optimization fails silently or with cryptic errors

Solution: Validation Pipeline

// Types of validation
interface ValidationResult {
  valid: boolean;
  errors: ValidationError[];   // Must fix before running
  warnings: ValidationWarning[]; // Can proceed but risky
}

interface ValidationError {
  code: string;
  severity: 'error' | 'warning';
  path: string;       // e.g., "objectives[0]"
  message: string;
  suggestion?: string;
  autoFix?: () => void;
}

Validation Rules

Rule Severity Message
No design variables Error "Add at least one design variable"
No objectives Error "Add at least one objective"
Objective not connected to extractor Error "Objective '{name}' has no source extractor"
Extractor type not set Error "Extractor '{name}' needs a type selected"
Design var bounds invalid Error "Min must be less than max for '{name}'"
No model file Error "No simulation file configured"
Custom extractor no code Warning "Custom extractor '{name}' has no code"
High trial count (>500) Warning "Large budget may take hours to complete"
Single trial Warning "Only 1 trial - results won't be meaningful"

Implementation Tasks

Task File Description
2.1 validation/specValidator.ts Client-side validation rules
2.2 ValidationPanel.tsx Display validation results
2.3 SpecRenderer.tsx Add "Validate" button, pre-run check
2.4 api/routes/spec.py Server-side validation endpoint
2.5 useSpecStore.ts Add validate() action

UI Flow

User clicks "Run Optimization"
    ↓
[Validate Spec] ──failed──→ [Show ValidationPanel]
    ↓ passed                      │
[Confirm Dialog]                  │
    ↓ confirmed                   │
[Start Optimization] ←── fix ─────┘

Phase 3: Error Handling & Recovery (HIGH PRIORITY)

Problem

  • NX crashes don't show useful feedback
  • Solver failures leave user confused
  • No way to resume after errors

Solution: Error Classification & Display

interface OptimizationError {
  type: 'nx_crash' | 'solver_fail' | 'extractor_error' | 'config_error' | 'system_error';
  trial?: number;
  message: string;
  details?: string;
  recoverable: boolean;
  suggestions: string[];
}

Error Handling Strategy

Error Type Display Recovery
NX Crash Toast + Error Panel Retry trial, skip trial
Solver Failure Badge on trial Mark infeasible, continue
Extractor Error Log + badge Use NaN, continue
Config Error Block run Show validation panel
System Error Full modal Restart optimization

Implementation Tasks

Task File Description
3.1 ErrorBoundary.tsx Wrap canvas in error boundary
3.2 ErrorPanel.tsx Detailed error display with suggestions
3.3 optimization.py Enhanced error responses with type/recovery
3.4 SpecRenderer.tsx Error state handling, retry buttons
3.5 useOptimizationStatus.ts Hook for status polling with error handling

Phase 4: Live Updates via WebSocket (MEDIUM PRIORITY)

Problem

  • Current polling (3s) is inefficient and has latency
  • Missed updates between polls
  • No real-time progress indication

Solution: WebSocket for Trial Updates

// WebSocket events
interface TrialStartEvent {
  type: 'trial_start';
  trial_number: number;
  params: Record<string, number>;
}

interface TrialCompleteEvent {
  type: 'trial_complete';
  trial_number: number;
  objectives: Record<string, number>;
  is_best: boolean;
  is_feasible: boolean;
}

interface OptimizationCompleteEvent {
  type: 'optimization_complete';
  best_trial: number;
  total_trials: number;
}

Implementation Tasks

Task File Description
4.1 websocket.py Add optimization events to WS
4.2 run_optimization.py Emit events during optimization
4.3 useOptimizationWebSocket.ts Hook for WS subscription
4.4 SpecRenderer.tsx Use WS instead of polling
4.5 ResultBadge.tsx Animate on new results

Phase 5: Convergence Visualization (MEDIUM PRIORITY)

Problem

  • No visual feedback on optimization progress
  • Can't tell if converging or stuck
  • No Pareto front visualization for multi-objective

Solution: Embedded Charts

Components

Component Description
ConvergenceSparkline Tiny chart in ObjectiveNode showing trend
ProgressRing Circular progress in header (trials/total)
ConvergenceChart Full chart in Results panel
ParetoPlot 2D Pareto front for multi-objective

Implementation Tasks

Task File Description
5.1 ConvergenceSparkline.tsx SVG sparkline component
5.2 ObjectiveNode.tsx Integrate sparkline
5.3 ProgressRing.tsx Circular progress indicator
5.4 ConvergenceChart.tsx Full chart with Recharts
5.5 ResultsPanel.tsx Panel showing detailed results

Phase 6: End-to-End Testing (MEDIUM PRIORITY)

Problem

  • No automated tests for canvas operations
  • Manual testing is time-consuming and error-prone
  • Regressions go unnoticed

Solution: Playwright E2E Tests

Test Scenarios

Test Steps Assertions
Load study Navigate to /canvas/{id} Spec loads, nodes render
Add design var Drag from palette Node appears, spec updates
Connect nodes Drag edge Edge renders, spec has edge
Edit node Click node, change value Value persists, API called
Run validation Click validate Errors shown for incomplete
Start optimization Complete spec, click run Status shows running
View results Wait for trial Badge shows value
Stop optimization Click stop Status shows stopped

Implementation Tasks

Task File Description
6.1 e2e/canvas.spec.ts Basic canvas operations
6.2 e2e/optimization.spec.ts Run/stop/status flow
6.3 e2e/panels.spec.ts Panel open/close/persist
6.4 playwright.config.ts Configure Playwright
6.5 CI workflow Run tests in GitHub Actions

Implementation Order

Week 1:
├── Phase 1: Panel Management (critical UX fix)
│   ├── Day 1-2: usePanelStore + PanelContainer
│   └── Day 3-4: Refactor existing panels
│
├── Phase 2: Validation (prevent user errors)
│   └── Day 5: Validation rules + UI

Week 2:
├── Phase 3: Error Handling
│   ├── Day 1-2: Error types + ErrorPanel
│   └── Day 3: Integration with optimization flow
│
├── Phase 4: WebSocket Updates
│   └── Day 4-5: WS events + frontend hook

Week 3:
├── Phase 5: Visualization
│   ├── Day 1-2: Sparklines
│   └── Day 3: Progress indicators
│
├── Phase 6: Testing
│   └── Day 4-5: Playwright setup + core tests

Quick Wins (Can Do Now)

These can be implemented immediately with minimal changes:

  1. Persist introspection data in localStorage

    • Cache introspection results
    • Restore on panel reopen
  2. Add loading states to all buttons

    • Disable during operations
    • Show spinners
  3. Add confirmation dialogs

    • Before stopping optimization
    • Before clearing canvas
  4. Improve error messages

    • Parse NX error logs
    • Show actionable suggestions

Files to Create/Modify

New Files

atomizer-dashboard/frontend/src/
├── hooks/
│   ├── usePanelStore.ts
│   └── useOptimizationWebSocket.ts
├── components/canvas/
│   ├── PanelContainer.tsx
│   ├── panels/
│   │   ├── ValidationPanel.tsx
│   │   ├── ErrorPanel.tsx
│   │   └── ResultsPanel.tsx
│   └── visualization/
│       ├── ConvergenceSparkline.tsx
│       ├── ProgressRing.tsx
│       └── ConvergenceChart.tsx
└── lib/
    └── validation/
        └── specValidator.ts

e2e/
├── canvas.spec.ts
├── optimization.spec.ts
└── panels.spec.ts

Modified Files

atomizer-dashboard/frontend/src/
├── pages/CanvasView.tsx
├── components/canvas/SpecRenderer.tsx
├── components/canvas/panels/IntrospectionPanel.tsx
├── components/canvas/panels/NodeConfigPanelV2.tsx
├── components/canvas/nodes/ObjectiveNode.tsx
└── hooks/useSpecStore.ts

atomizer-dashboard/backend/api/
├── routes/optimization.py
├── routes/spec.py
└── websocket.py

Success Criteria

Phase Success Metric
1 Introspection panel persists across node selections
2 Invalid spec shows clear error before run
3 NX errors display with recovery options
4 Results update within 500ms of trial completion
5 Convergence trend visible on objective nodes
6 All E2E tests pass in CI

Next Steps

  1. Review this plan
  2. Start with Phase 1 (Panel Management) - fixes your immediate issue
  3. Implement incrementally, commit after each phase