# Canvas Builder Robustness & Enhancement Plan

**Created**: January 21, 2026  
**Branch**: `feature/studio-enhancement`  
**Status**: Planning

---

## Executive Summary

This plan addresses critical issues and enhancements to make the Canvas Builder robust and production-ready:

1. **Panel Management** - Panels (Introspection, Config, Chat) disappear unexpectedly
2. **Pre-run Validation** - No validation before starting optimization
3. **Error Handling** - Poor feedback when things go wrong
4. **Live Updates** - Polling is inefficient; need WebSocket
5. **Visualization** - No convergence charts or progress indicators
6. **Testing** - No automated tests for critical flows

---

## Phase 1: Panel Management System (HIGH PRIORITY)

### Problem
- IntrospectionPanel disappears when user clicks elsewhere on canvas
- Panel state is lost (e.g., introspection results, expanded sections)
- No way to have multiple panels open simultaneously
- Chat panel and Config panel are mutually exclusive

### Root Cause
```typescript
// Current: Local state in ModelNodeConfig (NodeConfigPanelV2.tsx:275)
const [showIntrospection, setShowIntrospection] = useState(false);

// When selectedNodeId changes, ModelNodeConfig unmounts, losing state
```

### Solution: Centralized Panel Store

Create `usePanelStore.ts` - a Zustand store for panel management:

```typescript
// atomizer-dashboard/frontend/src/hooks/usePanelStore.ts

interface PanelState {
  // Panel visibility
  panels: {
    introspection: { open: boolean; filePath?: string; data?: IntrospectionResult };
    config: { open: boolean; nodeId?: string };
    chat: { open: boolean; powerMode: boolean };
    validation: { open: boolean; errors?: ValidationError[] };
    results: { open: boolean; trialId?: number };
  };
  
  // Actions
  openPanel: (panel: PanelName, data?: any) => void;
  closePanel: (panel: PanelName) => void;
  togglePanel: (panel: PanelName) => void;
  
  // Panel data persistence
  setIntrospectionData: (data: IntrospectionResult) => void;
  clearIntrospectionData: () => void;
}
```

### Implementation Tasks

| Task | File | Description |
|------|------|-------------|
| 1.1 | `usePanelStore.ts` | Create Zustand store for panel state |
| 1.2 | `PanelContainer.tsx` | Create container that renders open panels |
| 1.3 | `IntrospectionPanel.tsx` | Refactor to use store instead of local state |
| 1.4 | `NodeConfigPanelV2.tsx` | Remove local panel state, use store |
| 1.5 | `CanvasView.tsx` | Integrate PanelContainer, remove chat panel logic |
| 1.6 | `SpecRenderer.tsx` | Add panel trigger buttons (introspect, validate) |

### UI Changes

**Before:**
```
[Canvas] [Config Panel OR Chat Panel]
         ↑ mutually exclusive
```

**After:**
```
[Canvas] [Right Panel Area]
         ├── Config Panel (pinnable)
         ├── Chat Panel (collapsible)
         └── Floating Panels:
             ├── Introspection (draggable, persistent)
             ├── Validation Results
             └── Trial Details
```

### Panel Behaviors

| Panel | Trigger | Persistence | Position |
|-------|---------|-------------|----------|
| **Config** | Node click | While node selected | Right sidebar |
| **Chat** | Toggle button | Always available | Right sidebar (below config) |
| **Introspection** | "Introspect" button | Until explicitly closed | Floating, draggable |
| **Validation** | "Validate" or pre-run | Until fixed or dismissed | Floating |
| **Results** | Click on result badge | Until dismissed | Floating |

---

## Phase 2: Pre-run Validation (HIGH PRIORITY)

### Problem
- User can click "Run" with incomplete spec
- No feedback about missing extractors, objectives, or connections
- Optimization fails silently or with cryptic errors

### Solution: Validation Pipeline

```typescript
// Types of validation
interface ValidationResult {
  valid: boolean;
  errors: ValidationError[];   // Must fix before running
  warnings: ValidationWarning[]; // Can proceed but risky
}

interface ValidationError {
  code: string;
  severity: 'error' | 'warning';
  path: string;       // e.g., "objectives[0]"
  message: string;
  suggestion?: string;
  autoFix?: () => void;
}
```

### Validation Rules

| Rule | Severity | Message |
|------|----------|---------|
| No design variables | Error | "Add at least one design variable" |
| No objectives | Error | "Add at least one objective" |
| Objective not connected to extractor | Error | "Objective '{name}' has no source extractor" |
| Extractor type not set | Error | "Extractor '{name}' needs a type selected" |
| Design var bounds invalid | Error | "Min must be less than max for '{name}'" |
| No model file | Error | "No simulation file configured" |
| Custom extractor no code | Warning | "Custom extractor '{name}' has no code" |
| High trial count (>500) | Warning | "Large budget may take hours to complete" |
| Single trial | Warning | "Only 1 trial - results won't be meaningful" |

### Implementation Tasks

| Task | File | Description |
|------|------|-------------|
| 2.1 | `validation/specValidator.ts` | Client-side validation rules |
| 2.2 | `ValidationPanel.tsx` | Display validation results |
| 2.3 | `SpecRenderer.tsx` | Add "Validate" button, pre-run check |
| 2.4 | `api/routes/spec.py` | Server-side validation endpoint |
| 2.5 | `useSpecStore.ts` | Add `validate()` action |

### UI Flow

```
User clicks "Run Optimization"
    ↓
[Validate Spec] ──failed──→ [Show ValidationPanel]
    ↓ passed                      │
[Confirm Dialog]                  │
    ↓ confirmed                   │
[Start Optimization] ←── fix ─────┘
```

---

## Phase 3: Error Handling & Recovery (HIGH PRIORITY)

### Problem
- NX crashes don't show useful feedback
- Solver failures leave user confused
- No way to resume after errors

### Solution: Error Classification & Display

```typescript
interface OptimizationError {
  type: 'nx_crash' | 'solver_fail' | 'extractor_error' | 'config_error' | 'system_error';
  trial?: number;
  message: string;
  details?: string;
  recoverable: boolean;
  suggestions: string[];
}
```

### Error Handling Strategy

| Error Type | Display | Recovery |
|------------|---------|----------|
| NX Crash | Toast + Error Panel | Retry trial, skip trial |
| Solver Failure | Badge on trial | Mark infeasible, continue |
| Extractor Error | Log + badge | Use NaN, continue |
| Config Error | Block run | Show validation panel |
| System Error | Full modal | Restart optimization |

### Implementation Tasks

| Task | File | Description |
|------|------|-------------|
| 3.1 | `ErrorBoundary.tsx` | Wrap canvas in error boundary |
| 3.2 | `ErrorPanel.tsx` | Detailed error display with suggestions |
| 3.3 | `optimization.py` | Enhanced error responses with type/recovery |
| 3.4 | `SpecRenderer.tsx` | Error state handling, retry buttons |
| 3.5 | `useOptimizationStatus.ts` | Hook for status polling with error handling |

---

## Phase 4: Live Updates via WebSocket (MEDIUM PRIORITY)

### Problem
- Current polling (3s) is inefficient and has latency
- Missed updates between polls
- No real-time progress indication

### Solution: WebSocket for Trial Updates

```typescript
// WebSocket events
interface TrialStartEvent {
  type: 'trial_start';
  trial_number: number;
  params: Record<string, number>;
}

interface TrialCompleteEvent {
  type: 'trial_complete';
  trial_number: number;
  objectives: Record<string, number>;
  is_best: boolean;
  is_feasible: boolean;
}

interface OptimizationCompleteEvent {
  type: 'optimization_complete';
  best_trial: number;
  total_trials: number;
}
```

### Implementation Tasks

| Task | File | Description |
|------|------|-------------|
| 4.1 | `websocket.py` | Add optimization events to WS |
| 4.2 | `run_optimization.py` | Emit events during optimization |
| 4.3 | `useOptimizationWebSocket.ts` | Hook for WS subscription |
| 4.4 | `SpecRenderer.tsx` | Use WS instead of polling |
| 4.5 | `ResultBadge.tsx` | Animate on new results |

---

## Phase 5: Convergence Visualization (MEDIUM PRIORITY)

### Problem
- No visual feedback on optimization progress
- Can't tell if converging or stuck
- No Pareto front visualization for multi-objective

### Solution: Embedded Charts

### Components

| Component | Description |
|-----------|-------------|
| `ConvergenceSparkline` | Tiny chart in ObjectiveNode showing trend |
| `ProgressRing` | Circular progress in header (trials/total) |
| `ConvergenceChart` | Full chart in Results panel |
| `ParetoPlot` | 2D Pareto front for multi-objective |

### Implementation Tasks

| Task | File | Description |
|------|------|-------------|
| 5.1 | `ConvergenceSparkline.tsx` | SVG sparkline component |
| 5.2 | `ObjectiveNode.tsx` | Integrate sparkline |
| 5.3 | `ProgressRing.tsx` | Circular progress indicator |
| 5.4 | `ConvergenceChart.tsx` | Full chart with Recharts |
| 5.5 | `ResultsPanel.tsx` | Panel showing detailed results |

---

## Phase 6: End-to-End Testing (MEDIUM PRIORITY)

### Problem
- No automated tests for canvas operations
- Manual testing is time-consuming and error-prone
- Regressions go unnoticed

### Solution: Playwright E2E Tests

### Test Scenarios

| Test | Steps | Assertions |
|------|-------|------------|
| Load study | Navigate to /canvas/{id} | Spec loads, nodes render |
| Add design var | Drag from palette | Node appears, spec updates |
| Connect nodes | Drag edge | Edge renders, spec has edge |
| Edit node | Click node, change value | Value persists, API called |
| Run validation | Click validate | Errors shown for incomplete |
| Start optimization | Complete spec, click run | Status shows running |
| View results | Wait for trial | Badge shows value |
| Stop optimization | Click stop | Status shows stopped |

### Implementation Tasks

| Task | File | Description |
|------|------|-------------|
| 6.1 | `e2e/canvas.spec.ts` | Basic canvas operations |
| 6.2 | `e2e/optimization.spec.ts` | Run/stop/status flow |
| 6.3 | `e2e/panels.spec.ts` | Panel open/close/persist |
| 6.4 | `playwright.config.ts` | Configure Playwright |
| 6.5 | `CI workflow` | Run tests in GitHub Actions |

---

## Implementation Order

```
Week 1:
├── Phase 1: Panel Management (critical UX fix)
│   ├── Day 1-2: usePanelStore + PanelContainer
│   └── Day 3-4: Refactor existing panels
│
├── Phase 2: Validation (prevent user errors)
│   └── Day 5: Validation rules + UI

Week 2:
├── Phase 3: Error Handling
│   ├── Day 1-2: Error types + ErrorPanel
│   └── Day 3: Integration with optimization flow
│
├── Phase 4: WebSocket Updates
│   └── Day 4-5: WS events + frontend hook

Week 3:
├── Phase 5: Visualization
│   ├── Day 1-2: Sparklines
│   └── Day 3: Progress indicators
│
├── Phase 6: Testing
│   └── Day 4-5: Playwright setup + core tests
```

---

## Quick Wins (Can Do Now)

These can be implemented immediately with minimal changes:

1. **Persist introspection data in localStorage**
   - Cache introspection results
   - Restore on panel reopen

2. **Add loading states to all buttons**
   - Disable during operations
   - Show spinners

3. **Add confirmation dialogs**
   - Before stopping optimization
   - Before clearing canvas

4. **Improve error messages**
   - Parse NX error logs
   - Show actionable suggestions

---

## Files to Create/Modify

### New Files
```
atomizer-dashboard/frontend/src/
├── hooks/
│   ├── usePanelStore.ts
│   └── useOptimizationWebSocket.ts
├── components/canvas/
│   ├── PanelContainer.tsx
│   ├── panels/
│   │   ├── ValidationPanel.tsx
│   │   ├── ErrorPanel.tsx
│   │   └── ResultsPanel.tsx
│   └── visualization/
│       ├── ConvergenceSparkline.tsx
│       ├── ProgressRing.tsx
│       └── ConvergenceChart.tsx
└── lib/
    └── validation/
        └── specValidator.ts

e2e/
├── canvas.spec.ts
├── optimization.spec.ts
└── panels.spec.ts
```

### Modified Files
```
atomizer-dashboard/frontend/src/
├── pages/CanvasView.tsx
├── components/canvas/SpecRenderer.tsx
├── components/canvas/panels/IntrospectionPanel.tsx
├── components/canvas/panels/NodeConfigPanelV2.tsx
├── components/canvas/nodes/ObjectiveNode.tsx
└── hooks/useSpecStore.ts

atomizer-dashboard/backend/api/
├── routes/optimization.py
├── routes/spec.py
└── websocket.py
```

---

## Success Criteria

| Phase | Success Metric |
|-------|----------------|
| 1 | Introspection panel persists across node selections |
| 2 | Invalid spec shows clear error before run |
| 3 | NX errors display with recovery options |
| 4 | Results update within 500ms of trial completion |
| 5 | Convergence trend visible on objective nodes |
| 6 | All E2E tests pass in CI |

---

## Next Steps

1. Review this plan
2. Start with Phase 1 (Panel Management) - fixes your immediate issue
3. Implement incrementally, commit after each phase