feat: Add panel management, validation, and error handling to canvas

Phase 1 - Panel Management System: - Create usePanelStore.ts for centralized panel state management - Add PanelContainer.tsx for draggable floating panels - Create FloatingIntrospectionPanel.tsx (persistent, doesn't disappear on node click) - Create ResultsPanel.tsx for trial result details - Refactor NodeConfigPanelV2 to use panel store for introspection - Integrate PanelContainer into CanvasView Phase 2 - Pre-run Validation: - Create specValidator.ts with comprehensive validation rules - Add ValidationPanel (enhanced version with error navigation) - Add Validate button to SpecRenderer with status indicator - Block run if validation fails - Check for: design vars, objectives, extractors, bounds, connections Phase 3 - Error Handling & Recovery: - Create ErrorPanel.tsx for displaying optimization errors - Add error classification (nx_crash, solver_fail, extractor_error, etc.) - Add recovery suggestions based on error type - Update status endpoint to return error info - Add _get_study_error_info helper to check error_status.json and DB - Integrate error detection into status polling Documentation: - Add CANVAS_ROBUSTNESS_PLAN.md with full implementation plan
2026-01-21 21:35:31 -05:00
parent e1c59a51c1
commit c224b16ac3
12 changed files with 2853 additions and 29 deletions
--- a/docs/plans/CANVAS_ROBUSTNESS_PLAN.md
+++ b/docs/plans/CANVAS_ROBUSTNESS_PLAN.md
@@ -0,0 +1,438 @@
+# Canvas Builder Robustness & Enhancement Plan
+
+**Created**: January 21, 2026  
+**Branch**: `feature/studio-enhancement`  
+**Status**: Planning
+
+---
+
+## Executive Summary
+
+This plan addresses critical issues and enhancements to make the Canvas Builder robust and production-ready:
+
+1. **Panel Management** - Panels (Introspection, Config, Chat) disappear unexpectedly
+2. **Pre-run Validation** - No validation before starting optimization
+3. **Error Handling** - Poor feedback when things go wrong
+4. **Live Updates** - Polling is inefficient; need WebSocket
+5. **Visualization** - No convergence charts or progress indicators
+6. **Testing** - No automated tests for critical flows
+
+---
+
+## Phase 1: Panel Management System (HIGH PRIORITY)
+
+### Problem
+- IntrospectionPanel disappears when user clicks elsewhere on canvas
+- Panel state is lost (e.g., introspection results, expanded sections)
+- No way to have multiple panels open simultaneously
+- Chat panel and Config panel are mutually exclusive
+
+### Root Cause
+```typescript
+// Current: Local state in ModelNodeConfig (NodeConfigPanelV2.tsx:275)
+const [showIntrospection, setShowIntrospection] = useState(false);
+
+// When selectedNodeId changes, ModelNodeConfig unmounts, losing state
+```
+
+### Solution: Centralized Panel Store
+
+Create `usePanelStore.ts` - a Zustand store for panel management:
+
+```typescript
+// atomizer-dashboard/frontend/src/hooks/usePanelStore.ts
+
+interface PanelState {
+  // Panel visibility
+  panels: {
+    introspection: { open: boolean; filePath?: string; data?: IntrospectionResult };
+    config: { open: boolean; nodeId?: string };
+    chat: { open: boolean; powerMode: boolean };
+    validation: { open: boolean; errors?: ValidationError[] };
+    results: { open: boolean; trialId?: number };
+  };
+  
+  // Actions
+  openPanel: (panel: PanelName, data?: any) => void;
+  closePanel: (panel: PanelName) => void;
+  togglePanel: (panel: PanelName) => void;
+  
+  // Panel data persistence
+  setIntrospectionData: (data: IntrospectionResult) => void;
+  clearIntrospectionData: () => void;
+}
+```
+
+### Implementation Tasks
+
+| Task | File | Description |
+|------|------|-------------|
+| 1.1 | `usePanelStore.ts` | Create Zustand store for panel state |
+| 1.2 | `PanelContainer.tsx` | Create container that renders open panels |
+| 1.3 | `IntrospectionPanel.tsx` | Refactor to use store instead of local state |
+| 1.4 | `NodeConfigPanelV2.tsx` | Remove local panel state, use store |
+| 1.5 | `CanvasView.tsx` | Integrate PanelContainer, remove chat panel logic |
+| 1.6 | `SpecRenderer.tsx` | Add panel trigger buttons (introspect, validate) |
+
+### UI Changes
+
+**Before:**
+```
+[Canvas] [Config Panel OR Chat Panel]
+         ↑ mutually exclusive
+```
+
+**After:**
+```
+[Canvas] [Right Panel Area]
+         ├── Config Panel (pinnable)
+         ├── Chat Panel (collapsible)
+         └── Floating Panels:
+             ├── Introspection (draggable, persistent)
+             ├── Validation Results
+             └── Trial Details
+```
+
+### Panel Behaviors
+
+| Panel | Trigger | Persistence | Position |
+|-------|---------|-------------|----------|
+| **Config** | Node click | While node selected | Right sidebar |
+| **Chat** | Toggle button | Always available | Right sidebar (below config) |
+| **Introspection** | "Introspect" button | Until explicitly closed | Floating, draggable |
+| **Validation** | "Validate" or pre-run | Until fixed or dismissed | Floating |
+| **Results** | Click on result badge | Until dismissed | Floating |
+
+---
+
+## Phase 2: Pre-run Validation (HIGH PRIORITY)
+
+### Problem
+- User can click "Run" with incomplete spec
+- No feedback about missing extractors, objectives, or connections
+- Optimization fails silently or with cryptic errors
+
+### Solution: Validation Pipeline
+
+```typescript
+// Types of validation
+interface ValidationResult {
+  valid: boolean;
+  errors: ValidationError[];   // Must fix before running
+  warnings: ValidationWarning[]; // Can proceed but risky
+}
+
+interface ValidationError {
+  code: string;
+  severity: 'error' | 'warning';
+  path: string;       // e.g., "objectives[0]"
+  message: string;
+  suggestion?: string;
+  autoFix?: () => void;
+}
+```
+
+### Validation Rules
+
+| Rule | Severity | Message |
+|------|----------|---------|
+| No design variables | Error | "Add at least one design variable" |
+| No objectives | Error | "Add at least one objective" |
+| Objective not connected to extractor | Error | "Objective '{name}' has no source extractor" |
+| Extractor type not set | Error | "Extractor '{name}' needs a type selected" |
+| Design var bounds invalid | Error | "Min must be less than max for '{name}'" |
+| No model file | Error | "No simulation file configured" |
+| Custom extractor no code | Warning | "Custom extractor '{name}' has no code" |
+| High trial count (>500) | Warning | "Large budget may take hours to complete" |
+| Single trial | Warning | "Only 1 trial - results won't be meaningful" |
+
+### Implementation Tasks
+
+| Task | File | Description |
+|------|------|-------------|
+| 2.1 | `validation/specValidator.ts` | Client-side validation rules |
+| 2.2 | `ValidationPanel.tsx` | Display validation results |
+| 2.3 | `SpecRenderer.tsx` | Add "Validate" button, pre-run check |
+| 2.4 | `api/routes/spec.py` | Server-side validation endpoint |
+| 2.5 | `useSpecStore.ts` | Add `validate()` action |
+
+### UI Flow
+
+```
+User clicks "Run Optimization"
+    ↓
+[Validate Spec] ──failed──→ [Show ValidationPanel]
+    ↓ passed                      │
+[Confirm Dialog]                  │
+    ↓ confirmed                   │
+[Start Optimization] ←── fix ─────┘
+```
+
+---
+
+## Phase 3: Error Handling & Recovery (HIGH PRIORITY)
+
+### Problem
+- NX crashes don't show useful feedback
+- Solver failures leave user confused
+- No way to resume after errors
+
+### Solution: Error Classification & Display
+
+```typescript
+interface OptimizationError {
+  type: 'nx_crash' | 'solver_fail' | 'extractor_error' | 'config_error' | 'system_error';
+  trial?: number;
+  message: string;
+  details?: string;
+  recoverable: boolean;
+  suggestions: string[];
+}
+```
+
+### Error Handling Strategy
+
+| Error Type | Display | Recovery |
+|------------|---------|----------|
+| NX Crash | Toast + Error Panel | Retry trial, skip trial |
+| Solver Failure | Badge on trial | Mark infeasible, continue |
+| Extractor Error | Log + badge | Use NaN, continue |
+| Config Error | Block run | Show validation panel |
+| System Error | Full modal | Restart optimization |
+
+### Implementation Tasks
+
+| Task | File | Description |
+|------|------|-------------|
+| 3.1 | `ErrorBoundary.tsx` | Wrap canvas in error boundary |
+| 3.2 | `ErrorPanel.tsx` | Detailed error display with suggestions |
+| 3.3 | `optimization.py` | Enhanced error responses with type/recovery |
+| 3.4 | `SpecRenderer.tsx` | Error state handling, retry buttons |
+| 3.5 | `useOptimizationStatus.ts` | Hook for status polling with error handling |
+
+---
+
+## Phase 4: Live Updates via WebSocket (MEDIUM PRIORITY)
+
+### Problem
+- Current polling (3s) is inefficient and has latency
+- Missed updates between polls
+- No real-time progress indication
+
+### Solution: WebSocket for Trial Updates
+
+```typescript
+// WebSocket events
+interface TrialStartEvent {
+  type: 'trial_start';
+  trial_number: number;
+  params: Record<string, number>;
+}
+
+interface TrialCompleteEvent {
+  type: 'trial_complete';
+  trial_number: number;
+  objectives: Record<string, number>;
+  is_best: boolean;
+  is_feasible: boolean;
+}
+
+interface OptimizationCompleteEvent {
+  type: 'optimization_complete';
+  best_trial: number;
+  total_trials: number;
+}
+```
+
+### Implementation Tasks
+
+| Task | File | Description |
+|------|------|-------------|
+| 4.1 | `websocket.py` | Add optimization events to WS |
+| 4.2 | `run_optimization.py` | Emit events during optimization |
+| 4.3 | `useOptimizationWebSocket.ts` | Hook for WS subscription |
+| 4.4 | `SpecRenderer.tsx` | Use WS instead of polling |
+| 4.5 | `ResultBadge.tsx` | Animate on new results |
+
+---
+
+## Phase 5: Convergence Visualization (MEDIUM PRIORITY)
+
+### Problem
+- No visual feedback on optimization progress
+- Can't tell if converging or stuck
+- No Pareto front visualization for multi-objective
+
+### Solution: Embedded Charts
+
+### Components
+
+| Component | Description |
+|-----------|-------------|
+| `ConvergenceSparkline` | Tiny chart in ObjectiveNode showing trend |
+| `ProgressRing` | Circular progress in header (trials/total) |
+| `ConvergenceChart` | Full chart in Results panel |
+| `ParetoPlot` | 2D Pareto front for multi-objective |
+
+### Implementation Tasks
+
+| Task | File | Description |
+|------|------|-------------|
+| 5.1 | `ConvergenceSparkline.tsx` | SVG sparkline component |
+| 5.2 | `ObjectiveNode.tsx` | Integrate sparkline |
+| 5.3 | `ProgressRing.tsx` | Circular progress indicator |
+| 5.4 | `ConvergenceChart.tsx` | Full chart with Recharts |
+| 5.5 | `ResultsPanel.tsx` | Panel showing detailed results |
+
+---
+
+## Phase 6: End-to-End Testing (MEDIUM PRIORITY)
+
+### Problem
+- No automated tests for canvas operations
+- Manual testing is time-consuming and error-prone
+- Regressions go unnoticed
+
+### Solution: Playwright E2E Tests
+
+### Test Scenarios
+
+| Test | Steps | Assertions |
+|------|-------|------------|
+| Load study | Navigate to /canvas/{id} | Spec loads, nodes render |
+| Add design var | Drag from palette | Node appears, spec updates |
+| Connect nodes | Drag edge | Edge renders, spec has edge |
+| Edit node | Click node, change value | Value persists, API called |
+| Run validation | Click validate | Errors shown for incomplete |
+| Start optimization | Complete spec, click run | Status shows running |
+| View results | Wait for trial | Badge shows value |
+| Stop optimization | Click stop | Status shows stopped |
+
+### Implementation Tasks
+
+| Task | File | Description |
+|------|------|-------------|
+| 6.1 | `e2e/canvas.spec.ts` | Basic canvas operations |
+| 6.2 | `e2e/optimization.spec.ts` | Run/stop/status flow |
+| 6.3 | `e2e/panels.spec.ts` | Panel open/close/persist |
+| 6.4 | `playwright.config.ts` | Configure Playwright |
+| 6.5 | `CI workflow` | Run tests in GitHub Actions |
+
+---
+
+## Implementation Order
+
+```
+Week 1:
+├── Phase 1: Panel Management (critical UX fix)
+│   ├── Day 1-2: usePanelStore + PanelContainer
+│   └── Day 3-4: Refactor existing panels
+│
+├── Phase 2: Validation (prevent user errors)
+│   └── Day 5: Validation rules + UI
+
+Week 2:
+├── Phase 3: Error Handling
+│   ├── Day 1-2: Error types + ErrorPanel
+│   └── Day 3: Integration with optimization flow
+│
+├── Phase 4: WebSocket Updates
+│   └── Day 4-5: WS events + frontend hook
+
+Week 3:
+├── Phase 5: Visualization
+│   ├── Day 1-2: Sparklines
+│   └── Day 3: Progress indicators
+│
+├── Phase 6: Testing
+│   └── Day 4-5: Playwright setup + core tests
+```
+
+---
+
+## Quick Wins (Can Do Now)
+
+These can be implemented immediately with minimal changes:
+
+1. **Persist introspection data in localStorage**
+   - Cache introspection results
+   - Restore on panel reopen
+
+2. **Add loading states to all buttons**
+   - Disable during operations
+   - Show spinners
+
+3. **Add confirmation dialogs**
+   - Before stopping optimization
+   - Before clearing canvas
+
+4. **Improve error messages**
+   - Parse NX error logs
+   - Show actionable suggestions
+
+---
+
+## Files to Create/Modify
+
+### New Files
+```
+atomizer-dashboard/frontend/src/
+├── hooks/
+│   ├── usePanelStore.ts
+│   └── useOptimizationWebSocket.ts
+├── components/canvas/
+│   ├── PanelContainer.tsx
+│   ├── panels/
+│   │   ├── ValidationPanel.tsx
+│   │   ├── ErrorPanel.tsx
+│   │   └── ResultsPanel.tsx
+│   └── visualization/
+│       ├── ConvergenceSparkline.tsx
+│       ├── ProgressRing.tsx
+│       └── ConvergenceChart.tsx
+└── lib/
+    └── validation/
+        └── specValidator.ts
+
+e2e/
+├── canvas.spec.ts
+├── optimization.spec.ts
+└── panels.spec.ts
+```
+
+### Modified Files
+```
+atomizer-dashboard/frontend/src/
+├── pages/CanvasView.tsx
+├── components/canvas/SpecRenderer.tsx
+├── components/canvas/panels/IntrospectionPanel.tsx
+├── components/canvas/panels/NodeConfigPanelV2.tsx
+├── components/canvas/nodes/ObjectiveNode.tsx
+└── hooks/useSpecStore.ts
+
+atomizer-dashboard/backend/api/
+├── routes/optimization.py
+├── routes/spec.py
+└── websocket.py
+```
+
+---
+
+## Success Criteria
+
+| Phase | Success Metric |
+|-------|----------------|
+| 1 | Introspection panel persists across node selections |
+| 2 | Invalid spec shows clear error before run |
+| 3 | NX errors display with recovery options |
+| 4 | Results update within 500ms of trial completion |
+| 5 | Convergence trend visible on objective nodes |
+| 6 | All E2E tests pass in CI |
+
+---
+
+## Next Steps
+
+1. Review this plan
+2. Start with Phase 1 (Panel Management) - fixes your immediate issue
+3. Implement incrementally, commit after each phase