feat: Add panel management, validation, and error handling to canvas

Phase 1 - Panel Management System:
- Create usePanelStore.ts for centralized panel state management
- Add PanelContainer.tsx for draggable floating panels
- Create FloatingIntrospectionPanel.tsx (persistent, doesn't disappear on node click)
- Create ResultsPanel.tsx for trial result details
- Refactor NodeConfigPanelV2 to use panel store for introspection
- Integrate PanelContainer into CanvasView

Phase 2 - Pre-run Validation:
- Create specValidator.ts with comprehensive validation rules
- Add ValidationPanel (enhanced version with error navigation)
- Add Validate button to SpecRenderer with status indicator
- Block run if validation fails
- Check for: design vars, objectives, extractors, bounds, connections

Phase 3 - Error Handling & Recovery:
- Create ErrorPanel.tsx for displaying optimization errors
- Add error classification (nx_crash, solver_fail, extractor_error, etc.)
- Add recovery suggestions based on error type
- Update status endpoint to return error info
- Add _get_study_error_info helper to check error_status.json and DB
- Integrate error detection into status polling

Documentation:
- Add CANVAS_ROBUSTNESS_PLAN.md with full implementation plan
This commit is contained in:
2026-01-21 21:35:31 -05:00
parent e1c59a51c1
commit c224b16ac3
12 changed files with 2853 additions and 29 deletions

View File

@@ -0,0 +1,438 @@
# Canvas Builder Robustness & Enhancement Plan
**Created**: January 21, 2026
**Branch**: `feature/studio-enhancement`
**Status**: Planning
---
## Executive Summary
This plan addresses critical issues and enhancements to make the Canvas Builder robust and production-ready:
1. **Panel Management** - Panels (Introspection, Config, Chat) disappear unexpectedly
2. **Pre-run Validation** - No validation before starting optimization
3. **Error Handling** - Poor feedback when things go wrong
4. **Live Updates** - Polling is inefficient; need WebSocket
5. **Visualization** - No convergence charts or progress indicators
6. **Testing** - No automated tests for critical flows
---
## Phase 1: Panel Management System (HIGH PRIORITY)
### Problem
- IntrospectionPanel disappears when user clicks elsewhere on canvas
- Panel state is lost (e.g., introspection results, expanded sections)
- No way to have multiple panels open simultaneously
- Chat panel and Config panel are mutually exclusive
### Root Cause
```typescript
// Current: Local state in ModelNodeConfig (NodeConfigPanelV2.tsx:275)
const [showIntrospection, setShowIntrospection] = useState(false);
// When selectedNodeId changes, ModelNodeConfig unmounts, losing state
```
### Solution: Centralized Panel Store
Create `usePanelStore.ts` - a Zustand store for panel management:
```typescript
// atomizer-dashboard/frontend/src/hooks/usePanelStore.ts
interface PanelState {
// Panel visibility
panels: {
introspection: { open: boolean; filePath?: string; data?: IntrospectionResult };
config: { open: boolean; nodeId?: string };
chat: { open: boolean; powerMode: boolean };
validation: { open: boolean; errors?: ValidationError[] };
results: { open: boolean; trialId?: number };
};
// Actions
openPanel: (panel: PanelName, data?: any) => void;
closePanel: (panel: PanelName) => void;
togglePanel: (panel: PanelName) => void;
// Panel data persistence
setIntrospectionData: (data: IntrospectionResult) => void;
clearIntrospectionData: () => void;
}
```
### Implementation Tasks
| Task | File | Description |
|------|------|-------------|
| 1.1 | `usePanelStore.ts` | Create Zustand store for panel state |
| 1.2 | `PanelContainer.tsx` | Create container that renders open panels |
| 1.3 | `IntrospectionPanel.tsx` | Refactor to use store instead of local state |
| 1.4 | `NodeConfigPanelV2.tsx` | Remove local panel state, use store |
| 1.5 | `CanvasView.tsx` | Integrate PanelContainer, remove chat panel logic |
| 1.6 | `SpecRenderer.tsx` | Add panel trigger buttons (introspect, validate) |
### UI Changes
**Before:**
```
[Canvas] [Config Panel OR Chat Panel]
↑ mutually exclusive
```
**After:**
```
[Canvas] [Right Panel Area]
├── Config Panel (pinnable)
├── Chat Panel (collapsible)
└── Floating Panels:
├── Introspection (draggable, persistent)
├── Validation Results
└── Trial Details
```
### Panel Behaviors
| Panel | Trigger | Persistence | Position |
|-------|---------|-------------|----------|
| **Config** | Node click | While node selected | Right sidebar |
| **Chat** | Toggle button | Always available | Right sidebar (below config) |
| **Introspection** | "Introspect" button | Until explicitly closed | Floating, draggable |
| **Validation** | "Validate" or pre-run | Until fixed or dismissed | Floating |
| **Results** | Click on result badge | Until dismissed | Floating |
---
## Phase 2: Pre-run Validation (HIGH PRIORITY)
### Problem
- User can click "Run" with incomplete spec
- No feedback about missing extractors, objectives, or connections
- Optimization fails silently or with cryptic errors
### Solution: Validation Pipeline
```typescript
// Types of validation
interface ValidationResult {
valid: boolean;
errors: ValidationError[]; // Must fix before running
warnings: ValidationWarning[]; // Can proceed but risky
}
interface ValidationError {
code: string;
severity: 'error' | 'warning';
path: string; // e.g., "objectives[0]"
message: string;
suggestion?: string;
autoFix?: () => void;
}
```
### Validation Rules
| Rule | Severity | Message |
|------|----------|---------|
| No design variables | Error | "Add at least one design variable" |
| No objectives | Error | "Add at least one objective" |
| Objective not connected to extractor | Error | "Objective '{name}' has no source extractor" |
| Extractor type not set | Error | "Extractor '{name}' needs a type selected" |
| Design var bounds invalid | Error | "Min must be less than max for '{name}'" |
| No model file | Error | "No simulation file configured" |
| Custom extractor no code | Warning | "Custom extractor '{name}' has no code" |
| High trial count (>500) | Warning | "Large budget may take hours to complete" |
| Single trial | Warning | "Only 1 trial - results won't be meaningful" |
### Implementation Tasks
| Task | File | Description |
|------|------|-------------|
| 2.1 | `validation/specValidator.ts` | Client-side validation rules |
| 2.2 | `ValidationPanel.tsx` | Display validation results |
| 2.3 | `SpecRenderer.tsx` | Add "Validate" button, pre-run check |
| 2.4 | `api/routes/spec.py` | Server-side validation endpoint |
| 2.5 | `useSpecStore.ts` | Add `validate()` action |
### UI Flow
```
User clicks "Run Optimization"
[Validate Spec] ──failed──→ [Show ValidationPanel]
↓ passed │
[Confirm Dialog] │
↓ confirmed │
[Start Optimization] ←── fix ─────┘
```
---
## Phase 3: Error Handling & Recovery (HIGH PRIORITY)
### Problem
- NX crashes don't show useful feedback
- Solver failures leave user confused
- No way to resume after errors
### Solution: Error Classification & Display
```typescript
interface OptimizationError {
type: 'nx_crash' | 'solver_fail' | 'extractor_error' | 'config_error' | 'system_error';
trial?: number;
message: string;
details?: string;
recoverable: boolean;
suggestions: string[];
}
```
### Error Handling Strategy
| Error Type | Display | Recovery |
|------------|---------|----------|
| NX Crash | Toast + Error Panel | Retry trial, skip trial |
| Solver Failure | Badge on trial | Mark infeasible, continue |
| Extractor Error | Log + badge | Use NaN, continue |
| Config Error | Block run | Show validation panel |
| System Error | Full modal | Restart optimization |
### Implementation Tasks
| Task | File | Description |
|------|------|-------------|
| 3.1 | `ErrorBoundary.tsx` | Wrap canvas in error boundary |
| 3.2 | `ErrorPanel.tsx` | Detailed error display with suggestions |
| 3.3 | `optimization.py` | Enhanced error responses with type/recovery |
| 3.4 | `SpecRenderer.tsx` | Error state handling, retry buttons |
| 3.5 | `useOptimizationStatus.ts` | Hook for status polling with error handling |
---
## Phase 4: Live Updates via WebSocket (MEDIUM PRIORITY)
### Problem
- Current polling (3s) is inefficient and has latency
- Missed updates between polls
- No real-time progress indication
### Solution: WebSocket for Trial Updates
```typescript
// WebSocket events
interface TrialStartEvent {
type: 'trial_start';
trial_number: number;
params: Record<string, number>;
}
interface TrialCompleteEvent {
type: 'trial_complete';
trial_number: number;
objectives: Record<string, number>;
is_best: boolean;
is_feasible: boolean;
}
interface OptimizationCompleteEvent {
type: 'optimization_complete';
best_trial: number;
total_trials: number;
}
```
### Implementation Tasks
| Task | File | Description |
|------|------|-------------|
| 4.1 | `websocket.py` | Add optimization events to WS |
| 4.2 | `run_optimization.py` | Emit events during optimization |
| 4.3 | `useOptimizationWebSocket.ts` | Hook for WS subscription |
| 4.4 | `SpecRenderer.tsx` | Use WS instead of polling |
| 4.5 | `ResultBadge.tsx` | Animate on new results |
---
## Phase 5: Convergence Visualization (MEDIUM PRIORITY)
### Problem
- No visual feedback on optimization progress
- Can't tell if converging or stuck
- No Pareto front visualization for multi-objective
### Solution: Embedded Charts
### Components
| Component | Description |
|-----------|-------------|
| `ConvergenceSparkline` | Tiny chart in ObjectiveNode showing trend |
| `ProgressRing` | Circular progress in header (trials/total) |
| `ConvergenceChart` | Full chart in Results panel |
| `ParetoPlot` | 2D Pareto front for multi-objective |
### Implementation Tasks
| Task | File | Description |
|------|------|-------------|
| 5.1 | `ConvergenceSparkline.tsx` | SVG sparkline component |
| 5.2 | `ObjectiveNode.tsx` | Integrate sparkline |
| 5.3 | `ProgressRing.tsx` | Circular progress indicator |
| 5.4 | `ConvergenceChart.tsx` | Full chart with Recharts |
| 5.5 | `ResultsPanel.tsx` | Panel showing detailed results |
---
## Phase 6: End-to-End Testing (MEDIUM PRIORITY)
### Problem
- No automated tests for canvas operations
- Manual testing is time-consuming and error-prone
- Regressions go unnoticed
### Solution: Playwright E2E Tests
### Test Scenarios
| Test | Steps | Assertions |
|------|-------|------------|
| Load study | Navigate to /canvas/{id} | Spec loads, nodes render |
| Add design var | Drag from palette | Node appears, spec updates |
| Connect nodes | Drag edge | Edge renders, spec has edge |
| Edit node | Click node, change value | Value persists, API called |
| Run validation | Click validate | Errors shown for incomplete |
| Start optimization | Complete spec, click run | Status shows running |
| View results | Wait for trial | Badge shows value |
| Stop optimization | Click stop | Status shows stopped |
### Implementation Tasks
| Task | File | Description |
|------|------|-------------|
| 6.1 | `e2e/canvas.spec.ts` | Basic canvas operations |
| 6.2 | `e2e/optimization.spec.ts` | Run/stop/status flow |
| 6.3 | `e2e/panels.spec.ts` | Panel open/close/persist |
| 6.4 | `playwright.config.ts` | Configure Playwright |
| 6.5 | `CI workflow` | Run tests in GitHub Actions |
---
## Implementation Order
```
Week 1:
├── Phase 1: Panel Management (critical UX fix)
│ ├── Day 1-2: usePanelStore + PanelContainer
│ └── Day 3-4: Refactor existing panels
├── Phase 2: Validation (prevent user errors)
│ └── Day 5: Validation rules + UI
Week 2:
├── Phase 3: Error Handling
│ ├── Day 1-2: Error types + ErrorPanel
│ └── Day 3: Integration with optimization flow
├── Phase 4: WebSocket Updates
│ └── Day 4-5: WS events + frontend hook
Week 3:
├── Phase 5: Visualization
│ ├── Day 1-2: Sparklines
│ └── Day 3: Progress indicators
├── Phase 6: Testing
│ └── Day 4-5: Playwright setup + core tests
```
---
## Quick Wins (Can Do Now)
These can be implemented immediately with minimal changes:
1. **Persist introspection data in localStorage**
- Cache introspection results
- Restore on panel reopen
2. **Add loading states to all buttons**
- Disable during operations
- Show spinners
3. **Add confirmation dialogs**
- Before stopping optimization
- Before clearing canvas
4. **Improve error messages**
- Parse NX error logs
- Show actionable suggestions
---
## Files to Create/Modify
### New Files
```
atomizer-dashboard/frontend/src/
├── hooks/
│ ├── usePanelStore.ts
│ └── useOptimizationWebSocket.ts
├── components/canvas/
│ ├── PanelContainer.tsx
│ ├── panels/
│ │ ├── ValidationPanel.tsx
│ │ ├── ErrorPanel.tsx
│ │ └── ResultsPanel.tsx
│ └── visualization/
│ ├── ConvergenceSparkline.tsx
│ ├── ProgressRing.tsx
│ └── ConvergenceChart.tsx
└── lib/
└── validation/
└── specValidator.ts
e2e/
├── canvas.spec.ts
├── optimization.spec.ts
└── panels.spec.ts
```
### Modified Files
```
atomizer-dashboard/frontend/src/
├── pages/CanvasView.tsx
├── components/canvas/SpecRenderer.tsx
├── components/canvas/panels/IntrospectionPanel.tsx
├── components/canvas/panels/NodeConfigPanelV2.tsx
├── components/canvas/nodes/ObjectiveNode.tsx
└── hooks/useSpecStore.ts
atomizer-dashboard/backend/api/
├── routes/optimization.py
├── routes/spec.py
└── websocket.py
```
---
## Success Criteria
| Phase | Success Metric |
|-------|----------------|
| 1 | Introspection panel persists across node selections |
| 2 | Invalid spec shows clear error before run |
| 3 | NX errors display with recovery options |
| 4 | Results update within 500ms of trial completion |
| 5 | Convergence trend visible on objective nodes |
| 6 | All E2E tests pass in CI |
---
## Next Steps
1. Review this plan
2. Start with Phase 1 (Panel Management) - fixes your immediate issue
3. Implement incrementally, commit after each phase