Files
Atomizer/docs/protocols/operations/OP_03_MONITOR_PROGRESS.md
Antoine 602560c46a feat: Add MLP surrogate with Turbo Mode for 100x faster optimization
Neural Acceleration (MLP Surrogate):
- Add run_nn_optimization.py with hybrid FEA/NN workflow
- MLP architecture: 4-layer (64->128->128->64) with BatchNorm/Dropout
- Three workflow modes:
  - --all: Sequential export->train->optimize->validate
  - --hybrid-loop: Iterative Train->NN->Validate->Retrain cycle
  - --turbo: Aggressive single-best validation (RECOMMENDED)
- Turbo mode: 5000 NN trials + 50 FEA validations in ~12 minutes
- Separate nn_study.db to avoid overloading dashboard

Performance Results (bracket_pareto_3obj study):
- NN prediction errors: mass 1-5%, stress 1-4%, stiffness 5-15%
- Found minimum mass designs at boundary (angle~30deg, thick~30mm)
- 100x speedup vs pure FEA exploration

Protocol Operating System:
- Add .claude/skills/ with Bootstrap, Cheatsheet, Context Loader
- Add docs/protocols/ with operations (OP_01-06) and system (SYS_10-14)
- Update SYS_14_NEURAL_ACCELERATION.md with MLP Turbo Mode docs

NX Automation:
- Add optimization_engine/hooks/ for NX CAD/CAE automation
- Add study_wizard.py for guided study creation
- Fix FEM mesh update: load idealized part before UpdateFemodel()

New Study:
- bracket_pareto_3obj: 3-objective Pareto (mass, stress, stiffness)
- 167 FEA trials + 5000 NN trials completed
- Demonstrates full hybrid workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 20:01:59 -05:00

247 lines
5.1 KiB
Markdown

# OP_03: Monitor Progress
<!--
PROTOCOL: Monitor Optimization Progress
LAYER: Operations
VERSION: 1.0
STATUS: Active
LAST_UPDATED: 2025-12-05
PRIVILEGE: user
LOAD_WITH: [SYS_13_DASHBOARD_TRACKING]
-->
## Overview
This protocol covers monitoring optimization progress through console output, dashboard, database queries, and Optuna's built-in tools.
---
## When to Use
| Trigger | Action |
|---------|--------|
| "status", "progress" | Follow this protocol |
| "how many trials" | Query database |
| "what's happening" | Check console or dashboard |
| "is it running" | Check process status |
---
## Quick Reference
| Method | Command/URL | Best For |
|--------|-------------|----------|
| Console | Watch terminal output | Quick check |
| Dashboard | `http://localhost:3000` | Visual monitoring |
| Database query | Python one-liner | Scripted checks |
| Optuna Dashboard | `http://localhost:8080` | Detailed analysis |
---
## Monitoring Methods
### 1. Console Output
If running in foreground, watch terminal:
```
[10:15:30] Trial 15/50 started
[10:17:45] Trial 15/50 complete: mass=0.234 kg (best: 0.212 kg)
[10:17:46] Trial 16/50 started
```
### 2. Atomizer Dashboard
**Start Dashboard** (if not running):
```bash
# Terminal 1: Backend
cd atomizer-dashboard/backend
python -m uvicorn api.main:app --reload --port 8000
# Terminal 2: Frontend
cd atomizer-dashboard/frontend
npm run dev
```
**View at**: `http://localhost:3000`
**Features**:
- Real-time trial progress bar
- Current optimizer phase (if Protocol 10)
- Pareto front visualization (if multi-objective)
- Parallel coordinates plot
- Convergence chart
### 3. Database Query
**Quick status**:
```bash
python -c "
import optuna
study = optuna.load_study(
study_name='my_study',
storage='sqlite:///studies/my_study/2_results/study.db'
)
print(f'Trials completed: {len(study.trials)}')
print(f'Best value: {study.best_value}')
print(f'Best params: {study.best_params}')
"
```
**Detailed status**:
```python
import optuna
study = optuna.load_study(
study_name='my_study',
storage='sqlite:///studies/my_study/2_results/study.db'
)
# Trial counts by state
from collections import Counter
states = Counter(t.state.name for t in study.trials)
print(f"Complete: {states.get('COMPLETE', 0)}")
print(f"Pruned: {states.get('PRUNED', 0)}")
print(f"Failed: {states.get('FAIL', 0)}")
print(f"Running: {states.get('RUNNING', 0)}")
# Best trials
if len(study.directions) > 1:
print(f"Pareto front size: {len(study.best_trials)}")
else:
print(f"Best value: {study.best_value}")
```
### 4. Optuna Dashboard
```bash
optuna-dashboard sqlite:///studies/my_study/2_results/study.db
# Open http://localhost:8080
```
**Features**:
- Trial history table
- Parameter importance
- Optimization history plot
- Slice plot (parameter vs objective)
### 5. Check Running Processes
```bash
# Linux/Mac
ps aux | grep run_optimization
# Windows
tasklist | findstr python
```
---
## Key Metrics to Monitor
### Trial Progress
- Completed trials vs target
- Completion rate (trials/hour)
- Estimated time remaining
### Objective Improvement
- Current best value
- Improvement trend
- Plateau detection
### Constraint Satisfaction
- Feasibility rate (% passing constraints)
- Most violated constraint
### For Protocol 10 (IMSO)
- Current phase (Characterization vs Optimization)
- Current strategy (TPE, GP, CMA-ES)
- Characterization confidence
### For Protocol 11 (Multi-Objective)
- Pareto front size
- Hypervolume indicator
- Spread of solutions
---
## Interpreting Results
### Healthy Optimization
```
Trial 45/50: mass=0.198 kg (best: 0.195 kg)
Feasibility rate: 78%
```
- Progress toward target
- Reasonable feasibility rate (60-90%)
- Gradual improvement
### Potential Issues
**All Trials Pruned**:
```
Trial 20 pruned: constraint violated
Trial 21 pruned: constraint violated
...
```
→ Constraints too tight. Consider relaxing.
**No Improvement**:
```
Trial 30: best=0.234 (unchanged since trial 8)
Trial 31: best=0.234 (unchanged since trial 8)
```
→ May have converged, or stuck in local minimum.
**High Failure Rate**:
```
Failed: 15/50 (30%)
```
→ Model issues. Check NX logs.
---
## Real-Time State File
If using Protocol 10, check:
```bash
cat studies/my_study/2_results/intelligent_optimizer/optimizer_state.json
```
```json
{
"timestamp": "2025-12-05T10:15:30",
"trial_number": 29,
"total_trials": 50,
"current_phase": "adaptive_optimization",
"current_strategy": "GP_UCB",
"is_multi_objective": false
}
```
---
## Troubleshooting
| Symptom | Cause | Solution |
|---------|-------|----------|
| Dashboard shows old data | Backend not running | Start backend |
| "No study found" | Wrong path | Check study name and path |
| Trial count not increasing | Process stopped | Check if still running |
| Dashboard not updating | Polling issue | Refresh browser |
---
## Cross-References
- **Preceded By**: [OP_02_RUN_OPTIMIZATION](./OP_02_RUN_OPTIMIZATION.md)
- **Followed By**: [OP_04_ANALYZE_RESULTS](./OP_04_ANALYZE_RESULTS.md)
- **Integrates With**: [SYS_13_DASHBOARD_TRACKING](../system/SYS_13_DASHBOARD_TRACKING.md)
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-05 | Initial release |