Files
Atomizer/docs/protocols/operations/OP_02_RUN_OPTIMIZATION.md
Antoine 602560c46a feat: Add MLP surrogate with Turbo Mode for 100x faster optimization
Neural Acceleration (MLP Surrogate):
- Add run_nn_optimization.py with hybrid FEA/NN workflow
- MLP architecture: 4-layer (64->128->128->64) with BatchNorm/Dropout
- Three workflow modes:
  - --all: Sequential export->train->optimize->validate
  - --hybrid-loop: Iterative Train->NN->Validate->Retrain cycle
  - --turbo: Aggressive single-best validation (RECOMMENDED)
- Turbo mode: 5000 NN trials + 50 FEA validations in ~12 minutes
- Separate nn_study.db to avoid overloading dashboard

Performance Results (bracket_pareto_3obj study):
- NN prediction errors: mass 1-5%, stress 1-4%, stiffness 5-15%
- Found minimum mass designs at boundary (angle~30deg, thick~30mm)
- 100x speedup vs pure FEA exploration

Protocol Operating System:
- Add .claude/skills/ with Bootstrap, Cheatsheet, Context Loader
- Add docs/protocols/ with operations (OP_01-06) and system (SYS_10-14)
- Update SYS_14_NEURAL_ACCELERATION.md with MLP Turbo Mode docs

NX Automation:
- Add optimization_engine/hooks/ for NX CAD/CAE automation
- Add study_wizard.py for guided study creation
- Fix FEM mesh update: load idealized part before UpdateFemodel()

New Study:
- bracket_pareto_3obj: 3-objective Pareto (mass, stress, stiffness)
- 167 FEA trials + 5000 NN trials completed
- Demonstrates full hybrid workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 20:01:59 -05:00

298 lines
6.3 KiB
Markdown

# OP_02: Run Optimization
<!--
PROTOCOL: Run Optimization
LAYER: Operations
VERSION: 1.0
STATUS: Active
LAST_UPDATED: 2025-12-05
PRIVILEGE: user
LOAD_WITH: []
-->
## Overview
This protocol covers executing optimization runs, including pre-flight validation, execution modes, monitoring, and handling common issues.
---
## When to Use
| Trigger | Action |
|---------|--------|
| "start", "run", "execute" | Follow this protocol |
| "begin optimization" | Follow this protocol |
| Study setup complete | Execute this protocol |
---
## Quick Reference
**Start Command**:
```bash
conda activate atomizer
cd studies/{study_name}
python run_optimization.py
```
**Common Options**:
| Flag | Purpose |
|------|---------|
| `--n-trials 100` | Override trial count |
| `--resume` | Continue interrupted run |
| `--test` | Run single trial for validation |
| `--export-training` | Export data for neural training |
---
## Pre-Flight Checklist
Before running, verify:
- [ ] **Environment**: `conda activate atomizer`
- [ ] **Config exists**: `1_setup/optimization_config.json`
- [ ] **Script exists**: `run_optimization.py`
- [ ] **Model files**: NX files in `1_setup/model/`
- [ ] **No conflicts**: No other optimization running on same study
- [ ] **Disk space**: Sufficient for results
**Quick Validation**:
```bash
python run_optimization.py --test
```
This runs a single trial to verify setup.
---
## Execution Modes
### 1. Standard Run
```bash
python run_optimization.py
```
Uses settings from `optimization_config.json`.
### 2. Override Trials
```bash
python run_optimization.py --n-trials 100
```
Override trial count from config.
### 3. Resume Interrupted
```bash
python run_optimization.py --resume
```
Continues from last completed trial.
### 4. Neural Acceleration
```bash
python run_optimization.py --neural
```
Requires trained surrogate model.
### 5. Export Training Data
```bash
python run_optimization.py --export-training
```
Saves BDF/OP2 for neural network training.
---
## Monitoring Progress
### Option 1: Console Output
The script prints progress:
```
Trial 15/50 complete. Best: 0.234 kg
Trial 16/50 complete. Best: 0.234 kg
```
### Option 2: Dashboard
See [SYS_13_DASHBOARD_TRACKING](../system/SYS_13_DASHBOARD_TRACKING.md).
```bash
# Start dashboard (separate terminal)
cd atomizer-dashboard/backend && python -m uvicorn api.main:app --port 8000
cd atomizer-dashboard/frontend && npm run dev
# Open browser
http://localhost:3000
```
### Option 3: Query Database
```bash
python -c "
import optuna
study = optuna.load_study('study_name', 'sqlite:///2_results/study.db')
print(f'Trials: {len(study.trials)}')
print(f'Best value: {study.best_value}')
"
```
### Option 4: Optuna Dashboard
```bash
optuna-dashboard sqlite:///2_results/study.db
# Open http://localhost:8080
```
---
## During Execution
### What Happens Per Trial
1. **Sample parameters**: Optuna suggests design variable values
2. **Update model**: NX expressions updated via journal
3. **Solve**: NX Nastran runs FEA simulation
4. **Extract results**: Extractors read OP2 file
5. **Evaluate**: Check constraints, compute objectives
6. **Record**: Trial stored in Optuna database
### Normal Output
```
[2025-12-05 10:15:30] Trial 1 started
[2025-12-05 10:17:45] NX solve complete (135.2s)
[2025-12-05 10:17:46] Extraction complete
[2025-12-05 10:17:46] Trial 1 complete: mass=0.342 kg, stress=198.5 MPa
[2025-12-05 10:17:47] Trial 2 started
...
```
### Expected Timing
| Operation | Typical Time |
|-----------|--------------|
| NX solve | 30s - 30min |
| Extraction | <1s |
| Per trial total | 1-30 min |
| 50 trials | 1-24 hours |
---
## Handling Issues
### Trial Failed / Pruned
```
[WARNING] Trial 12 pruned: Stress constraint violated (312.5 MPa > 250 MPa)
```
**Normal behavior** - optimizer learns from failures.
### NX Session Timeout
```
[ERROR] NX session timeout after 600s
```
**Solution**: Increase timeout in config or simplify model.
### Expression Not Found
```
[ERROR] Expression 'thicknes' not found in model
```
**Solution**: Check spelling, verify expression exists in NX.
### OP2 File Missing
```
[ERROR] OP2 file not found: model.op2
```
**Solution**: Check NX solve completed. Review NX log file.
### Database Locked
```
[ERROR] Database is locked
```
**Solution**: Another process using database. Wait or kill stale process.
---
## Stopping and Resuming
### Graceful Stop
Press `Ctrl+C` once. Current trial completes, then exits.
### Force Stop
Press `Ctrl+C` twice. Immediate exit (may lose current trial).
### Resume
```bash
python run_optimization.py --resume
```
Continues from last completed trial. Same study database used.
---
## Post-Run Actions
After optimization completes:
1. **Check results**:
```bash
python -c "import optuna; s=optuna.load_study(...); print(s.best_params)"
```
2. **View in dashboard**: `http://localhost:3000`
3. **Generate report**: See [OP_04_ANALYZE_RESULTS](./OP_04_ANALYZE_RESULTS.md)
4. **Update STUDY_REPORT.md**: Fill in results template
---
## Protocol Integration
### With Protocol 10 (IMSO)
If enabled, optimization runs in two phases:
1. Characterization (10-30 trials)
2. Optimization (remaining trials)
Dashboard shows phase transitions.
### With Protocol 11 (Multi-Objective)
If 2+ objectives, uses NSGA-II. Returns Pareto front, not single best.
### With Protocol 13 (Dashboard)
Writes `optimizer_state.json` every trial for real-time updates.
### With Protocol 14 (Neural)
If `--neural` flag, uses trained surrogate for fast evaluation.
---
## Troubleshooting
| Symptom | Cause | Solution |
|---------|-------|----------|
| "ModuleNotFoundError" | Wrong environment | `conda activate atomizer` |
| All trials pruned | Constraints too tight | Relax constraints |
| Very slow | Model too complex | Simplify mesh, increase timeout |
| No improvement | Wrong sampler | Try different algorithm |
| "NX license error" | License unavailable | Check NX license server |
---
## Cross-References
- **Preceded By**: [OP_01_CREATE_STUDY](./OP_01_CREATE_STUDY.md)
- **Followed By**: [OP_03_MONITOR_PROGRESS](./OP_03_MONITOR_PROGRESS.md), [OP_04_ANALYZE_RESULTS](./OP_04_ANALYZE_RESULTS.md)
- **Integrates With**: [SYS_10_IMSO](../system/SYS_10_IMSO.md), [SYS_13_DASHBOARD_TRACKING](../system/SYS_13_DASHBOARD_TRACKING.md)
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-05 | Initial release |