Files
Atomizer/docs/protocols/system/SYS_14_NEURAL_ACCELERATION.md

565 lines
15 KiB
Markdown
Raw Normal View History

# SYS_14: Neural Network Acceleration
<!--
PROTOCOL: Neural Network Surrogate Acceleration
LAYER: System
VERSION: 2.0
STATUS: Active
LAST_UPDATED: 2025-12-06
PRIVILEGE: user
LOAD_WITH: [SYS_10_IMSO, SYS_11_MULTI_OBJECTIVE]
-->
## Overview
Atomizer provides **neural network surrogate acceleration** enabling 100-1000x faster optimization by replacing expensive FEA evaluations with instant neural predictions.
**Two approaches available**:
1. **MLP Surrogate** (Simple, integrated) - 4-layer MLP trained on FEA data, runs within study
2. **GNN Field Predictor** (Advanced) - Graph neural network for full field predictions
**Key Innovation**: Train once on FEA data, then explore 5,000-50,000+ designs in the time it takes to run 50 FEA trials.
---
## When to Use
| Trigger | Action |
|---------|--------|
| >50 trials needed | Consider neural acceleration |
| "neural", "surrogate", "NN" mentioned | Load this protocol |
| "fast", "acceleration", "speed" needed | Suggest neural acceleration |
| Training data available | Enable surrogate |
---
## Quick Reference
**Performance Comparison**:
| Metric | Traditional FEA | Neural Network | Improvement |
|--------|-----------------|----------------|-------------|
| Time per evaluation | 10-30 minutes | 4.5 milliseconds | **2,000-500,000x** |
| Trials per hour | 2-6 | 800,000+ | **1000x** |
| Design exploration | ~50 designs | ~50,000 designs | **1000x** |
**Model Types**:
| Model | Purpose | Use When |
|-------|---------|----------|
| **MLP Surrogate** | Direct objective prediction | Simple studies, quick setup |
| Field Predictor GNN | Full displacement/stress fields | Need field visualization |
| Parametric Predictor GNN | Direct objective prediction | Complex geometry, need accuracy |
| Ensemble | Uncertainty quantification | Need confidence bounds |
---
## MLP Surrogate (Recommended for Quick Start)
### Overview
The MLP (Multi-Layer Perceptron) surrogate is a simple but effective neural network that predicts objectives directly from design parameters. It's integrated into the study workflow via `run_nn_optimization.py`.
### Architecture
```
Input Layer (N design variables)
Linear(N, 64) + ReLU + BatchNorm + Dropout(0.1)
Linear(64, 128) + ReLU + BatchNorm + Dropout(0.1)
Linear(128, 128) + ReLU + BatchNorm + Dropout(0.1)
Linear(128, 64) + ReLU + BatchNorm + Dropout(0.1)
Linear(64, M objectives)
```
**Parameters**: ~34,000 trainable
### Workflow Modes
#### 1. Standard Hybrid Mode (`--all`)
Run all phases sequentially:
```bash
python run_nn_optimization.py --all
```
Phases:
1. **Export**: Extract training data from existing FEA trials
2. **Train**: Train MLP surrogate (300 epochs default)
3. **NN-Optimize**: Run 1000 NN trials with NSGA-II
4. **Validate**: Validate top 10 candidates with FEA
#### 2. Hybrid Loop Mode (`--hybrid-loop`)
Iterative refinement:
```bash
python run_nn_optimization.py --hybrid-loop --iterations 5 --nn-trials 500
```
Each iteration:
1. Train/retrain surrogate from current FEA data
2. Run NN optimization
3. Validate top candidates with FEA
4. Add validated results to training set
5. Repeat until convergence (max error < 5%)
#### 3. Turbo Mode (`--turbo`) ⚡ RECOMMENDED
Aggressive single-best validation:
```bash
python run_nn_optimization.py --turbo --nn-trials 5000 --batch-size 100 --retrain-every 10
```
Strategy:
- Run NN in small batches (100 trials)
- Validate ONLY the single best candidate with FEA
- Add to training data immediately
- Retrain surrogate every N FEA validations
- Repeat until total NN budget exhausted
**Example**: 5,000 NN trials with batch=100 → 50 FEA validations in ~12 minutes
### Configuration
```json
{
"neural_acceleration": {
"enabled": true,
"min_training_points": 50,
"auto_train": true,
"epochs": 300,
"validation_split": 0.2,
"nn_trials": 1000,
"validate_top_n": 10,
"model_file": "surrogate_best.pt",
"separate_nn_database": true
}
}
```
**Important**: `separate_nn_database: true` stores NN trials in `nn_study.db` instead of `study.db` to avoid overloading the dashboard with thousands of NN-only results.
### Typical Accuracy
| Objective | Expected Error |
|-----------|----------------|
| Mass | 1-5% |
| Stress | 1-4% |
| Stiffness | 5-15% |
### Output Files
```
2_results/
├── study.db # Main FEA + validated results (dashboard)
├── nn_study.db # NN-only results (not in dashboard)
├── surrogate_best.pt # Trained model weights
├── training_data.json # Normalized training data
├── nn_optimization_state.json # NN optimization state
├── nn_pareto_front.json # NN-predicted Pareto front
├── validation_report.json # FEA validation results
└── turbo_report.json # Turbo mode results (if used)
```
---
## GNN Field Predictor (Advanced)
### Core Components
| Component | File | Purpose |
|-----------|------|---------|
| BDF/OP2 Parser | `neural_field_parser.py` | Convert NX files to neural format |
| Data Validator | `validate_parsed_data.py` | Physics and quality checks |
| Field Predictor | `field_predictor.py` | GNN for full field prediction |
| Parametric Predictor | `parametric_predictor.py` | GNN for direct objectives |
| Physics Loss | `physics_losses.py` | Physics-informed training |
| Neural Surrogate | `neural_surrogate.py` | Integration with Atomizer |
| Neural Runner | `runner_with_neural.py` | Optimization with NN acceleration |
### Workflow Diagram
```
Traditional:
Design → NX Model → Mesh → Solve (30 min) → Results → Objective
Neural (after training):
Design → Neural Network (4.5 ms) → Results → Objective
```
---
## Neural Model Types
### 1. Field Predictor GNN
**Use Case**: When you need full field predictions (stress distribution, deformation shape).
```
Input Features (12D per node):
├── Node coordinates (x, y, z)
├── Material properties (E, nu, rho)
├── Boundary conditions (fixed/free per DOF)
└── Load information (force magnitude, direction)
GNN Layers (6 message passing):
├── MeshGraphConv (custom for FEA topology)
├── Layer normalization
├── ReLU activation
└── Dropout (0.1)
Output (per node):
├── Displacement (6 DOF: Tx, Ty, Tz, Rx, Ry, Rz)
└── Von Mises stress (1 value)
```
**Parameters**: ~718,221 trainable
### 2. Parametric Predictor GNN (Recommended)
**Use Case**: Direct optimization objective prediction (fastest option).
```
Design Parameters (ND) → Design Encoder (MLP) → GNN Backbone → Scalar Heads
Output (objectives):
├── mass (grams)
├── frequency (Hz)
├── max_displacement (mm)
└── max_stress (MPa)
```
**Parameters**: ~500,000 trainable
### 3. Ensemble Models
**Use Case**: Uncertainty quantification.
1. Train 3-5 models with different random seeds
2. At inference, run all models
3. Use mean for prediction, std for uncertainty
4. High uncertainty → trigger FEA validation
---
## Training Pipeline
### Step 1: Collect Training Data
Enable export in workflow config:
```json
{
"training_data_export": {
"enabled": true,
"export_dir": "atomizer_field_training_data/my_study"
}
}
```
Output structure:
```
atomizer_field_training_data/my_study/
├── trial_0001/
│ ├── input/model.bdf # Nastran input
│ ├── output/model.op2 # Binary results
│ └── metadata.json # Design params + objectives
├── trial_0002/
│ └── ...
└── study_summary.json
```
**Recommended**: 100-500 FEA samples for good generalization.
### Step 2: Parse to Neural Format
```bash
cd atomizer-field
python batch_parser.py ../atomizer_field_training_data/my_study
```
Creates HDF5 + JSON files per trial.
### Step 3: Train Model
**Parametric Predictor** (recommended):
```bash
python train_parametric.py \
--train_dir ../training_data/parsed \
--val_dir ../validation_data/parsed \
--epochs 200 \
--hidden_channels 128 \
--num_layers 4
```
**Field Predictor**:
```bash
python train.py \
--train_dir ../training_data/parsed \
--epochs 200 \
--model FieldPredictorGNN \
--hidden_channels 128 \
--num_layers 6 \
--physics_loss_weight 0.3
```
### Step 4: Validate
```bash
python validate.py --checkpoint runs/my_model/checkpoint_best.pt
```
Expected output:
```
Validation Results:
├── Mean Absolute Error: 2.3% (mass), 1.8% (frequency)
├── R² Score: 0.987
├── Inference Time: 4.5ms ± 0.8ms
└── Physics Violations: 0.2%
```
### Step 5: Deploy
```json
{
"neural_surrogate": {
"enabled": true,
"model_checkpoint": "atomizer-field/runs/my_model/checkpoint_best.pt",
"confidence_threshold": 0.85
}
}
```
---
## Configuration
### Full Neural Configuration Example
```json
{
"study_name": "bracket_neural_optimization",
"surrogate_settings": {
"enabled": true,
"model_type": "parametric_gnn",
"model_path": "models/bracket_surrogate.pt",
"confidence_threshold": 0.85,
"validation_frequency": 10,
"fallback_to_fea": true
},
"training_data_export": {
"enabled": true,
"export_dir": "atomizer_field_training_data/bracket_study",
"export_bdf": true,
"export_op2": true,
"export_fields": ["displacement", "stress"]
},
"neural_optimization": {
"initial_fea_trials": 50,
"neural_trials": 5000,
"retraining_interval": 500,
"uncertainty_threshold": 0.15
}
}
```
### Configuration Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `enabled` | bool | false | Enable neural surrogate |
| `model_type` | string | "parametric_gnn" | Model architecture |
| `model_path` | string | - | Path to trained model |
| `confidence_threshold` | float | 0.85 | Min confidence for predictions |
| `validation_frequency` | int | 10 | FEA validation every N trials |
| `fallback_to_fea` | bool | true | Use FEA when uncertain |
---
## Hybrid FEA/Neural Workflow
### Phase 1: FEA Exploration (50-100 trials)
- Run standard FEA optimization
- Export training data automatically
- Build landscape understanding
### Phase 2: Neural Training
- Parse collected data
- Train parametric predictor
- Validate accuracy
### Phase 3: Neural Acceleration (1000s of trials)
- Use neural network for rapid exploration
- Periodic FEA validation
- Retrain if distribution shifts
### Phase 4: FEA Refinement (10-20 trials)
- Validate top candidates with FEA
- Ensure results are physically accurate
- Generate final Pareto front
---
## Adaptive Iteration Loop
For complex optimizations, use iterative refinement:
```
┌─────────────────────────────────────────────────────────────────┐
│ Iteration 1: │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Initial FEA │ -> │ Train NN │ -> │ NN Search │ │
│ │ (50-100) │ │ Surrogate │ │ (1000 trials)│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ Iteration 2+: ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Validate Top │ -> │ Retrain NN │ -> │ NN Search │ │
│ │ NN with FEA │ │ with new data│ │ (1000 trials)│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
### Adaptive Configuration
```json
{
"adaptive_settings": {
"enabled": true,
"initial_fea_trials": 50,
"nn_trials_per_iteration": 1000,
"fea_validation_per_iteration": 5,
"max_iterations": 10,
"convergence_threshold": 0.01,
"retrain_epochs": 100
}
}
```
### Convergence Criteria
Stop when:
- No improvement for 2-3 consecutive iterations
- Reached FEA budget limit
- Objective improvement < 1% threshold
### Output Files
```
studies/my_study/3_results/
├── adaptive_state.json # Current iteration state
├── surrogate_model.pt # Trained neural network
└── training_history.json # NN training metrics
```
---
## Loss Functions
### Data Loss (MSE)
Standard prediction error:
```python
data_loss = MSE(predicted, target)
```
### Physics Loss
Enforce physical constraints:
```python
physics_loss = (
equilibrium_loss + # Force balance
boundary_loss + # BC satisfaction
compatibility_loss # Strain compatibility
)
```
### Combined Training
```python
total_loss = data_loss + 0.3 * physics_loss
```
Physics loss weight typically 0.1-0.5.
---
## Uncertainty Quantification
### Ensemble Method
```python
# Run N models
predictions = [model_i(x) for model_i in ensemble]
# Statistics
mean_prediction = np.mean(predictions)
uncertainty = np.std(predictions)
# Decision
if uncertainty > threshold:
# Use FEA instead
result = run_fea(x)
else:
result = mean_prediction
```
### Confidence Thresholds
| Uncertainty | Action |
|-------------|--------|
| < 5% | Use neural prediction |
| 5-15% | Use neural, flag for validation |
| > 15% | Fall back to FEA |
---
## Troubleshooting
| Symptom | Cause | Solution |
|---------|-------|----------|
| High prediction error | Insufficient training data | Collect more FEA samples |
| Out-of-distribution warnings | Design outside training range | Retrain with expanded range |
| Slow inference | Large mesh | Use parametric predictor instead |
| Physics violations | Low physics loss weight | Increase `physics_loss_weight` |
---
## Cross-References
- **Depends On**: [SYS_10_IMSO](./SYS_10_IMSO.md) for optimization framework
- **Used By**: [OP_02_RUN_OPTIMIZATION](../operations/OP_02_RUN_OPTIMIZATION.md), [OP_05_EXPORT_TRAINING_DATA](../operations/OP_05_EXPORT_TRAINING_DATA.md)
- **See Also**: [modules/neural-acceleration.md](../../.claude/skills/modules/neural-acceleration.md)
---
## Implementation Files
```
atomizer-field/
├── neural_field_parser.py # BDF/OP2 parsing
├── field_predictor.py # Field GNN
├── parametric_predictor.py # Parametric GNN
├── train.py # Field training
├── train_parametric.py # Parametric training
├── validate.py # Model validation
├── physics_losses.py # Physics-informed loss
└── batch_parser.py # Batch data conversion
optimization_engine/
├── neural_surrogate.py # Atomizer integration
└── runner_with_neural.py # Neural runner
```
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 2.0 | 2025-12-06 | Added MLP Surrogate with Turbo Mode |
| 1.0 | 2025-12-05 | Initial consolidation from neural docs |