docs/protocols/operations/OP_05_EXPORT_TRAINING_DATA.md

# OP_05: Export Training Data

<!--
PROTOCOL: Export Neural Network Training Data
LAYER: Operations
VERSION: 1.0
STATUS: Active
LAST_UPDATED: 2025-12-05
PRIVILEGE: user
LOAD_WITH: [SYS_14_NEURAL_ACCELERATION]
-->

## Overview

This protocol covers exporting FEA simulation data for training neural network surrogates. Proper data export enables Protocol 14 (Neural Acceleration).

---

## When to Use

| Trigger | Action |
|---------|--------|
| "export training data" | Follow this protocol |
| "neural network data" | Follow this protocol |
| Planning >50 trials | Consider export for acceleration |
| Want to train surrogate | Follow this protocol |

---

## Quick Reference

**Export Command**:
```bash
python run_optimization.py --export-training
```

**Output Structure**:
```
atomizer_field_training_data/{study_name}/
├── trial_0001/
│   ├── input/model.bdf
│   ├── output/model.op2
│   └── metadata.json
├── trial_0002/
│   └── ...
└── study_summary.json
```

**Recommended Data Volume**:
| Complexity | Training Samples | Validation Samples |
|------------|-----------------|-------------------|
| Simple (2-3 params) | 50-100 | 20-30 |
| Medium (4-6 params) | 100-200 | 30-50 |
| Complex (7+ params) | 200-500 | 50-100 |

---

## Configuration

### Enable Export in Config

Add to `optimization_config.json`:

```json
{
  "training_data_export": {
    "enabled": true,
    "export_dir": "atomizer_field_training_data/my_study",
    "export_bdf": true,
    "export_op2": true,
    "export_fields": ["displacement", "stress"],
    "include_failed": false
  }
}
```

### Configuration Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | bool | false | Enable export |
| `export_dir` | string | - | Output directory |
| `export_bdf` | bool | true | Save Nastran input |
| `export_op2` | bool | true | Save binary results |
| `export_fields` | list | all | Which result fields |
| `include_failed` | bool | false | Include failed trials |

---

## Export Workflow

### Step 1: Run with Export Enabled

```bash
conda activate atomizer
cd studies/my_study
python run_optimization.py --export-training
```

Or run standard optimization with config export enabled.

### Step 2: Verify Export

```bash
ls atomizer_field_training_data/my_study/
# Should see trial_0001/, trial_0002/, etc.

# Check a trial
ls atomizer_field_training_data/my_study/trial_0001/
# input/model.bdf
# output/model.op2
# metadata.json
```

### Step 3: Check Metadata

```bash
cat atomizer_field_training_data/my_study/trial_0001/metadata.json
```

```json
{
  "trial_number": 1,
  "design_parameters": {
    "thickness": 5.2,
    "width": 30.0
  },
  "objectives": {
    "mass": 0.234,
    "max_stress": 198.5
  },
  "constraints_satisfied": true,
  "simulation_time": 145.2
}
```

### Step 4: Check Study Summary

```bash
cat atomizer_field_training_data/my_study/study_summary.json
```

```json
{
  "study_name": "my_study",
  "total_trials": 50,
  "successful_exports": 47,
  "failed_exports": 3,
  "design_parameters": ["thickness", "width"],
  "objectives": ["mass", "max_stress"],
  "export_timestamp": "2025-12-05T15:30:00"
}
```

---

## Data Quality Checks

### Verify Sample Count

```python
from pathlib import Path
import json

export_dir = Path("atomizer_field_training_data/my_study")
trials = list(export_dir.glob("trial_*"))
print(f"Exported trials: {len(trials)}")

# Check for missing files
for trial_dir in trials:
    bdf = trial_dir / "input" / "model.bdf"
    op2 = trial_dir / "output" / "model.op2"
    meta = trial_dir / "metadata.json"

    if not all([bdf.exists(), op2.exists(), meta.exists()]):
        print(f"Missing files in {trial_dir}")
```

### Check Parameter Coverage

```python
import json
import numpy as np

# Load all metadata
params = []
for trial_dir in export_dir.glob("trial_*"):
    with open(trial_dir / "metadata.json") as f:
        meta = json.load(f)
        params.append(meta["design_parameters"])

# Check coverage
import pandas as pd
df = pd.DataFrame(params)
print(df.describe())

# Look for gaps
for col in df.columns:
    print(f"{col}: min={df[col].min():.2f}, max={df[col].max():.2f}")
```

---

## Space-Filling Sampling

For best neural network training, use space-filling designs:

### Latin Hypercube Sampling

```python
from scipy.stats import qmc

# Generate space-filling samples
n_samples = 100
n_params = 4

sampler = qmc.LatinHypercube(d=n_params)
samples = sampler.random(n=n_samples)

# Scale to parameter bounds
lower = [2.0, 20.0, 5.0, 1.0]
upper = [10.0, 50.0, 15.0, 5.0]
scaled = qmc.scale(samples, lower, upper)
```

### Sobol Sequence

```python
sampler = qmc.Sobol(d=n_params)
samples = sampler.random(n=n_samples)
scaled = qmc.scale(samples, lower, upper)
```

---

## Next Steps After Export

### 1. Parse to Neural Format

```bash
cd atomizer-field
python batch_parser.py ../atomizer_field_training_data/my_study
```

### 2. Split Train/Validation

```python
from sklearn.model_selection import train_test_split

# 80/20 split
train_trials, val_trials = train_test_split(
    all_trials,
    test_size=0.2,
    random_state=42
)
```

### 3. Train Model

```bash
python train_parametric.py \
  --train_dir ../training_data/parsed \
  --val_dir ../validation_data/parsed \
  --epochs 200
```

See [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md) for full training workflow.

---

## Troubleshooting

| Symptom | Cause | Solution |
|---------|-------|----------|
| No export directory | Export not enabled | Add `training_data_export` to config |
| Missing OP2 files | Solve failed | Check `include_failed: false` |
| Incomplete metadata | Extraction error | Check extractor logs |
| Low sample count | Too many failures | Relax constraints |

---

## Cross-References

- **Related**: [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md)
- **Preceded By**: [OP_02_RUN_OPTIMIZATION](./OP_02_RUN_OPTIMIZATION.md)
- **Skill**: `.claude/skills/modules/neural-acceleration.md`

---

## Version History

| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-05 | Initial release |
feat: Add MLP surrogate with Turbo Mode for 100x faster optimization Neural Acceleration (MLP Surrogate): - Add run_nn_optimization.py with hybrid FEA/NN workflow - MLP architecture: 4-layer (64->128->128->64) with BatchNorm/Dropout - Three workflow modes: - --all: Sequential export->train->optimize->validate - --hybrid-loop: Iterative Train->NN->Validate->Retrain cycle - --turbo: Aggressive single-best validation (RECOMMENDED) - Turbo mode: 5000 NN trials + 50 FEA validations in ~12 minutes - Separate nn_study.db to avoid overloading dashboard Performance Results (bracket_pareto_3obj study): - NN prediction errors: mass 1-5%, stress 1-4%, stiffness 5-15% - Found minimum mass designs at boundary (angle~30deg, thick~30mm) - 100x speedup vs pure FEA exploration Protocol Operating System: - Add .claude/skills/ with Bootstrap, Cheatsheet, Context Loader - Add docs/protocols/ with operations (OP_01-06) and system (SYS_10-14) - Update SYS_14_NEURAL_ACCELERATION.md with MLP Turbo Mode docs NX Automation: - Add optimization_engine/hooks/ for NX CAD/CAE automation - Add study_wizard.py for guided study creation - Fix FEM mesh update: load idealized part before UpdateFemodel() New Study: - bracket_pareto_3obj: 3-objective Pareto (mass, stress, stiffness) - 167 FEA trials + 5000 NN trials completed - Demonstrates full hybrid workflow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> 2025-12-06 20:01:59 -05:00			`# OP_05: Export Training Data`

			`<!--`
			`PROTOCOL: Export Neural Network Training Data`
			`LAYER: Operations`
			`VERSION: 1.0`
			`STATUS: Active`
			`LAST_UPDATED: 2025-12-05`
			`PRIVILEGE: user`
			`LOAD_WITH: [SYS_14_NEURAL_ACCELERATION]`
			`-->`

			`## Overview`

			`This protocol covers exporting FEA simulation data for training neural network surrogates. Proper data export enables Protocol 14 (Neural Acceleration).`

			`---`

			`## When to Use`

			`\| Trigger \| Action \|`
			`\|---------\|--------\|`
			`\| "export training data" \| Follow this protocol \|`
			`\| "neural network data" \| Follow this protocol \|`
			`\| Planning >50 trials \| Consider export for acceleration \|`
			`\| Want to train surrogate \| Follow this protocol \|`

			`---`

			`## Quick Reference`

			`Export Command:`
			```bash
			`python run_optimization.py --export-training`
			```

			`Output Structure:`
			```
			`atomizer_field_training_data/{study_name}/`
			`├── trial_0001/`
			`│ ├── input/model.bdf`
			`│ ├── output/model.op2`
			`│ └── metadata.json`
			`├── trial_0002/`
			`│ └── ...`
			`└── study_summary.json`
			```

			`Recommended Data Volume:`
			`\| Complexity \| Training Samples \| Validation Samples \|`
			`\|------------\|-----------------\|-------------------\|`
			`\| Simple (2-3 params) \| 50-100 \| 20-30 \|`
			`\| Medium (4-6 params) \| 100-200 \| 30-50 \|`
			`\| Complex (7+ params) \| 200-500 \| 50-100 \|`

			`---`

			`## Configuration`

			`### Enable Export in Config`

			Add to `optimization_config.json`:

			```json
			`{`
			`"training_data_export": {`
			`"enabled": true,`
			`"export_dir": "atomizer_field_training_data/my_study",`
			`"export_bdf": true,`
			`"export_op2": true,`
			`"export_fields": ["displacement", "stress"],`
			`"include_failed": false`
			`}`
			`}`
			```

			`### Configuration Options`

			`\| Option \| Type \| Default \| Description \|`
			`\|--------\|------\|---------\|-------------\|`
			\| `enabled` \| bool \| false \| Enable export \|
			\| `export_dir` \| string \| - \| Output directory \|
			\| `export_bdf` \| bool \| true \| Save Nastran input \|
			\| `export_op2` \| bool \| true \| Save binary results \|
			\| `export_fields` \| list \| all \| Which result fields \|
			\| `include_failed` \| bool \| false \| Include failed trials \|

			`---`

			`## Export Workflow`

			`### Step 1: Run with Export Enabled`

			```bash
			`conda activate atomizer`
			`cd studies/my_study`
			`python run_optimization.py --export-training`
			```

			`Or run standard optimization with config export enabled.`

			`### Step 2: Verify Export`

			```bash
			`ls atomizer_field_training_data/my_study/`
			`# Should see trial_0001/, trial_0002/, etc.`

			`# Check a trial`
			`ls atomizer_field_training_data/my_study/trial_0001/`
			`# input/model.bdf`
			`# output/model.op2`
			`# metadata.json`
			```

			`### Step 3: Check Metadata`

			```bash
			`cat atomizer_field_training_data/my_study/trial_0001/metadata.json`
			```

			```json
			`{`
			`"trial_number": 1,`
			`"design_parameters": {`
			`"thickness": 5.2,`
			`"width": 30.0`
			`},`
			`"objectives": {`
			`"mass": 0.234,`
			`"max_stress": 198.5`
			`},`
			`"constraints_satisfied": true,`
			`"simulation_time": 145.2`
			`}`
			```

			`### Step 4: Check Study Summary`

			```bash
			`cat atomizer_field_training_data/my_study/study_summary.json`
			```

			```json
			`{`
			`"study_name": "my_study",`
			`"total_trials": 50,`
			`"successful_exports": 47,`
			`"failed_exports": 3,`
			`"design_parameters": ["thickness", "width"],`
			`"objectives": ["mass", "max_stress"],`
			`"export_timestamp": "2025-12-05T15:30:00"`
			`}`
			```

			`---`

			`## Data Quality Checks`

			`### Verify Sample Count`

			```python
			`from pathlib import Path`
			`import json`

			`export_dir = Path("atomizer_field_training_data/my_study")`
			`trials = list(export_dir.glob("trial_*"))`
			`print(f"Exported trials: {len(trials)}")`

			`# Check for missing files`
			`for trial_dir in trials:`
			`bdf = trial_dir / "input" / "model.bdf"`
			`op2 = trial_dir / "output" / "model.op2"`
			`meta = trial_dir / "metadata.json"`

			`if not all([bdf.exists(), op2.exists(), meta.exists()]):`
			`print(f"Missing files in {trial_dir}")`
			```

			`### Check Parameter Coverage`

			```python
			`import json`
			`import numpy as np`

			`# Load all metadata`
			`params = []`
			`for trial_dir in export_dir.glob("trial_*"):`
			`with open(trial_dir / "metadata.json") as f:`
			`meta = json.load(f)`
			`params.append(meta["design_parameters"])`

			`# Check coverage`
			`import pandas as pd`
			`df = pd.DataFrame(params)`
			`print(df.describe())`

			`# Look for gaps`
			`for col in df.columns:`
			`print(f"{col}: min={df[col].min():.2f}, max={df[col].max():.2f}")`
			```

			`---`

			`## Space-Filling Sampling`

			`For best neural network training, use space-filling designs:`

			`### Latin Hypercube Sampling`

			```python
			`from scipy.stats import qmc`

			`# Generate space-filling samples`
			`n_samples = 100`
			`n_params = 4`

			`sampler = qmc.LatinHypercube(d=n_params)`
			`samples = sampler.random(n=n_samples)`

			`# Scale to parameter bounds`
			`lower = [2.0, 20.0, 5.0, 1.0]`
			`upper = [10.0, 50.0, 15.0, 5.0]`
			`scaled = qmc.scale(samples, lower, upper)`
			```

			`### Sobol Sequence`

			```python
			`sampler = qmc.Sobol(d=n_params)`
			`samples = sampler.random(n=n_samples)`
			`scaled = qmc.scale(samples, lower, upper)`
			```

			`---`

			`## Next Steps After Export`

			`### 1. Parse to Neural Format`

			```bash
			`cd atomizer-field`
			`python batch_parser.py ../atomizer_field_training_data/my_study`
			```

			`### 2. Split Train/Validation`

			```python
			`from sklearn.model_selection import train_test_split`

			`# 80/20 split`
			`train_trials, val_trials = train_test_split(`
			`all_trials,`
			`test_size=0.2,`
			`random_state=42`
			`)`
			```

			`### 3. Train Model`

			```bash
			`python train_parametric.py \`
			`--train_dir ../training_data/parsed \`
			`--val_dir ../validation_data/parsed \`
			`--epochs 200`
			```

			`See [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md) for full training workflow.`

			`---`

			`## Troubleshooting`

			`\| Symptom \| Cause \| Solution \|`
			`\|---------\|-------\|----------\|`
			\| No export directory \| Export not enabled \| Add `training_data_export` to config \|
			\| Missing OP2 files \| Solve failed \| Check `include_failed: false` \|
			`\| Incomplete metadata \| Extraction error \| Check extractor logs \|`
			`\| Low sample count \| Too many failures \| Relax constraints \|`

			`---`

			`## Cross-References`

			`- Related: [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md)`
			`- Preceded By: [OP_02_RUN_OPTIMIZATION](./OP_02_RUN_OPTIMIZATION.md)`
			- Skill: `.claude/skills/modules/neural-acceleration.md`

			`---`

			`## Version History`

			`\| Version \| Date \| Changes \|`
			`\|---------\|------\|---------\|`
			`\| 1.0 \| 2025-12-05 \| Initial release \|`