# OP_05: Export Training Data

<!--
PROTOCOL: Export Neural Network Training Data
LAYER: Operations
VERSION: 1.0
STATUS: Active
LAST_UPDATED: 2025-12-05
PRIVILEGE: user
LOAD_WITH: [SYS_14_NEURAL_ACCELERATION]
-->

## Overview

This protocol covers exporting FEA simulation data for training neural network surrogates. Proper data export enables Protocol 14 (Neural Acceleration).

---

## When to Use

| Trigger | Action |
|---------|--------|
| "export training data" | Follow this protocol |
| "neural network data" | Follow this protocol |
| Planning >50 trials | Consider export for acceleration |
| Want to train surrogate | Follow this protocol |

---

## Quick Reference

**Export Command**:
```bash
python run_optimization.py --export-training
```

**Output Structure**:
```
atomizer_field_training_data/{study_name}/
├── trial_0001/
│   ├── input/model.bdf
│   ├── output/model.op2
│   └── metadata.json
├── trial_0002/
│   └── ...
└── study_summary.json
```

**Recommended Data Volume**:
| Complexity | Training Samples | Validation Samples |
|------------|-----------------|-------------------|
| Simple (2-3 params) | 50-100 | 20-30 |
| Medium (4-6 params) | 100-200 | 30-50 |
| Complex (7+ params) | 200-500 | 50-100 |

---

## Configuration

### Enable Export in Config

Add to `optimization_config.json`:

```json
{
  "training_data_export": {
    "enabled": true,
    "export_dir": "atomizer_field_training_data/my_study",
    "export_bdf": true,
    "export_op2": true,
    "export_fields": ["displacement", "stress"],
    "include_failed": false
  }
}
```

### Configuration Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | bool | false | Enable export |
| `export_dir` | string | - | Output directory |
| `export_bdf` | bool | true | Save Nastran input |
| `export_op2` | bool | true | Save binary results |
| `export_fields` | list | all | Which result fields |
| `include_failed` | bool | false | Include failed trials |

---

## Export Workflow

### Step 1: Run with Export Enabled

```bash
conda activate atomizer
cd studies/my_study
python run_optimization.py --export-training
```

Or run standard optimization with config export enabled.

### Step 2: Verify Export

```bash
ls atomizer_field_training_data/my_study/
# Should see trial_0001/, trial_0002/, etc.

# Check a trial
ls atomizer_field_training_data/my_study/trial_0001/
# input/model.bdf
# output/model.op2
# metadata.json
```

### Step 3: Check Metadata

```bash
cat atomizer_field_training_data/my_study/trial_0001/metadata.json
```

```json
{
  "trial_number": 1,
  "design_parameters": {
    "thickness": 5.2,
    "width": 30.0
  },
  "objectives": {
    "mass": 0.234,
    "max_stress": 198.5
  },
  "constraints_satisfied": true,
  "simulation_time": 145.2
}
```

### Step 4: Check Study Summary

```bash
cat atomizer_field_training_data/my_study/study_summary.json
```

```json
{
  "study_name": "my_study",
  "total_trials": 50,
  "successful_exports": 47,
  "failed_exports": 3,
  "design_parameters": ["thickness", "width"],
  "objectives": ["mass", "max_stress"],
  "export_timestamp": "2025-12-05T15:30:00"
}
```

---

## Data Quality Checks

### Verify Sample Count

```python
from pathlib import Path
import json

export_dir = Path("atomizer_field_training_data/my_study")
trials = list(export_dir.glob("trial_*"))
print(f"Exported trials: {len(trials)}")

# Check for missing files
for trial_dir in trials:
    bdf = trial_dir / "input" / "model.bdf"
    op2 = trial_dir / "output" / "model.op2"
    meta = trial_dir / "metadata.json"

    if not all([bdf.exists(), op2.exists(), meta.exists()]):
        print(f"Missing files in {trial_dir}")
```

### Check Parameter Coverage

```python
import json
import numpy as np

# Load all metadata
params = []
for trial_dir in export_dir.glob("trial_*"):
    with open(trial_dir / "metadata.json") as f:
        meta = json.load(f)
        params.append(meta["design_parameters"])

# Check coverage
import pandas as pd
df = pd.DataFrame(params)
print(df.describe())

# Look for gaps
for col in df.columns:
    print(f"{col}: min={df[col].min():.2f}, max={df[col].max():.2f}")
```

---

## Space-Filling Sampling

For best neural network training, use space-filling designs:

### Latin Hypercube Sampling

```python
from scipy.stats import qmc

# Generate space-filling samples
n_samples = 100
n_params = 4

sampler = qmc.LatinHypercube(d=n_params)
samples = sampler.random(n=n_samples)

# Scale to parameter bounds
lower = [2.0, 20.0, 5.0, 1.0]
upper = [10.0, 50.0, 15.0, 5.0]
scaled = qmc.scale(samples, lower, upper)
```

### Sobol Sequence

```python
sampler = qmc.Sobol(d=n_params)
samples = sampler.random(n=n_samples)
scaled = qmc.scale(samples, lower, upper)
```

---

## Next Steps After Export

### 1. Parse to Neural Format

```bash
cd atomizer-field
python batch_parser.py ../atomizer_field_training_data/my_study
```

### 2. Split Train/Validation

```python
from sklearn.model_selection import train_test_split

# 80/20 split
train_trials, val_trials = train_test_split(
    all_trials,
    test_size=0.2,
    random_state=42
)
```

### 3. Train Model

```bash
python train_parametric.py \
  --train_dir ../training_data/parsed \
  --val_dir ../validation_data/parsed \
  --epochs 200
```

See [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md) for full training workflow.

---

## Troubleshooting

| Symptom | Cause | Solution |
|---------|-------|----------|
| No export directory | Export not enabled | Add `training_data_export` to config |
| Missing OP2 files | Solve failed | Check `include_failed: false` |
| Incomplete metadata | Extraction error | Check extractor logs |
| Low sample count | Too many failures | Relax constraints |

---

## Cross-References

- **Related**: [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md)
- **Preceded By**: [OP_02_RUN_OPTIMIZATION](./OP_02_RUN_OPTIMIZATION.md)
- **Skill**: `.claude/skills/modules/neural-acceleration.md`

---

## Version History

| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-05 | Initial release |