Files
Atomizer/docs/protocols/operations/OP_05_EXPORT_TRAINING_DATA.md

295 lines
6.0 KiB
Markdown
Raw Normal View History

# OP_05: Export Training Data
<!--
PROTOCOL: Export Neural Network Training Data
LAYER: Operations
VERSION: 1.0
STATUS: Active
LAST_UPDATED: 2025-12-05
PRIVILEGE: user
LOAD_WITH: [SYS_14_NEURAL_ACCELERATION]
-->
## Overview
This protocol covers exporting FEA simulation data for training neural network surrogates. Proper data export enables Protocol 14 (Neural Acceleration).
---
## When to Use
| Trigger | Action |
|---------|--------|
| "export training data" | Follow this protocol |
| "neural network data" | Follow this protocol |
| Planning >50 trials | Consider export for acceleration |
| Want to train surrogate | Follow this protocol |
---
## Quick Reference
**Export Command**:
```bash
python run_optimization.py --export-training
```
**Output Structure**:
```
atomizer_field_training_data/{study_name}/
├── trial_0001/
│ ├── input/model.bdf
│ ├── output/model.op2
│ └── metadata.json
├── trial_0002/
│ └── ...
└── study_summary.json
```
**Recommended Data Volume**:
| Complexity | Training Samples | Validation Samples |
|------------|-----------------|-------------------|
| Simple (2-3 params) | 50-100 | 20-30 |
| Medium (4-6 params) | 100-200 | 30-50 |
| Complex (7+ params) | 200-500 | 50-100 |
---
## Configuration
### Enable Export in Config
Add to `optimization_config.json`:
```json
{
"training_data_export": {
"enabled": true,
"export_dir": "atomizer_field_training_data/my_study",
"export_bdf": true,
"export_op2": true,
"export_fields": ["displacement", "stress"],
"include_failed": false
}
}
```
### Configuration Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | bool | false | Enable export |
| `export_dir` | string | - | Output directory |
| `export_bdf` | bool | true | Save Nastran input |
| `export_op2` | bool | true | Save binary results |
| `export_fields` | list | all | Which result fields |
| `include_failed` | bool | false | Include failed trials |
---
## Export Workflow
### Step 1: Run with Export Enabled
```bash
conda activate atomizer
cd studies/my_study
python run_optimization.py --export-training
```
Or run standard optimization with config export enabled.
### Step 2: Verify Export
```bash
ls atomizer_field_training_data/my_study/
# Should see trial_0001/, trial_0002/, etc.
# Check a trial
ls atomizer_field_training_data/my_study/trial_0001/
# input/model.bdf
# output/model.op2
# metadata.json
```
### Step 3: Check Metadata
```bash
cat atomizer_field_training_data/my_study/trial_0001/metadata.json
```
```json
{
"trial_number": 1,
"design_parameters": {
"thickness": 5.2,
"width": 30.0
},
"objectives": {
"mass": 0.234,
"max_stress": 198.5
},
"constraints_satisfied": true,
"simulation_time": 145.2
}
```
### Step 4: Check Study Summary
```bash
cat atomizer_field_training_data/my_study/study_summary.json
```
```json
{
"study_name": "my_study",
"total_trials": 50,
"successful_exports": 47,
"failed_exports": 3,
"design_parameters": ["thickness", "width"],
"objectives": ["mass", "max_stress"],
"export_timestamp": "2025-12-05T15:30:00"
}
```
---
## Data Quality Checks
### Verify Sample Count
```python
from pathlib import Path
import json
export_dir = Path("atomizer_field_training_data/my_study")
trials = list(export_dir.glob("trial_*"))
print(f"Exported trials: {len(trials)}")
# Check for missing files
for trial_dir in trials:
bdf = trial_dir / "input" / "model.bdf"
op2 = trial_dir / "output" / "model.op2"
meta = trial_dir / "metadata.json"
if not all([bdf.exists(), op2.exists(), meta.exists()]):
print(f"Missing files in {trial_dir}")
```
### Check Parameter Coverage
```python
import json
import numpy as np
# Load all metadata
params = []
for trial_dir in export_dir.glob("trial_*"):
with open(trial_dir / "metadata.json") as f:
meta = json.load(f)
params.append(meta["design_parameters"])
# Check coverage
import pandas as pd
df = pd.DataFrame(params)
print(df.describe())
# Look for gaps
for col in df.columns:
print(f"{col}: min={df[col].min():.2f}, max={df[col].max():.2f}")
```
---
## Space-Filling Sampling
For best neural network training, use space-filling designs:
### Latin Hypercube Sampling
```python
from scipy.stats import qmc
# Generate space-filling samples
n_samples = 100
n_params = 4
sampler = qmc.LatinHypercube(d=n_params)
samples = sampler.random(n=n_samples)
# Scale to parameter bounds
lower = [2.0, 20.0, 5.0, 1.0]
upper = [10.0, 50.0, 15.0, 5.0]
scaled = qmc.scale(samples, lower, upper)
```
### Sobol Sequence
```python
sampler = qmc.Sobol(d=n_params)
samples = sampler.random(n=n_samples)
scaled = qmc.scale(samples, lower, upper)
```
---
## Next Steps After Export
### 1. Parse to Neural Format
```bash
cd atomizer-field
python batch_parser.py ../atomizer_field_training_data/my_study
```
### 2. Split Train/Validation
```python
from sklearn.model_selection import train_test_split
# 80/20 split
train_trials, val_trials = train_test_split(
all_trials,
test_size=0.2,
random_state=42
)
```
### 3. Train Model
```bash
python train_parametric.py \
--train_dir ../training_data/parsed \
--val_dir ../validation_data/parsed \
--epochs 200
```
See [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md) for full training workflow.
---
## Troubleshooting
| Symptom | Cause | Solution |
|---------|-------|----------|
| No export directory | Export not enabled | Add `training_data_export` to config |
| Missing OP2 files | Solve failed | Check `include_failed: false` |
| Incomplete metadata | Extraction error | Check extractor logs |
| Low sample count | Too many failures | Relax constraints |
---
## Cross-References
- **Related**: [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md)
- **Preceded By**: [OP_02_RUN_OPTIMIZATION](./OP_02_RUN_OPTIMIZATION.md)
- **Skill**: `.claude/skills/modules/neural-acceleration.md`
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-05 | Initial release |