# OP_05: Export Training Data ## Overview This protocol covers exporting FEA simulation data for training neural network surrogates. Proper data export enables Protocol 14 (Neural Acceleration). --- ## When to Use | Trigger | Action | |---------|--------| | "export training data" | Follow this protocol | | "neural network data" | Follow this protocol | | Planning >50 trials | Consider export for acceleration | | Want to train surrogate | Follow this protocol | --- ## Quick Reference **Export Command**: ```bash python run_optimization.py --export-training ``` **Output Structure**: ``` atomizer_field_training_data/{study_name}/ ├── trial_0001/ │ ├── input/model.bdf │ ├── output/model.op2 │ └── metadata.json ├── trial_0002/ │ └── ... └── study_summary.json ``` **Recommended Data Volume**: | Complexity | Training Samples | Validation Samples | |------------|-----------------|-------------------| | Simple (2-3 params) | 50-100 | 20-30 | | Medium (4-6 params) | 100-200 | 30-50 | | Complex (7+ params) | 200-500 | 50-100 | --- ## Configuration ### Enable Export in Config Add to `optimization_config.json`: ```json { "training_data_export": { "enabled": true, "export_dir": "atomizer_field_training_data/my_study", "export_bdf": true, "export_op2": true, "export_fields": ["displacement", "stress"], "include_failed": false } } ``` ### Configuration Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `enabled` | bool | false | Enable export | | `export_dir` | string | - | Output directory | | `export_bdf` | bool | true | Save Nastran input | | `export_op2` | bool | true | Save binary results | | `export_fields` | list | all | Which result fields | | `include_failed` | bool | false | Include failed trials | --- ## Export Workflow ### Step 1: Run with Export Enabled ```bash conda activate atomizer cd studies/my_study python run_optimization.py --export-training ``` Or run standard optimization with config export enabled. ### Step 2: Verify Export ```bash ls atomizer_field_training_data/my_study/ # Should see trial_0001/, trial_0002/, etc. # Check a trial ls atomizer_field_training_data/my_study/trial_0001/ # input/model.bdf # output/model.op2 # metadata.json ``` ### Step 3: Check Metadata ```bash cat atomizer_field_training_data/my_study/trial_0001/metadata.json ``` ```json { "trial_number": 1, "design_parameters": { "thickness": 5.2, "width": 30.0 }, "objectives": { "mass": 0.234, "max_stress": 198.5 }, "constraints_satisfied": true, "simulation_time": 145.2 } ``` ### Step 4: Check Study Summary ```bash cat atomizer_field_training_data/my_study/study_summary.json ``` ```json { "study_name": "my_study", "total_trials": 50, "successful_exports": 47, "failed_exports": 3, "design_parameters": ["thickness", "width"], "objectives": ["mass", "max_stress"], "export_timestamp": "2025-12-05T15:30:00" } ``` --- ## Data Quality Checks ### Verify Sample Count ```python from pathlib import Path import json export_dir = Path("atomizer_field_training_data/my_study") trials = list(export_dir.glob("trial_*")) print(f"Exported trials: {len(trials)}") # Check for missing files for trial_dir in trials: bdf = trial_dir / "input" / "model.bdf" op2 = trial_dir / "output" / "model.op2" meta = trial_dir / "metadata.json" if not all([bdf.exists(), op2.exists(), meta.exists()]): print(f"Missing files in {trial_dir}") ``` ### Check Parameter Coverage ```python import json import numpy as np # Load all metadata params = [] for trial_dir in export_dir.glob("trial_*"): with open(trial_dir / "metadata.json") as f: meta = json.load(f) params.append(meta["design_parameters"]) # Check coverage import pandas as pd df = pd.DataFrame(params) print(df.describe()) # Look for gaps for col in df.columns: print(f"{col}: min={df[col].min():.2f}, max={df[col].max():.2f}") ``` --- ## Space-Filling Sampling For best neural network training, use space-filling designs: ### Latin Hypercube Sampling ```python from scipy.stats import qmc # Generate space-filling samples n_samples = 100 n_params = 4 sampler = qmc.LatinHypercube(d=n_params) samples = sampler.random(n=n_samples) # Scale to parameter bounds lower = [2.0, 20.0, 5.0, 1.0] upper = [10.0, 50.0, 15.0, 5.0] scaled = qmc.scale(samples, lower, upper) ``` ### Sobol Sequence ```python sampler = qmc.Sobol(d=n_params) samples = sampler.random(n=n_samples) scaled = qmc.scale(samples, lower, upper) ``` --- ## Next Steps After Export ### 1. Parse to Neural Format ```bash cd atomizer-field python batch_parser.py ../atomizer_field_training_data/my_study ``` ### 2. Split Train/Validation ```python from sklearn.model_selection import train_test_split # 80/20 split train_trials, val_trials = train_test_split( all_trials, test_size=0.2, random_state=42 ) ``` ### 3. Train Model ```bash python train_parametric.py \ --train_dir ../training_data/parsed \ --val_dir ../validation_data/parsed \ --epochs 200 ``` See [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md) for full training workflow. --- ## Troubleshooting | Symptom | Cause | Solution | |---------|-------|----------| | No export directory | Export not enabled | Add `training_data_export` to config | | Missing OP2 files | Solve failed | Check `include_failed: false` | | Incomplete metadata | Extraction error | Check extractor logs | | Low sample count | Too many failures | Relax constraints | --- ## Cross-References - **Related**: [SYS_14_NEURAL_ACCELERATION](../system/SYS_14_NEURAL_ACCELERATION.md) - **Preceded By**: [OP_02_RUN_OPTIMIZATION](./OP_02_RUN_OPTIMIZATION.md) - **Skill**: `.claude/skills/modules/neural-acceleration.md` --- ## Version History | Version | Date | Changes | |---------|------|---------| | 1.0 | 2025-12-05 | Initial release |