Files
Atomizer/docs/04_USER_GUIDES/TRAINING_DATA_EXPORT_GUIDE.md

522 lines
13 KiB
Markdown
Raw Normal View History

# Training Data Export for AtomizerField
## Overview
The Training Data Export feature automatically captures NX Nastran input/output files and metadata during Atomizer optimization runs. This data is used to train AtomizerField neural network surrogate models that can replace slow FEA evaluations (30 min) with fast predictions (50 ms).
## Quick Start
Add this configuration to your `workflow_config.json`:
```json
{
"study_name": "my_optimization",
"design_variables": [...],
"objectives": [...],
"training_data_export": {
"enabled": true,
"export_dir": "atomizer_field_training_data/my_study_001"
}
}
```
Run your optimization as normal:
```bash
cd studies/my_optimization
python run_optimization.py
```
The training data will be automatically exported to the specified directory.
## How It Works
### During Optimization
After each trial:
1. **FEA Solve Completes**: NX Nastran generates `.dat` (input deck) and `.op2` (binary results) files
2. **Results Extraction**: Atomizer extracts objectives, constraints, and other metrics
3. **Data Export**: The exporter copies the NX files and creates metadata
4. **Trial Directory Created**: Structured directory with input, output, and metadata
### After Optimization
When optimization completes:
1. **Finalize Called**: Creates `study_summary.json` with overall study metadata
2. **README Generated**: Instructions for using the data with AtomizerField
3. **Ready for Training**: Data is structured for AtomizerField batch parser
## Directory Structure
After running an optimization with training data export enabled:
```
atomizer_field_training_data/my_study_001/
├── trial_0001/
│ ├── input/
│ │ └── model.bdf # NX Nastran input deck (BDF format)
│ ├── output/
│ │ └── model.op2 # NX Nastran binary results (OP2 format)
│ └── metadata.json # Design parameters, objectives, constraints
├── trial_0002/
│ └── ...
├── trial_0003/
│ └── ...
├── study_summary.json # Overall study metadata
└── README.md # Usage instructions
```
### metadata.json Format
Each trial's `metadata.json` contains:
```json
{
"trial_number": 42,
"timestamp": "2025-01-15T10:30:45.123456",
"atomizer_study": "my_optimization",
"design_parameters": {
"thickness": 3.5,
"width": 50.0,
"length": 200.0
},
"results": {
"objectives": {
"max_stress": 245.3,
"mass": 1.25
},
"constraints": {
"stress_limit": -54.7
},
"max_displacement": 1.23
}
}
```
### study_summary.json Format
The `study_summary.json` file contains:
```json
{
"study_name": "my_optimization",
"total_trials": 100,
"design_variables": ["thickness", "width", "length"],
"objectives": ["max_stress", "mass"],
"constraints": ["stress_limit"],
"export_timestamp": "2025-01-15T12:00:00.000000",
"metadata": {
"atomizer_version": "1.0",
"optimization_algorithm": "NSGA-II",
"n_trials": 100
}
}
```
## Configuration Options
### Basic Configuration
```json
"training_data_export": {
"enabled": true,
"export_dir": "path/to/export/directory"
}
```
**Parameters:**
- `enabled` (required): `true` to enable export, `false` to disable
- `export_dir` (required if enabled): Path to export directory (relative or absolute)
### Recommended Directory Structure
For organizing multiple studies:
```
atomizer_field_training_data/
├── beam_study_001/ # First beam optimization
│ └── trial_0001/ ...
├── beam_study_002/ # Second beam optimization (different parameters)
│ └── trial_0001/ ...
├── bracket_study_001/ # Bracket optimization
│ └── trial_0001/ ...
└── plate_study_001/ # Plate optimization
└── trial_0001/ ...
```
## Using Exported Data with AtomizerField
### Step 1: Parse Training Data
Convert BDF/OP2 files to PyTorch Geometric format:
```bash
cd Atomizer-Field
python batch_parser.py --data-dir "../Atomizer/atomizer_field_training_data/my_study_001"
```
This creates graph representations of the FEA data suitable for GNN training.
### Step 2: Validate Parsed Data
Ensure data was parsed correctly:
```bash
python validate_parsed_data.py
```
### Step 3: Train Neural Network
Train the GNN surrogate model:
```bash
python train.py --data-dir "training_data/parsed/" --epochs 200
```
### Step 4: Use Trained Model in Atomizer
Enable neural network surrogate in your optimization:
```bash
cd ../Atomizer
python run_optimization.py --config studies/my_study/workflow_config.json --use-neural
```
## Integration Points
The training data exporter integrates seamlessly with Atomizer's optimization flow:
### In `optimization_engine/runner.py`:
```python
from optimization_engine.training_data_exporter import create_exporter_from_config
class OptimizationRunner:
def __init__(self, config_path):
# ... existing initialization ...
# Initialize training data exporter (if enabled)
self.training_data_exporter = create_exporter_from_config(self.config)
if self.training_data_exporter:
print(f"Training data export enabled: {self.training_data_exporter.export_dir}")
def objective(self, trial):
# ... simulation and results extraction ...
# Export training data (if enabled)
if self.training_data_exporter:
simulation_files = {
'dat_file': path_to_dat,
'op2_file': path_to_op2
}
self.training_data_exporter.export_trial(
trial_number=trial.number,
design_variables=design_vars,
results=extracted_results,
simulation_files=simulation_files
)
def run(self):
# ... optimization loop ...
# Finalize training data export (if enabled)
if self.training_data_exporter:
self.training_data_exporter.finalize()
```
## File Formats
### BDF (.bdf) - Nastran Bulk Data File
- **Format**: ASCII text
- **Contains**:
- Mesh geometry (nodes, elements)
- Material properties
- Loads and boundary conditions
- Analysis parameters
### OP2 (.op2) - Nastran Output2
- **Format**: Binary
- **Contains**:
- Displacements
- Stresses (von Mises, principal, etc.)
- Strains
- Reaction forces
- Modal results (if applicable)
### JSON (.json) - Metadata
- **Format**: UTF-8 JSON
- **Contains**:
- Design parameter values
- Objective function values
- Constraint values
- Trial metadata (number, timestamp, study name)
## Example: Complete Workflow
### 1. Create Optimization Study
```python
import json
from pathlib import Path
config = {
"study_name": "beam_optimization",
"sim_file": "examples/Models/Beam/Beam.sim",
"fem_file": "examples/Models/Beam/Beam_fem1.fem",
"design_variables": [
{"name": "thickness", "expression_name": "thickness", "min": 2.0, "max": 8.0},
{"name": "width", "expression_name": "width", "min": 20.0, "max": 60.0}
],
"objectives": [
{
"name": "max_stress",
"type": "minimize",
"extractor": {"type": "result_parameter", "parameter_name": "Max Von Mises Stress"}
},
{
"name": "mass",
"type": "minimize",
"extractor": {"type": "expression", "expression_name": "mass"}
}
],
"optimization": {
"algorithm": "NSGA-II",
"n_trials": 100
},
# Enable training data export
"training_data_export": {
"enabled": True,
"export_dir": "atomizer_field_training_data/beam_study_001"
}
}
# Save config
config_path = Path("studies/beam_optimization/1_setup/workflow_config.json")
config_path.parent.mkdir(parents=True, exist_ok=True)
with open(config_path, 'w') as f:
json.dump(config, f, indent=2)
```
### 2. Run Optimization
```bash
cd studies/beam_optimization
python run_optimization.py
```
Console output will show:
```
Training data export enabled: atomizer_field_training_data/beam_study_001
...
Training data export finalized: 100 trials exported
```
### 3. Verify Export
```bash
dir atomizer_field_training_data\beam_study_001
```
You should see:
```
trial_0001/
trial_0002/
...
trial_0100/
study_summary.json
README.md
```
### 4. Train AtomizerField
```bash
cd Atomizer-Field
python batch_parser.py --data-dir "../Atomizer/atomizer_field_training_data/beam_study_001"
python train.py --data-dir "training_data/parsed/" --epochs 200
```
## Troubleshooting
### No .dat or .op2 Files Found
**Problem**: Export logs show "dat file not found" or "op2 file not found"
**Solution**:
- Ensure NX Nastran solver is writing these files
- Check NX simulation settings
- Verify file paths in `result_path`
### Export Directory Permission Error
**Problem**: `PermissionError` when creating export directory
**Solution**:
- Use absolute path or path relative to Atomizer root
- Ensure write permissions for the target directory
- Check disk space
### Missing Metadata Fields
**Problem**: `metadata.json` doesn't contain expected fields
**Solution**:
- Verify extractors are configured correctly in `workflow_config.json`
- Check that results are being extracted before export
- Review `extracted_results` dict in runner
### Large File Sizes
**Problem**: Export directory grows very large
**Solution**:
- OP2 files can be large (10-100 MB per trial)
- For 1000 trials, expect 10-100 GB of training data
- Use compression or cloud storage for large datasets
## Performance Considerations
### Disk I/O
- Each trial export involves 2 file copies (.dat and .op2)
- Minimal overhead (~100-500ms per trial)
- Negligible compared to FEA solve time (30 minutes)
### Storage Requirements
Typical file sizes per trial:
- `.dat` file: 1-10 MB (depends on mesh density)
- `.op2` file: 5-50 MB (depends on results requested)
- `metadata.json`: 1-5 KB
For 100 trials: ~600 MB - 6 GB
For 1000 trials: ~6 GB - 60 GB
## API Reference
### TrainingDataExporter Class
```python
from optimization_engine.training_data_exporter import TrainingDataExporter
exporter = TrainingDataExporter(
export_dir=Path("training_data/study_001"),
study_name="my_study",
design_variable_names=["thickness", "width"],
objective_names=["stress", "mass"],
constraint_names=["stress_limit"], # Optional
metadata={"version": "1.0"} # Optional
)
```
#### Methods
**export_trial(trial_number, design_variables, results, simulation_files)**
Export training data for a single trial.
- `trial_number` (int): Optuna trial number
- `design_variables` (dict): Design parameter names and values
- `results` (dict): Objectives, constraints, and other results
- `simulation_files` (dict): Paths to 'dat_file' and 'op2_file'
Returns `True` if successful, `False` otherwise.
**finalize()**
Finalize export by creating `study_summary.json`.
### Factory Function
**create_exporter_from_config(config)**
Create exporter from workflow configuration dict.
- `config` (dict): Workflow configuration
Returns `TrainingDataExporter` if enabled, `None` otherwise.
## Best Practices
### 1. Organize by Study Type
Group related studies together:
```
atomizer_field_training_data/
├── beams/
│ ├── cantilever_001/
│ ├── cantilever_002/
│ └── simply_supported_001/
└── brackets/
├── L_bracket_001/
└── T_bracket_001/
```
### 2. Use Descriptive Names
Include important parameters in study names:
```
beam_study_thickness_2-8_width_20-60_100trials
```
### 3. Version Your Studies
Track changes to design space or objectives:
```
bracket_study_001 # Initial study
bracket_study_002 # Expanded design space
bracket_study_003 # Added constraint
```
### 4. Document Metadata
Add custom metadata to track study details:
```json
"metadata": {
"description": "Initial beam study with basic design variables",
"date": "2025-01-15",
"engineer": "Your Name",
"validation_status": "pending"
}
```
### 5. Backup Training Data
Training data is valuable:
- Expensive to generate (hours/days of computation)
- Back up to cloud storage
- Consider version control for study configurations
## Future Enhancements
Planned improvements:
- [ ] Incremental export (resume after crash)
- [ ] Compression options (gzip .dat and .op2 files)
- [ ] Cloud upload integration (S3, Azure Blob)
- [ ] Export filtering (only export Pareto-optimal trials)
- [ ] Multi-fidelity support (tag high/low fidelity trials)
## See Also
- [AtomizerField Documentation](../../Atomizer-Field/docs/)
- [How to Extend Optimization](HOW_TO_EXTEND_OPTIMIZATION.md)
- [Hybrid Mode Guide](HYBRID_MODE_GUIDE.md)
## Support
For issues or questions:
1. Check the troubleshooting section above
2. Review [AtomizerField integration test plan](../Atomizer-Field/AtomizerField_Integration_Test_Plan.md)
3. Open an issue on GitHub with:
- Your `workflow_config.json`
- Export logs
- Error messages