Files
Atomizer/docs/04_USER_GUIDES/TRAINING_DATA_EXPORT_GUIDE.md
Anto01 e3bdb08a22 feat: Major update with validators, skills, dashboard, and docs reorganization
- Add validation framework (config, model, results, study validators)
- Add Claude Code skills (create-study, run-optimization, generate-report,
  troubleshoot, analyze-model)
- Add Atomizer Dashboard (React frontend + FastAPI backend)
- Reorganize docs into structured directories (00-09)
- Add neural surrogate modules and training infrastructure
- Add multi-objective optimization support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 19:23:58 -05:00

13 KiB

Training Data Export for AtomizerField

Overview

The Training Data Export feature automatically captures NX Nastran input/output files and metadata during Atomizer optimization runs. This data is used to train AtomizerField neural network surrogate models that can replace slow FEA evaluations (30 min) with fast predictions (50 ms).

Quick Start

Add this configuration to your workflow_config.json:

{
  "study_name": "my_optimization",
  "design_variables": [...],
  "objectives": [...],

  "training_data_export": {
    "enabled": true,
    "export_dir": "atomizer_field_training_data/my_study_001"
  }
}

Run your optimization as normal:

cd studies/my_optimization
python run_optimization.py

The training data will be automatically exported to the specified directory.

How It Works

During Optimization

After each trial:

  1. FEA Solve Completes: NX Nastran generates .dat (input deck) and .op2 (binary results) files
  2. Results Extraction: Atomizer extracts objectives, constraints, and other metrics
  3. Data Export: The exporter copies the NX files and creates metadata
  4. Trial Directory Created: Structured directory with input, output, and metadata

After Optimization

When optimization completes:

  1. Finalize Called: Creates study_summary.json with overall study metadata
  2. README Generated: Instructions for using the data with AtomizerField
  3. Ready for Training: Data is structured for AtomizerField batch parser

Directory Structure

After running an optimization with training data export enabled:

atomizer_field_training_data/my_study_001/
├── trial_0001/
│   ├── input/
│   │   └── model.bdf          # NX Nastran input deck (BDF format)
│   ├── output/
│   │   └── model.op2          # NX Nastran binary results (OP2 format)
│   └── metadata.json          # Design parameters, objectives, constraints
├── trial_0002/
│   └── ...
├── trial_0003/
│   └── ...
├── study_summary.json         # Overall study metadata
└── README.md                  # Usage instructions

metadata.json Format

Each trial's metadata.json contains:

{
  "trial_number": 42,
  "timestamp": "2025-01-15T10:30:45.123456",
  "atomizer_study": "my_optimization",
  "design_parameters": {
    "thickness": 3.5,
    "width": 50.0,
    "length": 200.0
  },
  "results": {
    "objectives": {
      "max_stress": 245.3,
      "mass": 1.25
    },
    "constraints": {
      "stress_limit": -54.7
    },
    "max_displacement": 1.23
  }
}

study_summary.json Format

The study_summary.json file contains:

{
  "study_name": "my_optimization",
  "total_trials": 100,
  "design_variables": ["thickness", "width", "length"],
  "objectives": ["max_stress", "mass"],
  "constraints": ["stress_limit"],
  "export_timestamp": "2025-01-15T12:00:00.000000",
  "metadata": {
    "atomizer_version": "1.0",
    "optimization_algorithm": "NSGA-II",
    "n_trials": 100
  }
}

Configuration Options

Basic Configuration

"training_data_export": {
  "enabled": true,
  "export_dir": "path/to/export/directory"
}

Parameters:

  • enabled (required): true to enable export, false to disable
  • export_dir (required if enabled): Path to export directory (relative or absolute)

For organizing multiple studies:

atomizer_field_training_data/
├── beam_study_001/          # First beam optimization
│   └── trial_0001/ ...
├── beam_study_002/          # Second beam optimization (different parameters)
│   └── trial_0001/ ...
├── bracket_study_001/       # Bracket optimization
│   └── trial_0001/ ...
└── plate_study_001/         # Plate optimization
    └── trial_0001/ ...

Using Exported Data with AtomizerField

Step 1: Parse Training Data

Convert BDF/OP2 files to PyTorch Geometric format:

cd Atomizer-Field
python batch_parser.py --data-dir "../Atomizer/atomizer_field_training_data/my_study_001"

This creates graph representations of the FEA data suitable for GNN training.

Step 2: Validate Parsed Data

Ensure data was parsed correctly:

python validate_parsed_data.py

Step 3: Train Neural Network

Train the GNN surrogate model:

python train.py --data-dir "training_data/parsed/" --epochs 200

Step 4: Use Trained Model in Atomizer

Enable neural network surrogate in your optimization:

cd ../Atomizer
python run_optimization.py --config studies/my_study/workflow_config.json --use-neural

Integration Points

The training data exporter integrates seamlessly with Atomizer's optimization flow:

In optimization_engine/runner.py:

from optimization_engine.training_data_exporter import create_exporter_from_config

class OptimizationRunner:
    def __init__(self, config_path):
        # ... existing initialization ...

        # Initialize training data exporter (if enabled)
        self.training_data_exporter = create_exporter_from_config(self.config)
        if self.training_data_exporter:
            print(f"Training data export enabled: {self.training_data_exporter.export_dir}")

    def objective(self, trial):
        # ... simulation and results extraction ...

        # Export training data (if enabled)
        if self.training_data_exporter:
            simulation_files = {
                'dat_file': path_to_dat,
                'op2_file': path_to_op2
            }
            self.training_data_exporter.export_trial(
                trial_number=trial.number,
                design_variables=design_vars,
                results=extracted_results,
                simulation_files=simulation_files
            )

    def run(self):
        # ... optimization loop ...

        # Finalize training data export (if enabled)
        if self.training_data_exporter:
            self.training_data_exporter.finalize()

File Formats

BDF (.bdf) - Nastran Bulk Data File

  • Format: ASCII text
  • Contains:
    • Mesh geometry (nodes, elements)
    • Material properties
    • Loads and boundary conditions
    • Analysis parameters

OP2 (.op2) - Nastran Output2

  • Format: Binary
  • Contains:
    • Displacements
    • Stresses (von Mises, principal, etc.)
    • Strains
    • Reaction forces
    • Modal results (if applicable)

JSON (.json) - Metadata

  • Format: UTF-8 JSON
  • Contains:
    • Design parameter values
    • Objective function values
    • Constraint values
    • Trial metadata (number, timestamp, study name)

Example: Complete Workflow

1. Create Optimization Study

import json
from pathlib import Path

config = {
    "study_name": "beam_optimization",
    "sim_file": "examples/Models/Beam/Beam.sim",
    "fem_file": "examples/Models/Beam/Beam_fem1.fem",

    "design_variables": [
        {"name": "thickness", "expression_name": "thickness", "min": 2.0, "max": 8.0},
        {"name": "width", "expression_name": "width", "min": 20.0, "max": 60.0}
    ],

    "objectives": [
        {
            "name": "max_stress",
            "type": "minimize",
            "extractor": {"type": "result_parameter", "parameter_name": "Max Von Mises Stress"}
        },
        {
            "name": "mass",
            "type": "minimize",
            "extractor": {"type": "expression", "expression_name": "mass"}
        }
    ],

    "optimization": {
        "algorithm": "NSGA-II",
        "n_trials": 100
    },

    # Enable training data export
    "training_data_export": {
        "enabled": True,
        "export_dir": "atomizer_field_training_data/beam_study_001"
    }
}

# Save config
config_path = Path("studies/beam_optimization/1_setup/workflow_config.json")
config_path.parent.mkdir(parents=True, exist_ok=True)
with open(config_path, 'w') as f:
    json.dump(config, f, indent=2)

2. Run Optimization

cd studies/beam_optimization
python run_optimization.py

Console output will show:

Training data export enabled: atomizer_field_training_data/beam_study_001
...
Training data export finalized: 100 trials exported

3. Verify Export

dir atomizer_field_training_data\beam_study_001

You should see:

trial_0001/
trial_0002/
...
trial_0100/
study_summary.json
README.md

4. Train AtomizerField

cd Atomizer-Field
python batch_parser.py --data-dir "../Atomizer/atomizer_field_training_data/beam_study_001"
python train.py --data-dir "training_data/parsed/" --epochs 200

Troubleshooting

No .dat or .op2 Files Found

Problem: Export logs show "dat file not found" or "op2 file not found"

Solution:

  • Ensure NX Nastran solver is writing these files
  • Check NX simulation settings
  • Verify file paths in result_path

Export Directory Permission Error

Problem: PermissionError when creating export directory

Solution:

  • Use absolute path or path relative to Atomizer root
  • Ensure write permissions for the target directory
  • Check disk space

Missing Metadata Fields

Problem: metadata.json doesn't contain expected fields

Solution:

  • Verify extractors are configured correctly in workflow_config.json
  • Check that results are being extracted before export
  • Review extracted_results dict in runner

Large File Sizes

Problem: Export directory grows very large

Solution:

  • OP2 files can be large (10-100 MB per trial)
  • For 1000 trials, expect 10-100 GB of training data
  • Use compression or cloud storage for large datasets

Performance Considerations

Disk I/O

  • Each trial export involves 2 file copies (.dat and .op2)
  • Minimal overhead (~100-500ms per trial)
  • Negligible compared to FEA solve time (30 minutes)

Storage Requirements

Typical file sizes per trial:

  • .dat file: 1-10 MB (depends on mesh density)
  • .op2 file: 5-50 MB (depends on results requested)
  • metadata.json: 1-5 KB

For 100 trials: ~600 MB - 6 GB For 1000 trials: ~6 GB - 60 GB

API Reference

TrainingDataExporter Class

from optimization_engine.training_data_exporter import TrainingDataExporter

exporter = TrainingDataExporter(
    export_dir=Path("training_data/study_001"),
    study_name="my_study",
    design_variable_names=["thickness", "width"],
    objective_names=["stress", "mass"],
    constraint_names=["stress_limit"],  # Optional
    metadata={"version": "1.0"}         # Optional
)

Methods

export_trial(trial_number, design_variables, results, simulation_files)

Export training data for a single trial.

  • trial_number (int): Optuna trial number
  • design_variables (dict): Design parameter names and values
  • results (dict): Objectives, constraints, and other results
  • simulation_files (dict): Paths to 'dat_file' and 'op2_file'

Returns True if successful, False otherwise.

finalize()

Finalize export by creating study_summary.json.

Factory Function

create_exporter_from_config(config)

Create exporter from workflow configuration dict.

  • config (dict): Workflow configuration

Returns TrainingDataExporter if enabled, None otherwise.

Best Practices

1. Organize by Study Type

Group related studies together:

atomizer_field_training_data/
├── beams/
│   ├── cantilever_001/
│   ├── cantilever_002/
│   └── simply_supported_001/
└── brackets/
    ├── L_bracket_001/
    └── T_bracket_001/

2. Use Descriptive Names

Include important parameters in study names:

beam_study_thickness_2-8_width_20-60_100trials

3. Version Your Studies

Track changes to design space or objectives:

bracket_study_001  # Initial study
bracket_study_002  # Expanded design space
bracket_study_003  # Added constraint

4. Document Metadata

Add custom metadata to track study details:

"metadata": {
  "description": "Initial beam study with basic design variables",
  "date": "2025-01-15",
  "engineer": "Your Name",
  "validation_status": "pending"
}

5. Backup Training Data

Training data is valuable:

  • Expensive to generate (hours/days of computation)
  • Back up to cloud storage
  • Consider version control for study configurations

Future Enhancements

Planned improvements:

  • Incremental export (resume after crash)
  • Compression options (gzip .dat and .op2 files)
  • Cloud upload integration (S3, Azure Blob)
  • Export filtering (only export Pareto-optimal trials)
  • Multi-fidelity support (tag high/low fidelity trials)

See Also

Support

For issues or questions:

  1. Check the troubleshooting section above
  2. Review AtomizerField integration test plan
  3. Open an issue on GitHub with:
    • Your workflow_config.json
    • Export logs
    • Error messages