Files
Atomizer/hq/skills/atomizer-protocols/protocols/OP_05_EXPORT_TRAINING_DATA.md
Antoine 3289a76e19 feat: add Atomizer HQ multi-agent cluster infrastructure
- 8-agent OpenClaw cluster (Manager, Tech-Lead, Secretary, Auditor,
  Optimizer, Study-Builder, NX-Expert, Webster)
- Orchestration engine: orchestrate.py (sync delegation + handoffs)
- Workflow engine: YAML-defined multi-step pipelines
- Agent workspaces: SOUL.md, AGENTS.md, MEMORY.md per agent
- Shared skills: delegate, orchestrate, atomizer-protocols
- Capability registry (AGENTS_REGISTRY.json)
- Cluster management: cluster.sh, systemd template
- All secrets replaced with env var references
2026-02-15 21:18:18 +00:00

6.0 KiB

OP_05: Export Training Data

Overview

This protocol covers exporting FEA simulation data for training neural network surrogates. Proper data export enables Protocol 14 (Neural Acceleration).


When to Use

Trigger Action
"export training data" Follow this protocol
"neural network data" Follow this protocol
Planning >50 trials Consider export for acceleration
Want to train surrogate Follow this protocol

Quick Reference

Export Command:

python run_optimization.py --export-training

Output Structure:

atomizer_field_training_data/{study_name}/
├── trial_0001/
│   ├── input/model.bdf
│   ├── output/model.op2
│   └── metadata.json
├── trial_0002/
│   └── ...
└── study_summary.json

Recommended Data Volume:

Complexity Training Samples Validation Samples
Simple (2-3 params) 50-100 20-30
Medium (4-6 params) 100-200 30-50
Complex (7+ params) 200-500 50-100

Configuration

Enable Export in Config

Add to optimization_config.json:

{
  "training_data_export": {
    "enabled": true,
    "export_dir": "atomizer_field_training_data/my_study",
    "export_bdf": true,
    "export_op2": true,
    "export_fields": ["displacement", "stress"],
    "include_failed": false
  }
}

Configuration Options

Option Type Default Description
enabled bool false Enable export
export_dir string - Output directory
export_bdf bool true Save Nastran input
export_op2 bool true Save binary results
export_fields list all Which result fields
include_failed bool false Include failed trials

Export Workflow

Step 1: Run with Export Enabled

conda activate atomizer
cd studies/my_study
python run_optimization.py --export-training

Or run standard optimization with config export enabled.

Step 2: Verify Export

ls atomizer_field_training_data/my_study/
# Should see trial_0001/, trial_0002/, etc.

# Check a trial
ls atomizer_field_training_data/my_study/trial_0001/
# input/model.bdf
# output/model.op2
# metadata.json

Step 3: Check Metadata

cat atomizer_field_training_data/my_study/trial_0001/metadata.json
{
  "trial_number": 1,
  "design_parameters": {
    "thickness": 5.2,
    "width": 30.0
  },
  "objectives": {
    "mass": 0.234,
    "max_stress": 198.5
  },
  "constraints_satisfied": true,
  "simulation_time": 145.2
}

Step 4: Check Study Summary

cat atomizer_field_training_data/my_study/study_summary.json
{
  "study_name": "my_study",
  "total_trials": 50,
  "successful_exports": 47,
  "failed_exports": 3,
  "design_parameters": ["thickness", "width"],
  "objectives": ["mass", "max_stress"],
  "export_timestamp": "2025-12-05T15:30:00"
}

Data Quality Checks

Verify Sample Count

from pathlib import Path
import json

export_dir = Path("atomizer_field_training_data/my_study")
trials = list(export_dir.glob("trial_*"))
print(f"Exported trials: {len(trials)}")

# Check for missing files
for trial_dir in trials:
    bdf = trial_dir / "input" / "model.bdf"
    op2 = trial_dir / "output" / "model.op2"
    meta = trial_dir / "metadata.json"

    if not all([bdf.exists(), op2.exists(), meta.exists()]):
        print(f"Missing files in {trial_dir}")

Check Parameter Coverage

import json
import numpy as np

# Load all metadata
params = []
for trial_dir in export_dir.glob("trial_*"):
    with open(trial_dir / "metadata.json") as f:
        meta = json.load(f)
        params.append(meta["design_parameters"])

# Check coverage
import pandas as pd
df = pd.DataFrame(params)
print(df.describe())

# Look for gaps
for col in df.columns:
    print(f"{col}: min={df[col].min():.2f}, max={df[col].max():.2f}")

Space-Filling Sampling

For best neural network training, use space-filling designs:

Latin Hypercube Sampling

from scipy.stats import qmc

# Generate space-filling samples
n_samples = 100
n_params = 4

sampler = qmc.LatinHypercube(d=n_params)
samples = sampler.random(n=n_samples)

# Scale to parameter bounds
lower = [2.0, 20.0, 5.0, 1.0]
upper = [10.0, 50.0, 15.0, 5.0]
scaled = qmc.scale(samples, lower, upper)

Sobol Sequence

sampler = qmc.Sobol(d=n_params)
samples = sampler.random(n=n_samples)
scaled = qmc.scale(samples, lower, upper)

Next Steps After Export

1. Parse to Neural Format

cd atomizer-field
python batch_parser.py ../atomizer_field_training_data/my_study

2. Split Train/Validation

from sklearn.model_selection import train_test_split

# 80/20 split
train_trials, val_trials = train_test_split(
    all_trials,
    test_size=0.2,
    random_state=42
)

3. Train Model

python train_parametric.py \
  --train_dir ../training_data/parsed \
  --val_dir ../validation_data/parsed \
  --epochs 200

See SYS_14_NEURAL_ACCELERATION for full training workflow.


Troubleshooting

Symptom Cause Solution
No export directory Export not enabled Add training_data_export to config
Missing OP2 files Solve failed Check include_failed: false
Incomplete metadata Extraction error Check extractor logs
Low sample count Too many failures Relax constraints

Cross-References


Version History

Version Date Changes
1.0 2025-12-05 Initial release