Files
Atomizer/docs/07_DEVELOPMENT/NN_SURROGATE_AUTOMATION_PLAN.md
Anto01 e3bdb08a22 feat: Major update with validators, skills, dashboard, and docs reorganization
- Add validation framework (config, model, results, study validators)
- Add Claude Code skills (create-study, run-optimization, generate-report,
  troubleshoot, analyze-model)
- Add Atomizer Dashboard (React frontend + FastAPI backend)
- Reorganize docs into structured directories (00-09)
- Add neural surrogate modules and training infrastructure
- Add multi-objective optimization support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 19:23:58 -05:00

20 KiB

Neural Network Surrogate Automation Plan

Vision: One-Click ML-Accelerated Optimization

Make neural network surrogates a first-class citizen in Atomizer, fully integrated into the optimization workflow so that:

  1. Non-coders can enable/configure NN acceleration via JSON config
  2. The system automatically builds, trains, and validates surrogates
  3. Knowledge accumulates in a reusable "Physics Knowledge Base"
  4. The dashboard provides full visibility and control

Current State (What We Have)

Manual Steps Required Today:
1. Run optimization (30+ FEA trials)
2. Manually run: generate_training_data.py
3. Manually run: run_training_fea.py
4. Manually run: train_nn_surrogate.py
5. Manually run: generate_nn_report.py
6. Manually enable --enable-nn flag
7. No persistent knowledge storage

Target State (What We Want)

Automated Flow:
1. User creates optimization_config.json with surrogate_settings
2. User runs: python run_optimization.py --trials 100
3. System automatically:
   - Runs initial FEA exploration (20-30 trials)
   - Generates space-filling training points
   - Runs parallel FEA on training points
   - Trains and validates surrogate
   - Switches to NN-accelerated optimization
   - Validates top candidates with real FEA
   - Stores learned physics in Knowledge Base

Phase 1: Extended Configuration Schema

Current optimization_config.json

{
  "study_name": "uav_arm_optimization",
  "optimization_settings": {
    "protocol": "protocol_11_multi_objective",
    "n_trials": 30
  },
  "design_variables": [...],
  "objectives": [...],
  "constraints": [...]
}

Proposed Extended Schema

{
  "study_name": "uav_arm_optimization",
  "description": "UAV Camera Support Arm",
  "engineering_context": "Drone gimbal arm for 850g camera payload",

  "optimization_settings": {
    "protocol": "protocol_12_hybrid_surrogate",
    "n_trials": 200,
    "sampler": "NSGAIISampler"
  },

  "design_variables": [...],
  "objectives": [...],
  "constraints": [...],

  "surrogate_settings": {
    "enabled": true,
    "mode": "auto",

    "training": {
      "initial_fea_trials": 30,
      "space_filling_samples": 100,
      "sampling_method": "lhs_with_corners",
      "parallel_workers": 2
    },

    "model": {
      "architecture": "mlp",
      "hidden_layers": [64, 128, 64],
      "validation_method": "5_fold_cv",
      "min_accuracy_mape": 10.0,
      "retrain_threshold": 15.0
    },

    "optimization": {
      "nn_trials_per_fea": 50,
      "validate_top_n": 5,
      "adaptive_sampling": true
    },

    "knowledge_base": {
      "save_to_master": true,
      "master_db_path": "knowledge_base/physics_surrogates.db",
      "tags": ["cantilever", "aluminum", "modal", "static"],
      "reuse_similar": true
    }
  },

  "simulation": {...},
  "reporting": {...}
}

Phase 2: Protocol 12 - Hybrid Surrogate Optimization

Workflow Stages

┌─────────────────────────────────────────────────────────────────────┐
│                    PROTOCOL 12: HYBRID SURROGATE                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  STAGE 1: EXPLORATION (FEA Only)                                    │
│  ├─ Run initial_fea_trials with real FEA                            │
│  ├─ Build baseline Pareto front                                     │
│  └─ Assess design space complexity                                  │
│                                                                      │
│  STAGE 2: TRAINING DATA GENERATION                                  │
│  ├─ Generate space_filling_samples (LHS + corners)                  │
│  ├─ Run parallel FEA on training points                             │
│  ├─ Store all results in training_data.db                           │
│  └─ Monitor for failures, retry if needed                           │
│                                                                      │
│  STAGE 3: SURROGATE TRAINING                                        │
│  ├─ Train NN on combined data (optimization + training)             │
│  ├─ Validate with k-fold cross-validation                           │
│  ├─ Check accuracy >= min_accuracy_mape                             │
│  └─ Generate performance report                                     │
│                                                                      │
│  STAGE 4: NN-ACCELERATED OPTIMIZATION                               │
│  ├─ Run nn_trials_per_fea NN evaluations per FEA validation         │
│  ├─ Validate top_n candidates with real FEA                         │
│  ├─ Update surrogate with new data (adaptive)                       │
│  └─ Repeat until n_trials reached                                   │
│                                                                      │
│  STAGE 5: FINAL VALIDATION & REPORTING                              │
│  ├─ Validate all Pareto-optimal designs with FEA                    │
│  ├─ Generate comprehensive report                                   │
│  └─ Save learned physics to Knowledge Base                          │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Implementation: runner_protocol_12.py

class HybridSurrogateRunner:
    """Protocol 12: Automated hybrid FEA/NN optimization."""

    def __init__(self, config: dict):
        self.config = config
        self.surrogate_config = config.get('surrogate_settings', {})
        self.stage = "exploration"

    def run(self):
        # Stage 1: Exploration
        self.run_exploration_stage()

        # Stage 2: Training Data
        if self.surrogate_config.get('enabled', False):
            self.generate_training_data()
            self.run_parallel_fea_training()

            # Stage 3: Train Surrogate
            self.train_and_validate_surrogate()

            # Stage 4: NN-Accelerated
            self.run_nn_accelerated_optimization()

        # Stage 5: Final
        self.validate_and_report()
        self.save_to_knowledge_base()

Phase 3: Physics Knowledge Base Architecture

Purpose

Store learned physics relationships so future optimizations can:

  1. Warm-start with pre-trained surrogates for similar problems
  2. Transfer learn from related geometries/materials
  3. Build institutional knowledge over time

Database Schema: physics_surrogates.db

-- Master registry of all trained surrogates
CREATE TABLE surrogates (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    study_name TEXT,

    -- Problem characterization
    geometry_type TEXT,        -- 'cantilever', 'plate', 'shell', 'solid'
    material_family TEXT,      -- 'aluminum', 'steel', 'composite'
    analysis_types TEXT,       -- JSON: ['static', 'modal', 'buckling']

    -- Design space
    n_parameters INTEGER,
    parameter_names TEXT,      -- JSON array
    parameter_bounds TEXT,     -- JSON: {name: [min, max]}

    -- Objectives & Constraints
    objectives TEXT,           -- JSON: [{name, goal}]
    constraints TEXT,          -- JSON: [{name, type, threshold}]

    -- Model info
    model_path TEXT,           -- Path to .pt file
    architecture TEXT,         -- JSON: model architecture
    training_samples INTEGER,

    -- Performance metrics
    cv_mape_mass REAL,
    cv_mape_frequency REAL,
    cv_r2_mass REAL,
    cv_r2_frequency REAL,

    -- Metadata
    tags TEXT,                 -- JSON array for search
    description TEXT,
    engineering_context TEXT
);

-- Training data for each surrogate
CREATE TABLE training_data (
    id INTEGER PRIMARY KEY,
    surrogate_id INTEGER REFERENCES surrogates(id),

    -- Input parameters (normalized 0-1)
    params_json TEXT,
    params_normalized TEXT,

    -- Output values
    mass REAL,
    frequency REAL,
    max_displacement REAL,
    max_stress REAL,

    -- Source
    source TEXT,              -- 'optimization', 'lhs', 'corner', 'adaptive'
    fea_timestamp TIMESTAMP
);

-- Similarity index for finding related problems
CREATE TABLE problem_similarity (
    surrogate_id INTEGER REFERENCES surrogates(id),

    -- Embedding for similarity search
    geometry_embedding BLOB,   -- Vector embedding of geometry type
    physics_embedding BLOB,    -- Vector embedding of physics signature

    -- Precomputed similarity features
    feature_vector TEXT        -- JSON: normalized features for matching
);

Knowledge Base API

class PhysicsKnowledgeBase:
    """Central repository for learned physics surrogates."""

    def __init__(self, db_path: str = "knowledge_base/physics_surrogates.db"):
        self.db_path = db_path

    def find_similar_surrogate(self, config: dict) -> Optional[SurrogateMatch]:
        """Find existing surrogate that could transfer to this problem."""
        # Extract features from config
        features = self._extract_problem_features(config)

        # Query similar problems
        matches = self._query_similar(features)

        # Return best match if similarity > threshold
        if matches and matches[0].similarity > 0.8:
            return matches[0]
        return None

    def save_surrogate(self, study_name: str, model_path: str,
                       config: dict, metrics: dict):
        """Save trained surrogate to knowledge base."""
        # Store model and metadata
        # Index for future similarity search
        pass

    def transfer_learn(self, base_surrogate_id: int,
                       new_config: dict) -> nn.Module:
        """Create new surrogate by transfer learning from existing one."""
        # Load base model
        # Freeze early layers
        # Fine-tune on new data
        pass

Phase 4: Dashboard Integration

New Dashboard Pages

1. Surrogate Status Panel (in existing Dashboard)

┌─────────────────────────────────────────────────────────┐
│ SURROGATE STATUS                                        │
├─────────────────────────────────────────────────────────┤
│ Mode: Hybrid (NN + FEA Validation)                      │
│ Stage: NN-Accelerated Optimization                      │
│                                                         │
│ Training Data: 150 samples (50 opt + 100 LHS)          │
│ Model Accuracy: MAPE 1.8% mass, 1.1% freq              │
│ Speedup: ~50x (10ms NN vs 500ms FEA)                   │
│                                                         │
│ [View Report] [Retrain] [Disable NN]                   │
└─────────────────────────────────────────────────────────┘

2. Knowledge Base Browser

┌─────────────────────────────────────────────────────────┐
│ PHYSICS KNOWLEDGE BASE                                  │
├─────────────────────────────────────────────────────────┤
│ Stored Surrogates: 12                                   │
│                                                         │
│ [Cantilever Beams]  5 models, avg MAPE 2.1%            │
│ [Shell Structures]  3 models, avg MAPE 3.4%            │
│ [Solid Parts]       4 models, avg MAPE 4.2%            │
│                                                         │
│ Search: [aluminum modal_______] [Find Similar]          │
│                                                         │
│ Matching Models:                                        │
│ - uav_arm_v2 (92% match) - Transfer Learning Available │
│ - bracket_opt (78% match)                              │
└─────────────────────────────────────────────────────────┘

Phase 5: User Workflow (Non-Coder Experience)

Scenario: New Optimization with NN Acceleration

Step 1: Create Study via Dashboard
┌─────────────────────────────────────────────────────────┐
│ NEW OPTIMIZATION STUDY                                  │
├─────────────────────────────────────────────────────────┤
│ Study Name: [drone_motor_mount___________]              │
│ Description: [Motor mount bracket________]              │
│                                                         │
│ Model File: [Browse...] drone_mount.prt                │
│ Sim File:   [Browse...] drone_mount_sim.sim            │
│                                                         │
│ ☑ Enable Neural Network Acceleration                    │
│   ├─ Initial FEA Trials: [30____]                      │
│   ├─ Training Samples:   [100___]                      │
│   ├─ Target Accuracy:    [10% MAPE]                    │
│   └─ ☑ Save to Knowledge Base                          │
│                                                         │
│ Similar existing model found: "uav_arm_optimization"   │
│ ☑ Use as starting point (transfer learning)            │
│                                                         │
│ [Create Study]                                          │
└─────────────────────────────────────────────────────────┘

Step 2: System Automatically Executes Protocol 12
- User sees progress in dashboard
- No command-line needed
- All stages automated

Step 3: Review Results
- Pareto front with FEA-validated designs
- NN performance report
- Knowledge saved for future use

Implementation Roadmap

Phase 1: Config Schema Extension (1-2 days)

  • Define surrogate_settings schema
  • Update config validator
  • Create migration for existing configs

Phase 2: Protocol 12 Runner (3-5 days)

  • Create HybridSurrogateRunner class
  • Implement stage transitions
  • Add progress callbacks for dashboard
  • Integrate existing scripts as modules

Phase 3: Knowledge Base (2-3 days)

  • Create SQLite schema
  • Implement PhysicsKnowledgeBase API
  • Add similarity search
  • Basic transfer learning

Phase 4: Dashboard Integration (2-3 days)

  • Surrogate status panel
  • Knowledge base browser
  • Study creation wizard with NN options

Phase 5: Documentation & Testing (1-2 days)

  • User guide for non-coders
  • Integration tests
  • Example workflows

Data Flow Architecture

                    ┌──────────────────────────────────────┐
                    │      optimization_config.json        │
                    │  (Single source of truth for study)  │
                    └──────────────────┬───────────────────┘
                                       │
                    ┌──────────────────▼───────────────────┐
                    │         Protocol 12 Runner           │
                    │    (Orchestrates entire workflow)    │
                    └──────────────────┬───────────────────┘
                                       │
         ┌─────────────────┬───────────┼───────────┬─────────────────┐
         │                 │           │           │                 │
         ▼                 ▼           ▼           ▼                 ▼
    ┌─────────┐      ┌─────────┐ ┌─────────┐ ┌─────────┐      ┌─────────┐
    │  FEA    │      │Training │ │Surrogate│ │   NN    │      │Knowledge│
    │ Solver  │      │  Data   │ │ Trainer │ │  Optim  │      │  Base   │
    └────┬────┘      └────┬────┘ └────┬────┘ └────┬────┘      └────┬────┘
         │                │           │           │                 │
         ▼                ▼           ▼           ▼                 ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │                         study.db                                 │
    │  (Optuna trials + training data + surrogate metadata)           │
    └─────────────────────────────────────────────────────────────────┘
                                       │
                    ┌──────────────────▼───────────────────┐
                    │        physics_surrogates.db         │
                    │   (Master knowledge base - global)   │
                    └──────────────────────────────────────┘

Key Benefits

For Non-Coders

  1. Single JSON config - No Python scripts to run manually
  2. Dashboard control - Start/stop/monitor from browser
  3. Automatic recommendations - System suggests best settings
  4. Knowledge reuse - Similar problems get free speedup

For the Organization

  1. Institutional memory - Physics knowledge persists
  2. Faster iterations - Each new study benefits from past work
  3. Reproducibility - Everything tracked in databases
  4. Scalability - Add more workers, train better models

For the Workflow

  1. End-to-end automation - No manual steps between stages
  2. Adaptive optimization - System learns during run
  3. Validated results - Top candidates always FEA-verified
  4. Rich reporting - Performance metrics, comparisons, recommendations

Next Steps

  1. Review this plan - Get feedback on priorities
  2. Start with config schema - Extend optimization_config.json
  3. Build Protocol 12 - Core automation logic
  4. Knowledge Base MVP - Basic save/load functionality
  5. Dashboard integration - Visual control panel

Document Version: 1.0 Created: 2025-11-25 Author: Claude Code + Antoine