# Neural Network Surrogate Automation Plan

## Vision: One-Click ML-Accelerated Optimization

Make neural network surrogates a **first-class citizen** in Atomizer, fully integrated into the optimization workflow so that:
1. Non-coders can enable/configure NN acceleration via JSON config
2. The system automatically builds, trains, and validates surrogates
3. Knowledge accumulates in a reusable "Physics Knowledge Base"
4. The dashboard provides full visibility and control

---

## Current State (What We Have)

```
Manual Steps Required Today:
1. Run optimization (30+ FEA trials)
2. Manually run: generate_training_data.py
3. Manually run: run_training_fea.py
4. Manually run: train_nn_surrogate.py
5. Manually run: generate_nn_report.py
6. Manually enable --enable-nn flag
7. No persistent knowledge storage
```

---

## Target State (What We Want)

```
Automated Flow:
1. User creates optimization_config.json with surrogate_settings
2. User runs: python run_optimization.py --trials 100
3. System automatically:
   - Runs initial FEA exploration (20-30 trials)
   - Generates space-filling training points
   - Runs parallel FEA on training points
   - Trains and validates surrogate
   - Switches to NN-accelerated optimization
   - Validates top candidates with real FEA
   - Stores learned physics in Knowledge Base
```

---

## Phase 1: Extended Configuration Schema

### Current optimization_config.json
```json
{
  "study_name": "uav_arm_optimization",
  "optimization_settings": {
    "protocol": "protocol_11_multi_objective",
    "n_trials": 30
  },
  "design_variables": [...],
  "objectives": [...],
  "constraints": [...]
}
```

### Proposed Extended Schema
```json
{
  "study_name": "uav_arm_optimization",
  "description": "UAV Camera Support Arm",
  "engineering_context": "Drone gimbal arm for 850g camera payload",

  "optimization_settings": {
    "protocol": "protocol_12_hybrid_surrogate",
    "n_trials": 200,
    "sampler": "NSGAIISampler"
  },

  "design_variables": [...],
  "objectives": [...],
  "constraints": [...],

  "surrogate_settings": {
    "enabled": true,
    "mode": "auto",

    "training": {
      "initial_fea_trials": 30,
      "space_filling_samples": 100,
      "sampling_method": "lhs_with_corners",
      "parallel_workers": 2
    },

    "model": {
      "architecture": "mlp",
      "hidden_layers": [64, 128, 64],
      "validation_method": "5_fold_cv",
      "min_accuracy_mape": 10.0,
      "retrain_threshold": 15.0
    },

    "optimization": {
      "nn_trials_per_fea": 50,
      "validate_top_n": 5,
      "adaptive_sampling": true
    },

    "knowledge_base": {
      "save_to_master": true,
      "master_db_path": "knowledge_base/physics_surrogates.db",
      "tags": ["cantilever", "aluminum", "modal", "static"],
      "reuse_similar": true
    }
  },

  "simulation": {...},
  "reporting": {...}
}
```

---

## Phase 2: Protocol 12 - Hybrid Surrogate Optimization

### Workflow Stages

```
┌─────────────────────────────────────────────────────────────────────┐
│                    PROTOCOL 12: HYBRID SURROGATE                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  STAGE 1: EXPLORATION (FEA Only)                                    │
│  ├─ Run initial_fea_trials with real FEA                            │
│  ├─ Build baseline Pareto front                                     │
│  └─ Assess design space complexity                                  │
│                                                                      │
│  STAGE 2: TRAINING DATA GENERATION                                  │
│  ├─ Generate space_filling_samples (LHS + corners)                  │
│  ├─ Run parallel FEA on training points                             │
│  ├─ Store all results in training_data.db                           │
│  └─ Monitor for failures, retry if needed                           │
│                                                                      │
│  STAGE 3: SURROGATE TRAINING                                        │
│  ├─ Train NN on combined data (optimization + training)             │
│  ├─ Validate with k-fold cross-validation                           │
│  ├─ Check accuracy >= min_accuracy_mape                             │
│  └─ Generate performance report                                     │
│                                                                      │
│  STAGE 4: NN-ACCELERATED OPTIMIZATION                               │
│  ├─ Run nn_trials_per_fea NN evaluations per FEA validation         │
│  ├─ Validate top_n candidates with real FEA                         │
│  ├─ Update surrogate with new data (adaptive)                       │
│  └─ Repeat until n_trials reached                                   │
│                                                                      │
│  STAGE 5: FINAL VALIDATION & REPORTING                              │
│  ├─ Validate all Pareto-optimal designs with FEA                    │
│  ├─ Generate comprehensive report                                   │
│  └─ Save learned physics to Knowledge Base                          │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

### Implementation: runner_protocol_12.py

```python
class HybridSurrogateRunner:
    """Protocol 12: Automated hybrid FEA/NN optimization."""

    def __init__(self, config: dict):
        self.config = config
        self.surrogate_config = config.get('surrogate_settings', {})
        self.stage = "exploration"

    def run(self):
        # Stage 1: Exploration
        self.run_exploration_stage()

        # Stage 2: Training Data
        if self.surrogate_config.get('enabled', False):
            self.generate_training_data()
            self.run_parallel_fea_training()

            # Stage 3: Train Surrogate
            self.train_and_validate_surrogate()

            # Stage 4: NN-Accelerated
            self.run_nn_accelerated_optimization()

        # Stage 5: Final
        self.validate_and_report()
        self.save_to_knowledge_base()
```

---

## Phase 3: Physics Knowledge Base Architecture

### Purpose
Store learned physics relationships so future optimizations can:
1. **Warm-start** with pre-trained surrogates for similar problems
2. **Transfer learn** from related geometries/materials
3. **Build institutional knowledge** over time

### Database Schema: physics_surrogates.db

```sql
-- Master registry of all trained surrogates
CREATE TABLE surrogates (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    study_name TEXT,

    -- Problem characterization
    geometry_type TEXT,        -- 'cantilever', 'plate', 'shell', 'solid'
    material_family TEXT,      -- 'aluminum', 'steel', 'composite'
    analysis_types TEXT,       -- JSON: ['static', 'modal', 'buckling']

    -- Design space
    n_parameters INTEGER,
    parameter_names TEXT,      -- JSON array
    parameter_bounds TEXT,     -- JSON: {name: [min, max]}

    -- Objectives & Constraints
    objectives TEXT,           -- JSON: [{name, goal}]
    constraints TEXT,          -- JSON: [{name, type, threshold}]

    -- Model info
    model_path TEXT,           -- Path to .pt file
    architecture TEXT,         -- JSON: model architecture
    training_samples INTEGER,

    -- Performance metrics
    cv_mape_mass REAL,
    cv_mape_frequency REAL,
    cv_r2_mass REAL,
    cv_r2_frequency REAL,

    -- Metadata
    tags TEXT,                 -- JSON array for search
    description TEXT,
    engineering_context TEXT
);

-- Training data for each surrogate
CREATE TABLE training_data (
    id INTEGER PRIMARY KEY,
    surrogate_id INTEGER REFERENCES surrogates(id),

    -- Input parameters (normalized 0-1)
    params_json TEXT,
    params_normalized TEXT,

    -- Output values
    mass REAL,
    frequency REAL,
    max_displacement REAL,
    max_stress REAL,

    -- Source
    source TEXT,              -- 'optimization', 'lhs', 'corner', 'adaptive'
    fea_timestamp TIMESTAMP
);

-- Similarity index for finding related problems
CREATE TABLE problem_similarity (
    surrogate_id INTEGER REFERENCES surrogates(id),

    -- Embedding for similarity search
    geometry_embedding BLOB,   -- Vector embedding of geometry type
    physics_embedding BLOB,    -- Vector embedding of physics signature

    -- Precomputed similarity features
    feature_vector TEXT        -- JSON: normalized features for matching
);
```

### Knowledge Base API

```python
class PhysicsKnowledgeBase:
    """Central repository for learned physics surrogates."""

    def __init__(self, db_path: str = "knowledge_base/physics_surrogates.db"):
        self.db_path = db_path

    def find_similar_surrogate(self, config: dict) -> Optional[SurrogateMatch]:
        """Find existing surrogate that could transfer to this problem."""
        # Extract features from config
        features = self._extract_problem_features(config)

        # Query similar problems
        matches = self._query_similar(features)

        # Return best match if similarity > threshold
        if matches and matches[0].similarity > 0.8:
            return matches[0]
        return None

    def save_surrogate(self, study_name: str, model_path: str,
                       config: dict, metrics: dict):
        """Save trained surrogate to knowledge base."""
        # Store model and metadata
        # Index for future similarity search
        pass

    def transfer_learn(self, base_surrogate_id: int,
                       new_config: dict) -> nn.Module:
        """Create new surrogate by transfer learning from existing one."""
        # Load base model
        # Freeze early layers
        # Fine-tune on new data
        pass
```

---

## Phase 4: Dashboard Integration

### New Dashboard Pages

#### 1. Surrogate Status Panel (in existing Dashboard)
```
┌─────────────────────────────────────────────────────────┐
│ SURROGATE STATUS                                        │
├─────────────────────────────────────────────────────────┤
│ Mode: Hybrid (NN + FEA Validation)                      │
│ Stage: NN-Accelerated Optimization                      │
│                                                         │
│ Training Data: 150 samples (50 opt + 100 LHS)          │
│ Model Accuracy: MAPE 1.8% mass, 1.1% freq              │
│ Speedup: ~50x (10ms NN vs 500ms FEA)                   │
│                                                         │
│ [View Report] [Retrain] [Disable NN]                   │
└─────────────────────────────────────────────────────────┘
```

#### 2. Knowledge Base Browser
```
┌─────────────────────────────────────────────────────────┐
│ PHYSICS KNOWLEDGE BASE                                  │
├─────────────────────────────────────────────────────────┤
│ Stored Surrogates: 12                                   │
│                                                         │
│ [Cantilever Beams]  5 models, avg MAPE 2.1%            │
│ [Shell Structures]  3 models, avg MAPE 3.4%            │
│ [Solid Parts]       4 models, avg MAPE 4.2%            │
│                                                         │
│ Search: [aluminum modal_______] [Find Similar]          │
│                                                         │
│ Matching Models:                                        │
│ - uav_arm_v2 (92% match) - Transfer Learning Available │
│ - bracket_opt (78% match)                              │
└─────────────────────────────────────────────────────────┘
```

---

## Phase 5: User Workflow (Non-Coder Experience)

### Scenario: New Optimization with NN Acceleration

```
Step 1: Create Study via Dashboard
┌─────────────────────────────────────────────────────────┐
│ NEW OPTIMIZATION STUDY                                  │
├─────────────────────────────────────────────────────────┤
│ Study Name: [drone_motor_mount___________]              │
│ Description: [Motor mount bracket________]              │
│                                                         │
│ Model File: [Browse...] drone_mount.prt                │
│ Sim File:   [Browse...] drone_mount_sim.sim            │
│                                                         │
│ ☑ Enable Neural Network Acceleration                    │
│   ├─ Initial FEA Trials: [30____]                      │
│   ├─ Training Samples:   [100___]                      │
│   ├─ Target Accuracy:    [10% MAPE]                    │
│   └─ ☑ Save to Knowledge Base                          │
│                                                         │
│ Similar existing model found: "uav_arm_optimization"   │
│ ☑ Use as starting point (transfer learning)            │
│                                                         │
│ [Create Study]                                          │
└─────────────────────────────────────────────────────────┘

Step 2: System Automatically Executes Protocol 12
- User sees progress in dashboard
- No command-line needed
- All stages automated

Step 3: Review Results
- Pareto front with FEA-validated designs
- NN performance report
- Knowledge saved for future use
```

---

## Implementation Roadmap

### Phase 1: Config Schema Extension (1-2 days)
- [ ] Define surrogate_settings schema
- [ ] Update config validator
- [ ] Create migration for existing configs

### Phase 2: Protocol 12 Runner (3-5 days)
- [ ] Create HybridSurrogateRunner class
- [ ] Implement stage transitions
- [ ] Add progress callbacks for dashboard
- [ ] Integrate existing scripts as modules

### Phase 3: Knowledge Base (2-3 days)
- [ ] Create SQLite schema
- [ ] Implement PhysicsKnowledgeBase API
- [ ] Add similarity search
- [ ] Basic transfer learning

### Phase 4: Dashboard Integration (2-3 days)
- [ ] Surrogate status panel
- [ ] Knowledge base browser
- [ ] Study creation wizard with NN options

### Phase 5: Documentation & Testing (1-2 days)
- [ ] User guide for non-coders
- [ ] Integration tests
- [ ] Example workflows

---

## Data Flow Architecture

```
                    ┌──────────────────────────────────────┐
                    │      optimization_config.json        │
                    │  (Single source of truth for study)  │
                    └──────────────────┬───────────────────┘
                                       │
                    ┌──────────────────▼───────────────────┐
                    │         Protocol 12 Runner           │
                    │    (Orchestrates entire workflow)    │
                    └──────────────────┬───────────────────┘
                                       │
         ┌─────────────────┬───────────┼───────────┬─────────────────┐
         │                 │           │           │                 │
         ▼                 ▼           ▼           ▼                 ▼
    ┌─────────┐      ┌─────────┐ ┌─────────┐ ┌─────────┐      ┌─────────┐
    │  FEA    │      │Training │ │Surrogate│ │   NN    │      │Knowledge│
    │ Solver  │      │  Data   │ │ Trainer │ │  Optim  │      │  Base   │
    └────┬────┘      └────┬────┘ └────┬────┘ └────┬────┘      └────┬────┘
         │                │           │           │                 │
         ▼                ▼           ▼           ▼                 ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │                         study.db                                 │
    │  (Optuna trials + training data + surrogate metadata)           │
    └─────────────────────────────────────────────────────────────────┘
                                       │
                    ┌──────────────────▼───────────────────┐
                    │        physics_surrogates.db         │
                    │   (Master knowledge base - global)   │
                    └──────────────────────────────────────┘
```

---

## Key Benefits

### For Non-Coders
1. **Single JSON config** - No Python scripts to run manually
2. **Dashboard control** - Start/stop/monitor from browser
3. **Automatic recommendations** - System suggests best settings
4. **Knowledge reuse** - Similar problems get free speedup

### For the Organization
1. **Institutional memory** - Physics knowledge persists
2. **Faster iterations** - Each new study benefits from past work
3. **Reproducibility** - Everything tracked in databases
4. **Scalability** - Add more workers, train better models

### For the Workflow
1. **End-to-end automation** - No manual steps between stages
2. **Adaptive optimization** - System learns during run
3. **Validated results** - Top candidates always FEA-verified
4. **Rich reporting** - Performance metrics, comparisons, recommendations

---

## Next Steps

1. **Review this plan** - Get feedback on priorities
2. **Start with config schema** - Extend optimization_config.json
3. **Build Protocol 12** - Core automation logic
4. **Knowledge Base MVP** - Basic save/load functionality
5. **Dashboard integration** - Visual control panel

---

*Document Version: 1.0*
*Created: 2025-11-25*
*Author: Claude Code + Antoine*