Files
Atomizer/docs/protocols/system/SYS_16_SELF_AWARE_TURBO.md
Anto01 b1ffc64407 feat: Implement SAT v3 achieving WS=205.58 (new campaign record)
Self-Aware Turbo v3 optimization validated on M1 Mirror flat back:
- Best WS: 205.58 (12% better than previous best 218.26)
- 100% feasibility rate, 100% unique designs
- Uses 556 training samples from V5-V8 campaign data

Key innovations in V9:
- Adaptive exploration schedule (15% → 8% → 3%)
- Mass threshold at 118 kg (optimal sweet spot)
- 70% exploitation near best design
- Seeded with best known design from V7
- Ensemble surrogate with R²=0.99

Updated documentation:
- SYS_16: SAT protocol updated to v3.0 VALIDATED
- Cheatsheet: Added SAT v3 as recommended method
- Context: Updated protocol overview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 16:06:33 -05:00

11 KiB

SYS_16: Self-Aware Turbo (SAT) Optimization

Version: 3.0

Status: VALIDATED

Created: 2025-12-28

Updated: 2025-12-31


Quick Summary

SAT v3 achieved WS=205.58, beating all previous methods (V7 TPE: 218.26, V6 TPE: 225.41).

SAT is a surrogate-accelerated optimization method that:

  1. Trains an ensemble of 5 MLPs on historical FEA data
  2. Uses adaptive exploration that decreases over time (15%→8%→3%)
  3. Filters candidates to prevent duplicate evaluations
  4. Applies soft mass constraints in the acquisition function

Version History

Version Study Training Data Key Fix Best WS
v1 V7 129 (V6 only) - 218.26
v2 V8 196 (V6 only) Duplicate prevention 271.38
v3 V9 556 (V5-V8) Adaptive exploration + mass targeting 205.58

Problem Statement

V5 surrogate + L-BFGS failed catastrophically because:

  1. MLP predicted WS=280 but actual was WS=376 (30%+ error)
  2. L-BFGS descended to regions outside training distribution
  3. Surrogate had no way to signal uncertainty
  4. All L-BFGS solutions converged to the same "fake optimum"

Root cause: The surrogate is overconfident in regions where it has no data.


Solution: Uncertainty-Aware Surrogate with Active Learning

Core Principles

  1. Never trust a point prediction - Always require uncertainty bounds
  2. High uncertainty = run FEA - Don't optimize where you don't know
  3. Actively fill gaps - Prioritize FEA in high-uncertainty regions
  4. Validate gradient solutions - Check L-BFGS results against FEA before trusting

Architecture

1. Ensemble Surrogate (Epistemic Uncertainty)

Instead of one MLP, train N independent models with different initializations:

class EnsembleSurrogate:
    def __init__(self, n_models=5):
        self.models = [MLP() for _ in range(n_models)]

    def predict(self, x):
        preds = [m.predict(x) for m in self.models]
        mean = np.mean(preds, axis=0)
        std = np.std(preds, axis=0)  # Epistemic uncertainty
        return mean, std

    def is_confident(self, x, threshold=0.1):
        mean, std = self.predict(x)
        # Confident if std < 10% of mean
        return (std / (mean + 1e-6)) < threshold

Why this works: Models trained on different random seeds will agree in well-sampled regions but disagree wildly in extrapolation regions.

2. Distance-Based OOD Detection

Track training data distribution and flag points that are "too far":

class OODDetector:
    def __init__(self, X_train):
        self.X_train = X_train
        self.mean = X_train.mean(axis=0)
        self.std = X_train.std(axis=0)
        # Fit KNN for local density
        self.knn = NearestNeighbors(n_neighbors=5)
        self.knn.fit(X_train)

    def distance_to_training(self, x):
        """Return distance to nearest training points."""
        distances, _ = self.knn.kneighbors(x.reshape(1, -1))
        return distances.mean()

    def is_in_distribution(self, x, threshold=2.0):
        """Check if point is within 2 std of training data."""
        z_scores = np.abs((x - self.mean) / (self.std + 1e-6))
        return z_scores.max() < threshold

3. Trust-Region L-BFGS

Constrain L-BFGS to stay within training distribution:

def trust_region_lbfgs(surrogate, ood_detector, x0, max_iter=100):
    """L-BFGS that respects training data boundaries."""

    def constrained_objective(x):
        # If OOD, return large penalty
        if not ood_detector.is_in_distribution(x):
            return 1e9

        mean, std = surrogate.predict(x)
        # If uncertain, return upper confidence bound (pessimistic)
        if std > 0.1 * mean:
            return mean + 2 * std  # Be conservative

        return mean

    result = minimize(constrained_objective, x0, method='L-BFGS-B')
    return result.x

4. Acquisition Function with Uncertainty

Use Expected Improvement with Uncertainty (like Bayesian Optimization):

def acquisition_score(x, surrogate, best_so_far):
    """Score = potential improvement weighted by confidence."""
    mean, std = surrogate.predict(x)

    # Expected improvement (lower is better for minimization)
    improvement = best_so_far - mean

    # Exploration bonus for uncertain regions
    exploration = 0.5 * std

    # High score = worth evaluating with FEA
    return improvement + exploration

def select_next_fea_candidates(surrogate, candidates, best_so_far, n=5):
    """Select candidates balancing exploitation and exploration."""
    scores = [acquisition_score(c, surrogate, best_so_far) for c in candidates]

    # Pick top candidates by acquisition score
    top_indices = np.argsort(scores)[-n:]
    return [candidates[i] for i in top_indices]

Algorithm: Self-Aware Turbo (SAT)

INITIALIZE:
  - Load existing FEA data (X_train, Y_train)
  - Train ensemble surrogate on data
  - Fit OOD detector on X_train
  - Set best_ws = min(Y_train)

PHASE 1: UNCERTAINTY MAPPING (10% of budget)
  FOR i in 1..N_mapping:
    - Sample random point x
    - Get uncertainty: mean, std = surrogate.predict(x)
    - If std > threshold: run FEA, add to training data
    - Retrain ensemble periodically

  This fills in the "holes" in the surrogate's knowledge.

PHASE 2: EXPLOITATION WITH VALIDATION (80% of budget)
  FOR i in 1..N_exploit:
    - Generate 1000 TPE samples
    - Filter to keep only confident predictions (std < 10% of mean)
    - Filter to keep only in-distribution (OOD check)
    - Rank by predicted WS

    - Take top 5 candidates
    - Run FEA on all 5

    - For each FEA result:
      - Compare predicted vs actual
      - If error > 20%: mark region as "unreliable", force exploration there
      - If error < 10%: update best, retrain surrogate

    - Every 10 iterations: retrain ensemble with new data

PHASE 3: L-BFGS REFINEMENT (10% of budget)
  - Only run L-BFGS if ensemble R² > 0.95 on validation set
  - Use trust-region L-BFGS (stay within training distribution)

  FOR each L-BFGS solution:
    - Check ensemble disagreement
    - If models agree (std < 5%): run FEA to validate
    - If models disagree: skip, too uncertain

    - Compare L-BFGS prediction vs FEA
    - If error > 15%: ABORT L-BFGS phase, return to Phase 2
    - If error < 10%: accept as candidate

FINAL:
  - Return best FEA-validated design
  - Report uncertainty bounds for all objectives

Key Differences from V5

Aspect V5 (Failed) SAT (Proposed)
Model Single MLP Ensemble of 5 MLPs
Uncertainty None Ensemble disagreement + OOD detection
L-BFGS Trust blindly Trust-region, validate every step
Extrapolation Accept Reject or penalize
Active learning No Yes - prioritize uncertain regions
Validation After L-BFGS Throughout

Implementation Checklist

  1. EnsembleSurrogate class with N=5 MLPs
  2. OODDetector with KNN + z-score checks
  3. acquisition_score() balancing exploitation/exploration
  4. Trust-region L-BFGS with OOD penalties
  5. Automatic retraining when new FEA data arrives
  6. Logging of prediction errors to track surrogate quality
  7. Early abort if L-BFGS predictions consistently wrong

Expected Behavior

In well-sampled regions:

  • Ensemble agrees → Low uncertainty → Trust predictions
  • L-BFGS finds valid optima → FEA confirms → Success

In poorly-sampled regions:

  • Ensemble disagrees → High uncertainty → Run FEA instead
  • L-BFGS penalized → Stays in trusted zone → No fake optima

At distribution boundaries:

  • OOD detector flags → Reject predictions
  • Acquisition prioritizes → Active learning fills gaps

Metrics to Track

  1. Surrogate R² on validation set - Target > 0.95 before L-BFGS
  2. Prediction error histogram - Should be centered at 0
  3. OOD rejection rate - How often we refuse to predict
  4. Ensemble disagreement - Average std across predictions
  5. L-BFGS success rate - % of L-BFGS solutions that validate

When to Use SAT vs Pure TPE

Scenario Recommendation
< 100 existing samples Pure TPE (not enough for good surrogate)
100-500 samples SAT Phase 1-2 only (no L-BFGS)
> 500 samples Full SAT with L-BFGS refinement
High-dimensional (>20 params) Pure TPE (curse of dimensionality)
Noisy FEA Pure TPE (surrogates struggle with noise)

SAT v3 Implementation Details

Adaptive Exploration Schedule

def get_exploration_weight(trial_num):
    if trial_num <= 30:      return 0.15  # Phase 1: 15% exploration
    elif trial_num <= 80:    return 0.08  # Phase 2: 8% exploration
    else:                    return 0.03  # Phase 3: 3% exploitation

Acquisition Function (v3)

# Normalize components
norm_ws = (pred_ws - pred_ws.min()) / (pred_ws.max() - pred_ws.min())
norm_dist = distances / distances.max()
mass_penalty = max(0, pred_mass - 118.0) * 5.0  # Soft threshold at 118 kg

# Adaptive acquisition (lower = better)
acquisition = norm_ws - exploration_weight * norm_dist + norm_mass_penalty

Candidate Generation (v3)

for _ in range(1000):
    if random() < 0.7 and best_x is not None:
        # 70% exploitation: sample near best
        scale = uniform(0.05, 0.15)
        candidate = sample_near_point(best_x, scale)
    else:
        # 30% exploration: random sampling
        candidate = sample_random()

Key Configuration (v3)

{
  "n_ensemble_models": 5,
  "training_epochs": 800,
  "candidates_per_round": 1000,
  "min_distance_threshold": 0.03,
  "mass_soft_threshold": 118.0,
  "exploit_near_best_ratio": 0.7,
  "lbfgs_polish_trials": 10
}

V9 Results

Phase Trials Best WS Mean WS
Phase 1 (explore) 30 232.00 394.48
Phase 2 (balanced) 50 222.01 360.51
Phase 3 (exploit) 57+ 205.58 262.57

Key metrics:

  • 100% feasibility rate
  • 100% unique designs (no duplicates)
  • Surrogate R² = 0.99

References

  • Gaussian Process literature on uncertainty quantification
  • Deep Ensembles: Lakshminarayanan et al. (2017)
  • Bayesian Optimization with Expected Improvement
  • Trust-region methods for constrained optimization

Implementation

  • V9 Study: studies/M1_Mirror/m1_mirror_cost_reduction_flat_back_V9/
  • Script: run_sat_optimization.py
  • Ensemble: optimization_engine/surrogates/ensemble_surrogate.py

The key insight: A surrogate that knows when it doesn't know is infinitely more valuable than one that's confidently wrong.