docs/protocols/system/SYS_16_SELF_AWARE_TURBO.md

# SYS_16: Self-Aware Turbo (SAT) Optimization

## Version: 1.0
## Status: PROPOSED
## Created: 2025-12-28

---

## Problem Statement

V5 surrogate + L-BFGS failed catastrophically because:
1. MLP predicted WS=280 but actual was WS=376 (30%+ error)
2. L-BFGS descended to regions **outside training distribution**
3. Surrogate had no way to signal uncertainty
4. All L-BFGS solutions converged to the same "fake optimum"

**Root cause:** The surrogate is overconfident in regions where it has no data.

---

## Solution: Uncertainty-Aware Surrogate with Active Learning

### Core Principles

1. **Never trust a point prediction** - Always require uncertainty bounds
2. **High uncertainty = run FEA** - Don't optimize where you don't know
3. **Actively fill gaps** - Prioritize FEA in high-uncertainty regions
4. **Validate gradient solutions** - Check L-BFGS results against FEA before trusting

---

## Architecture

### 1. Ensemble Surrogate (Epistemic Uncertainty)

Instead of one MLP, train **N independent models** with different initializations:

```python
class EnsembleSurrogate:
    def __init__(self, n_models=5):
        self.models = [MLP() for _ in range(n_models)]

    def predict(self, x):
        preds = [m.predict(x) for m in self.models]
        mean = np.mean(preds, axis=0)
        std = np.std(preds, axis=0)  # Epistemic uncertainty
        return mean, std

    def is_confident(self, x, threshold=0.1):
        mean, std = self.predict(x)
        # Confident if std < 10% of mean
        return (std / (mean + 1e-6)) < threshold
```

**Why this works:** Models trained on different random seeds will agree in well-sampled regions but disagree wildly in extrapolation regions.

### 2. Distance-Based OOD Detection

Track training data distribution and flag points that are "too far":

```python
class OODDetector:
    def __init__(self, X_train):
        self.X_train = X_train
        self.mean = X_train.mean(axis=0)
        self.std = X_train.std(axis=0)
        # Fit KNN for local density
        self.knn = NearestNeighbors(n_neighbors=5)
        self.knn.fit(X_train)

    def distance_to_training(self, x):
        """Return distance to nearest training points."""
        distances, _ = self.knn.kneighbors(x.reshape(1, -1))
        return distances.mean()

    def is_in_distribution(self, x, threshold=2.0):
        """Check if point is within 2 std of training data."""
        z_scores = np.abs((x - self.mean) / (self.std + 1e-6))
        return z_scores.max() < threshold
```

### 3. Trust-Region L-BFGS

Constrain L-BFGS to stay within training distribution:

```python
def trust_region_lbfgs(surrogate, ood_detector, x0, max_iter=100):
    """L-BFGS that respects training data boundaries."""

    def constrained_objective(x):
        # If OOD, return large penalty
        if not ood_detector.is_in_distribution(x):
            return 1e9

        mean, std = surrogate.predict(x)
        # If uncertain, return upper confidence bound (pessimistic)
        if std > 0.1 * mean:
            return mean + 2 * std  # Be conservative

        return mean

    result = minimize(constrained_objective, x0, method='L-BFGS-B')
    return result.x
```

### 4. Acquisition Function with Uncertainty

Use **Expected Improvement with Uncertainty** (like Bayesian Optimization):

```python
def acquisition_score(x, surrogate, best_so_far):
    """Score = potential improvement weighted by confidence."""
    mean, std = surrogate.predict(x)

    # Expected improvement (lower is better for minimization)
    improvement = best_so_far - mean

    # Exploration bonus for uncertain regions
    exploration = 0.5 * std

    # High score = worth evaluating with FEA
    return improvement + exploration

def select_next_fea_candidates(surrogate, candidates, best_so_far, n=5):
    """Select candidates balancing exploitation and exploration."""
    scores = [acquisition_score(c, surrogate, best_so_far) for c in candidates]

    # Pick top candidates by acquisition score
    top_indices = np.argsort(scores)[-n:]
    return [candidates[i] for i in top_indices]
```

---

## Algorithm: Self-Aware Turbo (SAT)

```
INITIALIZE:
  - Load existing FEA data (X_train, Y_train)
  - Train ensemble surrogate on data
  - Fit OOD detector on X_train
  - Set best_ws = min(Y_train)

PHASE 1: UNCERTAINTY MAPPING (10% of budget)
  FOR i in 1..N_mapping:
    - Sample random point x
    - Get uncertainty: mean, std = surrogate.predict(x)
    - If std > threshold: run FEA, add to training data
    - Retrain ensemble periodically

  This fills in the "holes" in the surrogate's knowledge.

PHASE 2: EXPLOITATION WITH VALIDATION (80% of budget)
  FOR i in 1..N_exploit:
    - Generate 1000 TPE samples
    - Filter to keep only confident predictions (std < 10% of mean)
    - Filter to keep only in-distribution (OOD check)
    - Rank by predicted WS

    - Take top 5 candidates
    - Run FEA on all 5

    - For each FEA result:
      - Compare predicted vs actual
      - If error > 20%: mark region as "unreliable", force exploration there
      - If error < 10%: update best, retrain surrogate

    - Every 10 iterations: retrain ensemble with new data

PHASE 3: L-BFGS REFINEMENT (10% of budget)
  - Only run L-BFGS if ensemble R² > 0.95 on validation set
  - Use trust-region L-BFGS (stay within training distribution)

  FOR each L-BFGS solution:
    - Check ensemble disagreement
    - If models agree (std < 5%): run FEA to validate
    - If models disagree: skip, too uncertain

    - Compare L-BFGS prediction vs FEA
    - If error > 15%: ABORT L-BFGS phase, return to Phase 2
    - If error < 10%: accept as candidate

FINAL:
  - Return best FEA-validated design
  - Report uncertainty bounds for all objectives
```

---

## Key Differences from V5

| Aspect | V5 (Failed) | SAT (Proposed) |
|--------|-------------|----------------|
| **Model** | Single MLP | Ensemble of 5 MLPs |
| **Uncertainty** | None | Ensemble disagreement + OOD detection |
| **L-BFGS** | Trust blindly | Trust-region, validate every step |
| **Extrapolation** | Accept | Reject or penalize |
| **Active learning** | No | Yes - prioritize uncertain regions |
| **Validation** | After L-BFGS | Throughout |

---

## Implementation Checklist

1. [ ] `EnsembleSurrogate` class with N=5 MLPs
2. [ ] `OODDetector` with KNN + z-score checks
3. [ ] `acquisition_score()` balancing exploitation/exploration
4. [ ] Trust-region L-BFGS with OOD penalties
5. [ ] Automatic retraining when new FEA data arrives
6. [ ] Logging of prediction errors to track surrogate quality
7. [ ] Early abort if L-BFGS predictions consistently wrong

---

## Expected Behavior

**In well-sampled regions:**
- Ensemble agrees → Low uncertainty → Trust predictions
- L-BFGS finds valid optima → FEA confirms → Success

**In poorly-sampled regions:**
- Ensemble disagrees → High uncertainty → Run FEA instead
- L-BFGS penalized → Stays in trusted zone → No fake optima

**At distribution boundaries:**
- OOD detector flags → Reject predictions
- Acquisition prioritizes → Active learning fills gaps

---

## Metrics to Track

1. **Surrogate R² on validation set** - Target > 0.95 before L-BFGS
2. **Prediction error histogram** - Should be centered at 0
3. **OOD rejection rate** - How often we refuse to predict
4. **Ensemble disagreement** - Average std across predictions
5. **L-BFGS success rate** - % of L-BFGS solutions that validate

---

## When to Use SAT vs Pure TPE

| Scenario | Recommendation |
|----------|----------------|
| < 100 existing samples | Pure TPE (not enough for good surrogate) |
| 100-500 samples | SAT Phase 1-2 only (no L-BFGS) |
| > 500 samples | Full SAT with L-BFGS refinement |
| High-dimensional (>20 params) | Pure TPE (curse of dimensionality) |
| Noisy FEA | Pure TPE (surrogates struggle with noise) |

---

## References

- Gaussian Process literature on uncertainty quantification
- Deep Ensembles: Lakshminarayanan et al. (2017)
- Bayesian Optimization with Expected Improvement
- Trust-region methods for constrained optimization

---

*The key insight: A surrogate that knows when it doesn't know is infinitely more valuable than one that's confidently wrong.*
feat: Pre-migration checkpoint - updated docs and utilities Updates before optimization_engine migration: - Updated migration plan to v2.1 with complete file inventory - Added OP_07 disk optimization protocol - Added SYS_16 self-aware turbo protocol - Added study archiver and cleanup utilities - Added ensemble surrogate module - Updated NX solver and session manager - Updated zernike HTML generator - Added context engineering plan - LAC session insights updates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> 2025-12-29 10:22:45 -05:00			`# SYS_16: Self-Aware Turbo (SAT) Optimization`

			`## Version: 1.0`
			`## Status: PROPOSED`
			`## Created: 2025-12-28`

			`---`

			`## Problem Statement`

			`V5 surrogate + L-BFGS failed catastrophically because:`
			`1. MLP predicted WS=280 but actual was WS=376 (30%+ error)`
			`2. L-BFGS descended to regions outside training distribution`
			`3. Surrogate had no way to signal uncertainty`
			`4. All L-BFGS solutions converged to the same "fake optimum"`

			`Root cause: The surrogate is overconfident in regions where it has no data.`

			`---`

			`## Solution: Uncertainty-Aware Surrogate with Active Learning`

			`### Core Principles`

			`1. Never trust a point prediction - Always require uncertainty bounds`
			`2. High uncertainty = run FEA - Don't optimize where you don't know`
			`3. Actively fill gaps - Prioritize FEA in high-uncertainty regions`
			`4. Validate gradient solutions - Check L-BFGS results against FEA before trusting`

			`---`

			`## Architecture`

			`### 1. Ensemble Surrogate (Epistemic Uncertainty)`

			`Instead of one MLP, train N independent models with different initializations:`

			```python
			`class EnsembleSurrogate:`
			`def __init__(self, n_models=5):`
			`self.models = [MLP() for _ in range(n_models)]`

			`def predict(self, x):`
			`preds = [m.predict(x) for m in self.models]`
			`mean = np.mean(preds, axis=0)`
			`std = np.std(preds, axis=0) # Epistemic uncertainty`
			`return mean, std`

			`def is_confident(self, x, threshold=0.1):`
			`mean, std = self.predict(x)`
			`# Confident if std < 10% of mean`
			`return (std / (mean + 1e-6)) < threshold`
			```

			`Why this works: Models trained on different random seeds will agree in well-sampled regions but disagree wildly in extrapolation regions.`

			`### 2. Distance-Based OOD Detection`

			`Track training data distribution and flag points that are "too far":`

			```python
			`class OODDetector:`
			`def __init__(self, X_train):`
			`self.X_train = X_train`
			`self.mean = X_train.mean(axis=0)`
			`self.std = X_train.std(axis=0)`
			`# Fit KNN for local density`
			`self.knn = NearestNeighbors(n_neighbors=5)`
			`self.knn.fit(X_train)`

			`def distance_to_training(self, x):`
			`"""Return distance to nearest training points."""`
			`distances, _ = self.knn.kneighbors(x.reshape(1, -1))`
			`return distances.mean()`

			`def is_in_distribution(self, x, threshold=2.0):`
			`"""Check if point is within 2 std of training data."""`
			`z_scores = np.abs((x - self.mean) / (self.std + 1e-6))`
			`return z_scores.max() < threshold`
			```

			`### 3. Trust-Region L-BFGS`

			`Constrain L-BFGS to stay within training distribution:`

			```python
			`def trust_region_lbfgs(surrogate, ood_detector, x0, max_iter=100):`
			`"""L-BFGS that respects training data boundaries."""`

			`def constrained_objective(x):`
			`# If OOD, return large penalty`
			`if not ood_detector.is_in_distribution(x):`
			`return 1e9`

			`mean, std = surrogate.predict(x)`
			`# If uncertain, return upper confidence bound (pessimistic)`
			`if std > 0.1 * mean:`
			`return mean + 2 * std # Be conservative`

			`return mean`

			`result = minimize(constrained_objective, x0, method='L-BFGS-B')`
			`return result.x`
			```

			`### 4. Acquisition Function with Uncertainty`

			`Use Expected Improvement with Uncertainty (like Bayesian Optimization):`

			```python
			`def acquisition_score(x, surrogate, best_so_far):`
			`"""Score = potential improvement weighted by confidence."""`
			`mean, std = surrogate.predict(x)`

			`# Expected improvement (lower is better for minimization)`
			`improvement = best_so_far - mean`

			`# Exploration bonus for uncertain regions`
			`exploration = 0.5 * std`

			`# High score = worth evaluating with FEA`
			`return improvement + exploration`

			`def select_next_fea_candidates(surrogate, candidates, best_so_far, n=5):`
			`"""Select candidates balancing exploitation and exploration."""`
			`scores = [acquisition_score(c, surrogate, best_so_far) for c in candidates]`

			`# Pick top candidates by acquisition score`
			`top_indices = np.argsort(scores)[-n:]`
			`return [candidates[i] for i in top_indices]`
			```

			`---`

			`## Algorithm: Self-Aware Turbo (SAT)`

			```
			`INITIALIZE:`
			`- Load existing FEA data (X_train, Y_train)`
			`- Train ensemble surrogate on data`
			`- Fit OOD detector on X_train`
			`- Set best_ws = min(Y_train)`

			`PHASE 1: UNCERTAINTY MAPPING (10% of budget)`
			`FOR i in 1..N_mapping:`
			`- Sample random point x`
			`- Get uncertainty: mean, std = surrogate.predict(x)`
			`- If std > threshold: run FEA, add to training data`
			`- Retrain ensemble periodically`

			`This fills in the "holes" in the surrogate's knowledge.`

			`PHASE 2: EXPLOITATION WITH VALIDATION (80% of budget)`
			`FOR i in 1..N_exploit:`
			`- Generate 1000 TPE samples`
			`- Filter to keep only confident predictions (std < 10% of mean)`
			`- Filter to keep only in-distribution (OOD check)`
			`- Rank by predicted WS`

			`- Take top 5 candidates`
			`- Run FEA on all 5`

			`- For each FEA result:`
			`- Compare predicted vs actual`
			`- If error > 20%: mark region as "unreliable", force exploration there`
			`- If error < 10%: update best, retrain surrogate`

			`- Every 10 iterations: retrain ensemble with new data`

			`PHASE 3: L-BFGS REFINEMENT (10% of budget)`
			`- Only run L-BFGS if ensemble R² > 0.95 on validation set`
			`- Use trust-region L-BFGS (stay within training distribution)`

			`FOR each L-BFGS solution:`
			`- Check ensemble disagreement`
			`- If models agree (std < 5%): run FEA to validate`
			`- If models disagree: skip, too uncertain`

			`- Compare L-BFGS prediction vs FEA`
			`- If error > 15%: ABORT L-BFGS phase, return to Phase 2`
			`- If error < 10%: accept as candidate`

			`FINAL:`
			`- Return best FEA-validated design`
			`- Report uncertainty bounds for all objectives`
			```

			`---`

			`## Key Differences from V5`

			`\| Aspect \| V5 (Failed) \| SAT (Proposed) \|`
			`\|--------\|-------------\|----------------\|`
			`\| Model \| Single MLP \| Ensemble of 5 MLPs \|`
			`\| Uncertainty \| None \| Ensemble disagreement + OOD detection \|`
			`\| L-BFGS \| Trust blindly \| Trust-region, validate every step \|`
			`\| Extrapolation \| Accept \| Reject or penalize \|`
			`\| Active learning \| No \| Yes - prioritize uncertain regions \|`
			`\| Validation \| After L-BFGS \| Throughout \|`

			`---`

			`## Implementation Checklist`

			1. [ ] `EnsembleSurrogate` class with N=5 MLPs
			2. [ ] `OODDetector` with KNN + z-score checks
			3. [ ] `acquisition_score()` balancing exploitation/exploration
			`4. [ ] Trust-region L-BFGS with OOD penalties`
			`5. [ ] Automatic retraining when new FEA data arrives`
			`6. [ ] Logging of prediction errors to track surrogate quality`
			`7. [ ] Early abort if L-BFGS predictions consistently wrong`

			`---`

			`## Expected Behavior`

			`In well-sampled regions:`
			`- Ensemble agrees → Low uncertainty → Trust predictions`
			`- L-BFGS finds valid optima → FEA confirms → Success`

			`In poorly-sampled regions:`
			`- Ensemble disagrees → High uncertainty → Run FEA instead`
			`- L-BFGS penalized → Stays in trusted zone → No fake optima`

			`At distribution boundaries:`
			`- OOD detector flags → Reject predictions`
			`- Acquisition prioritizes → Active learning fills gaps`

			`---`

			`## Metrics to Track`

			`1. Surrogate R² on validation set - Target > 0.95 before L-BFGS`
			`2. Prediction error histogram - Should be centered at 0`
			`3. OOD rejection rate - How often we refuse to predict`
			`4. Ensemble disagreement - Average std across predictions`
			`5. L-BFGS success rate - % of L-BFGS solutions that validate`

			`---`

			`## When to Use SAT vs Pure TPE`

			`\| Scenario \| Recommendation \|`
			`\|----------\|----------------\|`
			`\| < 100 existing samples \| Pure TPE (not enough for good surrogate) \|`
			`\| 100-500 samples \| SAT Phase 1-2 only (no L-BFGS) \|`
			`\| > 500 samples \| Full SAT with L-BFGS refinement \|`
			`\| High-dimensional (>20 params) \| Pure TPE (curse of dimensionality) \|`
			`\| Noisy FEA \| Pure TPE (surrogates struggle with noise) \|`

			`---`

			`## References`

			`- Gaussian Process literature on uncertainty quantification`
			`- Deep Ensembles: Lakshminarayanan et al. (2017)`
			`- Bayesian Optimization with Expected Improvement`
			`- Trust-region methods for constrained optimization`

			`---`

			`The key insight: A surrogate that knows when it doesn't know is infinitely more valuable than one that's confidently wrong.`