docs/guides/CMA-ES_EXPLAINED.md

# CMA-ES Explained for Engineers

**CMA-ES** = **Covariance Matrix Adaptation Evolution Strategy**

A derivative-free optimization algorithm ideal for:
- Local refinement around known good solutions
- 4-10 dimensional problems
- Smooth, continuous objective functions
- Problems where gradient information is unavailable (like FEA)

---

## The Core Idea

Imagine searching for the lowest point in a hilly landscape while blindfolded:

1. **Throw darts** around your current best guess
2. **Observe which darts land lower** (better objective)
3. **Learn the shape of the valley** from those results
4. **Adjust future throws** to follow the valley's direction

---

## Key Components

```
┌─────────────────────────────────────────────────────────────┐
│                      CMA-ES Components                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. MEAN (μ) - Current best guess location                  │
│     • Moves toward better solutions each generation         │
│                                                             │
│  2. STEP SIZE (σ) - How far to throw darts                  │
│     • Adapts: shrinks when close, grows when exploring      │
│     • sigma0=0.3 means 30% of parameter range initially     │
│                                                             │
│  3. COVARIANCE MATRIX (C) - Shape of the search cloud       │
│     • Learns parameter correlations                         │
│     • Stretches search along promising directions           │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

---

## Visual: How the Search Evolves

```
Generation 1 (Round search):        Generation 10 (Learned shape):

    x    x                              x
      x    x                          x   x
    x  ●  x     ──────►            x    ●    x
      x    x                          x   x
    x    x                              x

  ● = mean (center)                Ellipse aligned with
  x = samples                      the valley direction
```

CMA-ES learns that certain parameter combinations work well together and stretches its search cloud in that direction.

---

## The Algorithm (Simplified)

```python
def cma_es_generation():
    # 1. SAMPLE: Generate λ candidates around the mean
    for i in range(population_size):
        candidates[i] = mean + sigma * sample_from_gaussian(covariance=C)

    # 2. EVALUATE: Run FEA for each candidate
    for candidate in candidates:
        fitness[candidate] = run_simulation(candidate)

    # 3. SELECT: Keep the best μ candidates
    selected = top_k(candidates, by=fitness, k=mu)

    # 4. UPDATE MEAN: Move toward the best solutions
    new_mean = weighted_average(selected)

    # 5. UPDATE COVARIANCE: Learn parameter correlations
    C = update_covariance(C, selected, mean, new_mean)

    # 6. UPDATE STEP SIZE: Adapt exploration range
    sigma = adapt_step_size(sigma, evolution_path)
```

---

## The Covariance Matrix Magic

Consider 4 design variables:

```
Covariance Matrix C (4x4):
                    var1    var2    var3    var4
var1               [ 1.0     0.3    -0.5     0.1 ]
var2               [ 0.3     1.0     0.2    -0.2 ]
var3               [-0.5     0.2     1.0     0.4 ]
var4               [ 0.1    -0.2     0.4     1.0 ]
```

**Reading the matrix:**
- **Diagonal (1.0)**: Variance in each parameter
- **Off-diagonal**: Correlations between parameters
- **Positive (0.3)**: When var1 increases, var2 should increase
- **Negative (-0.5)**: When var1 increases, var3 should decrease

CMA-ES **learns these correlations automatically** from simulation results!

---

## CMA-ES vs TPE

| Property | TPE | CMA-ES |
|----------|-----|--------|
| **Best for** | Global exploration | Local refinement |
| **Starting point** | Random | Known baseline |
| **Correlation learning** | None (independent) | Automatic |
| **Step size** | Fixed ranges | Adaptive |
| **Dimensionality** | Good for high-D | Best for 4-10D |
| **Sample efficiency** | Good | Excellent (locally) |

---

## Optuna Configuration

```python
from optuna.samplers import CmaEsSampler

# Baseline values (starting point)
x0 = {
    'whiffle_min': 62.75,
    'whiffle_outer_to_vertical': 75.89,
    'whiffle_triangle_closeness': 65.65,
    'blank_backface_angle': 4.43
}

sampler = CmaEsSampler(
    x0=x0,           # Center of initial distribution
    sigma0=0.3,      # Initial step size (30% of range)
    seed=42,         # Reproducibility
    restart_strategy='ipop'  # Increase population on restart
)

study = optuna.create_study(sampler=sampler, direction="minimize")

# CRITICAL: Enqueue baseline as trial 0!
# x0 only sets the CENTER, it doesn't evaluate the baseline
study.enqueue_trial(x0)

study.optimize(objective, n_trials=200)
```

---

## Common Pitfalls

### 1. Not Evaluating the Baseline

**Problem**: CMA-ES samples AROUND x0, but doesn't evaluate x0 itself.

**Solution**: Always enqueue the baseline:
```python
if len(study.trials) == 0:
    study.enqueue_trial(x0)
```

### 2. sigma0 Too Large or Too Small

| sigma0 | Effect |
|--------|--------|
| **Too large (>0.5)** | Explores too far, misses local optimum |
| **Too small (<0.1)** | Gets stuck, slow convergence |
| **Recommended (0.2-0.3)** | Good balance for refinement |

### 3. Wrong Problem Type

CMA-ES struggles with:
- Discrete/categorical variables
- Very high dimensions (>20)
- Multi-modal landscapes (use TPE first)
- Noisy objectives (add regularization)

---

## When to Use CMA-ES in Atomizer

| Scenario | Use CMA-ES? |
|----------|-------------|
| First exploration of design space | No, use TPE |
| Refining around known good design | **Yes** |
| 4-10 continuous variables | **Yes** |
| >15 variables | No, use TPE or NSGA-II |
| Need to learn variable correlations | **Yes** |
| Multi-objective optimization | No, use NSGA-II |

---

## References

- Hansen, N. (2016). The CMA Evolution Strategy: A Tutorial
- Optuna CmaEsSampler: https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.CmaEsSampler.html
- cmaes Python package: https://github.com/CyberAgentAILab/cmaes

---

*Created: 2025-12-19*
*Atomizer Framework*
-												feat: Major update - Physics docs, Zernike OPD, insights, NX journals, tools

Documentation:
- Add docs/06_PHYSICS/ with Zernike fundamentals and OPD method docs
- Add docs/guides/CMA-ES_EXPLAINED.md optimization guide
- Update CLAUDE.md and ATOMIZER_CONTEXT.md with current architecture
- Update OP_01_CREATE_STUDY protocol

Planning:
- Add DYNAMIC_RESPONSE plans for random vibration/PSD support
- Add OPTIMIZATION_ENGINE_MIGRATION_PLAN for code reorganization

Insights System:
- Update design_space, modal_analysis, stress_field, thermal_field insights
- Improve error handling and data validation

NX Journals:
- Add analyze_wfe_zernike.py for Zernike WFE analysis
- Add capture_study_images.py for automated screenshots
- Add extract_expressions.py and introspect_part.py utilities
- Add user_generated_journals/journal_top_view_image_taking.py

Tests & Tools:
- Add comprehensive Zernike OPD test suite
- Add audit_v10 tests for WFE validation
- Add tools for Pareto graphs and mirror data extraction
- Add migrate_studies_to_topics.py utility

Knowledge Base:
- Initialize LAC (Learning Atomizer Core) with failure/success patterns

Dashboard:
- Update Setup.tsx and launch_dashboard.py
- Add restart-dev.bat helper script

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

											
										
										
											2025-12-23 19:47:37 -05:00
+								# CMA-ES Explained for Engineers
 								**CMA-ES** = **Covariance Matrix Adaptation Evolution Strategy**
 								A derivative-free optimization algorithm ideal for:
 								- Local refinement around known good solutions
 								- 4-10 dimensional problems
 								- Smooth, continuous objective functions
 								- Problems where gradient information is unavailable (like FEA)
 								---
 								## The Core Idea
 								Imagine searching for the lowest point in a hilly landscape while blindfolded:
 . **Throw darts** around your current best guess
 . **Observe which darts land lower** (better objective)
 . **Learn the shape of the valley** from those results
 . **Adjust future throws** to follow the valley's direction
 								---
 								## Key Components
 								```
 								┌─────────────────────────────────────────────────────────────┐
 								│                      CMA-ES Components                       │
 								├─────────────────────────────────────────────────────────────┤
 								│                                                             │
 								│  1. MEAN (μ) - Current best guess location                  │
 								│     • Moves toward better solutions each generation         │
 								│                                                             │
 								│  2. STEP SIZE (σ) - How far to throw darts                  │
 								│     • Adapts: shrinks when close, grows when exploring      │
 								│     • sigma0=0.3 means 30% of parameter range initially     │
 								│                                                             │
 								│  3. COVARIANCE MATRIX (C) - Shape of the search cloud       │
 								│     • Learns parameter correlations                         │
 								│     • Stretches search along promising directions           │
 								│                                                             │
 								└─────────────────────────────────────────────────────────────┘
 								```
 								---
 								## Visual: How the Search Evolves
 								```
 								Generation 1 (Round search):        Generation 10 (Learned shape):
 								    x    x                              x
 								      x    x                          x   x
 								    x  ●  x     ──────►            x    ●    x
 								      x    x                          x   x
 								    x    x                              x
 								  ● = mean (center)                Ellipse aligned with
 								  x = samples                      the valley direction
 								```
 								CMA-ES learns that certain parameter combinations work well together and stretches its search cloud in that direction.
 								---
 								## The Algorithm (Simplified)
 								```python
 								def cma_es_generation():
 								    # 1. SAMPLE: Generate λ candidates around the mean
 								    for i in range(population_size):
 								        candidates[i] = mean + sigma * sample_from_gaussian(covariance=C)
 								    # 2. EVALUATE: Run FEA for each candidate
 								    for candidate in candidates:
 								        fitness[candidate] = run_simulation(candidate)
 								    # 3. SELECT: Keep the best μ candidates
 								    selected = top_k(candidates, by=fitness, k=mu)
 								    # 4. UPDATE MEAN: Move toward the best solutions
 								    new_mean = weighted_average(selected)
 								    # 5. UPDATE COVARIANCE: Learn parameter correlations
 								    C = update_covariance(C, selected, mean, new_mean)
 								    # 6. UPDATE STEP SIZE: Adapt exploration range
 								    sigma = adapt_step_size(sigma, evolution_path)
 								```
 								---
 								## The Covariance Matrix Magic
 								Consider 4 design variables:
 								```
 								Covariance Matrix C (4x4):
 								                    var1    var2    var3    var4
 								var1               [ 1.0     0.3    -0.5     0.1 ]
 								var2               [ 0.3     1.0     0.2    -0.2 ]
 								var3               [-0.5     0.2     1.0     0.4 ]
 								var4               [ 0.1    -0.2     0.4     1.0 ]
 								```
 								**Reading the matrix:**
 								- **Diagonal (1.0)**: Variance in each parameter
 								- **Off-diagonal**: Correlations between parameters
 								- **Positive (0.3)**: When var1 increases, var2 should increase
 								- **Negative (-0.5)**: When var1 increases, var3 should decrease
 								CMA-ES **learns these correlations automatically** from simulation results!
 								---
 								## CMA-ES vs TPE
 								| Property | TPE | CMA-ES |
 								|----------|-----|--------|
 								| **Best for** | Global exploration | Local refinement |
 								| **Starting point** | Random | Known baseline |
 								| **Correlation learning** | None (independent) | Automatic |
 								| **Step size** | Fixed ranges | Adaptive |
 								| **Dimensionality** | Good for high-D | Best for 4-10D |
 								| **Sample efficiency** | Good | Excellent (locally) |
 								---
 								## Optuna Configuration
 								```python
 								from optuna.samplers import CmaEsSampler
 								# Baseline values (starting point)
 								x0 = {
 								    'whiffle_min': 62.75,
 								    'whiffle_outer_to_vertical': 75.89,
 								    'whiffle_triangle_closeness': 65.65,
 								    'blank_backface_angle': 4.43
 								}
 								sampler = CmaEsSampler(
 								    x0=x0,           # Center of initial distribution
 								    sigma0=0.3,      # Initial step size (30% of range)
 								    seed=42,         # Reproducibility
 								    restart_strategy='ipop'  # Increase population on restart
 								)
 								study = optuna.create_study(sampler=sampler, direction="minimize")
 								# CRITICAL: Enqueue baseline as trial 0!
 								# x0 only sets the CENTER, it doesn't evaluate the baseline
 								study.enqueue_trial(x0)
 								study.optimize(objective, n_trials=200)
 								```
 								---
 								## Common Pitfalls
 								### 1. Not Evaluating the Baseline
 								**Problem**: CMA-ES samples AROUND x0, but doesn't evaluate x0 itself.
 								**Solution**: Always enqueue the baseline:
 								```python
 								if len(study.trials) == 0:
 								    study.enqueue_trial(x0)
 								```
 								### 2. sigma0 Too Large or Too Small
 								| sigma0 | Effect |
 								|--------|--------|
 								| **Too large (>0.5)** | Explores too far, misses local optimum |
 								| **Too small (<0.1)** | Gets stuck, slow convergence |
 								| **Recommended (0.2-0.3)** | Good balance for refinement |
 								### 3. Wrong Problem Type
 								CMA-ES struggles with:
 								- Discrete/categorical variables
 								- Very high dimensions (>20)
 								- Multi-modal landscapes (use TPE first)
 								- Noisy objectives (add regularization)
 								---
 								## When to Use CMA-ES in Atomizer
 								| Scenario | Use CMA-ES? |
 								|----------|-------------|
 								| First exploration of design space | No, use TPE |
 								| Refining around known good design | **Yes** |
 								| 4-10 continuous variables | **Yes** |
 								| >15 variables | No, use TPE or NSGA-II |
 								| Need to learn variable correlations | **Yes** |
 								| Multi-objective optimization | No, use NSGA-II |
 								---
 								## References
 								- Hansen, N. (2016). The CMA Evolution Strategy: A Tutorial
 								- Optuna CmaEsSampler: https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.CmaEsSampler.html
 								- cmaes Python package: https://github.com/CyberAgentAILab/cmaes
 								---
 								*Created: 2025-12-19*
 								*Atomizer Framework*