213 lines
6.6 KiB
Markdown
213 lines
6.6 KiB
Markdown
|
|
# CMA-ES Explained for Engineers
|
|||
|
|
|
|||
|
|
**CMA-ES** = **Covariance Matrix Adaptation Evolution Strategy**
|
|||
|
|
|
|||
|
|
A derivative-free optimization algorithm ideal for:
|
|||
|
|
- Local refinement around known good solutions
|
|||
|
|
- 4-10 dimensional problems
|
|||
|
|
- Smooth, continuous objective functions
|
|||
|
|
- Problems where gradient information is unavailable (like FEA)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## The Core Idea
|
|||
|
|
|
|||
|
|
Imagine searching for the lowest point in a hilly landscape while blindfolded:
|
|||
|
|
|
|||
|
|
1. **Throw darts** around your current best guess
|
|||
|
|
2. **Observe which darts land lower** (better objective)
|
|||
|
|
3. **Learn the shape of the valley** from those results
|
|||
|
|
4. **Adjust future throws** to follow the valley's direction
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Key Components
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────┐
|
|||
|
|
│ CMA-ES Components │
|
|||
|
|
├─────────────────────────────────────────────────────────────┤
|
|||
|
|
│ │
|
|||
|
|
│ 1. MEAN (μ) - Current best guess location │
|
|||
|
|
│ • Moves toward better solutions each generation │
|
|||
|
|
│ │
|
|||
|
|
│ 2. STEP SIZE (σ) - How far to throw darts │
|
|||
|
|
│ • Adapts: shrinks when close, grows when exploring │
|
|||
|
|
│ • sigma0=0.3 means 30% of parameter range initially │
|
|||
|
|
│ │
|
|||
|
|
│ 3. COVARIANCE MATRIX (C) - Shape of the search cloud │
|
|||
|
|
│ • Learns parameter correlations │
|
|||
|
|
│ • Stretches search along promising directions │
|
|||
|
|
│ │
|
|||
|
|
└─────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Visual: How the Search Evolves
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Generation 1 (Round search): Generation 10 (Learned shape):
|
|||
|
|
|
|||
|
|
x x x
|
|||
|
|
x x x x
|
|||
|
|
x ● x ──────► x ● x
|
|||
|
|
x x x x
|
|||
|
|
x x x
|
|||
|
|
|
|||
|
|
● = mean (center) Ellipse aligned with
|
|||
|
|
x = samples the valley direction
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
CMA-ES learns that certain parameter combinations work well together and stretches its search cloud in that direction.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## The Algorithm (Simplified)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def cma_es_generation():
|
|||
|
|
# 1. SAMPLE: Generate λ candidates around the mean
|
|||
|
|
for i in range(population_size):
|
|||
|
|
candidates[i] = mean + sigma * sample_from_gaussian(covariance=C)
|
|||
|
|
|
|||
|
|
# 2. EVALUATE: Run FEA for each candidate
|
|||
|
|
for candidate in candidates:
|
|||
|
|
fitness[candidate] = run_simulation(candidate)
|
|||
|
|
|
|||
|
|
# 3. SELECT: Keep the best μ candidates
|
|||
|
|
selected = top_k(candidates, by=fitness, k=mu)
|
|||
|
|
|
|||
|
|
# 4. UPDATE MEAN: Move toward the best solutions
|
|||
|
|
new_mean = weighted_average(selected)
|
|||
|
|
|
|||
|
|
# 5. UPDATE COVARIANCE: Learn parameter correlations
|
|||
|
|
C = update_covariance(C, selected, mean, new_mean)
|
|||
|
|
|
|||
|
|
# 6. UPDATE STEP SIZE: Adapt exploration range
|
|||
|
|
sigma = adapt_step_size(sigma, evolution_path)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## The Covariance Matrix Magic
|
|||
|
|
|
|||
|
|
Consider 4 design variables:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Covariance Matrix C (4x4):
|
|||
|
|
var1 var2 var3 var4
|
|||
|
|
var1 [ 1.0 0.3 -0.5 0.1 ]
|
|||
|
|
var2 [ 0.3 1.0 0.2 -0.2 ]
|
|||
|
|
var3 [-0.5 0.2 1.0 0.4 ]
|
|||
|
|
var4 [ 0.1 -0.2 0.4 1.0 ]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Reading the matrix:**
|
|||
|
|
- **Diagonal (1.0)**: Variance in each parameter
|
|||
|
|
- **Off-diagonal**: Correlations between parameters
|
|||
|
|
- **Positive (0.3)**: When var1 increases, var2 should increase
|
|||
|
|
- **Negative (-0.5)**: When var1 increases, var3 should decrease
|
|||
|
|
|
|||
|
|
CMA-ES **learns these correlations automatically** from simulation results!
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## CMA-ES vs TPE
|
|||
|
|
|
|||
|
|
| Property | TPE | CMA-ES |
|
|||
|
|
|----------|-----|--------|
|
|||
|
|
| **Best for** | Global exploration | Local refinement |
|
|||
|
|
| **Starting point** | Random | Known baseline |
|
|||
|
|
| **Correlation learning** | None (independent) | Automatic |
|
|||
|
|
| **Step size** | Fixed ranges | Adaptive |
|
|||
|
|
| **Dimensionality** | Good for high-D | Best for 4-10D |
|
|||
|
|
| **Sample efficiency** | Good | Excellent (locally) |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Optuna Configuration
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from optuna.samplers import CmaEsSampler
|
|||
|
|
|
|||
|
|
# Baseline values (starting point)
|
|||
|
|
x0 = {
|
|||
|
|
'whiffle_min': 62.75,
|
|||
|
|
'whiffle_outer_to_vertical': 75.89,
|
|||
|
|
'whiffle_triangle_closeness': 65.65,
|
|||
|
|
'blank_backface_angle': 4.43
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
sampler = CmaEsSampler(
|
|||
|
|
x0=x0, # Center of initial distribution
|
|||
|
|
sigma0=0.3, # Initial step size (30% of range)
|
|||
|
|
seed=42, # Reproducibility
|
|||
|
|
restart_strategy='ipop' # Increase population on restart
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
study = optuna.create_study(sampler=sampler, direction="minimize")
|
|||
|
|
|
|||
|
|
# CRITICAL: Enqueue baseline as trial 0!
|
|||
|
|
# x0 only sets the CENTER, it doesn't evaluate the baseline
|
|||
|
|
study.enqueue_trial(x0)
|
|||
|
|
|
|||
|
|
study.optimize(objective, n_trials=200)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Common Pitfalls
|
|||
|
|
|
|||
|
|
### 1. Not Evaluating the Baseline
|
|||
|
|
|
|||
|
|
**Problem**: CMA-ES samples AROUND x0, but doesn't evaluate x0 itself.
|
|||
|
|
|
|||
|
|
**Solution**: Always enqueue the baseline:
|
|||
|
|
```python
|
|||
|
|
if len(study.trials) == 0:
|
|||
|
|
study.enqueue_trial(x0)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. sigma0 Too Large or Too Small
|
|||
|
|
|
|||
|
|
| sigma0 | Effect |
|
|||
|
|
|--------|--------|
|
|||
|
|
| **Too large (>0.5)** | Explores too far, misses local optimum |
|
|||
|
|
| **Too small (<0.1)** | Gets stuck, slow convergence |
|
|||
|
|
| **Recommended (0.2-0.3)** | Good balance for refinement |
|
|||
|
|
|
|||
|
|
### 3. Wrong Problem Type
|
|||
|
|
|
|||
|
|
CMA-ES struggles with:
|
|||
|
|
- Discrete/categorical variables
|
|||
|
|
- Very high dimensions (>20)
|
|||
|
|
- Multi-modal landscapes (use TPE first)
|
|||
|
|
- Noisy objectives (add regularization)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## When to Use CMA-ES in Atomizer
|
|||
|
|
|
|||
|
|
| Scenario | Use CMA-ES? |
|
|||
|
|
|----------|-------------|
|
|||
|
|
| First exploration of design space | No, use TPE |
|
|||
|
|
| Refining around known good design | **Yes** |
|
|||
|
|
| 4-10 continuous variables | **Yes** |
|
|||
|
|
| >15 variables | No, use TPE or NSGA-II |
|
|||
|
|
| Need to learn variable correlations | **Yes** |
|
|||
|
|
| Multi-objective optimization | No, use NSGA-II |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## References
|
|||
|
|
|
|||
|
|
- Hansen, N. (2016). The CMA Evolution Strategy: A Tutorial
|
|||
|
|
- Optuna CmaEsSampler: https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.CmaEsSampler.html
|
|||
|
|
- cmaes Python package: https://github.com/CyberAgentAILab/cmaes
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*Created: 2025-12-19*
|
|||
|
|
*Atomizer Framework*
|