# CMA-ES Explained for Engineers **CMA-ES** = **Covariance Matrix Adaptation Evolution Strategy** A derivative-free optimization algorithm ideal for: - Local refinement around known good solutions - 4-10 dimensional problems - Smooth, continuous objective functions - Problems where gradient information is unavailable (like FEA) --- ## The Core Idea Imagine searching for the lowest point in a hilly landscape while blindfolded: 1. **Throw darts** around your current best guess 2. **Observe which darts land lower** (better objective) 3. **Learn the shape of the valley** from those results 4. **Adjust future throws** to follow the valley's direction --- ## Key Components ``` ┌─────────────────────────────────────────────────────────────┐ │ CMA-ES Components │ ├─────────────────────────────────────────────────────────────┤ │ │ │ 1. MEAN (μ) - Current best guess location │ │ • Moves toward better solutions each generation │ │ │ │ 2. STEP SIZE (σ) - How far to throw darts │ │ • Adapts: shrinks when close, grows when exploring │ │ • sigma0=0.3 means 30% of parameter range initially │ │ │ │ 3. COVARIANCE MATRIX (C) - Shape of the search cloud │ │ • Learns parameter correlations │ │ • Stretches search along promising directions │ │ │ └─────────────────────────────────────────────────────────────┘ ``` --- ## Visual: How the Search Evolves ``` Generation 1 (Round search): Generation 10 (Learned shape): x x x x x x x x ● x ──────► x ● x x x x x x x x ● = mean (center) Ellipse aligned with x = samples the valley direction ``` CMA-ES learns that certain parameter combinations work well together and stretches its search cloud in that direction. --- ## The Algorithm (Simplified) ```python def cma_es_generation(): # 1. SAMPLE: Generate λ candidates around the mean for i in range(population_size): candidates[i] = mean + sigma * sample_from_gaussian(covariance=C) # 2. EVALUATE: Run FEA for each candidate for candidate in candidates: fitness[candidate] = run_simulation(candidate) # 3. SELECT: Keep the best μ candidates selected = top_k(candidates, by=fitness, k=mu) # 4. UPDATE MEAN: Move toward the best solutions new_mean = weighted_average(selected) # 5. UPDATE COVARIANCE: Learn parameter correlations C = update_covariance(C, selected, mean, new_mean) # 6. UPDATE STEP SIZE: Adapt exploration range sigma = adapt_step_size(sigma, evolution_path) ``` --- ## The Covariance Matrix Magic Consider 4 design variables: ``` Covariance Matrix C (4x4): var1 var2 var3 var4 var1 [ 1.0 0.3 -0.5 0.1 ] var2 [ 0.3 1.0 0.2 -0.2 ] var3 [-0.5 0.2 1.0 0.4 ] var4 [ 0.1 -0.2 0.4 1.0 ] ``` **Reading the matrix:** - **Diagonal (1.0)**: Variance in each parameter - **Off-diagonal**: Correlations between parameters - **Positive (0.3)**: When var1 increases, var2 should increase - **Negative (-0.5)**: When var1 increases, var3 should decrease CMA-ES **learns these correlations automatically** from simulation results! --- ## CMA-ES vs TPE | Property | TPE | CMA-ES | |----------|-----|--------| | **Best for** | Global exploration | Local refinement | | **Starting point** | Random | Known baseline | | **Correlation learning** | None (independent) | Automatic | | **Step size** | Fixed ranges | Adaptive | | **Dimensionality** | Good for high-D | Best for 4-10D | | **Sample efficiency** | Good | Excellent (locally) | --- ## Optuna Configuration ```python from optuna.samplers import CmaEsSampler # Baseline values (starting point) x0 = { 'whiffle_min': 62.75, 'whiffle_outer_to_vertical': 75.89, 'whiffle_triangle_closeness': 65.65, 'blank_backface_angle': 4.43 } sampler = CmaEsSampler( x0=x0, # Center of initial distribution sigma0=0.3, # Initial step size (30% of range) seed=42, # Reproducibility restart_strategy='ipop' # Increase population on restart ) study = optuna.create_study(sampler=sampler, direction="minimize") # CRITICAL: Enqueue baseline as trial 0! # x0 only sets the CENTER, it doesn't evaluate the baseline study.enqueue_trial(x0) study.optimize(objective, n_trials=200) ``` --- ## Common Pitfalls ### 1. Not Evaluating the Baseline **Problem**: CMA-ES samples AROUND x0, but doesn't evaluate x0 itself. **Solution**: Always enqueue the baseline: ```python if len(study.trials) == 0: study.enqueue_trial(x0) ``` ### 2. sigma0 Too Large or Too Small | sigma0 | Effect | |--------|--------| | **Too large (>0.5)** | Explores too far, misses local optimum | | **Too small (<0.1)** | Gets stuck, slow convergence | | **Recommended (0.2-0.3)** | Good balance for refinement | ### 3. Wrong Problem Type CMA-ES struggles with: - Discrete/categorical variables - Very high dimensions (>20) - Multi-modal landscapes (use TPE first) - Noisy objectives (add regularization) --- ## When to Use CMA-ES in Atomizer | Scenario | Use CMA-ES? | |----------|-------------| | First exploration of design space | No, use TPE | | Refining around known good design | **Yes** | | 4-10 continuous variables | **Yes** | | >15 variables | No, use TPE or NSGA-II | | Need to learn variable correlations | **Yes** | | Multi-objective optimization | No, use NSGA-II | --- ## References - Hansen, N. (2016). The CMA Evolution Strategy: A Tutorial - Optuna CmaEsSampler: https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.CmaEsSampler.html - cmaes Python package: https://github.com/CyberAgentAILab/cmaes --- *Created: 2025-12-19* *Atomizer Framework*