Files
Atomizer/projects/hydrotech-beam/studies/01_doe_landscape/OPTIMIZATION_STRATEGY.md
2026-02-15 08:00:21 +00:00

26 KiB
Raw Blame History

Optimization Strategy — Hydrotech Beam DoE & Landscape Mapping

Study: 01_doe_landscape
Project: Hydrotech Beam Structural Optimization
Author: Optimizer Agent
Date: 2025-02-09 (updated 2026-02-10 — introspection corrections)
Status: APPROVED WITH CONDITIONS — Auditor review 2026-02-10, blockers resolved inline
References: BREAKDOWN.md, DECISIONS.md, CONTEXT.md


1. Problem Formulation

1.1 Objective

\min_{x} \quad f(x) = \text{mass}(x) \quad [\text{kg}]

Single-objective minimization of total beam mass. This aligns with DEC-HB-001 (approved by Tech Lead, pending CEO confirmation).

1.2 Constraints

ID Constraint Operator Limit Units Source
g₁ Tip displacement 10.0 mm NX Nastran SOL 101 — displacement sensor at beam tip
g₂ Max von Mises stress 130.0 MPa NX Nastran SOL 101 — max elemental nodal VM stress

Both are hard constraints — no trade-off or relaxation without CEO approval.

1.3 Design Variables

ID NX Expression Type Lower Upper Baseline Units Notes
DV1 beam_half_core_thickness Continuous 10 40 25.162 mm Core half-thickness; stiffness scales ~quadratically via sandwich effect
DV2 beam_face_thickness Continuous 10 40 21.504 mm Face sheet thickness; primary bending stiffness contributor
DV3 holes_diameter Continuous 150 450 300 mm Lightening hole diameter; mass ∝ d² reduction
DV4 hole_count (→ Pattern_p7) Integer 5 15 10 Number of lightening holes; 11 discrete levels

Total design space: 3 continuous × 1 integer (11 levels) = effectively 3D continuous × 11 slices.

1.4 Integer Handling

Per DEC-HB-003, hole_count is treated as a true integer throughout:

  • Phase 1 (LHS): Generate continuous LHS, round DV4 to nearest integer. Use stratified integer sampling to ensure coverage across all 11 levels.
  • Phase 2 (TPE): Optuna IntDistribution(5, 15) — native integer support, no rounding hacks.
  • NX rebuild: The model requires integer hole count. Non-integer values will cause geometry failures.

1.5 Baseline Assessment

Metric Baseline Value Constraint Status
Mass 1,133.01 kg (minimize) Overbuilt — room to reduce
Tip displacement ~22 mm (unverified — awaiting baseline re-run) ≤ 10 mm Likely FAILS
VM stress (unknown — awaiting baseline re-run) ≤ 130 MPa ⚠️ Unconfirmed

⚠️ Critical: The baseline design likely violates the displacement constraint (~22 mm vs 10 mm limit). Baseline re-run pending — CEO running SOL 101 in parallel. The optimizer must first find the feasible region before it can meaningfully minimize mass. This shapes the entire strategy.

Introspection note (2026-02-10): Mass expression is p173 (body_property147.mass, kg). DV baselines are NOT round numbers (face=21.504mm, core=25.162mm). NX expression beam_lenght has a typo (no 'h'). hole_count links to Pattern_p7 in the NX pattern feature.


2. Algorithm Selection

2.1 Tech Lead's Recommendation

DEC-HB-002 proposes a two-phase strategy:

  • Phase 1: Latin Hypercube Sampling (LHS) — 4050 trials
  • Phase 2: TPE via Optuna — 60100 trials

2.2 My Assessment: CONFIRMED with refinements

The two-phase approach is the right call. Here's why, and what I'd adjust:

Why LHS → TPE is correct for this problem

Factor Implication Algorithm Fit
4 design variables (low-dim) All methods work; sample efficiency less critical Any
1 integer variable Need native mixed-type support TPE ✓, CMA-ES ≈ (rounding)
Infeasible baseline Must map feasibility BEFORE optimizing LHS first ✓
Expected significant interactions (DV1×DV2, DV3×DV4) Need space-filling to detect interactions LHS ✓
Potentially narrow feasible region Risk of missing it with random search LHS gives systematic coverage ✓
NX-in-the-loop (medium cost) ~100-200 trials is budget-appropriate TPE efficient enough ✓

What I'd modify

  1. Phase 1 budget: 50 trials (not 40). With 4 variables, we want at least 10× the dimensionality for a reliable DoE. 50 trials also divides cleanly for stratified integer sampling (≈4-5 trials per hole_count level).

  2. Enqueue baseline as Trial 0. LAC critical lesson: CMA-ES doesn't evaluate x0 first. While we're using LHS (not CMA-ES), the same principle applies — always evaluate the baseline explicitly so we have a verified anchor point. This also validates the extractor pipeline before burning 50 trials.

  3. Phase 2 budget: 80 trials (flexible 60-100). Start with 60, apply convergence criteria (Section 6), extend to 100 if still improving.

  4. Seed Phase 2 from Phase 1 data. Use Optuna's enqueue_trial() to warm-start TPE with the best feasible point(s) from the DoE. This avoids the TPE startup penalty (first n_startup_trials are random).

Algorithms NOT selected (and why)

Algorithm Why Not
CMA-ES Good option, but integer rounding is a hack; doesn't evaluate x0 first (LAC lesson); TPE is equally good at 4D
NSGA-II Overkill for single-objective; population size wastes budget
Surrogate + L-BFGS LAC CRITICAL: Gradient descent on surrogates finds fake optima. V5 mirror study: L-BFGS was 22% WORSE than pure TPE (WS=325 vs WS=290). V6 confirmed simple TPE beats complex surrogate methods. Do not use.
SOL 200 (Nastran native) No integer support for hole_count; gradient-based so may miss global optimum; more NX setup effort. Keep as backup (Tech Lead's suggestion).
Nelder-Mead No integer support; poor exploration; would miss the feasible region

2.3 Final Algorithm Configuration

Phase 1: LHS DoE
  - Trials: 50 (+ 1 baseline = 51 total)
  - Sampling: Maximin LHS, DV4 rounded to nearest integer
  - Purpose: Landscape mapping, feasibility identification, sensitivity analysis

Phase 2: TPE Optimization
  - Trials: 60-100 (adaptive, see convergence criteria)
  - Sampler: Optuna TPEsampler
  - n_startup_trials: 0 (warm-started from Phase 1 best)
  - Constraint handling: Optuna constraint interface with Deb's rules
  - Purpose: Converge to minimum-mass feasible design

Total budget: 111-151 evaluations

3. Constraint Handling

3.1 The Challenge

The baseline FAILS the displacement constraint by 120% (22 mm vs 10 mm). This means:

  • A significant portion of the design space may be infeasible
  • Random sampling may return few or zero feasible points
  • The optimizer must navigate toward feasibility AND optimality simultaneously

3.2 Approach: Deb's Feasibility Rules (Constraint Domination)

For ranking solutions during optimization, use Deb's feasibility rules (Deb 2000):

  1. Feasible vs feasible → compare by objective (lower mass wins)
  2. Feasible vs infeasible → feasible always wins
  3. Infeasible vs infeasible → lower total constraint violation wins

This is implemented via Optuna's constraint interface:

def constraints(trial):
    """Return constraint violations (negative = feasible, positive = infeasible)."""
    disp = trial.user_attrs["tip_displacement"]
    stress = trial.user_attrs["max_von_mises"]
    return [
        disp - 10.0,      # ≤ 0 means displacement ≤ 10 mm
        stress - 130.0,    # ≤ 0 means stress ≤ 130 MPa
    ]

3.3 Why NOT Penalty Functions

Method Pros Cons Verdict
Deb's rules (selected) No tuning params; feasible always beats infeasible; explores infeasible region for learning Requires custom Optuna integration Best for this case
Quadratic penalty Simple to implement Penalty weight requires tuning; wrong weight → optimizer ignores constraint OR over-penalizes Fragile
Adaptive penalty Self-tuning Complex implementation; may oscillate Over-engineered for 4 DVs
Death penalty (reject infeasible) Simplest With infeasible baseline, may reject 80%+ of trials → wasted budget Dangerous

3.4 Phase 1 (DoE) Constraint Handling

During the DoE phase, record all results without filtering. Every trial runs, every result is stored. Infeasible points are valuable for:

  • Mapping the feasibility boundary
  • Training the TPE model in Phase 2
  • Understanding which variables drive constraint violation

3.5 Constraint Margin Buffer

Consider a 5% inner margin during optimization to account for numerical noise:

  • Displacement target for optimizer: ≤ 9.5 mm (vs hard limit 10.0 mm)
  • Stress target for optimizer: ≤ 123.5 MPa (vs hard limit 130.0 MPa)

The hard limits remain 10 mm / 130 MPa for final validation. The buffer prevents the optimizer from converging to designs that are right on the boundary and may flip infeasible under mesh variation.


4. Search Space Analysis

4.1 Bound Reasonableness

Variable Range Span Concern
DV1: half_core_thickness 1040 mm 4× range Reasonable. Lower bound = thin core, upper = thick. Stiffness-mass trade-off
DV2: face_thickness 1040 mm 4× range Reasonable. 10 mm face is already substantial for steel
DV3: holes_diameter 150450 mm 3× range ⚠️ Needs geometric check — see §4.2
DV4: hole_count 515 3× range ⚠️ Needs geometric check — see §4.2

4.2 Geometric Feasibility: Hole Overlap Analysis

Critical concern: At extreme DV3 × DV4 combinations, holes may overlap or leave insufficient ligament (material between holes).

Overlap condition (CORRECTED — Auditor review 2026-02-10)

The NX pattern places n holes across a span of p6 mm using n-1 intervals (holes at both endpoints of the span). Confirmed by introspection: Pattern_p8 = 4000/9 = 444.44 mm.

Spacing between hole centers = hole_span / (hole_count - 1)
Ligament between holes = spacing - d = hole_span/(hole_count - 1) - d

For no overlap, we need: hole_span/(n-1) - d > 0, i.e., d < hole_span/(n-1)

With hole_span = 4,000 mm (fixed, p6):

Worst case: n=15 holes, d=450 mm

Spacing = 4000 / (15-1) = 285.7 mm
Ligament = 285.7 - 450 = -164.3 mm → INFEASIBLE (overlap)

Minimum ligament width

For structural integrity and mesh quality, a minimum ligament of ~30 mm is advisable:

Minimum ligament constraint: hole_span / (hole_count - 1) - holes_diameter ≥ 30 mm

Pre-flight geometric filter

Before sending any trial to NX, compute:

  1. ligament = 4000 / (hole_count - 1) - holes_diameter → must be ≥ 30 mm
  2. web_clear = 2 × beam_half_height - 2 × beam_face_thickness - holes_diameter → must be > 0

If either fails, skip NX evaluation and record as infeasible with max constraint violation. This saves compute and avoids NX geometry crashes.

4.3 Hole-to-Web-Height Ratio (CORRECTED — Auditor review 2026-02-10)

The hole diameter must fit within the web clear height. From introspection:

  • Total beam height = 2 × beam_half_height = 2 × 250 = 500 mm (fixed)
  • Web clear height = total_height - 2 × face_thickness = 500 - 2 × beam_face_thickness
At baseline (face=21.504mm): web_clear = 500 - 2×21.504 = 456.99 mm → holes of 450mm barely fit (7mm clearance)
At face=40mm: web_clear = 500 - 2×40 = 420 mm → holes of 450mm DO NOT FIT
At face=10mm: web_clear = 500 - 2×10 = 480 mm → holes of 450mm fit (30mm clearance)

This means beam_face_thickness and holes_diameter interact geometrically — thicker faces reduce the web clear height available for holes. This constraint is captured in the pre-flight filter (§4.2):

web_clear = 500 - 2 × beam_face_thickness - holes_diameter > 0

4.4 Expected Feasible Region

Based on the physics (Tech Lead's analysis §1.2 and §1.3):

To reduce displacement (currently 22→10 mm) Effect on mass
↑ DV1 (thicker core) ↑ mass (but stiffness scales ~d², mass scales ~d) → efficient
↑ DV2 (thicker face) ↑ mass (direct)
↓ DV3 (smaller holes) ↑ mass (more web material)
↓ DV4 (fewer holes) ↑ mass (more web material)

Prediction: The feasible region (displacement ≤ 10 mm) likely requires:

  • DV1 in upper range (25-40 mm) — the sandwich effect is the most mass-efficient stiffness lever
  • DV2 moderate (15-30 mm) — thicker faces help stiffness but cost mass directly
  • DV3 and DV4 constrained by stress — large/many holes save mass but increase stress

The optimizer should find a "sweet spot" where core thickness provides stiffness, and holes are sized to save mass without violating stress limits.

4.5 Estimated Design Space Volume

  • DV1: 30 mm span (continuous)
  • DV2: 30 mm span (continuous)
  • DV3: 300 mm span (continuous)
  • DV4: 11 integer levels

Total configurations: effectively infinite (3 continuous), but the integer dimension creates 11 "slices" of the space. With 50 DoE trials, we get ~4-5 trials per slice — sufficient for trend identification.


5. Trial Budget & Compute Estimate

5.1 Budget Breakdown

Phase Trials Purpose
Trial 0 1 Baseline validation (enqueued)
Phase 1: LHS DoE 50 Landscape mapping, feasibility, sensitivity
Phase 2: TPE 60100 Directed optimization
Validation 35 Confirm optimum, check mesh sensitivity
Total 114156

5.2 Compute Time Estimate

Parameter Estimate Notes
DOF count 10K100K Steel beam, SOL 101
Single solve time 30s3min Depends on mesh density
Model rebuild time 1030s NX parametric update + remesh
Total per trial 14 min Rebuild + solve + extraction
Phase 1 (51 trials) 13.5 hrs
Phase 2 (60100 trials) 17 hrs
Total compute 210 hrs Likely ~45 hrs

5.3 Budget Justification

For 4 design variables, rule-of-thumb budgets:

  • Minimum viable: 10 × n_vars = 40 trials (DoE only)
  • Standard: 25 × n_vars = 100 trials (DoE + optimization)
  • Thorough: 50 × n_vars = 200 trials (with validation)

Our budget of 114156 falls in the standard-to-thorough range. Appropriate for a first study where we're mapping an unknown landscape with an infeasible baseline.


6. Convergence Criteria

6.1 Phase 1 (DoE) — No Convergence Criteria

The DoE runs all 50 planned trials. It's not iterative — it's a one-shot space-filling design. Stop conditions:

  • All 50 trials complete (or fail with documented errors)
  • Early abort: If >80% of trials fail to solve (NX crashes), stop and investigate

6.2 Phase 2 (TPE) — Convergence Criteria

Criterion Threshold Action
Improvement stall Best feasible objective unchanged for 20 consecutive trials Consider stopping
Relative improvement < 1% improvement over last 20 trials Consider stopping
Budget exhausted 100 trials completed in Phase 2 Hard stop
Perfect convergence Multiple trials within 0.5% of each other from different regions Confident optimum found
Minimum budget Always run at least 60 trials in Phase 2 Ensures adequate exploration

6.3 Decision Logic

After 60 Phase 2 trials:
  IF best_feasible improved by >2% in last 20 trials → continue to 80
  IF no feasible solution found → STOP, escalate (see §7.1)
  ELSE → assess convergence, decide 80 or 100

After 80 Phase 2 trials:
  IF still improving >1% per 20 trials → continue to 100
  ELSE → STOP, declare converged

After 100 Phase 2 trials:
  HARD STOP regardless

6.4 Phase 1 → Phase 2 Gate

Before starting Phase 2, review DoE results:

Check Action if FAIL
At least 5 feasible points found If 0 feasible: expand bounds or relax constraints (escalate to CEO)
NX solve success rate > 80% If <80%: investigate failures, fix model, re-run failed trials
No systematic NX crashes at bounds If crashes: tighten bounds away from failure region
Sensitivity trends visible If flat: check extractors, may be reading wrong output

7. Risk Mitigation

7.1 Risk: Feasible Region is Empty

Likelihood: Medium (baseline fails displacement by 120%)

Detection: After Phase 1, zero feasible points found.

Mitigation ladder:

  1. Check the data — Are extractors reading correctly? Validate against manual NX check.
  2. Examine near-feasible — Find the trial closest to feasibility. How far off? If displacement = 10.5 mm, we're close. If displacement = 18 mm, we have a problem.
  3. Targeted exploration — Run additional trials at extreme stiffness (max DV1, max DV2, min DV3, min DV4). If even the stiffest/heaviest design fails, the constraint is physically impossible with this geometry.
  4. Constraint relaxation — Propose to CEO: relax displacement to 12 or 15 mm. Document the mass-displacement Pareto front from DoE data to support the discussion.
  5. Geometric redesign — If the problem is fundamentally infeasible, the beam geometry needs redesign (out of optimization scope).

7.2 Risk: NX Crashes at Parameter Extremes

Likelihood: Medium (LAC: rib_thickness had undocumented CAD constraint at 9mm, causing 34% failure rate in V13)

Detection: Solver returns no results for certain parameter combinations.

Mitigation:

  1. Pre-flight corner tests — Before Phase 1, manually test the 16 corners of the design space (2⁴ combinations of min/max for each variable). This catches geometric rebuild failures early.
  2. Error-handling in run script — Every trial must catch exceptions and log:
    • NX rebuild failure (geometry Boolean crash)
    • Meshing failure (degenerate elements)
    • Solver failure (singularity, divergence)
    • Extraction failure (missing result)
  3. Infeasible-by-default — If a trial fails for any reason, record it as infeasible with maximum constraint violation (displacement=9999, stress=9999). This lets Deb's rules naturally steer away from crashing regions.
  4. NEVER kill NX processes directly — LAC CRITICAL RULE. Use NXSessionManager.close_nx_if_allowed() only. If NX hangs, implement a timeout (e.g., 10 min per trial) and let NX time out gracefully.

7.3 Risk: Mesh-Dependent Stress Results

Likelihood: Medium (stress at hole edges is mesh-sensitive)

Mitigation:

  1. Mesh convergence pre-study — Run baseline at 3 mesh densities. If stress varies >10%, refine mesh or use stress averaging region.
  2. Consistent mesh controls — Ensure NX applies the same mesh size/refinement strategy regardless of parameter values. The parametric model should have mesh controls tied to hole geometry.
  3. Stress extraction method — Use elemental nodal stress (conservative) per LAC success pattern. Note: pyNastran returns stress in kPa for NX kg-mm-s unit system — divide by 1000 for MPa.

7.4 Risk: Surrogate Temptation

Mitigation: DON'T DO IT (yet).

LAC lessons from the M1 Mirror project are unequivocal:

  • V5 surrogate + L-BFGS was 22% worse than V6 pure TPE
  • MLP surrogates have smooth gradients everywhere → L-BFGS descends to fake optima outside training distribution
  • No uncertainty quantification = no way to detect out-of-distribution predictions

With only 4 variables and affordable FEA (~2 min/trial), direct FEA evaluation via TPE is both simpler and more reliable. Surrogate methods should only be considered if:

  • FEA solve time exceeds 30 minutes per trial, AND
  • We have 100+ validated training points, AND
  • We use ensemble surrogates with uncertainty quantification (SYS_16 protocol)

7.5 Risk: Study Corruption

Mitigation: LAC CRITICAL — Always copy working studies, never rewrite from scratch.

  • Phase 2 study will be created by copying the Phase 1 study directory and adding optimization logic
  • Never modify run_optimization.py in-place for a new phase — copy to a new version
  • Git-commit the study directory after each phase completion

8. AtomizerSpec Draft

See atomizer_spec_draft.json for the full JSON config.

8.1 Key Configuration Decisions

Setting Value Rationale
algorithm.phase1.type LHS Space-filling DoE for landscape mapping
algorithm.phase2.type TPE Native mixed-integer, sample-efficient, LAC-proven
hole_count.type integer DEC-HB-003: true integer, no rounding
constraint_handling deb_feasibility_rules Best for infeasible baseline
baseline_trial enqueued LAC lesson: always validate baseline first
penalty_config.method deb_rules No penalty weight tuning needed

8.2 Extractor Requirements

ID Type Output Source Notes
ext_001 expression mass NX expression p173 Direct read from NX
ext_002 displacement tip_displacement SOL 101 result sensor or .op2 parse ⚠️ Need sensor setup or node ID
ext_003 stress max_von_mises SOL 101 elemental nodal kPa → MPa conversion needed

8.3 Open Items for Spec Finalization

Before this spec can be promoted from _draft to production:

  1. Beam web length — Required to validate DV3 × DV4 geometric feasibility
  2. Displacement extraction method — Sensor in .sim, or node ID for .op2 parsing?
  3. Stress extraction scope — Whole model max, or specific element group?
  4. NX expression names confirmed — Verify p173 is mass, confirm displacement/stress expression names
  5. Solver runtime benchmark — Time one SOL 101 run to refine compute estimates
  6. Corner test results — Validate model rebuilds at all 16 bound corners

9. Execution Plan Summary

┌─────────────────────────────────────────────────────────────────┐
│                    HYDROTECH BEAM OPTIMIZATION                  │
│                    Study: 01_doe_landscape                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  PRE-FLIGHT (before any trials)                                 │
│  ├── Validate baseline: run Trial 0, verify mass/disp/stress   │
│  ├── Corner tests: 16 extreme combinations, check NX rebuilds  │
│  ├── Mesh convergence: 3 density levels at baseline             │
│  └── Confirm extractors: mass, displacement, stress pipelines   │
│                                                                 │
│  PHASE 1: DoE LANDSCAPE (51 trials)                             │
│  ├── Trial 0: Baseline (enqueued)                               │
│  ├── Trials 1-50: LHS with integer rounding for hole_count     │
│  ├── Analysis: sensitivity, interaction, feasibility mapping    │
│  └── GATE: ≥5 feasible? NX success >80%? Proceed/escalate      │
│                                                                 │
│  PHASE 2: TPE OPTIMIZATION (60-100 trials)                      │
│  ├── Warm-start from best Phase 1 feasible point(s)            │
│  ├── Deb's feasibility rules for constraint handling            │
│  ├── Convergence check every 20 trials                          │
│  └── Hard stop at 100 trials                                    │
│                                                                 │
│  VALIDATION (3-5 trials)                                        │
│  ├── Re-run best design to confirm repeatability                │
│  ├── Perturb ±5% on each variable to check sensitivity          │
│  └── Document final design with full NX results                 │
│                                                                 │
│  TOTAL: 114-156 NX evaluations | ~4-5 hours compute            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

10. LAC Lessons Incorporated

LAC Lesson Source How Applied
CMA-ES doesn't evaluate x0 first Mirror V7 failure Baseline enqueued as Trial 0 for both phases
Surrogate + L-BFGS = fake optima Mirror V5 failure No surrogates in this study; direct FEA only
Never kill NX processes directly Dec 2025 incident Timeout-based error handling; NXSessionManager only
Copy working studies, never rewrite Mirror V5 failure Phase 2 created by copying Phase 1
pyNastran stress in kPa Support arm success Extractor divides by 1000 for MPa
CAD constraints can limit bounds Mirror V13 (rib_thickness) Pre-flight corner tests before DoE
Always include README.md Repeated failures (Dec 2025, Jan 2026) README.md created with study
Simple beats complex (TPE > surrogate) Mirror V6 vs V5 TPE selected over surrogate-based methods

Optimizer — The algorithm is the strategy.