.claude/skills/core/study-creation-core.md

---
skill_id: SKILL_CORE_001
version: 2.4
last_updated: 2025-12-07
type: core
code_dependencies:
  - optimization_engine/base_runner.py
  - optimization_engine/extractors/__init__.py
  - optimization_engine/templates/registry.json
requires_skills: []
replaces: create-study.md
---

# Study Creation Core Skill

**Version**: 2.4
**Updated**: 2025-12-07
**Type**: Core Skill

You are helping the user create a complete Atomizer optimization study from a natural language description.

**CRITICAL**: This skill is your SINGLE SOURCE OF TRUTH. DO NOT improvise or look at other studies for patterns. Use ONLY the patterns documented here and in the loaded modules.

---

## Module Loading

This core skill is always loaded. Additional modules are loaded based on context:

| Module | Load When | Path |
|--------|-----------|------|
| **extractors-catalog** | Always (for reference) | `modules/extractors-catalog.md` |
| **zernike-optimization** | "telescope", "mirror", "optical", "wavefront" | `modules/zernike-optimization.md` |
| **neural-acceleration** | >50 trials, "neural", "surrogate", "fast" | `modules/neural-acceleration.md` |

---

## MANDATORY: Model Introspection at Study Creation

**ALWAYS run introspection when creating a study or when user asks:**

```python
from optimization_engine.hooks.nx_cad.model_introspection import (
    introspect_part,
    introspect_simulation,
    introspect_op2,
    introspect_study
)

# Introspect entire study directory (recommended)
study_info = introspect_study("studies/my_study/")

# Or introspect individual files
part_info = introspect_part("path/to/model.prt")
sim_info = introspect_simulation("path/to/model.sim")
op2_info = introspect_op2("path/to/results.op2")
```

### Introspection Extracts

| Source | Information |
|--------|-------------|
| `.prt` | Expressions (count, values, types), bodies, mass, material, features |
| `.sim` | Solutions, boundary conditions, loads, materials, mesh info, output requests |
| `.op2` | Available results (displacement, stress, strain, SPC forces, etc.), subcases |

### Generate Introspection Report

**MANDATORY**: Save `MODEL_INTROSPECTION.md` to study directory at creation:

```python
# After introspection, generate and save report
study_info = introspect_study(study_dir)
# Generate markdown report and save to studies/{study_name}/MODEL_INTROSPECTION.md
```

---

## MANDATORY DOCUMENTATION CHECKLIST

**EVERY study MUST have these files. A study is NOT complete without them:**

| File | Purpose | When Created |
|------|---------|--------------|
| `MODEL_INTROSPECTION.md` | **Model Analysis** - Expressions, solutions, available results | At study creation |
| `README.md` | **Engineering Blueprint** - Full mathematical formulation | At study creation |
| `STUDY_REPORT.md` | **Results Tracking** - Progress, best designs, recommendations | At study creation (template) |

**README.md Requirements (11 sections)**:
1. Engineering Problem (objective, physical system)
2. Mathematical Formulation (objectives, design variables, constraints with LaTeX)
3. Optimization Algorithm (config, properties, return format)
4. Simulation Pipeline (trial execution flow diagram)
5. Result Extraction Methods (extractor details, code snippets)
6. Neural Acceleration (surrogate config, expected performance)
7. Study File Structure (directory tree)
8. Results Location (output files)
9. Quick Start (commands)
10. Configuration Reference (config.json mapping)
11. References

**FAILURE MODE**: If you create a study without MODEL_INTROSPECTION.md, README.md, and STUDY_REPORT.md, the study is incomplete.

---

## PR.3 NXSolver Interface

**Module**: `optimization_engine.nx_solver`

```python
from optimization_engine.nx_solver import NXSolver

nx_solver = NXSolver(
    nastran_version="2412",      # NX version
    timeout=600,                  # Max solve time (seconds)
    use_journal=True,             # Use journal mode (recommended)
    enable_session_management=True,
    study_name="my_study"
)
```

**Main Method - `run_simulation()`**:
```python
result = nx_solver.run_simulation(
    sim_file=sim_file,           # Path to .sim file
    working_dir=model_dir,       # Working directory
    expression_updates=design_vars,  # Dict: {'param_name': value}
    solution_name=None,          # None = solve ALL solutions
    cleanup=True                 # Remove temp files after
)

# Returns:
# {
#     'success': bool,
#     'op2_file': Path,
#     'log_file': Path,
#     'elapsed_time': float,
#     'errors': list,
#     'solution_name': str
# }
```

**CRITICAL**: For multi-solution workflows (static + modal), set `solution_name=None`.

---

## PR.4 Sampler Configurations

| Sampler | Use Case | Import | Config |
|---------|----------|--------|--------|
| **NSGAIISampler** | Multi-objective (2-3 objectives) | `from optuna.samplers import NSGAIISampler` | `NSGAIISampler(population_size=20, mutation_prob=0.1, crossover_prob=0.9, seed=42)` |
| **TPESampler** | Single-objective | `from optuna.samplers import TPESampler` | `TPESampler(seed=42)` |
| **CmaEsSampler** | Single-objective, continuous | `from optuna.samplers import CmaEsSampler` | `CmaEsSampler(seed=42)` |

---

## PR.5 Study Creation Patterns

**Multi-Objective (NSGA-II)**:
```python
study = optuna.create_study(
    study_name=study_name,
    storage=f"sqlite:///{results_dir / 'study.db'}",
    sampler=NSGAIISampler(population_size=20, seed=42),
    directions=['minimize', 'maximize'],  # [obj1_dir, obj2_dir]
    load_if_exists=True
)
```

**Single-Objective (TPE)**:
```python
study = optuna.create_study(
    study_name=study_name,
    storage=f"sqlite:///{results_dir / 'study.db'}",
    sampler=TPESampler(seed=42),
    direction='minimize',  # or 'maximize'
    load_if_exists=True
)
```

---

## PR.6 Objective Function Return Formats

**Multi-Objective** (directions=['minimize', 'minimize']):
```python
def objective(trial) -> Tuple[float, float]:
    # ... extraction ...
    return (obj1, obj2)  # Both positive, framework handles direction
```

**Multi-Objective with maximize** (directions=['maximize', 'minimize']):
```python
def objective(trial) -> Tuple[float, float]:
    # ... extraction ...
    return (-stiffness, mass)  # -stiffness so minimize → maximize
```

**Single-Objective**:
```python
def objective(trial) -> float:
    # ... extraction ...
    return objective_value
```

---

## PR.7 Hook System

**Available Hook Points** (from `optimization_engine.plugins.hooks`):

| Hook Point | When | Context Keys |
|------------|------|--------------|
| `PRE_MESH` | Before meshing | `trial_number, design_variables, sim_file` |
| `POST_MESH` | After mesh | `trial_number, design_variables, sim_file` |
| `PRE_SOLVE` | Before solve | `trial_number, design_variables, sim_file, working_dir` |
| `POST_SOLVE` | After solve | `trial_number, design_variables, op2_file, working_dir` |
| `POST_EXTRACTION` | After extraction | `trial_number, design_variables, results, working_dir` |
| `POST_CALCULATION` | After calculations | `trial_number, objectives, constraints, feasible` |
| `CUSTOM_OBJECTIVE` | Custom objectives | `trial_number, design_variables, extracted_results` |

See [EXT_02_CREATE_HOOK](../../docs/protocols/extensions/EXT_02_CREATE_HOOK.md) for creating custom hooks.

---

## PR.8 Structured Logging (MANDATORY)

**Always use structured logging**:
```python
from optimization_engine.logger import get_logger

logger = get_logger(study_name, study_dir=results_dir)

# Study lifecycle
logger.study_start(study_name, n_trials, "NSGAIISampler")
logger.study_complete(study_name, total_trials, successful_trials)

# Trial lifecycle
logger.trial_start(trial.number, design_vars)
logger.trial_complete(trial.number, objectives_dict, constraints_dict, feasible)
logger.trial_failed(trial.number, error_message)

# General logging
logger.info("message")
logger.warning("message")
logger.error("message", exc_info=True)
```

---

## Study Structure

```
studies/{study_name}/
├── 1_setup/                          # INPUT: Configuration & Model
│   ├── model/                        # WORKING COPY of NX Files
│   │   ├── {Model}.prt               # Parametric part
│   │   ├── {Model}_sim1.sim          # Simulation setup
│   │   └── *.dat, *.op2, *.f06       # Solver outputs
│   ├── optimization_config.json      # Study configuration
│   └── workflow_config.json          # Workflow metadata
├── 2_results/                        # OUTPUT: Results
│   ├── study.db                      # Optuna SQLite database
│   └── optimization_history.json     # Trial history
├── run_optimization.py               # Main entry point
├── reset_study.py                    # Database reset
├── README.md                         # Engineering blueprint
└── STUDY_REPORT.md                   # Results report template
```

---

## CRITICAL: Model File Protection

**NEVER modify the user's original/master model files.** Always work on copies.

```python
import shutil
from pathlib import Path

def setup_working_copy(source_dir: Path, model_dir: Path, file_patterns: list):
    """Copy model files from user's source to study working directory."""
    model_dir.mkdir(parents=True, exist_ok=True)

    for pattern in file_patterns:
        for src_file in source_dir.glob(pattern):
            dst_file = model_dir / src_file.name
            if not dst_file.exists():
                shutil.copy2(src_file, dst_file)
```

---

## Interactive Discovery Process

### Step 1: Problem Understanding

**Ask clarifying questions**:
- "What component are you optimizing?"
- "What do you want to optimize?" (minimize/maximize)
- "What limits must be satisfied?" (constraints)
- "What parameters can be changed?" (design variables)
- "Where are your NX files?"

### Step 2: Protocol Selection

| Scenario | Protocol | Sampler |
|----------|----------|---------|
| Single objective + constraints | Protocol 10 | TPE/CMA-ES |
| 2-3 objectives | Protocol 11 | NSGA-II |
| >50 trials, need speed | Protocol 14 | + Neural |

### Step 3: Extractor Mapping

Map user needs to extractors from [extractors-catalog module](../modules/extractors-catalog.md):

| Need | Extractor |
|------|-----------|
| Displacement | E1: `extract_displacement` |
| Stress | E3: `extract_solid_stress` |
| Frequency | E2: `extract_frequency` |
| Mass (FEM) | E4: `extract_mass_from_bdf` |
| Mass (CAD) | E5: `extract_mass_from_expression` |

### Step 4: Multi-Solution Detection

If user needs BOTH:
- Static results (stress, displacement)
- Modal results (frequency)

Then set `solution_name=None` to solve ALL solutions.

---

## File Generation

### 1. optimization_config.json

```json
{
  "study_name": "{study_name}",
  "description": "{concise description}",

  "optimization_settings": {
    "protocol": "protocol_11_multi_objective",
    "n_trials": 30,
    "sampler": "NSGAIISampler",
    "timeout_per_trial": 600
  },

  "design_variables": [
    {
      "parameter": "{nx_expression_name}",
      "bounds": [min, max],
      "description": "{what this controls}"
    }
  ],

  "objectives": [
    {
      "name": "{objective_name}",
      "goal": "minimize",
      "weight": 1.0,
      "description": "{what this measures}"
    }
  ],

  "constraints": [
    {
      "name": "{constraint_name}",
      "type": "less_than",
      "threshold": value,
      "description": "{engineering justification}"
    }
  ],

  "simulation": {
    "model_file": "{Model}.prt",
    "sim_file": "{Model}_sim1.sim",
    "solver": "nastran"
  }
}
```

### 2. run_optimization.py Template

```python
"""
{Study Name} Optimization
{Brief description}
"""

from pathlib import Path
import sys
import json
import argparse
from typing import Tuple

project_root = Path(__file__).resolve().parents[2]
sys.path.insert(0, str(project_root))

import optuna
from optuna.samplers import NSGAIISampler  # or TPESampler

from optimization_engine.nx_solver import NXSolver
from optimization_engine.logger import get_logger

# Import extractors - USE ONLY FROM extractors-catalog module
from optimization_engine.extractors.extract_displacement import extract_displacement
from optimization_engine.extractors.bdf_mass_extractor import extract_mass_from_bdf


def load_config(config_file: Path) -> dict:
    with open(config_file, 'r') as f:
        return json.load(f)


def objective(trial: optuna.Trial, config: dict, nx_solver: NXSolver,
              model_dir: Path, logger) -> Tuple[float, float]:
    """Multi-objective function. Returns (obj1, obj2)."""

    # 1. Sample design variables
    design_vars = {}
    for var in config['design_variables']:
        param_name = var['parameter']
        bounds = var['bounds']
        design_vars[param_name] = trial.suggest_float(param_name, bounds[0], bounds[1])

    logger.trial_start(trial.number, design_vars)

    try:
        # 2. Run simulation
        sim_file = model_dir / config['simulation']['sim_file']
        result = nx_solver.run_simulation(
            sim_file=sim_file,
            working_dir=model_dir,
            expression_updates=design_vars,
            solution_name=None,  # Solve ALL solutions
            cleanup=True
        )

        if not result['success']:
            logger.trial_failed(trial.number, f"Simulation failed")
            return (float('inf'), float('inf'))

        op2_file = result['op2_file']

        # 3. Extract results
        disp_result = extract_displacement(op2_file, subcase=1)
        max_displacement = disp_result['max_displacement']

        dat_file = model_dir / config['simulation'].get('dat_file', 'model.dat')
        mass_kg = extract_mass_from_bdf(str(dat_file))

        # 4. Calculate objectives
        applied_force = 1000.0  # N
        stiffness = applied_force / max(abs(max_displacement), 1e-6)

        # 5. Set trial attributes
        trial.set_user_attr('stiffness', stiffness)
        trial.set_user_attr('mass', mass_kg)

        objectives = {'stiffness': stiffness, 'mass': mass_kg}
        logger.trial_complete(trial.number, objectives, {}, True)

        return (-stiffness, mass_kg)  # Negate stiffness to maximize

    except Exception as e:
        logger.trial_failed(trial.number, str(e))
        return (float('inf'), float('inf'))


def main():
    parser = argparse.ArgumentParser(description='{Study Name} Optimization')

    stage_group = parser.add_mutually_exclusive_group()
    stage_group.add_argument('--discover', action='store_true')
    stage_group.add_argument('--validate', action='store_true')
    stage_group.add_argument('--test', action='store_true')
    stage_group.add_argument('--train', action='store_true')
    stage_group.add_argument('--run', action='store_true')

    parser.add_argument('--trials', type=int, default=100)
    parser.add_argument('--resume', action='store_true')
    parser.add_argument('--enable-nn', action='store_true')

    args = parser.parse_args()

    study_dir = Path(__file__).parent
    config_path = study_dir / "1_setup" / "optimization_config.json"
    model_dir = study_dir / "1_setup" / "model"
    results_dir = study_dir / "2_results"
    results_dir.mkdir(exist_ok=True)

    study_name = "{study_name}"

    logger = get_logger(study_name, study_dir=results_dir)
    config = load_config(config_path)
    nx_solver = NXSolver()

    storage = f"sqlite:///{results_dir / 'study.db'}"
    sampler = NSGAIISampler(population_size=20, seed=42)

    logger.study_start(study_name, args.trials, "NSGAIISampler")

    if args.resume:
        study = optuna.load_study(study_name=study_name, storage=storage, sampler=sampler)
    else:
        study = optuna.create_study(
            study_name=study_name,
            storage=storage,
            sampler=sampler,
            directions=['minimize', 'minimize'],
            load_if_exists=True
        )

    study.optimize(
        lambda trial: objective(trial, config, nx_solver, model_dir, logger),
        n_trials=args.trials,
        show_progress_bar=True
    )

    n_successful = len([t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE])
    logger.study_complete(study_name, len(study.trials), n_successful)


if __name__ == "__main__":
    main()
```

### 3. reset_study.py

```python
"""Reset {study_name} optimization study by deleting database."""
import optuna
from pathlib import Path

study_dir = Path(__file__).parent
storage = f"sqlite:///{study_dir / '2_results' / 'study.db'}"
study_name = "{study_name}"

try:
    optuna.delete_study(study_name=study_name, storage=storage)
    print(f"[OK] Deleted study: {study_name}")
except KeyError:
    print(f"[WARNING] Study '{study_name}' not found")
except Exception as e:
    print(f"[ERROR] Error: {e}")
```

---

## Common Patterns

### Pattern 1: Mass Minimization with Constraints

```
Objective: Minimize mass
Constraints: Stress < limit, Displacement < limit
Protocol: Protocol 10 (single-objective TPE)
Extractors: E4/E5, E3, E1
Multi-Solution: No (static only)
```

### Pattern 2: Mass vs Stiffness Trade-off

```
Objectives: Minimize mass, Maximize stiffness
Constraints: Stress < limit
Protocol: Protocol 11 (multi-objective NSGA-II)
Extractors: E4/E5, E1 (for stiffness = F/δ), E3
Multi-Solution: No (static only)
```

### Pattern 3: Mass vs Frequency Trade-off

```
Objectives: Minimize mass, Maximize frequency
Constraints: Stress < limit, Displacement < limit
Protocol: Protocol 11 (multi-objective NSGA-II)
Extractors: E4/E5, E2, E3, E1
Multi-Solution: Yes (static + modal)
```

---

## Validation Integration

### Pre-Flight Check

```python
def preflight_check():
    """Validate study setup before running."""
    from optimization_engine.validators import validate_study

    result = validate_study(STUDY_NAME)

    if not result.is_ready_to_run:
        print("[X] Study validation failed!")
        print(result)
        sys.exit(1)

    print("[OK] Pre-flight check passed!")
    return True
```

### Validation Checklist

- [ ] All design variables have valid bounds (min < max)
- [ ] All objectives have proper extraction methods
- [ ] All constraints have thresholds defined
- [ ] Protocol matches objective count
- [ ] Part file (.prt) exists in model directory
- [ ] Simulation file (.sim) exists

---

## Output Format

After completing study creation, provide:

**Summary Table**:
```
Study Created: {study_name}
Protocol: {protocol}
Objectives: {list}
Constraints: {list}
Design Variables: {list}
Multi-Solution: {Yes/No}
```

**File Checklist**:
```
✓ studies/{study_name}/1_setup/optimization_config.json
✓ studies/{study_name}/1_setup/workflow_config.json
✓ studies/{study_name}/run_optimization.py
✓ studies/{study_name}/reset_study.py
✓ studies/{study_name}/MODEL_INTROSPECTION.md    # MANDATORY - Model analysis
✓ studies/{study_name}/README.md
✓ studies/{study_name}/STUDY_REPORT.md
```

**Next Steps**:
```
1. Place your NX files in studies/{study_name}/1_setup/model/
2. Test with: python run_optimization.py --test
3. Monitor: http://localhost:3003
4. Full run: python run_optimization.py --run --trials {n_trials}
```

---

## Critical Reminders

1. **Multi-Objective Return Format**: Return tuple with positive values, use `directions` for semantics
2. **Multi-Solution**: Set `solution_name=None` for static + modal workflows
3. **Always use centralized extractors** from `optimization_engine/extractors/`
4. **Never modify master model files** - always work on copies
5. **Structured logging is mandatory** - use `get_logger()`

---

## Assembly FEM (AFEM) Workflow

For complex assemblies with `.afm` files, the update sequence is critical:

```
.prt (geometry) → _fem1.fem (component mesh) → .afm (assembly mesh) → .sim (solution)
```

### The 4-Step Update Process

1. **Update Expressions in Geometry (.prt)**
   - Open part, update expressions, DoUpdate(), Save

2. **Update ALL Linked Geometry Parts** (CRITICAL!)
   - Open each linked part, DoUpdate(), Save
   - **Skipping this causes corrupt results ("billion nm" RMS)**

3. **Update Component FEMs (.fem)**
   - UpdateFemodel() regenerates mesh from updated geometry

4. **Update Assembly FEM (.afm)**
   - UpdateFemodel(), merge coincident nodes at interfaces

### Assembly Configuration

```json
{
  "nx_settings": {
    "expression_part": "M1_Blank",
    "component_fems": ["M1_Blank_fem1.fem", "M1_Support_fem1.fem"],
    "afm_file": "ASSY_M1_assyfem1.afm"
  }
}
```

---

## Multi-Solution Solve Protocol

When simulation has multiple solutions (static + modal), use `SolveAllSolutions` API:

### Critical: Foreground Mode Required

```python
# WRONG - Returns immediately, async
theCAESimSolveManager.SolveChainOfSolutions(
    psolutions1,
    SolveMode.Background  # Returns before complete!
)

# CORRECT - Waits for completion
theCAESimSolveManager.SolveAllSolutions(
    SolveOption.Solve,
    SetupCheckOption.CompleteCheckAndOutputErrors,
    SolveMode.Foreground,  # Blocks until complete
    False
)
```

### When to Use

- `solution_name=None` passed to `NXSolver.run_simulation()`
- Multiple solutions that must all complete
- Multi-objective requiring results from different analysis types

### Solution Monitor Control

Solution monitor is automatically disabled when solving multiple solutions to prevent window pile-up:

```python
propertyTable.SetBooleanPropertyValue("solution monitor", False)
```

### Verification

After solve, verify:
- Both `.dat` files written (one per solution)
- Both `.op2` files created with updated timestamps
- Results are unique per trial (frequency values vary)

---

## Cross-References

- **Operations Protocol**: [OP_01_CREATE_STUDY](../../docs/protocols/operations/OP_01_CREATE_STUDY.md)
- **Extractors Module**: [extractors-catalog](../modules/extractors-catalog.md)
- **Zernike Module**: [zernike-optimization](../modules/zernike-optimization.md)
- **Neural Module**: [neural-acceleration](../modules/neural-acceleration.md)
- **System Protocols**: [SYS_10_IMSO](../../docs/protocols/system/SYS_10_IMSO.md), [SYS_11_MULTI_OBJECTIVE](../../docs/protocols/system/SYS_11_MULTI_OBJECTIVE.md)