Files

Antoine 602560c46a feat: Add MLP surrogate with Turbo Mode for 100x faster optimization

Neural Acceleration (MLP Surrogate):
- Add run_nn_optimization.py with hybrid FEA/NN workflow
- MLP architecture: 4-layer (64->128->128->64) with BatchNorm/Dropout
- Three workflow modes:
  - --all: Sequential export->train->optimize->validate
  - --hybrid-loop: Iterative Train->NN->Validate->Retrain cycle
  - --turbo: Aggressive single-best validation (RECOMMENDED)
- Turbo mode: 5000 NN trials + 50 FEA validations in ~12 minutes
- Separate nn_study.db to avoid overloading dashboard

Performance Results (bracket_pareto_3obj study):
- NN prediction errors: mass 1-5%, stress 1-4%, stiffness 5-15%
- Found minimum mass designs at boundary (angle~30deg, thick~30mm)
- 100x speedup vs pure FEA exploration

Protocol Operating System:
- Add .claude/skills/ with Bootstrap, Cheatsheet, Context Loader
- Add docs/protocols/ with operations (OP_01-06) and system (SYS_10-14)
- Update SYS_14_NEURAL_ACCELERATION.md with MLP Turbo Mode docs

NX Automation:
- Add optimization_engine/hooks/ for NX CAD/CAE automation
- Add study_wizard.py for guided study creation
- Fix FEM mesh update: load idealized part before UpdateFemodel()

New Study:
- bracket_pareto_3obj: 3-objective Pareto (mass, stress, stiffness)
- 167 FEA trials + 5000 NN trials completed
- Demonstrates full hybrid workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-06 20:01:59 -05:00

9.2 KiB

Raw Blame History

OP_06: Troubleshoot

Overview

This protocol provides systematic troubleshooting for common optimization issues, covering NX errors, extraction failures, database problems, and performance issues.

When to Use

Trigger	Action
"error", "failed"	Follow this protocol
"not working", "crashed"	Follow this protocol
"help", "stuck"	Follow this protocol
Unexpected behavior	Follow this protocol

Quick Diagnostic

# 1. Check environment
conda activate atomizer
python --version  # Should be 3.9+

# 2. Check study structure
ls studies/my_study/
# Should have: 1_setup/, run_optimization.py

# 3. Check model files
ls studies/my_study/1_setup/model/
# Should have: .prt, .sim files

# 4. Test single trial
python run_optimization.py --test

Error Categories

1. Environment Errors

"ModuleNotFoundError: No module named 'optuna'"

Cause: Wrong Python environment

Solution:

conda activate atomizer
# Verify
conda list | grep optuna

"Python version mismatch"

Cause: Wrong Python version

Solution:

python --version  # Need 3.9+
conda activate atomizer

2. NX Model Setup Errors

"All optimization trials produce identical results"

Cause: Missing idealized part (*_i.prt) or broken file chain

Symptoms:

Journal shows "FE model updated" but results don't change
DAT files have same node coordinates with different expressions
OP2 file timestamps update but values are identical

Root Cause: NX simulation files have a parent-child hierarchy:

.sim → .fem → _i.prt → .prt (geometry)

If the _i.prt (idealized part) is missing or not properly linked, UpdateFemodel() runs but the mesh doesn't regenerate because:

FEM mesh is tied to idealized geometry, not master geometry
Without idealized part updating, FEM has nothing new to mesh against

Solution:

Check file chain in NX:
- Open .sim file
- Go to Part Navigator or Assembly Navigator
- List ALL referenced parts

Copy ALL linked files to study folder:

# Typical file set needed:
Model.prt           # Geometry
Model_fem1_i.prt    # Idealized part ← OFTEN MISSING!
Model_fem1.fem      # FEM file
Model_sim1.sim      # Simulation file

Verify links are intact:
- Open model in NX after copying
- Check that updates propagate: Geometry → Idealized → FEM → Sim

CRITICAL CODE FIX (already implemented in solve_simulation.py): The idealized part MUST be explicitly loaded before UpdateFemodel():

# Load idealized part BEFORE updating FEM
for filename in os.listdir(working_dir):
    if '_i.prt' in filename.lower():
        idealized_part, status = theSession.Parts.Open(path)
        break

# Now UpdateFemodel() will work correctly
feModel.UpdateFemodel()

Without loading the _i.prt, NX cannot propagate geometry changes to the mesh.

Prevention: Always use introspection to list all parts referenced by a simulation.

3. NX/Solver Errors

"NX session timeout after 600s"

Cause: Model too complex or NX stuck

Solution:

Increase timeout in config:
```
"simulation": {
  "timeout": 1200
}
```
Simplify mesh if possible
Check NX license availability

"Expression 'xxx' not found in model"

Cause: Expression name mismatch

Solution:

Open model in NX
Go to Tools → Expressions
Verify exact expression name (case-sensitive)
Update config to match

"NX license error"

Cause: License server unavailable

Solution:

Check license server status
Wait and retry
Contact IT if persistent

"NX solve failed - check log"

Cause: Nastran solver error

Solution:

Find log file: 1_setup/model/*.log or *.f06
Search for "FATAL" or "ERROR"
Common causes:
- Singular stiffness matrix (constraints issue)
- Bad mesh (distorted elements)
- Missing material properties

3. Extraction Errors

"OP2 file not found"

Cause: Solve didn't produce output

Solution:

Check if solve completed
Look for .op2 file in model directory
Check NX log for solve errors

"No displacement data for subcase X"

Cause: Wrong subcase number

Solution:

Check available subcases in OP2:

from pyNastran.op2.op2 import OP2
op2 = OP2()
op2.read_op2('model.op2')
print(op2.displacements.keys())

Update subcase in extractor call

"Element type 'xxx' not supported"

Cause: Extractor doesn't support element type

Solution:

Check available types in extractor
Common types: cquad4, ctria3, ctetra, chexa
May need different extractor

4. Database Errors

"Database is locked"

Cause: Another process using database

Solution:

Check for running processes:
```
ps aux | grep run_optimization
```
Kill stale process if needed
Wait for other optimization to finish

"Study 'xxx' not found"

Cause: Wrong study name or path

Solution:

Check exact study name in database:

import optuna
storage = optuna.storages.RDBStorage('sqlite:///study.db')
print(storage.get_all_study_summaries())

Use correct name when loading

"IntegrityError: UNIQUE constraint failed"

Cause: Duplicate trial number

Solution:

Don't run multiple optimizations on same study simultaneously
Use --resume flag for continuation

5. Constraint/Feasibility Errors

"All trials pruned"

Cause: No feasible region

Solution:

Check constraint values:

# In objective function, print constraint values
print(f"Stress: {stress}, limit: 250")

Relax constraints
Widen design variable bounds

"No improvement after N trials"

Cause: Stuck in local minimum or converged

Solution:

Check if truly converged (good result)
Try different starting region
Use different sampler
Increase exploration (lower n_startup_trials)

6. Performance Issues

"Trials running very slowly"

Cause: Complex model or inefficient extraction

Solution:

Profile time per component:

import time
start = time.time()
# ... operation ...
print(f"Took: {time.time() - start:.1f}s")

Simplify mesh if NX is slow
Check extraction isn't re-parsing OP2 multiple times

"Memory error"

Cause: Large OP2 file or many trials

Solution:

Clear Python memory between trials
Don't store all results in memory
Use database for persistence

Diagnostic Commands

Quick Health Check

# Environment
conda activate atomizer
python -c "import optuna; print('Optuna OK')"
python -c "import pyNastran; print('pyNastran OK')"

# Study structure
ls -la studies/my_study/

# Config validity
python -c "
import json
with open('studies/my_study/1_setup/optimization_config.json') as f:
    config = json.load(f)
print('Config OK')
print(f'Objectives: {len(config.get(\"objectives\", []))}')
"

# Database status
python -c "
import optuna
study = optuna.load_study('my_study', 'sqlite:///studies/my_study/2_results/study.db')
print(f'Trials: {len(study.trials)}')
"

NX Log Analysis

# Find latest log
ls -lt studies/my_study/1_setup/model/*.log | head -1

# Search for errors
grep -i "error\|fatal\|fail" studies/my_study/1_setup/model/*.log

Trial Failure Analysis

import optuna

study = optuna.load_study(...)

# Failed trials
failed = [t for t in study.trials
          if t.state == optuna.trial.TrialState.FAIL]
print(f"Failed: {len(failed)}")

for t in failed[:5]:
    print(f"Trial {t.number}: {t.user_attrs}")

# Pruned trials
pruned = [t for t in study.trials
          if t.state == optuna.trial.TrialState.PRUNED]
print(f"Pruned: {len(pruned)}")

Recovery Actions

Reset Study (Start Fresh)

# Backup first
cp -r studies/my_study/2_results studies/my_study/2_results_backup

# Delete results
rm -rf studies/my_study/2_results/*

# Run fresh
python run_optimization.py

Resume Interrupted Study

python run_optimization.py --resume

Restore from Backup

cp -r studies/my_study/2_results_backup/* studies/my_study/2_results/

Getting Help

Information to Provide

When asking for help, include:

Error message (full traceback)
Config file contents
Study structure (ls -la)
What you tried
NX log excerpt (if NX error)

Log Locations

Log	Location
Optimization	Console output or redirect to file
NX Solve	`1_setup/model/.log`, `.f06`
Database	`2_results/study.db` (query with optuna)
Intelligence	`2_results/intelligent_optimizer/*.json`

Cross-References

Related: All operation protocols
System: SYS_10_IMSO, SYS_12_EXTRACTOR_LIBRARY

Version History

Version	Date	Changes
1.0	2025-12-05	Initial release

9.2 KiB Raw Blame History

OP_06: Troubleshoot

Overview

When to Use

Quick Diagnostic

Error Categories

1. Environment Errors

"ModuleNotFoundError: No module named 'optuna'"

"Python version mismatch"

2. NX Model Setup Errors

"All optimization trials produce identical results"

3. NX/Solver Errors

"NX session timeout after 600s"

"Expression 'xxx' not found in model"

"NX license error"

"NX solve failed - check log"

3. Extraction Errors

"OP2 file not found"

"No displacement data for subcase X"

"Element type 'xxx' not supported"

4. Database Errors

"Database is locked"

"Study 'xxx' not found"

"IntegrityError: UNIQUE constraint failed"

5. Constraint/Feasibility Errors

"All trials pruned"

"No improvement after N trials"

6. Performance Issues

"Trials running very slowly"

"Memory error"

Diagnostic Commands

Quick Health Check

NX Log Analysis

Trial Failure Analysis

Recovery Actions

Reset Study (Start Fresh)

Resume Interrupted Study

Restore from Backup

Getting Help

Information to Provide

Log Locations

Cross-References

Version History

9.2 KiB

Raw Blame History