Files
Atomizer/docs/protocols/operations/OP_06_TROUBLESHOOT.md
Antoine 602560c46a feat: Add MLP surrogate with Turbo Mode for 100x faster optimization
Neural Acceleration (MLP Surrogate):
- Add run_nn_optimization.py with hybrid FEA/NN workflow
- MLP architecture: 4-layer (64->128->128->64) with BatchNorm/Dropout
- Three workflow modes:
  - --all: Sequential export->train->optimize->validate
  - --hybrid-loop: Iterative Train->NN->Validate->Retrain cycle
  - --turbo: Aggressive single-best validation (RECOMMENDED)
- Turbo mode: 5000 NN trials + 50 FEA validations in ~12 minutes
- Separate nn_study.db to avoid overloading dashboard

Performance Results (bracket_pareto_3obj study):
- NN prediction errors: mass 1-5%, stress 1-4%, stiffness 5-15%
- Found minimum mass designs at boundary (angle~30deg, thick~30mm)
- 100x speedup vs pure FEA exploration

Protocol Operating System:
- Add .claude/skills/ with Bootstrap, Cheatsheet, Context Loader
- Add docs/protocols/ with operations (OP_01-06) and system (SYS_10-14)
- Update SYS_14_NEURAL_ACCELERATION.md with MLP Turbo Mode docs

NX Automation:
- Add optimization_engine/hooks/ for NX CAD/CAE automation
- Add study_wizard.py for guided study creation
- Fix FEM mesh update: load idealized part before UpdateFemodel()

New Study:
- bracket_pareto_3obj: 3-objective Pareto (mass, stress, stiffness)
- 167 FEA trials + 5000 NN trials completed
- Demonstrates full hybrid workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 20:01:59 -05:00

9.2 KiB

OP_06: Troubleshoot

Overview

This protocol provides systematic troubleshooting for common optimization issues, covering NX errors, extraction failures, database problems, and performance issues.


When to Use

Trigger Action
"error", "failed" Follow this protocol
"not working", "crashed" Follow this protocol
"help", "stuck" Follow this protocol
Unexpected behavior Follow this protocol

Quick Diagnostic

# 1. Check environment
conda activate atomizer
python --version  # Should be 3.9+

# 2. Check study structure
ls studies/my_study/
# Should have: 1_setup/, run_optimization.py

# 3. Check model files
ls studies/my_study/1_setup/model/
# Should have: .prt, .sim files

# 4. Test single trial
python run_optimization.py --test

Error Categories

1. Environment Errors

"ModuleNotFoundError: No module named 'optuna'"

Cause: Wrong Python environment

Solution:

conda activate atomizer
# Verify
conda list | grep optuna

"Python version mismatch"

Cause: Wrong Python version

Solution:

python --version  # Need 3.9+
conda activate atomizer

2. NX Model Setup Errors

"All optimization trials produce identical results"

Cause: Missing idealized part (*_i.prt) or broken file chain

Symptoms:

  • Journal shows "FE model updated" but results don't change
  • DAT files have same node coordinates with different expressions
  • OP2 file timestamps update but values are identical

Root Cause: NX simulation files have a parent-child hierarchy:

.sim → .fem → _i.prt → .prt (geometry)

If the _i.prt (idealized part) is missing or not properly linked, UpdateFemodel() runs but the mesh doesn't regenerate because:

  • FEM mesh is tied to idealized geometry, not master geometry
  • Without idealized part updating, FEM has nothing new to mesh against

Solution:

  1. Check file chain in NX:

    • Open .sim file
    • Go to Part Navigator or Assembly Navigator
    • List ALL referenced parts
  2. Copy ALL linked files to study folder:

    # Typical file set needed:
    Model.prt           # Geometry
    Model_fem1_i.prt    # Idealized part ← OFTEN MISSING!
    Model_fem1.fem      # FEM file
    Model_sim1.sim      # Simulation file
    
  3. Verify links are intact:

    • Open model in NX after copying
    • Check that updates propagate: Geometry → Idealized → FEM → Sim
  4. CRITICAL CODE FIX (already implemented in solve_simulation.py): The idealized part MUST be explicitly loaded before UpdateFemodel():

    # Load idealized part BEFORE updating FEM
    for filename in os.listdir(working_dir):
        if '_i.prt' in filename.lower():
            idealized_part, status = theSession.Parts.Open(path)
            break
    
    # Now UpdateFemodel() will work correctly
    feModel.UpdateFemodel()
    

    Without loading the _i.prt, NX cannot propagate geometry changes to the mesh.

Prevention: Always use introspection to list all parts referenced by a simulation.


3. NX/Solver Errors

"NX session timeout after 600s"

Cause: Model too complex or NX stuck

Solution:

  1. Increase timeout in config:
    "simulation": {
      "timeout": 1200
    }
    
  2. Simplify mesh if possible
  3. Check NX license availability

"Expression 'xxx' not found in model"

Cause: Expression name mismatch

Solution:

  1. Open model in NX
  2. Go to Tools → Expressions
  3. Verify exact expression name (case-sensitive)
  4. Update config to match

"NX license error"

Cause: License server unavailable

Solution:

  1. Check license server status
  2. Wait and retry
  3. Contact IT if persistent

"NX solve failed - check log"

Cause: Nastran solver error

Solution:

  1. Find log file: 1_setup/model/*.log or *.f06
  2. Search for "FATAL" or "ERROR"
  3. Common causes:
    • Singular stiffness matrix (constraints issue)
    • Bad mesh (distorted elements)
    • Missing material properties

3. Extraction Errors

"OP2 file not found"

Cause: Solve didn't produce output

Solution:

  1. Check if solve completed
  2. Look for .op2 file in model directory
  3. Check NX log for solve errors

"No displacement data for subcase X"

Cause: Wrong subcase number

Solution:

  1. Check available subcases in OP2:
    from pyNastran.op2.op2 import OP2
    op2 = OP2()
    op2.read_op2('model.op2')
    print(op2.displacements.keys())
    
  2. Update subcase in extractor call

"Element type 'xxx' not supported"

Cause: Extractor doesn't support element type

Solution:

  1. Check available types in extractor
  2. Common types: cquad4, ctria3, ctetra, chexa
  3. May need different extractor

4. Database Errors

"Database is locked"

Cause: Another process using database

Solution:

  1. Check for running processes:
    ps aux | grep run_optimization
    
  2. Kill stale process if needed
  3. Wait for other optimization to finish

"Study 'xxx' not found"

Cause: Wrong study name or path

Solution:

  1. Check exact study name in database:
    import optuna
    storage = optuna.storages.RDBStorage('sqlite:///study.db')
    print(storage.get_all_study_summaries())
    
  2. Use correct name when loading

"IntegrityError: UNIQUE constraint failed"

Cause: Duplicate trial number

Solution:

  1. Don't run multiple optimizations on same study simultaneously
  2. Use --resume flag for continuation

5. Constraint/Feasibility Errors

"All trials pruned"

Cause: No feasible region

Solution:

  1. Check constraint values:
    # In objective function, print constraint values
    print(f"Stress: {stress}, limit: 250")
    
  2. Relax constraints
  3. Widen design variable bounds

"No improvement after N trials"

Cause: Stuck in local minimum or converged

Solution:

  1. Check if truly converged (good result)
  2. Try different starting region
  3. Use different sampler
  4. Increase exploration (lower n_startup_trials)

6. Performance Issues

"Trials running very slowly"

Cause: Complex model or inefficient extraction

Solution:

  1. Profile time per component:
    import time
    start = time.time()
    # ... operation ...
    print(f"Took: {time.time() - start:.1f}s")
    
  2. Simplify mesh if NX is slow
  3. Check extraction isn't re-parsing OP2 multiple times

"Memory error"

Cause: Large OP2 file or many trials

Solution:

  1. Clear Python memory between trials
  2. Don't store all results in memory
  3. Use database for persistence

Diagnostic Commands

Quick Health Check

# Environment
conda activate atomizer
python -c "import optuna; print('Optuna OK')"
python -c "import pyNastran; print('pyNastran OK')"

# Study structure
ls -la studies/my_study/

# Config validity
python -c "
import json
with open('studies/my_study/1_setup/optimization_config.json') as f:
    config = json.load(f)
print('Config OK')
print(f'Objectives: {len(config.get(\"objectives\", []))}')
"

# Database status
python -c "
import optuna
study = optuna.load_study('my_study', 'sqlite:///studies/my_study/2_results/study.db')
print(f'Trials: {len(study.trials)}')
"

NX Log Analysis

# Find latest log
ls -lt studies/my_study/1_setup/model/*.log | head -1

# Search for errors
grep -i "error\|fatal\|fail" studies/my_study/1_setup/model/*.log

Trial Failure Analysis

import optuna

study = optuna.load_study(...)

# Failed trials
failed = [t for t in study.trials
          if t.state == optuna.trial.TrialState.FAIL]
print(f"Failed: {len(failed)}")

for t in failed[:5]:
    print(f"Trial {t.number}: {t.user_attrs}")

# Pruned trials
pruned = [t for t in study.trials
          if t.state == optuna.trial.TrialState.PRUNED]
print(f"Pruned: {len(pruned)}")

Recovery Actions

Reset Study (Start Fresh)

# Backup first
cp -r studies/my_study/2_results studies/my_study/2_results_backup

# Delete results
rm -rf studies/my_study/2_results/*

# Run fresh
python run_optimization.py

Resume Interrupted Study

python run_optimization.py --resume

Restore from Backup

cp -r studies/my_study/2_results_backup/* studies/my_study/2_results/

Getting Help

Information to Provide

When asking for help, include:

  1. Error message (full traceback)
  2. Config file contents
  3. Study structure (ls -la)
  4. What you tried
  5. NX log excerpt (if NX error)

Log Locations

Log Location
Optimization Console output or redirect to file
NX Solve 1_setup/model/*.log, *.f06
Database 2_results/study.db (query with optuna)
Intelligence 2_results/intelligent_optimizer/*.json

Cross-References


Version History

Version Date Changes
1.0 2025-12-05 Initial release