feat: Add MLP surrogate with Turbo Mode for 100x faster optimization

Neural Acceleration (MLP Surrogate): - Add run_nn_optimization.py with hybrid FEA/NN workflow - MLP architecture: 4-layer (64->128->128->64) with BatchNorm/Dropout - Three workflow modes: - --all: Sequential export->train->optimize->validate - --hybrid-loop: Iterative Train->NN->Validate->Retrain cycle - --turbo: Aggressive single-best validation (RECOMMENDED) - Turbo mode: 5000 NN trials + 50 FEA validations in ~12 minutes - Separate nn_study.db to avoid overloading dashboard Performance Results (bracket_pareto_3obj study): - NN prediction errors: mass 1-5%, stress 1-4%, stiffness 5-15% - Found minimum mass designs at boundary (angle~30deg, thick~30mm) - 100x speedup vs pure FEA exploration Protocol Operating System: - Add .claude/skills/ with Bootstrap, Cheatsheet, Context Loader - Add docs/protocols/ with operations (OP_01-06) and system (SYS_10-14) - Update SYS_14_NEURAL_ACCELERATION.md with MLP Turbo Mode docs NX Automation: - Add optimization_engine/hooks/ for NX CAD/CAE automation - Add study_wizard.py for guided study creation - Fix FEM mesh update: load idealized part before UpdateFemodel() New Study: - bracket_pareto_3obj: 3-objective Pareto (mass, stress, stiffness) - 167 FEA trials + 5000 NN trials completed - Demonstrates full hybrid workflow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 20:01:59 -05:00
parent 0cb2808c44
commit 602560c46a
70 changed files with 31018 additions and 289 deletions
--- a/docs/protocols/operations/OP_06_TROUBLESHOOT.md
+++ b/docs/protocols/operations/OP_06_TROUBLESHOOT.md
@@ -0,0 +1,437 @@
+# OP_06: Troubleshoot
+
+<!--
+PROTOCOL: Troubleshoot Optimization Issues
+LAYER: Operations
+VERSION: 1.0
+STATUS: Active
+LAST_UPDATED: 2025-12-05
+PRIVILEGE: user
+LOAD_WITH: []
+-->
+
+## Overview
+
+This protocol provides systematic troubleshooting for common optimization issues, covering NX errors, extraction failures, database problems, and performance issues.
+
+---
+
+## When to Use
+
+| Trigger | Action |
+|---------|--------|
+| "error", "failed" | Follow this protocol |
+| "not working", "crashed" | Follow this protocol |
+| "help", "stuck" | Follow this protocol |
+| Unexpected behavior | Follow this protocol |
+
+---
+
+## Quick Diagnostic
+
+```bash
+# 1. Check environment
+conda activate atomizer
+python --version  # Should be 3.9+
+
+# 2. Check study structure
+ls studies/my_study/
+# Should have: 1_setup/, run_optimization.py
+
+# 3. Check model files
+ls studies/my_study/1_setup/model/
+# Should have: .prt, .sim files
+
+# 4. Test single trial
+python run_optimization.py --test
+```
+
+---
+
+## Error Categories
+
+### 1. Environment Errors
+
+#### "ModuleNotFoundError: No module named 'optuna'"
+
+**Cause**: Wrong Python environment
+
+**Solution**:
+```bash
+conda activate atomizer
+# Verify
+conda list | grep optuna
+```
+
+#### "Python version mismatch"
+
+**Cause**: Wrong Python version
+
+**Solution**:
+```bash
+python --version  # Need 3.9+
+conda activate atomizer
+```
+
+---
+
+### 2. NX Model Setup Errors
+
+#### "All optimization trials produce identical results"
+
+**Cause**: Missing idealized part (`*_i.prt`) or broken file chain
+
+**Symptoms**:
+- Journal shows "FE model updated" but results don't change
+- DAT files have same node coordinates with different expressions
+- OP2 file timestamps update but values are identical
+
+**Root Cause**: NX simulation files have a parent-child hierarchy:
+```
+.sim → .fem → _i.prt → .prt (geometry)
+```
+
+If the `_i.prt` (idealized part) is missing or not properly linked, `UpdateFemodel()` runs but the mesh doesn't regenerate because:
+- FEM mesh is tied to idealized geometry, not master geometry
+- Without idealized part updating, FEM has nothing new to mesh against
+
+**Solution**:
+1. **Check file chain in NX**:
+   - Open `.sim` file
+   - Go to **Part Navigator** or **Assembly Navigator**
+   - List ALL referenced parts
+
+2. **Copy ALL linked files** to study folder:
+   ```bash
+   # Typical file set needed:
+   Model.prt           # Geometry
+   Model_fem1_i.prt    # Idealized part ← OFTEN MISSING!
+   Model_fem1.fem      # FEM file
+   Model_sim1.sim      # Simulation file
+   ```
+
+3. **Verify links are intact**:
+   - Open model in NX after copying
+   - Check that updates propagate: Geometry → Idealized → FEM → Sim
+
+4. **CRITICAL CODE FIX** (already implemented in `solve_simulation.py`):
+   The idealized part MUST be explicitly loaded before `UpdateFemodel()`:
+   ```python
+   # Load idealized part BEFORE updating FEM
+   for filename in os.listdir(working_dir):
+       if '_i.prt' in filename.lower():
+           idealized_part, status = theSession.Parts.Open(path)
+           break
+
+   # Now UpdateFemodel() will work correctly
+   feModel.UpdateFemodel()
+   ```
+   Without loading the `_i.prt`, NX cannot propagate geometry changes to the mesh.
+
+**Prevention**: Always use introspection to list all parts referenced by a simulation.
+
+---
+
+### 3. NX/Solver Errors
+
+#### "NX session timeout after 600s"
+
+**Cause**: Model too complex or NX stuck
+
+**Solution**:
+1. Increase timeout in config:
+   ```json
+   "simulation": {
+     "timeout": 1200
+   }
+   ```
+2. Simplify mesh if possible
+3. Check NX license availability
+
+#### "Expression 'xxx' not found in model"
+
+**Cause**: Expression name mismatch
+
+**Solution**:
+1. Open model in NX
+2. Go to Tools → Expressions
+3. Verify exact expression name (case-sensitive)
+4. Update config to match
+
+#### "NX license error"
+
+**Cause**: License server unavailable
+
+**Solution**:
+1. Check license server status
+2. Wait and retry
+3. Contact IT if persistent
+
+#### "NX solve failed - check log"
+
+**Cause**: Nastran solver error
+
+**Solution**:
+1. Find log file: `1_setup/model/*.log` or `*.f06`
+2. Search for "FATAL" or "ERROR"
+3. Common causes:
+   - Singular stiffness matrix (constraints issue)
+   - Bad mesh (distorted elements)
+   - Missing material properties
+
+---
+
+### 3. Extraction Errors
+
+#### "OP2 file not found"
+
+**Cause**: Solve didn't produce output
+
+**Solution**:
+1. Check if solve completed
+2. Look for `.op2` file in model directory
+3. Check NX log for solve errors
+
+#### "No displacement data for subcase X"
+
+**Cause**: Wrong subcase number
+
+**Solution**:
+1. Check available subcases in OP2:
+   ```python
+   from pyNastran.op2.op2 import OP2
+   op2 = OP2()
+   op2.read_op2('model.op2')
+   print(op2.displacements.keys())
+   ```
+2. Update subcase in extractor call
+
+#### "Element type 'xxx' not supported"
+
+**Cause**: Extractor doesn't support element type
+
+**Solution**:
+1. Check available types in extractor
+2. Common types: `cquad4`, `ctria3`, `ctetra`, `chexa`
+3. May need different extractor
+
+---
+
+### 4. Database Errors
+
+#### "Database is locked"
+
+**Cause**: Another process using database
+
+**Solution**:
+1. Check for running processes:
+   ```bash
+   ps aux | grep run_optimization
+   ```
+2. Kill stale process if needed
+3. Wait for other optimization to finish
+
+#### "Study 'xxx' not found"
+
+**Cause**: Wrong study name or path
+
+**Solution**:
+1. Check exact study name in database:
+   ```python
+   import optuna
+   storage = optuna.storages.RDBStorage('sqlite:///study.db')
+   print(storage.get_all_study_summaries())
+   ```
+2. Use correct name when loading
+
+#### "IntegrityError: UNIQUE constraint failed"
+
+**Cause**: Duplicate trial number
+
+**Solution**:
+1. Don't run multiple optimizations on same study simultaneously
+2. Use `--resume` flag for continuation
+
+---
+
+### 5. Constraint/Feasibility Errors
+
+#### "All trials pruned"
+
+**Cause**: No feasible region
+
+**Solution**:
+1. Check constraint values:
+   ```python
+   # In objective function, print constraint values
+   print(f"Stress: {stress}, limit: 250")
+   ```
+2. Relax constraints
+3. Widen design variable bounds
+
+#### "No improvement after N trials"
+
+**Cause**: Stuck in local minimum or converged
+
+**Solution**:
+1. Check if truly converged (good result)
+2. Try different starting region
+3. Use different sampler
+4. Increase exploration (lower `n_startup_trials`)
+
+---
+
+### 6. Performance Issues
+
+#### "Trials running very slowly"
+
+**Cause**: Complex model or inefficient extraction
+
+**Solution**:
+1. Profile time per component:
+   ```python
+   import time
+   start = time.time()
+   # ... operation ...
+   print(f"Took: {time.time() - start:.1f}s")
+   ```
+2. Simplify mesh if NX is slow
+3. Check extraction isn't re-parsing OP2 multiple times
+
+#### "Memory error"
+
+**Cause**: Large OP2 file or many trials
+
+**Solution**:
+1. Clear Python memory between trials
+2. Don't store all results in memory
+3. Use database for persistence
+
+---
+
+## Diagnostic Commands
+
+### Quick Health Check
+
+```bash
+# Environment
+conda activate atomizer
+python -c "import optuna; print('Optuna OK')"
+python -c "import pyNastran; print('pyNastran OK')"
+
+# Study structure
+ls -la studies/my_study/
+
+# Config validity
+python -c "
+import json
+with open('studies/my_study/1_setup/optimization_config.json') as f:
+    config = json.load(f)
+print('Config OK')
+print(f'Objectives: {len(config.get(\"objectives\", []))}')
+"
+
+# Database status
+python -c "
+import optuna
+study = optuna.load_study('my_study', 'sqlite:///studies/my_study/2_results/study.db')
+print(f'Trials: {len(study.trials)}')
+"
+```
+
+### NX Log Analysis
+
+```bash
+# Find latest log
+ls -lt studies/my_study/1_setup/model/*.log | head -1
+
+# Search for errors
+grep -i "error\|fatal\|fail" studies/my_study/1_setup/model/*.log
+```
+
+### Trial Failure Analysis
+
+```python
+import optuna
+
+study = optuna.load_study(...)
+
+# Failed trials
+failed = [t for t in study.trials
+          if t.state == optuna.trial.TrialState.FAIL]
+print(f"Failed: {len(failed)}")
+
+for t in failed[:5]:
+    print(f"Trial {t.number}: {t.user_attrs}")
+
+# Pruned trials
+pruned = [t for t in study.trials
+          if t.state == optuna.trial.TrialState.PRUNED]
+print(f"Pruned: {len(pruned)}")
+```
+
+---
+
+## Recovery Actions
+
+### Reset Study (Start Fresh)
+
+```bash
+# Backup first
+cp -r studies/my_study/2_results studies/my_study/2_results_backup
+
+# Delete results
+rm -rf studies/my_study/2_results/*
+
+# Run fresh
+python run_optimization.py
+```
+
+### Resume Interrupted Study
+
+```bash
+python run_optimization.py --resume
+```
+
+### Restore from Backup
+
+```bash
+cp -r studies/my_study/2_results_backup/* studies/my_study/2_results/
+```
+
+---
+
+## Getting Help
+
+### Information to Provide
+
+When asking for help, include:
+1. Error message (full traceback)
+2. Config file contents
+3. Study structure (`ls -la`)
+4. What you tried
+5. NX log excerpt (if NX error)
+
+### Log Locations
+
+| Log | Location |
+|-----|----------|
+| Optimization | Console output or redirect to file |
+| NX Solve | `1_setup/model/*.log`, `*.f06` |
+| Database | `2_results/study.db` (query with optuna) |
+| Intelligence | `2_results/intelligent_optimizer/*.json` |
+
+---
+
+## Cross-References
+
+- **Related**: All operation protocols
+- **System**: [SYS_10_IMSO](../system/SYS_10_IMSO.md), [SYS_12_EXTRACTOR_LIBRARY](../system/SYS_12_EXTRACTOR_LIBRARY.md)
+
+---
+
+## Version History
+
+| Version | Date | Changes |
+|---------|------|---------|
+| 1.0 | 2025-12-05 | Initial release |