feat: Add MLP surrogate with Turbo Mode for 100x faster optimization
Neural Acceleration (MLP Surrogate): - Add run_nn_optimization.py with hybrid FEA/NN workflow - MLP architecture: 4-layer (64->128->128->64) with BatchNorm/Dropout - Three workflow modes: - --all: Sequential export->train->optimize->validate - --hybrid-loop: Iterative Train->NN->Validate->Retrain cycle - --turbo: Aggressive single-best validation (RECOMMENDED) - Turbo mode: 5000 NN trials + 50 FEA validations in ~12 minutes - Separate nn_study.db to avoid overloading dashboard Performance Results (bracket_pareto_3obj study): - NN prediction errors: mass 1-5%, stress 1-4%, stiffness 5-15% - Found minimum mass designs at boundary (angle~30deg, thick~30mm) - 100x speedup vs pure FEA exploration Protocol Operating System: - Add .claude/skills/ with Bootstrap, Cheatsheet, Context Loader - Add docs/protocols/ with operations (OP_01-06) and system (SYS_10-14) - Update SYS_14_NEURAL_ACCELERATION.md with MLP Turbo Mode docs NX Automation: - Add optimization_engine/hooks/ for NX CAD/CAE automation - Add study_wizard.py for guided study creation - Fix FEM mesh update: load idealized part before UpdateFemodel() New Study: - bracket_pareto_3obj: 3-objective Pareto (mass, stress, stiffness) - 167 FEA trials + 5000 NN trials completed - Demonstrates full hybrid workflow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
437
docs/protocols/operations/OP_06_TROUBLESHOOT.md
Normal file
437
docs/protocols/operations/OP_06_TROUBLESHOOT.md
Normal file
@@ -0,0 +1,437 @@
|
||||
# OP_06: Troubleshoot
|
||||
|
||||
<!--
|
||||
PROTOCOL: Troubleshoot Optimization Issues
|
||||
LAYER: Operations
|
||||
VERSION: 1.0
|
||||
STATUS: Active
|
||||
LAST_UPDATED: 2025-12-05
|
||||
PRIVILEGE: user
|
||||
LOAD_WITH: []
|
||||
-->
|
||||
|
||||
## Overview
|
||||
|
||||
This protocol provides systematic troubleshooting for common optimization issues, covering NX errors, extraction failures, database problems, and performance issues.
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| "error", "failed" | Follow this protocol |
|
||||
| "not working", "crashed" | Follow this protocol |
|
||||
| "help", "stuck" | Follow this protocol |
|
||||
| Unexpected behavior | Follow this protocol |
|
||||
|
||||
---
|
||||
|
||||
## Quick Diagnostic
|
||||
|
||||
```bash
|
||||
# 1. Check environment
|
||||
conda activate atomizer
|
||||
python --version # Should be 3.9+
|
||||
|
||||
# 2. Check study structure
|
||||
ls studies/my_study/
|
||||
# Should have: 1_setup/, run_optimization.py
|
||||
|
||||
# 3. Check model files
|
||||
ls studies/my_study/1_setup/model/
|
||||
# Should have: .prt, .sim files
|
||||
|
||||
# 4. Test single trial
|
||||
python run_optimization.py --test
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Categories
|
||||
|
||||
### 1. Environment Errors
|
||||
|
||||
#### "ModuleNotFoundError: No module named 'optuna'"
|
||||
|
||||
**Cause**: Wrong Python environment
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
conda activate atomizer
|
||||
# Verify
|
||||
conda list | grep optuna
|
||||
```
|
||||
|
||||
#### "Python version mismatch"
|
||||
|
||||
**Cause**: Wrong Python version
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
python --version # Need 3.9+
|
||||
conda activate atomizer
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. NX Model Setup Errors
|
||||
|
||||
#### "All optimization trials produce identical results"
|
||||
|
||||
**Cause**: Missing idealized part (`*_i.prt`) or broken file chain
|
||||
|
||||
**Symptoms**:
|
||||
- Journal shows "FE model updated" but results don't change
|
||||
- DAT files have same node coordinates with different expressions
|
||||
- OP2 file timestamps update but values are identical
|
||||
|
||||
**Root Cause**: NX simulation files have a parent-child hierarchy:
|
||||
```
|
||||
.sim → .fem → _i.prt → .prt (geometry)
|
||||
```
|
||||
|
||||
If the `_i.prt` (idealized part) is missing or not properly linked, `UpdateFemodel()` runs but the mesh doesn't regenerate because:
|
||||
- FEM mesh is tied to idealized geometry, not master geometry
|
||||
- Without idealized part updating, FEM has nothing new to mesh against
|
||||
|
||||
**Solution**:
|
||||
1. **Check file chain in NX**:
|
||||
- Open `.sim` file
|
||||
- Go to **Part Navigator** or **Assembly Navigator**
|
||||
- List ALL referenced parts
|
||||
|
||||
2. **Copy ALL linked files** to study folder:
|
||||
```bash
|
||||
# Typical file set needed:
|
||||
Model.prt # Geometry
|
||||
Model_fem1_i.prt # Idealized part ← OFTEN MISSING!
|
||||
Model_fem1.fem # FEM file
|
||||
Model_sim1.sim # Simulation file
|
||||
```
|
||||
|
||||
3. **Verify links are intact**:
|
||||
- Open model in NX after copying
|
||||
- Check that updates propagate: Geometry → Idealized → FEM → Sim
|
||||
|
||||
4. **CRITICAL CODE FIX** (already implemented in `solve_simulation.py`):
|
||||
The idealized part MUST be explicitly loaded before `UpdateFemodel()`:
|
||||
```python
|
||||
# Load idealized part BEFORE updating FEM
|
||||
for filename in os.listdir(working_dir):
|
||||
if '_i.prt' in filename.lower():
|
||||
idealized_part, status = theSession.Parts.Open(path)
|
||||
break
|
||||
|
||||
# Now UpdateFemodel() will work correctly
|
||||
feModel.UpdateFemodel()
|
||||
```
|
||||
Without loading the `_i.prt`, NX cannot propagate geometry changes to the mesh.
|
||||
|
||||
**Prevention**: Always use introspection to list all parts referenced by a simulation.
|
||||
|
||||
---
|
||||
|
||||
### 3. NX/Solver Errors
|
||||
|
||||
#### "NX session timeout after 600s"
|
||||
|
||||
**Cause**: Model too complex or NX stuck
|
||||
|
||||
**Solution**:
|
||||
1. Increase timeout in config:
|
||||
```json
|
||||
"simulation": {
|
||||
"timeout": 1200
|
||||
}
|
||||
```
|
||||
2. Simplify mesh if possible
|
||||
3. Check NX license availability
|
||||
|
||||
#### "Expression 'xxx' not found in model"
|
||||
|
||||
**Cause**: Expression name mismatch
|
||||
|
||||
**Solution**:
|
||||
1. Open model in NX
|
||||
2. Go to Tools → Expressions
|
||||
3. Verify exact expression name (case-sensitive)
|
||||
4. Update config to match
|
||||
|
||||
#### "NX license error"
|
||||
|
||||
**Cause**: License server unavailable
|
||||
|
||||
**Solution**:
|
||||
1. Check license server status
|
||||
2. Wait and retry
|
||||
3. Contact IT if persistent
|
||||
|
||||
#### "NX solve failed - check log"
|
||||
|
||||
**Cause**: Nastran solver error
|
||||
|
||||
**Solution**:
|
||||
1. Find log file: `1_setup/model/*.log` or `*.f06`
|
||||
2. Search for "FATAL" or "ERROR"
|
||||
3. Common causes:
|
||||
- Singular stiffness matrix (constraints issue)
|
||||
- Bad mesh (distorted elements)
|
||||
- Missing material properties
|
||||
|
||||
---
|
||||
|
||||
### 3. Extraction Errors
|
||||
|
||||
#### "OP2 file not found"
|
||||
|
||||
**Cause**: Solve didn't produce output
|
||||
|
||||
**Solution**:
|
||||
1. Check if solve completed
|
||||
2. Look for `.op2` file in model directory
|
||||
3. Check NX log for solve errors
|
||||
|
||||
#### "No displacement data for subcase X"
|
||||
|
||||
**Cause**: Wrong subcase number
|
||||
|
||||
**Solution**:
|
||||
1. Check available subcases in OP2:
|
||||
```python
|
||||
from pyNastran.op2.op2 import OP2
|
||||
op2 = OP2()
|
||||
op2.read_op2('model.op2')
|
||||
print(op2.displacements.keys())
|
||||
```
|
||||
2. Update subcase in extractor call
|
||||
|
||||
#### "Element type 'xxx' not supported"
|
||||
|
||||
**Cause**: Extractor doesn't support element type
|
||||
|
||||
**Solution**:
|
||||
1. Check available types in extractor
|
||||
2. Common types: `cquad4`, `ctria3`, `ctetra`, `chexa`
|
||||
3. May need different extractor
|
||||
|
||||
---
|
||||
|
||||
### 4. Database Errors
|
||||
|
||||
#### "Database is locked"
|
||||
|
||||
**Cause**: Another process using database
|
||||
|
||||
**Solution**:
|
||||
1. Check for running processes:
|
||||
```bash
|
||||
ps aux | grep run_optimization
|
||||
```
|
||||
2. Kill stale process if needed
|
||||
3. Wait for other optimization to finish
|
||||
|
||||
#### "Study 'xxx' not found"
|
||||
|
||||
**Cause**: Wrong study name or path
|
||||
|
||||
**Solution**:
|
||||
1. Check exact study name in database:
|
||||
```python
|
||||
import optuna
|
||||
storage = optuna.storages.RDBStorage('sqlite:///study.db')
|
||||
print(storage.get_all_study_summaries())
|
||||
```
|
||||
2. Use correct name when loading
|
||||
|
||||
#### "IntegrityError: UNIQUE constraint failed"
|
||||
|
||||
**Cause**: Duplicate trial number
|
||||
|
||||
**Solution**:
|
||||
1. Don't run multiple optimizations on same study simultaneously
|
||||
2. Use `--resume` flag for continuation
|
||||
|
||||
---
|
||||
|
||||
### 5. Constraint/Feasibility Errors
|
||||
|
||||
#### "All trials pruned"
|
||||
|
||||
**Cause**: No feasible region
|
||||
|
||||
**Solution**:
|
||||
1. Check constraint values:
|
||||
```python
|
||||
# In objective function, print constraint values
|
||||
print(f"Stress: {stress}, limit: 250")
|
||||
```
|
||||
2. Relax constraints
|
||||
3. Widen design variable bounds
|
||||
|
||||
#### "No improvement after N trials"
|
||||
|
||||
**Cause**: Stuck in local minimum or converged
|
||||
|
||||
**Solution**:
|
||||
1. Check if truly converged (good result)
|
||||
2. Try different starting region
|
||||
3. Use different sampler
|
||||
4. Increase exploration (lower `n_startup_trials`)
|
||||
|
||||
---
|
||||
|
||||
### 6. Performance Issues
|
||||
|
||||
#### "Trials running very slowly"
|
||||
|
||||
**Cause**: Complex model or inefficient extraction
|
||||
|
||||
**Solution**:
|
||||
1. Profile time per component:
|
||||
```python
|
||||
import time
|
||||
start = time.time()
|
||||
# ... operation ...
|
||||
print(f"Took: {time.time() - start:.1f}s")
|
||||
```
|
||||
2. Simplify mesh if NX is slow
|
||||
3. Check extraction isn't re-parsing OP2 multiple times
|
||||
|
||||
#### "Memory error"
|
||||
|
||||
**Cause**: Large OP2 file or many trials
|
||||
|
||||
**Solution**:
|
||||
1. Clear Python memory between trials
|
||||
2. Don't store all results in memory
|
||||
3. Use database for persistence
|
||||
|
||||
---
|
||||
|
||||
## Diagnostic Commands
|
||||
|
||||
### Quick Health Check
|
||||
|
||||
```bash
|
||||
# Environment
|
||||
conda activate atomizer
|
||||
python -c "import optuna; print('Optuna OK')"
|
||||
python -c "import pyNastran; print('pyNastran OK')"
|
||||
|
||||
# Study structure
|
||||
ls -la studies/my_study/
|
||||
|
||||
# Config validity
|
||||
python -c "
|
||||
import json
|
||||
with open('studies/my_study/1_setup/optimization_config.json') as f:
|
||||
config = json.load(f)
|
||||
print('Config OK')
|
||||
print(f'Objectives: {len(config.get(\"objectives\", []))}')
|
||||
"
|
||||
|
||||
# Database status
|
||||
python -c "
|
||||
import optuna
|
||||
study = optuna.load_study('my_study', 'sqlite:///studies/my_study/2_results/study.db')
|
||||
print(f'Trials: {len(study.trials)}')
|
||||
"
|
||||
```
|
||||
|
||||
### NX Log Analysis
|
||||
|
||||
```bash
|
||||
# Find latest log
|
||||
ls -lt studies/my_study/1_setup/model/*.log | head -1
|
||||
|
||||
# Search for errors
|
||||
grep -i "error\|fatal\|fail" studies/my_study/1_setup/model/*.log
|
||||
```
|
||||
|
||||
### Trial Failure Analysis
|
||||
|
||||
```python
|
||||
import optuna
|
||||
|
||||
study = optuna.load_study(...)
|
||||
|
||||
# Failed trials
|
||||
failed = [t for t in study.trials
|
||||
if t.state == optuna.trial.TrialState.FAIL]
|
||||
print(f"Failed: {len(failed)}")
|
||||
|
||||
for t in failed[:5]:
|
||||
print(f"Trial {t.number}: {t.user_attrs}")
|
||||
|
||||
# Pruned trials
|
||||
pruned = [t for t in study.trials
|
||||
if t.state == optuna.trial.TrialState.PRUNED]
|
||||
print(f"Pruned: {len(pruned)}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recovery Actions
|
||||
|
||||
### Reset Study (Start Fresh)
|
||||
|
||||
```bash
|
||||
# Backup first
|
||||
cp -r studies/my_study/2_results studies/my_study/2_results_backup
|
||||
|
||||
# Delete results
|
||||
rm -rf studies/my_study/2_results/*
|
||||
|
||||
# Run fresh
|
||||
python run_optimization.py
|
||||
```
|
||||
|
||||
### Resume Interrupted Study
|
||||
|
||||
```bash
|
||||
python run_optimization.py --resume
|
||||
```
|
||||
|
||||
### Restore from Backup
|
||||
|
||||
```bash
|
||||
cp -r studies/my_study/2_results_backup/* studies/my_study/2_results/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Getting Help
|
||||
|
||||
### Information to Provide
|
||||
|
||||
When asking for help, include:
|
||||
1. Error message (full traceback)
|
||||
2. Config file contents
|
||||
3. Study structure (`ls -la`)
|
||||
4. What you tried
|
||||
5. NX log excerpt (if NX error)
|
||||
|
||||
### Log Locations
|
||||
|
||||
| Log | Location |
|
||||
|-----|----------|
|
||||
| Optimization | Console output or redirect to file |
|
||||
| NX Solve | `1_setup/model/*.log`, `*.f06` |
|
||||
| Database | `2_results/study.db` (query with optuna) |
|
||||
| Intelligence | `2_results/intelligent_optimizer/*.json` |
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
- **Related**: All operation protocols
|
||||
- **System**: [SYS_10_IMSO](../system/SYS_10_IMSO.md), [SYS_12_EXTRACTOR_LIBRARY](../system/SYS_12_EXTRACTOR_LIBRARY.md)
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 1.0 | 2025-12-05 | Initial release |
|
||||
Reference in New Issue
Block a user