Neural Acceleration (MLP Surrogate): - Add run_nn_optimization.py with hybrid FEA/NN workflow - MLP architecture: 4-layer (64->128->128->64) with BatchNorm/Dropout - Three workflow modes: - --all: Sequential export->train->optimize->validate - --hybrid-loop: Iterative Train->NN->Validate->Retrain cycle - --turbo: Aggressive single-best validation (RECOMMENDED) - Turbo mode: 5000 NN trials + 50 FEA validations in ~12 minutes - Separate nn_study.db to avoid overloading dashboard Performance Results (bracket_pareto_3obj study): - NN prediction errors: mass 1-5%, stress 1-4%, stiffness 5-15% - Found minimum mass designs at boundary (angle~30deg, thick~30mm) - 100x speedup vs pure FEA exploration Protocol Operating System: - Add .claude/skills/ with Bootstrap, Cheatsheet, Context Loader - Add docs/protocols/ with operations (OP_01-06) and system (SYS_10-14) - Update SYS_14_NEURAL_ACCELERATION.md with MLP Turbo Mode docs NX Automation: - Add optimization_engine/hooks/ for NX CAD/CAE automation - Add study_wizard.py for guided study creation - Fix FEM mesh update: load idealized part before UpdateFemodel() New Study: - bracket_pareto_3obj: 3-objective Pareto (mass, stress, stiffness) - 167 FEA trials + 5000 NN trials completed - Demonstrates full hybrid workflow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
438 lines
9.2 KiB
Markdown
438 lines
9.2 KiB
Markdown
# OP_06: Troubleshoot
|
|
|
|
<!--
|
|
PROTOCOL: Troubleshoot Optimization Issues
|
|
LAYER: Operations
|
|
VERSION: 1.0
|
|
STATUS: Active
|
|
LAST_UPDATED: 2025-12-05
|
|
PRIVILEGE: user
|
|
LOAD_WITH: []
|
|
-->
|
|
|
|
## Overview
|
|
|
|
This protocol provides systematic troubleshooting for common optimization issues, covering NX errors, extraction failures, database problems, and performance issues.
|
|
|
|
---
|
|
|
|
## When to Use
|
|
|
|
| Trigger | Action |
|
|
|---------|--------|
|
|
| "error", "failed" | Follow this protocol |
|
|
| "not working", "crashed" | Follow this protocol |
|
|
| "help", "stuck" | Follow this protocol |
|
|
| Unexpected behavior | Follow this protocol |
|
|
|
|
---
|
|
|
|
## Quick Diagnostic
|
|
|
|
```bash
|
|
# 1. Check environment
|
|
conda activate atomizer
|
|
python --version # Should be 3.9+
|
|
|
|
# 2. Check study structure
|
|
ls studies/my_study/
|
|
# Should have: 1_setup/, run_optimization.py
|
|
|
|
# 3. Check model files
|
|
ls studies/my_study/1_setup/model/
|
|
# Should have: .prt, .sim files
|
|
|
|
# 4. Test single trial
|
|
python run_optimization.py --test
|
|
```
|
|
|
|
---
|
|
|
|
## Error Categories
|
|
|
|
### 1. Environment Errors
|
|
|
|
#### "ModuleNotFoundError: No module named 'optuna'"
|
|
|
|
**Cause**: Wrong Python environment
|
|
|
|
**Solution**:
|
|
```bash
|
|
conda activate atomizer
|
|
# Verify
|
|
conda list | grep optuna
|
|
```
|
|
|
|
#### "Python version mismatch"
|
|
|
|
**Cause**: Wrong Python version
|
|
|
|
**Solution**:
|
|
```bash
|
|
python --version # Need 3.9+
|
|
conda activate atomizer
|
|
```
|
|
|
|
---
|
|
|
|
### 2. NX Model Setup Errors
|
|
|
|
#### "All optimization trials produce identical results"
|
|
|
|
**Cause**: Missing idealized part (`*_i.prt`) or broken file chain
|
|
|
|
**Symptoms**:
|
|
- Journal shows "FE model updated" but results don't change
|
|
- DAT files have same node coordinates with different expressions
|
|
- OP2 file timestamps update but values are identical
|
|
|
|
**Root Cause**: NX simulation files have a parent-child hierarchy:
|
|
```
|
|
.sim → .fem → _i.prt → .prt (geometry)
|
|
```
|
|
|
|
If the `_i.prt` (idealized part) is missing or not properly linked, `UpdateFemodel()` runs but the mesh doesn't regenerate because:
|
|
- FEM mesh is tied to idealized geometry, not master geometry
|
|
- Without idealized part updating, FEM has nothing new to mesh against
|
|
|
|
**Solution**:
|
|
1. **Check file chain in NX**:
|
|
- Open `.sim` file
|
|
- Go to **Part Navigator** or **Assembly Navigator**
|
|
- List ALL referenced parts
|
|
|
|
2. **Copy ALL linked files** to study folder:
|
|
```bash
|
|
# Typical file set needed:
|
|
Model.prt # Geometry
|
|
Model_fem1_i.prt # Idealized part ← OFTEN MISSING!
|
|
Model_fem1.fem # FEM file
|
|
Model_sim1.sim # Simulation file
|
|
```
|
|
|
|
3. **Verify links are intact**:
|
|
- Open model in NX after copying
|
|
- Check that updates propagate: Geometry → Idealized → FEM → Sim
|
|
|
|
4. **CRITICAL CODE FIX** (already implemented in `solve_simulation.py`):
|
|
The idealized part MUST be explicitly loaded before `UpdateFemodel()`:
|
|
```python
|
|
# Load idealized part BEFORE updating FEM
|
|
for filename in os.listdir(working_dir):
|
|
if '_i.prt' in filename.lower():
|
|
idealized_part, status = theSession.Parts.Open(path)
|
|
break
|
|
|
|
# Now UpdateFemodel() will work correctly
|
|
feModel.UpdateFemodel()
|
|
```
|
|
Without loading the `_i.prt`, NX cannot propagate geometry changes to the mesh.
|
|
|
|
**Prevention**: Always use introspection to list all parts referenced by a simulation.
|
|
|
|
---
|
|
|
|
### 3. NX/Solver Errors
|
|
|
|
#### "NX session timeout after 600s"
|
|
|
|
**Cause**: Model too complex or NX stuck
|
|
|
|
**Solution**:
|
|
1. Increase timeout in config:
|
|
```json
|
|
"simulation": {
|
|
"timeout": 1200
|
|
}
|
|
```
|
|
2. Simplify mesh if possible
|
|
3. Check NX license availability
|
|
|
|
#### "Expression 'xxx' not found in model"
|
|
|
|
**Cause**: Expression name mismatch
|
|
|
|
**Solution**:
|
|
1. Open model in NX
|
|
2. Go to Tools → Expressions
|
|
3. Verify exact expression name (case-sensitive)
|
|
4. Update config to match
|
|
|
|
#### "NX license error"
|
|
|
|
**Cause**: License server unavailable
|
|
|
|
**Solution**:
|
|
1. Check license server status
|
|
2. Wait and retry
|
|
3. Contact IT if persistent
|
|
|
|
#### "NX solve failed - check log"
|
|
|
|
**Cause**: Nastran solver error
|
|
|
|
**Solution**:
|
|
1. Find log file: `1_setup/model/*.log` or `*.f06`
|
|
2. Search for "FATAL" or "ERROR"
|
|
3. Common causes:
|
|
- Singular stiffness matrix (constraints issue)
|
|
- Bad mesh (distorted elements)
|
|
- Missing material properties
|
|
|
|
---
|
|
|
|
### 3. Extraction Errors
|
|
|
|
#### "OP2 file not found"
|
|
|
|
**Cause**: Solve didn't produce output
|
|
|
|
**Solution**:
|
|
1. Check if solve completed
|
|
2. Look for `.op2` file in model directory
|
|
3. Check NX log for solve errors
|
|
|
|
#### "No displacement data for subcase X"
|
|
|
|
**Cause**: Wrong subcase number
|
|
|
|
**Solution**:
|
|
1. Check available subcases in OP2:
|
|
```python
|
|
from pyNastran.op2.op2 import OP2
|
|
op2 = OP2()
|
|
op2.read_op2('model.op2')
|
|
print(op2.displacements.keys())
|
|
```
|
|
2. Update subcase in extractor call
|
|
|
|
#### "Element type 'xxx' not supported"
|
|
|
|
**Cause**: Extractor doesn't support element type
|
|
|
|
**Solution**:
|
|
1. Check available types in extractor
|
|
2. Common types: `cquad4`, `ctria3`, `ctetra`, `chexa`
|
|
3. May need different extractor
|
|
|
|
---
|
|
|
|
### 4. Database Errors
|
|
|
|
#### "Database is locked"
|
|
|
|
**Cause**: Another process using database
|
|
|
|
**Solution**:
|
|
1. Check for running processes:
|
|
```bash
|
|
ps aux | grep run_optimization
|
|
```
|
|
2. Kill stale process if needed
|
|
3. Wait for other optimization to finish
|
|
|
|
#### "Study 'xxx' not found"
|
|
|
|
**Cause**: Wrong study name or path
|
|
|
|
**Solution**:
|
|
1. Check exact study name in database:
|
|
```python
|
|
import optuna
|
|
storage = optuna.storages.RDBStorage('sqlite:///study.db')
|
|
print(storage.get_all_study_summaries())
|
|
```
|
|
2. Use correct name when loading
|
|
|
|
#### "IntegrityError: UNIQUE constraint failed"
|
|
|
|
**Cause**: Duplicate trial number
|
|
|
|
**Solution**:
|
|
1. Don't run multiple optimizations on same study simultaneously
|
|
2. Use `--resume` flag for continuation
|
|
|
|
---
|
|
|
|
### 5. Constraint/Feasibility Errors
|
|
|
|
#### "All trials pruned"
|
|
|
|
**Cause**: No feasible region
|
|
|
|
**Solution**:
|
|
1. Check constraint values:
|
|
```python
|
|
# In objective function, print constraint values
|
|
print(f"Stress: {stress}, limit: 250")
|
|
```
|
|
2. Relax constraints
|
|
3. Widen design variable bounds
|
|
|
|
#### "No improvement after N trials"
|
|
|
|
**Cause**: Stuck in local minimum or converged
|
|
|
|
**Solution**:
|
|
1. Check if truly converged (good result)
|
|
2. Try different starting region
|
|
3. Use different sampler
|
|
4. Increase exploration (lower `n_startup_trials`)
|
|
|
|
---
|
|
|
|
### 6. Performance Issues
|
|
|
|
#### "Trials running very slowly"
|
|
|
|
**Cause**: Complex model or inefficient extraction
|
|
|
|
**Solution**:
|
|
1. Profile time per component:
|
|
```python
|
|
import time
|
|
start = time.time()
|
|
# ... operation ...
|
|
print(f"Took: {time.time() - start:.1f}s")
|
|
```
|
|
2. Simplify mesh if NX is slow
|
|
3. Check extraction isn't re-parsing OP2 multiple times
|
|
|
|
#### "Memory error"
|
|
|
|
**Cause**: Large OP2 file or many trials
|
|
|
|
**Solution**:
|
|
1. Clear Python memory between trials
|
|
2. Don't store all results in memory
|
|
3. Use database for persistence
|
|
|
|
---
|
|
|
|
## Diagnostic Commands
|
|
|
|
### Quick Health Check
|
|
|
|
```bash
|
|
# Environment
|
|
conda activate atomizer
|
|
python -c "import optuna; print('Optuna OK')"
|
|
python -c "import pyNastran; print('pyNastran OK')"
|
|
|
|
# Study structure
|
|
ls -la studies/my_study/
|
|
|
|
# Config validity
|
|
python -c "
|
|
import json
|
|
with open('studies/my_study/1_setup/optimization_config.json') as f:
|
|
config = json.load(f)
|
|
print('Config OK')
|
|
print(f'Objectives: {len(config.get(\"objectives\", []))}')
|
|
"
|
|
|
|
# Database status
|
|
python -c "
|
|
import optuna
|
|
study = optuna.load_study('my_study', 'sqlite:///studies/my_study/2_results/study.db')
|
|
print(f'Trials: {len(study.trials)}')
|
|
"
|
|
```
|
|
|
|
### NX Log Analysis
|
|
|
|
```bash
|
|
# Find latest log
|
|
ls -lt studies/my_study/1_setup/model/*.log | head -1
|
|
|
|
# Search for errors
|
|
grep -i "error\|fatal\|fail" studies/my_study/1_setup/model/*.log
|
|
```
|
|
|
|
### Trial Failure Analysis
|
|
|
|
```python
|
|
import optuna
|
|
|
|
study = optuna.load_study(...)
|
|
|
|
# Failed trials
|
|
failed = [t for t in study.trials
|
|
if t.state == optuna.trial.TrialState.FAIL]
|
|
print(f"Failed: {len(failed)}")
|
|
|
|
for t in failed[:5]:
|
|
print(f"Trial {t.number}: {t.user_attrs}")
|
|
|
|
# Pruned trials
|
|
pruned = [t for t in study.trials
|
|
if t.state == optuna.trial.TrialState.PRUNED]
|
|
print(f"Pruned: {len(pruned)}")
|
|
```
|
|
|
|
---
|
|
|
|
## Recovery Actions
|
|
|
|
### Reset Study (Start Fresh)
|
|
|
|
```bash
|
|
# Backup first
|
|
cp -r studies/my_study/2_results studies/my_study/2_results_backup
|
|
|
|
# Delete results
|
|
rm -rf studies/my_study/2_results/*
|
|
|
|
# Run fresh
|
|
python run_optimization.py
|
|
```
|
|
|
|
### Resume Interrupted Study
|
|
|
|
```bash
|
|
python run_optimization.py --resume
|
|
```
|
|
|
|
### Restore from Backup
|
|
|
|
```bash
|
|
cp -r studies/my_study/2_results_backup/* studies/my_study/2_results/
|
|
```
|
|
|
|
---
|
|
|
|
## Getting Help
|
|
|
|
### Information to Provide
|
|
|
|
When asking for help, include:
|
|
1. Error message (full traceback)
|
|
2. Config file contents
|
|
3. Study structure (`ls -la`)
|
|
4. What you tried
|
|
5. NX log excerpt (if NX error)
|
|
|
|
### Log Locations
|
|
|
|
| Log | Location |
|
|
|-----|----------|
|
|
| Optimization | Console output or redirect to file |
|
|
| NX Solve | `1_setup/model/*.log`, `*.f06` |
|
|
| Database | `2_results/study.db` (query with optuna) |
|
|
| Intelligence | `2_results/intelligent_optimizer/*.json` |
|
|
|
|
---
|
|
|
|
## Cross-References
|
|
|
|
- **Related**: All operation protocols
|
|
- **System**: [SYS_10_IMSO](../system/SYS_10_IMSO.md), [SYS_12_EXTRACTOR_LIBRARY](../system/SYS_12_EXTRACTOR_LIBRARY.md)
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
| Version | Date | Changes |
|
|
|---------|------|---------|
|
|
| 1.0 | 2025-12-05 | Initial release |
|