Files
Atomizer/docs/protocols/operations/OP_06_TROUBLESHOOT.md

438 lines
9.2 KiB
Markdown
Raw Normal View History

# OP_06: Troubleshoot
<!--
PROTOCOL: Troubleshoot Optimization Issues
LAYER: Operations
VERSION: 1.0
STATUS: Active
LAST_UPDATED: 2025-12-05
PRIVILEGE: user
LOAD_WITH: []
-->
## Overview
This protocol provides systematic troubleshooting for common optimization issues, covering NX errors, extraction failures, database problems, and performance issues.
---
## When to Use
| Trigger | Action |
|---------|--------|
| "error", "failed" | Follow this protocol |
| "not working", "crashed" | Follow this protocol |
| "help", "stuck" | Follow this protocol |
| Unexpected behavior | Follow this protocol |
---
## Quick Diagnostic
```bash
# 1. Check environment
conda activate atomizer
python --version # Should be 3.9+
# 2. Check study structure
ls studies/my_study/
# Should have: 1_setup/, run_optimization.py
# 3. Check model files
ls studies/my_study/1_setup/model/
# Should have: .prt, .sim files
# 4. Test single trial
python run_optimization.py --test
```
---
## Error Categories
### 1. Environment Errors
#### "ModuleNotFoundError: No module named 'optuna'"
**Cause**: Wrong Python environment
**Solution**:
```bash
conda activate atomizer
# Verify
conda list | grep optuna
```
#### "Python version mismatch"
**Cause**: Wrong Python version
**Solution**:
```bash
python --version # Need 3.9+
conda activate atomizer
```
---
### 2. NX Model Setup Errors
#### "All optimization trials produce identical results"
**Cause**: Missing idealized part (`*_i.prt`) or broken file chain
**Symptoms**:
- Journal shows "FE model updated" but results don't change
- DAT files have same node coordinates with different expressions
- OP2 file timestamps update but values are identical
**Root Cause**: NX simulation files have a parent-child hierarchy:
```
.sim → .fem → _i.prt → .prt (geometry)
```
If the `_i.prt` (idealized part) is missing or not properly linked, `UpdateFemodel()` runs but the mesh doesn't regenerate because:
- FEM mesh is tied to idealized geometry, not master geometry
- Without idealized part updating, FEM has nothing new to mesh against
**Solution**:
1. **Check file chain in NX**:
- Open `.sim` file
- Go to **Part Navigator** or **Assembly Navigator**
- List ALL referenced parts
2. **Copy ALL linked files** to study folder:
```bash
# Typical file set needed:
Model.prt # Geometry
Model_fem1_i.prt # Idealized part ← OFTEN MISSING!
Model_fem1.fem # FEM file
Model_sim1.sim # Simulation file
```
3. **Verify links are intact**:
- Open model in NX after copying
- Check that updates propagate: Geometry → Idealized → FEM → Sim
4. **CRITICAL CODE FIX** (already implemented in `solve_simulation.py`):
The idealized part MUST be explicitly loaded before `UpdateFemodel()`:
```python
# Load idealized part BEFORE updating FEM
for filename in os.listdir(working_dir):
if '_i.prt' in filename.lower():
idealized_part, status = theSession.Parts.Open(path)
break
# Now UpdateFemodel() will work correctly
feModel.UpdateFemodel()
```
Without loading the `_i.prt`, NX cannot propagate geometry changes to the mesh.
**Prevention**: Always use introspection to list all parts referenced by a simulation.
---
### 3. NX/Solver Errors
#### "NX session timeout after 600s"
**Cause**: Model too complex or NX stuck
**Solution**:
1. Increase timeout in config:
```json
"simulation": {
"timeout": 1200
}
```
2. Simplify mesh if possible
3. Check NX license availability
#### "Expression 'xxx' not found in model"
**Cause**: Expression name mismatch
**Solution**:
1. Open model in NX
2. Go to Tools → Expressions
3. Verify exact expression name (case-sensitive)
4. Update config to match
#### "NX license error"
**Cause**: License server unavailable
**Solution**:
1. Check license server status
2. Wait and retry
3. Contact IT if persistent
#### "NX solve failed - check log"
**Cause**: Nastran solver error
**Solution**:
1. Find log file: `1_setup/model/*.log` or `*.f06`
2. Search for "FATAL" or "ERROR"
3. Common causes:
- Singular stiffness matrix (constraints issue)
- Bad mesh (distorted elements)
- Missing material properties
---
### 3. Extraction Errors
#### "OP2 file not found"
**Cause**: Solve didn't produce output
**Solution**:
1. Check if solve completed
2. Look for `.op2` file in model directory
3. Check NX log for solve errors
#### "No displacement data for subcase X"
**Cause**: Wrong subcase number
**Solution**:
1. Check available subcases in OP2:
```python
from pyNastran.op2.op2 import OP2
op2 = OP2()
op2.read_op2('model.op2')
print(op2.displacements.keys())
```
2. Update subcase in extractor call
#### "Element type 'xxx' not supported"
**Cause**: Extractor doesn't support element type
**Solution**:
1. Check available types in extractor
2. Common types: `cquad4`, `ctria3`, `ctetra`, `chexa`
3. May need different extractor
---
### 4. Database Errors
#### "Database is locked"
**Cause**: Another process using database
**Solution**:
1. Check for running processes:
```bash
ps aux | grep run_optimization
```
2. Kill stale process if needed
3. Wait for other optimization to finish
#### "Study 'xxx' not found"
**Cause**: Wrong study name or path
**Solution**:
1. Check exact study name in database:
```python
import optuna
storage = optuna.storages.RDBStorage('sqlite:///study.db')
print(storage.get_all_study_summaries())
```
2. Use correct name when loading
#### "IntegrityError: UNIQUE constraint failed"
**Cause**: Duplicate trial number
**Solution**:
1. Don't run multiple optimizations on same study simultaneously
2. Use `--resume` flag for continuation
---
### 5. Constraint/Feasibility Errors
#### "All trials pruned"
**Cause**: No feasible region
**Solution**:
1. Check constraint values:
```python
# In objective function, print constraint values
print(f"Stress: {stress}, limit: 250")
```
2. Relax constraints
3. Widen design variable bounds
#### "No improvement after N trials"
**Cause**: Stuck in local minimum or converged
**Solution**:
1. Check if truly converged (good result)
2. Try different starting region
3. Use different sampler
4. Increase exploration (lower `n_startup_trials`)
---
### 6. Performance Issues
#### "Trials running very slowly"
**Cause**: Complex model or inefficient extraction
**Solution**:
1. Profile time per component:
```python
import time
start = time.time()
# ... operation ...
print(f"Took: {time.time() - start:.1f}s")
```
2. Simplify mesh if NX is slow
3. Check extraction isn't re-parsing OP2 multiple times
#### "Memory error"
**Cause**: Large OP2 file or many trials
**Solution**:
1. Clear Python memory between trials
2. Don't store all results in memory
3. Use database for persistence
---
## Diagnostic Commands
### Quick Health Check
```bash
# Environment
conda activate atomizer
python -c "import optuna; print('Optuna OK')"
python -c "import pyNastran; print('pyNastran OK')"
# Study structure
ls -la studies/my_study/
# Config validity
python -c "
import json
with open('studies/my_study/1_setup/optimization_config.json') as f:
config = json.load(f)
print('Config OK')
print(f'Objectives: {len(config.get(\"objectives\", []))}')
"
# Database status
python -c "
import optuna
study = optuna.load_study('my_study', 'sqlite:///studies/my_study/2_results/study.db')
print(f'Trials: {len(study.trials)}')
"
```
### NX Log Analysis
```bash
# Find latest log
ls -lt studies/my_study/1_setup/model/*.log | head -1
# Search for errors
grep -i "error\|fatal\|fail" studies/my_study/1_setup/model/*.log
```
### Trial Failure Analysis
```python
import optuna
study = optuna.load_study(...)
# Failed trials
failed = [t for t in study.trials
if t.state == optuna.trial.TrialState.FAIL]
print(f"Failed: {len(failed)}")
for t in failed[:5]:
print(f"Trial {t.number}: {t.user_attrs}")
# Pruned trials
pruned = [t for t in study.trials
if t.state == optuna.trial.TrialState.PRUNED]
print(f"Pruned: {len(pruned)}")
```
---
## Recovery Actions
### Reset Study (Start Fresh)
```bash
# Backup first
cp -r studies/my_study/2_results studies/my_study/2_results_backup
# Delete results
rm -rf studies/my_study/2_results/*
# Run fresh
python run_optimization.py
```
### Resume Interrupted Study
```bash
python run_optimization.py --resume
```
### Restore from Backup
```bash
cp -r studies/my_study/2_results_backup/* studies/my_study/2_results/
```
---
## Getting Help
### Information to Provide
When asking for help, include:
1. Error message (full traceback)
2. Config file contents
3. Study structure (`ls -la`)
4. What you tried
5. NX log excerpt (if NX error)
### Log Locations
| Log | Location |
|-----|----------|
| Optimization | Console output or redirect to file |
| NX Solve | `1_setup/model/*.log`, `*.f06` |
| Database | `2_results/study.db` (query with optuna) |
| Intelligence | `2_results/intelligent_optimizer/*.json` |
---
## Cross-References
- **Related**: All operation protocols
- **System**: [SYS_10_IMSO](../system/SYS_10_IMSO.md), [SYS_12_EXTRACTOR_LIBRARY](../system/SYS_12_EXTRACTOR_LIBRARY.md)
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-12-05 | Initial release |