Files
Atomizer/.claude/skills/troubleshoot.md
Anto01 e3bdb08a22 feat: Major update with validators, skills, dashboard, and docs reorganization
- Add validation framework (config, model, results, study validators)
- Add Claude Code skills (create-study, run-optimization, generate-report,
  troubleshoot, analyze-model)
- Add Atomizer Dashboard (React frontend + FastAPI backend)
- Reorganize docs into structured directories (00-09)
- Add neural surrogate modules and training infrastructure
- Add multi-objective optimization support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 19:23:58 -05:00

359 lines
9.1 KiB
Markdown

# Troubleshoot Skill
**Last Updated**: November 25, 2025
**Version**: 1.0 - Debug Common Issues and Error Recovery
You are helping the user diagnose and fix problems with Atomizer optimization studies.
## Purpose
Diagnose and resolve common issues:
1. Configuration validation failures
2. Model file problems
3. NX solver errors
4. Optuna/database issues
5. Result extraction failures
6. Constraint violation patterns
## Triggers
- "troubleshoot"
- "debug"
- "error"
- "failed"
- "not working"
- "what's wrong"
- "fix"
## Prerequisites
- Study must exist in `studies/{study_name}/`
- User should describe the error or symptom
## Diagnostic Process
### Step 1: Run Full Validation
```python
from optimization_engine.validators import validate_study
result = validate_study("{study_name}")
print(result)
```
This provides a complete health check covering:
- Configuration validity
- Model file presence
- Results integrity (if any)
### Step 2: Identify Error Category
Classify the issue into one of these categories:
| Category | Symptoms | First Check |
|----------|----------|-------------|
| Config | "Invalid config", validation errors | `validate_config_file()` |
| Model | "File not found", NX errors | `validate_study_model()` |
| Solver | "Simulation failed", timeout | NX logs, OP2 files |
| Database | "Study not found", lock errors | `study.db` file, Optuna |
| Extraction | "Cannot extract", NaN values | OP2 file validity |
| Constraints | All trials infeasible | Constraint thresholds |
## Common Issues & Solutions
### Issue 1: Configuration Validation Fails
**Symptoms**:
```
[ERROR] [DESIGN_VAR_BOUNDS] beam_thickness: min (5) >= max (3)
```
**Diagnosis**:
```python
from optimization_engine.validators import validate_config_file
result = validate_config_file("studies/{study_name}/1_setup/optimization_config.json")
for error in result.errors:
print(error)
```
**Solutions**:
| Error Code | Cause | Fix |
|------------|-------|-----|
| DESIGN_VAR_BOUNDS | Bounds inverted | Swap min/max values |
| MISSING_OBJECTIVES | No objectives defined | Add objectives array |
| INVALID_DIRECTION | Wrong goal value | Use "minimize" or "maximize" |
| PROTOCOL_MISMATCH | Wrong protocol for objectives | Match protocol to # objectives |
### Issue 2: Model Files Missing
**Symptoms**:
```
[ERROR] No part file (.prt) found in model directory
```
**Diagnosis**:
```python
from optimization_engine.validators import validate_study_model
result = validate_study_model("{study_name}")
print(f"Part: {result.prt_file}")
print(f"Sim: {result.sim_file}")
print(f"FEM: {result.fem_file}")
```
**Solutions**:
1. Ensure files are in `studies/{study_name}/1_setup/model/`
2. Check file naming convention (e.g., `Beam.prt`, `Beam_sim1.sim`)
3. FEM file auto-generates on first solve (not required initially)
### Issue 3: NX Solver Fails
**Symptoms**:
```
[NX SOLVER] Error: Simulation timeout after 600s
[NX SOLVER] Error: Unable to open simulation file
```
**Diagnosis**:
1. Check NX is installed and configured:
```python
from config import NX_VERSION, NX_INSTALL_PATH
print(f"NX Version: {NX_VERSION}")
print(f"Path: {NX_INSTALL_PATH}")
```
2. Check for running NX processes:
```bash
tasklist | findstr "ugraf"
```
3. Read NX journal output:
```
studies/{study_name}/1_setup/model/_temp_solve_journal.py
```
**Solutions**:
| Error | Cause | Fix |
|-------|-------|-----|
| Timeout | Complex mesh or bad parameters | Increase timeout or simplify design |
| License error | NX license unavailable | Wait or check license server |
| File locked | Another NX process has file open | Close NX and retry |
| Expression not found | NX expression name mismatch | Verify expression names in NX |
### Issue 4: Database Errors
**Symptoms**:
```
[ERROR] Study 'my_study' not found in storage
[ERROR] database is locked
```
**Diagnosis**:
```python
import optuna
storage = "sqlite:///studies/{study_name}/2_results/study.db"
studies = optuna.study.get_all_study_summaries(storage)
print([s.study_name for s in studies])
```
**Solutions**:
| Error | Cause | Fix |
|-------|-------|-----|
| Study not found | Wrong study name | Check exact name in database |
| Database locked | Multiple processes | Kill other optimization processes |
| Corrupted DB | Interrupted write | Delete and restart (backup first) |
### Issue 5: Result Extraction Fails
**Symptoms**:
```
[ERROR] Cannot extract displacement from OP2
[ERROR] NaN values in objectives
```
**Diagnosis**:
1. Check OP2 file exists:
```bash
dir studies\{study_name}\1_setup\model\*.op2
```
2. Validate OP2 contents:
```python
from pyNastran.op2.op2 import OP2
op2 = OP2()
op2.read_op2("path/to/file.op2")
print(op2.get_result_table_names())
```
3. Check extraction config matches OP2:
```json
{
"extraction": {
"params": {
"subcase": 1,
"result_type": "displacement"
}
}
}
```
**Solutions**:
| Error | Cause | Fix |
|-------|-------|-----|
| No OP2 file | Solve didn't run | Check NX solver output |
| Wrong subcase | Subcase ID mismatch | Match subcase to solution |
| Missing result | Result not requested | Enable output in NX |
### Issue 6: All Trials Infeasible
**Symptoms**:
```
Feasibility rate: 0%
All trials violating constraints
```
**Diagnosis**:
```python
from optimization_engine.validators import validate_results
result = validate_results("studies/{study_name}/2_results/study.db")
print(f"Feasibility: {result.info.feasibility_rate}%")
```
Check constraint violations in Optuna dashboard or:
```python
import optuna
study = optuna.load_study(...)
for trial in study.trials:
if trial.user_attrs.get('feasible') == False:
print(f"Trial {trial.number}: {trial.user_attrs.get('violated_constraints')}")
```
**Solutions**:
| Issue | Cause | Fix |
|-------|-------|-----|
| All trials fail constraint | Threshold too tight | Relax constraint threshold |
| Single constraint always fails | Wrong extraction | Check constraint extraction |
| Bounds cause violations | Design space infeasible | Expand design variable bounds |
## Quick Diagnostic Commands
### Validate Everything
```bash
python -m optimization_engine.validators.study_validator {study_name}
```
### Check Results
```bash
python -m optimization_engine.validators.results_validator {study_name}
```
### List All Studies
```bash
python -m optimization_engine.validators.study_validator
```
### Check Optuna Database
```python
import optuna
storage = "sqlite:///studies/{study_name}/2_results/study.db"
study = optuna.load_study(study_name="{study_name}", storage=storage)
print(f"Trials: {len(study.trials)}")
print(f"Completed: {len([t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE])}")
print(f"Failed: {len([t for t in study.trials if t.state == optuna.trial.TrialState.FAIL])}")
```
## Recovery Actions
### Reset Study Database
```python
import optuna
storage = "sqlite:///studies/{study_name}/2_results/study.db"
optuna.delete_study(study_name="{study_name}", storage=storage)
```
Or use the reset script:
```bash
python studies/{study_name}/reset_study.py
```
### Resume Interrupted Study
```bash
python studies/{study_name}/run_optimization.py --trials 30 --resume
```
### Clean Worker Directories
```bash
# Remove temp files from worker dirs
del /S /Q studies\{study_name}\1_setup\worker_*
```
### Backup and Restore Database
```bash
# Backup
copy studies\{study_name}\2_results\study.db studies\{study_name}\2_results\study_backup.db
# Restore
copy studies\{study_name}\2_results\study_backup.db studies\{study_name}\2_results\study.db
```
## Error Message Reference
| Error Message | Category | Quick Fix |
|---------------|----------|-----------|
| "min >= max" | Config | Swap bounds |
| "No part file found" | Model | Add .prt file |
| "Simulation timeout" | Solver | Increase timeout or check NX |
| "database is locked" | Database | Kill other processes |
| "Cannot extract" | Extraction | Check OP2 and extraction config |
| "All trials infeasible" | Constraints | Relax thresholds |
## Integration with Other Skills
- After fixing config -> `/run-optimization` to restart
- After debugging results -> `/generate-report` to analyze
- If model needs changes -> Check `NX_FILE_MODIFICATIONS_REQUIRED.md`
## Example Conversation
```
USER: My optimization is failing with "Cannot extract displacement"
CLAUDE: Let me diagnose the displacement extraction issue.
First, I'll validate the study:
[Runs validate_study()]
The validation shows model files are present. Let me check if the OP2 file exists
and contains displacement data...
[Checks OP2 file]
I found the issue: Your extraction config specifies subcase=1, but the displacement
results are in subcase=2 (the static analysis solution).
Fix: Update optimization_config.json:
```json
{
"extraction": {
"params": {
"subcase": 2 // Was 1, should be 2
}
}
}
```
After this change, run again with:
python run_optimization.py --trials 30 --resume
```
## Notes
- Always run full validation before investigating specific issues
- Most problems are configuration or file path issues
- NX solver errors often require checking NX directly
- Keep backups before modifying study.db
- Use --resume to continue after fixing issues