Atomizer/.claude/skills/troubleshoot.md

# Troubleshoot Skill

**Last Updated**: November 25, 2025
**Version**: 1.0 - Debug Common Issues and Error Recovery

You are helping the user diagnose and fix problems with Atomizer optimization studies.

## Purpose

Diagnose and resolve common issues:
1. Configuration validation failures
2. Model file problems
3. NX solver errors
4. Optuna/database issues
5. Result extraction failures
6. Constraint violation patterns

## Triggers

- "troubleshoot"
- "debug"
- "error"
- "failed"
- "not working"
- "what's wrong"
- "fix"

## Prerequisites

- Study must exist in `studies/{study_name}/`
- User should describe the error or symptom

## Diagnostic Process

### Step 1: Run Full Validation

```python
from optimization_engine.validators import validate_study

result = validate_study("{study_name}")
print(result)
```

This provides a complete health check covering:
- Configuration validity
- Model file presence
- Results integrity (if any)

### Step 2: Identify Error Category

Classify the issue into one of these categories:

| Category | Symptoms | First Check |
|----------|----------|-------------|
| Config | "Invalid config", validation errors | `validate_config_file()` |
| Model | "File not found", NX errors | `validate_study_model()` |
| Solver | "Simulation failed", timeout | NX logs, OP2 files |
| Database | "Study not found", lock errors | `study.db` file, Optuna |
| Extraction | "Cannot extract", NaN values | OP2 file validity |
| Constraints | All trials infeasible | Constraint thresholds |

## Common Issues & Solutions

### Issue 1: Configuration Validation Fails

**Symptoms**:
```
[ERROR] [DESIGN_VAR_BOUNDS] beam_thickness: min (5) >= max (3)
```

**Diagnosis**:
```python
from optimization_engine.validators import validate_config_file

result = validate_config_file("studies/{study_name}/1_setup/optimization_config.json")
for error in result.errors:
    print(error)
```

**Solutions**:
| Error Code | Cause | Fix |
|------------|-------|-----|
| DESIGN_VAR_BOUNDS | Bounds inverted | Swap min/max values |
| MISSING_OBJECTIVES | No objectives defined | Add objectives array |
| INVALID_DIRECTION | Wrong goal value | Use "minimize" or "maximize" |
| PROTOCOL_MISMATCH | Wrong protocol for objectives | Match protocol to # objectives |

### Issue 2: Model Files Missing

**Symptoms**:
```
[ERROR] No part file (.prt) found in model directory
```

**Diagnosis**:
```python
from optimization_engine.validators import validate_study_model

result = validate_study_model("{study_name}")
print(f"Part: {result.prt_file}")
print(f"Sim: {result.sim_file}")
print(f"FEM: {result.fem_file}")
```

**Solutions**:
1. Ensure files are in `studies/{study_name}/1_setup/model/`
2. Check file naming convention (e.g., `Beam.prt`, `Beam_sim1.sim`)
3. FEM file auto-generates on first solve (not required initially)

### Issue 3: NX Solver Fails

**Symptoms**:
```
[NX SOLVER] Error: Simulation timeout after 600s
[NX SOLVER] Error: Unable to open simulation file
```

**Diagnosis**:
1. Check NX is installed and configured:
   ```python
   from config import NX_VERSION, NX_INSTALL_PATH
   print(f"NX Version: {NX_VERSION}")
   print(f"Path: {NX_INSTALL_PATH}")
   ```

2. Check for running NX processes:
   ```bash
   tasklist | findstr "ugraf"
   ```

3. Read NX journal output:
   ```
   studies/{study_name}/1_setup/model/_temp_solve_journal.py
   ```

**Solutions**:
| Error | Cause | Fix |
|-------|-------|-----|
| Timeout | Complex mesh or bad parameters | Increase timeout or simplify design |
| License error | NX license unavailable | Wait or check license server |
| File locked | Another NX process has file open | Close NX and retry |
| Expression not found | NX expression name mismatch | Verify expression names in NX |

### Issue 4: Database Errors

**Symptoms**:
```
[ERROR] Study 'my_study' not found in storage
[ERROR] database is locked
```

**Diagnosis**:
```python
import optuna
storage = "sqlite:///studies/{study_name}/2_results/study.db"
studies = optuna.study.get_all_study_summaries(storage)
print([s.study_name for s in studies])
```

**Solutions**:
| Error | Cause | Fix |
|-------|-------|-----|
| Study not found | Wrong study name | Check exact name in database |
| Database locked | Multiple processes | Kill other optimization processes |
| Corrupted DB | Interrupted write | Delete and restart (backup first) |

### Issue 5: Result Extraction Fails

**Symptoms**:
```
[ERROR] Cannot extract displacement from OP2
[ERROR] NaN values in objectives
```

**Diagnosis**:
1. Check OP2 file exists:
   ```bash
   dir studies\{study_name}\1_setup\model\*.op2
   ```

2. Validate OP2 contents:
   ```python
   from pyNastran.op2.op2 import OP2
   op2 = OP2()
   op2.read_op2("path/to/file.op2")
   print(op2.get_result_table_names())
   ```

3. Check extraction config matches OP2:
   ```json
   {
     "extraction": {
       "params": {
         "subcase": 1,
         "result_type": "displacement"
       }
     }
   }
   ```

**Solutions**:
| Error | Cause | Fix |
|-------|-------|-----|
| No OP2 file | Solve didn't run | Check NX solver output |
| Wrong subcase | Subcase ID mismatch | Match subcase to solution |
| Missing result | Result not requested | Enable output in NX |

### Issue 6: All Trials Infeasible

**Symptoms**:
```
Feasibility rate: 0%
All trials violating constraints
```

**Diagnosis**:
```python
from optimization_engine.validators import validate_results

result = validate_results("studies/{study_name}/2_results/study.db")
print(f"Feasibility: {result.info.feasibility_rate}%")
```

Check constraint violations in Optuna dashboard or:
```python
import optuna
study = optuna.load_study(...)
for trial in study.trials:
    if trial.user_attrs.get('feasible') == False:
        print(f"Trial {trial.number}: {trial.user_attrs.get('violated_constraints')}")
```

**Solutions**:
| Issue | Cause | Fix |
|-------|-------|-----|
| All trials fail constraint | Threshold too tight | Relax constraint threshold |
| Single constraint always fails | Wrong extraction | Check constraint extraction |
| Bounds cause violations | Design space infeasible | Expand design variable bounds |

## Quick Diagnostic Commands

### Validate Everything
```bash
python -m optimization_engine.validators.study_validator {study_name}
```

### Check Results
```bash
python -m optimization_engine.validators.results_validator {study_name}
```

### List All Studies
```bash
python -m optimization_engine.validators.study_validator
```

### Check Optuna Database
```python
import optuna
storage = "sqlite:///studies/{study_name}/2_results/study.db"
study = optuna.load_study(study_name="{study_name}", storage=storage)
print(f"Trials: {len(study.trials)}")
print(f"Completed: {len([t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE])}")
print(f"Failed: {len([t for t in study.trials if t.state == optuna.trial.TrialState.FAIL])}")
```

## Recovery Actions

### Reset Study Database
```python
import optuna
storage = "sqlite:///studies/{study_name}/2_results/study.db"
optuna.delete_study(study_name="{study_name}", storage=storage)
```

Or use the reset script:
```bash
python studies/{study_name}/reset_study.py
```

### Resume Interrupted Study
```bash
python studies/{study_name}/run_optimization.py --trials 30 --resume
```

### Clean Worker Directories
```bash
# Remove temp files from worker dirs
del /S /Q studies\{study_name}\1_setup\worker_*
```

### Backup and Restore Database
```bash
# Backup
copy studies\{study_name}\2_results\study.db studies\{study_name}\2_results\study_backup.db

# Restore
copy studies\{study_name}\2_results\study_backup.db studies\{study_name}\2_results\study.db
```

## Error Message Reference

| Error Message | Category | Quick Fix |
|---------------|----------|-----------|
| "min >= max" | Config | Swap bounds |
| "No part file found" | Model | Add .prt file |
| "Simulation timeout" | Solver | Increase timeout or check NX |
| "database is locked" | Database | Kill other processes |
| "Cannot extract" | Extraction | Check OP2 and extraction config |
| "All trials infeasible" | Constraints | Relax thresholds |

## Integration with Other Skills

- After fixing config -> `/run-optimization` to restart
- After debugging results -> `/generate-report` to analyze
- If model needs changes -> Check `NX_FILE_MODIFICATIONS_REQUIRED.md`

## Example Conversation

```
USER: My optimization is failing with "Cannot extract displacement"

CLAUDE: Let me diagnose the displacement extraction issue.

First, I'll validate the study:

[Runs validate_study()]

The validation shows model files are present. Let me check if the OP2 file exists
and contains displacement data...

[Checks OP2 file]

I found the issue: Your extraction config specifies subcase=1, but the displacement
results are in subcase=2 (the static analysis solution).

Fix: Update optimization_config.json:
```json
{
  "extraction": {
    "params": {
      "subcase": 2  // Was 1, should be 2
    }
  }
}
```

After this change, run again with:
  python run_optimization.py --trials 30 --resume
```

## Notes

- Always run full validation before investigating specific issues
- Most problems are configuration or file path issues
- NX solver errors often require checking NX directly
- Keep backups before modifying study.db
- Use --resume to continue after fixing issues