feat: Add TrialManager and DashboardDB for unified trial management

- Add TrialManager (trial_manager.py) for consistent trial_NNNN naming
- Add DashboardDB (dashboard_db.py) for Optuna-compatible database schema
- Update CLAUDE.md with trial management documentation
- Update ATOMIZER_CONTEXT.md with v1.8 trial system
- Update cheatsheet v2.2 with new utilities
- Update SYS_14 protocol to v2.3 with TrialManager integration
- Add LAC learnings for trial management patterns
- Add archive/README.md for deprecated code policy

Key principles:
- Trial numbers NEVER reset (monotonic)
- Folders NEVER get overwritten
- Database always synced with filesystem
- Surrogate predictions are NOT trials (only FEA results)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-28 12:20:19 -05:00
parent f13563d7ab
commit cf454f6e40
10 changed files with 1402 additions and 9 deletions

View File

@@ -676,10 +676,287 @@ optimization_engine/
---
## Self-Improving Turbo Optimization
### Overview
The **Self-Improving Turbo** pattern combines MLP surrogate exploration with iterative FEA validation and surrogate retraining. This creates a closed-loop optimization where the surrogate continuously improves from its own mistakes.
### Workflow
```
INITIALIZE:
- Load pre-trained surrogate (from prior FEA data)
- Load previous FEA params for diversity checking
REPEAT until converged or FEA budget exhausted:
1. SURROGATE EXPLORE (~1 min)
├─ Run 5000 Optuna TPE trials with surrogate
├─ Quantize predictions to machining precision
└─ Find diverse top candidates
2. SELECT DIVERSE CANDIDATES
├─ Sort by weighted sum
├─ Select top 5 that are:
│ ├─ At least 15% different from each other
│ └─ At least 7.5% different from ALL previous FEA
└─ Ensures exploration, not just exploitation
3. FEA VALIDATE (~25 min for 5 candidates)
├─ For each candidate:
│ ├─ Create iteration folder
│ ├─ Update NX expressions
│ ├─ Run Nastran solver
│ ├─ Extract objectives (ZernikeOPD or other)
│ └─ Log prediction error
└─ Add results to training data
4. RETRAIN SURROGATE (~2 min)
├─ Combine all FEA samples
├─ Retrain MLP for 100 epochs
├─ Save new checkpoint
└─ Reload improved model
5. CHECK CONVERGENCE
├─ Track best feasible objective
├─ If improved: reset patience counter
└─ If no improvement for 3 iterations: STOP
```
### Configuration Example
```json
{
"turbo_settings": {
"surrogate_trials_per_iteration": 5000,
"fea_validations_per_iteration": 5,
"max_fea_validations": 100,
"max_iterations": 30,
"convergence_patience": 3,
"retrain_frequency": "every_iteration",
"min_samples_for_retrain": 20
}
}
```
### Key Parameters
| Parameter | Typical Value | Description |
|-----------|---------------|-------------|
| `surrogate_trials_per_iteration` | 5000 | NN trials per iteration |
| `fea_validations_per_iteration` | 5 | FEA runs per iteration |
| `max_fea_validations` | 100 | Total FEA budget |
| `convergence_patience` | 3 | Stop after N no-improvement iterations |
| `MIN_CANDIDATE_DISTANCE` | 0.15 | 15% of param range for diversity |
### Example Results (M1 Mirror Turbo V1)
| Metric | Value |
|--------|-------|
| FEA Validations | 45 |
| Best WS Found | 282.05 |
| Baseline (V11) | 284.19 |
| Improvement | 0.75% |
---
## Dashboard Integration for Neural Studies
### Problem
Neural surrogate studies generate thousands of NN-only trials that would overwhelm the dashboard. Only FEA-validated trials should be visible.
### Solution: Separate Optuna Study
Log FEA validation results to a separate Optuna study that the dashboard can read:
```python
import optuna
# Create Optuna study for dashboard visibility
optuna_db_path = RESULTS_DIR / "study.db"
optuna_storage = f"sqlite:///{optuna_db_path}"
optuna_study = optuna.create_study(
study_name=study_name,
storage=optuna_storage,
direction="minimize",
load_if_exists=True,
)
# After each FEA validation:
trial = optuna_study.ask()
# Set parameters (using suggest_float with fixed bounds)
for var_name, var_val in result['params'].items():
trial.suggest_float(var_name, var_val, var_val)
# Set objectives as user attributes
for obj_name, obj_val in result['objectives'].items():
trial.set_user_attr(obj_name, obj_val)
# Log iteration metadata
trial.set_user_attr('turbo_iteration', turbo_iter)
trial.set_user_attr('prediction_error', abs(actual_ws - predicted_ws))
trial.set_user_attr('is_feasible', is_feasible)
# Report the objective value
optuna_study.tell(trial, result['weighted_sum'])
```
### File Structure
```
3_results/
├── study.db # Optuna format (for dashboard)
├── study_custom.db # Custom SQLite (detailed turbo data)
├── checkpoints/
│ └── best_model.pt # Surrogate model
├── turbo_logs/ # Per-iteration JSON logs
└── best_design_archive/ # Archived best designs
```
### Backfilling Existing Data
If you have existing turbo runs without Optuna logging, use the backfill script:
```python
# scripts/backfill_optuna.py
import optuna
import sqlite3
import json
# Read from custom database
conn = sqlite3.connect('study_custom.db')
c.execute('''
SELECT iter_num, turbo_iteration, weighted_sum, surrogate_predicted_ws,
params, objectives, is_feasible
FROM trials ORDER BY iter_num
''')
# Create Optuna study
study = optuna.create_study(...)
# Backfill each trial
for row in rows:
trial = study.ask()
params = json.loads(row['params']) # Stored as JSON
objectives = json.loads(row['objectives'])
for name, val in params.items():
trial.suggest_float(name, float(val), float(val))
for name, val in objectives.items():
trial.set_user_attr(name, float(val))
study.tell(trial, row['weighted_sum'])
```
### Dashboard View
After integration, the dashboard shows:
- Only FEA-validated trials (not NN-only)
- Objective convergence over FEA iterations
- Parameter distributions from validated designs
- Prediction error trends (via user attributes)
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 2.3 | 2025-12-28 | Added TrialManager, DashboardDB, proper trial_NNNN naming |
| 2.2 | 2025-12-24 | Added Self-Improving Turbo and Dashboard Integration sections |
| 2.1 | 2025-12-10 | Added Zernike GNN section for mirror optimization |
| 2.0 | 2025-12-06 | Added MLP Surrogate with Turbo Mode |
| 1.0 | 2025-12-05 | Initial consolidation from neural docs |
---
## New Trial Management System (v2.3)
### Overview
The new trial management system provides:
1. **Consistent trial naming**: `trial_NNNN/` folders (zero-padded, never reused)
2. **Dashboard compatibility**: Optuna-compatible SQLite schema
3. **Clear separation**: Surrogate predictions are ephemeral, only FEA results are trials
### Key Components
| Component | File | Purpose |
|-----------|------|---------|
| `TrialManager` | `optimization_engine/utils/trial_manager.py` | Trial folder + DB management |
| `DashboardDB` | `optimization_engine/utils/dashboard_db.py` | Optuna-compatible database ops |
### Usage Pattern
```python
from optimization_engine.utils.trial_manager import TrialManager
# Initialize
tm = TrialManager(study_dir, "my_study")
# Start trial (creates folder, reserves DB row)
trial = tm.new_trial(
params={'rib_thickness': 10.5},
source="turbo",
metadata={'turbo_batch': 1, 'predicted_ws': 186.77}
)
# Run FEA...
# Complete trial (logs to DB)
tm.complete_trial(
trial_number=trial['trial_number'],
objectives={'wfe_40_20': 5.63, 'mass_kg': 118.67},
weighted_sum=175.87,
is_feasible=True,
metadata={'solve_time': 211.7}
)
```
### Trial Folder Structure
```
2_iterations/
├── trial_0001/
│ ├── params.json # Input parameters
│ ├── params.exp # NX expression format
│ ├── results.json # Output objectives
│ ├── _meta.json # Full metadata (source, timestamps, predictions)
│ └── *.op2, *.fem... # FEA files
├── trial_0002/
└── ...
```
### Database Schema
The `DashboardDB` class creates Optuna-compatible tables:
| Table | Purpose |
|-------|---------|
| `studies` | Study metadata |
| `trials` | Trial info with `state`, `number`, `study_id` |
| `trial_values` | Objective values |
| `trial_params` | Parameter values |
| `trial_user_attributes` | Custom metadata (turbo_batch, predicted_ws, etc.) |
### Converting Legacy Databases
```python
from optimization_engine.utils.dashboard_db import convert_custom_to_optuna
# Convert custom schema to Optuna format
convert_custom_to_optuna(
db_path="3_results/study.db",
study_name="my_study"
)
```
### Key Principles
1. **Surrogate predictions are NOT trials** - only FEA-validated results are logged
2. **Trial numbers never reset** - monotonically increasing across all runs
3. **Folders never overwritten** - each trial gets a unique `trial_NNNN/` directory
4. **Metadata preserved** - predictions stored for accuracy analysis