feat: Add TrialManager and DashboardDB for unified trial management
- Add TrialManager (trial_manager.py) for consistent trial_NNNN naming - Add DashboardDB (dashboard_db.py) for Optuna-compatible database schema - Update CLAUDE.md with trial management documentation - Update ATOMIZER_CONTEXT.md with v1.8 trial system - Update cheatsheet v2.2 with new utilities - Update SYS_14 protocol to v2.3 with TrialManager integration - Add LAC learnings for trial management patterns - Add archive/README.md for deprecated code policy Key principles: - Trial numbers NEVER reset (monotonic) - Folders NEVER get overwritten - Database always synced with filesystem - Surrogate predictions are NOT trials (only FEA results) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -676,10 +676,287 @@ optimization_engine/
|
||||
|
||||
---
|
||||
|
||||
## Self-Improving Turbo Optimization
|
||||
|
||||
### Overview
|
||||
|
||||
The **Self-Improving Turbo** pattern combines MLP surrogate exploration with iterative FEA validation and surrogate retraining. This creates a closed-loop optimization where the surrogate continuously improves from its own mistakes.
|
||||
|
||||
### Workflow
|
||||
|
||||
```
|
||||
INITIALIZE:
|
||||
- Load pre-trained surrogate (from prior FEA data)
|
||||
- Load previous FEA params for diversity checking
|
||||
|
||||
REPEAT until converged or FEA budget exhausted:
|
||||
|
||||
1. SURROGATE EXPLORE (~1 min)
|
||||
├─ Run 5000 Optuna TPE trials with surrogate
|
||||
├─ Quantize predictions to machining precision
|
||||
└─ Find diverse top candidates
|
||||
|
||||
2. SELECT DIVERSE CANDIDATES
|
||||
├─ Sort by weighted sum
|
||||
├─ Select top 5 that are:
|
||||
│ ├─ At least 15% different from each other
|
||||
│ └─ At least 7.5% different from ALL previous FEA
|
||||
└─ Ensures exploration, not just exploitation
|
||||
|
||||
3. FEA VALIDATE (~25 min for 5 candidates)
|
||||
├─ For each candidate:
|
||||
│ ├─ Create iteration folder
|
||||
│ ├─ Update NX expressions
|
||||
│ ├─ Run Nastran solver
|
||||
│ ├─ Extract objectives (ZernikeOPD or other)
|
||||
│ └─ Log prediction error
|
||||
└─ Add results to training data
|
||||
|
||||
4. RETRAIN SURROGATE (~2 min)
|
||||
├─ Combine all FEA samples
|
||||
├─ Retrain MLP for 100 epochs
|
||||
├─ Save new checkpoint
|
||||
└─ Reload improved model
|
||||
|
||||
5. CHECK CONVERGENCE
|
||||
├─ Track best feasible objective
|
||||
├─ If improved: reset patience counter
|
||||
└─ If no improvement for 3 iterations: STOP
|
||||
```
|
||||
|
||||
### Configuration Example
|
||||
|
||||
```json
|
||||
{
|
||||
"turbo_settings": {
|
||||
"surrogate_trials_per_iteration": 5000,
|
||||
"fea_validations_per_iteration": 5,
|
||||
"max_fea_validations": 100,
|
||||
"max_iterations": 30,
|
||||
"convergence_patience": 3,
|
||||
"retrain_frequency": "every_iteration",
|
||||
"min_samples_for_retrain": 20
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Key Parameters
|
||||
|
||||
| Parameter | Typical Value | Description |
|
||||
|-----------|---------------|-------------|
|
||||
| `surrogate_trials_per_iteration` | 5000 | NN trials per iteration |
|
||||
| `fea_validations_per_iteration` | 5 | FEA runs per iteration |
|
||||
| `max_fea_validations` | 100 | Total FEA budget |
|
||||
| `convergence_patience` | 3 | Stop after N no-improvement iterations |
|
||||
| `MIN_CANDIDATE_DISTANCE` | 0.15 | 15% of param range for diversity |
|
||||
|
||||
### Example Results (M1 Mirror Turbo V1)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| FEA Validations | 45 |
|
||||
| Best WS Found | 282.05 |
|
||||
| Baseline (V11) | 284.19 |
|
||||
| Improvement | 0.75% |
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Integration for Neural Studies
|
||||
|
||||
### Problem
|
||||
|
||||
Neural surrogate studies generate thousands of NN-only trials that would overwhelm the dashboard. Only FEA-validated trials should be visible.
|
||||
|
||||
### Solution: Separate Optuna Study
|
||||
|
||||
Log FEA validation results to a separate Optuna study that the dashboard can read:
|
||||
|
||||
```python
|
||||
import optuna
|
||||
|
||||
# Create Optuna study for dashboard visibility
|
||||
optuna_db_path = RESULTS_DIR / "study.db"
|
||||
optuna_storage = f"sqlite:///{optuna_db_path}"
|
||||
optuna_study = optuna.create_study(
|
||||
study_name=study_name,
|
||||
storage=optuna_storage,
|
||||
direction="minimize",
|
||||
load_if_exists=True,
|
||||
)
|
||||
|
||||
# After each FEA validation:
|
||||
trial = optuna_study.ask()
|
||||
|
||||
# Set parameters (using suggest_float with fixed bounds)
|
||||
for var_name, var_val in result['params'].items():
|
||||
trial.suggest_float(var_name, var_val, var_val)
|
||||
|
||||
# Set objectives as user attributes
|
||||
for obj_name, obj_val in result['objectives'].items():
|
||||
trial.set_user_attr(obj_name, obj_val)
|
||||
|
||||
# Log iteration metadata
|
||||
trial.set_user_attr('turbo_iteration', turbo_iter)
|
||||
trial.set_user_attr('prediction_error', abs(actual_ws - predicted_ws))
|
||||
trial.set_user_attr('is_feasible', is_feasible)
|
||||
|
||||
# Report the objective value
|
||||
optuna_study.tell(trial, result['weighted_sum'])
|
||||
```
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
3_results/
|
||||
├── study.db # Optuna format (for dashboard)
|
||||
├── study_custom.db # Custom SQLite (detailed turbo data)
|
||||
├── checkpoints/
|
||||
│ └── best_model.pt # Surrogate model
|
||||
├── turbo_logs/ # Per-iteration JSON logs
|
||||
└── best_design_archive/ # Archived best designs
|
||||
```
|
||||
|
||||
### Backfilling Existing Data
|
||||
|
||||
If you have existing turbo runs without Optuna logging, use the backfill script:
|
||||
|
||||
```python
|
||||
# scripts/backfill_optuna.py
|
||||
import optuna
|
||||
import sqlite3
|
||||
import json
|
||||
|
||||
# Read from custom database
|
||||
conn = sqlite3.connect('study_custom.db')
|
||||
c.execute('''
|
||||
SELECT iter_num, turbo_iteration, weighted_sum, surrogate_predicted_ws,
|
||||
params, objectives, is_feasible
|
||||
FROM trials ORDER BY iter_num
|
||||
''')
|
||||
|
||||
# Create Optuna study
|
||||
study = optuna.create_study(...)
|
||||
|
||||
# Backfill each trial
|
||||
for row in rows:
|
||||
trial = study.ask()
|
||||
params = json.loads(row['params']) # Stored as JSON
|
||||
objectives = json.loads(row['objectives'])
|
||||
|
||||
for name, val in params.items():
|
||||
trial.suggest_float(name, float(val), float(val))
|
||||
for name, val in objectives.items():
|
||||
trial.set_user_attr(name, float(val))
|
||||
|
||||
study.tell(trial, row['weighted_sum'])
|
||||
```
|
||||
|
||||
### Dashboard View
|
||||
|
||||
After integration, the dashboard shows:
|
||||
- Only FEA-validated trials (not NN-only)
|
||||
- Objective convergence over FEA iterations
|
||||
- Parameter distributions from validated designs
|
||||
- Prediction error trends (via user attributes)
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 2.3 | 2025-12-28 | Added TrialManager, DashboardDB, proper trial_NNNN naming |
|
||||
| 2.2 | 2025-12-24 | Added Self-Improving Turbo and Dashboard Integration sections |
|
||||
| 2.1 | 2025-12-10 | Added Zernike GNN section for mirror optimization |
|
||||
| 2.0 | 2025-12-06 | Added MLP Surrogate with Turbo Mode |
|
||||
| 1.0 | 2025-12-05 | Initial consolidation from neural docs |
|
||||
|
||||
---
|
||||
|
||||
## New Trial Management System (v2.3)
|
||||
|
||||
### Overview
|
||||
|
||||
The new trial management system provides:
|
||||
1. **Consistent trial naming**: `trial_NNNN/` folders (zero-padded, never reused)
|
||||
2. **Dashboard compatibility**: Optuna-compatible SQLite schema
|
||||
3. **Clear separation**: Surrogate predictions are ephemeral, only FEA results are trials
|
||||
|
||||
### Key Components
|
||||
|
||||
| Component | File | Purpose |
|
||||
|-----------|------|---------|
|
||||
| `TrialManager` | `optimization_engine/utils/trial_manager.py` | Trial folder + DB management |
|
||||
| `DashboardDB` | `optimization_engine/utils/dashboard_db.py` | Optuna-compatible database ops |
|
||||
|
||||
### Usage Pattern
|
||||
|
||||
```python
|
||||
from optimization_engine.utils.trial_manager import TrialManager
|
||||
|
||||
# Initialize
|
||||
tm = TrialManager(study_dir, "my_study")
|
||||
|
||||
# Start trial (creates folder, reserves DB row)
|
||||
trial = tm.new_trial(
|
||||
params={'rib_thickness': 10.5},
|
||||
source="turbo",
|
||||
metadata={'turbo_batch': 1, 'predicted_ws': 186.77}
|
||||
)
|
||||
|
||||
# Run FEA...
|
||||
|
||||
# Complete trial (logs to DB)
|
||||
tm.complete_trial(
|
||||
trial_number=trial['trial_number'],
|
||||
objectives={'wfe_40_20': 5.63, 'mass_kg': 118.67},
|
||||
weighted_sum=175.87,
|
||||
is_feasible=True,
|
||||
metadata={'solve_time': 211.7}
|
||||
)
|
||||
```
|
||||
|
||||
### Trial Folder Structure
|
||||
|
||||
```
|
||||
2_iterations/
|
||||
├── trial_0001/
|
||||
│ ├── params.json # Input parameters
|
||||
│ ├── params.exp # NX expression format
|
||||
│ ├── results.json # Output objectives
|
||||
│ ├── _meta.json # Full metadata (source, timestamps, predictions)
|
||||
│ └── *.op2, *.fem... # FEA files
|
||||
├── trial_0002/
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Database Schema
|
||||
|
||||
The `DashboardDB` class creates Optuna-compatible tables:
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `studies` | Study metadata |
|
||||
| `trials` | Trial info with `state`, `number`, `study_id` |
|
||||
| `trial_values` | Objective values |
|
||||
| `trial_params` | Parameter values |
|
||||
| `trial_user_attributes` | Custom metadata (turbo_batch, predicted_ws, etc.) |
|
||||
|
||||
### Converting Legacy Databases
|
||||
|
||||
```python
|
||||
from optimization_engine.utils.dashboard_db import convert_custom_to_optuna
|
||||
|
||||
# Convert custom schema to Optuna format
|
||||
convert_custom_to_optuna(
|
||||
db_path="3_results/study.db",
|
||||
study_name="my_study"
|
||||
)
|
||||
```
|
||||
|
||||
### Key Principles
|
||||
|
||||
1. **Surrogate predictions are NOT trials** - only FEA-validated results are logged
|
||||
2. **Trial numbers never reset** - monotonically increasing across all runs
|
||||
3. **Folders never overwritten** - each trial gets a unique `trial_NNNN/` directory
|
||||
4. **Metadata preserved** - predictions stored for accuracy analysis
|
||||
|
||||
Reference in New Issue
Block a user