feat: Add NN Quality Assessor with relative accuracy thresholds

The Method Selector now uses relative accuracy thresholds to assess
NN suitability by comparing NN error to problem variability (CV ratio).

NNQualityAssessor features:
- Physics-based objective classification (linear, smooth, nonlinear, chaotic)
- CV ratio computation: nn_error / coefficient_of_variation
- Turbo suitability score based on relative thresholds
- Data collection from validation_report.json, turbo_report.json, and study.db

Quality thresholds by objective type:
- Linear (mass, volume): max 2% error, CV ratio < 0.5
- Smooth (frequency): max 5% error, CV ratio < 1.0
- Nonlinear (stress, stiffness): max 10% error, CV ratio < 2.0
- Chaotic (contact, buckling): max 20% error, CV ratio < 3.0

CLI output now includes:
- Per-objective NN quality table with error, CV, ratio, and quality indicator
- Turbo suitability and hybrid suitability percentages
- Warnings when NN error exceeds physics-based thresholds

Updated SYS_15_METHOD_SELECTOR.md to v2.0 with full NN Quality Assessment documentation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Antoine
2025-12-07 06:38:25 -05:00
parent 3e9488d9f0
commit 6cf12d9344
2 changed files with 583 additions and 53 deletions

View File

@@ -3,9 +3,9 @@
<!--
PROTOCOL: Adaptive Method Selector
LAYER: System
VERSION: 1.0
VERSION: 2.0
STATUS: Active
LAST_UPDATED: 2025-12-06
LAST_UPDATED: 2025-12-07
PRIVILEGE: user
LOAD_WITH: [SYS_10_IMSO, SYS_11_MULTI_OBJECTIVE, SYS_14_NEURAL_ACCELERATION]
-->
@@ -16,9 +16,10 @@ The **Adaptive Method Selector (AMS)** analyzes optimization problems and recomm
1. **Static Analysis**: Problem characteristics from config (dimensionality, objectives, constraints)
2. **Dynamic Analysis**: Early FEA trial metrics (smoothness, correlations, feasibility)
3. **Runtime Monitoring**: Continuous optimization performance assessment
3. **NN Quality Assessment**: Relative accuracy thresholds comparing NN error to problem variability
4. **Runtime Monitoring**: Continuous optimization performance assessment
**Key Value**: Eliminates guesswork in choosing optimization strategies by providing data-driven recommendations.
**Key Value**: Eliminates guesswork in choosing optimization strategies by providing data-driven recommendations with relative accuracy thresholds.
---
@@ -100,36 +101,90 @@ print(recommendation.alternatives) # Other methods with scores
---
## NN Quality Assessment
The method selector uses **relative accuracy thresholds** to assess NN suitability. Instead of absolute error limits, it compares NN error to the problem's natural variability (coefficient of variation).
### Core Concept
```
NN Suitability = f(nn_error / coefficient_of_variation)
If nn_error >> CV → NN is unreliable (not learning, just noise)
If nn_error ≈ CV → NN captures the trend (hybrid recommended)
If nn_error << CV → NN is excellent (turbo viable)
```
### Physics-Based Classification
Objectives are classified by their expected predictability:
| Objective Type | Examples | Max Expected Error | CV Ratio Limit |
|----------------|----------|-------------------|----------------|
| **Linear** | mass, volume | 2% | 0.5 |
| **Smooth** | frequency, avg stress | 5% | 1.0 |
| **Nonlinear** | max stress, stiffness | 10% | 2.0 |
| **Chaotic** | contact, buckling | 20% | 3.0 |
### CV Ratio Interpretation
The **CV Ratio** = NN Error / (Coefficient of Variation × 100):
| CV Ratio | Quality | Interpretation |
|----------|---------|----------------|
| < 0.5 | ✓ Great | NN captures physics much better than noise |
| 0.5 - 1.0 | ✓ Good | NN adds significant value for exploration |
| 1.0 - 2.0 | ~ OK | NN is marginal, use with validation |
| > 2.0 | ✗ Poor | NN not learning effectively, use FEA |
### Method Recommendations Based on Quality
| Turbo Suitability | Hybrid Suitability | Recommendation |
|-------------------|--------------------|-----------------------|
| > 80% | any | **TURBO** - trust NN fully |
| 50-80% | > 50% | **TURBO** with monitoring |
| < 50% | > 50% | **HYBRID_LOOP** - verify periodically |
| < 30% | < 50% | **PURE_FEA** or retrain first |
### Data Sources
NN quality metrics are collected from:
1. `validation_report.json` - FEA validation results
2. `turbo_report.json` - Turbo mode validation history
3. `study.db` - Trial `nn_error_percent` user attributes
---
## Architecture
```
┌───────────────────────────────────────────────────────────────────┐
│ AdaptiveMethodSelector │
├───────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────────
│ │ ProblemProfiler │ EarlyMetricsCollector
│ │ (static analysis)│ │ (dynamic analysis)
│ └───────┬─────────┘ └──────────┬────────────┘
│ │
▼ │
│ ┌────────────────────────────────────────────────
│ │ _score_methods()
│ │ (rule-based scoring with weighted factors) │
│ └───────────────────────────────────────────────┘
│ ┌────────────────────────────────────────────────
│ │ MethodRecommendation
│ │ method, confidence, parameters, reasoning
│ └────────────────────────────────────────────────
│ │
│ ┌──────────────────┐ │
│ │ RuntimeAdvisor │ ← Monitors during optimization │
│ │ (pivot advisor) │ │
│ └──────────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────
AdaptiveMethodSelector
├─────────────────────────────────────────────────────────────────────────
│ ┌───────────────── ┌────────────────────┐ ┌───────────────────┐
│ │ ProblemProfiler │ EarlyMetricsCollector│ │ NNQualityAssessor │
│ │(static analysis)│ │ (dynamic analysis) (NN accuracy) │
│ └───────┬─────────┘ └─────────┬──────────┘ └─────────┬─────────┘
│ ┌─────────────────────────────────────────────────────────────────┐
│ │ _score_methods()
│ │ (rule-based scoring with static + dynamic + NN factors) │ │
│ └───────────────────────────────┬─────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────────────┐
│ │ MethodRecommendation
│ │ method, confidence, parameters, reasoning, warnings
│ └─────────────────────────────────────────────────────────────────┘
│ ┌──────────────────┐
│ │ RuntimeAdvisor │ ← Monitors during optimization
│ │ (pivot advisor) │
│ └──────────────────┘
└─────────────────────────────────────────────────────────────────────────
```
---
@@ -173,16 +228,37 @@ class EarlyMetrics:
variable_sensitivity: Dict[str, float]
```
### 3. AdaptiveMethodSelector
### 3. NNQualityAssessor
Main entry point that combines static + dynamic analysis:
Assesses NN surrogate quality relative to problem complexity:
```python
@dataclass
class NNQualityMetrics:
has_nn_data: bool = False
n_validations: int = 0
nn_errors: Dict[str, float] # Absolute % error per objective
cv_ratios: Dict[str, float] # nn_error / (CV * 100) per objective
expected_errors: Dict[str, float] # Physics-based threshold
overall_quality: float # 0-1, based on absolute thresholds
turbo_suitability: float # 0-1, based on CV ratios
hybrid_suitability: float # 0-1, more lenient threshold
objective_types: Dict[str, str] # 'linear', 'smooth', 'nonlinear', 'chaotic'
```
### 4. AdaptiveMethodSelector
Main entry point that combines static + dynamic + NN quality analysis:
```python
selector = AdaptiveMethodSelector(min_trials=20)
recommendation = selector.recommend(config_path, db_path)
recommendation = selector.recommend(config_path, db_path, results_dir=results_dir)
# Access last NN quality for display
print(f"Turbo suitability: {selector.last_nn_quality.turbo_suitability:.0%}")
```
### 4. RuntimeAdvisor
### 5. RuntimeAdvisor
Monitors optimization progress and suggests pivots:
@@ -210,11 +286,24 @@ Problem Profile:
Constraints: 1
Max FEA budget: ~72 trials
NN Quality Assessment:
Validations analyzed: 10
| Objective | NN Error | CV | Ratio | Type | Quality |
|---------------|----------|--------|-------|------------|---------|
| mass | 3.7% | 16.0% | 0.23 | linear | ✓ Great |
| stress | 2.0% | 7.7% | 0.26 | nonlinear | ✓ Great |
| stiffness | 7.8% | 38.9% | 0.20 | nonlinear | ✓ Great |
Overall Quality: 22%
Turbo Suitability: 77%
Hybrid Suitability: 88%
----------------------------------------------------------------------
RECOMMENDED: TURBO
Confidence: 100%
Reason: low-dimensional design space; sufficient FEA budget; smooth landscape (79%)
Reason: low-dimensional design space; sufficient FEA budget; smooth landscape (79%); good NN quality (77%)
Suggested parameters:
--nn-trials: 5000
@@ -223,9 +312,12 @@ Problem Profile:
--epochs: 150
Alternatives:
- hybrid_loop (75%): uncertain landscape - hybrid adapts; adequate budget for iterations
- hybrid_loop (90%): uncertain landscape - hybrid adapts; NN adds value with periodic retraining
- pure_fea (50%): default recommendation
Warnings:
! mass: NN error (3.7%) above expected (2%) - consider retraining or using hybrid mode
======================================================================
```
@@ -312,8 +404,11 @@ optimization_engine/
└── method_selector.py # Complete AMS implementation
├── ProblemProfiler # Static config analysis
├── EarlyMetricsCollector # Dynamic FEA metrics
├── NNQualityMetrics # NN accuracy dataclass
├── NNQualityAssessor # Relative accuracy assessment
├── AdaptiveMethodSelector # Main recommendation engine
├── RuntimeAdvisor # Mid-run pivot advisor
├── print_recommendation() # CLI output with NN quality table
└── recommend_method() # Convenience function
```
@@ -323,4 +418,5 @@ optimization_engine/
| Version | Date | Changes |
|---------|------|---------|
| 2.0 | 2025-12-07 | Added NNQualityAssessor with relative accuracy thresholds |
| 1.0 | 2025-12-06 | Initial implementation with 4 methods |