feat: Add NN Quality Assessor with relative accuracy thresholds

The Method Selector now uses relative accuracy thresholds to assess NN suitability by comparing NN error to problem variability (CV ratio). NNQualityAssessor features: - Physics-based objective classification (linear, smooth, nonlinear, chaotic) - CV ratio computation: nn_error / coefficient_of_variation - Turbo suitability score based on relative thresholds - Data collection from validation_report.json, turbo_report.json, and study.db Quality thresholds by objective type: - Linear (mass, volume): max 2% error, CV ratio < 0.5 - Smooth (frequency): max 5% error, CV ratio < 1.0 - Nonlinear (stress, stiffness): max 10% error, CV ratio < 2.0 - Chaotic (contact, buckling): max 20% error, CV ratio < 3.0 CLI output now includes: - Per-objective NN quality table with error, CV, ratio, and quality indicator - Turbo suitability and hybrid suitability percentages - Warnings when NN error exceeds physics-based thresholds Updated SYS_15_METHOD_SELECTOR.md to v2.0 with full NN Quality Assessment documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 06:38:25 -05:00
parent 3e9488d9f0
commit 6cf12d9344
2 changed files with 583 additions and 53 deletions
--- a/docs/protocols/system/SYS_15_METHOD_SELECTOR.md
+++ b/docs/protocols/system/SYS_15_METHOD_SELECTOR.md
@@ -3,9 +3,9 @@
 <!--
 PROTOCOL: Adaptive Method Selector
 LAYER: System
-VERSION: 1.0
+VERSION: 2.0
 STATUS: Active
-LAST_UPDATED: 2025-12-06
+LAST_UPDATED: 2025-12-07
 PRIVILEGE: user
 LOAD_WITH: [SYS_10_IMSO, SYS_11_MULTI_OBJECTIVE, SYS_14_NEURAL_ACCELERATION]
 -->
@@ -16,9 +16,10 @@ The **Adaptive Method Selector (AMS)** analyzes optimization problems and recomm

 1. **Static Analysis**: Problem characteristics from config (dimensionality, objectives, constraints)
 2. **Dynamic Analysis**: Early FEA trial metrics (smoothness, correlations, feasibility)
-3. **Runtime Monitoring**: Continuous optimization performance assessment
+3. **NN Quality Assessment**: Relative accuracy thresholds comparing NN error to problem variability
+4. **Runtime Monitoring**: Continuous optimization performance assessment

-**Key Value**: Eliminates guesswork in choosing optimization strategies by providing data-driven recommendations.
+**Key Value**: Eliminates guesswork in choosing optimization strategies by providing data-driven recommendations with relative accuracy thresholds.

 ---

@@ -100,36 +101,90 @@ print(recommendation.alternatives) # Other methods with scores

 ---

+## NN Quality Assessment
+
+The method selector uses **relative accuracy thresholds** to assess NN suitability. Instead of absolute error limits, it compares NN error to the problem's natural variability (coefficient of variation).
+
+### Core Concept
+
+```
+NN Suitability = f(nn_error / coefficient_of_variation)
+
+If nn_error >> CV → NN is unreliable (not learning, just noise)
+If nn_error ≈ CV → NN captures the trend (hybrid recommended)
+If nn_error << CV → NN is excellent (turbo viable)
+```
+
+### Physics-Based Classification
+
+Objectives are classified by their expected predictability:
+
+| Objective Type | Examples | Max Expected Error | CV Ratio Limit |
+|----------------|----------|-------------------|----------------|
+| **Linear** | mass, volume | 2% | 0.5 |
+| **Smooth** | frequency, avg stress | 5% | 1.0 |
+| **Nonlinear** | max stress, stiffness | 10% | 2.0 |
+| **Chaotic** | contact, buckling | 20% | 3.0 |
+
+### CV Ratio Interpretation
+
+The **CV Ratio** = NN Error / (Coefficient of Variation × 100):
+
+| CV Ratio | Quality | Interpretation |
+|----------|---------|----------------|
+| < 0.5 | ✓ Great | NN captures physics much better than noise |
+| 0.5 - 1.0 | ✓ Good | NN adds significant value for exploration |
+| 1.0 - 2.0 | ~ OK | NN is marginal, use with validation |
+| > 2.0 | ✗ Poor | NN not learning effectively, use FEA |
+
+### Method Recommendations Based on Quality
+
+| Turbo Suitability | Hybrid Suitability | Recommendation |
+|-------------------|--------------------|-----------------------|
+| > 80% | any | **TURBO** - trust NN fully |
+| 50-80% | > 50% | **TURBO** with monitoring |
+| < 50% | > 50% | **HYBRID_LOOP** - verify periodically |
+| < 30% | < 50% | **PURE_FEA** or retrain first |
+
+### Data Sources
+
+NN quality metrics are collected from:
+1. `validation_report.json` - FEA validation results
+2. `turbo_report.json` - Turbo mode validation history
+3. `study.db` - Trial `nn_error_percent` user attributes
+
+---
+
 ## Architecture

 ```
-┌───────────────────────────────────────────────────────────────────┐
-│                     AdaptiveMethodSelector                         │
-├───────────────────────────────────────────────────────────────────┤
-│                                                                    │
-│  ┌──────────────────┐    ┌──────────────────────┐                 │
-│  │ ProblemProfiler  │    │ EarlyMetricsCollector │                 │
-│  │ (static analysis)│    │ (dynamic analysis)    │                 │
-│  └────────┬─────────┘    └──────────┬────────────┘                 │
-│           │                         │                              │
-│           ▼                         ▼                              │
-│  ┌────────────────────────────────────────────────┐               │
-│  │               _score_methods()                  │               │
-│  │  (rule-based scoring with weighted factors)    │               │
-│  └──────────────────────┬─────────────────────────┘               │
-│                         │                                          │
-│                         ▼                                          │
-│  ┌────────────────────────────────────────────────┐               │
-│  │          MethodRecommendation                   │               │
-│  │  method, confidence, parameters, reasoning      │               │
-│  └────────────────────────────────────────────────┘               │
-│                                                                    │
-│  ┌──────────────────┐                                             │
-│  │  RuntimeAdvisor  │ ← Monitors during optimization               │
-│  │  (pivot advisor) │                                             │
-│  └──────────────────┘                                             │
-│                                                                    │
-└───────────────────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────────────┐
+│                        AdaptiveMethodSelector                            │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                          │
+│  ┌─────────────────┐  ┌────────────────────┐  ┌───────────────────┐     │
+│  │ ProblemProfiler │  │EarlyMetricsCollector│  │ NNQualityAssessor │     │
+│  │(static analysis)│  │ (dynamic analysis) │  │ (NN accuracy)     │     │
+│  └───────┬─────────┘  └─────────┬──────────┘  └─────────┬─────────┘     │
+│          │                      │                       │                │
+│          ▼                      ▼                       ▼                │
+│  ┌─────────────────────────────────────────────────────────────────┐    │
+│  │                       _score_methods()                           │    │
+│  │     (rule-based scoring with static + dynamic + NN factors)     │    │
+│  └───────────────────────────────┬─────────────────────────────────┘    │
+│                                  │                                       │
+│                                  ▼                                       │
+│  ┌─────────────────────────────────────────────────────────────────┐    │
+│  │                    MethodRecommendation                          │    │
+│  │       method, confidence, parameters, reasoning, warnings        │    │
+│  └─────────────────────────────────────────────────────────────────┘    │
+│                                                                          │
+│  ┌──────────────────┐                                                   │
+│  │  RuntimeAdvisor  │ ← Monitors during optimization                     │
+│  │  (pivot advisor) │                                                   │
+│  └──────────────────┘                                                   │
+│                                                                          │
+└─────────────────────────────────────────────────────────────────────────┘
 ```

 ---
@@ -173,16 +228,37 @@ class EarlyMetrics:
    variable_sensitivity: Dict[str, float]
 ```

-### 3. AdaptiveMethodSelector
+### 3. NNQualityAssessor

-Main entry point that combines static + dynamic analysis:
+Assesses NN surrogate quality relative to problem complexity:
+
+```python
+@dataclass
+class NNQualityMetrics:
+    has_nn_data: bool = False
+    n_validations: int = 0
+    nn_errors: Dict[str, float]           # Absolute % error per objective
+    cv_ratios: Dict[str, float]           # nn_error / (CV * 100) per objective
+    expected_errors: Dict[str, float]      # Physics-based threshold
+    overall_quality: float                 # 0-1, based on absolute thresholds
+    turbo_suitability: float              # 0-1, based on CV ratios
+    hybrid_suitability: float             # 0-1, more lenient threshold
+    objective_types: Dict[str, str]        # 'linear', 'smooth', 'nonlinear', 'chaotic'
+```
+
+### 4. AdaptiveMethodSelector
+
+Main entry point that combines static + dynamic + NN quality analysis:

 ```python
 selector = AdaptiveMethodSelector(min_trials=20)
-recommendation = selector.recommend(config_path, db_path)
+recommendation = selector.recommend(config_path, db_path, results_dir=results_dir)
+
+# Access last NN quality for display
+print(f"Turbo suitability: {selector.last_nn_quality.turbo_suitability:.0%}")
 ```

-### 4. RuntimeAdvisor
+### 5. RuntimeAdvisor

 Monitors optimization progress and suggests pivots:

@@ -210,11 +286,24 @@ Problem Profile:
  Constraints: 1
  Max FEA budget: ~72 trials

+NN Quality Assessment:
+  Validations analyzed: 10
+
+  | Objective     | NN Error | CV     | Ratio | Type       | Quality |
+  |---------------|----------|--------|-------|------------|---------|
+  | mass          |    3.7% |  16.0% |  0.23 | linear     | ✓ Great |
+  | stress        |    2.0% |   7.7% |  0.26 | nonlinear  | ✓ Great |
+  | stiffness     |    7.8% |  38.9% |  0.20 | nonlinear  | ✓ Great |
+
+  Overall Quality: 22%
+  Turbo Suitability: 77%
+  Hybrid Suitability: 88%
+
 ----------------------------------------------------------------------

  RECOMMENDED: TURBO
  Confidence: 100%
-  Reason: low-dimensional design space; sufficient FEA budget; smooth landscape (79%)
+  Reason: low-dimensional design space; sufficient FEA budget; smooth landscape (79%); good NN quality (77%)

  Suggested parameters:
    --nn-trials: 5000
@@ -223,9 +312,12 @@ Problem Profile:
    --epochs: 150

  Alternatives:
-    - hybrid_loop (75%): uncertain landscape - hybrid adapts; adequate budget for iterations
+    - hybrid_loop (90%): uncertain landscape - hybrid adapts; NN adds value with periodic retraining
    - pure_fea (50%): default recommendation

+  Warnings:
+    ! mass: NN error (3.7%) above expected (2%) - consider retraining or using hybrid mode
+
 ======================================================================
 ```

@@ -312,8 +404,11 @@ optimization_engine/
 └── method_selector.py    # Complete AMS implementation
    ├── ProblemProfiler          # Static config analysis
    ├── EarlyMetricsCollector    # Dynamic FEA metrics
+    ├── NNQualityMetrics         # NN accuracy dataclass
+    ├── NNQualityAssessor        # Relative accuracy assessment
    ├── AdaptiveMethodSelector   # Main recommendation engine
    ├── RuntimeAdvisor           # Mid-run pivot advisor
+    ├── print_recommendation()   # CLI output with NN quality table
    └── recommend_method()       # Convenience function
 ```

@@ -323,4 +418,5 @@ optimization_engine/

 | Version | Date | Changes |
 |---------|------|---------|
+| 2.0 | 2025-12-07 | Added NNQualityAssessor with relative accuracy thresholds |
 | 1.0 | 2025-12-06 | Initial implementation with 4 methods |