# Phase 2.5: Intelligent Codebase-Aware Gap Detection

## Problem Statement

The current Research Agent uses dumb keyword matching and doesn't understand what already exists in the Atomizer codebase. When a user asks:

> "I want to evaluate strain on a part with sol101 and optimize this (minimize) using iterations and optuna to lower it varying all my geometry parameters that contains v_ in its expression"

**Current (Wrong) Behavior:**
- Detects keyword "geometry"
- Asks user for geometry examples
- Completely misses the actual request

**Expected (Correct) Behavior:**
```
Analyzing your optimization request...

Workflow Components Identified:
---------------------------------
1. Run SOL101 analysis                    [KNOWN - nx_solver.py]
2. Extract geometry parameters (v_ prefix) [KNOWN - expression system]
3. Update parameter values                 [KNOWN - parameter updater]
4. Optuna optimization loop               [KNOWN - optimization engine]
5. Extract strain from OP2                [MISSING - not implemented]
6. Minimize strain objective              [SIMPLE - max(strain values)]

Knowledge Gap Analysis:
-----------------------
HAVE:  - OP2 displacement extraction (op2_extractor_example.py)
HAVE:  - OP2 stress extraction (op2_extractor_example.py)
MISSING: - OP2 strain extraction

Research Needed:
----------------
Only need to learn: How to extract strain data from Nastran OP2 files using pyNastran

Would you like me to:
1. Search pyNastran documentation for strain extraction
2. Look for strain extraction examples in op2_extractor_example.py pattern
3. Ask you for an example of strain extraction code
```

## Solution Architecture

### 1. Codebase Capability Analyzer

Scan Atomizer to build capability index:

```python
class CodebaseCapabilityAnalyzer:
    """Analyzes what Atomizer can already do."""

    def analyze_codebase(self) -> Dict[str, Any]:
        """
        Returns:
        {
            'optimization': {
                'optuna_integration': True,
                'parameter_updating': True,
                'expression_parsing': True
            },
            'simulation': {
                'nx_solver': True,
                'sol101': True,
                'sol103': False
            },
            'result_extraction': {
                'displacement': True,
                'stress': True,
                'strain': False,  # <-- THE GAP!
                'modal': False
            }
        }
        """
```

### 2. Workflow Decomposer

Break user request into atomic steps:

```python
class WorkflowDecomposer:
    """Breaks complex requests into atomic workflow steps."""

    def decompose(self, user_request: str) -> List[WorkflowStep]:
        """
        Input: "minimize strain using SOL101 and optuna varying v_ params"

        Output:
        [
            WorkflowStep("identify_parameters", domain="geometry", params={"filter": "v_"}),
            WorkflowStep("update_parameters", domain="geometry", params={"values": "from_optuna"}),
            WorkflowStep("run_analysis", domain="simulation", params={"solver": "SOL101"}),
            WorkflowStep("extract_strain", domain="results", params={"metric": "max_strain"}),
            WorkflowStep("optimize", domain="optimization", params={"objective": "minimize", "algorithm": "optuna"})
        ]
        """
```

### 3. Capability Matcher

Match workflow steps to existing capabilities:

```python
class CapabilityMatcher:
    """Matches required workflow steps to existing capabilities."""

    def match(self, workflow_steps, capabilities) -> CapabilityMatch:
        """
        Returns:
        {
            'known_steps': [
                {'step': 'identify_parameters', 'implementation': 'expression_parser.py'},
                {'step': 'update_parameters', 'implementation': 'parameter_updater.py'},
                {'step': 'run_analysis', 'implementation': 'nx_solver.py'},
                {'step': 'optimize', 'implementation': 'optuna_optimizer.py'}
            ],
            'unknown_steps': [
                {'step': 'extract_strain', 'similar_to': 'extract_stress', 'gap': 'strain_from_op2'}
            ],
            'confidence': 0.80  # 4/5 steps known
        }
        """
```

### 4. Targeted Research Planner

Create research plan ONLY for missing pieces:

```python
class TargetedResearchPlanner:
    """Creates research plan focused on actual gaps."""

    def plan(self, unknown_steps) -> ResearchPlan:
        """
        For gap='strain_from_op2', similar_to='stress_from_op2':

        Research Plan:
        1. Read existing op2_extractor_example.py to understand pattern
        2. Search pyNastran docs for strain extraction API
        3. If not found, ask user for strain extraction example
        4. Generate extract_strain() function following same pattern as extract_stress()
        """
```

## Implementation Plan

### Week 1: Capability Analysis
- [X] Map existing Atomizer capabilities
- [X] Build capability index from code
- [X] Create capability query system

### Week 2: Workflow Decomposition
- [X] Build workflow step extractor
- [X] Create domain classifier
- [X] Implement step-to-capability matcher

### Week 3: Intelligent Gap Detection
- [X] Integrate all components
- [X] Test with strain optimization request
- [X] Verify correct gap identification

## Success Criteria

**Test Input:**
"minimize strain using SOL101 and optuna varying v_ parameters"

**Expected Output:**
```
Request Analysis Complete
-------------------------

Known Capabilities (80%):
- Parameter identification (v_ prefix filter)
- Parameter updating
- SOL101 simulation execution
- Optuna optimization loop

Missing Capability (20%):
- Strain extraction from OP2 files

Recommendation:
The only missing piece is extracting strain data from Nastran OP2 output files.
I found a similar implementation for stress extraction in op2_extractor_example.py.

Would you like me to:
1. Research pyNastran strain extraction API
2. Generate extract_max_strain() function following the stress extraction pattern
3. Integrate into your optimization workflow

Research needed: Minimal (1 function, ~50 lines of code)
```

## Benefits

1. **Accurate Gap Detection**: Only identifies actual missing capabilities
2. **Minimal Research**: Focuses effort on real unknowns
3. **Leverages Existing Code**: Understands what you already have
4. **Better UX**: Clear explanation of what's known vs unknown
5. **Faster Iterations**: Doesn't waste time on known capabilities

## Current Status

- [X] Problem identified
- [X] Solution architecture designed
- [X] Implementation completed
- [X] All tests passing

## Implementation Summary

Phase 2.5 has been successfully implemented with 4 core components:

1. **CodebaseCapabilityAnalyzer** ([codebase_analyzer.py](../optimization_engine/codebase_analyzer.py))
   - Scans Atomizer codebase for existing capabilities
   - Identifies what's implemented vs missing
   - Finds similar capabilities for pattern reuse

2. **WorkflowDecomposer** ([workflow_decomposer.py](../optimization_engine/workflow_decomposer.py))
   - Breaks user requests into atomic workflow steps
   - Extracts parameters from natural language
   - Classifies steps by domain

3. **CapabilityMatcher** ([capability_matcher.py](../optimization_engine/capability_matcher.py))
   - Matches workflow steps to existing code
   - Identifies actual knowledge gaps
   - Calculates confidence based on pattern similarity

4. **TargetedResearchPlanner** ([targeted_research_planner.py](../optimization_engine/targeted_research_planner.py))
   - Creates focused research plans
   - Leverages similar capabilities when available
   - Prioritizes research sources

## Test Results

Run the comprehensive test:
```bash
python tests/test_phase_2_5_intelligent_gap_detection.py
```

**Test Output (strain optimization request):**
- Workflow: 5 steps identified
- Known: 4/5 steps (80% coverage)
- Missing: Only strain extraction
- Similar: Can adapt from displacement/stress
- Overall confidence: 90%
- Research plan: 4 focused steps

## Next Steps

1. Integrate Phase 2.5 with existing Research Agent
2. Update interactive session to use new gap detection
3. Test with diverse optimization requests
4. Build MCP integration for documentation search