feat: Complete Phase 2.5-2.7 - Intelligent LLM-Powered Workflow Analysis

This commit implements three major architectural improvements to transform Atomizer from static pattern matching to intelligent AI-powered analysis. ## Phase 2.5: Intelligent Codebase-Aware Gap Detection ✅ Created intelligent system that understands existing capabilities before requesting examples: **New Files:** - optimization_engine/codebase_analyzer.py (379 lines) Scans Atomizer codebase for existing FEA/CAE capabilities - optimization_engine/workflow_decomposer.py (507 lines, v0.2.0) Breaks user requests into atomic workflow steps Complete rewrite with multi-objective, constraints, subcase targeting - optimization_engine/capability_matcher.py (312 lines) Matches workflow steps to existing code implementations - optimization_engine/targeted_research_planner.py (259 lines) Creates focused research plans for only missing capabilities **Results:** - 80-90% coverage on complex optimization requests - 87-93% confidence in capability matching - Fixed expression reading misclassification (geometry vs result_extraction) ## Phase 2.6: Intelligent Step Classification ✅ Distinguishes engineering features from simple math operations: **New Files:** - optimization_engine/step_classifier.py (335 lines) **Classification Types:** 1. Engineering Features - Complex FEA/CAE needing research 2. Inline Calculations - Simple math to auto-generate 3. Post-Processing Hooks - Middleware between FEA steps ## Phase 2.7: LLM-Powered Workflow Intelligence ✅ Replaces static regex patterns with Claude AI analysis: **New Files:** - optimization_engine/llm_workflow_analyzer.py (395 lines) Uses Claude API for intelligent request analysis Supports both Claude Code (dev) and API (production) modes - .claude/skills/analyze-workflow.md Skill template for LLM workflow analysis integration **Key Breakthrough:** - Detects ALL intermediate steps (avg, min, normalization, etc.) - Understands engineering context (CBUSH vs CBAR, directions, metrics) - Distinguishes OP2 extraction from part expression reading - Expected 95%+ accuracy with full nuance detection ## Test Coverage **New Test Files:** - tests/test_phase_2_5_intelligent_gap_detection.py (335 lines) - tests/test_complex_multiobj_request.py (130 lines) - tests/test_cbush_optimization.py (130 lines) - tests/test_cbar_genetic_algorithm.py (150 lines) - tests/test_step_classifier.py (140 lines) - tests/test_llm_complex_request.py (387 lines) All tests include: - UTF-8 encoding for Windows console - atomizer environment (not test_env) - Comprehensive validation checks ## Documentation **New Documentation:** - docs/PHASE_2_5_INTELLIGENT_GAP_DETECTION.md (254 lines) - docs/PHASE_2_7_LLM_INTEGRATION.md (227 lines) - docs/SESSION_SUMMARY_PHASE_2_5_TO_2_7.md (252 lines) **Updated:** - README.md - Added Phase 2.5-2.7 completion status - DEVELOPMENT_ROADMAP.md - Updated phase progress ## Critical Fixes 1. **Expression Reading Misclassification** (lines cited in session summary) - Updated codebase_analyzer.py pattern detection - Fixed workflow_decomposer.py domain classification - Added capability_matcher.py read_expression mapping 2. **Environment Standardization** - All code now uses 'atomizer' conda environment - Removed test_env references throughout 3. **Multi-Objective Support** - WorkflowDecomposer v0.2.0 handles multiple objectives - Constraint extraction and validation - Subcase and direction targeting ## Architecture Evolution **Before (Static & Dumb):** User Request → Regex Patterns → Hardcoded Rules → Missed Steps ❌ **After (LLM-Powered & Intelligent):** User Request → Claude AI Analysis → Structured JSON → ├─ Engineering (research needed) ├─ Inline (auto-generate Python) ├─ Hooks (middleware scripts) └─ Optimization (config) ✅ ## LLM Integration Strategy **Development Mode (Current):** - Use Claude Code directly for interactive analysis - No API consumption or costs - Perfect for iterative development **Production Mode (Future):** - Optional Anthropic API integration - Falls back to heuristics if no API key - For standalone batch processing ## Next Steps - Phase 2.8: Inline Code Generation - Phase 2.9: Post-Processing Hook Generation - Phase 3: MCP Integration for automated documentation research 🚀 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 13:35:41 -05:00
parent 986285d9cf
commit 0a7cca9c6a
94 changed files with 12761 additions and 10670 deletions
--- a/optimization_engine/codebase_analyzer.py
+++ b/optimization_engine/codebase_analyzer.py
@@ -0,0 +1,415 @@
+"""
+Codebase Capability Analyzer
+
+Scans the Atomizer codebase to build a capability index showing what features
+are already implemented. This enables intelligent gap detection.
+
+Author: Atomizer Development Team
+Version: 0.1.0 (Phase 2.5)
+Last Updated: 2025-01-16
+"""
+
+import ast
+import re
+from pathlib import Path
+from typing import Dict, List, Set, Any, Optional
+from dataclasses import dataclass
+
+
+@dataclass
+class CodeCapability:
+    """Represents a discovered capability in the codebase."""
+    name: str
+    category: str
+    file_path: Path
+    confidence: float
+    details: Dict[str, Any]
+
+
+class CodebaseCapabilityAnalyzer:
+    """Analyzes the Atomizer codebase to identify existing capabilities."""
+
+    def __init__(self, project_root: Optional[Path] = None):
+        if project_root is None:
+            # Auto-detect project root
+            current = Path(__file__).resolve()
+            while current.parent != current:
+                if (current / 'optimization_engine').exists():
+                    project_root = current
+                    break
+                current = current.parent
+
+        self.project_root = project_root
+        self.capabilities: Dict[str, Dict[str, Any]] = {}
+
+    def analyze_codebase(self) -> Dict[str, Any]:
+        """
+        Analyze the entire codebase and build capability index.
+
+        Returns:
+            {
+                'optimization': {
+                    'optuna_integration': True,
+                    'parameter_updating': True,
+                    'expression_parsing': True
+                },
+                'simulation': {
+                    'nx_solver': True,
+                    'sol101': True,
+                    'sol103': False
+                },
+                'result_extraction': {
+                    'displacement': True,
+                    'stress': True,
+                    'strain': False
+                },
+                'geometry': {
+                    'parameter_extraction': True,
+                    'expression_filtering': True
+                },
+                'materials': {
+                    'xml_generation': True
+                }
+            }
+        """
+        capabilities = {
+            'optimization': {},
+            'simulation': {},
+            'result_extraction': {},
+            'geometry': {},
+            'materials': {},
+            'loads_bc': {},
+            'mesh': {},
+            'reporting': {}
+        }
+
+        # Analyze optimization capabilities
+        capabilities['optimization'] = self._analyze_optimization()
+
+        # Analyze simulation capabilities
+        capabilities['simulation'] = self._analyze_simulation()
+
+        # Analyze result extraction capabilities
+        capabilities['result_extraction'] = self._analyze_result_extraction()
+
+        # Analyze geometry capabilities
+        capabilities['geometry'] = self._analyze_geometry()
+
+        # Analyze material capabilities
+        capabilities['materials'] = self._analyze_materials()
+
+        self.capabilities = capabilities
+        return capabilities
+
+    def _analyze_optimization(self) -> Dict[str, bool]:
+        """Analyze optimization-related capabilities."""
+        capabilities = {
+            'optuna_integration': False,
+            'parameter_updating': False,
+            'expression_parsing': False,
+            'history_tracking': False
+        }
+
+        # Check for Optuna integration
+        optuna_files = list(self.project_root.glob('optimization_engine/*optuna*.py'))
+        if optuna_files or self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'import\s+optuna|from\s+optuna'
+        ):
+            capabilities['optuna_integration'] = True
+
+        # Check for parameter updating
+        if self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'def\s+update_parameter|class\s+\w*Parameter\w*Updater'
+        ):
+            capabilities['parameter_updating'] = True
+
+        # Check for expression parsing
+        if self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'def\s+parse_expression|def\s+extract.*expression'
+        ):
+            capabilities['expression_parsing'] = True
+
+        # Check for history tracking
+        if self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'class\s+\w*History|def\s+track_history'
+        ):
+            capabilities['history_tracking'] = True
+
+        return capabilities
+
+    def _analyze_simulation(self) -> Dict[str, bool]:
+        """Analyze simulation-related capabilities."""
+        capabilities = {
+            'nx_solver': False,
+            'sol101': False,
+            'sol103': False,
+            'sol106': False,
+            'journal_execution': False
+        }
+
+        # Check for NX solver integration
+        nx_solver_file = self.project_root / 'optimization_engine' / 'nx_solver.py'
+        if nx_solver_file.exists():
+            capabilities['nx_solver'] = True
+            content = nx_solver_file.read_text(encoding='utf-8')
+
+            # Check for specific solution types
+            if 'sol101' in content.lower() or 'SOL101' in content:
+                capabilities['sol101'] = True
+            if 'sol103' in content.lower() or 'SOL103' in content:
+                capabilities['sol103'] = True
+            if 'sol106' in content.lower() or 'SOL106' in content:
+                capabilities['sol106'] = True
+
+        # Check for journal execution
+        if self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'def\s+run.*journal|def\s+execute.*journal'
+        ):
+            capabilities['journal_execution'] = True
+
+        return capabilities
+
+    def _analyze_result_extraction(self) -> Dict[str, bool]:
+        """Analyze result extraction capabilities."""
+        capabilities = {
+            'displacement': False,
+            'stress': False,
+            'strain': False,
+            'modal': False,
+            'temperature': False
+        }
+
+        # Check result extractors directory
+        extractors_dir = self.project_root / 'optimization_engine' / 'result_extractors'
+        if extractors_dir.exists():
+            # Look for OP2 extraction capabilities
+            for py_file in extractors_dir.glob('*.py'):
+                content = py_file.read_text(encoding='utf-8')
+
+                # Check for displacement extraction
+                if re.search(r'displacement|displacements', content, re.IGNORECASE):
+                    capabilities['displacement'] = True
+
+                # Check for stress extraction
+                if re.search(r'stress|von_mises', content, re.IGNORECASE):
+                    capabilities['stress'] = True
+
+                # Check for strain extraction
+                if re.search(r'strain|strains', content, re.IGNORECASE):
+                    # Need to verify it's actual extraction, not just a comment
+                    if re.search(r'def\s+\w*extract.*strain|strain.*=.*op2', content, re.IGNORECASE):
+                        capabilities['strain'] = True
+
+                # Check for modal extraction
+                if re.search(r'modal|mode_shape|eigenvalue', content, re.IGNORECASE):
+                    capabilities['modal'] = True
+
+                # Check for temperature extraction
+                if re.search(r'temperature|thermal', content, re.IGNORECASE):
+                    capabilities['temperature'] = True
+
+        return capabilities
+
+    def _analyze_geometry(self) -> Dict[str, bool]:
+        """Analyze geometry-related capabilities."""
+        capabilities = {
+            'parameter_extraction': False,
+            'expression_filtering': False,
+            'feature_creation': False
+        }
+
+        # Check for parameter extraction (including expression reading/finding)
+        if self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'def\s+extract.*parameter|def\s+get.*parameter|def\s+find.*expression|def\s+read.*expression|def\s+get.*expression'
+        ):
+            capabilities['parameter_extraction'] = True
+
+        # Check for expression filtering (v_ prefix)
+        if self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'v_|filter.*expression|contains.*v_'
+        ):
+            capabilities['expression_filtering'] = True
+
+        # Check for feature creation
+        if self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'def\s+create.*feature|def\s+add.*feature'
+        ):
+            capabilities['feature_creation'] = True
+
+        return capabilities
+
+    def _analyze_materials(self) -> Dict[str, bool]:
+        """Analyze material-related capabilities."""
+        capabilities = {
+            'xml_generation': False,
+            'material_assignment': False
+        }
+
+        # Check for material XML generation
+        material_files = list(self.project_root.glob('optimization_engine/custom_functions/*material*.py'))
+        if material_files:
+            capabilities['xml_generation'] = True
+
+        # Check for material assignment
+        if self._file_contains_pattern(
+            self.project_root / 'optimization_engine',
+            r'def\s+assign.*material|def\s+set.*material'
+        ):
+            capabilities['material_assignment'] = True
+
+        return capabilities
+
+    def _file_contains_pattern(self, directory: Path, pattern: str) -> bool:
+        """Check if any Python file in directory contains the regex pattern."""
+        if not directory.exists():
+            return False
+
+        for py_file in directory.rglob('*.py'):
+            try:
+                content = py_file.read_text(encoding='utf-8')
+                if re.search(pattern, content):
+                    return True
+            except Exception:
+                continue
+
+        return False
+
+    def get_capability_details(self, category: str, capability: str) -> Optional[Dict[str, Any]]:
+        """Get detailed information about a specific capability."""
+        if category not in self.capabilities:
+            return None
+
+        if capability not in self.capabilities[category]:
+            return None
+
+        if not self.capabilities[category][capability]:
+            return None
+
+        # Find the file that implements this capability
+        details = {
+            'exists': True,
+            'category': category,
+            'name': capability,
+            'implementation_files': []
+        }
+
+        # Search for implementation files based on category
+        search_patterns = {
+            'optimization': ['optuna', 'parameter', 'expression'],
+            'simulation': ['nx_solver', 'journal'],
+            'result_extraction': ['op2', 'extractor', 'result'],
+            'geometry': ['parameter', 'expression', 'geometry'],
+            'materials': ['material', 'xml']
+        }
+
+        if category in search_patterns:
+            for pattern in search_patterns[category]:
+                for py_file in (self.project_root / 'optimization_engine').rglob(f'*{pattern}*.py'):
+                    if py_file.is_file():
+                        details['implementation_files'].append(str(py_file.relative_to(self.project_root)))
+
+        return details
+
+    def find_similar_capabilities(self, missing_capability: str, category: str) -> List[str]:
+        """Find existing capabilities similar to the missing one."""
+        if category not in self.capabilities:
+            return []
+
+        similar = []
+
+        # Special case: for result_extraction, all extraction types are similar
+        # because they use the same OP2 extraction pattern
+        if category == 'result_extraction':
+            for capability, exists in self.capabilities[category].items():
+                if exists and capability != missing_capability:
+                    similar.append(capability)
+            return similar
+
+        # Simple similarity: check if words overlap
+        missing_words = set(missing_capability.lower().split('_'))
+
+        for capability, exists in self.capabilities[category].items():
+            if not exists:
+                continue
+
+            capability_words = set(capability.lower().split('_'))
+
+            # If there's word overlap, consider it similar
+            if missing_words & capability_words:
+                similar.append(capability)
+
+        return similar
+
+    def get_summary(self) -> str:
+        """Get a human-readable summary of capabilities."""
+        if not self.capabilities:
+            self.analyze_codebase()
+
+        lines = ["Atomizer Codebase Capabilities Summary", "=" * 50, ""]
+
+        for category, caps in self.capabilities.items():
+            if not caps:
+                continue
+
+            existing = [name for name, exists in caps.items() if exists]
+            missing = [name for name, exists in caps.items() if not exists]
+
+            if existing:
+                lines.append(f"{category.upper()}:")
+                lines.append(f"  Implemented ({len(existing)}):")
+                for cap in existing:
+                    lines.append(f"    - {cap}")
+
+                if missing:
+                    lines.append(f"  Not Found ({len(missing)}):")
+                    for cap in missing:
+                        lines.append(f"    - {cap}")
+                lines.append("")
+
+        return "\n".join(lines)
+
+
+def main():
+    """Test the codebase analyzer."""
+    analyzer = CodebaseCapabilityAnalyzer()
+
+    print("Analyzing Atomizer codebase...")
+    print("=" * 80)
+
+    capabilities = analyzer.analyze_codebase()
+
+    print("\nCapabilities Found:")
+    print("-" * 80)
+    print(analyzer.get_summary())
+
+    print("\nDetailed Check: Result Extraction")
+    print("-" * 80)
+    for capability, exists in capabilities['result_extraction'].items():
+        status = "FOUND" if exists else "MISSING"
+        print(f"  {capability:20s} : {status}")
+
+        if exists:
+            details = analyzer.get_capability_details('result_extraction', capability)
+            if details and details.get('implementation_files'):
+                print(f"    Files: {', '.join(details['implementation_files'][:2])}")
+
+    print("\nSimilar to 'strain':")
+    print("-" * 80)
+    similar = analyzer.find_similar_capabilities('strain', 'result_extraction')
+    if similar:
+        for cap in similar:
+            print(f"  - {cap} (could be used as pattern)")
+    else:
+        print("  No similar capabilities found")
+
+
+if __name__ == '__main__':
+    main()