# Phase 3.2: LLM Integration Roadmap **Status**: ✅ **WEEK 1 COMPLETE** - 🎯 **Week 2 IN PROGRESS** **Timeline**: 2-4 weeks **Last Updated**: 2025-11-17 **Current Progress**: 25% (Week 1/4 Complete) --- ## Executive Summary ### The Problem We've built 85% of an LLM-native optimization system, but **it's not integrated into production**. The components exist but are disconnected islands: - ✅ **LLMWorkflowAnalyzer** - Parses natural language → workflow (Phase 2.7) - ✅ **ExtractorOrchestrator** - Auto-generates result extractors (Phase 3.1) - ✅ **InlineCodeGenerator** - Creates custom calculations (Phase 2.8) - ✅ **HookGenerator** - Generates post-processing hooks (Phase 2.9) - ✅ **LLMOptimizationRunner** - Orchestrates LLM workflow (Phase 3.2) - ⚠️ **ResearchAgent** - Learns from examples (Phase 2, partially complete) **Reality**: Users still write 100+ lines of JSON config manually instead of using 3 lines of natural language. ### The Solution **Phase 3.2 Integration Sprint**: Wire LLM components into production workflow with a single `--llm` flag. --- ## Strategic Roadmap ### Week 1: Make LLM Mode Accessible (16 hours) **Goal**: Users can invoke LLM mode with a single command #### Tasks **1.1 Create Unified Entry Point** (4 hours) ✅ COMPLETE - [x] Create `optimization_engine/run_optimization.py` as unified CLI - [x] Add `--llm` flag for natural language mode - [x] Add `--request` parameter for natural language input - [x] Preserve existing `--config` for traditional JSON mode - [x] Support both modes in parallel (no breaking changes) **Files**: - `optimization_engine/run_optimization.py` (NEW) **Success Metric**: ```bash python optimization_engine/run_optimization.py --llm \ --request "Minimize stress for bracket. Vary wall thickness 3-8mm" \ --prt studies/bracket/model/Bracket.prt \ --sim studies/bracket/model/Bracket_sim1.sim ``` --- **1.2 Wire LLMOptimizationRunner to Production** (8 hours) ✅ COMPLETE - [x] Connect LLMWorkflowAnalyzer to entry point - [x] Bridge LLMOptimizationRunner → OptimizationRunner for execution - [x] Pass model updater and simulation runner callables - [x] Integrate with existing hook system - [x] Preserve all logging (detailed logs, optimization.log) - [x] Add workflow validation and error handling - [x] Create comprehensive integration test suite (5/5 tests passing) **Files Modified**: - `optimization_engine/run_optimization.py` - `optimization_engine/llm_optimization_runner.py` (integration points) **Success Metric**: LLM workflow generates extractors → runs FEA → logs results --- **1.3 Create Minimal Example** (2 hours) ✅ COMPLETE - [x] Create `examples/llm_mode_simple_example.py` - [x] Show: Natural language request → Optimization results - [x] Compare: Traditional mode (100 lines JSON) vs LLM mode (3 lines) - [x] Include troubleshooting tips **Files Created**: - `examples/llm_mode_simple_example.py` **Success Metric**: Example runs successfully, demonstrates value ✅ --- **1.4 End-to-End Integration Test** (2 hours) ✅ COMPLETE - [x] Test with simple_beam_optimization study - [x] Natural language → JSON workflow → NX solve → Results - [x] Verify all extractors generated correctly - [x] Check logs created properly - [x] Validate output matches manual mode - [x] Test graceful failure without API key - [x] Comprehensive verification of all output files **Files Created**: - `tests/test_phase_3_2_e2e.py` **Success Metric**: LLM mode completes beam optimization without errors ✅ --- ### Week 2: Robustness & Safety (16 hours) **Goal**: LLM mode handles failures gracefully, never crashes #### Tasks **2.1 Code Validation Pipeline** (6 hours) - [ ] Create `optimization_engine/code_validator.py` - [ ] Implement syntax validation (ast.parse) - [ ] Implement security scanning (whitelist imports) - [ ] Implement test execution on example OP2 - [ ] Implement output schema validation - [ ] Add retry with LLM feedback on validation failure **Files Created**: - `optimization_engine/code_validator.py` **Integration Points**: - `optimization_engine/extractor_orchestrator.py` (validate before saving) - `optimization_engine/inline_code_generator.py` (validate calculations) **Success Metric**: Generated code passes validation, or LLM fixes based on feedback --- **2.2 Graceful Fallback Mechanisms** (4 hours) - [ ] Wrap all LLM calls in try/except - [ ] Provide clear error messages - [ ] Offer fallback to manual mode - [ ] Log failures to audit trail - [ ] Never crash on LLM failure **Files Modified**: - `optimization_engine/run_optimization.py` - `optimization_engine/llm_workflow_analyzer.py` - `optimization_engine/llm_optimization_runner.py` **Success Metric**: LLM failures degrade gracefully to manual mode --- **2.3 LLM Audit Trail** (3 hours) - [ ] Create `optimization_engine/llm_audit.py` - [ ] Log all LLM requests and responses - [ ] Log generated code with prompts - [ ] Log validation results - [ ] Create `llm_audit.json` in study output directory **Files Created**: - `optimization_engine/llm_audit.py` **Integration Points**: - All LLM components log to audit trail **Success Metric**: Full LLM decision trace available for debugging --- **2.4 Failure Scenario Testing** (3 hours) - [ ] Test: Invalid natural language request - [ ] Test: LLM unavailable (API down) - [ ] Test: Generated code has syntax error - [ ] Test: Generated code fails validation - [ ] Test: OP2 file format unexpected - [ ] Verify all fail gracefully **Files Created**: - `tests/test_llm_failure_modes.py` **Success Metric**: All failure scenarios handled without crashes --- ### Week 3: Learning System (12 hours) **Goal**: System learns from successful workflows and reuses patterns #### Tasks **3.1 Knowledge Base Implementation** (4 hours) - [ ] Create `optimization_engine/knowledge_base.py` - [ ] Implement `save_session()` - Save successful workflows - [ ] Implement `search_templates()` - Find similar past workflows - [ ] Implement `get_template()` - Retrieve reusable pattern - [ ] Add confidence scoring (user-validated > LLM-generated) **Files Created**: - `optimization_engine/knowledge_base.py` - `knowledge_base/sessions/` (directory for session logs) - `knowledge_base/templates/` (directory for reusable patterns) **Success Metric**: Successful workflows saved with metadata --- **3.2 Template Extraction** (4 hours) - [ ] Analyze generated extractor code to identify patterns - [ ] Extract reusable template structure - [ ] Parameterize variable parts - [ ] Save template with usage examples - [ ] Implement template application to new requests **Files Modified**: - `optimization_engine/extractor_orchestrator.py` **Integration**: ```python # After successful generation: template = extract_template(generated_code) knowledge_base.save_template(feature_name, template, confidence='medium') # On next request: existing_template = knowledge_base.search_templates(feature_name) if existing_template and existing_template.confidence > 0.7: code = existing_template.apply(new_params) # Reuse! ``` **Success Metric**: Second identical request reuses template (faster) --- **3.3 ResearchAgent Integration** (4 hours) - [ ] Complete ResearchAgent implementation - [ ] Integrate into ExtractorOrchestrator error handling - [ ] Add user example collection workflow - [ ] Implement pattern learning from examples - [ ] Save learned knowledge to knowledge base **Files Modified**: - `optimization_engine/research_agent.py` (complete implementation) - `optimization_engine/llm_optimization_runner.py` (integrate ResearchAgent) **Workflow**: ``` Unknown feature requested → ResearchAgent asks user for example → Learns pattern from example → Generates feature using pattern → Saves to knowledge base → Retry with new feature ``` **Success Metric**: Unknown feature request triggers learning loop successfully --- ### Week 4: Documentation & Discoverability (8 hours) **Goal**: Users discover and understand LLM capabilities #### Tasks **4.1 Update README** (2 hours) - [ ] Add "🤖 LLM-Powered Mode" section to README.md - [ ] Show example command with natural language - [ ] Explain what LLM mode can do - [ ] Link to detailed docs **Files Modified**: - `README.md` **Success Metric**: README clearly shows LLM capabilities upfront --- **4.2 Create LLM Mode Documentation** (3 hours) - [ ] Create `docs/LLM_MODE.md` - [ ] Explain how LLM mode works - [ ] Provide usage examples - [ ] Document when to use LLM vs manual mode - [ ] Add troubleshooting guide - [ ] Explain learning system **Files Created**: - `docs/LLM_MODE.md` **Contents**: - How it works (architecture diagram) - Getting started (first LLM optimization) - Natural language patterns that work well - Troubleshooting common issues - How learning system improves over time **Success Metric**: Users understand LLM mode from docs --- **4.3 Create Demo Video/GIF** (1 hour) - [ ] Record terminal session: Natural language → Results - [ ] Show before/after (100 lines JSON vs 3 lines) - [ ] Create animated GIF for README - [ ] Add to documentation **Files Created**: - `docs/demo/llm_mode_demo.gif` **Success Metric**: Visual demo shows value proposition clearly --- **4.4 Update All Planning Docs** (2 hours) - [ ] Update DEVELOPMENT.md with Phase 3.2 completion status - [ ] Update DEVELOPMENT_GUIDANCE.md progress (80-90% → 90-95%) - [ ] Update DEVELOPMENT_ROADMAP.md Phase 3 status - [ ] Mark Phase 3.2 as ✅ Complete **Files Modified**: - `DEVELOPMENT.md` - `DEVELOPMENT_GUIDANCE.md` - `DEVELOPMENT_ROADMAP.md` **Success Metric**: All docs reflect completed Phase 3.2 --- ## Implementation Details ### Entry Point Architecture ```python # optimization_engine/run_optimization.py (NEW) import argparse from pathlib import Path def main(): parser = argparse.ArgumentParser( description="Atomizer Optimization Engine - Manual or LLM-powered mode" ) # Mode selection mode_group = parser.add_mutually_exclusive_group(required=True) mode_group.add_argument('--llm', action='store_true', help='Use LLM-assisted workflow (natural language mode)') mode_group.add_argument('--config', type=Path, help='JSON config file (traditional mode)') # LLM mode parameters parser.add_argument('--request', type=str, help='Natural language optimization request (required with --llm)') # Common parameters parser.add_argument('--prt', type=Path, required=True, help='Path to .prt file') parser.add_argument('--sim', type=Path, required=True, help='Path to .sim file') parser.add_argument('--output', type=Path, help='Output directory (default: auto-generated)') parser.add_argument('--trials', type=int, default=50, help='Number of optimization trials') args = parser.parse_args() if args.llm: run_llm_mode(args) else: run_traditional_mode(args) def run_llm_mode(args): """LLM-powered natural language mode.""" from optimization_engine.llm_workflow_analyzer import LLMWorkflowAnalyzer from optimization_engine.llm_optimization_runner import LLMOptimizationRunner from optimization_engine.nx_updater import NXParameterUpdater from optimization_engine.nx_solver import NXSolver from optimization_engine.llm_audit import LLMAuditLogger if not args.request: raise ValueError("--request required with --llm mode") print(f"🤖 LLM Mode: Analyzing request...") print(f" Request: {args.request}") # Initialize audit logger audit_logger = LLMAuditLogger(args.output / "llm_audit.json") # Analyze natural language request analyzer = LLMWorkflowAnalyzer(use_claude_code=True) try: workflow = analyzer.analyze_request(args.request) audit_logger.log_analysis(args.request, workflow, reasoning=workflow.get('llm_reasoning', '')) print(f"✓ Workflow created:") print(f" - Design variables: {len(workflow['design_variables'])}") print(f" - Objectives: {len(workflow['objectives'])}") print(f" - Extractors: {len(workflow['engineering_features'])}") except Exception as e: print(f"✗ LLM analysis failed: {e}") print(" Falling back to manual mode. Please provide --config instead.") return # Create model updater and solver callables updater = NXParameterUpdater(args.prt) solver = NXSolver() def model_updater(design_vars): updater.update_expressions(design_vars) def simulation_runner(): result = solver.run_simulation(args.sim) return result['op2_file'] # Run LLM-powered optimization runner = LLMOptimizationRunner( llm_workflow=workflow, model_updater=model_updater, simulation_runner=simulation_runner, study_name=args.output.name if args.output else "llm_optimization", output_dir=args.output ) study = runner.run(n_trials=args.trials) print(f"\n✓ Optimization complete!") print(f" Best trial: {study.best_trial.number}") print(f" Best value: {study.best_value:.6f}") print(f" Results: {args.output}") def run_traditional_mode(args): """Traditional JSON configuration mode.""" from optimization_engine.runner import OptimizationRunner import json print(f"📄 Traditional Mode: Loading config...") with open(args.config) as f: config = json.load(f) runner = OptimizationRunner( config_file=args.config, prt_file=args.prt, sim_file=args.sim, output_dir=args.output ) study = runner.run(n_trials=args.trials) print(f"\n✓ Optimization complete!") print(f" Results: {args.output}") if __name__ == '__main__': main() ``` --- ### Validation Pipeline ```python # optimization_engine/code_validator.py (NEW) import ast import subprocess import tempfile from pathlib import Path from typing import Dict, Any, List class CodeValidator: """ Validates LLM-generated code before execution. Checks: 1. Syntax (ast.parse) 2. Security (whitelist imports) 3. Test execution on example data 4. Output schema validation """ ALLOWED_IMPORTS = { 'pyNastran', 'numpy', 'pathlib', 'typing', 'dataclasses', 'json', 'sys', 'os', 'math', 'collections' } FORBIDDEN_CALLS = { 'eval', 'exec', 'compile', '__import__', 'open', 'subprocess', 'os.system', 'os.popen' } def validate_extractor(self, code: str, test_op2_file: Path) -> Dict[str, Any]: """ Validate generated extractor code. Args: code: Generated Python code test_op2_file: Example OP2 file for testing Returns: { 'valid': bool, 'error': str (if invalid), 'test_result': dict (if valid) } """ # 1. Syntax check try: tree = ast.parse(code) except SyntaxError as e: return { 'valid': False, 'error': f'Syntax error: {e}', 'stage': 'syntax' } # 2. Security scan security_result = self._check_security(tree) if not security_result['safe']: return { 'valid': False, 'error': security_result['error'], 'stage': 'security' } # 3. Test execution try: test_result = self._test_execution(code, test_op2_file) except Exception as e: return { 'valid': False, 'error': f'Runtime error: {e}', 'stage': 'execution' } # 4. Output schema validation schema_result = self._validate_output_schema(test_result) if not schema_result['valid']: return { 'valid': False, 'error': schema_result['error'], 'stage': 'schema' } return { 'valid': True, 'test_result': test_result } def _check_security(self, tree: ast.AST) -> Dict[str, Any]: """Check for dangerous imports and function calls.""" for node in ast.walk(tree): # Check imports if isinstance(node, ast.Import): for alias in node.names: module = alias.name.split('.')[0] if module not in self.ALLOWED_IMPORTS: return { 'safe': False, 'error': f'Disallowed import: {alias.name}' } # Check function calls if isinstance(node, ast.Call): if isinstance(node.func, ast.Name): if node.func.id in self.FORBIDDEN_CALLS: return { 'safe': False, 'error': f'Forbidden function call: {node.func.id}' } return {'safe': True} def _test_execution(self, code: str, test_file: Path) -> Dict[str, Any]: """Execute code in sandboxed environment with test data.""" # Write code to temp file with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f: f.write(code) temp_code_file = Path(f.name) try: # Execute in subprocess (sandboxed) result = subprocess.run( ['python', str(temp_code_file), str(test_file)], capture_output=True, text=True, timeout=30 ) if result.returncode != 0: raise RuntimeError(f"Execution failed: {result.stderr}") # Parse JSON output import json output = json.loads(result.stdout) return output finally: temp_code_file.unlink() def _validate_output_schema(self, output: Dict[str, Any]) -> Dict[str, Any]: """Validate output matches expected extractor schema.""" # All extractors must return dict with numeric values if not isinstance(output, dict): return { 'valid': False, 'error': 'Output must be a dictionary' } # Check for at least one result value if not any(key for key in output if not key.startswith('_')): return { 'valid': False, 'error': 'No result values found in output' } # All values must be numeric for key, value in output.items(): if not key.startswith('_'): # Skip metadata if not isinstance(value, (int, float)): return { 'valid': False, 'error': f'Non-numeric value for {key}: {type(value)}' } return {'valid': True} ``` --- ## Success Metrics ### Week 1 Success - [ ] LLM mode accessible via `--llm` flag - [ ] Natural language request → Workflow generation works - [ ] End-to-end test passes (simple_beam_optimization) - [ ] Example demonstrates value (100 lines → 3 lines) ### Week 2 Success - [ ] Generated code validated before execution - [ ] All failure scenarios degrade gracefully (no crashes) - [ ] Complete LLM audit trail in `llm_audit.json` - [ ] Test suite covers failure modes ### Week 3 Success - [ ] Successful workflows saved to knowledge base - [ ] Second identical request reuses template (faster) - [ ] Unknown features trigger ResearchAgent learning loop - [ ] Knowledge base grows over time ### Week 4 Success - [ ] README shows LLM mode prominently - [ ] docs/LLM_MODE.md complete and clear - [ ] Demo video/GIF shows value proposition - [ ] All planning docs updated --- ## Risk Mitigation ### Risk: LLM generates unsafe code **Mitigation**: Multi-stage validation pipeline (syntax, security, test, schema) ### Risk: LLM unavailable (API down) **Mitigation**: Graceful fallback to manual mode with clear error message ### Risk: Generated code fails at runtime **Mitigation**: Sandboxed test execution before saving, retry with LLM feedback ### Risk: Users don't discover LLM mode **Mitigation**: Prominent README section, demo video, clear examples ### Risk: Learning system fills disk with templates **Mitigation**: Confidence-based pruning, max template limit, user confirmation for saves --- ## Next Steps After Phase 3.2 Once integration is complete: 1. **Validate with Real Studies** - Run simple_beam_optimization in LLM mode - Create new study using only natural language - Compare results manual vs LLM mode 2. **Fix atomizer Conda Environment** - Rebuild clean environment - Test visualization in atomizer env 3. **NXOpen Documentation Integration** (Phase 2, remaining tasks) - Research Siemens docs portal access - Integrate NXOpen stub files for intellisense - Enable LLM to reference NXOpen API 4. **Phase 4: Dynamic Code Generation** (Roadmap) - Journal script generator - Custom function templates - Safe execution sandbox --- **Last Updated**: 2025-11-17 **Owner**: Antoine Polvé **Status**: Ready to begin Week 1 implementation