docs: Major documentation overhaul - restructure folders, update tagline, add Getting Started guide

- Restructure docs/ folder (remove numeric prefixes):
  - 04_USER_GUIDES -> guides/
  - 05_API_REFERENCE -> api/
  - 06_PHYSICS -> physics/
  - 07_DEVELOPMENT -> development/
  - 08_ARCHIVE -> archive/
  - 09_DIAGRAMS -> diagrams/

- Replace tagline 'Talk, don't click' with 'LLM-driven optimization framework' in 9 files

- Create comprehensive docs/GETTING_STARTED.md:
  - Prerequisites and quick setup
  - Project structure overview
  - First study tutorial (Claude or manual)
  - Dashboard usage guide
  - Neural acceleration introduction

- Rewrite docs/00_INDEX.md with correct paths and modern structure

- Archive obsolete files:
  - 01_PROTOCOLS.md -> archive/historical/01_PROTOCOLS_legacy.md
  - 03_GETTING_STARTED.md -> archive/historical/
  - ATOMIZER_PODCAST_BRIEFING.md -> archive/marketing/

- Update timestamps to 2026-01-20 across all key files

- Update .gitignore to exclude docs/generated/

- Version bump: ATOMIZER_CONTEXT v1.8 -> v2.0
This commit is contained in:
2026-01-20 10:03:45 -05:00
parent 37f73cc2be
commit ea437d360e
103 changed files with 8980 additions and 327 deletions

View File

@@ -0,0 +1,253 @@
# Phase 2.5: Intelligent Codebase-Aware Gap Detection
## Problem Statement
The current Research Agent uses dumb keyword matching and doesn't understand what already exists in the Atomizer codebase. When a user asks:
> "I want to evaluate strain on a part with sol101 and optimize this (minimize) using iterations and optuna to lower it varying all my geometry parameters that contains v_ in its expression"
**Current (Wrong) Behavior:**
- Detects keyword "geometry"
- Asks user for geometry examples
- Completely misses the actual request
**Expected (Correct) Behavior:**
```
Analyzing your optimization request...
Workflow Components Identified:
---------------------------------
1. Run SOL101 analysis [KNOWN - nx_solver.py]
2. Extract geometry parameters (v_ prefix) [KNOWN - expression system]
3. Update parameter values [KNOWN - parameter updater]
4. Optuna optimization loop [KNOWN - optimization engine]
5. Extract strain from OP2 [MISSING - not implemented]
6. Minimize strain objective [SIMPLE - max(strain values)]
Knowledge Gap Analysis:
-----------------------
HAVE: - OP2 displacement extraction (op2_extractor_example.py)
HAVE: - OP2 stress extraction (op2_extractor_example.py)
MISSING: - OP2 strain extraction
Research Needed:
----------------
Only need to learn: How to extract strain data from Nastran OP2 files using pyNastran
Would you like me to:
1. Search pyNastran documentation for strain extraction
2. Look for strain extraction examples in op2_extractor_example.py pattern
3. Ask you for an example of strain extraction code
```
## Solution Architecture
### 1. Codebase Capability Analyzer
Scan Atomizer to build capability index:
```python
class CodebaseCapabilityAnalyzer:
"""Analyzes what Atomizer can already do."""
def analyze_codebase(self) -> Dict[str, Any]:
"""
Returns:
{
'optimization': {
'optuna_integration': True,
'parameter_updating': True,
'expression_parsing': True
},
'simulation': {
'nx_solver': True,
'sol101': True,
'sol103': False
},
'result_extraction': {
'displacement': True,
'stress': True,
'strain': False, # <-- THE GAP!
'modal': False
}
}
"""
```
### 2. Workflow Decomposer
Break user request into atomic steps:
```python
class WorkflowDecomposer:
"""Breaks complex requests into atomic workflow steps."""
def decompose(self, user_request: str) -> List[WorkflowStep]:
"""
Input: "minimize strain using SOL101 and optuna varying v_ params"
Output:
[
WorkflowStep("identify_parameters", domain="geometry", params={"filter": "v_"}),
WorkflowStep("update_parameters", domain="geometry", params={"values": "from_optuna"}),
WorkflowStep("run_analysis", domain="simulation", params={"solver": "SOL101"}),
WorkflowStep("extract_strain", domain="results", params={"metric": "max_strain"}),
WorkflowStep("optimize", domain="optimization", params={"objective": "minimize", "algorithm": "optuna"})
]
"""
```
### 3. Capability Matcher
Match workflow steps to existing capabilities:
```python
class CapabilityMatcher:
"""Matches required workflow steps to existing capabilities."""
def match(self, workflow_steps, capabilities) -> CapabilityMatch:
"""
Returns:
{
'known_steps': [
{'step': 'identify_parameters', 'implementation': 'expression_parser.py'},
{'step': 'update_parameters', 'implementation': 'parameter_updater.py'},
{'step': 'run_analysis', 'implementation': 'nx_solver.py'},
{'step': 'optimize', 'implementation': 'optuna_optimizer.py'}
],
'unknown_steps': [
{'step': 'extract_strain', 'similar_to': 'extract_stress', 'gap': 'strain_from_op2'}
],
'confidence': 0.80 # 4/5 steps known
}
"""
```
### 4. Targeted Research Planner
Create research plan ONLY for missing pieces:
```python
class TargetedResearchPlanner:
"""Creates research plan focused on actual gaps."""
def plan(self, unknown_steps) -> ResearchPlan:
"""
For gap='strain_from_op2', similar_to='stress_from_op2':
Research Plan:
1. Read existing op2_extractor_example.py to understand pattern
2. Search pyNastran docs for strain extraction API
3. If not found, ask user for strain extraction example
4. Generate extract_strain() function following same pattern as extract_stress()
"""
```
## Implementation Plan
### Week 1: Capability Analysis
- [X] Map existing Atomizer capabilities
- [X] Build capability index from code
- [X] Create capability query system
### Week 2: Workflow Decomposition
- [X] Build workflow step extractor
- [X] Create domain classifier
- [X] Implement step-to-capability matcher
### Week 3: Intelligent Gap Detection
- [X] Integrate all components
- [X] Test with strain optimization request
- [X] Verify correct gap identification
## Success Criteria
**Test Input:**
"minimize strain using SOL101 and optuna varying v_ parameters"
**Expected Output:**
```
Request Analysis Complete
-------------------------
Known Capabilities (80%):
- Parameter identification (v_ prefix filter)
- Parameter updating
- SOL101 simulation execution
- Optuna optimization loop
Missing Capability (20%):
- Strain extraction from OP2 files
Recommendation:
The only missing piece is extracting strain data from Nastran OP2 output files.
I found a similar implementation for stress extraction in op2_extractor_example.py.
Would you like me to:
1. Research pyNastran strain extraction API
2. Generate extract_max_strain() function following the stress extraction pattern
3. Integrate into your optimization workflow
Research needed: Minimal (1 function, ~50 lines of code)
```
## Benefits
1. **Accurate Gap Detection**: Only identifies actual missing capabilities
2. **Minimal Research**: Focuses effort on real unknowns
3. **Leverages Existing Code**: Understands what you already have
4. **Better UX**: Clear explanation of what's known vs unknown
5. **Faster Iterations**: Doesn't waste time on known capabilities
## Current Status
- [X] Problem identified
- [X] Solution architecture designed
- [X] Implementation completed
- [X] All tests passing
## Implementation Summary
Phase 2.5 has been successfully implemented with 4 core components:
1. **CodebaseCapabilityAnalyzer** ([codebase_analyzer.py](../optimization_engine/codebase_analyzer.py))
- Scans Atomizer codebase for existing capabilities
- Identifies what's implemented vs missing
- Finds similar capabilities for pattern reuse
2. **WorkflowDecomposer** ([workflow_decomposer.py](../optimization_engine/workflow_decomposer.py))
- Breaks user requests into atomic workflow steps
- Extracts parameters from natural language
- Classifies steps by domain
3. **CapabilityMatcher** ([capability_matcher.py](../optimization_engine/capability_matcher.py))
- Matches workflow steps to existing code
- Identifies actual knowledge gaps
- Calculates confidence based on pattern similarity
4. **TargetedResearchPlanner** ([targeted_research_planner.py](../optimization_engine/targeted_research_planner.py))
- Creates focused research plans
- Leverages similar capabilities when available
- Prioritizes research sources
## Test Results
Run the comprehensive test:
```bash
python tests/test_phase_2_5_intelligent_gap_detection.py
```
**Test Output (strain optimization request):**
- Workflow: 5 steps identified
- Known: 4/5 steps (80% coverage)
- Missing: Only strain extraction
- Similar: Can adapt from displacement/stress
- Overall confidence: 90%
- Research plan: 4 focused steps
## Next Steps
1. Integrate Phase 2.5 with existing Research Agent
2. Update interactive session to use new gap detection
3. Test with diverse optimization requests
4. Build MCP integration for documentation search

View File

@@ -0,0 +1,245 @@
# Phase 2.7: LLM-Powered Workflow Intelligence
## Problem: Static Regex vs. Dynamic Intelligence
**Previous Approach (Phase 2.5-2.6):**
- ❌ Dumb regex patterns to extract workflow steps
- ❌ Static rules for step classification
- ❌ Missed intermediate calculations
- ❌ Couldn't understand nuance (CBUSH vs CBAR, element forces vs reaction forces)
**New Approach (Phase 2.7):**
-**Use Claude LLM to analyze user requests**
-**Understand engineering context dynamically**
-**Detect ALL intermediate steps intelligently**
-**Distinguish subtle differences (element types, directions, metrics)**
## Architecture
```
User Request
LLM Analyzer (Claude)
Structured JSON Analysis
┌────────────────────────────────────┐
│ Engineering Features (FEA) │
│ Inline Calculations (Math) │
│ Post-Processing Hooks (Custom) │
│ Optimization Config │
└────────────────────────────────────┘
Phase 2.5 Capability Matching
Research Plan / Code Generation
```
## Example: CBAR Optimization Request
**User Input:**
```
I want to extract forces in direction Z of all the 1D elements and find the average of it,
then find the minimum value and compare it to the average, then assign it to a objective
metric that needs to be minimized.
I want to iterate on the FEA properties of the Cbar element stiffness in X to make the
objective function minimized.
I want to use genetic algorithm to iterate and optimize this
```
**LLM Analysis Output:**
```json
{
"engineering_features": [
{
"action": "extract_1d_element_forces",
"domain": "result_extraction",
"description": "Extract element forces from CBAR in Z direction from OP2",
"params": {
"element_types": ["CBAR"],
"result_type": "element_force",
"direction": "Z"
}
},
{
"action": "update_cbar_stiffness",
"domain": "fea_properties",
"description": "Modify CBAR stiffness in X direction",
"params": {
"element_type": "CBAR",
"property": "stiffness_x"
}
}
],
"inline_calculations": [
{
"action": "calculate_average",
"params": {"input": "forces_z", "operation": "mean"},
"code_hint": "avg = sum(forces_z) / len(forces_z)"
},
{
"action": "find_minimum",
"params": {"input": "forces_z", "operation": "min"},
"code_hint": "min_val = min(forces_z)"
}
],
"post_processing_hooks": [
{
"action": "custom_objective_metric",
"description": "Compare min to average",
"params": {
"inputs": ["min_force", "avg_force"],
"formula": "min_force / avg_force",
"objective": "minimize"
}
}
],
"optimization": {
"algorithm": "genetic_algorithm",
"design_variables": [
{"parameter": "cbar_stiffness_x", "type": "FEA_property"}
]
}
}
```
## Key Intelligence Improvements
### 1. Detects Intermediate Steps
**Old (Regex):**
- ❌ Only saw "extract forces" and "optimize"
- ❌ Missed average, minimum, comparison
**New (LLM):**
- ✅ Identifies: extract → average → min → compare → optimize
- ✅ Classifies each as engineering vs. simple math
### 2. Understands Engineering Context
**Old (Regex):**
- ❌ "forces" → generic "reaction_force" extraction
- ❌ Didn't distinguish CBUSH from CBAR
**New (LLM):**
- ✅ "1D element forces" → element forces (not reaction forces)
- ✅ "CBAR stiffness in X" → specific property in specific direction
- ✅ Understands these come from different sources (OP2 vs property cards)
### 3. Smart Classification
**Old (Regex):**
```python
if 'average' in text:
return 'simple_calculation' # Dumb!
```
**New (LLM):**
```python
# LLM reasoning:
# - "average of forces" → simple Python (sum/len)
# - "extract forces from OP2" → engineering (pyNastran)
# - "compare min to avg for objective" → hook (custom logic)
```
### 4. Generates Actionable Code Hints
**Old:** Just action names like "calculate_average"
**New:** Includes code hints for auto-generation:
```json
{
"action": "calculate_average",
"code_hint": "avg = sum(forces_z) / len(forces_z)"
}
```
## Integration with Existing Phases
### Phase 2.5 (Capability Matching)
LLM output feeds directly into existing capability matcher:
- Engineering features → check if implemented
- If missing → create research plan
- If similar → adapt existing code
### Phase 2.6 (Step Classification)
Now **replaced by LLM** for better accuracy:
- No more static rules
- Context-aware classification
- Understands subtle differences
## Implementation
**File:** `optimization_engine/llm_workflow_analyzer.py`
**Key Function:**
```python
analyzer = LLMWorkflowAnalyzer(api_key=os.getenv('ANTHROPIC_API_KEY'))
analysis = analyzer.analyze_request(user_request)
# Returns structured JSON with:
# - engineering_features
# - inline_calculations
# - post_processing_hooks
# - optimization config
```
## Benefits
1. **Accurate**: Understands engineering nuance
2. **Complete**: Detects ALL steps, including intermediate ones
3. **Dynamic**: No hardcoded patterns to maintain
4. **Extensible**: Automatically handles new request types
5. **Actionable**: Provides code hints for auto-generation
## LLM Integration Modes
### Development Mode (Recommended)
For development within Claude Code:
- Use Claude Code directly for interactive workflow analysis
- No API consumption or costs
- Real-time feedback and iteration
- Perfect for testing and refinement
### Production Mode (Future)
For standalone Atomizer execution:
- Optional Anthropic API integration
- Set `ANTHROPIC_API_KEY` environment variable
- Falls back to heuristics if no key provided
- Useful for automated batch processing
**Current Status**: llm_workflow_analyzer.py supports both modes. For development, continue using Claude Code interactively.
## Next Steps
1. ✅ Install anthropic package
2. ✅ Create LLM analyzer module
3. ✅ Document integration modes
4. ⏳ Integrate with Phase 2.5 capability matcher
5. ⏳ Test with diverse optimization requests via Claude Code
6. ⏳ Build code generator for inline calculations
7. ⏳ Build hook generator for post-processing
## Success Criteria
**Input:**
"Extract 1D forces, find average, find minimum, compare to average, optimize CBAR stiffness"
**Output:**
```
Engineering Features: 2 (need research)
- extract_1d_element_forces
- update_cbar_stiffness
Inline Calculations: 2 (auto-generate)
- calculate_average
- find_minimum
Post-Processing: 1 (generate hook)
- custom_objective_metric (min/avg ratio)
Optimization: 1
- genetic_algorithm
✅ All steps detected
✅ Correctly classified
✅ Ready for implementation
```

View File

@@ -0,0 +1,699 @@
# Phase 3.2: LLM Integration Roadmap
**Status**: ✅ **WEEK 1 COMPLETE** - 🎯 **Week 2 IN PROGRESS**
**Timeline**: 2-4 weeks
**Last Updated**: 2025-11-17
**Current Progress**: 25% (Week 1/4 Complete)
---
## Executive Summary
### The Problem
We've built 85% of an LLM-native optimization system, but **it's not integrated into production**. The components exist but are disconnected islands:
-**LLMWorkflowAnalyzer** - Parses natural language → workflow (Phase 2.7)
-**ExtractorOrchestrator** - Auto-generates result extractors (Phase 3.1)
-**InlineCodeGenerator** - Creates custom calculations (Phase 2.8)
-**HookGenerator** - Generates post-processing hooks (Phase 2.9)
-**LLMOptimizationRunner** - Orchestrates LLM workflow (Phase 3.2)
- ⚠️ **ResearchAgent** - Learns from examples (Phase 2, partially complete)
**Reality**: Users still write 100+ lines of JSON config manually instead of using 3 lines of natural language.
### The Solution
**Phase 3.2 Integration Sprint**: Wire LLM components into production workflow with a single `--llm` flag.
---
## Strategic Roadmap
### Week 1: Make LLM Mode Accessible (16 hours)
**Goal**: Users can invoke LLM mode with a single command
#### Tasks
**1.1 Create Unified Entry Point** (4 hours) ✅ COMPLETE
- [x] Create `optimization_engine/run_optimization.py` as unified CLI
- [x] Add `--llm` flag for natural language mode
- [x] Add `--request` parameter for natural language input
- [x] Preserve existing `--config` for traditional JSON mode
- [x] Support both modes in parallel (no breaking changes)
**Files**:
- `optimization_engine/run_optimization.py` (NEW)
**Success Metric**:
```bash
python optimization_engine/run_optimization.py --llm \
--request "Minimize stress for bracket. Vary wall thickness 3-8mm" \
--prt studies/bracket/model/Bracket.prt \
--sim studies/bracket/model/Bracket_sim1.sim
```
---
**1.2 Wire LLMOptimizationRunner to Production** (8 hours) ✅ COMPLETE
- [x] Connect LLMWorkflowAnalyzer to entry point
- [x] Bridge LLMOptimizationRunner → OptimizationRunner for execution
- [x] Pass model updater and simulation runner callables
- [x] Integrate with existing hook system
- [x] Preserve all logging (detailed logs, optimization.log)
- [x] Add workflow validation and error handling
- [x] Create comprehensive integration test suite (5/5 tests passing)
**Files Modified**:
- `optimization_engine/run_optimization.py`
- `optimization_engine/llm_optimization_runner.py` (integration points)
**Success Metric**: LLM workflow generates extractors → runs FEA → logs results
---
**1.3 Create Minimal Example** (2 hours) ✅ COMPLETE
- [x] Create `examples/llm_mode_simple_example.py`
- [x] Show: Natural language request → Optimization results
- [x] Compare: Traditional mode (100 lines JSON) vs LLM mode (3 lines)
- [x] Include troubleshooting tips
**Files Created**:
- `examples/llm_mode_simple_example.py`
**Success Metric**: Example runs successfully, demonstrates value ✅
---
**1.4 End-to-End Integration Test** (2 hours) ✅ COMPLETE
- [x] Test with simple_beam_optimization study
- [x] Natural language → JSON workflow → NX solve → Results
- [x] Verify all extractors generated correctly
- [x] Check logs created properly
- [x] Validate output matches manual mode
- [x] Test graceful failure without API key
- [x] Comprehensive verification of all output files
**Files Created**:
- `tests/test_phase_3_2_e2e.py`
**Success Metric**: LLM mode completes beam optimization without errors ✅
---
### Week 2: Robustness & Safety (16 hours)
**Goal**: LLM mode handles failures gracefully, never crashes
#### Tasks
**2.1 Code Validation Pipeline** (6 hours)
- [ ] Create `optimization_engine/code_validator.py`
- [ ] Implement syntax validation (ast.parse)
- [ ] Implement security scanning (whitelist imports)
- [ ] Implement test execution on example OP2
- [ ] Implement output schema validation
- [ ] Add retry with LLM feedback on validation failure
**Files Created**:
- `optimization_engine/code_validator.py`
**Integration Points**:
- `optimization_engine/extractor_orchestrator.py` (validate before saving)
- `optimization_engine/inline_code_generator.py` (validate calculations)
**Success Metric**: Generated code passes validation, or LLM fixes based on feedback
---
**2.2 Graceful Fallback Mechanisms** (4 hours)
- [ ] Wrap all LLM calls in try/except
- [ ] Provide clear error messages
- [ ] Offer fallback to manual mode
- [ ] Log failures to audit trail
- [ ] Never crash on LLM failure
**Files Modified**:
- `optimization_engine/run_optimization.py`
- `optimization_engine/llm_workflow_analyzer.py`
- `optimization_engine/llm_optimization_runner.py`
**Success Metric**: LLM failures degrade gracefully to manual mode
---
**2.3 LLM Audit Trail** (3 hours)
- [ ] Create `optimization_engine/llm_audit.py`
- [ ] Log all LLM requests and responses
- [ ] Log generated code with prompts
- [ ] Log validation results
- [ ] Create `llm_audit.json` in study output directory
**Files Created**:
- `optimization_engine/llm_audit.py`
**Integration Points**:
- All LLM components log to audit trail
**Success Metric**: Full LLM decision trace available for debugging
---
**2.4 Failure Scenario Testing** (3 hours)
- [ ] Test: Invalid natural language request
- [ ] Test: LLM unavailable (API down)
- [ ] Test: Generated code has syntax error
- [ ] Test: Generated code fails validation
- [ ] Test: OP2 file format unexpected
- [ ] Verify all fail gracefully
**Files Created**:
- `tests/test_llm_failure_modes.py`
**Success Metric**: All failure scenarios handled without crashes
---
### Week 3: Learning System (12 hours)
**Goal**: System learns from successful workflows and reuses patterns
#### Tasks
**3.1 Knowledge Base Implementation** (4 hours)
- [ ] Create `optimization_engine/knowledge_base.py`
- [ ] Implement `save_session()` - Save successful workflows
- [ ] Implement `search_templates()` - Find similar past workflows
- [ ] Implement `get_template()` - Retrieve reusable pattern
- [ ] Add confidence scoring (user-validated > LLM-generated)
**Files Created**:
- `optimization_engine/knowledge_base.py`
- `knowledge_base/sessions/` (directory for session logs)
- `knowledge_base/templates/` (directory for reusable patterns)
**Success Metric**: Successful workflows saved with metadata
---
**3.2 Template Extraction** (4 hours)
- [ ] Analyze generated extractor code to identify patterns
- [ ] Extract reusable template structure
- [ ] Parameterize variable parts
- [ ] Save template with usage examples
- [ ] Implement template application to new requests
**Files Modified**:
- `optimization_engine/extractor_orchestrator.py`
**Integration**:
```python
# After successful generation:
template = extract_template(generated_code)
knowledge_base.save_template(feature_name, template, confidence='medium')
# On next request:
existing_template = knowledge_base.search_templates(feature_name)
if existing_template and existing_template.confidence > 0.7:
code = existing_template.apply(new_params) # Reuse!
```
**Success Metric**: Second identical request reuses template (faster)
---
**3.3 ResearchAgent Integration** (4 hours)
- [ ] Complete ResearchAgent implementation
- [ ] Integrate into ExtractorOrchestrator error handling
- [ ] Add user example collection workflow
- [ ] Implement pattern learning from examples
- [ ] Save learned knowledge to knowledge base
**Files Modified**:
- `optimization_engine/research_agent.py` (complete implementation)
- `optimization_engine/llm_optimization_runner.py` (integrate ResearchAgent)
**Workflow**:
```
Unknown feature requested
→ ResearchAgent asks user for example
→ Learns pattern from example
→ Generates feature using pattern
→ Saves to knowledge base
→ Retry with new feature
```
**Success Metric**: Unknown feature request triggers learning loop successfully
---
### Week 4: Documentation & Discoverability (8 hours)
**Goal**: Users discover and understand LLM capabilities
#### Tasks
**4.1 Update README** (2 hours)
- [ ] Add "🤖 LLM-Powered Mode" section to README.md
- [ ] Show example command with natural language
- [ ] Explain what LLM mode can do
- [ ] Link to detailed docs
**Files Modified**:
- `README.md`
**Success Metric**: README clearly shows LLM capabilities upfront
---
**4.2 Create LLM Mode Documentation** (3 hours)
- [ ] Create `docs/LLM_MODE.md`
- [ ] Explain how LLM mode works
- [ ] Provide usage examples
- [ ] Document when to use LLM vs manual mode
- [ ] Add troubleshooting guide
- [ ] Explain learning system
**Files Created**:
- `docs/LLM_MODE.md`
**Contents**:
- How it works (architecture diagram)
- Getting started (first LLM optimization)
- Natural language patterns that work well
- Troubleshooting common issues
- How learning system improves over time
**Success Metric**: Users understand LLM mode from docs
---
**4.3 Create Demo Video/GIF** (1 hour)
- [ ] Record terminal session: Natural language → Results
- [ ] Show before/after (100 lines JSON vs 3 lines)
- [ ] Create animated GIF for README
- [ ] Add to documentation
**Files Created**:
- `docs/demo/llm_mode_demo.gif`
**Success Metric**: Visual demo shows value proposition clearly
---
**4.4 Update All Planning Docs** (2 hours)
- [ ] Update DEVELOPMENT.md with Phase 3.2 completion status
- [ ] Update DEVELOPMENT_GUIDANCE.md progress (80-90% → 90-95%)
- [ ] Update DEVELOPMENT_ROADMAP.md Phase 3 status
- [ ] Mark Phase 3.2 as ✅ Complete
**Files Modified**:
- `DEVELOPMENT.md`
- `DEVELOPMENT_GUIDANCE.md`
- `DEVELOPMENT_ROADMAP.md`
**Success Metric**: All docs reflect completed Phase 3.2
---
## Implementation Details
### Entry Point Architecture
```python
# optimization_engine/run_optimization.py (NEW)
import argparse
from pathlib import Path
def main():
parser = argparse.ArgumentParser(
description="Atomizer Optimization Engine - Manual or LLM-powered mode"
)
# Mode selection
mode_group = parser.add_mutually_exclusive_group(required=True)
mode_group.add_argument('--llm', action='store_true',
help='Use LLM-assisted workflow (natural language mode)')
mode_group.add_argument('--config', type=Path,
help='JSON config file (traditional mode)')
# LLM mode parameters
parser.add_argument('--request', type=str,
help='Natural language optimization request (required with --llm)')
# Common parameters
parser.add_argument('--prt', type=Path, required=True,
help='Path to .prt file')
parser.add_argument('--sim', type=Path, required=True,
help='Path to .sim file')
parser.add_argument('--output', type=Path,
help='Output directory (default: auto-generated)')
parser.add_argument('--trials', type=int, default=50,
help='Number of optimization trials')
args = parser.parse_args()
if args.llm:
run_llm_mode(args)
else:
run_traditional_mode(args)
def run_llm_mode(args):
"""LLM-powered natural language mode."""
from optimization_engine.llm_workflow_analyzer import LLMWorkflowAnalyzer
from optimization_engine.llm_optimization_runner import LLMOptimizationRunner
from optimization_engine.nx_updater import NXParameterUpdater
from optimization_engine.nx_solver import NXSolver
from optimization_engine.llm_audit import LLMAuditLogger
if not args.request:
raise ValueError("--request required with --llm mode")
print(f"🤖 LLM Mode: Analyzing request...")
print(f" Request: {args.request}")
# Initialize audit logger
audit_logger = LLMAuditLogger(args.output / "llm_audit.json")
# Analyze natural language request
analyzer = LLMWorkflowAnalyzer(use_claude_code=True)
try:
workflow = analyzer.analyze_request(args.request)
audit_logger.log_analysis(args.request, workflow,
reasoning=workflow.get('llm_reasoning', ''))
print(f"✓ Workflow created:")
print(f" - Design variables: {len(workflow['design_variables'])}")
print(f" - Objectives: {len(workflow['objectives'])}")
print(f" - Extractors: {len(workflow['engineering_features'])}")
except Exception as e:
print(f"✗ LLM analysis failed: {e}")
print(" Falling back to manual mode. Please provide --config instead.")
return
# Create model updater and solver callables
updater = NXParameterUpdater(args.prt)
solver = NXSolver()
def model_updater(design_vars):
updater.update_expressions(design_vars)
def simulation_runner():
result = solver.run_simulation(args.sim)
return result['op2_file']
# Run LLM-powered optimization
runner = LLMOptimizationRunner(
llm_workflow=workflow,
model_updater=model_updater,
simulation_runner=simulation_runner,
study_name=args.output.name if args.output else "llm_optimization",
output_dir=args.output
)
study = runner.run(n_trials=args.trials)
print(f"\n✓ Optimization complete!")
print(f" Best trial: {study.best_trial.number}")
print(f" Best value: {study.best_value:.6f}")
print(f" Results: {args.output}")
def run_traditional_mode(args):
"""Traditional JSON configuration mode."""
from optimization_engine.runner import OptimizationRunner
import json
print(f"📄 Traditional Mode: Loading config...")
with open(args.config) as f:
config = json.load(f)
runner = OptimizationRunner(
config_file=args.config,
prt_file=args.prt,
sim_file=args.sim,
output_dir=args.output
)
study = runner.run(n_trials=args.trials)
print(f"\n✓ Optimization complete!")
print(f" Results: {args.output}")
if __name__ == '__main__':
main()
```
---
### Validation Pipeline
```python
# optimization_engine/code_validator.py (NEW)
import ast
import subprocess
import tempfile
from pathlib import Path
from typing import Dict, Any, List
class CodeValidator:
"""
Validates LLM-generated code before execution.
Checks:
1. Syntax (ast.parse)
2. Security (whitelist imports)
3. Test execution on example data
4. Output schema validation
"""
ALLOWED_IMPORTS = {
'pyNastran', 'numpy', 'pathlib', 'typing', 'dataclasses',
'json', 'sys', 'os', 'math', 'collections'
}
FORBIDDEN_CALLS = {
'eval', 'exec', 'compile', '__import__', 'open',
'subprocess', 'os.system', 'os.popen'
}
def validate_extractor(self, code: str, test_op2_file: Path) -> Dict[str, Any]:
"""
Validate generated extractor code.
Args:
code: Generated Python code
test_op2_file: Example OP2 file for testing
Returns:
{
'valid': bool,
'error': str (if invalid),
'test_result': dict (if valid)
}
"""
# 1. Syntax check
try:
tree = ast.parse(code)
except SyntaxError as e:
return {
'valid': False,
'error': f'Syntax error: {e}',
'stage': 'syntax'
}
# 2. Security scan
security_result = self._check_security(tree)
if not security_result['safe']:
return {
'valid': False,
'error': security_result['error'],
'stage': 'security'
}
# 3. Test execution
try:
test_result = self._test_execution(code, test_op2_file)
except Exception as e:
return {
'valid': False,
'error': f'Runtime error: {e}',
'stage': 'execution'
}
# 4. Output schema validation
schema_result = self._validate_output_schema(test_result)
if not schema_result['valid']:
return {
'valid': False,
'error': schema_result['error'],
'stage': 'schema'
}
return {
'valid': True,
'test_result': test_result
}
def _check_security(self, tree: ast.AST) -> Dict[str, Any]:
"""Check for dangerous imports and function calls."""
for node in ast.walk(tree):
# Check imports
if isinstance(node, ast.Import):
for alias in node.names:
module = alias.name.split('.')[0]
if module not in self.ALLOWED_IMPORTS:
return {
'safe': False,
'error': f'Disallowed import: {alias.name}'
}
# Check function calls
if isinstance(node, ast.Call):
if isinstance(node.func, ast.Name):
if node.func.id in self.FORBIDDEN_CALLS:
return {
'safe': False,
'error': f'Forbidden function call: {node.func.id}'
}
return {'safe': True}
def _test_execution(self, code: str, test_file: Path) -> Dict[str, Any]:
"""Execute code in sandboxed environment with test data."""
# Write code to temp file
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(code)
temp_code_file = Path(f.name)
try:
# Execute in subprocess (sandboxed)
result = subprocess.run(
['python', str(temp_code_file), str(test_file)],
capture_output=True,
text=True,
timeout=30
)
if result.returncode != 0:
raise RuntimeError(f"Execution failed: {result.stderr}")
# Parse JSON output
import json
output = json.loads(result.stdout)
return output
finally:
temp_code_file.unlink()
def _validate_output_schema(self, output: Dict[str, Any]) -> Dict[str, Any]:
"""Validate output matches expected extractor schema."""
# All extractors must return dict with numeric values
if not isinstance(output, dict):
return {
'valid': False,
'error': 'Output must be a dictionary'
}
# Check for at least one result value
if not any(key for key in output if not key.startswith('_')):
return {
'valid': False,
'error': 'No result values found in output'
}
# All values must be numeric
for key, value in output.items():
if not key.startswith('_'): # Skip metadata
if not isinstance(value, (int, float)):
return {
'valid': False,
'error': f'Non-numeric value for {key}: {type(value)}'
}
return {'valid': True}
```
---
## Success Metrics
### Week 1 Success
- [ ] LLM mode accessible via `--llm` flag
- [ ] Natural language request → Workflow generation works
- [ ] End-to-end test passes (simple_beam_optimization)
- [ ] Example demonstrates value (100 lines → 3 lines)
### Week 2 Success
- [ ] Generated code validated before execution
- [ ] All failure scenarios degrade gracefully (no crashes)
- [ ] Complete LLM audit trail in `llm_audit.json`
- [ ] Test suite covers failure modes
### Week 3 Success
- [ ] Successful workflows saved to knowledge base
- [ ] Second identical request reuses template (faster)
- [ ] Unknown features trigger ResearchAgent learning loop
- [ ] Knowledge base grows over time
### Week 4 Success
- [ ] README shows LLM mode prominently
- [ ] docs/LLM_MODE.md complete and clear
- [ ] Demo video/GIF shows value proposition
- [ ] All planning docs updated
---
## Risk Mitigation
### Risk: LLM generates unsafe code
**Mitigation**: Multi-stage validation pipeline (syntax, security, test, schema)
### Risk: LLM unavailable (API down)
**Mitigation**: Graceful fallback to manual mode with clear error message
### Risk: Generated code fails at runtime
**Mitigation**: Sandboxed test execution before saving, retry with LLM feedback
### Risk: Users don't discover LLM mode
**Mitigation**: Prominent README section, demo video, clear examples
### Risk: Learning system fills disk with templates
**Mitigation**: Confidence-based pruning, max template limit, user confirmation for saves
---
## Next Steps After Phase 3.2
Once integration is complete:
1. **Validate with Real Studies**
- Run simple_beam_optimization in LLM mode
- Create new study using only natural language
- Compare results manual vs LLM mode
2. **Fix atomizer Conda Environment**
- Rebuild clean environment
- Test visualization in atomizer env
3. **NXOpen Documentation Integration** (Phase 2, remaining tasks)
- Research Siemens docs portal access
- Integrate NXOpen stub files for intellisense
- Enable LLM to reference NXOpen API
4. **Phase 4: Dynamic Code Generation** (Roadmap)
- Journal script generator
- Custom function templates
- Safe execution sandbox
---
**Last Updated**: 2025-11-17
**Owner**: Antoine Polvé
**Status**: Ready to begin Week 1 implementation

View File

@@ -0,0 +1,346 @@
# Phase 3.2 Integration Status
> **Date**: 2025-11-17
> **Status**: Partially Complete - Framework Ready, API Integration Pending
---
## Overview
Phase 3.2 aims to integrate the LLM components (Phases 2.5-3.1) into the production optimization workflow, enabling users to run optimizations using natural language requests.
**Goal**: Enable users to run:
```bash
python run_optimization.py --llm "maximize displacement, ensure safety factor > 4"
```
---
## What's Been Completed ✅
### 1. Generic Optimization Runner (`optimization_engine/run_optimization.py`)
**Created**: 2025-11-17
A flexible, command-line driven optimization runner supporting both LLM and manual modes:
```bash
# LLM Mode (Natural Language)
python optimization_engine/run_optimization.py \
--llm "maximize displacement, ensure safety factor > 4" \
--prt model/Bracket.prt \
--sim model/Bracket_sim1.sim \
--trials 20
# Manual Mode (JSON Config)
python optimization_engine/run_optimization.py \
--config config.json \
--prt model/Bracket.prt \
--sim model/Bracket_sim1.sim \
--trials 50
```
**Features**:
- ✅ Command-line argument parsing (`--llm`, `--config`, `--prt`, `--sim`, etc.)
- ✅ Integration with `LLMWorkflowAnalyzer` for natural language parsing
- ✅ Integration with `LLMOptimizationRunner` for automated extractor/hook generation
- ✅ Proper error handling and user feedback
- ✅ Comprehensive help message with examples
- ✅ Flexible output directory and study naming
**Files**:
- [optimization_engine/run_optimization.py](../optimization_engine/run_optimization.py) - Generic runner
- [tests/test_phase_3_2_llm_mode.py](../tests/test_phase_3_2_llm_mode.py) - Integration tests
### 2. Test Suite
**Test Results**: ✅ All tests passing
Tests verify:
- Argument parsing works correctly
- Help message displays `--llm` flag
- Framework is ready for LLM integration
---
## Current Limitation ⚠️
### LLM Workflow Analysis Requires API Key
The `LLMWorkflowAnalyzer` currently requires an Anthropic API key to actually parse natural language requests. The `use_claude_code` flag exists but **doesn't implement actual integration** with Claude Code's AI capabilities.
**Current Behavior**:
- `--llm` mode is implemented in the CLI
- But `LLMWorkflowAnalyzer.analyze_request()` returns empty workflow when `use_claude_code=True` and no API key provided
- Actual LLM analysis requires `--api-key` argument
**Workaround Options**:
#### Option 1: Use Anthropic API Key
```bash
python run_optimization.py \
--llm "maximize displacement" \
--prt model/part.prt \
--sim model/sim.sim \
--api-key "sk-ant-..."
```
#### Option 2: Pre-Generate Workflow JSON (Hybrid Approach)
1. Use Claude Code to help create workflow JSON manually
2. Save as `llm_workflow.json`
3. Load and use with `LLMOptimizationRunner`
Example:
```python
# In your study's run_optimization.py
from optimization_engine.llm_optimization_runner import LLMOptimizationRunner
import json
# Load pre-generated workflow (created with Claude Code assistance)
with open('llm_workflow.json', 'r') as f:
llm_workflow = json.load(f)
# Run optimization with LLM runner
runner = LLMOptimizationRunner(
llm_workflow=llm_workflow,
model_updater=model_updater,
simulation_runner=simulation_runner,
study_name='my_study'
)
results = runner.run_optimization(n_trials=20)
```
#### Option 3: Use Existing Study Scripts
The bracket study's `run_optimization.py` already demonstrates the complete workflow with hardcoded configuration - this works perfectly!
---
## Architecture
### LLM Mode Flow (When API Key Provided)
```
User Natural Language Request
LLMWorkflowAnalyzer (Phase 2.7)
├─> Claude API call
└─> Parse to structured workflow JSON
LLMOptimizationRunner (Phase 3.2)
├─> ExtractorOrchestrator (Phase 3.1) → Auto-generate extractors
├─> InlineCodeGenerator (Phase 2.8) → Auto-generate calculations
├─> HookGenerator (Phase 2.9) → Auto-generate hooks
└─> Run Optuna optimization with generated code
Results
```
### Manual Mode Flow (Current Working Approach)
```
Hardcoded Workflow JSON (or manually created)
LLMOptimizationRunner (Phase 3.2)
├─> ExtractorOrchestrator → Auto-generate extractors
├─> InlineCodeGenerator → Auto-generate calculations
├─> HookGenerator → Auto-generate hooks
└─> Run Optuna optimization
Results
```
---
## What Works Right Now
### ✅ **LLM Components are Functional**
All individual components work and are tested:
1. **Phase 2.5**: Intelligent Gap Detection ✅
2. **Phase 2.7**: LLM Workflow Analysis (requires API key) ✅
3. **Phase 2.8**: Inline Code Generator ✅
4. **Phase 2.9**: Hook Generator ✅
5. **Phase 3.0**: pyNastran Research Agent ✅
6. **Phase 3.1**: Extractor Orchestrator ✅
7. **Phase 3.2**: LLM Optimization Runner ✅
### ✅ **Generic CLI Runner**
The new `run_optimization.py` provides:
- Clean command-line interface
- Argument validation
- Error handling
- Comprehensive help
### ✅ **Bracket Study Demonstrates End-to-End Workflow**
[studies/bracket_displacement_maximizing/run_optimization.py](../studies/bracket_displacement_maximizing/run_optimization.py) shows the complete integration:
- Wizard-based setup (Phase 3.3)
- LLMOptimizationRunner with hardcoded workflow
- Auto-generated extractors and hooks
- Real NX simulations
- Complete results with reports
---
## Next Steps to Complete Phase 3.2
### Short Term (Can Do Now)
1. **Document Hybrid Approach** ✅ (This document!)
- Show how to use Claude Code to create workflow JSON
- Example workflow JSON templates for common use cases
2. **Create Example Workflow JSONs**
- `examples/llm_workflows/maximize_displacement.json`
- `examples/llm_workflows/minimize_stress.json`
- `examples/llm_workflows/multi_objective.json`
3. **Update DEVELOPMENT_GUIDANCE.md**
- Mark Phase 3.2 as "Partially Complete"
- Document the API key requirement
- Provide hybrid approach guidance
### Medium Term (Requires Decision)
**Option A: Implement True Claude Code Integration**
- Modify `LLMWorkflowAnalyzer` to actually interface with Claude Code
- Would require understanding Claude Code's internal API/skill system
- Most aligned with "Development Strategy" (use Claude Code, defer API integration)
**Option B: Defer Until API Integration is Priority**
- Document current state as "Framework Ready"
- Focus on other high-priority items (NXOpen docs, Engineering pipeline)
- Return to full LLM integration when ready to integrate Anthropic API
**Option C: Hybrid Approach (Recommended for Now)**
- Keep generic CLI runner as-is
- Document how to use Claude Code to manually create workflow JSONs
- Use `LLMOptimizationRunner` with pre-generated workflows
- Provides 90% of the value with 10% of the complexity
---
## Recommendation
**For now, adopt Option C (Hybrid Approach)**:
### Why:
1. **Development Strategy Alignment**: We're using Claude Code for development, not integrating API yet
2. **Provides Value**: All automation components (extractors, hooks, calculations) work perfectly
3. **No Blocker**: Users can still leverage LLM components via pre-generated workflows
4. **Flexible**: Can add full API integration later without changing architecture
5. **Focus**: Allows us to prioritize Phase 3.3+ items (NXOpen docs, Engineering pipeline)
### What This Means:
- ✅ Phase 3.2 is "Framework Complete"
- ⚠️ Full natural language CLI requires API key (documented limitation)
- ✅ Hybrid approach (Claude Code → JSON → LLMOptimizationRunner) works today
- 🎯 Can return to full integration when API integration becomes priority
---
## Example: Using Hybrid Approach
### Step 1: Create Workflow JSON (with Claude Code assistance)
```json
{
"engineering_features": [
{
"action": "extract_displacement",
"domain": "result_extraction",
"description": "Extract displacement results from OP2 file",
"params": {"result_type": "displacement"}
},
{
"action": "extract_solid_stress",
"domain": "result_extraction",
"description": "Extract von Mises stress from CTETRA elements",
"params": {
"result_type": "stress",
"element_type": "ctetra"
}
}
],
"inline_calculations": [
{
"action": "calculate_safety_factor",
"params": {
"input": "max_von_mises",
"yield_strength": 276.0,
"operation": "divide"
},
"code_hint": "safety_factor = 276.0 / max_von_mises"
}
],
"post_processing_hooks": [],
"optimization": {
"algorithm": "TPE",
"direction": "minimize",
"design_variables": [
{
"parameter": "thickness",
"min": 3.0,
"max": 10.0,
"units": "mm"
}
]
}
}
```
### Step 2: Use in Python Script
```python
import json
from pathlib import Path
from optimization_engine.llm_optimization_runner import LLMOptimizationRunner
from optimization_engine.nx_updater import NXParameterUpdater
from optimization_engine.nx_solver import NXSolver
# Load pre-generated workflow
with open('llm_workflow.json', 'r') as f:
workflow = json.load(f)
# Setup model updater
updater = NXParameterUpdater(prt_file_path=Path("model/part.prt"))
def model_updater(design_vars):
updater.update_expressions(design_vars)
updater.save()
# Setup simulation runner
solver = NXSolver(nastran_version='2412', use_journal=True)
def simulation_runner(design_vars) -> Path:
result = solver.run_simulation(Path("model/sim.sim"), expression_updates=design_vars)
return result['op2_file']
# Run optimization
runner = LLMOptimizationRunner(
llm_workflow=workflow,
model_updater=model_updater,
simulation_runner=simulation_runner,
study_name='my_optimization'
)
results = runner.run_optimization(n_trials=20)
print(f"Best design: {results['best_params']}")
```
---
## References
- [DEVELOPMENT_GUIDANCE.md](../DEVELOPMENT_GUIDANCE.md) - Strategic direction
- [optimization_engine/run_optimization.py](../optimization_engine/run_optimization.py) - Generic CLI runner
- [optimization_engine/llm_optimization_runner.py](../optimization_engine/llm_optimization_runner.py) - LLM runner
- [optimization_engine/llm_workflow_analyzer.py](../optimization_engine/llm_workflow_analyzer.py) - Workflow analyzer
- [studies/bracket_displacement_maximizing/run_optimization.py](../studies/bracket_displacement_maximizing/run_optimization.py) - Complete example
---
**Document Maintained By**: Antoine Letarte
**Last Updated**: 2025-11-17
**Status**: Framework Complete, API Integration Pending

View File

@@ -0,0 +1,617 @@
# Phase 3.2 Integration - Next Steps
**Status**: Week 1 Complete (Task 1.2 Verified)
**Date**: 2025-11-17
**Author**: Antoine Letarte
## Week 1 Summary - COMPLETE ✅
### Task 1.2: Wire LLMOptimizationRunner to Production ✅
**Deliverables Completed**:
- ✅ Interface contracts verified (`model_updater`, `simulation_runner`)
- ✅ LLM workflow validation in `run_optimization.py`
- ✅ Error handling for initialization failures
- ✅ Comprehensive integration test suite (5/5 tests passing)
- ✅ Example walkthrough (`examples/llm_mode_simple_example.py`)
- ✅ Documentation updated (README, DEVELOPMENT, DEVELOPMENT_GUIDANCE)
**Commit**: `7767fc6` - feat: Phase 3.2 Task 1.2 - Wire LLMOptimizationRunner to production
**Key Achievement**: Natural language optimization is now wired to production infrastructure. Users can describe optimization problems in plain English, and the system will auto-generate extractors, hooks, and run optimization.
---
## Immediate Next Steps (Week 1 Completion)
### Task 1.3: Create Minimal Working Example ✅ (Already Done)
**Status**: COMPLETE - Created in Task 1.2 commit
**Deliverable**: `examples/llm_mode_simple_example.py`
**What it demonstrates**:
```python
request = """
Minimize displacement and mass while keeping stress below 200 MPa.
Design variables:
- beam_half_core_thickness: 15 to 30 mm
- beam_face_thickness: 15 to 30 mm
Run 5 trials using TPE sampler.
"""
```
**Usage**:
```bash
python examples/llm_mode_simple_example.py
```
---
### Task 1.4: End-to-End Integration Test ✅ COMPLETE
**Priority**: HIGH ✅ DONE
**Effort**: 2 hours (completed)
**Objective**: Verify complete LLM mode workflow works with real FEM solver ✅
**Deliverable**: `tests/test_phase_3_2_e2e.py`
**Test Coverage** (All Implemented):
1. ✅ Natural language request parsing
2. ✅ LLM workflow generation (with API key or Claude Code)
3. ✅ Extractor auto-generation
4. ✅ Hook auto-generation
5. ✅ Model update (NX expressions)
6. ✅ Simulation run (actual FEM solve)
7. ✅ Result extraction
8. ✅ Optimization loop (3 trials minimum)
9. ✅ Results saved to output directory
10. ✅ Graceful failure without API key
**Acceptance Criteria**: ALL MET ✅
- [x] Test runs without errors
- [x] 3 trials complete successfully (verified with API key mode)
- [x] Best design found and saved
- [x] Generated extractors work correctly
- [x] Generated hooks execute without errors
- [x] Optimization history written to JSON
- [x] Graceful skip when no API key (provides clear instructions)
**Implementation Plan**:
```python
def test_e2e_llm_mode():
"""End-to-end test of LLM mode with real FEM solver."""
# 1. Natural language request
request = """
Minimize mass while keeping displacement below 5mm.
Design variables: beam_half_core_thickness (20-30mm),
beam_face_thickness (18-25mm)
Run 3 trials with TPE sampler.
"""
# 2. Setup test environment
study_dir = Path("studies/simple_beam_optimization")
prt_file = study_dir / "1_setup/model/Beam.prt"
sim_file = study_dir / "1_setup/model/Beam_sim1.sim"
output_dir = study_dir / "2_substudies/test_e2e_3trials"
# 3. Run via subprocess (simulates real usage)
cmd = [
"c:/Users/antoi/anaconda3/envs/test_env/python.exe",
"optimization_engine/run_optimization.py",
"--llm", request,
"--prt", str(prt_file),
"--sim", str(sim_file),
"--output", str(output_dir.parent),
"--study-name", "test_e2e_3trials",
"--trials", "3"
]
result = subprocess.run(cmd, capture_output=True, text=True)
# 4. Verify outputs
assert result.returncode == 0
assert (output_dir / "history.json").exists()
assert (output_dir / "best_trial.json").exists()
assert (output_dir / "generated_extractors").exists()
# 5. Verify results are valid
with open(output_dir / "history.json") as f:
history = json.load(f)
assert len(history) == 3 # 3 trials completed
assert all("objective" in trial for trial in history)
assert all("design_variables" in trial for trial in history)
```
**Known Issue to Address**:
- LLMWorkflowAnalyzer Claude Code integration returns empty workflow
- **Options**:
1. Use Anthropic API key for testing (preferred for now)
2. Implement Claude Code integration in Phase 2.7 first
3. Mock the LLM response for testing purposes
**Recommendation**: Use API key for E2E test, document Claude Code gap separately
---
## Week 2: Robustness & Safety (16 hours) 🎯
**Objective**: Make LLM mode production-ready with validation, fallbacks, and safety
### Task 2.1: Code Validation System (6 hours)
**Deliverable**: `optimization_engine/code_validator.py`
**Features**:
1. **Syntax Validation**:
- Run `ast.parse()` on generated Python code
- Catch syntax errors before execution
- Return detailed error messages with line numbers
2. **Security Validation**:
- Check for dangerous imports (`os.system`, `subprocess`, `eval`, etc.)
- Whitelist-based approach (only allow: numpy, pandas, pathlib, json, etc.)
- Reject code with file system modifications outside working directory
3. **Schema Validation**:
- Verify extractor returns `Dict[str, float]`
- Verify hook has correct signature
- Validate optimization config structure
**Example**:
```python
class CodeValidator:
"""Validates generated code before execution."""
DANGEROUS_IMPORTS = [
'os.system', 'subprocess', 'eval', 'exec',
'compile', '__import__', 'open' # open needs special handling
]
ALLOWED_IMPORTS = [
'numpy', 'pandas', 'pathlib', 'json', 'math',
'pyNastran', 'NXOpen', 'typing'
]
def validate_syntax(self, code: str) -> ValidationResult:
"""Check if code has valid Python syntax."""
try:
ast.parse(code)
return ValidationResult(valid=True)
except SyntaxError as e:
return ValidationResult(
valid=False,
error=f"Syntax error at line {e.lineno}: {e.msg}"
)
def validate_security(self, code: str) -> ValidationResult:
"""Check for dangerous operations."""
tree = ast.parse(code)
for node in ast.walk(tree):
# Check imports
if isinstance(node, ast.Import):
for alias in node.names:
if alias.name not in self.ALLOWED_IMPORTS:
return ValidationResult(
valid=False,
error=f"Disallowed import: {alias.name}"
)
# Check function calls
if isinstance(node, ast.Call):
if hasattr(node.func, 'id'):
if node.func.id in self.DANGEROUS_IMPORTS:
return ValidationResult(
valid=False,
error=f"Dangerous function call: {node.func.id}"
)
return ValidationResult(valid=True)
def validate_extractor_schema(self, code: str) -> ValidationResult:
"""Verify extractor returns Dict[str, float]."""
# Check for return type annotation
tree = ast.parse(code)
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
if node.name.startswith('extract_'):
# Verify has return annotation
if node.returns is None:
return ValidationResult(
valid=False,
error=f"Extractor {node.name} missing return type annotation"
)
return ValidationResult(valid=True)
```
---
### Task 2.2: Fallback Mechanisms (4 hours)
**Deliverable**: Enhanced error handling in `run_optimization.py` and `llm_optimization_runner.py`
**Scenarios to Handle**:
1. **LLM Analysis Fails**:
```python
try:
llm_workflow = analyzer.analyze_request(request)
except Exception as e:
logger.error(f"LLM analysis failed: {e}")
logger.info("Falling back to manual mode...")
logger.info("Please provide a JSON config file or try:")
logger.info(" - Simplifying your request")
logger.info(" - Checking API key is valid")
logger.info(" - Using Claude Code mode (no API key)")
sys.exit(1)
```
2. **Extractor Generation Fails**:
```python
try:
extractors = extractor_orchestrator.generate_all()
except Exception as e:
logger.error(f"Extractor generation failed: {e}")
logger.info("Attempting to use fallback extractors...")
# Use pre-built generic extractors
extractors = {
'displacement': GenericDisplacementExtractor(),
'stress': GenericStressExtractor(),
'mass': GenericMassExtractor()
}
logger.info("Using generic extractors - results may be less specific")
```
3. **Hook Generation Fails**:
```python
try:
hook_manager.generate_hooks(llm_workflow['post_processing_hooks'])
except Exception as e:
logger.warning(f"Hook generation failed: {e}")
logger.info("Continuing without custom hooks...")
# Optimization continues without hooks (reduced functionality but not fatal)
```
4. **Single Trial Failure**:
```python
def _objective(self, trial):
try:
# ... run trial
return objective_value
except Exception as e:
logger.error(f"Trial {trial.number} failed: {e}")
# Return worst-case value instead of crashing
return float('inf') if self.direction == 'minimize' else float('-inf')
```
---
### Task 2.3: Comprehensive Test Suite (4 hours)
**Deliverable**: Extended test coverage in `tests/`
**New Tests**:
1. **tests/test_code_validator.py**:
- Test syntax validation catches errors
- Test security validation blocks dangerous code
- Test schema validation enforces correct signatures
- Test allowed imports pass validation
2. **tests/test_fallback_mechanisms.py**:
- Test LLM failure falls back gracefully
- Test extractor generation failure uses generic extractors
- Test hook generation failure continues optimization
- Test single trial failure doesn't crash optimization
3. **tests/test_llm_mode_error_cases.py**:
- Test empty natural language request
- Test request with missing design variables
- Test request with conflicting objectives
- Test request with invalid parameter ranges
4. **tests/test_integration_robustness.py**:
- Test optimization with intermittent FEM failures
- Test optimization with corrupted OP2 files
- Test optimization with missing NX expressions
- Test optimization with invalid design variable values
---
### Task 2.4: Audit Trail System (2 hours)
**Deliverable**: `optimization_engine/audit_trail.py`
**Features**:
- Log all LLM-generated code to timestamped files
- Save validation results
- Track which extractors/hooks were used
- Record any fallbacks or errors
**Example**:
```python
class AuditTrail:
"""Records all LLM-generated code and validation results."""
def __init__(self, output_dir: Path):
self.output_dir = output_dir / "audit_trail"
self.output_dir.mkdir(exist_ok=True)
self.log_file = self.output_dir / f"audit_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
self.entries = []
def log_generated_code(self, code_type: str, code: str, validation_result: ValidationResult):
"""Log generated code and validation result."""
entry = {
"timestamp": datetime.now().isoformat(),
"type": code_type,
"code": code,
"validation": {
"valid": validation_result.valid,
"error": validation_result.error
}
}
self.entries.append(entry)
# Save to file immediately
with open(self.log_file, 'w') as f:
json.dump(self.entries, f, indent=2)
def log_fallback(self, component: str, reason: str, fallback_action: str):
"""Log when a fallback mechanism is used."""
entry = {
"timestamp": datetime.now().isoformat(),
"type": "fallback",
"component": component,
"reason": reason,
"fallback_action": fallback_action
}
self.entries.append(entry)
with open(self.log_file, 'w') as f:
json.dump(self.entries, f, indent=2)
```
**Integration**:
```python
# In LLMOptimizationRunner.__init__
self.audit_trail = AuditTrail(output_dir)
# When generating extractors
for feature in engineering_features:
code = generator.generate_extractor(feature)
validation = validator.validate(code)
self.audit_trail.log_generated_code("extractor", code, validation)
if not validation.valid:
self.audit_trail.log_fallback(
component="extractor",
reason=validation.error,
fallback_action="using generic extractor"
)
```
---
## Week 3: Learning System (20 hours)
**Objective**: Build intelligence that learns from successful generations
### Task 3.1: Template Library (8 hours)
**Deliverable**: `optimization_engine/template_library/`
**Structure**:
```
template_library/
├── extractors/
│ ├── displacement_templates.py
│ ├── stress_templates.py
│ ├── mass_templates.py
│ └── thermal_templates.py
├── calculations/
│ ├── safety_factor_templates.py
│ ├── objective_templates.py
│ └── constraint_templates.py
├── hooks/
│ ├── plotting_templates.py
│ ├── logging_templates.py
│ └── reporting_templates.py
└── registry.py
```
**Features**:
- Pre-validated code templates for common operations
- Success rate tracking for each template
- Automatic template selection based on context
- Template versioning and deprecation
---
### Task 3.2: Knowledge Base Integration (8 hours)
**Deliverable**: Enhanced ResearchAgent with optimization-specific knowledge
**Knowledge Sources**:
1. pyNastran documentation (already integrated in Phase 3)
2. NXOpen API documentation (NXOpen intellisense - already set up)
3. Optimization best practices
4. Common FEA pitfalls and solutions
**Features**:
- Query knowledge base during code generation
- Suggest best practices for extractor design
- Warn about common mistakes (unit mismatches, etc.)
---
### Task 3.3: Success Metrics & Learning (4 hours)
**Deliverable**: `optimization_engine/learning_system.py`
**Features**:
- Track which LLM-generated code succeeds vs fails
- Store successful patterns to knowledge base
- Suggest improvements based on past failures
- Auto-tune LLM prompts based on success rate
---
## Week 4: Documentation & Polish (12 hours)
### Task 4.1: User Guide (4 hours)
**Deliverable**: `docs/LLM_MODE_USER_GUIDE.md`
**Contents**:
- Getting started with LLM mode
- Natural language request formatting tips
- Common patterns and examples
- Troubleshooting guide
- FAQ
---
### Task 4.2: Architecture Documentation (4 hours)
**Deliverable**: `docs/ARCHITECTURE.md`
**Contents**:
- System architecture diagram
- Component interaction flows
- LLM integration points
- Extractor/hook generation pipeline
- Data flow diagrams
---
### Task 4.3: Demo Video & Presentation (4 hours)
**Deliverable**:
- `docs/demo_video.mp4`
- `docs/PHASE_3_2_PRESENTATION.pdf`
**Contents**:
- 5-minute demo video showing LLM mode in action
- Presentation slides explaining the integration
- Before/after comparison (manual JSON vs LLM mode)
---
## Success Criteria for Phase 3.2
At the end of 4 weeks, we should have:
- [x] Week 1: LLM mode wired to production (Task 1.2 COMPLETE)
- [ ] Week 1: End-to-end test passing (Task 1.4)
- [ ] Week 2: Code validation preventing unsafe executions
- [ ] Week 2: Fallback mechanisms for all failure modes
- [ ] Week 2: Test coverage > 80%
- [ ] Week 2: Audit trail for all generated code
- [ ] Week 3: Template library with 20+ validated templates
- [ ] Week 3: Knowledge base integration working
- [ ] Week 3: Learning system tracking success metrics
- [ ] Week 4: Complete user documentation
- [ ] Week 4: Architecture documentation
- [ ] Week 4: Demo video completed
---
## Priority Order
**Immediate (This Week)**:
1. Task 1.4: End-to-end integration test (2-4 hours)
2. Address LLMWorkflowAnalyzer Claude Code gap (or use API key)
**Week 2 Priorities**:
1. Code validation system (CRITICAL for safety)
2. Fallback mechanisms (CRITICAL for robustness)
3. Comprehensive test suite
4. Audit trail system
**Week 3 Priorities**:
1. Template library (HIGH value - improves reliability)
2. Knowledge base integration
3. Learning system
**Week 4 Priorities**:
1. User guide (CRITICAL for adoption)
2. Architecture documentation
3. Demo video
---
## Known Gaps & Risks
### Gap 1: LLMWorkflowAnalyzer Claude Code Integration
**Status**: Empty workflow returned when `use_claude_code=True`
**Impact**: HIGH - LLM mode doesn't work without API key
**Options**:
1. Implement Claude Code integration in Phase 2.7
2. Use API key for now (temporary solution)
3. Mock LLM responses for testing
**Recommendation**: Use API key for testing, implement Claude Code integration as Phase 2.7 task
---
### Gap 2: Manual Mode Not Yet Integrated
**Status**: `--config` flag not fully implemented
**Impact**: MEDIUM - Users must use study-specific scripts
**Timeline**: Week 2-3 (lower priority than robustness)
---
### Risk 1: LLM-Generated Code Failures
**Mitigation**: Code validation system (Week 2, Task 2.1)
**Severity**: HIGH if not addressed
**Status**: Planned for Week 2
---
### Risk 2: FEM Solver Failures
**Mitigation**: Fallback mechanisms (Week 2, Task 2.2)
**Severity**: MEDIUM
**Status**: Planned for Week 2
---
## Recommendations
1. **Complete Task 1.4 this week**: Verify E2E workflow works before moving to Week 2
2. **Use API key for testing**: Don't block on Claude Code integration - it's a Phase 2.7 component issue
3. **Prioritize safety over features**: Week 2 validation is CRITICAL before any production use
4. **Build template library early**: Week 3 templates will significantly improve reliability
5. **Document as you go**: Don't leave all documentation to Week 4
---
## Conclusion
**Phase 3.2 Week 1 Status**: ✅ COMPLETE
**Task 1.2 Achievement**: Natural language optimization is now wired to production infrastructure with comprehensive testing and validation.
**Next Immediate Step**: Complete Task 1.4 (E2E integration test) to verify the complete workflow before moving to Week 2 robustness work.
**Overall Progress**: 25% of Phase 3.2 complete (1 week / 4 weeks)
**Timeline on Track**: YES - Week 1 completed on schedule
---
**Author**: Claude Code
**Last Updated**: 2025-11-17
**Next Review**: After Task 1.4 completion

View File

@@ -0,0 +1,419 @@
# Phase 3.3: Visualization & Model Cleanup System
**Status**: ✅ Complete
**Date**: 2025-11-17
## Overview
Phase 3.3 adds automated post-processing capabilities to Atomizer, including publication-quality visualization and intelligent model cleanup to manage disk space.
---
## Features Implemented
### 1. Automated Visualization System
**File**: `optimization_engine/visualizer.py`
**Capabilities**:
- **Convergence Plots**: Objective value vs trial number with running best
- **Design Space Exploration**: Parameter evolution colored by performance
- **Parallel Coordinate Plots**: High-dimensional visualization
- **Sensitivity Heatmaps**: Parameter correlation analysis
- **Constraint Violations**: Track constraint satisfaction over trials
- **Multi-Objective Breakdown**: Individual objective contributions
**Output Formats**:
- PNG (high-resolution, 300 DPI)
- PDF (vector graphics, publication-ready)
- Customizable via configuration
**Example Usage**:
```bash
# Standalone visualization
python optimization_engine/visualizer.py studies/beam/substudies/opt1 png pdf
# Automatic during optimization (configured in JSON)
```
### 2. Model Cleanup System
**File**: `optimization_engine/model_cleanup.py`
**Purpose**: Reduce disk usage by deleting large CAD/FEM files from non-optimal trials
**Strategy**:
- Keep top-N best trials (configurable)
- Delete large files: `.prt`, `.sim`, `.fem`, `.op2`, `.f06`
- Preserve ALL `results.json` (small, critical data)
- Dry-run mode for safety
**Example Usage**:
```bash
# Standalone cleanup
python optimization_engine/model_cleanup.py studies/beam/substudies/opt1 --keep-top-n 10
# Dry run (preview without deleting)
python optimization_engine/model_cleanup.py studies/beam/substudies/opt1 --dry-run
# Automatic during optimization (configured in JSON)
```
### 3. Optuna Dashboard Integration
**File**: `docs/OPTUNA_DASHBOARD.md`
**Capabilities**:
- Real-time monitoring during optimization
- Interactive parallel coordinate plots
- Parameter importance analysis (fANOVA)
- Multi-study comparison
**Usage**:
```bash
# Launch dashboard for a study
cd studies/beam/substudies/opt1
optuna-dashboard sqlite:///optuna_study.db
# Access at http://localhost:8080
```
---
## Configuration
### JSON Configuration Format
Add `post_processing` section to optimization config:
```json
{
"study_name": "my_optimization",
"design_variables": { ... },
"objectives": [ ... ],
"optimization_settings": {
"n_trials": 50,
...
},
"post_processing": {
"generate_plots": true,
"plot_formats": ["png", "pdf"],
"cleanup_models": true,
"keep_top_n_models": 10,
"cleanup_dry_run": false
}
}
```
### Configuration Options
#### Visualization Settings
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `generate_plots` | boolean | `false` | Enable automatic plot generation |
| `plot_formats` | list | `["png", "pdf"]` | Output formats for plots |
#### Cleanup Settings
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `cleanup_models` | boolean | `false` | Enable model cleanup |
| `keep_top_n_models` | integer | `10` | Number of best trials to keep models for |
| `cleanup_dry_run` | boolean | `false` | Preview cleanup without deleting |
---
## Workflow Integration
### Automatic Post-Processing
When configured, post-processing runs automatically after optimization completes:
```
OPTIMIZATION COMPLETE
===========================================================
...
POST-PROCESSING
===========================================================
Generating visualization plots...
- Generating convergence plot...
- Generating design space exploration...
- Generating parallel coordinate plot...
- Generating sensitivity heatmap...
Plots generated: 2 format(s)
Improvement: 23.1%
Location: studies/beam/substudies/opt1/plots
Cleaning up trial models...
Deleted 320 files from 40 trials
Space freed: 1542.3 MB
Kept top 10 trial models
===========================================================
```
### Directory Structure After Post-Processing
```
studies/my_optimization/
├── substudies/
│ └── opt1/
│ ├── trial_000/ # Top performer - KEPT
│ │ ├── Beam.prt # CAD files kept
│ │ ├── Beam_sim1.sim
│ │ └── results.json
│ ├── trial_001/ # Poor performer - CLEANED
│ │ └── results.json # Only results kept
│ ├── ...
│ ├── plots/ # NEW: Auto-generated
│ │ ├── convergence.png
│ │ ├── convergence.pdf
│ │ ├── design_space_evolution.png
│ │ ├── design_space_evolution.pdf
│ │ ├── parallel_coordinates.png
│ │ ├── parallel_coordinates.pdf
│ │ └── plot_summary.json
│ ├── history.json
│ ├── best_trial.json
│ ├── cleanup_log.json # NEW: Cleanup statistics
│ └── optuna_study.pkl
```
---
## Plot Types
### 1. Convergence Plot
**File**: `convergence.png/pdf`
**Shows**:
- Individual trial objectives (scatter)
- Running best (line)
- Best trial highlighted (gold star)
- Improvement percentage annotation
**Use Case**: Assess optimization convergence and identify best trial
### 2. Design Space Exploration
**File**: `design_space_evolution.png/pdf`
**Shows**:
- Each design variable evolution over trials
- Color-coded by objective value (darker = better)
- Best trial highlighted
- Units displayed on y-axis
**Use Case**: Understand how parameters changed during optimization
### 3. Parallel Coordinate Plot
**File**: `parallel_coordinates.png/pdf`
**Shows**:
- High-dimensional view of design space
- Each line = one trial
- Color-coded by objective
- Best trial highlighted
**Use Case**: Visualize relationships between multiple design variables
### 4. Sensitivity Heatmap
**File**: `sensitivity_heatmap.png/pdf`
**Shows**:
- Correlation matrix: design variables vs objectives
- Values: -1 (negative correlation) to +1 (positive)
- Color-coded: red (negative), blue (positive)
**Use Case**: Identify which parameters most influence objectives
### 5. Constraint Violations
**File**: `constraint_violations.png/pdf` (if constraints exist)
**Shows**:
- Constraint values over trials
- Feasibility threshold (red line at y=0)
- Trend of constraint satisfaction
**Use Case**: Verify constraint satisfaction throughout optimization
### 6. Objective Breakdown
**File**: `objective_breakdown.png/pdf` (if multi-objective)
**Shows**:
- Stacked area plot of individual objectives
- Total objective overlay
- Contribution of each objective over trials
**Use Case**: Understand multi-objective trade-offs
---
## Benefits
### Visualization
**Publication-Ready**: High-DPI PNG and vector PDF exports
**Automated**: No manual post-processing required
**Comprehensive**: 6 plot types cover all optimization aspects
**Customizable**: Configurable formats and styling
**Portable**: Plots embedded in reports, papers, presentations
### Model Cleanup
**Disk Space Savings**: 50-90% reduction typical (depends on model size)
**Selective**: Keeps best trials for validation/reproduction
**Safe**: Preserves all critical data (results.json)
**Traceable**: Cleanup log documents what was deleted
**Reversible**: Dry-run mode previews before deletion
### Optuna Dashboard
**Real-Time**: Monitor optimization while it runs
**Interactive**: Zoom, filter, explore data dynamically
**Advanced**: Parameter importance, contour plots
**Comparative**: Multi-study comparison support
---
## Example: Beam Optimization
**Configuration**:
```json
{
"study_name": "simple_beam_optimization",
"optimization_settings": {
"n_trials": 50
},
"post_processing": {
"generate_plots": true,
"plot_formats": ["png", "pdf"],
"cleanup_models": true,
"keep_top_n_models": 10
}
}
```
**Results**:
- 50 trials completed
- 6 plots generated (× 2 formats = 12 files)
- 40 trials cleaned up
- 1.2 GB disk space freed
- Top 10 trial models retained for validation
**Files Generated**:
- `plots/convergence.{png,pdf}`
- `plots/design_space_evolution.{png,pdf}`
- `plots/parallel_coordinates.{png,pdf}`
- `plots/plot_summary.json`
- `cleanup_log.json`
---
## Future Enhancements
### Potential Additions
1. **Interactive HTML Plots**: Plotly-based interactive visualizations
2. **Automated Report Generation**: Markdown → PDF with embedded plots
3. **Video Animation**: Design evolution as animated GIF/MP4
4. **3D Scatter Plots**: For high-dimensional design spaces
5. **Statistical Analysis**: Confidence intervals, significance tests
6. **Comparison Reports**: Side-by-side substudy comparison
### Configuration Expansion
```json
"post_processing": {
"generate_plots": true,
"plot_formats": ["png", "pdf", "html"], // Add interactive
"plot_style": "publication", // Predefined styles
"generate_report": true, // Auto-generate PDF report
"report_template": "default", // Custom templates
"cleanup_models": true,
"keep_top_n_models": 10,
"archive_cleaned_trials": false // Compress instead of delete
}
```
---
## Troubleshooting
### Matplotlib Import Error
**Problem**: `ImportError: No module named 'matplotlib'`
**Solution**: Install visualization dependencies
```bash
conda install -n atomizer matplotlib pandas "numpy<2" -y
```
### Unicode Display Error
**Problem**: Checkmark character displays incorrectly in Windows console
**Status**: Fixed (replaced Unicode with "SUCCESS:")
### Missing history.json
**Problem**: Older substudies don't have `history.json`
**Solution**: Generate from trial results
```bash
python optimization_engine/generate_history_from_trials.py studies/beam/substudies/opt1
```
### Cleanup Deleted Wrong Files
**Prevention**: ALWAYS use dry-run first!
```bash
python optimization_engine/model_cleanup.py <substudy> --dry-run
```
---
## Technical Details
### Dependencies
**Required**:
- `matplotlib >= 3.10`
- `numpy < 2.0` (pyNastran compatibility)
- `pandas >= 2.3`
- `optuna >= 3.0` (for dashboard)
**Optional**:
- `optuna-dashboard` (for real-time monitoring)
### Performance
**Visualization**:
- 50 trials: ~5-10 seconds
- 100 trials: ~10-15 seconds
- 500 trials: ~30-40 seconds
**Cleanup**:
- Depends on file count and sizes
- Typically < 1 minute for 100 trials
---
## Summary
Phase 3.3 completes Atomizer's post-processing capabilities with:
✅ Automated publication-quality visualization
✅ Intelligent model cleanup for disk space management
✅ Optuna dashboard integration for real-time monitoring
✅ Comprehensive configuration options
✅ Full integration with optimization workflow
**Next Phase**: Phase 3.4 - Report Generation & Statistical Analysis