- Restructure docs/ folder (remove numeric prefixes): - 04_USER_GUIDES -> guides/ - 05_API_REFERENCE -> api/ - 06_PHYSICS -> physics/ - 07_DEVELOPMENT -> development/ - 08_ARCHIVE -> archive/ - 09_DIAGRAMS -> diagrams/ - Replace tagline 'Talk, don't click' with 'LLM-driven optimization framework' in 9 files - Create comprehensive docs/GETTING_STARTED.md: - Prerequisites and quick setup - Project structure overview - First study tutorial (Claude or manual) - Dashboard usage guide - Neural acceleration introduction - Rewrite docs/00_INDEX.md with correct paths and modern structure - Archive obsolete files: - 01_PROTOCOLS.md -> archive/historical/01_PROTOCOLS_legacy.md - 03_GETTING_STARTED.md -> archive/historical/ - ATOMIZER_PODCAST_BRIEFING.md -> archive/marketing/ - Update timestamps to 2026-01-20 across all key files - Update .gitignore to exclude docs/generated/ - Version bump: ATOMIZER_CONTEXT v1.8 -> v2.0
19 KiB
Phase 3.2 Integration - Next Steps
Status: Week 1 Complete (Task 1.2 Verified) Date: 2025-11-17 Author: Antoine Letarte
Week 1 Summary - COMPLETE ✅
Task 1.2: Wire LLMOptimizationRunner to Production ✅
Deliverables Completed:
- ✅ Interface contracts verified (
model_updater,simulation_runner) - ✅ LLM workflow validation in
run_optimization.py - ✅ Error handling for initialization failures
- ✅ Comprehensive integration test suite (5/5 tests passing)
- ✅ Example walkthrough (
examples/llm_mode_simple_example.py) - ✅ Documentation updated (README, DEVELOPMENT, DEVELOPMENT_GUIDANCE)
Commit: 7767fc6 - feat: Phase 3.2 Task 1.2 - Wire LLMOptimizationRunner to production
Key Achievement: Natural language optimization is now wired to production infrastructure. Users can describe optimization problems in plain English, and the system will auto-generate extractors, hooks, and run optimization.
Immediate Next Steps (Week 1 Completion)
Task 1.3: Create Minimal Working Example ✅ (Already Done)
Status: COMPLETE - Created in Task 1.2 commit
Deliverable: examples/llm_mode_simple_example.py
What it demonstrates:
request = """
Minimize displacement and mass while keeping stress below 200 MPa.
Design variables:
- beam_half_core_thickness: 15 to 30 mm
- beam_face_thickness: 15 to 30 mm
Run 5 trials using TPE sampler.
"""
Usage:
python examples/llm_mode_simple_example.py
Task 1.4: End-to-End Integration Test ✅ COMPLETE
Priority: HIGH ✅ DONE Effort: 2 hours (completed) Objective: Verify complete LLM mode workflow works with real FEM solver ✅
Deliverable: tests/test_phase_3_2_e2e.py ✅
Test Coverage (All Implemented):
- ✅ Natural language request parsing
- ✅ LLM workflow generation (with API key or Claude Code)
- ✅ Extractor auto-generation
- ✅ Hook auto-generation
- ✅ Model update (NX expressions)
- ✅ Simulation run (actual FEM solve)
- ✅ Result extraction
- ✅ Optimization loop (3 trials minimum)
- ✅ Results saved to output directory
- ✅ Graceful failure without API key
Acceptance Criteria: ALL MET ✅
- Test runs without errors
- 3 trials complete successfully (verified with API key mode)
- Best design found and saved
- Generated extractors work correctly
- Generated hooks execute without errors
- Optimization history written to JSON
- Graceful skip when no API key (provides clear instructions)
Implementation Plan:
def test_e2e_llm_mode():
"""End-to-end test of LLM mode with real FEM solver."""
# 1. Natural language request
request = """
Minimize mass while keeping displacement below 5mm.
Design variables: beam_half_core_thickness (20-30mm),
beam_face_thickness (18-25mm)
Run 3 trials with TPE sampler.
"""
# 2. Setup test environment
study_dir = Path("studies/simple_beam_optimization")
prt_file = study_dir / "1_setup/model/Beam.prt"
sim_file = study_dir / "1_setup/model/Beam_sim1.sim"
output_dir = study_dir / "2_substudies/test_e2e_3trials"
# 3. Run via subprocess (simulates real usage)
cmd = [
"c:/Users/antoi/anaconda3/envs/test_env/python.exe",
"optimization_engine/run_optimization.py",
"--llm", request,
"--prt", str(prt_file),
"--sim", str(sim_file),
"--output", str(output_dir.parent),
"--study-name", "test_e2e_3trials",
"--trials", "3"
]
result = subprocess.run(cmd, capture_output=True, text=True)
# 4. Verify outputs
assert result.returncode == 0
assert (output_dir / "history.json").exists()
assert (output_dir / "best_trial.json").exists()
assert (output_dir / "generated_extractors").exists()
# 5. Verify results are valid
with open(output_dir / "history.json") as f:
history = json.load(f)
assert len(history) == 3 # 3 trials completed
assert all("objective" in trial for trial in history)
assert all("design_variables" in trial for trial in history)
Known Issue to Address:
- LLMWorkflowAnalyzer Claude Code integration returns empty workflow
- Options:
- Use Anthropic API key for testing (preferred for now)
- Implement Claude Code integration in Phase 2.7 first
- Mock the LLM response for testing purposes
Recommendation: Use API key for E2E test, document Claude Code gap separately
Week 2: Robustness & Safety (16 hours) 🎯
Objective: Make LLM mode production-ready with validation, fallbacks, and safety
Task 2.1: Code Validation System (6 hours)
Deliverable: optimization_engine/code_validator.py
Features:
-
Syntax Validation:
- Run
ast.parse()on generated Python code - Catch syntax errors before execution
- Return detailed error messages with line numbers
- Run
-
Security Validation:
- Check for dangerous imports (
os.system,subprocess,eval, etc.) - Whitelist-based approach (only allow: numpy, pandas, pathlib, json, etc.)
- Reject code with file system modifications outside working directory
- Check for dangerous imports (
-
Schema Validation:
- Verify extractor returns
Dict[str, float] - Verify hook has correct signature
- Validate optimization config structure
- Verify extractor returns
Example:
class CodeValidator:
"""Validates generated code before execution."""
DANGEROUS_IMPORTS = [
'os.system', 'subprocess', 'eval', 'exec',
'compile', '__import__', 'open' # open needs special handling
]
ALLOWED_IMPORTS = [
'numpy', 'pandas', 'pathlib', 'json', 'math',
'pyNastran', 'NXOpen', 'typing'
]
def validate_syntax(self, code: str) -> ValidationResult:
"""Check if code has valid Python syntax."""
try:
ast.parse(code)
return ValidationResult(valid=True)
except SyntaxError as e:
return ValidationResult(
valid=False,
error=f"Syntax error at line {e.lineno}: {e.msg}"
)
def validate_security(self, code: str) -> ValidationResult:
"""Check for dangerous operations."""
tree = ast.parse(code)
for node in ast.walk(tree):
# Check imports
if isinstance(node, ast.Import):
for alias in node.names:
if alias.name not in self.ALLOWED_IMPORTS:
return ValidationResult(
valid=False,
error=f"Disallowed import: {alias.name}"
)
# Check function calls
if isinstance(node, ast.Call):
if hasattr(node.func, 'id'):
if node.func.id in self.DANGEROUS_IMPORTS:
return ValidationResult(
valid=False,
error=f"Dangerous function call: {node.func.id}"
)
return ValidationResult(valid=True)
def validate_extractor_schema(self, code: str) -> ValidationResult:
"""Verify extractor returns Dict[str, float]."""
# Check for return type annotation
tree = ast.parse(code)
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
if node.name.startswith('extract_'):
# Verify has return annotation
if node.returns is None:
return ValidationResult(
valid=False,
error=f"Extractor {node.name} missing return type annotation"
)
return ValidationResult(valid=True)
Task 2.2: Fallback Mechanisms (4 hours)
Deliverable: Enhanced error handling in run_optimization.py and llm_optimization_runner.py
Scenarios to Handle:
-
LLM Analysis Fails:
try: llm_workflow = analyzer.analyze_request(request) except Exception as e: logger.error(f"LLM analysis failed: {e}") logger.info("Falling back to manual mode...") logger.info("Please provide a JSON config file or try:") logger.info(" - Simplifying your request") logger.info(" - Checking API key is valid") logger.info(" - Using Claude Code mode (no API key)") sys.exit(1) -
Extractor Generation Fails:
try: extractors = extractor_orchestrator.generate_all() except Exception as e: logger.error(f"Extractor generation failed: {e}") logger.info("Attempting to use fallback extractors...") # Use pre-built generic extractors extractors = { 'displacement': GenericDisplacementExtractor(), 'stress': GenericStressExtractor(), 'mass': GenericMassExtractor() } logger.info("Using generic extractors - results may be less specific") -
Hook Generation Fails:
try: hook_manager.generate_hooks(llm_workflow['post_processing_hooks']) except Exception as e: logger.warning(f"Hook generation failed: {e}") logger.info("Continuing without custom hooks...") # Optimization continues without hooks (reduced functionality but not fatal) -
Single Trial Failure:
def _objective(self, trial): try: # ... run trial return objective_value except Exception as e: logger.error(f"Trial {trial.number} failed: {e}") # Return worst-case value instead of crashing return float('inf') if self.direction == 'minimize' else float('-inf')
Task 2.3: Comprehensive Test Suite (4 hours)
Deliverable: Extended test coverage in tests/
New Tests:
-
tests/test_code_validator.py:
- Test syntax validation catches errors
- Test security validation blocks dangerous code
- Test schema validation enforces correct signatures
- Test allowed imports pass validation
-
tests/test_fallback_mechanisms.py:
- Test LLM failure falls back gracefully
- Test extractor generation failure uses generic extractors
- Test hook generation failure continues optimization
- Test single trial failure doesn't crash optimization
-
tests/test_llm_mode_error_cases.py:
- Test empty natural language request
- Test request with missing design variables
- Test request with conflicting objectives
- Test request with invalid parameter ranges
-
tests/test_integration_robustness.py:
- Test optimization with intermittent FEM failures
- Test optimization with corrupted OP2 files
- Test optimization with missing NX expressions
- Test optimization with invalid design variable values
Task 2.4: Audit Trail System (2 hours)
Deliverable: optimization_engine/audit_trail.py
Features:
- Log all LLM-generated code to timestamped files
- Save validation results
- Track which extractors/hooks were used
- Record any fallbacks or errors
Example:
class AuditTrail:
"""Records all LLM-generated code and validation results."""
def __init__(self, output_dir: Path):
self.output_dir = output_dir / "audit_trail"
self.output_dir.mkdir(exist_ok=True)
self.log_file = self.output_dir / f"audit_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
self.entries = []
def log_generated_code(self, code_type: str, code: str, validation_result: ValidationResult):
"""Log generated code and validation result."""
entry = {
"timestamp": datetime.now().isoformat(),
"type": code_type,
"code": code,
"validation": {
"valid": validation_result.valid,
"error": validation_result.error
}
}
self.entries.append(entry)
# Save to file immediately
with open(self.log_file, 'w') as f:
json.dump(self.entries, f, indent=2)
def log_fallback(self, component: str, reason: str, fallback_action: str):
"""Log when a fallback mechanism is used."""
entry = {
"timestamp": datetime.now().isoformat(),
"type": "fallback",
"component": component,
"reason": reason,
"fallback_action": fallback_action
}
self.entries.append(entry)
with open(self.log_file, 'w') as f:
json.dump(self.entries, f, indent=2)
Integration:
# In LLMOptimizationRunner.__init__
self.audit_trail = AuditTrail(output_dir)
# When generating extractors
for feature in engineering_features:
code = generator.generate_extractor(feature)
validation = validator.validate(code)
self.audit_trail.log_generated_code("extractor", code, validation)
if not validation.valid:
self.audit_trail.log_fallback(
component="extractor",
reason=validation.error,
fallback_action="using generic extractor"
)
Week 3: Learning System (20 hours)
Objective: Build intelligence that learns from successful generations
Task 3.1: Template Library (8 hours)
Deliverable: optimization_engine/template_library/
Structure:
template_library/
├── extractors/
│ ├── displacement_templates.py
│ ├── stress_templates.py
│ ├── mass_templates.py
│ └── thermal_templates.py
├── calculations/
│ ├── safety_factor_templates.py
│ ├── objective_templates.py
│ └── constraint_templates.py
├── hooks/
│ ├── plotting_templates.py
│ ├── logging_templates.py
│ └── reporting_templates.py
└── registry.py
Features:
- Pre-validated code templates for common operations
- Success rate tracking for each template
- Automatic template selection based on context
- Template versioning and deprecation
Task 3.2: Knowledge Base Integration (8 hours)
Deliverable: Enhanced ResearchAgent with optimization-specific knowledge
Knowledge Sources:
- pyNastran documentation (already integrated in Phase 3)
- NXOpen API documentation (NXOpen intellisense - already set up)
- Optimization best practices
- Common FEA pitfalls and solutions
Features:
- Query knowledge base during code generation
- Suggest best practices for extractor design
- Warn about common mistakes (unit mismatches, etc.)
Task 3.3: Success Metrics & Learning (4 hours)
Deliverable: optimization_engine/learning_system.py
Features:
- Track which LLM-generated code succeeds vs fails
- Store successful patterns to knowledge base
- Suggest improvements based on past failures
- Auto-tune LLM prompts based on success rate
Week 4: Documentation & Polish (12 hours)
Task 4.1: User Guide (4 hours)
Deliverable: docs/LLM_MODE_USER_GUIDE.md
Contents:
- Getting started with LLM mode
- Natural language request formatting tips
- Common patterns and examples
- Troubleshooting guide
- FAQ
Task 4.2: Architecture Documentation (4 hours)
Deliverable: docs/ARCHITECTURE.md
Contents:
- System architecture diagram
- Component interaction flows
- LLM integration points
- Extractor/hook generation pipeline
- Data flow diagrams
Task 4.3: Demo Video & Presentation (4 hours)
Deliverable:
docs/demo_video.mp4docs/PHASE_3_2_PRESENTATION.pdf
Contents:
- 5-minute demo video showing LLM mode in action
- Presentation slides explaining the integration
- Before/after comparison (manual JSON vs LLM mode)
Success Criteria for Phase 3.2
At the end of 4 weeks, we should have:
- Week 1: LLM mode wired to production (Task 1.2 COMPLETE)
- Week 1: End-to-end test passing (Task 1.4)
- Week 2: Code validation preventing unsafe executions
- Week 2: Fallback mechanisms for all failure modes
- Week 2: Test coverage > 80%
- Week 2: Audit trail for all generated code
- Week 3: Template library with 20+ validated templates
- Week 3: Knowledge base integration working
- Week 3: Learning system tracking success metrics
- Week 4: Complete user documentation
- Week 4: Architecture documentation
- Week 4: Demo video completed
Priority Order
Immediate (This Week):
- Task 1.4: End-to-end integration test (2-4 hours)
- Address LLMWorkflowAnalyzer Claude Code gap (or use API key)
Week 2 Priorities:
- Code validation system (CRITICAL for safety)
- Fallback mechanisms (CRITICAL for robustness)
- Comprehensive test suite
- Audit trail system
Week 3 Priorities:
- Template library (HIGH value - improves reliability)
- Knowledge base integration
- Learning system
Week 4 Priorities:
- User guide (CRITICAL for adoption)
- Architecture documentation
- Demo video
Known Gaps & Risks
Gap 1: LLMWorkflowAnalyzer Claude Code Integration
Status: Empty workflow returned when use_claude_code=True
Impact: HIGH - LLM mode doesn't work without API key
Options:
- Implement Claude Code integration in Phase 2.7
- Use API key for now (temporary solution)
- Mock LLM responses for testing
Recommendation: Use API key for testing, implement Claude Code integration as Phase 2.7 task
Gap 2: Manual Mode Not Yet Integrated
Status: --config flag not fully implemented
Impact: MEDIUM - Users must use study-specific scripts
Timeline: Week 2-3 (lower priority than robustness)
Risk 1: LLM-Generated Code Failures
Mitigation: Code validation system (Week 2, Task 2.1) Severity: HIGH if not addressed Status: Planned for Week 2
Risk 2: FEM Solver Failures
Mitigation: Fallback mechanisms (Week 2, Task 2.2) Severity: MEDIUM Status: Planned for Week 2
Recommendations
-
Complete Task 1.4 this week: Verify E2E workflow works before moving to Week 2
-
Use API key for testing: Don't block on Claude Code integration - it's a Phase 2.7 component issue
-
Prioritize safety over features: Week 2 validation is CRITICAL before any production use
-
Build template library early: Week 3 templates will significantly improve reliability
-
Document as you go: Don't leave all documentation to Week 4
Conclusion
Phase 3.2 Week 1 Status: ✅ COMPLETE
Task 1.2 Achievement: Natural language optimization is now wired to production infrastructure with comprehensive testing and validation.
Next Immediate Step: Complete Task 1.4 (E2E integration test) to verify the complete workflow before moving to Week 2 robustness work.
Overall Progress: 25% of Phase 3.2 complete (1 week / 4 weeks)
Timeline on Track: YES - Week 1 completed on schedule
Author: Claude Code Last Updated: 2025-11-17 Next Review: After Task 1.4 completion