Files
Atomizer/docs/archive/phase_documents/PHASE_3_2_NEXT_STEPS.md
Anto01 ea437d360e docs: Major documentation overhaul - restructure folders, update tagline, add Getting Started guide
- Restructure docs/ folder (remove numeric prefixes):
  - 04_USER_GUIDES -> guides/
  - 05_API_REFERENCE -> api/
  - 06_PHYSICS -> physics/
  - 07_DEVELOPMENT -> development/
  - 08_ARCHIVE -> archive/
  - 09_DIAGRAMS -> diagrams/

- Replace tagline 'Talk, don't click' with 'LLM-driven optimization framework' in 9 files

- Create comprehensive docs/GETTING_STARTED.md:
  - Prerequisites and quick setup
  - Project structure overview
  - First study tutorial (Claude or manual)
  - Dashboard usage guide
  - Neural acceleration introduction

- Rewrite docs/00_INDEX.md with correct paths and modern structure

- Archive obsolete files:
  - 01_PROTOCOLS.md -> archive/historical/01_PROTOCOLS_legacy.md
  - 03_GETTING_STARTED.md -> archive/historical/
  - ATOMIZER_PODCAST_BRIEFING.md -> archive/marketing/

- Update timestamps to 2026-01-20 across all key files

- Update .gitignore to exclude docs/generated/

- Version bump: ATOMIZER_CONTEXT v1.8 -> v2.0
2026-01-20 10:03:45 -05:00

19 KiB

Phase 3.2 Integration - Next Steps

Status: Week 1 Complete (Task 1.2 Verified) Date: 2025-11-17 Author: Antoine Letarte

Week 1 Summary - COMPLETE

Task 1.2: Wire LLMOptimizationRunner to Production

Deliverables Completed:

  • Interface contracts verified (model_updater, simulation_runner)
  • LLM workflow validation in run_optimization.py
  • Error handling for initialization failures
  • Comprehensive integration test suite (5/5 tests passing)
  • Example walkthrough (examples/llm_mode_simple_example.py)
  • Documentation updated (README, DEVELOPMENT, DEVELOPMENT_GUIDANCE)

Commit: 7767fc6 - feat: Phase 3.2 Task 1.2 - Wire LLMOptimizationRunner to production

Key Achievement: Natural language optimization is now wired to production infrastructure. Users can describe optimization problems in plain English, and the system will auto-generate extractors, hooks, and run optimization.


Immediate Next Steps (Week 1 Completion)

Task 1.3: Create Minimal Working Example (Already Done)

Status: COMPLETE - Created in Task 1.2 commit

Deliverable: examples/llm_mode_simple_example.py

What it demonstrates:

request = """
Minimize displacement and mass while keeping stress below 200 MPa.

Design variables:
- beam_half_core_thickness: 15 to 30 mm
- beam_face_thickness: 15 to 30 mm

Run 5 trials using TPE sampler.
"""

Usage:

python examples/llm_mode_simple_example.py

Task 1.4: End-to-End Integration Test COMPLETE

Priority: HIGH DONE Effort: 2 hours (completed) Objective: Verify complete LLM mode workflow works with real FEM solver

Deliverable: tests/test_phase_3_2_e2e.py

Test Coverage (All Implemented):

  1. Natural language request parsing
  2. LLM workflow generation (with API key or Claude Code)
  3. Extractor auto-generation
  4. Hook auto-generation
  5. Model update (NX expressions)
  6. Simulation run (actual FEM solve)
  7. Result extraction
  8. Optimization loop (3 trials minimum)
  9. Results saved to output directory
  10. Graceful failure without API key

Acceptance Criteria: ALL MET

  • Test runs without errors
  • 3 trials complete successfully (verified with API key mode)
  • Best design found and saved
  • Generated extractors work correctly
  • Generated hooks execute without errors
  • Optimization history written to JSON
  • Graceful skip when no API key (provides clear instructions)

Implementation Plan:

def test_e2e_llm_mode():
    """End-to-end test of LLM mode with real FEM solver."""

    # 1. Natural language request
    request = """
    Minimize mass while keeping displacement below 5mm.
    Design variables: beam_half_core_thickness (20-30mm),
                      beam_face_thickness (18-25mm)
    Run 3 trials with TPE sampler.
    """

    # 2. Setup test environment
    study_dir = Path("studies/simple_beam_optimization")
    prt_file = study_dir / "1_setup/model/Beam.prt"
    sim_file = study_dir / "1_setup/model/Beam_sim1.sim"
    output_dir = study_dir / "2_substudies/test_e2e_3trials"

    # 3. Run via subprocess (simulates real usage)
    cmd = [
        "c:/Users/antoi/anaconda3/envs/test_env/python.exe",
        "optimization_engine/run_optimization.py",
        "--llm", request,
        "--prt", str(prt_file),
        "--sim", str(sim_file),
        "--output", str(output_dir.parent),
        "--study-name", "test_e2e_3trials",
        "--trials", "3"
    ]

    result = subprocess.run(cmd, capture_output=True, text=True)

    # 4. Verify outputs
    assert result.returncode == 0
    assert (output_dir / "history.json").exists()
    assert (output_dir / "best_trial.json").exists()
    assert (output_dir / "generated_extractors").exists()

    # 5. Verify results are valid
    with open(output_dir / "history.json") as f:
        history = json.load(f)

    assert len(history) == 3  # 3 trials completed
    assert all("objective" in trial for trial in history)
    assert all("design_variables" in trial for trial in history)

Known Issue to Address:

  • LLMWorkflowAnalyzer Claude Code integration returns empty workflow
  • Options:
    1. Use Anthropic API key for testing (preferred for now)
    2. Implement Claude Code integration in Phase 2.7 first
    3. Mock the LLM response for testing purposes

Recommendation: Use API key for E2E test, document Claude Code gap separately


Week 2: Robustness & Safety (16 hours) 🎯

Objective: Make LLM mode production-ready with validation, fallbacks, and safety

Task 2.1: Code Validation System (6 hours)

Deliverable: optimization_engine/code_validator.py

Features:

  1. Syntax Validation:

    • Run ast.parse() on generated Python code
    • Catch syntax errors before execution
    • Return detailed error messages with line numbers
  2. Security Validation:

    • Check for dangerous imports (os.system, subprocess, eval, etc.)
    • Whitelist-based approach (only allow: numpy, pandas, pathlib, json, etc.)
    • Reject code with file system modifications outside working directory
  3. Schema Validation:

    • Verify extractor returns Dict[str, float]
    • Verify hook has correct signature
    • Validate optimization config structure

Example:

class CodeValidator:
    """Validates generated code before execution."""

    DANGEROUS_IMPORTS = [
        'os.system', 'subprocess', 'eval', 'exec',
        'compile', '__import__', 'open'  # open needs special handling
    ]

    ALLOWED_IMPORTS = [
        'numpy', 'pandas', 'pathlib', 'json', 'math',
        'pyNastran', 'NXOpen', 'typing'
    ]

    def validate_syntax(self, code: str) -> ValidationResult:
        """Check if code has valid Python syntax."""
        try:
            ast.parse(code)
            return ValidationResult(valid=True)
        except SyntaxError as e:
            return ValidationResult(
                valid=False,
                error=f"Syntax error at line {e.lineno}: {e.msg}"
            )

    def validate_security(self, code: str) -> ValidationResult:
        """Check for dangerous operations."""
        tree = ast.parse(code)

        for node in ast.walk(tree):
            # Check imports
            if isinstance(node, ast.Import):
                for alias in node.names:
                    if alias.name not in self.ALLOWED_IMPORTS:
                        return ValidationResult(
                            valid=False,
                            error=f"Disallowed import: {alias.name}"
                        )

            # Check function calls
            if isinstance(node, ast.Call):
                if hasattr(node.func, 'id'):
                    if node.func.id in self.DANGEROUS_IMPORTS:
                        return ValidationResult(
                            valid=False,
                            error=f"Dangerous function call: {node.func.id}"
                        )

        return ValidationResult(valid=True)

    def validate_extractor_schema(self, code: str) -> ValidationResult:
        """Verify extractor returns Dict[str, float]."""
        # Check for return type annotation
        tree = ast.parse(code)

        for node in ast.walk(tree):
            if isinstance(node, ast.FunctionDef):
                if node.name.startswith('extract_'):
                    # Verify has return annotation
                    if node.returns is None:
                        return ValidationResult(
                            valid=False,
                            error=f"Extractor {node.name} missing return type annotation"
                        )

        return ValidationResult(valid=True)

Task 2.2: Fallback Mechanisms (4 hours)

Deliverable: Enhanced error handling in run_optimization.py and llm_optimization_runner.py

Scenarios to Handle:

  1. LLM Analysis Fails:

    try:
        llm_workflow = analyzer.analyze_request(request)
    except Exception as e:
        logger.error(f"LLM analysis failed: {e}")
        logger.info("Falling back to manual mode...")
        logger.info("Please provide a JSON config file or try:")
        logger.info("  - Simplifying your request")
        logger.info("  - Checking API key is valid")
        logger.info("  - Using Claude Code mode (no API key)")
        sys.exit(1)
    
  2. Extractor Generation Fails:

    try:
        extractors = extractor_orchestrator.generate_all()
    except Exception as e:
        logger.error(f"Extractor generation failed: {e}")
        logger.info("Attempting to use fallback extractors...")
    
        # Use pre-built generic extractors
        extractors = {
            'displacement': GenericDisplacementExtractor(),
            'stress': GenericStressExtractor(),
            'mass': GenericMassExtractor()
        }
        logger.info("Using generic extractors - results may be less specific")
    
  3. Hook Generation Fails:

    try:
        hook_manager.generate_hooks(llm_workflow['post_processing_hooks'])
    except Exception as e:
        logger.warning(f"Hook generation failed: {e}")
        logger.info("Continuing without custom hooks...")
        # Optimization continues without hooks (reduced functionality but not fatal)
    
  4. Single Trial Failure:

    def _objective(self, trial):
        try:
            # ... run trial
            return objective_value
        except Exception as e:
            logger.error(f"Trial {trial.number} failed: {e}")
            # Return worst-case value instead of crashing
            return float('inf') if self.direction == 'minimize' else float('-inf')
    

Task 2.3: Comprehensive Test Suite (4 hours)

Deliverable: Extended test coverage in tests/

New Tests:

  1. tests/test_code_validator.py:

    • Test syntax validation catches errors
    • Test security validation blocks dangerous code
    • Test schema validation enforces correct signatures
    • Test allowed imports pass validation
  2. tests/test_fallback_mechanisms.py:

    • Test LLM failure falls back gracefully
    • Test extractor generation failure uses generic extractors
    • Test hook generation failure continues optimization
    • Test single trial failure doesn't crash optimization
  3. tests/test_llm_mode_error_cases.py:

    • Test empty natural language request
    • Test request with missing design variables
    • Test request with conflicting objectives
    • Test request with invalid parameter ranges
  4. tests/test_integration_robustness.py:

    • Test optimization with intermittent FEM failures
    • Test optimization with corrupted OP2 files
    • Test optimization with missing NX expressions
    • Test optimization with invalid design variable values

Task 2.4: Audit Trail System (2 hours)

Deliverable: optimization_engine/audit_trail.py

Features:

  • Log all LLM-generated code to timestamped files
  • Save validation results
  • Track which extractors/hooks were used
  • Record any fallbacks or errors

Example:

class AuditTrail:
    """Records all LLM-generated code and validation results."""

    def __init__(self, output_dir: Path):
        self.output_dir = output_dir / "audit_trail"
        self.output_dir.mkdir(exist_ok=True)

        self.log_file = self.output_dir / f"audit_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
        self.entries = []

    def log_generated_code(self, code_type: str, code: str, validation_result: ValidationResult):
        """Log generated code and validation result."""
        entry = {
            "timestamp": datetime.now().isoformat(),
            "type": code_type,
            "code": code,
            "validation": {
                "valid": validation_result.valid,
                "error": validation_result.error
            }
        }
        self.entries.append(entry)

        # Save to file immediately
        with open(self.log_file, 'w') as f:
            json.dump(self.entries, f, indent=2)

    def log_fallback(self, component: str, reason: str, fallback_action: str):
        """Log when a fallback mechanism is used."""
        entry = {
            "timestamp": datetime.now().isoformat(),
            "type": "fallback",
            "component": component,
            "reason": reason,
            "fallback_action": fallback_action
        }
        self.entries.append(entry)

        with open(self.log_file, 'w') as f:
            json.dump(self.entries, f, indent=2)

Integration:

# In LLMOptimizationRunner.__init__
self.audit_trail = AuditTrail(output_dir)

# When generating extractors
for feature in engineering_features:
    code = generator.generate_extractor(feature)
    validation = validator.validate(code)
    self.audit_trail.log_generated_code("extractor", code, validation)

    if not validation.valid:
        self.audit_trail.log_fallback(
            component="extractor",
            reason=validation.error,
            fallback_action="using generic extractor"
        )

Week 3: Learning System (20 hours)

Objective: Build intelligence that learns from successful generations

Task 3.1: Template Library (8 hours)

Deliverable: optimization_engine/template_library/

Structure:

template_library/
├── extractors/
│   ├── displacement_templates.py
│   ├── stress_templates.py
│   ├── mass_templates.py
│   └── thermal_templates.py
├── calculations/
│   ├── safety_factor_templates.py
│   ├── objective_templates.py
│   └── constraint_templates.py
├── hooks/
│   ├── plotting_templates.py
│   ├── logging_templates.py
│   └── reporting_templates.py
└── registry.py

Features:

  • Pre-validated code templates for common operations
  • Success rate tracking for each template
  • Automatic template selection based on context
  • Template versioning and deprecation

Task 3.2: Knowledge Base Integration (8 hours)

Deliverable: Enhanced ResearchAgent with optimization-specific knowledge

Knowledge Sources:

  1. pyNastran documentation (already integrated in Phase 3)
  2. NXOpen API documentation (NXOpen intellisense - already set up)
  3. Optimization best practices
  4. Common FEA pitfalls and solutions

Features:

  • Query knowledge base during code generation
  • Suggest best practices for extractor design
  • Warn about common mistakes (unit mismatches, etc.)

Task 3.3: Success Metrics & Learning (4 hours)

Deliverable: optimization_engine/learning_system.py

Features:

  • Track which LLM-generated code succeeds vs fails
  • Store successful patterns to knowledge base
  • Suggest improvements based on past failures
  • Auto-tune LLM prompts based on success rate

Week 4: Documentation & Polish (12 hours)

Task 4.1: User Guide (4 hours)

Deliverable: docs/LLM_MODE_USER_GUIDE.md

Contents:

  • Getting started with LLM mode
  • Natural language request formatting tips
  • Common patterns and examples
  • Troubleshooting guide
  • FAQ

Task 4.2: Architecture Documentation (4 hours)

Deliverable: docs/ARCHITECTURE.md

Contents:

  • System architecture diagram
  • Component interaction flows
  • LLM integration points
  • Extractor/hook generation pipeline
  • Data flow diagrams

Task 4.3: Demo Video & Presentation (4 hours)

Deliverable:

  • docs/demo_video.mp4
  • docs/PHASE_3_2_PRESENTATION.pdf

Contents:

  • 5-minute demo video showing LLM mode in action
  • Presentation slides explaining the integration
  • Before/after comparison (manual JSON vs LLM mode)

Success Criteria for Phase 3.2

At the end of 4 weeks, we should have:

  • Week 1: LLM mode wired to production (Task 1.2 COMPLETE)
  • Week 1: End-to-end test passing (Task 1.4)
  • Week 2: Code validation preventing unsafe executions
  • Week 2: Fallback mechanisms for all failure modes
  • Week 2: Test coverage > 80%
  • Week 2: Audit trail for all generated code
  • Week 3: Template library with 20+ validated templates
  • Week 3: Knowledge base integration working
  • Week 3: Learning system tracking success metrics
  • Week 4: Complete user documentation
  • Week 4: Architecture documentation
  • Week 4: Demo video completed

Priority Order

Immediate (This Week):

  1. Task 1.4: End-to-end integration test (2-4 hours)
  2. Address LLMWorkflowAnalyzer Claude Code gap (or use API key)

Week 2 Priorities:

  1. Code validation system (CRITICAL for safety)
  2. Fallback mechanisms (CRITICAL for robustness)
  3. Comprehensive test suite
  4. Audit trail system

Week 3 Priorities:

  1. Template library (HIGH value - improves reliability)
  2. Knowledge base integration
  3. Learning system

Week 4 Priorities:

  1. User guide (CRITICAL for adoption)
  2. Architecture documentation
  3. Demo video

Known Gaps & Risks

Gap 1: LLMWorkflowAnalyzer Claude Code Integration

Status: Empty workflow returned when use_claude_code=True Impact: HIGH - LLM mode doesn't work without API key Options:

  1. Implement Claude Code integration in Phase 2.7
  2. Use API key for now (temporary solution)
  3. Mock LLM responses for testing

Recommendation: Use API key for testing, implement Claude Code integration as Phase 2.7 task


Gap 2: Manual Mode Not Yet Integrated

Status: --config flag not fully implemented Impact: MEDIUM - Users must use study-specific scripts Timeline: Week 2-3 (lower priority than robustness)


Risk 1: LLM-Generated Code Failures

Mitigation: Code validation system (Week 2, Task 2.1) Severity: HIGH if not addressed Status: Planned for Week 2


Risk 2: FEM Solver Failures

Mitigation: Fallback mechanisms (Week 2, Task 2.2) Severity: MEDIUM Status: Planned for Week 2


Recommendations

  1. Complete Task 1.4 this week: Verify E2E workflow works before moving to Week 2

  2. Use API key for testing: Don't block on Claude Code integration - it's a Phase 2.7 component issue

  3. Prioritize safety over features: Week 2 validation is CRITICAL before any production use

  4. Build template library early: Week 3 templates will significantly improve reliability

  5. Document as you go: Don't leave all documentation to Week 4


Conclusion

Phase 3.2 Week 1 Status: COMPLETE

Task 1.2 Achievement: Natural language optimization is now wired to production infrastructure with comprehensive testing and validation.

Next Immediate Step: Complete Task 1.4 (E2E integration test) to verify the complete workflow before moving to Week 2 robustness work.

Overall Progress: 25% of Phase 3.2 complete (1 week / 4 weeks)

Timeline on Track: YES - Week 1 completed on schedule


Author: Claude Code Last Updated: 2025-11-17 Next Review: After Task 1.4 completion