# Feature Registry Architecture > Comprehensive guide to Atomizer's LLM-instructed feature database system **Last Updated**: 2025-01-16 **Status**: Phase 2 - Design Document --- ## Table of Contents 1. [Vision and Goals](#vision-and-goals) 2. [Feature Categorization System](#feature-categorization-system) 3. [Feature Registry Structure](#feature-registry-structure) 4. [LLM Instruction Format](#llm-instruction-format) 5. [Feature Documentation Strategy](#feature-documentation-strategy) 6. [Dynamic Tool Building](#dynamic-tool-building) 7. [Examples](#examples) 8. [Implementation Plan](#implementation-plan) --- ## Vision and Goals ### Core Philosophy Atomizer's feature registry is not just a catalog - it's an **LLM instruction system** that enables: 1. **Self-Documentation**: Features describe themselves to the LLM 2. **Intelligent Composition**: LLM can combine features into workflows 3. **Autonomous Proposals**: LLM suggests new features based on user needs 4. **Structured Customization**: Users customize the tool through natural language 5. **Continuous Evolution**: Feature database grows as users add capabilities ### Key Principles - **Feature Types Are First-Class**: Engineering, software, UI, and analysis features are equally important - **Location-Aware**: Features know where their code lives and how to use it - **Metadata-Rich**: Each feature has enough context for LLM to understand and use it - **Composable**: Features can be combined into higher-level workflows - **Extensible**: New feature types can be added without breaking the system --- ## Feature Categorization System ### Primary Feature Dimensions Features are organized along **three dimensions**: #### Dimension 1: Domain (WHAT it does) - **Engineering**: Physics-based operations (stress, thermal, modal, etc.) - **Software**: Core algorithms and infrastructure (optimization, hooks, path resolution) - **UI**: User-facing components (dashboard, reports, visualization) - **Analysis**: Post-processing and decision support (sensitivity, Pareto, surrogate quality) #### Dimension 2: Lifecycle Stage (WHEN it runs) - **Pre-Mesh**: Before meshing (geometry operations) - **Pre-Solve**: Before FEA solve (parameter updates, logging) - **Solve**: During FEA execution (solver control) - **Post-Solve**: After solve, before extraction (file validation) - **Post-Extraction**: After result extraction (logging, analysis) - **Post-Optimization**: After optimization completes (reporting, visualization) #### Dimension 3: Abstraction Level (HOW it's used) - **Primitive**: Low-level functions (extract_stress, update_expression) - **Composite**: Mid-level workflows (RSS_metric, weighted_objective) - **Workflow**: High-level operations (run_optimization, generate_report) ### Feature Type Classification ``` ┌─────────────────────────────────────────────────────────────┐ │ FEATURE UNIVERSE │ └─────────────────────────────────────────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ │ │ │ ENGINEERING SOFTWARE UI │ │ │ ┌───┴───┐ ┌────┴────┐ ┌─────┴─────┐ │ │ │ │ │ │ Extractors Metrics Optimization Hooks Dashboard Reports │ │ │ │ │ │ Stress RSS Optuna Pre-Solve Widgets HTML Thermal SCF TPE Post-Solve Controls PDF Modal FOS Sampler Post-Extract Charts Markdown ``` --- ## Feature Registry Structure ### JSON Schema ```json { "feature_registry": { "version": "0.2.0", "last_updated": "2025-01-16", "categories": { "engineering": { ... }, "software": { ... }, "ui": { ... }, "analysis": { ... } } } } ``` ### Feature Entry Schema Each feature has: ```json { "feature_id": "unique_identifier", "name": "Human-Readable Name", "description": "What this feature does (for LLM understanding)", "category": "engineering|software|ui|analysis", "subcategory": "extractors|metrics|optimization|hooks|...", "lifecycle_stage": "pre_solve|post_solve|post_extraction|...", "abstraction_level": "primitive|composite|workflow", "implementation": { "file_path": "relative/path/to/implementation.py", "function_name": "function_or_class_name", "entry_point": "how to invoke this feature" }, "interface": { "inputs": [ { "name": "parameter_name", "type": "str|int|float|dict|list", "required": true, "description": "What this parameter does", "units": "mm|MPa|Hz|none", "example": "example_value" } ], "outputs": [ { "name": "output_name", "type": "float|dict|list", "description": "What this output represents", "units": "mm|MPa|Hz|none" } ] }, "dependencies": { "features": ["feature_id_1", "feature_id_2"], "libraries": ["optuna", "pyNastran"], "nx_version": "2412" }, "usage_examples": [ { "description": "Example scenario", "code": "example_code_snippet", "natural_language": "How user would request this" } ], "composition_hints": { "combines_with": ["feature_id_3", "feature_id_4"], "typical_workflows": ["workflow_name_1"], "prerequisites": ["feature that must run before this"] }, "metadata": { "author": "Antoine Polvé", "created": "2025-01-16", "status": "stable|experimental|deprecated", "tested": true, "documentation_url": "docs/features/feature_name.md" } } ``` --- ## LLM Instruction Format ### How LLM Uses the Registry The feature registry serves as a **structured instruction manual** for the LLM: #### 1. Discovery Phase ``` User: "I want to minimize stress on my bracket" LLM reads registry: → Finds category="engineering", subcategory="extractors" → Discovers "stress_extractor" feature → Reads: "Extracts von Mises stress from OP2 files" → Checks composition_hints: combines_with=["optimization_runner"] LLM response: "I'll use the stress_extractor feature to minimize stress. This requires an OP2 file from NX solve." ``` #### 2. Composition Phase ``` User: "Add a custom RSS metric combining stress and displacement" LLM reads registry: → Finds abstraction_level="composite" examples → Discovers "rss_metric" template feature → Reads interface: inputs=[stress_value, displacement_value] → Checks composition_hints: combines_with=["stress_extractor", "displacement_extractor"] LLM generates new composite feature following the pattern ``` #### 3. Proposal Phase ``` User: "What features could help me analyze fatigue life?" LLM reads registry: → Searches category="engineering", subcategory="extractors" → Finds: stress_extractor, displacement_extractor (exist) → Doesn't find: fatigue_extractor (missing) → Reads composition_hints for similar features LLM proposes: "I can create a fatigue_life_extractor that: 1. Extracts stress history from OP2 2. Applies rainflow counting algorithm 3. Uses S-N curve to estimate fatigue life This would be similar to stress_extractor but with time-series analysis. Should I implement it?" ``` #### 4. Execution Phase ``` User: "Run the optimization" LLM reads registry: → Finds abstraction_level="workflow", feature_id="run_optimization" → Reads implementation.entry_point → Checks dependencies: ["optuna", "nx_solver", "stress_extractor"] → Reads lifecycle_stage to understand execution order LLM executes: python optimization_engine/runner.py ``` ### Natural Language Mapping Each feature includes `natural_language` examples showing how users might request it: ```json "usage_examples": [ { "natural_language": [ "minimize stress", "reduce von Mises stress", "find lowest stress configuration", "optimize for minimum stress" ], "maps_to": { "feature": "stress_extractor", "objective": "minimize", "metric": "max_von_mises" } } ] ``` This enables LLM to understand user intent and select correct features. --- ## Feature Documentation Strategy ### Multi-Location Documentation Features are documented in **three places**, each serving different purposes: #### 1. Feature Registry (feature_registry.json) **Purpose**: LLM instruction and discovery **Location**: `optimization_engine/feature_registry.json` **Content**: - Structured metadata - Interface definitions - Composition hints - Usage examples **Example**: ```json { "feature_id": "stress_extractor", "name": "Stress Extractor", "description": "Extracts von Mises stress from OP2 files", "category": "engineering", "subcategory": "extractors" } ``` #### 2. Code Implementation (*.py files) **Purpose**: Actual functionality **Location**: Codebase (e.g., `optimization_engine/result_extractors/extractors.py`) **Content**: - Python code with docstrings - Type hints - Implementation details **Example**: ```python def extract_stress_from_op2(op2_file: Path) -> dict: """ Extracts von Mises stress from OP2 file. Args: op2_file: Path to OP2 file Returns: dict with max_von_mises, min_von_mises, avg_von_mises """ # Implementation... ``` #### 3. Feature Documentation (docs/features/*.md) **Purpose**: Human-readable guides and tutorials **Location**: `docs/features/` **Content**: - Detailed explanations - Extended examples - Best practices - Troubleshooting **Example**: `docs/features/stress_extractor.md` ```markdown # Stress Extractor ## Overview Extracts von Mises stress from NX Nastran OP2 files. ## When to Use - Structural optimization where stress is the objective - Constraint checking (yield stress limits) - Multi-objective with stress as one objective ## Example Workflows [detailed examples...] ``` ### Documentation Flow ``` User Request ↓ LLM reads feature_registry.json (discovers feature) ↓ LLM reads code docstrings (understands interface) ↓ LLM reads docs/features/*.md (if complex usage needed) ↓ LLM composes workflow using features ``` --- ## Dynamic Tool Building ### How LLM Builds New Features The registry enables **autonomous feature creation** through templates and patterns: #### Step 1: Pattern Recognition ``` User: "I need thermal stress extraction" LLM: 1. Reads existing feature: stress_extractor 2. Identifies pattern: OP2 parsing → result extraction → return dict 3. Finds similar features: displacement_extractor 4. Recognizes template: engineering.extractors ``` #### Step 2: Feature Generation ``` LLM generates new feature following pattern: { "feature_id": "thermal_stress_extractor", "name": "Thermal Stress Extractor", "description": "Extracts thermal stress from OP2 files (steady-state heat transfer analysis)", "category": "engineering", "subcategory": "extractors", "lifecycle_stage": "post_extraction", "abstraction_level": "primitive", "implementation": { "file_path": "optimization_engine/result_extractors/thermal_extractors.py", "function_name": "extract_thermal_stress_from_op2", "entry_point": "from optimization_engine.result_extractors.thermal_extractors import extract_thermal_stress_from_op2" }, # ... rest of schema } ``` #### Step 3: Code Generation ```python # LLM writes implementation following stress_extractor pattern def extract_thermal_stress_from_op2(op2_file: Path) -> dict: """ Extracts thermal stress from OP2 file. Args: op2_file: Path to OP2 file from thermal analysis Returns: dict with max_thermal_stress, temperature_at_max_stress """ from pyNastran.op2.op2 import OP2 op2 = OP2() op2.read_op2(op2_file) # Extract thermal stress (element type depends on analysis) thermal_stress = op2.thermal_stress_data return { 'max_thermal_stress': thermal_stress.max(), 'temperature_at_max_stress': # ... } ``` #### Step 4: Registration ``` LLM adds to feature_registry.json LLM creates docs/features/thermal_stress_extractor.md LLM updates CHANGELOG.md with new feature LLM runs tests to validate implementation ``` ### Feature Composition Examples #### Example 1: RSS Metric (Composite Feature) ``` User: "Create RSS metric combining stress and displacement" LLM composes from primitives: stress_extractor + displacement_extractor → rss_metric Generated feature: { "feature_id": "rss_stress_displacement", "abstraction_level": "composite", "dependencies": { "features": ["stress_extractor", "displacement_extractor"] }, "composition_hints": { "composed_from": ["stress_extractor", "displacement_extractor"], "composition_type": "root_sum_square" } } ``` #### Example 2: Complete Workflow ``` User: "Run bracket optimization minimizing stress" LLM composes workflow from features: 1. study_manager (create study folder) 2. nx_updater (update wall_thickness parameter) 3. nx_solver (run FEA) 4. stress_extractor (extract results) 5. optimization_runner (Optuna TPE loop) 6. report_generator (create HTML report) Each step uses a feature from registry with proper sequencing based on lifecycle_stage metadata. ``` --- ## Examples ### Example 1: Engineering Feature (Stress Extractor) ```json { "feature_id": "stress_extractor", "name": "Stress Extractor", "description": "Extracts von Mises stress from NX Nastran OP2 files", "category": "engineering", "subcategory": "extractors", "lifecycle_stage": "post_extraction", "abstraction_level": "primitive", "implementation": { "file_path": "optimization_engine/result_extractors/extractors.py", "function_name": "extract_stress_from_op2", "entry_point": "from optimization_engine.result_extractors.extractors import extract_stress_from_op2" }, "interface": { "inputs": [ { "name": "op2_file", "type": "Path", "required": true, "description": "Path to OP2 file from NX solve", "example": "bracket_sim1-solution_1.op2" } ], "outputs": [ { "name": "max_von_mises", "type": "float", "description": "Maximum von Mises stress across all elements", "units": "MPa" }, { "name": "element_id_at_max", "type": "int", "description": "Element ID where max stress occurs" } ] }, "dependencies": { "features": [], "libraries": ["pyNastran"], "nx_version": "2412" }, "usage_examples": [ { "description": "Minimize stress in bracket optimization", "code": "result = extract_stress_from_op2(Path('bracket.op2'))\nmax_stress = result['max_von_mises']", "natural_language": [ "minimize stress", "reduce von Mises stress", "find lowest stress configuration" ] } ], "composition_hints": { "combines_with": ["displacement_extractor", "mass_extractor"], "typical_workflows": ["structural_optimization", "stress_minimization"], "prerequisites": ["nx_solver"] }, "metadata": { "author": "Antoine Polvé", "created": "2025-01-10", "status": "stable", "tested": true, "documentation_url": "docs/features/stress_extractor.md" } } ``` ### Example 2: Software Feature (Hook Manager) ```json { "feature_id": "hook_manager", "name": "Hook Manager", "description": "Manages plugin lifecycle hooks for optimization workflow", "category": "software", "subcategory": "infrastructure", "lifecycle_stage": "all", "abstraction_level": "composite", "implementation": { "file_path": "optimization_engine/plugins/hook_manager.py", "function_name": "HookManager", "entry_point": "from optimization_engine.plugins.hook_manager import HookManager" }, "interface": { "inputs": [ { "name": "hook_type", "type": "str", "required": true, "description": "Lifecycle point: pre_solve, post_solve, post_extraction", "example": "pre_solve" }, { "name": "context", "type": "dict", "required": true, "description": "Context data passed to hooks (trial_number, design_variables, etc.)" } ], "outputs": [ { "name": "execution_history", "type": "list", "description": "List of hooks executed with timestamps and success status" } ] }, "dependencies": { "features": [], "libraries": [], "nx_version": null }, "usage_examples": [ { "description": "Execute pre-solve hooks before FEA", "code": "hook_manager.execute_hooks('pre_solve', context={'trial': 1})", "natural_language": [ "run pre-solve plugins", "execute hooks before solving" ] } ], "composition_hints": { "combines_with": ["detailed_logger", "optimization_logger"], "typical_workflows": ["optimization_runner"], "prerequisites": [] }, "metadata": { "author": "Antoine Polvé", "created": "2025-01-16", "status": "stable", "tested": true, "documentation_url": "docs/features/hook_manager.md" } } ``` ### Example 3: UI Feature (Dashboard Widget) ```json { "feature_id": "optimization_progress_chart", "name": "Optimization Progress Chart", "description": "Real-time chart showing optimization convergence", "category": "ui", "subcategory": "dashboard_widgets", "lifecycle_stage": "post_optimization", "abstraction_level": "composite", "implementation": { "file_path": "dashboard/frontend/components/ProgressChart.js", "function_name": "OptimizationProgressChart", "entry_point": "new OptimizationProgressChart(containerId)" }, "interface": { "inputs": [ { "name": "trial_data", "type": "list[dict]", "required": true, "description": "List of trial results with objective values", "example": "[{trial: 1, value: 45.3}, {trial: 2, value: 42.1}]" } ], "outputs": [ { "name": "chart_element", "type": "HTMLElement", "description": "Rendered chart DOM element" } ] }, "dependencies": { "features": [], "libraries": ["Chart.js"], "nx_version": null }, "usage_examples": [ { "description": "Display optimization progress in dashboard", "code": "chart = new OptimizationProgressChart('chart-container')\nchart.update(trial_data)", "natural_language": [ "show optimization progress", "display convergence chart", "visualize trial results" ] } ], "composition_hints": { "combines_with": ["trial_history_table", "best_parameters_display"], "typical_workflows": ["dashboard_view", "result_monitoring"], "prerequisites": ["optimization_runner"] }, "metadata": { "author": "Antoine Polvé", "created": "2025-01-10", "status": "stable", "tested": true, "documentation_url": "docs/features/dashboard_widgets.md" } } ``` ### Example 4: Analysis Feature (Surrogate Quality Checker) ```json { "feature_id": "surrogate_quality_checker", "name": "Surrogate Quality Checker", "description": "Evaluates surrogate model quality using R², CV score, and confidence intervals", "category": "analysis", "subcategory": "decision_support", "lifecycle_stage": "post_optimization", "abstraction_level": "composite", "implementation": { "file_path": "optimization_engine/analysis/surrogate_quality.py", "function_name": "check_surrogate_quality", "entry_point": "from optimization_engine.analysis.surrogate_quality import check_surrogate_quality" }, "interface": { "inputs": [ { "name": "trial_data", "type": "list[dict]", "required": true, "description": "Trial history with design variables and objectives" }, { "name": "min_r_squared", "type": "float", "required": false, "description": "Minimum acceptable R² threshold", "example": "0.9" } ], "outputs": [ { "name": "r_squared", "type": "float", "description": "Coefficient of determination", "units": "none" }, { "name": "cv_score", "type": "float", "description": "Cross-validation score", "units": "none" }, { "name": "quality_verdict", "type": "str", "description": "EXCELLENT|GOOD|POOR based on metrics" } ] }, "dependencies": { "features": ["optimization_runner"], "libraries": ["sklearn", "numpy"], "nx_version": null }, "usage_examples": [ { "description": "Check if surrogate is reliable for predictions", "code": "quality = check_surrogate_quality(trial_data)\nif quality['r_squared'] > 0.9:\n print('Surrogate is reliable')", "natural_language": [ "check surrogate quality", "is surrogate reliable", "can I trust the surrogate model" ] } ], "composition_hints": { "combines_with": ["sensitivity_analysis", "pareto_front_analyzer"], "typical_workflows": ["post_optimization_analysis", "decision_support"], "prerequisites": ["optimization_runner"] }, "metadata": { "author": "Antoine Polvé", "created": "2025-01-16", "status": "experimental", "tested": false, "documentation_url": "docs/features/surrogate_quality_checker.md" } } ``` --- ## Implementation Plan ### Phase 2 Week 1: Foundation #### Day 1-2: Create Initial Registry - [ ] Create `optimization_engine/feature_registry.json` - [ ] Document 15-20 existing features across all categories - [ ] Add engineering features (stress_extractor, displacement_extractor) - [ ] Add software features (hook_manager, optimization_runner, nx_solver) - [ ] Add UI features (dashboard widgets) #### Day 3-4: LLM Skill Setup - [ ] Create `.claude/skills/atomizer.md` - [ ] Define how LLM should read and use feature_registry.json - [ ] Add feature discovery examples - [ ] Add feature composition examples - [ ] Test LLM's ability to navigate registry #### Day 5: Documentation - [ ] Create `docs/features/` directory - [ ] Write feature guides for key features - [ ] Link registry entries to documentation - [ ] Update DEVELOPMENT.md with registry usage ### Phase 2 Week 2: LLM Integration #### Natural Language Parser - [ ] Intent classification using registry metadata - [ ] Entity extraction for design variables, objectives - [ ] Feature selection based on user request - [ ] Workflow composition from features ### Future Phases: Feature Expansion #### Phase 3: Code Generation - [ ] Template features for common patterns - [ ] Validation rules for generated code - [ ] Auto-registration of new features #### Phase 4-7: Continuous Evolution - [ ] User-contributed features - [ ] Pattern learning from usage - [ ] Best practices extraction - [ ] Self-documentation updates --- ## Benefits of This Architecture ### For Users - **Natural language control**: "minimize stress" → LLM selects stress_extractor - **Intelligent suggestions**: LLM proposes features based on context - **No configuration files**: LLM generates config from conversation ### For Developers - **Clear structure**: Features organized by domain, lifecycle, abstraction - **Easy extension**: Add new features following templates - **Self-documenting**: Registry serves as API documentation ### For LLM - **Comprehensive context**: All capabilities in one place - **Composition guidance**: Knows how features combine - **Natural language mapping**: Understands user intent - **Pattern recognition**: Can generate new features from templates --- ## Next Steps 1. **Create initial feature_registry.json** with 15-20 existing features 2. **Test LLM navigation** with Claude skill 3. **Validate registry structure** with real user requests 4. **Iterate on metadata** based on LLM's needs 5. **Build out documentation** in docs/features/ --- **Maintained by**: Antoine Polvé (antoine@atomaste.com) **Repository**: [GitHub - Atomizer](https://github.com/yourusername/Atomizer)