Files
Atomizer/docs/archive/historical/FEATURE_REGISTRY_ARCHITECTURE.md

844 lines
24 KiB
Markdown
Raw Normal View History

# Feature Registry Architecture
> Comprehensive guide to Atomizer's LLM-instructed feature database system
**Last Updated**: 2025-01-16
**Status**: Phase 2 - Design Document
---
## Table of Contents
1. [Vision and Goals](#vision-and-goals)
2. [Feature Categorization System](#feature-categorization-system)
3. [Feature Registry Structure](#feature-registry-structure)
4. [LLM Instruction Format](#llm-instruction-format)
5. [Feature Documentation Strategy](#feature-documentation-strategy)
6. [Dynamic Tool Building](#dynamic-tool-building)
7. [Examples](#examples)
8. [Implementation Plan](#implementation-plan)
---
## Vision and Goals
### Core Philosophy
Atomizer's feature registry is not just a catalog - it's an **LLM instruction system** that enables:
1. **Self-Documentation**: Features describe themselves to the LLM
2. **Intelligent Composition**: LLM can combine features into workflows
3. **Autonomous Proposals**: LLM suggests new features based on user needs
4. **Structured Customization**: Users customize the tool through natural language
5. **Continuous Evolution**: Feature database grows as users add capabilities
### Key Principles
- **Feature Types Are First-Class**: Engineering, software, UI, and analysis features are equally important
- **Location-Aware**: Features know where their code lives and how to use it
- **Metadata-Rich**: Each feature has enough context for LLM to understand and use it
- **Composable**: Features can be combined into higher-level workflows
- **Extensible**: New feature types can be added without breaking the system
---
## Feature Categorization System
### Primary Feature Dimensions
Features are organized along **three dimensions**:
#### Dimension 1: Domain (WHAT it does)
- **Engineering**: Physics-based operations (stress, thermal, modal, etc.)
- **Software**: Core algorithms and infrastructure (optimization, hooks, path resolution)
- **UI**: User-facing components (dashboard, reports, visualization)
- **Analysis**: Post-processing and decision support (sensitivity, Pareto, surrogate quality)
#### Dimension 2: Lifecycle Stage (WHEN it runs)
- **Pre-Mesh**: Before meshing (geometry operations)
- **Pre-Solve**: Before FEA solve (parameter updates, logging)
- **Solve**: During FEA execution (solver control)
- **Post-Solve**: After solve, before extraction (file validation)
- **Post-Extraction**: After result extraction (logging, analysis)
- **Post-Optimization**: After optimization completes (reporting, visualization)
#### Dimension 3: Abstraction Level (HOW it's used)
- **Primitive**: Low-level functions (extract_stress, update_expression)
- **Composite**: Mid-level workflows (RSS_metric, weighted_objective)
- **Workflow**: High-level operations (run_optimization, generate_report)
### Feature Type Classification
```
┌─────────────────────────────────────────────────────────────┐
│ FEATURE UNIVERSE │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────┼─────────────────────┐
│ │ │
ENGINEERING SOFTWARE UI
│ │ │
┌───┴───┐ ┌────┴────┐ ┌─────┴─────┐
│ │ │ │ │ │
Extractors Metrics Optimization Hooks Dashboard Reports
│ │ │ │ │ │
Stress RSS Optuna Pre-Solve Widgets HTML
Thermal SCF TPE Post-Solve Controls PDF
Modal FOS Sampler Post-Extract Charts Markdown
```
---
## Feature Registry Structure
### JSON Schema
```json
{
"feature_registry": {
"version": "0.2.0",
"last_updated": "2025-01-16",
"categories": {
"engineering": { ... },
"software": { ... },
"ui": { ... },
"analysis": { ... }
}
}
}
```
### Feature Entry Schema
Each feature has:
```json
{
"feature_id": "unique_identifier",
"name": "Human-Readable Name",
"description": "What this feature does (for LLM understanding)",
"category": "engineering|software|ui|analysis",
"subcategory": "extractors|metrics|optimization|hooks|...",
"lifecycle_stage": "pre_solve|post_solve|post_extraction|...",
"abstraction_level": "primitive|composite|workflow",
"implementation": {
"file_path": "relative/path/to/implementation.py",
"function_name": "function_or_class_name",
"entry_point": "how to invoke this feature"
},
"interface": {
"inputs": [
{
"name": "parameter_name",
"type": "str|int|float|dict|list",
"required": true,
"description": "What this parameter does",
"units": "mm|MPa|Hz|none",
"example": "example_value"
}
],
"outputs": [
{
"name": "output_name",
"type": "float|dict|list",
"description": "What this output represents",
"units": "mm|MPa|Hz|none"
}
]
},
"dependencies": {
"features": ["feature_id_1", "feature_id_2"],
"libraries": ["optuna", "pyNastran"],
"nx_version": "2412"
},
"usage_examples": [
{
"description": "Example scenario",
"code": "example_code_snippet",
"natural_language": "How user would request this"
}
],
"composition_hints": {
"combines_with": ["feature_id_3", "feature_id_4"],
"typical_workflows": ["workflow_name_1"],
"prerequisites": ["feature that must run before this"]
},
"metadata": {
"author": "Antoine Polvé",
"created": "2025-01-16",
"status": "stable|experimental|deprecated",
"tested": true,
"documentation_url": "docs/features/feature_name.md"
}
}
```
---
## LLM Instruction Format
### How LLM Uses the Registry
The feature registry serves as a **structured instruction manual** for the LLM:
#### 1. Discovery Phase
```
User: "I want to minimize stress on my bracket"
LLM reads registry:
→ Finds category="engineering", subcategory="extractors"
→ Discovers "stress_extractor" feature
→ Reads: "Extracts von Mises stress from OP2 files"
→ Checks composition_hints: combines_with=["optimization_runner"]
LLM response: "I'll use the stress_extractor feature to minimize stress.
This requires an OP2 file from NX solve."
```
#### 2. Composition Phase
```
User: "Add a custom RSS metric combining stress and displacement"
LLM reads registry:
→ Finds abstraction_level="composite" examples
→ Discovers "rss_metric" template feature
→ Reads interface: inputs=[stress_value, displacement_value]
→ Checks composition_hints: combines_with=["stress_extractor", "displacement_extractor"]
LLM generates new composite feature following the pattern
```
#### 3. Proposal Phase
```
User: "What features could help me analyze fatigue life?"
LLM reads registry:
→ Searches category="engineering", subcategory="extractors"
→ Finds: stress_extractor, displacement_extractor (exist)
→ Doesn't find: fatigue_extractor (missing)
→ Reads composition_hints for similar features
LLM proposes: "I can create a fatigue_life_extractor that:
1. Extracts stress history from OP2
2. Applies rainflow counting algorithm
3. Uses S-N curve to estimate fatigue life
This would be similar to stress_extractor but with
time-series analysis. Should I implement it?"
```
#### 4. Execution Phase
```
User: "Run the optimization"
LLM reads registry:
→ Finds abstraction_level="workflow", feature_id="run_optimization"
→ Reads implementation.entry_point
→ Checks dependencies: ["optuna", "nx_solver", "stress_extractor"]
→ Reads lifecycle_stage to understand execution order
LLM executes: python optimization_engine/runner.py
```
### Natural Language Mapping
Each feature includes `natural_language` examples showing how users might request it:
```json
"usage_examples": [
{
"natural_language": [
"minimize stress",
"reduce von Mises stress",
"find lowest stress configuration",
"optimize for minimum stress"
],
"maps_to": {
"feature": "stress_extractor",
"objective": "minimize",
"metric": "max_von_mises"
}
}
]
```
This enables LLM to understand user intent and select correct features.
---
## Feature Documentation Strategy
### Multi-Location Documentation
Features are documented in **three places**, each serving different purposes:
#### 1. Feature Registry (feature_registry.json)
**Purpose**: LLM instruction and discovery
**Location**: `optimization_engine/feature_registry.json`
**Content**:
- Structured metadata
- Interface definitions
- Composition hints
- Usage examples
**Example**:
```json
{
"feature_id": "stress_extractor",
"name": "Stress Extractor",
"description": "Extracts von Mises stress from OP2 files",
"category": "engineering",
"subcategory": "extractors"
}
```
#### 2. Code Implementation (*.py files)
**Purpose**: Actual functionality
**Location**: Codebase (e.g., `optimization_engine/result_extractors/extractors.py`)
**Content**:
- Python code with docstrings
- Type hints
- Implementation details
**Example**:
```python
def extract_stress_from_op2(op2_file: Path) -> dict:
"""
Extracts von Mises stress from OP2 file.
Args:
op2_file: Path to OP2 file
Returns:
dict with max_von_mises, min_von_mises, avg_von_mises
"""
# Implementation...
```
#### 3. Feature Documentation (docs/features/*.md)
**Purpose**: Human-readable guides and tutorials
**Location**: `docs/features/`
**Content**:
- Detailed explanations
- Extended examples
- Best practices
- Troubleshooting
**Example**: `docs/features/stress_extractor.md`
```markdown
# Stress Extractor
## Overview
Extracts von Mises stress from NX Nastran OP2 files.
## When to Use
- Structural optimization where stress is the objective
- Constraint checking (yield stress limits)
- Multi-objective with stress as one objective
## Example Workflows
[detailed examples...]
```
### Documentation Flow
```
User Request
LLM reads feature_registry.json (discovers feature)
LLM reads code docstrings (understands interface)
LLM reads docs/features/*.md (if complex usage needed)
LLM composes workflow using features
```
---
## Dynamic Tool Building
### How LLM Builds New Features
The registry enables **autonomous feature creation** through templates and patterns:
#### Step 1: Pattern Recognition
```
User: "I need thermal stress extraction"
LLM:
1. Reads existing feature: stress_extractor
2. Identifies pattern: OP2 parsing → result extraction → return dict
3. Finds similar features: displacement_extractor
4. Recognizes template: engineering.extractors
```
#### Step 2: Feature Generation
```
LLM generates new feature following pattern:
{
"feature_id": "thermal_stress_extractor",
"name": "Thermal Stress Extractor",
"description": "Extracts thermal stress from OP2 files (steady-state heat transfer analysis)",
"category": "engineering",
"subcategory": "extractors",
"lifecycle_stage": "post_extraction",
"abstraction_level": "primitive",
"implementation": {
"file_path": "optimization_engine/result_extractors/thermal_extractors.py",
"function_name": "extract_thermal_stress_from_op2",
"entry_point": "from optimization_engine.result_extractors.thermal_extractors import extract_thermal_stress_from_op2"
},
# ... rest of schema
}
```
#### Step 3: Code Generation
```python
# LLM writes implementation following stress_extractor pattern
def extract_thermal_stress_from_op2(op2_file: Path) -> dict:
"""
Extracts thermal stress from OP2 file.
Args:
op2_file: Path to OP2 file from thermal analysis
Returns:
dict with max_thermal_stress, temperature_at_max_stress
"""
from pyNastran.op2.op2 import OP2
op2 = OP2()
op2.read_op2(op2_file)
# Extract thermal stress (element type depends on analysis)
thermal_stress = op2.thermal_stress_data
return {
'max_thermal_stress': thermal_stress.max(),
'temperature_at_max_stress': # ...
}
```
#### Step 4: Registration
```
LLM adds to feature_registry.json
LLM creates docs/features/thermal_stress_extractor.md
LLM updates CHANGELOG.md with new feature
LLM runs tests to validate implementation
```
### Feature Composition Examples
#### Example 1: RSS Metric (Composite Feature)
```
User: "Create RSS metric combining stress and displacement"
LLM composes from primitives:
stress_extractor + displacement_extractor → rss_metric
Generated feature:
{
"feature_id": "rss_stress_displacement",
"abstraction_level": "composite",
"dependencies": {
"features": ["stress_extractor", "displacement_extractor"]
},
"composition_hints": {
"composed_from": ["stress_extractor", "displacement_extractor"],
"composition_type": "root_sum_square"
}
}
```
#### Example 2: Complete Workflow
```
User: "Run bracket optimization minimizing stress"
LLM composes workflow from features:
1. study_manager (create study folder)
2. nx_updater (update wall_thickness parameter)
3. nx_solver (run FEA)
4. stress_extractor (extract results)
5. optimization_runner (Optuna TPE loop)
6. report_generator (create HTML report)
Each step uses a feature from registry with proper sequencing
based on lifecycle_stage metadata.
```
---
## Examples
### Example 1: Engineering Feature (Stress Extractor)
```json
{
"feature_id": "stress_extractor",
"name": "Stress Extractor",
"description": "Extracts von Mises stress from NX Nastran OP2 files",
"category": "engineering",
"subcategory": "extractors",
"lifecycle_stage": "post_extraction",
"abstraction_level": "primitive",
"implementation": {
"file_path": "optimization_engine/result_extractors/extractors.py",
"function_name": "extract_stress_from_op2",
"entry_point": "from optimization_engine.result_extractors.extractors import extract_stress_from_op2"
},
"interface": {
"inputs": [
{
"name": "op2_file",
"type": "Path",
"required": true,
"description": "Path to OP2 file from NX solve",
"example": "bracket_sim1-solution_1.op2"
}
],
"outputs": [
{
"name": "max_von_mises",
"type": "float",
"description": "Maximum von Mises stress across all elements",
"units": "MPa"
},
{
"name": "element_id_at_max",
"type": "int",
"description": "Element ID where max stress occurs"
}
]
},
"dependencies": {
"features": [],
"libraries": ["pyNastran"],
"nx_version": "2412"
},
"usage_examples": [
{
"description": "Minimize stress in bracket optimization",
"code": "result = extract_stress_from_op2(Path('bracket.op2'))\nmax_stress = result['max_von_mises']",
"natural_language": [
"minimize stress",
"reduce von Mises stress",
"find lowest stress configuration"
]
}
],
"composition_hints": {
"combines_with": ["displacement_extractor", "mass_extractor"],
"typical_workflows": ["structural_optimization", "stress_minimization"],
"prerequisites": ["nx_solver"]
},
"metadata": {
"author": "Antoine Polvé",
"created": "2025-01-10",
"status": "stable",
"tested": true,
"documentation_url": "docs/features/stress_extractor.md"
}
}
```
### Example 2: Software Feature (Hook Manager)
```json
{
"feature_id": "hook_manager",
"name": "Hook Manager",
"description": "Manages plugin lifecycle hooks for optimization workflow",
"category": "software",
"subcategory": "infrastructure",
"lifecycle_stage": "all",
"abstraction_level": "composite",
"implementation": {
"file_path": "optimization_engine/plugins/hook_manager.py",
"function_name": "HookManager",
"entry_point": "from optimization_engine.plugins.hook_manager import HookManager"
},
"interface": {
"inputs": [
{
"name": "hook_type",
"type": "str",
"required": true,
"description": "Lifecycle point: pre_solve, post_solve, post_extraction",
"example": "pre_solve"
},
{
"name": "context",
"type": "dict",
"required": true,
"description": "Context data passed to hooks (trial_number, design_variables, etc.)"
}
],
"outputs": [
{
"name": "execution_history",
"type": "list",
"description": "List of hooks executed with timestamps and success status"
}
]
},
"dependencies": {
"features": [],
"libraries": [],
"nx_version": null
},
"usage_examples": [
{
"description": "Execute pre-solve hooks before FEA",
"code": "hook_manager.execute_hooks('pre_solve', context={'trial': 1})",
"natural_language": [
"run pre-solve plugins",
"execute hooks before solving"
]
}
],
"composition_hints": {
"combines_with": ["detailed_logger", "optimization_logger"],
"typical_workflows": ["optimization_runner"],
"prerequisites": []
},
"metadata": {
"author": "Antoine Polvé",
"created": "2025-01-16",
"status": "stable",
"tested": true,
"documentation_url": "docs/features/hook_manager.md"
}
}
```
### Example 3: UI Feature (Dashboard Widget)
```json
{
"feature_id": "optimization_progress_chart",
"name": "Optimization Progress Chart",
"description": "Real-time chart showing optimization convergence",
"category": "ui",
"subcategory": "dashboard_widgets",
"lifecycle_stage": "post_optimization",
"abstraction_level": "composite",
"implementation": {
"file_path": "dashboard/frontend/components/ProgressChart.js",
"function_name": "OptimizationProgressChart",
"entry_point": "new OptimizationProgressChart(containerId)"
},
"interface": {
"inputs": [
{
"name": "trial_data",
"type": "list[dict]",
"required": true,
"description": "List of trial results with objective values",
"example": "[{trial: 1, value: 45.3}, {trial: 2, value: 42.1}]"
}
],
"outputs": [
{
"name": "chart_element",
"type": "HTMLElement",
"description": "Rendered chart DOM element"
}
]
},
"dependencies": {
"features": [],
"libraries": ["Chart.js"],
"nx_version": null
},
"usage_examples": [
{
"description": "Display optimization progress in dashboard",
"code": "chart = new OptimizationProgressChart('chart-container')\nchart.update(trial_data)",
"natural_language": [
"show optimization progress",
"display convergence chart",
"visualize trial results"
]
}
],
"composition_hints": {
"combines_with": ["trial_history_table", "best_parameters_display"],
"typical_workflows": ["dashboard_view", "result_monitoring"],
"prerequisites": ["optimization_runner"]
},
"metadata": {
"author": "Antoine Polvé",
"created": "2025-01-10",
"status": "stable",
"tested": true,
"documentation_url": "docs/features/dashboard_widgets.md"
}
}
```
### Example 4: Analysis Feature (Surrogate Quality Checker)
```json
{
"feature_id": "surrogate_quality_checker",
"name": "Surrogate Quality Checker",
"description": "Evaluates surrogate model quality using R², CV score, and confidence intervals",
"category": "analysis",
"subcategory": "decision_support",
"lifecycle_stage": "post_optimization",
"abstraction_level": "composite",
"implementation": {
"file_path": "optimization_engine/analysis/surrogate_quality.py",
"function_name": "check_surrogate_quality",
"entry_point": "from optimization_engine.analysis.surrogate_quality import check_surrogate_quality"
},
"interface": {
"inputs": [
{
"name": "trial_data",
"type": "list[dict]",
"required": true,
"description": "Trial history with design variables and objectives"
},
{
"name": "min_r_squared",
"type": "float",
"required": false,
"description": "Minimum acceptable R² threshold",
"example": "0.9"
}
],
"outputs": [
{
"name": "r_squared",
"type": "float",
"description": "Coefficient of determination",
"units": "none"
},
{
"name": "cv_score",
"type": "float",
"description": "Cross-validation score",
"units": "none"
},
{
"name": "quality_verdict",
"type": "str",
"description": "EXCELLENT|GOOD|POOR based on metrics"
}
]
},
"dependencies": {
"features": ["optimization_runner"],
"libraries": ["sklearn", "numpy"],
"nx_version": null
},
"usage_examples": [
{
"description": "Check if surrogate is reliable for predictions",
"code": "quality = check_surrogate_quality(trial_data)\nif quality['r_squared'] > 0.9:\n print('Surrogate is reliable')",
"natural_language": [
"check surrogate quality",
"is surrogate reliable",
"can I trust the surrogate model"
]
}
],
"composition_hints": {
"combines_with": ["sensitivity_analysis", "pareto_front_analyzer"],
"typical_workflows": ["post_optimization_analysis", "decision_support"],
"prerequisites": ["optimization_runner"]
},
"metadata": {
"author": "Antoine Polvé",
"created": "2025-01-16",
"status": "experimental",
"tested": false,
"documentation_url": "docs/features/surrogate_quality_checker.md"
}
}
```
---
## Implementation Plan
### Phase 2 Week 1: Foundation
#### Day 1-2: Create Initial Registry
- [ ] Create `optimization_engine/feature_registry.json`
- [ ] Document 15-20 existing features across all categories
- [ ] Add engineering features (stress_extractor, displacement_extractor)
- [ ] Add software features (hook_manager, optimization_runner, nx_solver)
- [ ] Add UI features (dashboard widgets)
#### Day 3-4: LLM Skill Setup
- [ ] Create `.claude/skills/atomizer.md`
- [ ] Define how LLM should read and use feature_registry.json
- [ ] Add feature discovery examples
- [ ] Add feature composition examples
- [ ] Test LLM's ability to navigate registry
#### Day 5: Documentation
- [ ] Create `docs/features/` directory
- [ ] Write feature guides for key features
- [ ] Link registry entries to documentation
- [ ] Update DEVELOPMENT.md with registry usage
### Phase 2 Week 2: LLM Integration
#### Natural Language Parser
- [ ] Intent classification using registry metadata
- [ ] Entity extraction for design variables, objectives
- [ ] Feature selection based on user request
- [ ] Workflow composition from features
### Future Phases: Feature Expansion
#### Phase 3: Code Generation
- [ ] Template features for common patterns
- [ ] Validation rules for generated code
- [ ] Auto-registration of new features
#### Phase 4-7: Continuous Evolution
- [ ] User-contributed features
- [ ] Pattern learning from usage
- [ ] Best practices extraction
- [ ] Self-documentation updates
---
## Benefits of This Architecture
### For Users
- **Natural language control**: "minimize stress" → LLM selects stress_extractor
- **Intelligent suggestions**: LLM proposes features based on context
- **No configuration files**: LLM generates config from conversation
### For Developers
- **Clear structure**: Features organized by domain, lifecycle, abstraction
- **Easy extension**: Add new features following templates
- **Self-documenting**: Registry serves as API documentation
### For LLM
- **Comprehensive context**: All capabilities in one place
- **Composition guidance**: Knows how features combine
- **Natural language mapping**: Understands user intent
- **Pattern recognition**: Can generate new features from templates
---
## Next Steps
1. **Create initial feature_registry.json** with 15-20 existing features
2. **Test LLM navigation** with Claude skill
3. **Validate registry structure** with real user requests
4. **Iterate on metadata** based on LLM's needs
5. **Build out documentation** in docs/features/
---
**Maintained by**: Antoine Polvé (antoine@atomaste.com)
**Repository**: [GitHub - Atomizer](https://github.com/yourusername/Atomizer)