Atomizer/docs/FEATURE_REGISTRY_ARCHITECTURE.md

# Feature Registry Architecture

> Comprehensive guide to Atomizer's LLM-instructed feature database system

**Last Updated**: 2025-01-16
**Status**: Phase 2 - Design Document

---

## Table of Contents

1. [Vision and Goals](#vision-and-goals)
2. [Feature Categorization System](#feature-categorization-system)
3. [Feature Registry Structure](#feature-registry-structure)
4. [LLM Instruction Format](#llm-instruction-format)
5. [Feature Documentation Strategy](#feature-documentation-strategy)
6. [Dynamic Tool Building](#dynamic-tool-building)
7. [Examples](#examples)
8. [Implementation Plan](#implementation-plan)

---

## Vision and Goals

### Core Philosophy

Atomizer's feature registry is not just a catalog - it's an **LLM instruction system** that enables:

1. **Self-Documentation**: Features describe themselves to the LLM
2. **Intelligent Composition**: LLM can combine features into workflows
3. **Autonomous Proposals**: LLM suggests new features based on user needs
4. **Structured Customization**: Users customize the tool through natural language
5. **Continuous Evolution**: Feature database grows as users add capabilities

### Key Principles

- **Feature Types Are First-Class**: Engineering, software, UI, and analysis features are equally important
- **Location-Aware**: Features know where their code lives and how to use it
- **Metadata-Rich**: Each feature has enough context for LLM to understand and use it
- **Composable**: Features can be combined into higher-level workflows
- **Extensible**: New feature types can be added without breaking the system

---

## Feature Categorization System

### Primary Feature Dimensions

Features are organized along **three dimensions**:

#### Dimension 1: Domain (WHAT it does)
- **Engineering**: Physics-based operations (stress, thermal, modal, etc.)
- **Software**: Core algorithms and infrastructure (optimization, hooks, path resolution)
- **UI**: User-facing components (dashboard, reports, visualization)
- **Analysis**: Post-processing and decision support (sensitivity, Pareto, surrogate quality)

#### Dimension 2: Lifecycle Stage (WHEN it runs)
- **Pre-Mesh**: Before meshing (geometry operations)
- **Pre-Solve**: Before FEA solve (parameter updates, logging)
- **Solve**: During FEA execution (solver control)
- **Post-Solve**: After solve, before extraction (file validation)
- **Post-Extraction**: After result extraction (logging, analysis)
- **Post-Optimization**: After optimization completes (reporting, visualization)

#### Dimension 3: Abstraction Level (HOW it's used)
- **Primitive**: Low-level functions (extract_stress, update_expression)
- **Composite**: Mid-level workflows (RSS_metric, weighted_objective)
- **Workflow**: High-level operations (run_optimization, generate_report)

### Feature Type Classification

```
┌─────────────────────────────────────────────────────────────┐
│                     FEATURE UNIVERSE                        │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
   ENGINEERING            SOFTWARE                UI
        │                     │                     │
    ┌───┴───┐           ┌────┴────┐          ┌─────┴─────┐
    │       │           │         │          │           │
Extractors  Metrics  Optimization Hooks  Dashboard  Reports
    │       │           │         │          │           │
  Stress   RSS        Optuna   Pre-Solve  Widgets    HTML
  Thermal  SCF         TPE     Post-Solve Controls   PDF
  Modal    FOS       Sampler  Post-Extract Charts   Markdown
```

---

## Feature Registry Structure

### JSON Schema

```json
{
  "feature_registry": {
    "version": "0.2.0",
    "last_updated": "2025-01-16",
    "categories": {
      "engineering": { ... },
      "software": { ... },
      "ui": { ... },
      "analysis": { ... }
    }
  }
}
```

### Feature Entry Schema

Each feature has:

```json
{
  "feature_id": "unique_identifier",
  "name": "Human-Readable Name",
  "description": "What this feature does (for LLM understanding)",
  "category": "engineering|software|ui|analysis",
  "subcategory": "extractors|metrics|optimization|hooks|...",
  "lifecycle_stage": "pre_solve|post_solve|post_extraction|...",
  "abstraction_level": "primitive|composite|workflow",
  "implementation": {
    "file_path": "relative/path/to/implementation.py",
    "function_name": "function_or_class_name",
    "entry_point": "how to invoke this feature"
  },
  "interface": {
    "inputs": [
      {
        "name": "parameter_name",
        "type": "str|int|float|dict|list",
        "required": true,
        "description": "What this parameter does",
        "units": "mm|MPa|Hz|none",
        "example": "example_value"
      }
    ],
    "outputs": [
      {
        "name": "output_name",
        "type": "float|dict|list",
        "description": "What this output represents",
        "units": "mm|MPa|Hz|none"
      }
    ]
  },
  "dependencies": {
    "features": ["feature_id_1", "feature_id_2"],
    "libraries": ["optuna", "pyNastran"],
    "nx_version": "2412"
  },
  "usage_examples": [
    {
      "description": "Example scenario",
      "code": "example_code_snippet",
      "natural_language": "How user would request this"
    }
  ],
  "composition_hints": {
    "combines_with": ["feature_id_3", "feature_id_4"],
    "typical_workflows": ["workflow_name_1"],
    "prerequisites": ["feature that must run before this"]
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-16",
    "status": "stable|experimental|deprecated",
    "tested": true,
    "documentation_url": "docs/features/feature_name.md"
  }
}
```

---

## LLM Instruction Format

### How LLM Uses the Registry

The feature registry serves as a **structured instruction manual** for the LLM:

#### 1. Discovery Phase
```
User: "I want to minimize stress on my bracket"

LLM reads registry:
  → Finds category="engineering", subcategory="extractors"
  → Discovers "stress_extractor" feature
  → Reads: "Extracts von Mises stress from OP2 files"
  → Checks composition_hints: combines_with=["optimization_runner"]

LLM response: "I'll use the stress_extractor feature to minimize stress.
               This requires an OP2 file from NX solve."
```

#### 2. Composition Phase
```
User: "Add a custom RSS metric combining stress and displacement"

LLM reads registry:
  → Finds abstraction_level="composite" examples
  → Discovers "rss_metric" template feature
  → Reads interface: inputs=[stress_value, displacement_value]
  → Checks composition_hints: combines_with=["stress_extractor", "displacement_extractor"]

LLM generates new composite feature following the pattern
```

#### 3. Proposal Phase
```
User: "What features could help me analyze fatigue life?"

LLM reads registry:
  → Searches category="engineering", subcategory="extractors"
  → Finds: stress_extractor, displacement_extractor (exist)
  → Doesn't find: fatigue_extractor (missing)
  → Reads composition_hints for similar features

LLM proposes: "I can create a fatigue_life_extractor that:
               1. Extracts stress history from OP2
               2. Applies rainflow counting algorithm
               3. Uses S-N curve to estimate fatigue life

               This would be similar to stress_extractor but with
               time-series analysis. Should I implement it?"
```

#### 4. Execution Phase
```
User: "Run the optimization"

LLM reads registry:
  → Finds abstraction_level="workflow", feature_id="run_optimization"
  → Reads implementation.entry_point
  → Checks dependencies: ["optuna", "nx_solver", "stress_extractor"]
  → Reads lifecycle_stage to understand execution order

LLM executes: python optimization_engine/runner.py
```

### Natural Language Mapping

Each feature includes `natural_language` examples showing how users might request it:

```json
"usage_examples": [
  {
    "natural_language": [
      "minimize stress",
      "reduce von Mises stress",
      "find lowest stress configuration",
      "optimize for minimum stress"
    ],
    "maps_to": {
      "feature": "stress_extractor",
      "objective": "minimize",
      "metric": "max_von_mises"
    }
  }
]
```

This enables LLM to understand user intent and select correct features.

---

## Feature Documentation Strategy

### Multi-Location Documentation

Features are documented in **three places**, each serving different purposes:

#### 1. Feature Registry (feature_registry.json)
**Purpose**: LLM instruction and discovery
**Location**: `optimization_engine/feature_registry.json`
**Content**:
- Structured metadata
- Interface definitions
- Composition hints
- Usage examples

**Example**:
```json
{
  "feature_id": "stress_extractor",
  "name": "Stress Extractor",
  "description": "Extracts von Mises stress from OP2 files",
  "category": "engineering",
  "subcategory": "extractors"
}
```

#### 2. Code Implementation (*.py files)
**Purpose**: Actual functionality
**Location**: Codebase (e.g., `optimization_engine/result_extractors/extractors.py`)
**Content**:
- Python code with docstrings
- Type hints
- Implementation details

**Example**:
```python
def extract_stress_from_op2(op2_file: Path) -> dict:
    """
    Extracts von Mises stress from OP2 file.

    Args:
        op2_file: Path to OP2 file

    Returns:
        dict with max_von_mises, min_von_mises, avg_von_mises
    """
    # Implementation...
```

#### 3. Feature Documentation (docs/features/*.md)
**Purpose**: Human-readable guides and tutorials
**Location**: `docs/features/`
**Content**:
- Detailed explanations
- Extended examples
- Best practices
- Troubleshooting

**Example**: `docs/features/stress_extractor.md`
```markdown
# Stress Extractor

## Overview
Extracts von Mises stress from NX Nastran OP2 files.

## When to Use
- Structural optimization where stress is the objective
- Constraint checking (yield stress limits)
- Multi-objective with stress as one objective

## Example Workflows
[detailed examples...]
```

### Documentation Flow

```
User Request
     ↓
LLM reads feature_registry.json (discovers feature)
     ↓
LLM reads code docstrings (understands interface)
     ↓
LLM reads docs/features/*.md (if complex usage needed)
     ↓
LLM composes workflow using features
```

---

## Dynamic Tool Building

### How LLM Builds New Features

The registry enables **autonomous feature creation** through templates and patterns:

#### Step 1: Pattern Recognition
```
User: "I need thermal stress extraction"

LLM:
1. Reads existing feature: stress_extractor
2. Identifies pattern: OP2 parsing → result extraction → return dict
3. Finds similar features: displacement_extractor
4. Recognizes template: engineering.extractors
```

#### Step 2: Feature Generation
```
LLM generates new feature following pattern:
{
  "feature_id": "thermal_stress_extractor",
  "name": "Thermal Stress Extractor",
  "description": "Extracts thermal stress from OP2 files (steady-state heat transfer analysis)",
  "category": "engineering",
  "subcategory": "extractors",
  "lifecycle_stage": "post_extraction",
  "abstraction_level": "primitive",
  "implementation": {
    "file_path": "optimization_engine/result_extractors/thermal_extractors.py",
    "function_name": "extract_thermal_stress_from_op2",
    "entry_point": "from optimization_engine.result_extractors.thermal_extractors import extract_thermal_stress_from_op2"
  },
  # ... rest of schema
}
```

#### Step 3: Code Generation
```python
# LLM writes implementation following stress_extractor pattern
def extract_thermal_stress_from_op2(op2_file: Path) -> dict:
    """
    Extracts thermal stress from OP2 file.

    Args:
        op2_file: Path to OP2 file from thermal analysis

    Returns:
        dict with max_thermal_stress, temperature_at_max_stress
    """
    from pyNastran.op2.op2 import OP2

    op2 = OP2()
    op2.read_op2(op2_file)

    # Extract thermal stress (element type depends on analysis)
    thermal_stress = op2.thermal_stress_data

    return {
        'max_thermal_stress': thermal_stress.max(),
        'temperature_at_max_stress': # ...
    }
```

#### Step 4: Registration
```
LLM adds to feature_registry.json
LLM creates docs/features/thermal_stress_extractor.md
LLM updates CHANGELOG.md with new feature
LLM runs tests to validate implementation
```

### Feature Composition Examples

#### Example 1: RSS Metric (Composite Feature)
```
User: "Create RSS metric combining stress and displacement"

LLM composes from primitives:
  stress_extractor + displacement_extractor → rss_metric

Generated feature:
{
  "feature_id": "rss_stress_displacement",
  "abstraction_level": "composite",
  "dependencies": {
    "features": ["stress_extractor", "displacement_extractor"]
  },
  "composition_hints": {
    "composed_from": ["stress_extractor", "displacement_extractor"],
    "composition_type": "root_sum_square"
  }
}
```

#### Example 2: Complete Workflow
```
User: "Run bracket optimization minimizing stress"

LLM composes workflow from features:
  1. study_manager (create study folder)
  2. nx_updater (update wall_thickness parameter)
  3. nx_solver (run FEA)
  4. stress_extractor (extract results)
  5. optimization_runner (Optuna TPE loop)
  6. report_generator (create HTML report)

Each step uses a feature from registry with proper sequencing
based on lifecycle_stage metadata.
```

---

## Examples

### Example 1: Engineering Feature (Stress Extractor)

```json
{
  "feature_id": "stress_extractor",
  "name": "Stress Extractor",
  "description": "Extracts von Mises stress from NX Nastran OP2 files",
  "category": "engineering",
  "subcategory": "extractors",
  "lifecycle_stage": "post_extraction",
  "abstraction_level": "primitive",
  "implementation": {
    "file_path": "optimization_engine/result_extractors/extractors.py",
    "function_name": "extract_stress_from_op2",
    "entry_point": "from optimization_engine.result_extractors.extractors import extract_stress_from_op2"
  },
  "interface": {
    "inputs": [
      {
        "name": "op2_file",
        "type": "Path",
        "required": true,
        "description": "Path to OP2 file from NX solve",
        "example": "bracket_sim1-solution_1.op2"
      }
    ],
    "outputs": [
      {
        "name": "max_von_mises",
        "type": "float",
        "description": "Maximum von Mises stress across all elements",
        "units": "MPa"
      },
      {
        "name": "element_id_at_max",
        "type": "int",
        "description": "Element ID where max stress occurs"
      }
    ]
  },
  "dependencies": {
    "features": [],
    "libraries": ["pyNastran"],
    "nx_version": "2412"
  },
  "usage_examples": [
    {
      "description": "Minimize stress in bracket optimization",
      "code": "result = extract_stress_from_op2(Path('bracket.op2'))\nmax_stress = result['max_von_mises']",
      "natural_language": [
        "minimize stress",
        "reduce von Mises stress",
        "find lowest stress configuration"
      ]
    }
  ],
  "composition_hints": {
    "combines_with": ["displacement_extractor", "mass_extractor"],
    "typical_workflows": ["structural_optimization", "stress_minimization"],
    "prerequisites": ["nx_solver"]
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-10",
    "status": "stable",
    "tested": true,
    "documentation_url": "docs/features/stress_extractor.md"
  }
}
```

### Example 2: Software Feature (Hook Manager)

```json
{
  "feature_id": "hook_manager",
  "name": "Hook Manager",
  "description": "Manages plugin lifecycle hooks for optimization workflow",
  "category": "software",
  "subcategory": "infrastructure",
  "lifecycle_stage": "all",
  "abstraction_level": "composite",
  "implementation": {
    "file_path": "optimization_engine/plugins/hook_manager.py",
    "function_name": "HookManager",
    "entry_point": "from optimization_engine.plugins.hook_manager import HookManager"
  },
  "interface": {
    "inputs": [
      {
        "name": "hook_type",
        "type": "str",
        "required": true,
        "description": "Lifecycle point: pre_solve, post_solve, post_extraction",
        "example": "pre_solve"
      },
      {
        "name": "context",
        "type": "dict",
        "required": true,
        "description": "Context data passed to hooks (trial_number, design_variables, etc.)"
      }
    ],
    "outputs": [
      {
        "name": "execution_history",
        "type": "list",
        "description": "List of hooks executed with timestamps and success status"
      }
    ]
  },
  "dependencies": {
    "features": [],
    "libraries": [],
    "nx_version": null
  },
  "usage_examples": [
    {
      "description": "Execute pre-solve hooks before FEA",
      "code": "hook_manager.execute_hooks('pre_solve', context={'trial': 1})",
      "natural_language": [
        "run pre-solve plugins",
        "execute hooks before solving"
      ]
    }
  ],
  "composition_hints": {
    "combines_with": ["detailed_logger", "optimization_logger"],
    "typical_workflows": ["optimization_runner"],
    "prerequisites": []
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-16",
    "status": "stable",
    "tested": true,
    "documentation_url": "docs/features/hook_manager.md"
  }
}
```

### Example 3: UI Feature (Dashboard Widget)

```json
{
  "feature_id": "optimization_progress_chart",
  "name": "Optimization Progress Chart",
  "description": "Real-time chart showing optimization convergence",
  "category": "ui",
  "subcategory": "dashboard_widgets",
  "lifecycle_stage": "post_optimization",
  "abstraction_level": "composite",
  "implementation": {
    "file_path": "dashboard/frontend/components/ProgressChart.js",
    "function_name": "OptimizationProgressChart",
    "entry_point": "new OptimizationProgressChart(containerId)"
  },
  "interface": {
    "inputs": [
      {
        "name": "trial_data",
        "type": "list[dict]",
        "required": true,
        "description": "List of trial results with objective values",
        "example": "[{trial: 1, value: 45.3}, {trial: 2, value: 42.1}]"
      }
    ],
    "outputs": [
      {
        "name": "chart_element",
        "type": "HTMLElement",
        "description": "Rendered chart DOM element"
      }
    ]
  },
  "dependencies": {
    "features": [],
    "libraries": ["Chart.js"],
    "nx_version": null
  },
  "usage_examples": [
    {
      "description": "Display optimization progress in dashboard",
      "code": "chart = new OptimizationProgressChart('chart-container')\nchart.update(trial_data)",
      "natural_language": [
        "show optimization progress",
        "display convergence chart",
        "visualize trial results"
      ]
    }
  ],
  "composition_hints": {
    "combines_with": ["trial_history_table", "best_parameters_display"],
    "typical_workflows": ["dashboard_view", "result_monitoring"],
    "prerequisites": ["optimization_runner"]
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-10",
    "status": "stable",
    "tested": true,
    "documentation_url": "docs/features/dashboard_widgets.md"
  }
}
```

### Example 4: Analysis Feature (Surrogate Quality Checker)

```json
{
  "feature_id": "surrogate_quality_checker",
  "name": "Surrogate Quality Checker",
  "description": "Evaluates surrogate model quality using R², CV score, and confidence intervals",
  "category": "analysis",
  "subcategory": "decision_support",
  "lifecycle_stage": "post_optimization",
  "abstraction_level": "composite",
  "implementation": {
    "file_path": "optimization_engine/analysis/surrogate_quality.py",
    "function_name": "check_surrogate_quality",
    "entry_point": "from optimization_engine.analysis.surrogate_quality import check_surrogate_quality"
  },
  "interface": {
    "inputs": [
      {
        "name": "trial_data",
        "type": "list[dict]",
        "required": true,
        "description": "Trial history with design variables and objectives"
      },
      {
        "name": "min_r_squared",
        "type": "float",
        "required": false,
        "description": "Minimum acceptable R² threshold",
        "example": "0.9"
      }
    ],
    "outputs": [
      {
        "name": "r_squared",
        "type": "float",
        "description": "Coefficient of determination",
        "units": "none"
      },
      {
        "name": "cv_score",
        "type": "float",
        "description": "Cross-validation score",
        "units": "none"
      },
      {
        "name": "quality_verdict",
        "type": "str",
        "description": "EXCELLENT|GOOD|POOR based on metrics"
      }
    ]
  },
  "dependencies": {
    "features": ["optimization_runner"],
    "libraries": ["sklearn", "numpy"],
    "nx_version": null
  },
  "usage_examples": [
    {
      "description": "Check if surrogate is reliable for predictions",
      "code": "quality = check_surrogate_quality(trial_data)\nif quality['r_squared'] > 0.9:\n    print('Surrogate is reliable')",
      "natural_language": [
        "check surrogate quality",
        "is surrogate reliable",
        "can I trust the surrogate model"
      ]
    }
  ],
  "composition_hints": {
    "combines_with": ["sensitivity_analysis", "pareto_front_analyzer"],
    "typical_workflows": ["post_optimization_analysis", "decision_support"],
    "prerequisites": ["optimization_runner"]
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-16",
    "status": "experimental",
    "tested": false,
    "documentation_url": "docs/features/surrogate_quality_checker.md"
  }
}
```

---

## Implementation Plan

### Phase 2 Week 1: Foundation

#### Day 1-2: Create Initial Registry
- [ ] Create `optimization_engine/feature_registry.json`
- [ ] Document 15-20 existing features across all categories
- [ ] Add engineering features (stress_extractor, displacement_extractor)
- [ ] Add software features (hook_manager, optimization_runner, nx_solver)
- [ ] Add UI features (dashboard widgets)

#### Day 3-4: LLM Skill Setup
- [ ] Create `.claude/skills/atomizer.md`
- [ ] Define how LLM should read and use feature_registry.json
- [ ] Add feature discovery examples
- [ ] Add feature composition examples
- [ ] Test LLM's ability to navigate registry

#### Day 5: Documentation
- [ ] Create `docs/features/` directory
- [ ] Write feature guides for key features
- [ ] Link registry entries to documentation
- [ ] Update DEVELOPMENT.md with registry usage

### Phase 2 Week 2: LLM Integration

#### Natural Language Parser
- [ ] Intent classification using registry metadata
- [ ] Entity extraction for design variables, objectives
- [ ] Feature selection based on user request
- [ ] Workflow composition from features

### Future Phases: Feature Expansion

#### Phase 3: Code Generation
- [ ] Template features for common patterns
- [ ] Validation rules for generated code
- [ ] Auto-registration of new features

#### Phase 4-7: Continuous Evolution
- [ ] User-contributed features
- [ ] Pattern learning from usage
- [ ] Best practices extraction
- [ ] Self-documentation updates

---

## Benefits of This Architecture

### For Users
- **Natural language control**: "minimize stress" → LLM selects stress_extractor
- **Intelligent suggestions**: LLM proposes features based on context
- **No configuration files**: LLM generates config from conversation

### For Developers
- **Clear structure**: Features organized by domain, lifecycle, abstraction
- **Easy extension**: Add new features following templates
- **Self-documenting**: Registry serves as API documentation

### For LLM
- **Comprehensive context**: All capabilities in one place
- **Composition guidance**: Knows how features combine
- **Natural language mapping**: Understands user intent
- **Pattern recognition**: Can generate new features from templates

---

## Next Steps

1. **Create initial feature_registry.json** with 15-20 existing features
2. **Test LLM navigation** with Claude skill
3. **Validate registry structure** with real user requests
4. **Iterate on metadata** based on LLM's needs
5. **Build out documentation** in docs/features/

---

**Maintained by**: Antoine Polvé (antoine@atomaste.com)
**Repository**: [GitHub - Atomizer](https://github.com/yourusername/Atomizer)