Files
Atomizer/docs/08_ARCHIVE/historical/FEATURE_REGISTRY_ARCHITECTURE.md
Anto01 e3bdb08a22 feat: Major update with validators, skills, dashboard, and docs reorganization
- Add validation framework (config, model, results, study validators)
- Add Claude Code skills (create-study, run-optimization, generate-report,
  troubleshoot, analyze-model)
- Add Atomizer Dashboard (React frontend + FastAPI backend)
- Reorganize docs into structured directories (00-09)
- Add neural surrogate modules and training infrastructure
- Add multi-objective optimization support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 19:23:58 -05:00

24 KiB

Feature Registry Architecture

Comprehensive guide to Atomizer's LLM-instructed feature database system

Last Updated: 2025-01-16 Status: Phase 2 - Design Document


Table of Contents

  1. Vision and Goals
  2. Feature Categorization System
  3. Feature Registry Structure
  4. LLM Instruction Format
  5. Feature Documentation Strategy
  6. Dynamic Tool Building
  7. Examples
  8. Implementation Plan

Vision and Goals

Core Philosophy

Atomizer's feature registry is not just a catalog - it's an LLM instruction system that enables:

  1. Self-Documentation: Features describe themselves to the LLM
  2. Intelligent Composition: LLM can combine features into workflows
  3. Autonomous Proposals: LLM suggests new features based on user needs
  4. Structured Customization: Users customize the tool through natural language
  5. Continuous Evolution: Feature database grows as users add capabilities

Key Principles

  • Feature Types Are First-Class: Engineering, software, UI, and analysis features are equally important
  • Location-Aware: Features know where their code lives and how to use it
  • Metadata-Rich: Each feature has enough context for LLM to understand and use it
  • Composable: Features can be combined into higher-level workflows
  • Extensible: New feature types can be added without breaking the system

Feature Categorization System

Primary Feature Dimensions

Features are organized along three dimensions:

Dimension 1: Domain (WHAT it does)

  • Engineering: Physics-based operations (stress, thermal, modal, etc.)
  • Software: Core algorithms and infrastructure (optimization, hooks, path resolution)
  • UI: User-facing components (dashboard, reports, visualization)
  • Analysis: Post-processing and decision support (sensitivity, Pareto, surrogate quality)

Dimension 2: Lifecycle Stage (WHEN it runs)

  • Pre-Mesh: Before meshing (geometry operations)
  • Pre-Solve: Before FEA solve (parameter updates, logging)
  • Solve: During FEA execution (solver control)
  • Post-Solve: After solve, before extraction (file validation)
  • Post-Extraction: After result extraction (logging, analysis)
  • Post-Optimization: After optimization completes (reporting, visualization)

Dimension 3: Abstraction Level (HOW it's used)

  • Primitive: Low-level functions (extract_stress, update_expression)
  • Composite: Mid-level workflows (RSS_metric, weighted_objective)
  • Workflow: High-level operations (run_optimization, generate_report)

Feature Type Classification

┌─────────────────────────────────────────────────────────────┐
│                     FEATURE UNIVERSE                        │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
   ENGINEERING            SOFTWARE                UI
        │                     │                     │
    ┌───┴───┐           ┌────┴────┐          ┌─────┴─────┐
    │       │           │         │          │           │
Extractors  Metrics  Optimization Hooks  Dashboard  Reports
    │       │           │         │          │           │
  Stress   RSS        Optuna   Pre-Solve  Widgets    HTML
  Thermal  SCF         TPE     Post-Solve Controls   PDF
  Modal    FOS       Sampler  Post-Extract Charts   Markdown

Feature Registry Structure

JSON Schema

{
  "feature_registry": {
    "version": "0.2.0",
    "last_updated": "2025-01-16",
    "categories": {
      "engineering": { ... },
      "software": { ... },
      "ui": { ... },
      "analysis": { ... }
    }
  }
}

Feature Entry Schema

Each feature has:

{
  "feature_id": "unique_identifier",
  "name": "Human-Readable Name",
  "description": "What this feature does (for LLM understanding)",
  "category": "engineering|software|ui|analysis",
  "subcategory": "extractors|metrics|optimization|hooks|...",
  "lifecycle_stage": "pre_solve|post_solve|post_extraction|...",
  "abstraction_level": "primitive|composite|workflow",
  "implementation": {
    "file_path": "relative/path/to/implementation.py",
    "function_name": "function_or_class_name",
    "entry_point": "how to invoke this feature"
  },
  "interface": {
    "inputs": [
      {
        "name": "parameter_name",
        "type": "str|int|float|dict|list",
        "required": true,
        "description": "What this parameter does",
        "units": "mm|MPa|Hz|none",
        "example": "example_value"
      }
    ],
    "outputs": [
      {
        "name": "output_name",
        "type": "float|dict|list",
        "description": "What this output represents",
        "units": "mm|MPa|Hz|none"
      }
    ]
  },
  "dependencies": {
    "features": ["feature_id_1", "feature_id_2"],
    "libraries": ["optuna", "pyNastran"],
    "nx_version": "2412"
  },
  "usage_examples": [
    {
      "description": "Example scenario",
      "code": "example_code_snippet",
      "natural_language": "How user would request this"
    }
  ],
  "composition_hints": {
    "combines_with": ["feature_id_3", "feature_id_4"],
    "typical_workflows": ["workflow_name_1"],
    "prerequisites": ["feature that must run before this"]
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-16",
    "status": "stable|experimental|deprecated",
    "tested": true,
    "documentation_url": "docs/features/feature_name.md"
  }
}

LLM Instruction Format

How LLM Uses the Registry

The feature registry serves as a structured instruction manual for the LLM:

1. Discovery Phase

User: "I want to minimize stress on my bracket"

LLM reads registry:
  → Finds category="engineering", subcategory="extractors"
  → Discovers "stress_extractor" feature
  → Reads: "Extracts von Mises stress from OP2 files"
  → Checks composition_hints: combines_with=["optimization_runner"]

LLM response: "I'll use the stress_extractor feature to minimize stress.
               This requires an OP2 file from NX solve."

2. Composition Phase

User: "Add a custom RSS metric combining stress and displacement"

LLM reads registry:
  → Finds abstraction_level="composite" examples
  → Discovers "rss_metric" template feature
  → Reads interface: inputs=[stress_value, displacement_value]
  → Checks composition_hints: combines_with=["stress_extractor", "displacement_extractor"]

LLM generates new composite feature following the pattern

3. Proposal Phase

User: "What features could help me analyze fatigue life?"

LLM reads registry:
  → Searches category="engineering", subcategory="extractors"
  → Finds: stress_extractor, displacement_extractor (exist)
  → Doesn't find: fatigue_extractor (missing)
  → Reads composition_hints for similar features

LLM proposes: "I can create a fatigue_life_extractor that:
               1. Extracts stress history from OP2
               2. Applies rainflow counting algorithm
               3. Uses S-N curve to estimate fatigue life

               This would be similar to stress_extractor but with
               time-series analysis. Should I implement it?"

4. Execution Phase

User: "Run the optimization"

LLM reads registry:
  → Finds abstraction_level="workflow", feature_id="run_optimization"
  → Reads implementation.entry_point
  → Checks dependencies: ["optuna", "nx_solver", "stress_extractor"]
  → Reads lifecycle_stage to understand execution order

LLM executes: python optimization_engine/runner.py

Natural Language Mapping

Each feature includes natural_language examples showing how users might request it:

"usage_examples": [
  {
    "natural_language": [
      "minimize stress",
      "reduce von Mises stress",
      "find lowest stress configuration",
      "optimize for minimum stress"
    ],
    "maps_to": {
      "feature": "stress_extractor",
      "objective": "minimize",
      "metric": "max_von_mises"
    }
  }
]

This enables LLM to understand user intent and select correct features.


Feature Documentation Strategy

Multi-Location Documentation

Features are documented in three places, each serving different purposes:

1. Feature Registry (feature_registry.json)

Purpose: LLM instruction and discovery Location: optimization_engine/feature_registry.json Content:

  • Structured metadata
  • Interface definitions
  • Composition hints
  • Usage examples

Example:

{
  "feature_id": "stress_extractor",
  "name": "Stress Extractor",
  "description": "Extracts von Mises stress from OP2 files",
  "category": "engineering",
  "subcategory": "extractors"
}

2. Code Implementation (*.py files)

Purpose: Actual functionality Location: Codebase (e.g., optimization_engine/result_extractors/extractors.py) Content:

  • Python code with docstrings
  • Type hints
  • Implementation details

Example:

def extract_stress_from_op2(op2_file: Path) -> dict:
    """
    Extracts von Mises stress from OP2 file.

    Args:
        op2_file: Path to OP2 file

    Returns:
        dict with max_von_mises, min_von_mises, avg_von_mises
    """
    # Implementation...

3. Feature Documentation (docs/features/*.md)

Purpose: Human-readable guides and tutorials Location: docs/features/ Content:

  • Detailed explanations
  • Extended examples
  • Best practices
  • Troubleshooting

Example: docs/features/stress_extractor.md

# Stress Extractor

## Overview
Extracts von Mises stress from NX Nastran OP2 files.

## When to Use
- Structural optimization where stress is the objective
- Constraint checking (yield stress limits)
- Multi-objective with stress as one objective

## Example Workflows
[detailed examples...]

Documentation Flow

User Request
     ↓
LLM reads feature_registry.json (discovers feature)
     ↓
LLM reads code docstrings (understands interface)
     ↓
LLM reads docs/features/*.md (if complex usage needed)
     ↓
LLM composes workflow using features

Dynamic Tool Building

How LLM Builds New Features

The registry enables autonomous feature creation through templates and patterns:

Step 1: Pattern Recognition

User: "I need thermal stress extraction"

LLM:
1. Reads existing feature: stress_extractor
2. Identifies pattern: OP2 parsing → result extraction → return dict
3. Finds similar features: displacement_extractor
4. Recognizes template: engineering.extractors

Step 2: Feature Generation

LLM generates new feature following pattern:
{
  "feature_id": "thermal_stress_extractor",
  "name": "Thermal Stress Extractor",
  "description": "Extracts thermal stress from OP2 files (steady-state heat transfer analysis)",
  "category": "engineering",
  "subcategory": "extractors",
  "lifecycle_stage": "post_extraction",
  "abstraction_level": "primitive",
  "implementation": {
    "file_path": "optimization_engine/result_extractors/thermal_extractors.py",
    "function_name": "extract_thermal_stress_from_op2",
    "entry_point": "from optimization_engine.result_extractors.thermal_extractors import extract_thermal_stress_from_op2"
  },
  # ... rest of schema
}

Step 3: Code Generation

# LLM writes implementation following stress_extractor pattern
def extract_thermal_stress_from_op2(op2_file: Path) -> dict:
    """
    Extracts thermal stress from OP2 file.

    Args:
        op2_file: Path to OP2 file from thermal analysis

    Returns:
        dict with max_thermal_stress, temperature_at_max_stress
    """
    from pyNastran.op2.op2 import OP2

    op2 = OP2()
    op2.read_op2(op2_file)

    # Extract thermal stress (element type depends on analysis)
    thermal_stress = op2.thermal_stress_data

    return {
        'max_thermal_stress': thermal_stress.max(),
        'temperature_at_max_stress': # ...
    }

Step 4: Registration

LLM adds to feature_registry.json
LLM creates docs/features/thermal_stress_extractor.md
LLM updates CHANGELOG.md with new feature
LLM runs tests to validate implementation

Feature Composition Examples

Example 1: RSS Metric (Composite Feature)

User: "Create RSS metric combining stress and displacement"

LLM composes from primitives:
  stress_extractor + displacement_extractor → rss_metric

Generated feature:
{
  "feature_id": "rss_stress_displacement",
  "abstraction_level": "composite",
  "dependencies": {
    "features": ["stress_extractor", "displacement_extractor"]
  },
  "composition_hints": {
    "composed_from": ["stress_extractor", "displacement_extractor"],
    "composition_type": "root_sum_square"
  }
}

Example 2: Complete Workflow

User: "Run bracket optimization minimizing stress"

LLM composes workflow from features:
  1. study_manager (create study folder)
  2. nx_updater (update wall_thickness parameter)
  3. nx_solver (run FEA)
  4. stress_extractor (extract results)
  5. optimization_runner (Optuna TPE loop)
  6. report_generator (create HTML report)

Each step uses a feature from registry with proper sequencing
based on lifecycle_stage metadata.

Examples

Example 1: Engineering Feature (Stress Extractor)

{
  "feature_id": "stress_extractor",
  "name": "Stress Extractor",
  "description": "Extracts von Mises stress from NX Nastran OP2 files",
  "category": "engineering",
  "subcategory": "extractors",
  "lifecycle_stage": "post_extraction",
  "abstraction_level": "primitive",
  "implementation": {
    "file_path": "optimization_engine/result_extractors/extractors.py",
    "function_name": "extract_stress_from_op2",
    "entry_point": "from optimization_engine.result_extractors.extractors import extract_stress_from_op2"
  },
  "interface": {
    "inputs": [
      {
        "name": "op2_file",
        "type": "Path",
        "required": true,
        "description": "Path to OP2 file from NX solve",
        "example": "bracket_sim1-solution_1.op2"
      }
    ],
    "outputs": [
      {
        "name": "max_von_mises",
        "type": "float",
        "description": "Maximum von Mises stress across all elements",
        "units": "MPa"
      },
      {
        "name": "element_id_at_max",
        "type": "int",
        "description": "Element ID where max stress occurs"
      }
    ]
  },
  "dependencies": {
    "features": [],
    "libraries": ["pyNastran"],
    "nx_version": "2412"
  },
  "usage_examples": [
    {
      "description": "Minimize stress in bracket optimization",
      "code": "result = extract_stress_from_op2(Path('bracket.op2'))\nmax_stress = result['max_von_mises']",
      "natural_language": [
        "minimize stress",
        "reduce von Mises stress",
        "find lowest stress configuration"
      ]
    }
  ],
  "composition_hints": {
    "combines_with": ["displacement_extractor", "mass_extractor"],
    "typical_workflows": ["structural_optimization", "stress_minimization"],
    "prerequisites": ["nx_solver"]
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-10",
    "status": "stable",
    "tested": true,
    "documentation_url": "docs/features/stress_extractor.md"
  }
}

Example 2: Software Feature (Hook Manager)

{
  "feature_id": "hook_manager",
  "name": "Hook Manager",
  "description": "Manages plugin lifecycle hooks for optimization workflow",
  "category": "software",
  "subcategory": "infrastructure",
  "lifecycle_stage": "all",
  "abstraction_level": "composite",
  "implementation": {
    "file_path": "optimization_engine/plugins/hook_manager.py",
    "function_name": "HookManager",
    "entry_point": "from optimization_engine.plugins.hook_manager import HookManager"
  },
  "interface": {
    "inputs": [
      {
        "name": "hook_type",
        "type": "str",
        "required": true,
        "description": "Lifecycle point: pre_solve, post_solve, post_extraction",
        "example": "pre_solve"
      },
      {
        "name": "context",
        "type": "dict",
        "required": true,
        "description": "Context data passed to hooks (trial_number, design_variables, etc.)"
      }
    ],
    "outputs": [
      {
        "name": "execution_history",
        "type": "list",
        "description": "List of hooks executed with timestamps and success status"
      }
    ]
  },
  "dependencies": {
    "features": [],
    "libraries": [],
    "nx_version": null
  },
  "usage_examples": [
    {
      "description": "Execute pre-solve hooks before FEA",
      "code": "hook_manager.execute_hooks('pre_solve', context={'trial': 1})",
      "natural_language": [
        "run pre-solve plugins",
        "execute hooks before solving"
      ]
    }
  ],
  "composition_hints": {
    "combines_with": ["detailed_logger", "optimization_logger"],
    "typical_workflows": ["optimization_runner"],
    "prerequisites": []
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-16",
    "status": "stable",
    "tested": true,
    "documentation_url": "docs/features/hook_manager.md"
  }
}

Example 3: UI Feature (Dashboard Widget)

{
  "feature_id": "optimization_progress_chart",
  "name": "Optimization Progress Chart",
  "description": "Real-time chart showing optimization convergence",
  "category": "ui",
  "subcategory": "dashboard_widgets",
  "lifecycle_stage": "post_optimization",
  "abstraction_level": "composite",
  "implementation": {
    "file_path": "dashboard/frontend/components/ProgressChart.js",
    "function_name": "OptimizationProgressChart",
    "entry_point": "new OptimizationProgressChart(containerId)"
  },
  "interface": {
    "inputs": [
      {
        "name": "trial_data",
        "type": "list[dict]",
        "required": true,
        "description": "List of trial results with objective values",
        "example": "[{trial: 1, value: 45.3}, {trial: 2, value: 42.1}]"
      }
    ],
    "outputs": [
      {
        "name": "chart_element",
        "type": "HTMLElement",
        "description": "Rendered chart DOM element"
      }
    ]
  },
  "dependencies": {
    "features": [],
    "libraries": ["Chart.js"],
    "nx_version": null
  },
  "usage_examples": [
    {
      "description": "Display optimization progress in dashboard",
      "code": "chart = new OptimizationProgressChart('chart-container')\nchart.update(trial_data)",
      "natural_language": [
        "show optimization progress",
        "display convergence chart",
        "visualize trial results"
      ]
    }
  ],
  "composition_hints": {
    "combines_with": ["trial_history_table", "best_parameters_display"],
    "typical_workflows": ["dashboard_view", "result_monitoring"],
    "prerequisites": ["optimization_runner"]
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-10",
    "status": "stable",
    "tested": true,
    "documentation_url": "docs/features/dashboard_widgets.md"
  }
}

Example 4: Analysis Feature (Surrogate Quality Checker)

{
  "feature_id": "surrogate_quality_checker",
  "name": "Surrogate Quality Checker",
  "description": "Evaluates surrogate model quality using R², CV score, and confidence intervals",
  "category": "analysis",
  "subcategory": "decision_support",
  "lifecycle_stage": "post_optimization",
  "abstraction_level": "composite",
  "implementation": {
    "file_path": "optimization_engine/analysis/surrogate_quality.py",
    "function_name": "check_surrogate_quality",
    "entry_point": "from optimization_engine.analysis.surrogate_quality import check_surrogate_quality"
  },
  "interface": {
    "inputs": [
      {
        "name": "trial_data",
        "type": "list[dict]",
        "required": true,
        "description": "Trial history with design variables and objectives"
      },
      {
        "name": "min_r_squared",
        "type": "float",
        "required": false,
        "description": "Minimum acceptable R² threshold",
        "example": "0.9"
      }
    ],
    "outputs": [
      {
        "name": "r_squared",
        "type": "float",
        "description": "Coefficient of determination",
        "units": "none"
      },
      {
        "name": "cv_score",
        "type": "float",
        "description": "Cross-validation score",
        "units": "none"
      },
      {
        "name": "quality_verdict",
        "type": "str",
        "description": "EXCELLENT|GOOD|POOR based on metrics"
      }
    ]
  },
  "dependencies": {
    "features": ["optimization_runner"],
    "libraries": ["sklearn", "numpy"],
    "nx_version": null
  },
  "usage_examples": [
    {
      "description": "Check if surrogate is reliable for predictions",
      "code": "quality = check_surrogate_quality(trial_data)\nif quality['r_squared'] > 0.9:\n    print('Surrogate is reliable')",
      "natural_language": [
        "check surrogate quality",
        "is surrogate reliable",
        "can I trust the surrogate model"
      ]
    }
  ],
  "composition_hints": {
    "combines_with": ["sensitivity_analysis", "pareto_front_analyzer"],
    "typical_workflows": ["post_optimization_analysis", "decision_support"],
    "prerequisites": ["optimization_runner"]
  },
  "metadata": {
    "author": "Antoine Polvé",
    "created": "2025-01-16",
    "status": "experimental",
    "tested": false,
    "documentation_url": "docs/features/surrogate_quality_checker.md"
  }
}

Implementation Plan

Phase 2 Week 1: Foundation

Day 1-2: Create Initial Registry

  • Create optimization_engine/feature_registry.json
  • Document 15-20 existing features across all categories
  • Add engineering features (stress_extractor, displacement_extractor)
  • Add software features (hook_manager, optimization_runner, nx_solver)
  • Add UI features (dashboard widgets)

Day 3-4: LLM Skill Setup

  • Create .claude/skills/atomizer.md
  • Define how LLM should read and use feature_registry.json
  • Add feature discovery examples
  • Add feature composition examples
  • Test LLM's ability to navigate registry

Day 5: Documentation

  • Create docs/features/ directory
  • Write feature guides for key features
  • Link registry entries to documentation
  • Update DEVELOPMENT.md with registry usage

Phase 2 Week 2: LLM Integration

Natural Language Parser

  • Intent classification using registry metadata
  • Entity extraction for design variables, objectives
  • Feature selection based on user request
  • Workflow composition from features

Future Phases: Feature Expansion

Phase 3: Code Generation

  • Template features for common patterns
  • Validation rules for generated code
  • Auto-registration of new features

Phase 4-7: Continuous Evolution

  • User-contributed features
  • Pattern learning from usage
  • Best practices extraction
  • Self-documentation updates

Benefits of This Architecture

For Users

  • Natural language control: "minimize stress" → LLM selects stress_extractor
  • Intelligent suggestions: LLM proposes features based on context
  • No configuration files: LLM generates config from conversation

For Developers

  • Clear structure: Features organized by domain, lifecycle, abstraction
  • Easy extension: Add new features following templates
  • Self-documenting: Registry serves as API documentation

For LLM

  • Comprehensive context: All capabilities in one place
  • Composition guidance: Knows how features combine
  • Natural language mapping: Understands user intent
  • Pattern recognition: Can generate new features from templates

Next Steps

  1. Create initial feature_registry.json with 15-20 existing features
  2. Test LLM navigation with Claude skill
  3. Validate registry structure with real user requests
  4. Iterate on metadata based on LLM's needs
  5. Build out documentation in docs/features/

Maintained by: Antoine Polvé (antoine@atomaste.com) Repository: GitHub - Atomizer