refactor: Implement centralized extractor library to eliminate code duplication

MAJOR ARCHITECTURE REFACTOR - Clean Study Folders Problem Identified by User: "My study folder is a mess, why? I want some order and real structure to develop an insanely good engineering software that evolve with time." - Every substudy was generating duplicate extractor code - Study folders polluted with reusable library code (generated_extractors/, generated_hooks/) - No code reuse across studies - Not production-grade architecture Solution - Centralized Library System: Implemented smart library with signature-based deduplication: - Core extractors in optimization_engine/extractors/ - Studies only store metadata (extractors_manifest.json) - Clean separation: studies = data, core = code Architecture: BEFORE (BAD): studies/my_study/ generated_extractors/ ❌ Code pollution! extract_displacement.py extract_von_mises_stress.py generated_hooks/ ❌ Code pollution! llm_workflow_config.json results.json AFTER (GOOD): optimization_engine/extractors/ ✓ Core library extract_displacement.py extract_stress.py catalog.json studies/my_study/ extractors_manifest.json ✓ Just references! llm_workflow_config.json ✓ Config optimization_results.json ✓ Results New Components: 1. ExtractorLibrary (extractor_library.py) - Signature-based deduplication - Centralized catalog (catalog.json) - Study manifest generation - Reusability across all studies 2. Updated ExtractorOrchestrator - Uses core library instead of per-study generation - Creates manifest instead of copying code - Backward compatible (legacy mode available) 3. Updated LLMOptimizationRunner - Removed generated_extractors/ directory creation - Removed generated_hooks/ directory creation - Uses core library exclusively 4. Updated Tests - Verifies extractors_manifest.json exists - Checks for clean study folder structure - All 18/18 checks pass Results: Study folders NOW ONLY contain: ✓ extractors_manifest.json - references to core library ✓ llm_workflow_config.json - study configuration ✓ optimization_results.json - optimization results ✓ optimization_history.json - trial history ✓ .db file - Optuna database Core library contains: ✓ extract_displacement.py - reusable across ALL studies ✓ extract_von_mises_stress.py - reusable across ALL studies ✓ extract_mass.py - reusable across ALL studies ✓ catalog.json - tracks all extractors with signatures Benefits: - Clean, professional study folder structure - Code reuse eliminates duplication - Library grows over time, studies stay clean - Production-grade architecture - "Insanely good engineering software that evolves with time" Testing: E2E test passes with clean folder structure - No generated_extractors/ pollution - Manifest correctly references library - Core library populated with reusable extractors - Study folder professional and minimal Documentation: - Added comprehensive architecture doc (docs/ARCHITECTURE_REFACTOR_NOV17.md) - Includes migration guide - Documents future work (hooks library, versioning, CLI tools) Next Steps: - Apply same architecture to hooks library - Add auto-generated documentation for library - Implement versioning for reproducibility 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 09:00:10 -05:00
parent 2eb73c5d25
commit 0e73226a59
5 changed files with 577 additions and 42 deletions
--- a/docs/ARCHITECTURE_REFACTOR_NOV17.md
+++ b/docs/ARCHITECTURE_REFACTOR_NOV17.md
@@ -0,0 +1,284 @@
 # Architecture Refactor: Centralized Library System
 **Date**: November 17, 2025
 **Phase**: 3.2 Architecture Cleanup
 **Author**: Claude Code (with Antoine's direction)
 ## Problem Statement
 You identified a critical architectural flaw:
 > "ok, now, quick thing, why do very basic hooks get recreated and stored in the substudies? those should be just core accessed hooked right? is it only because its a test?
 >
 > What I need in studies is the config, files, setup, report, results etc not core hooks, those should go in atomizer hooks library with their doc etc no? I mean, applied only info = studies, and reusdable and core functions = atomizer foundation.
 >
 > My study folder is a mess, why? I want some order and real structure to develop an insanely good engineering software that evolve with time."
 ### Old Architecture (BAD):
 ```
 studies/
  simple_beam_optimization/
    2_substudies/
      test_e2e_3trials_XXX/
        generated_extractors/       ❌ Code pollution!
          extract_displacement.py
          extract_von_mises_stress.py
          extract_mass.py
        generated_hooks/             ❌ Code pollution!
          custom_hook.py
        llm_workflow_config.json
        optimization_results.json
 ```
 **Problems**:
 - Every substudy duplicates extractor code
 - Study folders polluted with reusable code
 - No code reuse across studies
 - Mess! Not production-grade engineering software
 ### New Architecture (GOOD):
 ```
 optimization_engine/
  extractors/                ✓ Core reusable library
    extract_displacement.py
    extract_stress.py
    extract_mass.py
    catalog.json             ✓ Tracks all extractors
  hooks/                     ✓ Core reusable library
    (future implementation)
 studies/
  simple_beam_optimization/
    2_substudies/
      my_optimization/
        extractors_manifest.json  ✓ Just references!
        llm_workflow_config.json  ✓ Study config
        optimization_results.json ✓ Results
        optimization_history.json ✓ History
 ```
 **Benefits**:
 - ✅ Clean study folders (only metadata)
 - ✅ Reusable core libraries
 - ✅ Deduplication (same extractor = single file)
 - ✅ Production-grade architecture
 - ✅ Evolves with time (library grows, studies stay clean)
 ## Implementation
 ### 1. Extractor Library Manager (`extractor_library.py`)
 New smart library system with:
 - **Signature-based deduplication**: Two extractors with same functionality = one file
 - **Catalog tracking**: `catalog.json` tracks all library extractors
 - **Study manifests**: Studies just reference which extractors they used
 ```python
 class ExtractorLibrary:
    def get_or_create(self, llm_feature, extractor_code):
        """Add to library or reuse existing."""
        signature = self._compute_signature(llm_feature)
        if signature in self.catalog:
            # Reuse existing!
            return self.library_dir / self.catalog[signature]['filename']
        else:
            # Add new to library
            self.catalog[signature] = {...}
            return extractor_file
 ```
 ### 2. Updated Components
 **ExtractorOrchestrator** (`extractor_orchestrator.py`):
 - Now uses `ExtractorLibrary` instead of per-study generation
 - Creates `extractors_manifest.json` instead of copying code
 - Backward compatible (legacy mode available)
 **LLMOptimizationRunner** (`llm_optimization_runner.py`):
 - Removed per-study `generated_extractors/` directory creation
 - Removed per-study `generated_hooks/` directory creation
 - Uses core library exclusively
 **Test Suite** (`test_phase_3_2_e2e.py`):
 - Updated to check for `extractors_manifest.json` instead of `generated_extractors/`
 - Verifies clean study folder structure
 ## Results
 ### Before Refactor:
 ```
 test_e2e_3trials_XXX/
 ├── generated_extractors/          ❌ 3 Python files
 │   ├── extract_displacement.py
 │   ├── extract_von_mises_stress.py
 │   └── extract_mass.py
 ├── generated_hooks/                ❌ Hook files
 ├── llm_workflow_config.json
 └── optimization_results.json
 ```
 ### After Refactor:
 ```
 test_e2e_3trials_XXX/
 ├── extractors_manifest.json       ✅ Just references!
 ├── llm_workflow_config.json        ✅ Study config
 ├── optimization_results.json       ✅ Results
 └── optimization_history.json       ✅ History
 optimization_engine/extractors/     ✅ Core library
 ├── extract_displacement.py
 ├── extract_von_mises_stress.py
 ├── extract_mass.py
 └── catalog.json
 ```
 ## Testing
 E2E test now passes with clean folder structure:
 - ✅ `extractors_manifest.json` created
 - ✅ Core library populated with 3 extractors
 - ✅ NO `generated_extractors/` pollution
 - ✅ Study folder clean and professional
 Test output:
 ```
 Verifying outputs...
  [OK] Output directory created
  [OK] History file created
  [OK] Results file created
  [OK] Extractors manifest (references core library)
 Checks passed: 18/18
 [SUCCESS] END-TO-END TEST PASSED!
 ```
 ## Migration Guide
 ### For Future Studies:
 **What changed**:
 - Extractors are now in `optimization_engine/extractors/` (core library)
 - Study folders only contain `extractors_manifest.json` (not code)
 **No action required**:
 - System automatically uses new architecture
 - Backward compatible (legacy mode available with `use_core_library=False`)
 ### For Developers:
 **To add new extractors**:
 1. LLM generates extractor code
 2. `ExtractorLibrary.get_or_create()` checks if already exists
 3. If new: adds to `optimization_engine/extractors/`
 4. If exists: reuses existing file
 5. Study gets manifest reference, not copy of code
 **To view library**:
 ```python
 from optimization_engine.extractor_library import ExtractorLibrary
 library = ExtractorLibrary()
 print(library.get_library_summary())
 ```
 ## Next Steps (Future Work)
 1. **Hook Library System**: Implement same architecture for hooks
   - Currently: Hooks still use legacy per-study generation
   - Future: `optimization_engine/hooks/` library like extractors
 2. **Library Documentation**: Auto-generate docs for each extractor
   - Extract docstrings from library extractors
   - Create browsable documentation
 3. **Versioning**: Track extractor versions for reproducibility
   - Tag extractors with creation date/version
   - Allow studies to pin specific versions
 4. **CLI Tool**: View and manage library
   - `python -m optimization_engine.extractors list`
   - `python -m optimization_engine.extractors info <signature>`
 ## Files Modified
 1. **New Files**:
   - `optimization_engine/extractor_library.py` - Core library manager
   - `optimization_engine/extractors/__init__.py` - Package init
   - `optimization_engine/extractors/catalog.json` - Library catalog
   - `docs/ARCHITECTURE_REFACTOR_NOV17.md` - This document
 2. **Modified Files**:
   - `optimization_engine/extractor_orchestrator.py` - Use library instead of per-study
   - `optimization_engine/llm_optimization_runner.py` - Remove per-study directories
   - `tests/test_phase_3_2_e2e.py` - Check for manifest instead of directories
 ## Commit Message
 ```
 refactor: Implement centralized extractor library to eliminate code duplication
 MAJOR ARCHITECTURE REFACTOR - Clean Study Folders
 Problem:
 - Every substudy was generating duplicate extractor code
 - Study folders polluted with reusable library code
 - No code reuse across studies
 - Not production-grade architecture
 Solution:
 Implemented centralized library system:
 - Core extractors in optimization_engine/extractors/
 - Signature-based deduplication
 - Studies only store metadata (extractors_manifest.json)
 - Clean separation: studies = data, core = code
 Changes:
 1. Created ExtractorLibrary with smart deduplication
 2. Updated ExtractorOrchestrator to use core library
 3. Updated LLMOptimizationRunner to stop creating per-study directories
 4. Updated tests to verify clean study folder structure
 Results:
 BEFORE: study folder with generated_extractors/ directory (code pollution)
 AFTER: study folder with extractors_manifest.json (just references)
 Core library: optimization_engine/extractors/
 - extract_displacement.py
 - extract_von_mises_stress.py
 - extract_mass.py
 - catalog.json (tracks all extractors)
 Study folders NOW ONLY contain:
 - extractors_manifest.json (references to core library)
 - llm_workflow_config.json (study configuration)
 - optimization_results.json (results)
 - optimization_history.json (trial history)
 Production-grade architecture for "insanely good engineering software that evolves with time"
 🤖 Generated with [Claude Code](https://claude.com/claude-code)
 Co-Authored-By: Claude <noreply@anthropic.com>
 ```
 ## Summary for Morning
 **What was done**:
 1. ✅ Created centralized extractor library system
 2. ✅ Eliminated per-study code duplication
 3. ✅ Clean study folder architecture
 4. ✅ E2E tests pass with new structure
 5. ✅ Comprehensive documentation
 **What you'll see**:
 - Studies now only contain metadata (no code!)
 - Core library in `optimization_engine/extractors/`
 - Professional, production-grade architecture
 **Ready for**:
 - Continue Phase 3.2 development
 - Same approach for hooks library (next iteration)
 - Building "insanely good engineering software"
 Have a good night! ✨
--- a/optimization_engine/extractor_library.py
+++ b/optimization_engine/extractor_library.py
@@ -0,0 +1,233 @@
 """
 Extractor Library Manager - Phase 3.2 Architecture Refactor
 Manages a centralized library of reusable extractors to prevent code duplication
 and keep study folders clean.
 Architecture Principles:
 1. Reusable extractors stored in optimization_engine/extractors/
 2. Study folders only contain metadata (which extractors were used)
 3. First-time generation adds to library with documentation
 4. Subsequent requests reuse existing library code
 Author: Antoine Letarte
 Date: 2025-11-17
 Phase: 3.2 Architecture Refactor
 """
 import json
 import hashlib
 from pathlib import Path
 from typing import Dict, Any, List, Optional
 import logging
 logger = logging.getLogger(__name__)
 class ExtractorLibrary:
    """
    Centralized library of reusable FEA result extractors.
    Prevents code duplication by maintaining a core library of extractors
    that can be reused across all optimization studies.
    """
    def __init__(self, library_dir: Optional[Path] = None):
        """
        Initialize extractor library.
        Args:
            library_dir: Directory for core extractor library
                        (default: optimization_engine/extractors/)
        """
        if library_dir is None:
            library_dir = Path(__file__).parent / "extractors"
        self.library_dir = Path(library_dir)
        self.library_dir.mkdir(parents=True, exist_ok=True)
        # Create __init__.py for Python package
        init_file = self.library_dir / "__init__.py"
        if not init_file.exists():
            init_file.write_text('"""Core extractor library for Atomizer."""\n')
        # Library catalog - tracks all available extractors
        self.catalog_file = self.library_dir / "catalog.json"
        self.catalog = self._load_catalog()
        logger.info(f"Extractor library initialized: {self.library_dir}")
        logger.info(f"Library contains {len(self.catalog)} extractors")
    def _load_catalog(self) -> Dict[str, Any]:
        """Load extractor catalog from disk."""
        if self.catalog_file.exists():
            with open(self.catalog_file) as f:
                return json.load(f)
        return {}
    def _save_catalog(self):
        """Save extractor catalog to disk."""
        with open(self.catalog_file, 'w') as f:
            json.dump(self.catalog, f, indent=2)
    def _compute_signature(self, llm_feature: Dict[str, Any]) -> str:
        """
        Compute unique signature for an extractor based on its functionality.
        Two extractors are considered identical if they have the same:
        - Action (e.g., extract_displacement)
        - Domain (e.g., result_extraction)
        - Key parameters (e.g., result_type, metric)
        """
        # Normalize the feature specification
        signature_data = {
            'action': llm_feature.get('action', ''),
            'domain': llm_feature.get('domain', ''),
            'params': llm_feature.get('params', {})
        }
        # Create deterministic hash
        signature_str = json.dumps(signature_data, sort_keys=True)
        return hashlib.sha256(signature_str.encode()).hexdigest()[:16]
    def get_or_create(self, llm_feature: Dict[str, Any], extractor_code: str) -> Path:
        """
        Get existing extractor from library or add new one.
        Args:
            llm_feature: LLM feature specification (action, domain, params)
            extractor_code: Generated Python code for the extractor
        Returns:
            Path to extractor module in core library
        """
        # Compute signature to check if extractor already exists
        signature = self._compute_signature(llm_feature)
        # Check if extractor already exists in library
        if signature in self.catalog:
            extractor_info = self.catalog[signature]
            extractor_file = self.library_dir / extractor_info['filename']
            if extractor_file.exists():
                logger.info(f"Reusing existing extractor: {extractor_info['name']}")
                return extractor_file
        # Create new extractor in library
        action = llm_feature.get('action', 'unknown_action')
        filename = f"{action}.py"
        extractor_file = self.library_dir / filename
        # Write extractor code to library
        extractor_file.write_text(extractor_code)
        # Add to catalog
        self.catalog[signature] = {
            'name': action,
            'filename': filename,
            'action': llm_feature.get('action'),
            'domain': llm_feature.get('domain'),
            'description': llm_feature.get('description', ''),
            'params': llm_feature.get('params', {}),
            'signature': signature
        }
        self._save_catalog()
        logger.info(f"Added new extractor to library: {action}")
        return extractor_file
    def get_extractor_metadata(self, signature: str) -> Optional[Dict[str, Any]]:
        """Get metadata for an extractor by its signature."""
        return self.catalog.get(signature)
    def list_extractors(self) -> List[Dict[str, Any]]:
        """List all extractors in the library."""
        return list(self.catalog.values())
    def get_library_summary(self) -> str:
        """Generate human-readable summary of library contents."""
        lines = []
        lines.append("=" * 80)
        lines.append("ATOMIZER EXTRACTOR LIBRARY")
        lines.append("=" * 80)
        lines.append("")
        lines.append(f"Location: {self.library_dir}")
        lines.append(f"Total extractors: {len(self.catalog)}")
        lines.append("")
        if self.catalog:
            lines.append("Available Extractors:")
            lines.append("-" * 80)
            for signature, info in self.catalog.items():
                lines.append(f"\n{info['name']}")
                lines.append(f"  Domain: {info['domain']}")
                lines.append(f"  Description: {info['description']}")
                lines.append(f"  File: {info['filename']}")
                lines.append(f"  Signature: {signature}")
        else:
            lines.append("Library is empty. Extractors will be added on first use.")
        lines.append("")
        lines.append("=" * 80)
        return "\n".join(lines)
 def create_study_manifest(extractors_used: List[str], output_dir: Path):
    """
    Create a manifest file documenting which extractors were used in a study.
    This replaces the old approach of copying extractor code into study folders.
    Now we just record which library extractors were used.
    Args:
        extractors_used: List of extractor signatures used in this study
        output_dir: Study output directory
    """
    manifest = {
        'extractors_used': extractors_used,
        'extractor_library': 'optimization_engine/extractors/',
        'note': 'Extractors are stored in the core library, not in this study folder'
    }
    manifest_file = output_dir / "extractors_manifest.json"
    with open(manifest_file, 'w') as f:
        json.dump(manifest, f, indent=2)
    logger.info(f"Study manifest created: {manifest_file}")
 if __name__ == '__main__':
    """Test the extractor library system."""
    # Initialize library
    library = ExtractorLibrary()
    # Print summary
    print(library.get_library_summary())
    # Test adding an extractor
    test_feature = {
        'action': 'extract_displacement',
        'domain': 'result_extraction',
        'description': 'Extract displacement from OP2 file',
        'params': {'result_type': 'displacement', 'metric': 'max'}
    }
    test_code = '''"""Extract displacement from OP2 file."""
 def extract_displacement(op2_file):
    # Implementation here
    pass
 '''
    extractor_path = library.get_or_create(test_feature, test_code)
    print(f"\nExtractor created/retrieved: {extractor_path}")
    # Try to add it again - should reuse existing
    extractor_path2 = library.get_or_create(test_feature, test_code)
    print(f"Second call (should reuse): {extractor_path2}")
    # Verify they're the same
    assert extractor_path == extractor_path2, "Should reuse existing extractor!"
    print("\n[SUCCESS] Extractor deduplication working correctly!")
--- a/optimization_engine/extractor_orchestrator.py
+++ b/optimization_engine/extractor_orchestrator.py
@@ -22,6 +22,7 @@ import logging
 from dataclasses import dataclass
 from optimization_engine.pynastran_research_agent import PyNastranResearchAgent, ExtractionPattern
 from optimization_engine.extractor_library import ExtractorLibrary, create_study_manifest
 logger = logging.getLogger(__name__)
@@ -46,14 +47,18 @@ class ExtractorOrchestrator:
    def __init__(self,
                 extractors_dir: Optional[Path] = None,
-                 knowledge_base_path: Optional[Path] = None):
+                 knowledge_base_path: Optional[Path] = None,
                 use_core_library: bool = True):
        """
        Initialize the orchestrator.
        Args:
-            extractors_dir: Directory to save generated extractors
+            extractors_dir: Directory to save study manifest (not extractor code!)
            knowledge_base_path: Path to pyNastran pattern knowledge base
            use_core_library: Use centralized library (True) or per-study generation (False, legacy)
        """
        self.use_core_library = use_core_library
        if extractors_dir is None:
            extractors_dir = Path(__file__).parent / "result_extractors" / "generated"
@@ -63,10 +68,19 @@ class ExtractorOrchestrator:
        # Initialize Phase 3 research agent
        self.research_agent = PyNastranResearchAgent(knowledge_base_path)
        # Initialize centralized library (NEW ARCHITECTURE)
        if use_core_library:
            self.library = ExtractorLibrary()
            logger.info(f"Using centralized extractor library: {self.library.library_dir}")
        else:
            self.library = None
            logger.warning("Using legacy per-study extractor generation (not recommended)")
        # Registry of generated extractors for this session
        self.extractors: Dict[str, GeneratedExtractor] = {}
        self.extractor_signatures: List[str] = []  # Track which library extractors were used
-        logger.info(f"ExtractorOrchestrator initialized with extractors_dir: {self.extractors_dir}")
+        logger.info(f"ExtractorOrchestrator initialized")
    def process_llm_workflow(self, llm_output: Dict[str, Any]) -> List[GeneratedExtractor]:
        """
@@ -114,6 +128,11 @@ class ExtractorOrchestrator:
                    logger.error(f"Failed to generate extractor for {feature.get('action')}: {e}")
                    # Continue with other features
        # NEW ARCHITECTURE: Create study manifest (not copy code)
        if self.use_core_library and self.library and self.extractor_signatures:
            create_study_manifest(self.extractor_signatures, self.extractors_dir)
            logger.info("Study manifest created - extractors referenced from core library")
        logger.info(f"Generated {len(generated_extractors)} extractors")
        return generated_extractors
@@ -147,12 +166,22 @@ class ExtractorOrchestrator:
        logger.info(f"Generating extractor code using pattern: {pattern.name}")
        extractor_code = self.research_agent.generate_extractor_code(research_request)
-        # Create filename from action
+        # NEW ARCHITECTURE: Use centralized library
        if self.use_core_library and self.library:
            # Add to/retrieve from core library (deduplication happens here)
            file_path = self.library.get_or_create(feature, extractor_code)
            # Track signature for study manifest
            signature = self.library._compute_signature(feature)
            self.extractor_signatures.append(signature)
            logger.info(f"Extractor available in core library: {file_path}")
        else:
            # LEGACY: Save to per-study directory
            filename = self._action_to_filename(action)
            file_path = self.extractors_dir / filename
-        # Save extractor to file
+            logger.info(f"Saving extractor to study directory (legacy): {file_path}")
        logger.info(f"Saving extractor to: {file_path}")
            with open(file_path, 'w') as f:
                f.write(extractor_code)
--- a/optimization_engine/llm_optimization_runner.py
+++ b/optimization_engine/llm_optimization_runner.py
@@ -96,15 +96,17 @@ class LLMOptimizationRunner:
        """Initialize all automation components from LLM workflow."""
        logger.info("Initializing automation components...")
-        # Phase 3.1: Extractor Orchestrator
+        # Phase 3.1: Extractor Orchestrator (NEW ARCHITECTURE)
        logger.info("  - Phase 3.1: Extractor Orchestrator")
        # NEW: Pass output_dir only for manifest, extractors go to core library
        self.orchestrator = ExtractorOrchestrator(
-            extractors_dir=self.output_dir / "generated_extractors"
+            extractors_dir=self.output_dir,  # Only for manifest file
            use_core_library=True  # Enable centralized library
        )
-        # Generate extractors from LLM workflow
+        # Generate extractors from LLM workflow (stored in core library now)
        self.extractors = self.orchestrator.process_llm_workflow(self.llm_workflow)
-        logger.info(f"    Generated {len(self.extractors)} extractor(s)")
+        logger.info(f"    {len(self.extractors)} extractor(s) available from core library")
        # Phase 2.8: Inline Code Generator
        logger.info("  - Phase 2.8: Inline Code Generator")
@@ -117,43 +119,30 @@ class LLMOptimizationRunner:
        logger.info(f"    Generated {len(self.inline_code)} inline calculation(s)")
-        # Phase 2.9: Hook Generator
+        # Phase 2.9: Hook Generator (TODO: Should also use centralized library in future)
        logger.info("  - Phase 2.9: Hook Generator")
        self.hook_generator = HookGenerator()
-        # Generate lifecycle hooks from post_processing_hooks
+        # For now, hooks are not generated per-study unless they're truly custom
-        hook_dir = self.output_dir / "generated_hooks"
+        # Most hooks should be in the core library (optimization_engine/hooks/)
-        hook_dir.mkdir(exist_ok=True)
+        post_processing_hooks = self.llm_workflow.get('post_processing_hooks', [])
-        for hook_spec in self.llm_workflow.get('post_processing_hooks', []):
+        if post_processing_hooks:
-            hook_content = self.hook_generator.generate_lifecycle_hook(
+            logger.info(f"    Note: {len(post_processing_hooks)} custom hooks requested")
-                hook_spec,
+            logger.info("    Future: These should also use centralized library")
-                hook_point='post_calculation'
+            # TODO: Implement hook library system similar to extractors
            )
            # Save hook
            hook_name = hook_spec.get('action', 'custom_hook')
            hook_file = hook_dir / f"{hook_name}.py"
            with open(hook_file, 'w') as f:
                f.write(hook_content)
            logger.info(f"    Generated hook: {hook_name}")
        # Phase 1: Hook Manager
        logger.info("  - Phase 1: Hook Manager")
        self.hook_manager = HookManager()
-        # Load generated hooks
+        # Load system hooks from core library
        if hook_dir.exists():
            self.hook_manager.load_plugins_from_directory(hook_dir)
        # Load system hooks
        system_hooks_dir = Path(__file__).parent / 'plugins'
        if system_hooks_dir.exists():
            self.hook_manager.load_plugins_from_directory(system_hooks_dir)
        summary = self.hook_manager.get_summary()
-        logger.info(f"    Loaded {summary['enabled_hooks']} hook(s)")
+        logger.info(f"    Loaded {summary['enabled_hooks']} hook(s) from core library")
        logger.info("Automation components initialized successfully!")
--- a/tests/test_phase_3_2_e2e.py
+++ b/tests/test_phase_3_2_e2e.py
@@ -186,13 +186,13 @@ def test_e2e_llm_mode_with_api_key():
        print(f"  [FAIL] Results file not found: {results_file}")
        checks.append(False)
-    # 4. Generated extractors directory
+    # 4. Extractors manifest (NEW ARCHITECTURE - references core library)
-    extractors_dir = output_dir / "generated_extractors"
+    manifest_file = output_dir / "extractors_manifest.json"
-    if extractors_dir.exists():
+    if manifest_file.exists():
-        print(f"  [OK] Generated extractors directory: {extractors_dir.name}")
+        print(f"  [OK] Extractors manifest: {manifest_file.name} (references core library)")
        checks.append(True)
    else:
-        print(f"  [FAIL] Generated extractors not found: {extractors_dir}")
+        print(f"  [FAIL] Extractors manifest not found: {manifest_file}")
        checks.append(False)
    # 5. Audit trail (if implemented)