Files
Atomizer/tests/test_phase_3_2_e2e.py
Anto01 0e73226a59 refactor: Implement centralized extractor library to eliminate code duplication
MAJOR ARCHITECTURE REFACTOR - Clean Study Folders

Problem Identified by User:
"My study folder is a mess, why? I want some order and real structure to develop
an insanely good engineering software that evolve with time."

- Every substudy was generating duplicate extractor code
- Study folders polluted with reusable library code (generated_extractors/, generated_hooks/)
- No code reuse across studies
- Not production-grade architecture

Solution - Centralized Library System:
Implemented smart library with signature-based deduplication:
- Core extractors in optimization_engine/extractors/
- Studies only store metadata (extractors_manifest.json)
- Clean separation: studies = data, core = code

Architecture:

BEFORE (BAD):
  studies/my_study/
    generated_extractors/            Code pollution!
      extract_displacement.py
      extract_von_mises_stress.py
    generated_hooks/                 Code pollution!
    llm_workflow_config.json
    results.json

AFTER (GOOD):
  optimization_engine/extractors/   ✓ Core library
    extract_displacement.py
    extract_stress.py
    catalog.json

  studies/my_study/
    extractors_manifest.json        ✓ Just references!
    llm_workflow_config.json        ✓ Config
    optimization_results.json       ✓ Results

New Components:

1. ExtractorLibrary (extractor_library.py)
   - Signature-based deduplication
   - Centralized catalog (catalog.json)
   - Study manifest generation
   - Reusability across all studies

2. Updated ExtractorOrchestrator
   - Uses core library instead of per-study generation
   - Creates manifest instead of copying code
   - Backward compatible (legacy mode available)

3. Updated LLMOptimizationRunner
   - Removed generated_extractors/ directory creation
   - Removed generated_hooks/ directory creation
   - Uses core library exclusively

4. Updated Tests
   - Verifies extractors_manifest.json exists
   - Checks for clean study folder structure
   - All 18/18 checks pass

Results:

Study folders NOW ONLY contain:
✓ extractors_manifest.json - references to core library
✓ llm_workflow_config.json - study configuration
✓ optimization_results.json - optimization results
✓ optimization_history.json - trial history
✓ .db file - Optuna database

Core library contains:
✓ extract_displacement.py - reusable across ALL studies
✓ extract_von_mises_stress.py - reusable across ALL studies
✓ extract_mass.py - reusable across ALL studies
✓ catalog.json - tracks all extractors with signatures

Benefits:
- Clean, professional study folder structure
- Code reuse eliminates duplication
- Library grows over time, studies stay clean
- Production-grade architecture
- "Insanely good engineering software that evolves with time"

Testing:
E2E test passes with clean folder structure
- No generated_extractors/ pollution
- Manifest correctly references library
- Core library populated with reusable extractors
- Study folder professional and minimal

Documentation:
- Added comprehensive architecture doc (docs/ARCHITECTURE_REFACTOR_NOV17.md)
- Includes migration guide
- Documents future work (hooks library, versioning, CLI tools)

Next Steps:
- Apply same architecture to hooks library
- Add auto-generated documentation for library
- Implement versioning for reproducibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 09:00:10 -05:00

16 KiB