MAJOR ARCHITECTURE REFACTOR - Clean Study Folders
Problem Identified by User:
"My study folder is a mess, why? I want some order and real structure to develop
an insanely good engineering software that evolve with time."
- Every substudy was generating duplicate extractor code
- Study folders polluted with reusable library code (generated_extractors/, generated_hooks/)
- No code reuse across studies
- Not production-grade architecture
Solution - Centralized Library System:
Implemented smart library with signature-based deduplication:
- Core extractors in optimization_engine/extractors/
- Studies only store metadata (extractors_manifest.json)
- Clean separation: studies = data, core = code
Architecture:
BEFORE (BAD):
studies/my_study/
generated_extractors/ ❌ Code pollution!
extract_displacement.py
extract_von_mises_stress.py
generated_hooks/ ❌ Code pollution!
llm_workflow_config.json
results.json
AFTER (GOOD):
optimization_engine/extractors/ ✓ Core library
extract_displacement.py
extract_stress.py
catalog.json
studies/my_study/
extractors_manifest.json ✓ Just references!
llm_workflow_config.json ✓ Config
optimization_results.json ✓ Results
New Components:
1. ExtractorLibrary (extractor_library.py)
- Signature-based deduplication
- Centralized catalog (catalog.json)
- Study manifest generation
- Reusability across all studies
2. Updated ExtractorOrchestrator
- Uses core library instead of per-study generation
- Creates manifest instead of copying code
- Backward compatible (legacy mode available)
3. Updated LLMOptimizationRunner
- Removed generated_extractors/ directory creation
- Removed generated_hooks/ directory creation
- Uses core library exclusively
4. Updated Tests
- Verifies extractors_manifest.json exists
- Checks for clean study folder structure
- All 18/18 checks pass
Results:
Study folders NOW ONLY contain:
✓ extractors_manifest.json - references to core library
✓ llm_workflow_config.json - study configuration
✓ optimization_results.json - optimization results
✓ optimization_history.json - trial history
✓ .db file - Optuna database
Core library contains:
✓ extract_displacement.py - reusable across ALL studies
✓ extract_von_mises_stress.py - reusable across ALL studies
✓ extract_mass.py - reusable across ALL studies
✓ catalog.json - tracks all extractors with signatures
Benefits:
- Clean, professional study folder structure
- Code reuse eliminates duplication
- Library grows over time, studies stay clean
- Production-grade architecture
- "Insanely good engineering software that evolves with time"
Testing:
E2E test passes with clean folder structure
- No generated_extractors/ pollution
- Manifest correctly references library
- Core library populated with reusable extractors
- Study folder professional and minimal
Documentation:
- Added comprehensive architecture doc (docs/ARCHITECTURE_REFACTOR_NOV17.md)
- Includes migration guide
- Documents future work (hooks library, versioning, CLI tools)
Next Steps:
- Apply same architecture to hooks library
- Add auto-generated documentation for library
- Implement versioning for reproducibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
9.0 KiB
Architecture Refactor: Centralized Library System
Date: November 17, 2025 Phase: 3.2 Architecture Cleanup Author: Claude Code (with Antoine's direction)
Problem Statement
You identified a critical architectural flaw:
"ok, now, quick thing, why do very basic hooks get recreated and stored in the substudies? those should be just core accessed hooked right? is it only because its a test?
What I need in studies is the config, files, setup, report, results etc not core hooks, those should go in atomizer hooks library with their doc etc no? I mean, applied only info = studies, and reusdable and core functions = atomizer foundation.
My study folder is a mess, why? I want some order and real structure to develop an insanely good engineering software that evolve with time."
Old Architecture (BAD):
studies/
simple_beam_optimization/
2_substudies/
test_e2e_3trials_XXX/
generated_extractors/ ❌ Code pollution!
extract_displacement.py
extract_von_mises_stress.py
extract_mass.py
generated_hooks/ ❌ Code pollution!
custom_hook.py
llm_workflow_config.json
optimization_results.json
Problems:
- Every substudy duplicates extractor code
- Study folders polluted with reusable code
- No code reuse across studies
- Mess! Not production-grade engineering software
New Architecture (GOOD):
optimization_engine/
extractors/ ✓ Core reusable library
extract_displacement.py
extract_stress.py
extract_mass.py
catalog.json ✓ Tracks all extractors
hooks/ ✓ Core reusable library
(future implementation)
studies/
simple_beam_optimization/
2_substudies/
my_optimization/
extractors_manifest.json ✓ Just references!
llm_workflow_config.json ✓ Study config
optimization_results.json ✓ Results
optimization_history.json ✓ History
Benefits:
- ✅ Clean study folders (only metadata)
- ✅ Reusable core libraries
- ✅ Deduplication (same extractor = single file)
- ✅ Production-grade architecture
- ✅ Evolves with time (library grows, studies stay clean)
Implementation
1. Extractor Library Manager (extractor_library.py)
New smart library system with:
- Signature-based deduplication: Two extractors with same functionality = one file
- Catalog tracking:
catalog.jsontracks all library extractors - Study manifests: Studies just reference which extractors they used
class ExtractorLibrary:
def get_or_create(self, llm_feature, extractor_code):
"""Add to library or reuse existing."""
signature = self._compute_signature(llm_feature)
if signature in self.catalog:
# Reuse existing!
return self.library_dir / self.catalog[signature]['filename']
else:
# Add new to library
self.catalog[signature] = {...}
return extractor_file
2. Updated Components
ExtractorOrchestrator (extractor_orchestrator.py):
- Now uses
ExtractorLibraryinstead of per-study generation - Creates
extractors_manifest.jsoninstead of copying code - Backward compatible (legacy mode available)
LLMOptimizationRunner (llm_optimization_runner.py):
- Removed per-study
generated_extractors/directory creation - Removed per-study
generated_hooks/directory creation - Uses core library exclusively
Test Suite (test_phase_3_2_e2e.py):
- Updated to check for
extractors_manifest.jsoninstead ofgenerated_extractors/ - Verifies clean study folder structure
Results
Before Refactor:
test_e2e_3trials_XXX/
├── generated_extractors/ ❌ 3 Python files
│ ├── extract_displacement.py
│ ├── extract_von_mises_stress.py
│ └── extract_mass.py
├── generated_hooks/ ❌ Hook files
├── llm_workflow_config.json
└── optimization_results.json
After Refactor:
test_e2e_3trials_XXX/
├── extractors_manifest.json ✅ Just references!
├── llm_workflow_config.json ✅ Study config
├── optimization_results.json ✅ Results
└── optimization_history.json ✅ History
optimization_engine/extractors/ ✅ Core library
├── extract_displacement.py
├── extract_von_mises_stress.py
├── extract_mass.py
└── catalog.json
Testing
E2E test now passes with clean folder structure:
- ✅
extractors_manifest.jsoncreated - ✅ Core library populated with 3 extractors
- ✅ NO
generated_extractors/pollution - ✅ Study folder clean and professional
Test output:
Verifying outputs...
[OK] Output directory created
[OK] History file created
[OK] Results file created
[OK] Extractors manifest (references core library)
Checks passed: 18/18
[SUCCESS] END-TO-END TEST PASSED!
Migration Guide
For Future Studies:
What changed:
- Extractors are now in
optimization_engine/extractors/(core library) - Study folders only contain
extractors_manifest.json(not code)
No action required:
- System automatically uses new architecture
- Backward compatible (legacy mode available with
use_core_library=False)
For Developers:
To add new extractors:
- LLM generates extractor code
ExtractorLibrary.get_or_create()checks if already exists- If new: adds to
optimization_engine/extractors/ - If exists: reuses existing file
- Study gets manifest reference, not copy of code
To view library:
from optimization_engine.extractor_library import ExtractorLibrary
library = ExtractorLibrary()
print(library.get_library_summary())
Next Steps (Future Work)
-
Hook Library System: Implement same architecture for hooks
- Currently: Hooks still use legacy per-study generation
- Future:
optimization_engine/hooks/library like extractors
-
Library Documentation: Auto-generate docs for each extractor
- Extract docstrings from library extractors
- Create browsable documentation
-
Versioning: Track extractor versions for reproducibility
- Tag extractors with creation date/version
- Allow studies to pin specific versions
-
CLI Tool: View and manage library
python -m optimization_engine.extractors listpython -m optimization_engine.extractors info <signature>
Files Modified
-
New Files:
optimization_engine/extractor_library.py- Core library manageroptimization_engine/extractors/__init__.py- Package initoptimization_engine/extractors/catalog.json- Library catalogdocs/ARCHITECTURE_REFACTOR_NOV17.md- This document
-
Modified Files:
optimization_engine/extractor_orchestrator.py- Use library instead of per-studyoptimization_engine/llm_optimization_runner.py- Remove per-study directoriestests/test_phase_3_2_e2e.py- Check for manifest instead of directories
Commit Message
refactor: Implement centralized extractor library to eliminate code duplication
MAJOR ARCHITECTURE REFACTOR - Clean Study Folders
Problem:
- Every substudy was generating duplicate extractor code
- Study folders polluted with reusable library code
- No code reuse across studies
- Not production-grade architecture
Solution:
Implemented centralized library system:
- Core extractors in optimization_engine/extractors/
- Signature-based deduplication
- Studies only store metadata (extractors_manifest.json)
- Clean separation: studies = data, core = code
Changes:
1. Created ExtractorLibrary with smart deduplication
2. Updated ExtractorOrchestrator to use core library
3. Updated LLMOptimizationRunner to stop creating per-study directories
4. Updated tests to verify clean study folder structure
Results:
BEFORE: study folder with generated_extractors/ directory (code pollution)
AFTER: study folder with extractors_manifest.json (just references)
Core library: optimization_engine/extractors/
- extract_displacement.py
- extract_von_mises_stress.py
- extract_mass.py
- catalog.json (tracks all extractors)
Study folders NOW ONLY contain:
- extractors_manifest.json (references to core library)
- llm_workflow_config.json (study configuration)
- optimization_results.json (results)
- optimization_history.json (trial history)
Production-grade architecture for "insanely good engineering software that evolves with time"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Summary for Morning
What was done:
- ✅ Created centralized extractor library system
- ✅ Eliminated per-study code duplication
- ✅ Clean study folder architecture
- ✅ E2E tests pass with new structure
- ✅ Comprehensive documentation
What you'll see:
- Studies now only contain metadata (no code!)
- Core library in
optimization_engine/extractors/ - Professional, production-grade architecture
Ready for:
- Continue Phase 3.2 development
- Same approach for hooks library (next iteration)
- Building "insanely good engineering software"
Have a good night! ✨