MAJOR ARCHITECTURE REFACTOR - Clean Study Folders
Problem Identified by User:
"My study folder is a mess, why? I want some order and real structure to develop
an insanely good engineering software that evolve with time."
- Every substudy was generating duplicate extractor code
- Study folders polluted with reusable library code (generated_extractors/, generated_hooks/)
- No code reuse across studies
- Not production-grade architecture
Solution - Centralized Library System:
Implemented smart library with signature-based deduplication:
- Core extractors in optimization_engine/extractors/
- Studies only store metadata (extractors_manifest.json)
- Clean separation: studies = data, core = code
Architecture:
BEFORE (BAD):
studies/my_study/
generated_extractors/ ❌ Code pollution!
extract_displacement.py
extract_von_mises_stress.py
generated_hooks/ ❌ Code pollution!
llm_workflow_config.json
results.json
AFTER (GOOD):
optimization_engine/extractors/ ✓ Core library
extract_displacement.py
extract_stress.py
catalog.json
studies/my_study/
extractors_manifest.json ✓ Just references!
llm_workflow_config.json ✓ Config
optimization_results.json ✓ Results
New Components:
1. ExtractorLibrary (extractor_library.py)
- Signature-based deduplication
- Centralized catalog (catalog.json)
- Study manifest generation
- Reusability across all studies
2. Updated ExtractorOrchestrator
- Uses core library instead of per-study generation
- Creates manifest instead of copying code
- Backward compatible (legacy mode available)
3. Updated LLMOptimizationRunner
- Removed generated_extractors/ directory creation
- Removed generated_hooks/ directory creation
- Uses core library exclusively
4. Updated Tests
- Verifies extractors_manifest.json exists
- Checks for clean study folder structure
- All 18/18 checks pass
Results:
Study folders NOW ONLY contain:
✓ extractors_manifest.json - references to core library
✓ llm_workflow_config.json - study configuration
✓ optimization_results.json - optimization results
✓ optimization_history.json - trial history
✓ .db file - Optuna database
Core library contains:
✓ extract_displacement.py - reusable across ALL studies
✓ extract_von_mises_stress.py - reusable across ALL studies
✓ extract_mass.py - reusable across ALL studies
✓ catalog.json - tracks all extractors with signatures
Benefits:
- Clean, professional study folder structure
- Code reuse eliminates duplication
- Library grows over time, studies stay clean
- Production-grade architecture
- "Insanely good engineering software that evolves with time"
Testing:
E2E test passes with clean folder structure
- No generated_extractors/ pollution
- Manifest correctly references library
- Core library populated with reusable extractors
- Study folder professional and minimal
Documentation:
- Added comprehensive architecture doc (docs/ARCHITECTURE_REFACTOR_NOV17.md)
- Includes migration guide
- Documents future work (hooks library, versioning, CLI tools)
Next Steps:
- Apply same architecture to hooks library
- Add auto-generated documentation for library
- Implement versioning for reproducibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
285 lines
9.0 KiB
Markdown
285 lines
9.0 KiB
Markdown
# Architecture Refactor: Centralized Library System
|
|
**Date**: November 17, 2025
|
|
**Phase**: 3.2 Architecture Cleanup
|
|
**Author**: Claude Code (with Antoine's direction)
|
|
|
|
## Problem Statement
|
|
|
|
You identified a critical architectural flaw:
|
|
|
|
> "ok, now, quick thing, why do very basic hooks get recreated and stored in the substudies? those should be just core accessed hooked right? is it only because its a test?
|
|
>
|
|
> What I need in studies is the config, files, setup, report, results etc not core hooks, those should go in atomizer hooks library with their doc etc no? I mean, applied only info = studies, and reusdable and core functions = atomizer foundation.
|
|
>
|
|
> My study folder is a mess, why? I want some order and real structure to develop an insanely good engineering software that evolve with time."
|
|
|
|
### Old Architecture (BAD):
|
|
```
|
|
studies/
|
|
simple_beam_optimization/
|
|
2_substudies/
|
|
test_e2e_3trials_XXX/
|
|
generated_extractors/ ❌ Code pollution!
|
|
extract_displacement.py
|
|
extract_von_mises_stress.py
|
|
extract_mass.py
|
|
generated_hooks/ ❌ Code pollution!
|
|
custom_hook.py
|
|
llm_workflow_config.json
|
|
optimization_results.json
|
|
```
|
|
|
|
**Problems**:
|
|
- Every substudy duplicates extractor code
|
|
- Study folders polluted with reusable code
|
|
- No code reuse across studies
|
|
- Mess! Not production-grade engineering software
|
|
|
|
### New Architecture (GOOD):
|
|
```
|
|
optimization_engine/
|
|
extractors/ ✓ Core reusable library
|
|
extract_displacement.py
|
|
extract_stress.py
|
|
extract_mass.py
|
|
catalog.json ✓ Tracks all extractors
|
|
|
|
hooks/ ✓ Core reusable library
|
|
(future implementation)
|
|
|
|
studies/
|
|
simple_beam_optimization/
|
|
2_substudies/
|
|
my_optimization/
|
|
extractors_manifest.json ✓ Just references!
|
|
llm_workflow_config.json ✓ Study config
|
|
optimization_results.json ✓ Results
|
|
optimization_history.json ✓ History
|
|
```
|
|
|
|
**Benefits**:
|
|
- ✅ Clean study folders (only metadata)
|
|
- ✅ Reusable core libraries
|
|
- ✅ Deduplication (same extractor = single file)
|
|
- ✅ Production-grade architecture
|
|
- ✅ Evolves with time (library grows, studies stay clean)
|
|
|
|
## Implementation
|
|
|
|
### 1. Extractor Library Manager (`extractor_library.py`)
|
|
|
|
New smart library system with:
|
|
- **Signature-based deduplication**: Two extractors with same functionality = one file
|
|
- **Catalog tracking**: `catalog.json` tracks all library extractors
|
|
- **Study manifests**: Studies just reference which extractors they used
|
|
|
|
```python
|
|
class ExtractorLibrary:
|
|
def get_or_create(self, llm_feature, extractor_code):
|
|
"""Add to library or reuse existing."""
|
|
signature = self._compute_signature(llm_feature)
|
|
|
|
if signature in self.catalog:
|
|
# Reuse existing!
|
|
return self.library_dir / self.catalog[signature]['filename']
|
|
else:
|
|
# Add new to library
|
|
self.catalog[signature] = {...}
|
|
return extractor_file
|
|
```
|
|
|
|
### 2. Updated Components
|
|
|
|
**ExtractorOrchestrator** (`extractor_orchestrator.py`):
|
|
- Now uses `ExtractorLibrary` instead of per-study generation
|
|
- Creates `extractors_manifest.json` instead of copying code
|
|
- Backward compatible (legacy mode available)
|
|
|
|
**LLMOptimizationRunner** (`llm_optimization_runner.py`):
|
|
- Removed per-study `generated_extractors/` directory creation
|
|
- Removed per-study `generated_hooks/` directory creation
|
|
- Uses core library exclusively
|
|
|
|
**Test Suite** (`test_phase_3_2_e2e.py`):
|
|
- Updated to check for `extractors_manifest.json` instead of `generated_extractors/`
|
|
- Verifies clean study folder structure
|
|
|
|
## Results
|
|
|
|
### Before Refactor:
|
|
```
|
|
test_e2e_3trials_XXX/
|
|
├── generated_extractors/ ❌ 3 Python files
|
|
│ ├── extract_displacement.py
|
|
│ ├── extract_von_mises_stress.py
|
|
│ └── extract_mass.py
|
|
├── generated_hooks/ ❌ Hook files
|
|
├── llm_workflow_config.json
|
|
└── optimization_results.json
|
|
```
|
|
|
|
### After Refactor:
|
|
```
|
|
test_e2e_3trials_XXX/
|
|
├── extractors_manifest.json ✅ Just references!
|
|
├── llm_workflow_config.json ✅ Study config
|
|
├── optimization_results.json ✅ Results
|
|
└── optimization_history.json ✅ History
|
|
|
|
optimization_engine/extractors/ ✅ Core library
|
|
├── extract_displacement.py
|
|
├── extract_von_mises_stress.py
|
|
├── extract_mass.py
|
|
└── catalog.json
|
|
```
|
|
|
|
## Testing
|
|
|
|
E2E test now passes with clean folder structure:
|
|
- ✅ `extractors_manifest.json` created
|
|
- ✅ Core library populated with 3 extractors
|
|
- ✅ NO `generated_extractors/` pollution
|
|
- ✅ Study folder clean and professional
|
|
|
|
Test output:
|
|
```
|
|
Verifying outputs...
|
|
[OK] Output directory created
|
|
[OK] History file created
|
|
[OK] Results file created
|
|
[OK] Extractors manifest (references core library)
|
|
|
|
Checks passed: 18/18
|
|
[SUCCESS] END-TO-END TEST PASSED!
|
|
```
|
|
|
|
## Migration Guide
|
|
|
|
### For Future Studies:
|
|
|
|
**What changed**:
|
|
- Extractors are now in `optimization_engine/extractors/` (core library)
|
|
- Study folders only contain `extractors_manifest.json` (not code)
|
|
|
|
**No action required**:
|
|
- System automatically uses new architecture
|
|
- Backward compatible (legacy mode available with `use_core_library=False`)
|
|
|
|
### For Developers:
|
|
|
|
**To add new extractors**:
|
|
1. LLM generates extractor code
|
|
2. `ExtractorLibrary.get_or_create()` checks if already exists
|
|
3. If new: adds to `optimization_engine/extractors/`
|
|
4. If exists: reuses existing file
|
|
5. Study gets manifest reference, not copy of code
|
|
|
|
**To view library**:
|
|
```python
|
|
from optimization_engine.extractor_library import ExtractorLibrary
|
|
|
|
library = ExtractorLibrary()
|
|
print(library.get_library_summary())
|
|
```
|
|
|
|
## Next Steps (Future Work)
|
|
|
|
1. **Hook Library System**: Implement same architecture for hooks
|
|
- Currently: Hooks still use legacy per-study generation
|
|
- Future: `optimization_engine/hooks/` library like extractors
|
|
|
|
2. **Library Documentation**: Auto-generate docs for each extractor
|
|
- Extract docstrings from library extractors
|
|
- Create browsable documentation
|
|
|
|
3. **Versioning**: Track extractor versions for reproducibility
|
|
- Tag extractors with creation date/version
|
|
- Allow studies to pin specific versions
|
|
|
|
4. **CLI Tool**: View and manage library
|
|
- `python -m optimization_engine.extractors list`
|
|
- `python -m optimization_engine.extractors info <signature>`
|
|
|
|
## Files Modified
|
|
|
|
1. **New Files**:
|
|
- `optimization_engine/extractor_library.py` - Core library manager
|
|
- `optimization_engine/extractors/__init__.py` - Package init
|
|
- `optimization_engine/extractors/catalog.json` - Library catalog
|
|
- `docs/ARCHITECTURE_REFACTOR_NOV17.md` - This document
|
|
|
|
2. **Modified Files**:
|
|
- `optimization_engine/extractor_orchestrator.py` - Use library instead of per-study
|
|
- `optimization_engine/llm_optimization_runner.py` - Remove per-study directories
|
|
- `tests/test_phase_3_2_e2e.py` - Check for manifest instead of directories
|
|
|
|
## Commit Message
|
|
|
|
```
|
|
refactor: Implement centralized extractor library to eliminate code duplication
|
|
|
|
MAJOR ARCHITECTURE REFACTOR - Clean Study Folders
|
|
|
|
Problem:
|
|
- Every substudy was generating duplicate extractor code
|
|
- Study folders polluted with reusable library code
|
|
- No code reuse across studies
|
|
- Not production-grade architecture
|
|
|
|
Solution:
|
|
Implemented centralized library system:
|
|
- Core extractors in optimization_engine/extractors/
|
|
- Signature-based deduplication
|
|
- Studies only store metadata (extractors_manifest.json)
|
|
- Clean separation: studies = data, core = code
|
|
|
|
Changes:
|
|
1. Created ExtractorLibrary with smart deduplication
|
|
2. Updated ExtractorOrchestrator to use core library
|
|
3. Updated LLMOptimizationRunner to stop creating per-study directories
|
|
4. Updated tests to verify clean study folder structure
|
|
|
|
Results:
|
|
BEFORE: study folder with generated_extractors/ directory (code pollution)
|
|
AFTER: study folder with extractors_manifest.json (just references)
|
|
|
|
Core library: optimization_engine/extractors/
|
|
- extract_displacement.py
|
|
- extract_von_mises_stress.py
|
|
- extract_mass.py
|
|
- catalog.json (tracks all extractors)
|
|
|
|
Study folders NOW ONLY contain:
|
|
- extractors_manifest.json (references to core library)
|
|
- llm_workflow_config.json (study configuration)
|
|
- optimization_results.json (results)
|
|
- optimization_history.json (trial history)
|
|
|
|
Production-grade architecture for "insanely good engineering software that evolves with time"
|
|
|
|
🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
|
|
|
Co-Authored-By: Claude <noreply@anthropic.com>
|
|
```
|
|
|
|
## Summary for Morning
|
|
|
|
**What was done**:
|
|
1. ✅ Created centralized extractor library system
|
|
2. ✅ Eliminated per-study code duplication
|
|
3. ✅ Clean study folder architecture
|
|
4. ✅ E2E tests pass with new structure
|
|
5. ✅ Comprehensive documentation
|
|
|
|
**What you'll see**:
|
|
- Studies now only contain metadata (no code!)
|
|
- Core library in `optimization_engine/extractors/`
|
|
- Professional, production-grade architecture
|
|
|
|
**Ready for**:
|
|
- Continue Phase 3.2 development
|
|
- Same approach for hooks library (next iteration)
|
|
- Building "insanely good engineering software"
|
|
|
|
Have a good night! ✨
|