refactor: Implement centralized extractor library to eliminate code duplication
MAJOR ARCHITECTURE REFACTOR - Clean Study Folders
Problem Identified by User:
"My study folder is a mess, why? I want some order and real structure to develop
an insanely good engineering software that evolve with time."
- Every substudy was generating duplicate extractor code
- Study folders polluted with reusable library code (generated_extractors/, generated_hooks/)
- No code reuse across studies
- Not production-grade architecture
Solution - Centralized Library System:
Implemented smart library with signature-based deduplication:
- Core extractors in optimization_engine/extractors/
- Studies only store metadata (extractors_manifest.json)
- Clean separation: studies = data, core = code
Architecture:
BEFORE (BAD):
studies/my_study/
generated_extractors/ ❌ Code pollution!
extract_displacement.py
extract_von_mises_stress.py
generated_hooks/ ❌ Code pollution!
llm_workflow_config.json
results.json
AFTER (GOOD):
optimization_engine/extractors/ ✓ Core library
extract_displacement.py
extract_stress.py
catalog.json
studies/my_study/
extractors_manifest.json ✓ Just references!
llm_workflow_config.json ✓ Config
optimization_results.json ✓ Results
New Components:
1. ExtractorLibrary (extractor_library.py)
- Signature-based deduplication
- Centralized catalog (catalog.json)
- Study manifest generation
- Reusability across all studies
2. Updated ExtractorOrchestrator
- Uses core library instead of per-study generation
- Creates manifest instead of copying code
- Backward compatible (legacy mode available)
3. Updated LLMOptimizationRunner
- Removed generated_extractors/ directory creation
- Removed generated_hooks/ directory creation
- Uses core library exclusively
4. Updated Tests
- Verifies extractors_manifest.json exists
- Checks for clean study folder structure
- All 18/18 checks pass
Results:
Study folders NOW ONLY contain:
✓ extractors_manifest.json - references to core library
✓ llm_workflow_config.json - study configuration
✓ optimization_results.json - optimization results
✓ optimization_history.json - trial history
✓ .db file - Optuna database
Core library contains:
✓ extract_displacement.py - reusable across ALL studies
✓ extract_von_mises_stress.py - reusable across ALL studies
✓ extract_mass.py - reusable across ALL studies
✓ catalog.json - tracks all extractors with signatures
Benefits:
- Clean, professional study folder structure
- Code reuse eliminates duplication
- Library grows over time, studies stay clean
- Production-grade architecture
- "Insanely good engineering software that evolves with time"
Testing:
E2E test passes with clean folder structure
- No generated_extractors/ pollution
- Manifest correctly references library
- Core library populated with reusable extractors
- Study folder professional and minimal
Documentation:
- Added comprehensive architecture doc (docs/ARCHITECTURE_REFACTOR_NOV17.md)
- Includes migration guide
- Documents future work (hooks library, versioning, CLI tools)
Next Steps:
- Apply same architecture to hooks library
- Add auto-generated documentation for library
- Implement versioning for reproducibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -22,6 +22,7 @@ import logging
|
||||
from dataclasses import dataclass
|
||||
|
||||
from optimization_engine.pynastran_research_agent import PyNastranResearchAgent, ExtractionPattern
|
||||
from optimization_engine.extractor_library import ExtractorLibrary, create_study_manifest
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -46,14 +47,18 @@ class ExtractorOrchestrator:
|
||||
|
||||
def __init__(self,
|
||||
extractors_dir: Optional[Path] = None,
|
||||
knowledge_base_path: Optional[Path] = None):
|
||||
knowledge_base_path: Optional[Path] = None,
|
||||
use_core_library: bool = True):
|
||||
"""
|
||||
Initialize the orchestrator.
|
||||
|
||||
Args:
|
||||
extractors_dir: Directory to save generated extractors
|
||||
extractors_dir: Directory to save study manifest (not extractor code!)
|
||||
knowledge_base_path: Path to pyNastran pattern knowledge base
|
||||
use_core_library: Use centralized library (True) or per-study generation (False, legacy)
|
||||
"""
|
||||
self.use_core_library = use_core_library
|
||||
|
||||
if extractors_dir is None:
|
||||
extractors_dir = Path(__file__).parent / "result_extractors" / "generated"
|
||||
|
||||
@@ -63,10 +68,19 @@ class ExtractorOrchestrator:
|
||||
# Initialize Phase 3 research agent
|
||||
self.research_agent = PyNastranResearchAgent(knowledge_base_path)
|
||||
|
||||
# Initialize centralized library (NEW ARCHITECTURE)
|
||||
if use_core_library:
|
||||
self.library = ExtractorLibrary()
|
||||
logger.info(f"Using centralized extractor library: {self.library.library_dir}")
|
||||
else:
|
||||
self.library = None
|
||||
logger.warning("Using legacy per-study extractor generation (not recommended)")
|
||||
|
||||
# Registry of generated extractors for this session
|
||||
self.extractors: Dict[str, GeneratedExtractor] = {}
|
||||
self.extractor_signatures: List[str] = [] # Track which library extractors were used
|
||||
|
||||
logger.info(f"ExtractorOrchestrator initialized with extractors_dir: {self.extractors_dir}")
|
||||
logger.info(f"ExtractorOrchestrator initialized")
|
||||
|
||||
def process_llm_workflow(self, llm_output: Dict[str, Any]) -> List[GeneratedExtractor]:
|
||||
"""
|
||||
@@ -114,6 +128,11 @@ class ExtractorOrchestrator:
|
||||
logger.error(f"Failed to generate extractor for {feature.get('action')}: {e}")
|
||||
# Continue with other features
|
||||
|
||||
# NEW ARCHITECTURE: Create study manifest (not copy code)
|
||||
if self.use_core_library and self.library and self.extractor_signatures:
|
||||
create_study_manifest(self.extractor_signatures, self.extractors_dir)
|
||||
logger.info("Study manifest created - extractors referenced from core library")
|
||||
|
||||
logger.info(f"Generated {len(generated_extractors)} extractors")
|
||||
return generated_extractors
|
||||
|
||||
@@ -147,14 +166,24 @@ class ExtractorOrchestrator:
|
||||
logger.info(f"Generating extractor code using pattern: {pattern.name}")
|
||||
extractor_code = self.research_agent.generate_extractor_code(research_request)
|
||||
|
||||
# Create filename from action
|
||||
filename = self._action_to_filename(action)
|
||||
file_path = self.extractors_dir / filename
|
||||
# NEW ARCHITECTURE: Use centralized library
|
||||
if self.use_core_library and self.library:
|
||||
# Add to/retrieve from core library (deduplication happens here)
|
||||
file_path = self.library.get_or_create(feature, extractor_code)
|
||||
|
||||
# Save extractor to file
|
||||
logger.info(f"Saving extractor to: {file_path}")
|
||||
with open(file_path, 'w') as f:
|
||||
f.write(extractor_code)
|
||||
# Track signature for study manifest
|
||||
signature = self.library._compute_signature(feature)
|
||||
self.extractor_signatures.append(signature)
|
||||
|
||||
logger.info(f"Extractor available in core library: {file_path}")
|
||||
else:
|
||||
# LEGACY: Save to per-study directory
|
||||
filename = self._action_to_filename(action)
|
||||
file_path = self.extractors_dir / filename
|
||||
|
||||
logger.info(f"Saving extractor to study directory (legacy): {file_path}")
|
||||
with open(file_path, 'w') as f:
|
||||
f.write(extractor_code)
|
||||
|
||||
# Extract function name from generated code
|
||||
function_name = self._extract_function_name(extractor_code)
|
||||
|
||||
Reference in New Issue
Block a user