docs: Major documentation overhaul - restructure folders, update tagline, add Getting Started guide

- Restructure docs/ folder (remove numeric prefixes):
  - 04_USER_GUIDES -> guides/
  - 05_API_REFERENCE -> api/
  - 06_PHYSICS -> physics/
  - 07_DEVELOPMENT -> development/
  - 08_ARCHIVE -> archive/
  - 09_DIAGRAMS -> diagrams/

- Replace tagline 'Talk, don't click' with 'LLM-driven optimization framework' in 9 files

- Create comprehensive docs/GETTING_STARTED.md:
  - Prerequisites and quick setup
  - Project structure overview
  - First study tutorial (Claude or manual)
  - Dashboard usage guide
  - Neural acceleration introduction

- Rewrite docs/00_INDEX.md with correct paths and modern structure

- Archive obsolete files:
  - 01_PROTOCOLS.md -> archive/historical/01_PROTOCOLS_legacy.md
  - 03_GETTING_STARTED.md -> archive/historical/
  - ATOMIZER_PODCAST_BRIEFING.md -> archive/marketing/

- Update timestamps to 2026-01-20 across all key files

- Update .gitignore to exclude docs/generated/

- Version bump: ATOMIZER_CONTEXT v1.8 -> v2.0
This commit is contained in:
2026-01-20 10:03:45 -05:00
parent 37f73cc2be
commit ea437d360e
103 changed files with 8980 additions and 327 deletions

View File

@@ -0,0 +1,284 @@
# Architecture Refactor: Centralized Library System
**Date**: November 17, 2025
**Phase**: 3.2 Architecture Cleanup
**Author**: Claude Code (with Antoine's direction)
## Problem Statement
You identified a critical architectural flaw:
> "ok, now, quick thing, why do very basic hooks get recreated and stored in the substudies? those should be just core accessed hooked right? is it only because its a test?
>
> What I need in studies is the config, files, setup, report, results etc not core hooks, those should go in atomizer hooks library with their doc etc no? I mean, applied only info = studies, and reusdable and core functions = atomizer foundation.
>
> My study folder is a mess, why? I want some order and real structure to develop an insanely good engineering software that evolve with time."
### Old Architecture (BAD):
```
studies/
simple_beam_optimization/
2_substudies/
test_e2e_3trials_XXX/
generated_extractors/ ❌ Code pollution!
extract_displacement.py
extract_von_mises_stress.py
extract_mass.py
generated_hooks/ ❌ Code pollution!
custom_hook.py
llm_workflow_config.json
optimization_results.json
```
**Problems**:
- Every substudy duplicates extractor code
- Study folders polluted with reusable code
- No code reuse across studies
- Mess! Not production-grade engineering software
### New Architecture (GOOD):
```
optimization_engine/
extractors/ ✓ Core reusable library
extract_displacement.py
extract_stress.py
extract_mass.py
catalog.json ✓ Tracks all extractors
hooks/ ✓ Core reusable library
(future implementation)
studies/
simple_beam_optimization/
2_substudies/
my_optimization/
extractors_manifest.json ✓ Just references!
llm_workflow_config.json ✓ Study config
optimization_results.json ✓ Results
optimization_history.json ✓ History
```
**Benefits**:
- ✅ Clean study folders (only metadata)
- ✅ Reusable core libraries
- ✅ Deduplication (same extractor = single file)
- ✅ Production-grade architecture
- ✅ Evolves with time (library grows, studies stay clean)
## Implementation
### 1. Extractor Library Manager (`extractor_library.py`)
New smart library system with:
- **Signature-based deduplication**: Two extractors with same functionality = one file
- **Catalog tracking**: `catalog.json` tracks all library extractors
- **Study manifests**: Studies just reference which extractors they used
```python
class ExtractorLibrary:
def get_or_create(self, llm_feature, extractor_code):
"""Add to library or reuse existing."""
signature = self._compute_signature(llm_feature)
if signature in self.catalog:
# Reuse existing!
return self.library_dir / self.catalog[signature]['filename']
else:
# Add new to library
self.catalog[signature] = {...}
return extractor_file
```
### 2. Updated Components
**ExtractorOrchestrator** (`extractor_orchestrator.py`):
- Now uses `ExtractorLibrary` instead of per-study generation
- Creates `extractors_manifest.json` instead of copying code
- Backward compatible (legacy mode available)
**LLMOptimizationRunner** (`llm_optimization_runner.py`):
- Removed per-study `generated_extractors/` directory creation
- Removed per-study `generated_hooks/` directory creation
- Uses core library exclusively
**Test Suite** (`test_phase_3_2_e2e.py`):
- Updated to check for `extractors_manifest.json` instead of `generated_extractors/`
- Verifies clean study folder structure
## Results
### Before Refactor:
```
test_e2e_3trials_XXX/
├── generated_extractors/ ❌ 3 Python files
│ ├── extract_displacement.py
│ ├── extract_von_mises_stress.py
│ └── extract_mass.py
├── generated_hooks/ ❌ Hook files
├── llm_workflow_config.json
└── optimization_results.json
```
### After Refactor:
```
test_e2e_3trials_XXX/
├── extractors_manifest.json ✅ Just references!
├── llm_workflow_config.json ✅ Study config
├── optimization_results.json ✅ Results
└── optimization_history.json ✅ History
optimization_engine/extractors/ ✅ Core library
├── extract_displacement.py
├── extract_von_mises_stress.py
├── extract_mass.py
└── catalog.json
```
## Testing
E2E test now passes with clean folder structure:
-`extractors_manifest.json` created
- ✅ Core library populated with 3 extractors
- ✅ NO `generated_extractors/` pollution
- ✅ Study folder clean and professional
Test output:
```
Verifying outputs...
[OK] Output directory created
[OK] History file created
[OK] Results file created
[OK] Extractors manifest (references core library)
Checks passed: 18/18
[SUCCESS] END-TO-END TEST PASSED!
```
## Migration Guide
### For Future Studies:
**What changed**:
- Extractors are now in `optimization_engine/extractors/` (core library)
- Study folders only contain `extractors_manifest.json` (not code)
**No action required**:
- System automatically uses new architecture
- Backward compatible (legacy mode available with `use_core_library=False`)
### For Developers:
**To add new extractors**:
1. LLM generates extractor code
2. `ExtractorLibrary.get_or_create()` checks if already exists
3. If new: adds to `optimization_engine/extractors/`
4. If exists: reuses existing file
5. Study gets manifest reference, not copy of code
**To view library**:
```python
from optimization_engine.extractor_library import ExtractorLibrary
library = ExtractorLibrary()
print(library.get_library_summary())
```
## Next Steps (Future Work)
1. **Hook Library System**: Implement same architecture for hooks
- Currently: Hooks still use legacy per-study generation
- Future: `optimization_engine/hooks/` library like extractors
2. **Library Documentation**: Auto-generate docs for each extractor
- Extract docstrings from library extractors
- Create browsable documentation
3. **Versioning**: Track extractor versions for reproducibility
- Tag extractors with creation date/version
- Allow studies to pin specific versions
4. **CLI Tool**: View and manage library
- `python -m optimization_engine.extractors list`
- `python -m optimization_engine.extractors info <signature>`
## Files Modified
1. **New Files**:
- `optimization_engine/extractor_library.py` - Core library manager
- `optimization_engine/extractors/__init__.py` - Package init
- `optimization_engine/extractors/catalog.json` - Library catalog
- `docs/ARCHITECTURE_REFACTOR_NOV17.md` - This document
2. **Modified Files**:
- `optimization_engine/extractor_orchestrator.py` - Use library instead of per-study
- `optimization_engine/llm_optimization_runner.py` - Remove per-study directories
- `tests/test_phase_3_2_e2e.py` - Check for manifest instead of directories
## Commit Message
```
refactor: Implement centralized extractor library to eliminate code duplication
MAJOR ARCHITECTURE REFACTOR - Clean Study Folders
Problem:
- Every substudy was generating duplicate extractor code
- Study folders polluted with reusable library code
- No code reuse across studies
- Not production-grade architecture
Solution:
Implemented centralized library system:
- Core extractors in optimization_engine/extractors/
- Signature-based deduplication
- Studies only store metadata (extractors_manifest.json)
- Clean separation: studies = data, core = code
Changes:
1. Created ExtractorLibrary with smart deduplication
2. Updated ExtractorOrchestrator to use core library
3. Updated LLMOptimizationRunner to stop creating per-study directories
4. Updated tests to verify clean study folder structure
Results:
BEFORE: study folder with generated_extractors/ directory (code pollution)
AFTER: study folder with extractors_manifest.json (just references)
Core library: optimization_engine/extractors/
- extract_displacement.py
- extract_von_mises_stress.py
- extract_mass.py
- catalog.json (tracks all extractors)
Study folders NOW ONLY contain:
- extractors_manifest.json (references to core library)
- llm_workflow_config.json (study configuration)
- optimization_results.json (results)
- optimization_history.json (trial history)
Production-grade architecture for "insanely good engineering software that evolves with time"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
```
## Summary for Morning
**What was done**:
1. ✅ Created centralized extractor library system
2. ✅ Eliminated per-study code duplication
3. ✅ Clean study folder architecture
4. ✅ E2E tests pass with new structure
5. ✅ Comprehensive documentation
**What you'll see**:
- Studies now only contain metadata (no code!)
- Core library in `optimization_engine/extractors/`
- Professional, production-grade architecture
**Ready for**:
- Continue Phase 3.2 development
- Same approach for hooks library (next iteration)
- Building "insanely good engineering software"
Have a good night! ✨