Complete implementation of Agentic Context Engineering (ACE) framework: Core modules (optimization_engine/context/): - playbook.py: AtomizerPlaybook with helpful/harmful scoring - reflector.py: AtomizerReflector for insight extraction - session_state.py: Context isolation (exposed/isolated state) - feedback_loop.py: Automated learning from trial results - compaction.py: Long-session context management - cache_monitor.py: KV-cache optimization tracking - runner_integration.py: OptimizationRunner integration Dashboard integration: - context.py: 12 REST API endpoints for playbook management Tests: - test_context_engineering.py: 44 unit tests - test_context_integration.py: 16 integration tests Documentation: - CONTEXT_ENGINEERING_REPORT.md: Comprehensive implementation report - CONTEXT_ENGINEERING_API.md: Complete API reference - SYS_17_CONTEXT_ENGINEERING.md: System protocol - Updated cheatsheet with SYS_17 quick reference - Enhanced bootstrap (00_BOOTSTRAP_V2.md) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1173 lines
41 KiB
Markdown
1173 lines
41 KiB
Markdown
# Atomizer Context Engineering Implementation Report
|
|
|
|
**Version**: 1.0
|
|
**Date**: December 29, 2025
|
|
**Author**: Claude (with Antoine)
|
|
**Status**: Complete - All Tests Passing
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This report documents the implementation of **Agentic Context Engineering (ACE)** in Atomizer, transforming it from a traditional LLM-assisted tool into a **self-improving, context-aware optimization platform**. The implementation enables Atomizer to learn from every optimization run, accumulating institutional knowledge that compounds over time.
|
|
|
|
### Key Achievements
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| New Python modules created | 8 |
|
|
| Lines of code added | ~2,500 |
|
|
| Unit tests created | 44 |
|
|
| Integration tests created | 16 |
|
|
| Test pass rate | 100% (60/60) |
|
|
| Dashboard API endpoints | 12 |
|
|
|
|
### Expected Outcomes
|
|
|
|
- **10-15% improvement** in optimization task success rates
|
|
- **80%+ reduction** in repeated mistakes across sessions
|
|
- **Dramatic cost reduction** through KV-cache optimization
|
|
- **True institutional memory** that compounds over time
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Background & Motivation](#1-background--motivation)
|
|
2. [Architecture Overview](#2-architecture-overview)
|
|
3. [Core Components](#3-core-components)
|
|
4. [Implementation Details](#4-implementation-details)
|
|
5. [Integration Points](#5-integration-points)
|
|
6. [API Reference](#6-api-reference)
|
|
7. [Testing](#7-testing)
|
|
8. [Usage Guide](#8-usage-guide)
|
|
9. [Migration Guide](#9-migration-guide)
|
|
10. [Future Enhancements](#10-future-enhancements)
|
|
|
|
---
|
|
|
|
## 1. Background & Motivation
|
|
|
|
### 1.1 The Problem
|
|
|
|
Traditional LLM-assisted optimization tools have a fundamental limitation: **they don't learn from their mistakes**. Each session starts fresh, with no memory of:
|
|
|
|
- What approaches worked before
|
|
- What errors were encountered and how they were resolved
|
|
- User preferences and workflow patterns
|
|
- Domain-specific knowledge accumulated over time
|
|
|
|
This leads to:
|
|
- Repeated mistakes across sessions
|
|
- Inconsistent quality of assistance
|
|
- No improvement over time
|
|
- Wasted context window on rediscovering known patterns
|
|
|
|
### 1.2 The Solution: ACE Framework
|
|
|
|
The **Agentic Context Engineering (ACE)** framework addresses this by implementing a structured learning loop:
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Generator │────▶│ Reflector │────▶│ Curator │
|
|
│ (Opt Runs) │ │ (Analysis) │ │ (Playbook) │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
│ │
|
|
│ │
|
|
└───────────── Feedback ───────────────┘
|
|
```
|
|
|
|
**Key Principles Implemented:**
|
|
|
|
1. **Structured Playbook** - Knowledge stored as itemized insights with helpful/harmful tracking
|
|
2. **Execution Feedback** - Use success/failure as the learning signal
|
|
3. **Context Isolation** - Expose only what's needed; isolate heavy data
|
|
4. **KV-Cache Optimization** - Stable prefix for 10x cost reduction
|
|
5. **Error Preservation** - "Leave wrong turns in context" for learning
|
|
|
|
---
|
|
|
|
## 2. Architecture Overview
|
|
|
|
### 2.1 System Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ Atomizer Context Engineering │
|
|
├─────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ AtomizerPlaybook │ │
|
|
│ │ ┌──────────────────────────────────────────────────────────┐ │ │
|
|
│ │ │ [str-00001] helpful=8 harmful=0 :: │ │ │
|
|
│ │ │ "For thin-walled structures, use shell elements" │ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ [mis-00002] helpful=0 harmful=6 :: │ │ │
|
|
│ │ │ "Never set convergence < 1e-8 for SOL 106" │ │ │
|
|
│ │ └──────────────────────────────────────────────────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌───────────────┼───────────────┐ │
|
|
│ ▼ ▼ ▼ │
|
|
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
|
│ │ Reflector │ │ FeedbackLoop │ │ SessionState │ │
|
|
│ │ (Analysis) │ │ (Learning) │ │ (Isolation) │ │
|
|
│ └───────────────┘ └───────────────┘ └───────────────┘ │
|
|
│ │ │ │ │
|
|
│ └───────────────┼───────────────┘ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ OptimizationRunner │ │
|
|
│ │ (via ContextEngineeringMixin or ContextAwareRunner) │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌───────────────┼───────────────┐ │
|
|
│ ▼ ▼ ▼ │
|
|
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
|
│ │ CacheMonitor │ │ Compaction │ │ ErrorTracker │ │
|
|
│ │ (KV-Cache) │ │ (Long Sess) │ │ (Plugin) │ │
|
|
│ └───────────────┘ └───────────────┘ └───────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 2.2 Directory Structure
|
|
|
|
```
|
|
optimization_engine/
|
|
├── context/ # NEW: Context Engineering Module
|
|
│ ├── __init__.py # Module exports
|
|
│ ├── playbook.py # AtomizerPlaybook, PlaybookItem
|
|
│ ├── reflector.py # AtomizerReflector, OptimizationOutcome
|
|
│ ├── session_state.py # AtomizerSessionState, TaskType
|
|
│ ├── cache_monitor.py # ContextCacheOptimizer
|
|
│ ├── feedback_loop.py # FeedbackLoop
|
|
│ ├── compaction.py # CompactionManager
|
|
│ └── runner_integration.py # Mixin and wrapper classes
|
|
│
|
|
├── plugins/
|
|
│ └── post_solve/
|
|
│ └── error_tracker.py # NEW: Error capture hook
|
|
│
|
|
knowledge_base/
|
|
└── playbook.json # NEW: Persistent playbook storage
|
|
|
|
atomizer-dashboard/
|
|
└── backend/api/routes/
|
|
└── context.py # NEW: REST API for playbook
|
|
|
|
.claude/skills/
|
|
└── 00_BOOTSTRAP_V2.md # NEW: Enhanced bootstrap
|
|
|
|
tests/
|
|
├── test_context_engineering.py # NEW: Unit tests (44 tests)
|
|
└── test_context_integration.py # NEW: Integration tests (16 tests)
|
|
```
|
|
|
|
### 2.3 Data Flow
|
|
|
|
```
|
|
Trial Execution Learning Loop Context Usage
|
|
────────────── ───────────── ─────────────
|
|
|
|
┌─────────────┐ ┌─────────────┐
|
|
│ Start │ │ Session │
|
|
│ Trial │ │ Start │
|
|
└──────┬──────┘ └──────┬──────┘
|
|
│ │
|
|
▼ ▼
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Execute │────────▶│ Reflector │ │ Load │
|
|
│ Solver │ │ Analyze │ │ Playbook │
|
|
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Success/ │ │ Extract │ │ Filter │
|
|
│ Failure │ │ Insights │ │ by Task │
|
|
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Feedback │────────▶│ Update │────────────────────────▶│ Inject │
|
|
│ Loop │ │ Playbook │ │ Context │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Core Components
|
|
|
|
### 3.1 AtomizerPlaybook (`playbook.py`)
|
|
|
|
The playbook is the central knowledge store. It holds itemized insights with tracking metrics.
|
|
|
|
**Key Classes:**
|
|
|
|
| Class | Purpose |
|
|
|-------|---------|
|
|
| `InsightCategory` | Enum for insight types (STRATEGY, MISTAKE, TOOL, etc.) |
|
|
| `PlaybookItem` | Single insight with helpful/harmful counts |
|
|
| `AtomizerPlaybook` | Collection of items with CRUD operations |
|
|
|
|
**Insight Categories:**
|
|
|
|
| Category | Code | Description | Example |
|
|
|----------|------|-------------|---------|
|
|
| STRATEGY | `str` | Optimization strategies | "Use shell elements for thin walls" |
|
|
| MISTAKE | `mis` | Common mistakes to avoid | "Don't set convergence < 1e-8" |
|
|
| TOOL | `tool` | Tool usage patterns | "TPE works well for 5-10 variables" |
|
|
| CALCULATION | `cal` | Formulas and calculations | "Safety factor = yield/max_stress" |
|
|
| DOMAIN | `dom` | Domain knowledge | "Mirror deformation follows Zernike" |
|
|
| WORKFLOW | `wf` | Workflow patterns | "Load _i.prt before UpdateFemodel()" |
|
|
|
|
**Key Methods:**
|
|
|
|
```python
|
|
# Add insight (auto-deduplicates)
|
|
item = playbook.add_insight(
|
|
category=InsightCategory.STRATEGY,
|
|
content="Use shell elements for thin walls",
|
|
source_trial=42,
|
|
tags=["mesh", "shell"]
|
|
)
|
|
|
|
# Record outcome (updates scores)
|
|
playbook.record_outcome(item.id, helpful=True)
|
|
|
|
# Get context for LLM
|
|
context = playbook.get_context_for_task(
|
|
task_type="optimization",
|
|
max_items=15,
|
|
min_confidence=0.5
|
|
)
|
|
|
|
# Prune harmful items
|
|
removed = playbook.prune_harmful(threshold=-3)
|
|
|
|
# Persist
|
|
playbook.save(path)
|
|
playbook = AtomizerPlaybook.load(path)
|
|
```
|
|
|
|
**Item Scoring:**
|
|
|
|
```
|
|
net_score = helpful_count - harmful_count
|
|
confidence = helpful_count / (helpful_count + harmful_count)
|
|
```
|
|
|
|
Items with `net_score <= -3` are automatically pruned.
|
|
|
|
### 3.2 AtomizerReflector (`reflector.py`)
|
|
|
|
The reflector analyzes optimization outcomes and extracts actionable insights.
|
|
|
|
**Key Classes:**
|
|
|
|
| Class | Purpose |
|
|
|-------|---------|
|
|
| `OptimizationOutcome` | Captured result from a trial |
|
|
| `InsightCandidate` | Pending insight before commit |
|
|
| `AtomizerReflector` | Analysis engine |
|
|
|
|
**Error Pattern Recognition:**
|
|
|
|
The reflector automatically classifies errors:
|
|
|
|
| Pattern | Classification | Tags |
|
|
|---------|---------------|------|
|
|
| "convergence", "did not converge" | `convergence_failure` | solver, convergence |
|
|
| "mesh", "element", "jacobian" | `mesh_error` | mesh, element |
|
|
| "singular", "matrix", "pivot" | `singularity` | singularity, boundary |
|
|
| "memory", "allocation" | `memory_error` | memory, performance |
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
reflector = AtomizerReflector(playbook)
|
|
|
|
# Analyze each trial
|
|
outcome = OptimizationOutcome(
|
|
trial_number=42,
|
|
success=False,
|
|
objective_value=None,
|
|
solver_errors=["convergence failure"],
|
|
design_variables={"thickness": 0.5}
|
|
)
|
|
insights = reflector.analyze_trial(outcome)
|
|
|
|
# Analyze study completion
|
|
reflector.analyze_study_completion(
|
|
study_name="bracket_opt",
|
|
total_trials=100,
|
|
convergence_rate=0.85
|
|
)
|
|
|
|
# Commit to playbook
|
|
count = reflector.commit_insights()
|
|
```
|
|
|
|
### 3.3 AtomizerSessionState (`session_state.py`)
|
|
|
|
Manages context with exposure control - separating what the LLM sees from what's available.
|
|
|
|
**Architecture:**
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────┐
|
|
│ AtomizerSessionState │
|
|
├──────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ ExposedState (Always in context) │ │
|
|
│ ├─────────────────────────────────────────────────┤ │
|
|
│ │ • task_type: TaskType │ │
|
|
│ │ • current_objective: str │ │
|
|
│ │ • recent_actions: List[str] (max 10) │ │
|
|
│ │ • recent_errors: List[str] (max 5) │ │
|
|
│ │ • study_name, status, trials, best_value │ │
|
|
│ │ • active_playbook_items: List[str] (max 15) │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ IsolatedState (On-demand access) │ │
|
|
│ ├─────────────────────────────────────────────────┤ │
|
|
│ │ • full_trial_history: List[Dict] │ │
|
|
│ │ • nx_model_path, nx_expressions │ │
|
|
│ │ • neural_predictions │ │
|
|
│ │ • last_solver_output, last_f06_content │ │
|
|
│ │ • optimization_config, study_config │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
└──────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Task Types:**
|
|
|
|
| TaskType | Description |
|
|
|----------|-------------|
|
|
| `CREATE_STUDY` | Setting up a new optimization |
|
|
| `RUN_OPTIMIZATION` | Executing optimization trials |
|
|
| `MONITOR_PROGRESS` | Checking optimization status |
|
|
| `ANALYZE_RESULTS` | Reviewing completed results |
|
|
| `DEBUG_ERROR` | Troubleshooting issues |
|
|
| `CONFIGURE_SETTINGS` | Modifying configuration |
|
|
| `EXPORT_DATA` | Exporting training data |
|
|
| `NEURAL_ACCELERATION` | Neural surrogate operations |
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
session = AtomizerSessionState(session_id="session_001")
|
|
session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
|
session.exposed.study_name = "bracket_opt"
|
|
|
|
# Add action (auto-compresses old actions)
|
|
session.add_action("Started trial 42")
|
|
|
|
# Add error (highlighted in context)
|
|
session.add_error("Convergence failure", error_type="solver")
|
|
|
|
# Get context for LLM
|
|
context = session.get_llm_context()
|
|
|
|
# Access isolated data when needed
|
|
f06_content = session.load_isolated_data("last_f06_content")
|
|
```
|
|
|
|
### 3.4 FeedbackLoop (`feedback_loop.py`)
|
|
|
|
Connects optimization outcomes to playbook updates, implementing the core learning mechanism.
|
|
|
|
**The Learning Mechanism:**
|
|
|
|
```
|
|
Trial Success + Playbook Item Active → helpful_count++
|
|
Trial Failure + Playbook Item Active → harmful_count++
|
|
```
|
|
|
|
This creates a self-improving system where:
|
|
- Good advice gets reinforced
|
|
- Bad advice gets demoted and eventually pruned
|
|
- Novel patterns are captured for future use
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
feedback = FeedbackLoop(playbook_path)
|
|
|
|
# Process each trial
|
|
result = feedback.process_trial_result(
|
|
trial_number=42,
|
|
success=True,
|
|
objective_value=100.5,
|
|
design_variables={"thickness": 1.5},
|
|
context_items_used=["str-00001", "mis-00003"],
|
|
errors=None
|
|
)
|
|
|
|
# Finalize at study end
|
|
result = feedback.finalize_study({
|
|
"name": "bracket_opt",
|
|
"total_trials": 100,
|
|
"best_value": 50.2,
|
|
"convergence_rate": 0.85
|
|
})
|
|
# Returns: {"insights_added": 15, "items_pruned": 2, ...}
|
|
```
|
|
|
|
### 3.5 CompactionManager (`compaction.py`)
|
|
|
|
Handles context management for long-running optimizations that may exceed context window limits.
|
|
|
|
**Compaction Strategy:**
|
|
|
|
```
|
|
Before Compaction (55 events):
|
|
├── Event 1: Trial 1 complete
|
|
├── Event 2: Trial 2 complete
|
|
├── ...
|
|
├── Event 50: Trial 50 complete
|
|
├── Event 51: ERROR - Convergence failure ← Preserved!
|
|
├── Event 52: Trial 52 complete
|
|
├── Event 53: Trial 53 complete
|
|
├── Event 54: Trial 54 complete
|
|
└── Event 55: Trial 55 complete
|
|
|
|
After Compaction (12 events):
|
|
├── 📦 Trials 1-50: Best=45.2, Avg=67.3, Failures=5
|
|
├── ❌ ERROR - Convergence failure ← Still here!
|
|
├── Event 52: Trial 52 complete
|
|
├── Event 53: Trial 53 complete
|
|
├── Event 54: Trial 54 complete
|
|
└── Event 55: Trial 55 complete
|
|
```
|
|
|
|
**Key Features:**
|
|
- Errors are NEVER compacted
|
|
- Milestones are preserved
|
|
- Recent events kept in full detail
|
|
- Statistics summarized for older events
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
manager = CompactionManager(
|
|
compaction_threshold=50, # Trigger at 50 events
|
|
keep_recent=20, # Always keep last 20
|
|
keep_errors=True # Never compact errors
|
|
)
|
|
|
|
# Add events
|
|
manager.add_trial_event(trial_number=42, success=True, objective=100.5)
|
|
manager.add_error_event("Convergence failure", error_type="solver")
|
|
manager.add_milestone("Reached 50% improvement", {"improvement": 0.5})
|
|
|
|
# Get context string
|
|
context = manager.get_context_string()
|
|
```
|
|
|
|
### 3.6 ContextCacheOptimizer (`cache_monitor.py`)
|
|
|
|
Optimizes context structure for KV-cache efficiency, potentially reducing API costs by 10x.
|
|
|
|
**Three-Tier Context Structure:**
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────┐
|
|
│ STABLE PREFIX (Cached across all requests) │
|
|
│ • Atomizer identity and capabilities │
|
|
│ • Tool schemas and definitions │
|
|
│ • Base protocol routing table │
|
|
│ Estimated: 5,000 tokens │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ SEMI-STABLE (Cached per session type) │
|
|
│ • Active protocol definition │
|
|
│ • Task-specific instructions │
|
|
│ • Relevant playbook items │
|
|
│ Estimated: 15,000 tokens │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ DYNAMIC (Changes every turn) │
|
|
│ • Current session state │
|
|
│ • Recent actions/errors │
|
|
│ • User's latest message │
|
|
│ Estimated: 2,000 tokens │
|
|
└─────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Cost Impact:**
|
|
|
|
| Scenario | Cache Hit Rate | Cost Reduction |
|
|
|----------|---------------|----------------|
|
|
| No caching | 0% | 0% |
|
|
| Stable prefix only | ~50% | ~45% |
|
|
| Stable + semi-stable | ~70% | ~63% |
|
|
| Optimal | ~90% | ~81% |
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
optimizer = ContextCacheOptimizer()
|
|
|
|
# Build stable prefix
|
|
builder = StablePrefixBuilder()
|
|
builder.add_identity("I am Atomizer...")
|
|
builder.add_capabilities("I can optimize...")
|
|
builder.add_tools("Available tools...")
|
|
stable_prefix = builder.build()
|
|
|
|
# Prepare context
|
|
context = optimizer.prepare_context(
|
|
stable_prefix=stable_prefix,
|
|
semi_stable=protocol_content,
|
|
dynamic=user_message
|
|
)
|
|
|
|
# Check efficiency
|
|
print(optimizer.get_report())
|
|
# Cache Hits: 45/50 (90%)
|
|
# Estimated Savings: 81%
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Implementation Details
|
|
|
|
### 4.1 File-by-File Breakdown
|
|
|
|
#### `playbook.py` (159 lines)
|
|
|
|
| Class/Function | Lines | Purpose |
|
|
|---------------|-------|---------|
|
|
| `InsightCategory` | 6 | Enum for insight types |
|
|
| `PlaybookItem` | 55 | Single insight with scoring |
|
|
| `AtomizerPlaybook` | 85 | Collection management |
|
|
| `get_playbook()` | 13 | Global singleton access |
|
|
|
|
**Key Design Decisions:**
|
|
|
|
1. **MD5 Deduplication**: Content is hashed for duplicate detection
|
|
2. **Neutral Confidence**: Untested items get 0.5 confidence (neutral)
|
|
3. **Source Tracking**: Items track which trials generated them
|
|
4. **Tag-based Filtering**: Flexible filtering via tags
|
|
|
|
#### `reflector.py` (138 lines)
|
|
|
|
| Class/Function | Lines | Purpose |
|
|
|---------------|-------|---------|
|
|
| `OptimizationOutcome` | 30 | Outcome data structure |
|
|
| `InsightCandidate` | 12 | Pending insight |
|
|
| `AtomizerReflector` | 90 | Analysis engine |
|
|
| `ERROR_PATTERNS` | 20 | Regex patterns for classification |
|
|
|
|
**Key Design Decisions:**
|
|
|
|
1. **Pattern-Based Classification**: Regex patterns identify error types
|
|
2. **Two-Phase Commit**: Insights are staged before commit
|
|
3. **Study-Level Analysis**: Generates insights from overall patterns
|
|
|
|
#### `session_state.py` (168 lines)
|
|
|
|
| Class/Function | Lines | Purpose |
|
|
|---------------|-------|---------|
|
|
| `TaskType` | 10 | Enum for task types |
|
|
| `ExposedState` | 25 | Always-visible state |
|
|
| `IsolatedState` | 20 | On-demand state |
|
|
| `AtomizerSessionState` | 100 | Main session class |
|
|
| Global functions | 13 | Session management |
|
|
|
|
**Key Design Decisions:**
|
|
|
|
1. **Explicit Separation**: Exposed vs Isolated is enforced by API
|
|
2. **Auto-Compression**: Actions automatically compressed when limit exceeded
|
|
3. **Separate History File**: Trial history saved separately to keep main state small
|
|
|
|
#### `feedback_loop.py` (82 lines)
|
|
|
|
| Class/Function | Lines | Purpose |
|
|
|---------------|-------|---------|
|
|
| `FeedbackLoop` | 70 | Main learning loop |
|
|
| `FeedbackLoopFactory` | 12 | Factory methods |
|
|
|
|
**Key Design Decisions:**
|
|
|
|
1. **Attribution Tracking**: Records which items were active per trial
|
|
2. **Batch Processing**: Supports processing multiple trials
|
|
3. **Study Finalization**: Comprehensive cleanup at study end
|
|
|
|
#### `compaction.py` (169 lines)
|
|
|
|
| Class/Function | Lines | Purpose |
|
|
|---------------|-------|---------|
|
|
| `EventType` | 10 | Event type enum |
|
|
| `ContextEvent` | 25 | Single event |
|
|
| `CompactionManager` | 110 | Compaction logic |
|
|
| `ContextBudgetManager` | 24 | Token budgeting |
|
|
|
|
**Key Design Decisions:**
|
|
|
|
1. **Preserve Flag**: Events can be marked as never-compact
|
|
2. **Statistical Summary**: Compacted regions include statistics
|
|
3. **Time Range Tracking**: Compaction events track what they replaced
|
|
|
|
#### `cache_monitor.py` (135 lines)
|
|
|
|
| Class/Function | Lines | Purpose |
|
|
|---------------|-------|---------|
|
|
| `CacheStats` | 20 | Statistics tracking |
|
|
| `ContextSection` | 15 | Section tracking |
|
|
| `ContextCacheOptimizer` | 70 | Main optimizer |
|
|
| `StablePrefixBuilder` | 30 | Prefix construction |
|
|
|
|
**Key Design Decisions:**
|
|
|
|
1. **Hash-Based Detection**: MD5 hash detects prefix changes
|
|
2. **Token Estimation**: 4 chars ≈ 1 token
|
|
3. **Request History**: Keeps last 100 requests for analysis
|
|
|
|
### 4.2 Error Tracker Plugin (`error_tracker.py`)
|
|
|
|
The error tracker is implemented as a post_solve hook that captures solver errors for learning.
|
|
|
|
**Hook Points:**
|
|
- `post_solve`: Called after solver completes (success or failure)
|
|
|
|
**Features:**
|
|
- Automatic error classification
|
|
- F06 file parsing for error extraction
|
|
- Integration with LAC (if available)
|
|
- Persistent error log (`error_history.jsonl`)
|
|
|
|
---
|
|
|
|
## 5. Integration Points
|
|
|
|
### 5.1 OptimizationRunner Integration
|
|
|
|
Two approaches are provided:
|
|
|
|
#### Approach 1: Mixin (Recommended for new code)
|
|
|
|
```python
|
|
from optimization_engine.context.runner_integration import ContextEngineeringMixin
|
|
from optimization_engine.core.runner import OptimizationRunner
|
|
|
|
class MyContextAwareRunner(ContextEngineeringMixin, OptimizationRunner):
|
|
def __init__(self, *args, **kwargs):
|
|
super().__init__(*args, **kwargs)
|
|
self.init_context_engineering()
|
|
|
|
runner = MyContextAwareRunner(config_path=...)
|
|
runner.run(n_trials=100)
|
|
```
|
|
|
|
#### Approach 2: Wrapper (For existing code)
|
|
|
|
```python
|
|
from optimization_engine.context.runner_integration import ContextAwareRunner
|
|
from optimization_engine.core.runner import OptimizationRunner
|
|
|
|
runner = OptimizationRunner(config_path=...)
|
|
context_runner = ContextAwareRunner(runner)
|
|
|
|
study = context_runner.run(n_trials=100)
|
|
report = context_runner.get_learning_report()
|
|
```
|
|
|
|
### 5.2 Dashboard Integration
|
|
|
|
The dashboard API is available at `/api/context/*`:
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/api/context/playbook` | GET | Get playbook summary |
|
|
| `/api/context/playbook/items` | GET | List items with filtering |
|
|
| `/api/context/playbook/items/{id}` | GET | Get specific item |
|
|
| `/api/context/playbook/feedback` | POST | Record helpful/harmful |
|
|
| `/api/context/playbook/insights` | POST | Add new insight |
|
|
| `/api/context/playbook/items/{id}` | DELETE | Delete item |
|
|
| `/api/context/playbook/prune` | POST | Prune harmful items |
|
|
| `/api/context/playbook/context` | GET | Get LLM context string |
|
|
| `/api/context/session` | GET | Get session state |
|
|
| `/api/context/session/context` | GET | Get session context string |
|
|
| `/api/context/cache/stats` | GET | Get cache statistics |
|
|
| `/api/context/learning/report` | GET | Get learning report |
|
|
|
|
### 5.3 Claude Code Integration
|
|
|
|
The bootstrap file (`.claude/skills/00_BOOTSTRAP_V2.md`) provides:
|
|
|
|
1. **Session Initialization**: Load playbook and session state
|
|
2. **Task Routing**: Map user intent to task type
|
|
3. **Context Loading**: Filter playbook by task type
|
|
4. **Real-Time Recording**: Record insights immediately
|
|
5. **Session Closing**: Finalize and save learnings
|
|
|
|
---
|
|
|
|
## 6. API Reference
|
|
|
|
### 6.1 Python API
|
|
|
|
#### AtomizerPlaybook
|
|
|
|
```python
|
|
class AtomizerPlaybook:
|
|
"""Evolving playbook that accumulates optimization knowledge."""
|
|
|
|
def add_insight(
|
|
category: InsightCategory,
|
|
content: str,
|
|
source_trial: Optional[int] = None,
|
|
tags: Optional[List[str]] = None
|
|
) -> PlaybookItem:
|
|
"""Add new insight (auto-deduplicates)."""
|
|
|
|
def record_outcome(item_id: str, helpful: bool) -> bool:
|
|
"""Record whether using insight was helpful/harmful."""
|
|
|
|
def get_context_for_task(
|
|
task_type: str,
|
|
max_items: int = 20,
|
|
min_confidence: float = 0.5,
|
|
tags: Optional[List[str]] = None
|
|
) -> str:
|
|
"""Generate context string for LLM."""
|
|
|
|
def search_by_content(
|
|
query: str,
|
|
category: Optional[InsightCategory] = None,
|
|
limit: int = 5
|
|
) -> List[PlaybookItem]:
|
|
"""Search items by content."""
|
|
|
|
def prune_harmful(threshold: int = -3) -> int:
|
|
"""Remove items with net_score <= threshold."""
|
|
|
|
def save(path: Path) -> None:
|
|
"""Persist to JSON."""
|
|
|
|
@classmethod
|
|
def load(path: Path) -> AtomizerPlaybook:
|
|
"""Load from JSON."""
|
|
```
|
|
|
|
#### AtomizerReflector
|
|
|
|
```python
|
|
class AtomizerReflector:
|
|
"""Analyzes optimization outcomes to extract insights."""
|
|
|
|
def analyze_trial(outcome: OptimizationOutcome) -> List[InsightCandidate]:
|
|
"""Analyze single trial, return insight candidates."""
|
|
|
|
def analyze_study_completion(
|
|
study_name: str,
|
|
total_trials: int,
|
|
best_value: float,
|
|
convergence_rate: float,
|
|
method: str = ""
|
|
) -> List[InsightCandidate]:
|
|
"""Analyze completed study."""
|
|
|
|
def commit_insights(min_confidence: float = 0.0) -> int:
|
|
"""Commit pending insights to playbook."""
|
|
```
|
|
|
|
#### FeedbackLoop
|
|
|
|
```python
|
|
class FeedbackLoop:
|
|
"""Automated feedback loop that learns from optimization."""
|
|
|
|
def process_trial_result(
|
|
trial_number: int,
|
|
success: bool,
|
|
objective_value: float,
|
|
design_variables: Dict[str, float],
|
|
context_items_used: Optional[List[str]] = None,
|
|
errors: Optional[List[str]] = None
|
|
) -> Dict[str, Any]:
|
|
"""Process trial and update playbook."""
|
|
|
|
def finalize_study(study_stats: Dict[str, Any]) -> Dict[str, Any]:
|
|
"""Finalize study, commit insights, prune harmful."""
|
|
```
|
|
|
|
### 6.2 REST API
|
|
|
|
#### GET /api/context/playbook/items
|
|
|
|
Query Parameters:
|
|
- `category` (str): Filter by category (str, mis, tool, etc.)
|
|
- `min_score` (int): Minimum net score
|
|
- `min_confidence` (float): Minimum confidence (0.0-1.0)
|
|
- `limit` (int): Maximum items (default 50)
|
|
- `offset` (int): Pagination offset
|
|
|
|
Response:
|
|
```json
|
|
[
|
|
{
|
|
"id": "str-00001",
|
|
"category": "str",
|
|
"content": "Use shell elements for thin walls",
|
|
"helpful_count": 8,
|
|
"harmful_count": 0,
|
|
"net_score": 8,
|
|
"confidence": 1.0,
|
|
"tags": ["mesh", "shell"],
|
|
"created_at": "2025-12-29T10:00:00",
|
|
"last_used": "2025-12-29T15:30:00"
|
|
}
|
|
]
|
|
```
|
|
|
|
#### POST /api/context/playbook/feedback
|
|
|
|
Request:
|
|
```json
|
|
{
|
|
"item_id": "str-00001",
|
|
"helpful": true
|
|
}
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"item_id": "str-00001",
|
|
"new_score": 9,
|
|
"new_confidence": 1.0,
|
|
"helpful_count": 9,
|
|
"harmful_count": 0
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Testing
|
|
|
|
### 7.1 Test Coverage
|
|
|
|
| Test File | Tests | Coverage |
|
|
|-----------|-------|----------|
|
|
| `test_context_engineering.py` | 44 | Unit tests |
|
|
| `test_context_integration.py` | 16 | Integration tests |
|
|
| **Total** | **60** | **100% pass** |
|
|
|
|
### 7.2 Test Categories
|
|
|
|
#### Unit Tests (`test_context_engineering.py`)
|
|
|
|
| Class | Tests | Description |
|
|
|-------|-------|-------------|
|
|
| `TestAtomizerPlaybook` | 10 | Playbook CRUD, scoring, persistence |
|
|
| `TestAtomizerReflector` | 6 | Outcome analysis, insight extraction |
|
|
| `TestSessionState` | 9 | State management, isolation |
|
|
| `TestCompactionManager` | 7 | Compaction triggers, error preservation |
|
|
| `TestCacheMonitor` | 5 | Cache hit detection, prefix building |
|
|
| `TestFeedbackLoop` | 5 | Trial processing, finalization |
|
|
| `TestContextBudgetManager` | 2 | Budget tracking |
|
|
|
|
#### Integration Tests (`test_context_integration.py`)
|
|
|
|
| Class | Tests | Description |
|
|
|-------|-------|-------------|
|
|
| `TestFullOptimizationPipeline` | 4 | End-to-end optimization cycles |
|
|
| `TestReflectorLearningPatterns` | 2 | Pattern learning verification |
|
|
| `TestErrorTrackerIntegration` | 2 | Error capture and classification |
|
|
| `TestPlaybookContextGeneration` | 3 | Context filtering and ordering |
|
|
|
|
### 7.3 Running Tests
|
|
|
|
```bash
|
|
# Run all context engineering tests
|
|
pytest tests/test_context_engineering.py tests/test_context_integration.py -v
|
|
|
|
# Run specific test class
|
|
pytest tests/test_context_engineering.py::TestAtomizerPlaybook -v
|
|
|
|
# Run with coverage
|
|
pytest tests/test_context_engineering.py --cov=optimization_engine.context
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Usage Guide
|
|
|
|
### 8.1 Quick Start
|
|
|
|
```python
|
|
from optimization_engine.context import (
|
|
AtomizerPlaybook,
|
|
FeedbackLoop,
|
|
InsightCategory
|
|
)
|
|
from pathlib import Path
|
|
|
|
# Initialize
|
|
playbook_path = Path("knowledge_base/playbook.json")
|
|
feedback = FeedbackLoop(playbook_path)
|
|
|
|
# Run your optimization loop
|
|
for trial in range(100):
|
|
# ... execute trial ...
|
|
|
|
feedback.process_trial_result(
|
|
trial_number=trial,
|
|
success=result.success,
|
|
objective_value=result.objective,
|
|
design_variables=result.params
|
|
)
|
|
|
|
# Finalize
|
|
report = feedback.finalize_study({
|
|
"name": "my_study",
|
|
"total_trials": 100,
|
|
"best_value": best_result,
|
|
"convergence_rate": 0.85
|
|
})
|
|
|
|
print(f"Added {report['insights_added']} insights")
|
|
```
|
|
|
|
### 8.2 Adding Insights Manually
|
|
|
|
```python
|
|
from optimization_engine.context import get_playbook, InsightCategory, save_playbook
|
|
|
|
playbook = get_playbook()
|
|
|
|
# Add a strategy insight
|
|
playbook.add_insight(
|
|
category=InsightCategory.STRATEGY,
|
|
content="For mirror optimization, use Zernike basis functions",
|
|
tags=["mirror", "zernike", "optics"]
|
|
)
|
|
|
|
# Add a mistake insight
|
|
playbook.add_insight(
|
|
category=InsightCategory.MISTAKE,
|
|
content="Don't use convergence tolerance < 1e-10 for nonlinear analysis",
|
|
tags=["convergence", "nonlinear", "solver"]
|
|
)
|
|
|
|
save_playbook()
|
|
```
|
|
|
|
### 8.3 Querying the Playbook
|
|
|
|
```python
|
|
playbook = get_playbook()
|
|
|
|
# Get context for optimization task
|
|
context = playbook.get_context_for_task(
|
|
task_type="optimization",
|
|
max_items=15,
|
|
min_confidence=0.6
|
|
)
|
|
|
|
# Search for specific topics
|
|
mesh_insights = playbook.search_by_content("mesh", limit=5)
|
|
|
|
# Get all mistakes
|
|
mistakes = playbook.get_by_category(InsightCategory.MISTAKE)
|
|
|
|
# Get statistics
|
|
stats = playbook.get_stats()
|
|
print(f"Total items: {stats['total_items']}")
|
|
print(f"By category: {stats['by_category']}")
|
|
```
|
|
|
|
### 8.4 Managing Session State
|
|
|
|
```python
|
|
from optimization_engine.context import get_session, TaskType
|
|
|
|
session = get_session()
|
|
|
|
# Set task context
|
|
session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
|
session.exposed.study_name = "bracket_opt_v2"
|
|
|
|
# Track progress
|
|
session.update_study_status(
|
|
name="bracket_opt_v2",
|
|
status="running",
|
|
trials_completed=45,
|
|
trials_total=100,
|
|
best_value=123.5,
|
|
best_trial=38
|
|
)
|
|
|
|
# Record actions and errors
|
|
session.add_action("Started trial 46")
|
|
session.add_error("Minor convergence warning", error_type="warning")
|
|
|
|
# Get LLM context
|
|
context = session.get_llm_context()
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Migration Guide
|
|
|
|
### 9.1 From LAC to Playbook
|
|
|
|
The Learning Atomizer Core (LAC) system is superseded by the Playbook system. Key differences:
|
|
|
|
| Aspect | LAC | Playbook |
|
|
|--------|-----|----------|
|
|
| Storage | Multiple JSONL files | Single JSON file |
|
|
| Scoring | Simple confidence | Helpful/harmful counts |
|
|
| Deduplication | Manual | Automatic (hash-based) |
|
|
| Pruning | Manual | Automatic (threshold-based) |
|
|
| Integration | Separate scripts | Built into runner |
|
|
|
|
### 9.2 Migration Steps
|
|
|
|
1. **Export existing LAC data:**
|
|
```python
|
|
# Read old LAC files
|
|
lac_data = []
|
|
for jsonl_file in Path("knowledge_base/lac/session_insights").glob("*.jsonl"):
|
|
with open(jsonl_file) as f:
|
|
for line in f:
|
|
lac_data.append(json.loads(line))
|
|
```
|
|
|
|
2. **Convert to playbook:**
|
|
```python
|
|
from optimization_engine.context import AtomizerPlaybook, InsightCategory
|
|
|
|
playbook = AtomizerPlaybook()
|
|
|
|
category_map = {
|
|
"failure": InsightCategory.MISTAKE,
|
|
"success_pattern": InsightCategory.STRATEGY,
|
|
"workaround": InsightCategory.WORKFLOW,
|
|
"user_preference": InsightCategory.WORKFLOW,
|
|
"protocol_clarification": InsightCategory.DOMAIN
|
|
}
|
|
|
|
for item in lac_data:
|
|
category = category_map.get(item["category"], InsightCategory.DOMAIN)
|
|
playbook.add_insight(
|
|
category=category,
|
|
content=item["insight"],
|
|
tags=item.get("tags", [])
|
|
)
|
|
|
|
playbook.save(Path("knowledge_base/playbook.json"))
|
|
```
|
|
|
|
### 9.3 Updating Bootstrap
|
|
|
|
Replace `00_BOOTSTRAP.md` with `00_BOOTSTRAP_V2.md`:
|
|
|
|
```bash
|
|
# Backup old bootstrap
|
|
cp .claude/skills/00_BOOTSTRAP.md .claude/skills/00_BOOTSTRAP_v1_backup.md
|
|
|
|
# Use new bootstrap
|
|
cp .claude/skills/00_BOOTSTRAP_V2.md .claude/skills/00_BOOTSTRAP.md
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Future Enhancements
|
|
|
|
### 10.1 Planned Improvements
|
|
|
|
| Enhancement | Priority | Description |
|
|
|-------------|----------|-------------|
|
|
| Embedding-based search | High | Replace keyword search with semantic embeddings |
|
|
| Cross-study learning | High | Share insights across different geometry types |
|
|
| Confidence decay | Medium | Reduce confidence of old, unused insights |
|
|
| Multi-user support | Medium | Per-user playbooks with shared base |
|
|
| Automatic tagging | Low | LLM-generated tags for insights |
|
|
|
|
### 10.2 Architecture Improvements
|
|
|
|
1. **Vector Database Integration**
|
|
- Use embeddings for semantic similarity
|
|
- Better duplicate detection
|
|
- More relevant context retrieval
|
|
|
|
2. **Hierarchical Playbooks**
|
|
- Global → Domain → Study hierarchy
|
|
- Inherit and override patterns
|
|
|
|
3. **Active Learning**
|
|
- Identify uncertain items
|
|
- Request explicit feedback from users
|
|
|
|
---
|
|
|
|
## Appendix A: File Manifest
|
|
|
|
| File | Size | Description |
|
|
|------|------|-------------|
|
|
| `optimization_engine/context/__init__.py` | 1.2 KB | Module exports |
|
|
| `optimization_engine/context/playbook.py` | 8.5 KB | Playbook implementation |
|
|
| `optimization_engine/context/reflector.py` | 6.8 KB | Reflector implementation |
|
|
| `optimization_engine/context/session_state.py` | 8.2 KB | Session state |
|
|
| `optimization_engine/context/cache_monitor.py` | 5.9 KB | Cache optimization |
|
|
| `optimization_engine/context/feedback_loop.py` | 5.1 KB | Feedback loop |
|
|
| `optimization_engine/context/compaction.py` | 7.4 KB | Compaction manager |
|
|
| `optimization_engine/context/runner_integration.py` | 6.8 KB | Runner integration |
|
|
| `optimization_engine/plugins/post_solve/error_tracker.py` | 4.2 KB | Error tracker hook |
|
|
| `atomizer-dashboard/backend/api/routes/context.py` | 6.5 KB | REST API |
|
|
| `.claude/skills/00_BOOTSTRAP_V2.md` | 8.9 KB | Enhanced bootstrap |
|
|
| `tests/test_context_engineering.py` | 11.2 KB | Unit tests |
|
|
| `tests/test_context_integration.py` | 8.8 KB | Integration tests |
|
|
|
|
**Total: ~90 KB of new code and documentation**
|
|
|
|
---
|
|
|
|
## Appendix B: Configuration Reference
|
|
|
|
### Playbook JSON Schema
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"last_updated": "2025-12-29T10:00:00",
|
|
"items": {
|
|
"str-00001": {
|
|
"id": "str-00001",
|
|
"category": "str",
|
|
"content": "Insight text here",
|
|
"helpful_count": 5,
|
|
"harmful_count": 1,
|
|
"created_at": "2025-12-29T10:00:00",
|
|
"last_used": "2025-12-29T15:30:00",
|
|
"source_trials": [42, 67],
|
|
"tags": ["tag1", "tag2"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Context Budget Defaults
|
|
|
|
```python
|
|
DEFAULT_BUDGET = {
|
|
"stable_prefix": 5000, # tokens
|
|
"protocols": 10000,
|
|
"playbook": 5000,
|
|
"session_state": 2000,
|
|
"conversation": 30000,
|
|
"working_space": 48000,
|
|
"total": 100000
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
*Document generated: December 29, 2025*
|
|
*Implementation complete: 60/60 tests passing*
|