feat: Implement ACE Context Engineering framework (SYS_17)
Complete implementation of Agentic Context Engineering (ACE) framework: Core modules (optimization_engine/context/): - playbook.py: AtomizerPlaybook with helpful/harmful scoring - reflector.py: AtomizerReflector for insight extraction - session_state.py: Context isolation (exposed/isolated state) - feedback_loop.py: Automated learning from trial results - compaction.py: Long-session context management - cache_monitor.py: KV-cache optimization tracking - runner_integration.py: OptimizationRunner integration Dashboard integration: - context.py: 12 REST API endpoints for playbook management Tests: - test_context_engineering.py: 44 unit tests - test_context_integration.py: 16 integration tests Documentation: - CONTEXT_ENGINEERING_REPORT.md: Comprehensive implementation report - CONTEXT_ENGINEERING_API.md: Complete API reference - SYS_17_CONTEXT_ENGINEERING.md: System protocol - Updated cheatsheet with SYS_17 quick reference - Enhanced bootstrap (00_BOOTSTRAP_V2.md) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -172,7 +172,7 @@ studies/{geometry_type}/{study_name}/
|
||||
│ SYS_10: IMSO (single-obj) SYS_11: Multi-objective │
|
||||
│ SYS_12: Extractors SYS_13: Dashboard │
|
||||
│ SYS_14: Neural Accel SYS_15: Method Selector │
|
||||
│ SYS_16: Study Insights │
|
||||
│ SYS_16: Study Insights SYS_17: Context Engineering │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
|
||||
425
.claude/skills/00_BOOTSTRAP_V2.md
Normal file
425
.claude/skills/00_BOOTSTRAP_V2.md
Normal file
@@ -0,0 +1,425 @@
|
||||
---
|
||||
skill_id: SKILL_000
|
||||
version: 3.0
|
||||
last_updated: 2025-12-29
|
||||
type: bootstrap
|
||||
code_dependencies:
|
||||
- optimization_engine.context.playbook
|
||||
- optimization_engine.context.session_state
|
||||
- optimization_engine.context.feedback_loop
|
||||
requires_skills: []
|
||||
---
|
||||
|
||||
# Atomizer LLM Bootstrap v3.0 - Context-Aware Sessions
|
||||
|
||||
**Version**: 3.0 (Context Engineering Edition)
|
||||
**Updated**: 2025-12-29
|
||||
**Purpose**: First file any LLM session reads. Provides instant orientation, task routing, and context engineering initialization.
|
||||
|
||||
---
|
||||
|
||||
## Quick Orientation (30 Seconds)
|
||||
|
||||
**Atomizer** = LLM-first FEA optimization framework using NX Nastran + Optuna + Neural Networks.
|
||||
|
||||
**Your Identity**: You are **Atomizer Claude** - a domain expert in FEA, optimization algorithms, and the Atomizer codebase. Not a generic assistant.
|
||||
|
||||
**Core Philosophy**: "Talk, don't click." Users describe what they want; you configure and execute.
|
||||
|
||||
**NEW in v3.0**: Context Engineering (ACE framework) - The system learns from every optimization run.
|
||||
|
||||
---
|
||||
|
||||
## Session Startup Checklist
|
||||
|
||||
On **every new session**, complete these steps:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ SESSION STARTUP (v3.0) │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ STEP 1: Initialize Context Engineering │
|
||||
│ □ Load playbook from knowledge_base/playbook.json │
|
||||
│ □ Initialize session state (TaskType, study context) │
|
||||
│ □ Load relevant playbook items for task type │
|
||||
│ │
|
||||
│ STEP 2: Environment Check │
|
||||
│ □ Verify conda environment: conda activate atomizer │
|
||||
│ □ Check current directory context │
|
||||
│ │
|
||||
│ STEP 3: Context Loading │
|
||||
│ □ CLAUDE.md loaded (system instructions) │
|
||||
│ □ This file (00_BOOTSTRAP_V2.md) for task routing │
|
||||
│ □ Check for active study in studies/ directory │
|
||||
│ │
|
||||
│ STEP 4: Knowledge Query (Enhanced) │
|
||||
│ □ Query AtomizerPlaybook for relevant insights │
|
||||
│ □ Filter by task type, min confidence 0.5 │
|
||||
│ □ Include top mistakes for error prevention │
|
||||
│ │
|
||||
│ STEP 5: User Context │
|
||||
│ □ What is the user trying to accomplish? │
|
||||
│ □ Is there an active study context? │
|
||||
│ □ What privilege level? (default: user) │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Context Engineering Initialization
|
||||
|
||||
```python
|
||||
# On session start, initialize context engineering
|
||||
from optimization_engine.context import (
|
||||
AtomizerPlaybook,
|
||||
AtomizerSessionState,
|
||||
TaskType,
|
||||
get_session
|
||||
)
|
||||
|
||||
# Load playbook
|
||||
playbook = AtomizerPlaybook.load(Path("knowledge_base/playbook.json"))
|
||||
|
||||
# Initialize session
|
||||
session = get_session()
|
||||
session.exposed.task_type = TaskType.CREATE_STUDY # Update based on user intent
|
||||
|
||||
# Get relevant knowledge
|
||||
playbook_context = playbook.get_context_for_task(
|
||||
task_type="optimization",
|
||||
max_items=15,
|
||||
min_confidence=0.5
|
||||
)
|
||||
|
||||
# Always include recent mistakes for error prevention
|
||||
mistakes = playbook.get_by_category(InsightCategory.MISTAKE, min_score=-2)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task Classification Tree
|
||||
|
||||
When a user request arrives, classify it and update session state:
|
||||
|
||||
```
|
||||
User Request
|
||||
│
|
||||
├─► CREATE something?
|
||||
│ ├─ "new study", "set up", "create", "optimize this"
|
||||
│ ├─ session.exposed.task_type = TaskType.CREATE_STUDY
|
||||
│ └─► Load: OP_01_CREATE_STUDY.md + core/study-creation-core.md
|
||||
│
|
||||
├─► RUN something?
|
||||
│ ├─ "start", "run", "execute", "begin optimization"
|
||||
│ ├─ session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
||||
│ └─► Load: OP_02_RUN_OPTIMIZATION.md
|
||||
│
|
||||
├─► CHECK status?
|
||||
│ ├─ "status", "progress", "how many trials", "what's happening"
|
||||
│ ├─ session.exposed.task_type = TaskType.MONITOR_PROGRESS
|
||||
│ └─► Load: OP_03_MONITOR_PROGRESS.md
|
||||
│
|
||||
├─► ANALYZE results?
|
||||
│ ├─ "results", "best design", "compare", "pareto"
|
||||
│ ├─ session.exposed.task_type = TaskType.ANALYZE_RESULTS
|
||||
│ └─► Load: OP_04_ANALYZE_RESULTS.md
|
||||
│
|
||||
├─► DEBUG/FIX error?
|
||||
│ ├─ "error", "failed", "not working", "crashed"
|
||||
│ ├─ session.exposed.task_type = TaskType.DEBUG_ERROR
|
||||
│ └─► Load: OP_06_TROUBLESHOOT.md + playbook[MISTAKE]
|
||||
│
|
||||
├─► MANAGE disk space?
|
||||
│ ├─ "disk", "space", "cleanup", "archive", "storage"
|
||||
│ └─► Load: OP_07_DISK_OPTIMIZATION.md
|
||||
│
|
||||
├─► CONFIGURE settings?
|
||||
│ ├─ "change", "modify", "settings", "parameters"
|
||||
│ ├─ session.exposed.task_type = TaskType.CONFIGURE_SETTINGS
|
||||
│ └─► Load relevant SYS_* protocol
|
||||
│
|
||||
├─► NEURAL acceleration?
|
||||
│ ├─ "neural", "surrogate", "turbo", "GNN"
|
||||
│ ├─ session.exposed.task_type = TaskType.NEURAL_ACCELERATION
|
||||
│ └─► Load: SYS_14_NEURAL_ACCELERATION.md
|
||||
│
|
||||
└─► EXTEND functionality?
|
||||
├─ "add extractor", "new hook", "create protocol"
|
||||
└─► Check privilege, then load EXT_* protocol
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Protocol Routing Table (With Context Loading)
|
||||
|
||||
| User Intent | Keywords | Protocol | Skill to Load | Playbook Filter |
|
||||
|-------------|----------|----------|---------------|-----------------|
|
||||
| Create study | "new", "set up", "create" | OP_01 | study-creation-core.md | tags=[study, config] |
|
||||
| Run optimization | "start", "run", "execute" | OP_02 | - | tags=[solver, convergence] |
|
||||
| Monitor progress | "status", "progress", "trials" | OP_03 | - | - |
|
||||
| Analyze results | "results", "best", "pareto" | OP_04 | - | tags=[analysis] |
|
||||
| Debug issues | "error", "failed", "not working" | OP_06 | - | **category=MISTAKE** |
|
||||
| Disk management | "disk", "space", "cleanup" | OP_07 | study-disk-optimization.md | - |
|
||||
| Neural surrogates | "neural", "surrogate", "turbo" | SYS_14 | neural-acceleration.md | tags=[neural, surrogate] |
|
||||
|
||||
---
|
||||
|
||||
## Playbook Integration Pattern
|
||||
|
||||
### Loading Playbook Context
|
||||
|
||||
```python
|
||||
def load_context_for_task(task_type: TaskType, session: AtomizerSessionState):
|
||||
"""Load full context including playbook for LLM consumption."""
|
||||
context_parts = []
|
||||
|
||||
# 1. Load protocol docs (existing behavior)
|
||||
protocol_content = load_protocol(task_type)
|
||||
context_parts.append(protocol_content)
|
||||
|
||||
# 2. Load session state (exposed only)
|
||||
context_parts.append(session.get_llm_context())
|
||||
|
||||
# 3. Load relevant playbook items
|
||||
playbook = AtomizerPlaybook.load(PLAYBOOK_PATH)
|
||||
playbook_context = playbook.get_context_for_task(
|
||||
task_type=task_type.value,
|
||||
max_items=15,
|
||||
min_confidence=0.6
|
||||
)
|
||||
context_parts.append(playbook_context)
|
||||
|
||||
# 4. Add error-specific items if debugging
|
||||
if task_type == TaskType.DEBUG_ERROR:
|
||||
mistakes = playbook.get_by_category(InsightCategory.MISTAKE)
|
||||
for item in mistakes[:5]:
|
||||
context_parts.append(item.to_context_string())
|
||||
|
||||
return "\n\n---\n\n".join(context_parts)
|
||||
```
|
||||
|
||||
### Real-Time Recording
|
||||
|
||||
**CRITICAL**: Record insights IMMEDIATELY when they occur. Do not wait until session end.
|
||||
|
||||
```python
|
||||
# On discovering a workaround
|
||||
playbook.add_insight(
|
||||
category=InsightCategory.WORKFLOW,
|
||||
content="For mesh update issues, load _i.prt file before UpdateFemodel()",
|
||||
tags=["mesh", "nx", "update"]
|
||||
)
|
||||
playbook.save(PLAYBOOK_PATH)
|
||||
|
||||
# On trial failure
|
||||
playbook.add_insight(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"Convergence failure with tolerance < 1e-8 on large meshes",
|
||||
source_trial=trial_number,
|
||||
tags=["convergence", "solver"]
|
||||
)
|
||||
playbook.save(PLAYBOOK_PATH)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling Protocol (Enhanced)
|
||||
|
||||
When ANY error occurs:
|
||||
|
||||
1. **Preserve the error** - Add to session state
|
||||
2. **Check playbook** - Look for matching mistake patterns
|
||||
3. **Learn from it** - If novel error, add to playbook
|
||||
4. **Show to user** - Include error context in response
|
||||
|
||||
```python
|
||||
# On error
|
||||
session.add_error(f"{error_type}: {error_message}", error_type=error_type)
|
||||
|
||||
# Check playbook for similar errors
|
||||
similar = playbook.search_by_content(error_message, category=InsightCategory.MISTAKE)
|
||||
if similar:
|
||||
print(f"Known issue: {similar[0].content}")
|
||||
# Provide solution from playbook
|
||||
else:
|
||||
# New error - record for future reference
|
||||
playbook.add_insight(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"{error_type}: {error_message[:200]}",
|
||||
tags=["error", error_type]
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Context Budget Management
|
||||
|
||||
Total context budget: ~100K tokens
|
||||
|
||||
Allocation:
|
||||
- **Stable prefix**: 5K tokens (cached across requests)
|
||||
- **Protocols**: 10K tokens
|
||||
- **Playbook items**: 5K tokens
|
||||
- **Session state**: 2K tokens
|
||||
- **Conversation history**: 30K tokens
|
||||
- **Working space**: 48K tokens
|
||||
|
||||
If approaching limit:
|
||||
1. Trigger compaction of old events
|
||||
2. Reduce playbook items to top 5
|
||||
3. Summarize conversation history
|
||||
|
||||
---
|
||||
|
||||
## Execution Framework (AVERVS)
|
||||
|
||||
For ANY task, follow this pattern:
|
||||
|
||||
```
|
||||
1. ANNOUNCE → State what you're about to do
|
||||
2. VALIDATE → Check prerequisites are met
|
||||
3. EXECUTE → Perform the action
|
||||
4. RECORD → Record outcome to playbook (NEW!)
|
||||
5. VERIFY → Confirm success
|
||||
6. REPORT → Summarize what was done
|
||||
7. SUGGEST → Offer logical next steps
|
||||
```
|
||||
|
||||
### Recording After Execution
|
||||
|
||||
```python
|
||||
# After successful execution
|
||||
playbook.add_insight(
|
||||
category=InsightCategory.STRATEGY,
|
||||
content=f"Approach worked: {brief_description}",
|
||||
tags=relevant_tags
|
||||
)
|
||||
|
||||
# After failure
|
||||
playbook.add_insight(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"Failed approach: {brief_description}. Reason: {reason}",
|
||||
tags=relevant_tags
|
||||
)
|
||||
|
||||
# Always save after recording
|
||||
playbook.save(PLAYBOOK_PATH)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Session Closing Checklist (Enhanced)
|
||||
|
||||
Before ending a session, complete:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ SESSION CLOSING (v3.0) │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. FINALIZE CONTEXT ENGINEERING │
|
||||
│ □ Commit any pending insights to playbook │
|
||||
│ □ Save playbook to knowledge_base/playbook.json │
|
||||
│ □ Export learning report if optimization completed │
|
||||
│ │
|
||||
│ 2. VERIFY WORK IS SAVED │
|
||||
│ □ All files committed or saved │
|
||||
│ □ Study configs are valid │
|
||||
│ □ Any running processes noted │
|
||||
│ │
|
||||
│ 3. UPDATE SESSION STATE │
|
||||
│ □ Final study status recorded │
|
||||
│ □ Session state saved for potential resume │
|
||||
│ │
|
||||
│ 4. SUMMARIZE FOR USER │
|
||||
│ □ What was accomplished │
|
||||
│ □ What the system learned (new playbook items) │
|
||||
│ □ Current state of any studies │
|
||||
│ □ Recommended next steps │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Finalization Code
|
||||
|
||||
```python
|
||||
# At session end
|
||||
from optimization_engine.context import FeedbackLoop, save_playbook
|
||||
|
||||
# If optimization was run, finalize learning
|
||||
if optimization_completed:
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
result = feedback.finalize_study({
|
||||
"name": study_name,
|
||||
"total_trials": n_trials,
|
||||
"best_value": best_value,
|
||||
"convergence_rate": success_rate
|
||||
})
|
||||
print(f"Learning finalized: {result['insights_added']} insights added")
|
||||
|
||||
# Always save playbook
|
||||
save_playbook()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Context Engineering Components Reference
|
||||
|
||||
| Component | Purpose | Location |
|
||||
|-----------|---------|----------|
|
||||
| **AtomizerPlaybook** | Knowledge store with helpful/harmful tracking | `optimization_engine/context/playbook.py` |
|
||||
| **AtomizerReflector** | Analyzes outcomes, extracts insights | `optimization_engine/context/reflector.py` |
|
||||
| **AtomizerSessionState** | Context isolation (exposed/isolated) | `optimization_engine/context/session_state.py` |
|
||||
| **FeedbackLoop** | Connects outcomes to playbook updates | `optimization_engine/context/feedback_loop.py` |
|
||||
| **CompactionManager** | Handles long sessions | `optimization_engine/context/compaction.py` |
|
||||
| **ContextCacheOptimizer** | KV-cache optimization | `optimization_engine/context/cache_monitor.py` |
|
||||
|
||||
---
|
||||
|
||||
## Quick Paths
|
||||
|
||||
### "I just want to run an optimization"
|
||||
1. Initialize session state as RUN_OPTIMIZATION
|
||||
2. Load playbook items for [solver, convergence]
|
||||
3. Load OP_02_RUN_OPTIMIZATION.md
|
||||
4. After run, finalize feedback loop
|
||||
|
||||
### "Something broke"
|
||||
1. Initialize session state as DEBUG_ERROR
|
||||
2. Load ALL mistake items from playbook
|
||||
3. Load OP_06_TROUBLESHOOT.md
|
||||
4. Record any new errors discovered
|
||||
|
||||
### "What did my optimization find?"
|
||||
1. Initialize session state as ANALYZE_RESULTS
|
||||
2. Load OP_04_ANALYZE_RESULTS.md
|
||||
3. Query the study database
|
||||
4. Generate report
|
||||
|
||||
---
|
||||
|
||||
## Key Constraints (Always Apply)
|
||||
|
||||
1. **Python Environment**: Always use `conda activate atomizer`
|
||||
2. **Never modify master files**: Copy NX files to study working directory first
|
||||
3. **Code reuse**: Check `optimization_engine/extractors/` before writing new extraction code
|
||||
4. **Validation**: Always validate config before running optimization
|
||||
5. **Record immediately**: Don't wait until session end to record insights
|
||||
6. **Save playbook**: After every insight, save the playbook
|
||||
|
||||
---
|
||||
|
||||
## Migration from v2.0
|
||||
|
||||
If upgrading from BOOTSTRAP v2.0:
|
||||
|
||||
1. The LAC system is now superseded by AtomizerPlaybook
|
||||
2. Session insights are now structured PlaybookItems
|
||||
3. Helpful/harmful tracking replaces simple confidence scores
|
||||
4. Context is now explicitly exposed vs isolated
|
||||
|
||||
The old LAC files in `knowledge_base/lac/` are still readable but new insights should use the playbook system.
|
||||
|
||||
---
|
||||
|
||||
*Atomizer v3.0: Where engineers talk, AI optimizes, and the system learns.*
|
||||
@@ -34,6 +34,7 @@ requires_skills:
|
||||
| Add custom physics extractor | EXT_01 | Create in `optimization_engine/extractors/` |
|
||||
| Add lifecycle hook | EXT_02 | Create in `optimization_engine/plugins/` |
|
||||
| Generate physics insight | SYS_16 | `python -m optimization_engine.insights generate <study>` |
|
||||
| **Manage knowledge/playbook** | **SYS_17** | `from optimization_engine.context import AtomizerPlaybook` |
|
||||
|
||||
---
|
||||
|
||||
@@ -366,6 +367,7 @@ Without it, `UpdateFemodel()` runs but the mesh doesn't change!
|
||||
| 14 | Neural | Surrogate model acceleration |
|
||||
| 15 | Method Selector | Recommends optimization strategy |
|
||||
| 16 | Study Insights | Physics visualizations (Zernike, stress, modal) |
|
||||
| 17 | Context Engineering | ACE framework - self-improving knowledge system |
|
||||
|
||||
---
|
||||
|
||||
@@ -549,3 +551,106 @@ convert_custom_to_optuna(db_path, study_name)
|
||||
- Trial numbers **NEVER reset** across study lifetime
|
||||
- Surrogate predictions (5K per batch) are NOT logged as trials
|
||||
- Only FEA-validated results become trials
|
||||
|
||||
---
|
||||
|
||||
## Context Engineering Quick Reference (SYS_17)
|
||||
|
||||
The ACE (Agentic Context Engineering) framework enables self-improving optimization through structured knowledge capture.
|
||||
|
||||
### Core Components
|
||||
|
||||
| Component | Purpose | Key Function |
|
||||
|-----------|---------|--------------|
|
||||
| **AtomizerPlaybook** | Structured knowledge store | `playbook.add_insight()`, `playbook.get_context_for_task()` |
|
||||
| **AtomizerReflector** | Extracts insights from outcomes | `reflector.analyze_outcome()` |
|
||||
| **AtomizerSessionState** | Context isolation (exposed/isolated) | `session.get_llm_context()` |
|
||||
| **FeedbackLoop** | Automated learning | `feedback.process_trial_result()` |
|
||||
| **CompactionManager** | Long-session handling | `compactor.maybe_compact()` |
|
||||
| **CacheMonitor** | KV-cache optimization | `optimizer.track_completion()` |
|
||||
|
||||
### Python API Quick Reference
|
||||
|
||||
```python
|
||||
from optimization_engine.context import (
|
||||
AtomizerPlaybook, AtomizerReflector, get_session,
|
||||
InsightCategory, TaskType, FeedbackLoop
|
||||
)
|
||||
|
||||
# Load playbook
|
||||
playbook = AtomizerPlaybook.load(Path("knowledge_base/playbook.json"))
|
||||
|
||||
# Add an insight
|
||||
playbook.add_insight(
|
||||
category=InsightCategory.STRATEGY, # str, mis, tool, cal, dom, wf
|
||||
content="CMA-ES converges faster on smooth mirror surfaces",
|
||||
tags=["mirror", "sampler", "convergence"]
|
||||
)
|
||||
playbook.save(Path("knowledge_base/playbook.json"))
|
||||
|
||||
# Get context for LLM
|
||||
context = playbook.get_context_for_task(
|
||||
task_type="optimization",
|
||||
max_items=15,
|
||||
min_confidence=0.5
|
||||
)
|
||||
|
||||
# Record feedback
|
||||
playbook.record_outcome(item_id="str_001", helpful=True)
|
||||
|
||||
# Session state
|
||||
session = get_session()
|
||||
session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
||||
session.add_action("Started optimization run")
|
||||
llm_context = session.get_llm_context()
|
||||
|
||||
# Feedback loop (automated learning)
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
feedback.process_trial_result(
|
||||
trial_number=42,
|
||||
params={'thickness': 10.5},
|
||||
objectives={'mass': 5.2},
|
||||
is_feasible=True
|
||||
)
|
||||
```
|
||||
|
||||
### Insight Categories
|
||||
|
||||
| Category | Code | Use For |
|
||||
|----------|------|---------|
|
||||
| Strategy | `str` | Optimization approaches that work |
|
||||
| Mistake | `mis` | Common errors to avoid |
|
||||
| Tool | `tool` | Tool usage patterns |
|
||||
| Calculation | `cal` | Formulas and calculations |
|
||||
| Domain | `dom` | FEA/NX domain knowledge |
|
||||
| Workflow | `wf` | Process patterns |
|
||||
|
||||
### Playbook Item Format
|
||||
|
||||
```
|
||||
[str_001] helpful=5 harmful=0 :: CMA-ES converges faster on smooth surfaces
|
||||
```
|
||||
|
||||
- `net_score = helpful - harmful`
|
||||
- `confidence = helpful / (helpful + harmful)`
|
||||
- Items with `net_score < -3` are pruned
|
||||
|
||||
### REST API Endpoints
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `/api/context/playbook` | GET | Playbook summary stats |
|
||||
| `/api/context/playbook/items` | GET | List items with filters |
|
||||
| `/api/context/playbook/feedback` | POST | Record helpful/harmful |
|
||||
| `/api/context/playbook/insights` | POST | Add new insight |
|
||||
| `/api/context/playbook/prune` | POST | Remove harmful items |
|
||||
| `/api/context/session` | GET | Current session state |
|
||||
| `/api/context/learning/report` | GET | Comprehensive learning report |
|
||||
|
||||
### Dashboard URL
|
||||
|
||||
| Service | URL | Purpose |
|
||||
|---------|-----|---------|
|
||||
| Context API | `http://localhost:5000/api/context` | Playbook management |
|
||||
|
||||
**Full documentation**: `docs/protocols/system/SYS_17_CONTEXT_ENGINEERING.md`
|
||||
|
||||
@@ -12,7 +12,7 @@ import sys
|
||||
# Add parent directory to path to import optimization_engine
|
||||
sys.path.append(str(Path(__file__).parent.parent.parent.parent))
|
||||
|
||||
from api.routes import optimization, claude, terminal, insights
|
||||
from api.routes import optimization, claude, terminal, insights, context
|
||||
from api.websocket import optimization_stream
|
||||
|
||||
# Create FastAPI app
|
||||
@@ -37,6 +37,7 @@ app.include_router(optimization_stream.router, prefix="/api/ws", tags=["websocke
|
||||
app.include_router(claude.router, prefix="/api/claude", tags=["claude"])
|
||||
app.include_router(terminal.router, prefix="/api/terminal", tags=["terminal"])
|
||||
app.include_router(insights.router, prefix="/api/insights", tags=["insights"])
|
||||
app.include_router(context.router, prefix="/api/context", tags=["context"])
|
||||
|
||||
@app.get("/")
|
||||
async def root():
|
||||
|
||||
450
atomizer-dashboard/backend/api/routes/context.py
Normal file
450
atomizer-dashboard/backend/api/routes/context.py
Normal file
@@ -0,0 +1,450 @@
|
||||
"""
|
||||
Context Engineering API Routes
|
||||
|
||||
Provides endpoints for:
|
||||
- Viewing playbook contents
|
||||
- Managing session state
|
||||
- Recording feedback on playbook items
|
||||
- Triggering compaction
|
||||
- Monitoring cache efficiency
|
||||
- Exporting learning reports
|
||||
|
||||
Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
|
||||
"""
|
||||
|
||||
from fastapi import APIRouter, HTTPException, Query
|
||||
from pathlib import Path
|
||||
from typing import Optional, List
|
||||
from pydantic import BaseModel
|
||||
from datetime import datetime
|
||||
import sys
|
||||
|
||||
# Add parent paths for imports
|
||||
sys.path.append(str(Path(__file__).parent.parent.parent.parent.parent))
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
# Paths
|
||||
ATOMIZER_ROOT = Path(__file__).parents[4]
|
||||
PLAYBOOK_PATH = ATOMIZER_ROOT / "knowledge_base" / "playbook.json"
|
||||
|
||||
|
||||
# Pydantic models for request/response
|
||||
class PlaybookItemResponse(BaseModel):
|
||||
id: str
|
||||
category: str
|
||||
content: str
|
||||
helpful_count: int
|
||||
harmful_count: int
|
||||
net_score: int
|
||||
confidence: float
|
||||
tags: List[str]
|
||||
created_at: str
|
||||
last_used: Optional[str]
|
||||
|
||||
|
||||
class PlaybookSummary(BaseModel):
|
||||
total_items: int
|
||||
by_category: dict
|
||||
version: int
|
||||
last_updated: str
|
||||
avg_score: float
|
||||
top_score: int
|
||||
lowest_score: int
|
||||
|
||||
|
||||
class FeedbackRequest(BaseModel):
|
||||
item_id: str
|
||||
helpful: bool
|
||||
|
||||
|
||||
class InsightRequest(BaseModel):
|
||||
category: str
|
||||
content: str
|
||||
tags: Optional[List[str]] = None
|
||||
source_trial: Optional[int] = None
|
||||
|
||||
|
||||
class SessionStateResponse(BaseModel):
|
||||
session_id: str
|
||||
task_type: Optional[str]
|
||||
study_name: Optional[str]
|
||||
study_status: str
|
||||
trials_completed: int
|
||||
trials_total: int
|
||||
best_value: Optional[float]
|
||||
recent_actions: List[str]
|
||||
recent_errors: List[str]
|
||||
|
||||
|
||||
# Helper function to get playbook
|
||||
def get_playbook():
|
||||
"""Load playbook, handling import errors gracefully."""
|
||||
try:
|
||||
from optimization_engine.context.playbook import AtomizerPlaybook
|
||||
return AtomizerPlaybook.load(PLAYBOOK_PATH)
|
||||
except ImportError as e:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Context engineering module not available: {str(e)}"
|
||||
)
|
||||
|
||||
|
||||
# Playbook endpoints
|
||||
@router.get("/playbook", response_model=PlaybookSummary)
|
||||
async def get_playbook_summary():
|
||||
"""Get playbook summary statistics."""
|
||||
playbook = get_playbook()
|
||||
stats = playbook.get_stats()
|
||||
|
||||
return PlaybookSummary(
|
||||
total_items=stats["total_items"],
|
||||
by_category=stats["by_category"],
|
||||
version=stats["version"],
|
||||
last_updated=stats["last_updated"],
|
||||
avg_score=stats["avg_score"],
|
||||
top_score=stats["max_score"],
|
||||
lowest_score=stats["min_score"]
|
||||
)
|
||||
|
||||
|
||||
@router.get("/playbook/items", response_model=List[PlaybookItemResponse])
|
||||
async def get_playbook_items(
|
||||
category: Optional[str] = Query(None, description="Filter by category (str, mis, tool, etc.)"),
|
||||
min_score: int = Query(0, description="Minimum net score"),
|
||||
min_confidence: float = Query(0.0, description="Minimum confidence (0.0-1.0)"),
|
||||
limit: int = Query(50, description="Maximum items to return"),
|
||||
offset: int = Query(0, description="Pagination offset")
|
||||
):
|
||||
"""
|
||||
Get playbook items with optional filtering.
|
||||
|
||||
Categories:
|
||||
- str: Strategy
|
||||
- mis: Mistake
|
||||
- tool: Tool usage
|
||||
- cal: Calculation
|
||||
- dom: Domain knowledge
|
||||
- wf: Workflow
|
||||
"""
|
||||
playbook = get_playbook()
|
||||
|
||||
items = list(playbook.items.values())
|
||||
|
||||
# Filter by category
|
||||
if category:
|
||||
try:
|
||||
from optimization_engine.context.playbook import InsightCategory
|
||||
cat = InsightCategory(category)
|
||||
items = [i for i in items if i.category == cat]
|
||||
except ValueError:
|
||||
raise HTTPException(400, f"Invalid category: {category}. Valid: str, mis, tool, cal, dom, wf")
|
||||
|
||||
# Filter by score
|
||||
items = [i for i in items if i.net_score >= min_score]
|
||||
|
||||
# Filter by confidence
|
||||
items = [i for i in items if i.confidence >= min_confidence]
|
||||
|
||||
# Sort by score
|
||||
items.sort(key=lambda x: x.net_score, reverse=True)
|
||||
|
||||
# Paginate
|
||||
items = items[offset:offset + limit]
|
||||
|
||||
return [
|
||||
PlaybookItemResponse(
|
||||
id=item.id,
|
||||
category=item.category.value,
|
||||
content=item.content,
|
||||
helpful_count=item.helpful_count,
|
||||
harmful_count=item.harmful_count,
|
||||
net_score=item.net_score,
|
||||
confidence=item.confidence,
|
||||
tags=item.tags,
|
||||
created_at=item.created_at,
|
||||
last_used=item.last_used
|
||||
)
|
||||
for item in items
|
||||
]
|
||||
|
||||
|
||||
@router.get("/playbook/items/{item_id}", response_model=PlaybookItemResponse)
|
||||
async def get_playbook_item(item_id: str):
|
||||
"""Get a specific playbook item by ID."""
|
||||
playbook = get_playbook()
|
||||
|
||||
if item_id not in playbook.items:
|
||||
raise HTTPException(404, f"Item not found: {item_id}")
|
||||
|
||||
item = playbook.items[item_id]
|
||||
|
||||
return PlaybookItemResponse(
|
||||
id=item.id,
|
||||
category=item.category.value,
|
||||
content=item.content,
|
||||
helpful_count=item.helpful_count,
|
||||
harmful_count=item.harmful_count,
|
||||
net_score=item.net_score,
|
||||
confidence=item.confidence,
|
||||
tags=item.tags,
|
||||
created_at=item.created_at,
|
||||
last_used=item.last_used
|
||||
)
|
||||
|
||||
|
||||
@router.post("/playbook/feedback")
|
||||
async def record_feedback(request: FeedbackRequest):
|
||||
"""
|
||||
Record feedback on a playbook item.
|
||||
|
||||
This is how the system learns:
|
||||
- helpful=true increases the item's score
|
||||
- helpful=false decreases the item's score
|
||||
"""
|
||||
playbook = get_playbook()
|
||||
|
||||
if request.item_id not in playbook.items:
|
||||
raise HTTPException(404, f"Item not found: {request.item_id}")
|
||||
|
||||
playbook.record_outcome(request.item_id, helpful=request.helpful)
|
||||
playbook.save(PLAYBOOK_PATH)
|
||||
|
||||
item = playbook.items[request.item_id]
|
||||
|
||||
return {
|
||||
"item_id": request.item_id,
|
||||
"new_score": item.net_score,
|
||||
"new_confidence": item.confidence,
|
||||
"helpful_count": item.helpful_count,
|
||||
"harmful_count": item.harmful_count
|
||||
}
|
||||
|
||||
|
||||
@router.post("/playbook/insights")
|
||||
async def add_insight(request: InsightRequest):
|
||||
"""
|
||||
Add a new insight to the playbook.
|
||||
|
||||
Categories:
|
||||
- str: Strategy - Optimization strategies that work
|
||||
- mis: Mistake - Common mistakes to avoid
|
||||
- tool: Tool - Tool usage patterns
|
||||
- cal: Calculation - Formulas and calculations
|
||||
- dom: Domain - Domain-specific knowledge (FEA, NX)
|
||||
- wf: Workflow - Workflow patterns
|
||||
"""
|
||||
try:
|
||||
from optimization_engine.context.playbook import InsightCategory
|
||||
except ImportError as e:
|
||||
raise HTTPException(500, f"Context module not available: {e}")
|
||||
|
||||
# Validate category
|
||||
try:
|
||||
category = InsightCategory(request.category)
|
||||
except ValueError:
|
||||
raise HTTPException(400, f"Invalid category: {request.category}")
|
||||
|
||||
playbook = get_playbook()
|
||||
|
||||
item = playbook.add_insight(
|
||||
category=category,
|
||||
content=request.content,
|
||||
source_trial=request.source_trial,
|
||||
tags=request.tags
|
||||
)
|
||||
|
||||
playbook.save(PLAYBOOK_PATH)
|
||||
|
||||
return {
|
||||
"item_id": item.id,
|
||||
"category": item.category.value,
|
||||
"content": item.content,
|
||||
"message": "Insight added successfully"
|
||||
}
|
||||
|
||||
|
||||
@router.delete("/playbook/items/{item_id}")
|
||||
async def delete_playbook_item(item_id: str):
|
||||
"""Delete a playbook item."""
|
||||
playbook = get_playbook()
|
||||
|
||||
if item_id not in playbook.items:
|
||||
raise HTTPException(404, f"Item not found: {item_id}")
|
||||
|
||||
content = playbook.items[item_id].content[:50]
|
||||
del playbook.items[item_id]
|
||||
playbook.save(PLAYBOOK_PATH)
|
||||
|
||||
return {
|
||||
"deleted": item_id,
|
||||
"content_preview": content
|
||||
}
|
||||
|
||||
|
||||
@router.post("/playbook/prune")
|
||||
async def prune_playbook(threshold: int = Query(-3, description="Net score threshold for pruning")):
|
||||
"""
|
||||
Prune harmful items from the playbook.
|
||||
|
||||
Items with net_score <= threshold will be removed.
|
||||
"""
|
||||
playbook = get_playbook()
|
||||
|
||||
removed_count = playbook.prune_harmful(threshold=threshold)
|
||||
playbook.save(PLAYBOOK_PATH)
|
||||
|
||||
return {
|
||||
"items_pruned": removed_count,
|
||||
"threshold_used": threshold,
|
||||
"remaining_items": len(playbook.items)
|
||||
}
|
||||
|
||||
|
||||
@router.get("/playbook/context")
|
||||
async def get_playbook_context(
|
||||
task_type: str = Query("optimization", description="Task type for context filtering"),
|
||||
max_items: int = Query(15, description="Maximum items to include"),
|
||||
min_confidence: float = Query(0.5, description="Minimum confidence threshold")
|
||||
):
|
||||
"""
|
||||
Get playbook context string formatted for LLM consumption.
|
||||
|
||||
This is what gets injected into the LLM context window.
|
||||
"""
|
||||
playbook = get_playbook()
|
||||
|
||||
context = playbook.get_context_for_task(
|
||||
task_type=task_type,
|
||||
max_items=max_items,
|
||||
min_confidence=min_confidence
|
||||
)
|
||||
|
||||
return {
|
||||
"context": context,
|
||||
"items_included": min(max_items, len(playbook.items)),
|
||||
"task_type": task_type
|
||||
}
|
||||
|
||||
|
||||
# Session state endpoints
|
||||
@router.get("/session", response_model=SessionStateResponse)
|
||||
async def get_session_state():
|
||||
"""Get current session state."""
|
||||
try:
|
||||
from optimization_engine.context.session_state import get_session
|
||||
session = get_session()
|
||||
|
||||
return SessionStateResponse(
|
||||
session_id=session.session_id,
|
||||
task_type=session.exposed.task_type.value if session.exposed.task_type else None,
|
||||
study_name=session.exposed.study_name,
|
||||
study_status=session.exposed.study_status,
|
||||
trials_completed=session.exposed.trials_completed,
|
||||
trials_total=session.exposed.trials_total,
|
||||
best_value=session.exposed.best_value,
|
||||
recent_actions=session.exposed.recent_actions[-10:],
|
||||
recent_errors=session.exposed.recent_errors[-5:]
|
||||
)
|
||||
except ImportError:
|
||||
raise HTTPException(500, "Session state module not available")
|
||||
|
||||
|
||||
@router.get("/session/context")
|
||||
async def get_session_context():
|
||||
"""Get session context string for LLM consumption."""
|
||||
try:
|
||||
from optimization_engine.context.session_state import get_session
|
||||
session = get_session()
|
||||
|
||||
return {
|
||||
"context": session.get_llm_context(),
|
||||
"session_id": session.session_id,
|
||||
"last_updated": session.last_updated
|
||||
}
|
||||
except ImportError:
|
||||
raise HTTPException(500, "Session state module not available")
|
||||
|
||||
|
||||
# Cache monitoring endpoints
|
||||
@router.get("/cache/stats")
|
||||
async def get_cache_stats():
|
||||
"""Get KV-cache efficiency statistics."""
|
||||
try:
|
||||
from optimization_engine.context.cache_monitor import get_cache_optimizer
|
||||
optimizer = get_cache_optimizer()
|
||||
|
||||
return {
|
||||
"stats": optimizer.get_stats_dict(),
|
||||
"report": optimizer.get_report()
|
||||
}
|
||||
except ImportError:
|
||||
return {
|
||||
"message": "Cache monitoring not active",
|
||||
"stats": None
|
||||
}
|
||||
|
||||
|
||||
# Learning report endpoints
|
||||
@router.get("/learning/report")
|
||||
async def get_learning_report():
|
||||
"""Get a comprehensive learning report."""
|
||||
playbook = get_playbook()
|
||||
stats = playbook.get_stats()
|
||||
|
||||
# Get top and worst performers
|
||||
items = list(playbook.items.values())
|
||||
items.sort(key=lambda x: x.net_score, reverse=True)
|
||||
|
||||
top_performers = [
|
||||
{"id": i.id, "content": i.content[:100], "score": i.net_score}
|
||||
for i in items[:10]
|
||||
]
|
||||
|
||||
items.sort(key=lambda x: x.net_score)
|
||||
worst_performers = [
|
||||
{"id": i.id, "content": i.content[:100], "score": i.net_score}
|
||||
for i in items[:5] if i.net_score < 0
|
||||
]
|
||||
|
||||
return {
|
||||
"generated_at": datetime.now().isoformat(),
|
||||
"playbook_stats": stats,
|
||||
"top_performers": top_performers,
|
||||
"worst_performers": worst_performers,
|
||||
"recommendations": _generate_recommendations(playbook)
|
||||
}
|
||||
|
||||
|
||||
def _generate_recommendations(playbook) -> List[str]:
|
||||
"""Generate recommendations based on playbook state."""
|
||||
recommendations = []
|
||||
|
||||
# Check for harmful items
|
||||
harmful = [i for i in playbook.items.values() if i.net_score < -3]
|
||||
if harmful:
|
||||
recommendations.append(
|
||||
f"Consider pruning {len(harmful)} harmful items (net_score < -3)"
|
||||
)
|
||||
|
||||
# Check for untested items
|
||||
untested = [
|
||||
i for i in playbook.items.values()
|
||||
if i.helpful_count + i.harmful_count == 0
|
||||
]
|
||||
if len(untested) > 10:
|
||||
recommendations.append(
|
||||
f"{len(untested)} items have no feedback - consider testing them"
|
||||
)
|
||||
|
||||
# Check category balance
|
||||
stats = playbook.get_stats()
|
||||
if stats["by_category"].get("MISTAKE", 0) < 5:
|
||||
recommendations.append(
|
||||
"Low mistake count - actively record errors when they occur"
|
||||
)
|
||||
|
||||
if not recommendations:
|
||||
recommendations.append("Playbook is in good health!")
|
||||
|
||||
return recommendations
|
||||
1172
docs/CONTEXT_ENGINEERING_REPORT.md
Normal file
1172
docs/CONTEXT_ENGINEERING_REPORT.md
Normal file
File diff suppressed because it is too large
Load Diff
948
docs/api/CONTEXT_ENGINEERING_API.md
Normal file
948
docs/api/CONTEXT_ENGINEERING_API.md
Normal file
@@ -0,0 +1,948 @@
|
||||
# Context Engineering API Reference
|
||||
|
||||
**Version**: 1.0
|
||||
**Updated**: 2025-12-29
|
||||
**Module**: `optimization_engine.context`
|
||||
|
||||
This document provides complete API documentation for the Atomizer Context Engineering (ACE) framework.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Module Overview](#module-overview)
|
||||
2. [Core Classes](#core-classes)
|
||||
- [AtomizerPlaybook](#atomizerplaybook)
|
||||
- [PlaybookItem](#playbookitem)
|
||||
- [InsightCategory](#insightcategory)
|
||||
3. [Session Management](#session-management)
|
||||
- [AtomizerSessionState](#atomizersessionstate)
|
||||
- [ExposedState](#exposedstate)
|
||||
- [IsolatedState](#isolatedstate)
|
||||
- [TaskType](#tasktype)
|
||||
4. [Analysis & Learning](#analysis--learning)
|
||||
- [AtomizerReflector](#atomizerreflector)
|
||||
- [FeedbackLoop](#feedbackloop)
|
||||
5. [Optimization](#optimization)
|
||||
- [CompactionManager](#compactionmanager)
|
||||
- [ContextCacheOptimizer](#contextcacheoptimizer)
|
||||
6. [Integration](#integration)
|
||||
- [ContextEngineeringMixin](#contextengineeringmixin)
|
||||
- [ContextAwareRunner](#contextawarerunner)
|
||||
7. [REST API](#rest-api)
|
||||
|
||||
---
|
||||
|
||||
## Module Overview
|
||||
|
||||
### Import Patterns
|
||||
|
||||
```python
|
||||
# Full import
|
||||
from optimization_engine.context import (
|
||||
# Core playbook
|
||||
AtomizerPlaybook,
|
||||
PlaybookItem,
|
||||
InsightCategory,
|
||||
|
||||
# Session management
|
||||
AtomizerSessionState,
|
||||
ExposedState,
|
||||
IsolatedState,
|
||||
TaskType,
|
||||
get_session,
|
||||
|
||||
# Analysis
|
||||
AtomizerReflector,
|
||||
OptimizationOutcome,
|
||||
InsightCandidate,
|
||||
|
||||
# Learning
|
||||
FeedbackLoop,
|
||||
FeedbackLoopFactory,
|
||||
|
||||
# Optimization
|
||||
CompactionManager,
|
||||
ContextEvent,
|
||||
EventType,
|
||||
ContextBudgetManager,
|
||||
ContextCacheOptimizer,
|
||||
CacheStats,
|
||||
StablePrefixBuilder,
|
||||
|
||||
# Integration
|
||||
ContextEngineeringMixin,
|
||||
ContextAwareRunner,
|
||||
)
|
||||
|
||||
# Convenience imports
|
||||
from optimization_engine.context import AtomizerPlaybook, get_session
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Core Classes
|
||||
|
||||
### AtomizerPlaybook
|
||||
|
||||
The central knowledge store for persistent learning across sessions.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
AtomizerPlaybook(
|
||||
items: Dict[str, PlaybookItem] = None,
|
||||
version: int = 1,
|
||||
created_at: str = None,
|
||||
last_updated: str = None
|
||||
)
|
||||
```
|
||||
|
||||
#### Class Methods
|
||||
|
||||
##### `load(path: Path) -> AtomizerPlaybook`
|
||||
Load playbook from JSON file.
|
||||
|
||||
```python
|
||||
playbook = AtomizerPlaybook.load(Path("knowledge_base/playbook.json"))
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `path`: Path to JSON file
|
||||
|
||||
**Returns:** AtomizerPlaybook instance
|
||||
|
||||
**Raises:** FileNotFoundError if file doesn't exist (creates new if not found)
|
||||
|
||||
---
|
||||
|
||||
#### Instance Methods
|
||||
|
||||
##### `save(path: Path) -> None`
|
||||
Save playbook to JSON file.
|
||||
|
||||
```python
|
||||
playbook.save(Path("knowledge_base/playbook.json"))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `add_insight(category, content, source_trial=None, tags=None) -> PlaybookItem`
|
||||
Add a new insight to the playbook.
|
||||
|
||||
```python
|
||||
item = playbook.add_insight(
|
||||
category=InsightCategory.STRATEGY,
|
||||
content="CMA-ES converges faster on smooth surfaces",
|
||||
source_trial=42,
|
||||
tags=["sampler", "convergence", "mirror"]
|
||||
)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `category` (InsightCategory): Category of the insight
|
||||
- `content` (str): The insight content
|
||||
- `source_trial` (int, optional): Trial number that generated this insight
|
||||
- `tags` (List[str], optional): Tags for filtering
|
||||
|
||||
**Returns:** The created PlaybookItem
|
||||
|
||||
---
|
||||
|
||||
##### `record_outcome(item_id: str, helpful: bool) -> None`
|
||||
Record whether an insight was helpful or harmful.
|
||||
|
||||
```python
|
||||
playbook.record_outcome("str_001", helpful=True)
|
||||
playbook.record_outcome("mis_003", helpful=False)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `item_id` (str): ID of the playbook item
|
||||
- `helpful` (bool): True if helpful, False if harmful
|
||||
|
||||
---
|
||||
|
||||
##### `get_context_for_task(task_type, max_items=15, min_confidence=0.5) -> str`
|
||||
Get formatted context string for LLM consumption.
|
||||
|
||||
```python
|
||||
context = playbook.get_context_for_task(
|
||||
task_type="optimization",
|
||||
max_items=15,
|
||||
min_confidence=0.5
|
||||
)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `task_type` (str): Type of task for filtering
|
||||
- `max_items` (int): Maximum items to include
|
||||
- `min_confidence` (float): Minimum confidence threshold (0.0-1.0)
|
||||
|
||||
**Returns:** Formatted string suitable for LLM context
|
||||
|
||||
---
|
||||
|
||||
##### `get_by_category(category, min_score=0) -> List[PlaybookItem]`
|
||||
Get items filtered by category.
|
||||
|
||||
```python
|
||||
mistakes = playbook.get_by_category(InsightCategory.MISTAKE, min_score=-2)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `category` (InsightCategory): Category to filter by
|
||||
- `min_score` (int): Minimum net score
|
||||
|
||||
**Returns:** List of matching PlaybookItems
|
||||
|
||||
---
|
||||
|
||||
##### `get_stats() -> Dict`
|
||||
Get playbook statistics.
|
||||
|
||||
```python
|
||||
stats = playbook.get_stats()
|
||||
# Returns:
|
||||
# {
|
||||
# "total_items": 45,
|
||||
# "by_category": {"STRATEGY": 12, "MISTAKE": 8, ...},
|
||||
# "version": 3,
|
||||
# "last_updated": "2025-12-29T10:30:00",
|
||||
# "avg_score": 2.4,
|
||||
# "max_score": 15,
|
||||
# "min_score": -3
|
||||
# }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `prune_harmful(threshold=-3) -> int`
|
||||
Remove items with net score below threshold.
|
||||
|
||||
```python
|
||||
removed_count = playbook.prune_harmful(threshold=-3)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `threshold` (int): Items with net_score <= threshold are removed
|
||||
|
||||
**Returns:** Number of items removed
|
||||
|
||||
---
|
||||
|
||||
### PlaybookItem
|
||||
|
||||
Dataclass representing a single playbook entry.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class PlaybookItem:
|
||||
id: str # e.g., "str_001", "mis_003"
|
||||
category: InsightCategory # Category enum
|
||||
content: str # The insight text
|
||||
helpful_count: int = 0 # Times marked helpful
|
||||
harmful_count: int = 0 # Times marked harmful
|
||||
tags: List[str] = field(default_factory=list)
|
||||
source_trial: Optional[int] = None
|
||||
created_at: str = "" # ISO timestamp
|
||||
last_used: Optional[str] = None # ISO timestamp
|
||||
```
|
||||
|
||||
#### Properties
|
||||
|
||||
```python
|
||||
item.net_score # helpful_count - harmful_count
|
||||
item.confidence # helpful / (helpful + harmful), or 0.5 if no feedback
|
||||
```
|
||||
|
||||
#### Methods
|
||||
|
||||
```python
|
||||
# Convert to context string for LLM
|
||||
context_str = item.to_context_string()
|
||||
# "[str_001] helpful=5 harmful=0 :: CMA-ES converges faster..."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### InsightCategory
|
||||
|
||||
Enum for categorizing insights.
|
||||
|
||||
```python
|
||||
class InsightCategory(Enum):
|
||||
STRATEGY = "str" # Optimization strategies that work
|
||||
CALCULATION = "cal" # Formulas and calculations
|
||||
MISTAKE = "mis" # Common mistakes to avoid
|
||||
TOOL = "tool" # Tool usage patterns
|
||||
DOMAIN = "dom" # Domain-specific knowledge (FEA, NX)
|
||||
WORKFLOW = "wf" # Workflow patterns
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# Create with enum
|
||||
category = InsightCategory.STRATEGY
|
||||
|
||||
# Create from string
|
||||
category = InsightCategory("str")
|
||||
|
||||
# Get string value
|
||||
value = InsightCategory.STRATEGY.value # "str"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Session Management
|
||||
|
||||
### AtomizerSessionState
|
||||
|
||||
Manages session context with exposed/isolated separation.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
session = AtomizerSessionState(
|
||||
session_id: str = None # Auto-generated UUID if not provided
|
||||
)
|
||||
```
|
||||
|
||||
#### Attributes
|
||||
|
||||
```python
|
||||
session.session_id # Unique session identifier
|
||||
session.exposed # ExposedState - always in LLM context
|
||||
session.isolated # IsolatedState - on-demand access only
|
||||
session.last_updated # ISO timestamp of last update
|
||||
```
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `get_llm_context() -> str`
|
||||
Get exposed state formatted for LLM context.
|
||||
|
||||
```python
|
||||
context = session.get_llm_context()
|
||||
# Returns formatted string with task type, study info, progress, etc.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `add_action(action: str) -> None`
|
||||
Record an action (keeps last 20).
|
||||
|
||||
```python
|
||||
session.add_action("Started optimization with TPE sampler")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `add_error(error: str, error_type: str = None) -> None`
|
||||
Record an error (keeps last 10).
|
||||
|
||||
```python
|
||||
session.add_error("NX solver timeout after 600s", error_type="solver")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `to_dict() / from_dict(data) -> AtomizerSessionState`
|
||||
Serialize/deserialize session state.
|
||||
|
||||
```python
|
||||
# Save
|
||||
data = session.to_dict()
|
||||
|
||||
# Restore
|
||||
session = AtomizerSessionState.from_dict(data)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ExposedState
|
||||
|
||||
State that's always included in LLM context.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ExposedState:
|
||||
task_type: Optional[TaskType] = None
|
||||
study_name: Optional[str] = None
|
||||
study_status: str = "idle"
|
||||
trials_completed: int = 0
|
||||
trials_total: int = 0
|
||||
best_value: Optional[float] = None
|
||||
recent_actions: List[str] = field(default_factory=list) # Last 20
|
||||
recent_errors: List[str] = field(default_factory=list) # Last 10
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### IsolatedState
|
||||
|
||||
State available on-demand but not in default context.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class IsolatedState:
|
||||
full_trial_history: List[Dict] = field(default_factory=list)
|
||||
detailed_errors: List[Dict] = field(default_factory=list)
|
||||
performance_metrics: Dict = field(default_factory=dict)
|
||||
debug_info: Dict = field(default_factory=dict)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### TaskType
|
||||
|
||||
Enum for session task classification.
|
||||
|
||||
```python
|
||||
class TaskType(Enum):
|
||||
CREATE_STUDY = "create_study"
|
||||
RUN_OPTIMIZATION = "run_optimization"
|
||||
MONITOR_PROGRESS = "monitor_progress"
|
||||
ANALYZE_RESULTS = "analyze_results"
|
||||
DEBUG_ERROR = "debug_error"
|
||||
CONFIGURE_SETTINGS = "configure_settings"
|
||||
NEURAL_ACCELERATION = "neural_acceleration"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### get_session()
|
||||
|
||||
Get or create the global session instance.
|
||||
|
||||
```python
|
||||
from optimization_engine.context import get_session
|
||||
|
||||
session = get_session()
|
||||
session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Analysis & Learning
|
||||
|
||||
### AtomizerReflector
|
||||
|
||||
Analyzes optimization outcomes and extracts insights.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
reflector = AtomizerReflector(playbook: AtomizerPlaybook)
|
||||
```
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `analyze_outcome(outcome: OptimizationOutcome) -> List[InsightCandidate]`
|
||||
Analyze an optimization outcome for insights.
|
||||
|
||||
```python
|
||||
outcome = OptimizationOutcome(
|
||||
study_name="bracket_v3",
|
||||
trial_number=42,
|
||||
params={'thickness': 10.5},
|
||||
objectives={'mass': 5.2},
|
||||
constraints_satisfied=True,
|
||||
error_message=None,
|
||||
solve_time=45.2
|
||||
)
|
||||
|
||||
insights = reflector.analyze_outcome(outcome)
|
||||
for insight in insights:
|
||||
print(f"{insight.category}: {insight.content}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `extract_error_insights(error_message: str) -> List[InsightCandidate]`
|
||||
Extract insights from error messages.
|
||||
|
||||
```python
|
||||
insights = reflector.extract_error_insights("Solution did not converge within tolerance")
|
||||
# Returns insights about convergence failures
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### OptimizationOutcome
|
||||
|
||||
Dataclass for optimization trial outcomes.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class OptimizationOutcome:
|
||||
study_name: str
|
||||
trial_number: int
|
||||
params: Dict[str, Any]
|
||||
objectives: Dict[str, float]
|
||||
constraints_satisfied: bool
|
||||
error_message: Optional[str] = None
|
||||
solve_time: Optional[float] = None
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### FeedbackLoop
|
||||
|
||||
Automated learning from optimization execution.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
feedback = FeedbackLoop(playbook_path: Path)
|
||||
```
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `process_trial_result(trial_number, params, objectives, is_feasible, error=None)`
|
||||
Process a trial result for learning opportunities.
|
||||
|
||||
```python
|
||||
feedback.process_trial_result(
|
||||
trial_number=42,
|
||||
params={'thickness': 10.5, 'width': 25.0},
|
||||
objectives={'mass': 5.2, 'stress': 180.0},
|
||||
is_feasible=True,
|
||||
error=None
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `finalize_study(study_summary: Dict) -> Dict`
|
||||
Finalize learning at end of optimization study.
|
||||
|
||||
```python
|
||||
result = feedback.finalize_study({
|
||||
"name": "bracket_v3",
|
||||
"total_trials": 100,
|
||||
"best_value": 4.8,
|
||||
"convergence_rate": 0.95
|
||||
})
|
||||
# Returns: {"insights_added": 3, "patterns_identified": ["fast_convergence"]}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optimization
|
||||
|
||||
### CompactionManager
|
||||
|
||||
Handles context compaction for long-running sessions.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
compactor = CompactionManager(
|
||||
max_events: int = 100,
|
||||
preserve_errors: bool = True,
|
||||
preserve_milestones: bool = True
|
||||
)
|
||||
```
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `add_event(event: ContextEvent) -> None`
|
||||
Add an event to the session history.
|
||||
|
||||
```python
|
||||
from optimization_engine.context import ContextEvent, EventType
|
||||
|
||||
event = ContextEvent(
|
||||
event_type=EventType.TRIAL_COMPLETE,
|
||||
content="Trial 42 completed: mass=5.2kg",
|
||||
timestamp=datetime.now().isoformat(),
|
||||
is_error=False,
|
||||
is_milestone=False
|
||||
)
|
||||
compactor.add_event(event)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `maybe_compact() -> Optional[str]`
|
||||
Compact events if over threshold.
|
||||
|
||||
```python
|
||||
summary = compactor.maybe_compact()
|
||||
if summary:
|
||||
print(f"Compacted: {summary}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `get_context() -> str`
|
||||
Get current context string.
|
||||
|
||||
```python
|
||||
context = compactor.get_context()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ContextCacheOptimizer
|
||||
|
||||
Monitors and optimizes KV-cache efficiency.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
optimizer = ContextCacheOptimizer()
|
||||
```
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `track_request(prefix_tokens: int, total_tokens: int)`
|
||||
Track a request for cache analysis.
|
||||
|
||||
```python
|
||||
optimizer.track_request(prefix_tokens=5000, total_tokens=15000)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `track_completion(success: bool, response_tokens: int)`
|
||||
Track completion for performance analysis.
|
||||
|
||||
```python
|
||||
optimizer.track_completion(success=True, response_tokens=500)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `get_stats_dict() -> Dict`
|
||||
Get cache statistics.
|
||||
|
||||
```python
|
||||
stats = optimizer.get_stats_dict()
|
||||
# Returns:
|
||||
# {
|
||||
# "total_requests": 150,
|
||||
# "cache_hits": 120,
|
||||
# "cache_hit_rate": 0.8,
|
||||
# "avg_prefix_ratio": 0.33,
|
||||
# ...
|
||||
# }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `get_report() -> str`
|
||||
Get human-readable report.
|
||||
|
||||
```python
|
||||
report = optimizer.get_report()
|
||||
print(report)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration
|
||||
|
||||
### ContextEngineeringMixin
|
||||
|
||||
Mixin class for adding context engineering to optimization runners.
|
||||
|
||||
```python
|
||||
class ContextEngineeringMixin:
|
||||
def init_context_engineering(self, playbook_path: Path):
|
||||
"""Initialize context engineering components."""
|
||||
|
||||
def record_trial_outcome(self, trial_number, params, objectives,
|
||||
is_feasible, error=None):
|
||||
"""Record trial outcome for learning."""
|
||||
|
||||
def get_context_for_llm(self) -> str:
|
||||
"""Get combined context for LLM consumption."""
|
||||
|
||||
def finalize_context_engineering(self, study_summary: Dict):
|
||||
"""Finalize learning at study completion."""
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### ContextAwareRunner
|
||||
|
||||
Pre-built runner with context engineering enabled.
|
||||
|
||||
```python
|
||||
from optimization_engine.context import ContextAwareRunner
|
||||
|
||||
runner = ContextAwareRunner(
|
||||
config=config_dict,
|
||||
playbook_path=Path("knowledge_base/playbook.json")
|
||||
)
|
||||
|
||||
# Run optimization with automatic learning
|
||||
runner.run()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## REST API
|
||||
|
||||
The Context Engineering module exposes REST endpoints via FastAPI.
|
||||
|
||||
### Base URL
|
||||
```
|
||||
http://localhost:5000/api/context
|
||||
```
|
||||
|
||||
### Endpoints
|
||||
|
||||
#### GET `/playbook`
|
||||
Get playbook summary statistics.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"total_items": 45,
|
||||
"by_category": {"STRATEGY": 12, "MISTAKE": 8},
|
||||
"version": 3,
|
||||
"last_updated": "2025-12-29T10:30:00",
|
||||
"avg_score": 2.4,
|
||||
"top_score": 15,
|
||||
"lowest_score": -3
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### GET `/playbook/items`
|
||||
List playbook items with optional filters.
|
||||
|
||||
**Query Parameters:**
|
||||
- `category` (str): Filter by category (str, mis, tool, cal, dom, wf)
|
||||
- `min_score` (int): Minimum net score (default: 0)
|
||||
- `min_confidence` (float): Minimum confidence (default: 0.0)
|
||||
- `limit` (int): Max items (default: 50)
|
||||
- `offset` (int): Pagination offset (default: 0)
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "str_001",
|
||||
"category": "str",
|
||||
"content": "CMA-ES converges faster on smooth surfaces",
|
||||
"helpful_count": 5,
|
||||
"harmful_count": 0,
|
||||
"net_score": 5,
|
||||
"confidence": 1.0,
|
||||
"tags": ["sampler", "convergence"],
|
||||
"created_at": "2025-12-29T10:00:00",
|
||||
"last_used": "2025-12-29T10:30:00"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### GET `/playbook/items/{item_id}`
|
||||
Get a specific playbook item.
|
||||
|
||||
**Response:** Single PlaybookItemResponse object
|
||||
|
||||
---
|
||||
|
||||
#### POST `/playbook/feedback`
|
||||
Record feedback on a playbook item.
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"item_id": "str_001",
|
||||
"helpful": true
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"item_id": "str_001",
|
||||
"new_score": 6,
|
||||
"new_confidence": 1.0,
|
||||
"helpful_count": 6,
|
||||
"harmful_count": 0
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### POST `/playbook/insights`
|
||||
Add a new insight.
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"category": "str",
|
||||
"content": "New insight content",
|
||||
"tags": ["tag1", "tag2"],
|
||||
"source_trial": 42
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"item_id": "str_015",
|
||||
"category": "str",
|
||||
"content": "New insight content",
|
||||
"message": "Insight added successfully"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### DELETE `/playbook/items/{item_id}`
|
||||
Delete a playbook item.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"deleted": "str_001",
|
||||
"content_preview": "CMA-ES converges faster..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### POST `/playbook/prune`
|
||||
Remove harmful items.
|
||||
|
||||
**Query Parameters:**
|
||||
- `threshold` (int): Net score threshold (default: -3)
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"items_pruned": 3,
|
||||
"threshold_used": -3,
|
||||
"remaining_items": 42
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### GET `/playbook/context`
|
||||
Get playbook context for LLM consumption.
|
||||
|
||||
**Query Parameters:**
|
||||
- `task_type` (str): Task type (default: "optimization")
|
||||
- `max_items` (int): Maximum items (default: 15)
|
||||
- `min_confidence` (float): Minimum confidence (default: 0.5)
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"context": "## Atomizer Knowledge Base\n...",
|
||||
"items_included": 15,
|
||||
"task_type": "optimization"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### GET `/session`
|
||||
Get current session state.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"task_type": "run_optimization",
|
||||
"study_name": "bracket_v3",
|
||||
"study_status": "running",
|
||||
"trials_completed": 42,
|
||||
"trials_total": 100,
|
||||
"best_value": 5.2,
|
||||
"recent_actions": ["Started optimization", "Trial 42 complete"],
|
||||
"recent_errors": []
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### GET `/session/context`
|
||||
Get session context for LLM consumption.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"context": "## Current Session\nTask: run_optimization\n...",
|
||||
"session_id": "abc123",
|
||||
"last_updated": "2025-12-29T10:30:00"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### GET `/cache/stats`
|
||||
Get KV-cache statistics.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"stats": {
|
||||
"total_requests": 150,
|
||||
"cache_hits": 120,
|
||||
"cache_hit_rate": 0.8
|
||||
},
|
||||
"report": "Cache Performance Report\n..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### GET `/learning/report`
|
||||
Get comprehensive learning report.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"generated_at": "2025-12-29T10:30:00",
|
||||
"playbook_stats": {...},
|
||||
"top_performers": [
|
||||
{"id": "str_001", "content": "...", "score": 15}
|
||||
],
|
||||
"worst_performers": [
|
||||
{"id": "mis_003", "content": "...", "score": -2}
|
||||
],
|
||||
"recommendations": [
|
||||
"Consider pruning 3 harmful items (net_score < -3)"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
All API endpoints return appropriate HTTP status codes:
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 200 | Success |
|
||||
| 400 | Bad request (invalid parameters) |
|
||||
| 404 | Not found (item doesn't exist) |
|
||||
| 500 | Server error (module not available) |
|
||||
|
||||
Error response format:
|
||||
```json
|
||||
{
|
||||
"detail": "Error description"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Context Engineering Report](../CONTEXT_ENGINEERING_REPORT.md) - Full implementation report
|
||||
- [SYS_17 Protocol](../protocols/system/SYS_17_CONTEXT_ENGINEERING.md) - System protocol
|
||||
- [Cheatsheet](../../.claude/skills/01_CHEATSHEET.md) - Quick reference
|
||||
307
docs/protocols/system/SYS_17_CONTEXT_ENGINEERING.md
Normal file
307
docs/protocols/system/SYS_17_CONTEXT_ENGINEERING.md
Normal file
@@ -0,0 +1,307 @@
|
||||
---
|
||||
protocol_id: SYS_17
|
||||
version: 1.0
|
||||
last_updated: 2025-12-29
|
||||
status: active
|
||||
owner: system
|
||||
code_dependencies:
|
||||
- optimization_engine.context.*
|
||||
requires_protocols: []
|
||||
---
|
||||
|
||||
# SYS_17: Context Engineering System
|
||||
|
||||
## Overview
|
||||
|
||||
The Context Engineering System implements the **Agentic Context Engineering (ACE)** framework, enabling Atomizer to learn from every optimization run and accumulate institutional knowledge over time.
|
||||
|
||||
## When to Load This Protocol
|
||||
|
||||
Load SYS_17 when:
|
||||
- User asks about "learning", "playbook", or "context engineering"
|
||||
- Debugging why certain knowledge isn't being applied
|
||||
- Configuring context behavior
|
||||
- Analyzing what the system has learned
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### The ACE Framework
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Generator │────▶│ Reflector │────▶│ Curator │
|
||||
│ (Opt Runs) │ │ (Analysis) │ │ (Playbook) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │
|
||||
└───────────── Feedback ───────────────┘
|
||||
```
|
||||
|
||||
1. **Generator**: OptimizationRunner produces trial outcomes
|
||||
2. **Reflector**: Analyzes outcomes, extracts patterns
|
||||
3. **Curator**: Playbook stores and manages insights
|
||||
4. **Feedback**: Success/failure updates insight scores
|
||||
|
||||
### Playbook Item Structure
|
||||
|
||||
```
|
||||
[str-00001] helpful=8 harmful=0 :: "Use shell elements for thin walls"
|
||||
│ │ │ │
|
||||
│ │ │ └── Insight content
|
||||
│ │ └── Times advice led to failure
|
||||
│ └── Times advice led to success
|
||||
└── Unique ID (category-number)
|
||||
```
|
||||
|
||||
### Categories
|
||||
|
||||
| Code | Name | Description | Example |
|
||||
|------|------|-------------|---------|
|
||||
| `str` | STRATEGY | Optimization approaches | "Start with TPE, switch to CMA-ES" |
|
||||
| `mis` | MISTAKE | Things to avoid | "Don't use coarse mesh for stress" |
|
||||
| `tool` | TOOL | Tool usage tips | "Use GP sampler for few-shot" |
|
||||
| `cal` | CALCULATION | Formulas | "Safety factor = yield/max_stress" |
|
||||
| `dom` | DOMAIN | Domain knowledge | "Zernike coefficients for mirrors" |
|
||||
| `wf` | WORKFLOW | Workflow patterns | "Load _i.prt before UpdateFemodel()" |
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. AtomizerPlaybook
|
||||
|
||||
Location: `optimization_engine/context/playbook.py`
|
||||
|
||||
The central knowledge store. Handles:
|
||||
- Adding insights (with auto-deduplication)
|
||||
- Recording helpful/harmful outcomes
|
||||
- Generating filtered context for LLM
|
||||
- Pruning consistently harmful items
|
||||
- Persistence (JSON)
|
||||
|
||||
**Quick Usage:**
|
||||
```python
|
||||
from optimization_engine.context import get_playbook, save_playbook, InsightCategory
|
||||
|
||||
playbook = get_playbook()
|
||||
playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements for thin walls")
|
||||
playbook.record_outcome("str-00001", helpful=True)
|
||||
save_playbook()
|
||||
```
|
||||
|
||||
### 2. AtomizerReflector
|
||||
|
||||
Location: `optimization_engine/context/reflector.py`
|
||||
|
||||
Analyzes optimization outcomes to extract insights:
|
||||
- Classifies errors (convergence, mesh, singularity, etc.)
|
||||
- Extracts success patterns
|
||||
- Generates study-level insights
|
||||
|
||||
**Quick Usage:**
|
||||
```python
|
||||
from optimization_engine.context import AtomizerReflector, OptimizationOutcome
|
||||
|
||||
reflector = AtomizerReflector(playbook)
|
||||
outcome = OptimizationOutcome(trial_number=42, success=True, ...)
|
||||
insights = reflector.analyze_trial(outcome)
|
||||
reflector.commit_insights()
|
||||
```
|
||||
|
||||
### 3. FeedbackLoop
|
||||
|
||||
Location: `optimization_engine/context/feedback_loop.py`
|
||||
|
||||
Automated learning loop that:
|
||||
- Processes trial results
|
||||
- Updates playbook scores based on outcomes
|
||||
- Tracks which items were active per trial
|
||||
- Finalizes learning at study end
|
||||
|
||||
**Quick Usage:**
|
||||
```python
|
||||
from optimization_engine.context import FeedbackLoop
|
||||
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
feedback.process_trial_result(trial_number=42, success=True, ...)
|
||||
feedback.finalize_study({"name": "study", "total_trials": 100, ...})
|
||||
```
|
||||
|
||||
### 4. SessionState
|
||||
|
||||
Location: `optimization_engine/context/session_state.py`
|
||||
|
||||
Manages context isolation:
|
||||
- **Exposed**: Always in LLM context (task type, recent actions, errors)
|
||||
- **Isolated**: On-demand access (full history, NX paths, F06 content)
|
||||
|
||||
**Quick Usage:**
|
||||
```python
|
||||
from optimization_engine.context import get_session, TaskType
|
||||
|
||||
session = get_session()
|
||||
session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
||||
session.add_action("Started trial 42")
|
||||
context = session.get_llm_context()
|
||||
```
|
||||
|
||||
### 5. CompactionManager
|
||||
|
||||
Location: `optimization_engine/context/compaction.py`
|
||||
|
||||
Handles long sessions:
|
||||
- Triggers compaction at threshold (default 50 events)
|
||||
- Summarizes old events into statistics
|
||||
- Preserves errors and milestones
|
||||
|
||||
### 6. CacheOptimizer
|
||||
|
||||
Location: `optimization_engine/context/cache_monitor.py`
|
||||
|
||||
Optimizes for KV-cache:
|
||||
- Three-tier context structure (stable/semi-stable/dynamic)
|
||||
- Tracks cache hit rate
|
||||
- Estimates cost savings
|
||||
|
||||
## Integration with OptimizationRunner
|
||||
|
||||
### Option 1: Mixin
|
||||
|
||||
```python
|
||||
from optimization_engine.context.runner_integration import ContextEngineeringMixin
|
||||
|
||||
class MyRunner(ContextEngineeringMixin, OptimizationRunner):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self.init_context_engineering()
|
||||
```
|
||||
|
||||
### Option 2: Wrapper
|
||||
|
||||
```python
|
||||
from optimization_engine.context.runner_integration import ContextAwareRunner
|
||||
|
||||
runner = OptimizationRunner(config_path=...)
|
||||
context_runner = ContextAwareRunner(runner)
|
||||
context_runner.run(n_trials=100)
|
||||
```
|
||||
|
||||
## Dashboard API
|
||||
|
||||
Base URL: `/api/context`
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/playbook` | GET | Playbook summary |
|
||||
| `/playbook/items` | GET | List items (with filters) |
|
||||
| `/playbook/items/{id}` | GET | Get specific item |
|
||||
| `/playbook/feedback` | POST | Record helpful/harmful |
|
||||
| `/playbook/insights` | POST | Add new insight |
|
||||
| `/playbook/prune` | POST | Prune harmful items |
|
||||
| `/playbook/context` | GET | Get LLM context string |
|
||||
| `/session` | GET | Session state |
|
||||
| `/learning/report` | GET | Learning report |
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Record Immediately
|
||||
|
||||
Don't wait until session end:
|
||||
```python
|
||||
# RIGHT: Record immediately
|
||||
playbook.add_insight(InsightCategory.MISTAKE, "Convergence failed with X")
|
||||
playbook.save(path)
|
||||
|
||||
# WRONG: Wait until end
|
||||
# (User might close session, learning lost)
|
||||
```
|
||||
|
||||
### 2. Be Specific
|
||||
|
||||
```python
|
||||
# GOOD: Specific and actionable
|
||||
"For bracket optimization with >5 variables, TPE outperforms random search"
|
||||
|
||||
# BAD: Vague
|
||||
"TPE is good"
|
||||
```
|
||||
|
||||
### 3. Include Context
|
||||
|
||||
```python
|
||||
playbook.add_insight(
|
||||
InsightCategory.STRATEGY,
|
||||
"Shell elements reduce solve time by 40% for thickness < 2mm",
|
||||
tags=["mesh", "shell", "performance"]
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Review Harmful Items
|
||||
|
||||
Periodically check items with negative scores:
|
||||
```python
|
||||
harmful = [i for i in playbook.items.values() if i.net_score < 0]
|
||||
for item in harmful:
|
||||
print(f"{item.id}: {item.content[:50]}... (score={item.net_score})")
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Playbook Not Updating
|
||||
|
||||
1. Check playbook path:
|
||||
```python
|
||||
print(playbook_path) # Should be knowledge_base/playbook.json
|
||||
```
|
||||
|
||||
2. Verify save is called:
|
||||
```python
|
||||
playbook.save(path) # Must be explicit
|
||||
```
|
||||
|
||||
### Insights Not Appearing in Context
|
||||
|
||||
1. Check confidence threshold:
|
||||
```python
|
||||
# Default is 0.5 - new items start at 0.5
|
||||
context = playbook.get_context_for_task("opt", min_confidence=0.3)
|
||||
```
|
||||
|
||||
2. Check if items exist:
|
||||
```python
|
||||
print(f"Total items: {len(playbook.items)}")
|
||||
```
|
||||
|
||||
### Learning Not Working
|
||||
|
||||
1. Verify FeedbackLoop is finalized:
|
||||
```python
|
||||
feedback.finalize_study(...) # MUST be called
|
||||
```
|
||||
|
||||
2. Check context_items_used parameter:
|
||||
```python
|
||||
# Items must be explicitly tracked
|
||||
feedback.process_trial_result(
|
||||
...,
|
||||
context_items_used=list(playbook.items.keys())[:10]
|
||||
)
|
||||
```
|
||||
|
||||
## Files Reference
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `optimization_engine/context/__init__.py` | Module exports |
|
||||
| `optimization_engine/context/playbook.py` | Knowledge store |
|
||||
| `optimization_engine/context/reflector.py` | Outcome analysis |
|
||||
| `optimization_engine/context/session_state.py` | Context isolation |
|
||||
| `optimization_engine/context/feedback_loop.py` | Learning loop |
|
||||
| `optimization_engine/context/compaction.py` | Long session management |
|
||||
| `optimization_engine/context/cache_monitor.py` | KV-cache optimization |
|
||||
| `optimization_engine/context/runner_integration.py` | Runner integration |
|
||||
| `knowledge_base/playbook.json` | Persistent storage |
|
||||
|
||||
## See Also
|
||||
|
||||
- `docs/CONTEXT_ENGINEERING_REPORT.md` - Full implementation report
|
||||
- `.claude/skills/00_BOOTSTRAP_V2.md` - Enhanced bootstrap
|
||||
- `tests/test_context_engineering.py` - Unit tests
|
||||
- `tests/test_context_integration.py` - Integration tests
|
||||
123
optimization_engine/context/__init__.py
Normal file
123
optimization_engine/context/__init__.py
Normal file
@@ -0,0 +1,123 @@
|
||||
"""
|
||||
Atomizer Context Engineering Module
|
||||
|
||||
Implements state-of-the-art context engineering for LLM-powered optimization.
|
||||
Based on the ACE (Agentic Context Engineering) framework.
|
||||
|
||||
Components:
|
||||
- Playbook: Structured knowledge store with helpful/harmful tracking
|
||||
- Reflector: Analyzes optimization outcomes to extract insights
|
||||
- SessionState: Context isolation with exposed/isolated separation
|
||||
- CacheMonitor: KV-cache optimization for cost reduction
|
||||
- FeedbackLoop: Automated learning from execution
|
||||
- Compaction: Long-running session context management
|
||||
|
||||
Usage:
|
||||
from optimization_engine.context import (
|
||||
AtomizerPlaybook,
|
||||
AtomizerReflector,
|
||||
AtomizerSessionState,
|
||||
FeedbackLoop,
|
||||
CompactionManager
|
||||
)
|
||||
|
||||
# Load or create playbook
|
||||
playbook = AtomizerPlaybook.load(path)
|
||||
|
||||
# Create feedback loop for learning
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
# Process trial results
|
||||
feedback.process_trial_result(...)
|
||||
|
||||
# Finalize and commit learning
|
||||
feedback.finalize_study(stats)
|
||||
"""
|
||||
|
||||
from .playbook import (
|
||||
AtomizerPlaybook,
|
||||
PlaybookItem,
|
||||
InsightCategory,
|
||||
get_playbook,
|
||||
save_playbook,
|
||||
)
|
||||
|
||||
from .reflector import (
|
||||
AtomizerReflector,
|
||||
OptimizationOutcome,
|
||||
InsightCandidate,
|
||||
ReflectorFactory,
|
||||
)
|
||||
|
||||
from .session_state import (
|
||||
AtomizerSessionState,
|
||||
ExposedState,
|
||||
IsolatedState,
|
||||
TaskType,
|
||||
get_session,
|
||||
set_session,
|
||||
clear_session,
|
||||
)
|
||||
|
||||
from .cache_monitor import (
|
||||
ContextCacheOptimizer,
|
||||
CacheStats,
|
||||
ContextSection,
|
||||
StablePrefixBuilder,
|
||||
get_cache_optimizer,
|
||||
)
|
||||
|
||||
from .feedback_loop import (
|
||||
FeedbackLoop,
|
||||
FeedbackLoopFactory,
|
||||
)
|
||||
|
||||
from .compaction import (
|
||||
CompactionManager,
|
||||
ContextEvent,
|
||||
EventType,
|
||||
ContextBudgetManager,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Playbook
|
||||
"AtomizerPlaybook",
|
||||
"PlaybookItem",
|
||||
"InsightCategory",
|
||||
"get_playbook",
|
||||
"save_playbook",
|
||||
|
||||
# Reflector
|
||||
"AtomizerReflector",
|
||||
"OptimizationOutcome",
|
||||
"InsightCandidate",
|
||||
"ReflectorFactory",
|
||||
|
||||
# Session State
|
||||
"AtomizerSessionState",
|
||||
"ExposedState",
|
||||
"IsolatedState",
|
||||
"TaskType",
|
||||
"get_session",
|
||||
"set_session",
|
||||
"clear_session",
|
||||
|
||||
# Cache Monitor
|
||||
"ContextCacheOptimizer",
|
||||
"CacheStats",
|
||||
"ContextSection",
|
||||
"StablePrefixBuilder",
|
||||
"get_cache_optimizer",
|
||||
|
||||
# Feedback Loop
|
||||
"FeedbackLoop",
|
||||
"FeedbackLoopFactory",
|
||||
|
||||
# Compaction
|
||||
"CompactionManager",
|
||||
"ContextEvent",
|
||||
"EventType",
|
||||
"ContextBudgetManager",
|
||||
]
|
||||
|
||||
__version__ = "1.0.0"
|
||||
390
optimization_engine/context/cache_monitor.py
Normal file
390
optimization_engine/context/cache_monitor.py
Normal file
@@ -0,0 +1,390 @@
|
||||
"""
|
||||
Atomizer Cache Monitor - KV-Cache Optimization
|
||||
|
||||
Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
|
||||
|
||||
Monitors and optimizes KV-cache hit rates for cost reduction.
|
||||
Based on the principle that cached tokens cost ~10x less than uncached.
|
||||
|
||||
The cache monitor tracks:
|
||||
- Stable prefix length (should stay constant for cache hits)
|
||||
- Cache hit rate across requests
|
||||
- Estimated cost savings
|
||||
|
||||
Structure for KV-cache optimization:
|
||||
1. STABLE PREFIX - Never changes (identity, tools, routing)
|
||||
2. SEMI-STABLE - Changes per session type (protocols, playbook)
|
||||
3. DYNAMIC - Changes every turn (state, user message)
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional, List, Dict, Any
|
||||
from datetime import datetime
|
||||
import hashlib
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
@dataclass
|
||||
class CacheStats:
|
||||
"""Statistics for cache efficiency tracking."""
|
||||
total_requests: int = 0
|
||||
cache_hits: int = 0
|
||||
cache_misses: int = 0
|
||||
prefix_length_chars: int = 0
|
||||
prefix_length_tokens: int = 0 # Estimated
|
||||
|
||||
@property
|
||||
def hit_rate(self) -> float:
|
||||
"""Calculate cache hit rate (0.0-1.0)."""
|
||||
if self.total_requests == 0:
|
||||
return 0.0
|
||||
return self.cache_hits / self.total_requests
|
||||
|
||||
@property
|
||||
def estimated_savings_percent(self) -> float:
|
||||
"""
|
||||
Estimate cost savings from cache hits.
|
||||
|
||||
Based on ~10x cost difference between cached/uncached tokens.
|
||||
"""
|
||||
if self.total_requests == 0:
|
||||
return 0.0
|
||||
# Cached tokens cost ~10% of uncached
|
||||
# So savings = hit_rate * 90%
|
||||
return self.hit_rate * 90.0
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"total_requests": self.total_requests,
|
||||
"cache_hits": self.cache_hits,
|
||||
"cache_misses": self.cache_misses,
|
||||
"hit_rate": self.hit_rate,
|
||||
"prefix_length_chars": self.prefix_length_chars,
|
||||
"prefix_length_tokens": self.prefix_length_tokens,
|
||||
"estimated_savings_percent": self.estimated_savings_percent
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ContextSection:
|
||||
"""A section of context with stability classification."""
|
||||
name: str
|
||||
content: str
|
||||
stability: str # "stable", "semi_stable", "dynamic"
|
||||
last_hash: str = ""
|
||||
|
||||
def compute_hash(self) -> str:
|
||||
"""Compute content hash for change detection."""
|
||||
return hashlib.md5(self.content.encode()).hexdigest()
|
||||
|
||||
def has_changed(self) -> bool:
|
||||
"""Check if content has changed since last hash."""
|
||||
current_hash = self.compute_hash()
|
||||
changed = current_hash != self.last_hash
|
||||
self.last_hash = current_hash
|
||||
return changed
|
||||
|
||||
|
||||
class ContextCacheOptimizer:
|
||||
"""
|
||||
Tracks and optimizes context for cache efficiency.
|
||||
|
||||
Implements the three-tier context structure:
|
||||
1. Stable prefix (cached across all requests)
|
||||
2. Semi-stable section (cached per session type)
|
||||
3. Dynamic section (changes every turn)
|
||||
|
||||
Usage:
|
||||
optimizer = ContextCacheOptimizer()
|
||||
|
||||
# Build context with cache optimization
|
||||
context = optimizer.prepare_context(
|
||||
stable_prefix=identity_and_tools,
|
||||
semi_stable=protocols_and_playbook,
|
||||
dynamic=state_and_message
|
||||
)
|
||||
|
||||
# Check efficiency
|
||||
print(optimizer.get_report())
|
||||
"""
|
||||
|
||||
# Approximate tokens per character for estimation
|
||||
CHARS_PER_TOKEN = 4
|
||||
|
||||
def __init__(self):
|
||||
self.stats = CacheStats()
|
||||
self._sections: Dict[str, ContextSection] = {}
|
||||
self._last_stable_hash: Optional[str] = None
|
||||
self._last_semi_stable_hash: Optional[str] = None
|
||||
self._request_history: List[Dict[str, Any]] = []
|
||||
|
||||
def prepare_context(
|
||||
self,
|
||||
stable_prefix: str,
|
||||
semi_stable: str,
|
||||
dynamic: str
|
||||
) -> str:
|
||||
"""
|
||||
Assemble context optimized for caching.
|
||||
|
||||
Tracks whether prefix changed (cache miss).
|
||||
|
||||
Args:
|
||||
stable_prefix: Content that never changes (tools, identity)
|
||||
semi_stable: Content that changes per session type
|
||||
dynamic: Content that changes every turn
|
||||
|
||||
Returns:
|
||||
Assembled context string with clear section boundaries
|
||||
"""
|
||||
# Hash the stable prefix
|
||||
stable_hash = hashlib.md5(stable_prefix.encode()).hexdigest()
|
||||
|
||||
self.stats.total_requests += 1
|
||||
|
||||
# Check for cache hit (stable prefix unchanged)
|
||||
if stable_hash == self._last_stable_hash:
|
||||
self.stats.cache_hits += 1
|
||||
else:
|
||||
self.stats.cache_misses += 1
|
||||
|
||||
self._last_stable_hash = stable_hash
|
||||
self.stats.prefix_length_chars = len(stable_prefix)
|
||||
self.stats.prefix_length_tokens = len(stable_prefix) // self.CHARS_PER_TOKEN
|
||||
|
||||
# Record request for history
|
||||
self._request_history.append({
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"cache_hit": stable_hash == self._last_stable_hash,
|
||||
"stable_length": len(stable_prefix),
|
||||
"semi_stable_length": len(semi_stable),
|
||||
"dynamic_length": len(dynamic)
|
||||
})
|
||||
|
||||
# Keep history bounded
|
||||
if len(self._request_history) > 100:
|
||||
self._request_history = self._request_history[-100:]
|
||||
|
||||
# Assemble with clear boundaries
|
||||
# Using markdown horizontal rules as section separators
|
||||
return f"""{stable_prefix}
|
||||
|
||||
---
|
||||
|
||||
{semi_stable}
|
||||
|
||||
---
|
||||
|
||||
{dynamic}"""
|
||||
|
||||
def register_section(
|
||||
self,
|
||||
name: str,
|
||||
content: str,
|
||||
stability: str = "dynamic"
|
||||
) -> None:
|
||||
"""
|
||||
Register a context section for change tracking.
|
||||
|
||||
Args:
|
||||
name: Section identifier
|
||||
content: Section content
|
||||
stability: One of "stable", "semi_stable", "dynamic"
|
||||
"""
|
||||
section = ContextSection(
|
||||
name=name,
|
||||
content=content,
|
||||
stability=stability
|
||||
)
|
||||
section.last_hash = section.compute_hash()
|
||||
self._sections[name] = section
|
||||
|
||||
def check_section_changes(self) -> Dict[str, bool]:
|
||||
"""
|
||||
Check which sections have changed.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping section names to change status
|
||||
"""
|
||||
changes = {}
|
||||
for name, section in self._sections.items():
|
||||
changes[name] = section.has_changed()
|
||||
return changes
|
||||
|
||||
def get_stable_sections(self) -> List[str]:
|
||||
"""Get names of sections marked as stable."""
|
||||
return [
|
||||
name for name, section in self._sections.items()
|
||||
if section.stability == "stable"
|
||||
]
|
||||
|
||||
def get_report(self) -> str:
|
||||
"""Generate human-readable cache efficiency report."""
|
||||
return f"""
|
||||
Cache Efficiency Report
|
||||
=======================
|
||||
Requests: {self.stats.total_requests}
|
||||
Cache Hits: {self.stats.cache_hits}
|
||||
Cache Misses: {self.stats.cache_misses}
|
||||
Hit Rate: {self.stats.hit_rate:.1%}
|
||||
|
||||
Stable Prefix:
|
||||
- Characters: {self.stats.prefix_length_chars:,}
|
||||
- Estimated Tokens: {self.stats.prefix_length_tokens:,}
|
||||
|
||||
Cost Impact:
|
||||
- Estimated Savings: {self.stats.estimated_savings_percent:.0f}%
|
||||
- (Based on 10x cost difference for cached tokens)
|
||||
|
||||
Recommendations:
|
||||
{self._get_recommendations()}
|
||||
"""
|
||||
|
||||
def _get_recommendations(self) -> str:
|
||||
"""Generate optimization recommendations."""
|
||||
recommendations = []
|
||||
|
||||
if self.stats.hit_rate < 0.5 and self.stats.total_requests > 5:
|
||||
recommendations.append(
|
||||
"- Low cache hit rate: Check if stable prefix is actually stable"
|
||||
)
|
||||
|
||||
if self.stats.prefix_length_tokens > 5000:
|
||||
recommendations.append(
|
||||
"- Large stable prefix: Consider moving less-stable content to semi-stable"
|
||||
)
|
||||
|
||||
if self.stats.prefix_length_tokens < 1000:
|
||||
recommendations.append(
|
||||
"- Small stable prefix: Consider moving more content to stable section"
|
||||
)
|
||||
|
||||
if not recommendations:
|
||||
recommendations.append("- Cache performance looks good!")
|
||||
|
||||
return "\n".join(recommendations)
|
||||
|
||||
def get_stats_dict(self) -> Dict[str, Any]:
|
||||
"""Get statistics as dictionary."""
|
||||
return self.stats.to_dict()
|
||||
|
||||
def reset_stats(self) -> None:
|
||||
"""Reset all statistics."""
|
||||
self.stats = CacheStats()
|
||||
self._request_history = []
|
||||
|
||||
def save_stats(self, path: Path) -> None:
|
||||
"""Save statistics to JSON file."""
|
||||
data = {
|
||||
"stats": self.stats.to_dict(),
|
||||
"request_history": self._request_history[-50:], # Last 50
|
||||
"sections": {
|
||||
name: {
|
||||
"stability": s.stability,
|
||||
"content_length": len(s.content)
|
||||
}
|
||||
for name, s in self._sections.items()
|
||||
}
|
||||
}
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(path, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2)
|
||||
|
||||
@classmethod
|
||||
def load_stats(cls, path: Path) -> "ContextCacheOptimizer":
|
||||
"""Load statistics from JSON file."""
|
||||
optimizer = cls()
|
||||
|
||||
if not path.exists():
|
||||
return optimizer
|
||||
|
||||
with open(path, encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
stats = data.get("stats", {})
|
||||
optimizer.stats.total_requests = stats.get("total_requests", 0)
|
||||
optimizer.stats.cache_hits = stats.get("cache_hits", 0)
|
||||
optimizer.stats.cache_misses = stats.get("cache_misses", 0)
|
||||
optimizer.stats.prefix_length_chars = stats.get("prefix_length_chars", 0)
|
||||
optimizer.stats.prefix_length_tokens = stats.get("prefix_length_tokens", 0)
|
||||
|
||||
optimizer._request_history = data.get("request_history", [])
|
||||
|
||||
return optimizer
|
||||
|
||||
|
||||
class StablePrefixBuilder:
|
||||
"""
|
||||
Helper for building stable prefix content.
|
||||
|
||||
Ensures consistent ordering and formatting of stable content
|
||||
to maximize cache hits.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self._sections: List[tuple] = [] # (order, name, content)
|
||||
|
||||
def add_section(self, name: str, content: str, order: int = 50) -> "StablePrefixBuilder":
|
||||
"""
|
||||
Add a section to the stable prefix.
|
||||
|
||||
Args:
|
||||
name: Section name (for documentation)
|
||||
content: Section content
|
||||
order: Sort order (lower = earlier)
|
||||
|
||||
Returns:
|
||||
Self for chaining
|
||||
"""
|
||||
self._sections.append((order, name, content))
|
||||
return self
|
||||
|
||||
def add_identity(self, identity: str) -> "StablePrefixBuilder":
|
||||
"""Add identity section (order 10)."""
|
||||
return self.add_section("identity", identity, order=10)
|
||||
|
||||
def add_capabilities(self, capabilities: str) -> "StablePrefixBuilder":
|
||||
"""Add capabilities section (order 20)."""
|
||||
return self.add_section("capabilities", capabilities, order=20)
|
||||
|
||||
def add_tools(self, tools: str) -> "StablePrefixBuilder":
|
||||
"""Add tools section (order 30)."""
|
||||
return self.add_section("tools", tools, order=30)
|
||||
|
||||
def add_routing(self, routing: str) -> "StablePrefixBuilder":
|
||||
"""Add routing section (order 40)."""
|
||||
return self.add_section("routing", routing, order=40)
|
||||
|
||||
def build(self) -> str:
|
||||
"""
|
||||
Build the stable prefix string.
|
||||
|
||||
Sections are sorted by order to ensure consistency.
|
||||
|
||||
Returns:
|
||||
Assembled stable prefix
|
||||
"""
|
||||
# Sort by order
|
||||
sorted_sections = sorted(self._sections, key=lambda x: x[0])
|
||||
|
||||
lines = []
|
||||
for _, name, content in sorted_sections:
|
||||
lines.append(f"<!-- {name} -->")
|
||||
lines.append(content.strip())
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# Global cache optimizer instance
|
||||
_global_optimizer: Optional[ContextCacheOptimizer] = None
|
||||
|
||||
|
||||
def get_cache_optimizer() -> ContextCacheOptimizer:
|
||||
"""Get the global cache optimizer instance."""
|
||||
global _global_optimizer
|
||||
if _global_optimizer is None:
|
||||
_global_optimizer = ContextCacheOptimizer()
|
||||
return _global_optimizer
|
||||
520
optimization_engine/context/compaction.py
Normal file
520
optimization_engine/context/compaction.py
Normal file
@@ -0,0 +1,520 @@
|
||||
"""
|
||||
Atomizer Context Compaction - Long-Running Session Management
|
||||
|
||||
Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
|
||||
|
||||
Based on Google ADK's compaction architecture:
|
||||
- Trigger compaction when threshold reached
|
||||
- Summarize older events
|
||||
- Preserve recent detail
|
||||
- Never compact error events
|
||||
|
||||
This module handles context management for long-running optimizations
|
||||
that may exceed context window limits.
|
||||
"""
|
||||
|
||||
from typing import List, Dict, Any, Optional
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class EventType(Enum):
|
||||
"""Types of events in optimization context."""
|
||||
TRIAL_START = "trial_start"
|
||||
TRIAL_COMPLETE = "trial_complete"
|
||||
TRIAL_FAILED = "trial_failed"
|
||||
ERROR = "error"
|
||||
WARNING = "warning"
|
||||
MILESTONE = "milestone"
|
||||
COMPACTION = "compaction"
|
||||
STUDY_START = "study_start"
|
||||
STUDY_END = "study_end"
|
||||
CONFIG_CHANGE = "config_change"
|
||||
|
||||
|
||||
@dataclass
|
||||
class ContextEvent:
|
||||
"""
|
||||
Single event in optimization context.
|
||||
|
||||
Events are the atomic units of context history.
|
||||
They can be compacted (summarized) or preserved based on importance.
|
||||
"""
|
||||
timestamp: datetime
|
||||
event_type: EventType
|
||||
summary: str
|
||||
details: Dict[str, Any] = field(default_factory=dict)
|
||||
compacted: bool = False
|
||||
preserve: bool = False # If True, never compact this event
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"timestamp": self.timestamp.isoformat(),
|
||||
"event_type": self.event_type.value,
|
||||
"summary": self.summary,
|
||||
"details": self.details,
|
||||
"compacted": self.compacted,
|
||||
"preserve": self.preserve
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "ContextEvent":
|
||||
"""Create from dictionary."""
|
||||
return cls(
|
||||
timestamp=datetime.fromisoformat(data["timestamp"]),
|
||||
event_type=EventType(data["event_type"]),
|
||||
summary=data["summary"],
|
||||
details=data.get("details", {}),
|
||||
compacted=data.get("compacted", False),
|
||||
preserve=data.get("preserve", False)
|
||||
)
|
||||
|
||||
|
||||
class CompactionManager:
|
||||
"""
|
||||
Manages context compaction for long optimization sessions.
|
||||
|
||||
Strategy:
|
||||
- Keep last N events in full detail
|
||||
- Summarize older events into milestone markers
|
||||
- Preserve error events (never compact errors)
|
||||
- Track statistics for optimization insights
|
||||
|
||||
Usage:
|
||||
manager = CompactionManager(compaction_threshold=50, keep_recent=20)
|
||||
|
||||
# Add events as they occur
|
||||
manager.add_event(ContextEvent(
|
||||
timestamp=datetime.now(),
|
||||
event_type=EventType.TRIAL_COMPLETE,
|
||||
summary="Trial 42 complete: obj=100.5",
|
||||
details={"trial_number": 42, "objective": 100.5}
|
||||
))
|
||||
|
||||
# Get context string for LLM
|
||||
context = manager.get_context_string()
|
||||
|
||||
# Check if compaction occurred
|
||||
print(f"Compactions: {manager.compaction_count}")
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
compaction_threshold: int = 50,
|
||||
keep_recent: int = 20,
|
||||
keep_errors: bool = True
|
||||
):
|
||||
"""
|
||||
Initialize compaction manager.
|
||||
|
||||
Args:
|
||||
compaction_threshold: Trigger compaction when events exceed this
|
||||
keep_recent: Number of recent events to always keep in detail
|
||||
keep_errors: Whether to preserve all error events
|
||||
"""
|
||||
self.events: List[ContextEvent] = []
|
||||
self.compaction_threshold = compaction_threshold
|
||||
self.keep_recent = keep_recent
|
||||
self.keep_errors = keep_errors
|
||||
self.compaction_count = 0
|
||||
|
||||
# Statistics for compacted regions
|
||||
self._compaction_stats: List[Dict[str, Any]] = []
|
||||
|
||||
def add_event(self, event: ContextEvent) -> bool:
|
||||
"""
|
||||
Add event and trigger compaction if needed.
|
||||
|
||||
Args:
|
||||
event: The event to add
|
||||
|
||||
Returns:
|
||||
True if compaction was triggered
|
||||
"""
|
||||
# Mark errors as preserved
|
||||
if event.event_type == EventType.ERROR and self.keep_errors:
|
||||
event.preserve = True
|
||||
|
||||
self.events.append(event)
|
||||
|
||||
# Check if compaction needed
|
||||
if len(self.events) > self.compaction_threshold:
|
||||
self._compact()
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def add_trial_event(
|
||||
self,
|
||||
trial_number: int,
|
||||
success: bool,
|
||||
objective: Optional[float] = None,
|
||||
duration: Optional[float] = None
|
||||
) -> None:
|
||||
"""
|
||||
Convenience method to add a trial completion event.
|
||||
|
||||
Args:
|
||||
trial_number: Trial number
|
||||
success: Whether trial succeeded
|
||||
objective: Objective value (if successful)
|
||||
duration: Trial duration in seconds
|
||||
"""
|
||||
event_type = EventType.TRIAL_COMPLETE if success else EventType.TRIAL_FAILED
|
||||
|
||||
summary_parts = [f"Trial {trial_number}"]
|
||||
if success and objective is not None:
|
||||
summary_parts.append(f"obj={objective:.4g}")
|
||||
elif not success:
|
||||
summary_parts.append("FAILED")
|
||||
if duration is not None:
|
||||
summary_parts.append(f"{duration:.1f}s")
|
||||
|
||||
self.add_event(ContextEvent(
|
||||
timestamp=datetime.now(),
|
||||
event_type=event_type,
|
||||
summary=" | ".join(summary_parts),
|
||||
details={
|
||||
"trial_number": trial_number,
|
||||
"success": success,
|
||||
"objective": objective,
|
||||
"duration": duration
|
||||
}
|
||||
))
|
||||
|
||||
def add_error_event(self, error_message: str, error_type: str = "") -> None:
|
||||
"""
|
||||
Add an error event (always preserved).
|
||||
|
||||
Args:
|
||||
error_message: Error description
|
||||
error_type: Optional error classification
|
||||
"""
|
||||
summary = f"[{error_type}] {error_message}" if error_type else error_message
|
||||
|
||||
self.add_event(ContextEvent(
|
||||
timestamp=datetime.now(),
|
||||
event_type=EventType.ERROR,
|
||||
summary=summary,
|
||||
details={"error_type": error_type, "message": error_message},
|
||||
preserve=True
|
||||
))
|
||||
|
||||
def add_milestone(self, description: str, details: Optional[Dict[str, Any]] = None) -> None:
|
||||
"""
|
||||
Add a milestone event (preserved).
|
||||
|
||||
Args:
|
||||
description: Milestone description
|
||||
details: Optional additional details
|
||||
"""
|
||||
self.add_event(ContextEvent(
|
||||
timestamp=datetime.now(),
|
||||
event_type=EventType.MILESTONE,
|
||||
summary=description,
|
||||
details=details or {},
|
||||
preserve=True
|
||||
))
|
||||
|
||||
def _compact(self) -> None:
|
||||
"""
|
||||
Compact older events into summaries.
|
||||
|
||||
Preserves:
|
||||
- All error events (if keep_errors=True)
|
||||
- Events marked with preserve=True
|
||||
- Last `keep_recent` events
|
||||
- Milestone summaries of compacted regions
|
||||
"""
|
||||
if len(self.events) <= self.keep_recent:
|
||||
return
|
||||
|
||||
# Split into old and recent
|
||||
old_events = self.events[:-self.keep_recent]
|
||||
recent_events = self.events[-self.keep_recent:]
|
||||
|
||||
# Separate preserved from compactable
|
||||
preserved_events = [e for e in old_events if e.preserve]
|
||||
compactable_events = [e for e in old_events if not e.preserve]
|
||||
|
||||
# Summarize compactable events
|
||||
if compactable_events:
|
||||
summary = self._create_summary(compactable_events)
|
||||
|
||||
compaction_event = ContextEvent(
|
||||
timestamp=compactable_events[0].timestamp,
|
||||
event_type=EventType.COMPACTION,
|
||||
summary=summary,
|
||||
details={
|
||||
"events_compacted": len(compactable_events),
|
||||
"compaction_number": self.compaction_count,
|
||||
"time_range": {
|
||||
"start": compactable_events[0].timestamp.isoformat(),
|
||||
"end": compactable_events[-1].timestamp.isoformat()
|
||||
}
|
||||
},
|
||||
compacted=True
|
||||
)
|
||||
|
||||
self.compaction_count += 1
|
||||
|
||||
# Store compaction statistics
|
||||
self._compaction_stats.append({
|
||||
"compaction_number": self.compaction_count,
|
||||
"events_compacted": len(compactable_events),
|
||||
"summary": summary
|
||||
})
|
||||
|
||||
# Rebuild events list
|
||||
self.events = [compaction_event] + preserved_events + recent_events
|
||||
else:
|
||||
self.events = preserved_events + recent_events
|
||||
|
||||
def _create_summary(self, events: List[ContextEvent]) -> str:
|
||||
"""
|
||||
Create summary of compacted events.
|
||||
|
||||
Args:
|
||||
events: List of events to summarize
|
||||
|
||||
Returns:
|
||||
Summary string
|
||||
"""
|
||||
# Collect trial statistics
|
||||
trial_events = [
|
||||
e for e in events
|
||||
if e.event_type in (EventType.TRIAL_COMPLETE, EventType.TRIAL_FAILED)
|
||||
]
|
||||
|
||||
if not trial_events:
|
||||
return f"[{len(events)} events compacted]"
|
||||
|
||||
# Extract trial statistics
|
||||
trial_numbers = []
|
||||
objectives = []
|
||||
failures = 0
|
||||
|
||||
for e in trial_events:
|
||||
if "trial_number" in e.details:
|
||||
trial_numbers.append(e.details["trial_number"])
|
||||
if "objective" in e.details and e.details["objective"] is not None:
|
||||
objectives.append(e.details["objective"])
|
||||
if e.event_type == EventType.TRIAL_FAILED:
|
||||
failures += 1
|
||||
|
||||
if trial_numbers and objectives:
|
||||
return (
|
||||
f"Trials {min(trial_numbers)}-{max(trial_numbers)}: "
|
||||
f"Best={min(objectives):.4g}, "
|
||||
f"Avg={sum(objectives)/len(objectives):.4g}, "
|
||||
f"Failures={failures}"
|
||||
)
|
||||
elif trial_numbers:
|
||||
return f"Trials {min(trial_numbers)}-{max(trial_numbers)} ({failures} failures)"
|
||||
else:
|
||||
return f"[{len(events)} events compacted]"
|
||||
|
||||
def get_context_string(self, include_timestamps: bool = False) -> str:
|
||||
"""
|
||||
Generate context string from events.
|
||||
|
||||
Args:
|
||||
include_timestamps: Whether to include timestamps
|
||||
|
||||
Returns:
|
||||
Formatted context string for LLM
|
||||
"""
|
||||
lines = ["## Optimization History", ""]
|
||||
|
||||
for event in self.events:
|
||||
timestamp = ""
|
||||
if include_timestamps:
|
||||
timestamp = f"[{event.timestamp.strftime('%H:%M:%S')}] "
|
||||
|
||||
if event.compacted:
|
||||
lines.append(f"📦 {timestamp}{event.summary}")
|
||||
elif event.event_type == EventType.ERROR:
|
||||
lines.append(f"❌ {timestamp}{event.summary}")
|
||||
elif event.event_type == EventType.WARNING:
|
||||
lines.append(f"⚠️ {timestamp}{event.summary}")
|
||||
elif event.event_type == EventType.MILESTONE:
|
||||
lines.append(f"🎯 {timestamp}{event.summary}")
|
||||
elif event.event_type == EventType.TRIAL_FAILED:
|
||||
lines.append(f"✗ {timestamp}{event.summary}")
|
||||
elif event.event_type == EventType.TRIAL_COMPLETE:
|
||||
lines.append(f"✓ {timestamp}{event.summary}")
|
||||
else:
|
||||
lines.append(f"- {timestamp}{event.summary}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def get_stats(self) -> Dict[str, Any]:
|
||||
"""Get compaction statistics."""
|
||||
event_counts = {}
|
||||
for event in self.events:
|
||||
etype = event.event_type.value
|
||||
event_counts[etype] = event_counts.get(etype, 0) + 1
|
||||
|
||||
return {
|
||||
"total_events": len(self.events),
|
||||
"compaction_count": self.compaction_count,
|
||||
"events_by_type": event_counts,
|
||||
"error_events": event_counts.get("error", 0),
|
||||
"compacted_events": len([e for e in self.events if e.compacted]),
|
||||
"preserved_events": len([e for e in self.events if e.preserve]),
|
||||
"compaction_history": self._compaction_stats[-5:] # Last 5
|
||||
}
|
||||
|
||||
def get_recent_events(self, n: int = 10) -> List[ContextEvent]:
|
||||
"""Get the n most recent events."""
|
||||
return self.events[-n:]
|
||||
|
||||
def get_errors(self) -> List[ContextEvent]:
|
||||
"""Get all error events."""
|
||||
return [e for e in self.events if e.event_type == EventType.ERROR]
|
||||
|
||||
def clear(self) -> None:
|
||||
"""Clear all events and reset state."""
|
||||
self.events = []
|
||||
self.compaction_count = 0
|
||||
self._compaction_stats = []
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary for serialization."""
|
||||
return {
|
||||
"events": [e.to_dict() for e in self.events],
|
||||
"compaction_threshold": self.compaction_threshold,
|
||||
"keep_recent": self.keep_recent,
|
||||
"keep_errors": self.keep_errors,
|
||||
"compaction_count": self.compaction_count,
|
||||
"compaction_stats": self._compaction_stats
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "CompactionManager":
|
||||
"""Create from dictionary."""
|
||||
manager = cls(
|
||||
compaction_threshold=data.get("compaction_threshold", 50),
|
||||
keep_recent=data.get("keep_recent", 20),
|
||||
keep_errors=data.get("keep_errors", True)
|
||||
)
|
||||
manager.events = [ContextEvent.from_dict(e) for e in data.get("events", [])]
|
||||
manager.compaction_count = data.get("compaction_count", 0)
|
||||
manager._compaction_stats = data.get("compaction_stats", [])
|
||||
return manager
|
||||
|
||||
|
||||
class ContextBudgetManager:
|
||||
"""
|
||||
Manages overall context budget across sessions.
|
||||
|
||||
Tracks:
|
||||
- Token estimates for each context section
|
||||
- Recommendations for context reduction
|
||||
- Budget allocation warnings
|
||||
"""
|
||||
|
||||
# Approximate tokens per character
|
||||
CHARS_PER_TOKEN = 4
|
||||
|
||||
# Default budget allocation (tokens)
|
||||
DEFAULT_BUDGET = {
|
||||
"stable_prefix": 5000,
|
||||
"protocols": 10000,
|
||||
"playbook": 5000,
|
||||
"session_state": 2000,
|
||||
"conversation": 30000,
|
||||
"working_space": 48000,
|
||||
"total": 100000
|
||||
}
|
||||
|
||||
def __init__(self, budget: Optional[Dict[str, int]] = None):
|
||||
"""
|
||||
Initialize budget manager.
|
||||
|
||||
Args:
|
||||
budget: Custom budget allocation (uses defaults if not provided)
|
||||
"""
|
||||
self.budget = budget or self.DEFAULT_BUDGET.copy()
|
||||
self._current_usage: Dict[str, int] = {k: 0 for k in self.budget.keys()}
|
||||
|
||||
def estimate_tokens(self, text: str) -> int:
|
||||
"""Estimate token count for text."""
|
||||
return len(text) // self.CHARS_PER_TOKEN
|
||||
|
||||
def update_usage(self, section: str, text: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Update usage for a section.
|
||||
|
||||
Args:
|
||||
section: Budget section name
|
||||
text: Content of the section
|
||||
|
||||
Returns:
|
||||
Usage status with warnings if over budget
|
||||
"""
|
||||
tokens = self.estimate_tokens(text)
|
||||
self._current_usage[section] = tokens
|
||||
|
||||
result = {
|
||||
"section": section,
|
||||
"tokens": tokens,
|
||||
"budget": self.budget.get(section, 0),
|
||||
"over_budget": tokens > self.budget.get(section, float('inf'))
|
||||
}
|
||||
|
||||
if result["over_budget"]:
|
||||
result["warning"] = f"{section} exceeds budget by {tokens - self.budget[section]} tokens"
|
||||
|
||||
return result
|
||||
|
||||
def get_total_usage(self) -> int:
|
||||
"""Get total token usage across all sections."""
|
||||
return sum(self._current_usage.values())
|
||||
|
||||
def get_status(self) -> Dict[str, Any]:
|
||||
"""Get overall budget status."""
|
||||
total_used = self.get_total_usage()
|
||||
total_budget = self.budget.get("total", 100000)
|
||||
|
||||
return {
|
||||
"total_used": total_used,
|
||||
"total_budget": total_budget,
|
||||
"utilization": total_used / total_budget,
|
||||
"by_section": {
|
||||
section: {
|
||||
"used": self._current_usage.get(section, 0),
|
||||
"budget": self.budget.get(section, 0),
|
||||
"utilization": (
|
||||
self._current_usage.get(section, 0) / self.budget.get(section, 1)
|
||||
if self.budget.get(section, 0) > 0 else 0
|
||||
)
|
||||
}
|
||||
for section in self.budget.keys()
|
||||
if section != "total"
|
||||
},
|
||||
"recommendations": self._get_recommendations()
|
||||
}
|
||||
|
||||
def _get_recommendations(self) -> List[str]:
|
||||
"""Generate budget recommendations."""
|
||||
recommendations = []
|
||||
total_used = self.get_total_usage()
|
||||
total_budget = self.budget.get("total", 100000)
|
||||
|
||||
if total_used > total_budget * 0.9:
|
||||
recommendations.append("Context usage > 90%. Consider triggering compaction.")
|
||||
|
||||
for section, used in self._current_usage.items():
|
||||
budget = self.budget.get(section, 0)
|
||||
if budget > 0 and used > budget:
|
||||
recommendations.append(
|
||||
f"{section}: {used - budget} tokens over budget. Reduce content."
|
||||
)
|
||||
|
||||
if not recommendations:
|
||||
recommendations.append("Budget healthy.")
|
||||
|
||||
return recommendations
|
||||
378
optimization_engine/context/feedback_loop.py
Normal file
378
optimization_engine/context/feedback_loop.py
Normal file
@@ -0,0 +1,378 @@
|
||||
"""
|
||||
Atomizer Feedback Loop - Automated Learning from Execution
|
||||
|
||||
Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
|
||||
|
||||
Connects optimization outcomes to playbook updates using the principle:
|
||||
"Leverage natural execution feedback as the learning signal"
|
||||
|
||||
The feedback loop:
|
||||
1. Observes trial outcomes (success/failure)
|
||||
2. Tracks which playbook items were active during each trial
|
||||
3. Updates helpful/harmful counts based on outcomes
|
||||
4. Commits new insights from the reflector
|
||||
|
||||
This implements true self-improvement: the system gets better
|
||||
at optimization over time by learning from its own execution.
|
||||
"""
|
||||
|
||||
from typing import Dict, Any, List, Optional
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
import json
|
||||
|
||||
from .playbook import AtomizerPlaybook, InsightCategory
|
||||
from .reflector import AtomizerReflector, OptimizationOutcome
|
||||
|
||||
|
||||
class FeedbackLoop:
|
||||
"""
|
||||
Automated feedback loop that learns from optimization runs.
|
||||
|
||||
Key insight from ACE: Use execution feedback (success/failure)
|
||||
as the learning signal, not labeled data.
|
||||
|
||||
Usage:
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
# After each trial
|
||||
feedback.process_trial_result(
|
||||
trial_number=42,
|
||||
success=True,
|
||||
objective_value=100.5,
|
||||
design_variables={"thickness": 1.5},
|
||||
context_items_used=["str-00001", "mis-00003"]
|
||||
)
|
||||
|
||||
# After study completion
|
||||
result = feedback.finalize_study(study_stats)
|
||||
print(f"Added {result['insights_added']} insights")
|
||||
"""
|
||||
|
||||
def __init__(self, playbook_path: Path):
|
||||
"""
|
||||
Initialize feedback loop with playbook path.
|
||||
|
||||
Args:
|
||||
playbook_path: Path to the playbook JSON file
|
||||
"""
|
||||
self.playbook_path = playbook_path
|
||||
self.playbook = AtomizerPlaybook.load(playbook_path)
|
||||
self.reflector = AtomizerReflector(self.playbook)
|
||||
|
||||
# Track items used per trial for attribution
|
||||
self._trial_item_usage: Dict[int, List[str]] = {}
|
||||
|
||||
# Track outcomes for batch analysis
|
||||
self._outcomes: List[OptimizationOutcome] = []
|
||||
|
||||
# Statistics
|
||||
self._total_trials_processed = 0
|
||||
self._successful_trials = 0
|
||||
self._failed_trials = 0
|
||||
|
||||
def process_trial_result(
|
||||
self,
|
||||
trial_number: int,
|
||||
success: bool,
|
||||
objective_value: float,
|
||||
design_variables: Dict[str, float],
|
||||
context_items_used: Optional[List[str]] = None,
|
||||
errors: Optional[List[str]] = None,
|
||||
extractor_used: str = "",
|
||||
duration_seconds: float = 0.0
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Process a trial result and update playbook accordingly.
|
||||
|
||||
This is the core learning mechanism:
|
||||
- If trial succeeded with certain playbook items -> increase helpful count
|
||||
- If trial failed with certain playbook items -> increase harmful count
|
||||
|
||||
Args:
|
||||
trial_number: Trial number
|
||||
success: Whether the trial succeeded
|
||||
objective_value: Objective function value (0 if failed)
|
||||
design_variables: Design variable values used
|
||||
context_items_used: List of playbook item IDs in context
|
||||
errors: List of error messages (if any)
|
||||
extractor_used: Name of extractor used
|
||||
duration_seconds: Trial duration
|
||||
|
||||
Returns:
|
||||
Dictionary with processing results
|
||||
"""
|
||||
context_items_used = context_items_used or []
|
||||
errors = errors or []
|
||||
|
||||
# Update statistics
|
||||
self._total_trials_processed += 1
|
||||
if success:
|
||||
self._successful_trials += 1
|
||||
else:
|
||||
self._failed_trials += 1
|
||||
|
||||
# Track item usage for this trial
|
||||
self._trial_item_usage[trial_number] = context_items_used
|
||||
|
||||
# Update playbook item scores based on outcome
|
||||
items_updated = 0
|
||||
for item_id in context_items_used:
|
||||
if self.playbook.record_outcome(item_id, helpful=success):
|
||||
items_updated += 1
|
||||
|
||||
# Create outcome for reflection
|
||||
outcome = OptimizationOutcome(
|
||||
trial_number=trial_number,
|
||||
success=success,
|
||||
objective_value=objective_value if success else None,
|
||||
constraint_violations=[],
|
||||
solver_errors=errors,
|
||||
design_variables=design_variables,
|
||||
extractor_used=extractor_used,
|
||||
duration_seconds=duration_seconds
|
||||
)
|
||||
|
||||
# Store outcome
|
||||
self._outcomes.append(outcome)
|
||||
|
||||
# Reflect on outcome
|
||||
insights = self.reflector.analyze_trial(outcome)
|
||||
|
||||
return {
|
||||
"trial_number": trial_number,
|
||||
"success": success,
|
||||
"items_updated": items_updated,
|
||||
"insights_extracted": len(insights)
|
||||
}
|
||||
|
||||
def record_error(
|
||||
self,
|
||||
trial_number: int,
|
||||
error_type: str,
|
||||
error_message: str,
|
||||
context_items_used: Optional[List[str]] = None
|
||||
) -> None:
|
||||
"""
|
||||
Record an error for a trial.
|
||||
|
||||
Separate from process_trial_result for cases where
|
||||
we want to record errors without full trial data.
|
||||
|
||||
Args:
|
||||
trial_number: Trial number
|
||||
error_type: Classification of error
|
||||
error_message: Error details
|
||||
context_items_used: Playbook items that were active
|
||||
"""
|
||||
context_items_used = context_items_used or []
|
||||
|
||||
# Mark items as harmful
|
||||
for item_id in context_items_used:
|
||||
self.playbook.record_outcome(item_id, helpful=False)
|
||||
|
||||
# Create insight about the error
|
||||
self.reflector.pending_insights.append({
|
||||
"category": InsightCategory.MISTAKE,
|
||||
"content": f"{error_type}: {error_message[:200]}",
|
||||
"helpful": False,
|
||||
"trial": trial_number
|
||||
})
|
||||
|
||||
def finalize_study(
|
||||
self,
|
||||
study_stats: Dict[str, Any],
|
||||
save_playbook: bool = True
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Called when study completes. Commits insights and prunes playbook.
|
||||
|
||||
Args:
|
||||
study_stats: Dictionary with study statistics:
|
||||
- name: Study name
|
||||
- total_trials: Total trials run
|
||||
- best_value: Best objective achieved
|
||||
- convergence_rate: Success rate (0.0-1.0)
|
||||
- method: Optimization method used
|
||||
save_playbook: Whether to save playbook to disk
|
||||
|
||||
Returns:
|
||||
Dictionary with finalization results
|
||||
"""
|
||||
# Analyze study-level patterns
|
||||
study_insights = self.reflector.analyze_study_completion(
|
||||
study_name=study_stats.get("name", "unknown"),
|
||||
total_trials=study_stats.get("total_trials", 0),
|
||||
best_value=study_stats.get("best_value", 0),
|
||||
convergence_rate=study_stats.get("convergence_rate", 0),
|
||||
method=study_stats.get("method", "")
|
||||
)
|
||||
|
||||
# Commit all pending insights
|
||||
insights_added = self.reflector.commit_insights()
|
||||
|
||||
# Prune consistently harmful items
|
||||
items_pruned = self.playbook.prune_harmful(threshold=-3)
|
||||
|
||||
# Save updated playbook
|
||||
if save_playbook:
|
||||
self.playbook.save(self.playbook_path)
|
||||
|
||||
return {
|
||||
"insights_added": insights_added,
|
||||
"items_pruned": items_pruned,
|
||||
"playbook_size": len(self.playbook.items),
|
||||
"playbook_version": self.playbook.version,
|
||||
"total_trials_processed": self._total_trials_processed,
|
||||
"successful_trials": self._successful_trials,
|
||||
"failed_trials": self._failed_trials,
|
||||
"success_rate": (
|
||||
self._successful_trials / self._total_trials_processed
|
||||
if self._total_trials_processed > 0 else 0
|
||||
)
|
||||
}
|
||||
|
||||
def get_item_performance(self) -> Dict[str, Dict[str, Any]]:
|
||||
"""
|
||||
Get performance metrics for all playbook items.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping item IDs to performance stats
|
||||
"""
|
||||
performance = {}
|
||||
for item_id, item in self.playbook.items.items():
|
||||
trials_used_in = [
|
||||
trial for trial, items in self._trial_item_usage.items()
|
||||
if item_id in items
|
||||
]
|
||||
performance[item_id] = {
|
||||
"helpful_count": item.helpful_count,
|
||||
"harmful_count": item.harmful_count,
|
||||
"net_score": item.net_score,
|
||||
"confidence": item.confidence,
|
||||
"trials_used_in": len(trials_used_in),
|
||||
"category": item.category.value,
|
||||
"content_preview": item.content[:100]
|
||||
}
|
||||
return performance
|
||||
|
||||
def get_top_performers(self, n: int = 10) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Get the top performing playbook items.
|
||||
|
||||
Args:
|
||||
n: Number of top items to return
|
||||
|
||||
Returns:
|
||||
List of item performance dictionaries
|
||||
"""
|
||||
performance = self.get_item_performance()
|
||||
sorted_items = sorted(
|
||||
performance.items(),
|
||||
key=lambda x: x[1]["net_score"],
|
||||
reverse=True
|
||||
)
|
||||
return [
|
||||
{"id": item_id, **stats}
|
||||
for item_id, stats in sorted_items[:n]
|
||||
]
|
||||
|
||||
def get_worst_performers(self, n: int = 10) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Get the worst performing playbook items.
|
||||
|
||||
Args:
|
||||
n: Number of worst items to return
|
||||
|
||||
Returns:
|
||||
List of item performance dictionaries
|
||||
"""
|
||||
performance = self.get_item_performance()
|
||||
sorted_items = sorted(
|
||||
performance.items(),
|
||||
key=lambda x: x[1]["net_score"]
|
||||
)
|
||||
return [
|
||||
{"id": item_id, **stats}
|
||||
for item_id, stats in sorted_items[:n]
|
||||
]
|
||||
|
||||
def get_statistics(self) -> Dict[str, Any]:
|
||||
"""Get feedback loop statistics."""
|
||||
return {
|
||||
"total_trials_processed": self._total_trials_processed,
|
||||
"successful_trials": self._successful_trials,
|
||||
"failed_trials": self._failed_trials,
|
||||
"success_rate": (
|
||||
self._successful_trials / self._total_trials_processed
|
||||
if self._total_trials_processed > 0 else 0
|
||||
),
|
||||
"playbook_items": len(self.playbook.items),
|
||||
"pending_insights": self.reflector.get_pending_count(),
|
||||
"outcomes_recorded": len(self._outcomes)
|
||||
}
|
||||
|
||||
def export_learning_report(self, path: Path) -> None:
|
||||
"""
|
||||
Export a detailed learning report.
|
||||
|
||||
Args:
|
||||
path: Path to save the report
|
||||
"""
|
||||
report = {
|
||||
"generated_at": datetime.now().isoformat(),
|
||||
"statistics": self.get_statistics(),
|
||||
"top_performers": self.get_top_performers(20),
|
||||
"worst_performers": self.get_worst_performers(10),
|
||||
"playbook_stats": self.playbook.get_stats(),
|
||||
"outcomes_summary": {
|
||||
"total": len(self._outcomes),
|
||||
"by_success": {
|
||||
"success": len([o for o in self._outcomes if o.success]),
|
||||
"failure": len([o for o in self._outcomes if not o.success])
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(path, 'w', encoding='utf-8') as f:
|
||||
json.dump(report, f, indent=2)
|
||||
|
||||
def reset(self) -> None:
|
||||
"""Reset the feedback loop state (keeps playbook)."""
|
||||
self._trial_item_usage = {}
|
||||
self._outcomes = []
|
||||
self._total_trials_processed = 0
|
||||
self._successful_trials = 0
|
||||
self._failed_trials = 0
|
||||
self.reflector = AtomizerReflector(self.playbook)
|
||||
|
||||
|
||||
class FeedbackLoopFactory:
|
||||
"""Factory for creating feedback loops."""
|
||||
|
||||
@staticmethod
|
||||
def create_for_study(study_dir: Path) -> FeedbackLoop:
|
||||
"""
|
||||
Create a feedback loop for a specific study.
|
||||
|
||||
Args:
|
||||
study_dir: Path to study directory
|
||||
|
||||
Returns:
|
||||
Configured FeedbackLoop
|
||||
"""
|
||||
playbook_path = study_dir / "3_results" / "playbook.json"
|
||||
return FeedbackLoop(playbook_path)
|
||||
|
||||
@staticmethod
|
||||
def create_global() -> FeedbackLoop:
|
||||
"""
|
||||
Create a feedback loop using the global playbook.
|
||||
|
||||
Returns:
|
||||
FeedbackLoop using global playbook path
|
||||
"""
|
||||
from pathlib import Path
|
||||
playbook_path = Path(__file__).parents[2] / "knowledge_base" / "playbook.json"
|
||||
return FeedbackLoop(playbook_path)
|
||||
432
optimization_engine/context/playbook.py
Normal file
432
optimization_engine/context/playbook.py
Normal file
@@ -0,0 +1,432 @@
|
||||
"""
|
||||
Atomizer Playbook - Structured Knowledge Store
|
||||
|
||||
Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
|
||||
Based on ACE framework principles:
|
||||
- Incremental delta updates (never rewrite wholesale)
|
||||
- Helpful/harmful tracking for each insight
|
||||
- Semantic deduplication
|
||||
- Category-based organization
|
||||
|
||||
This module provides the core data structures for accumulating optimization
|
||||
knowledge across sessions.
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import List, Dict, Optional, Any
|
||||
from enum import Enum
|
||||
import json
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
import hashlib
|
||||
|
||||
|
||||
class InsightCategory(Enum):
|
||||
"""Categories for playbook insights."""
|
||||
STRATEGY = "str" # Optimization strategies
|
||||
CALCULATION = "cal" # Formulas and calculations
|
||||
MISTAKE = "mis" # Common mistakes to avoid
|
||||
TOOL = "tool" # Tool usage patterns
|
||||
DOMAIN = "dom" # Domain-specific knowledge (FEA, NX)
|
||||
WORKFLOW = "wf" # Workflow patterns
|
||||
|
||||
|
||||
@dataclass
|
||||
class PlaybookItem:
|
||||
"""
|
||||
Single insight in the playbook with helpful/harmful tracking.
|
||||
|
||||
Each item accumulates feedback over time:
|
||||
- helpful_count: Times this insight led to success
|
||||
- harmful_count: Times this insight led to failure
|
||||
- net_score: helpful - harmful (used for ranking)
|
||||
- confidence: helpful / (helpful + harmful)
|
||||
"""
|
||||
id: str
|
||||
category: InsightCategory
|
||||
content: str
|
||||
helpful_count: int = 0
|
||||
harmful_count: int = 0
|
||||
created_at: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||
last_used: Optional[str] = None
|
||||
source_trials: List[int] = field(default_factory=list)
|
||||
tags: List[str] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def net_score(self) -> int:
|
||||
"""Net helpfulness score (helpful - harmful)."""
|
||||
return self.helpful_count - self.harmful_count
|
||||
|
||||
@property
|
||||
def confidence(self) -> float:
|
||||
"""Confidence score (0.0-1.0) based on outcome ratio."""
|
||||
total = self.helpful_count + self.harmful_count
|
||||
if total == 0:
|
||||
return 0.5 # Neutral confidence for untested items
|
||||
return self.helpful_count / total
|
||||
|
||||
def to_context_string(self) -> str:
|
||||
"""Format for injection into LLM context."""
|
||||
return f"[{self.id}] helpful={self.helpful_count} harmful={self.harmful_count} :: {self.content}"
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary for serialization."""
|
||||
return {
|
||||
"id": self.id,
|
||||
"category": self.category.value,
|
||||
"content": self.content,
|
||||
"helpful_count": self.helpful_count,
|
||||
"harmful_count": self.harmful_count,
|
||||
"created_at": self.created_at,
|
||||
"last_used": self.last_used,
|
||||
"source_trials": self.source_trials,
|
||||
"tags": self.tags
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "PlaybookItem":
|
||||
"""Create from dictionary."""
|
||||
return cls(
|
||||
id=data["id"],
|
||||
category=InsightCategory(data["category"]),
|
||||
content=data["content"],
|
||||
helpful_count=data.get("helpful_count", 0),
|
||||
harmful_count=data.get("harmful_count", 0),
|
||||
created_at=data.get("created_at", ""),
|
||||
last_used=data.get("last_used"),
|
||||
source_trials=data.get("source_trials", []),
|
||||
tags=data.get("tags", [])
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class AtomizerPlaybook:
|
||||
"""
|
||||
Evolving playbook that accumulates optimization knowledge.
|
||||
|
||||
Based on ACE framework principles:
|
||||
- Incremental delta updates (never rewrite wholesale)
|
||||
- Helpful/harmful tracking for each insight
|
||||
- Semantic deduplication
|
||||
- Category-based organization
|
||||
|
||||
Usage:
|
||||
playbook = AtomizerPlaybook.load(path)
|
||||
item = playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements for thin walls")
|
||||
playbook.record_outcome(item.id, helpful=True)
|
||||
playbook.save(path)
|
||||
"""
|
||||
items: Dict[str, PlaybookItem] = field(default_factory=dict)
|
||||
version: int = 1
|
||||
last_updated: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||
|
||||
def _generate_id(self, category: InsightCategory) -> str:
|
||||
"""Generate unique ID for new item."""
|
||||
existing = [k for k in self.items.keys() if k.startswith(category.value)]
|
||||
next_num = len(existing) + 1
|
||||
return f"{category.value}-{next_num:05d}"
|
||||
|
||||
def _content_hash(self, content: str) -> str:
|
||||
"""Generate hash for content deduplication."""
|
||||
normalized = content.lower().strip()
|
||||
return hashlib.md5(normalized.encode()).hexdigest()[:12]
|
||||
|
||||
def add_insight(
|
||||
self,
|
||||
category: InsightCategory,
|
||||
content: str,
|
||||
source_trial: Optional[int] = None,
|
||||
tags: Optional[List[str]] = None
|
||||
) -> PlaybookItem:
|
||||
"""
|
||||
Add new insight with delta update (ACE principle).
|
||||
|
||||
Checks for semantic duplicates before adding.
|
||||
If duplicate found, increments helpful_count instead.
|
||||
|
||||
Args:
|
||||
category: Type of insight
|
||||
content: The insight text
|
||||
source_trial: Trial number that generated this insight
|
||||
tags: Optional tags for filtering
|
||||
|
||||
Returns:
|
||||
The created or updated PlaybookItem
|
||||
"""
|
||||
content_hash = self._content_hash(content)
|
||||
|
||||
# Check for near-duplicates
|
||||
for item in self.items.values():
|
||||
existing_hash = self._content_hash(item.content)
|
||||
if content_hash == existing_hash:
|
||||
# Update existing instead of adding duplicate
|
||||
item.helpful_count += 1
|
||||
if source_trial and source_trial not in item.source_trials:
|
||||
item.source_trials.append(source_trial)
|
||||
if tags:
|
||||
item.tags = list(set(item.tags + tags))
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
return item
|
||||
|
||||
# Create new item
|
||||
item_id = self._generate_id(category)
|
||||
item = PlaybookItem(
|
||||
id=item_id,
|
||||
category=category,
|
||||
content=content,
|
||||
source_trials=[source_trial] if source_trial else [],
|
||||
tags=tags or []
|
||||
)
|
||||
self.items[item_id] = item
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
self.version += 1
|
||||
return item
|
||||
|
||||
def record_outcome(self, item_id: str, helpful: bool) -> bool:
|
||||
"""
|
||||
Record whether using this insight was helpful or harmful.
|
||||
|
||||
Args:
|
||||
item_id: The playbook item ID
|
||||
helpful: True if outcome was positive, False if negative
|
||||
|
||||
Returns:
|
||||
True if item was found and updated, False otherwise
|
||||
"""
|
||||
if item_id not in self.items:
|
||||
return False
|
||||
|
||||
if helpful:
|
||||
self.items[item_id].helpful_count += 1
|
||||
else:
|
||||
self.items[item_id].harmful_count += 1
|
||||
self.items[item_id].last_used = datetime.now().isoformat()
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
return True
|
||||
|
||||
def get_context_for_task(
|
||||
self,
|
||||
task_type: str,
|
||||
max_items: int = 20,
|
||||
min_confidence: float = 0.5,
|
||||
tags: Optional[List[str]] = None
|
||||
) -> str:
|
||||
"""
|
||||
Generate context string for LLM consumption.
|
||||
|
||||
Filters by relevance and confidence, sorted by net score.
|
||||
|
||||
Args:
|
||||
task_type: Type of task (for filtering)
|
||||
max_items: Maximum items to include
|
||||
min_confidence: Minimum confidence threshold
|
||||
tags: Optional tags to filter by
|
||||
|
||||
Returns:
|
||||
Formatted context string for LLM
|
||||
"""
|
||||
relevant_items = [
|
||||
item for item in self.items.values()
|
||||
if item.confidence >= min_confidence
|
||||
]
|
||||
|
||||
# Filter by tags if provided
|
||||
if tags:
|
||||
relevant_items = [
|
||||
item for item in relevant_items
|
||||
if any(tag in item.tags for tag in tags)
|
||||
]
|
||||
|
||||
# Sort by net score (most helpful first)
|
||||
relevant_items.sort(key=lambda x: x.net_score, reverse=True)
|
||||
|
||||
# Group by category
|
||||
sections: Dict[str, List[str]] = {}
|
||||
for item in relevant_items[:max_items]:
|
||||
cat_name = item.category.name
|
||||
if cat_name not in sections:
|
||||
sections[cat_name] = []
|
||||
sections[cat_name].append(item.to_context_string())
|
||||
|
||||
# Build context string
|
||||
lines = ["## Atomizer Knowledge Playbook", ""]
|
||||
for cat_name, items in sections.items():
|
||||
lines.append(f"### {cat_name}")
|
||||
lines.extend(items)
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def search_by_content(
|
||||
self,
|
||||
query: str,
|
||||
category: Optional[InsightCategory] = None,
|
||||
limit: int = 5
|
||||
) -> List[PlaybookItem]:
|
||||
"""
|
||||
Search playbook items by content similarity.
|
||||
|
||||
Simple keyword matching - could be enhanced with embeddings.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
category: Optional category filter
|
||||
limit: Maximum results
|
||||
|
||||
Returns:
|
||||
List of matching items sorted by relevance
|
||||
"""
|
||||
query_lower = query.lower()
|
||||
query_words = set(query_lower.split())
|
||||
|
||||
scored_items = []
|
||||
for item in self.items.values():
|
||||
if category and item.category != category:
|
||||
continue
|
||||
|
||||
content_lower = item.content.lower()
|
||||
content_words = set(content_lower.split())
|
||||
|
||||
# Simple word overlap scoring
|
||||
overlap = len(query_words & content_words)
|
||||
if overlap > 0 or query_lower in content_lower:
|
||||
score = overlap + (1 if query_lower in content_lower else 0)
|
||||
scored_items.append((score, item))
|
||||
|
||||
scored_items.sort(key=lambda x: (-x[0], -x[1].net_score))
|
||||
return [item for _, item in scored_items[:limit]]
|
||||
|
||||
def get_by_category(
|
||||
self,
|
||||
category: InsightCategory,
|
||||
min_score: int = 0
|
||||
) -> List[PlaybookItem]:
|
||||
"""Get all items in a category with minimum net score."""
|
||||
return [
|
||||
item for item in self.items.values()
|
||||
if item.category == category and item.net_score >= min_score
|
||||
]
|
||||
|
||||
def prune_harmful(self, threshold: int = -3) -> int:
|
||||
"""
|
||||
Remove items that have proven consistently harmful.
|
||||
|
||||
Args:
|
||||
threshold: Net score threshold (items at or below are removed)
|
||||
|
||||
Returns:
|
||||
Number of items removed
|
||||
"""
|
||||
to_remove = [
|
||||
item_id for item_id, item in self.items.items()
|
||||
if item.net_score <= threshold
|
||||
]
|
||||
for item_id in to_remove:
|
||||
del self.items[item_id]
|
||||
|
||||
if to_remove:
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
self.version += 1
|
||||
|
||||
return len(to_remove)
|
||||
|
||||
def get_stats(self) -> Dict[str, Any]:
|
||||
"""Get playbook statistics."""
|
||||
by_category = {}
|
||||
for item in self.items.values():
|
||||
cat = item.category.name
|
||||
if cat not in by_category:
|
||||
by_category[cat] = 0
|
||||
by_category[cat] += 1
|
||||
|
||||
scores = [item.net_score for item in self.items.values()]
|
||||
|
||||
return {
|
||||
"total_items": len(self.items),
|
||||
"by_category": by_category,
|
||||
"version": self.version,
|
||||
"last_updated": self.last_updated,
|
||||
"avg_score": sum(scores) / len(scores) if scores else 0,
|
||||
"max_score": max(scores) if scores else 0,
|
||||
"min_score": min(scores) if scores else 0
|
||||
}
|
||||
|
||||
def save(self, path: Path) -> None:
|
||||
"""
|
||||
Persist playbook to JSON.
|
||||
|
||||
Args:
|
||||
path: File path to save to
|
||||
"""
|
||||
data = {
|
||||
"version": self.version,
|
||||
"last_updated": self.last_updated,
|
||||
"items": {k: v.to_dict() for k, v in self.items.items()}
|
||||
}
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(path, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2)
|
||||
|
||||
@classmethod
|
||||
def load(cls, path: Path) -> "AtomizerPlaybook":
|
||||
"""
|
||||
Load playbook from JSON.
|
||||
|
||||
Args:
|
||||
path: File path to load from
|
||||
|
||||
Returns:
|
||||
Loaded playbook (or new empty playbook if file doesn't exist)
|
||||
"""
|
||||
if not path.exists():
|
||||
return cls()
|
||||
|
||||
with open(path, encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
playbook = cls(
|
||||
version=data.get("version", 1),
|
||||
last_updated=data.get("last_updated", datetime.now().isoformat())
|
||||
)
|
||||
|
||||
for item_data in data.get("items", {}).values():
|
||||
item = PlaybookItem.from_dict(item_data)
|
||||
playbook.items[item.id] = item
|
||||
|
||||
return playbook
|
||||
|
||||
|
||||
# Convenience function for global playbook access
|
||||
_global_playbook: Optional[AtomizerPlaybook] = None
|
||||
_global_playbook_path: Optional[Path] = None
|
||||
|
||||
|
||||
def get_playbook(path: Optional[Path] = None) -> AtomizerPlaybook:
|
||||
"""
|
||||
Get the global playbook instance.
|
||||
|
||||
Args:
|
||||
path: Optional path to load from (uses default if not provided)
|
||||
|
||||
Returns:
|
||||
The global AtomizerPlaybook instance
|
||||
"""
|
||||
global _global_playbook, _global_playbook_path
|
||||
|
||||
if path is None:
|
||||
# Default path
|
||||
path = Path(__file__).parents[2] / "knowledge_base" / "playbook.json"
|
||||
|
||||
if _global_playbook is None or _global_playbook_path != path:
|
||||
_global_playbook = AtomizerPlaybook.load(path)
|
||||
_global_playbook_path = path
|
||||
|
||||
return _global_playbook
|
||||
|
||||
|
||||
def save_playbook() -> None:
|
||||
"""Save the global playbook to its path."""
|
||||
global _global_playbook, _global_playbook_path
|
||||
|
||||
if _global_playbook is not None and _global_playbook_path is not None:
|
||||
_global_playbook.save(_global_playbook_path)
|
||||
467
optimization_engine/context/reflector.py
Normal file
467
optimization_engine/context/reflector.py
Normal file
@@ -0,0 +1,467 @@
|
||||
"""
|
||||
Atomizer Reflector - Optimization Outcome Analysis
|
||||
|
||||
Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
|
||||
|
||||
The Reflector analyzes optimization outcomes to extract actionable insights:
|
||||
- Examines successful and failed trials
|
||||
- Extracts patterns that led to success/failure
|
||||
- Formats insights for Curator (Playbook) integration
|
||||
|
||||
This implements the "Reflector" role from the ACE framework's
|
||||
Generator -> Reflector -> Curator pipeline.
|
||||
"""
|
||||
|
||||
from typing import Dict, Any, List, Optional
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
import re
|
||||
|
||||
from .playbook import AtomizerPlaybook, InsightCategory, PlaybookItem
|
||||
|
||||
|
||||
@dataclass
|
||||
class OptimizationOutcome:
|
||||
"""
|
||||
Captured outcome from an optimization trial.
|
||||
|
||||
Contains all information needed to analyze what happened
|
||||
and extract insights for the playbook.
|
||||
"""
|
||||
trial_number: int
|
||||
success: bool
|
||||
objective_value: Optional[float]
|
||||
constraint_violations: List[str] = field(default_factory=list)
|
||||
solver_errors: List[str] = field(default_factory=list)
|
||||
design_variables: Dict[str, float] = field(default_factory=dict)
|
||||
extractor_used: str = ""
|
||||
duration_seconds: float = 0.0
|
||||
notes: str = ""
|
||||
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||
|
||||
# Optional metadata
|
||||
solver_type: str = ""
|
||||
mesh_info: Dict[str, Any] = field(default_factory=dict)
|
||||
convergence_info: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary for serialization."""
|
||||
return {
|
||||
"trial_number": self.trial_number,
|
||||
"success": self.success,
|
||||
"objective_value": self.objective_value,
|
||||
"constraint_violations": self.constraint_violations,
|
||||
"solver_errors": self.solver_errors,
|
||||
"design_variables": self.design_variables,
|
||||
"extractor_used": self.extractor_used,
|
||||
"duration_seconds": self.duration_seconds,
|
||||
"notes": self.notes,
|
||||
"timestamp": self.timestamp,
|
||||
"solver_type": self.solver_type,
|
||||
"mesh_info": self.mesh_info,
|
||||
"convergence_info": self.convergence_info
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class InsightCandidate:
|
||||
"""
|
||||
A candidate insight extracted from trial analysis.
|
||||
|
||||
Not yet committed to playbook - pending review/aggregation.
|
||||
"""
|
||||
category: InsightCategory
|
||||
content: str
|
||||
helpful: bool
|
||||
trial_number: Optional[int] = None
|
||||
confidence: float = 0.5
|
||||
tags: List[str] = field(default_factory=list)
|
||||
|
||||
|
||||
class AtomizerReflector:
|
||||
"""
|
||||
Analyzes optimization outcomes and extracts actionable insights.
|
||||
|
||||
Implements the Reflector role from ACE framework:
|
||||
- Examines successful and failed trials
|
||||
- Extracts patterns that led to success/failure
|
||||
- Formats insights for Curator integration
|
||||
|
||||
Usage:
|
||||
playbook = AtomizerPlaybook.load(path)
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
# After each trial
|
||||
reflector.analyze_trial(outcome)
|
||||
|
||||
# After study completion
|
||||
reflector.analyze_study_completion(stats)
|
||||
|
||||
# Commit insights to playbook
|
||||
count = reflector.commit_insights()
|
||||
playbook.save(path)
|
||||
"""
|
||||
|
||||
# Error pattern matchers for insight extraction
|
||||
ERROR_PATTERNS = {
|
||||
"convergence": [
|
||||
r"convergence",
|
||||
r"did not converge",
|
||||
r"iteration limit",
|
||||
r"max iterations"
|
||||
],
|
||||
"mesh": [
|
||||
r"mesh",
|
||||
r"element",
|
||||
r"distorted",
|
||||
r"jacobian",
|
||||
r"negative volume"
|
||||
],
|
||||
"singularity": [
|
||||
r"singular",
|
||||
r"matrix",
|
||||
r"ill-conditioned",
|
||||
r"pivot"
|
||||
],
|
||||
"memory": [
|
||||
r"memory",
|
||||
r"allocation",
|
||||
r"out of memory",
|
||||
r"insufficient"
|
||||
],
|
||||
"license": [
|
||||
r"license",
|
||||
r"checkout",
|
||||
r"unavailable"
|
||||
],
|
||||
"boundary": [
|
||||
r"boundary",
|
||||
r"constraint",
|
||||
r"spc",
|
||||
r"load"
|
||||
]
|
||||
}
|
||||
|
||||
def __init__(self, playbook: AtomizerPlaybook):
|
||||
"""
|
||||
Initialize reflector with target playbook.
|
||||
|
||||
Args:
|
||||
playbook: The playbook to add insights to
|
||||
"""
|
||||
self.playbook = playbook
|
||||
self.pending_insights: List[InsightCandidate] = []
|
||||
self.analyzed_trials: List[int] = []
|
||||
|
||||
def analyze_trial(self, outcome: OptimizationOutcome) -> List[InsightCandidate]:
|
||||
"""
|
||||
Analyze a single trial outcome and extract insights.
|
||||
|
||||
Returns list of insight candidates (not yet added to playbook).
|
||||
|
||||
Args:
|
||||
outcome: The trial outcome to analyze
|
||||
|
||||
Returns:
|
||||
List of extracted insight candidates
|
||||
"""
|
||||
insights = []
|
||||
self.analyzed_trials.append(outcome.trial_number)
|
||||
|
||||
# Analyze solver errors
|
||||
for error in outcome.solver_errors:
|
||||
error_insights = self._analyze_error(error, outcome)
|
||||
insights.extend(error_insights)
|
||||
|
||||
# Analyze constraint violations
|
||||
for violation in outcome.constraint_violations:
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"Constraint violation: {violation}",
|
||||
helpful=False,
|
||||
trial_number=outcome.trial_number,
|
||||
tags=["constraint", "violation"]
|
||||
))
|
||||
|
||||
# Analyze successful patterns
|
||||
if outcome.success and outcome.objective_value is not None:
|
||||
success_insights = self._analyze_success(outcome)
|
||||
insights.extend(success_insights)
|
||||
|
||||
# Analyze duration (performance insights)
|
||||
if outcome.duration_seconds > 0:
|
||||
perf_insights = self._analyze_performance(outcome)
|
||||
insights.extend(perf_insights)
|
||||
|
||||
self.pending_insights.extend(insights)
|
||||
return insights
|
||||
|
||||
def _analyze_error(
|
||||
self,
|
||||
error: str,
|
||||
outcome: OptimizationOutcome
|
||||
) -> List[InsightCandidate]:
|
||||
"""Analyze a solver error and extract relevant insights."""
|
||||
insights = []
|
||||
error_lower = error.lower()
|
||||
|
||||
# Classify error type
|
||||
error_type = "unknown"
|
||||
for etype, patterns in self.ERROR_PATTERNS.items():
|
||||
if any(re.search(p, error_lower) for p in patterns):
|
||||
error_type = etype
|
||||
break
|
||||
|
||||
# Generate insight based on error type
|
||||
if error_type == "convergence":
|
||||
config_summary = self._summarize_config(outcome)
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"Convergence failure with {config_summary}. Consider relaxing solver tolerances or reviewing mesh quality.",
|
||||
helpful=False,
|
||||
trial_number=outcome.trial_number,
|
||||
confidence=0.7,
|
||||
tags=["convergence", "solver", error_type]
|
||||
))
|
||||
|
||||
elif error_type == "mesh":
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"Mesh-related error: {error[:100]}. Review element quality and mesh density.",
|
||||
helpful=False,
|
||||
trial_number=outcome.trial_number,
|
||||
confidence=0.8,
|
||||
tags=["mesh", "element", error_type]
|
||||
))
|
||||
|
||||
elif error_type == "singularity":
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"Matrix singularity detected. Check boundary conditions and constraints for rigid body modes.",
|
||||
helpful=False,
|
||||
trial_number=outcome.trial_number,
|
||||
confidence=0.9,
|
||||
tags=["singularity", "boundary", error_type]
|
||||
))
|
||||
|
||||
elif error_type == "memory":
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.TOOL,
|
||||
content=f"Memory allocation failure. Consider reducing mesh density or using out-of-core solver.",
|
||||
helpful=False,
|
||||
trial_number=outcome.trial_number,
|
||||
confidence=0.8,
|
||||
tags=["memory", "performance", error_type]
|
||||
))
|
||||
|
||||
else:
|
||||
# Generic error insight
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"Solver error: {error[:150]}",
|
||||
helpful=False,
|
||||
trial_number=outcome.trial_number,
|
||||
confidence=0.5,
|
||||
tags=["error", error_type]
|
||||
))
|
||||
|
||||
return insights
|
||||
|
||||
def _analyze_success(self, outcome: OptimizationOutcome) -> List[InsightCandidate]:
|
||||
"""Analyze successful trial and extract winning patterns."""
|
||||
insights = []
|
||||
|
||||
# Record successful design variable ranges
|
||||
design_summary = self._summarize_design(outcome)
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.STRATEGY,
|
||||
content=f"Successful design: {design_summary}",
|
||||
helpful=True,
|
||||
trial_number=outcome.trial_number,
|
||||
confidence=0.6,
|
||||
tags=["success", "design"]
|
||||
))
|
||||
|
||||
# Record extractor performance if fast
|
||||
if outcome.duration_seconds > 0 and outcome.duration_seconds < 60:
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.TOOL,
|
||||
content=f"Fast solve ({outcome.duration_seconds:.1f}s) using {outcome.extractor_used}",
|
||||
helpful=True,
|
||||
trial_number=outcome.trial_number,
|
||||
confidence=0.5,
|
||||
tags=["performance", "extractor"]
|
||||
))
|
||||
|
||||
return insights
|
||||
|
||||
def _analyze_performance(self, outcome: OptimizationOutcome) -> List[InsightCandidate]:
|
||||
"""Analyze performance characteristics."""
|
||||
insights = []
|
||||
|
||||
# Flag very slow trials
|
||||
if outcome.duration_seconds > 300: # > 5 minutes
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.TOOL,
|
||||
content=f"Slow trial ({outcome.duration_seconds/60:.1f} min). Consider mesh refinement or solver settings.",
|
||||
helpful=False,
|
||||
trial_number=outcome.trial_number,
|
||||
confidence=0.6,
|
||||
tags=["performance", "slow"]
|
||||
))
|
||||
|
||||
return insights
|
||||
|
||||
def analyze_study_completion(
|
||||
self,
|
||||
study_name: str,
|
||||
total_trials: int,
|
||||
best_value: float,
|
||||
convergence_rate: float,
|
||||
method: str = ""
|
||||
) -> List[InsightCandidate]:
|
||||
"""
|
||||
Analyze completed study and extract high-level insights.
|
||||
|
||||
Args:
|
||||
study_name: Name of the completed study
|
||||
total_trials: Total number of trials run
|
||||
best_value: Best objective value achieved
|
||||
convergence_rate: Fraction of trials that succeeded (0.0-1.0)
|
||||
method: Optimization method used
|
||||
|
||||
Returns:
|
||||
List of study-level insight candidates
|
||||
"""
|
||||
insights = []
|
||||
|
||||
if convergence_rate > 0.9:
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.STRATEGY,
|
||||
content=f"Study '{study_name}' achieved {convergence_rate:.0%} success rate - configuration is robust for similar problems.",
|
||||
helpful=True,
|
||||
confidence=0.8,
|
||||
tags=["study", "robust", "high_success"]
|
||||
))
|
||||
elif convergence_rate < 0.5:
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.MISTAKE,
|
||||
content=f"Study '{study_name}' had only {convergence_rate:.0%} success rate - review mesh quality and solver settings.",
|
||||
helpful=False,
|
||||
confidence=0.8,
|
||||
tags=["study", "low_success", "needs_review"]
|
||||
))
|
||||
|
||||
# Method-specific insights
|
||||
if method and total_trials > 20:
|
||||
if convergence_rate > 0.8:
|
||||
insights.append(InsightCandidate(
|
||||
category=InsightCategory.STRATEGY,
|
||||
content=f"{method} performed well on '{study_name}' ({convergence_rate:.0%} success, {total_trials} trials).",
|
||||
helpful=True,
|
||||
confidence=0.7,
|
||||
tags=["method", method.lower(), "performance"]
|
||||
))
|
||||
|
||||
self.pending_insights.extend(insights)
|
||||
return insights
|
||||
|
||||
def commit_insights(self, min_confidence: float = 0.0) -> int:
|
||||
"""
|
||||
Commit pending insights to playbook (Curator handoff).
|
||||
|
||||
Aggregates similar insights and adds to playbook with
|
||||
appropriate helpful/harmful counts.
|
||||
|
||||
Args:
|
||||
min_confidence: Minimum confidence threshold to commit
|
||||
|
||||
Returns:
|
||||
Number of insights added to playbook
|
||||
"""
|
||||
count = 0
|
||||
|
||||
for insight in self.pending_insights:
|
||||
if insight.confidence < min_confidence:
|
||||
continue
|
||||
|
||||
item = self.playbook.add_insight(
|
||||
category=insight.category,
|
||||
content=insight.content,
|
||||
source_trial=insight.trial_number,
|
||||
tags=insight.tags
|
||||
)
|
||||
|
||||
# Record initial outcome based on insight nature
|
||||
if not insight.helpful:
|
||||
self.playbook.record_outcome(item.id, helpful=False)
|
||||
|
||||
count += 1
|
||||
|
||||
self.pending_insights = []
|
||||
return count
|
||||
|
||||
def get_pending_count(self) -> int:
|
||||
"""Get number of pending insights."""
|
||||
return len(self.pending_insights)
|
||||
|
||||
def clear_pending(self) -> None:
|
||||
"""Clear pending insights without committing."""
|
||||
self.pending_insights = []
|
||||
|
||||
def _summarize_config(self, outcome: OptimizationOutcome) -> str:
|
||||
"""Create brief config summary for error context."""
|
||||
parts = []
|
||||
if outcome.extractor_used:
|
||||
parts.append(f"extractor={outcome.extractor_used}")
|
||||
parts.append(f"vars={len(outcome.design_variables)}")
|
||||
if outcome.solver_type:
|
||||
parts.append(f"solver={outcome.solver_type}")
|
||||
return ", ".join(parts)
|
||||
|
||||
def _summarize_design(self, outcome: OptimizationOutcome) -> str:
|
||||
"""Create brief design summary."""
|
||||
parts = []
|
||||
if outcome.objective_value is not None:
|
||||
parts.append(f"obj={outcome.objective_value:.4g}")
|
||||
|
||||
# Include up to 3 design variables
|
||||
var_items = list(outcome.design_variables.items())[:3]
|
||||
for k, v in var_items:
|
||||
parts.append(f"{k}={v:.3g}")
|
||||
|
||||
if len(outcome.design_variables) > 3:
|
||||
parts.append(f"(+{len(outcome.design_variables)-3} more)")
|
||||
|
||||
return ", ".join(parts)
|
||||
|
||||
|
||||
class ReflectorFactory:
|
||||
"""Factory for creating reflectors with different configurations."""
|
||||
|
||||
@staticmethod
|
||||
def create_for_study(study_dir: Path) -> AtomizerReflector:
|
||||
"""
|
||||
Create a reflector for a specific study.
|
||||
|
||||
Args:
|
||||
study_dir: Path to the study directory
|
||||
|
||||
Returns:
|
||||
Configured AtomizerReflector
|
||||
"""
|
||||
playbook_path = study_dir / "3_results" / "playbook.json"
|
||||
playbook = AtomizerPlaybook.load(playbook_path)
|
||||
return AtomizerReflector(playbook)
|
||||
|
||||
@staticmethod
|
||||
def create_global() -> AtomizerReflector:
|
||||
"""
|
||||
Create a reflector using the global playbook.
|
||||
|
||||
Returns:
|
||||
AtomizerReflector using global playbook
|
||||
"""
|
||||
from .playbook import get_playbook
|
||||
return AtomizerReflector(get_playbook())
|
||||
531
optimization_engine/context/runner_integration.py
Normal file
531
optimization_engine/context/runner_integration.py
Normal file
@@ -0,0 +1,531 @@
|
||||
"""
|
||||
Context Engineering Integration for OptimizationRunner
|
||||
|
||||
Provides integration between the context engineering system and the
|
||||
OptimizationRunner without modifying the core runner code.
|
||||
|
||||
Two approaches are provided:
|
||||
1. ContextEngineeringMixin - Mix into OptimizationRunner subclass
|
||||
2. ContextAwareRunner - Wrapper that adds context engineering
|
||||
|
||||
Usage:
|
||||
# Approach 1: Mixin
|
||||
class MyRunner(ContextEngineeringMixin, OptimizationRunner):
|
||||
pass
|
||||
|
||||
# Approach 2: Wrapper
|
||||
runner = OptimizationRunner(...)
|
||||
context_runner = ContextAwareRunner(runner, playbook_path)
|
||||
context_runner.run(...)
|
||||
"""
|
||||
|
||||
from typing import Dict, Any, Optional, List, Callable
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
import time
|
||||
|
||||
from .playbook import AtomizerPlaybook, get_playbook
|
||||
from .reflector import AtomizerReflector, OptimizationOutcome
|
||||
from .feedback_loop import FeedbackLoop
|
||||
from .compaction import CompactionManager, EventType
|
||||
from .session_state import AtomizerSessionState, TaskType, get_session
|
||||
|
||||
|
||||
class ContextEngineeringMixin:
|
||||
"""
|
||||
Mixin class to add context engineering to OptimizationRunner.
|
||||
|
||||
Provides:
|
||||
- Automatic playbook loading/saving
|
||||
- Trial outcome reflection
|
||||
- Learning from successes/failures
|
||||
- Session state tracking
|
||||
|
||||
Usage:
|
||||
class MyContextAwareRunner(ContextEngineeringMixin, OptimizationRunner):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self.init_context_engineering()
|
||||
"""
|
||||
|
||||
def init_context_engineering(
|
||||
self,
|
||||
playbook_path: Optional[Path] = None,
|
||||
enable_compaction: bool = True,
|
||||
compaction_threshold: int = 50
|
||||
) -> None:
|
||||
"""
|
||||
Initialize context engineering components.
|
||||
|
||||
Call this in your subclass __init__ after super().__init__().
|
||||
|
||||
Args:
|
||||
playbook_path: Path to playbook JSON (default: output_dir/playbook.json)
|
||||
enable_compaction: Whether to enable context compaction
|
||||
compaction_threshold: Number of events before compaction
|
||||
"""
|
||||
# Determine playbook path
|
||||
if playbook_path is None:
|
||||
playbook_path = getattr(self, 'output_dir', Path('.')) / 'playbook.json'
|
||||
|
||||
self._playbook_path = Path(playbook_path)
|
||||
self._playbook = AtomizerPlaybook.load(self._playbook_path)
|
||||
self._reflector = AtomizerReflector(self._playbook)
|
||||
self._feedback_loop = FeedbackLoop(self._playbook_path)
|
||||
|
||||
# Initialize compaction if enabled
|
||||
self._enable_compaction = enable_compaction
|
||||
if enable_compaction:
|
||||
self._compaction_manager = CompactionManager(
|
||||
compaction_threshold=compaction_threshold,
|
||||
keep_recent=20,
|
||||
keep_errors=True
|
||||
)
|
||||
else:
|
||||
self._compaction_manager = None
|
||||
|
||||
# Session state
|
||||
self._session = get_session()
|
||||
self._session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
||||
|
||||
# Track active playbook items for feedback attribution
|
||||
self._active_playbook_items: List[str] = []
|
||||
|
||||
# Statistics
|
||||
self._context_stats = {
|
||||
"trials_processed": 0,
|
||||
"insights_generated": 0,
|
||||
"errors_captured": 0
|
||||
}
|
||||
|
||||
def get_relevant_playbook_items(self, max_items: int = 15) -> List[str]:
|
||||
"""
|
||||
Get relevant playbook items for current optimization context.
|
||||
|
||||
Returns:
|
||||
List of playbook item context strings
|
||||
"""
|
||||
context = self._playbook.get_context_for_task(
|
||||
task_type="optimization",
|
||||
max_items=max_items,
|
||||
min_confidence=0.5
|
||||
)
|
||||
|
||||
# Extract item IDs for feedback tracking
|
||||
self._active_playbook_items = [
|
||||
item.id for item in self._playbook.items.values()
|
||||
][:max_items]
|
||||
|
||||
return context.split('\n')
|
||||
|
||||
def record_trial_start(self, trial_number: int, design_vars: Dict[str, float]) -> None:
|
||||
"""
|
||||
Record the start of a trial for context tracking.
|
||||
|
||||
Args:
|
||||
trial_number: Trial number
|
||||
design_vars: Design variable values
|
||||
"""
|
||||
if self._compaction_manager:
|
||||
self._compaction_manager.add_event(
|
||||
self._compaction_manager.events.__class__(
|
||||
timestamp=datetime.now(),
|
||||
event_type=EventType.TRIAL_START,
|
||||
summary=f"Trial {trial_number} started",
|
||||
details={"trial_number": trial_number, "design_vars": design_vars}
|
||||
)
|
||||
)
|
||||
|
||||
self._session.add_action(f"Started trial {trial_number}")
|
||||
|
||||
def record_trial_outcome(
|
||||
self,
|
||||
trial_number: int,
|
||||
success: bool,
|
||||
objective_value: Optional[float],
|
||||
design_vars: Dict[str, float],
|
||||
errors: Optional[List[str]] = None,
|
||||
duration_seconds: float = 0.0
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Record the outcome of a trial for learning.
|
||||
|
||||
Args:
|
||||
trial_number: Trial number
|
||||
success: Whether trial succeeded
|
||||
objective_value: Objective value (None if failed)
|
||||
design_vars: Design variable values
|
||||
errors: List of error messages
|
||||
duration_seconds: Trial duration
|
||||
|
||||
Returns:
|
||||
Dictionary with processing results
|
||||
"""
|
||||
errors = errors or []
|
||||
|
||||
# Update compaction manager
|
||||
if self._compaction_manager:
|
||||
self._compaction_manager.add_trial_event(
|
||||
trial_number=trial_number,
|
||||
success=success,
|
||||
objective=objective_value,
|
||||
duration=duration_seconds
|
||||
)
|
||||
|
||||
# Create outcome for reflection
|
||||
outcome = OptimizationOutcome(
|
||||
trial_number=trial_number,
|
||||
success=success,
|
||||
objective_value=objective_value,
|
||||
constraint_violations=[],
|
||||
solver_errors=errors,
|
||||
design_variables=design_vars,
|
||||
extractor_used=getattr(self, '_current_extractor', ''),
|
||||
duration_seconds=duration_seconds
|
||||
)
|
||||
|
||||
# Analyze and generate insights
|
||||
insights = self._reflector.analyze_trial(outcome)
|
||||
|
||||
# Process through feedback loop
|
||||
result = self._feedback_loop.process_trial_result(
|
||||
trial_number=trial_number,
|
||||
success=success,
|
||||
objective_value=objective_value or 0.0,
|
||||
design_variables=design_vars,
|
||||
context_items_used=self._active_playbook_items,
|
||||
errors=errors
|
||||
)
|
||||
|
||||
# Update statistics
|
||||
self._context_stats["trials_processed"] += 1
|
||||
self._context_stats["insights_generated"] += len(insights)
|
||||
|
||||
# Update session state
|
||||
if success:
|
||||
self._session.add_action(
|
||||
f"Trial {trial_number} succeeded: obj={objective_value:.4g}"
|
||||
)
|
||||
else:
|
||||
error_summary = errors[0][:50] if errors else "unknown"
|
||||
self._session.add_error(f"Trial {trial_number}: {error_summary}")
|
||||
self._context_stats["errors_captured"] += 1
|
||||
|
||||
return {
|
||||
"insights_extracted": len(insights),
|
||||
"playbook_items_updated": result.get("items_updated", 0)
|
||||
}
|
||||
|
||||
def record_error(self, error_message: str, error_type: str = "") -> None:
|
||||
"""
|
||||
Record an error for learning (outside trial context).
|
||||
|
||||
Args:
|
||||
error_message: Error description
|
||||
error_type: Error classification
|
||||
"""
|
||||
if self._compaction_manager:
|
||||
self._compaction_manager.add_error_event(error_message, error_type)
|
||||
|
||||
self._session.add_error(error_message, error_type)
|
||||
self._context_stats["errors_captured"] += 1
|
||||
|
||||
def finalize_context_engineering(self, study_stats: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
|
||||
"""
|
||||
Finalize context engineering at end of optimization.
|
||||
|
||||
Commits insights and saves playbook.
|
||||
|
||||
Args:
|
||||
study_stats: Optional study statistics for analysis
|
||||
|
||||
Returns:
|
||||
Dictionary with finalization results
|
||||
"""
|
||||
if study_stats is None:
|
||||
study_stats = {
|
||||
"name": getattr(self, 'study', {}).get('study_name', 'unknown'),
|
||||
"total_trials": self._context_stats["trials_processed"],
|
||||
"best_value": getattr(self, 'best_value', 0),
|
||||
"convergence_rate": 0.8 # Would need actual calculation
|
||||
}
|
||||
|
||||
# Finalize feedback loop
|
||||
result = self._feedback_loop.finalize_study(study_stats)
|
||||
|
||||
# Save playbook
|
||||
self._playbook.save(self._playbook_path)
|
||||
|
||||
# Add compaction stats
|
||||
if self._compaction_manager:
|
||||
result["compaction_stats"] = self._compaction_manager.get_stats()
|
||||
|
||||
result["context_stats"] = self._context_stats
|
||||
|
||||
return result
|
||||
|
||||
def get_context_string(self) -> str:
|
||||
"""
|
||||
Get full context string for LLM consumption.
|
||||
|
||||
Returns:
|
||||
Formatted context string
|
||||
"""
|
||||
parts = []
|
||||
|
||||
# Session state
|
||||
parts.append(self._session.get_llm_context())
|
||||
|
||||
# Playbook items
|
||||
playbook_context = self._playbook.get_context_for_task(
|
||||
task_type="optimization",
|
||||
max_items=15
|
||||
)
|
||||
if playbook_context:
|
||||
parts.append(playbook_context)
|
||||
|
||||
# Compaction history
|
||||
if self._compaction_manager:
|
||||
parts.append(self._compaction_manager.get_context_string())
|
||||
|
||||
return "\n\n---\n\n".join(parts)
|
||||
|
||||
|
||||
class ContextAwareRunner:
|
||||
"""
|
||||
Wrapper that adds context engineering to any OptimizationRunner.
|
||||
|
||||
This approach doesn't require subclassing - it wraps an existing
|
||||
runner instance and intercepts relevant calls.
|
||||
|
||||
Usage:
|
||||
runner = OptimizationRunner(...)
|
||||
context_runner = ContextAwareRunner(runner)
|
||||
|
||||
# Use context_runner.run() instead of runner.run()
|
||||
study = context_runner.run(n_trials=50)
|
||||
|
||||
# Get learning report
|
||||
report = context_runner.get_learning_report()
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
runner,
|
||||
playbook_path: Optional[Path] = None,
|
||||
enable_compaction: bool = True
|
||||
):
|
||||
"""
|
||||
Initialize context-aware wrapper.
|
||||
|
||||
Args:
|
||||
runner: OptimizationRunner instance to wrap
|
||||
playbook_path: Path to playbook (default: runner's output_dir)
|
||||
enable_compaction: Whether to enable context compaction
|
||||
"""
|
||||
self._runner = runner
|
||||
|
||||
# Determine playbook path
|
||||
if playbook_path is None:
|
||||
playbook_path = runner.output_dir / 'playbook.json'
|
||||
|
||||
self._playbook_path = Path(playbook_path)
|
||||
self._playbook = AtomizerPlaybook.load(self._playbook_path)
|
||||
self._reflector = AtomizerReflector(self._playbook)
|
||||
self._feedback_loop = FeedbackLoop(self._playbook_path)
|
||||
|
||||
# Compaction
|
||||
self._enable_compaction = enable_compaction
|
||||
if enable_compaction:
|
||||
self._compaction = CompactionManager(
|
||||
compaction_threshold=50,
|
||||
keep_recent=20
|
||||
)
|
||||
else:
|
||||
self._compaction = None
|
||||
|
||||
# Session
|
||||
self._session = get_session()
|
||||
self._session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
||||
|
||||
# Statistics
|
||||
self._stats = {
|
||||
"trials_observed": 0,
|
||||
"successful_trials": 0,
|
||||
"failed_trials": 0,
|
||||
"insights_generated": 0
|
||||
}
|
||||
|
||||
# Hook into runner's objective function
|
||||
self._original_objective = runner._objective_function
|
||||
runner._objective_function = self._wrapped_objective
|
||||
|
||||
def _wrapped_objective(self, trial) -> float:
|
||||
"""
|
||||
Wrapped objective function that captures outcomes.
|
||||
"""
|
||||
start_time = time.time()
|
||||
trial_number = trial.number
|
||||
|
||||
# Record trial start
|
||||
if self._compaction:
|
||||
from .compaction import ContextEvent
|
||||
self._compaction.add_event(ContextEvent(
|
||||
timestamp=datetime.now(),
|
||||
event_type=EventType.TRIAL_START,
|
||||
summary=f"Trial {trial_number} starting"
|
||||
))
|
||||
|
||||
try:
|
||||
# Run original objective
|
||||
result = self._original_objective(trial)
|
||||
|
||||
# Record success
|
||||
duration = time.time() - start_time
|
||||
self._record_success(trial_number, result, trial.params, duration)
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
# Record failure
|
||||
duration = time.time() - start_time
|
||||
self._record_failure(trial_number, str(e), trial.params, duration)
|
||||
raise
|
||||
|
||||
def _record_success(
|
||||
self,
|
||||
trial_number: int,
|
||||
objective_value: float,
|
||||
params: Dict[str, Any],
|
||||
duration: float
|
||||
) -> None:
|
||||
"""Record successful trial."""
|
||||
self._stats["trials_observed"] += 1
|
||||
self._stats["successful_trials"] += 1
|
||||
|
||||
if self._compaction:
|
||||
self._compaction.add_trial_event(
|
||||
trial_number=trial_number,
|
||||
success=True,
|
||||
objective=objective_value,
|
||||
duration=duration
|
||||
)
|
||||
|
||||
# Process through feedback loop
|
||||
self._feedback_loop.process_trial_result(
|
||||
trial_number=trial_number,
|
||||
success=True,
|
||||
objective_value=objective_value,
|
||||
design_variables=dict(params),
|
||||
context_items_used=list(self._playbook.items.keys())[:10]
|
||||
)
|
||||
|
||||
# Update session
|
||||
self._session.add_action(f"Trial {trial_number}: obj={objective_value:.4g}")
|
||||
|
||||
def _record_failure(
|
||||
self,
|
||||
trial_number: int,
|
||||
error: str,
|
||||
params: Dict[str, Any],
|
||||
duration: float
|
||||
) -> None:
|
||||
"""Record failed trial."""
|
||||
self._stats["trials_observed"] += 1
|
||||
self._stats["failed_trials"] += 1
|
||||
|
||||
if self._compaction:
|
||||
self._compaction.add_trial_event(
|
||||
trial_number=trial_number,
|
||||
success=False,
|
||||
duration=duration
|
||||
)
|
||||
self._compaction.add_error_event(error, "trial_failure")
|
||||
|
||||
# Process through feedback loop
|
||||
self._feedback_loop.process_trial_result(
|
||||
trial_number=trial_number,
|
||||
success=False,
|
||||
objective_value=0.0,
|
||||
design_variables=dict(params),
|
||||
errors=[error]
|
||||
)
|
||||
|
||||
# Update session
|
||||
self._session.add_error(f"Trial {trial_number}: {error[:100]}")
|
||||
|
||||
def run(self, *args, **kwargs):
|
||||
"""
|
||||
Run optimization with context engineering.
|
||||
|
||||
Passes through to wrapped runner.run() with context tracking.
|
||||
"""
|
||||
# Update session state
|
||||
study_name = kwargs.get('study_name', 'unknown')
|
||||
self._session.exposed.study_name = study_name
|
||||
self._session.exposed.study_status = "running"
|
||||
|
||||
try:
|
||||
# Run optimization
|
||||
result = self._runner.run(*args, **kwargs)
|
||||
|
||||
# Finalize context engineering
|
||||
self._finalize(study_name)
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
self._session.add_error(f"Study failed: {str(e)}")
|
||||
raise
|
||||
|
||||
def _finalize(self, study_name: str) -> None:
|
||||
"""Finalize context engineering after optimization."""
|
||||
total_trials = self._stats["trials_observed"]
|
||||
success_rate = (
|
||||
self._stats["successful_trials"] / total_trials
|
||||
if total_trials > 0 else 0
|
||||
)
|
||||
|
||||
# Finalize feedback loop
|
||||
result = self._feedback_loop.finalize_study({
|
||||
"name": study_name,
|
||||
"total_trials": total_trials,
|
||||
"best_value": getattr(self._runner, 'best_value', 0),
|
||||
"convergence_rate": success_rate
|
||||
})
|
||||
|
||||
self._stats["insights_generated"] = result.get("insights_added", 0)
|
||||
|
||||
# Update session
|
||||
self._session.exposed.study_status = "completed"
|
||||
self._session.exposed.trials_completed = total_trials
|
||||
|
||||
def get_learning_report(self) -> Dict[str, Any]:
|
||||
"""Get report on what the system learned."""
|
||||
return {
|
||||
"statistics": self._stats,
|
||||
"playbook_size": len(self._playbook.items),
|
||||
"playbook_stats": self._playbook.get_stats(),
|
||||
"feedback_stats": self._feedback_loop.get_statistics(),
|
||||
"top_insights": self._feedback_loop.get_top_performers(10),
|
||||
"compaction_stats": (
|
||||
self._compaction.get_stats() if self._compaction else None
|
||||
)
|
||||
}
|
||||
|
||||
def get_context(self) -> str:
|
||||
"""Get current context string for LLM."""
|
||||
parts = [self._session.get_llm_context()]
|
||||
|
||||
if self._compaction:
|
||||
parts.append(self._compaction.get_context_string())
|
||||
|
||||
playbook_context = self._playbook.get_context_for_task("optimization")
|
||||
if playbook_context:
|
||||
parts.append(playbook_context)
|
||||
|
||||
return "\n\n---\n\n".join(parts)
|
||||
|
||||
def __getattr__(self, name):
|
||||
"""Delegate unknown attributes to wrapped runner."""
|
||||
return getattr(self._runner, name)
|
||||
463
optimization_engine/context/session_state.py
Normal file
463
optimization_engine/context/session_state.py
Normal file
@@ -0,0 +1,463 @@
|
||||
"""
|
||||
Atomizer Session State - Context Isolation Management
|
||||
|
||||
Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
|
||||
|
||||
Implements the "Write-Select-Compress-Isolate" pattern:
|
||||
- Exposed fields are sent to LLM at every turn
|
||||
- Isolated fields are accessed selectively when needed
|
||||
- Automatic compression of old data
|
||||
|
||||
This ensures efficient context usage while maintaining
|
||||
access to full historical data when needed.
|
||||
"""
|
||||
|
||||
from typing import Dict, List, Optional, Any
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from dataclasses import dataclass, field
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class TaskType(Enum):
|
||||
"""Types of tasks Claude can perform in Atomizer."""
|
||||
CREATE_STUDY = "create_study"
|
||||
RUN_OPTIMIZATION = "run_optimization"
|
||||
MONITOR_PROGRESS = "monitor_progress"
|
||||
ANALYZE_RESULTS = "analyze_results"
|
||||
DEBUG_ERROR = "debug_error"
|
||||
CONFIGURE_SETTINGS = "configure_settings"
|
||||
EXPORT_DATA = "export_data"
|
||||
NEURAL_ACCELERATION = "neural_acceleration"
|
||||
|
||||
|
||||
@dataclass
|
||||
class ExposedState:
|
||||
"""
|
||||
State exposed to LLM at every turn.
|
||||
|
||||
Keep this minimal - only what's needed for immediate context.
|
||||
Everything here counts against token budget every turn.
|
||||
"""
|
||||
|
||||
# Current task context
|
||||
task_type: Optional[TaskType] = None
|
||||
current_objective: str = ""
|
||||
|
||||
# Recent history (compressed)
|
||||
recent_actions: List[str] = field(default_factory=list)
|
||||
recent_errors: List[str] = field(default_factory=list)
|
||||
|
||||
# Active study summary
|
||||
study_name: Optional[str] = None
|
||||
study_status: str = "unknown"
|
||||
trials_completed: int = 0
|
||||
trials_total: int = 0
|
||||
best_value: Optional[float] = None
|
||||
best_trial: Optional[int] = None
|
||||
|
||||
# Playbook excerpt (most relevant items)
|
||||
active_playbook_items: List[str] = field(default_factory=list)
|
||||
|
||||
# Constraints for context size
|
||||
MAX_ACTIONS: int = 10
|
||||
MAX_ERRORS: int = 5
|
||||
MAX_PLAYBOOK_ITEMS: int = 15
|
||||
|
||||
|
||||
@dataclass
|
||||
class IsolatedState:
|
||||
"""
|
||||
State isolated from LLM - accessed selectively.
|
||||
|
||||
This data is NOT included in every context window.
|
||||
Load specific fields when explicitly needed.
|
||||
"""
|
||||
|
||||
# Full optimization history (can be large)
|
||||
full_trial_history: List[Dict[str, Any]] = field(default_factory=list)
|
||||
|
||||
# NX session state (heavy, complex)
|
||||
nx_model_path: Optional[str] = None
|
||||
nx_expressions: Dict[str, Any] = field(default_factory=dict)
|
||||
nx_sim_path: Optional[str] = None
|
||||
|
||||
# Neural network cache
|
||||
neural_predictions: Dict[str, float] = field(default_factory=dict)
|
||||
surrogate_model_path: Optional[str] = None
|
||||
|
||||
# Full playbook (loaded on demand)
|
||||
full_playbook_path: Optional[str] = None
|
||||
|
||||
# Debug information
|
||||
last_solver_output: str = ""
|
||||
last_f06_content: str = ""
|
||||
last_solver_returncode: Optional[int] = None
|
||||
|
||||
# Configuration snapshots
|
||||
optimization_config: Dict[str, Any] = field(default_factory=dict)
|
||||
study_config: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass
|
||||
class AtomizerSessionState:
|
||||
"""
|
||||
Complete session state with exposure control.
|
||||
|
||||
The exposed state is automatically injected into every LLM context.
|
||||
The isolated state is accessed only when explicitly needed.
|
||||
|
||||
Usage:
|
||||
session = AtomizerSessionState(session_id="session_001")
|
||||
session.exposed.task_type = TaskType.CREATE_STUDY
|
||||
session.add_action("Created study directory")
|
||||
|
||||
# Get context for LLM
|
||||
context = session.get_llm_context()
|
||||
|
||||
# Access isolated data when needed
|
||||
f06 = session.load_isolated_data("last_f06_content")
|
||||
"""
|
||||
|
||||
session_id: str
|
||||
created_at: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||
last_updated: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||
|
||||
exposed: ExposedState = field(default_factory=ExposedState)
|
||||
isolated: IsolatedState = field(default_factory=IsolatedState)
|
||||
|
||||
def get_llm_context(self) -> str:
|
||||
"""
|
||||
Generate context string for LLM consumption.
|
||||
|
||||
Only includes exposed state - isolated state requires
|
||||
explicit access via load_isolated_data().
|
||||
|
||||
Returns:
|
||||
Formatted markdown context string
|
||||
"""
|
||||
lines = [
|
||||
"## Current Session State",
|
||||
"",
|
||||
f"**Task**: {self.exposed.task_type.value if self.exposed.task_type else 'Not set'}",
|
||||
f"**Objective**: {self.exposed.current_objective or 'None specified'}",
|
||||
"",
|
||||
]
|
||||
|
||||
# Study context
|
||||
if self.exposed.study_name:
|
||||
progress = ""
|
||||
if self.exposed.trials_total > 0:
|
||||
pct = (self.exposed.trials_completed / self.exposed.trials_total) * 100
|
||||
progress = f" ({pct:.0f}%)"
|
||||
|
||||
lines.extend([
|
||||
f"### Active Study: {self.exposed.study_name}",
|
||||
f"- Status: {self.exposed.study_status}",
|
||||
f"- Trials: {self.exposed.trials_completed}/{self.exposed.trials_total}{progress}",
|
||||
])
|
||||
|
||||
if self.exposed.best_value is not None:
|
||||
lines.append(f"- Best: {self.exposed.best_value:.6g} (trial #{self.exposed.best_trial})")
|
||||
lines.append("")
|
||||
|
||||
# Recent actions
|
||||
if self.exposed.recent_actions:
|
||||
lines.append("### Recent Actions")
|
||||
for action in self.exposed.recent_actions[-5:]:
|
||||
lines.append(f"- {action}")
|
||||
lines.append("")
|
||||
|
||||
# Recent errors (highlight these)
|
||||
if self.exposed.recent_errors:
|
||||
lines.append("### Recent Errors (address these)")
|
||||
for error in self.exposed.recent_errors:
|
||||
lines.append(f"- {error}")
|
||||
lines.append("")
|
||||
|
||||
# Relevant playbook items
|
||||
if self.exposed.active_playbook_items:
|
||||
lines.append("### Relevant Knowledge")
|
||||
for item in self.exposed.active_playbook_items:
|
||||
lines.append(f"- {item}")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def add_action(self, action: str) -> None:
|
||||
"""
|
||||
Record an action (auto-compresses old actions).
|
||||
|
||||
Args:
|
||||
action: Description of the action taken
|
||||
"""
|
||||
timestamp = datetime.now().strftime("%H:%M:%S")
|
||||
self.exposed.recent_actions.append(f"[{timestamp}] {action}")
|
||||
|
||||
# Compress if over limit
|
||||
if len(self.exposed.recent_actions) > self.exposed.MAX_ACTIONS:
|
||||
# Keep first, summarize middle, keep last 5
|
||||
first = self.exposed.recent_actions[0]
|
||||
last_five = self.exposed.recent_actions[-5:]
|
||||
middle_count = len(self.exposed.recent_actions) - 6
|
||||
|
||||
self.exposed.recent_actions = (
|
||||
[first] +
|
||||
[f"... ({middle_count} earlier actions)"] +
|
||||
last_five
|
||||
)
|
||||
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
|
||||
def add_error(self, error: str, error_type: str = "") -> None:
|
||||
"""
|
||||
Record an error for LLM attention.
|
||||
|
||||
Errors are preserved more aggressively than actions
|
||||
because they need to be addressed.
|
||||
|
||||
Args:
|
||||
error: Error message
|
||||
error_type: Optional error classification
|
||||
"""
|
||||
prefix = f"[{error_type}] " if error_type else ""
|
||||
self.exposed.recent_errors.append(f"{prefix}{error}")
|
||||
|
||||
# Keep most recent errors
|
||||
self.exposed.recent_errors = self.exposed.recent_errors[-self.exposed.MAX_ERRORS:]
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
|
||||
def clear_errors(self) -> None:
|
||||
"""Clear all recorded errors (after they're addressed)."""
|
||||
self.exposed.recent_errors = []
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
|
||||
def update_study_status(
|
||||
self,
|
||||
name: str,
|
||||
status: str,
|
||||
trials_completed: int,
|
||||
trials_total: int,
|
||||
best_value: Optional[float] = None,
|
||||
best_trial: Optional[int] = None
|
||||
) -> None:
|
||||
"""
|
||||
Update the study status in exposed state.
|
||||
|
||||
Args:
|
||||
name: Study name
|
||||
status: Current status (running, completed, failed, etc.)
|
||||
trials_completed: Number of completed trials
|
||||
trials_total: Total planned trials
|
||||
best_value: Best objective value found
|
||||
best_trial: Trial number with best value
|
||||
"""
|
||||
self.exposed.study_name = name
|
||||
self.exposed.study_status = status
|
||||
self.exposed.trials_completed = trials_completed
|
||||
self.exposed.trials_total = trials_total
|
||||
self.exposed.best_value = best_value
|
||||
self.exposed.best_trial = best_trial
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
|
||||
def set_playbook_items(self, items: List[str]) -> None:
|
||||
"""
|
||||
Set the active playbook items for context.
|
||||
|
||||
Args:
|
||||
items: List of playbook item context strings
|
||||
"""
|
||||
self.exposed.active_playbook_items = items[:self.exposed.MAX_PLAYBOOK_ITEMS]
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
|
||||
def load_isolated_data(self, key: str) -> Any:
|
||||
"""
|
||||
Explicitly load isolated data when needed.
|
||||
|
||||
Use this when you need access to heavy data that
|
||||
shouldn't be in every context window.
|
||||
|
||||
Args:
|
||||
key: Attribute name in IsolatedState
|
||||
|
||||
Returns:
|
||||
The isolated data value, or None if not found
|
||||
"""
|
||||
return getattr(self.isolated, key, None)
|
||||
|
||||
def set_isolated_data(self, key: str, value: Any) -> None:
|
||||
"""
|
||||
Set isolated data.
|
||||
|
||||
Args:
|
||||
key: Attribute name in IsolatedState
|
||||
value: Value to set
|
||||
"""
|
||||
if hasattr(self.isolated, key):
|
||||
setattr(self.isolated, key, value)
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
|
||||
def add_trial_to_history(self, trial_data: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Add a trial to the full history (isolated state).
|
||||
|
||||
Args:
|
||||
trial_data: Dictionary with trial information
|
||||
"""
|
||||
trial_data["recorded_at"] = datetime.now().isoformat()
|
||||
self.isolated.full_trial_history.append(trial_data)
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
|
||||
def get_trial_history_summary(self, last_n: int = 10) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Get summary of recent trials from isolated history.
|
||||
|
||||
Args:
|
||||
last_n: Number of recent trials to return
|
||||
|
||||
Returns:
|
||||
List of trial summary dictionaries
|
||||
"""
|
||||
return self.isolated.full_trial_history[-last_n:]
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary for serialization."""
|
||||
return {
|
||||
"session_id": self.session_id,
|
||||
"created_at": self.created_at,
|
||||
"last_updated": self.last_updated,
|
||||
"exposed": {
|
||||
"task_type": self.exposed.task_type.value if self.exposed.task_type else None,
|
||||
"current_objective": self.exposed.current_objective,
|
||||
"recent_actions": self.exposed.recent_actions,
|
||||
"recent_errors": self.exposed.recent_errors,
|
||||
"study_name": self.exposed.study_name,
|
||||
"study_status": self.exposed.study_status,
|
||||
"trials_completed": self.exposed.trials_completed,
|
||||
"trials_total": self.exposed.trials_total,
|
||||
"best_value": self.exposed.best_value,
|
||||
"best_trial": self.exposed.best_trial,
|
||||
"active_playbook_items": self.exposed.active_playbook_items
|
||||
},
|
||||
"isolated": {
|
||||
"nx_model_path": self.isolated.nx_model_path,
|
||||
"nx_sim_path": self.isolated.nx_sim_path,
|
||||
"surrogate_model_path": self.isolated.surrogate_model_path,
|
||||
"full_playbook_path": self.isolated.full_playbook_path,
|
||||
"trial_history_count": len(self.isolated.full_trial_history)
|
||||
}
|
||||
}
|
||||
|
||||
def save(self, path: Path) -> None:
|
||||
"""
|
||||
Save session state to JSON.
|
||||
|
||||
Note: Full trial history is saved to a separate file
|
||||
to keep the main state file manageable.
|
||||
|
||||
Args:
|
||||
path: Path to save state file
|
||||
"""
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Save main state
|
||||
with open(path, 'w', encoding='utf-8') as f:
|
||||
json.dump(self.to_dict(), f, indent=2)
|
||||
|
||||
# Save trial history separately if large
|
||||
if len(self.isolated.full_trial_history) > 0:
|
||||
history_path = path.with_suffix('.history.json')
|
||||
with open(history_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(self.isolated.full_trial_history, f, indent=2)
|
||||
|
||||
@classmethod
|
||||
def load(cls, path: Path) -> "AtomizerSessionState":
|
||||
"""
|
||||
Load session state from JSON.
|
||||
|
||||
Args:
|
||||
path: Path to state file
|
||||
|
||||
Returns:
|
||||
Loaded session state (or new state if file doesn't exist)
|
||||
"""
|
||||
if not path.exists():
|
||||
return cls(session_id=f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}")
|
||||
|
||||
with open(path, encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
state = cls(
|
||||
session_id=data.get("session_id", "unknown"),
|
||||
created_at=data.get("created_at", datetime.now().isoformat()),
|
||||
last_updated=data.get("last_updated", datetime.now().isoformat())
|
||||
)
|
||||
|
||||
# Load exposed state
|
||||
exposed = data.get("exposed", {})
|
||||
if exposed.get("task_type"):
|
||||
state.exposed.task_type = TaskType(exposed["task_type"])
|
||||
state.exposed.current_objective = exposed.get("current_objective", "")
|
||||
state.exposed.recent_actions = exposed.get("recent_actions", [])
|
||||
state.exposed.recent_errors = exposed.get("recent_errors", [])
|
||||
state.exposed.study_name = exposed.get("study_name")
|
||||
state.exposed.study_status = exposed.get("study_status", "unknown")
|
||||
state.exposed.trials_completed = exposed.get("trials_completed", 0)
|
||||
state.exposed.trials_total = exposed.get("trials_total", 0)
|
||||
state.exposed.best_value = exposed.get("best_value")
|
||||
state.exposed.best_trial = exposed.get("best_trial")
|
||||
state.exposed.active_playbook_items = exposed.get("active_playbook_items", [])
|
||||
|
||||
# Load isolated state metadata
|
||||
isolated = data.get("isolated", {})
|
||||
state.isolated.nx_model_path = isolated.get("nx_model_path")
|
||||
state.isolated.nx_sim_path = isolated.get("nx_sim_path")
|
||||
state.isolated.surrogate_model_path = isolated.get("surrogate_model_path")
|
||||
state.isolated.full_playbook_path = isolated.get("full_playbook_path")
|
||||
|
||||
# Load trial history from separate file if exists
|
||||
history_path = path.with_suffix('.history.json')
|
||||
if history_path.exists():
|
||||
with open(history_path, encoding='utf-8') as f:
|
||||
state.isolated.full_trial_history = json.load(f)
|
||||
|
||||
return state
|
||||
|
||||
|
||||
# Convenience functions for session management
|
||||
_active_session: Optional[AtomizerSessionState] = None
|
||||
|
||||
|
||||
def get_session() -> AtomizerSessionState:
|
||||
"""
|
||||
Get the active session state.
|
||||
|
||||
Creates a new session if none exists.
|
||||
|
||||
Returns:
|
||||
The active AtomizerSessionState
|
||||
"""
|
||||
global _active_session
|
||||
if _active_session is None:
|
||||
_active_session = AtomizerSessionState(
|
||||
session_id=f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
|
||||
)
|
||||
return _active_session
|
||||
|
||||
|
||||
def set_session(session: AtomizerSessionState) -> None:
|
||||
"""
|
||||
Set the active session.
|
||||
|
||||
Args:
|
||||
session: Session state to make active
|
||||
"""
|
||||
global _active_session
|
||||
_active_session = session
|
||||
|
||||
|
||||
def clear_session() -> None:
|
||||
"""Clear the active session."""
|
||||
global _active_session
|
||||
_active_session = None
|
||||
268
optimization_engine/plugins/post_solve/error_tracker.py
Normal file
268
optimization_engine/plugins/post_solve/error_tracker.py
Normal file
@@ -0,0 +1,268 @@
|
||||
"""
|
||||
Error Tracker Hook - Context Engineering Integration
|
||||
|
||||
Preserves solver errors and failures in context for learning.
|
||||
Based on Manus insight: "leave the wrong turns in the context"
|
||||
|
||||
This hook:
|
||||
1. Captures solver errors and failures
|
||||
2. Classifies error types for playbook categorization
|
||||
3. Extracts relevant F06 content for analysis
|
||||
4. Records errors to session state and LAC
|
||||
|
||||
Hook Point: post_solve
|
||||
Priority: 100 (run early to capture before cleanup)
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import Dict, Any, Optional
|
||||
import json
|
||||
import re
|
||||
|
||||
|
||||
def classify_error(error_msg: str) -> str:
|
||||
"""
|
||||
Classify error type for playbook categorization.
|
||||
|
||||
Args:
|
||||
error_msg: Error message text
|
||||
|
||||
Returns:
|
||||
Error classification string
|
||||
"""
|
||||
error_lower = error_msg.lower()
|
||||
|
||||
# Check patterns in priority order
|
||||
if any(x in error_lower for x in ['convergence', 'did not converge', 'diverge']):
|
||||
return "convergence_failure"
|
||||
elif any(x in error_lower for x in ['mesh', 'element', 'distorted', 'jacobian']):
|
||||
return "mesh_error"
|
||||
elif any(x in error_lower for x in ['singular', 'matrix', 'pivot', 'ill-conditioned']):
|
||||
return "singularity"
|
||||
elif any(x in error_lower for x in ['memory', 'allocation', 'out of memory']):
|
||||
return "memory_error"
|
||||
elif any(x in error_lower for x in ['license', 'checkout']):
|
||||
return "license_error"
|
||||
elif any(x in error_lower for x in ['boundary', 'constraint', 'spc', 'rigid body']):
|
||||
return "boundary_condition_error"
|
||||
elif any(x in error_lower for x in ['timeout', 'time limit']):
|
||||
return "timeout_error"
|
||||
elif any(x in error_lower for x in ['file', 'not found', 'missing']):
|
||||
return "file_error"
|
||||
else:
|
||||
return "unknown_error"
|
||||
|
||||
|
||||
def extract_f06_error(f06_path: Optional[str], max_chars: int = 500) -> str:
|
||||
"""
|
||||
Extract error section from F06 file.
|
||||
|
||||
Args:
|
||||
f06_path: Path to F06 file
|
||||
max_chars: Maximum characters to extract
|
||||
|
||||
Returns:
|
||||
Error section content or empty string
|
||||
"""
|
||||
if not f06_path:
|
||||
return ""
|
||||
|
||||
path = Path(f06_path)
|
||||
if not path.exists():
|
||||
return ""
|
||||
|
||||
try:
|
||||
with open(path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
# Look for error indicators
|
||||
error_markers = [
|
||||
"*** USER FATAL",
|
||||
"*** SYSTEM FATAL",
|
||||
"*** USER WARNING",
|
||||
"*** SYSTEM WARNING",
|
||||
"FATAL ERROR",
|
||||
"ERROR MESSAGE"
|
||||
]
|
||||
|
||||
for marker in error_markers:
|
||||
if marker in content:
|
||||
idx = content.index(marker)
|
||||
# Extract surrounding context
|
||||
start = max(0, idx - 100)
|
||||
end = min(len(content), idx + max_chars)
|
||||
return content[start:end].strip()
|
||||
|
||||
# If no explicit error marker, check for convergence messages
|
||||
convergence_patterns = [
|
||||
r"CONVERGENCE NOT ACHIEVED",
|
||||
r"SOLUTION DID NOT CONVERGE",
|
||||
r"DIVERGENCE DETECTED"
|
||||
]
|
||||
|
||||
for pattern in convergence_patterns:
|
||||
match = re.search(pattern, content, re.IGNORECASE)
|
||||
if match:
|
||||
idx = match.start()
|
||||
start = max(0, idx - 50)
|
||||
end = min(len(content), idx + max_chars)
|
||||
return content[start:end].strip()
|
||||
|
||||
return ""
|
||||
|
||||
except Exception as e:
|
||||
return f"Error reading F06: {str(e)}"
|
||||
|
||||
|
||||
def find_f06_file(working_dir: str, sim_file: str = "") -> Optional[Path]:
|
||||
"""
|
||||
Find the F06 file in the working directory.
|
||||
|
||||
Args:
|
||||
working_dir: Working directory path
|
||||
sim_file: Simulation file name (for naming pattern)
|
||||
|
||||
Returns:
|
||||
Path to F06 file or None
|
||||
"""
|
||||
work_path = Path(working_dir)
|
||||
|
||||
# Try common patterns
|
||||
patterns = [
|
||||
"*.f06",
|
||||
"*-solution*.f06",
|
||||
"*_sim*.f06"
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
matches = list(work_path.glob(pattern))
|
||||
if matches:
|
||||
# Return most recently modified
|
||||
return max(matches, key=lambda p: p.stat().st_mtime)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def track_error(context: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Hook that preserves errors for context learning.
|
||||
|
||||
Called at post_solve after solver completes.
|
||||
Captures error information regardless of success/failure
|
||||
to enable learning from both outcomes.
|
||||
|
||||
Args:
|
||||
context: Hook context with trial information
|
||||
|
||||
Returns:
|
||||
Dictionary with error tracking results
|
||||
"""
|
||||
trial_number = context.get('trial_number', -1)
|
||||
working_dir = context.get('working_dir', '.')
|
||||
output_dir = context.get('output_dir', working_dir)
|
||||
solver_returncode = context.get('solver_returncode', 0)
|
||||
|
||||
# Determine if this is an error case
|
||||
# (solver returncode non-zero, or explicit error flag)
|
||||
is_error = (
|
||||
solver_returncode != 0 or
|
||||
context.get('error', False) or
|
||||
context.get('solver_failed', False)
|
||||
)
|
||||
|
||||
if not is_error:
|
||||
# No error to track, but still record success for learning
|
||||
return {"error_tracked": False, "trial_success": True}
|
||||
|
||||
# Find and extract F06 error info
|
||||
f06_path = context.get('f06_path')
|
||||
if not f06_path:
|
||||
f06_file = find_f06_file(working_dir, context.get('sim_file', ''))
|
||||
if f06_file:
|
||||
f06_path = str(f06_file)
|
||||
|
||||
f06_snippet = extract_f06_error(f06_path)
|
||||
|
||||
# Get error message from context or F06
|
||||
error_message = context.get('error_message', '')
|
||||
if not error_message and f06_snippet:
|
||||
# Extract first line of F06 error as message
|
||||
lines = f06_snippet.strip().split('\n')
|
||||
error_message = lines[0][:200] if lines else "Unknown solver error"
|
||||
|
||||
# Classify error
|
||||
error_type = classify_error(error_message or f06_snippet)
|
||||
|
||||
# Build error record
|
||||
error_info = {
|
||||
"trial": trial_number,
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"solver_returncode": solver_returncode,
|
||||
"error_type": error_type,
|
||||
"error_message": error_message,
|
||||
"f06_snippet": f06_snippet[:1000] if f06_snippet else "",
|
||||
"design_variables": context.get('design_variables', {}),
|
||||
"working_dir": working_dir
|
||||
}
|
||||
|
||||
# Save to error log (append mode - accumulate errors)
|
||||
error_log_path = Path(output_dir) / "error_history.jsonl"
|
||||
try:
|
||||
error_log_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(error_log_path, 'a', encoding='utf-8') as f:
|
||||
f.write(json.dumps(error_info) + "\n")
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not write error log: {e}")
|
||||
|
||||
# Try to update session state if context engineering is active
|
||||
try:
|
||||
from optimization_engine.context.session_state import get_session
|
||||
session = get_session()
|
||||
session.add_error(
|
||||
f"Trial {trial_number}: {error_type} - {error_message[:100]}",
|
||||
error_type=error_type
|
||||
)
|
||||
except ImportError:
|
||||
pass # Context module not available
|
||||
|
||||
# Try to record to LAC if available
|
||||
try:
|
||||
from knowledge_base.lac import get_lac
|
||||
lac = get_lac()
|
||||
lac.record_insight(
|
||||
category="failure",
|
||||
context=f"Trial {trial_number} solver error",
|
||||
insight=f"{error_type}: {error_message[:200]}",
|
||||
confidence=0.7,
|
||||
tags=["solver", error_type, "automatic"]
|
||||
)
|
||||
except ImportError:
|
||||
pass # LAC not available
|
||||
|
||||
return {
|
||||
"error_tracked": True,
|
||||
"error_type": error_type,
|
||||
"error_message": error_message[:200],
|
||||
"f06_extracted": bool(f06_snippet)
|
||||
}
|
||||
|
||||
|
||||
# Hook registration metadata
|
||||
HOOK_CONFIG = {
|
||||
"name": "error_tracker",
|
||||
"hook_point": "post_solve",
|
||||
"priority": 100, # Run early to capture before cleanup
|
||||
"enabled": True,
|
||||
"description": "Preserves solver errors for context learning"
|
||||
}
|
||||
|
||||
|
||||
# Make the function discoverable by hook manager
|
||||
def get_hook():
|
||||
"""Return the hook function for registration."""
|
||||
return track_error
|
||||
|
||||
|
||||
# For direct plugin discovery
|
||||
__all__ = ['track_error', 'HOOK_CONFIG', 'get_hook']
|
||||
739
tests/test_context_engineering.py
Normal file
739
tests/test_context_engineering.py
Normal file
@@ -0,0 +1,739 @@
|
||||
"""
|
||||
Test suite for context engineering components.
|
||||
|
||||
Tests the ACE (Agentic Context Engineering) implementation:
|
||||
- Playbook: Knowledge store with helpful/harmful tracking
|
||||
- Reflector: Outcome analysis and insight extraction
|
||||
- SessionState: Context isolation
|
||||
- Compaction: Long-running session management
|
||||
- FeedbackLoop: Automated learning
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
import tempfile
|
||||
import json
|
||||
from datetime import datetime
|
||||
|
||||
from optimization_engine.context.playbook import (
|
||||
AtomizerPlaybook,
|
||||
PlaybookItem,
|
||||
InsightCategory
|
||||
)
|
||||
from optimization_engine.context.reflector import (
|
||||
AtomizerReflector,
|
||||
OptimizationOutcome
|
||||
)
|
||||
from optimization_engine.context.session_state import (
|
||||
AtomizerSessionState,
|
||||
TaskType,
|
||||
ExposedState,
|
||||
IsolatedState
|
||||
)
|
||||
from optimization_engine.context.compaction import (
|
||||
CompactionManager,
|
||||
ContextEvent,
|
||||
EventType,
|
||||
ContextBudgetManager
|
||||
)
|
||||
from optimization_engine.context.cache_monitor import (
|
||||
ContextCacheOptimizer,
|
||||
CacheStats,
|
||||
StablePrefixBuilder
|
||||
)
|
||||
from optimization_engine.context.feedback_loop import (
|
||||
FeedbackLoop
|
||||
)
|
||||
|
||||
|
||||
class TestAtomizerPlaybook:
|
||||
"""Tests for the playbook system."""
|
||||
|
||||
def test_create_empty_playbook(self):
|
||||
"""Test creating an empty playbook."""
|
||||
playbook = AtomizerPlaybook()
|
||||
assert len(playbook.items) == 0
|
||||
assert playbook.version == 1
|
||||
|
||||
def test_add_insight(self):
|
||||
"""Test adding insights to playbook."""
|
||||
playbook = AtomizerPlaybook()
|
||||
|
||||
item = playbook.add_insight(
|
||||
category=InsightCategory.STRATEGY,
|
||||
content="Use shell elements for thin walls",
|
||||
source_trial=1
|
||||
)
|
||||
|
||||
assert item.id == "str-00001"
|
||||
assert item.helpful_count == 0
|
||||
assert item.harmful_count == 0
|
||||
assert item.category == InsightCategory.STRATEGY
|
||||
assert len(playbook.items) == 1
|
||||
assert 1 in item.source_trials
|
||||
|
||||
def test_add_multiple_categories(self):
|
||||
"""Test adding insights across different categories."""
|
||||
playbook = AtomizerPlaybook()
|
||||
|
||||
playbook.add_insight(InsightCategory.STRATEGY, "Strategy 1")
|
||||
playbook.add_insight(InsightCategory.MISTAKE, "Mistake 1")
|
||||
playbook.add_insight(InsightCategory.TOOL, "Tool tip 1")
|
||||
playbook.add_insight(InsightCategory.STRATEGY, "Strategy 2")
|
||||
|
||||
assert len(playbook.items) == 4
|
||||
assert "str-00001" in playbook.items
|
||||
assert "str-00002" in playbook.items
|
||||
assert "mis-00001" in playbook.items
|
||||
assert "tool-00001" in playbook.items
|
||||
|
||||
def test_deduplication(self):
|
||||
"""Test that duplicate insights are merged."""
|
||||
playbook = AtomizerPlaybook()
|
||||
|
||||
item1 = playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements")
|
||||
item2 = playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements")
|
||||
|
||||
# Should merge into one item
|
||||
assert len(playbook.items) == 1
|
||||
# Helpful count incremented on duplicate
|
||||
assert item2.helpful_count == 1
|
||||
assert item1 is item2 # Same object
|
||||
|
||||
def test_outcome_tracking(self):
|
||||
"""Test helpful/harmful tracking."""
|
||||
playbook = AtomizerPlaybook()
|
||||
item = playbook.add_insight(InsightCategory.STRATEGY, "Test insight")
|
||||
|
||||
playbook.record_outcome(item.id, helpful=True)
|
||||
playbook.record_outcome(item.id, helpful=True)
|
||||
playbook.record_outcome(item.id, helpful=False)
|
||||
|
||||
assert item.helpful_count == 2
|
||||
assert item.harmful_count == 1
|
||||
assert item.net_score == 1
|
||||
assert item.confidence == 2/3
|
||||
|
||||
def test_confidence_calculation(self):
|
||||
"""Test confidence score calculation."""
|
||||
playbook = AtomizerPlaybook()
|
||||
item = playbook.add_insight(InsightCategory.STRATEGY, "Test")
|
||||
|
||||
# Initial confidence is 0.5 (neutral)
|
||||
assert item.confidence == 0.5
|
||||
|
||||
# After positive feedback
|
||||
playbook.record_outcome(item.id, helpful=True)
|
||||
assert item.confidence == 1.0
|
||||
|
||||
# After mixed feedback
|
||||
playbook.record_outcome(item.id, helpful=False)
|
||||
assert item.confidence == 0.5
|
||||
|
||||
def test_persistence(self, tmp_path):
|
||||
"""Test save/load cycle."""
|
||||
playbook = AtomizerPlaybook()
|
||||
playbook.add_insight(InsightCategory.MISTAKE, "Don't do this", tags=["test"])
|
||||
playbook.add_insight(InsightCategory.STRATEGY, "Do this instead")
|
||||
|
||||
# Record some outcomes
|
||||
playbook.record_outcome("mis-00001", helpful=False)
|
||||
playbook.record_outcome("str-00001", helpful=True)
|
||||
|
||||
save_path = tmp_path / "playbook.json"
|
||||
playbook.save(save_path)
|
||||
|
||||
# Load and verify
|
||||
loaded = AtomizerPlaybook.load(save_path)
|
||||
assert len(loaded.items) == 2
|
||||
assert "mis-00001" in loaded.items
|
||||
assert loaded.items["mis-00001"].harmful_count == 1
|
||||
assert loaded.items["str-00001"].helpful_count == 1
|
||||
assert "test" in loaded.items["mis-00001"].tags
|
||||
|
||||
def test_pruning(self):
|
||||
"""Test harmful item pruning."""
|
||||
playbook = AtomizerPlaybook()
|
||||
item = playbook.add_insight(InsightCategory.STRATEGY, "Bad advice")
|
||||
|
||||
# Record many harmful outcomes
|
||||
for _ in range(5):
|
||||
playbook.record_outcome(item.id, helpful=False)
|
||||
|
||||
assert item.net_score == -5
|
||||
|
||||
# Prune with threshold -3
|
||||
removed = playbook.prune_harmful(threshold=-3)
|
||||
|
||||
assert removed == 1
|
||||
assert len(playbook.items) == 0
|
||||
|
||||
def test_search_by_content(self):
|
||||
"""Test content search functionality."""
|
||||
playbook = AtomizerPlaybook()
|
||||
playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements for thin walls")
|
||||
playbook.add_insight(InsightCategory.STRATEGY, "Solid elements for thick parts")
|
||||
playbook.add_insight(InsightCategory.MISTAKE, "Don't use coarse mesh")
|
||||
|
||||
results = playbook.search_by_content("shell elements")
|
||||
assert len(results) >= 1
|
||||
assert "shell" in results[0].content.lower()
|
||||
|
||||
def test_get_context_for_task(self):
|
||||
"""Test context string generation."""
|
||||
playbook = AtomizerPlaybook()
|
||||
playbook.add_insight(InsightCategory.STRATEGY, "Strategy 1")
|
||||
playbook.add_insight(InsightCategory.MISTAKE, "Mistake 1")
|
||||
|
||||
# Make strategy have higher score
|
||||
playbook.record_outcome("str-00001", helpful=True)
|
||||
playbook.record_outcome("str-00001", helpful=True)
|
||||
|
||||
context = playbook.get_context_for_task("optimization")
|
||||
|
||||
assert "Playbook" in context
|
||||
assert "str-00001" in context
|
||||
assert "helpful=2" in context
|
||||
|
||||
|
||||
class TestAtomizerReflector:
|
||||
"""Tests for the reflector component."""
|
||||
|
||||
def test_create_reflector(self):
|
||||
"""Test creating a reflector."""
|
||||
playbook = AtomizerPlaybook()
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
assert reflector.playbook is playbook
|
||||
assert len(reflector.pending_insights) == 0
|
||||
|
||||
def test_analyze_successful_trial(self):
|
||||
"""Test analysis of successful trial."""
|
||||
playbook = AtomizerPlaybook()
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
outcome = OptimizationOutcome(
|
||||
trial_number=1,
|
||||
success=True,
|
||||
objective_value=100.0,
|
||||
constraint_violations=[],
|
||||
solver_errors=[],
|
||||
design_variables={"thickness": 1.0, "width": 5.0},
|
||||
extractor_used="mass_extractor",
|
||||
duration_seconds=60
|
||||
)
|
||||
|
||||
insights = reflector.analyze_trial(outcome)
|
||||
|
||||
# Should extract success pattern
|
||||
assert len(insights) >= 1
|
||||
assert any(i.helpful for i in insights)
|
||||
assert 1 in reflector.analyzed_trials
|
||||
|
||||
def test_analyze_failed_trial(self):
|
||||
"""Test analysis of failed trial."""
|
||||
playbook = AtomizerPlaybook()
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
outcome = OptimizationOutcome(
|
||||
trial_number=1,
|
||||
success=False,
|
||||
objective_value=None,
|
||||
constraint_violations=["stress > 250 MPa"],
|
||||
solver_errors=["convergence failure at iteration 50"],
|
||||
design_variables={"thickness": 0.5},
|
||||
extractor_used="stress_extractor",
|
||||
duration_seconds=120
|
||||
)
|
||||
|
||||
insights = reflector.analyze_trial(outcome)
|
||||
|
||||
# Should extract failure patterns
|
||||
assert len(insights) >= 2 # At least error + constraint
|
||||
assert any(i.category == InsightCategory.MISTAKE for i in insights)
|
||||
assert not any(i.helpful for i in insights if i.category == InsightCategory.MISTAKE)
|
||||
|
||||
def test_analyze_mesh_error(self):
|
||||
"""Test analysis of mesh-related error."""
|
||||
playbook = AtomizerPlaybook()
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
outcome = OptimizationOutcome(
|
||||
trial_number=5,
|
||||
success=False,
|
||||
objective_value=None,
|
||||
constraint_violations=[],
|
||||
solver_errors=["Element distortion: negative jacobian detected"],
|
||||
design_variables={},
|
||||
extractor_used="",
|
||||
duration_seconds=30
|
||||
)
|
||||
|
||||
insights = reflector.analyze_trial(outcome)
|
||||
|
||||
# Should identify mesh error
|
||||
assert any("mesh" in str(i.tags).lower() for i in insights)
|
||||
|
||||
def test_commit_insights(self):
|
||||
"""Test committing insights to playbook."""
|
||||
playbook = AtomizerPlaybook()
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
outcome = OptimizationOutcome(
|
||||
trial_number=1,
|
||||
success=True,
|
||||
objective_value=100.0,
|
||||
constraint_violations=[],
|
||||
solver_errors=[],
|
||||
design_variables={"thickness": 1.0},
|
||||
extractor_used="mass_extractor",
|
||||
duration_seconds=60
|
||||
)
|
||||
|
||||
reflector.analyze_trial(outcome)
|
||||
count = reflector.commit_insights()
|
||||
|
||||
assert count > 0
|
||||
assert len(playbook.items) > 0
|
||||
assert len(reflector.pending_insights) == 0 # Cleared after commit
|
||||
|
||||
def test_analyze_study_completion(self):
|
||||
"""Test study-level analysis."""
|
||||
playbook = AtomizerPlaybook()
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
# High success rate study
|
||||
insights = reflector.analyze_study_completion(
|
||||
study_name="test_study",
|
||||
total_trials=100,
|
||||
best_value=50.0,
|
||||
convergence_rate=0.95,
|
||||
method="TPE"
|
||||
)
|
||||
|
||||
assert len(insights) >= 1
|
||||
assert any("robust" in i.content.lower() for i in insights)
|
||||
|
||||
|
||||
class TestSessionState:
|
||||
"""Tests for session state management."""
|
||||
|
||||
def test_create_session(self):
|
||||
"""Test creating a session."""
|
||||
session = AtomizerSessionState(session_id="test_session")
|
||||
|
||||
assert session.session_id == "test_session"
|
||||
assert session.exposed.task_type is None
|
||||
assert len(session.exposed.recent_actions) == 0
|
||||
|
||||
def test_set_task_type(self):
|
||||
"""Test setting task type."""
|
||||
session = AtomizerSessionState(session_id="test")
|
||||
session.exposed.task_type = TaskType.CREATE_STUDY
|
||||
|
||||
assert session.exposed.task_type == TaskType.CREATE_STUDY
|
||||
|
||||
def test_add_action(self):
|
||||
"""Test adding actions."""
|
||||
session = AtomizerSessionState(session_id="test")
|
||||
|
||||
session.add_action("Created study directory")
|
||||
session.add_action("Configured optimization")
|
||||
|
||||
assert len(session.exposed.recent_actions) == 2
|
||||
assert "Created study" in session.exposed.recent_actions[0]
|
||||
|
||||
def test_action_compression(self):
|
||||
"""Test automatic action compression."""
|
||||
session = AtomizerSessionState(session_id="test")
|
||||
|
||||
# Add more actions than the limit
|
||||
for i in range(15):
|
||||
session.add_action(f"Action {i}")
|
||||
|
||||
# Should be compressed
|
||||
assert len(session.exposed.recent_actions) <= 12
|
||||
assert any("earlier actions" in a.lower() for a in session.exposed.recent_actions)
|
||||
|
||||
def test_add_error(self):
|
||||
"""Test adding errors."""
|
||||
session = AtomizerSessionState(session_id="test")
|
||||
|
||||
session.add_error("Solver failed", error_type="convergence")
|
||||
session.add_error("Mesh error")
|
||||
|
||||
assert len(session.exposed.recent_errors) == 2
|
||||
assert "[convergence]" in session.exposed.recent_errors[0]
|
||||
|
||||
def test_update_study_status(self):
|
||||
"""Test updating study status."""
|
||||
session = AtomizerSessionState(session_id="test")
|
||||
|
||||
session.update_study_status(
|
||||
name="bracket_opt",
|
||||
status="running",
|
||||
trials_completed=25,
|
||||
trials_total=100,
|
||||
best_value=0.5,
|
||||
best_trial=20
|
||||
)
|
||||
|
||||
assert session.exposed.study_name == "bracket_opt"
|
||||
assert session.exposed.trials_completed == 25
|
||||
assert session.exposed.best_value == 0.5
|
||||
|
||||
def test_llm_context_generation(self):
|
||||
"""Test LLM context string generation."""
|
||||
session = AtomizerSessionState(session_id="test")
|
||||
session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
||||
session.exposed.study_name = "test_study"
|
||||
session.exposed.trials_completed = 50
|
||||
session.exposed.trials_total = 100
|
||||
session.exposed.best_value = 0.5
|
||||
|
||||
context = session.get_llm_context()
|
||||
|
||||
assert "test_study" in context
|
||||
assert "50" in context
|
||||
assert "0.5" in context
|
||||
assert "run_optimization" in context
|
||||
|
||||
def test_isolated_state_access(self):
|
||||
"""Test accessing isolated state."""
|
||||
session = AtomizerSessionState(session_id="test")
|
||||
session.isolated.nx_model_path = "/path/to/model.prt"
|
||||
|
||||
# Should not appear in LLM context
|
||||
context = session.get_llm_context()
|
||||
assert "/path/to/model.prt" not in context
|
||||
|
||||
# But accessible via explicit load
|
||||
path = session.load_isolated_data("nx_model_path")
|
||||
assert path == "/path/to/model.prt"
|
||||
|
||||
def test_persistence(self, tmp_path):
|
||||
"""Test save/load cycle."""
|
||||
session = AtomizerSessionState(session_id="test_persist")
|
||||
session.exposed.task_type = TaskType.ANALYZE_RESULTS
|
||||
session.exposed.study_name = "persist_study"
|
||||
session.add_action("Test action")
|
||||
|
||||
save_path = tmp_path / "session.json"
|
||||
session.save(save_path)
|
||||
|
||||
loaded = AtomizerSessionState.load(save_path)
|
||||
|
||||
assert loaded.session_id == "test_persist"
|
||||
assert loaded.exposed.task_type == TaskType.ANALYZE_RESULTS
|
||||
assert loaded.exposed.study_name == "persist_study"
|
||||
|
||||
|
||||
class TestCompactionManager:
|
||||
"""Tests for context compaction."""
|
||||
|
||||
def test_create_manager(self):
|
||||
"""Test creating compaction manager."""
|
||||
manager = CompactionManager(compaction_threshold=10, keep_recent=5)
|
||||
|
||||
assert manager.compaction_threshold == 10
|
||||
assert manager.keep_recent == 5
|
||||
assert len(manager.events) == 0
|
||||
|
||||
def test_add_events(self):
|
||||
"""Test adding events."""
|
||||
manager = CompactionManager(compaction_threshold=50)
|
||||
|
||||
manager.add_trial_event(trial_number=1, success=True, objective=100.0)
|
||||
manager.add_trial_event(trial_number=2, success=False)
|
||||
|
||||
assert len(manager.events) == 2
|
||||
|
||||
def test_compaction_trigger(self):
|
||||
"""Test that compaction triggers at threshold."""
|
||||
manager = CompactionManager(compaction_threshold=10, keep_recent=5)
|
||||
|
||||
for i in range(15):
|
||||
manager.add_event(ContextEvent(
|
||||
timestamp=datetime.now(),
|
||||
event_type=EventType.TRIAL_COMPLETE,
|
||||
summary=f"Trial {i} complete",
|
||||
details={"trial_number": i, "objective": i * 0.1}
|
||||
))
|
||||
|
||||
assert manager.compaction_count > 0
|
||||
assert len(manager.events) <= 10
|
||||
|
||||
def test_error_preservation(self):
|
||||
"""Test that errors are never compacted."""
|
||||
manager = CompactionManager(compaction_threshold=10, keep_recent=3)
|
||||
|
||||
# Add error early
|
||||
manager.add_error_event("Critical solver failure", "solver_error")
|
||||
|
||||
# Add many regular events
|
||||
for i in range(20):
|
||||
manager.add_trial_event(trial_number=i, success=True, objective=i)
|
||||
|
||||
# Error should still be present
|
||||
errors = [e for e in manager.events if e.event_type == EventType.ERROR]
|
||||
assert len(errors) == 1
|
||||
assert "Critical solver failure" in errors[0].summary
|
||||
|
||||
def test_milestone_preservation(self):
|
||||
"""Test that milestones are preserved."""
|
||||
manager = CompactionManager(compaction_threshold=10, keep_recent=3)
|
||||
|
||||
manager.add_milestone("Optimization started", {"method": "TPE"})
|
||||
|
||||
for i in range(20):
|
||||
manager.add_trial_event(trial_number=i, success=True)
|
||||
|
||||
# Milestone should be preserved
|
||||
milestones = [e for e in manager.events if e.event_type == EventType.MILESTONE]
|
||||
assert len(milestones) == 1
|
||||
|
||||
def test_context_string_generation(self):
|
||||
"""Test context string generation."""
|
||||
manager = CompactionManager()
|
||||
|
||||
manager.add_trial_event(trial_number=1, success=True, objective=100.0)
|
||||
manager.add_error_event("Test error")
|
||||
|
||||
context = manager.get_context_string()
|
||||
|
||||
assert "Optimization History" in context
|
||||
assert "Trial 1" in context
|
||||
assert "Test error" in context
|
||||
|
||||
def test_get_stats(self):
|
||||
"""Test statistics generation."""
|
||||
manager = CompactionManager(compaction_threshold=10, keep_recent=5)
|
||||
|
||||
for i in range(15):
|
||||
manager.add_trial_event(trial_number=i, success=i % 2 == 0)
|
||||
|
||||
stats = manager.get_stats()
|
||||
|
||||
assert stats["total_events"] <= 15
|
||||
assert stats["compaction_count"] > 0
|
||||
|
||||
|
||||
class TestCacheMonitor:
|
||||
"""Tests for cache monitoring."""
|
||||
|
||||
def test_create_optimizer(self):
|
||||
"""Test creating cache optimizer."""
|
||||
optimizer = ContextCacheOptimizer()
|
||||
|
||||
assert optimizer.stats.total_requests == 0
|
||||
assert optimizer.stats.cache_hits == 0
|
||||
|
||||
def test_prepare_context(self):
|
||||
"""Test context preparation."""
|
||||
optimizer = ContextCacheOptimizer()
|
||||
|
||||
context = optimizer.prepare_context(
|
||||
stable_prefix="Stable content",
|
||||
semi_stable="Session content",
|
||||
dynamic="User message"
|
||||
)
|
||||
|
||||
assert "Stable content" in context
|
||||
assert "Session content" in context
|
||||
assert "User message" in context
|
||||
assert optimizer.stats.total_requests == 1
|
||||
|
||||
def test_cache_hit_detection(self):
|
||||
"""Test cache hit detection."""
|
||||
optimizer = ContextCacheOptimizer()
|
||||
|
||||
# First request
|
||||
optimizer.prepare_context("Stable", "Semi", "Dynamic 1")
|
||||
|
||||
# Second request with same stable prefix
|
||||
optimizer.prepare_context("Stable", "Semi", "Dynamic 2")
|
||||
|
||||
assert optimizer.stats.total_requests == 2
|
||||
assert optimizer.stats.cache_hits == 1
|
||||
|
||||
def test_cache_miss_detection(self):
|
||||
"""Test cache miss detection."""
|
||||
optimizer = ContextCacheOptimizer()
|
||||
|
||||
optimizer.prepare_context("Stable 1", "Semi", "Dynamic")
|
||||
optimizer.prepare_context("Stable 2", "Semi", "Dynamic") # Different prefix
|
||||
|
||||
assert optimizer.stats.cache_hits == 0
|
||||
assert optimizer.stats.cache_misses == 2
|
||||
|
||||
def test_stable_prefix_builder(self):
|
||||
"""Test stable prefix builder."""
|
||||
builder = StablePrefixBuilder()
|
||||
|
||||
builder.add_identity("I am Atomizer")
|
||||
builder.add_capabilities("I can optimize")
|
||||
builder.add_tools("Tool definitions here")
|
||||
|
||||
prefix = builder.build()
|
||||
|
||||
assert "I am Atomizer" in prefix
|
||||
assert "I can optimize" in prefix
|
||||
# Identity should come before capabilities (order 10 < 20)
|
||||
assert prefix.index("Atomizer") < prefix.index("optimize")
|
||||
|
||||
|
||||
class TestFeedbackLoop:
|
||||
"""Tests for the feedback loop."""
|
||||
|
||||
def test_create_feedback_loop(self, tmp_path):
|
||||
"""Test creating feedback loop."""
|
||||
playbook_path = tmp_path / "playbook.json"
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
assert feedback.playbook is not None
|
||||
assert feedback._total_trials_processed == 0
|
||||
|
||||
def test_process_successful_trial(self, tmp_path):
|
||||
"""Test processing successful trial."""
|
||||
playbook_path = tmp_path / "playbook.json"
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
result = feedback.process_trial_result(
|
||||
trial_number=1,
|
||||
success=True,
|
||||
objective_value=100.0,
|
||||
design_variables={"thickness": 1.0}
|
||||
)
|
||||
|
||||
assert result["trial_number"] == 1
|
||||
assert result["success"] is True
|
||||
assert feedback._total_trials_processed == 1
|
||||
assert feedback._successful_trials == 1
|
||||
|
||||
def test_process_failed_trial(self, tmp_path):
|
||||
"""Test processing failed trial."""
|
||||
playbook_path = tmp_path / "playbook.json"
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
result = feedback.process_trial_result(
|
||||
trial_number=1,
|
||||
success=False,
|
||||
objective_value=0.0,
|
||||
design_variables={"thickness": 0.5},
|
||||
errors=["Convergence failure"]
|
||||
)
|
||||
|
||||
assert result["success"] is False
|
||||
assert feedback._failed_trials == 1
|
||||
|
||||
def test_finalize_study(self, tmp_path):
|
||||
"""Test study finalization."""
|
||||
playbook_path = tmp_path / "playbook.json"
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
# Process some trials
|
||||
for i in range(10):
|
||||
feedback.process_trial_result(
|
||||
trial_number=i,
|
||||
success=i % 3 != 0,
|
||||
objective_value=100 - i if i % 3 != 0 else 0,
|
||||
design_variables={"x": i * 0.1}
|
||||
)
|
||||
|
||||
# Finalize
|
||||
result = feedback.finalize_study({
|
||||
"name": "test_study",
|
||||
"total_trials": 10,
|
||||
"best_value": 91,
|
||||
"convergence_rate": 0.7
|
||||
})
|
||||
|
||||
assert result["insights_added"] > 0
|
||||
assert result["playbook_size"] > 0
|
||||
assert playbook_path.exists() # Should be saved
|
||||
|
||||
def test_playbook_item_attribution(self, tmp_path):
|
||||
"""Test that playbook items get updated based on outcomes."""
|
||||
playbook_path = tmp_path / "playbook.json"
|
||||
|
||||
# Pre-populate playbook
|
||||
playbook = AtomizerPlaybook()
|
||||
item = playbook.add_insight(InsightCategory.STRATEGY, "Test strategy")
|
||||
playbook.save(playbook_path)
|
||||
|
||||
# Create feedback loop and process trials with this item active
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
feedback.process_trial_result(
|
||||
trial_number=1,
|
||||
success=True,
|
||||
objective_value=100.0,
|
||||
design_variables={},
|
||||
context_items_used=[item.id]
|
||||
)
|
||||
|
||||
feedback.process_trial_result(
|
||||
trial_number=2,
|
||||
success=True,
|
||||
objective_value=95.0,
|
||||
design_variables={},
|
||||
context_items_used=[item.id]
|
||||
)
|
||||
|
||||
# Item should have positive feedback
|
||||
assert feedback.playbook.items[item.id].helpful_count == 2
|
||||
|
||||
|
||||
class TestContextBudgetManager:
|
||||
"""Tests for context budget management."""
|
||||
|
||||
def test_create_manager(self):
|
||||
"""Test creating budget manager."""
|
||||
manager = ContextBudgetManager()
|
||||
|
||||
assert manager.budget["total"] == 100000
|
||||
assert "stable_prefix" in manager.budget
|
||||
|
||||
def test_estimate_tokens(self):
|
||||
"""Test token estimation."""
|
||||
manager = ContextBudgetManager()
|
||||
|
||||
tokens = manager.estimate_tokens("Hello world") # 11 chars
|
||||
assert tokens == 2 # 11 / 4 = 2.75 -> 2
|
||||
|
||||
def test_update_usage(self):
|
||||
"""Test usage tracking."""
|
||||
manager = ContextBudgetManager()
|
||||
|
||||
result = manager.update_usage("stable_prefix", "x" * 20000) # 5000 tokens
|
||||
|
||||
assert result["section"] == "stable_prefix"
|
||||
assert result["tokens"] == 5000
|
||||
assert result["over_budget"] is False
|
||||
|
||||
def test_over_budget_warning(self):
|
||||
"""Test over-budget detection."""
|
||||
manager = ContextBudgetManager()
|
||||
|
||||
# Exceed stable_prefix budget (5000 tokens = 20000 chars)
|
||||
result = manager.update_usage("stable_prefix", "x" * 40000) # 10000 tokens
|
||||
|
||||
assert result["over_budget"] is True
|
||||
assert "warning" in result
|
||||
|
||||
def test_get_status(self):
|
||||
"""Test overall status reporting."""
|
||||
manager = ContextBudgetManager()
|
||||
|
||||
manager.update_usage("stable_prefix", "x" * 10000)
|
||||
manager.update_usage("protocols", "x" * 20000)
|
||||
|
||||
status = manager.get_status()
|
||||
|
||||
assert "total_used" in status
|
||||
assert "utilization" in status
|
||||
assert "recommendations" in status
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
463
tests/test_context_integration.py
Normal file
463
tests/test_context_integration.py
Normal file
@@ -0,0 +1,463 @@
|
||||
"""
|
||||
Integration test for full context engineering pipeline.
|
||||
|
||||
Tests the complete ACE (Agentic Context Engineering) workflow:
|
||||
1. Starting fresh session
|
||||
2. Running optimization with successes and failures
|
||||
3. Verifying playbook learns from outcomes
|
||||
4. Validating persistence across sessions
|
||||
5. Testing context compaction under load
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
import tempfile
|
||||
import json
|
||||
from datetime import datetime, timedelta
|
||||
import random
|
||||
|
||||
from optimization_engine.context.playbook import AtomizerPlaybook, InsightCategory
|
||||
from optimization_engine.context.reflector import AtomizerReflector, OptimizationOutcome
|
||||
from optimization_engine.context.session_state import AtomizerSessionState, TaskType
|
||||
from optimization_engine.context.feedback_loop import FeedbackLoop
|
||||
from optimization_engine.context.compaction import CompactionManager, EventType
|
||||
from optimization_engine.context.cache_monitor import ContextCacheOptimizer, StablePrefixBuilder
|
||||
|
||||
|
||||
class TestFullOptimizationPipeline:
|
||||
"""End-to-end test of optimization with context engineering."""
|
||||
|
||||
def test_complete_optimization_cycle(self, tmp_path):
|
||||
"""
|
||||
Simulates a complete optimization run:
|
||||
1. Initialize context engineering
|
||||
2. Process multiple trials (mix of success/failure)
|
||||
3. Finalize and commit learning
|
||||
4. Verify playbook has learned
|
||||
"""
|
||||
playbook_path = tmp_path / "playbook.json"
|
||||
|
||||
# Initialize feedback loop
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
# Simulate study with mixed results
|
||||
trial_results = []
|
||||
for i in range(20):
|
||||
success = random.random() > 0.3 # 70% success rate
|
||||
obj_value = 100 - i * 2 + random.uniform(-5, 5) if success else None
|
||||
|
||||
result = feedback.process_trial_result(
|
||||
trial_number=i,
|
||||
success=success,
|
||||
objective_value=obj_value if success else 0.0,
|
||||
design_variables={
|
||||
"thickness": 0.5 + i * 0.1,
|
||||
"width": 10 + i * 0.5
|
||||
},
|
||||
context_items_used=[],
|
||||
errors=["convergence failure"] if not success else None
|
||||
)
|
||||
|
||||
trial_results.append({
|
||||
"trial": i,
|
||||
"success": success,
|
||||
"insights": result.get("insights_extracted", 0)
|
||||
})
|
||||
|
||||
# Finalize study
|
||||
successful = sum(1 for r in trial_results if r["success"])
|
||||
final_result = feedback.finalize_study({
|
||||
"name": "integration_test_study",
|
||||
"total_trials": 20,
|
||||
"best_value": min(
|
||||
r.get("objective_value", float('inf'))
|
||||
for r in trial_results if r["success"]
|
||||
) if successful > 0 else 0,
|
||||
"convergence_rate": successful / 20
|
||||
})
|
||||
|
||||
# Verify learning occurred
|
||||
assert final_result["insights_added"] > 0
|
||||
assert final_result["playbook_size"] > 0
|
||||
assert playbook_path.exists()
|
||||
|
||||
# Load and verify playbook content
|
||||
playbook = AtomizerPlaybook.load(playbook_path)
|
||||
|
||||
# Should have some mistake insights from failures
|
||||
mistakes = [
|
||||
item for item in playbook.items.values()
|
||||
if item.category == InsightCategory.MISTAKE
|
||||
]
|
||||
assert len(mistakes) > 0
|
||||
|
||||
def test_learning_persistence_across_sessions(self, tmp_path):
|
||||
"""
|
||||
Test that learning persists across multiple "sessions".
|
||||
"""
|
||||
playbook_path = tmp_path / "playbook.json"
|
||||
|
||||
# Session 1: Generate initial learning
|
||||
feedback1 = FeedbackLoop(playbook_path)
|
||||
for i in range(10):
|
||||
feedback1.process_trial_result(
|
||||
trial_number=i,
|
||||
success=True,
|
||||
objective_value=100 - i,
|
||||
design_variables={"x": i}
|
||||
)
|
||||
feedback1.finalize_study({
|
||||
"name": "session1",
|
||||
"total_trials": 10,
|
||||
"best_value": 91,
|
||||
"convergence_rate": 1.0
|
||||
})
|
||||
|
||||
# Verify session 1 created insights
|
||||
pb1 = AtomizerPlaybook.load(playbook_path)
|
||||
session1_items = len(pb1.items)
|
||||
assert session1_items > 0
|
||||
|
||||
# Session 2: Continue learning
|
||||
feedback2 = FeedbackLoop(playbook_path)
|
||||
|
||||
# Should have loaded existing playbook
|
||||
assert len(feedback2.playbook.items) == session1_items
|
||||
|
||||
# Add more trials
|
||||
for i in range(10, 20):
|
||||
feedback2.process_trial_result(
|
||||
trial_number=i,
|
||||
success=i % 2 == 0,
|
||||
objective_value=100 - i if i % 2 == 0 else 0.0,
|
||||
design_variables={"x": i},
|
||||
errors=["test error"] if i % 2 != 0 else None
|
||||
)
|
||||
feedback2.finalize_study({
|
||||
"name": "session2",
|
||||
"total_trials": 10,
|
||||
"best_value": 80,
|
||||
"convergence_rate": 0.5
|
||||
})
|
||||
|
||||
# Verify combined learning
|
||||
pb2 = AtomizerPlaybook.load(playbook_path)
|
||||
assert len(pb2.items) >= session1_items # At least as many items
|
||||
|
||||
def test_playbook_pruning_over_time(self, tmp_path):
|
||||
"""
|
||||
Test that harmful insights get pruned.
|
||||
"""
|
||||
playbook_path = tmp_path / "playbook.json"
|
||||
|
||||
# Create playbook with a "bad" insight
|
||||
playbook = AtomizerPlaybook()
|
||||
bad_item = playbook.add_insight(
|
||||
InsightCategory.STRATEGY,
|
||||
"Use extremely coarse mesh" # Bad advice
|
||||
)
|
||||
|
||||
# Give it many harmful outcomes
|
||||
for _ in range(10):
|
||||
playbook.record_outcome(bad_item.id, helpful=False)
|
||||
|
||||
playbook.save(playbook_path)
|
||||
|
||||
# Create feedback loop and finalize
|
||||
feedback = FeedbackLoop(playbook_path)
|
||||
|
||||
# Process a few trials
|
||||
for i in range(5):
|
||||
feedback.process_trial_result(
|
||||
trial_number=i,
|
||||
success=True,
|
||||
objective_value=100,
|
||||
design_variables={}
|
||||
)
|
||||
|
||||
feedback.finalize_study({
|
||||
"name": "prune_test",
|
||||
"total_trials": 5,
|
||||
"best_value": 100,
|
||||
"convergence_rate": 1.0
|
||||
})
|
||||
|
||||
# Bad insight should be pruned (net_score -10 < threshold -3)
|
||||
final_playbook = AtomizerPlaybook.load(playbook_path)
|
||||
assert bad_item.id not in final_playbook.items
|
||||
|
||||
def test_context_compaction_under_load(self, tmp_path):
|
||||
"""
|
||||
Test that compaction works correctly under high trial volume.
|
||||
"""
|
||||
manager = CompactionManager(
|
||||
compaction_threshold=20,
|
||||
keep_recent=10,
|
||||
keep_errors=True
|
||||
)
|
||||
|
||||
# Simulate 100 trials
|
||||
errors_added = 0
|
||||
for i in range(100):
|
||||
success = i % 5 != 0
|
||||
|
||||
if success:
|
||||
manager.add_trial_event(
|
||||
trial_number=i,
|
||||
success=True,
|
||||
objective=100 - i * 0.5,
|
||||
duration=random.uniform(30, 120)
|
||||
)
|
||||
else:
|
||||
manager.add_trial_event(
|
||||
trial_number=i,
|
||||
success=False,
|
||||
duration=random.uniform(30, 120)
|
||||
)
|
||||
manager.add_error_event(
|
||||
f"Error in trial {i}",
|
||||
error_type="test_error"
|
||||
)
|
||||
errors_added += 1
|
||||
|
||||
# Should have compacted
|
||||
stats = manager.get_stats()
|
||||
assert stats["compaction_count"] > 0
|
||||
|
||||
# All errors should be preserved
|
||||
assert stats["error_events"] == errors_added
|
||||
|
||||
# Total events should be bounded
|
||||
assert stats["total_events"] < 100 # Compaction reduced count
|
||||
|
||||
# Context string should be reasonable length
|
||||
context = manager.get_context_string()
|
||||
assert len(context) < 50000 # Not too long
|
||||
|
||||
def test_session_state_throughout_optimization(self, tmp_path):
|
||||
"""
|
||||
Test session state tracking throughout an optimization.
|
||||
"""
|
||||
session = AtomizerSessionState(session_id="integration_test")
|
||||
session.exposed.task_type = TaskType.RUN_OPTIMIZATION
|
||||
session.exposed.study_name = "state_test"
|
||||
|
||||
# Simulate optimization progress
|
||||
for i in range(20):
|
||||
session.add_action(f"Processing trial {i}")
|
||||
|
||||
if i % 5 == 0 and i > 0:
|
||||
session.update_study_status(
|
||||
name="state_test",
|
||||
status="running",
|
||||
trials_completed=i,
|
||||
trials_total=20,
|
||||
best_value=100 - i,
|
||||
best_trial=i
|
||||
)
|
||||
|
||||
if i % 7 == 0:
|
||||
session.add_error(f"Minor issue at trial {i}")
|
||||
|
||||
# Verify state
|
||||
assert session.exposed.trials_completed == 15 # Last update at i=15
|
||||
assert len(session.exposed.recent_errors) <= 5 # Bounded
|
||||
|
||||
# Context should include key information
|
||||
context = session.get_llm_context()
|
||||
assert "state_test" in context
|
||||
assert "running" in context
|
||||
|
||||
def test_cache_optimization_effectiveness(self):
|
||||
"""
|
||||
Test that cache optimization actually works.
|
||||
"""
|
||||
optimizer = ContextCacheOptimizer()
|
||||
|
||||
# Build stable prefix (should be cached)
|
||||
builder = StablePrefixBuilder()
|
||||
builder.add_identity("I am Atomizer, an optimization assistant")
|
||||
builder.add_capabilities("I can run FEA optimizations")
|
||||
builder.add_tools("Available tools: NX, Nastran, Optuna")
|
||||
stable_prefix = builder.build()
|
||||
|
||||
# Simulate 10 requests with same stable prefix
|
||||
for i in range(10):
|
||||
optimizer.prepare_context(
|
||||
stable_prefix=stable_prefix,
|
||||
semi_stable=f"Session info for request {i}",
|
||||
dynamic=f"User message {i}"
|
||||
)
|
||||
|
||||
# Should have high cache hit rate
|
||||
assert optimizer.stats.hit_rate >= 0.9 # 9/10 hits
|
||||
assert optimizer.stats.estimated_savings_percent >= 80 # Good savings
|
||||
|
||||
|
||||
class TestReflectorLearningPatterns:
|
||||
"""Test that the reflector extracts useful patterns."""
|
||||
|
||||
def test_convergence_pattern_learning(self, tmp_path):
|
||||
"""Test learning from convergence failures."""
|
||||
playbook = AtomizerPlaybook()
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
# Simulate convergence failures
|
||||
for i in range(5):
|
||||
outcome = OptimizationOutcome(
|
||||
trial_number=i,
|
||||
success=False,
|
||||
objective_value=None,
|
||||
solver_errors=["Convergence failure at iteration 100"],
|
||||
design_variables={"x": i * 0.1},
|
||||
duration_seconds=300
|
||||
)
|
||||
reflector.analyze_trial(outcome)
|
||||
|
||||
reflector.commit_insights()
|
||||
|
||||
# Should have learned about convergence issues
|
||||
convergence_insights = [
|
||||
item for item in playbook.items.values()
|
||||
if "convergence" in item.content.lower()
|
||||
]
|
||||
assert len(convergence_insights) > 0
|
||||
|
||||
def test_success_pattern_learning(self, tmp_path):
|
||||
"""Test learning from successful designs."""
|
||||
playbook = AtomizerPlaybook()
|
||||
reflector = AtomizerReflector(playbook)
|
||||
|
||||
# Simulate successful designs with similar characteristics
|
||||
for i in range(5):
|
||||
outcome = OptimizationOutcome(
|
||||
trial_number=i,
|
||||
success=True,
|
||||
objective_value=50 + i,
|
||||
design_variables={
|
||||
"thickness": 1.0 + i * 0.1, # All around 1.0-1.5
|
||||
"width": 10.0 # Consistent
|
||||
},
|
||||
duration_seconds=60
|
||||
)
|
||||
reflector.analyze_trial(outcome)
|
||||
|
||||
reflector.commit_insights()
|
||||
|
||||
# Should have learned success patterns
|
||||
success_insights = [
|
||||
item for item in playbook.items.values()
|
||||
if item.category == InsightCategory.STRATEGY
|
||||
]
|
||||
assert len(success_insights) > 0
|
||||
|
||||
|
||||
class TestErrorTrackerIntegration:
|
||||
"""Test error tracker plugin integration."""
|
||||
|
||||
def test_error_classification(self):
|
||||
"""Test error classification function."""
|
||||
from optimization_engine.plugins.post_solve.error_tracker import classify_error
|
||||
|
||||
assert classify_error("Convergence failure at iteration 50") == "convergence_failure"
|
||||
assert classify_error("Element distortion detected") == "mesh_error"
|
||||
assert classify_error("Matrix singularity") == "singularity"
|
||||
assert classify_error("Out of memory") == "memory_error"
|
||||
assert classify_error("License checkout failed") == "license_error"
|
||||
assert classify_error("Random unknown error") == "unknown_error"
|
||||
|
||||
def test_error_tracking_hook(self, tmp_path):
|
||||
"""Test the error tracking hook function."""
|
||||
from optimization_engine.plugins.post_solve.error_tracker import track_error
|
||||
|
||||
context = {
|
||||
"trial_number": 5,
|
||||
"working_dir": str(tmp_path),
|
||||
"output_dir": str(tmp_path),
|
||||
"solver_returncode": 1,
|
||||
"error_message": "Convergence failure at iteration 100",
|
||||
"design_variables": {"x": 1.0, "y": 2.0}
|
||||
}
|
||||
|
||||
result = track_error(context)
|
||||
|
||||
assert result["error_tracked"] is True
|
||||
assert result["error_type"] == "convergence_failure"
|
||||
|
||||
# Should have created error log
|
||||
error_log = tmp_path / "error_history.jsonl"
|
||||
assert error_log.exists()
|
||||
|
||||
# Verify log content
|
||||
with open(error_log) as f:
|
||||
log_entry = json.loads(f.readline())
|
||||
|
||||
assert log_entry["trial"] == 5
|
||||
assert log_entry["error_type"] == "convergence_failure"
|
||||
|
||||
|
||||
class TestPlaybookContextGeneration:
|
||||
"""Test context generation for different scenarios."""
|
||||
|
||||
def test_context_for_optimization_task(self):
|
||||
"""Test context generation for optimization."""
|
||||
playbook = AtomizerPlaybook()
|
||||
|
||||
# Add various insights
|
||||
playbook.add_insight(InsightCategory.STRATEGY, "Start with coarse mesh")
|
||||
playbook.add_insight(InsightCategory.MISTAKE, "Avoid tiny elements")
|
||||
playbook.add_insight(InsightCategory.TOOL, "Use TPE for exploration")
|
||||
|
||||
# Give them different scores
|
||||
playbook.record_outcome("str-00001", helpful=True)
|
||||
playbook.record_outcome("str-00001", helpful=True)
|
||||
|
||||
context = playbook.get_context_for_task("optimization", max_items=10)
|
||||
|
||||
assert "Playbook" in context
|
||||
assert "STRATEGY" in context
|
||||
assert "coarse mesh" in context
|
||||
|
||||
def test_context_filtering_by_confidence(self):
|
||||
"""Test that low-confidence items are filtered."""
|
||||
playbook = AtomizerPlaybook()
|
||||
|
||||
# Add item with low confidence
|
||||
item = playbook.add_insight(InsightCategory.STRATEGY, "Questionable advice")
|
||||
playbook.record_outcome(item.id, helpful=True)
|
||||
playbook.record_outcome(item.id, helpful=False)
|
||||
playbook.record_outcome(item.id, helpful=False)
|
||||
playbook.record_outcome(item.id, helpful=False)
|
||||
# confidence = 1/4 = 0.25
|
||||
|
||||
# High min_confidence should exclude it
|
||||
context = playbook.get_context_for_task(
|
||||
"optimization",
|
||||
min_confidence=0.5
|
||||
)
|
||||
|
||||
assert "Questionable advice" not in context
|
||||
|
||||
def test_context_ordering_by_score(self):
|
||||
"""Test that items are ordered by net score."""
|
||||
playbook = AtomizerPlaybook()
|
||||
|
||||
# Add items with different scores
|
||||
low = playbook.add_insight(InsightCategory.STRATEGY, "Low score advice")
|
||||
high = playbook.add_insight(InsightCategory.STRATEGY, "High score advice")
|
||||
|
||||
# Give high item better score
|
||||
for _ in range(5):
|
||||
playbook.record_outcome(high.id, helpful=True)
|
||||
playbook.record_outcome(low.id, helpful=True)
|
||||
|
||||
context = playbook.get_context_for_task("optimization")
|
||||
|
||||
# High score should appear first
|
||||
high_pos = context.find("High score")
|
||||
low_pos = context.find("Low score")
|
||||
assert high_pos < low_pos
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
Reference in New Issue
Block a user