feat: Implement ACE Context Engineering framework (SYS_17)

Complete implementation of Agentic Context Engineering (ACE) framework: Core modules (optimization_engine/context/): - playbook.py: AtomizerPlaybook with helpful/harmful scoring - reflector.py: AtomizerReflector for insight extraction - session_state.py: Context isolation (exposed/isolated state) - feedback_loop.py: Automated learning from trial results - compaction.py: Long-session context management - cache_monitor.py: KV-cache optimization tracking - runner_integration.py: OptimizationRunner integration Dashboard integration: - context.py: 12 REST API endpoints for playbook management Tests: - test_context_engineering.py: 44 unit tests - test_context_integration.py: 16 integration tests Documentation: - CONTEXT_ENGINEERING_REPORT.md: Comprehensive implementation report - CONTEXT_ENGINEERING_API.md: Complete API reference - SYS_17_CONTEXT_ENGINEERING.md: System protocol - Updated cheatsheet with SYS_17 quick reference - Enhanced bootstrap (00_BOOTSTRAP_V2.md) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 20:21:20 -05:00
parent 0110d80401
commit 773f8ff8af
19 changed files with 8184 additions and 2 deletions
--- a/.claude/ATOMIZER_CONTEXT.md
+++ b/.claude/ATOMIZER_CONTEXT.md
@@ -172,7 +172,7 @@ studies/{geometry_type}/{study_name}/
 │ SYS_10: IMSO (single-obj)    SYS_11: Multi-objective           │
 │ SYS_12: Extractors           SYS_13: Dashboard                  │
 │ SYS_14: Neural Accel         SYS_15: Method Selector            │
-│ SYS_16: Study Insights                                          │
+│ SYS_16: Study Insights       SYS_17: Context Engineering        │
 └─────────────────────────────────────────────────────────────────┘
                              ▼
 ┌─────────────────────────────────────────────────────────────────┐
--- a/.claude/skills/00_BOOTSTRAP_V2.md
+++ b/.claude/skills/00_BOOTSTRAP_V2.md
@@ -0,0 +1,425 @@
 ---
 skill_id: SKILL_000
 version: 3.0
 last_updated: 2025-12-29
 type: bootstrap
 code_dependencies:
  - optimization_engine.context.playbook
  - optimization_engine.context.session_state
  - optimization_engine.context.feedback_loop
 requires_skills: []
 ---
 # Atomizer LLM Bootstrap v3.0 - Context-Aware Sessions
 **Version**: 3.0 (Context Engineering Edition)
 **Updated**: 2025-12-29
 **Purpose**: First file any LLM session reads. Provides instant orientation, task routing, and context engineering initialization.
 ---
 ## Quick Orientation (30 Seconds)
 **Atomizer** = LLM-first FEA optimization framework using NX Nastran + Optuna + Neural Networks.
 **Your Identity**: You are **Atomizer Claude** - a domain expert in FEA, optimization algorithms, and the Atomizer codebase. Not a generic assistant.
 **Core Philosophy**: "Talk, don't click." Users describe what they want; you configure and execute.
 **NEW in v3.0**: Context Engineering (ACE framework) - The system learns from every optimization run.
 ---
 ## Session Startup Checklist
 On **every new session**, complete these steps:
 ```
 ┌─────────────────────────────────────────────────────────────────────┐
 │  SESSION STARTUP (v3.0)                                             │
 ├─────────────────────────────────────────────────────────────────────┤
 │                                                                     │
 │  STEP 1: Initialize Context Engineering                             │
 │  □ Load playbook from knowledge_base/playbook.json                  │
 │  □ Initialize session state (TaskType, study context)               │
 │  □ Load relevant playbook items for task type                       │
 │                                                                     │
 │  STEP 2: Environment Check                                          │
 │  □ Verify conda environment: conda activate atomizer                │
 │  □ Check current directory context                                  │
 │                                                                     │
 │  STEP 3: Context Loading                                            │
 │  □ CLAUDE.md loaded (system instructions)                           │
 │  □ This file (00_BOOTSTRAP_V2.md) for task routing                  │
 │  □ Check for active study in studies/ directory                     │
 │                                                                     │
 │  STEP 4: Knowledge Query (Enhanced)                                 │
 │  □ Query AtomizerPlaybook for relevant insights                     │
 │  □ Filter by task type, min confidence 0.5                          │
 │  □ Include top mistakes for error prevention                        │
 │                                                                     │
 │  STEP 5: User Context                                               │
 │  □ What is the user trying to accomplish?                           │
 │  □ Is there an active study context?                                │
 │  □ What privilege level? (default: user)                            │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ```
 ### Context Engineering Initialization
 ```python
 # On session start, initialize context engineering
 from optimization_engine.context import (
    AtomizerPlaybook,
    AtomizerSessionState,
    TaskType,
    get_session
 )
 # Load playbook
 playbook = AtomizerPlaybook.load(Path("knowledge_base/playbook.json"))
 # Initialize session
 session = get_session()
 session.exposed.task_type = TaskType.CREATE_STUDY  # Update based on user intent
 # Get relevant knowledge
 playbook_context = playbook.get_context_for_task(
    task_type="optimization",
    max_items=15,
    min_confidence=0.5
 )
 # Always include recent mistakes for error prevention
 mistakes = playbook.get_by_category(InsightCategory.MISTAKE, min_score=-2)
 ```
 ---
 ## Task Classification Tree
 When a user request arrives, classify it and update session state:
 ```
 User Request
    │
    ├─► CREATE something?
    │       ├─ "new study", "set up", "create", "optimize this"
    │       ├─ session.exposed.task_type = TaskType.CREATE_STUDY
    │       └─► Load: OP_01_CREATE_STUDY.md + core/study-creation-core.md
    │
    ├─► RUN something?
    │       ├─ "start", "run", "execute", "begin optimization"
    │       ├─ session.exposed.task_type = TaskType.RUN_OPTIMIZATION
    │       └─► Load: OP_02_RUN_OPTIMIZATION.md
    │
    ├─► CHECK status?
    │       ├─ "status", "progress", "how many trials", "what's happening"
    │       ├─ session.exposed.task_type = TaskType.MONITOR_PROGRESS
    │       └─► Load: OP_03_MONITOR_PROGRESS.md
    │
    ├─► ANALYZE results?
    │       ├─ "results", "best design", "compare", "pareto"
    │       ├─ session.exposed.task_type = TaskType.ANALYZE_RESULTS
    │       └─► Load: OP_04_ANALYZE_RESULTS.md
    │
    ├─► DEBUG/FIX error?
    │       ├─ "error", "failed", "not working", "crashed"
    │       ├─ session.exposed.task_type = TaskType.DEBUG_ERROR
    │       └─► Load: OP_06_TROUBLESHOOT.md + playbook[MISTAKE]
    │
    ├─► MANAGE disk space?
    │       ├─ "disk", "space", "cleanup", "archive", "storage"
    │       └─► Load: OP_07_DISK_OPTIMIZATION.md
    │
    ├─► CONFIGURE settings?
    │       ├─ "change", "modify", "settings", "parameters"
    │       ├─ session.exposed.task_type = TaskType.CONFIGURE_SETTINGS
    │       └─► Load relevant SYS_* protocol
    │
    ├─► NEURAL acceleration?
    │       ├─ "neural", "surrogate", "turbo", "GNN"
    │       ├─ session.exposed.task_type = TaskType.NEURAL_ACCELERATION
    │       └─► Load: SYS_14_NEURAL_ACCELERATION.md
    │
    └─► EXTEND functionality?
            ├─ "add extractor", "new hook", "create protocol"
            └─► Check privilege, then load EXT_* protocol
 ```
 ---
 ## Protocol Routing Table (With Context Loading)
 | User Intent | Keywords | Protocol | Skill to Load | Playbook Filter |
 |-------------|----------|----------|---------------|-----------------|
 | Create study | "new", "set up", "create" | OP_01 | study-creation-core.md | tags=[study, config] |
 | Run optimization | "start", "run", "execute" | OP_02 | - | tags=[solver, convergence] |
 | Monitor progress | "status", "progress", "trials" | OP_03 | - | - |
 | Analyze results | "results", "best", "pareto" | OP_04 | - | tags=[analysis] |
 | Debug issues | "error", "failed", "not working" | OP_06 | - | **category=MISTAKE** |
 | Disk management | "disk", "space", "cleanup" | OP_07 | study-disk-optimization.md | - |
 | Neural surrogates | "neural", "surrogate", "turbo" | SYS_14 | neural-acceleration.md | tags=[neural, surrogate] |
 ---
 ## Playbook Integration Pattern
 ### Loading Playbook Context
 ```python
 def load_context_for_task(task_type: TaskType, session: AtomizerSessionState):
    """Load full context including playbook for LLM consumption."""
    context_parts = []
    # 1. Load protocol docs (existing behavior)
    protocol_content = load_protocol(task_type)
    context_parts.append(protocol_content)
    # 2. Load session state (exposed only)
    context_parts.append(session.get_llm_context())
    # 3. Load relevant playbook items
    playbook = AtomizerPlaybook.load(PLAYBOOK_PATH)
    playbook_context = playbook.get_context_for_task(
        task_type=task_type.value,
        max_items=15,
        min_confidence=0.6
    )
    context_parts.append(playbook_context)
    # 4. Add error-specific items if debugging
    if task_type == TaskType.DEBUG_ERROR:
        mistakes = playbook.get_by_category(InsightCategory.MISTAKE)
        for item in mistakes[:5]:
            context_parts.append(item.to_context_string())
    return "\n\n---\n\n".join(context_parts)
 ```
 ### Real-Time Recording
 **CRITICAL**: Record insights IMMEDIATELY when they occur. Do not wait until session end.
 ```python
 # On discovering a workaround
 playbook.add_insight(
    category=InsightCategory.WORKFLOW,
    content="For mesh update issues, load _i.prt file before UpdateFemodel()",
    tags=["mesh", "nx", "update"]
 )
 playbook.save(PLAYBOOK_PATH)
 # On trial failure
 playbook.add_insight(
    category=InsightCategory.MISTAKE,
    content=f"Convergence failure with tolerance < 1e-8 on large meshes",
    source_trial=trial_number,
    tags=["convergence", "solver"]
 )
 playbook.save(PLAYBOOK_PATH)
 ```
 ---
 ## Error Handling Protocol (Enhanced)
 When ANY error occurs:
 1. **Preserve the error** - Add to session state
 2. **Check playbook** - Look for matching mistake patterns
 3. **Learn from it** - If novel error, add to playbook
 4. **Show to user** - Include error context in response
 ```python
 # On error
 session.add_error(f"{error_type}: {error_message}", error_type=error_type)
 # Check playbook for similar errors
 similar = playbook.search_by_content(error_message, category=InsightCategory.MISTAKE)
 if similar:
    print(f"Known issue: {similar[0].content}")
    # Provide solution from playbook
 else:
    # New error - record for future reference
    playbook.add_insight(
        category=InsightCategory.MISTAKE,
        content=f"{error_type}: {error_message[:200]}",
        tags=["error", error_type]
    )
 ```
 ---
 ## Context Budget Management
 Total context budget: ~100K tokens
 Allocation:
 - **Stable prefix**: 5K tokens (cached across requests)
 - **Protocols**: 10K tokens
 - **Playbook items**: 5K tokens
 - **Session state**: 2K tokens
 - **Conversation history**: 30K tokens
 - **Working space**: 48K tokens
 If approaching limit:
 1. Trigger compaction of old events
 2. Reduce playbook items to top 5
 3. Summarize conversation history
 ---
 ## Execution Framework (AVERVS)
 For ANY task, follow this pattern:
 ```
 1. ANNOUNCE  → State what you're about to do
 2. VALIDATE  → Check prerequisites are met
 3. EXECUTE   → Perform the action
 4. RECORD    → Record outcome to playbook (NEW!)
 5. VERIFY    → Confirm success
 6. REPORT    → Summarize what was done
 7. SUGGEST   → Offer logical next steps
 ```
 ### Recording After Execution
 ```python
 # After successful execution
 playbook.add_insight(
    category=InsightCategory.STRATEGY,
    content=f"Approach worked: {brief_description}",
    tags=relevant_tags
 )
 # After failure
 playbook.add_insight(
    category=InsightCategory.MISTAKE,
    content=f"Failed approach: {brief_description}. Reason: {reason}",
    tags=relevant_tags
 )
 # Always save after recording
 playbook.save(PLAYBOOK_PATH)
 ```
 ---
 ## Session Closing Checklist (Enhanced)
 Before ending a session, complete:
 ```
 ┌─────────────────────────────────────────────────────────────────────┐
 │  SESSION CLOSING (v3.0)                                             │
 ├─────────────────────────────────────────────────────────────────────┤
 │                                                                     │
 │  1. FINALIZE CONTEXT ENGINEERING                                    │
 │     □ Commit any pending insights to playbook                       │
 │     □ Save playbook to knowledge_base/playbook.json                 │
 │     □ Export learning report if optimization completed              │
 │                                                                     │
 │  2. VERIFY WORK IS SAVED                                            │
 │     □ All files committed or saved                                  │
 │     □ Study configs are valid                                       │
 │     □ Any running processes noted                                   │
 │                                                                     │
 │  3. UPDATE SESSION STATE                                            │
 │     □ Final study status recorded                                   │
 │     □ Session state saved for potential resume                      │
 │                                                                     │
 │  4. SUMMARIZE FOR USER                                              │
 │     □ What was accomplished                                         │
 │     □ What the system learned (new playbook items)                  │
 │     □ Current state of any studies                                  │
 │     □ Recommended next steps                                        │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
 ```
 ### Finalization Code
 ```python
 # At session end
 from optimization_engine.context import FeedbackLoop, save_playbook
 # If optimization was run, finalize learning
 if optimization_completed:
    feedback = FeedbackLoop(playbook_path)
    result = feedback.finalize_study({
        "name": study_name,
        "total_trials": n_trials,
        "best_value": best_value,
        "convergence_rate": success_rate
    })
    print(f"Learning finalized: {result['insights_added']} insights added")
 # Always save playbook
 save_playbook()
 ```
 ---
 ## Context Engineering Components Reference
 | Component | Purpose | Location |
 |-----------|---------|----------|
 | **AtomizerPlaybook** | Knowledge store with helpful/harmful tracking | `optimization_engine/context/playbook.py` |
 | **AtomizerReflector** | Analyzes outcomes, extracts insights | `optimization_engine/context/reflector.py` |
 | **AtomizerSessionState** | Context isolation (exposed/isolated) | `optimization_engine/context/session_state.py` |
 | **FeedbackLoop** | Connects outcomes to playbook updates | `optimization_engine/context/feedback_loop.py` |
 | **CompactionManager** | Handles long sessions | `optimization_engine/context/compaction.py` |
 | **ContextCacheOptimizer** | KV-cache optimization | `optimization_engine/context/cache_monitor.py` |
 ---
 ## Quick Paths
 ### "I just want to run an optimization"
 1. Initialize session state as RUN_OPTIMIZATION
 2. Load playbook items for [solver, convergence]
 3. Load OP_02_RUN_OPTIMIZATION.md
 4. After run, finalize feedback loop
 ### "Something broke"
 1. Initialize session state as DEBUG_ERROR
 2. Load ALL mistake items from playbook
 3. Load OP_06_TROUBLESHOOT.md
 4. Record any new errors discovered
 ### "What did my optimization find?"
 1. Initialize session state as ANALYZE_RESULTS
 2. Load OP_04_ANALYZE_RESULTS.md
 3. Query the study database
 4. Generate report
 ---
 ## Key Constraints (Always Apply)
 1. **Python Environment**: Always use `conda activate atomizer`
 2. **Never modify master files**: Copy NX files to study working directory first
 3. **Code reuse**: Check `optimization_engine/extractors/` before writing new extraction code
 4. **Validation**: Always validate config before running optimization
 5. **Record immediately**: Don't wait until session end to record insights
 6. **Save playbook**: After every insight, save the playbook
 ---
 ## Migration from v2.0
 If upgrading from BOOTSTRAP v2.0:
 1. The LAC system is now superseded by AtomizerPlaybook
 2. Session insights are now structured PlaybookItems
 3. Helpful/harmful tracking replaces simple confidence scores
 4. Context is now explicitly exposed vs isolated
 The old LAC files in `knowledge_base/lac/` are still readable but new insights should use the playbook system.
 ---
 *Atomizer v3.0: Where engineers talk, AI optimizes, and the system learns.*
--- a/.claude/skills/01_CHEATSHEET.md
+++ b/.claude/skills/01_CHEATSHEET.md
@@ -34,6 +34,7 @@ requires_skills:
 | Add custom physics extractor | EXT_01 | Create in `optimization_engine/extractors/` |
 | Add lifecycle hook | EXT_02 | Create in `optimization_engine/plugins/` |
 | Generate physics insight | SYS_16 | `python -m optimization_engine.insights generate <study>` |
 | **Manage knowledge/playbook** | **SYS_17** | `from optimization_engine.context import AtomizerPlaybook` |
 ---
@@ -366,6 +367,7 @@ Without it, `UpdateFemodel()` runs but the mesh doesn't change!
 | 14 | Neural | Surrogate model acceleration |
 | 15 | Method Selector | Recommends optimization strategy |
 | 16 | Study Insights | Physics visualizations (Zernike, stress, modal) |
 | 17 | Context Engineering | ACE framework - self-improving knowledge system |
 ---
@@ -549,3 +551,106 @@ convert_custom_to_optuna(db_path, study_name)
 - Trial numbers **NEVER reset** across study lifetime
 - Surrogate predictions (5K per batch) are NOT logged as trials
 - Only FEA-validated results become trials
 ---
 ## Context Engineering Quick Reference (SYS_17)
 The ACE (Agentic Context Engineering) framework enables self-improving optimization through structured knowledge capture.
 ### Core Components
 | Component | Purpose | Key Function |
 |-----------|---------|--------------|
 | **AtomizerPlaybook** | Structured knowledge store | `playbook.add_insight()`, `playbook.get_context_for_task()` |
 | **AtomizerReflector** | Extracts insights from outcomes | `reflector.analyze_outcome()` |
 | **AtomizerSessionState** | Context isolation (exposed/isolated) | `session.get_llm_context()` |
 | **FeedbackLoop** | Automated learning | `feedback.process_trial_result()` |
 | **CompactionManager** | Long-session handling | `compactor.maybe_compact()` |
 | **CacheMonitor** | KV-cache optimization | `optimizer.track_completion()` |
 ### Python API Quick Reference
 ```python
 from optimization_engine.context import (
    AtomizerPlaybook, AtomizerReflector, get_session,
    InsightCategory, TaskType, FeedbackLoop
 )
 # Load playbook
 playbook = AtomizerPlaybook.load(Path("knowledge_base/playbook.json"))
 # Add an insight
 playbook.add_insight(
    category=InsightCategory.STRATEGY,  # str, mis, tool, cal, dom, wf
    content="CMA-ES converges faster on smooth mirror surfaces",
    tags=["mirror", "sampler", "convergence"]
 )
 playbook.save(Path("knowledge_base/playbook.json"))
 # Get context for LLM
 context = playbook.get_context_for_task(
    task_type="optimization",
    max_items=15,
    min_confidence=0.5
 )
 # Record feedback
 playbook.record_outcome(item_id="str_001", helpful=True)
 # Session state
 session = get_session()
 session.exposed.task_type = TaskType.RUN_OPTIMIZATION
 session.add_action("Started optimization run")
 llm_context = session.get_llm_context()
 # Feedback loop (automated learning)
 feedback = FeedbackLoop(playbook_path)
 feedback.process_trial_result(
    trial_number=42,
    params={'thickness': 10.5},
    objectives={'mass': 5.2},
    is_feasible=True
 )
 ```
 ### Insight Categories
 | Category | Code | Use For |
 |----------|------|---------|
 | Strategy | `str` | Optimization approaches that work |
 | Mistake | `mis` | Common errors to avoid |
 | Tool | `tool` | Tool usage patterns |
 | Calculation | `cal` | Formulas and calculations |
 | Domain | `dom` | FEA/NX domain knowledge |
 | Workflow | `wf` | Process patterns |
 ### Playbook Item Format
 ```
 [str_001] helpful=5 harmful=0 :: CMA-ES converges faster on smooth surfaces
 ```
 - `net_score = helpful - harmful`
 - `confidence = helpful / (helpful + harmful)`
 - Items with `net_score < -3` are pruned
 ### REST API Endpoints
 | Endpoint | Method | Purpose |
 |----------|--------|---------|
 | `/api/context/playbook` | GET | Playbook summary stats |
 | `/api/context/playbook/items` | GET | List items with filters |
 | `/api/context/playbook/feedback` | POST | Record helpful/harmful |
 | `/api/context/playbook/insights` | POST | Add new insight |
 | `/api/context/playbook/prune` | POST | Remove harmful items |
 | `/api/context/session` | GET | Current session state |
 | `/api/context/learning/report` | GET | Comprehensive learning report |
 ### Dashboard URL
 | Service | URL | Purpose |
 |---------|-----|---------|
 | Context API | `http://localhost:5000/api/context` | Playbook management |
 **Full documentation**: `docs/protocols/system/SYS_17_CONTEXT_ENGINEERING.md`
--- a/atomizer-dashboard/backend/api/main.py
+++ b/atomizer-dashboard/backend/api/main.py
@@ -12,7 +12,7 @@ import sys
 # Add parent directory to path to import optimization_engine
 sys.path.append(str(Path(__file__).parent.parent.parent.parent))
-from api.routes import optimization, claude, terminal, insights
+from api.routes import optimization, claude, terminal, insights, context
 from api.websocket import optimization_stream
 # Create FastAPI app
@@ -37,6 +37,7 @@ app.include_router(optimization_stream.router, prefix="/api/ws", tags=["websocke
 app.include_router(claude.router, prefix="/api/claude", tags=["claude"])
 app.include_router(terminal.router, prefix="/api/terminal", tags=["terminal"])
 app.include_router(insights.router, prefix="/api/insights", tags=["insights"])
 app.include_router(context.router, prefix="/api/context", tags=["context"])
@app.get("/")
 async def root():
--- a/atomizer-dashboard/backend/api/routes/context.py
+++ b/atomizer-dashboard/backend/api/routes/context.py
@@ -0,0 +1,450 @@
 """
 Context Engineering API Routes
 Provides endpoints for:
 - Viewing playbook contents
 - Managing session state
 - Recording feedback on playbook items
 - Triggering compaction
 - Monitoring cache efficiency
 - Exporting learning reports
 Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
 """
 from fastapi import APIRouter, HTTPException, Query
 from pathlib import Path
 from typing import Optional, List
 from pydantic import BaseModel
 from datetime import datetime
 import sys
 # Add parent paths for imports
 sys.path.append(str(Path(__file__).parent.parent.parent.parent.parent))
 router = APIRouter()
 # Paths
 ATOMIZER_ROOT = Path(__file__).parents[4]
 PLAYBOOK_PATH = ATOMIZER_ROOT / "knowledge_base" / "playbook.json"
 # Pydantic models for request/response
 class PlaybookItemResponse(BaseModel):
    id: str
    category: str
    content: str
    helpful_count: int
    harmful_count: int
    net_score: int
    confidence: float
    tags: List[str]
    created_at: str
    last_used: Optional[str]
 class PlaybookSummary(BaseModel):
    total_items: int
    by_category: dict
    version: int
    last_updated: str
    avg_score: float
    top_score: int
    lowest_score: int
 class FeedbackRequest(BaseModel):
    item_id: str
    helpful: bool
 class InsightRequest(BaseModel):
    category: str
    content: str
    tags: Optional[List[str]] = None
    source_trial: Optional[int] = None
 class SessionStateResponse(BaseModel):
    session_id: str
    task_type: Optional[str]
    study_name: Optional[str]
    study_status: str
    trials_completed: int
    trials_total: int
    best_value: Optional[float]
    recent_actions: List[str]
    recent_errors: List[str]
 # Helper function to get playbook
 def get_playbook():
    """Load playbook, handling import errors gracefully."""
    try:
        from optimization_engine.context.playbook import AtomizerPlaybook
        return AtomizerPlaybook.load(PLAYBOOK_PATH)
    except ImportError as e:
        raise HTTPException(
            status_code=500,
            detail=f"Context engineering module not available: {str(e)}"
        )
 # Playbook endpoints
@router.get("/playbook", response_model=PlaybookSummary)
 async def get_playbook_summary():
    """Get playbook summary statistics."""
    playbook = get_playbook()
    stats = playbook.get_stats()
    return PlaybookSummary(
        total_items=stats["total_items"],
        by_category=stats["by_category"],
        version=stats["version"],
        last_updated=stats["last_updated"],
        avg_score=stats["avg_score"],
        top_score=stats["max_score"],
        lowest_score=stats["min_score"]
    )
@router.get("/playbook/items", response_model=List[PlaybookItemResponse])
 async def get_playbook_items(
    category: Optional[str] = Query(None, description="Filter by category (str, mis, tool, etc.)"),
    min_score: int = Query(0, description="Minimum net score"),
    min_confidence: float = Query(0.0, description="Minimum confidence (0.0-1.0)"),
    limit: int = Query(50, description="Maximum items to return"),
    offset: int = Query(0, description="Pagination offset")
 ):
    """
    Get playbook items with optional filtering.
    Categories:
    - str: Strategy
    - mis: Mistake
    - tool: Tool usage
    - cal: Calculation
    - dom: Domain knowledge
    - wf: Workflow
    """
    playbook = get_playbook()
    items = list(playbook.items.values())
    # Filter by category
    if category:
        try:
            from optimization_engine.context.playbook import InsightCategory
            cat = InsightCategory(category)
            items = [i for i in items if i.category == cat]
        except ValueError:
            raise HTTPException(400, f"Invalid category: {category}. Valid: str, mis, tool, cal, dom, wf")
    # Filter by score
    items = [i for i in items if i.net_score >= min_score]
    # Filter by confidence
    items = [i for i in items if i.confidence >= min_confidence]
    # Sort by score
    items.sort(key=lambda x: x.net_score, reverse=True)
    # Paginate
    items = items[offset:offset + limit]
    return [
        PlaybookItemResponse(
            id=item.id,
            category=item.category.value,
            content=item.content,
            helpful_count=item.helpful_count,
            harmful_count=item.harmful_count,
            net_score=item.net_score,
            confidence=item.confidence,
            tags=item.tags,
            created_at=item.created_at,
            last_used=item.last_used
        )
        for item in items
    ]
@router.get("/playbook/items/{item_id}", response_model=PlaybookItemResponse)
 async def get_playbook_item(item_id: str):
    """Get a specific playbook item by ID."""
    playbook = get_playbook()
    if item_id not in playbook.items:
        raise HTTPException(404, f"Item not found: {item_id}")
    item = playbook.items[item_id]
    return PlaybookItemResponse(
        id=item.id,
        category=item.category.value,
        content=item.content,
        helpful_count=item.helpful_count,
        harmful_count=item.harmful_count,
        net_score=item.net_score,
        confidence=item.confidence,
        tags=item.tags,
        created_at=item.created_at,
        last_used=item.last_used
    )
@router.post("/playbook/feedback")
 async def record_feedback(request: FeedbackRequest):
    """
    Record feedback on a playbook item.
    This is how the system learns:
    - helpful=true increases the item's score
    - helpful=false decreases the item's score
    """
    playbook = get_playbook()
    if request.item_id not in playbook.items:
        raise HTTPException(404, f"Item not found: {request.item_id}")
    playbook.record_outcome(request.item_id, helpful=request.helpful)
    playbook.save(PLAYBOOK_PATH)
    item = playbook.items[request.item_id]
    return {
        "item_id": request.item_id,
        "new_score": item.net_score,
        "new_confidence": item.confidence,
        "helpful_count": item.helpful_count,
        "harmful_count": item.harmful_count
    }
@router.post("/playbook/insights")
 async def add_insight(request: InsightRequest):
    """
    Add a new insight to the playbook.
    Categories:
    - str: Strategy - Optimization strategies that work
    - mis: Mistake - Common mistakes to avoid
    - tool: Tool - Tool usage patterns
    - cal: Calculation - Formulas and calculations
    - dom: Domain - Domain-specific knowledge (FEA, NX)
    - wf: Workflow - Workflow patterns
    """
    try:
        from optimization_engine.context.playbook import InsightCategory
    except ImportError as e:
        raise HTTPException(500, f"Context module not available: {e}")
    # Validate category
    try:
        category = InsightCategory(request.category)
    except ValueError:
        raise HTTPException(400, f"Invalid category: {request.category}")
    playbook = get_playbook()
    item = playbook.add_insight(
        category=category,
        content=request.content,
        source_trial=request.source_trial,
        tags=request.tags
    )
    playbook.save(PLAYBOOK_PATH)
    return {
        "item_id": item.id,
        "category": item.category.value,
        "content": item.content,
        "message": "Insight added successfully"
    }
@router.delete("/playbook/items/{item_id}")
 async def delete_playbook_item(item_id: str):
    """Delete a playbook item."""
    playbook = get_playbook()
    if item_id not in playbook.items:
        raise HTTPException(404, f"Item not found: {item_id}")
    content = playbook.items[item_id].content[:50]
    del playbook.items[item_id]
    playbook.save(PLAYBOOK_PATH)
    return {
        "deleted": item_id,
        "content_preview": content
    }
@router.post("/playbook/prune")
 async def prune_playbook(threshold: int = Query(-3, description="Net score threshold for pruning")):
    """
    Prune harmful items from the playbook.
    Items with net_score <= threshold will be removed.
    """
    playbook = get_playbook()
    removed_count = playbook.prune_harmful(threshold=threshold)
    playbook.save(PLAYBOOK_PATH)
    return {
        "items_pruned": removed_count,
        "threshold_used": threshold,
        "remaining_items": len(playbook.items)
    }
@router.get("/playbook/context")
 async def get_playbook_context(
    task_type: str = Query("optimization", description="Task type for context filtering"),
    max_items: int = Query(15, description="Maximum items to include"),
    min_confidence: float = Query(0.5, description="Minimum confidence threshold")
 ):
    """
    Get playbook context string formatted for LLM consumption.
    This is what gets injected into the LLM context window.
    """
    playbook = get_playbook()
    context = playbook.get_context_for_task(
        task_type=task_type,
        max_items=max_items,
        min_confidence=min_confidence
    )
    return {
        "context": context,
        "items_included": min(max_items, len(playbook.items)),
        "task_type": task_type
    }
 # Session state endpoints
@router.get("/session", response_model=SessionStateResponse)
 async def get_session_state():
    """Get current session state."""
    try:
        from optimization_engine.context.session_state import get_session
        session = get_session()
        return SessionStateResponse(
            session_id=session.session_id,
            task_type=session.exposed.task_type.value if session.exposed.task_type else None,
            study_name=session.exposed.study_name,
            study_status=session.exposed.study_status,
            trials_completed=session.exposed.trials_completed,
            trials_total=session.exposed.trials_total,
            best_value=session.exposed.best_value,
            recent_actions=session.exposed.recent_actions[-10:],
            recent_errors=session.exposed.recent_errors[-5:]
        )
    except ImportError:
        raise HTTPException(500, "Session state module not available")
@router.get("/session/context")
 async def get_session_context():
    """Get session context string for LLM consumption."""
    try:
        from optimization_engine.context.session_state import get_session
        session = get_session()
        return {
            "context": session.get_llm_context(),
            "session_id": session.session_id,
            "last_updated": session.last_updated
        }
    except ImportError:
        raise HTTPException(500, "Session state module not available")
 # Cache monitoring endpoints
@router.get("/cache/stats")
 async def get_cache_stats():
    """Get KV-cache efficiency statistics."""
    try:
        from optimization_engine.context.cache_monitor import get_cache_optimizer
        optimizer = get_cache_optimizer()
        return {
            "stats": optimizer.get_stats_dict(),
            "report": optimizer.get_report()
        }
    except ImportError:
        return {
            "message": "Cache monitoring not active",
            "stats": None
        }
 # Learning report endpoints
@router.get("/learning/report")
 async def get_learning_report():
    """Get a comprehensive learning report."""
    playbook = get_playbook()
    stats = playbook.get_stats()
    # Get top and worst performers
    items = list(playbook.items.values())
    items.sort(key=lambda x: x.net_score, reverse=True)
    top_performers = [
        {"id": i.id, "content": i.content[:100], "score": i.net_score}
        for i in items[:10]
    ]
    items.sort(key=lambda x: x.net_score)
    worst_performers = [
        {"id": i.id, "content": i.content[:100], "score": i.net_score}
        for i in items[:5] if i.net_score < 0
    ]
    return {
        "generated_at": datetime.now().isoformat(),
        "playbook_stats": stats,
        "top_performers": top_performers,
        "worst_performers": worst_performers,
        "recommendations": _generate_recommendations(playbook)
    }
 def _generate_recommendations(playbook) -> List[str]:
    """Generate recommendations based on playbook state."""
    recommendations = []
    # Check for harmful items
    harmful = [i for i in playbook.items.values() if i.net_score < -3]
    if harmful:
        recommendations.append(
            f"Consider pruning {len(harmful)} harmful items (net_score < -3)"
        )
    # Check for untested items
    untested = [
        i for i in playbook.items.values()
        if i.helpful_count + i.harmful_count == 0
    ]
    if len(untested) > 10:
        recommendations.append(
            f"{len(untested)} items have no feedback - consider testing them"
        )
    # Check category balance
    stats = playbook.get_stats()
    if stats["by_category"].get("MISTAKE", 0) < 5:
        recommendations.append(
            "Low mistake count - actively record errors when they occur"
        )
    if not recommendations:
        recommendations.append("Playbook is in good health!")
    return recommendations
--- a/docs/CONTEXT_ENGINEERING_REPORT.md
+++ b/docs/CONTEXT_ENGINEERING_REPORT.md
--- a/docs/api/CONTEXT_ENGINEERING_API.md
+++ b/docs/api/CONTEXT_ENGINEERING_API.md
@@ -0,0 +1,948 @@
 # Context Engineering API Reference
 **Version**: 1.0
 **Updated**: 2025-12-29
 **Module**: `optimization_engine.context`
 This document provides complete API documentation for the Atomizer Context Engineering (ACE) framework.
 ---
 ## Table of Contents
 1. [Module Overview](#module-overview)
 2. [Core Classes](#core-classes)
   - [AtomizerPlaybook](#atomizerplaybook)
   - [PlaybookItem](#playbookitem)
   - [InsightCategory](#insightcategory)
 3. [Session Management](#session-management)
   - [AtomizerSessionState](#atomizersessionstate)
   - [ExposedState](#exposedstate)
   - [IsolatedState](#isolatedstate)
   - [TaskType](#tasktype)
 4. [Analysis & Learning](#analysis--learning)
   - [AtomizerReflector](#atomizerreflector)
   - [FeedbackLoop](#feedbackloop)
 5. [Optimization](#optimization)
   - [CompactionManager](#compactionmanager)
   - [ContextCacheOptimizer](#contextcacheoptimizer)
 6. [Integration](#integration)
   - [ContextEngineeringMixin](#contextengineeringmixin)
   - [ContextAwareRunner](#contextawarerunner)
 7. [REST API](#rest-api)
 ---
 ## Module Overview
 ### Import Patterns
 ```python
 # Full import
 from optimization_engine.context import (
    # Core playbook
    AtomizerPlaybook,
    PlaybookItem,
    InsightCategory,
    # Session management
    AtomizerSessionState,
    ExposedState,
    IsolatedState,
    TaskType,
    get_session,
    # Analysis
    AtomizerReflector,
    OptimizationOutcome,
    InsightCandidate,
    # Learning
    FeedbackLoop,
    FeedbackLoopFactory,
    # Optimization
    CompactionManager,
    ContextEvent,
    EventType,
    ContextBudgetManager,
    ContextCacheOptimizer,
    CacheStats,
    StablePrefixBuilder,
    # Integration
    ContextEngineeringMixin,
    ContextAwareRunner,
 )
 # Convenience imports
 from optimization_engine.context import AtomizerPlaybook, get_session
 ```
 ---
 ## Core Classes
 ### AtomizerPlaybook
 The central knowledge store for persistent learning across sessions.
 #### Constructor
 ```python
 AtomizerPlaybook(
    items: Dict[str, PlaybookItem] = None,
    version: int = 1,
    created_at: str = None,
    last_updated: str = None
 )
 ```
 #### Class Methods
 ##### `load(path: Path) -> AtomizerPlaybook`
 Load playbook from JSON file.
 ```python
 playbook = AtomizerPlaybook.load(Path("knowledge_base/playbook.json"))
 ```
 **Parameters:**
 - `path`: Path to JSON file
 **Returns:** AtomizerPlaybook instance
 **Raises:** FileNotFoundError if file doesn't exist (creates new if not found)
 ---
 #### Instance Methods
 ##### `save(path: Path) -> None`
 Save playbook to JSON file.
 ```python
 playbook.save(Path("knowledge_base/playbook.json"))
 ```
 ---
 ##### `add_insight(category, content, source_trial=None, tags=None) -> PlaybookItem`
 Add a new insight to the playbook.
 ```python
 item = playbook.add_insight(
    category=InsightCategory.STRATEGY,
    content="CMA-ES converges faster on smooth surfaces",
    source_trial=42,
    tags=["sampler", "convergence", "mirror"]
 )
 ```
 **Parameters:**
 - `category` (InsightCategory): Category of the insight
 - `content` (str): The insight content
 - `source_trial` (int, optional): Trial number that generated this insight
 - `tags` (List[str], optional): Tags for filtering
 **Returns:** The created PlaybookItem
 ---
 ##### `record_outcome(item_id: str, helpful: bool) -> None`
 Record whether an insight was helpful or harmful.
 ```python
 playbook.record_outcome("str_001", helpful=True)
 playbook.record_outcome("mis_003", helpful=False)
 ```
 **Parameters:**
 - `item_id` (str): ID of the playbook item
 - `helpful` (bool): True if helpful, False if harmful
 ---
 ##### `get_context_for_task(task_type, max_items=15, min_confidence=0.5) -> str`
 Get formatted context string for LLM consumption.
 ```python
 context = playbook.get_context_for_task(
    task_type="optimization",
    max_items=15,
    min_confidence=0.5
 )
 ```
 **Parameters:**
 - `task_type` (str): Type of task for filtering
 - `max_items` (int): Maximum items to include
 - `min_confidence` (float): Minimum confidence threshold (0.0-1.0)
 **Returns:** Formatted string suitable for LLM context
 ---
 ##### `get_by_category(category, min_score=0) -> List[PlaybookItem]`
 Get items filtered by category.
 ```python
 mistakes = playbook.get_by_category(InsightCategory.MISTAKE, min_score=-2)
 ```
 **Parameters:**
 - `category` (InsightCategory): Category to filter by
 - `min_score` (int): Minimum net score
 **Returns:** List of matching PlaybookItems
 ---
 ##### `get_stats() -> Dict`
 Get playbook statistics.
 ```python
 stats = playbook.get_stats()
 # Returns:
 # {
 #     "total_items": 45,
 #     "by_category": {"STRATEGY": 12, "MISTAKE": 8, ...},
 #     "version": 3,
 #     "last_updated": "2025-12-29T10:30:00",
 #     "avg_score": 2.4,
 #     "max_score": 15,
 #     "min_score": -3
 # }
 ```
 ---
 ##### `prune_harmful(threshold=-3) -> int`
 Remove items with net score below threshold.
 ```python
 removed_count = playbook.prune_harmful(threshold=-3)
 ```
 **Parameters:**
 - `threshold` (int): Items with net_score <= threshold are removed
 **Returns:** Number of items removed
 ---
 ### PlaybookItem
 Dataclass representing a single playbook entry.
 ```python
@dataclass
 class PlaybookItem:
    id: str                          # e.g., "str_001", "mis_003"
    category: InsightCategory        # Category enum
    content: str                     # The insight text
    helpful_count: int = 0           # Times marked helpful
    harmful_count: int = 0           # Times marked harmful
    tags: List[str] = field(default_factory=list)
    source_trial: Optional[int] = None
    created_at: str = ""             # ISO timestamp
    last_used: Optional[str] = None  # ISO timestamp
 ```
 #### Properties
 ```python
 item.net_score      # helpful_count - harmful_count
 item.confidence     # helpful / (helpful + harmful), or 0.5 if no feedback
 ```
 #### Methods
 ```python
 # Convert to context string for LLM
 context_str = item.to_context_string()
 # "[str_001] helpful=5 harmful=0 :: CMA-ES converges faster..."
 ```
 ---
 ### InsightCategory
 Enum for categorizing insights.
 ```python
 class InsightCategory(Enum):
    STRATEGY = "str"      # Optimization strategies that work
    CALCULATION = "cal"   # Formulas and calculations
    MISTAKE = "mis"       # Common mistakes to avoid
    TOOL = "tool"         # Tool usage patterns
    DOMAIN = "dom"        # Domain-specific knowledge (FEA, NX)
    WORKFLOW = "wf"       # Workflow patterns
 ```
 **Usage:**
 ```python
 # Create with enum
 category = InsightCategory.STRATEGY
 # Create from string
 category = InsightCategory("str")
 # Get string value
 value = InsightCategory.STRATEGY.value  # "str"
 ```
 ---
 ## Session Management
 ### AtomizerSessionState
 Manages session context with exposed/isolated separation.
 #### Constructor
 ```python
 session = AtomizerSessionState(
    session_id: str = None  # Auto-generated UUID if not provided
 )
 ```
 #### Attributes
 ```python
 session.session_id      # Unique session identifier
 session.exposed         # ExposedState - always in LLM context
 session.isolated        # IsolatedState - on-demand access only
 session.last_updated    # ISO timestamp of last update
 ```
 #### Methods
 ##### `get_llm_context() -> str`
 Get exposed state formatted for LLM context.
 ```python
 context = session.get_llm_context()
 # Returns formatted string with task type, study info, progress, etc.
 ```
 ---
 ##### `add_action(action: str) -> None`
 Record an action (keeps last 20).
 ```python
 session.add_action("Started optimization with TPE sampler")
 ```
 ---
 ##### `add_error(error: str, error_type: str = None) -> None`
 Record an error (keeps last 10).
 ```python
 session.add_error("NX solver timeout after 600s", error_type="solver")
 ```
 ---
 ##### `to_dict() / from_dict(data) -> AtomizerSessionState`
 Serialize/deserialize session state.
 ```python
 # Save
 data = session.to_dict()
 # Restore
 session = AtomizerSessionState.from_dict(data)
 ```
 ---
 ### ExposedState
 State that's always included in LLM context.
 ```python
@dataclass
 class ExposedState:
    task_type: Optional[TaskType] = None
    study_name: Optional[str] = None
    study_status: str = "idle"
    trials_completed: int = 0
    trials_total: int = 0
    best_value: Optional[float] = None
    recent_actions: List[str] = field(default_factory=list)  # Last 20
    recent_errors: List[str] = field(default_factory=list)   # Last 10
 ```
 ---
 ### IsolatedState
 State available on-demand but not in default context.
 ```python
@dataclass
 class IsolatedState:
    full_trial_history: List[Dict] = field(default_factory=list)
    detailed_errors: List[Dict] = field(default_factory=list)
    performance_metrics: Dict = field(default_factory=dict)
    debug_info: Dict = field(default_factory=dict)
 ```
 ---
 ### TaskType
 Enum for session task classification.
 ```python
 class TaskType(Enum):
    CREATE_STUDY = "create_study"
    RUN_OPTIMIZATION = "run_optimization"
    MONITOR_PROGRESS = "monitor_progress"
    ANALYZE_RESULTS = "analyze_results"
    DEBUG_ERROR = "debug_error"
    CONFIGURE_SETTINGS = "configure_settings"
    NEURAL_ACCELERATION = "neural_acceleration"
 ```
 ---
 ### get_session()
 Get or create the global session instance.
 ```python
 from optimization_engine.context import get_session
 session = get_session()
 session.exposed.task_type = TaskType.RUN_OPTIMIZATION
 ```
 ---
 ## Analysis & Learning
 ### AtomizerReflector
 Analyzes optimization outcomes and extracts insights.
 #### Constructor
 ```python
 reflector = AtomizerReflector(playbook: AtomizerPlaybook)
 ```
 #### Methods
 ##### `analyze_outcome(outcome: OptimizationOutcome) -> List[InsightCandidate]`
 Analyze an optimization outcome for insights.
 ```python
 outcome = OptimizationOutcome(
    study_name="bracket_v3",
    trial_number=42,
    params={'thickness': 10.5},
    objectives={'mass': 5.2},
    constraints_satisfied=True,
    error_message=None,
    solve_time=45.2
 )
 insights = reflector.analyze_outcome(outcome)
 for insight in insights:
    print(f"{insight.category}: {insight.content}")
 ```
 ---
 ##### `extract_error_insights(error_message: str) -> List[InsightCandidate]`
 Extract insights from error messages.
 ```python
 insights = reflector.extract_error_insights("Solution did not converge within tolerance")
 # Returns insights about convergence failures
 ```
 ---
 ### OptimizationOutcome
 Dataclass for optimization trial outcomes.
 ```python
@dataclass
 class OptimizationOutcome:
    study_name: str
    trial_number: int
    params: Dict[str, Any]
    objectives: Dict[str, float]
    constraints_satisfied: bool
    error_message: Optional[str] = None
    solve_time: Optional[float] = None
 ```
 ---
 ### FeedbackLoop
 Automated learning from optimization execution.
 #### Constructor
 ```python
 feedback = FeedbackLoop(playbook_path: Path)
 ```
 #### Methods
 ##### `process_trial_result(trial_number, params, objectives, is_feasible, error=None)`
 Process a trial result for learning opportunities.
 ```python
 feedback.process_trial_result(
    trial_number=42,
    params={'thickness': 10.5, 'width': 25.0},
    objectives={'mass': 5.2, 'stress': 180.0},
    is_feasible=True,
    error=None
 )
 ```
 ---
 ##### `finalize_study(study_summary: Dict) -> Dict`
 Finalize learning at end of optimization study.
 ```python
 result = feedback.finalize_study({
    "name": "bracket_v3",
    "total_trials": 100,
    "best_value": 4.8,
    "convergence_rate": 0.95
 })
 # Returns: {"insights_added": 3, "patterns_identified": ["fast_convergence"]}
 ```
 ---
 ## Optimization
 ### CompactionManager
 Handles context compaction for long-running sessions.
 #### Constructor
 ```python
 compactor = CompactionManager(
    max_events: int = 100,
    preserve_errors: bool = True,
    preserve_milestones: bool = True
 )
 ```
 #### Methods
 ##### `add_event(event: ContextEvent) -> None`
 Add an event to the session history.
 ```python
 from optimization_engine.context import ContextEvent, EventType
 event = ContextEvent(
    event_type=EventType.TRIAL_COMPLETE,
    content="Trial 42 completed: mass=5.2kg",
    timestamp=datetime.now().isoformat(),
    is_error=False,
    is_milestone=False
 )
 compactor.add_event(event)
 ```
 ---
 ##### `maybe_compact() -> Optional[str]`
 Compact events if over threshold.
 ```python
 summary = compactor.maybe_compact()
 if summary:
    print(f"Compacted: {summary}")
 ```
 ---
 ##### `get_context() -> str`
 Get current context string.
 ```python
 context = compactor.get_context()
 ```
 ---
 ### ContextCacheOptimizer
 Monitors and optimizes KV-cache efficiency.
 #### Constructor
 ```python
 optimizer = ContextCacheOptimizer()
 ```
 #### Methods
 ##### `track_request(prefix_tokens: int, total_tokens: int)`
 Track a request for cache analysis.
 ```python
 optimizer.track_request(prefix_tokens=5000, total_tokens=15000)
 ```
 ---
 ##### `track_completion(success: bool, response_tokens: int)`
 Track completion for performance analysis.
 ```python
 optimizer.track_completion(success=True, response_tokens=500)
 ```
 ---
 ##### `get_stats_dict() -> Dict`
 Get cache statistics.
 ```python
 stats = optimizer.get_stats_dict()
 # Returns:
 # {
 #     "total_requests": 150,
 #     "cache_hits": 120,
 #     "cache_hit_rate": 0.8,
 #     "avg_prefix_ratio": 0.33,
 #     ...
 # }
 ```
 ---
 ##### `get_report() -> str`
 Get human-readable report.
 ```python
 report = optimizer.get_report()
 print(report)
 ```
 ---
 ## Integration
 ### ContextEngineeringMixin
 Mixin class for adding context engineering to optimization runners.
 ```python
 class ContextEngineeringMixin:
    def init_context_engineering(self, playbook_path: Path):
        """Initialize context engineering components."""
    def record_trial_outcome(self, trial_number, params, objectives,
                            is_feasible, error=None):
        """Record trial outcome for learning."""
    def get_context_for_llm(self) -> str:
        """Get combined context for LLM consumption."""
    def finalize_context_engineering(self, study_summary: Dict):
        """Finalize learning at study completion."""
 ```
 ---
 ### ContextAwareRunner
 Pre-built runner with context engineering enabled.
 ```python
 from optimization_engine.context import ContextAwareRunner
 runner = ContextAwareRunner(
    config=config_dict,
    playbook_path=Path("knowledge_base/playbook.json")
 )
 # Run optimization with automatic learning
 runner.run()
 ```
 ---
 ## REST API
 The Context Engineering module exposes REST endpoints via FastAPI.
 ### Base URL
 ```
 http://localhost:5000/api/context
 ```
 ### Endpoints
 #### GET `/playbook`
 Get playbook summary statistics.
 **Response:**
 ```json
 {
    "total_items": 45,
    "by_category": {"STRATEGY": 12, "MISTAKE": 8},
    "version": 3,
    "last_updated": "2025-12-29T10:30:00",
    "avg_score": 2.4,
    "top_score": 15,
    "lowest_score": -3
 }
 ```
 ---
 #### GET `/playbook/items`
 List playbook items with optional filters.
 **Query Parameters:**
 - `category` (str): Filter by category (str, mis, tool, cal, dom, wf)
 - `min_score` (int): Minimum net score (default: 0)
 - `min_confidence` (float): Minimum confidence (default: 0.0)
 - `limit` (int): Max items (default: 50)
 - `offset` (int): Pagination offset (default: 0)
 **Response:**
 ```json
 [
    {
        "id": "str_001",
        "category": "str",
        "content": "CMA-ES converges faster on smooth surfaces",
        "helpful_count": 5,
        "harmful_count": 0,
        "net_score": 5,
        "confidence": 1.0,
        "tags": ["sampler", "convergence"],
        "created_at": "2025-12-29T10:00:00",
        "last_used": "2025-12-29T10:30:00"
    }
 ]
 ```
 ---
 #### GET `/playbook/items/{item_id}`
 Get a specific playbook item.
 **Response:** Single PlaybookItemResponse object
 ---
 #### POST `/playbook/feedback`
 Record feedback on a playbook item.
 **Request Body:**
 ```json
 {
    "item_id": "str_001",
    "helpful": true
 }
 ```
 **Response:**
 ```json
 {
    "item_id": "str_001",
    "new_score": 6,
    "new_confidence": 1.0,
    "helpful_count": 6,
    "harmful_count": 0
 }
 ```
 ---
 #### POST `/playbook/insights`
 Add a new insight.
 **Request Body:**
 ```json
 {
    "category": "str",
    "content": "New insight content",
    "tags": ["tag1", "tag2"],
    "source_trial": 42
 }
 ```
 **Response:**
 ```json
 {
    "item_id": "str_015",
    "category": "str",
    "content": "New insight content",
    "message": "Insight added successfully"
 }
 ```
 ---
 #### DELETE `/playbook/items/{item_id}`
 Delete a playbook item.
 **Response:**
 ```json
 {
    "deleted": "str_001",
    "content_preview": "CMA-ES converges faster..."
 }
 ```
 ---
 #### POST `/playbook/prune`
 Remove harmful items.
 **Query Parameters:**
 - `threshold` (int): Net score threshold (default: -3)
 **Response:**
 ```json
 {
    "items_pruned": 3,
    "threshold_used": -3,
    "remaining_items": 42
 }
 ```
 ---
 #### GET `/playbook/context`
 Get playbook context for LLM consumption.
 **Query Parameters:**
 - `task_type` (str): Task type (default: "optimization")
 - `max_items` (int): Maximum items (default: 15)
 - `min_confidence` (float): Minimum confidence (default: 0.5)
 **Response:**
 ```json
 {
    "context": "## Atomizer Knowledge Base\n...",
    "items_included": 15,
    "task_type": "optimization"
 }
 ```
 ---
 #### GET `/session`
 Get current session state.
 **Response:**
 ```json
 {
    "session_id": "abc123",
    "task_type": "run_optimization",
    "study_name": "bracket_v3",
    "study_status": "running",
    "trials_completed": 42,
    "trials_total": 100,
    "best_value": 5.2,
    "recent_actions": ["Started optimization", "Trial 42 complete"],
    "recent_errors": []
 }
 ```
 ---
 #### GET `/session/context`
 Get session context for LLM consumption.
 **Response:**
 ```json
 {
    "context": "## Current Session\nTask: run_optimization\n...",
    "session_id": "abc123",
    "last_updated": "2025-12-29T10:30:00"
 }
 ```
 ---
 #### GET `/cache/stats`
 Get KV-cache statistics.
 **Response:**
 ```json
 {
    "stats": {
        "total_requests": 150,
        "cache_hits": 120,
        "cache_hit_rate": 0.8
    },
    "report": "Cache Performance Report\n..."
 }
 ```
 ---
 #### GET `/learning/report`
 Get comprehensive learning report.
 **Response:**
 ```json
 {
    "generated_at": "2025-12-29T10:30:00",
    "playbook_stats": {...},
    "top_performers": [
        {"id": "str_001", "content": "...", "score": 15}
    ],
    "worst_performers": [
        {"id": "mis_003", "content": "...", "score": -2}
    ],
    "recommendations": [
        "Consider pruning 3 harmful items (net_score < -3)"
    ]
 }
 ```
 ---
 ## Error Handling
 All API endpoints return appropriate HTTP status codes:
 | Code | Meaning |
 |------|---------|
 | 200 | Success |
 | 400 | Bad request (invalid parameters) |
 | 404 | Not found (item doesn't exist) |
 | 500 | Server error (module not available) |
 Error response format:
 ```json
 {
    "detail": "Error description"
 }
 ```
 ---
 ## See Also
 - [Context Engineering Report](../CONTEXT_ENGINEERING_REPORT.md) - Full implementation report
 - [SYS_17 Protocol](../protocols/system/SYS_17_CONTEXT_ENGINEERING.md) - System protocol
 - [Cheatsheet](../../.claude/skills/01_CHEATSHEET.md) - Quick reference
--- a/docs/protocols/system/SYS_17_CONTEXT_ENGINEERING.md
+++ b/docs/protocols/system/SYS_17_CONTEXT_ENGINEERING.md
@@ -0,0 +1,307 @@
 ---
 protocol_id: SYS_17
 version: 1.0
 last_updated: 2025-12-29
 status: active
 owner: system
 code_dependencies:
  - optimization_engine.context.*
 requires_protocols: []
 ---
 # SYS_17: Context Engineering System
 ## Overview
 The Context Engineering System implements the **Agentic Context Engineering (ACE)** framework, enabling Atomizer to learn from every optimization run and accumulate institutional knowledge over time.
 ## When to Load This Protocol
 Load SYS_17 when:
 - User asks about "learning", "playbook", or "context engineering"
 - Debugging why certain knowledge isn't being applied
 - Configuring context behavior
 - Analyzing what the system has learned
 ## Core Concepts
 ### The ACE Framework
 ```
 ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
 │  Generator  │────▶│  Reflector  │────▶│   Curator   │
 │ (Opt Runs)  │     │ (Analysis)  │     │ (Playbook)  │
 └─────────────┘     └─────────────┘     └─────────────┘
       │                                       │
       └───────────── Feedback ───────────────┘
 ```
 1. **Generator**: OptimizationRunner produces trial outcomes
 2. **Reflector**: Analyzes outcomes, extracts patterns
 3. **Curator**: Playbook stores and manages insights
 4. **Feedback**: Success/failure updates insight scores
 ### Playbook Item Structure
 ```
 [str-00001] helpful=8 harmful=0 :: "Use shell elements for thin walls"
  │           │          │            │
  │           │          │            └── Insight content
  │           │          └── Times advice led to failure
  │           └── Times advice led to success
  └── Unique ID (category-number)
 ```
 ### Categories
 | Code | Name | Description | Example |
 |------|------|-------------|---------|
 | `str` | STRATEGY | Optimization approaches | "Start with TPE, switch to CMA-ES" |
 | `mis` | MISTAKE | Things to avoid | "Don't use coarse mesh for stress" |
 | `tool` | TOOL | Tool usage tips | "Use GP sampler for few-shot" |
 | `cal` | CALCULATION | Formulas | "Safety factor = yield/max_stress" |
 | `dom` | DOMAIN | Domain knowledge | "Zernike coefficients for mirrors" |
 | `wf` | WORKFLOW | Workflow patterns | "Load _i.prt before UpdateFemodel()" |
 ## Key Components
 ### 1. AtomizerPlaybook
 Location: `optimization_engine/context/playbook.py`
 The central knowledge store. Handles:
 - Adding insights (with auto-deduplication)
 - Recording helpful/harmful outcomes
 - Generating filtered context for LLM
 - Pruning consistently harmful items
 - Persistence (JSON)
 **Quick Usage:**
 ```python
 from optimization_engine.context import get_playbook, save_playbook, InsightCategory
 playbook = get_playbook()
 playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements for thin walls")
 playbook.record_outcome("str-00001", helpful=True)
 save_playbook()
 ```
 ### 2. AtomizerReflector
 Location: `optimization_engine/context/reflector.py`
 Analyzes optimization outcomes to extract insights:
 - Classifies errors (convergence, mesh, singularity, etc.)
 - Extracts success patterns
 - Generates study-level insights
 **Quick Usage:**
 ```python
 from optimization_engine.context import AtomizerReflector, OptimizationOutcome
 reflector = AtomizerReflector(playbook)
 outcome = OptimizationOutcome(trial_number=42, success=True, ...)
 insights = reflector.analyze_trial(outcome)
 reflector.commit_insights()
 ```
 ### 3. FeedbackLoop
 Location: `optimization_engine/context/feedback_loop.py`
 Automated learning loop that:
 - Processes trial results
 - Updates playbook scores based on outcomes
 - Tracks which items were active per trial
 - Finalizes learning at study end
 **Quick Usage:**
 ```python
 from optimization_engine.context import FeedbackLoop
 feedback = FeedbackLoop(playbook_path)
 feedback.process_trial_result(trial_number=42, success=True, ...)
 feedback.finalize_study({"name": "study", "total_trials": 100, ...})
 ```
 ### 4. SessionState
 Location: `optimization_engine/context/session_state.py`
 Manages context isolation:
 - **Exposed**: Always in LLM context (task type, recent actions, errors)
 - **Isolated**: On-demand access (full history, NX paths, F06 content)
 **Quick Usage:**
 ```python
 from optimization_engine.context import get_session, TaskType
 session = get_session()
 session.exposed.task_type = TaskType.RUN_OPTIMIZATION
 session.add_action("Started trial 42")
 context = session.get_llm_context()
 ```
 ### 5. CompactionManager
 Location: `optimization_engine/context/compaction.py`
 Handles long sessions:
 - Triggers compaction at threshold (default 50 events)
 - Summarizes old events into statistics
 - Preserves errors and milestones
 ### 6. CacheOptimizer
 Location: `optimization_engine/context/cache_monitor.py`
 Optimizes for KV-cache:
 - Three-tier context structure (stable/semi-stable/dynamic)
 - Tracks cache hit rate
 - Estimates cost savings
 ## Integration with OptimizationRunner
 ### Option 1: Mixin
 ```python
 from optimization_engine.context.runner_integration import ContextEngineeringMixin
 class MyRunner(ContextEngineeringMixin, OptimizationRunner):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.init_context_engineering()
 ```
 ### Option 2: Wrapper
 ```python
 from optimization_engine.context.runner_integration import ContextAwareRunner
 runner = OptimizationRunner(config_path=...)
 context_runner = ContextAwareRunner(runner)
 context_runner.run(n_trials=100)
 ```
 ## Dashboard API
 Base URL: `/api/context`
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/playbook` | GET | Playbook summary |
 | `/playbook/items` | GET | List items (with filters) |
 | `/playbook/items/{id}` | GET | Get specific item |
 | `/playbook/feedback` | POST | Record helpful/harmful |
 | `/playbook/insights` | POST | Add new insight |
 | `/playbook/prune` | POST | Prune harmful items |
 | `/playbook/context` | GET | Get LLM context string |
 | `/session` | GET | Session state |
 | `/learning/report` | GET | Learning report |
 ## Best Practices
 ### 1. Record Immediately
 Don't wait until session end:
 ```python
 # RIGHT: Record immediately
 playbook.add_insight(InsightCategory.MISTAKE, "Convergence failed with X")
 playbook.save(path)
 # WRONG: Wait until end
 # (User might close session, learning lost)
 ```
 ### 2. Be Specific
 ```python
 # GOOD: Specific and actionable
 "For bracket optimization with >5 variables, TPE outperforms random search"
 # BAD: Vague
 "TPE is good"
 ```
 ### 3. Include Context
 ```python
 playbook.add_insight(
    InsightCategory.STRATEGY,
    "Shell elements reduce solve time by 40% for thickness < 2mm",
    tags=["mesh", "shell", "performance"]
 )
 ```
 ### 4. Review Harmful Items
 Periodically check items with negative scores:
 ```python
 harmful = [i for i in playbook.items.values() if i.net_score < 0]
 for item in harmful:
    print(f"{item.id}: {item.content[:50]}... (score={item.net_score})")
 ```
 ## Troubleshooting
 ### Playbook Not Updating
 1. Check playbook path:
 ```python
 print(playbook_path)  # Should be knowledge_base/playbook.json
 ```
 2. Verify save is called:
 ```python
 playbook.save(path)  # Must be explicit
 ```
 ### Insights Not Appearing in Context
 1. Check confidence threshold:
 ```python
 # Default is 0.5 - new items start at 0.5
 context = playbook.get_context_for_task("opt", min_confidence=0.3)
 ```
 2. Check if items exist:
 ```python
 print(f"Total items: {len(playbook.items)}")
 ```
 ### Learning Not Working
 1. Verify FeedbackLoop is finalized:
 ```python
 feedback.finalize_study(...)  # MUST be called
 ```
 2. Check context_items_used parameter:
 ```python
 # Items must be explicitly tracked
 feedback.process_trial_result(
    ...,
    context_items_used=list(playbook.items.keys())[:10]
 )
 ```
 ## Files Reference
 | File | Purpose |
 |------|---------|
 | `optimization_engine/context/__init__.py` | Module exports |
 | `optimization_engine/context/playbook.py` | Knowledge store |
 | `optimization_engine/context/reflector.py` | Outcome analysis |
 | `optimization_engine/context/session_state.py` | Context isolation |
 | `optimization_engine/context/feedback_loop.py` | Learning loop |
 | `optimization_engine/context/compaction.py` | Long session management |
 | `optimization_engine/context/cache_monitor.py` | KV-cache optimization |
 | `optimization_engine/context/runner_integration.py` | Runner integration |
 | `knowledge_base/playbook.json` | Persistent storage |
 ## See Also
 - `docs/CONTEXT_ENGINEERING_REPORT.md` - Full implementation report
 - `.claude/skills/00_BOOTSTRAP_V2.md` - Enhanced bootstrap
 - `tests/test_context_engineering.py` - Unit tests
 - `tests/test_context_integration.py` - Integration tests
--- a/optimization_engine/context/init.py
+++ b/optimization_engine/context/init.py
@@ -0,0 +1,123 @@
 """
 Atomizer Context Engineering Module
 Implements state-of-the-art context engineering for LLM-powered optimization.
 Based on the ACE (Agentic Context Engineering) framework.
 Components:
 - Playbook: Structured knowledge store with helpful/harmful tracking
 - Reflector: Analyzes optimization outcomes to extract insights
 - SessionState: Context isolation with exposed/isolated separation
 - CacheMonitor: KV-cache optimization for cost reduction
 - FeedbackLoop: Automated learning from execution
 - Compaction: Long-running session context management
 Usage:
    from optimization_engine.context import (
        AtomizerPlaybook,
        AtomizerReflector,
        AtomizerSessionState,
        FeedbackLoop,
        CompactionManager
    )
    # Load or create playbook
    playbook = AtomizerPlaybook.load(path)
    # Create feedback loop for learning
    feedback = FeedbackLoop(playbook_path)
    # Process trial results
    feedback.process_trial_result(...)
    # Finalize and commit learning
    feedback.finalize_study(stats)
 """
 from .playbook import (
    AtomizerPlaybook,
    PlaybookItem,
    InsightCategory,
    get_playbook,
    save_playbook,
 )
 from .reflector import (
    AtomizerReflector,
    OptimizationOutcome,
    InsightCandidate,
    ReflectorFactory,
 )
 from .session_state import (
    AtomizerSessionState,
    ExposedState,
    IsolatedState,
    TaskType,
    get_session,
    set_session,
    clear_session,
 )
 from .cache_monitor import (
    ContextCacheOptimizer,
    CacheStats,
    ContextSection,
    StablePrefixBuilder,
    get_cache_optimizer,
 )
 from .feedback_loop import (
    FeedbackLoop,
    FeedbackLoopFactory,
 )
 from .compaction import (
    CompactionManager,
    ContextEvent,
    EventType,
    ContextBudgetManager,
 )
 __all__ = [
    # Playbook
    "AtomizerPlaybook",
    "PlaybookItem",
    "InsightCategory",
    "get_playbook",
    "save_playbook",
    # Reflector
    "AtomizerReflector",
    "OptimizationOutcome",
    "InsightCandidate",
    "ReflectorFactory",
    # Session State
    "AtomizerSessionState",
    "ExposedState",
    "IsolatedState",
    "TaskType",
    "get_session",
    "set_session",
    "clear_session",
    # Cache Monitor
    "ContextCacheOptimizer",
    "CacheStats",
    "ContextSection",
    "StablePrefixBuilder",
    "get_cache_optimizer",
    # Feedback Loop
    "FeedbackLoop",
    "FeedbackLoopFactory",
    # Compaction
    "CompactionManager",
    "ContextEvent",
    "EventType",
    "ContextBudgetManager",
 ]
 __version__ = "1.0.0"
--- a/optimization_engine/context/cache_monitor.py
+++ b/optimization_engine/context/cache_monitor.py
@@ -0,0 +1,390 @@
 """
 Atomizer Cache Monitor - KV-Cache Optimization
 Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
 Monitors and optimizes KV-cache hit rates for cost reduction.
 Based on the principle that cached tokens cost ~10x less than uncached.
 The cache monitor tracks:
 - Stable prefix length (should stay constant for cache hits)
 - Cache hit rate across requests
 - Estimated cost savings
 Structure for KV-cache optimization:
 1. STABLE PREFIX - Never changes (identity, tools, routing)
 2. SEMI-STABLE - Changes per session type (protocols, playbook)
 3. DYNAMIC - Changes every turn (state, user message)
 """
 from dataclasses import dataclass, field
 from typing import Optional, List, Dict, Any
 from datetime import datetime
 import hashlib
 import json
 from pathlib import Path
@dataclass
 class CacheStats:
    """Statistics for cache efficiency tracking."""
    total_requests: int = 0
    cache_hits: int = 0
    cache_misses: int = 0
    prefix_length_chars: int = 0
    prefix_length_tokens: int = 0  # Estimated
    @property
    def hit_rate(self) -> float:
        """Calculate cache hit rate (0.0-1.0)."""
        if self.total_requests == 0:
            return 0.0
        return self.cache_hits / self.total_requests
    @property
    def estimated_savings_percent(self) -> float:
        """
        Estimate cost savings from cache hits.
        Based on ~10x cost difference between cached/uncached tokens.
        """
        if self.total_requests == 0:
            return 0.0
        # Cached tokens cost ~10% of uncached
        # So savings = hit_rate * 90%
        return self.hit_rate * 90.0
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary."""
        return {
            "total_requests": self.total_requests,
            "cache_hits": self.cache_hits,
            "cache_misses": self.cache_misses,
            "hit_rate": self.hit_rate,
            "prefix_length_chars": self.prefix_length_chars,
            "prefix_length_tokens": self.prefix_length_tokens,
            "estimated_savings_percent": self.estimated_savings_percent
        }
@dataclass
 class ContextSection:
    """A section of context with stability classification."""
    name: str
    content: str
    stability: str  # "stable", "semi_stable", "dynamic"
    last_hash: str = ""
    def compute_hash(self) -> str:
        """Compute content hash for change detection."""
        return hashlib.md5(self.content.encode()).hexdigest()
    def has_changed(self) -> bool:
        """Check if content has changed since last hash."""
        current_hash = self.compute_hash()
        changed = current_hash != self.last_hash
        self.last_hash = current_hash
        return changed
 class ContextCacheOptimizer:
    """
    Tracks and optimizes context for cache efficiency.
    Implements the three-tier context structure:
    1. Stable prefix (cached across all requests)
    2. Semi-stable section (cached per session type)
    3. Dynamic section (changes every turn)
    Usage:
        optimizer = ContextCacheOptimizer()
        # Build context with cache optimization
        context = optimizer.prepare_context(
            stable_prefix=identity_and_tools,
            semi_stable=protocols_and_playbook,
            dynamic=state_and_message
        )
        # Check efficiency
        print(optimizer.get_report())
    """
    # Approximate tokens per character for estimation
    CHARS_PER_TOKEN = 4
    def __init__(self):
        self.stats = CacheStats()
        self._sections: Dict[str, ContextSection] = {}
        self._last_stable_hash: Optional[str] = None
        self._last_semi_stable_hash: Optional[str] = None
        self._request_history: List[Dict[str, Any]] = []
    def prepare_context(
        self,
        stable_prefix: str,
        semi_stable: str,
        dynamic: str
    ) -> str:
        """
        Assemble context optimized for caching.
        Tracks whether prefix changed (cache miss).
        Args:
            stable_prefix: Content that never changes (tools, identity)
            semi_stable: Content that changes per session type
            dynamic: Content that changes every turn
        Returns:
            Assembled context string with clear section boundaries
        """
        # Hash the stable prefix
        stable_hash = hashlib.md5(stable_prefix.encode()).hexdigest()
        self.stats.total_requests += 1
        # Check for cache hit (stable prefix unchanged)
        if stable_hash == self._last_stable_hash:
            self.stats.cache_hits += 1
        else:
            self.stats.cache_misses += 1
        self._last_stable_hash = stable_hash
        self.stats.prefix_length_chars = len(stable_prefix)
        self.stats.prefix_length_tokens = len(stable_prefix) // self.CHARS_PER_TOKEN
        # Record request for history
        self._request_history.append({
            "timestamp": datetime.now().isoformat(),
            "cache_hit": stable_hash == self._last_stable_hash,
            "stable_length": len(stable_prefix),
            "semi_stable_length": len(semi_stable),
            "dynamic_length": len(dynamic)
        })
        # Keep history bounded
        if len(self._request_history) > 100:
            self._request_history = self._request_history[-100:]
        # Assemble with clear boundaries
        # Using markdown horizontal rules as section separators
        return f"""{stable_prefix}
 ---
 {semi_stable}
 ---
 {dynamic}"""
    def register_section(
        self,
        name: str,
        content: str,
        stability: str = "dynamic"
    ) -> None:
        """
        Register a context section for change tracking.
        Args:
            name: Section identifier
            content: Section content
            stability: One of "stable", "semi_stable", "dynamic"
        """
        section = ContextSection(
            name=name,
            content=content,
            stability=stability
        )
        section.last_hash = section.compute_hash()
        self._sections[name] = section
    def check_section_changes(self) -> Dict[str, bool]:
        """
        Check which sections have changed.
        Returns:
            Dictionary mapping section names to change status
        """
        changes = {}
        for name, section in self._sections.items():
            changes[name] = section.has_changed()
        return changes
    def get_stable_sections(self) -> List[str]:
        """Get names of sections marked as stable."""
        return [
            name for name, section in self._sections.items()
            if section.stability == "stable"
        ]
    def get_report(self) -> str:
        """Generate human-readable cache efficiency report."""
        return f"""
 Cache Efficiency Report
 =======================
 Requests: {self.stats.total_requests}
 Cache Hits: {self.stats.cache_hits}
 Cache Misses: {self.stats.cache_misses}
 Hit Rate: {self.stats.hit_rate:.1%}
 Stable Prefix:
 - Characters: {self.stats.prefix_length_chars:,}
 - Estimated Tokens: {self.stats.prefix_length_tokens:,}
 Cost Impact:
 - Estimated Savings: {self.stats.estimated_savings_percent:.0f}%
 - (Based on 10x cost difference for cached tokens)
 Recommendations:
 {self._get_recommendations()}
 """
    def _get_recommendations(self) -> str:
        """Generate optimization recommendations."""
        recommendations = []
        if self.stats.hit_rate < 0.5 and self.stats.total_requests > 5:
            recommendations.append(
                "- Low cache hit rate: Check if stable prefix is actually stable"
            )
        if self.stats.prefix_length_tokens > 5000:
            recommendations.append(
                "- Large stable prefix: Consider moving less-stable content to semi-stable"
            )
        if self.stats.prefix_length_tokens < 1000:
            recommendations.append(
                "- Small stable prefix: Consider moving more content to stable section"
            )
        if not recommendations:
            recommendations.append("- Cache performance looks good!")
        return "\n".join(recommendations)
    def get_stats_dict(self) -> Dict[str, Any]:
        """Get statistics as dictionary."""
        return self.stats.to_dict()
    def reset_stats(self) -> None:
        """Reset all statistics."""
        self.stats = CacheStats()
        self._request_history = []
    def save_stats(self, path: Path) -> None:
        """Save statistics to JSON file."""
        data = {
            "stats": self.stats.to_dict(),
            "request_history": self._request_history[-50:],  # Last 50
            "sections": {
                name: {
                    "stability": s.stability,
                    "content_length": len(s.content)
                }
                for name, s in self._sections.items()
            }
        }
        path.parent.mkdir(parents=True, exist_ok=True)
        with open(path, 'w', encoding='utf-8') as f:
            json.dump(data, f, indent=2)
    @classmethod
    def load_stats(cls, path: Path) -> "ContextCacheOptimizer":
        """Load statistics from JSON file."""
        optimizer = cls()
        if not path.exists():
            return optimizer
        with open(path, encoding='utf-8') as f:
            data = json.load(f)
        stats = data.get("stats", {})
        optimizer.stats.total_requests = stats.get("total_requests", 0)
        optimizer.stats.cache_hits = stats.get("cache_hits", 0)
        optimizer.stats.cache_misses = stats.get("cache_misses", 0)
        optimizer.stats.prefix_length_chars = stats.get("prefix_length_chars", 0)
        optimizer.stats.prefix_length_tokens = stats.get("prefix_length_tokens", 0)
        optimizer._request_history = data.get("request_history", [])
        return optimizer
 class StablePrefixBuilder:
    """
    Helper for building stable prefix content.
    Ensures consistent ordering and formatting of stable content
    to maximize cache hits.
    """
    def __init__(self):
        self._sections: List[tuple] = []  # (order, name, content)
    def add_section(self, name: str, content: str, order: int = 50) -> "StablePrefixBuilder":
        """
        Add a section to the stable prefix.
        Args:
            name: Section name (for documentation)
            content: Section content
            order: Sort order (lower = earlier)
        Returns:
            Self for chaining
        """
        self._sections.append((order, name, content))
        return self
    def add_identity(self, identity: str) -> "StablePrefixBuilder":
        """Add identity section (order 10)."""
        return self.add_section("identity", identity, order=10)
    def add_capabilities(self, capabilities: str) -> "StablePrefixBuilder":
        """Add capabilities section (order 20)."""
        return self.add_section("capabilities", capabilities, order=20)
    def add_tools(self, tools: str) -> "StablePrefixBuilder":
        """Add tools section (order 30)."""
        return self.add_section("tools", tools, order=30)
    def add_routing(self, routing: str) -> "StablePrefixBuilder":
        """Add routing section (order 40)."""
        return self.add_section("routing", routing, order=40)
    def build(self) -> str:
        """
        Build the stable prefix string.
        Sections are sorted by order to ensure consistency.
        Returns:
            Assembled stable prefix
        """
        # Sort by order
        sorted_sections = sorted(self._sections, key=lambda x: x[0])
        lines = []
        for _, name, content in sorted_sections:
            lines.append(f"<!-- {name} -->")
            lines.append(content.strip())
            lines.append("")
        return "\n".join(lines)
 # Global cache optimizer instance
 _global_optimizer: Optional[ContextCacheOptimizer] = None
 def get_cache_optimizer() -> ContextCacheOptimizer:
    """Get the global cache optimizer instance."""
    global _global_optimizer
    if _global_optimizer is None:
        _global_optimizer = ContextCacheOptimizer()
    return _global_optimizer
--- a/optimization_engine/context/compaction.py
+++ b/optimization_engine/context/compaction.py
@@ -0,0 +1,520 @@
 """
 Atomizer Context Compaction - Long-Running Session Management
 Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
 Based on Google ADK's compaction architecture:
 - Trigger compaction when threshold reached
 - Summarize older events
 - Preserve recent detail
 - Never compact error events
 This module handles context management for long-running optimizations
 that may exceed context window limits.
 """
 from typing import List, Dict, Any, Optional
 from dataclasses import dataclass, field
 from datetime import datetime
 from enum import Enum
 class EventType(Enum):
    """Types of events in optimization context."""
    TRIAL_START = "trial_start"
    TRIAL_COMPLETE = "trial_complete"
    TRIAL_FAILED = "trial_failed"
    ERROR = "error"
    WARNING = "warning"
    MILESTONE = "milestone"
    COMPACTION = "compaction"
    STUDY_START = "study_start"
    STUDY_END = "study_end"
    CONFIG_CHANGE = "config_change"
@dataclass
 class ContextEvent:
    """
    Single event in optimization context.
    Events are the atomic units of context history.
    They can be compacted (summarized) or preserved based on importance.
    """
    timestamp: datetime
    event_type: EventType
    summary: str
    details: Dict[str, Any] = field(default_factory=dict)
    compacted: bool = False
    preserve: bool = False  # If True, never compact this event
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary."""
        return {
            "timestamp": self.timestamp.isoformat(),
            "event_type": self.event_type.value,
            "summary": self.summary,
            "details": self.details,
            "compacted": self.compacted,
            "preserve": self.preserve
        }
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> "ContextEvent":
        """Create from dictionary."""
        return cls(
            timestamp=datetime.fromisoformat(data["timestamp"]),
            event_type=EventType(data["event_type"]),
            summary=data["summary"],
            details=data.get("details", {}),
            compacted=data.get("compacted", False),
            preserve=data.get("preserve", False)
        )
 class CompactionManager:
    """
    Manages context compaction for long optimization sessions.
    Strategy:
    - Keep last N events in full detail
    - Summarize older events into milestone markers
    - Preserve error events (never compact errors)
    - Track statistics for optimization insights
    Usage:
        manager = CompactionManager(compaction_threshold=50, keep_recent=20)
        # Add events as they occur
        manager.add_event(ContextEvent(
            timestamp=datetime.now(),
            event_type=EventType.TRIAL_COMPLETE,
            summary="Trial 42 complete: obj=100.5",
            details={"trial_number": 42, "objective": 100.5}
        ))
        # Get context string for LLM
        context = manager.get_context_string()
        # Check if compaction occurred
        print(f"Compactions: {manager.compaction_count}")
    """
    def __init__(
        self,
        compaction_threshold: int = 50,
        keep_recent: int = 20,
        keep_errors: bool = True
    ):
        """
        Initialize compaction manager.
        Args:
            compaction_threshold: Trigger compaction when events exceed this
            keep_recent: Number of recent events to always keep in detail
            keep_errors: Whether to preserve all error events
        """
        self.events: List[ContextEvent] = []
        self.compaction_threshold = compaction_threshold
        self.keep_recent = keep_recent
        self.keep_errors = keep_errors
        self.compaction_count = 0
        # Statistics for compacted regions
        self._compaction_stats: List[Dict[str, Any]] = []
    def add_event(self, event: ContextEvent) -> bool:
        """
        Add event and trigger compaction if needed.
        Args:
            event: The event to add
        Returns:
            True if compaction was triggered
        """
        # Mark errors as preserved
        if event.event_type == EventType.ERROR and self.keep_errors:
            event.preserve = True
        self.events.append(event)
        # Check if compaction needed
        if len(self.events) > self.compaction_threshold:
            self._compact()
            return True
        return False
    def add_trial_event(
        self,
        trial_number: int,
        success: bool,
        objective: Optional[float] = None,
        duration: Optional[float] = None
    ) -> None:
        """
        Convenience method to add a trial completion event.
        Args:
            trial_number: Trial number
            success: Whether trial succeeded
            objective: Objective value (if successful)
            duration: Trial duration in seconds
        """
        event_type = EventType.TRIAL_COMPLETE if success else EventType.TRIAL_FAILED
        summary_parts = [f"Trial {trial_number}"]
        if success and objective is not None:
            summary_parts.append(f"obj={objective:.4g}")
        elif not success:
            summary_parts.append("FAILED")
        if duration is not None:
            summary_parts.append(f"{duration:.1f}s")
        self.add_event(ContextEvent(
            timestamp=datetime.now(),
            event_type=event_type,
            summary=" | ".join(summary_parts),
            details={
                "trial_number": trial_number,
                "success": success,
                "objective": objective,
                "duration": duration
            }
        ))
    def add_error_event(self, error_message: str, error_type: str = "") -> None:
        """
        Add an error event (always preserved).
        Args:
            error_message: Error description
            error_type: Optional error classification
        """
        summary = f"[{error_type}] {error_message}" if error_type else error_message
        self.add_event(ContextEvent(
            timestamp=datetime.now(),
            event_type=EventType.ERROR,
            summary=summary,
            details={"error_type": error_type, "message": error_message},
            preserve=True
        ))
    def add_milestone(self, description: str, details: Optional[Dict[str, Any]] = None) -> None:
        """
        Add a milestone event (preserved).
        Args:
            description: Milestone description
            details: Optional additional details
        """
        self.add_event(ContextEvent(
            timestamp=datetime.now(),
            event_type=EventType.MILESTONE,
            summary=description,
            details=details or {},
            preserve=True
        ))
    def _compact(self) -> None:
        """
        Compact older events into summaries.
        Preserves:
        - All error events (if keep_errors=True)
        - Events marked with preserve=True
        - Last `keep_recent` events
        - Milestone summaries of compacted regions
        """
        if len(self.events) <= self.keep_recent:
            return
        # Split into old and recent
        old_events = self.events[:-self.keep_recent]
        recent_events = self.events[-self.keep_recent:]
        # Separate preserved from compactable
        preserved_events = [e for e in old_events if e.preserve]
        compactable_events = [e for e in old_events if not e.preserve]
        # Summarize compactable events
        if compactable_events:
            summary = self._create_summary(compactable_events)
            compaction_event = ContextEvent(
                timestamp=compactable_events[0].timestamp,
                event_type=EventType.COMPACTION,
                summary=summary,
                details={
                    "events_compacted": len(compactable_events),
                    "compaction_number": self.compaction_count,
                    "time_range": {
                        "start": compactable_events[0].timestamp.isoformat(),
                        "end": compactable_events[-1].timestamp.isoformat()
                    }
                },
                compacted=True
            )
            self.compaction_count += 1
            # Store compaction statistics
            self._compaction_stats.append({
                "compaction_number": self.compaction_count,
                "events_compacted": len(compactable_events),
                "summary": summary
            })
            # Rebuild events list
            self.events = [compaction_event] + preserved_events + recent_events
        else:
            self.events = preserved_events + recent_events
    def _create_summary(self, events: List[ContextEvent]) -> str:
        """
        Create summary of compacted events.
        Args:
            events: List of events to summarize
        Returns:
            Summary string
        """
        # Collect trial statistics
        trial_events = [
            e for e in events
            if e.event_type in (EventType.TRIAL_COMPLETE, EventType.TRIAL_FAILED)
        ]
        if not trial_events:
            return f"[{len(events)} events compacted]"
        # Extract trial statistics
        trial_numbers = []
        objectives = []
        failures = 0
        for e in trial_events:
            if "trial_number" in e.details:
                trial_numbers.append(e.details["trial_number"])
            if "objective" in e.details and e.details["objective"] is not None:
                objectives.append(e.details["objective"])
            if e.event_type == EventType.TRIAL_FAILED:
                failures += 1
        if trial_numbers and objectives:
            return (
                f"Trials {min(trial_numbers)}-{max(trial_numbers)}: "
                f"Best={min(objectives):.4g}, "
                f"Avg={sum(objectives)/len(objectives):.4g}, "
                f"Failures={failures}"
            )
        elif trial_numbers:
            return f"Trials {min(trial_numbers)}-{max(trial_numbers)} ({failures} failures)"
        else:
            return f"[{len(events)} events compacted]"
    def get_context_string(self, include_timestamps: bool = False) -> str:
        """
        Generate context string from events.
        Args:
            include_timestamps: Whether to include timestamps
        Returns:
            Formatted context string for LLM
        """
        lines = ["## Optimization History", ""]
        for event in self.events:
            timestamp = ""
            if include_timestamps:
                timestamp = f"[{event.timestamp.strftime('%H:%M:%S')}] "
            if event.compacted:
                lines.append(f"📦 {timestamp}{event.summary}")
            elif event.event_type == EventType.ERROR:
                lines.append(f"❌ {timestamp}{event.summary}")
            elif event.event_type == EventType.WARNING:
                lines.append(f"⚠️ {timestamp}{event.summary}")
            elif event.event_type == EventType.MILESTONE:
                lines.append(f"🎯 {timestamp}{event.summary}")
            elif event.event_type == EventType.TRIAL_FAILED:
                lines.append(f"✗ {timestamp}{event.summary}")
            elif event.event_type == EventType.TRIAL_COMPLETE:
                lines.append(f"✓ {timestamp}{event.summary}")
            else:
                lines.append(f"- {timestamp}{event.summary}")
        return "\n".join(lines)
    def get_stats(self) -> Dict[str, Any]:
        """Get compaction statistics."""
        event_counts = {}
        for event in self.events:
            etype = event.event_type.value
            event_counts[etype] = event_counts.get(etype, 0) + 1
        return {
            "total_events": len(self.events),
            "compaction_count": self.compaction_count,
            "events_by_type": event_counts,
            "error_events": event_counts.get("error", 0),
            "compacted_events": len([e for e in self.events if e.compacted]),
            "preserved_events": len([e for e in self.events if e.preserve]),
            "compaction_history": self._compaction_stats[-5:]  # Last 5
        }
    def get_recent_events(self, n: int = 10) -> List[ContextEvent]:
        """Get the n most recent events."""
        return self.events[-n:]
    def get_errors(self) -> List[ContextEvent]:
        """Get all error events."""
        return [e for e in self.events if e.event_type == EventType.ERROR]
    def clear(self) -> None:
        """Clear all events and reset state."""
        self.events = []
        self.compaction_count = 0
        self._compaction_stats = []
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary for serialization."""
        return {
            "events": [e.to_dict() for e in self.events],
            "compaction_threshold": self.compaction_threshold,
            "keep_recent": self.keep_recent,
            "keep_errors": self.keep_errors,
            "compaction_count": self.compaction_count,
            "compaction_stats": self._compaction_stats
        }
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> "CompactionManager":
        """Create from dictionary."""
        manager = cls(
            compaction_threshold=data.get("compaction_threshold", 50),
            keep_recent=data.get("keep_recent", 20),
            keep_errors=data.get("keep_errors", True)
        )
        manager.events = [ContextEvent.from_dict(e) for e in data.get("events", [])]
        manager.compaction_count = data.get("compaction_count", 0)
        manager._compaction_stats = data.get("compaction_stats", [])
        return manager
 class ContextBudgetManager:
    """
    Manages overall context budget across sessions.
    Tracks:
    - Token estimates for each context section
    - Recommendations for context reduction
    - Budget allocation warnings
    """
    # Approximate tokens per character
    CHARS_PER_TOKEN = 4
    # Default budget allocation (tokens)
    DEFAULT_BUDGET = {
        "stable_prefix": 5000,
        "protocols": 10000,
        "playbook": 5000,
        "session_state": 2000,
        "conversation": 30000,
        "working_space": 48000,
        "total": 100000
    }
    def __init__(self, budget: Optional[Dict[str, int]] = None):
        """
        Initialize budget manager.
        Args:
            budget: Custom budget allocation (uses defaults if not provided)
        """
        self.budget = budget or self.DEFAULT_BUDGET.copy()
        self._current_usage: Dict[str, int] = {k: 0 for k in self.budget.keys()}
    def estimate_tokens(self, text: str) -> int:
        """Estimate token count for text."""
        return len(text) // self.CHARS_PER_TOKEN
    def update_usage(self, section: str, text: str) -> Dict[str, Any]:
        """
        Update usage for a section.
        Args:
            section: Budget section name
            text: Content of the section
        Returns:
            Usage status with warnings if over budget
        """
        tokens = self.estimate_tokens(text)
        self._current_usage[section] = tokens
        result = {
            "section": section,
            "tokens": tokens,
            "budget": self.budget.get(section, 0),
            "over_budget": tokens > self.budget.get(section, float('inf'))
        }
        if result["over_budget"]:
            result["warning"] = f"{section} exceeds budget by {tokens - self.budget[section]} tokens"
        return result
    def get_total_usage(self) -> int:
        """Get total token usage across all sections."""
        return sum(self._current_usage.values())
    def get_status(self) -> Dict[str, Any]:
        """Get overall budget status."""
        total_used = self.get_total_usage()
        total_budget = self.budget.get("total", 100000)
        return {
            "total_used": total_used,
            "total_budget": total_budget,
            "utilization": total_used / total_budget,
            "by_section": {
                section: {
                    "used": self._current_usage.get(section, 0),
                    "budget": self.budget.get(section, 0),
                    "utilization": (
                        self._current_usage.get(section, 0) / self.budget.get(section, 1)
                        if self.budget.get(section, 0) > 0 else 0
                    )
                }
                for section in self.budget.keys()
                if section != "total"
            },
            "recommendations": self._get_recommendations()
        }
    def _get_recommendations(self) -> List[str]:
        """Generate budget recommendations."""
        recommendations = []
        total_used = self.get_total_usage()
        total_budget = self.budget.get("total", 100000)
        if total_used > total_budget * 0.9:
            recommendations.append("Context usage > 90%. Consider triggering compaction.")
        for section, used in self._current_usage.items():
            budget = self.budget.get(section, 0)
            if budget > 0 and used > budget:
                recommendations.append(
                    f"{section}: {used - budget} tokens over budget. Reduce content."
                )
        if not recommendations:
            recommendations.append("Budget healthy.")
        return recommendations
--- a/optimization_engine/context/feedback_loop.py
+++ b/optimization_engine/context/feedback_loop.py
@@ -0,0 +1,378 @@
 """
 Atomizer Feedback Loop - Automated Learning from Execution
 Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
 Connects optimization outcomes to playbook updates using the principle:
 "Leverage natural execution feedback as the learning signal"
 The feedback loop:
 1. Observes trial outcomes (success/failure)
 2. Tracks which playbook items were active during each trial
 3. Updates helpful/harmful counts based on outcomes
 4. Commits new insights from the reflector
 This implements true self-improvement: the system gets better
 at optimization over time by learning from its own execution.
 """
 from typing import Dict, Any, List, Optional
 from pathlib import Path
 from datetime import datetime
 import json
 from .playbook import AtomizerPlaybook, InsightCategory
 from .reflector import AtomizerReflector, OptimizationOutcome
 class FeedbackLoop:
    """
    Automated feedback loop that learns from optimization runs.
    Key insight from ACE: Use execution feedback (success/failure)
    as the learning signal, not labeled data.
    Usage:
        feedback = FeedbackLoop(playbook_path)
        # After each trial
        feedback.process_trial_result(
            trial_number=42,
            success=True,
            objective_value=100.5,
            design_variables={"thickness": 1.5},
            context_items_used=["str-00001", "mis-00003"]
        )
        # After study completion
        result = feedback.finalize_study(study_stats)
        print(f"Added {result['insights_added']} insights")
    """
    def __init__(self, playbook_path: Path):
        """
        Initialize feedback loop with playbook path.
        Args:
            playbook_path: Path to the playbook JSON file
        """
        self.playbook_path = playbook_path
        self.playbook = AtomizerPlaybook.load(playbook_path)
        self.reflector = AtomizerReflector(self.playbook)
        # Track items used per trial for attribution
        self._trial_item_usage: Dict[int, List[str]] = {}
        # Track outcomes for batch analysis
        self._outcomes: List[OptimizationOutcome] = []
        # Statistics
        self._total_trials_processed = 0
        self._successful_trials = 0
        self._failed_trials = 0
    def process_trial_result(
        self,
        trial_number: int,
        success: bool,
        objective_value: float,
        design_variables: Dict[str, float],
        context_items_used: Optional[List[str]] = None,
        errors: Optional[List[str]] = None,
        extractor_used: str = "",
        duration_seconds: float = 0.0
    ) -> Dict[str, Any]:
        """
        Process a trial result and update playbook accordingly.
        This is the core learning mechanism:
        - If trial succeeded with certain playbook items -> increase helpful count
        - If trial failed with certain playbook items -> increase harmful count
        Args:
            trial_number: Trial number
            success: Whether the trial succeeded
            objective_value: Objective function value (0 if failed)
            design_variables: Design variable values used
            context_items_used: List of playbook item IDs in context
            errors: List of error messages (if any)
            extractor_used: Name of extractor used
            duration_seconds: Trial duration
        Returns:
            Dictionary with processing results
        """
        context_items_used = context_items_used or []
        errors = errors or []
        # Update statistics
        self._total_trials_processed += 1
        if success:
            self._successful_trials += 1
        else:
            self._failed_trials += 1
        # Track item usage for this trial
        self._trial_item_usage[trial_number] = context_items_used
        # Update playbook item scores based on outcome
        items_updated = 0
        for item_id in context_items_used:
            if self.playbook.record_outcome(item_id, helpful=success):
                items_updated += 1
        # Create outcome for reflection
        outcome = OptimizationOutcome(
            trial_number=trial_number,
            success=success,
            objective_value=objective_value if success else None,
            constraint_violations=[],
            solver_errors=errors,
            design_variables=design_variables,
            extractor_used=extractor_used,
            duration_seconds=duration_seconds
        )
        # Store outcome
        self._outcomes.append(outcome)
        # Reflect on outcome
        insights = self.reflector.analyze_trial(outcome)
        return {
            "trial_number": trial_number,
            "success": success,
            "items_updated": items_updated,
            "insights_extracted": len(insights)
        }
    def record_error(
        self,
        trial_number: int,
        error_type: str,
        error_message: str,
        context_items_used: Optional[List[str]] = None
    ) -> None:
        """
        Record an error for a trial.
        Separate from process_trial_result for cases where
        we want to record errors without full trial data.
        Args:
            trial_number: Trial number
            error_type: Classification of error
            error_message: Error details
            context_items_used: Playbook items that were active
        """
        context_items_used = context_items_used or []
        # Mark items as harmful
        for item_id in context_items_used:
            self.playbook.record_outcome(item_id, helpful=False)
        # Create insight about the error
        self.reflector.pending_insights.append({
            "category": InsightCategory.MISTAKE,
            "content": f"{error_type}: {error_message[:200]}",
            "helpful": False,
            "trial": trial_number
        })
    def finalize_study(
        self,
        study_stats: Dict[str, Any],
        save_playbook: bool = True
    ) -> Dict[str, Any]:
        """
        Called when study completes. Commits insights and prunes playbook.
        Args:
            study_stats: Dictionary with study statistics:
                - name: Study name
                - total_trials: Total trials run
                - best_value: Best objective achieved
                - convergence_rate: Success rate (0.0-1.0)
                - method: Optimization method used
            save_playbook: Whether to save playbook to disk
        Returns:
            Dictionary with finalization results
        """
        # Analyze study-level patterns
        study_insights = self.reflector.analyze_study_completion(
            study_name=study_stats.get("name", "unknown"),
            total_trials=study_stats.get("total_trials", 0),
            best_value=study_stats.get("best_value", 0),
            convergence_rate=study_stats.get("convergence_rate", 0),
            method=study_stats.get("method", "")
        )
        # Commit all pending insights
        insights_added = self.reflector.commit_insights()
        # Prune consistently harmful items
        items_pruned = self.playbook.prune_harmful(threshold=-3)
        # Save updated playbook
        if save_playbook:
            self.playbook.save(self.playbook_path)
        return {
            "insights_added": insights_added,
            "items_pruned": items_pruned,
            "playbook_size": len(self.playbook.items),
            "playbook_version": self.playbook.version,
            "total_trials_processed": self._total_trials_processed,
            "successful_trials": self._successful_trials,
            "failed_trials": self._failed_trials,
            "success_rate": (
                self._successful_trials / self._total_trials_processed
                if self._total_trials_processed > 0 else 0
            )
        }
    def get_item_performance(self) -> Dict[str, Dict[str, Any]]:
        """
        Get performance metrics for all playbook items.
        Returns:
            Dictionary mapping item IDs to performance stats
        """
        performance = {}
        for item_id, item in self.playbook.items.items():
            trials_used_in = [
                trial for trial, items in self._trial_item_usage.items()
                if item_id in items
            ]
            performance[item_id] = {
                "helpful_count": item.helpful_count,
                "harmful_count": item.harmful_count,
                "net_score": item.net_score,
                "confidence": item.confidence,
                "trials_used_in": len(trials_used_in),
                "category": item.category.value,
                "content_preview": item.content[:100]
            }
        return performance
    def get_top_performers(self, n: int = 10) -> List[Dict[str, Any]]:
        """
        Get the top performing playbook items.
        Args:
            n: Number of top items to return
        Returns:
            List of item performance dictionaries
        """
        performance = self.get_item_performance()
        sorted_items = sorted(
            performance.items(),
            key=lambda x: x[1]["net_score"],
            reverse=True
        )
        return [
            {"id": item_id, **stats}
            for item_id, stats in sorted_items[:n]
        ]
    def get_worst_performers(self, n: int = 10) -> List[Dict[str, Any]]:
        """
        Get the worst performing playbook items.
        Args:
            n: Number of worst items to return
        Returns:
            List of item performance dictionaries
        """
        performance = self.get_item_performance()
        sorted_items = sorted(
            performance.items(),
            key=lambda x: x[1]["net_score"]
        )
        return [
            {"id": item_id, **stats}
            for item_id, stats in sorted_items[:n]
        ]
    def get_statistics(self) -> Dict[str, Any]:
        """Get feedback loop statistics."""
        return {
            "total_trials_processed": self._total_trials_processed,
            "successful_trials": self._successful_trials,
            "failed_trials": self._failed_trials,
            "success_rate": (
                self._successful_trials / self._total_trials_processed
                if self._total_trials_processed > 0 else 0
            ),
            "playbook_items": len(self.playbook.items),
            "pending_insights": self.reflector.get_pending_count(),
            "outcomes_recorded": len(self._outcomes)
        }
    def export_learning_report(self, path: Path) -> None:
        """
        Export a detailed learning report.
        Args:
            path: Path to save the report
        """
        report = {
            "generated_at": datetime.now().isoformat(),
            "statistics": self.get_statistics(),
            "top_performers": self.get_top_performers(20),
            "worst_performers": self.get_worst_performers(10),
            "playbook_stats": self.playbook.get_stats(),
            "outcomes_summary": {
                "total": len(self._outcomes),
                "by_success": {
                    "success": len([o for o in self._outcomes if o.success]),
                    "failure": len([o for o in self._outcomes if not o.success])
                }
            }
        }
        path.parent.mkdir(parents=True, exist_ok=True)
        with open(path, 'w', encoding='utf-8') as f:
            json.dump(report, f, indent=2)
    def reset(self) -> None:
        """Reset the feedback loop state (keeps playbook)."""
        self._trial_item_usage = {}
        self._outcomes = []
        self._total_trials_processed = 0
        self._successful_trials = 0
        self._failed_trials = 0
        self.reflector = AtomizerReflector(self.playbook)
 class FeedbackLoopFactory:
    """Factory for creating feedback loops."""
    @staticmethod
    def create_for_study(study_dir: Path) -> FeedbackLoop:
        """
        Create a feedback loop for a specific study.
        Args:
            study_dir: Path to study directory
        Returns:
            Configured FeedbackLoop
        """
        playbook_path = study_dir / "3_results" / "playbook.json"
        return FeedbackLoop(playbook_path)
    @staticmethod
    def create_global() -> FeedbackLoop:
        """
        Create a feedback loop using the global playbook.
        Returns:
            FeedbackLoop using global playbook path
        """
        from pathlib import Path
        playbook_path = Path(__file__).parents[2] / "knowledge_base" / "playbook.json"
        return FeedbackLoop(playbook_path)
--- a/optimization_engine/context/playbook.py
+++ b/optimization_engine/context/playbook.py
@@ -0,0 +1,432 @@
 """
 Atomizer Playbook - Structured Knowledge Store
 Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
 Based on ACE framework principles:
 - Incremental delta updates (never rewrite wholesale)
 - Helpful/harmful tracking for each insight
 - Semantic deduplication
 - Category-based organization
 This module provides the core data structures for accumulating optimization
 knowledge across sessions.
 """
 from dataclasses import dataclass, field
 from typing import List, Dict, Optional, Any
 from enum import Enum
 import json
 from pathlib import Path
 from datetime import datetime
 import hashlib
 class InsightCategory(Enum):
    """Categories for playbook insights."""
    STRATEGY = "str"      # Optimization strategies
    CALCULATION = "cal"   # Formulas and calculations
    MISTAKE = "mis"       # Common mistakes to avoid
    TOOL = "tool"         # Tool usage patterns
    DOMAIN = "dom"        # Domain-specific knowledge (FEA, NX)
    WORKFLOW = "wf"       # Workflow patterns
@dataclass
 class PlaybookItem:
    """
    Single insight in the playbook with helpful/harmful tracking.
    Each item accumulates feedback over time:
    - helpful_count: Times this insight led to success
    - harmful_count: Times this insight led to failure
    - net_score: helpful - harmful (used for ranking)
    - confidence: helpful / (helpful + harmful)
    """
    id: str
    category: InsightCategory
    content: str
    helpful_count: int = 0
    harmful_count: int = 0
    created_at: str = field(default_factory=lambda: datetime.now().isoformat())
    last_used: Optional[str] = None
    source_trials: List[int] = field(default_factory=list)
    tags: List[str] = field(default_factory=list)
    @property
    def net_score(self) -> int:
        """Net helpfulness score (helpful - harmful)."""
        return self.helpful_count - self.harmful_count
    @property
    def confidence(self) -> float:
        """Confidence score (0.0-1.0) based on outcome ratio."""
        total = self.helpful_count + self.harmful_count
        if total == 0:
            return 0.5  # Neutral confidence for untested items
        return self.helpful_count / total
    def to_context_string(self) -> str:
        """Format for injection into LLM context."""
        return f"[{self.id}] helpful={self.helpful_count} harmful={self.harmful_count} :: {self.content}"
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary for serialization."""
        return {
            "id": self.id,
            "category": self.category.value,
            "content": self.content,
            "helpful_count": self.helpful_count,
            "harmful_count": self.harmful_count,
            "created_at": self.created_at,
            "last_used": self.last_used,
            "source_trials": self.source_trials,
            "tags": self.tags
        }
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> "PlaybookItem":
        """Create from dictionary."""
        return cls(
            id=data["id"],
            category=InsightCategory(data["category"]),
            content=data["content"],
            helpful_count=data.get("helpful_count", 0),
            harmful_count=data.get("harmful_count", 0),
            created_at=data.get("created_at", ""),
            last_used=data.get("last_used"),
            source_trials=data.get("source_trials", []),
            tags=data.get("tags", [])
        )
@dataclass
 class AtomizerPlaybook:
    """
    Evolving playbook that accumulates optimization knowledge.
    Based on ACE framework principles:
    - Incremental delta updates (never rewrite wholesale)
    - Helpful/harmful tracking for each insight
    - Semantic deduplication
    - Category-based organization
    Usage:
        playbook = AtomizerPlaybook.load(path)
        item = playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements for thin walls")
        playbook.record_outcome(item.id, helpful=True)
        playbook.save(path)
    """
    items: Dict[str, PlaybookItem] = field(default_factory=dict)
    version: int = 1
    last_updated: str = field(default_factory=lambda: datetime.now().isoformat())
    def _generate_id(self, category: InsightCategory) -> str:
        """Generate unique ID for new item."""
        existing = [k for k in self.items.keys() if k.startswith(category.value)]
        next_num = len(existing) + 1
        return f"{category.value}-{next_num:05d}"
    def _content_hash(self, content: str) -> str:
        """Generate hash for content deduplication."""
        normalized = content.lower().strip()
        return hashlib.md5(normalized.encode()).hexdigest()[:12]
    def add_insight(
        self,
        category: InsightCategory,
        content: str,
        source_trial: Optional[int] = None,
        tags: Optional[List[str]] = None
    ) -> PlaybookItem:
        """
        Add new insight with delta update (ACE principle).
        Checks for semantic duplicates before adding.
        If duplicate found, increments helpful_count instead.
        Args:
            category: Type of insight
            content: The insight text
            source_trial: Trial number that generated this insight
            tags: Optional tags for filtering
        Returns:
            The created or updated PlaybookItem
        """
        content_hash = self._content_hash(content)
        # Check for near-duplicates
        for item in self.items.values():
            existing_hash = self._content_hash(item.content)
            if content_hash == existing_hash:
                # Update existing instead of adding duplicate
                item.helpful_count += 1
                if source_trial and source_trial not in item.source_trials:
                    item.source_trials.append(source_trial)
                if tags:
                    item.tags = list(set(item.tags + tags))
                self.last_updated = datetime.now().isoformat()
                return item
        # Create new item
        item_id = self._generate_id(category)
        item = PlaybookItem(
            id=item_id,
            category=category,
            content=content,
            source_trials=[source_trial] if source_trial else [],
            tags=tags or []
        )
        self.items[item_id] = item
        self.last_updated = datetime.now().isoformat()
        self.version += 1
        return item
    def record_outcome(self, item_id: str, helpful: bool) -> bool:
        """
        Record whether using this insight was helpful or harmful.
        Args:
            item_id: The playbook item ID
            helpful: True if outcome was positive, False if negative
        Returns:
            True if item was found and updated, False otherwise
        """
        if item_id not in self.items:
            return False
        if helpful:
            self.items[item_id].helpful_count += 1
        else:
            self.items[item_id].harmful_count += 1
        self.items[item_id].last_used = datetime.now().isoformat()
        self.last_updated = datetime.now().isoformat()
        return True
    def get_context_for_task(
        self,
        task_type: str,
        max_items: int = 20,
        min_confidence: float = 0.5,
        tags: Optional[List[str]] = None
    ) -> str:
        """
        Generate context string for LLM consumption.
        Filters by relevance and confidence, sorted by net score.
        Args:
            task_type: Type of task (for filtering)
            max_items: Maximum items to include
            min_confidence: Minimum confidence threshold
            tags: Optional tags to filter by
        Returns:
            Formatted context string for LLM
        """
        relevant_items = [
            item for item in self.items.values()
            if item.confidence >= min_confidence
        ]
        # Filter by tags if provided
        if tags:
            relevant_items = [
                item for item in relevant_items
                if any(tag in item.tags for tag in tags)
            ]
        # Sort by net score (most helpful first)
        relevant_items.sort(key=lambda x: x.net_score, reverse=True)
        # Group by category
        sections: Dict[str, List[str]] = {}
        for item in relevant_items[:max_items]:
            cat_name = item.category.name
            if cat_name not in sections:
                sections[cat_name] = []
            sections[cat_name].append(item.to_context_string())
        # Build context string
        lines = ["## Atomizer Knowledge Playbook", ""]
        for cat_name, items in sections.items():
            lines.append(f"### {cat_name}")
            lines.extend(items)
            lines.append("")
        return "\n".join(lines)
    def search_by_content(
        self,
        query: str,
        category: Optional[InsightCategory] = None,
        limit: int = 5
    ) -> List[PlaybookItem]:
        """
        Search playbook items by content similarity.
        Simple keyword matching - could be enhanced with embeddings.
        Args:
            query: Search query
            category: Optional category filter
            limit: Maximum results
        Returns:
            List of matching items sorted by relevance
        """
        query_lower = query.lower()
        query_words = set(query_lower.split())
        scored_items = []
        for item in self.items.values():
            if category and item.category != category:
                continue
            content_lower = item.content.lower()
            content_words = set(content_lower.split())
            # Simple word overlap scoring
            overlap = len(query_words & content_words)
            if overlap > 0 or query_lower in content_lower:
                score = overlap + (1 if query_lower in content_lower else 0)
                scored_items.append((score, item))
        scored_items.sort(key=lambda x: (-x[0], -x[1].net_score))
        return [item for _, item in scored_items[:limit]]
    def get_by_category(
        self,
        category: InsightCategory,
        min_score: int = 0
    ) -> List[PlaybookItem]:
        """Get all items in a category with minimum net score."""
        return [
            item for item in self.items.values()
            if item.category == category and item.net_score >= min_score
        ]
    def prune_harmful(self, threshold: int = -3) -> int:
        """
        Remove items that have proven consistently harmful.
        Args:
            threshold: Net score threshold (items at or below are removed)
        Returns:
            Number of items removed
        """
        to_remove = [
            item_id for item_id, item in self.items.items()
            if item.net_score <= threshold
        ]
        for item_id in to_remove:
            del self.items[item_id]
        if to_remove:
            self.last_updated = datetime.now().isoformat()
            self.version += 1
        return len(to_remove)
    def get_stats(self) -> Dict[str, Any]:
        """Get playbook statistics."""
        by_category = {}
        for item in self.items.values():
            cat = item.category.name
            if cat not in by_category:
                by_category[cat] = 0
            by_category[cat] += 1
        scores = [item.net_score for item in self.items.values()]
        return {
            "total_items": len(self.items),
            "by_category": by_category,
            "version": self.version,
            "last_updated": self.last_updated,
            "avg_score": sum(scores) / len(scores) if scores else 0,
            "max_score": max(scores) if scores else 0,
            "min_score": min(scores) if scores else 0
        }
    def save(self, path: Path) -> None:
        """
        Persist playbook to JSON.
        Args:
            path: File path to save to
        """
        data = {
            "version": self.version,
            "last_updated": self.last_updated,
            "items": {k: v.to_dict() for k, v in self.items.items()}
        }
        path.parent.mkdir(parents=True, exist_ok=True)
        with open(path, 'w', encoding='utf-8') as f:
            json.dump(data, f, indent=2)
    @classmethod
    def load(cls, path: Path) -> "AtomizerPlaybook":
        """
        Load playbook from JSON.
        Args:
            path: File path to load from
        Returns:
            Loaded playbook (or new empty playbook if file doesn't exist)
        """
        if not path.exists():
            return cls()
        with open(path, encoding='utf-8') as f:
            data = json.load(f)
        playbook = cls(
            version=data.get("version", 1),
            last_updated=data.get("last_updated", datetime.now().isoformat())
        )
        for item_data in data.get("items", {}).values():
            item = PlaybookItem.from_dict(item_data)
            playbook.items[item.id] = item
        return playbook
 # Convenience function for global playbook access
 _global_playbook: Optional[AtomizerPlaybook] = None
 _global_playbook_path: Optional[Path] = None
 def get_playbook(path: Optional[Path] = None) -> AtomizerPlaybook:
    """
    Get the global playbook instance.
    Args:
        path: Optional path to load from (uses default if not provided)
    Returns:
        The global AtomizerPlaybook instance
    """
    global _global_playbook, _global_playbook_path
    if path is None:
        # Default path
        path = Path(__file__).parents[2] / "knowledge_base" / "playbook.json"
    if _global_playbook is None or _global_playbook_path != path:
        _global_playbook = AtomizerPlaybook.load(path)
        _global_playbook_path = path
    return _global_playbook
 def save_playbook() -> None:
    """Save the global playbook to its path."""
    global _global_playbook, _global_playbook_path
    if _global_playbook is not None and _global_playbook_path is not None:
        _global_playbook.save(_global_playbook_path)
--- a/optimization_engine/context/reflector.py
+++ b/optimization_engine/context/reflector.py
@@ -0,0 +1,467 @@
 """
 Atomizer Reflector - Optimization Outcome Analysis
 Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
 The Reflector analyzes optimization outcomes to extract actionable insights:
 - Examines successful and failed trials
 - Extracts patterns that led to success/failure
 - Formats insights for Curator (Playbook) integration
 This implements the "Reflector" role from the ACE framework's
 Generator -> Reflector -> Curator pipeline.
 """
 from typing import Dict, Any, List, Optional
 from dataclasses import dataclass, field
 from pathlib import Path
 from datetime import datetime
 import re
 from .playbook import AtomizerPlaybook, InsightCategory, PlaybookItem
@dataclass
 class OptimizationOutcome:
    """
    Captured outcome from an optimization trial.
    Contains all information needed to analyze what happened
    and extract insights for the playbook.
    """
    trial_number: int
    success: bool
    objective_value: Optional[float]
    constraint_violations: List[str] = field(default_factory=list)
    solver_errors: List[str] = field(default_factory=list)
    design_variables: Dict[str, float] = field(default_factory=dict)
    extractor_used: str = ""
    duration_seconds: float = 0.0
    notes: str = ""
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    # Optional metadata
    solver_type: str = ""
    mesh_info: Dict[str, Any] = field(default_factory=dict)
    convergence_info: Dict[str, Any] = field(default_factory=dict)
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary for serialization."""
        return {
            "trial_number": self.trial_number,
            "success": self.success,
            "objective_value": self.objective_value,
            "constraint_violations": self.constraint_violations,
            "solver_errors": self.solver_errors,
            "design_variables": self.design_variables,
            "extractor_used": self.extractor_used,
            "duration_seconds": self.duration_seconds,
            "notes": self.notes,
            "timestamp": self.timestamp,
            "solver_type": self.solver_type,
            "mesh_info": self.mesh_info,
            "convergence_info": self.convergence_info
        }
@dataclass
 class InsightCandidate:
    """
    A candidate insight extracted from trial analysis.
    Not yet committed to playbook - pending review/aggregation.
    """
    category: InsightCategory
    content: str
    helpful: bool
    trial_number: Optional[int] = None
    confidence: float = 0.5
    tags: List[str] = field(default_factory=list)
 class AtomizerReflector:
    """
    Analyzes optimization outcomes and extracts actionable insights.
    Implements the Reflector role from ACE framework:
    - Examines successful and failed trials
    - Extracts patterns that led to success/failure
    - Formats insights for Curator integration
    Usage:
        playbook = AtomizerPlaybook.load(path)
        reflector = AtomizerReflector(playbook)
        # After each trial
        reflector.analyze_trial(outcome)
        # After study completion
        reflector.analyze_study_completion(stats)
        # Commit insights to playbook
        count = reflector.commit_insights()
        playbook.save(path)
    """
    # Error pattern matchers for insight extraction
    ERROR_PATTERNS = {
        "convergence": [
            r"convergence",
            r"did not converge",
            r"iteration limit",
            r"max iterations"
        ],
        "mesh": [
            r"mesh",
            r"element",
            r"distorted",
            r"jacobian",
            r"negative volume"
        ],
        "singularity": [
            r"singular",
            r"matrix",
            r"ill-conditioned",
            r"pivot"
        ],
        "memory": [
            r"memory",
            r"allocation",
            r"out of memory",
            r"insufficient"
        ],
        "license": [
            r"license",
            r"checkout",
            r"unavailable"
        ],
        "boundary": [
            r"boundary",
            r"constraint",
            r"spc",
            r"load"
        ]
    }
    def __init__(self, playbook: AtomizerPlaybook):
        """
        Initialize reflector with target playbook.
        Args:
            playbook: The playbook to add insights to
        """
        self.playbook = playbook
        self.pending_insights: List[InsightCandidate] = []
        self.analyzed_trials: List[int] = []
    def analyze_trial(self, outcome: OptimizationOutcome) -> List[InsightCandidate]:
        """
        Analyze a single trial outcome and extract insights.
        Returns list of insight candidates (not yet added to playbook).
        Args:
            outcome: The trial outcome to analyze
        Returns:
            List of extracted insight candidates
        """
        insights = []
        self.analyzed_trials.append(outcome.trial_number)
        # Analyze solver errors
        for error in outcome.solver_errors:
            error_insights = self._analyze_error(error, outcome)
            insights.extend(error_insights)
        # Analyze constraint violations
        for violation in outcome.constraint_violations:
            insights.append(InsightCandidate(
                category=InsightCategory.MISTAKE,
                content=f"Constraint violation: {violation}",
                helpful=False,
                trial_number=outcome.trial_number,
                tags=["constraint", "violation"]
            ))
        # Analyze successful patterns
        if outcome.success and outcome.objective_value is not None:
            success_insights = self._analyze_success(outcome)
            insights.extend(success_insights)
        # Analyze duration (performance insights)
        if outcome.duration_seconds > 0:
            perf_insights = self._analyze_performance(outcome)
            insights.extend(perf_insights)
        self.pending_insights.extend(insights)
        return insights
    def _analyze_error(
        self,
        error: str,
        outcome: OptimizationOutcome
    ) -> List[InsightCandidate]:
        """Analyze a solver error and extract relevant insights."""
        insights = []
        error_lower = error.lower()
        # Classify error type
        error_type = "unknown"
        for etype, patterns in self.ERROR_PATTERNS.items():
            if any(re.search(p, error_lower) for p in patterns):
                error_type = etype
                break
        # Generate insight based on error type
        if error_type == "convergence":
            config_summary = self._summarize_config(outcome)
            insights.append(InsightCandidate(
                category=InsightCategory.MISTAKE,
                content=f"Convergence failure with {config_summary}. Consider relaxing solver tolerances or reviewing mesh quality.",
                helpful=False,
                trial_number=outcome.trial_number,
                confidence=0.7,
                tags=["convergence", "solver", error_type]
            ))
        elif error_type == "mesh":
            insights.append(InsightCandidate(
                category=InsightCategory.MISTAKE,
                content=f"Mesh-related error: {error[:100]}. Review element quality and mesh density.",
                helpful=False,
                trial_number=outcome.trial_number,
                confidence=0.8,
                tags=["mesh", "element", error_type]
            ))
        elif error_type == "singularity":
            insights.append(InsightCandidate(
                category=InsightCategory.MISTAKE,
                content=f"Matrix singularity detected. Check boundary conditions and constraints for rigid body modes.",
                helpful=False,
                trial_number=outcome.trial_number,
                confidence=0.9,
                tags=["singularity", "boundary", error_type]
            ))
        elif error_type == "memory":
            insights.append(InsightCandidate(
                category=InsightCategory.TOOL,
                content=f"Memory allocation failure. Consider reducing mesh density or using out-of-core solver.",
                helpful=False,
                trial_number=outcome.trial_number,
                confidence=0.8,
                tags=["memory", "performance", error_type]
            ))
        else:
            # Generic error insight
            insights.append(InsightCandidate(
                category=InsightCategory.MISTAKE,
                content=f"Solver error: {error[:150]}",
                helpful=False,
                trial_number=outcome.trial_number,
                confidence=0.5,
                tags=["error", error_type]
            ))
        return insights
    def _analyze_success(self, outcome: OptimizationOutcome) -> List[InsightCandidate]:
        """Analyze successful trial and extract winning patterns."""
        insights = []
        # Record successful design variable ranges
        design_summary = self._summarize_design(outcome)
        insights.append(InsightCandidate(
            category=InsightCategory.STRATEGY,
            content=f"Successful design: {design_summary}",
            helpful=True,
            trial_number=outcome.trial_number,
            confidence=0.6,
            tags=["success", "design"]
        ))
        # Record extractor performance if fast
        if outcome.duration_seconds > 0 and outcome.duration_seconds < 60:
            insights.append(InsightCandidate(
                category=InsightCategory.TOOL,
                content=f"Fast solve ({outcome.duration_seconds:.1f}s) using {outcome.extractor_used}",
                helpful=True,
                trial_number=outcome.trial_number,
                confidence=0.5,
                tags=["performance", "extractor"]
            ))
        return insights
    def _analyze_performance(self, outcome: OptimizationOutcome) -> List[InsightCandidate]:
        """Analyze performance characteristics."""
        insights = []
        # Flag very slow trials
        if outcome.duration_seconds > 300:  # > 5 minutes
            insights.append(InsightCandidate(
                category=InsightCategory.TOOL,
                content=f"Slow trial ({outcome.duration_seconds/60:.1f} min). Consider mesh refinement or solver settings.",
                helpful=False,
                trial_number=outcome.trial_number,
                confidence=0.6,
                tags=["performance", "slow"]
            ))
        return insights
    def analyze_study_completion(
        self,
        study_name: str,
        total_trials: int,
        best_value: float,
        convergence_rate: float,
        method: str = ""
    ) -> List[InsightCandidate]:
        """
        Analyze completed study and extract high-level insights.
        Args:
            study_name: Name of the completed study
            total_trials: Total number of trials run
            best_value: Best objective value achieved
            convergence_rate: Fraction of trials that succeeded (0.0-1.0)
            method: Optimization method used
        Returns:
            List of study-level insight candidates
        """
        insights = []
        if convergence_rate > 0.9:
            insights.append(InsightCandidate(
                category=InsightCategory.STRATEGY,
                content=f"Study '{study_name}' achieved {convergence_rate:.0%} success rate - configuration is robust for similar problems.",
                helpful=True,
                confidence=0.8,
                tags=["study", "robust", "high_success"]
            ))
        elif convergence_rate < 0.5:
            insights.append(InsightCandidate(
                category=InsightCategory.MISTAKE,
                content=f"Study '{study_name}' had only {convergence_rate:.0%} success rate - review mesh quality and solver settings.",
                helpful=False,
                confidence=0.8,
                tags=["study", "low_success", "needs_review"]
            ))
        # Method-specific insights
        if method and total_trials > 20:
            if convergence_rate > 0.8:
                insights.append(InsightCandidate(
                    category=InsightCategory.STRATEGY,
                    content=f"{method} performed well on '{study_name}' ({convergence_rate:.0%} success, {total_trials} trials).",
                    helpful=True,
                    confidence=0.7,
                    tags=["method", method.lower(), "performance"]
                ))
        self.pending_insights.extend(insights)
        return insights
    def commit_insights(self, min_confidence: float = 0.0) -> int:
        """
        Commit pending insights to playbook (Curator handoff).
        Aggregates similar insights and adds to playbook with
        appropriate helpful/harmful counts.
        Args:
            min_confidence: Minimum confidence threshold to commit
        Returns:
            Number of insights added to playbook
        """
        count = 0
        for insight in self.pending_insights:
            if insight.confidence < min_confidence:
                continue
            item = self.playbook.add_insight(
                category=insight.category,
                content=insight.content,
                source_trial=insight.trial_number,
                tags=insight.tags
            )
            # Record initial outcome based on insight nature
            if not insight.helpful:
                self.playbook.record_outcome(item.id, helpful=False)
            count += 1
        self.pending_insights = []
        return count
    def get_pending_count(self) -> int:
        """Get number of pending insights."""
        return len(self.pending_insights)
    def clear_pending(self) -> None:
        """Clear pending insights without committing."""
        self.pending_insights = []
    def _summarize_config(self, outcome: OptimizationOutcome) -> str:
        """Create brief config summary for error context."""
        parts = []
        if outcome.extractor_used:
            parts.append(f"extractor={outcome.extractor_used}")
        parts.append(f"vars={len(outcome.design_variables)}")
        if outcome.solver_type:
            parts.append(f"solver={outcome.solver_type}")
        return ", ".join(parts)
    def _summarize_design(self, outcome: OptimizationOutcome) -> str:
        """Create brief design summary."""
        parts = []
        if outcome.objective_value is not None:
            parts.append(f"obj={outcome.objective_value:.4g}")
        # Include up to 3 design variables
        var_items = list(outcome.design_variables.items())[:3]
        for k, v in var_items:
            parts.append(f"{k}={v:.3g}")
        if len(outcome.design_variables) > 3:
            parts.append(f"(+{len(outcome.design_variables)-3} more)")
        return ", ".join(parts)
 class ReflectorFactory:
    """Factory for creating reflectors with different configurations."""
    @staticmethod
    def create_for_study(study_dir: Path) -> AtomizerReflector:
        """
        Create a reflector for a specific study.
        Args:
            study_dir: Path to the study directory
        Returns:
            Configured AtomizerReflector
        """
        playbook_path = study_dir / "3_results" / "playbook.json"
        playbook = AtomizerPlaybook.load(playbook_path)
        return AtomizerReflector(playbook)
    @staticmethod
    def create_global() -> AtomizerReflector:
        """
        Create a reflector using the global playbook.
        Returns:
            AtomizerReflector using global playbook
        """
        from .playbook import get_playbook
        return AtomizerReflector(get_playbook())
--- a/optimization_engine/context/runner_integration.py
+++ b/optimization_engine/context/runner_integration.py
@@ -0,0 +1,531 @@
 """
 Context Engineering Integration for OptimizationRunner
 Provides integration between the context engineering system and the
 OptimizationRunner without modifying the core runner code.
 Two approaches are provided:
 1. ContextEngineeringMixin - Mix into OptimizationRunner subclass
 2. ContextAwareRunner - Wrapper that adds context engineering
 Usage:
    # Approach 1: Mixin
    class MyRunner(ContextEngineeringMixin, OptimizationRunner):
        pass
    # Approach 2: Wrapper
    runner = OptimizationRunner(...)
    context_runner = ContextAwareRunner(runner, playbook_path)
    context_runner.run(...)
 """
 from typing import Dict, Any, Optional, List, Callable
 from pathlib import Path
 from datetime import datetime
 import time
 from .playbook import AtomizerPlaybook, get_playbook
 from .reflector import AtomizerReflector, OptimizationOutcome
 from .feedback_loop import FeedbackLoop
 from .compaction import CompactionManager, EventType
 from .session_state import AtomizerSessionState, TaskType, get_session
 class ContextEngineeringMixin:
    """
    Mixin class to add context engineering to OptimizationRunner.
    Provides:
    - Automatic playbook loading/saving
    - Trial outcome reflection
    - Learning from successes/failures
    - Session state tracking
    Usage:
        class MyContextAwareRunner(ContextEngineeringMixin, OptimizationRunner):
            def __init__(self, *args, **kwargs):
                super().__init__(*args, **kwargs)
                self.init_context_engineering()
    """
    def init_context_engineering(
        self,
        playbook_path: Optional[Path] = None,
        enable_compaction: bool = True,
        compaction_threshold: int = 50
    ) -> None:
        """
        Initialize context engineering components.
        Call this in your subclass __init__ after super().__init__().
        Args:
            playbook_path: Path to playbook JSON (default: output_dir/playbook.json)
            enable_compaction: Whether to enable context compaction
            compaction_threshold: Number of events before compaction
        """
        # Determine playbook path
        if playbook_path is None:
            playbook_path = getattr(self, 'output_dir', Path('.')) / 'playbook.json'
        self._playbook_path = Path(playbook_path)
        self._playbook = AtomizerPlaybook.load(self._playbook_path)
        self._reflector = AtomizerReflector(self._playbook)
        self._feedback_loop = FeedbackLoop(self._playbook_path)
        # Initialize compaction if enabled
        self._enable_compaction = enable_compaction
        if enable_compaction:
            self._compaction_manager = CompactionManager(
                compaction_threshold=compaction_threshold,
                keep_recent=20,
                keep_errors=True
            )
        else:
            self._compaction_manager = None
        # Session state
        self._session = get_session()
        self._session.exposed.task_type = TaskType.RUN_OPTIMIZATION
        # Track active playbook items for feedback attribution
        self._active_playbook_items: List[str] = []
        # Statistics
        self._context_stats = {
            "trials_processed": 0,
            "insights_generated": 0,
            "errors_captured": 0
        }
    def get_relevant_playbook_items(self, max_items: int = 15) -> List[str]:
        """
        Get relevant playbook items for current optimization context.
        Returns:
            List of playbook item context strings
        """
        context = self._playbook.get_context_for_task(
            task_type="optimization",
            max_items=max_items,
            min_confidence=0.5
        )
        # Extract item IDs for feedback tracking
        self._active_playbook_items = [
            item.id for item in self._playbook.items.values()
        ][:max_items]
        return context.split('\n')
    def record_trial_start(self, trial_number: int, design_vars: Dict[str, float]) -> None:
        """
        Record the start of a trial for context tracking.
        Args:
            trial_number: Trial number
            design_vars: Design variable values
        """
        if self._compaction_manager:
            self._compaction_manager.add_event(
                self._compaction_manager.events.__class__(
                    timestamp=datetime.now(),
                    event_type=EventType.TRIAL_START,
                    summary=f"Trial {trial_number} started",
                    details={"trial_number": trial_number, "design_vars": design_vars}
                )
            )
        self._session.add_action(f"Started trial {trial_number}")
    def record_trial_outcome(
        self,
        trial_number: int,
        success: bool,
        objective_value: Optional[float],
        design_vars: Dict[str, float],
        errors: Optional[List[str]] = None,
        duration_seconds: float = 0.0
    ) -> Dict[str, Any]:
        """
        Record the outcome of a trial for learning.
        Args:
            trial_number: Trial number
            success: Whether trial succeeded
            objective_value: Objective value (None if failed)
            design_vars: Design variable values
            errors: List of error messages
            duration_seconds: Trial duration
        Returns:
            Dictionary with processing results
        """
        errors = errors or []
        # Update compaction manager
        if self._compaction_manager:
            self._compaction_manager.add_trial_event(
                trial_number=trial_number,
                success=success,
                objective=objective_value,
                duration=duration_seconds
            )
        # Create outcome for reflection
        outcome = OptimizationOutcome(
            trial_number=trial_number,
            success=success,
            objective_value=objective_value,
            constraint_violations=[],
            solver_errors=errors,
            design_variables=design_vars,
            extractor_used=getattr(self, '_current_extractor', ''),
            duration_seconds=duration_seconds
        )
        # Analyze and generate insights
        insights = self._reflector.analyze_trial(outcome)
        # Process through feedback loop
        result = self._feedback_loop.process_trial_result(
            trial_number=trial_number,
            success=success,
            objective_value=objective_value or 0.0,
            design_variables=design_vars,
            context_items_used=self._active_playbook_items,
            errors=errors
        )
        # Update statistics
        self._context_stats["trials_processed"] += 1
        self._context_stats["insights_generated"] += len(insights)
        # Update session state
        if success:
            self._session.add_action(
                f"Trial {trial_number} succeeded: obj={objective_value:.4g}"
            )
        else:
            error_summary = errors[0][:50] if errors else "unknown"
            self._session.add_error(f"Trial {trial_number}: {error_summary}")
            self._context_stats["errors_captured"] += 1
        return {
            "insights_extracted": len(insights),
            "playbook_items_updated": result.get("items_updated", 0)
        }
    def record_error(self, error_message: str, error_type: str = "") -> None:
        """
        Record an error for learning (outside trial context).
        Args:
            error_message: Error description
            error_type: Error classification
        """
        if self._compaction_manager:
            self._compaction_manager.add_error_event(error_message, error_type)
        self._session.add_error(error_message, error_type)
        self._context_stats["errors_captured"] += 1
    def finalize_context_engineering(self, study_stats: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
        """
        Finalize context engineering at end of optimization.
        Commits insights and saves playbook.
        Args:
            study_stats: Optional study statistics for analysis
        Returns:
            Dictionary with finalization results
        """
        if study_stats is None:
            study_stats = {
                "name": getattr(self, 'study', {}).get('study_name', 'unknown'),
                "total_trials": self._context_stats["trials_processed"],
                "best_value": getattr(self, 'best_value', 0),
                "convergence_rate": 0.8  # Would need actual calculation
            }
        # Finalize feedback loop
        result = self._feedback_loop.finalize_study(study_stats)
        # Save playbook
        self._playbook.save(self._playbook_path)
        # Add compaction stats
        if self._compaction_manager:
            result["compaction_stats"] = self._compaction_manager.get_stats()
        result["context_stats"] = self._context_stats
        return result
    def get_context_string(self) -> str:
        """
        Get full context string for LLM consumption.
        Returns:
            Formatted context string
        """
        parts = []
        # Session state
        parts.append(self._session.get_llm_context())
        # Playbook items
        playbook_context = self._playbook.get_context_for_task(
            task_type="optimization",
            max_items=15
        )
        if playbook_context:
            parts.append(playbook_context)
        # Compaction history
        if self._compaction_manager:
            parts.append(self._compaction_manager.get_context_string())
        return "\n\n---\n\n".join(parts)
 class ContextAwareRunner:
    """
    Wrapper that adds context engineering to any OptimizationRunner.
    This approach doesn't require subclassing - it wraps an existing
    runner instance and intercepts relevant calls.
    Usage:
        runner = OptimizationRunner(...)
        context_runner = ContextAwareRunner(runner)
        # Use context_runner.run() instead of runner.run()
        study = context_runner.run(n_trials=50)
        # Get learning report
        report = context_runner.get_learning_report()
    """
    def __init__(
        self,
        runner,
        playbook_path: Optional[Path] = None,
        enable_compaction: bool = True
    ):
        """
        Initialize context-aware wrapper.
        Args:
            runner: OptimizationRunner instance to wrap
            playbook_path: Path to playbook (default: runner's output_dir)
            enable_compaction: Whether to enable context compaction
        """
        self._runner = runner
        # Determine playbook path
        if playbook_path is None:
            playbook_path = runner.output_dir / 'playbook.json'
        self._playbook_path = Path(playbook_path)
        self._playbook = AtomizerPlaybook.load(self._playbook_path)
        self._reflector = AtomizerReflector(self._playbook)
        self._feedback_loop = FeedbackLoop(self._playbook_path)
        # Compaction
        self._enable_compaction = enable_compaction
        if enable_compaction:
            self._compaction = CompactionManager(
                compaction_threshold=50,
                keep_recent=20
            )
        else:
            self._compaction = None
        # Session
        self._session = get_session()
        self._session.exposed.task_type = TaskType.RUN_OPTIMIZATION
        # Statistics
        self._stats = {
            "trials_observed": 0,
            "successful_trials": 0,
            "failed_trials": 0,
            "insights_generated": 0
        }
        # Hook into runner's objective function
        self._original_objective = runner._objective_function
        runner._objective_function = self._wrapped_objective
    def _wrapped_objective(self, trial) -> float:
        """
        Wrapped objective function that captures outcomes.
        """
        start_time = time.time()
        trial_number = trial.number
        # Record trial start
        if self._compaction:
            from .compaction import ContextEvent
            self._compaction.add_event(ContextEvent(
                timestamp=datetime.now(),
                event_type=EventType.TRIAL_START,
                summary=f"Trial {trial_number} starting"
            ))
        try:
            # Run original objective
            result = self._original_objective(trial)
            # Record success
            duration = time.time() - start_time
            self._record_success(trial_number, result, trial.params, duration)
            return result
        except Exception as e:
            # Record failure
            duration = time.time() - start_time
            self._record_failure(trial_number, str(e), trial.params, duration)
            raise
    def _record_success(
        self,
        trial_number: int,
        objective_value: float,
        params: Dict[str, Any],
        duration: float
    ) -> None:
        """Record successful trial."""
        self._stats["trials_observed"] += 1
        self._stats["successful_trials"] += 1
        if self._compaction:
            self._compaction.add_trial_event(
                trial_number=trial_number,
                success=True,
                objective=objective_value,
                duration=duration
            )
        # Process through feedback loop
        self._feedback_loop.process_trial_result(
            trial_number=trial_number,
            success=True,
            objective_value=objective_value,
            design_variables=dict(params),
            context_items_used=list(self._playbook.items.keys())[:10]
        )
        # Update session
        self._session.add_action(f"Trial {trial_number}: obj={objective_value:.4g}")
    def _record_failure(
        self,
        trial_number: int,
        error: str,
        params: Dict[str, Any],
        duration: float
    ) -> None:
        """Record failed trial."""
        self._stats["trials_observed"] += 1
        self._stats["failed_trials"] += 1
        if self._compaction:
            self._compaction.add_trial_event(
                trial_number=trial_number,
                success=False,
                duration=duration
            )
            self._compaction.add_error_event(error, "trial_failure")
        # Process through feedback loop
        self._feedback_loop.process_trial_result(
            trial_number=trial_number,
            success=False,
            objective_value=0.0,
            design_variables=dict(params),
            errors=[error]
        )
        # Update session
        self._session.add_error(f"Trial {trial_number}: {error[:100]}")
    def run(self, *args, **kwargs):
        """
        Run optimization with context engineering.
        Passes through to wrapped runner.run() with context tracking.
        """
        # Update session state
        study_name = kwargs.get('study_name', 'unknown')
        self._session.exposed.study_name = study_name
        self._session.exposed.study_status = "running"
        try:
            # Run optimization
            result = self._runner.run(*args, **kwargs)
            # Finalize context engineering
            self._finalize(study_name)
            return result
        except Exception as e:
            self._session.add_error(f"Study failed: {str(e)}")
            raise
    def _finalize(self, study_name: str) -> None:
        """Finalize context engineering after optimization."""
        total_trials = self._stats["trials_observed"]
        success_rate = (
            self._stats["successful_trials"] / total_trials
            if total_trials > 0 else 0
        )
        # Finalize feedback loop
        result = self._feedback_loop.finalize_study({
            "name": study_name,
            "total_trials": total_trials,
            "best_value": getattr(self._runner, 'best_value', 0),
            "convergence_rate": success_rate
        })
        self._stats["insights_generated"] = result.get("insights_added", 0)
        # Update session
        self._session.exposed.study_status = "completed"
        self._session.exposed.trials_completed = total_trials
    def get_learning_report(self) -> Dict[str, Any]:
        """Get report on what the system learned."""
        return {
            "statistics": self._stats,
            "playbook_size": len(self._playbook.items),
            "playbook_stats": self._playbook.get_stats(),
            "feedback_stats": self._feedback_loop.get_statistics(),
            "top_insights": self._feedback_loop.get_top_performers(10),
            "compaction_stats": (
                self._compaction.get_stats() if self._compaction else None
            )
        }
    def get_context(self) -> str:
        """Get current context string for LLM."""
        parts = [self._session.get_llm_context()]
        if self._compaction:
            parts.append(self._compaction.get_context_string())
        playbook_context = self._playbook.get_context_for_task("optimization")
        if playbook_context:
            parts.append(playbook_context)
        return "\n\n---\n\n".join(parts)
    def __getattr__(self, name):
        """Delegate unknown attributes to wrapped runner."""
        return getattr(self._runner, name)
--- a/optimization_engine/context/session_state.py
+++ b/optimization_engine/context/session_state.py
@@ -0,0 +1,463 @@
 """
 Atomizer Session State - Context Isolation Management
 Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
 Implements the "Write-Select-Compress-Isolate" pattern:
 - Exposed fields are sent to LLM at every turn
 - Isolated fields are accessed selectively when needed
 - Automatic compression of old data
 This ensures efficient context usage while maintaining
 access to full historical data when needed.
 """
 from typing import Dict, List, Optional, Any
 from datetime import datetime
 from enum import Enum
 from dataclasses import dataclass, field
 import json
 from pathlib import Path
 class TaskType(Enum):
    """Types of tasks Claude can perform in Atomizer."""
    CREATE_STUDY = "create_study"
    RUN_OPTIMIZATION = "run_optimization"
    MONITOR_PROGRESS = "monitor_progress"
    ANALYZE_RESULTS = "analyze_results"
    DEBUG_ERROR = "debug_error"
    CONFIGURE_SETTINGS = "configure_settings"
    EXPORT_DATA = "export_data"
    NEURAL_ACCELERATION = "neural_acceleration"
@dataclass
 class ExposedState:
    """
    State exposed to LLM at every turn.
    Keep this minimal - only what's needed for immediate context.
    Everything here counts against token budget every turn.
    """
    # Current task context
    task_type: Optional[TaskType] = None
    current_objective: str = ""
    # Recent history (compressed)
    recent_actions: List[str] = field(default_factory=list)
    recent_errors: List[str] = field(default_factory=list)
    # Active study summary
    study_name: Optional[str] = None
    study_status: str = "unknown"
    trials_completed: int = 0
    trials_total: int = 0
    best_value: Optional[float] = None
    best_trial: Optional[int] = None
    # Playbook excerpt (most relevant items)
    active_playbook_items: List[str] = field(default_factory=list)
    # Constraints for context size
    MAX_ACTIONS: int = 10
    MAX_ERRORS: int = 5
    MAX_PLAYBOOK_ITEMS: int = 15
@dataclass
 class IsolatedState:
    """
    State isolated from LLM - accessed selectively.
    This data is NOT included in every context window.
    Load specific fields when explicitly needed.
    """
    # Full optimization history (can be large)
    full_trial_history: List[Dict[str, Any]] = field(default_factory=list)
    # NX session state (heavy, complex)
    nx_model_path: Optional[str] = None
    nx_expressions: Dict[str, Any] = field(default_factory=dict)
    nx_sim_path: Optional[str] = None
    # Neural network cache
    neural_predictions: Dict[str, float] = field(default_factory=dict)
    surrogate_model_path: Optional[str] = None
    # Full playbook (loaded on demand)
    full_playbook_path: Optional[str] = None
    # Debug information
    last_solver_output: str = ""
    last_f06_content: str = ""
    last_solver_returncode: Optional[int] = None
    # Configuration snapshots
    optimization_config: Dict[str, Any] = field(default_factory=dict)
    study_config: Dict[str, Any] = field(default_factory=dict)
@dataclass
 class AtomizerSessionState:
    """
    Complete session state with exposure control.
    The exposed state is automatically injected into every LLM context.
    The isolated state is accessed only when explicitly needed.
    Usage:
        session = AtomizerSessionState(session_id="session_001")
        session.exposed.task_type = TaskType.CREATE_STUDY
        session.add_action("Created study directory")
        # Get context for LLM
        context = session.get_llm_context()
        # Access isolated data when needed
        f06 = session.load_isolated_data("last_f06_content")
    """
    session_id: str
    created_at: str = field(default_factory=lambda: datetime.now().isoformat())
    last_updated: str = field(default_factory=lambda: datetime.now().isoformat())
    exposed: ExposedState = field(default_factory=ExposedState)
    isolated: IsolatedState = field(default_factory=IsolatedState)
    def get_llm_context(self) -> str:
        """
        Generate context string for LLM consumption.
        Only includes exposed state - isolated state requires
        explicit access via load_isolated_data().
        Returns:
            Formatted markdown context string
        """
        lines = [
            "## Current Session State",
            "",
            f"**Task**: {self.exposed.task_type.value if self.exposed.task_type else 'Not set'}",
            f"**Objective**: {self.exposed.current_objective or 'None specified'}",
            "",
        ]
        # Study context
        if self.exposed.study_name:
            progress = ""
            if self.exposed.trials_total > 0:
                pct = (self.exposed.trials_completed / self.exposed.trials_total) * 100
                progress = f" ({pct:.0f}%)"
            lines.extend([
                f"### Active Study: {self.exposed.study_name}",
                f"- Status: {self.exposed.study_status}",
                f"- Trials: {self.exposed.trials_completed}/{self.exposed.trials_total}{progress}",
            ])
            if self.exposed.best_value is not None:
                lines.append(f"- Best: {self.exposed.best_value:.6g} (trial #{self.exposed.best_trial})")
            lines.append("")
        # Recent actions
        if self.exposed.recent_actions:
            lines.append("### Recent Actions")
            for action in self.exposed.recent_actions[-5:]:
                lines.append(f"- {action}")
            lines.append("")
        # Recent errors (highlight these)
        if self.exposed.recent_errors:
            lines.append("### Recent Errors (address these)")
            for error in self.exposed.recent_errors:
                lines.append(f"- {error}")
            lines.append("")
        # Relevant playbook items
        if self.exposed.active_playbook_items:
            lines.append("### Relevant Knowledge")
            for item in self.exposed.active_playbook_items:
                lines.append(f"- {item}")
            lines.append("")
        return "\n".join(lines)
    def add_action(self, action: str) -> None:
        """
        Record an action (auto-compresses old actions).
        Args:
            action: Description of the action taken
        """
        timestamp = datetime.now().strftime("%H:%M:%S")
        self.exposed.recent_actions.append(f"[{timestamp}] {action}")
        # Compress if over limit
        if len(self.exposed.recent_actions) > self.exposed.MAX_ACTIONS:
            # Keep first, summarize middle, keep last 5
            first = self.exposed.recent_actions[0]
            last_five = self.exposed.recent_actions[-5:]
            middle_count = len(self.exposed.recent_actions) - 6
            self.exposed.recent_actions = (
                [first] +
                [f"... ({middle_count} earlier actions)"] +
                last_five
            )
        self.last_updated = datetime.now().isoformat()
    def add_error(self, error: str, error_type: str = "") -> None:
        """
        Record an error for LLM attention.
        Errors are preserved more aggressively than actions
        because they need to be addressed.
        Args:
            error: Error message
            error_type: Optional error classification
        """
        prefix = f"[{error_type}] " if error_type else ""
        self.exposed.recent_errors.append(f"{prefix}{error}")
        # Keep most recent errors
        self.exposed.recent_errors = self.exposed.recent_errors[-self.exposed.MAX_ERRORS:]
        self.last_updated = datetime.now().isoformat()
    def clear_errors(self) -> None:
        """Clear all recorded errors (after they're addressed)."""
        self.exposed.recent_errors = []
        self.last_updated = datetime.now().isoformat()
    def update_study_status(
        self,
        name: str,
        status: str,
        trials_completed: int,
        trials_total: int,
        best_value: Optional[float] = None,
        best_trial: Optional[int] = None
    ) -> None:
        """
        Update the study status in exposed state.
        Args:
            name: Study name
            status: Current status (running, completed, failed, etc.)
            trials_completed: Number of completed trials
            trials_total: Total planned trials
            best_value: Best objective value found
            best_trial: Trial number with best value
        """
        self.exposed.study_name = name
        self.exposed.study_status = status
        self.exposed.trials_completed = trials_completed
        self.exposed.trials_total = trials_total
        self.exposed.best_value = best_value
        self.exposed.best_trial = best_trial
        self.last_updated = datetime.now().isoformat()
    def set_playbook_items(self, items: List[str]) -> None:
        """
        Set the active playbook items for context.
        Args:
            items: List of playbook item context strings
        """
        self.exposed.active_playbook_items = items[:self.exposed.MAX_PLAYBOOK_ITEMS]
        self.last_updated = datetime.now().isoformat()
    def load_isolated_data(self, key: str) -> Any:
        """
        Explicitly load isolated data when needed.
        Use this when you need access to heavy data that
        shouldn't be in every context window.
        Args:
            key: Attribute name in IsolatedState
        Returns:
            The isolated data value, or None if not found
        """
        return getattr(self.isolated, key, None)
    def set_isolated_data(self, key: str, value: Any) -> None:
        """
        Set isolated data.
        Args:
            key: Attribute name in IsolatedState
            value: Value to set
        """
        if hasattr(self.isolated, key):
            setattr(self.isolated, key, value)
            self.last_updated = datetime.now().isoformat()
    def add_trial_to_history(self, trial_data: Dict[str, Any]) -> None:
        """
        Add a trial to the full history (isolated state).
        Args:
            trial_data: Dictionary with trial information
        """
        trial_data["recorded_at"] = datetime.now().isoformat()
        self.isolated.full_trial_history.append(trial_data)
        self.last_updated = datetime.now().isoformat()
    def get_trial_history_summary(self, last_n: int = 10) -> List[Dict[str, Any]]:
        """
        Get summary of recent trials from isolated history.
        Args:
            last_n: Number of recent trials to return
        Returns:
            List of trial summary dictionaries
        """
        return self.isolated.full_trial_history[-last_n:]
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary for serialization."""
        return {
            "session_id": self.session_id,
            "created_at": self.created_at,
            "last_updated": self.last_updated,
            "exposed": {
                "task_type": self.exposed.task_type.value if self.exposed.task_type else None,
                "current_objective": self.exposed.current_objective,
                "recent_actions": self.exposed.recent_actions,
                "recent_errors": self.exposed.recent_errors,
                "study_name": self.exposed.study_name,
                "study_status": self.exposed.study_status,
                "trials_completed": self.exposed.trials_completed,
                "trials_total": self.exposed.trials_total,
                "best_value": self.exposed.best_value,
                "best_trial": self.exposed.best_trial,
                "active_playbook_items": self.exposed.active_playbook_items
            },
            "isolated": {
                "nx_model_path": self.isolated.nx_model_path,
                "nx_sim_path": self.isolated.nx_sim_path,
                "surrogate_model_path": self.isolated.surrogate_model_path,
                "full_playbook_path": self.isolated.full_playbook_path,
                "trial_history_count": len(self.isolated.full_trial_history)
            }
        }
    def save(self, path: Path) -> None:
        """
        Save session state to JSON.
        Note: Full trial history is saved to a separate file
        to keep the main state file manageable.
        Args:
            path: Path to save state file
        """
        path.parent.mkdir(parents=True, exist_ok=True)
        # Save main state
        with open(path, 'w', encoding='utf-8') as f:
            json.dump(self.to_dict(), f, indent=2)
        # Save trial history separately if large
        if len(self.isolated.full_trial_history) > 0:
            history_path = path.with_suffix('.history.json')
            with open(history_path, 'w', encoding='utf-8') as f:
                json.dump(self.isolated.full_trial_history, f, indent=2)
    @classmethod
    def load(cls, path: Path) -> "AtomizerSessionState":
        """
        Load session state from JSON.
        Args:
            path: Path to state file
        Returns:
            Loaded session state (or new state if file doesn't exist)
        """
        if not path.exists():
            return cls(session_id=f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}")
        with open(path, encoding='utf-8') as f:
            data = json.load(f)
        state = cls(
            session_id=data.get("session_id", "unknown"),
            created_at=data.get("created_at", datetime.now().isoformat()),
            last_updated=data.get("last_updated", datetime.now().isoformat())
        )
        # Load exposed state
        exposed = data.get("exposed", {})
        if exposed.get("task_type"):
            state.exposed.task_type = TaskType(exposed["task_type"])
        state.exposed.current_objective = exposed.get("current_objective", "")
        state.exposed.recent_actions = exposed.get("recent_actions", [])
        state.exposed.recent_errors = exposed.get("recent_errors", [])
        state.exposed.study_name = exposed.get("study_name")
        state.exposed.study_status = exposed.get("study_status", "unknown")
        state.exposed.trials_completed = exposed.get("trials_completed", 0)
        state.exposed.trials_total = exposed.get("trials_total", 0)
        state.exposed.best_value = exposed.get("best_value")
        state.exposed.best_trial = exposed.get("best_trial")
        state.exposed.active_playbook_items = exposed.get("active_playbook_items", [])
        # Load isolated state metadata
        isolated = data.get("isolated", {})
        state.isolated.nx_model_path = isolated.get("nx_model_path")
        state.isolated.nx_sim_path = isolated.get("nx_sim_path")
        state.isolated.surrogate_model_path = isolated.get("surrogate_model_path")
        state.isolated.full_playbook_path = isolated.get("full_playbook_path")
        # Load trial history from separate file if exists
        history_path = path.with_suffix('.history.json')
        if history_path.exists():
            with open(history_path, encoding='utf-8') as f:
                state.isolated.full_trial_history = json.load(f)
        return state
 # Convenience functions for session management
 _active_session: Optional[AtomizerSessionState] = None
 def get_session() -> AtomizerSessionState:
    """
    Get the active session state.
    Creates a new session if none exists.
    Returns:
        The active AtomizerSessionState
    """
    global _active_session
    if _active_session is None:
        _active_session = AtomizerSessionState(
            session_id=f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
        )
    return _active_session
 def set_session(session: AtomizerSessionState) -> None:
    """
    Set the active session.
    Args:
        session: Session state to make active
    """
    global _active_session
    _active_session = session
 def clear_session() -> None:
    """Clear the active session."""
    global _active_session
    _active_session = None
--- a/optimization_engine/plugins/post_solve/error_tracker.py
+++ b/optimization_engine/plugins/post_solve/error_tracker.py
@@ -0,0 +1,268 @@
 """
 Error Tracker Hook - Context Engineering Integration
 Preserves solver errors and failures in context for learning.
 Based on Manus insight: "leave the wrong turns in the context"
 This hook:
 1. Captures solver errors and failures
 2. Classifies error types for playbook categorization
 3. Extracts relevant F06 content for analysis
 4. Records errors to session state and LAC
 Hook Point: post_solve
 Priority: 100 (run early to capture before cleanup)
 """
 from pathlib import Path
 from datetime import datetime
 from typing import Dict, Any, Optional
 import json
 import re
 def classify_error(error_msg: str) -> str:
    """
    Classify error type for playbook categorization.
    Args:
        error_msg: Error message text
    Returns:
        Error classification string
    """
    error_lower = error_msg.lower()
    # Check patterns in priority order
    if any(x in error_lower for x in ['convergence', 'did not converge', 'diverge']):
        return "convergence_failure"
    elif any(x in error_lower for x in ['mesh', 'element', 'distorted', 'jacobian']):
        return "mesh_error"
    elif any(x in error_lower for x in ['singular', 'matrix', 'pivot', 'ill-conditioned']):
        return "singularity"
    elif any(x in error_lower for x in ['memory', 'allocation', 'out of memory']):
        return "memory_error"
    elif any(x in error_lower for x in ['license', 'checkout']):
        return "license_error"
    elif any(x in error_lower for x in ['boundary', 'constraint', 'spc', 'rigid body']):
        return "boundary_condition_error"
    elif any(x in error_lower for x in ['timeout', 'time limit']):
        return "timeout_error"
    elif any(x in error_lower for x in ['file', 'not found', 'missing']):
        return "file_error"
    else:
        return "unknown_error"
 def extract_f06_error(f06_path: Optional[str], max_chars: int = 500) -> str:
    """
    Extract error section from F06 file.
    Args:
        f06_path: Path to F06 file
        max_chars: Maximum characters to extract
    Returns:
        Error section content or empty string
    """
    if not f06_path:
        return ""
    path = Path(f06_path)
    if not path.exists():
        return ""
    try:
        with open(path, 'r', encoding='utf-8', errors='ignore') as f:
            content = f.read()
        # Look for error indicators
        error_markers = [
            "*** USER FATAL",
            "*** SYSTEM FATAL",
            "*** USER WARNING",
            "*** SYSTEM WARNING",
            "FATAL ERROR",
            "ERROR MESSAGE"
        ]
        for marker in error_markers:
            if marker in content:
                idx = content.index(marker)
                # Extract surrounding context
                start = max(0, idx - 100)
                end = min(len(content), idx + max_chars)
                return content[start:end].strip()
        # If no explicit error marker, check for convergence messages
        convergence_patterns = [
            r"CONVERGENCE NOT ACHIEVED",
            r"SOLUTION DID NOT CONVERGE",
            r"DIVERGENCE DETECTED"
        ]
        for pattern in convergence_patterns:
            match = re.search(pattern, content, re.IGNORECASE)
            if match:
                idx = match.start()
                start = max(0, idx - 50)
                end = min(len(content), idx + max_chars)
                return content[start:end].strip()
        return ""
    except Exception as e:
        return f"Error reading F06: {str(e)}"
 def find_f06_file(working_dir: str, sim_file: str = "") -> Optional[Path]:
    """
    Find the F06 file in the working directory.
    Args:
        working_dir: Working directory path
        sim_file: Simulation file name (for naming pattern)
    Returns:
        Path to F06 file or None
    """
    work_path = Path(working_dir)
    # Try common patterns
    patterns = [
        "*.f06",
        "*-solution*.f06",
        "*_sim*.f06"
    ]
    for pattern in patterns:
        matches = list(work_path.glob(pattern))
        if matches:
            # Return most recently modified
            return max(matches, key=lambda p: p.stat().st_mtime)
    return None
 def track_error(context: Dict[str, Any]) -> Dict[str, Any]:
    """
    Hook that preserves errors for context learning.
    Called at post_solve after solver completes.
    Captures error information regardless of success/failure
    to enable learning from both outcomes.
    Args:
        context: Hook context with trial information
    Returns:
        Dictionary with error tracking results
    """
    trial_number = context.get('trial_number', -1)
    working_dir = context.get('working_dir', '.')
    output_dir = context.get('output_dir', working_dir)
    solver_returncode = context.get('solver_returncode', 0)
    # Determine if this is an error case
    # (solver returncode non-zero, or explicit error flag)
    is_error = (
        solver_returncode != 0 or
        context.get('error', False) or
        context.get('solver_failed', False)
    )
    if not is_error:
        # No error to track, but still record success for learning
        return {"error_tracked": False, "trial_success": True}
    # Find and extract F06 error info
    f06_path = context.get('f06_path')
    if not f06_path:
        f06_file = find_f06_file(working_dir, context.get('sim_file', ''))
        if f06_file:
            f06_path = str(f06_file)
    f06_snippet = extract_f06_error(f06_path)
    # Get error message from context or F06
    error_message = context.get('error_message', '')
    if not error_message and f06_snippet:
        # Extract first line of F06 error as message
        lines = f06_snippet.strip().split('\n')
        error_message = lines[0][:200] if lines else "Unknown solver error"
    # Classify error
    error_type = classify_error(error_message or f06_snippet)
    # Build error record
    error_info = {
        "trial": trial_number,
        "timestamp": datetime.now().isoformat(),
        "solver_returncode": solver_returncode,
        "error_type": error_type,
        "error_message": error_message,
        "f06_snippet": f06_snippet[:1000] if f06_snippet else "",
        "design_variables": context.get('design_variables', {}),
        "working_dir": working_dir
    }
    # Save to error log (append mode - accumulate errors)
    error_log_path = Path(output_dir) / "error_history.jsonl"
    try:
        error_log_path.parent.mkdir(parents=True, exist_ok=True)
        with open(error_log_path, 'a', encoding='utf-8') as f:
            f.write(json.dumps(error_info) + "\n")
    except Exception as e:
        print(f"Warning: Could not write error log: {e}")
    # Try to update session state if context engineering is active
    try:
        from optimization_engine.context.session_state import get_session
        session = get_session()
        session.add_error(
            f"Trial {trial_number}: {error_type} - {error_message[:100]}",
            error_type=error_type
        )
    except ImportError:
        pass  # Context module not available
    # Try to record to LAC if available
    try:
        from knowledge_base.lac import get_lac
        lac = get_lac()
        lac.record_insight(
            category="failure",
            context=f"Trial {trial_number} solver error",
            insight=f"{error_type}: {error_message[:200]}",
            confidence=0.7,
            tags=["solver", error_type, "automatic"]
        )
    except ImportError:
        pass  # LAC not available
    return {
        "error_tracked": True,
        "error_type": error_type,
        "error_message": error_message[:200],
        "f06_extracted": bool(f06_snippet)
    }
 # Hook registration metadata
 HOOK_CONFIG = {
    "name": "error_tracker",
    "hook_point": "post_solve",
    "priority": 100,  # Run early to capture before cleanup
    "enabled": True,
    "description": "Preserves solver errors for context learning"
 }
 # Make the function discoverable by hook manager
 def get_hook():
    """Return the hook function for registration."""
    return track_error
 # For direct plugin discovery
 __all__ = ['track_error', 'HOOK_CONFIG', 'get_hook']
--- a/tests/test_context_engineering.py
+++ b/tests/test_context_engineering.py
@@ -0,0 +1,739 @@
 """
 Test suite for context engineering components.
 Tests the ACE (Agentic Context Engineering) implementation:
 - Playbook: Knowledge store with helpful/harmful tracking
 - Reflector: Outcome analysis and insight extraction
 - SessionState: Context isolation
 - Compaction: Long-running session management
 - FeedbackLoop: Automated learning
 """
 import pytest
 from pathlib import Path
 import tempfile
 import json
 from datetime import datetime
 from optimization_engine.context.playbook import (
    AtomizerPlaybook,
    PlaybookItem,
    InsightCategory
 )
 from optimization_engine.context.reflector import (
    AtomizerReflector,
    OptimizationOutcome
 )
 from optimization_engine.context.session_state import (
    AtomizerSessionState,
    TaskType,
    ExposedState,
    IsolatedState
 )
 from optimization_engine.context.compaction import (
    CompactionManager,
    ContextEvent,
    EventType,
    ContextBudgetManager
 )
 from optimization_engine.context.cache_monitor import (
    ContextCacheOptimizer,
    CacheStats,
    StablePrefixBuilder
 )
 from optimization_engine.context.feedback_loop import (
    FeedbackLoop
 )
 class TestAtomizerPlaybook:
    """Tests for the playbook system."""
    def test_create_empty_playbook(self):
        """Test creating an empty playbook."""
        playbook = AtomizerPlaybook()
        assert len(playbook.items) == 0
        assert playbook.version == 1
    def test_add_insight(self):
        """Test adding insights to playbook."""
        playbook = AtomizerPlaybook()
        item = playbook.add_insight(
            category=InsightCategory.STRATEGY,
            content="Use shell elements for thin walls",
            source_trial=1
        )
        assert item.id == "str-00001"
        assert item.helpful_count == 0
        assert item.harmful_count == 0
        assert item.category == InsightCategory.STRATEGY
        assert len(playbook.items) == 1
        assert 1 in item.source_trials
    def test_add_multiple_categories(self):
        """Test adding insights across different categories."""
        playbook = AtomizerPlaybook()
        playbook.add_insight(InsightCategory.STRATEGY, "Strategy 1")
        playbook.add_insight(InsightCategory.MISTAKE, "Mistake 1")
        playbook.add_insight(InsightCategory.TOOL, "Tool tip 1")
        playbook.add_insight(InsightCategory.STRATEGY, "Strategy 2")
        assert len(playbook.items) == 4
        assert "str-00001" in playbook.items
        assert "str-00002" in playbook.items
        assert "mis-00001" in playbook.items
        assert "tool-00001" in playbook.items
    def test_deduplication(self):
        """Test that duplicate insights are merged."""
        playbook = AtomizerPlaybook()
        item1 = playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements")
        item2 = playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements")
        # Should merge into one item
        assert len(playbook.items) == 1
        # Helpful count incremented on duplicate
        assert item2.helpful_count == 1
        assert item1 is item2  # Same object
    def test_outcome_tracking(self):
        """Test helpful/harmful tracking."""
        playbook = AtomizerPlaybook()
        item = playbook.add_insight(InsightCategory.STRATEGY, "Test insight")
        playbook.record_outcome(item.id, helpful=True)
        playbook.record_outcome(item.id, helpful=True)
        playbook.record_outcome(item.id, helpful=False)
        assert item.helpful_count == 2
        assert item.harmful_count == 1
        assert item.net_score == 1
        assert item.confidence == 2/3
    def test_confidence_calculation(self):
        """Test confidence score calculation."""
        playbook = AtomizerPlaybook()
        item = playbook.add_insight(InsightCategory.STRATEGY, "Test")
        # Initial confidence is 0.5 (neutral)
        assert item.confidence == 0.5
        # After positive feedback
        playbook.record_outcome(item.id, helpful=True)
        assert item.confidence == 1.0
        # After mixed feedback
        playbook.record_outcome(item.id, helpful=False)
        assert item.confidence == 0.5
    def test_persistence(self, tmp_path):
        """Test save/load cycle."""
        playbook = AtomizerPlaybook()
        playbook.add_insight(InsightCategory.MISTAKE, "Don't do this", tags=["test"])
        playbook.add_insight(InsightCategory.STRATEGY, "Do this instead")
        # Record some outcomes
        playbook.record_outcome("mis-00001", helpful=False)
        playbook.record_outcome("str-00001", helpful=True)
        save_path = tmp_path / "playbook.json"
        playbook.save(save_path)
        # Load and verify
        loaded = AtomizerPlaybook.load(save_path)
        assert len(loaded.items) == 2
        assert "mis-00001" in loaded.items
        assert loaded.items["mis-00001"].harmful_count == 1
        assert loaded.items["str-00001"].helpful_count == 1
        assert "test" in loaded.items["mis-00001"].tags
    def test_pruning(self):
        """Test harmful item pruning."""
        playbook = AtomizerPlaybook()
        item = playbook.add_insight(InsightCategory.STRATEGY, "Bad advice")
        # Record many harmful outcomes
        for _ in range(5):
            playbook.record_outcome(item.id, helpful=False)
        assert item.net_score == -5
        # Prune with threshold -3
        removed = playbook.prune_harmful(threshold=-3)
        assert removed == 1
        assert len(playbook.items) == 0
    def test_search_by_content(self):
        """Test content search functionality."""
        playbook = AtomizerPlaybook()
        playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements for thin walls")
        playbook.add_insight(InsightCategory.STRATEGY, "Solid elements for thick parts")
        playbook.add_insight(InsightCategory.MISTAKE, "Don't use coarse mesh")
        results = playbook.search_by_content("shell elements")
        assert len(results) >= 1
        assert "shell" in results[0].content.lower()
    def test_get_context_for_task(self):
        """Test context string generation."""
        playbook = AtomizerPlaybook()
        playbook.add_insight(InsightCategory.STRATEGY, "Strategy 1")
        playbook.add_insight(InsightCategory.MISTAKE, "Mistake 1")
        # Make strategy have higher score
        playbook.record_outcome("str-00001", helpful=True)
        playbook.record_outcome("str-00001", helpful=True)
        context = playbook.get_context_for_task("optimization")
        assert "Playbook" in context
        assert "str-00001" in context
        assert "helpful=2" in context
 class TestAtomizerReflector:
    """Tests for the reflector component."""
    def test_create_reflector(self):
        """Test creating a reflector."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)
        assert reflector.playbook is playbook
        assert len(reflector.pending_insights) == 0
    def test_analyze_successful_trial(self):
        """Test analysis of successful trial."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)
        outcome = OptimizationOutcome(
            trial_number=1,
            success=True,
            objective_value=100.0,
            constraint_violations=[],
            solver_errors=[],
            design_variables={"thickness": 1.0, "width": 5.0},
            extractor_used="mass_extractor",
            duration_seconds=60
        )
        insights = reflector.analyze_trial(outcome)
        # Should extract success pattern
        assert len(insights) >= 1
        assert any(i.helpful for i in insights)
        assert 1 in reflector.analyzed_trials
    def test_analyze_failed_trial(self):
        """Test analysis of failed trial."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)
        outcome = OptimizationOutcome(
            trial_number=1,
            success=False,
            objective_value=None,
            constraint_violations=["stress > 250 MPa"],
            solver_errors=["convergence failure at iteration 50"],
            design_variables={"thickness": 0.5},
            extractor_used="stress_extractor",
            duration_seconds=120
        )
        insights = reflector.analyze_trial(outcome)
        # Should extract failure patterns
        assert len(insights) >= 2  # At least error + constraint
        assert any(i.category == InsightCategory.MISTAKE for i in insights)
        assert not any(i.helpful for i in insights if i.category == InsightCategory.MISTAKE)
    def test_analyze_mesh_error(self):
        """Test analysis of mesh-related error."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)
        outcome = OptimizationOutcome(
            trial_number=5,
            success=False,
            objective_value=None,
            constraint_violations=[],
            solver_errors=["Element distortion: negative jacobian detected"],
            design_variables={},
            extractor_used="",
            duration_seconds=30
        )
        insights = reflector.analyze_trial(outcome)
        # Should identify mesh error
        assert any("mesh" in str(i.tags).lower() for i in insights)
    def test_commit_insights(self):
        """Test committing insights to playbook."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)
        outcome = OptimizationOutcome(
            trial_number=1,
            success=True,
            objective_value=100.0,
            constraint_violations=[],
            solver_errors=[],
            design_variables={"thickness": 1.0},
            extractor_used="mass_extractor",
            duration_seconds=60
        )
        reflector.analyze_trial(outcome)
        count = reflector.commit_insights()
        assert count > 0
        assert len(playbook.items) > 0
        assert len(reflector.pending_insights) == 0  # Cleared after commit
    def test_analyze_study_completion(self):
        """Test study-level analysis."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)
        # High success rate study
        insights = reflector.analyze_study_completion(
            study_name="test_study",
            total_trials=100,
            best_value=50.0,
            convergence_rate=0.95,
            method="TPE"
        )
        assert len(insights) >= 1
        assert any("robust" in i.content.lower() for i in insights)
 class TestSessionState:
    """Tests for session state management."""
    def test_create_session(self):
        """Test creating a session."""
        session = AtomizerSessionState(session_id="test_session")
        assert session.session_id == "test_session"
        assert session.exposed.task_type is None
        assert len(session.exposed.recent_actions) == 0
    def test_set_task_type(self):
        """Test setting task type."""
        session = AtomizerSessionState(session_id="test")
        session.exposed.task_type = TaskType.CREATE_STUDY
        assert session.exposed.task_type == TaskType.CREATE_STUDY
    def test_add_action(self):
        """Test adding actions."""
        session = AtomizerSessionState(session_id="test")
        session.add_action("Created study directory")
        session.add_action("Configured optimization")
        assert len(session.exposed.recent_actions) == 2
        assert "Created study" in session.exposed.recent_actions[0]
    def test_action_compression(self):
        """Test automatic action compression."""
        session = AtomizerSessionState(session_id="test")
        # Add more actions than the limit
        for i in range(15):
            session.add_action(f"Action {i}")
        # Should be compressed
        assert len(session.exposed.recent_actions) <= 12
        assert any("earlier actions" in a.lower() for a in session.exposed.recent_actions)
    def test_add_error(self):
        """Test adding errors."""
        session = AtomizerSessionState(session_id="test")
        session.add_error("Solver failed", error_type="convergence")
        session.add_error("Mesh error")
        assert len(session.exposed.recent_errors) == 2
        assert "[convergence]" in session.exposed.recent_errors[0]
    def test_update_study_status(self):
        """Test updating study status."""
        session = AtomizerSessionState(session_id="test")
        session.update_study_status(
            name="bracket_opt",
            status="running",
            trials_completed=25,
            trials_total=100,
            best_value=0.5,
            best_trial=20
        )
        assert session.exposed.study_name == "bracket_opt"
        assert session.exposed.trials_completed == 25
        assert session.exposed.best_value == 0.5
    def test_llm_context_generation(self):
        """Test LLM context string generation."""
        session = AtomizerSessionState(session_id="test")
        session.exposed.task_type = TaskType.RUN_OPTIMIZATION
        session.exposed.study_name = "test_study"
        session.exposed.trials_completed = 50
        session.exposed.trials_total = 100
        session.exposed.best_value = 0.5
        context = session.get_llm_context()
        assert "test_study" in context
        assert "50" in context
        assert "0.5" in context
        assert "run_optimization" in context
    def test_isolated_state_access(self):
        """Test accessing isolated state."""
        session = AtomizerSessionState(session_id="test")
        session.isolated.nx_model_path = "/path/to/model.prt"
        # Should not appear in LLM context
        context = session.get_llm_context()
        assert "/path/to/model.prt" not in context
        # But accessible via explicit load
        path = session.load_isolated_data("nx_model_path")
        assert path == "/path/to/model.prt"
    def test_persistence(self, tmp_path):
        """Test save/load cycle."""
        session = AtomizerSessionState(session_id="test_persist")
        session.exposed.task_type = TaskType.ANALYZE_RESULTS
        session.exposed.study_name = "persist_study"
        session.add_action("Test action")
        save_path = tmp_path / "session.json"
        session.save(save_path)
        loaded = AtomizerSessionState.load(save_path)
        assert loaded.session_id == "test_persist"
        assert loaded.exposed.task_type == TaskType.ANALYZE_RESULTS
        assert loaded.exposed.study_name == "persist_study"
 class TestCompactionManager:
    """Tests for context compaction."""
    def test_create_manager(self):
        """Test creating compaction manager."""
        manager = CompactionManager(compaction_threshold=10, keep_recent=5)
        assert manager.compaction_threshold == 10
        assert manager.keep_recent == 5
        assert len(manager.events) == 0
    def test_add_events(self):
        """Test adding events."""
        manager = CompactionManager(compaction_threshold=50)
        manager.add_trial_event(trial_number=1, success=True, objective=100.0)
        manager.add_trial_event(trial_number=2, success=False)
        assert len(manager.events) == 2
    def test_compaction_trigger(self):
        """Test that compaction triggers at threshold."""
        manager = CompactionManager(compaction_threshold=10, keep_recent=5)
        for i in range(15):
            manager.add_event(ContextEvent(
                timestamp=datetime.now(),
                event_type=EventType.TRIAL_COMPLETE,
                summary=f"Trial {i} complete",
                details={"trial_number": i, "objective": i * 0.1}
            ))
        assert manager.compaction_count > 0
        assert len(manager.events) <= 10
    def test_error_preservation(self):
        """Test that errors are never compacted."""
        manager = CompactionManager(compaction_threshold=10, keep_recent=3)
        # Add error early
        manager.add_error_event("Critical solver failure", "solver_error")
        # Add many regular events
        for i in range(20):
            manager.add_trial_event(trial_number=i, success=True, objective=i)
        # Error should still be present
        errors = [e for e in manager.events if e.event_type == EventType.ERROR]
        assert len(errors) == 1
        assert "Critical solver failure" in errors[0].summary
    def test_milestone_preservation(self):
        """Test that milestones are preserved."""
        manager = CompactionManager(compaction_threshold=10, keep_recent=3)
        manager.add_milestone("Optimization started", {"method": "TPE"})
        for i in range(20):
            manager.add_trial_event(trial_number=i, success=True)
        # Milestone should be preserved
        milestones = [e for e in manager.events if e.event_type == EventType.MILESTONE]
        assert len(milestones) == 1
    def test_context_string_generation(self):
        """Test context string generation."""
        manager = CompactionManager()
        manager.add_trial_event(trial_number=1, success=True, objective=100.0)
        manager.add_error_event("Test error")
        context = manager.get_context_string()
        assert "Optimization History" in context
        assert "Trial 1" in context
        assert "Test error" in context
    def test_get_stats(self):
        """Test statistics generation."""
        manager = CompactionManager(compaction_threshold=10, keep_recent=5)
        for i in range(15):
            manager.add_trial_event(trial_number=i, success=i % 2 == 0)
        stats = manager.get_stats()
        assert stats["total_events"] <= 15
        assert stats["compaction_count"] > 0
 class TestCacheMonitor:
    """Tests for cache monitoring."""
    def test_create_optimizer(self):
        """Test creating cache optimizer."""
        optimizer = ContextCacheOptimizer()
        assert optimizer.stats.total_requests == 0
        assert optimizer.stats.cache_hits == 0
    def test_prepare_context(self):
        """Test context preparation."""
        optimizer = ContextCacheOptimizer()
        context = optimizer.prepare_context(
            stable_prefix="Stable content",
            semi_stable="Session content",
            dynamic="User message"
        )
        assert "Stable content" in context
        assert "Session content" in context
        assert "User message" in context
        assert optimizer.stats.total_requests == 1
    def test_cache_hit_detection(self):
        """Test cache hit detection."""
        optimizer = ContextCacheOptimizer()
        # First request
        optimizer.prepare_context("Stable", "Semi", "Dynamic 1")
        # Second request with same stable prefix
        optimizer.prepare_context("Stable", "Semi", "Dynamic 2")
        assert optimizer.stats.total_requests == 2
        assert optimizer.stats.cache_hits == 1
    def test_cache_miss_detection(self):
        """Test cache miss detection."""
        optimizer = ContextCacheOptimizer()
        optimizer.prepare_context("Stable 1", "Semi", "Dynamic")
        optimizer.prepare_context("Stable 2", "Semi", "Dynamic")  # Different prefix
        assert optimizer.stats.cache_hits == 0
        assert optimizer.stats.cache_misses == 2
    def test_stable_prefix_builder(self):
        """Test stable prefix builder."""
        builder = StablePrefixBuilder()
        builder.add_identity("I am Atomizer")
        builder.add_capabilities("I can optimize")
        builder.add_tools("Tool definitions here")
        prefix = builder.build()
        assert "I am Atomizer" in prefix
        assert "I can optimize" in prefix
        # Identity should come before capabilities (order 10 < 20)
        assert prefix.index("Atomizer") < prefix.index("optimize")
 class TestFeedbackLoop:
    """Tests for the feedback loop."""
    def test_create_feedback_loop(self, tmp_path):
        """Test creating feedback loop."""
        playbook_path = tmp_path / "playbook.json"
        feedback = FeedbackLoop(playbook_path)
        assert feedback.playbook is not None
        assert feedback._total_trials_processed == 0
    def test_process_successful_trial(self, tmp_path):
        """Test processing successful trial."""
        playbook_path = tmp_path / "playbook.json"
        feedback = FeedbackLoop(playbook_path)
        result = feedback.process_trial_result(
            trial_number=1,
            success=True,
            objective_value=100.0,
            design_variables={"thickness": 1.0}
        )
        assert result["trial_number"] == 1
        assert result["success"] is True
        assert feedback._total_trials_processed == 1
        assert feedback._successful_trials == 1
    def test_process_failed_trial(self, tmp_path):
        """Test processing failed trial."""
        playbook_path = tmp_path / "playbook.json"
        feedback = FeedbackLoop(playbook_path)
        result = feedback.process_trial_result(
            trial_number=1,
            success=False,
            objective_value=0.0,
            design_variables={"thickness": 0.5},
            errors=["Convergence failure"]
        )
        assert result["success"] is False
        assert feedback._failed_trials == 1
    def test_finalize_study(self, tmp_path):
        """Test study finalization."""
        playbook_path = tmp_path / "playbook.json"
        feedback = FeedbackLoop(playbook_path)
        # Process some trials
        for i in range(10):
            feedback.process_trial_result(
                trial_number=i,
                success=i % 3 != 0,
                objective_value=100 - i if i % 3 != 0 else 0,
                design_variables={"x": i * 0.1}
            )
        # Finalize
        result = feedback.finalize_study({
            "name": "test_study",
            "total_trials": 10,
            "best_value": 91,
            "convergence_rate": 0.7
        })
        assert result["insights_added"] > 0
        assert result["playbook_size"] > 0
        assert playbook_path.exists()  # Should be saved
    def test_playbook_item_attribution(self, tmp_path):
        """Test that playbook items get updated based on outcomes."""
        playbook_path = tmp_path / "playbook.json"
        # Pre-populate playbook
        playbook = AtomizerPlaybook()
        item = playbook.add_insight(InsightCategory.STRATEGY, "Test strategy")
        playbook.save(playbook_path)
        # Create feedback loop and process trials with this item active
        feedback = FeedbackLoop(playbook_path)
        feedback.process_trial_result(
            trial_number=1,
            success=True,
            objective_value=100.0,
            design_variables={},
            context_items_used=[item.id]
        )
        feedback.process_trial_result(
            trial_number=2,
            success=True,
            objective_value=95.0,
            design_variables={},
            context_items_used=[item.id]
        )
        # Item should have positive feedback
        assert feedback.playbook.items[item.id].helpful_count == 2
 class TestContextBudgetManager:
    """Tests for context budget management."""
    def test_create_manager(self):
        """Test creating budget manager."""
        manager = ContextBudgetManager()
        assert manager.budget["total"] == 100000
        assert "stable_prefix" in manager.budget
    def test_estimate_tokens(self):
        """Test token estimation."""
        manager = ContextBudgetManager()
        tokens = manager.estimate_tokens("Hello world")  # 11 chars
        assert tokens == 2  # 11 / 4 = 2.75 -> 2
    def test_update_usage(self):
        """Test usage tracking."""
        manager = ContextBudgetManager()
        result = manager.update_usage("stable_prefix", "x" * 20000)  # 5000 tokens
        assert result["section"] == "stable_prefix"
        assert result["tokens"] == 5000
        assert result["over_budget"] is False
    def test_over_budget_warning(self):
        """Test over-budget detection."""
        manager = ContextBudgetManager()
        # Exceed stable_prefix budget (5000 tokens = 20000 chars)
        result = manager.update_usage("stable_prefix", "x" * 40000)  # 10000 tokens
        assert result["over_budget"] is True
        assert "warning" in result
    def test_get_status(self):
        """Test overall status reporting."""
        manager = ContextBudgetManager()
        manager.update_usage("stable_prefix", "x" * 10000)
        manager.update_usage("protocols", "x" * 20000)
        status = manager.get_status()
        assert "total_used" in status
        assert "utilization" in status
        assert "recommendations" in status
 if __name__ == "__main__":
    pytest.main([__file__, "-v"])
--- a/tests/test_context_integration.py
+++ b/tests/test_context_integration.py
@@ -0,0 +1,463 @@
 """
 Integration test for full context engineering pipeline.
 Tests the complete ACE (Agentic Context Engineering) workflow:
 1. Starting fresh session
 2. Running optimization with successes and failures
 3. Verifying playbook learns from outcomes
 4. Validating persistence across sessions
 5. Testing context compaction under load
 """
 import pytest
 from pathlib import Path
 import tempfile
 import json
 from datetime import datetime, timedelta
 import random
 from optimization_engine.context.playbook import AtomizerPlaybook, InsightCategory
 from optimization_engine.context.reflector import AtomizerReflector, OptimizationOutcome
 from optimization_engine.context.session_state import AtomizerSessionState, TaskType
 from optimization_engine.context.feedback_loop import FeedbackLoop
 from optimization_engine.context.compaction import CompactionManager, EventType
 from optimization_engine.context.cache_monitor import ContextCacheOptimizer, StablePrefixBuilder
 class TestFullOptimizationPipeline:
    """End-to-end test of optimization with context engineering."""
    def test_complete_optimization_cycle(self, tmp_path):
        """
        Simulates a complete optimization run:
        1. Initialize context engineering
        2. Process multiple trials (mix of success/failure)
        3. Finalize and commit learning
        4. Verify playbook has learned
        """
        playbook_path = tmp_path / "playbook.json"
        # Initialize feedback loop
        feedback = FeedbackLoop(playbook_path)
        # Simulate study with mixed results
        trial_results = []
        for i in range(20):
            success = random.random() > 0.3  # 70% success rate
            obj_value = 100 - i * 2 + random.uniform(-5, 5) if success else None
            result = feedback.process_trial_result(
                trial_number=i,
                success=success,
                objective_value=obj_value if success else 0.0,
                design_variables={
                    "thickness": 0.5 + i * 0.1,
                    "width": 10 + i * 0.5
                },
                context_items_used=[],
                errors=["convergence failure"] if not success else None
            )
            trial_results.append({
                "trial": i,
                "success": success,
                "insights": result.get("insights_extracted", 0)
            })
        # Finalize study
        successful = sum(1 for r in trial_results if r["success"])
        final_result = feedback.finalize_study({
            "name": "integration_test_study",
            "total_trials": 20,
            "best_value": min(
                r.get("objective_value", float('inf'))
                for r in trial_results if r["success"]
            ) if successful > 0 else 0,
            "convergence_rate": successful / 20
        })
        # Verify learning occurred
        assert final_result["insights_added"] > 0
        assert final_result["playbook_size"] > 0
        assert playbook_path.exists()
        # Load and verify playbook content
        playbook = AtomizerPlaybook.load(playbook_path)
        # Should have some mistake insights from failures
        mistakes = [
            item for item in playbook.items.values()
            if item.category == InsightCategory.MISTAKE
        ]
        assert len(mistakes) > 0
    def test_learning_persistence_across_sessions(self, tmp_path):
        """
        Test that learning persists across multiple "sessions".
        """
        playbook_path = tmp_path / "playbook.json"
        # Session 1: Generate initial learning
        feedback1 = FeedbackLoop(playbook_path)
        for i in range(10):
            feedback1.process_trial_result(
                trial_number=i,
                success=True,
                objective_value=100 - i,
                design_variables={"x": i}
            )
        feedback1.finalize_study({
            "name": "session1",
            "total_trials": 10,
            "best_value": 91,
            "convergence_rate": 1.0
        })
        # Verify session 1 created insights
        pb1 = AtomizerPlaybook.load(playbook_path)
        session1_items = len(pb1.items)
        assert session1_items > 0
        # Session 2: Continue learning
        feedback2 = FeedbackLoop(playbook_path)
        # Should have loaded existing playbook
        assert len(feedback2.playbook.items) == session1_items
        # Add more trials
        for i in range(10, 20):
            feedback2.process_trial_result(
                trial_number=i,
                success=i % 2 == 0,
                objective_value=100 - i if i % 2 == 0 else 0.0,
                design_variables={"x": i},
                errors=["test error"] if i % 2 != 0 else None
            )
        feedback2.finalize_study({
            "name": "session2",
            "total_trials": 10,
            "best_value": 80,
            "convergence_rate": 0.5
        })
        # Verify combined learning
        pb2 = AtomizerPlaybook.load(playbook_path)
        assert len(pb2.items) >= session1_items  # At least as many items
    def test_playbook_pruning_over_time(self, tmp_path):
        """
        Test that harmful insights get pruned.
        """
        playbook_path = tmp_path / "playbook.json"
        # Create playbook with a "bad" insight
        playbook = AtomizerPlaybook()
        bad_item = playbook.add_insight(
            InsightCategory.STRATEGY,
            "Use extremely coarse mesh"  # Bad advice
        )
        # Give it many harmful outcomes
        for _ in range(10):
            playbook.record_outcome(bad_item.id, helpful=False)
        playbook.save(playbook_path)
        # Create feedback loop and finalize
        feedback = FeedbackLoop(playbook_path)
        # Process a few trials
        for i in range(5):
            feedback.process_trial_result(
                trial_number=i,
                success=True,
                objective_value=100,
                design_variables={}
            )
        feedback.finalize_study({
            "name": "prune_test",
            "total_trials": 5,
            "best_value": 100,
            "convergence_rate": 1.0
        })
        # Bad insight should be pruned (net_score -10 < threshold -3)
        final_playbook = AtomizerPlaybook.load(playbook_path)
        assert bad_item.id not in final_playbook.items
    def test_context_compaction_under_load(self, tmp_path):
        """
        Test that compaction works correctly under high trial volume.
        """
        manager = CompactionManager(
            compaction_threshold=20,
            keep_recent=10,
            keep_errors=True
        )
        # Simulate 100 trials
        errors_added = 0
        for i in range(100):
            success = i % 5 != 0
            if success:
                manager.add_trial_event(
                    trial_number=i,
                    success=True,
                    objective=100 - i * 0.5,
                    duration=random.uniform(30, 120)
                )
            else:
                manager.add_trial_event(
                    trial_number=i,
                    success=False,
                    duration=random.uniform(30, 120)
                )
                manager.add_error_event(
                    f"Error in trial {i}",
                    error_type="test_error"
                )
                errors_added += 1
        # Should have compacted
        stats = manager.get_stats()
        assert stats["compaction_count"] > 0
        # All errors should be preserved
        assert stats["error_events"] == errors_added
        # Total events should be bounded
        assert stats["total_events"] < 100  # Compaction reduced count
        # Context string should be reasonable length
        context = manager.get_context_string()
        assert len(context) < 50000  # Not too long
    def test_session_state_throughout_optimization(self, tmp_path):
        """
        Test session state tracking throughout an optimization.
        """
        session = AtomizerSessionState(session_id="integration_test")
        session.exposed.task_type = TaskType.RUN_OPTIMIZATION
        session.exposed.study_name = "state_test"
        # Simulate optimization progress
        for i in range(20):
            session.add_action(f"Processing trial {i}")
            if i % 5 == 0 and i > 0:
                session.update_study_status(
                    name="state_test",
                    status="running",
                    trials_completed=i,
                    trials_total=20,
                    best_value=100 - i,
                    best_trial=i
                )
            if i % 7 == 0:
                session.add_error(f"Minor issue at trial {i}")
        # Verify state
        assert session.exposed.trials_completed == 15  # Last update at i=15
        assert len(session.exposed.recent_errors) <= 5  # Bounded
        # Context should include key information
        context = session.get_llm_context()
        assert "state_test" in context
        assert "running" in context
    def test_cache_optimization_effectiveness(self):
        """
        Test that cache optimization actually works.
        """
        optimizer = ContextCacheOptimizer()
        # Build stable prefix (should be cached)
        builder = StablePrefixBuilder()
        builder.add_identity("I am Atomizer, an optimization assistant")
        builder.add_capabilities("I can run FEA optimizations")
        builder.add_tools("Available tools: NX, Nastran, Optuna")
        stable_prefix = builder.build()
        # Simulate 10 requests with same stable prefix
        for i in range(10):
            optimizer.prepare_context(
                stable_prefix=stable_prefix,
                semi_stable=f"Session info for request {i}",
                dynamic=f"User message {i}"
            )
        # Should have high cache hit rate
        assert optimizer.stats.hit_rate >= 0.9  # 9/10 hits
        assert optimizer.stats.estimated_savings_percent >= 80  # Good savings
 class TestReflectorLearningPatterns:
    """Test that the reflector extracts useful patterns."""
    def test_convergence_pattern_learning(self, tmp_path):
        """Test learning from convergence failures."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)
        # Simulate convergence failures
        for i in range(5):
            outcome = OptimizationOutcome(
                trial_number=i,
                success=False,
                objective_value=None,
                solver_errors=["Convergence failure at iteration 100"],
                design_variables={"x": i * 0.1},
                duration_seconds=300
            )
            reflector.analyze_trial(outcome)
        reflector.commit_insights()
        # Should have learned about convergence issues
        convergence_insights = [
            item for item in playbook.items.values()
            if "convergence" in item.content.lower()
        ]
        assert len(convergence_insights) > 0
    def test_success_pattern_learning(self, tmp_path):
        """Test learning from successful designs."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)
        # Simulate successful designs with similar characteristics
        for i in range(5):
            outcome = OptimizationOutcome(
                trial_number=i,
                success=True,
                objective_value=50 + i,
                design_variables={
                    "thickness": 1.0 + i * 0.1,  # All around 1.0-1.5
                    "width": 10.0  # Consistent
                },
                duration_seconds=60
            )
            reflector.analyze_trial(outcome)
        reflector.commit_insights()
        # Should have learned success patterns
        success_insights = [
            item for item in playbook.items.values()
            if item.category == InsightCategory.STRATEGY
        ]
        assert len(success_insights) > 0
 class TestErrorTrackerIntegration:
    """Test error tracker plugin integration."""
    def test_error_classification(self):
        """Test error classification function."""
        from optimization_engine.plugins.post_solve.error_tracker import classify_error
        assert classify_error("Convergence failure at iteration 50") == "convergence_failure"
        assert classify_error("Element distortion detected") == "mesh_error"
        assert classify_error("Matrix singularity") == "singularity"
        assert classify_error("Out of memory") == "memory_error"
        assert classify_error("License checkout failed") == "license_error"
        assert classify_error("Random unknown error") == "unknown_error"
    def test_error_tracking_hook(self, tmp_path):
        """Test the error tracking hook function."""
        from optimization_engine.plugins.post_solve.error_tracker import track_error
        context = {
            "trial_number": 5,
            "working_dir": str(tmp_path),
            "output_dir": str(tmp_path),
            "solver_returncode": 1,
            "error_message": "Convergence failure at iteration 100",
            "design_variables": {"x": 1.0, "y": 2.0}
        }
        result = track_error(context)
        assert result["error_tracked"] is True
        assert result["error_type"] == "convergence_failure"
        # Should have created error log
        error_log = tmp_path / "error_history.jsonl"
        assert error_log.exists()
        # Verify log content
        with open(error_log) as f:
            log_entry = json.loads(f.readline())
        assert log_entry["trial"] == 5
        assert log_entry["error_type"] == "convergence_failure"
 class TestPlaybookContextGeneration:
    """Test context generation for different scenarios."""
    def test_context_for_optimization_task(self):
        """Test context generation for optimization."""
        playbook = AtomizerPlaybook()
        # Add various insights
        playbook.add_insight(InsightCategory.STRATEGY, "Start with coarse mesh")
        playbook.add_insight(InsightCategory.MISTAKE, "Avoid tiny elements")
        playbook.add_insight(InsightCategory.TOOL, "Use TPE for exploration")
        # Give them different scores
        playbook.record_outcome("str-00001", helpful=True)
        playbook.record_outcome("str-00001", helpful=True)
        context = playbook.get_context_for_task("optimization", max_items=10)
        assert "Playbook" in context
        assert "STRATEGY" in context
        assert "coarse mesh" in context
    def test_context_filtering_by_confidence(self):
        """Test that low-confidence items are filtered."""
        playbook = AtomizerPlaybook()
        # Add item with low confidence
        item = playbook.add_insight(InsightCategory.STRATEGY, "Questionable advice")
        playbook.record_outcome(item.id, helpful=True)
        playbook.record_outcome(item.id, helpful=False)
        playbook.record_outcome(item.id, helpful=False)
        playbook.record_outcome(item.id, helpful=False)
        # confidence = 1/4 = 0.25
        # High min_confidence should exclude it
        context = playbook.get_context_for_task(
            "optimization",
            min_confidence=0.5
        )
        assert "Questionable advice" not in context
    def test_context_ordering_by_score(self):
        """Test that items are ordered by net score."""
        playbook = AtomizerPlaybook()
        # Add items with different scores
        low = playbook.add_insight(InsightCategory.STRATEGY, "Low score advice")
        high = playbook.add_insight(InsightCategory.STRATEGY, "High score advice")
        # Give high item better score
        for _ in range(5):
            playbook.record_outcome(high.id, helpful=True)
        playbook.record_outcome(low.id, helpful=True)
        context = playbook.get_context_for_task("optimization")
        # High score should appear first
        high_pos = context.find("High score")
        low_pos = context.find("Low score")
        assert high_pos < low_pos
 if __name__ == "__main__":
    pytest.main([__file__, "-v"])