Atomizer/docs/plans/ATOMIZER_CONTEXT_ENGINEERING_PLAN.md

# Atomizer Context Engineering Implementation Plan

## Claude Code Enhancement Strategy Using State-of-the-Art Context Engineering

**Version**: 1.1
**Date**: December 2025
**Updated**: December 28, 2025
**Author**: Antoine (with Claude)
**Purpose**: Transform Atomizer's LLM integration using cutting-edge context engineering patterns
**Prerequisite**: OPTIMIZATION_ENGINE_MIGRATION_PLAN.md (must complete BEFORE this plan)

---

## Prerequisite: Complete Migration First

**IMPORTANT**: This plan assumes the optimization_engine reorganization has been completed.

Before starting Context Engineering:
1. Complete all phases in `.claude/skills/modules/OPTIMIZATION_ENGINE_MIGRATION_PLAN.md`
2. Verify `optimization_engine/core/runner.py` exists (not `optimization_engine/runner.py`)
3. All imports use new paths (e.g., `from optimization_engine.core.runner import ...`)
4. Test suite passes with new structure

If migration is NOT complete, go do it first. Context Engineering builds on top of the reorganized structure.

---

## Executive Summary

This plan transforms Atomizer from a traditional LLM-assisted tool into a **self-improving, context-aware optimization platform** by implementing state-of-the-art context engineering techniques. The core innovation is treating the Learning Atomizer Core (LAC) as an **evolving playbook** that accumulates institutional knowledge through structured generation, reflection, and curation cycles.

**Expected Outcomes**:
- 10-15% improvement in optimization task success rates
- 80%+ reduction in repeated mistakes across sessions
- Dramatic cost reduction through KV-cache optimization
- True institutional memory that compounds over time

---

## Part 1: Architecture Mapping

### Current Atomizer Architecture → ACE Framework Alignment

| Atomizer Component | ACE Role | Enhancement |
|-------------------|----------|-------------|
| Optimization Runner | **Generator** | Produces optimization trajectories with success/failure signals |
| Post-run Analysis | **Reflector** | Extracts insights from optimization outcomes |
| Learning Atomizer Core (LAC) | **Curator** | Integrates insights into persistent playbook |
| Protocol Operating System (POS) | **Context Loader** | Selects relevant context per task type |
| Claude Code Sessions | **Agent** | Executes tasks using curated context |

### New Component: AtomizerPlaybook

```
┌─────────────────────────────────────────────────────────────────┐
│                     AtomizerPlaybook System                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐           │
│  │  Generator  │──▶│  Reflector  │──▶│   Curator   │           │
│  │(Opt Runs)   │   │(Analysis)   │   │(LAC Update) │           │
│  └─────────────┘   └─────────────┘   └─────────────┘           │
│        │                 │                  │                   │
│        ▼                 ▼                  ▼                   │
│  ┌──────────────────────────────────────────────────────┐      │
│  │              Structured Playbook Store               │      │
│  ├──────────────────────────────────────────────────────┤      │
│  │ [str-00001] helpful=8 harmful=0 ::                   │      │
│  │   "For thin-walled structures, start with shell      │      │
│  │    elements before trying solid mesh"                │      │
│  │                                                      │      │
│  │ [cal-00002] helpful=12 harmful=1 ::                  │      │
│  │   "Safety factor = yield_stress / max_von_mises"     │      │
│  │                                                      │      │
│  │ [mis-00003] helpful=0 harmful=6 ::                   │      │
│  │   "Never set convergence < 1e-8 for SOL 106"        │      │
│  └──────────────────────────────────────────────────────┘      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## Part 2: Implementation Phases

### Phase 1: Structured Playbook System (Week 1-2)

**Goal**: Convert LAC from unstructured memory to ACE-style itemized playbook

#### 1.1 Create Playbook Data Structure

**File**: `optimization_engine/context/playbook.py`

```python
from dataclasses import dataclass, field
from typing import List, Dict, Optional
from enum import Enum
import json
from pathlib import Path
from datetime import datetime
import hashlib

class InsightCategory(Enum):
    STRATEGY = "str"      # Optimization strategies
    CALCULATION = "cal"   # Formulas and calculations
    MISTAKE = "mis"       # Common mistakes to avoid
    TOOL = "tool"         # Tool usage patterns
    DOMAIN = "dom"        # Domain-specific knowledge (FEA, NX)
    WORKFLOW = "wf"       # Workflow patterns

@dataclass
class PlaybookItem:
    """Single insight in the playbook with helpful/harmful tracking."""
    id: str
    category: InsightCategory
    content: str
    helpful_count: int = 0
    harmful_count: int = 0
    created_at: str = field(default_factory=lambda: datetime.now().isoformat())
    last_used: Optional[str] = None
    source_trials: List[int] = field(default_factory=list)
    tags: List[str] = field(default_factory=list)

    @property
    def net_score(self) -> int:
        return self.helpful_count - self.harmful_count

    @property
    def confidence(self) -> float:
        total = self.helpful_count + self.harmful_count
        if total == 0:
            return 0.5
        return self.helpful_count / total

    def to_context_string(self) -> str:
        """Format for injection into LLM context."""
        return f"[{self.id}] helpful={self.helpful_count} harmful={self.harmful_count} :: {self.content}"

@dataclass
class AtomizerPlaybook:
    """
    Evolving playbook that accumulates optimization knowledge.

    Based on ACE framework principles:
    - Incremental delta updates (never rewrite wholesale)
    - Helpful/harmful tracking for each insight
    - Semantic deduplication
    - Category-based organization
    """
    items: Dict[str, PlaybookItem] = field(default_factory=dict)
    version: int = 1
    last_updated: str = field(default_factory=lambda: datetime.now().isoformat())

    def _generate_id(self, category: InsightCategory) -> str:
        """Generate unique ID for new item."""
        existing = [k for k in self.items.keys() if k.startswith(category.value)]
        next_num = len(existing) + 1
        return f"{category.value}-{next_num:05d}"

    def add_insight(
        self,
        category: InsightCategory,
        content: str,
        source_trial: Optional[int] = None,
        tags: Optional[List[str]] = None
    ) -> PlaybookItem:
        """
        Add new insight with delta update (ACE principle).

        Checks for semantic duplicates before adding.
        """
        # Check for near-duplicates (simple implementation)
        content_hash = hashlib.md5(content.lower().encode()).hexdigest()[:8]

        for item in self.items.values():
            existing_hash = hashlib.md5(item.content.lower().encode()).hexdigest()[:8]
            if content_hash == existing_hash:
                # Update existing instead of adding duplicate
                item.helpful_count += 1
                if source_trial:
                    item.source_trials.append(source_trial)
                return item

        # Create new item
        item_id = self._generate_id(category)
        item = PlaybookItem(
            id=item_id,
            category=category,
            content=content,
            source_trials=[source_trial] if source_trial else [],
            tags=tags or []
        )
        self.items[item_id] = item
        self.last_updated = datetime.now().isoformat()
        self.version += 1
        return item

    def record_outcome(self, item_id: str, helpful: bool):
        """Record whether using this insight was helpful or harmful."""
        if item_id in self.items:
            if helpful:
                self.items[item_id].helpful_count += 1
            else:
                self.items[item_id].harmful_count += 1
            self.items[item_id].last_used = datetime.now().isoformat()

    def get_context_for_task(
        self,
        task_type: str,
        max_items: int = 20,
        min_confidence: float = 0.5
    ) -> str:
        """
        Generate context string for LLM consumption.

        Filters by relevance and confidence, sorted by net score.
        """
        relevant_items = [
            item for item in self.items.values()
            if item.confidence >= min_confidence
        ]

        # Sort by net score (most helpful first)
        relevant_items.sort(key=lambda x: x.net_score, reverse=True)

        # Group by category
        sections = {}
        for item in relevant_items[:max_items]:
            cat_name = item.category.name
            if cat_name not in sections:
                sections[cat_name] = []
            sections[cat_name].append(item.to_context_string())

        # Build context string
        lines = ["## Atomizer Knowledge Playbook\n"]
        for cat_name, items in sections.items():
            lines.append(f"### {cat_name}")
            lines.extend(items)
            lines.append("")

        return "\n".join(lines)

    def prune_harmful(self, threshold: int = -3):
        """Remove items that have proven consistently harmful."""
        to_remove = [
            item_id for item_id, item in self.items.items()
            if item.net_score <= threshold
        ]
        for item_id in to_remove:
            del self.items[item_id]

    def save(self, path: Path):
        """Persist playbook to JSON."""
        data = {
            "version": self.version,
            "last_updated": self.last_updated,
            "items": {
                k: {
                    "id": v.id,
                    "category": v.category.value,
                    "content": v.content,
                    "helpful_count": v.helpful_count,
                    "harmful_count": v.harmful_count,
                    "created_at": v.created_at,
                    "last_used": v.last_used,
                    "source_trials": v.source_trials,
                    "tags": v.tags
                }
                for k, v in self.items.items()
            }
        }
        path.parent.mkdir(parents=True, exist_ok=True)
        with open(path, 'w') as f:
            json.dump(data, f, indent=2)

    @classmethod
    def load(cls, path: Path) -> "AtomizerPlaybook":
        """Load playbook from JSON."""
        if not path.exists():
            return cls()

        with open(path) as f:
            data = json.load(f)

        playbook = cls(
            version=data.get("version", 1),
            last_updated=data.get("last_updated", datetime.now().isoformat())
        )

        for item_data in data.get("items", {}).values():
            item = PlaybookItem(
                id=item_data["id"],
                category=InsightCategory(item_data["category"]),
                content=item_data["content"],
                helpful_count=item_data.get("helpful_count", 0),
                harmful_count=item_data.get("harmful_count", 0),
                created_at=item_data.get("created_at", ""),
                last_used=item_data.get("last_used"),
                source_trials=item_data.get("source_trials", []),
                tags=item_data.get("tags", [])
            )
            playbook.items[item.id] = item

        return playbook
```

#### 1.2 Create Reflector Component

**File**: `optimization_engine/context/reflector.py`

```python
"""
Reflector: Analyzes optimization outcomes to extract insights.

Part of the ACE (Agentic Context Engineering) implementation for Atomizer.
"""

from typing import Dict, Any, List, Optional
from dataclasses import dataclass
from pathlib import Path
import json

from .playbook import AtomizerPlaybook, InsightCategory

@dataclass
class OptimizationOutcome:
    """Captured outcome from an optimization run."""
    trial_number: int
    success: bool
    objective_value: Optional[float]
    constraint_violations: List[str]
    solver_errors: List[str]
    design_variables: Dict[str, float]
    extractor_used: str
    duration_seconds: float
    notes: str = ""

class AtomizerReflector:
    """
    Analyzes optimization outcomes and extracts actionable insights.

    Implements the Reflector role from ACE framework:
    - Examines successful and failed trials
    - Extracts patterns that led to success/failure
    - Formats insights for Curator integration
    """

    def __init__(self, playbook: AtomizerPlaybook):
        self.playbook = playbook
        self.pending_insights: List[Dict[str, Any]] = []

    def analyze_trial(self, outcome: OptimizationOutcome) -> List[Dict[str, Any]]:
        """
        Analyze a single trial outcome and extract insights.

        Returns list of insight candidates (not yet added to playbook).
        """
        insights = []

        # Analyze solver errors
        for error in outcome.solver_errors:
            if "convergence" in error.lower():
                insights.append({
                    "category": InsightCategory.MISTAKE,
                    "content": f"Convergence failure with config: {self._summarize_config(outcome)}",
                    "helpful": False,
                    "trial": outcome.trial_number
                })
            elif "mesh" in error.lower():
                insights.append({
                    "category": InsightCategory.MISTAKE,
                    "content": f"Mesh-related error: {error[:100]}",
                    "helpful": False,
                    "trial": outcome.trial_number
                })

        # Analyze successful patterns
        if outcome.success and outcome.objective_value is not None:
            # Record successful design variable ranges
            insights.append({
                "category": InsightCategory.STRATEGY,
                "content": f"Successful design: {self._summarize_design(outcome)}",
                "helpful": True,
                "trial": outcome.trial_number
            })

        # Analyze constraint violations
        for violation in outcome.constraint_violations:
            insights.append({
                "category": InsightCategory.MISTAKE,
                "content": f"Constraint violation: {violation}",
                "helpful": False,
                "trial": outcome.trial_number
            })

        self.pending_insights.extend(insights)
        return insights

    def analyze_study_completion(
        self,
        study_name: str,
        total_trials: int,
        best_value: float,
        convergence_rate: float
    ) -> List[Dict[str, Any]]:
        """
        Analyze completed study and extract high-level insights.
        """
        insights = []

        if convergence_rate > 0.9:
            insights.append({
                "category": InsightCategory.STRATEGY,
                "content": f"Study '{study_name}' achieved {convergence_rate:.0%} convergence - configuration is robust",
                "helpful": True,
                "trial": None
            })
        elif convergence_rate < 0.5:
            insights.append({
                "category": InsightCategory.MISTAKE,
                "content": f"Study '{study_name}' had only {convergence_rate:.0%} convergence - review mesh and solver settings",
                "helpful": False,
                "trial": None
            })

        return insights

    def commit_insights(self) -> int:
        """
        Commit pending insights to playbook (Curator handoff).

        Returns number of insights added.
        """
        count = 0
        for insight in self.pending_insights:
            item = self.playbook.add_insight(
                category=insight["category"],
                content=insight["content"],
                source_trial=insight.get("trial")
            )
            if not insight.get("helpful", True):
                self.playbook.record_outcome(item.id, helpful=False)
            count += 1

        self.pending_insights = []
        return count

    def _summarize_config(self, outcome: OptimizationOutcome) -> str:
        """Create brief config summary."""
        return f"extractor={outcome.extractor_used}, vars={len(outcome.design_variables)}"

    def _summarize_design(self, outcome: OptimizationOutcome) -> str:
        """Create brief design summary."""
        vars_summary = ", ".join(
            f"{k}={v:.3g}" for k, v in list(outcome.design_variables.items())[:3]
        )
        return f"obj={outcome.objective_value:.4g}, {vars_summary}"
```

#### 1.3 Integration with OptimizationRunner

**File**: `optimization_engine/core/runner.py` (modifications - POST-MIGRATION PATH)

```python
# Add to imports
from optimization_engine.context.playbook import AtomizerPlaybook
from optimization_engine.context.reflector import AtomizerReflector, OptimizationOutcome

class OptimizationRunner:
    def __init__(self, ...):
        # ... existing init ...

        # Initialize context engineering components
        self.playbook = AtomizerPlaybook.load(
            self.output_dir / "playbook.json"
        )
        self.reflector = AtomizerReflector(self.playbook)

    def _objective(self, trial: optuna.Trial) -> float:
        # ... existing trial logic ...

        # After trial completion, capture outcome for reflection
        outcome = OptimizationOutcome(
            trial_number=trial.number,
            success=not failed,
            objective_value=objective_value if not failed else None,
            constraint_violations=constraint_violations,
            solver_errors=solver_errors,
            design_variables=design_vars,
            extractor_used=self.config.get("extractor", "unknown"),
            duration_seconds=trial_duration
        )
        self.reflector.analyze_trial(outcome)

        return objective_value

    def run(self, n_trials: int) -> Dict[str, Any]:
        # ... existing run logic ...

        # After study completion
        self.reflector.analyze_study_completion(
            study_name=self.study.study_name,
            total_trials=len(self.study.trials),
            best_value=self.study.best_value,
            convergence_rate=successful_trials / total_trials
        )

        # Commit insights and save playbook
        insights_added = self.reflector.commit_insights()
        self.playbook.save(self.output_dir / "playbook.json")

        print(f"Added {insights_added} insights to playbook")
```

---

### Phase 2: Context Isolation & Loading (Week 2-3)

**Goal**: Implement Write-Select-Compress-Isolate pattern for POS

#### 2.1 Session State Schema

**File**: `optimization_engine/context/session_state.py`

```python
"""
Session state management with context isolation.

Implements the "Isolate" pattern from context engineering:
- Exposed fields are sent to LLM
- Isolated fields are accessed selectively
"""

from pydantic import BaseModel, Field
from typing import Dict, List, Optional, Any
from datetime import datetime
from enum import Enum

class TaskType(Enum):
    CREATE_STUDY = "create_study"
    RUN_OPTIMIZATION = "run_optimization"
    MONITOR_PROGRESS = "monitor_progress"
    ANALYZE_RESULTS = "analyze_results"
    DEBUG_ERROR = "debug_error"
    CONFIGURE_SETTINGS = "configure_settings"

class ExposedState(BaseModel):
    """State exposed to LLM at every turn."""

    # Current task context
    task_type: Optional[TaskType] = None
    current_objective: str = ""

    # Recent history (compressed)
    recent_actions: List[str] = Field(default_factory=list, max_items=10)
    recent_errors: List[str] = Field(default_factory=list, max_items=5)

    # Active study summary
    study_name: Optional[str] = None
    study_status: str = "unknown"
    trials_completed: int = 0
    best_value: Optional[float] = None

    # Playbook excerpt (most relevant items)
    active_playbook_items: List[str] = Field(default_factory=list, max_items=15)

class IsolatedState(BaseModel):
    """State isolated from LLM - accessed selectively."""

    # Full optimization history (can be large)
    full_trial_history: List[Dict[str, Any]] = Field(default_factory=list)

    # NX session state (heavy, complex)
    nx_model_path: Optional[str] = None
    nx_expressions: Dict[str, Any] = Field(default_factory=dict)

    # Neural network cache
    neural_predictions: Dict[str, float] = Field(default_factory=dict)

    # Full playbook (loaded on demand)
    full_playbook_path: Optional[str] = None

    # Debug information
    last_solver_output: str = ""
    last_f06_content: str = ""

class AtomizerSessionState(BaseModel):
    """
    Complete session state with exposure control.

    The exposed state is automatically injected into every LLM context.
    The isolated state is accessed only when explicitly needed.
    """

    session_id: str
    created_at: datetime = Field(default_factory=datetime.now)
    last_updated: datetime = Field(default_factory=datetime.now)

    exposed: ExposedState = Field(default_factory=ExposedState)
    isolated: IsolatedState = Field(default_factory=IsolatedState)

    def get_llm_context(self) -> str:
        """Generate context string for LLM consumption."""
        lines = [
            "## Current Session State",
            "",
            f"**Task**: {self.exposed.task_type.value if self.exposed.task_type else 'Not set'}",
            f"**Objective**: {self.exposed.current_objective}",
            "",
        ]

        if self.exposed.study_name:
            lines.extend([
                f"### Active Study: {self.exposed.study_name}",
                f"- Status: {self.exposed.study_status}",
                f"- Trials: {self.exposed.trials_completed}",
                f"- Best: {self.exposed.best_value}",
                "",
            ])

        if self.exposed.recent_actions:
            lines.append("### Recent Actions")
            for action in self.exposed.recent_actions[-5:]:
                lines.append(f"- {action}")
            lines.append("")

        if self.exposed.recent_errors:
            lines.append("### Recent Errors (address these)")
            for error in self.exposed.recent_errors:
                lines.append(f"- ⚠️ {error}")
            lines.append("")

        if self.exposed.active_playbook_items:
            lines.append("### Relevant Knowledge")
            for item in self.exposed.active_playbook_items:
                lines.append(f"- {item}")
            lines.append("")

        return "\n".join(lines)

    def add_action(self, action: str):
        """Record an action (auto-compresses old actions)."""
        self.exposed.recent_actions.append(action)
        if len(self.exposed.recent_actions) > 10:
            # Compress: keep first, last 5, summarize middle
            self.exposed.recent_actions = (
                [self.exposed.recent_actions[0]] +
                ["... (earlier actions summarized)"] +
                self.exposed.recent_actions[-5:]
            )
        self.last_updated = datetime.now()

    def add_error(self, error: str):
        """Record an error for LLM attention."""
        self.exposed.recent_errors.append(error)
        self.exposed.recent_errors = self.exposed.recent_errors[-5:]
        self.last_updated = datetime.now()

    def load_isolated_data(self, key: str) -> Any:
        """Explicitly load isolated data when needed."""
        return getattr(self.isolated, key, None)
```

#### 2.2 Context Loader Enhancement

**File**: `.claude/skills/02_CONTEXT_LOADER.md` (updated)

```markdown
# Context Loader - Enhanced with Playbook Integration

## Loading Rules by Task Type

### CREATE_STUDY
**Always Load**:
- `core/study-creation-core.md`
- `SYS_12_EXTRACTOR_LIBRARY.md`
- **Playbook**: Filter by tags=['study_creation', 'design_variables']

**Load If**:
- `modules/zernike-optimization.md`: if "telescope" or "mirror" in query
- `modules/neural-acceleration.md`: if trials > 50

### RUN_OPTIMIZATION
**Always Load**:
- `OP_02_RUN_OPTIMIZATION.md`
- **Playbook**: Filter by tags=['solver', 'convergence', 'mesh']

**Load If**:
- Recent errors exist → Include mistake items from playbook

### DEBUG_ERROR
**Always Load**:
- `OP_06_TROUBLESHOOT.md`
- **Playbook**: Filter by category=MISTAKE, min_confidence=0.3
- Session state recent_errors

**Load If**:
- "convergence" in error → Load solver-specific playbook items
- "mesh" in error → Load mesh-specific playbook items

## Playbook Integration Pattern

```python
# In context loader
def load_context_for_task(task_type: TaskType, session: AtomizerSessionState):
    context_parts = []

    # 1. Load protocol docs (existing behavior)
    context_parts.append(load_protocol(task_type))

    # 2. Load session state (exposed only)
    context_parts.append(session.get_llm_context())

    # 3. Load relevant playbook items
    playbook = AtomizerPlaybook.load(PLAYBOOK_PATH)
    playbook_context = playbook.get_context_for_task(
        task_type=task_type.value,
        max_items=15,
        min_confidence=0.6
    )
    context_parts.append(playbook_context)

    # 4. Add error-specific items if debugging
    if task_type == TaskType.DEBUG_ERROR:
        for error in session.exposed.recent_errors:
            relevant = playbook.search_by_content(error, category=InsightCategory.MISTAKE)
            context_parts.extend([item.to_context_string() for item in relevant])

    return "\n\n---\n\n".join(context_parts)
```
```

---

### Phase 3: KV-Cache Optimization (Week 3)

**Goal**: Maximize cache hits for 10x cost reduction

#### 3.1 Stable Prefix Architecture

**File**: `.claude/skills/SYSTEM_PROMPT_TEMPLATE.md`

```markdown
# Atomizer System Prompt Template

## Structure for KV-Cache Optimization

The system prompt is structured to maximize KV-cache hits:

```
[SECTION 1: STABLE - Never changes]
- Atomizer identity and capabilities
- Core principles (LLM-driven optimization)
- Tool schemas and definitions
- Base protocol routing table

[SECTION 2: SEMI-STABLE - Changes per session type]
- Active protocol definition
- Task-specific instructions
- Relevant playbook items (top 10 by score)

[SECTION 3: DYNAMIC - Changes every turn]
- Current session state
- Recent actions/errors
- User's latest message
```

## Implementation

### Stable Prefix (Cache This)

```
You are assisting with **Atomizer**, an LLM-first FEA optimization framework.

## Core Capabilities
- Natural language → optimization configuration
- NX Nastran integration via journals
- Multi-strategy optimization (TPE, CMA-ES, NSGA-II)
- Real-time progress monitoring
- Neural acceleration (600-1000x speedup)

## Principles
1. LLM-driven - users describe goals in plain language
2. Never modify master models - work on copies
3. Always validate before running
4. Document everything

## Available Tools
[... tool schemas - NEVER CHANGE ORDER ...]

## Protocol Routing
| Intent | Protocol | Priority |
|--------|----------|----------|
| Create study | OP_01 | 1 |
| Run optimization | OP_02 | 1 |
| Monitor progress | OP_03 | 2 |
| Analyze results | OP_04 | 2 |
| Debug errors | OP_06 | 1 |
```

### Semi-Stable Section (Per Session)

```
## Active Task: {task_type}

### Loaded Protocol: {protocol_name}
{protocol_content}

### Relevant Knowledge (from {playbook_version})
{playbook_items}
```

### Dynamic Section (Per Turn)

```
## Current State
{session_state.get_llm_context()}

## User Message
{user_message}
```
```

#### 3.2 Cache Monitoring

**File**: `optimization_engine/context/cache_monitor.py`

```python
"""
Monitor and optimize KV-cache hit rates.
"""

from dataclasses import dataclass
from typing import Optional
import hashlib

@dataclass
class CacheStats:
    total_requests: int = 0
    cache_hits: int = 0
    prefix_length: int = 0

    @property
    def hit_rate(self) -> float:
        if self.total_requests == 0:
            return 0.0
        return self.cache_hits / self.total_requests

class ContextCacheOptimizer:
    """
    Tracks and optimizes context for cache efficiency.
    """

    def __init__(self):
        self.stats = CacheStats()
        self._last_prefix_hash: Optional[str] = None

    def prepare_context(
        self,
        stable_prefix: str,
        semi_stable: str,
        dynamic: str
    ) -> str:
        """
        Assemble context optimized for caching.

        Tracks whether prefix changed (cache miss).
        """
        # Hash the stable prefix
        prefix_hash = hashlib.md5(stable_prefix.encode()).hexdigest()

        self.stats.total_requests += 1
        if prefix_hash == self._last_prefix_hash:
            self.stats.cache_hits += 1

        self._last_prefix_hash = prefix_hash
        self.stats.prefix_length = len(stable_prefix)

        # Assemble with clear boundaries
        return f"""{stable_prefix}

---
{semi_stable}

---
{dynamic}"""

    def get_report(self) -> str:
        """Generate cache efficiency report."""
        return f"""
Cache Statistics:
- Requests: {self.stats.total_requests}
- Cache Hits: {self.stats.cache_hits}
- Hit Rate: {self.stats.hit_rate:.1%}
- Stable Prefix: {self.stats.prefix_length} chars

Estimated Cost Savings: {self.stats.hit_rate * 90:.0f}%
(Based on 10x cost difference between cached/uncached tokens)
"""
```

---

### Phase 4: Error Recovery & Learning (Week 4)

**Goal**: Leave wrong turns in context, learn from failures

#### 4.1 Error Preservation Hook

**File**: `optimization_engine/plugins/post_solve/error_tracker.py`

```python
"""
Error Tracker Hook

Preserves solver errors and failures in context for learning.
Based on Manus insight: "leave the wrong turns in the context"
"""

from pathlib import Path
from datetime import datetime
import json

def track_error(context: dict) -> dict:
    """
    Hook that preserves errors for context learning.

    Called at post_solve when solver fails.
    """
    trial_number = context.get('trial_number', -1)
    output_dir = Path(context.get('output_dir', '.'))

    # Capture error details
    error_info = {
        "trial": trial_number,
        "timestamp": datetime.now().isoformat(),
        "solver_returncode": context.get('solver_returncode'),
        "error_type": classify_error(context),
        "design_variables": context.get('design_variables', {}),
        "error_message": context.get('error_message', ''),
        "f06_snippet": extract_f06_error(context.get('f06_path'))
    }

    # Append to error log (never overwrite - accumulate)
    error_log_path = output_dir / "error_history.jsonl"
    with open(error_log_path, 'a') as f:
        f.write(json.dumps(error_info) + "\n")

    # Update session state for LLM context
    if 'session_state' in context:
        context['session_state'].add_error(
            f"Trial {trial_number}: {error_info['error_type']} - {error_info['error_message'][:100]}"
        )

    return {"error_tracked": True, "error_type": error_info['error_type']}

def classify_error(context: dict) -> str:
    """Classify error type for playbook categorization."""
    error_msg = context.get('error_message', '').lower()

    if 'convergence' in error_msg:
        return "convergence_failure"
    elif 'mesh' in error_msg or 'element' in error_msg:
        return "mesh_error"
    elif 'singular' in error_msg or 'matrix' in error_msg:
        return "singularity"
    elif 'memory' in error_msg or 'allocation' in error_msg:
        return "memory_error"
    elif 'license' in error_msg:
        return "license_error"
    else:
        return "unknown_error"

def extract_f06_error(f06_path: str) -> str:
    """Extract error section from F06 file."""
    if not f06_path or not Path(f06_path).exists():
        return ""

    try:
        with open(f06_path) as f:
            content = f.read()

        # Look for error indicators
        error_markers = ["*** USER FATAL", "*** SYSTEM FATAL", "*** USER WARNING"]
        for marker in error_markers:
            if marker in content:
                idx = content.index(marker)
                return content[idx:idx+500]

        return ""
    except Exception:
        return ""

# Hook registration
HOOK_CONFIG = {
    "name": "error_tracker",
    "hook_point": "post_solve",
    "priority": 100,  # Run early to capture before cleanup
    "enabled": True,
    "description": "Preserves solver errors for context learning"
}
```

#### 4.2 Feedback Loop Integration

**File**: `optimization_engine/context/feedback_loop.py`

```python
"""
Feedback Loop: Connects optimization outcomes to playbook updates.

Implements ACE's "leverage natural execution feedback" principle.
"""

from typing import Dict, Any, List
from pathlib import Path

from .playbook import AtomizerPlaybook, InsightCategory
from .reflector import AtomizerReflector, OptimizationOutcome

class FeedbackLoop:
    """
    Automated feedback loop that learns from optimization runs.

    Key insight from ACE: Use execution feedback (success/failure)
    as the learning signal, not labeled data.
    """

    def __init__(self, playbook_path: Path):
        self.playbook = AtomizerPlaybook.load(playbook_path)
        self.reflector = AtomizerReflector(self.playbook)
        self.playbook_path = playbook_path

    def process_trial_result(
        self,
        trial_number: int,
        success: bool,
        objective_value: float,
        design_variables: Dict[str, float],
        context_items_used: List[str],  # Which playbook items were in context
        errors: List[str] = None
    ):
        """
        Process a trial result and update playbook accordingly.

        This is the core learning mechanism:
        - If trial succeeded with certain playbook items → increase helpful count
        - If trial failed with certain playbook items → increase harmful count
        """
        # Update playbook item scores based on outcome
        for item_id in context_items_used:
            self.playbook.record_outcome(item_id, helpful=success)

        # Create outcome for reflection
        outcome = OptimizationOutcome(
            trial_number=trial_number,
            success=success,
            objective_value=objective_value if success else None,
            constraint_violations=[],
            solver_errors=errors or [],
            design_variables=design_variables,
            extractor_used="",
            duration_seconds=0
        )

        # Reflect on outcome
        self.reflector.analyze_trial(outcome)

    def finalize_study(self, study_stats: Dict[str, Any]):
        """
        Called when study completes. Commits insights and prunes playbook.
        """
        # Analyze study-level patterns
        self.reflector.analyze_study_completion(
            study_name=study_stats.get("name", "unknown"),
            total_trials=study_stats.get("total_trials", 0),
            best_value=study_stats.get("best_value", 0),
            convergence_rate=study_stats.get("convergence_rate", 0)
        )

        # Commit all pending insights
        insights_added = self.reflector.commit_insights()

        # Prune consistently harmful items
        self.playbook.prune_harmful(threshold=-3)

        # Save updated playbook
        self.playbook.save(self.playbook_path)

        return {
            "insights_added": insights_added,
            "playbook_size": len(self.playbook.items),
            "playbook_version": self.playbook.version
        }
```

---

### Phase 5: Context Compaction (Week 4-5)

**Goal**: Handle long-running optimization sessions without context overflow

#### 5.1 Compaction Manager

**File**: `optimization_engine/context/compaction.py`

```python
"""
Context Compaction for Long-Running Optimizations

Based on Google ADK's compaction architecture:
- Trigger compaction when threshold reached
- Summarize older events
- Preserve recent detail
"""

from typing import List, Dict, Any
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class ContextEvent:
    """Single event in optimization context."""
    timestamp: datetime
    event_type: str  # trial_start, trial_complete, error, milestone
    summary: str
    details: Dict[str, Any] = field(default_factory=dict)
    compacted: bool = False

class CompactionManager:
    """
    Manages context compaction for long optimization sessions.

    Strategy:
    - Keep last N events in full detail
    - Summarize older events into milestone markers
    - Preserve error events (never compact errors)
    """

    def __init__(
        self,
        compaction_threshold: int = 50,
        keep_recent: int = 20,
        keep_errors: bool = True
    ):
        self.events: List[ContextEvent] = []
        self.compaction_threshold = compaction_threshold
        self.keep_recent = keep_recent
        self.keep_errors = keep_errors
        self.compaction_count = 0

    def add_event(self, event: ContextEvent):
        """Add event and trigger compaction if needed."""
        self.events.append(event)

        if len(self.events) > self.compaction_threshold:
            self._compact()

    def _compact(self):
        """
        Compact older events into summaries.

        Preserves:
        - All error events
        - Last `keep_recent` events
        - Milestone summaries of compacted regions
        """
        if len(self.events) <= self.keep_recent:
            return

        # Split into old and recent
        old_events = self.events[:-self.keep_recent]
        recent_events = self.events[-self.keep_recent:]

        # Separate errors from old events
        error_events = [e for e in old_events if e.event_type == "error"]
        non_error_events = [e for e in old_events if e.event_type != "error"]

        # Summarize non-error old events
        if non_error_events:
            summary = self._create_summary(non_error_events)
            compaction_event = ContextEvent(
                timestamp=non_error_events[0].timestamp,
                event_type="compaction",
                summary=summary,
                details={
                    "events_compacted": len(non_error_events),
                    "compaction_number": self.compaction_count
                },
                compacted=True
            )
            self.compaction_count += 1

            # Rebuild events list
            self.events = [compaction_event] + error_events + recent_events
        else:
            self.events = error_events + recent_events

    def _create_summary(self, events: List[ContextEvent]) -> str:
        """Create summary of compacted events."""
        trial_events = [e for e in events if "trial" in e.event_type]

        if not trial_events:
            return f"[{len(events)} events compacted]"

        # Extract trial statistics
        trial_numbers = []
        objectives = []

        for e in trial_events:
            if "trial_number" in e.details:
                trial_numbers.append(e.details["trial_number"])
            if "objective" in e.details:
                objectives.append(e.details["objective"])

        if trial_numbers and objectives:
            return (
                f"Trials {min(trial_numbers)}-{max(trial_numbers)}: "
                f"Best={min(objectives):.4g}, "
                f"Avg={sum(objectives)/len(objectives):.4g}"
            )
        elif trial_numbers:
            return f"Trials {min(trial_numbers)}-{max(trial_numbers)} completed"
        else:
            return f"[{len(events)} events compacted]"

    def get_context_string(self) -> str:
        """Generate context string from events."""
        lines = ["## Optimization History", ""]

        for event in self.events:
            if event.compacted:
                lines.append(f"📦 {event.summary}")
            elif event.event_type == "error":
                lines.append(f"⚠️ {event.summary}")
            else:
                lines.append(f"- {event.summary}")

        return "\n".join(lines)

    def get_stats(self) -> Dict[str, Any]:
        """Get compaction statistics."""
        return {
            "total_events": len(self.events),
            "compaction_count": self.compaction_count,
            "error_events": len([e for e in self.events if e.event_type == "error"]),
            "compacted_events": len([e for e in self.events if e.compacted])
        }
```

---

## Part 3: Claude Code Session Integration

### 3.1 Enhanced Bootstrap for Claude Code

**File**: `.claude/skills/00_BOOTSTRAP_V2.md`

```markdown
# Atomizer Bootstrap v2.0 - Context-Aware Sessions

## Session Initialization

On session start, perform these steps:

### Step 1: Load Playbook
```bash
# Check for existing playbook
cat optimization_engine/context/playbook.json 2>/dev/null | head -20
```

If playbook exists, extract top insights:
- Filter by task type (inferred from user's first message)
- Include top 10 by net_score
- Always include recent mistakes (last 5)

### Step 2: Initialize Session State
```python
from optimization_engine.context.session_state import AtomizerSessionState, TaskType

session = AtomizerSessionState(session_id="current")
session.exposed.task_type = TaskType.CREATE_STUDY  # Update based on intent
```

### Step 3: Load Task-Specific Context
Based on detected task type, load protocols per `02_CONTEXT_LOADER.md`

### Step 4: Inject Playbook Items
Add relevant playbook items to `session.exposed.active_playbook_items`

---

## Error Handling Protocol

When ANY error occurs:

1. **Preserve the error** - Add to session state
2. **Check playbook** - Look for matching mistake patterns
3. **Learn from it** - If novel error, queue for playbook addition
4. **Show to user** - Include error context in response

```python
# On error
session.add_error(f"{error_type}: {error_message}")

# Check playbook for similar errors
similar = playbook.search_by_content(error_message, category=InsightCategory.MISTAKE)
if similar:
    print(f"Known issue: {similar[0].content}")
else:
    reflector.queue_insight(InsightCategory.MISTAKE, error_message)
```

---

## Context Budget Management

Total context budget: ~100K tokens

Allocation:
- **Stable prefix**: 5K tokens (cached)
- **Protocols**: 10K tokens
- **Playbook items**: 5K tokens
- **Session state**: 2K tokens
- **Conversation history**: 30K tokens
- **Working space**: 48K tokens

If approaching limit:
1. Trigger compaction of old events
2. Reduce playbook items to top 5
3. Summarize conversation history
```

### 3.2 Dashboard Integration

**File**: `atomizer-dashboard/backend/api/routes/context.py`

```python
"""
Context Engineering API Routes

Provides endpoints for:
- Viewing playbook contents
- Managing session state
- Triggering compaction
- Monitoring cache efficiency
"""

from fastapi import APIRouter, HTTPException
from pathlib import Path
from typing import Optional

router = APIRouter(prefix="/context", tags=["context"])

ATOMIZER_ROOT = Path(__file__).parents[4]
PLAYBOOK_PATH = ATOMIZER_ROOT / "optimization_engine" / "context" / "playbook.json"

@router.get("/playbook")
async def get_playbook(
    category: Optional[str] = None,
    min_score: int = 0,
    limit: int = 50
):
    """Get playbook items with optional filtering."""
    from optimization_engine.context.playbook import AtomizerPlaybook, InsightCategory

    playbook = AtomizerPlaybook.load(PLAYBOOK_PATH)

    items = list(playbook.items.values())

    # Filter by category
    if category:
        try:
            cat = InsightCategory(category)
            items = [i for i in items if i.category == cat]
        except ValueError:
            raise HTTPException(400, f"Invalid category: {category}")

    # Filter by score
    items = [i for i in items if i.net_score >= min_score]

    # Sort by score
    items.sort(key=lambda x: x.net_score, reverse=True)

    return {
        "total": len(playbook.items),
        "filtered": len(items),
        "items": [
            {
                "id": i.id,
                "category": i.category.value,
                "content": i.content,
                "helpful": i.helpful_count,
                "harmful": i.harmful_count,
                "score": i.net_score,
                "confidence": i.confidence
            }
            for i in items[:limit]
        ]
    }

@router.post("/playbook/feedback")
async def record_feedback(item_id: str, helpful: bool):
    """Record feedback on a playbook item."""
    from optimization_engine.context.playbook import AtomizerPlaybook

    playbook = AtomizerPlaybook.load(PLAYBOOK_PATH)

    if item_id not in playbook.items:
        raise HTTPException(404, f"Item not found: {item_id}")

    playbook.record_outcome(item_id, helpful=helpful)
    playbook.save(PLAYBOOK_PATH)

    item = playbook.items[item_id]
    return {
        "id": item_id,
        "new_score": item.net_score,
        "confidence": item.confidence
    }

@router.get("/session/{session_id}")
async def get_session_state(session_id: str):
    """Get current session state."""
    # Implementation depends on session storage
    pass

@router.get("/cache/stats")
async def get_cache_stats():
    """Get KV-cache efficiency statistics."""
    from optimization_engine.context.cache_monitor import ContextCacheOptimizer

    # Would need to access singleton cache optimizer
    return {
        "message": "Cache stats endpoint - implement with actual cache monitor"
    }
```

---

## Part 4: Testing & Validation

### 4.1 Test Suite

**File**: `tests/test_context_engineering.py`

```python
"""
Test suite for context engineering components.
"""

import pytest
from pathlib import Path
import tempfile
import json

from optimization_engine.context.playbook import (
    AtomizerPlaybook,
    PlaybookItem,
    InsightCategory
)
from optimization_engine.context.reflector import (
    AtomizerReflector,
    OptimizationOutcome
)
from optimization_engine.context.session_state import (
    AtomizerSessionState,
    TaskType
)
from optimization_engine.context.compaction import (
    CompactionManager,
    ContextEvent
)


class TestAtomizerPlaybook:
    """Tests for the playbook system."""

    def test_add_insight(self):
        """Test adding insights to playbook."""
        playbook = AtomizerPlaybook()

        item = playbook.add_insight(
            category=InsightCategory.STRATEGY,
            content="Use shell elements for thin walls",
            source_trial=1
        )

        assert item.id == "str-00001"
        assert item.helpful_count == 0
        assert item.harmful_count == 0
        assert len(playbook.items) == 1

    def test_deduplication(self):
        """Test that duplicate insights are merged."""
        playbook = AtomizerPlaybook()

        playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements")
        playbook.add_insight(InsightCategory.STRATEGY, "Use shell elements")

        assert len(playbook.items) == 1
        assert playbook.items["str-00001"].helpful_count == 1

    def test_outcome_tracking(self):
        """Test helpful/harmful tracking."""
        playbook = AtomizerPlaybook()
        item = playbook.add_insight(InsightCategory.STRATEGY, "Test insight")

        playbook.record_outcome(item.id, helpful=True)
        playbook.record_outcome(item.id, helpful=True)
        playbook.record_outcome(item.id, helpful=False)

        assert item.helpful_count == 2
        assert item.harmful_count == 1
        assert item.net_score == 1

    def test_persistence(self, tmp_path):
        """Test save/load cycle."""
        playbook = AtomizerPlaybook()
        playbook.add_insight(InsightCategory.MISTAKE, "Don't do this")

        save_path = tmp_path / "playbook.json"
        playbook.save(save_path)

        loaded = AtomizerPlaybook.load(save_path)
        assert len(loaded.items) == 1
        assert "mis-00001" in loaded.items

    def test_pruning(self):
        """Test harmful item pruning."""
        playbook = AtomizerPlaybook()
        item = playbook.add_insight(InsightCategory.STRATEGY, "Bad advice")

        # Record many harmful outcomes
        for _ in range(5):
            playbook.record_outcome(item.id, helpful=False)

        playbook.prune_harmful(threshold=-3)
        assert len(playbook.items) == 0


class TestAtomizerReflector:
    """Tests for the reflector component."""

    def test_analyze_failed_trial(self):
        """Test analysis of failed trial."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)

        outcome = OptimizationOutcome(
            trial_number=1,
            success=False,
            objective_value=None,
            constraint_violations=["stress > 250 MPa"],
            solver_errors=["convergence failure at iteration 50"],
            design_variables={"thickness": 0.5},
            extractor_used="stress_extractor",
            duration_seconds=120
        )

        insights = reflector.analyze_trial(outcome)

        assert len(insights) >= 2  # At least error + constraint
        assert any(i["category"] == InsightCategory.MISTAKE for i in insights)

    def test_commit_insights(self):
        """Test committing insights to playbook."""
        playbook = AtomizerPlaybook()
        reflector = AtomizerReflector(playbook)

        outcome = OptimizationOutcome(
            trial_number=1,
            success=True,
            objective_value=100.0,
            constraint_violations=[],
            solver_errors=[],
            design_variables={"thickness": 1.0},
            extractor_used="mass_extractor",
            duration_seconds=60
        )

        reflector.analyze_trial(outcome)
        count = reflector.commit_insights()

        assert count > 0
        assert len(playbook.items) > 0


class TestSessionState:
    """Tests for session state management."""

    def test_exposed_state_context(self):
        """Test LLM context generation."""
        session = AtomizerSessionState(session_id="test")
        session.exposed.task_type = TaskType.CREATE_STUDY
        session.exposed.study_name = "bracket_opt"
        session.exposed.trials_completed = 25
        session.exposed.best_value = 0.5

        context = session.get_llm_context()

        assert "bracket_opt" in context
        assert "25" in context
        assert "0.5" in context

    def test_action_compression(self):
        """Test automatic action compression."""
        session = AtomizerSessionState(session_id="test")

        for i in range(15):
            session.add_action(f"Action {i}")

        # Should be compressed
        assert len(session.exposed.recent_actions) <= 12
        assert "summarized" in session.exposed.recent_actions[1].lower()


class TestCompactionManager:
    """Tests for context compaction."""

    def test_compaction_trigger(self):
        """Test that compaction triggers at threshold."""
        manager = CompactionManager(compaction_threshold=10, keep_recent=5)

        for i in range(15):
            manager.add_event(ContextEvent(
                timestamp=datetime.now(),
                event_type="trial_complete",
                summary=f"Trial {i} complete",
                details={"trial_number": i, "objective": i * 0.1}
            ))

        assert manager.compaction_count > 0
        assert len(manager.events) <= 10

    def test_error_preservation(self):
        """Test that errors are never compacted."""
        manager = CompactionManager(compaction_threshold=10, keep_recent=3)

        # Add error early
        manager.add_event(ContextEvent(
            timestamp=datetime.now(),
            event_type="error",
            summary="Critical solver failure"
        ))

        # Add many regular events
        for i in range(20):
            manager.add_event(ContextEvent(
                timestamp=datetime.now(),
                event_type="trial_complete",
                summary=f"Trial {i}"
            ))

        # Error should still be present
        errors = [e for e in manager.events if e.event_type == "error"]
        assert len(errors) == 1
```

### 4.2 Integration Test

**File**: `tests/test_context_integration.py`

```python
"""
Integration test for full context engineering pipeline.
"""

import pytest
from pathlib import Path
import tempfile

def test_full_optimization_with_context_engineering():
    """
    End-to-end test of optimization with context engineering.

    Simulates:
    1. Starting fresh session
    2. Running optimization with failures
    3. Verifying playbook learns from failures
    4. Running second optimization
    5. Verifying improved performance
    """
    from optimization_engine.context.playbook import AtomizerPlaybook
    from optimization_engine.context.feedback_loop import FeedbackLoop

    with tempfile.TemporaryDirectory() as tmp_dir:
        playbook_path = Path(tmp_dir) / "playbook.json"

        # Initialize feedback loop
        feedback = FeedbackLoop(playbook_path)

        # Simulate first study with failures
        for i in range(10):
            success = i % 3 != 0  # Every 3rd trial fails
            feedback.process_trial_result(
                trial_number=i,
                success=success,
                objective_value=100 - i if success else 0,
                design_variables={"thickness": 0.5 + i * 0.1},
                context_items_used=[],
                errors=["convergence failure"] if not success else []
            )

        # Finalize and check learning
        result = feedback.finalize_study({
            "name": "test_study",
            "total_trials": 10,
            "best_value": 91,
            "convergence_rate": 0.7
        })

        assert result["insights_added"] > 0

        # Load playbook and verify content
        playbook = AtomizerPlaybook.load(playbook_path)

        # Should have learned about convergence failures
        mistakes = [
            item for item in playbook.items.values()
            if item.category.value == "mis"
        ]
        assert len(mistakes) > 0
```

---

## Part 5: Rollout Plan

### Week 1-2: Foundation
- [ ] Implement `AtomizerPlaybook` class
- [ ] Implement `AtomizerReflector` class
- [ ] Add playbook persistence (JSON)
- [ ] Write unit tests
- [ ] Integrate with existing LAC concepts

### Week 3: Context Management
- [ ] Implement `AtomizerSessionState`
- [ ] Update `02_CONTEXT_LOADER.md` with playbook integration
- [ ] Create stable prefix template
- [ ] Implement cache monitoring

### Week 4: Learning Loop
- [ ] Implement `FeedbackLoop`
- [ ] Create error tracker hook
- [ ] Implement compaction manager
- [ ] Integration testing

### Week 5: Claude Code Integration
- [ ] Update `00_BOOTSTRAP.md` to v2
- [ ] Add dashboard API routes
- [ ] Create playbook visualization component
- [ ] End-to-end testing with real optimizations

### Week 6: Polish & Documentation
- [ ] Performance benchmarking
- [ ] Cost analysis (cache hit rates)
- [ ] Documentation updates
- [ ] Team training materials

---

## Success Metrics

| Metric | Baseline | Target | Measurement |
|--------|----------|--------|-------------|
| Task success rate | ~70% | 80-85% | Track via feedback loop |
| Repeated mistakes | N/A | <20% recurrence | Playbook harmful counts |
| Cache hit rate | 0% | >70% | Cache monitor stats |
| Cost per session | $X | 0.3X | API billing analysis |
| Playbook growth | 0 | 100+ items/month | Playbook stats |

---

## References

1. **ACE Framework**: Zhang et al., "Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models", arXiv:2510.04618, Oct 2025
2. **Manus Blog**: "Context Engineering for AI Agents: Lessons from Building Manus"
3. **Anthropic**: "Effective context engineering for AI agents"
4. **LangChain**: "Context Engineering for Agents"
5. **Google ADK**: "Architecting efficient context-aware multi-agent framework"

---

*Document generated: December 2025*
*For Claude Code implementation sessions*