Files
ATOCore/src/atocore/context/builder.py

425 lines
14 KiB
Python
Raw Normal View History

"""Context pack assembly: retrieve, rank, budget, format.
Trust precedence (per Master Plan):
1. Trusted Project State always included first, highest authority
2. Identity + Preference memories included next
3. Retrieved chunks ranked, deduplicated, budget-constrained
"""
import time
from dataclasses import dataclass, field
from pathlib import Path
import atocore.config as _config
from atocore.context.project_state import format_project_state, get_state
from atocore.memory.service import get_memories_for_context
from atocore.observability.logger import get_logger
fix(P1+P2): canonicalize project names at every trust boundary Three findings from codex's review of the previous P1+P2 fix. The earlier commit (f2372ef) only fixed alias resolution at the context builder. Codex correctly pointed out that the same fragmentation applies at every other place a project name crosses a boundary — project_state writes/reads, interaction capture/listing/filtering, memory create/queries, and reinforcement's downstream queries. Plus a real bug in the interaction `since` filter where the storage format and the documented ISO format don't compare cleanly. The fix is one helper used at every boundary instead of duplicating the resolution inline. New helper: src/atocore/projects/registry.py::resolve_project_name --------------------------------------------------------------- - Single canonicalization boundary for project names - Returns the canonical project_id when the input matches any registered id or alias - Returns the input unchanged for empty/None and for unregistered names (preserves backwards compat with hand-curated state that predates the registry) - Documented as the contract that every read/write at the trust boundary should pass through P1 — Trusted Project State endpoints ------------------------------------ src/atocore/context/project_state.py: set_state, get_state, and invalidate_state now all canonicalize project_name through resolve_project_name BEFORE looking up or creating the project row. Before this fix: - POST /project/state with project="p05" called ensure_project("p05") which created a separate row in the projects table - The state row was attached to that alias project_id - Later context builds canonicalized "p05" -> "p05-interferometer" via the builder fix from f2372ef and never found the state - Result: trusted state silently fragmented across alias rows After this fix: - The alias is resolved to the canonical id at every entry point - Two captures (one via "p05", one via "p05-interferometer") write to the same row - get_state via either alias or the canonical id finds the same row Fixes the highest-priority gap codex flagged because Trusted Project State is supposed to be the most dependable layer in the AtoCore trust hierarchy. P2.a — Interaction capture project canonicalization ---------------------------------------------------- src/atocore/interactions/service.py: record_interaction now canonicalizes project before storing, so interaction.project is always the canonical id regardless of what the client passed. Downstream effects: - reinforce_from_interaction queries memories by interaction.project -> previously missed memories stored under canonical id -> now consistent because interaction.project IS the canonical id - the extractor stamps candidates with interaction.project -> previously created candidates in alias buckets -> now creates candidates in the canonical bucket - list_interactions(project=alias) was already broken, now fixed by canonicalizing the filter input on the read side too Memory service applied the same fix: - src/atocore/memory/service.py: create_memory and get_memories both canonicalize project through resolve_project_name - This keeps stored memory.project consistent with the reinforcement query path P2.b — Interaction `since` filter format normalization ------------------------------------------------------ src/atocore/interactions/service.py: new _normalize_since helper. The bug: - created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by convention) so it sorts lexically and compares cleanly with the SQLite CURRENT_TIMESTAMP default - The `since` parameter was documented as ISO 8601 but compared as a raw string against the storage format - The lexically-greater 'T' separator means an ISO timestamp like '2026-04-07T12:00:00Z' is GREATER than the storage form '2026-04-07 12:00:00' for the same instant - Result: a client passing ISO `since` got an empty result for any row from the same day, even though those rows existed and were technically "after" the cutoff in real-world time The fix: - _normalize_since accepts ISO 8601 with T, optional Z suffix, optional fractional seconds, optional +HH:MM offsets - Uses datetime.fromisoformat for parsing (Python 3.11+) - Converts to UTC and reformats as the storage format before the SQL comparison - The bare storage format still works (backwards compat path is a regex match that returns the input unchanged) - Unparseable input is returned as-is so the comparison degrades gracefully (rows just don't match) instead of raising and breaking the listing endpoint builder.py refactor ------------------- The previous P1 fix had inline canonicalization. Now it uses the shared helper for consistency: - import changed from get_registered_project to resolve_project_name - the inline lookup is replaced with a single helper call - the comment block now points at representation-authority.md for the canonicalization contract New shared test fixture: tests/conftest.py::project_registry ------------------------------------------------------------ - Standardizes the registry-setup pattern that was duplicated across test_context_builder.py, test_project_state.py, test_interactions.py, and test_reinforcement.py - Returns a callable that takes (project_id, [aliases]) tuples and writes them into a temp registry file with the env var pointed at it and config.settings reloaded - Used by all 12 new regression tests in this commit Tests (12 new, all green on first run) -------------------------------------- test_project_state.py: - test_set_state_canonicalizes_alias: write via alias, read via every alias and the canonical id, verify same row id - test_get_state_canonicalizes_alias_after_canonical_write - test_invalidate_state_canonicalizes_alias - test_unregistered_project_state_still_works (backwards compat) test_interactions.py: - test_record_interaction_canonicalizes_project - test_list_interactions_canonicalizes_project_filter - test_list_interactions_since_accepts_iso_with_t_separator - test_list_interactions_since_accepts_z_suffix - test_list_interactions_since_accepts_offset - test_list_interactions_since_storage_format_still_works test_reinforcement.py: - test_reinforcement_works_when_capture_uses_alias (end-to-end: capture under alias, seed memory under canonical, verify reinforcement matches) - test_get_memories_filter_by_alias Full suite: 174 passing (was 162), 1 warning. The +12 is the new regression tests, no existing tests regressed. What's still NOT canonicalized (and why) ---------------------------------------- - _rank_chunks's secondary substring boost in builder.py — the retriever already does the right thing via its own _project_match_boost which calls get_registered_project. The redundant secondary boost still uses the raw hint but it's a multiplicative factor on top of correct retrieval, not a filter, so it can't drop relevant chunks. Tracked as a future cleanup but not a P1. - update_memory's project field (you can't change a memory's project after creation in the API anyway). - The retriever's project_hint parameter on direct /query calls — same reasoning as the builder boost, plus the retriever's own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
from atocore.projects.registry import resolve_project_name
from atocore.retrieval.retriever import ChunkResult, retrieve
log = get_logger("context_builder")
SYSTEM_PREFIX = (
"You have access to the following personal context from the user's knowledge base.\n"
"Use it to inform your answer. If the context is not relevant, ignore it.\n"
"Do not mention the context system unless asked.\n"
"When project state is provided, treat it as the most authoritative source."
)
# Budget allocation (per Master Plan section 9):
# identity: 5%, preferences: 5%, project state: 20%, retrieval: 60%+
PROJECT_STATE_BUDGET_RATIO = 0.20
MEMORY_BUDGET_RATIO = 0.05 # identity + preference; lowered from 0.10 to avoid squeezing project memories and chunks
# Project-scoped memories (project/knowledge/episodic) are the outlet
# for the Phase 9 reflection loop on the retrieval side. Budget sits
# between identity/preference and retrieved chunks so a reinforced
# memory can actually reach the model.
PROJECT_MEMORY_BUDGET_RATIO = 0.25
PROJECT_MEMORY_TYPES = ["project", "knowledge", "episodic"]
# Last built context pack for debug inspection
_last_context_pack: "ContextPack | None" = None
@dataclass
class ContextChunk:
content: str
source_file: str
heading_path: str
score: float
char_count: int
@dataclass
class ContextPack:
chunks_used: list[ContextChunk] = field(default_factory=list)
project_state_text: str = ""
project_state_chars: int = 0
memory_text: str = ""
memory_chars: int = 0
project_memory_text: str = ""
project_memory_chars: int = 0
total_chars: int = 0
budget: int = 0
budget_remaining: int = 0
formatted_context: str = ""
full_prompt: str = ""
query: str = ""
project_hint: str = ""
duration_ms: int = 0
def build_context(
user_prompt: str,
project_hint: str | None = None,
budget: int | None = None,
) -> ContextPack:
"""Build a context pack for a user prompt.
Trust precedence applied:
1. Project state is injected first (highest trust)
2. Identity + preference memories (second trust level)
3. Retrieved chunks fill the remaining budget
"""
global _last_context_pack
start = time.time()
budget = _config.settings.context_budget if budget is None else max(budget, 0)
# 1. Get Trusted Project State (highest precedence)
project_state_text = ""
project_state_chars = 0
project_state_budget = min(
budget,
max(0, int(budget * PROJECT_STATE_BUDGET_RATIO)),
)
fix(P1+P2): canonicalize project names at every trust boundary Three findings from codex's review of the previous P1+P2 fix. The earlier commit (f2372ef) only fixed alias resolution at the context builder. Codex correctly pointed out that the same fragmentation applies at every other place a project name crosses a boundary — project_state writes/reads, interaction capture/listing/filtering, memory create/queries, and reinforcement's downstream queries. Plus a real bug in the interaction `since` filter where the storage format and the documented ISO format don't compare cleanly. The fix is one helper used at every boundary instead of duplicating the resolution inline. New helper: src/atocore/projects/registry.py::resolve_project_name --------------------------------------------------------------- - Single canonicalization boundary for project names - Returns the canonical project_id when the input matches any registered id or alias - Returns the input unchanged for empty/None and for unregistered names (preserves backwards compat with hand-curated state that predates the registry) - Documented as the contract that every read/write at the trust boundary should pass through P1 — Trusted Project State endpoints ------------------------------------ src/atocore/context/project_state.py: set_state, get_state, and invalidate_state now all canonicalize project_name through resolve_project_name BEFORE looking up or creating the project row. Before this fix: - POST /project/state with project="p05" called ensure_project("p05") which created a separate row in the projects table - The state row was attached to that alias project_id - Later context builds canonicalized "p05" -> "p05-interferometer" via the builder fix from f2372ef and never found the state - Result: trusted state silently fragmented across alias rows After this fix: - The alias is resolved to the canonical id at every entry point - Two captures (one via "p05", one via "p05-interferometer") write to the same row - get_state via either alias or the canonical id finds the same row Fixes the highest-priority gap codex flagged because Trusted Project State is supposed to be the most dependable layer in the AtoCore trust hierarchy. P2.a — Interaction capture project canonicalization ---------------------------------------------------- src/atocore/interactions/service.py: record_interaction now canonicalizes project before storing, so interaction.project is always the canonical id regardless of what the client passed. Downstream effects: - reinforce_from_interaction queries memories by interaction.project -> previously missed memories stored under canonical id -> now consistent because interaction.project IS the canonical id - the extractor stamps candidates with interaction.project -> previously created candidates in alias buckets -> now creates candidates in the canonical bucket - list_interactions(project=alias) was already broken, now fixed by canonicalizing the filter input on the read side too Memory service applied the same fix: - src/atocore/memory/service.py: create_memory and get_memories both canonicalize project through resolve_project_name - This keeps stored memory.project consistent with the reinforcement query path P2.b — Interaction `since` filter format normalization ------------------------------------------------------ src/atocore/interactions/service.py: new _normalize_since helper. The bug: - created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by convention) so it sorts lexically and compares cleanly with the SQLite CURRENT_TIMESTAMP default - The `since` parameter was documented as ISO 8601 but compared as a raw string against the storage format - The lexically-greater 'T' separator means an ISO timestamp like '2026-04-07T12:00:00Z' is GREATER than the storage form '2026-04-07 12:00:00' for the same instant - Result: a client passing ISO `since` got an empty result for any row from the same day, even though those rows existed and were technically "after" the cutoff in real-world time The fix: - _normalize_since accepts ISO 8601 with T, optional Z suffix, optional fractional seconds, optional +HH:MM offsets - Uses datetime.fromisoformat for parsing (Python 3.11+) - Converts to UTC and reformats as the storage format before the SQL comparison - The bare storage format still works (backwards compat path is a regex match that returns the input unchanged) - Unparseable input is returned as-is so the comparison degrades gracefully (rows just don't match) instead of raising and breaking the listing endpoint builder.py refactor ------------------- The previous P1 fix had inline canonicalization. Now it uses the shared helper for consistency: - import changed from get_registered_project to resolve_project_name - the inline lookup is replaced with a single helper call - the comment block now points at representation-authority.md for the canonicalization contract New shared test fixture: tests/conftest.py::project_registry ------------------------------------------------------------ - Standardizes the registry-setup pattern that was duplicated across test_context_builder.py, test_project_state.py, test_interactions.py, and test_reinforcement.py - Returns a callable that takes (project_id, [aliases]) tuples and writes them into a temp registry file with the env var pointed at it and config.settings reloaded - Used by all 12 new regression tests in this commit Tests (12 new, all green on first run) -------------------------------------- test_project_state.py: - test_set_state_canonicalizes_alias: write via alias, read via every alias and the canonical id, verify same row id - test_get_state_canonicalizes_alias_after_canonical_write - test_invalidate_state_canonicalizes_alias - test_unregistered_project_state_still_works (backwards compat) test_interactions.py: - test_record_interaction_canonicalizes_project - test_list_interactions_canonicalizes_project_filter - test_list_interactions_since_accepts_iso_with_t_separator - test_list_interactions_since_accepts_z_suffix - test_list_interactions_since_accepts_offset - test_list_interactions_since_storage_format_still_works test_reinforcement.py: - test_reinforcement_works_when_capture_uses_alias (end-to-end: capture under alias, seed memory under canonical, verify reinforcement matches) - test_get_memories_filter_by_alias Full suite: 174 passing (was 162), 1 warning. The +12 is the new regression tests, no existing tests regressed. What's still NOT canonicalized (and why) ---------------------------------------- - _rank_chunks's secondary substring boost in builder.py — the retriever already does the right thing via its own _project_match_boost which calls get_registered_project. The redundant secondary boost still uses the raw hint but it's a multiplicative factor on top of correct retrieval, not a filter, so it can't drop relevant chunks. Tracked as a future cleanup but not a P1. - update_memory's project field (you can't change a memory's project after creation in the API anyway). - The retriever's project_hint parameter on direct /query calls — same reasoning as the builder boost, plus the retriever's own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
# Canonicalize the project hint through the registry so callers
# can pass an alias (`p05`, `gigabit`) and still find trusted
# state stored under the canonical project id. The same helper
# is used everywhere a project name crosses a trust boundary
# (project_state, memories, interactions). When the registry has
# no entry the helper returns the input unchanged so hand-curated
# state that predates the registry still works.
canonical_project = resolve_project_name(project_hint) if project_hint else ""
if canonical_project:
fix(P1+P2): alias-aware project state lookup + slash command corpus fallback Two regression fixes from codex's review of the slash command refactor commit (78d4e97). Both findings are real and now have covered tests. P1 — server-side alias resolution for project_state lookup ---------------------------------------------------------- The bug: - /context/build forwarded the caller's project hint verbatim to get_state(project_hint), which does an exact-name lookup against the projects table (case-insensitive but no alias resolution) - the project registry's alias matching was only used by the client's auto-context path and the retriever's project-match boost, never by the server's project_state lookup - consequence: /atocore-context "... p05" would silently miss trusted project state stored under the canonical id "p05-interferometer", weakening project-hinted retrieval to the point that an explicit alias hint was *worse* than no hint The fix in src/atocore/context/builder.py: - import get_registered_project from the projects registry - before calling get_state(project_hint), resolve the hint through get_registered_project; if a registry record exists, use the canonical project_id for the state lookup - if no registry record exists, fall back to the raw hint so a hand-curated project_state entry that predates the registry still works (backwards compat with pre-registry deployments) The retriever already does its own alias expansion via get_registered_project for the project-match boost, so the retriever side was never broken — only the project_state lookup in the builder. The fix is scoped to that one call site. Tests added in tests/test_context_builder.py: - test_alias_hint_resolves_through_registry: stands up a fresh registry, sets state under "p05-interferometer", then verifies build_context with project_hint="p05" finds the state, AND with project_hint="interferometer" (the second alias) finds it too, AND with the canonical id finds it. Covers all three resolution paths. - test_unknown_hint_falls_back_to_raw_lookup: empty registry, set state under an unregistered project name, verify the build_context call with that name as the hint still finds the state. Locks in the backwards-compat behavior. P2 — slash command no-hint fallback to corpus-wide context build ---------------------------------------------------------------- The bug: - the slash command's no-hint path called auto-context, which returns {"status": "no_project_match"} when project detection fails and does NOT fall back to a plain context-build - the slash command's own help text told the user "call without a hint to use the corpus-wide context build" — which was a lie because the wrapper no longer did that - consequence: generic prompts like "what changed in AtoCore backup policy?" or any cross-project question got a useless no_project_match envelope instead of a context pack The fix in .claude/commands/atocore-context.md: - the no-hint path now does the 2-step fallback dance: 1. try `auto-context "<prompt>"` for project detection 2. if the response contains "no_project_match", fall back to `context-build "<prompt>"` (no project arg) - both branches return a real context pack, fail-open envelope is preserved for genuine network errors - the underlying client surface is unchanged (no new flags, no new subcommands) — the fallback is per-frontend logic in the slash command, leaving auto-context's existing semantics intact for OpenClaw and any other caller that depends on the no_project_match envelope as a "do nothing" signal While I was here, also tightened the slash command's argument parsing to delegate alias-knowledge to the registry instead of embedding a hardcoded list: - old version had a literal list of "atocore", "p04", "p05", "p06" and their aliases that needed manual maintenance every time a project was added - new version takes the last token of $ARGUMENTS and asks the client's `detect-project` subcommand whether it's a known alias; if matched, it's the explicit hint, if not it's part of the prompt - this delegates registry knowledge to the registry, where it belongs Unrelated improvement noted but NOT fixed in this commit: - _rank_chunks in builder.py also has a naive substring boost that uses the original hint without alias expansion. The retriever already does the right thing, so this secondary boost is redundant. Tracked as a future cleanup but not in scope for the P1/P2 fix; codex's findings are about project_state lookup, not about the secondary chunk boost. Full suite: 162 passing (was 160), 1 warning. The +2 is the two new P1 regression tests.
2026-04-07 07:47:03 -04:00
state_entries = get_state(canonical_project)
if state_entries:
project_state_text = format_project_state(state_entries)
project_state_text, project_state_chars = _truncate_text_block(
project_state_text,
project_state_budget or budget,
)
# 2. Get identity + preference memories (second precedence)
memory_budget = min(int(budget * MEMORY_BUDGET_RATIO), max(budget - project_state_chars, 0))
memory_text, memory_chars = get_memories_for_context(
memory_types=["identity", "preference"],
budget=memory_budget,
query=user_prompt,
)
# 2b. Get project-scoped memories (third precedence). Only
# populated when a canonical project is in scope — cross-project
# memory bleed would rot the pack. Active-only filtering is
# handled by the shared min_confidence=0.5 gate inside
# get_memories_for_context.
project_memory_text = ""
project_memory_chars = 0
if canonical_project:
project_memory_budget = min(
int(budget * PROJECT_MEMORY_BUDGET_RATIO),
max(budget - project_state_chars - memory_chars, 0),
)
project_memory_text, project_memory_chars = get_memories_for_context(
memory_types=PROJECT_MEMORY_TYPES,
project=canonical_project,
budget=project_memory_budget,
header="--- Project Memories ---",
footer="--- End Project Memories ---",
query=user_prompt,
)
# 3. Calculate remaining budget for retrieval
retrieval_budget = budget - project_state_chars - memory_chars - project_memory_chars
# 4. Retrieve candidates
candidates = (
retrieve(
user_prompt,
top_k=_config.settings.context_top_k,
project_hint=project_hint,
)
if retrieval_budget > 0
else []
)
# 5. Score and rank
scored = _rank_chunks(candidates, project_hint)
# 6. Select within remaining budget
selected = _select_within_budget(scored, max(retrieval_budget, 0))
# 7. Format full context
formatted = _format_full_context(
project_state_text, memory_text, project_memory_text, selected
)
if len(formatted) > budget:
formatted, selected = _trim_context_to_budget(
project_state_text,
memory_text,
project_memory_text,
selected,
budget,
)
# 8. Build full prompt
full_prompt = f"{SYSTEM_PREFIX}\n\n{formatted}\n\n{user_prompt}"
project_state_chars = len(project_state_text)
memory_chars = len(memory_text)
project_memory_chars = len(project_memory_text)
retrieval_chars = sum(c.char_count for c in selected)
total_chars = len(formatted)
duration_ms = int((time.time() - start) * 1000)
pack = ContextPack(
chunks_used=selected,
project_state_text=project_state_text,
project_state_chars=project_state_chars,
memory_text=memory_text,
memory_chars=memory_chars,
project_memory_text=project_memory_text,
project_memory_chars=project_memory_chars,
total_chars=total_chars,
budget=budget,
budget_remaining=budget - total_chars,
formatted_context=formatted,
full_prompt=full_prompt,
query=user_prompt,
project_hint=project_hint or "",
duration_ms=duration_ms,
)
_last_context_pack = pack
log.info(
"context_built",
chunks_used=len(selected),
project_state_chars=project_state_chars,
memory_chars=memory_chars,
project_memory_chars=project_memory_chars,
retrieval_chars=retrieval_chars,
total_chars=total_chars,
budget_remaining=budget - total_chars,
duration_ms=duration_ms,
)
log.debug("context_pack_detail", pack=_pack_to_dict(pack))
return pack
def get_last_context_pack() -> ContextPack | None:
"""Return the last built context pack for debug inspection."""
return _last_context_pack
def _rank_chunks(
candidates: list[ChunkResult],
project_hint: str | None,
) -> list[tuple[float, ChunkResult]]:
"""Rank candidates with boosting for project match."""
scored = []
seen_content: set[str] = set()
for chunk in candidates:
# Deduplicate by content prefix (first 200 chars)
content_key = chunk.content[:200]
if content_key in seen_content:
continue
seen_content.add(content_key)
# Base score from similarity
final_score = chunk.score
# Project boost
if project_hint:
tags_str = chunk.tags.lower() if chunk.tags else ""
source_str = chunk.source_file.lower()
title_str = chunk.title.lower() if chunk.title else ""
hint_lower = project_hint.lower()
if hint_lower in tags_str or hint_lower in source_str or hint_lower in title_str:
final_score *= 1.3
scored.append((final_score, chunk))
# Sort by score descending
scored.sort(key=lambda x: x[0], reverse=True)
return scored
def _select_within_budget(
scored: list[tuple[float, ChunkResult]],
budget: int,
) -> list[ContextChunk]:
"""Select top chunks that fit within the character budget."""
selected = []
used = 0
for score, chunk in scored:
chunk_len = len(chunk.content)
if used + chunk_len > budget:
continue
selected.append(
ContextChunk(
content=chunk.content,
source_file=_shorten_path(chunk.source_file),
heading_path=chunk.heading_path,
score=score,
char_count=chunk_len,
)
)
used += chunk_len
return selected
def _format_full_context(
project_state_text: str,
memory_text: str,
project_memory_text: str,
chunks: list[ContextChunk],
) -> str:
"""Format project state + memories + retrieved chunks into full context block."""
parts = []
# 1. Project state first (highest trust)
if project_state_text:
parts.append(project_state_text)
parts.append("")
# 2. Identity + preference memories (second trust level)
if memory_text:
parts.append(memory_text)
parts.append("")
# 3. Project-scoped memories (third trust level)
if project_memory_text:
parts.append(project_memory_text)
parts.append("")
# 4. Retrieved chunks (lowest trust)
if chunks:
parts.append("--- AtoCore Retrieved Context ---")
if project_state_text:
parts.append("If retrieved context conflicts with Trusted Project State above, trust the Trusted Project State.")
for chunk in chunks:
parts.append(
f"[Source: {chunk.source_file} | Section: {chunk.heading_path} | Score: {chunk.score:.2f}]"
)
parts.append(chunk.content)
parts.append("")
parts.append("--- End Context ---")
elif not project_state_text and not memory_text and not project_memory_text:
parts.append("--- AtoCore Context ---\nNo relevant context found.\n--- End Context ---")
return "\n".join(parts)
def _shorten_path(path: str) -> str:
"""Shorten an absolute path to a relative-like display."""
p = Path(path)
parts = p.parts
if len(parts) > 3:
return str(Path(*parts[-3:]))
return str(p)
def _pack_to_dict(pack: ContextPack) -> dict:
"""Convert a context pack to a JSON-serializable dict."""
return {
"query": pack.query,
"project_hint": pack.project_hint,
"project_state_chars": pack.project_state_chars,
"memory_chars": pack.memory_chars,
"project_memory_chars": pack.project_memory_chars,
"chunks_used": len(pack.chunks_used),
"total_chars": pack.total_chars,
"budget": pack.budget,
"budget_remaining": pack.budget_remaining,
"duration_ms": pack.duration_ms,
"has_project_state": bool(pack.project_state_text),
"has_memories": bool(pack.memory_text),
"has_project_memories": bool(pack.project_memory_text),
"chunks": [
{
"source_file": c.source_file,
"heading_path": c.heading_path,
"score": c.score,
"char_count": c.char_count,
"content_preview": c.content[:100],
}
for c in pack.chunks_used
],
}
def _truncate_text_block(text: str, budget: int) -> tuple[str, int]:
"""Trim a formatted text block so trusted tiers cannot exceed the total budget."""
if budget <= 0 or not text:
return "", 0
if len(text) <= budget:
return text, len(text)
if budget <= 3:
trimmed = text[:budget]
else:
trimmed = f"{text[: budget - 3].rstrip()}..."
return trimmed, len(trimmed)
def _trim_context_to_budget(
project_state_text: str,
memory_text: str,
project_memory_text: str,
chunks: list[ContextChunk],
budget: int,
) -> tuple[str, list[ContextChunk]]:
"""Trim retrieval → project memories → identity/preference → project state."""
kept_chunks = list(chunks)
formatted = _format_full_context(
project_state_text, memory_text, project_memory_text, kept_chunks
)
while len(formatted) > budget and kept_chunks:
kept_chunks.pop()
formatted = _format_full_context(
project_state_text, memory_text, project_memory_text, kept_chunks
)
if len(formatted) <= budget:
return formatted, kept_chunks
# Drop project memories next (they were the most recently added
# tier and carry less trust than identity/preference).
project_memory_text, _ = _truncate_text_block(
project_memory_text,
max(budget - len(project_state_text) - len(memory_text), 0),
)
formatted = _format_full_context(
project_state_text, memory_text, project_memory_text, kept_chunks
)
if len(formatted) <= budget:
return formatted, kept_chunks
memory_text, _ = _truncate_text_block(memory_text, max(budget - len(project_state_text), 0))
formatted = _format_full_context(
project_state_text, memory_text, project_memory_text, kept_chunks
)
if len(formatted) <= budget:
return formatted, kept_chunks
project_state_text, _ = _truncate_text_block(project_state_text, budget)
formatted = _format_full_context(project_state_text, "", "", [])
if len(formatted) > budget:
formatted, _ = _truncate_text_block(formatted, budget)
return formatted, []