Files
ATOCore/src/atocore/context/project_state.py

255 lines
8.2 KiB
Python
Raw Normal View History

"""Trusted Project State — the highest-priority context source.
Per the Master Plan trust precedence:
1. Trusted Project State (this module)
2. AtoDrive artifacts
3. Recent validated memory
4. AtoVault summaries
5. PKM chunks
6. Historical / low-confidence
Project state is manually curated or explicitly confirmed facts about a project.
It always wins over retrieval-based context when there's a conflict.
"""
import uuid
from dataclasses import dataclass
from datetime import datetime, timezone
from atocore.models.database import get_connection
from atocore.observability.logger import get_logger
fix(P1+P2): canonicalize project names at every trust boundary Three findings from codex's review of the previous P1+P2 fix. The earlier commit (f2372ef) only fixed alias resolution at the context builder. Codex correctly pointed out that the same fragmentation applies at every other place a project name crosses a boundary — project_state writes/reads, interaction capture/listing/filtering, memory create/queries, and reinforcement's downstream queries. Plus a real bug in the interaction `since` filter where the storage format and the documented ISO format don't compare cleanly. The fix is one helper used at every boundary instead of duplicating the resolution inline. New helper: src/atocore/projects/registry.py::resolve_project_name --------------------------------------------------------------- - Single canonicalization boundary for project names - Returns the canonical project_id when the input matches any registered id or alias - Returns the input unchanged for empty/None and for unregistered names (preserves backwards compat with hand-curated state that predates the registry) - Documented as the contract that every read/write at the trust boundary should pass through P1 — Trusted Project State endpoints ------------------------------------ src/atocore/context/project_state.py: set_state, get_state, and invalidate_state now all canonicalize project_name through resolve_project_name BEFORE looking up or creating the project row. Before this fix: - POST /project/state with project="p05" called ensure_project("p05") which created a separate row in the projects table - The state row was attached to that alias project_id - Later context builds canonicalized "p05" -> "p05-interferometer" via the builder fix from f2372ef and never found the state - Result: trusted state silently fragmented across alias rows After this fix: - The alias is resolved to the canonical id at every entry point - Two captures (one via "p05", one via "p05-interferometer") write to the same row - get_state via either alias or the canonical id finds the same row Fixes the highest-priority gap codex flagged because Trusted Project State is supposed to be the most dependable layer in the AtoCore trust hierarchy. P2.a — Interaction capture project canonicalization ---------------------------------------------------- src/atocore/interactions/service.py: record_interaction now canonicalizes project before storing, so interaction.project is always the canonical id regardless of what the client passed. Downstream effects: - reinforce_from_interaction queries memories by interaction.project -> previously missed memories stored under canonical id -> now consistent because interaction.project IS the canonical id - the extractor stamps candidates with interaction.project -> previously created candidates in alias buckets -> now creates candidates in the canonical bucket - list_interactions(project=alias) was already broken, now fixed by canonicalizing the filter input on the read side too Memory service applied the same fix: - src/atocore/memory/service.py: create_memory and get_memories both canonicalize project through resolve_project_name - This keeps stored memory.project consistent with the reinforcement query path P2.b — Interaction `since` filter format normalization ------------------------------------------------------ src/atocore/interactions/service.py: new _normalize_since helper. The bug: - created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by convention) so it sorts lexically and compares cleanly with the SQLite CURRENT_TIMESTAMP default - The `since` parameter was documented as ISO 8601 but compared as a raw string against the storage format - The lexically-greater 'T' separator means an ISO timestamp like '2026-04-07T12:00:00Z' is GREATER than the storage form '2026-04-07 12:00:00' for the same instant - Result: a client passing ISO `since` got an empty result for any row from the same day, even though those rows existed and were technically "after" the cutoff in real-world time The fix: - _normalize_since accepts ISO 8601 with T, optional Z suffix, optional fractional seconds, optional +HH:MM offsets - Uses datetime.fromisoformat for parsing (Python 3.11+) - Converts to UTC and reformats as the storage format before the SQL comparison - The bare storage format still works (backwards compat path is a regex match that returns the input unchanged) - Unparseable input is returned as-is so the comparison degrades gracefully (rows just don't match) instead of raising and breaking the listing endpoint builder.py refactor ------------------- The previous P1 fix had inline canonicalization. Now it uses the shared helper for consistency: - import changed from get_registered_project to resolve_project_name - the inline lookup is replaced with a single helper call - the comment block now points at representation-authority.md for the canonicalization contract New shared test fixture: tests/conftest.py::project_registry ------------------------------------------------------------ - Standardizes the registry-setup pattern that was duplicated across test_context_builder.py, test_project_state.py, test_interactions.py, and test_reinforcement.py - Returns a callable that takes (project_id, [aliases]) tuples and writes them into a temp registry file with the env var pointed at it and config.settings reloaded - Used by all 12 new regression tests in this commit Tests (12 new, all green on first run) -------------------------------------- test_project_state.py: - test_set_state_canonicalizes_alias: write via alias, read via every alias and the canonical id, verify same row id - test_get_state_canonicalizes_alias_after_canonical_write - test_invalidate_state_canonicalizes_alias - test_unregistered_project_state_still_works (backwards compat) test_interactions.py: - test_record_interaction_canonicalizes_project - test_list_interactions_canonicalizes_project_filter - test_list_interactions_since_accepts_iso_with_t_separator - test_list_interactions_since_accepts_z_suffix - test_list_interactions_since_accepts_offset - test_list_interactions_since_storage_format_still_works test_reinforcement.py: - test_reinforcement_works_when_capture_uses_alias (end-to-end: capture under alias, seed memory under canonical, verify reinforcement matches) - test_get_memories_filter_by_alias Full suite: 174 passing (was 162), 1 warning. The +12 is the new regression tests, no existing tests regressed. What's still NOT canonicalized (and why) ---------------------------------------- - _rank_chunks's secondary substring boost in builder.py — the retriever already does the right thing via its own _project_match_boost which calls get_registered_project. The redundant secondary boost still uses the raw hint but it's a multiplicative factor on top of correct retrieval, not a filter, so it can't drop relevant chunks. Tracked as a future cleanup but not a P1. - update_memory's project field (you can't change a memory's project after creation in the API anyway). - The retriever's project_hint parameter on direct /query calls — same reasoning as the builder boost, plus the retriever's own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
from atocore.projects.registry import resolve_project_name
log = get_logger("project_state")
# DB schema extension for project state
PROJECT_STATE_SCHEMA = """
CREATE TABLE IF NOT EXISTS project_state (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
category TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
source TEXT DEFAULT '',
confidence REAL DEFAULT 1.0,
status TEXT DEFAULT 'active',
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
UNIQUE(project_id, category, key)
);
CREATE INDEX IF NOT EXISTS idx_project_state_project ON project_state(project_id);
CREATE INDEX IF NOT EXISTS idx_project_state_category ON project_state(category);
CREATE INDEX IF NOT EXISTS idx_project_state_status ON project_state(status);
"""
# Valid categories for project state entries
CATEGORIES = [
"status", # current project status, phase, blockers
"decision", # confirmed design/engineering decisions
"requirement", # key requirements and constraints
"contact", # key people, vendors, stakeholders
"milestone", # dates, deadlines, deliverables
"fact", # verified technical facts
"config", # project configuration, parameters
]
@dataclass
class ProjectStateEntry:
id: str
project_id: str
category: str
key: str
value: str
source: str = ""
confidence: float = 1.0
status: str = "active"
created_at: str = ""
updated_at: str = ""
def init_project_state_schema() -> None:
"""Create the project_state table if it doesn't exist."""
with get_connection() as conn:
conn.executescript(PROJECT_STATE_SCHEMA)
log.info("project_state_schema_initialized")
def ensure_project(name: str, description: str = "") -> str:
"""Get or create a project by name. Returns project_id."""
with get_connection() as conn:
row = conn.execute(
"SELECT id FROM projects WHERE lower(name) = lower(?)", (name,)
).fetchone()
if row:
return row["id"]
project_id = str(uuid.uuid4())
conn.execute(
"INSERT INTO projects (id, name, description) VALUES (?, ?, ?)",
(project_id, name, description),
)
log.info("project_created", name=name, project_id=project_id)
return project_id
def set_state(
project_name: str,
category: str,
key: str,
value: str,
source: str = "",
confidence: float = 1.0,
) -> ProjectStateEntry:
fix(P1+P2): canonicalize project names at every trust boundary Three findings from codex's review of the previous P1+P2 fix. The earlier commit (f2372ef) only fixed alias resolution at the context builder. Codex correctly pointed out that the same fragmentation applies at every other place a project name crosses a boundary — project_state writes/reads, interaction capture/listing/filtering, memory create/queries, and reinforcement's downstream queries. Plus a real bug in the interaction `since` filter where the storage format and the documented ISO format don't compare cleanly. The fix is one helper used at every boundary instead of duplicating the resolution inline. New helper: src/atocore/projects/registry.py::resolve_project_name --------------------------------------------------------------- - Single canonicalization boundary for project names - Returns the canonical project_id when the input matches any registered id or alias - Returns the input unchanged for empty/None and for unregistered names (preserves backwards compat with hand-curated state that predates the registry) - Documented as the contract that every read/write at the trust boundary should pass through P1 — Trusted Project State endpoints ------------------------------------ src/atocore/context/project_state.py: set_state, get_state, and invalidate_state now all canonicalize project_name through resolve_project_name BEFORE looking up or creating the project row. Before this fix: - POST /project/state with project="p05" called ensure_project("p05") which created a separate row in the projects table - The state row was attached to that alias project_id - Later context builds canonicalized "p05" -> "p05-interferometer" via the builder fix from f2372ef and never found the state - Result: trusted state silently fragmented across alias rows After this fix: - The alias is resolved to the canonical id at every entry point - Two captures (one via "p05", one via "p05-interferometer") write to the same row - get_state via either alias or the canonical id finds the same row Fixes the highest-priority gap codex flagged because Trusted Project State is supposed to be the most dependable layer in the AtoCore trust hierarchy. P2.a — Interaction capture project canonicalization ---------------------------------------------------- src/atocore/interactions/service.py: record_interaction now canonicalizes project before storing, so interaction.project is always the canonical id regardless of what the client passed. Downstream effects: - reinforce_from_interaction queries memories by interaction.project -> previously missed memories stored under canonical id -> now consistent because interaction.project IS the canonical id - the extractor stamps candidates with interaction.project -> previously created candidates in alias buckets -> now creates candidates in the canonical bucket - list_interactions(project=alias) was already broken, now fixed by canonicalizing the filter input on the read side too Memory service applied the same fix: - src/atocore/memory/service.py: create_memory and get_memories both canonicalize project through resolve_project_name - This keeps stored memory.project consistent with the reinforcement query path P2.b — Interaction `since` filter format normalization ------------------------------------------------------ src/atocore/interactions/service.py: new _normalize_since helper. The bug: - created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by convention) so it sorts lexically and compares cleanly with the SQLite CURRENT_TIMESTAMP default - The `since` parameter was documented as ISO 8601 but compared as a raw string against the storage format - The lexically-greater 'T' separator means an ISO timestamp like '2026-04-07T12:00:00Z' is GREATER than the storage form '2026-04-07 12:00:00' for the same instant - Result: a client passing ISO `since` got an empty result for any row from the same day, even though those rows existed and were technically "after" the cutoff in real-world time The fix: - _normalize_since accepts ISO 8601 with T, optional Z suffix, optional fractional seconds, optional +HH:MM offsets - Uses datetime.fromisoformat for parsing (Python 3.11+) - Converts to UTC and reformats as the storage format before the SQL comparison - The bare storage format still works (backwards compat path is a regex match that returns the input unchanged) - Unparseable input is returned as-is so the comparison degrades gracefully (rows just don't match) instead of raising and breaking the listing endpoint builder.py refactor ------------------- The previous P1 fix had inline canonicalization. Now it uses the shared helper for consistency: - import changed from get_registered_project to resolve_project_name - the inline lookup is replaced with a single helper call - the comment block now points at representation-authority.md for the canonicalization contract New shared test fixture: tests/conftest.py::project_registry ------------------------------------------------------------ - Standardizes the registry-setup pattern that was duplicated across test_context_builder.py, test_project_state.py, test_interactions.py, and test_reinforcement.py - Returns a callable that takes (project_id, [aliases]) tuples and writes them into a temp registry file with the env var pointed at it and config.settings reloaded - Used by all 12 new regression tests in this commit Tests (12 new, all green on first run) -------------------------------------- test_project_state.py: - test_set_state_canonicalizes_alias: write via alias, read via every alias and the canonical id, verify same row id - test_get_state_canonicalizes_alias_after_canonical_write - test_invalidate_state_canonicalizes_alias - test_unregistered_project_state_still_works (backwards compat) test_interactions.py: - test_record_interaction_canonicalizes_project - test_list_interactions_canonicalizes_project_filter - test_list_interactions_since_accepts_iso_with_t_separator - test_list_interactions_since_accepts_z_suffix - test_list_interactions_since_accepts_offset - test_list_interactions_since_storage_format_still_works test_reinforcement.py: - test_reinforcement_works_when_capture_uses_alias (end-to-end: capture under alias, seed memory under canonical, verify reinforcement matches) - test_get_memories_filter_by_alias Full suite: 174 passing (was 162), 1 warning. The +12 is the new regression tests, no existing tests regressed. What's still NOT canonicalized (and why) ---------------------------------------- - _rank_chunks's secondary substring boost in builder.py — the retriever already does the right thing via its own _project_match_boost which calls get_registered_project. The redundant secondary boost still uses the raw hint but it's a multiplicative factor on top of correct retrieval, not a filter, so it can't drop relevant chunks. Tracked as a future cleanup but not a P1. - update_memory's project field (you can't change a memory's project after creation in the API anyway). - The retriever's project_hint parameter on direct /query calls — same reasoning as the builder boost, plus the retriever's own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
"""Set or update a project state entry. Upsert semantics.
The ``project_name`` is canonicalized through the registry so a
caller passing an alias (``p05``) ends up writing into the same
row as the canonical id (``p05-interferometer``). Without this
step, alias and canonical names would create two parallel
project rows and fragmented state.
"""
if category not in CATEGORIES:
raise ValueError(f"Invalid category '{category}'. Must be one of: {CATEGORIES}")
_validate_confidence(confidence)
fix(P1+P2): canonicalize project names at every trust boundary Three findings from codex's review of the previous P1+P2 fix. The earlier commit (f2372ef) only fixed alias resolution at the context builder. Codex correctly pointed out that the same fragmentation applies at every other place a project name crosses a boundary — project_state writes/reads, interaction capture/listing/filtering, memory create/queries, and reinforcement's downstream queries. Plus a real bug in the interaction `since` filter where the storage format and the documented ISO format don't compare cleanly. The fix is one helper used at every boundary instead of duplicating the resolution inline. New helper: src/atocore/projects/registry.py::resolve_project_name --------------------------------------------------------------- - Single canonicalization boundary for project names - Returns the canonical project_id when the input matches any registered id or alias - Returns the input unchanged for empty/None and for unregistered names (preserves backwards compat with hand-curated state that predates the registry) - Documented as the contract that every read/write at the trust boundary should pass through P1 — Trusted Project State endpoints ------------------------------------ src/atocore/context/project_state.py: set_state, get_state, and invalidate_state now all canonicalize project_name through resolve_project_name BEFORE looking up or creating the project row. Before this fix: - POST /project/state with project="p05" called ensure_project("p05") which created a separate row in the projects table - The state row was attached to that alias project_id - Later context builds canonicalized "p05" -> "p05-interferometer" via the builder fix from f2372ef and never found the state - Result: trusted state silently fragmented across alias rows After this fix: - The alias is resolved to the canonical id at every entry point - Two captures (one via "p05", one via "p05-interferometer") write to the same row - get_state via either alias or the canonical id finds the same row Fixes the highest-priority gap codex flagged because Trusted Project State is supposed to be the most dependable layer in the AtoCore trust hierarchy. P2.a — Interaction capture project canonicalization ---------------------------------------------------- src/atocore/interactions/service.py: record_interaction now canonicalizes project before storing, so interaction.project is always the canonical id regardless of what the client passed. Downstream effects: - reinforce_from_interaction queries memories by interaction.project -> previously missed memories stored under canonical id -> now consistent because interaction.project IS the canonical id - the extractor stamps candidates with interaction.project -> previously created candidates in alias buckets -> now creates candidates in the canonical bucket - list_interactions(project=alias) was already broken, now fixed by canonicalizing the filter input on the read side too Memory service applied the same fix: - src/atocore/memory/service.py: create_memory and get_memories both canonicalize project through resolve_project_name - This keeps stored memory.project consistent with the reinforcement query path P2.b — Interaction `since` filter format normalization ------------------------------------------------------ src/atocore/interactions/service.py: new _normalize_since helper. The bug: - created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by convention) so it sorts lexically and compares cleanly with the SQLite CURRENT_TIMESTAMP default - The `since` parameter was documented as ISO 8601 but compared as a raw string against the storage format - The lexically-greater 'T' separator means an ISO timestamp like '2026-04-07T12:00:00Z' is GREATER than the storage form '2026-04-07 12:00:00' for the same instant - Result: a client passing ISO `since` got an empty result for any row from the same day, even though those rows existed and were technically "after" the cutoff in real-world time The fix: - _normalize_since accepts ISO 8601 with T, optional Z suffix, optional fractional seconds, optional +HH:MM offsets - Uses datetime.fromisoformat for parsing (Python 3.11+) - Converts to UTC and reformats as the storage format before the SQL comparison - The bare storage format still works (backwards compat path is a regex match that returns the input unchanged) - Unparseable input is returned as-is so the comparison degrades gracefully (rows just don't match) instead of raising and breaking the listing endpoint builder.py refactor ------------------- The previous P1 fix had inline canonicalization. Now it uses the shared helper for consistency: - import changed from get_registered_project to resolve_project_name - the inline lookup is replaced with a single helper call - the comment block now points at representation-authority.md for the canonicalization contract New shared test fixture: tests/conftest.py::project_registry ------------------------------------------------------------ - Standardizes the registry-setup pattern that was duplicated across test_context_builder.py, test_project_state.py, test_interactions.py, and test_reinforcement.py - Returns a callable that takes (project_id, [aliases]) tuples and writes them into a temp registry file with the env var pointed at it and config.settings reloaded - Used by all 12 new regression tests in this commit Tests (12 new, all green on first run) -------------------------------------- test_project_state.py: - test_set_state_canonicalizes_alias: write via alias, read via every alias and the canonical id, verify same row id - test_get_state_canonicalizes_alias_after_canonical_write - test_invalidate_state_canonicalizes_alias - test_unregistered_project_state_still_works (backwards compat) test_interactions.py: - test_record_interaction_canonicalizes_project - test_list_interactions_canonicalizes_project_filter - test_list_interactions_since_accepts_iso_with_t_separator - test_list_interactions_since_accepts_z_suffix - test_list_interactions_since_accepts_offset - test_list_interactions_since_storage_format_still_works test_reinforcement.py: - test_reinforcement_works_when_capture_uses_alias (end-to-end: capture under alias, seed memory under canonical, verify reinforcement matches) - test_get_memories_filter_by_alias Full suite: 174 passing (was 162), 1 warning. The +12 is the new regression tests, no existing tests regressed. What's still NOT canonicalized (and why) ---------------------------------------- - _rank_chunks's secondary substring boost in builder.py — the retriever already does the right thing via its own _project_match_boost which calls get_registered_project. The redundant secondary boost still uses the raw hint but it's a multiplicative factor on top of correct retrieval, not a filter, so it can't drop relevant chunks. Tracked as a future cleanup but not a P1. - update_memory's project field (you can't change a memory's project after creation in the API anyway). - The retriever's project_hint parameter on direct /query calls — same reasoning as the builder boost, plus the retriever's own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
project_name = resolve_project_name(project_name)
project_id = ensure_project(project_name)
entry_id = str(uuid.uuid4())
now = datetime.now(timezone.utc).isoformat()
with get_connection() as conn:
# Check if entry exists
existing = conn.execute(
"SELECT id FROM project_state WHERE project_id = ? AND category = ? AND key = ?",
(project_id, category, key),
).fetchone()
if existing:
entry_id = existing["id"]
conn.execute(
"UPDATE project_state SET value = ?, source = ?, confidence = ?, "
"status = 'active', updated_at = CURRENT_TIMESTAMP "
"WHERE id = ?",
(value, source, confidence, entry_id),
)
log.info("project_state_updated", project=project_name, category=category, key=key)
else:
conn.execute(
"INSERT INTO project_state (id, project_id, category, key, value, source, confidence) "
"VALUES (?, ?, ?, ?, ?, ?, ?)",
(entry_id, project_id, category, key, value, source, confidence),
)
log.info("project_state_created", project=project_name, category=category, key=key)
return ProjectStateEntry(
id=entry_id,
project_id=project_id,
category=category,
key=key,
value=value,
source=source,
confidence=confidence,
status="active",
created_at=now,
updated_at=now,
)
def get_state(
project_name: str,
category: str | None = None,
active_only: bool = True,
) -> list[ProjectStateEntry]:
fix(P1+P2): canonicalize project names at every trust boundary Three findings from codex's review of the previous P1+P2 fix. The earlier commit (f2372ef) only fixed alias resolution at the context builder. Codex correctly pointed out that the same fragmentation applies at every other place a project name crosses a boundary — project_state writes/reads, interaction capture/listing/filtering, memory create/queries, and reinforcement's downstream queries. Plus a real bug in the interaction `since` filter where the storage format and the documented ISO format don't compare cleanly. The fix is one helper used at every boundary instead of duplicating the resolution inline. New helper: src/atocore/projects/registry.py::resolve_project_name --------------------------------------------------------------- - Single canonicalization boundary for project names - Returns the canonical project_id when the input matches any registered id or alias - Returns the input unchanged for empty/None and for unregistered names (preserves backwards compat with hand-curated state that predates the registry) - Documented as the contract that every read/write at the trust boundary should pass through P1 — Trusted Project State endpoints ------------------------------------ src/atocore/context/project_state.py: set_state, get_state, and invalidate_state now all canonicalize project_name through resolve_project_name BEFORE looking up or creating the project row. Before this fix: - POST /project/state with project="p05" called ensure_project("p05") which created a separate row in the projects table - The state row was attached to that alias project_id - Later context builds canonicalized "p05" -> "p05-interferometer" via the builder fix from f2372ef and never found the state - Result: trusted state silently fragmented across alias rows After this fix: - The alias is resolved to the canonical id at every entry point - Two captures (one via "p05", one via "p05-interferometer") write to the same row - get_state via either alias or the canonical id finds the same row Fixes the highest-priority gap codex flagged because Trusted Project State is supposed to be the most dependable layer in the AtoCore trust hierarchy. P2.a — Interaction capture project canonicalization ---------------------------------------------------- src/atocore/interactions/service.py: record_interaction now canonicalizes project before storing, so interaction.project is always the canonical id regardless of what the client passed. Downstream effects: - reinforce_from_interaction queries memories by interaction.project -> previously missed memories stored under canonical id -> now consistent because interaction.project IS the canonical id - the extractor stamps candidates with interaction.project -> previously created candidates in alias buckets -> now creates candidates in the canonical bucket - list_interactions(project=alias) was already broken, now fixed by canonicalizing the filter input on the read side too Memory service applied the same fix: - src/atocore/memory/service.py: create_memory and get_memories both canonicalize project through resolve_project_name - This keeps stored memory.project consistent with the reinforcement query path P2.b — Interaction `since` filter format normalization ------------------------------------------------------ src/atocore/interactions/service.py: new _normalize_since helper. The bug: - created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by convention) so it sorts lexically and compares cleanly with the SQLite CURRENT_TIMESTAMP default - The `since` parameter was documented as ISO 8601 but compared as a raw string against the storage format - The lexically-greater 'T' separator means an ISO timestamp like '2026-04-07T12:00:00Z' is GREATER than the storage form '2026-04-07 12:00:00' for the same instant - Result: a client passing ISO `since` got an empty result for any row from the same day, even though those rows existed and were technically "after" the cutoff in real-world time The fix: - _normalize_since accepts ISO 8601 with T, optional Z suffix, optional fractional seconds, optional +HH:MM offsets - Uses datetime.fromisoformat for parsing (Python 3.11+) - Converts to UTC and reformats as the storage format before the SQL comparison - The bare storage format still works (backwards compat path is a regex match that returns the input unchanged) - Unparseable input is returned as-is so the comparison degrades gracefully (rows just don't match) instead of raising and breaking the listing endpoint builder.py refactor ------------------- The previous P1 fix had inline canonicalization. Now it uses the shared helper for consistency: - import changed from get_registered_project to resolve_project_name - the inline lookup is replaced with a single helper call - the comment block now points at representation-authority.md for the canonicalization contract New shared test fixture: tests/conftest.py::project_registry ------------------------------------------------------------ - Standardizes the registry-setup pattern that was duplicated across test_context_builder.py, test_project_state.py, test_interactions.py, and test_reinforcement.py - Returns a callable that takes (project_id, [aliases]) tuples and writes them into a temp registry file with the env var pointed at it and config.settings reloaded - Used by all 12 new regression tests in this commit Tests (12 new, all green on first run) -------------------------------------- test_project_state.py: - test_set_state_canonicalizes_alias: write via alias, read via every alias and the canonical id, verify same row id - test_get_state_canonicalizes_alias_after_canonical_write - test_invalidate_state_canonicalizes_alias - test_unregistered_project_state_still_works (backwards compat) test_interactions.py: - test_record_interaction_canonicalizes_project - test_list_interactions_canonicalizes_project_filter - test_list_interactions_since_accepts_iso_with_t_separator - test_list_interactions_since_accepts_z_suffix - test_list_interactions_since_accepts_offset - test_list_interactions_since_storage_format_still_works test_reinforcement.py: - test_reinforcement_works_when_capture_uses_alias (end-to-end: capture under alias, seed memory under canonical, verify reinforcement matches) - test_get_memories_filter_by_alias Full suite: 174 passing (was 162), 1 warning. The +12 is the new regression tests, no existing tests regressed. What's still NOT canonicalized (and why) ---------------------------------------- - _rank_chunks's secondary substring boost in builder.py — the retriever already does the right thing via its own _project_match_boost which calls get_registered_project. The redundant secondary boost still uses the raw hint but it's a multiplicative factor on top of correct retrieval, not a filter, so it can't drop relevant chunks. Tracked as a future cleanup but not a P1. - update_memory's project field (you can't change a memory's project after creation in the API anyway). - The retriever's project_hint parameter on direct /query calls — same reasoning as the builder boost, plus the retriever's own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
"""Get project state entries, optionally filtered by category.
The lookup is canonicalized through the registry so an alias hint
finds the same rows as the canonical id.
"""
project_name = resolve_project_name(project_name)
with get_connection() as conn:
project = conn.execute(
"SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,)
).fetchone()
if not project:
return []
query = "SELECT * FROM project_state WHERE project_id = ?"
params: list = [project["id"]]
if category:
query += " AND category = ?"
params.append(category)
if active_only:
query += " AND status = 'active'"
query += " ORDER BY category, key"
rows = conn.execute(query, params).fetchall()
return [
ProjectStateEntry(
id=r["id"],
project_id=r["project_id"],
category=r["category"],
key=r["key"],
value=r["value"],
source=r["source"],
confidence=r["confidence"],
status=r["status"],
created_at=r["created_at"],
updated_at=r["updated_at"],
)
for r in rows
]
def invalidate_state(project_name: str, category: str, key: str) -> bool:
fix(P1+P2): canonicalize project names at every trust boundary Three findings from codex's review of the previous P1+P2 fix. The earlier commit (f2372ef) only fixed alias resolution at the context builder. Codex correctly pointed out that the same fragmentation applies at every other place a project name crosses a boundary — project_state writes/reads, interaction capture/listing/filtering, memory create/queries, and reinforcement's downstream queries. Plus a real bug in the interaction `since` filter where the storage format and the documented ISO format don't compare cleanly. The fix is one helper used at every boundary instead of duplicating the resolution inline. New helper: src/atocore/projects/registry.py::resolve_project_name --------------------------------------------------------------- - Single canonicalization boundary for project names - Returns the canonical project_id when the input matches any registered id or alias - Returns the input unchanged for empty/None and for unregistered names (preserves backwards compat with hand-curated state that predates the registry) - Documented as the contract that every read/write at the trust boundary should pass through P1 — Trusted Project State endpoints ------------------------------------ src/atocore/context/project_state.py: set_state, get_state, and invalidate_state now all canonicalize project_name through resolve_project_name BEFORE looking up or creating the project row. Before this fix: - POST /project/state with project="p05" called ensure_project("p05") which created a separate row in the projects table - The state row was attached to that alias project_id - Later context builds canonicalized "p05" -> "p05-interferometer" via the builder fix from f2372ef and never found the state - Result: trusted state silently fragmented across alias rows After this fix: - The alias is resolved to the canonical id at every entry point - Two captures (one via "p05", one via "p05-interferometer") write to the same row - get_state via either alias or the canonical id finds the same row Fixes the highest-priority gap codex flagged because Trusted Project State is supposed to be the most dependable layer in the AtoCore trust hierarchy. P2.a — Interaction capture project canonicalization ---------------------------------------------------- src/atocore/interactions/service.py: record_interaction now canonicalizes project before storing, so interaction.project is always the canonical id regardless of what the client passed. Downstream effects: - reinforce_from_interaction queries memories by interaction.project -> previously missed memories stored under canonical id -> now consistent because interaction.project IS the canonical id - the extractor stamps candidates with interaction.project -> previously created candidates in alias buckets -> now creates candidates in the canonical bucket - list_interactions(project=alias) was already broken, now fixed by canonicalizing the filter input on the read side too Memory service applied the same fix: - src/atocore/memory/service.py: create_memory and get_memories both canonicalize project through resolve_project_name - This keeps stored memory.project consistent with the reinforcement query path P2.b — Interaction `since` filter format normalization ------------------------------------------------------ src/atocore/interactions/service.py: new _normalize_since helper. The bug: - created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by convention) so it sorts lexically and compares cleanly with the SQLite CURRENT_TIMESTAMP default - The `since` parameter was documented as ISO 8601 but compared as a raw string against the storage format - The lexically-greater 'T' separator means an ISO timestamp like '2026-04-07T12:00:00Z' is GREATER than the storage form '2026-04-07 12:00:00' for the same instant - Result: a client passing ISO `since` got an empty result for any row from the same day, even though those rows existed and were technically "after" the cutoff in real-world time The fix: - _normalize_since accepts ISO 8601 with T, optional Z suffix, optional fractional seconds, optional +HH:MM offsets - Uses datetime.fromisoformat for parsing (Python 3.11+) - Converts to UTC and reformats as the storage format before the SQL comparison - The bare storage format still works (backwards compat path is a regex match that returns the input unchanged) - Unparseable input is returned as-is so the comparison degrades gracefully (rows just don't match) instead of raising and breaking the listing endpoint builder.py refactor ------------------- The previous P1 fix had inline canonicalization. Now it uses the shared helper for consistency: - import changed from get_registered_project to resolve_project_name - the inline lookup is replaced with a single helper call - the comment block now points at representation-authority.md for the canonicalization contract New shared test fixture: tests/conftest.py::project_registry ------------------------------------------------------------ - Standardizes the registry-setup pattern that was duplicated across test_context_builder.py, test_project_state.py, test_interactions.py, and test_reinforcement.py - Returns a callable that takes (project_id, [aliases]) tuples and writes them into a temp registry file with the env var pointed at it and config.settings reloaded - Used by all 12 new regression tests in this commit Tests (12 new, all green on first run) -------------------------------------- test_project_state.py: - test_set_state_canonicalizes_alias: write via alias, read via every alias and the canonical id, verify same row id - test_get_state_canonicalizes_alias_after_canonical_write - test_invalidate_state_canonicalizes_alias - test_unregistered_project_state_still_works (backwards compat) test_interactions.py: - test_record_interaction_canonicalizes_project - test_list_interactions_canonicalizes_project_filter - test_list_interactions_since_accepts_iso_with_t_separator - test_list_interactions_since_accepts_z_suffix - test_list_interactions_since_accepts_offset - test_list_interactions_since_storage_format_still_works test_reinforcement.py: - test_reinforcement_works_when_capture_uses_alias (end-to-end: capture under alias, seed memory under canonical, verify reinforcement matches) - test_get_memories_filter_by_alias Full suite: 174 passing (was 162), 1 warning. The +12 is the new regression tests, no existing tests regressed. What's still NOT canonicalized (and why) ---------------------------------------- - _rank_chunks's secondary substring boost in builder.py — the retriever already does the right thing via its own _project_match_boost which calls get_registered_project. The redundant secondary boost still uses the raw hint but it's a multiplicative factor on top of correct retrieval, not a filter, so it can't drop relevant chunks. Tracked as a future cleanup but not a P1. - update_memory's project field (you can't change a memory's project after creation in the API anyway). - The retriever's project_hint parameter on direct /query calls — same reasoning as the builder boost, plus the retriever's own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
"""Mark a project state entry as superseded.
The lookup is canonicalized through the registry so an alias is
treated as the canonical project for the invalidation lookup.
"""
project_name = resolve_project_name(project_name)
with get_connection() as conn:
project = conn.execute(
"SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,)
).fetchone()
if not project:
return False
result = conn.execute(
"UPDATE project_state SET status = 'superseded', updated_at = CURRENT_TIMESTAMP "
"WHERE project_id = ? AND category = ? AND key = ? AND status = 'active'",
(project["id"], category, key),
)
if result.rowcount > 0:
log.info("project_state_invalidated", project=project_name, category=category, key=key)
return True
return False
def format_project_state(entries: list[ProjectStateEntry]) -> str:
"""Format project state entries for context injection."""
if not entries:
return ""
lines = ["--- Trusted Project State ---"]
current_category = ""
for entry in entries:
if entry.category != current_category:
current_category = entry.category
lines.append(f"\n[{current_category.upper()}]")
lines.append(f" {entry.key}: {entry.value}")
if entry.source:
lines.append(f" (source: {entry.source})")
lines.append("\n--- End Project State ---")
return "\n".join(lines)
def _validate_confidence(confidence: float) -> None:
if not 0.0 <= confidence <= 1.0:
raise ValueError("Confidence must be between 0.0 and 1.0")