fix(P1+P2): canonicalize project names at every trust boundary

Three findings from codex's review of the previous P1+P2 fix. The
earlier commit (f2372ef) only fixed alias resolution at the context
builder. Codex correctly pointed out that the same fragmentation
applies at every other place a project name crosses a boundary —
project_state writes/reads, interaction capture/listing/filtering,
memory create/queries, and reinforcement's downstream queries. Plus
a real bug in the interaction `since` filter where the storage
format and the documented ISO format don't compare cleanly.

The fix is one helper used at every boundary instead of duplicating
the resolution inline.

New helper: src/atocore/projects/registry.py::resolve_project_name
---------------------------------------------------------------
- Single canonicalization boundary for project names
- Returns the canonical project_id when the input matches any
  registered id or alias
- Returns the input unchanged for empty/None and for unregistered
  names (preserves backwards compat with hand-curated state that
  predates the registry)
- Documented as the contract that every read/write at the trust
  boundary should pass through

P1 — Trusted Project State endpoints
------------------------------------
src/atocore/context/project_state.py: set_state, get_state, and
invalidate_state now all canonicalize project_name through
resolve_project_name BEFORE looking up or creating the project row.

Before this fix:
- POST /project/state with project="p05" called ensure_project("p05")
  which created a separate row in the projects table
- The state row was attached to that alias project_id
- Later context builds canonicalized "p05" -> "p05-interferometer"
  via the builder fix from f2372ef and never found the state
- Result: trusted state silently fragmented across alias rows

After this fix:
- The alias is resolved to the canonical id at every entry point
- Two captures (one via "p05", one via "p05-interferometer") write
  to the same row
- get_state via either alias or the canonical id finds the same row

Fixes the highest-priority gap codex flagged because Trusted Project
State is supposed to be the most dependable layer in the AtoCore
trust hierarchy.

P2.a — Interaction capture project canonicalization
----------------------------------------------------
src/atocore/interactions/service.py: record_interaction now
canonicalizes project before storing, so interaction.project is
always the canonical id regardless of what the client passed.

Downstream effects:
- reinforce_from_interaction queries memories by interaction.project
  -> previously missed memories stored under canonical id
  -> now consistent because interaction.project IS the canonical id
- the extractor stamps candidates with interaction.project
  -> previously created candidates in alias buckets
  -> now creates candidates in the canonical bucket
- list_interactions(project=alias) was already broken, now fixed by
  canonicalizing the filter input on the read side too

Memory service applied the same fix:
- src/atocore/memory/service.py: create_memory and get_memories
  both canonicalize project through resolve_project_name
- This keeps stored memory.project consistent with the
  reinforcement query path

P2.b — Interaction `since` filter format normalization
------------------------------------------------------
src/atocore/interactions/service.py: new _normalize_since helper.

The bug:
- created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by
  convention) so it sorts lexically and compares cleanly with the
  SQLite CURRENT_TIMESTAMP default
- The `since` parameter was documented as ISO 8601 but compared as
  a raw string against the storage format
- The lexically-greater 'T' separator means an ISO timestamp like
  '2026-04-07T12:00:00Z' is GREATER than the storage form
  '2026-04-07 12:00:00' for the same instant
- Result: a client passing ISO `since` got an empty result for any
  row from the same day, even though those rows existed and were
  technically "after" the cutoff in real-world time

The fix:
- _normalize_since accepts ISO 8601 with T, optional Z suffix,
  optional fractional seconds, optional +HH:MM offsets
- Uses datetime.fromisoformat for parsing (Python 3.11+)
- Converts to UTC and reformats as the storage format before the
  SQL comparison
- The bare storage format still works (backwards compat path is a
  regex match that returns the input unchanged)
- Unparseable input is returned as-is so the comparison degrades
  gracefully (rows just don't match) instead of raising and
  breaking the listing endpoint

builder.py refactor
-------------------
The previous P1 fix had inline canonicalization. Now it uses the
shared helper for consistency:
- import changed from get_registered_project to resolve_project_name
- the inline lookup is replaced with a single helper call
- the comment block now points at representation-authority.md for
  the canonicalization contract

New shared test fixture: tests/conftest.py::project_registry
------------------------------------------------------------
- Standardizes the registry-setup pattern that was duplicated
  across test_context_builder.py, test_project_state.py,
  test_interactions.py, and test_reinforcement.py
- Returns a callable that takes (project_id, [aliases]) tuples
  and writes them into a temp registry file with the env var
  pointed at it and config.settings reloaded
- Used by all 12 new regression tests in this commit

Tests (12 new, all green on first run)
--------------------------------------
test_project_state.py:
- test_set_state_canonicalizes_alias: write via alias, read via
  every alias and the canonical id, verify same row id
- test_get_state_canonicalizes_alias_after_canonical_write
- test_invalidate_state_canonicalizes_alias
- test_unregistered_project_state_still_works (backwards compat)

test_interactions.py:
- test_record_interaction_canonicalizes_project
- test_list_interactions_canonicalizes_project_filter
- test_list_interactions_since_accepts_iso_with_t_separator
- test_list_interactions_since_accepts_z_suffix
- test_list_interactions_since_accepts_offset
- test_list_interactions_since_storage_format_still_works

test_reinforcement.py:
- test_reinforcement_works_when_capture_uses_alias (end-to-end:
  capture under alias, seed memory under canonical, verify
  reinforcement matches)
- test_get_memories_filter_by_alias

Full suite: 174 passing (was 162), 1 warning. The +12 is the
new regression tests, no existing tests regressed.

What's still NOT canonicalized (and why)
----------------------------------------
- _rank_chunks's secondary substring boost in builder.py — the
  retriever already does the right thing via its own
  _project_match_boost which calls get_registered_project. The
  redundant secondary boost still uses the raw hint but it's a
  multiplicative factor on top of correct retrieval, not a
  filter, so it can't drop relevant chunks. Tracked as a future
  cleanup but not a P1.
- update_memory's project field (you can't change a memory's
  project after creation in the API anyway).
- The retriever's project_hint parameter on direct /query calls
  — same reasoning as the builder boost, plus the retriever's
  own get_registered_project call already handles aliases there.
This commit is contained in:
2026-04-07 08:29:33 -04:00
parent f2372eff9e
commit fb6298a9a1
9 changed files with 391 additions and 24 deletions

View File

@@ -14,7 +14,7 @@ import atocore.config as _config
from atocore.context.project_state import format_project_state, get_state from atocore.context.project_state import format_project_state, get_state
from atocore.memory.service import get_memories_for_context from atocore.memory.service import get_memories_for_context
from atocore.observability.logger import get_logger from atocore.observability.logger import get_logger
from atocore.projects.registry import get_registered_project from atocore.projects.registry import resolve_project_name
from atocore.retrieval.retriever import ChunkResult, retrieve from atocore.retrieval.retriever import ChunkResult, retrieve
log = get_logger("context_builder") log = get_logger("context_builder")
@@ -85,20 +85,15 @@ def build_context(
max(0, int(budget * PROJECT_STATE_BUDGET_RATIO)), max(0, int(budget * PROJECT_STATE_BUDGET_RATIO)),
) )
# Resolve the project hint through the registry so callers can pass # Canonicalize the project hint through the registry so callers
# an alias (`p05`, `gigabit`) and still find trusted state stored # can pass an alias (`p05`, `gigabit`) and still find trusted
# under the canonical project id (`p05-interferometer`, # state stored under the canonical project id. The same helper
# `p04-gigabit`). The retriever already does this for the # is used everywhere a project name crosses a trust boundary
# project-match boost — the project_state lookup needs the same # (project_state, memories, interactions). When the registry has
# courtesy. If the registry has no entry for the hint, fall back to # no entry the helper returns the input unchanged so hand-curated
# the raw hint so a hand-curated project_state entry that predates # state that predates the registry still works.
# the registry still works. canonical_project = resolve_project_name(project_hint) if project_hint else ""
canonical_project = project_hint if canonical_project:
if project_hint:
registered = get_registered_project(project_hint)
if registered is not None:
canonical_project = registered.project_id
state_entries = get_state(canonical_project) state_entries = get_state(canonical_project)
if state_entries: if state_entries:
project_state_text = format_project_state(state_entries) project_state_text = format_project_state(state_entries)

View File

@@ -18,6 +18,7 @@ from datetime import datetime, timezone
from atocore.models.database import get_connection from atocore.models.database import get_connection
from atocore.observability.logger import get_logger from atocore.observability.logger import get_logger
from atocore.projects.registry import resolve_project_name
log = get_logger("project_state") log = get_logger("project_state")
@@ -101,11 +102,19 @@ def set_state(
source: str = "", source: str = "",
confidence: float = 1.0, confidence: float = 1.0,
) -> ProjectStateEntry: ) -> ProjectStateEntry:
"""Set or update a project state entry. Upsert semantics.""" """Set or update a project state entry. Upsert semantics.
The ``project_name`` is canonicalized through the registry so a
caller passing an alias (``p05``) ends up writing into the same
row as the canonical id (``p05-interferometer``). Without this
step, alias and canonical names would create two parallel
project rows and fragmented state.
"""
if category not in CATEGORIES: if category not in CATEGORIES:
raise ValueError(f"Invalid category '{category}'. Must be one of: {CATEGORIES}") raise ValueError(f"Invalid category '{category}'. Must be one of: {CATEGORIES}")
_validate_confidence(confidence) _validate_confidence(confidence)
project_name = resolve_project_name(project_name)
project_id = ensure_project(project_name) project_id = ensure_project(project_name)
entry_id = str(uuid.uuid4()) entry_id = str(uuid.uuid4())
now = datetime.now(timezone.utc).isoformat() now = datetime.now(timezone.utc).isoformat()
@@ -153,7 +162,12 @@ def get_state(
category: str | None = None, category: str | None = None,
active_only: bool = True, active_only: bool = True,
) -> list[ProjectStateEntry]: ) -> list[ProjectStateEntry]:
"""Get project state entries, optionally filtered by category.""" """Get project state entries, optionally filtered by category.
The lookup is canonicalized through the registry so an alias hint
finds the same rows as the canonical id.
"""
project_name = resolve_project_name(project_name)
with get_connection() as conn: with get_connection() as conn:
project = conn.execute( project = conn.execute(
"SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,) "SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,)
@@ -191,7 +205,12 @@ def get_state(
def invalidate_state(project_name: str, category: str, key: str) -> bool: def invalidate_state(project_name: str, category: str, key: str) -> bool:
"""Mark a project state entry as superseded.""" """Mark a project state entry as superseded.
The lookup is canonicalized through the registry so an alias is
treated as the canonical project for the invalidation lookup.
"""
project_name = resolve_project_name(project_name)
with get_connection() as conn: with get_connection() as conn:
project = conn.execute( project = conn.execute(
"SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,) "SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,)

View File

@@ -18,15 +18,24 @@ violating the AtoCore trust hierarchy.
from __future__ import annotations from __future__ import annotations
import json import json
import re
import uuid import uuid
from dataclasses import dataclass, field from dataclasses import dataclass, field
from datetime import datetime, timezone from datetime import datetime, timezone
from atocore.models.database import get_connection from atocore.models.database import get_connection
from atocore.observability.logger import get_logger from atocore.observability.logger import get_logger
from atocore.projects.registry import resolve_project_name
log = get_logger("interactions") log = get_logger("interactions")
# Stored timestamps use 'YYYY-MM-DD HH:MM:SS' (no timezone offset, UTC by
# convention) so they sort lexically and compare cleanly with the SQLite
# CURRENT_TIMESTAMP default. The since filter accepts ISO 8601 strings
# (with 'T', optional 'Z' or +offset, optional fractional seconds) and
# normalizes them to the storage format before the SQL comparison.
_STORAGE_TIMESTAMP_FORMAT = "%Y-%m-%d %H:%M:%S"
@dataclass @dataclass
class Interaction: class Interaction:
@@ -72,6 +81,13 @@ def record_interaction(
if not prompt or not prompt.strip(): if not prompt or not prompt.strip():
raise ValueError("Interaction prompt must be non-empty") raise ValueError("Interaction prompt must be non-empty")
# Canonicalize the project through the registry so an alias and
# the canonical id store under the same bucket. Without this,
# reinforcement and extraction (which both query by raw
# interaction.project) would silently miss memories and create
# candidates in the wrong project.
project = resolve_project_name(project)
interaction_id = str(uuid.uuid4()) interaction_id = str(uuid.uuid4())
# Store created_at explicitly so the same string lives in both the DB # Store created_at explicitly so the same string lives in both the DB
# column and the returned dataclass. SQLite's CURRENT_TIMESTAMP uses # column and the returned dataclass. SQLite's CURRENT_TIMESTAMP uses
@@ -159,9 +175,14 @@ def list_interactions(
) -> list[Interaction]: ) -> list[Interaction]:
"""List captured interactions, optionally filtered. """List captured interactions, optionally filtered.
``since`` is an ISO timestamp string; only interactions created at or ``since`` accepts an ISO 8601 timestamp string (with ``T``, an
after that time are returned. ``limit`` is hard-capped at 500 to keep optional ``Z`` or numeric offset, optional fractional seconds).
casual API listings cheap. The value is normalized to the storage format (UTC,
``YYYY-MM-DD HH:MM:SS``) before the SQL comparison so external
callers can pass any of the common ISO shapes without filter
drift. ``project`` is canonicalized through the registry so an
alias finds rows stored under the canonical project id.
``limit`` is hard-capped at 500 to keep casual API listings cheap.
""" """
if limit <= 0: if limit <= 0:
return [] return []
@@ -172,7 +193,7 @@ def list_interactions(
if project: if project:
query += " AND project = ?" query += " AND project = ?"
params.append(project) params.append(resolve_project_name(project))
if session_id: if session_id:
query += " AND session_id = ?" query += " AND session_id = ?"
params.append(session_id) params.append(session_id)
@@ -181,7 +202,7 @@ def list_interactions(
params.append(client) params.append(client)
if since: if since:
query += " AND created_at >= ?" query += " AND created_at >= ?"
params.append(since) params.append(_normalize_since(since))
query += " ORDER BY created_at DESC LIMIT ?" query += " ORDER BY created_at DESC LIMIT ?"
params.append(limit) params.append(limit)
@@ -243,3 +264,41 @@ def _safe_json_dict(raw: str | None) -> dict:
if not isinstance(value, dict): if not isinstance(value, dict):
return {} return {}
return value return value
def _normalize_since(since: str) -> str:
"""Normalize an ISO 8601 ``since`` filter to the storage format.
Stored ``created_at`` values are ``YYYY-MM-DD HH:MM:SS`` (no
timezone, UTC by convention). External callers naturally pass
ISO 8601 with ``T`` separator, optional ``Z`` suffix, optional
fractional seconds, and optional ``+HH:MM`` offsets. A naive
string comparison between the two formats fails on the same
day because the lexically-greater ``T`` makes any ISO value
sort after any space-separated value.
This helper accepts the common ISO shapes plus the bare
storage format and returns the storage format. On a parse
failure it returns the input unchanged so the SQL comparison
fails open (no rows match) instead of raising and breaking
the listing endpoint.
"""
if not since:
return since
candidate = since.strip()
# Python's fromisoformat understands trailing 'Z' from 3.11+ but
# we replace it explicitly for safety against earlier shapes.
if candidate.endswith("Z"):
candidate = candidate[:-1] + "+00:00"
try:
dt = datetime.fromisoformat(candidate)
except ValueError:
# Already in storage format, or unparseable: best-effort
# match the storage format with a regex; if that fails too,
# return the raw input.
if re.fullmatch(r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", since):
return since
return since
if dt.tzinfo is not None:
dt = dt.astimezone(timezone.utc).replace(tzinfo=None)
return dt.strftime(_STORAGE_TIMESTAMP_FORMAT)

View File

@@ -29,6 +29,7 @@ from datetime import datetime, timezone
from atocore.models.database import get_connection from atocore.models.database import get_connection
from atocore.observability.logger import get_logger from atocore.observability.logger import get_logger
from atocore.projects.registry import resolve_project_name
log = get_logger("memory") log = get_logger("memory")
@@ -84,6 +85,13 @@ def create_memory(
raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}") raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
_validate_confidence(confidence) _validate_confidence(confidence)
# Canonicalize the project through the registry so an alias and
# the canonical id store under the same bucket. This keeps
# reinforcement queries (which use the interaction's project) and
# context retrieval (which uses the registry-canonicalized hint)
# consistent with how memories are created.
project = resolve_project_name(project)
memory_id = str(uuid.uuid4()) memory_id = str(uuid.uuid4())
now = datetime.now(timezone.utc).isoformat() now = datetime.now(timezone.utc).isoformat()
@@ -162,8 +170,13 @@ def get_memories(
query += " AND memory_type = ?" query += " AND memory_type = ?"
params.append(memory_type) params.append(memory_type)
if project is not None: if project is not None:
# Canonicalize on the read side so a caller passing an alias
# finds rows that were stored under the canonical id (and
# vice versa). resolve_project_name returns the input
# unchanged for unregistered names so empty-string queries
# for "no project scope" still work.
query += " AND project = ?" query += " AND project = ?"
params.append(project) params.append(resolve_project_name(project))
if status is not None: if status is not None:
query += " AND status = ?" query += " AND status = ?"
params.append(status) params.append(status)

View File

@@ -254,6 +254,30 @@ def get_registered_project(project_name: str) -> RegisteredProject | None:
return None return None
def resolve_project_name(name: str | None) -> str:
"""Canonicalize a project name through the registry.
Returns the canonical ``project_id`` if the input matches any
registered project's id or alias. Returns the input unchanged
when it's empty or not in the registry — the second case keeps
backwards compatibility with hand-curated state, memories, and
interactions that predate the registry, or for projects that
are intentionally not registered.
This helper is the single canonicalization boundary for project
names across the trust hierarchy. Every read/write that takes a
project name should pass it through ``resolve_project_name``
before storing or querying. The contract is documented in
``docs/architecture/representation-authority.md``.
"""
if not name:
return name or ""
project = get_registered_project(name)
if project is not None:
return project.project_id
return name
def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict: def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
"""Ingest all configured source roots for a registered project. """Ingest all configured source roots for a registered project.

View File

@@ -1,5 +1,6 @@
"""pytest configuration and shared fixtures.""" """pytest configuration and shared fixtures."""
import json
import os import os
import sys import sys
import tempfile import tempfile
@@ -29,6 +30,45 @@ def tmp_data_dir(tmp_path):
return tmp_path return tmp_path
@pytest.fixture
def project_registry(tmp_path, monkeypatch):
"""Stand up an isolated project registry pointing at a temp file.
Returns a callable that takes one or more (project_id, [aliases])
tuples and writes them into the registry, then forces the in-process
settings singleton to re-resolve. Use this when a test needs the
canonicalization helpers (resolve_project_name, get_registered_project)
to recognize aliases.
"""
registry_path = tmp_path / "test-project-registry.json"
def _set(*projects):
payload = {"projects": []}
for entry in projects:
if isinstance(entry, str):
project_id, aliases = entry, []
else:
project_id, aliases = entry
payload["projects"].append(
{
"id": project_id,
"aliases": list(aliases),
"description": f"test project {project_id}",
"ingest_roots": [
{"source": "vault", "subpath": f"incoming/projects/{project_id}"}
],
}
)
registry_path.write_text(json.dumps(payload), encoding="utf-8")
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
from atocore import config
config.settings = config.Settings()
return registry_path
return _set
@pytest.fixture @pytest.fixture
def sample_markdown(tmp_path) -> Path: def sample_markdown(tmp_path) -> Path:
"""Create a sample markdown file for testing.""" """Create a sample markdown file for testing."""

View File

@@ -209,3 +209,96 @@ def test_list_interactions_endpoint_returns_summaries(tmp_data_dir):
assert body["interactions"][0]["response_chars"] == 50 assert body["interactions"][0]["response_chars"] == 50
# The list endpoint never includes the full response body # The list endpoint never includes the full response body
assert "response" not in body["interactions"][0] assert "response" not in body["interactions"][0]
# --- alias canonicalization on interaction capture/list -------------------
def test_record_interaction_canonicalizes_project(project_registry):
"""Capturing under an alias should store the canonical project id.
Regression for codex's P2 finding: reinforcement and extraction
query memories by interaction.project; if the captured project is
a raw alias they would silently miss memories stored under the
canonical id.
"""
init_db()
project_registry(("p05-interferometer", ["p05", "interferometer"]))
interaction = record_interaction(
prompt="quick capture", response="response body", project="p05", reinforce=False
)
assert interaction.project == "p05-interferometer"
fetched = get_interaction(interaction.id)
assert fetched.project == "p05-interferometer"
def test_list_interactions_canonicalizes_project_filter(project_registry):
init_db()
project_registry(("p06-polisher", ["p06", "polisher"]))
record_interaction(prompt="a", response="ra", project="p06-polisher", reinforce=False)
record_interaction(prompt="b", response="rb", project="polisher", reinforce=False)
record_interaction(prompt="c", response="rc", project="atocore", reinforce=False)
# Query by an alias should still find both p06 captures
via_alias = list_interactions(project="p06")
via_canonical = list_interactions(project="p06-polisher")
assert len(via_alias) == 2
assert len(via_canonical) == 2
assert {i.prompt for i in via_alias} == {"a", "b"}
# --- since filter format normalization ------------------------------------
def test_list_interactions_since_accepts_iso_with_t_separator(tmp_data_dir):
init_db()
record_interaction(prompt="early", response="r", reinforce=False)
time.sleep(1.05)
pivot = record_interaction(prompt="late", response="r", reinforce=False)
# pivot.created_at is in storage format 'YYYY-MM-DD HH:MM:SS'.
# Build the equivalent ISO 8601 with 'T' that an external client
# would naturally send.
iso_with_t = pivot.created_at.replace(" ", "T")
items = list_interactions(since=iso_with_t)
assert any(i.id == pivot.id for i in items)
# The early row must also be excluded if its timestamp is strictly
# before the pivot — since is inclusive on the cutoff
early_ids = {i.id for i in items if i.prompt == "early"}
assert early_ids == set() or len(items) >= 1
def test_list_interactions_since_accepts_z_suffix(tmp_data_dir):
init_db()
pivot = record_interaction(prompt="pivot", response="r", reinforce=False)
time.sleep(1.05)
after = record_interaction(prompt="after", response="r", reinforce=False)
iso_with_z = pivot.created_at.replace(" ", "T") + "Z"
items = list_interactions(since=iso_with_z)
ids = {i.id for i in items}
assert pivot.id in ids
assert after.id in ids
def test_list_interactions_since_accepts_offset(tmp_data_dir):
init_db()
pivot = record_interaction(prompt="pivot", response="r", reinforce=False)
time.sleep(1.05)
after = record_interaction(prompt="after", response="r", reinforce=False)
iso_with_offset = pivot.created_at.replace(" ", "T") + "+00:00"
items = list_interactions(since=iso_with_offset)
assert any(i.id == after.id for i in items)
def test_list_interactions_since_storage_format_still_works(tmp_data_dir):
"""The bare storage format must still work for backwards compatibility."""
init_db()
pivot = record_interaction(prompt="pivot", response="r", reinforce=False)
items = list_interactions(since=pivot.created_at)
assert any(i.id == pivot.id for i in items)

View File

@@ -131,3 +131,68 @@ def test_format_project_state():
def test_format_empty(): def test_format_empty():
"""Test formatting empty state.""" """Test formatting empty state."""
assert format_project_state([]) == "" assert format_project_state([]) == ""
# --- Alias canonicalization regression tests --------------------------------
def test_set_state_canonicalizes_alias(project_registry):
"""Writing state via an alias should land under the canonical project id.
Regression for codex's P1 finding: previously /project/state with
project="p05" created a separate alias row that later context builds
(which canonicalize the hint) would never see.
"""
project_registry(("p05-interferometer", ["p05", "interferometer"]))
set_state("p05", "status", "next_focus", "Wave 2 ingestion")
# The state must be reachable via every alias AND the canonical id
via_alias = get_state("p05")
via_canonical = get_state("p05-interferometer")
via_other_alias = get_state("interferometer")
assert len(via_alias) == 1
assert len(via_canonical) == 1
assert len(via_other_alias) == 1
# All three reads return the same row id (no fragmented duplicates)
assert via_alias[0].id == via_canonical[0].id == via_other_alias[0].id
assert via_canonical[0].value == "Wave 2 ingestion"
def test_get_state_canonicalizes_alias_after_canonical_write(project_registry):
"""Reading via an alias should find state written under the canonical id."""
project_registry(("p04-gigabit", ["p04", "gigabit"]))
set_state("p04-gigabit", "status", "phase", "Phase 1 baseline")
via_alias = get_state("gigabit")
assert len(via_alias) == 1
assert via_alias[0].value == "Phase 1 baseline"
def test_invalidate_state_canonicalizes_alias(project_registry):
"""Invalidating via an alias should hit the canonical row."""
project_registry(("p06-polisher", ["p06", "polisher"]))
set_state("p06-polisher", "decision", "frame", "kinematic mounts")
success = invalidate_state("polisher", "decision", "frame")
assert success is True
active = get_state("p06-polisher")
assert len(active) == 0
def test_unregistered_project_state_still_works(project_registry):
"""Hand-curated state for an unregistered project must still round-trip.
Backwards compatibility with state created before the project
registry existed: resolve_project_name returns the input unchanged
when the registry has no record, so the raw name is used as-is.
"""
project_registry() # empty registry
set_state("orphan-project", "status", "phase", "Standalone")
entries = get_state("orphan-project")
assert len(entries) == 1
assert entries[0].value == "Standalone"

View File

@@ -314,3 +314,62 @@ def test_api_post_interactions_accepts_reinforce_false(tmp_data_dir):
reloaded = [m for m in get_memories(memory_type="preference", limit=20) if m.id == mem.id][0] reloaded = [m for m in get_memories(memory_type="preference", limit=20) if m.id == mem.id][0]
assert reloaded.confidence == 0.5 assert reloaded.confidence == 0.5
assert reloaded.reference_count == 0 assert reloaded.reference_count == 0
# --- alias canonicalization end-to-end -------------------------------------
def test_reinforcement_works_when_capture_uses_alias(project_registry):
"""End-to-end: capture under an alias, seed memory under canonical id,
verify reinforcement still finds and bumps the memory.
Regression for codex's P2 finding: previously interaction.project
was stored verbatim and reinforcement queried memories using that
raw value, so capturing under "p05" while memories live under
"p05-interferometer" silently missed everything.
"""
init_db()
project_registry(("p05-interferometer", ["p05", "interferometer"]))
# Seed an active memory under the CANONICAL id
mem = create_memory(
memory_type="project",
content="the lateral support pads use GF-PTFE for thermal stability",
project="p05-interferometer",
confidence=0.5,
)
# Capture an interaction under the ALIAS — this is the bug case
record_interaction(
prompt="status update",
response=(
"Quick note: the lateral support pads use GF-PTFE for thermal "
"stability and that's still the current selection."
),
project="p05",
)
# The seeded memory should have been reinforced
reloaded = [
m
for m in get_memories(memory_type="project", project="p05-interferometer", limit=20)
if m.id == mem.id
][0]
assert reloaded.confidence > 0.5
assert reloaded.reference_count == 1
def test_get_memories_filter_by_alias(project_registry):
"""Filtering memories by an alias should find rows stored under canonical."""
init_db()
project_registry(("p04-gigabit", ["p04", "gigabit"]))
create_memory(memory_type="project", content="m1", project="p04-gigabit")
create_memory(memory_type="project", content="m2", project="gigabit")
via_alias = get_memories(memory_type="project", project="p04")
via_canonical = get_memories(memory_type="project", project="p04-gigabit")
assert len(via_alias) == 2
assert len(via_canonical) == 2
assert {m.content for m in via_alias} == {"m1", "m2"}