docs+test: clarify legacy alias compatibility gap, add gap regression test

Codex caught a real documentation accuracy bug in the previous canonicalization doc commit (f521aab). The doc claimed that rows written under aliases before fb6298a "still work via the unregistered-name fallback path" — that is wrong for REGISTERED aliases, which is exactly the case that matters. The unregistered-name fallback only saves you when the project was never in the registry: a row stored under "orphan-project" is read back via "orphan-project", both pass through resolve_project_name unchanged, and the strings line up. For a registered alias like "p05", the helper rewrites the read key to "p05-interferometer" but does NOT rewrite the storage key, so the legacy row becomes silently invisible. This commit corrects the doc and locks the gap behavior in with a regression test, so the issue cannot be lost again. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - Removed the misleading claim from the "What this rule does NOT cover" section. Replaced with a pointer to the new gap section and an explicit statement that the migration is required before engineering V1 ships. - New "Compatibility gap: legacy alias-keyed rows" section between "Why this is the trust hierarchy in action" and "The rule for new entry points". This is the natural insertion point because the gap is exactly the trust hierarchy failing for legacy data. The section covers: * a worked T0/T1 timeline showing the exact failure mode * what is at risk on the live Dalidou DB, ranked by trust tier: projects table (shadow rows), project_state (highest risk because Layer 3 is most-authoritative), memories, interactions * inspection SQL queries for measuring the actual blast radius on the live DB before running any migration * the spec for the migration script: walk projects, find shadow rows, merge dependent state via the conflict model when there are collisions, dry-run mode, idempotent * explicit statement that this is required pre-V1 because V1 will add new project-keyed tables and the killer correctness queries from engineering-query-catalog.md would report wrong results against any project that has shadow rows - "Open follow-ups" item 1 promoted from "tracked optional" to "REQUIRED before engineering V1 ships, NOT optional" with a more honest cost estimate (~150 LOC migration + ~50 LOC tests + supervised live run, not the previous optimistic ~30 LOC) - TL;DR rewritten to mention the gap explicitly and re-order the open follow-ups so the migration is the top priority tests/test_project_state.py --------------------------- - New test_legacy_alias_keyed_state_is_invisible_until_migrated - Inserts a "p05" project row + a project_state row pointing at it via raw SQL (bypassing set_state which now canonicalizes), simulating a pre-fix legacy row - Verifies the canonicalized get_state path can NOT see the row via either the alias or the canonical id — this is the bug - Verifies the row is still in the database (just unreachable), so the migration script has something to find - The docstring explicitly says: "When the legacy alias migration script lands, this test must be inverted." Future readers will know exactly when and how to update it. Full suite: 175 passing (was 174), 1 warning. The +1 is the new gap regression test. What this commit does NOT do ---------------------------- - The migration script itself is NOT in this commit. Codex's finding was a doc accuracy issue, and the right scope is fix the doc + lock the gap behavior in. Writing the migration is the next concrete step but is bigger (~200 LOC + dry-run mode + collision handling via the conflict model + supervised run on the live Dalidou DB), warrants its own commit, and probably warrants a "draft + review the dry-run output before applying" workflow rather than a single shot. - Existing tests are unchanged. The new test stands alone as a documented gap; the 12 canonicalization tests from fb6298a still pass without modification.
2026-04-07 20:14:19 -04:00
parent f521aab97b
commit 1953e559f9
2 changed files with 216 additions and 17 deletions
--- a/tests/test_project_state.py
+++ b/tests/test_project_state.py
@@ -196,3 +196,74 @@ def test_unregistered_project_state_still_works(project_registry):
    entries = get_state("orphan-project")
    assert len(entries) == 1
    assert entries[0].value == "Standalone"
+
+
+def test_legacy_alias_keyed_state_is_invisible_until_migrated(project_registry):
+    """Documents the compatibility gap from project-identity-canonicalization.md.
+
+    Rows that were written under a registered alias BEFORE the
+    canonicalization landed in fb6298a are stored in the projects
+    table under the alias name (not the canonical id). Every read
+    path now canonicalizes to the canonical id, so those legacy
+    rows become invisible.
+
+    This test simulates the legacy state by inserting a shadow
+    project row and a state row that points at it via raw SQL,
+    bypassing set_state() which now canonicalizes. Then it
+    verifies the canonicalized get_state() does NOT find the
+    legacy row.
+
+    When the legacy alias migration script lands (see the open
+    follow-ups in docs/architecture/project-identity-canonicalization.md),
+    this test must be inverted: after running the migration the
+    legacy state should be reachable via the canonical project,
+    not invisible. The migration is required before engineering
+    V1 ships.
+    """
+    import uuid
+
+    from atocore.models.database import get_connection
+
+    project_registry(("p05-interferometer", ["p05", "interferometer"]))
+
+    # Simulate a pre-fix legacy row by writing directly under the
+    # alias name. This is what the OLD set_state would have done
+    # before fb6298a added canonicalization.
+    legacy_project_id = str(uuid.uuid4())
+    legacy_state_id = str(uuid.uuid4())
+    with get_connection() as conn:
+        conn.execute(
+            "INSERT INTO projects (id, name, description) VALUES (?, ?, ?)",
+            (legacy_project_id, "p05", "shadow row created before canonicalization"),
+        )
+        conn.execute(
+            "INSERT INTO project_state "
+            "(id, project_id, category, key, value, source, confidence) "
+            "VALUES (?, ?, ?, ?, ?, ?, ?)",
+            (
+                legacy_state_id,
+                legacy_project_id,
+                "status",
+                "legacy_focus",
+                "Wave 1 ingestion",
+                "pre-canonicalization",
+                1.0,
+            ),
+        )
+
+    # The canonicalized read path looks under "p05-interferometer"
+    # and cannot see the legacy row. THIS IS THE GAP.
+    via_alias = get_state("p05")
+    via_canonical = get_state("p05-interferometer")
+    assert all(entry.value != "Wave 1 ingestion" for entry in via_alias)
+    assert all(entry.value != "Wave 1 ingestion" for entry in via_canonical)
+
+    # The legacy row is still in the database — it's just unreachable
+    # from the canonicalized read path. The migration script (open
+    # follow-up) is what closes the gap.
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT value FROM project_state WHERE id = ?", (legacy_state_id,)
+        ).fetchone()
+    assert row is not None
+    assert row["value"] == "Wave 1 ingestion"