feat(ops): legacy alias migration script with dry-run/apply modes
Closes the compatibility gap documented in docs/architecture/project-identity-canonicalization.md. Beforefb6298a, writes to project_state, memories, and interactions stored the raw project name. Afterfb6298aevery service-layer entry point canonicalizes through the registry, which silently made pre-fix alias-keyed rows unreachable from the new read path. Now there's a migration tool to find and fix them. This commit is the tool and its tests. The tool is NOT run against the live Dalidou DB in this commit — that's a separate supervised manual step after reviewing the dry-run output. scripts/migrate_legacy_aliases.py --------------------------------- Standalone offline migration tool. Dry-run default, --apply explicit. What it inspects: - projects: rows whose name is a registered alias and differs from the canonical project_id (shadow rows) - project_state: rows whose project_id points at a shadow; plan rekeys them to the canonical row's id. (category, key) collisions against the canonical block the apply step until a human resolves - memories: rows whose project column is a registered alias. Plain string rekey. Dedup collisions (after rekey, same (memory_type, content, project, status)) are handled by the existing memory supersession model: newer row stays active, older becomes superseded with updated_at as tiebreaker - interactions: rows whose project column is a registered alias. Plain string rekey, no collision handling What it does NOT do: - Never touches rows that are already canonical - Never auto-resolves project_state collisions (refuses until the human picks a winner via POST /project/state) - Never creates data; only rekeys or supersedes - Never runs outside a single SQLite transaction; any failure rolls back the entire migration Safety rails: - Dry-run is default. --apply is explicit. - Apply on empty plan refuses unless --allow-empty (prevents accidental runs that look meaningful but did nothing) - Apply refuses on any project_state collision - Apply refuses on integrity errors (e.g. two case-variant rows both matching the canonical lookup) - Writes a JSON report to data/migrations/ on every run (dry-run and apply alike) for audit - Idempotent: running twice produces the same final state as running once. The second run finds zero shadow rows and exits clean. CLI flags: --registry PATH override ATOCORE_PROJECT_REGISTRY_PATH --db PATH override the AtoCore SQLite DB path --apply actually mutate (default is dry-run) --allow-empty permit --apply on an empty plan --report-dir PATH where to write the JSON report --json emit the plan as JSON instead of human prose Smoke test against the Phase 9 validation DB produces the expected "Nothing to migrate. The database is clean." output with 4 known canonical projects and 0 shadows. tests/test_migrate_legacy_aliases.py ------------------------------------ 19 new tests, all green: Plan-building: - test_dry_run_on_empty_registry_reports_empty_plan - test_dry_run_on_clean_registered_db_reports_empty_plan - test_dry_run_finds_shadow_project - test_dry_run_plans_state_rekey_without_collisions - test_dry_run_detects_state_collision - test_dry_run_plans_memory_rekey_and_supersession - test_dry_run_plans_interaction_rekey Apply: - test_apply_refuses_on_state_collision - test_apply_migrates_clean_shadow_end_to_end (verifies get_state can see the state via BOTH the alias AND the canonical after migration) - test_apply_drops_shadow_state_duplicate_without_collision (same (category, key, value) on both sides - mark shadow superseded, don't hit the UNIQUE constraint) - test_apply_migrates_memories - test_apply_migrates_interactions - test_apply_is_idempotent - test_apply_refuses_with_integrity_errors (uses case-variant canonical rows to work around projects.name UNIQUE constraint; verifies the case-insensitive duplicate detection works) Reporting: - test_plan_to_json_dict_is_serializable - test_write_report_creates_file - test_render_plan_text_on_empty_plan - test_render_plan_text_on_collision End-to-end gap closure (the most important test): - test_legacy_alias_gap_is_closed_after_migration - Seeds the exact same scenario as test_legacy_alias_keyed_state_is_invisible_until_migrated in test_project_state.py (which documents the pre-migration gap) - Confirms the row is invisible before migration - Runs the migration - Verifies the row is reachable via BOTH the canonical id AND the alias afterward - This test and the pre-migration gap test together lock in "before migration: invisible, after migration: reachable" as the documented invariant Full suite: 194 passing (was 175), 1 warning. The +19 is the new migration test file. Next concrete step after this commit ------------------------------------ - Run the dry-run against the live Dalidou DB to find out the actual blast radius. The script is the inspection SQL, codified. - Review the dry-run output together - If clean (zero shadows), no apply needed; close the doc gap as "verified nothing to migrate on this deployment" - If there are shadows, resolve any collisions via POST /project/state, then run --apply under supervision - After apply, the test_legacy_alias_keyed_state_is_invisible_until_migrated test still passes (it simulates the gap directly, so it's independent of the live DB state) and the gap-closed companion test continues to guard forward
This commit is contained in:
886
scripts/migrate_legacy_aliases.py
Normal file
886
scripts/migrate_legacy_aliases.py
Normal file
@@ -0,0 +1,886 @@
|
|||||||
|
"""Migrate legacy alias-keyed rows to canonical project ids.
|
||||||
|
|
||||||
|
Standalone, offline migration tool that closes the compatibility gap
|
||||||
|
described in ``docs/architecture/project-identity-canonicalization.md``.
|
||||||
|
Before ``fb6298a`` landed, writes to ``/project/state``, ``/memory``,
|
||||||
|
and ``/interactions`` stored the project name verbatim. After
|
||||||
|
``fb6298a`` every service-layer entry point canonicalizes through the
|
||||||
|
project registry, which silently makes pre-fix alias-keyed rows
|
||||||
|
unreachable from the new read path.
|
||||||
|
|
||||||
|
This script finds those shadow rows and rekeys them to the canonical
|
||||||
|
project id.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
|
||||||
|
Dry-run (default, safe, idempotent):
|
||||||
|
|
||||||
|
python scripts/migrate_legacy_aliases.py
|
||||||
|
|
||||||
|
Override the registry or DB location:
|
||||||
|
|
||||||
|
python scripts/migrate_legacy_aliases.py \
|
||||||
|
--registry /srv/storage/atocore/config/project-registry.json \
|
||||||
|
--db /srv/storage/atocore/data/db/atocore.db
|
||||||
|
|
||||||
|
Apply after reviewing dry-run output:
|
||||||
|
|
||||||
|
python scripts/migrate_legacy_aliases.py --apply
|
||||||
|
|
||||||
|
Emit the plan as JSON instead of human prose:
|
||||||
|
|
||||||
|
python scripts/migrate_legacy_aliases.py --json
|
||||||
|
|
||||||
|
Behavior
|
||||||
|
--------
|
||||||
|
|
||||||
|
Dry-run always succeeds. It walks:
|
||||||
|
|
||||||
|
1. ``projects`` — any row whose ``name`` (lowercased) matches a
|
||||||
|
registered alias and differs from the canonical project id. These
|
||||||
|
are "shadow" rows.
|
||||||
|
2. ``project_state`` — any row whose ``project_id`` points at a
|
||||||
|
shadow projects row. The plan rekeys these to the canonical
|
||||||
|
projects id. A collision (shadow and canonical both have state
|
||||||
|
under the same ``(category, key)``) blocks the apply step.
|
||||||
|
3. ``memories`` — any row whose ``project`` column is a registered
|
||||||
|
alias. The plan rekeys these to the canonical project id.
|
||||||
|
Duplicate-after-rekey cases are handled by marking the older row
|
||||||
|
``superseded`` and keeping the newer row active.
|
||||||
|
4. ``interactions`` — any row whose ``project`` column is a
|
||||||
|
registered alias. Plain string rekey, no collision handling.
|
||||||
|
|
||||||
|
Apply runs the plan inside a single SQLite transaction. Any error
|
||||||
|
rolls back the whole migration. A JSON report is written under
|
||||||
|
``data/migrations/`` (respects ``ATOCORE_DATA_DIR``).
|
||||||
|
|
||||||
|
Safety rails
|
||||||
|
------------
|
||||||
|
|
||||||
|
- Dry-run is the default; ``--apply`` is explicit and not the default.
|
||||||
|
- Apply on an empty plan is a no-op that refuses unless
|
||||||
|
``--allow-empty`` is passed (prevents accidental runs on clean DBs
|
||||||
|
that look like they did something).
|
||||||
|
- Apply refuses if the plan has any project_state collisions. The
|
||||||
|
operator must resolve the collision via the normal
|
||||||
|
``POST /project/state`` path (or by manually editing the shadow
|
||||||
|
row) before running ``--apply``.
|
||||||
|
- The script never touches rows that are already canonical. Running
|
||||||
|
``--apply`` twice produces the same final state as running it
|
||||||
|
once (idempotent).
|
||||||
|
- A pre-apply integrity check refuses to run if the DB has more than
|
||||||
|
one row for the canonical id of any alias — a situation the
|
||||||
|
migration wasn't designed to handle.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sqlite3
|
||||||
|
import sys
|
||||||
|
import uuid
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||||
|
sys.path.insert(0, str(_REPO_ROOT / "src"))
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes describing the plan
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ShadowProject:
|
||||||
|
"""A projects row whose name is a registered alias."""
|
||||||
|
|
||||||
|
shadow_row_id: str
|
||||||
|
shadow_name: str
|
||||||
|
canonical_project_id: str # the id field from the registry (the name we want)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class StateRekeyPlan:
|
||||||
|
shadow_row_id: str
|
||||||
|
canonical_row_id: str # resolved at plan time; may be created if missing
|
||||||
|
canonical_project_id: str
|
||||||
|
rows_to_rekey: list[dict] = field(default_factory=list)
|
||||||
|
collisions: list[dict] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MemoryRekeyPlan:
|
||||||
|
alias: str
|
||||||
|
canonical_project_id: str
|
||||||
|
rows_to_rekey: list[dict] = field(default_factory=list)
|
||||||
|
to_supersede: list[dict] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class InteractionRekeyPlan:
|
||||||
|
alias: str
|
||||||
|
canonical_project_id: str
|
||||||
|
rows_to_rekey: list[dict] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MigrationPlan:
|
||||||
|
alias_map: dict[str, str] = field(default_factory=dict) # lowercased alias -> canonical id
|
||||||
|
shadow_projects: list[ShadowProject] = field(default_factory=list)
|
||||||
|
state_plans: list[StateRekeyPlan] = field(default_factory=list)
|
||||||
|
memory_plans: list[MemoryRekeyPlan] = field(default_factory=list)
|
||||||
|
interaction_plans: list[InteractionRekeyPlan] = field(default_factory=list)
|
||||||
|
integrity_errors: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def has_collisions(self) -> bool:
|
||||||
|
return any(p.collisions for p in self.state_plans)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def is_empty(self) -> bool:
|
||||||
|
return (
|
||||||
|
not self.shadow_projects
|
||||||
|
and not any(p.rows_to_rekey for p in self.memory_plans)
|
||||||
|
and not any(p.rows_to_rekey for p in self.interaction_plans)
|
||||||
|
)
|
||||||
|
|
||||||
|
def counts(self) -> dict:
|
||||||
|
return {
|
||||||
|
"shadow_projects": len(self.shadow_projects),
|
||||||
|
"state_rekey_rows": sum(len(p.rows_to_rekey) for p in self.state_plans),
|
||||||
|
"state_collisions": sum(len(p.collisions) for p in self.state_plans),
|
||||||
|
"memory_rekey_rows": sum(len(p.rows_to_rekey) for p in self.memory_plans),
|
||||||
|
"memory_supersede_rows": sum(len(p.to_supersede) for p in self.memory_plans),
|
||||||
|
"interaction_rekey_rows": sum(len(p.rows_to_rekey) for p in self.interaction_plans),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Plan building
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def build_alias_map(registry_path: Path) -> dict[str, str]:
|
||||||
|
"""Load the registry and return {lowercased alias_or_id: canonical_id}.
|
||||||
|
|
||||||
|
The canonical id itself is also in the map so later code can
|
||||||
|
distinguish "already canonical" from "is an alias".
|
||||||
|
"""
|
||||||
|
if not registry_path.exists():
|
||||||
|
return {}
|
||||||
|
payload = json.loads(registry_path.read_text(encoding="utf-8"))
|
||||||
|
mapping: dict[str, str] = {}
|
||||||
|
for entry in payload.get("projects", []):
|
||||||
|
canonical = str(entry.get("id", "")).strip()
|
||||||
|
if not canonical:
|
||||||
|
continue
|
||||||
|
mapping[canonical.lower()] = canonical
|
||||||
|
for alias in entry.get("aliases", []):
|
||||||
|
alias_str = str(alias).strip()
|
||||||
|
if not alias_str:
|
||||||
|
continue
|
||||||
|
mapping[alias_str.lower()] = canonical
|
||||||
|
return mapping
|
||||||
|
|
||||||
|
|
||||||
|
def _dict_row(row: sqlite3.Row) -> dict:
|
||||||
|
return {k: row[k] for k in row.keys()}
|
||||||
|
|
||||||
|
|
||||||
|
def find_shadow_projects(
|
||||||
|
conn: sqlite3.Connection, alias_map: dict[str, str]
|
||||||
|
) -> list[ShadowProject]:
|
||||||
|
"""Find rows in the projects table whose name is a registered
|
||||||
|
alias that differs from the canonical project id."""
|
||||||
|
shadows: list[ShadowProject] = []
|
||||||
|
rows = conn.execute("SELECT id, name FROM projects").fetchall()
|
||||||
|
for row in rows:
|
||||||
|
name = (row["name"] or "").strip()
|
||||||
|
if not name:
|
||||||
|
continue
|
||||||
|
canonical = alias_map.get(name.lower())
|
||||||
|
if canonical is None:
|
||||||
|
continue # not in registry; leave it alone (unregistered-name fallback)
|
||||||
|
if name == canonical:
|
||||||
|
continue # already canonical; nothing to migrate
|
||||||
|
shadows.append(
|
||||||
|
ShadowProject(
|
||||||
|
shadow_row_id=row["id"],
|
||||||
|
shadow_name=name,
|
||||||
|
canonical_project_id=canonical,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return shadows
|
||||||
|
|
||||||
|
|
||||||
|
def _find_or_plan_canonical_row(
|
||||||
|
conn: sqlite3.Connection, canonical_project_id: str
|
||||||
|
) -> str | None:
|
||||||
|
"""Return the existing projects.id row whose name matches the
|
||||||
|
canonical project id, or None if no such row exists yet.
|
||||||
|
|
||||||
|
Case-insensitive lookup matches ``ensure_project``'s semantics.
|
||||||
|
"""
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT id FROM projects WHERE lower(name) = lower(?)",
|
||||||
|
(canonical_project_id,),
|
||||||
|
).fetchone()
|
||||||
|
return row["id"] if row else None
|
||||||
|
|
||||||
|
|
||||||
|
def check_canonical_integrity(
|
||||||
|
conn: sqlite3.Connection, alias_map: dict[str, str]
|
||||||
|
) -> list[str]:
|
||||||
|
"""Refuse to run if any canonical project has more than one row.
|
||||||
|
|
||||||
|
This shouldn't happen under normal operation, but if it does the
|
||||||
|
migration can't proceed because we don't know which canonical row
|
||||||
|
the shadow dependents should be rekeyed to.
|
||||||
|
"""
|
||||||
|
errors: list[str] = []
|
||||||
|
seen_canonicals: set[str] = set()
|
||||||
|
for canonical in alias_map.values():
|
||||||
|
if canonical in seen_canonicals:
|
||||||
|
continue
|
||||||
|
seen_canonicals.add(canonical)
|
||||||
|
count_row = conn.execute(
|
||||||
|
"SELECT COUNT(*) AS c FROM projects WHERE lower(name) = lower(?)",
|
||||||
|
(canonical,),
|
||||||
|
).fetchone()
|
||||||
|
if count_row and count_row["c"] > 1:
|
||||||
|
errors.append(
|
||||||
|
f"canonical project '{canonical}' has {count_row['c']} rows in "
|
||||||
|
f"the projects table; migration refuses to run until this is "
|
||||||
|
f"resolved manually"
|
||||||
|
)
|
||||||
|
return errors
|
||||||
|
|
||||||
|
|
||||||
|
def plan_state_migration(
|
||||||
|
conn: sqlite3.Connection,
|
||||||
|
shadows: list[ShadowProject],
|
||||||
|
) -> list[StateRekeyPlan]:
|
||||||
|
plans: list[StateRekeyPlan] = []
|
||||||
|
for shadow in shadows:
|
||||||
|
canonical_row_id = _find_or_plan_canonical_row(
|
||||||
|
conn, shadow.canonical_project_id
|
||||||
|
)
|
||||||
|
# If the canonical row doesn't exist yet we'll create it at
|
||||||
|
# apply time; planning can use a placeholder.
|
||||||
|
planned_canonical_row_id = canonical_row_id or "<will-create>"
|
||||||
|
|
||||||
|
plan = StateRekeyPlan(
|
||||||
|
shadow_row_id=shadow.shadow_row_id,
|
||||||
|
canonical_row_id=planned_canonical_row_id,
|
||||||
|
canonical_project_id=shadow.canonical_project_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
# All state rows currently attached to the shadow
|
||||||
|
state_rows = conn.execute(
|
||||||
|
"SELECT * FROM project_state WHERE project_id = ? AND status = 'active'",
|
||||||
|
(shadow.shadow_row_id,),
|
||||||
|
).fetchall()
|
||||||
|
|
||||||
|
for raw in state_rows:
|
||||||
|
state = _dict_row(raw)
|
||||||
|
if canonical_row_id is None:
|
||||||
|
# No canonical row exists yet; nothing to collide with.
|
||||||
|
# The rekey is clean.
|
||||||
|
plan.rows_to_rekey.append(state)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check for a collision: an active state row on the
|
||||||
|
# canonical project with the same (category, key) whose
|
||||||
|
# value differs from the shadow's.
|
||||||
|
collision_row = conn.execute(
|
||||||
|
"SELECT * FROM project_state "
|
||||||
|
"WHERE project_id = ? AND category = ? AND key = ? AND status = 'active'",
|
||||||
|
(canonical_row_id, state["category"], state["key"]),
|
||||||
|
).fetchone()
|
||||||
|
|
||||||
|
if collision_row is None:
|
||||||
|
plan.rows_to_rekey.append(state)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if collision_row["value"] == state["value"]:
|
||||||
|
# Same value — trivially deduplicable. Plan to drop
|
||||||
|
# the shadow row (by marking it superseded via the
|
||||||
|
# rekey pathway: rekey would hit the UNIQUE
|
||||||
|
# constraint so we can't just UPDATE; instead we'll
|
||||||
|
# mark the shadow superseded at apply time).
|
||||||
|
state["_duplicate_of"] = _dict_row(collision_row)
|
||||||
|
plan.rows_to_rekey.append(state)
|
||||||
|
continue
|
||||||
|
|
||||||
|
plan.collisions.append(
|
||||||
|
{
|
||||||
|
"shadow": state,
|
||||||
|
"canonical": _dict_row(collision_row),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
plans.append(plan)
|
||||||
|
return plans
|
||||||
|
|
||||||
|
|
||||||
|
def plan_memory_migration(
|
||||||
|
conn: sqlite3.Connection, alias_map: dict[str, str]
|
||||||
|
) -> list[MemoryRekeyPlan]:
|
||||||
|
plans: list[MemoryRekeyPlan] = []
|
||||||
|
|
||||||
|
# Group aliases by canonical to build one plan per canonical.
|
||||||
|
aliases_by_canonical: dict[str, list[str]] = {}
|
||||||
|
for alias, canonical in alias_map.items():
|
||||||
|
if alias == canonical.lower():
|
||||||
|
continue # canonical string is not an alias worth rekeying
|
||||||
|
aliases_by_canonical.setdefault(canonical, []).append(alias)
|
||||||
|
|
||||||
|
for canonical, aliases in aliases_by_canonical.items():
|
||||||
|
for alias_key in aliases:
|
||||||
|
# memories store the string (mixed case allowed); find any
|
||||||
|
# row whose project column (lowercased) matches the alias
|
||||||
|
raw_rows = conn.execute(
|
||||||
|
"SELECT * FROM memories WHERE lower(project) = lower(?)",
|
||||||
|
(alias_key,),
|
||||||
|
).fetchall()
|
||||||
|
if not raw_rows:
|
||||||
|
continue
|
||||||
|
|
||||||
|
plan = MemoryRekeyPlan(alias=alias_key, canonical_project_id=canonical)
|
||||||
|
for raw in raw_rows:
|
||||||
|
mem = _dict_row(raw)
|
||||||
|
# Check for dedup collision against the canonical bucket
|
||||||
|
dup = conn.execute(
|
||||||
|
"SELECT * FROM memories WHERE memory_type = ? AND content = ? "
|
||||||
|
"AND lower(project) = lower(?) AND status = ? AND id != ?",
|
||||||
|
(
|
||||||
|
mem["memory_type"],
|
||||||
|
mem["content"],
|
||||||
|
canonical,
|
||||||
|
mem["status"],
|
||||||
|
mem["id"],
|
||||||
|
),
|
||||||
|
).fetchone()
|
||||||
|
if dup is None:
|
||||||
|
plan.rows_to_rekey.append(mem)
|
||||||
|
continue
|
||||||
|
dup_dict = _dict_row(dup)
|
||||||
|
# Newer row wins; older is superseded. updated_at is
|
||||||
|
# the tiebreaker because created_at is the storage
|
||||||
|
# insert time and updated_at reflects last mutation.
|
||||||
|
shadow_updated = mem.get("updated_at") or ""
|
||||||
|
canonical_updated = dup_dict.get("updated_at") or ""
|
||||||
|
if shadow_updated > canonical_updated:
|
||||||
|
# Shadow is newer: canonical becomes superseded,
|
||||||
|
# shadow rekeys into active.
|
||||||
|
plan.rows_to_rekey.append(mem)
|
||||||
|
plan.to_supersede.append(dup_dict)
|
||||||
|
else:
|
||||||
|
# Canonical is newer (or equal): shadow becomes
|
||||||
|
# superseded in place (we still rekey it so its
|
||||||
|
# project column is canonical, but its status flips
|
||||||
|
# to superseded).
|
||||||
|
mem["_action"] = "supersede"
|
||||||
|
plan.to_supersede.append(mem)
|
||||||
|
plans.append(plan)
|
||||||
|
return plans
|
||||||
|
|
||||||
|
|
||||||
|
def plan_interaction_migration(
|
||||||
|
conn: sqlite3.Connection, alias_map: dict[str, str]
|
||||||
|
) -> list[InteractionRekeyPlan]:
|
||||||
|
plans: list[InteractionRekeyPlan] = []
|
||||||
|
aliases_by_canonical: dict[str, list[str]] = {}
|
||||||
|
for alias, canonical in alias_map.items():
|
||||||
|
if alias == canonical.lower():
|
||||||
|
continue
|
||||||
|
aliases_by_canonical.setdefault(canonical, []).append(alias)
|
||||||
|
|
||||||
|
for canonical, aliases in aliases_by_canonical.items():
|
||||||
|
for alias_key in aliases:
|
||||||
|
raw_rows = conn.execute(
|
||||||
|
"SELECT * FROM interactions WHERE lower(project) = lower(?)",
|
||||||
|
(alias_key,),
|
||||||
|
).fetchall()
|
||||||
|
if not raw_rows:
|
||||||
|
continue
|
||||||
|
plan = InteractionRekeyPlan(alias=alias_key, canonical_project_id=canonical)
|
||||||
|
for raw in raw_rows:
|
||||||
|
plan.rows_to_rekey.append(_dict_row(raw))
|
||||||
|
plans.append(plan)
|
||||||
|
return plans
|
||||||
|
|
||||||
|
|
||||||
|
def build_plan(
|
||||||
|
conn: sqlite3.Connection, registry_path: Path
|
||||||
|
) -> MigrationPlan:
|
||||||
|
alias_map = build_alias_map(registry_path)
|
||||||
|
plan = MigrationPlan(alias_map=alias_map)
|
||||||
|
if not alias_map:
|
||||||
|
return plan
|
||||||
|
|
||||||
|
plan.integrity_errors = check_canonical_integrity(conn, alias_map)
|
||||||
|
if plan.integrity_errors:
|
||||||
|
return plan
|
||||||
|
|
||||||
|
plan.shadow_projects = find_shadow_projects(conn, alias_map)
|
||||||
|
plan.state_plans = plan_state_migration(conn, plan.shadow_projects)
|
||||||
|
plan.memory_plans = plan_memory_migration(conn, alias_map)
|
||||||
|
plan.interaction_plans = plan_interaction_migration(conn, alias_map)
|
||||||
|
return plan
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Apply
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class MigrationRefused(RuntimeError):
|
||||||
|
"""Raised when the apply step refuses to run due to a safety rail."""
|
||||||
|
|
||||||
|
|
||||||
|
def apply_plan(conn: sqlite3.Connection, plan: MigrationPlan) -> dict:
|
||||||
|
"""Execute the plan inside a single SQLite transaction.
|
||||||
|
|
||||||
|
Returns a summary dict. Raises ``MigrationRefused`` if the plan
|
||||||
|
has integrity errors, has state collisions, or is empty (callers
|
||||||
|
that want to apply an empty plan must check ``plan.is_empty``
|
||||||
|
themselves).
|
||||||
|
"""
|
||||||
|
if plan.integrity_errors:
|
||||||
|
raise MigrationRefused(
|
||||||
|
"migration refuses to run due to integrity errors: "
|
||||||
|
+ "; ".join(plan.integrity_errors)
|
||||||
|
)
|
||||||
|
if plan.has_collisions:
|
||||||
|
raise MigrationRefused(
|
||||||
|
"migration refuses to run; the plan has "
|
||||||
|
f"{sum(len(p.collisions) for p in plan.state_plans)} "
|
||||||
|
"project_state collision(s) that require manual resolution"
|
||||||
|
)
|
||||||
|
|
||||||
|
summary: dict = {
|
||||||
|
"shadow_projects_deleted": 0,
|
||||||
|
"state_rows_rekeyed": 0,
|
||||||
|
"state_rows_merged_as_duplicate": 0,
|
||||||
|
"memory_rows_rekeyed": 0,
|
||||||
|
"memory_rows_superseded": 0,
|
||||||
|
"interaction_rows_rekeyed": 0,
|
||||||
|
"canonical_rows_created": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
conn.execute("BEGIN")
|
||||||
|
|
||||||
|
# --- Projects + project_state ----------------------------------
|
||||||
|
for sp_plan in plan.state_plans:
|
||||||
|
canonical_row_id = sp_plan.canonical_row_id
|
||||||
|
if canonical_row_id == "<will-create>":
|
||||||
|
# Create the canonical projects row now
|
||||||
|
canonical_row_id = str(uuid.uuid4())
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO projects (id, name, description) "
|
||||||
|
"VALUES (?, ?, ?)",
|
||||||
|
(
|
||||||
|
canonical_row_id,
|
||||||
|
sp_plan.canonical_project_id,
|
||||||
|
"created by legacy alias migration",
|
||||||
|
),
|
||||||
|
)
|
||||||
|
summary["canonical_rows_created"] += 1
|
||||||
|
|
||||||
|
for state in sp_plan.rows_to_rekey:
|
||||||
|
if "_duplicate_of" in state:
|
||||||
|
# Same (category, key, value) already on canonical;
|
||||||
|
# mark the shadow state row as superseded rather
|
||||||
|
# than hitting the UNIQUE constraint.
|
||||||
|
conn.execute(
|
||||||
|
"UPDATE project_state SET status = 'superseded', "
|
||||||
|
"updated_at = CURRENT_TIMESTAMP WHERE id = ?",
|
||||||
|
(state["id"],),
|
||||||
|
)
|
||||||
|
summary["state_rows_merged_as_duplicate"] += 1
|
||||||
|
continue
|
||||||
|
conn.execute(
|
||||||
|
"UPDATE project_state SET project_id = ?, "
|
||||||
|
"updated_at = CURRENT_TIMESTAMP WHERE id = ?",
|
||||||
|
(canonical_row_id, state["id"]),
|
||||||
|
)
|
||||||
|
summary["state_rows_rekeyed"] += 1
|
||||||
|
|
||||||
|
# Delete shadow projects rows after their state dependents
|
||||||
|
# have moved. We can do this unconditionally because project_state
|
||||||
|
# has ON DELETE CASCADE on project_id, and all our dependents
|
||||||
|
# have already been rekeyed or superseded above.
|
||||||
|
for sp_plan in plan.state_plans:
|
||||||
|
conn.execute(
|
||||||
|
"DELETE FROM projects WHERE id = ?", (sp_plan.shadow_row_id,)
|
||||||
|
)
|
||||||
|
summary["shadow_projects_deleted"] += 1
|
||||||
|
|
||||||
|
# Also delete any shadow projects that had no state rows
|
||||||
|
# attached (find_shadow_projects returned them but
|
||||||
|
# plan_state_migration produced an empty plan entry for them).
|
||||||
|
# They're already covered by the loop above because every
|
||||||
|
# shadow has a state plan entry, even if empty.
|
||||||
|
|
||||||
|
# --- Memories --------------------------------------------------
|
||||||
|
for mem_plan in plan.memory_plans:
|
||||||
|
for mem in mem_plan.rows_to_rekey:
|
||||||
|
conn.execute(
|
||||||
|
"UPDATE memories SET project = ?, "
|
||||||
|
"updated_at = CURRENT_TIMESTAMP WHERE id = ?",
|
||||||
|
(mem_plan.canonical_project_id, mem["id"]),
|
||||||
|
)
|
||||||
|
summary["memory_rows_rekeyed"] += 1
|
||||||
|
for mem in mem_plan.to_supersede:
|
||||||
|
# Supersede the older/duplicate row AND rekey its
|
||||||
|
# project to canonical so the historical audit trail
|
||||||
|
# reflects the canonical project.
|
||||||
|
conn.execute(
|
||||||
|
"UPDATE memories SET status = 'superseded', "
|
||||||
|
"project = ?, updated_at = CURRENT_TIMESTAMP "
|
||||||
|
"WHERE id = ?",
|
||||||
|
(mem_plan.canonical_project_id, mem["id"]),
|
||||||
|
)
|
||||||
|
summary["memory_rows_superseded"] += 1
|
||||||
|
|
||||||
|
# --- Interactions ----------------------------------------------
|
||||||
|
for ix_plan in plan.interaction_plans:
|
||||||
|
for ix in ix_plan.rows_to_rekey:
|
||||||
|
conn.execute(
|
||||||
|
"UPDATE interactions SET project = ? WHERE id = ?",
|
||||||
|
(ix_plan.canonical_project_id, ix["id"]),
|
||||||
|
)
|
||||||
|
summary["interaction_rows_rekeyed"] += 1
|
||||||
|
|
||||||
|
conn.execute("COMMIT")
|
||||||
|
except Exception:
|
||||||
|
conn.execute("ROLLBACK")
|
||||||
|
raise
|
||||||
|
|
||||||
|
return summary
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Reporting
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def render_plan_text(plan: MigrationPlan) -> str:
|
||||||
|
lines: list[str] = []
|
||||||
|
lines.append("=" * 72)
|
||||||
|
lines.append("Legacy alias migration — plan")
|
||||||
|
lines.append("=" * 72)
|
||||||
|
|
||||||
|
if plan.integrity_errors:
|
||||||
|
lines.append("")
|
||||||
|
lines.append("INTEGRITY ERRORS — migration cannot run:")
|
||||||
|
for err in plan.integrity_errors:
|
||||||
|
lines.append(f" - {err}")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
if not plan.alias_map:
|
||||||
|
lines.append("No registered aliases; nothing to plan.")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
canonicals = sorted(set(plan.alias_map.values()))
|
||||||
|
lines.append(f"Known canonical projects ({len(canonicals)}):")
|
||||||
|
for canonical in canonicals:
|
||||||
|
aliases = [
|
||||||
|
alias
|
||||||
|
for alias, canon in plan.alias_map.items()
|
||||||
|
if canon == canonical and alias != canonical.lower()
|
||||||
|
]
|
||||||
|
lines.append(f" - {canonical} (aliases: {', '.join(aliases) or 'none'})")
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
counts = plan.counts()
|
||||||
|
lines.append("Counts:")
|
||||||
|
lines.append(f" shadow projects : {counts['shadow_projects']}")
|
||||||
|
lines.append(f" state rekey rows : {counts['state_rekey_rows']}")
|
||||||
|
lines.append(f" state collisions : {counts['state_collisions']}")
|
||||||
|
lines.append(f" memory rekey rows : {counts['memory_rekey_rows']}")
|
||||||
|
lines.append(f" memory supersede : {counts['memory_supersede_rows']}")
|
||||||
|
lines.append(f" interaction rekey : {counts['interaction_rekey_rows']}")
|
||||||
|
|
||||||
|
if plan.is_empty:
|
||||||
|
lines.append("")
|
||||||
|
lines.append("Nothing to migrate. The database is clean.")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
if plan.shadow_projects:
|
||||||
|
lines.append("")
|
||||||
|
lines.append("Shadow projects (alias-named rows that will be deleted):")
|
||||||
|
for shadow in plan.shadow_projects:
|
||||||
|
lines.append(
|
||||||
|
f" - '{shadow.shadow_name}' (row {shadow.shadow_row_id[:8]}) "
|
||||||
|
f"-> '{shadow.canonical_project_id}'"
|
||||||
|
)
|
||||||
|
|
||||||
|
if plan.has_collisions:
|
||||||
|
lines.append("")
|
||||||
|
lines.append("*** STATE COLLISIONS — apply will REFUSE until resolved: ***")
|
||||||
|
for sp_plan in plan.state_plans:
|
||||||
|
for collision in sp_plan.collisions:
|
||||||
|
shadow = collision["shadow"]
|
||||||
|
canonical = collision["canonical"]
|
||||||
|
lines.append(
|
||||||
|
f" - [{shadow['category']}/{shadow['key']}] "
|
||||||
|
f"shadow value={shadow['value']!r} vs "
|
||||||
|
f"canonical value={canonical['value']!r}"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
lines.append(
|
||||||
|
"Resolution: use POST /project/state to pick the winning value, "
|
||||||
|
"then re-run --apply."
|
||||||
|
)
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
lines.append("")
|
||||||
|
lines.append("Apply plan (if --apply is given):")
|
||||||
|
for sp_plan in plan.state_plans:
|
||||||
|
if sp_plan.rows_to_rekey:
|
||||||
|
lines.append(
|
||||||
|
f" - rekey {len(sp_plan.rows_to_rekey)} project_state row(s) "
|
||||||
|
f"from shadow {sp_plan.shadow_row_id[:8]} -> '{sp_plan.canonical_project_id}'"
|
||||||
|
)
|
||||||
|
for mem_plan in plan.memory_plans:
|
||||||
|
if mem_plan.rows_to_rekey:
|
||||||
|
lines.append(
|
||||||
|
f" - rekey {len(mem_plan.rows_to_rekey)} memory row(s) from "
|
||||||
|
f"project='{mem_plan.alias}' -> '{mem_plan.canonical_project_id}'"
|
||||||
|
)
|
||||||
|
if mem_plan.to_supersede:
|
||||||
|
lines.append(
|
||||||
|
f" - supersede {len(mem_plan.to_supersede)} duplicate "
|
||||||
|
f"memory row(s) under '{mem_plan.canonical_project_id}'"
|
||||||
|
)
|
||||||
|
for ix_plan in plan.interaction_plans:
|
||||||
|
if ix_plan.rows_to_rekey:
|
||||||
|
lines.append(
|
||||||
|
f" - rekey {len(ix_plan.rows_to_rekey)} interaction row(s) "
|
||||||
|
f"from project='{ix_plan.alias}' -> '{ix_plan.canonical_project_id}'"
|
||||||
|
)
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def plan_to_json_dict(plan: MigrationPlan) -> dict:
|
||||||
|
return {
|
||||||
|
"alias_map": plan.alias_map,
|
||||||
|
"integrity_errors": plan.integrity_errors,
|
||||||
|
"counts": plan.counts(),
|
||||||
|
"is_empty": plan.is_empty,
|
||||||
|
"has_collisions": plan.has_collisions,
|
||||||
|
"shadow_projects": [asdict(sp) for sp in plan.shadow_projects],
|
||||||
|
"state_plans": [
|
||||||
|
{
|
||||||
|
"shadow_row_id": sp.shadow_row_id,
|
||||||
|
"canonical_row_id": sp.canonical_row_id,
|
||||||
|
"canonical_project_id": sp.canonical_project_id,
|
||||||
|
"rows_to_rekey": sp.rows_to_rekey,
|
||||||
|
"collisions": sp.collisions,
|
||||||
|
}
|
||||||
|
for sp in plan.state_plans
|
||||||
|
],
|
||||||
|
"memory_plans": [
|
||||||
|
{
|
||||||
|
"alias": mp.alias,
|
||||||
|
"canonical_project_id": mp.canonical_project_id,
|
||||||
|
"rows_to_rekey": mp.rows_to_rekey,
|
||||||
|
"to_supersede": mp.to_supersede,
|
||||||
|
}
|
||||||
|
for mp in plan.memory_plans
|
||||||
|
],
|
||||||
|
"interaction_plans": [
|
||||||
|
{
|
||||||
|
"alias": ip.alias,
|
||||||
|
"canonical_project_id": ip.canonical_project_id,
|
||||||
|
"rows_to_rekey": ip.rows_to_rekey,
|
||||||
|
}
|
||||||
|
for ip in plan.interaction_plans
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def write_report(
|
||||||
|
plan: MigrationPlan,
|
||||||
|
summary: dict | None,
|
||||||
|
db_path: Path,
|
||||||
|
registry_path: Path,
|
||||||
|
mode: str,
|
||||||
|
report_dir: Path,
|
||||||
|
) -> Path:
|
||||||
|
report_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
stamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
|
||||||
|
report_path = report_dir / f"legacy-aliases-{stamp}.json"
|
||||||
|
payload = {
|
||||||
|
"created_at": datetime.now(timezone.utc).isoformat(),
|
||||||
|
"mode": mode,
|
||||||
|
"db_path": str(db_path),
|
||||||
|
"registry_path": str(registry_path),
|
||||||
|
"plan": plan_to_json_dict(plan),
|
||||||
|
"apply_summary": summary,
|
||||||
|
}
|
||||||
|
report_path.write_text(
|
||||||
|
json.dumps(payload, indent=2, default=str, ensure_ascii=True) + "\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
return report_path
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Entry point
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _open_db(db_path: Path) -> sqlite3.Connection:
|
||||||
|
conn = sqlite3.connect(str(db_path))
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
conn.execute("PRAGMA foreign_keys = ON")
|
||||||
|
return conn
|
||||||
|
|
||||||
|
|
||||||
|
def run(
|
||||||
|
db_path: Path,
|
||||||
|
registry_path: Path,
|
||||||
|
apply: bool,
|
||||||
|
allow_empty: bool,
|
||||||
|
report_dir: Path,
|
||||||
|
out_json: bool,
|
||||||
|
) -> int:
|
||||||
|
# Ensure the schema exists (init_db handles migrations too). We
|
||||||
|
# do this by importing the app-side init path and pointing it at
|
||||||
|
# the caller's db_path via ATOCORE_DATA_DIR + ATOCORE_DB_DIR.
|
||||||
|
os.environ["ATOCORE_DB_DIR"] = str(db_path.parent)
|
||||||
|
os.environ["ATOCORE_PROJECT_REGISTRY_PATH"] = str(registry_path)
|
||||||
|
import atocore.config as _config
|
||||||
|
|
||||||
|
_config.settings = _config.Settings()
|
||||||
|
from atocore.context.project_state import init_project_state_schema
|
||||||
|
from atocore.models.database import init_db
|
||||||
|
|
||||||
|
init_db()
|
||||||
|
init_project_state_schema()
|
||||||
|
|
||||||
|
conn = _open_db(db_path)
|
||||||
|
try:
|
||||||
|
plan = build_plan(conn, registry_path)
|
||||||
|
|
||||||
|
if out_json:
|
||||||
|
print(json.dumps(plan_to_json_dict(plan), indent=2, default=str))
|
||||||
|
else:
|
||||||
|
print(render_plan_text(plan))
|
||||||
|
|
||||||
|
if not apply:
|
||||||
|
# Dry-run always writes a report for audit
|
||||||
|
write_report(
|
||||||
|
plan,
|
||||||
|
summary=None,
|
||||||
|
db_path=db_path,
|
||||||
|
registry_path=registry_path,
|
||||||
|
mode="dry-run",
|
||||||
|
report_dir=report_dir,
|
||||||
|
)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
# --apply path
|
||||||
|
if plan.integrity_errors:
|
||||||
|
print("\nRefused: integrity errors prevent apply.", file=sys.stderr)
|
||||||
|
return 2
|
||||||
|
if plan.has_collisions:
|
||||||
|
print(
|
||||||
|
"\nRefused: project_state collisions prevent apply.",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
return 2
|
||||||
|
if plan.is_empty and not allow_empty:
|
||||||
|
print(
|
||||||
|
"\nRefused: plan is empty. Pass --allow-empty to confirm.",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
return 3
|
||||||
|
|
||||||
|
summary = apply_plan(conn, plan)
|
||||||
|
print("\nApply complete:")
|
||||||
|
for k, v in summary.items():
|
||||||
|
print(f" {k}: {v}")
|
||||||
|
|
||||||
|
report_path = write_report(
|
||||||
|
plan,
|
||||||
|
summary=summary,
|
||||||
|
db_path=db_path,
|
||||||
|
registry_path=registry_path,
|
||||||
|
mode="apply",
|
||||||
|
report_dir=report_dir,
|
||||||
|
)
|
||||||
|
print(f"\nReport: {report_path}")
|
||||||
|
return 0
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__.split("\n\n", 1)[0])
|
||||||
|
parser.add_argument(
|
||||||
|
"--registry",
|
||||||
|
type=Path,
|
||||||
|
default=None,
|
||||||
|
help="override ATOCORE_PROJECT_REGISTRY_PATH for this run",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--db",
|
||||||
|
type=Path,
|
||||||
|
default=None,
|
||||||
|
help="override the AtoCore SQLite DB path",
|
||||||
|
)
|
||||||
|
parser.add_argument("--apply", action="store_true", help="actually mutate the DB")
|
||||||
|
parser.add_argument(
|
||||||
|
"--allow-empty",
|
||||||
|
action="store_true",
|
||||||
|
help="don't refuse when --apply is called with an empty plan",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--report-dir",
|
||||||
|
type=Path,
|
||||||
|
default=None,
|
||||||
|
help="where to write the JSON report (default: data/migrations/)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json",
|
||||||
|
action="store_true",
|
||||||
|
help="emit the plan as JSON instead of human prose",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Resolve the registry and db paths. If the caller didn't override
|
||||||
|
# them, fall back to what the app's config resolves today.
|
||||||
|
import atocore.config as _config
|
||||||
|
|
||||||
|
_config.settings = _config.Settings()
|
||||||
|
|
||||||
|
registry_path = args.registry or _config.settings.resolved_project_registry_path
|
||||||
|
db_path = args.db or _config.settings.db_path
|
||||||
|
report_dir = (
|
||||||
|
args.report_dir or _config.settings.resolved_data_dir / "migrations"
|
||||||
|
)
|
||||||
|
|
||||||
|
return run(
|
||||||
|
db_path=Path(db_path),
|
||||||
|
registry_path=Path(registry_path),
|
||||||
|
apply=args.apply,
|
||||||
|
allow_empty=args.allow_empty,
|
||||||
|
report_dir=Path(report_dir),
|
||||||
|
out_json=args.json,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
615
tests/test_migrate_legacy_aliases.py
Normal file
615
tests/test_migrate_legacy_aliases.py
Normal file
@@ -0,0 +1,615 @@
|
|||||||
|
"""Tests for scripts/migrate_legacy_aliases.py.
|
||||||
|
|
||||||
|
The migration script closes the compatibility gap documented in
|
||||||
|
docs/architecture/project-identity-canonicalization.md. These tests
|
||||||
|
cover:
|
||||||
|
|
||||||
|
- empty/clean database behavior
|
||||||
|
- shadow projects detection
|
||||||
|
- state rekey without collisions
|
||||||
|
- state collision detection + apply refusal
|
||||||
|
- memory rekey + supersession of duplicates
|
||||||
|
- interaction rekey
|
||||||
|
- end-to-end apply on a realistic shadow
|
||||||
|
- idempotency (running twice produces the same final state)
|
||||||
|
- report artifact is written
|
||||||
|
- the pre-fix regression gap is actually closed after migration
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import sqlite3
|
||||||
|
import sys
|
||||||
|
import uuid
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from atocore.context.project_state import (
|
||||||
|
get_state,
|
||||||
|
init_project_state_schema,
|
||||||
|
)
|
||||||
|
from atocore.models.database import init_db
|
||||||
|
|
||||||
|
# Make scripts/ importable
|
||||||
|
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||||
|
sys.path.insert(0, str(_REPO_ROOT / "scripts"))
|
||||||
|
|
||||||
|
import migrate_legacy_aliases as mig # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers that seed "legacy" rows the way they would have looked before fb6298a
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _open_db_connection():
|
||||||
|
"""Open a direct SQLite connection to the test data dir's DB."""
|
||||||
|
import atocore.config as config
|
||||||
|
|
||||||
|
conn = sqlite3.connect(str(config.settings.db_path))
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
conn.execute("PRAGMA foreign_keys = ON")
|
||||||
|
return conn
|
||||||
|
|
||||||
|
|
||||||
|
def _seed_shadow_project(
|
||||||
|
conn: sqlite3.Connection, shadow_name: str
|
||||||
|
) -> str:
|
||||||
|
"""Insert a projects row keyed under an alias, like the old set_state would have."""
|
||||||
|
project_id = str(uuid.uuid4())
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO projects (id, name, description) VALUES (?, ?, ?)",
|
||||||
|
(project_id, shadow_name, f"shadow row for {shadow_name}"),
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
return project_id
|
||||||
|
|
||||||
|
|
||||||
|
def _seed_state_row(
|
||||||
|
conn: sqlite3.Connection,
|
||||||
|
project_id: str,
|
||||||
|
category: str,
|
||||||
|
key: str,
|
||||||
|
value: str,
|
||||||
|
) -> str:
|
||||||
|
row_id = str(uuid.uuid4())
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO project_state "
|
||||||
|
"(id, project_id, category, key, value, source, confidence) "
|
||||||
|
"VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||||
|
(row_id, project_id, category, key, value, "legacy-test", 1.0),
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
return row_id
|
||||||
|
|
||||||
|
|
||||||
|
def _seed_memory_row(
|
||||||
|
conn: sqlite3.Connection,
|
||||||
|
memory_type: str,
|
||||||
|
content: str,
|
||||||
|
project: str,
|
||||||
|
status: str = "active",
|
||||||
|
) -> str:
|
||||||
|
row_id = str(uuid.uuid4())
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO memories "
|
||||||
|
"(id, memory_type, content, project, source_chunk_id, confidence, status) "
|
||||||
|
"VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||||
|
(row_id, memory_type, content, project, None, 1.0, status),
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
return row_id
|
||||||
|
|
||||||
|
|
||||||
|
def _seed_interaction_row(
|
||||||
|
conn: sqlite3.Connection, prompt: str, project: str
|
||||||
|
) -> str:
|
||||||
|
row_id = str(uuid.uuid4())
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO interactions "
|
||||||
|
"(id, prompt, context_pack, response_summary, response, "
|
||||||
|
" memories_used, chunks_used, client, session_id, project, created_at) "
|
||||||
|
"VALUES (?, ?, '{}', '', '', '[]', '[]', 'legacy-test', '', ?, '2026-04-01 12:00:00')",
|
||||||
|
(row_id, prompt, project),
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
return row_id
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# plan-building tests
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture(autouse=True)
|
||||||
|
def _setup(tmp_data_dir):
|
||||||
|
init_db()
|
||||||
|
init_project_state_schema()
|
||||||
|
|
||||||
|
|
||||||
|
def test_dry_run_on_empty_registry_reports_empty_plan(tmp_data_dir):
|
||||||
|
"""Empty registry -> empty alias map -> empty plan."""
|
||||||
|
registry_path = tmp_data_dir / "empty-registry.json"
|
||||||
|
registry_path.write_text('{"projects": []}', encoding="utf-8")
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert plan.alias_map == {}
|
||||||
|
assert plan.is_empty
|
||||||
|
assert not plan.has_collisions
|
||||||
|
assert plan.counts() == {
|
||||||
|
"shadow_projects": 0,
|
||||||
|
"state_rekey_rows": 0,
|
||||||
|
"state_collisions": 0,
|
||||||
|
"memory_rekey_rows": 0,
|
||||||
|
"memory_supersede_rows": 0,
|
||||||
|
"interaction_rekey_rows": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_dry_run_on_clean_registered_db_reports_empty_plan(project_registry):
|
||||||
|
"""A registry with projects but no legacy rows -> empty plan."""
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert plan.alias_map != {}
|
||||||
|
assert plan.is_empty
|
||||||
|
|
||||||
|
|
||||||
|
def test_dry_run_finds_shadow_project(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
_seed_shadow_project(conn, "p05")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert len(plan.shadow_projects) == 1
|
||||||
|
assert plan.shadow_projects[0].shadow_name == "p05"
|
||||||
|
assert plan.shadow_projects[0].canonical_project_id == "p05-interferometer"
|
||||||
|
|
||||||
|
|
||||||
|
def test_dry_run_plans_state_rekey_without_collisions(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1 ingestion")
|
||||||
|
_seed_state_row(conn, shadow_id, "decision", "lateral_support", "GF-PTFE")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert len(plan.state_plans) == 1
|
||||||
|
sp = plan.state_plans[0]
|
||||||
|
assert len(sp.rows_to_rekey) == 2
|
||||||
|
assert sp.collisions == []
|
||||||
|
assert not plan.has_collisions
|
||||||
|
|
||||||
|
|
||||||
|
def test_dry_run_detects_state_collision(project_registry):
|
||||||
|
"""Shadow and canonical both have state under the same (category, key) with different values."""
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
|
||||||
|
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1")
|
||||||
|
_seed_state_row(
|
||||||
|
conn, canonical_id, "status", "next_focus", "Wave 2"
|
||||||
|
)
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert plan.has_collisions
|
||||||
|
collision = plan.state_plans[0].collisions[0]
|
||||||
|
assert collision["shadow"]["value"] == "Wave 1"
|
||||||
|
assert collision["canonical"]["value"] == "Wave 2"
|
||||||
|
|
||||||
|
|
||||||
|
def test_dry_run_plans_memory_rekey_and_supersession(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p04-gigabit", ["p04", "gigabit"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
# A clean memory under the alias that will just be rekeyed
|
||||||
|
_seed_memory_row(conn, "project", "clean rekey memory", "p04")
|
||||||
|
# A memory that collides with an existing canonical memory
|
||||||
|
_seed_memory_row(conn, "project", "duplicate content", "p04")
|
||||||
|
_seed_memory_row(conn, "project", "duplicate content", "p04-gigabit")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
# There's exactly one memory plan (one alias matched)
|
||||||
|
assert len(plan.memory_plans) == 1
|
||||||
|
mp = plan.memory_plans[0]
|
||||||
|
# Two rows are candidates for rekey or supersession — one clean,
|
||||||
|
# one duplicate. The duplicate is handled via to_supersede; the
|
||||||
|
# other via rows_to_rekey.
|
||||||
|
total_affected = len(mp.rows_to_rekey) + len(mp.to_supersede)
|
||||||
|
assert total_affected == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_dry_run_plans_interaction_rekey(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p06-polisher", ["p06", "polisher"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
_seed_interaction_row(conn, "quick capture under alias", "polisher")
|
||||||
|
_seed_interaction_row(conn, "another alias-keyed row", "p06")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
total = sum(len(p.rows_to_rekey) for p in plan.interaction_plans)
|
||||||
|
assert total == 2
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# apply tests
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_refuses_on_state_collision(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
|
||||||
|
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1")
|
||||||
|
_seed_state_row(conn, canonical_id, "status", "next_focus", "Wave 2")
|
||||||
|
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
assert plan.has_collisions
|
||||||
|
|
||||||
|
with pytest.raises(mig.MigrationRefused):
|
||||||
|
mig.apply_plan(conn, plan)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_migrates_clean_shadow_end_to_end(project_registry):
|
||||||
|
"""The happy path: one shadow project with clean state rows, rekey into a freshly-created canonical row, verify reachability via get_state."""
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
_seed_state_row(
|
||||||
|
conn, shadow_id, "status", "next_focus", "Wave 1 ingestion"
|
||||||
|
)
|
||||||
|
_seed_state_row(
|
||||||
|
conn, shadow_id, "decision", "lateral_support", "GF-PTFE"
|
||||||
|
)
|
||||||
|
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
assert not plan.has_collisions
|
||||||
|
summary = mig.apply_plan(conn, plan)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert summary["state_rows_rekeyed"] == 2
|
||||||
|
assert summary["shadow_projects_deleted"] == 1
|
||||||
|
assert summary["canonical_rows_created"] == 1
|
||||||
|
|
||||||
|
# The regression gap is now closed: the service layer can see
|
||||||
|
# the state under the canonical id via either the alias OR the
|
||||||
|
# canonical.
|
||||||
|
via_alias = get_state("p05")
|
||||||
|
via_canonical = get_state("p05-interferometer")
|
||||||
|
assert len(via_alias) == 2
|
||||||
|
assert len(via_canonical) == 2
|
||||||
|
values = {entry.value for entry in via_canonical}
|
||||||
|
assert values == {"Wave 1 ingestion", "GF-PTFE"}
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_drops_shadow_state_duplicate_without_collision(project_registry):
|
||||||
|
"""Shadow and canonical both have the same (category, key, value) — shadow gets marked superseded rather than hitting the UNIQUE constraint."""
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
|
||||||
|
_seed_state_row(
|
||||||
|
conn, shadow_id, "status", "next_focus", "Wave 1 ingestion"
|
||||||
|
)
|
||||||
|
_seed_state_row(
|
||||||
|
conn, canonical_id, "status", "next_focus", "Wave 1 ingestion"
|
||||||
|
)
|
||||||
|
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
assert not plan.has_collisions
|
||||||
|
summary = mig.apply_plan(conn, plan)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert summary["state_rows_merged_as_duplicate"] == 1
|
||||||
|
|
||||||
|
via_canonical = get_state("p05-interferometer")
|
||||||
|
# Exactly one active row survives
|
||||||
|
assert len(via_canonical) == 1
|
||||||
|
assert via_canonical[0].value == "Wave 1 ingestion"
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_migrates_memories(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p04-gigabit", ["p04", "gigabit"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
_seed_memory_row(conn, "project", "lateral support uses GF-PTFE", "p04")
|
||||||
|
_seed_memory_row(conn, "preference", "I prefer descriptive commits", "gigabit")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
summary = mig.apply_plan(conn, plan)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert summary["memory_rows_rekeyed"] == 2
|
||||||
|
|
||||||
|
# Both memories should now read as living under the canonical id
|
||||||
|
from atocore.memory.service import get_memories
|
||||||
|
|
||||||
|
rows = get_memories(project="p04-gigabit", limit=50)
|
||||||
|
contents = {m.content for m in rows}
|
||||||
|
assert "lateral support uses GF-PTFE" in contents
|
||||||
|
assert "I prefer descriptive commits" in contents
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_migrates_interactions(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p06-polisher", ["p06", "polisher"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
_seed_interaction_row(conn, "alias-keyed 1", "polisher")
|
||||||
|
_seed_interaction_row(conn, "alias-keyed 2", "p06")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
summary = mig.apply_plan(conn, plan)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert summary["interaction_rows_rekeyed"] == 2
|
||||||
|
|
||||||
|
from atocore.interactions.service import list_interactions
|
||||||
|
|
||||||
|
rows = list_interactions(project="p06-polisher", limit=50)
|
||||||
|
prompts = {i.prompt for i in rows}
|
||||||
|
assert prompts == {"alias-keyed 1", "alias-keyed 2"}
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_is_idempotent(project_registry):
|
||||||
|
"""Running apply twice produces the same final state as running it once."""
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1")
|
||||||
|
_seed_memory_row(conn, "project", "m1", "p05")
|
||||||
|
_seed_interaction_row(conn, "i1", "p05")
|
||||||
|
|
||||||
|
# first apply
|
||||||
|
plan_a = mig.build_plan(conn, registry_path)
|
||||||
|
summary_a = mig.apply_plan(conn, plan_a)
|
||||||
|
|
||||||
|
# second apply: plan should be empty
|
||||||
|
plan_b = mig.build_plan(conn, registry_path)
|
||||||
|
assert plan_b.is_empty
|
||||||
|
|
||||||
|
# forcing a second apply on the empty plan via the function
|
||||||
|
# directly should also succeed as a no-op (caller normally
|
||||||
|
# has to pass --allow-empty through the CLI, but apply_plan
|
||||||
|
# itself doesn't enforce that — the refusal is in run())
|
||||||
|
summary_b = mig.apply_plan(conn, plan_b)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
assert summary_a["state_rows_rekeyed"] == 1
|
||||||
|
assert summary_a["memory_rows_rekeyed"] == 1
|
||||||
|
assert summary_a["interaction_rows_rekeyed"] == 1
|
||||||
|
assert summary_b["state_rows_rekeyed"] == 0
|
||||||
|
assert summary_b["memory_rows_rekeyed"] == 0
|
||||||
|
assert summary_b["interaction_rows_rekeyed"] == 0
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_refuses_with_integrity_errors(project_registry):
|
||||||
|
"""If the projects table has two case-variant rows for the canonical id, refuse.
|
||||||
|
|
||||||
|
The projects.name column has a case-sensitive UNIQUE constraint,
|
||||||
|
so exact duplicates can't exist. But case-variant rows
|
||||||
|
``p05-interferometer`` and ``P05-Interferometer`` can both
|
||||||
|
survive the UNIQUE constraint while both matching the
|
||||||
|
case-insensitive ``lower(name) = lower(?)`` lookup that the
|
||||||
|
migration uses to find the canonical row. That ambiguity
|
||||||
|
(which canonical row should dependents rekey into?) is exactly
|
||||||
|
the integrity failure the migration is guarding against.
|
||||||
|
"""
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
_seed_shadow_project(conn, "p05-interferometer")
|
||||||
|
_seed_shadow_project(conn, "P05-Interferometer")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
assert plan.integrity_errors
|
||||||
|
with pytest.raises(mig.MigrationRefused):
|
||||||
|
mig.apply_plan(conn, plan)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# reporting tests
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_plan_to_json_dict_is_serializable(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
payload = mig.plan_to_json_dict(plan)
|
||||||
|
# Must be JSON-serializable
|
||||||
|
json_str = json.dumps(payload, default=str)
|
||||||
|
assert "p05-interferometer" in json_str
|
||||||
|
assert payload["counts"]["state_rekey_rows"] == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_write_report_creates_file(tmp_path, project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
report_dir = tmp_path / "reports"
|
||||||
|
report_path = mig.write_report(
|
||||||
|
plan,
|
||||||
|
summary=None,
|
||||||
|
db_path=Path("/tmp/fake.db"),
|
||||||
|
registry_path=registry_path,
|
||||||
|
mode="dry-run",
|
||||||
|
report_dir=report_dir,
|
||||||
|
)
|
||||||
|
assert report_path.exists()
|
||||||
|
payload = json.loads(report_path.read_text(encoding="utf-8"))
|
||||||
|
assert payload["mode"] == "dry-run"
|
||||||
|
assert "plan" in payload
|
||||||
|
|
||||||
|
|
||||||
|
def test_render_plan_text_on_empty_plan(project_registry):
|
||||||
|
registry_path = project_registry() # empty
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
text = mig.render_plan_text(plan)
|
||||||
|
assert "nothing to plan" in text.lower()
|
||||||
|
|
||||||
|
|
||||||
|
def test_render_plan_text_on_collision(project_registry):
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
|
||||||
|
_seed_state_row(conn, shadow_id, "status", "phase", "A")
|
||||||
|
_seed_state_row(conn, canonical_id, "status", "phase", "B")
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
text = mig.render_plan_text(plan)
|
||||||
|
assert "COLLISION" in text.upper()
|
||||||
|
assert "REFUSE" in text.upper() or "refuse" in text.lower()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# gap-closed companion test — the flip side of
|
||||||
|
# test_legacy_alias_keyed_state_is_invisible_until_migrated in
|
||||||
|
# test_project_state.py. After running this migration, the legacy row
|
||||||
|
# IS reachable via the canonical id.
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_legacy_alias_gap_is_closed_after_migration(project_registry):
|
||||||
|
"""End-to-end regression test for the canonicalization gap.
|
||||||
|
|
||||||
|
Simulates the exact scenario from
|
||||||
|
test_legacy_alias_keyed_state_is_invisible_until_migrated in
|
||||||
|
test_project_state.py — a shadow projects row with a state row
|
||||||
|
pointing at it. Runs the migration. Verifies the state is now
|
||||||
|
reachable via the canonical id.
|
||||||
|
"""
|
||||||
|
registry_path = project_registry(
|
||||||
|
("p05-interferometer", ["p05", "interferometer"])
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = _open_db_connection()
|
||||||
|
try:
|
||||||
|
shadow_id = _seed_shadow_project(conn, "p05")
|
||||||
|
_seed_state_row(
|
||||||
|
conn, shadow_id, "status", "legacy_focus", "Wave 1 ingestion"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Before migration: the legacy row is invisible to get_state
|
||||||
|
# (this is the documented gap, covered in test_project_state.py)
|
||||||
|
assert all(
|
||||||
|
entry.value != "Wave 1 ingestion" for entry in get_state("p05")
|
||||||
|
)
|
||||||
|
assert all(
|
||||||
|
entry.value != "Wave 1 ingestion"
|
||||||
|
for entry in get_state("p05-interferometer")
|
||||||
|
)
|
||||||
|
|
||||||
|
# Run the migration
|
||||||
|
plan = mig.build_plan(conn, registry_path)
|
||||||
|
mig.apply_plan(conn, plan)
|
||||||
|
finally:
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
# After migration: the row is reachable via canonical AND alias
|
||||||
|
via_canonical = get_state("p05-interferometer")
|
||||||
|
via_alias = get_state("p05")
|
||||||
|
assert any(e.value == "Wave 1 ingestion" for e in via_canonical)
|
||||||
|
assert any(e.value == "Wave 1 ingestion" for e in via_alias)
|
||||||
Reference in New Issue
Block a user