Closes the compatibility gap documented in
docs/architecture/project-identity-canonicalization.md. Before fb6298a,
writes to project_state, memories, and interactions stored the raw
project name. After fb6298a every service-layer entry point
canonicalizes through the registry, which silently made pre-fix
alias-keyed rows unreachable from the new read path. Now there's
a migration tool to find and fix them.
This commit is the tool and its tests. The tool is NOT run against
the live Dalidou DB in this commit — that's a separate supervised
manual step after reviewing the dry-run output.
scripts/migrate_legacy_aliases.py
---------------------------------
Standalone offline migration tool. Dry-run default, --apply explicit.
What it inspects:
- projects: rows whose name is a registered alias and differs from
the canonical project_id (shadow rows)
- project_state: rows whose project_id points at a shadow; plan
rekeys them to the canonical row's id. (category, key) collisions
against the canonical block the apply step until a human resolves
- memories: rows whose project column is a registered alias. Plain
string rekey. Dedup collisions (after rekey, same
(memory_type, content, project, status)) are handled by the
existing memory supersession model: newer row stays active, older
becomes superseded with updated_at as tiebreaker
- interactions: rows whose project column is a registered alias.
Plain string rekey, no collision handling
What it does NOT do:
- Never touches rows that are already canonical
- Never auto-resolves project_state collisions (refuses until the
human picks a winner via POST /project/state)
- Never creates data; only rekeys or supersedes
- Never runs outside a single SQLite transaction; any failure rolls
back the entire migration
Safety rails:
- Dry-run is default. --apply is explicit.
- Apply on empty plan refuses unless --allow-empty (prevents
accidental runs that look meaningful but did nothing)
- Apply refuses on any project_state collision
- Apply refuses on integrity errors (e.g. two case-variant rows
both matching the canonical lookup)
- Writes a JSON report to data/migrations/ on every run (dry-run
and apply alike) for audit
- Idempotent: running twice produces the same final state as
running once. The second run finds zero shadow rows and exits
clean.
CLI flags:
--registry PATH override ATOCORE_PROJECT_REGISTRY_PATH
--db PATH override the AtoCore SQLite DB path
--apply actually mutate (default is dry-run)
--allow-empty permit --apply on an empty plan
--report-dir PATH where to write the JSON report
--json emit the plan as JSON instead of human prose
Smoke test against the Phase 9 validation DB produces the expected
"Nothing to migrate. The database is clean." output with 4 known
canonical projects and 0 shadows.
tests/test_migrate_legacy_aliases.py
------------------------------------
19 new tests, all green:
Plan-building:
- test_dry_run_on_empty_registry_reports_empty_plan
- test_dry_run_on_clean_registered_db_reports_empty_plan
- test_dry_run_finds_shadow_project
- test_dry_run_plans_state_rekey_without_collisions
- test_dry_run_detects_state_collision
- test_dry_run_plans_memory_rekey_and_supersession
- test_dry_run_plans_interaction_rekey
Apply:
- test_apply_refuses_on_state_collision
- test_apply_migrates_clean_shadow_end_to_end (verifies get_state
can see the state via BOTH the alias AND the canonical after
migration)
- test_apply_drops_shadow_state_duplicate_without_collision
(same (category, key, value) on both sides - mark shadow
superseded, don't hit the UNIQUE constraint)
- test_apply_migrates_memories
- test_apply_migrates_interactions
- test_apply_is_idempotent
- test_apply_refuses_with_integrity_errors (uses case-variant
canonical rows to work around projects.name UNIQUE constraint;
verifies the case-insensitive duplicate detection works)
Reporting:
- test_plan_to_json_dict_is_serializable
- test_write_report_creates_file
- test_render_plan_text_on_empty_plan
- test_render_plan_text_on_collision
End-to-end gap closure (the most important test):
- test_legacy_alias_gap_is_closed_after_migration
- Seeds the exact same scenario as
test_legacy_alias_keyed_state_is_invisible_until_migrated
in test_project_state.py (which documents the pre-migration
gap)
- Confirms the row is invisible before migration
- Runs the migration
- Verifies the row is reachable via BOTH the canonical id AND
the alias afterward
- This test and the pre-migration gap test together lock in
"before migration: invisible, after migration: reachable"
as the documented invariant
Full suite: 194 passing (was 175), 1 warning. The +19 is the new
migration test file.
Next concrete step after this commit
------------------------------------
- Run the dry-run against the live Dalidou DB to find out the
actual blast radius. The script is the inspection SQL, codified.
- Review the dry-run output together
- If clean (zero shadows), no apply needed; close the doc gap as
"verified nothing to migrate on this deployment"
- If there are shadows, resolve any collisions via
POST /project/state, then run --apply under supervision
- After apply, the test_legacy_alias_keyed_state_is_invisible_until_migrated
test still passes (it simulates the gap directly, so it's
independent of the live DB state) and the gap-closed companion
test continues to guard forward