Closes the compatibility gap documented in
docs/architecture/project-identity-canonicalization.md. Before fb6298a,
writes to project_state, memories, and interactions stored the raw
project name. After fb6298a every service-layer entry point
canonicalizes through the registry, which silently made pre-fix
alias-keyed rows unreachable from the new read path. Now there's
a migration tool to find and fix them.
This commit is the tool and its tests. The tool is NOT run against
the live Dalidou DB in this commit — that's a separate supervised
manual step after reviewing the dry-run output.
scripts/migrate_legacy_aliases.py
---------------------------------
Standalone offline migration tool. Dry-run default, --apply explicit.
What it inspects:
- projects: rows whose name is a registered alias and differs from
the canonical project_id (shadow rows)
- project_state: rows whose project_id points at a shadow; plan
rekeys them to the canonical row's id. (category, key) collisions
against the canonical block the apply step until a human resolves
- memories: rows whose project column is a registered alias. Plain
string rekey. Dedup collisions (after rekey, same
(memory_type, content, project, status)) are handled by the
existing memory supersession model: newer row stays active, older
becomes superseded with updated_at as tiebreaker
- interactions: rows whose project column is a registered alias.
Plain string rekey, no collision handling
What it does NOT do:
- Never touches rows that are already canonical
- Never auto-resolves project_state collisions (refuses until the
human picks a winner via POST /project/state)
- Never creates data; only rekeys or supersedes
- Never runs outside a single SQLite transaction; any failure rolls
back the entire migration
Safety rails:
- Dry-run is default. --apply is explicit.
- Apply on empty plan refuses unless --allow-empty (prevents
accidental runs that look meaningful but did nothing)
- Apply refuses on any project_state collision
- Apply refuses on integrity errors (e.g. two case-variant rows
both matching the canonical lookup)
- Writes a JSON report to data/migrations/ on every run (dry-run
and apply alike) for audit
- Idempotent: running twice produces the same final state as
running once. The second run finds zero shadow rows and exits
clean.
CLI flags:
--registry PATH override ATOCORE_PROJECT_REGISTRY_PATH
--db PATH override the AtoCore SQLite DB path
--apply actually mutate (default is dry-run)
--allow-empty permit --apply on an empty plan
--report-dir PATH where to write the JSON report
--json emit the plan as JSON instead of human prose
Smoke test against the Phase 9 validation DB produces the expected
"Nothing to migrate. The database is clean." output with 4 known
canonical projects and 0 shadows.
tests/test_migrate_legacy_aliases.py
------------------------------------
19 new tests, all green:
Plan-building:
- test_dry_run_on_empty_registry_reports_empty_plan
- test_dry_run_on_clean_registered_db_reports_empty_plan
- test_dry_run_finds_shadow_project
- test_dry_run_plans_state_rekey_without_collisions
- test_dry_run_detects_state_collision
- test_dry_run_plans_memory_rekey_and_supersession
- test_dry_run_plans_interaction_rekey
Apply:
- test_apply_refuses_on_state_collision
- test_apply_migrates_clean_shadow_end_to_end (verifies get_state
can see the state via BOTH the alias AND the canonical after
migration)
- test_apply_drops_shadow_state_duplicate_without_collision
(same (category, key, value) on both sides - mark shadow
superseded, don't hit the UNIQUE constraint)
- test_apply_migrates_memories
- test_apply_migrates_interactions
- test_apply_is_idempotent
- test_apply_refuses_with_integrity_errors (uses case-variant
canonical rows to work around projects.name UNIQUE constraint;
verifies the case-insensitive duplicate detection works)
Reporting:
- test_plan_to_json_dict_is_serializable
- test_write_report_creates_file
- test_render_plan_text_on_empty_plan
- test_render_plan_text_on_collision
End-to-end gap closure (the most important test):
- test_legacy_alias_gap_is_closed_after_migration
- Seeds the exact same scenario as
test_legacy_alias_keyed_state_is_invisible_until_migrated
in test_project_state.py (which documents the pre-migration
gap)
- Confirms the row is invisible before migration
- Runs the migration
- Verifies the row is reachable via BOTH the canonical id AND
the alias afterward
- This test and the pre-migration gap test together lock in
"before migration: invisible, after migration: reachable"
as the documented invariant
Full suite: 194 passing (was 175), 1 warning. The +19 is the new
migration test file.
Next concrete step after this commit
------------------------------------
- Run the dry-run against the live Dalidou DB to find out the
actual blast radius. The script is the inspection SQL, codified.
- Review the dry-run output together
- If clean (zero shadows), no apply needed; close the doc gap as
"verified nothing to migrate on this deployment"
- If there are shadows, resolve any collisions via
POST /project/state, then run --apply under supervision
- After apply, the test_legacy_alias_keyed_state_is_invisible_until_migrated
test still passes (it simulates the gap directly, so it's
independent of the live DB state) and the gap-closed companion
test continues to guard forward
Integrate codex/port-atocore-ops-client (operator client + operations
playbook) so the dalidou-storage-foundation branch can fast-forward
into main.
# Conflicts:
# README.md
Session 1 of the four-session plan. Empirically exercises the Phase 9
loop (capture -> reinforce -> extract) for the first time and lands
three small hygiene fixes.
Validation script + report
--------------------------
scripts/phase9_first_real_use.py — reproducible script that:
- sets up an isolated SQLite + Chroma store under
data/validation/phase9-first-use (gitignored)
- seeds 3 active memories
- runs 8 sample interactions through capture + reinforce + extract
- prints what each step produced and reinforcement state at the end
- supports --json output for downstream tooling
docs/phase9-first-real-use.md — narrative report of the run with:
- extraction results table (8/8 expectations met exactly)
- the empirical finding that REINFORCEMENT MATCHED ZERO seeds
despite sample 5 clearly echoing the rebase preference memory
- root cause analysis: the substring matcher is too brittle for
natural paraphrases (e.g. "prefers" vs "I prefer", "history"
vs "the history")
- recommended fix: replace substring matcher with a token-overlap
matcher (>=70% of memory tokens present in response, with
light stemming and a small stop list)
- explicit note that the fix is queued as a follow-up commit, not
bundled into the report — keeps the audit trail clean
Key extraction results from the run:
- all 7 heading/sentence rules fired correctly
- 0 false positives on the prose-only sample (the most important
sanity check)
- long content preserved without truncation
- dedup correctly kept three distinct cues from one interaction
- project scoping flowed cleanly through the pipeline
Hygiene 1: FastAPI lifespan migration (src/atocore/main.py)
- Replaced @app.on_event("startup") with the modern @asynccontextmanager
lifespan handler
- Same setup work (setup_logging, ensure_runtime_dirs, init_db,
init_project_state_schema, startup_ready log)
- Removes the two on_event deprecation warnings from every test run
- Test suite now shows 1 warning instead of 3
Hygiene 2: EXTRACTOR_VERSION constant (src/atocore/memory/extractor.py)
- Added EXTRACTOR_VERSION = "0.1.0" with a versioned change log comment
- MemoryCandidate dataclass carries extractor_version on every candidate
- POST /interactions/{id}/extract response now includes extractor_version
on both the top level (current run) and on each candidate
- Implements the versioning requirement called out in
docs/architecture/promotion-rules.md so old candidates can be
identified and re-evaluated when the rule set evolves
Hygiene 3: ~/.git-credentials cleanup (out-of-tree, not committed)
- Removed the dead OAUTH_USER:<jwt> line for dalidou:3000 that was
being silently rewritten by the system credential manager on every
push attempt
- Configured credential.http://dalidou:3000.helper with the empty-string
sentinel pattern so the URL-specific helper chain is exactly
["", store] instead of inheriting the system-level "manager" helper
that ships with Git for Windows
- Same fix for the 100.80.199.40 (Tailscale) entry
- Verified end to end: a fresh push using only the cleaned credentials
file (no embedded URL) authenticates as Antoine and lands cleanly
Full suite: 160 passing (no change from previous), 1 warning
(was 3) thanks to the lifespan migration.