Files
ATOCore/docs/architecture/project-identity-canonicalization.md

463 lines
21 KiB
Markdown
Raw Permalink Normal View History

docs(arch): project-identity-canonicalization contract Codifies the helper-at-every-service-boundary rule that fb6298a implemented across the eight current callsites. The contract is intentionally simple but easy to forget, so it lives in its own doc that the engineering layer V1 implementation sprint can read before adding new project-keyed entity surfaces. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - The contract: every read/write that takes a project name MUST call resolve_project_name() before the value crosses a service boundary; canonicalization happens once, at the first statement after input validation, never later - The helper API: resolve_project_name(name) returns the canonical project_id for registered names, the input unchanged for empty or unregistered names (the second case is the backwards-compat path for hand-curated state predating the registry) - Full table of the 8 current callsites: builder.build_context, project_state.set_state/get_state/invalidate_state, interactions.record_interaction/list_interactions, memory.create_memory/get_memories - Where the helper is intentionally NOT called and why: legacy ensure_project lookup, retriever's own _project_match_boost (which already calls get_registered_project), _rank_chunks secondary substring boost (multiplicative not filter, can't drop relevant chunks), update_memory (no project field update), unregistered names (the rule applied to a name with no record) - Why this is the trust hierarchy in action: Layer 3 trusted state has to be findable to win the trust battle; an un-canonicalized lookup silently makes Layer 3 invisible and the system falls through to lower-trust retrieved chunks with no signal to the human - The 4-step rule for new entry points: identify project-keyed reads/writes, place the call as the first statement after validation, add a regression test using the project_registry fixture, verify None/empty paths - How the project_registry fixture works with a copy-pasteable example - What the rule does NOT cover: alias creation (registry's own write path), registry hot-reloading (no in-process cache by design), cross-project dedup (collision detection at registration), time-bounded canonicalization (canonical id is stable forever), legacy data migration (open follow-up) - Engineering layer V1 implications: every new service entry point in the entities/relationships/conflicts/mirror modules must apply the helper at the first statement after validation; treated as code review failure if missing - Open follow-ups: legacy data migration script (~30 LOC), registry file caching when projects scale beyond ~50, case sensitivity audit when entity-side storage lands, _rank_chunks cleanup, documentation discoverability (intentional redundancy between this doc, the helper docstring, and per-callsite comments) - Quick reference card: copy-pasteable template for new service functions master-plan-status.md updated ----------------------------- - New doc added to the engineering-layer planning sprint listing - Marked as required reading before V1 implementation begins - Note that V1 must apply the contract at every new service-layer entry point Pure doc work, no code changes. Full suite stays at 174 passing because no source changed.
2026-04-07 19:32:31 -04:00
# Project Identity Canonicalization
## Why this document exists
AtoCore identifies projects by name in many places: trusted state
rows, memories, captured interactions, query/context API parameters,
extractor candidates, future engineering entities. Without an
explicit rule, every callsite would have to remember to canonicalize
project names through the registry — and the recent codex review
caught exactly the bug class that follows when one of them forgets.
The fix landed in `fb6298a` and works correctly today. This document
exists to make the rule **explicit and discoverable** so the
engineering layer V1 implementation, future entity write paths, and
any new agent integration don't reintroduce the same fragmentation
when nobody is looking.
## The contract
> **Every read/write that takes a project name MUST canonicalize it
> through `resolve_project_name()` before the value crosses a service
> boundary.**
The boundary is wherever a project name becomes a database row, a
query filter, an attribute on a stored object, or a key for any
lookup. The canonicalization happens **once**, at that boundary,
before the underlying storage primitive is called.
Symbolically:
```
HTTP layer (raw user input)
service entry point
project_name = resolve_project_name(project_name) ← ONLY canonical from this point
storage / queries / further service calls
```
The rule is intentionally simple. There's no per-call exception,
no "trust me, the caller already canonicalized it" shortcut, no
opt-out flag. Every service-layer entry point applies the helper
the moment it receives a project name from outside the service.
## The helper
```python
# src/atocore/projects/registry.py
def resolve_project_name(name: str | None) -> str:
"""Canonicalize a project name through the registry.
Returns the canonical project_id if the input matches any
registered project's id or alias. Returns the input unchanged
when it's empty or not in the registry — the second case keeps
backwards compatibility with hand-curated state, memories, and
interactions that predate the registry, or for projects that
are intentionally not registered.
"""
if not name:
return name or ""
project = get_registered_project(name)
if project is not None:
return project.project_id
return name
```
Three behaviors worth keeping in mind:
1. **Empty / None input → empty string output.** Callers don't have
to pre-check; passing `""` or `None` to a query filter still
works as "no project scope".
2. **Registered alias → canonical project_id.** The helper does the
case-insensitive lookup and returns the project's `id` field
(e.g. `"p05" → "p05-interferometer"`).
3. **Unregistered name → input unchanged.** This is the
backwards-compatibility path. Hand-curated state, memories, or
interactions created under a name that isn't in the registry
keep working. The retrieval is then "best effort" — the raw
string is used as the SQL key, which still finds the row that
was stored under the same raw string. This path exists so the
engineering layer V1 doesn't have to also be a data migration.
## Where the helper is currently called
As of `fb6298a`, the helper is invoked at exactly these eight
service-layer entry points:
| Module | Function | What gets canonicalized |
|---|---|---|
| `src/atocore/context/builder.py` | `build_context` | the `project_hint` parameter, before the trusted state lookup |
| `src/atocore/context/project_state.py` | `set_state` | `project_name`, before `ensure_project()` |
| `src/atocore/context/project_state.py` | `get_state` | `project_name`, before the SQL lookup |
| `src/atocore/context/project_state.py` | `invalidate_state` | `project_name`, before the SQL lookup |
| `src/atocore/interactions/service.py` | `record_interaction` | `project`, before insert |
| `src/atocore/interactions/service.py` | `list_interactions` | `project` filter parameter, before WHERE clause |
| `src/atocore/memory/service.py` | `create_memory` | `project`, before insert |
| `src/atocore/memory/service.py` | `get_memories` | `project` filter parameter, before WHERE clause |
Every one of those is the **first** thing the function does after
input validation. There is no path through any of those eight
functions where a project name reaches storage without passing
through `resolve_project_name`.
## Where the helper is NOT called (and why that's correct)
These places intentionally do not canonicalize:
1. **`update_memory`'s project field.** The API does not allow
changing a memory's project after creation, so there's no
project to canonicalize. The function only updates `content`,
`confidence`, and `status`.
2. **The retriever's `_project_match_boost` substring matcher.** It
already calls `get_registered_project` internally to expand the
hint into the candidate set (canonical id + all aliases + last
path segments). It accepts the raw hint by design.
3. **`_rank_chunks`'s secondary substring boost in
`builder.py`.** Still uses the raw hint. This is a multiplicative
factor on top of correct retrieval, not a filter, so it cannot
drop relevant chunks. Tracked as a future cleanup but not
critical.
4. **Direct SQL queries for the projects table itself** (e.g.
`ensure_project`'s lookup). These are intentional case-insensitive
raw lookups against the column the canonical id is stored in.
`set_state` already canonicalized before reaching `ensure_project`,
so the value passed is the canonical id by definition.
5. **Hand-authored project names that aren't in the registry.**
The helper returns those unchanged. This is the backwards-compat
path mentioned above; it is *not* a violation of the rule, it's
the rule applied to a name with no registry record.
## Why this is the trust hierarchy in action
The whole point of AtoCore is the trust hierarchy from the operating
model:
1. Trusted Project State (Layer 3) is the most authoritative layer
2. Memories (active) are second
3. Source chunks (raw retrieved content) are last
If a caller passes the alias `p05` and Layer 3 was written under
`p05-interferometer`, and the lookup fails to find the canonical
row, **the trust hierarchy collapses**. The most-authoritative
layer is silently invisible to the caller. The system would still
return *something* — namely, lower-trust retrieved chunks — and the
human would never know they got a degraded answer.
The canonicalization helper is what makes the trust hierarchy
**dependable**. Layer 3 is supposed to win every time. To win it
has to be findable. To be findable, the lookup key has to match
how the row was stored. And the only way to guarantee that match
across every entry point is to canonicalize at every boundary.
docs+test: clarify legacy alias compatibility gap, add gap regression test Codex caught a real documentation accuracy bug in the previous canonicalization doc commit (f521aab). The doc claimed that rows written under aliases before fb6298a "still work via the unregistered-name fallback path" — that is wrong for REGISTERED aliases, which is exactly the case that matters. The unregistered-name fallback only saves you when the project was never in the registry: a row stored under "orphan-project" is read back via "orphan-project", both pass through resolve_project_name unchanged, and the strings line up. For a registered alias like "p05", the helper rewrites the read key to "p05-interferometer" but does NOT rewrite the storage key, so the legacy row becomes silently invisible. This commit corrects the doc and locks the gap behavior in with a regression test, so the issue cannot be lost again. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - Removed the misleading claim from the "What this rule does NOT cover" section. Replaced with a pointer to the new gap section and an explicit statement that the migration is required before engineering V1 ships. - New "Compatibility gap: legacy alias-keyed rows" section between "Why this is the trust hierarchy in action" and "The rule for new entry points". This is the natural insertion point because the gap is exactly the trust hierarchy failing for legacy data. The section covers: * a worked T0/T1 timeline showing the exact failure mode * what is at risk on the live Dalidou DB, ranked by trust tier: projects table (shadow rows), project_state (highest risk because Layer 3 is most-authoritative), memories, interactions * inspection SQL queries for measuring the actual blast radius on the live DB before running any migration * the spec for the migration script: walk projects, find shadow rows, merge dependent state via the conflict model when there are collisions, dry-run mode, idempotent * explicit statement that this is required pre-V1 because V1 will add new project-keyed tables and the killer correctness queries from engineering-query-catalog.md would report wrong results against any project that has shadow rows - "Open follow-ups" item 1 promoted from "tracked optional" to "REQUIRED before engineering V1 ships, NOT optional" with a more honest cost estimate (~150 LOC migration + ~50 LOC tests + supervised live run, not the previous optimistic ~30 LOC) - TL;DR rewritten to mention the gap explicitly and re-order the open follow-ups so the migration is the top priority tests/test_project_state.py --------------------------- - New test_legacy_alias_keyed_state_is_invisible_until_migrated - Inserts a "p05" project row + a project_state row pointing at it via raw SQL (bypassing set_state which now canonicalizes), simulating a pre-fix legacy row - Verifies the canonicalized get_state path can NOT see the row via either the alias or the canonical id — this is the bug - Verifies the row is still in the database (just unreachable), so the migration script has something to find - The docstring explicitly says: "When the legacy alias migration script lands, this test must be inverted." Future readers will know exactly when and how to update it. Full suite: 175 passing (was 174), 1 warning. The +1 is the new gap regression test. What this commit does NOT do ---------------------------- - The migration script itself is NOT in this commit. Codex's finding was a doc accuracy issue, and the right scope is fix the doc + lock the gap behavior in. Writing the migration is the next concrete step but is bigger (~200 LOC + dry-run mode + collision handling via the conflict model + supervised run on the live Dalidou DB), warrants its own commit, and probably warrants a "draft + review the dry-run output before applying" workflow rather than a single shot. - Existing tests are unchanged. The new test stands alone as a documented gap; the 12 canonicalization tests from fb6298a still pass without modification.
2026-04-07 20:14:19 -04:00
## Compatibility gap: legacy alias-keyed rows
The canonicalization rule fixes new writes going forward, but it
does NOT fix rows that were already written under a registered
alias before `fb6298a` landed. Those rows have a real, concrete
gap that must be closed by a one-time migration before the
engineering layer V1 ships.
The exact failure mode:
```
time T0 (before fb6298a):
POST /project/state {project: "p05", ...}
-> set_state("p05", ...) # no canonicalization
-> ensure_project("p05") # creates a "p05" row
-> writes state with project_id pointing at the "p05" row
time T1 (after fb6298a):
POST /project/state {project: "p05", ...} (or any read)
-> set_state("p05", ...)
-> resolve_project_name("p05") -> "p05-interferometer"
-> ensure_project("p05-interferometer") # creates a SECOND row
-> writes new state under the canonical row
-> the T0 state is still in the "p05" row, INVISIBLE to every
canonicalized read
```
The unregistered-name fallback path saves you when the project was
never in the registry: a row stored under `"orphan-project"` is read
back via `"orphan-project"`, both pass through `resolve_project_name`
unchanged, and the strings line up. **It does not save you when the
name is a registered alias** — the helper rewrites the read key but
not the storage key, and the legacy row becomes invisible.
What is at risk on the live Dalidou DB:
1. **`projects` table**: any rows whose `name` column matches a
registered alias (one row per alias actually written under
before the fix landed). These shadow the canonical project row
and silently fragment the projects namespace.
2. **`project_state` table**: any rows whose `project_id` points
at one of those shadow project rows. **This is the highest-risk
case** because it directly defeats the trust hierarchy: Layer 3
trusted state becomes invisible to every canonicalized lookup.
3. **`memories` table**: any rows whose `project` column is a
registered alias. Reinforcement and extraction queries will
miss them.
4. **`interactions` table**: any rows whose `project` column is a
registered alias. Listing and downstream reflection will miss
them.
How to find out the actual blast radius on the live Dalidou DB:
```sql
-- inspect the projects table for alias-shadow rows
SELECT id, name FROM projects;
-- count alias-keyed memories per known alias
SELECT project, COUNT(*) FROM memories
WHERE project IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core')
GROUP BY project;
-- count alias-keyed interactions
SELECT project, COUNT(*) FROM interactions
WHERE project IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core')
GROUP BY project;
-- count alias-shadowed project_state rows by project name
SELECT p.name, COUNT(*) FROM project_state ps
JOIN projects p ON ps.project_id = p.id
WHERE p.name IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core');
```
The migration that closes the gap has to:
1. For each registered project, find all `projects` rows whose
name matches one of the project's aliases AND is not the
canonical id itself. These are the "shadow" rows.
2. For each shadow row, MERGE its dependent state into the
canonical project's row:
- rekey `project_state.project_id` from shadow → canonical
- if the merge would create a `(project_id, category, key)`
collision (a state row already exists under the canonical
id with the same category+key), the migration must surface
the conflict via the existing conflict model and pause
until the human resolves it
- delete the now-empty shadow `projects` row
3. For `memories` and `interactions`, the fix is simpler because
the alias appears as a string column (not a foreign key):
`UPDATE memories SET project = canonical WHERE project = alias`,
then same for interactions.
4. The migration must run in dry-run mode first, printing the
exact rows it would touch and the canonical destinations they
would be merged into.
5. The migration must be idempotent — running it twice produces
the same final state as running it once.
This work is **required before the engineering layer V1 ships**
because V1 will add new `entities`, `relationships`, `conflicts`,
and `mirror_regeneration_failures` tables that all key on the
canonical project id. Any leaked alias-keyed rows in the existing
tables would show up in V1 reads as silently missing data, and
the killer-correctness queries from `engineering-query-catalog.md`
(orphan requirements, decisions on flagged assumptions,
unsupported claims) would report wrong results against any project
that has shadow rows.
The migration script does NOT exist yet. The open follow-ups
section below tracks it as the next concrete step.
docs(arch): project-identity-canonicalization contract Codifies the helper-at-every-service-boundary rule that fb6298a implemented across the eight current callsites. The contract is intentionally simple but easy to forget, so it lives in its own doc that the engineering layer V1 implementation sprint can read before adding new project-keyed entity surfaces. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - The contract: every read/write that takes a project name MUST call resolve_project_name() before the value crosses a service boundary; canonicalization happens once, at the first statement after input validation, never later - The helper API: resolve_project_name(name) returns the canonical project_id for registered names, the input unchanged for empty or unregistered names (the second case is the backwards-compat path for hand-curated state predating the registry) - Full table of the 8 current callsites: builder.build_context, project_state.set_state/get_state/invalidate_state, interactions.record_interaction/list_interactions, memory.create_memory/get_memories - Where the helper is intentionally NOT called and why: legacy ensure_project lookup, retriever's own _project_match_boost (which already calls get_registered_project), _rank_chunks secondary substring boost (multiplicative not filter, can't drop relevant chunks), update_memory (no project field update), unregistered names (the rule applied to a name with no record) - Why this is the trust hierarchy in action: Layer 3 trusted state has to be findable to win the trust battle; an un-canonicalized lookup silently makes Layer 3 invisible and the system falls through to lower-trust retrieved chunks with no signal to the human - The 4-step rule for new entry points: identify project-keyed reads/writes, place the call as the first statement after validation, add a regression test using the project_registry fixture, verify None/empty paths - How the project_registry fixture works with a copy-pasteable example - What the rule does NOT cover: alias creation (registry's own write path), registry hot-reloading (no in-process cache by design), cross-project dedup (collision detection at registration), time-bounded canonicalization (canonical id is stable forever), legacy data migration (open follow-up) - Engineering layer V1 implications: every new service entry point in the entities/relationships/conflicts/mirror modules must apply the helper at the first statement after validation; treated as code review failure if missing - Open follow-ups: legacy data migration script (~30 LOC), registry file caching when projects scale beyond ~50, case sensitivity audit when entity-side storage lands, _rank_chunks cleanup, documentation discoverability (intentional redundancy between this doc, the helper docstring, and per-callsite comments) - Quick reference card: copy-pasteable template for new service functions master-plan-status.md updated ----------------------------- - New doc added to the engineering-layer planning sprint listing - Marked as required reading before V1 implementation begins - Note that V1 must apply the contract at every new service-layer entry point Pure doc work, no code changes. Full suite stays at 174 passing because no source changed.
2026-04-07 19:32:31 -04:00
## The rule for new entry points
When you add a new service-layer function that takes a project name,
follow this checklist:
1. **Does the function read or write a row keyed by project?** If
yes, you must call `resolve_project_name`. If no (e.g. it only
takes `project` as a label for logging), you may skip the
canonicalization but you should add a comment explaining why.
2. **Where does the canonicalization go?** As the first statement
after input validation. Not later, not "before storage", not
"in the helper that does the actual write". As the first
statement, so any subsequent service call inside the function
sees the canonical value.
3. **Add a regression test that uses an alias.** Use the
`project_registry` fixture from `tests/conftest.py` to set up
a temp registry with at least one project + aliases, then
verify the new function works when called with the alias and
when called with the canonical id.
4. **If the function can be called with `None` or empty string,
verify that path too.** The helper handles it correctly but
the function-under-test might not.
## How the `project_registry` test fixture works
`tests/conftest.py::project_registry` returns a callable that
takes one or more `(project_id, [aliases])` tuples (or just a bare
`project_id` string), writes them into a temp registry file,
points `ATOCORE_PROJECT_REGISTRY_PATH` at it, and reloads
`config.settings`. Use it like:
```python
def test_my_new_thing_canonicalizes(project_registry):
project_registry(("p05-interferometer", ["p05", "interferometer"]))
# ... call your service function with "p05" ...
# ... assert it works the same as if you'd passed "p05-interferometer" ...
```
The fixture is reused by all 12 alias-canonicalization regression
tests added in `fb6298a`. Following the same pattern for new
features is the cheapest way to keep the contract intact.
## What this rule does NOT cover
1. **Alias creation / management.** This document is about reading
and writing project-keyed data. Adding new projects or new
aliases is the registry's own write path
(`POST /projects/register`, `PUT /projects/{name}`), which
already enforces collision detection and atomic file writes.
2. **Registry hot-reloading.** The helper calls
`load_project_registry()` on every invocation, which reads the
JSON file each time. There is no in-process cache. If the
registry file changes, the next call sees the new contents.
Performance is fine for the current registry size but if it
becomes a bottleneck, add a versioned cache here, not at every
call site.
3. **Cross-project deduplication.** If two different projects in
the registry happen to share an alias, the registry's collision
detection blocks the second one at registration time, so this
case can't arise in practice. The helper does not handle it
defensively.
4. **Time-bounded canonicalization.** A project's canonical id is
stable. Aliases can be added or removed via
`PUT /projects/{name}`, but the canonical `id` field never
changes after registration. So a row written today under the
canonical id will always remain findable under that id, even
if the alias set evolves.
5. **Migration of legacy data.** If the live Dalidou DB has rows
that were written under aliases before the canonicalization
docs+test: clarify legacy alias compatibility gap, add gap regression test Codex caught a real documentation accuracy bug in the previous canonicalization doc commit (f521aab). The doc claimed that rows written under aliases before fb6298a "still work via the unregistered-name fallback path" — that is wrong for REGISTERED aliases, which is exactly the case that matters. The unregistered-name fallback only saves you when the project was never in the registry: a row stored under "orphan-project" is read back via "orphan-project", both pass through resolve_project_name unchanged, and the strings line up. For a registered alias like "p05", the helper rewrites the read key to "p05-interferometer" but does NOT rewrite the storage key, so the legacy row becomes silently invisible. This commit corrects the doc and locks the gap behavior in with a regression test, so the issue cannot be lost again. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - Removed the misleading claim from the "What this rule does NOT cover" section. Replaced with a pointer to the new gap section and an explicit statement that the migration is required before engineering V1 ships. - New "Compatibility gap: legacy alias-keyed rows" section between "Why this is the trust hierarchy in action" and "The rule for new entry points". This is the natural insertion point because the gap is exactly the trust hierarchy failing for legacy data. The section covers: * a worked T0/T1 timeline showing the exact failure mode * what is at risk on the live Dalidou DB, ranked by trust tier: projects table (shadow rows), project_state (highest risk because Layer 3 is most-authoritative), memories, interactions * inspection SQL queries for measuring the actual blast radius on the live DB before running any migration * the spec for the migration script: walk projects, find shadow rows, merge dependent state via the conflict model when there are collisions, dry-run mode, idempotent * explicit statement that this is required pre-V1 because V1 will add new project-keyed tables and the killer correctness queries from engineering-query-catalog.md would report wrong results against any project that has shadow rows - "Open follow-ups" item 1 promoted from "tracked optional" to "REQUIRED before engineering V1 ships, NOT optional" with a more honest cost estimate (~150 LOC migration + ~50 LOC tests + supervised live run, not the previous optimistic ~30 LOC) - TL;DR rewritten to mention the gap explicitly and re-order the open follow-ups so the migration is the top priority tests/test_project_state.py --------------------------- - New test_legacy_alias_keyed_state_is_invisible_until_migrated - Inserts a "p05" project row + a project_state row pointing at it via raw SQL (bypassing set_state which now canonicalizes), simulating a pre-fix legacy row - Verifies the canonicalized get_state path can NOT see the row via either the alias or the canonical id — this is the bug - Verifies the row is still in the database (just unreachable), so the migration script has something to find - The docstring explicitly says: "When the legacy alias migration script lands, this test must be inverted." Future readers will know exactly when and how to update it. Full suite: 175 passing (was 174), 1 warning. The +1 is the new gap regression test. What this commit does NOT do ---------------------------- - The migration script itself is NOT in this commit. Codex's finding was a doc accuracy issue, and the right scope is fix the doc + lock the gap behavior in. Writing the migration is the next concrete step but is bigger (~200 LOC + dry-run mode + collision handling via the conflict model + supervised run on the live Dalidou DB), warrants its own commit, and probably warrants a "draft + review the dry-run output before applying" workflow rather than a single shot. - Existing tests are unchanged. The new test stands alone as a documented gap; the 12 canonicalization tests from fb6298a still pass without modification.
2026-04-07 20:14:19 -04:00
landed (e.g. a `memories` row with `project = "p05"` from
before `fb6298a`), those rows are **NOT** automatically
reachable from the canonicalized read path. The unregistered-
name fallback only helps for project names that were never
registered at all; it does **NOT** help for names that are
registered as aliases. See the "Compatibility gap" section
below for the exact failure mode and the migration path that
has to run before the engineering layer V1 ships.
docs(arch): project-identity-canonicalization contract Codifies the helper-at-every-service-boundary rule that fb6298a implemented across the eight current callsites. The contract is intentionally simple but easy to forget, so it lives in its own doc that the engineering layer V1 implementation sprint can read before adding new project-keyed entity surfaces. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - The contract: every read/write that takes a project name MUST call resolve_project_name() before the value crosses a service boundary; canonicalization happens once, at the first statement after input validation, never later - The helper API: resolve_project_name(name) returns the canonical project_id for registered names, the input unchanged for empty or unregistered names (the second case is the backwards-compat path for hand-curated state predating the registry) - Full table of the 8 current callsites: builder.build_context, project_state.set_state/get_state/invalidate_state, interactions.record_interaction/list_interactions, memory.create_memory/get_memories - Where the helper is intentionally NOT called and why: legacy ensure_project lookup, retriever's own _project_match_boost (which already calls get_registered_project), _rank_chunks secondary substring boost (multiplicative not filter, can't drop relevant chunks), update_memory (no project field update), unregistered names (the rule applied to a name with no record) - Why this is the trust hierarchy in action: Layer 3 trusted state has to be findable to win the trust battle; an un-canonicalized lookup silently makes Layer 3 invisible and the system falls through to lower-trust retrieved chunks with no signal to the human - The 4-step rule for new entry points: identify project-keyed reads/writes, place the call as the first statement after validation, add a regression test using the project_registry fixture, verify None/empty paths - How the project_registry fixture works with a copy-pasteable example - What the rule does NOT cover: alias creation (registry's own write path), registry hot-reloading (no in-process cache by design), cross-project dedup (collision detection at registration), time-bounded canonicalization (canonical id is stable forever), legacy data migration (open follow-up) - Engineering layer V1 implications: every new service entry point in the entities/relationships/conflicts/mirror modules must apply the helper at the first statement after validation; treated as code review failure if missing - Open follow-ups: legacy data migration script (~30 LOC), registry file caching when projects scale beyond ~50, case sensitivity audit when entity-side storage lands, _rank_chunks cleanup, documentation discoverability (intentional redundancy between this doc, the helper docstring, and per-callsite comments) - Quick reference card: copy-pasteable template for new service functions master-plan-status.md updated ----------------------------- - New doc added to the engineering-layer planning sprint listing - Marked as required reading before V1 implementation begins - Note that V1 must apply the contract at every new service-layer entry point Pure doc work, no code changes. Full suite stays at 174 passing because no source changed.
2026-04-07 19:32:31 -04:00
## What this enables for the engineering layer V1
When the engineering layer ships per `engineering-v1-acceptance.md`,
it adds at least these new project-keyed surfaces:
- `entities` table with a `project_id` column
- `relationships` table that joins entities, indirectly project-keyed
- `conflicts` table with a `project` column
- `mirror_regeneration_failures` table with a `project` column
- new endpoints: `POST /entities/...`, `POST /ingest/kb-cad/export`,
`POST /ingest/kb-fem/export`, `GET /mirror/{project}/...`,
`GET /conflicts?project=...`
**Every one of those write/read paths needs to call
`resolve_project_name` at its service-layer entry point**, following
the same pattern as the eight existing call sites listed above. The
implementation sprint should:
1. Apply the helper at each new service entry point as the first
statement after input validation
2. Add a regression test using the `project_registry` fixture that
exercises an alias against each new entry point
3. Treat any new service function that takes a project name without
calling `resolve_project_name` as a code review failure
The pattern is simple enough to follow without thinking, which is
exactly the property we want for a contract that has to hold
across many independent additions.
## Open follow-ups
These are things the canonicalization story still has open. None
are blockers, but they're the rough edges to be aware of.
docs+test: clarify legacy alias compatibility gap, add gap regression test Codex caught a real documentation accuracy bug in the previous canonicalization doc commit (f521aab). The doc claimed that rows written under aliases before fb6298a "still work via the unregistered-name fallback path" — that is wrong for REGISTERED aliases, which is exactly the case that matters. The unregistered-name fallback only saves you when the project was never in the registry: a row stored under "orphan-project" is read back via "orphan-project", both pass through resolve_project_name unchanged, and the strings line up. For a registered alias like "p05", the helper rewrites the read key to "p05-interferometer" but does NOT rewrite the storage key, so the legacy row becomes silently invisible. This commit corrects the doc and locks the gap behavior in with a regression test, so the issue cannot be lost again. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - Removed the misleading claim from the "What this rule does NOT cover" section. Replaced with a pointer to the new gap section and an explicit statement that the migration is required before engineering V1 ships. - New "Compatibility gap: legacy alias-keyed rows" section between "Why this is the trust hierarchy in action" and "The rule for new entry points". This is the natural insertion point because the gap is exactly the trust hierarchy failing for legacy data. The section covers: * a worked T0/T1 timeline showing the exact failure mode * what is at risk on the live Dalidou DB, ranked by trust tier: projects table (shadow rows), project_state (highest risk because Layer 3 is most-authoritative), memories, interactions * inspection SQL queries for measuring the actual blast radius on the live DB before running any migration * the spec for the migration script: walk projects, find shadow rows, merge dependent state via the conflict model when there are collisions, dry-run mode, idempotent * explicit statement that this is required pre-V1 because V1 will add new project-keyed tables and the killer correctness queries from engineering-query-catalog.md would report wrong results against any project that has shadow rows - "Open follow-ups" item 1 promoted from "tracked optional" to "REQUIRED before engineering V1 ships, NOT optional" with a more honest cost estimate (~150 LOC migration + ~50 LOC tests + supervised live run, not the previous optimistic ~30 LOC) - TL;DR rewritten to mention the gap explicitly and re-order the open follow-ups so the migration is the top priority tests/test_project_state.py --------------------------- - New test_legacy_alias_keyed_state_is_invisible_until_migrated - Inserts a "p05" project row + a project_state row pointing at it via raw SQL (bypassing set_state which now canonicalizes), simulating a pre-fix legacy row - Verifies the canonicalized get_state path can NOT see the row via either the alias or the canonical id — this is the bug - Verifies the row is still in the database (just unreachable), so the migration script has something to find - The docstring explicitly says: "When the legacy alias migration script lands, this test must be inverted." Future readers will know exactly when and how to update it. Full suite: 175 passing (was 174), 1 warning. The +1 is the new gap regression test. What this commit does NOT do ---------------------------- - The migration script itself is NOT in this commit. Codex's finding was a doc accuracy issue, and the right scope is fix the doc + lock the gap behavior in. Writing the migration is the next concrete step but is bigger (~200 LOC + dry-run mode + collision handling via the conflict model + supervised run on the live Dalidou DB), warrants its own commit, and probably warrants a "draft + review the dry-run output before applying" workflow rather than a single shot. - Existing tests are unchanged. The new test stands alone as a documented gap; the 12 canonicalization tests from fb6298a still pass without modification.
2026-04-07 20:14:19 -04:00
1. **Legacy alias data migration — REQUIRED before engineering V1
ships, NOT optional.** If the live Dalidou DB has any rows
written under aliases before `fb6298a` landed, they are
silently invisible to the canonicalized read path (see the
"Compatibility gap" section above for the exact failure mode).
This is a real correctness issue, not a theoretical one: any
trusted state, memory, or interaction stored under `p05`,
`gigabit`, `polisher`, etc. before the fix landed is currently
unreachable from any service-layer query. The migration script
has to walk `projects`, `project_state`, `memories`, and
`interactions`, merge shadow rows into their canonical
counterparts (with conflict-model handling for any collisions),
and run in dry-run mode first. Estimated cost: ~150 LOC for
the migration script + ~50 LOC of tests + a one-time supervised
run on the live Dalidou DB. **This migration is the next
concrete pre-V1 step.**
docs(arch): project-identity-canonicalization contract Codifies the helper-at-every-service-boundary rule that fb6298a implemented across the eight current callsites. The contract is intentionally simple but easy to forget, so it lives in its own doc that the engineering layer V1 implementation sprint can read before adding new project-keyed entity surfaces. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - The contract: every read/write that takes a project name MUST call resolve_project_name() before the value crosses a service boundary; canonicalization happens once, at the first statement after input validation, never later - The helper API: resolve_project_name(name) returns the canonical project_id for registered names, the input unchanged for empty or unregistered names (the second case is the backwards-compat path for hand-curated state predating the registry) - Full table of the 8 current callsites: builder.build_context, project_state.set_state/get_state/invalidate_state, interactions.record_interaction/list_interactions, memory.create_memory/get_memories - Where the helper is intentionally NOT called and why: legacy ensure_project lookup, retriever's own _project_match_boost (which already calls get_registered_project), _rank_chunks secondary substring boost (multiplicative not filter, can't drop relevant chunks), update_memory (no project field update), unregistered names (the rule applied to a name with no record) - Why this is the trust hierarchy in action: Layer 3 trusted state has to be findable to win the trust battle; an un-canonicalized lookup silently makes Layer 3 invisible and the system falls through to lower-trust retrieved chunks with no signal to the human - The 4-step rule for new entry points: identify project-keyed reads/writes, place the call as the first statement after validation, add a regression test using the project_registry fixture, verify None/empty paths - How the project_registry fixture works with a copy-pasteable example - What the rule does NOT cover: alias creation (registry's own write path), registry hot-reloading (no in-process cache by design), cross-project dedup (collision detection at registration), time-bounded canonicalization (canonical id is stable forever), legacy data migration (open follow-up) - Engineering layer V1 implications: every new service entry point in the entities/relationships/conflicts/mirror modules must apply the helper at the first statement after validation; treated as code review failure if missing - Open follow-ups: legacy data migration script (~30 LOC), registry file caching when projects scale beyond ~50, case sensitivity audit when entity-side storage lands, _rank_chunks cleanup, documentation discoverability (intentional redundancy between this doc, the helper docstring, and per-callsite comments) - Quick reference card: copy-pasteable template for new service functions master-plan-status.md updated ----------------------------- - New doc added to the engineering-layer planning sprint listing - Marked as required reading before V1 implementation begins - Note that V1 must apply the contract at every new service-layer entry point Pure doc work, no code changes. Full suite stays at 174 passing because no source changed.
2026-04-07 19:32:31 -04:00
2. **Registry file caching.** `load_project_registry()` reads the
JSON file on every `resolve_project_name` call. With ~5
projects this is fine; with 50+ it would warrant a versioned
cache (cache key = file mtime + size). Defer until measured.
3. **Case sensitivity audit.** The helper uses
`get_registered_project` which lowercases for comparison. The
stored canonical id keeps its original casing. No bug today
because every test passes, but worth re-confirming when the
engineering layer adds entity-side storage.
4. **`_rank_chunks`'s secondary substring boost.** Mentioned
earlier; still uses the raw hint. Replace it with the same
helper-driven approach the retriever uses, OR delete it as
redundant once we confirm the retriever's primary boost is
sufficient.
5. **Documentation discoverability.** This doc lives under
`docs/architecture/`. The contract is also restated in the
docstring of `resolve_project_name` and referenced from each
call site's comment. That redundancy is intentional — the
contract is too easy to forget to live in only one place.
## Quick reference card
Copy-pasteable for new service functions:
```python
from atocore.projects.registry import resolve_project_name
def my_new_service_entry_point(
project_name: str,
other_args: ...,
) -> ...:
# Validate inputs first
if not project_name:
raise ValueError("project_name is required")
# Canonicalize through the registry as the first thing after
# validation. Every subsequent operation in this function uses
# the canonical id, so storage and queries are guaranteed
# consistent across alias and canonical-id callers.
project_name = resolve_project_name(project_name)
# ... rest of the function ...
```
## TL;DR
- One helper, one rule: `resolve_project_name` at every service-layer
entry point that takes a project name
- Currently called in 8 places across builder, project_state,
interactions, and memory; all 8 listed in this doc
docs+test: clarify legacy alias compatibility gap, add gap regression test Codex caught a real documentation accuracy bug in the previous canonicalization doc commit (f521aab). The doc claimed that rows written under aliases before fb6298a "still work via the unregistered-name fallback path" — that is wrong for REGISTERED aliases, which is exactly the case that matters. The unregistered-name fallback only saves you when the project was never in the registry: a row stored under "orphan-project" is read back via "orphan-project", both pass through resolve_project_name unchanged, and the strings line up. For a registered alias like "p05", the helper rewrites the read key to "p05-interferometer" but does NOT rewrite the storage key, so the legacy row becomes silently invisible. This commit corrects the doc and locks the gap behavior in with a regression test, so the issue cannot be lost again. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - Removed the misleading claim from the "What this rule does NOT cover" section. Replaced with a pointer to the new gap section and an explicit statement that the migration is required before engineering V1 ships. - New "Compatibility gap: legacy alias-keyed rows" section between "Why this is the trust hierarchy in action" and "The rule for new entry points". This is the natural insertion point because the gap is exactly the trust hierarchy failing for legacy data. The section covers: * a worked T0/T1 timeline showing the exact failure mode * what is at risk on the live Dalidou DB, ranked by trust tier: projects table (shadow rows), project_state (highest risk because Layer 3 is most-authoritative), memories, interactions * inspection SQL queries for measuring the actual blast radius on the live DB before running any migration * the spec for the migration script: walk projects, find shadow rows, merge dependent state via the conflict model when there are collisions, dry-run mode, idempotent * explicit statement that this is required pre-V1 because V1 will add new project-keyed tables and the killer correctness queries from engineering-query-catalog.md would report wrong results against any project that has shadow rows - "Open follow-ups" item 1 promoted from "tracked optional" to "REQUIRED before engineering V1 ships, NOT optional" with a more honest cost estimate (~150 LOC migration + ~50 LOC tests + supervised live run, not the previous optimistic ~30 LOC) - TL;DR rewritten to mention the gap explicitly and re-order the open follow-ups so the migration is the top priority tests/test_project_state.py --------------------------- - New test_legacy_alias_keyed_state_is_invisible_until_migrated - Inserts a "p05" project row + a project_state row pointing at it via raw SQL (bypassing set_state which now canonicalizes), simulating a pre-fix legacy row - Verifies the canonicalized get_state path can NOT see the row via either the alias or the canonical id — this is the bug - Verifies the row is still in the database (just unreachable), so the migration script has something to find - The docstring explicitly says: "When the legacy alias migration script lands, this test must be inverted." Future readers will know exactly when and how to update it. Full suite: 175 passing (was 174), 1 warning. The +1 is the new gap regression test. What this commit does NOT do ---------------------------- - The migration script itself is NOT in this commit. Codex's finding was a doc accuracy issue, and the right scope is fix the doc + lock the gap behavior in. Writing the migration is the next concrete step but is bigger (~200 LOC + dry-run mode + collision handling via the conflict model + supervised run on the live Dalidou DB), warrants its own commit, and probably warrants a "draft + review the dry-run output before applying" workflow rather than a single shot. - Existing tests are unchanged. The new test stands alone as a documented gap; the 12 canonicalization tests from fb6298a still pass without modification.
2026-04-07 20:14:19 -04:00
- Backwards-compat path returns **unregistered** names unchanged
(e.g. `"orphan-project"`); this does NOT cover **registered
alias** names that were used as storage keys before `fb6298a`
- **Real compatibility gap**: any row whose `project` column is a
registered alias from before the canonicalization landed is
silently invisible to the new read path. A one-time migration
is required before engineering V1 ships. See the "Compatibility
gap" section.
docs(arch): project-identity-canonicalization contract Codifies the helper-at-every-service-boundary rule that fb6298a implemented across the eight current callsites. The contract is intentionally simple but easy to forget, so it lives in its own doc that the engineering layer V1 implementation sprint can read before adding new project-keyed entity surfaces. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - The contract: every read/write that takes a project name MUST call resolve_project_name() before the value crosses a service boundary; canonicalization happens once, at the first statement after input validation, never later - The helper API: resolve_project_name(name) returns the canonical project_id for registered names, the input unchanged for empty or unregistered names (the second case is the backwards-compat path for hand-curated state predating the registry) - Full table of the 8 current callsites: builder.build_context, project_state.set_state/get_state/invalidate_state, interactions.record_interaction/list_interactions, memory.create_memory/get_memories - Where the helper is intentionally NOT called and why: legacy ensure_project lookup, retriever's own _project_match_boost (which already calls get_registered_project), _rank_chunks secondary substring boost (multiplicative not filter, can't drop relevant chunks), update_memory (no project field update), unregistered names (the rule applied to a name with no record) - Why this is the trust hierarchy in action: Layer 3 trusted state has to be findable to win the trust battle; an un-canonicalized lookup silently makes Layer 3 invisible and the system falls through to lower-trust retrieved chunks with no signal to the human - The 4-step rule for new entry points: identify project-keyed reads/writes, place the call as the first statement after validation, add a regression test using the project_registry fixture, verify None/empty paths - How the project_registry fixture works with a copy-pasteable example - What the rule does NOT cover: alias creation (registry's own write path), registry hot-reloading (no in-process cache by design), cross-project dedup (collision detection at registration), time-bounded canonicalization (canonical id is stable forever), legacy data migration (open follow-up) - Engineering layer V1 implications: every new service entry point in the entities/relationships/conflicts/mirror modules must apply the helper at the first statement after validation; treated as code review failure if missing - Open follow-ups: legacy data migration script (~30 LOC), registry file caching when projects scale beyond ~50, case sensitivity audit when entity-side storage lands, _rank_chunks cleanup, documentation discoverability (intentional redundancy between this doc, the helper docstring, and per-callsite comments) - Quick reference card: copy-pasteable template for new service functions master-plan-status.md updated ----------------------------- - New doc added to the engineering-layer planning sprint listing - Marked as required reading before V1 implementation begins - Note that V1 must apply the contract at every new service-layer entry point Pure doc work, no code changes. Full suite stays at 174 passing because no source changed.
2026-04-07 19:32:31 -04:00
- The trust hierarchy depends on this helper being applied
everywhere — Layer 3 trusted state has to be findable for it to
win the trust battle
- Use the `project_registry` test fixture to add regression tests
for any new service function that takes a project name
- The engineering layer V1 implementation must follow the same
pattern at every new service entry point
docs+test: clarify legacy alias compatibility gap, add gap regression test Codex caught a real documentation accuracy bug in the previous canonicalization doc commit (f521aab). The doc claimed that rows written under aliases before fb6298a "still work via the unregistered-name fallback path" — that is wrong for REGISTERED aliases, which is exactly the case that matters. The unregistered-name fallback only saves you when the project was never in the registry: a row stored under "orphan-project" is read back via "orphan-project", both pass through resolve_project_name unchanged, and the strings line up. For a registered alias like "p05", the helper rewrites the read key to "p05-interferometer" but does NOT rewrite the storage key, so the legacy row becomes silently invisible. This commit corrects the doc and locks the gap behavior in with a regression test, so the issue cannot be lost again. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - Removed the misleading claim from the "What this rule does NOT cover" section. Replaced with a pointer to the new gap section and an explicit statement that the migration is required before engineering V1 ships. - New "Compatibility gap: legacy alias-keyed rows" section between "Why this is the trust hierarchy in action" and "The rule for new entry points". This is the natural insertion point because the gap is exactly the trust hierarchy failing for legacy data. The section covers: * a worked T0/T1 timeline showing the exact failure mode * what is at risk on the live Dalidou DB, ranked by trust tier: projects table (shadow rows), project_state (highest risk because Layer 3 is most-authoritative), memories, interactions * inspection SQL queries for measuring the actual blast radius on the live DB before running any migration * the spec for the migration script: walk projects, find shadow rows, merge dependent state via the conflict model when there are collisions, dry-run mode, idempotent * explicit statement that this is required pre-V1 because V1 will add new project-keyed tables and the killer correctness queries from engineering-query-catalog.md would report wrong results against any project that has shadow rows - "Open follow-ups" item 1 promoted from "tracked optional" to "REQUIRED before engineering V1 ships, NOT optional" with a more honest cost estimate (~150 LOC migration + ~50 LOC tests + supervised live run, not the previous optimistic ~30 LOC) - TL;DR rewritten to mention the gap explicitly and re-order the open follow-ups so the migration is the top priority tests/test_project_state.py --------------------------- - New test_legacy_alias_keyed_state_is_invisible_until_migrated - Inserts a "p05" project row + a project_state row pointing at it via raw SQL (bypassing set_state which now canonicalizes), simulating a pre-fix legacy row - Verifies the canonicalized get_state path can NOT see the row via either the alias or the canonical id — this is the bug - Verifies the row is still in the database (just unreachable), so the migration script has something to find - The docstring explicitly says: "When the legacy alias migration script lands, this test must be inverted." Future readers will know exactly when and how to update it. Full suite: 175 passing (was 174), 1 warning. The +1 is the new gap regression test. What this commit does NOT do ---------------------------- - The migration script itself is NOT in this commit. Codex's finding was a doc accuracy issue, and the right scope is fix the doc + lock the gap behavior in. Writing the migration is the next concrete step but is bigger (~200 LOC + dry-run mode + collision handling via the conflict model + supervised run on the live Dalidou DB), warrants its own commit, and probably warrants a "draft + review the dry-run output before applying" workflow rather than a single shot. - Existing tests are unchanged. The new test stands alone as a documented gap; the 12 canonicalization tests from fb6298a still pass without modification.
2026-04-07 20:14:19 -04:00
- Open follow-ups (in priority order): **legacy alias data
migration (required pre-V1)**, redundant substring boost
cleanup, registry caching when projects scale