25 Commits

Author SHA1 Message Date
58c744fd2f feat: post-backup validation + retention cleanup (Tasks B & C)
- create_runtime_backup() now auto-validates its output and includes
  validated/validation_errors fields in returned metadata
- New cleanup_old_backups() with retention policy: 7 daily, 4 weekly
  (Sundays), 6 monthly (1st of month), dry-run by default
- CLI `cleanup` subcommand added to backup module
- 9 new tests (2 validation + 7 retention), 259 total passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:46:46 -04:00
a34a7a995f fix: token-overlap matcher for reinforcement (Phase 9B)
Replace the substring-based _memory_matches() with a token-overlap
matcher that tokenizes both memory content and response, applies
lightweight stemming (trailing s/ed/ing) and stop-word removal, then
checks whether >= 70% of the memory's tokens appear in the response.

This fixes the paraphrase blindness that prevented reinforcement from
ever firing on natural responses ("prefers" vs "prefer", "because
history" vs "because the history").

7 new tests (26 total reinforcement tests, all passing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:40:05 -04:00
92fc250b54 fix: use correct hook field name last_assistant_message
The Claude Code Stop hook sends `last_assistant_message`, not
`assistant_message`. This was causing response_chars=0 on all
captured interactions. Also removes the temporary debug log block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:17:21 -04:00
2d911909f8 feat: auto-capture Claude Code sessions via Stop hook
Add deploy/hooks/capture_stop.py — a Claude Code Stop hook that reads
the transcript JSONL, extracts the last user prompt, and POSTs to the
AtoCore /interactions endpoint in conservative mode (reinforce=false).

Conservative mode means: capture only, no automatic reinforcement or
extraction into the review queue. Kill switch: ATOCORE_CAPTURE_DISABLED=1.

Also: note build_sha cosmetic issue after restore in runbook, update
project status docs to reflect drill pass and auto-capture wiring.

17 new tests (243 total, all passing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:00:42 -04:00
1a8fdf4225 fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.

## Chroma restore bind-mount bug (drill finding)

src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
  OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.

Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.

Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.

## Doc consolidation

docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.

- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
  - Replaced the manual sudo cp restore sequence with the new
    `python -m atocore.ops.backup restore <STAMP>
    --confirm-service-stopped` CLI
  - Added the one-shot docker compose run pattern for running
    restore inside a container that reuses the live volume mounts
  - Documented the --no-pre-snapshot / --no-chroma / --chroma flags
  - New "Chroma restore and bind-mounted volumes" subsection
    explaining the bug and the regression test that protects the fix
  - New "Restore drill" subsection with three levels (unit tests,
    module round-trip, live Dalidou drill) and the cadence list
  - Failure-mode table gained four entries: restored_integrity_ok,
    Device-or-resource-busy, drill marker still present,
    chroma_snapshot_missing
  - "Open follow-ups" struck the restore_runtime_backup item (done)
    and added a "Done (historical)" note referencing 2026-04-09
  - Quickstart cheat sheet now has a full drill one-liner using
    memory_type=episodic (the 2026-04-09 drill found the runbook's
    memory_type=note was invalid — the valid set is identity,
    preference, project, episodic, knowledge, adaptation)

## Status doc sync

Long overdue — I've been landing code without updating the
project's narrative state docs.

docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
  real with CLI, pre-restore safety snapshot, WAL cleanup,
  integrity check; live drill on 2026-04-09 surfaced and fixed
  Chroma bind-mount bug; deploy provenance via /health build_sha;
  deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
  auto-capture (priority 2) are now ahead of retrieval quality work,
  reflecting the updated unblock sequence

docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
  (create/list/validate/restore, admin/backup endpoint with
  include_chroma, /health provenance, self-update guard,
  procedure doc with failure modes) and what's still pending
  (retention cleanup, off-Dalidou target, auto-validation)

## Test count

226 passing (was 225 + 1 new inode-stability regression test).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
336208004c ops: add restore_runtime_backup + drill runbook
Close the backup side of the loop: we had create/list/validate but
no restore, and no documented drill. A backup you've never restored
is not a backup. This lands the missing restore surface and the
procedure to exercise it before enabling any write-path automation
(auto-capture, automated ingestion, reinforcement sweeps).

Code — src/atocore/ops/backup.py:

- restore_runtime_backup(stamp, *, include_chroma, pre_restore_snapshot,
  confirm_service_stopped) performs:
  1. validate_backup() gate — refuse on any error
  2. pre-restore safety snapshot of current state (reversibility anchor)
  3. PRAGMA wal_checkpoint(TRUNCATE) on target db (flush + release
     OS handles; Windows needs this after conn.backup() reads)
  4. unlink stale -wal/-shm sidecars (tolerant to Windows lock races)
  5. shutil.copy2 snapshot db over target
  6. restore registry if snapshot captured one
  7. restore Chroma tree if snapshot captured one and include_chroma
     resolves to true (defaults to whether backup has Chroma)
  8. PRAGMA integrity_check on restored db, report result
- Refuses without confirm_service_stopped=True to prevent hot-restore
  into a running service (would corrupt SQLite state)
- Rewrote main() as argparse with 4 subcommands: create, list,
  validate, restore. `python -m atocore.ops.backup restore STAMP
  --confirm-service-stopped` is the drill CLI entry point, run via
  `docker compose run --rm --entrypoint python atocore` so it reuses
  the live service's volume mounts

Tests — tests/test_backup.py (6 new):

- test_restore_refuses_without_confirm_service_stopped
- test_restore_raises_on_invalid_backup
- test_restore_round_trip_reverses_post_backup_mutations
  (canonical drill flow: seed -> backup -> mutate -> restore ->
   mutation gone + baseline survived + pre-restore snapshot has
   the mutation captured as rollback anchor)
- test_restore_round_trip_with_chroma
- test_restore_skips_pre_snapshot_when_requested
- test_restore_cleans_stale_wal_sidecars (asserts stale byte
  markers do not survive, not file existence, since PRAGMA
  integrity_check may legitimately recreate -wal)

Docs — docs/backup-restore-drill.md (new):

- What gets backed up (hot sqlite, cold chroma, registry JSON,
  metadata.json) and what doesn't (.env, source content)
- What restore does, step by step, and why confirm_service_stopped
  is a hard gate
- 8-step drill procedure: capture -> baseline -> mutate -> stop ->
  restore -> start -> verify marker gone -> optional cleanup
- Correct endpoint bodies verified against routes.py:
    POST /admin/backup with JSON body {"include_chroma": true}
    POST /memory with memory_type/content/project/confidence
    GET /memory?project=drill to list drill markers
    POST /query with {"prompt": ..., "top_k": ...} (not "query")
- Failure modes: integrity_check fail, container won't start,
  marker still present after restore, with remediation for each
- When to run: before new write-path automation, after backup.py
  or schema changes, after infra bumps, monthly as standing check

225/225 tests passing (219 existing + 6 new restore).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:17:48 -04:00
03822389a1 deploy: self-update re-exec guard in deploy.sh
When deploy.sh itself changes in the commit being pulled, the bash
process is still running the OLD script from memory — git reset --hard
updated the file on disk but the in-memory instructions are stale.
This bit the 2026-04-09 Dalidou deploy: the old pre-build-sha Step 2
ran against fresh source, so the container started with
ATOCORE_BUILD_SHA="unknown" instead of the real commit. Manual
re-run fixed it, but the class of bug will re-emerge every time
deploy.sh itself changes.

Fix (Step 1.5):
- After git reset --hard, sha1 the running script ($0) and the
  on-disk copy at $APP_DIR/deploy/dalidou/deploy.sh
- If they differ, export ATOCORE_DEPLOY_REEXECED=1 and exec into
  the fresh copy so Step 2 onward runs under the new script
- The sentinel env var prevents recursion
- Skipped in dry-run mode, when $0 isn't readable, or when the
  on-disk script doesn't exist yet

Docs (docs/dalidou-deployment.md):
- New "The deploy.sh self-update race" troubleshooting section
  explaining the root cause, the Step 1.5 mechanism, what the log
  output looks like, and how to opt out

Verified syntax and dry-run. 219/219 tests still passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:08:41 -04:00
be4099486c deploy: add build_sha visibility for precise drift detection
Make /health report the precise git SHA the container was built from,
so 'is the live service current?' can be answered without ambiguity.
0.2.0 was too coarse to trust as a 'live is current' signal — many
commits share the same __version__.

Three layers:

1. /health endpoint (src/atocore/api/routes.py)
   - Reads ATOCORE_BUILD_SHA, ATOCORE_BUILD_TIME, ATOCORE_BUILD_BRANCH
     from environment, defaults to 'unknown'
   - Reports them alongside existing code_version field

2. docker-compose.yml
   - Forwards the three env vars from the host into the container
   - Defaults to 'unknown' so direct `docker compose up` runs (without
     deploy.sh) cleanly signal missing build provenance

3. deploy.sh
   - Step 2 captures git SHA + UTC timestamp + branch and exports them
     as env vars before `docker compose up -d --build`
   - Step 6 reads /health post-deploy and compares the reported
     build_sha against the freshly-built one. Mismatch exits non-zero
     (exit code 6) with a remediation hint covering cached image,
     env propagation, and concurrent restart cases

Tests (tests/test_api_storage.py):
- test_health_endpoint_reports_code_version_from_module
- test_health_endpoint_reports_build_metadata_from_env
- test_health_endpoint_reports_unknown_when_build_env_unset

Docs (docs/dalidou-deployment.md):
- Three-level drift detection table (code_version coarse,
  build_sha precise, build_time/branch forensic)
- Canonical drift check script using LIVE_SHA vs EXPECTED_SHA
- Note that running deploy.sh is itself the simplest drift check

219/219 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:25:32 -04:00
2c0b214137 deploy.sh: add permission pre-flight check with clean remediation
Dalidou Claude's second re-deploy (commit b492f5f) reported one
remaining friction point: the app dir was root-owned from the
previous manual-workaround deploy (when ALTER TABLE was run as
root to work around the schema init bug), so deploy.sh's git
fetch/reset hit a permission wall. They worked around it with
a one-shot docker run chown, but the script itself produced
cryptic git errors before that, so the fix wasn't obvious until
after the fact.

This commit adds a permission pre-flight check that runs BEFORE
any git operations and exits cleanly with an explicit remediation
message instead of letting git produce half-state on partial
failure.

The check:
1. Reads the current owner of the app dir via `stat -c '%U:%G'`
2. Reports the current user via `id -un` / `id -u:id -g`
3. Attempts to create a throwaway marker file in the app dir
4. If the marker write fails, prints three distinct remediation
   commands covering the common environments:
     a. sudo chown -R 1000:1000 $APP_DIR (if passwordless sudo)
     b. sudo bash $0 (if running deploy.sh itself as root works)
     c. docker run --rm -v $APP_DIR:/app alpine chown -R ...
        (what Dalidou Claude actually did on 2026-04-08)
5. Exits with code 5 so CI / automation can distinguish "no
   permission" from other deploy failures

Dry-run mode skips the check (nothing is mutated in dry-run).

A brief WARNING is also printed early if the app dir exists but
doesn't appear writable, before the fatal check — this gives
operators a heads-up even in the happy-path case.

Syntax check: bash -n passes.
Full suite: 216 passing (unchanged; no code changes to the app).

What this commit does NOT do
----------------------------
- Does NOT automatically fix permissions. chown needs root and
  we don't want deploy.sh to escalate silently. The operator
  runs one of the three remediation commands manually.
- Does NOT check permissions on nested files (like .git/config)
  individually. The marker-file test on the app dir root is the
  cheapest proxy that catches the common case (root-owned dir
  tree after a previous sudo-based operation).
- Does NOT change behavior on first-time deploys where the app
  dir doesn't exist yet. The check is gated on `-d $APP_DIR`.
2026-04-08 19:55:50 -04:00
b492f5f7b0 fix: schema init ordering, deploy.sh default, client BASE_URL docs
Three issues Dalidou Claude surfaced during the first real deploy
of commit e877e5b to the live service (report from 2026-04-08).
Bug 1 was the critical one — a schema init ordering bug that would
have bitten every future upgrade from a pre-Phase-9 schema — and
the other two were usability traps around hostname resolution.

Bug 1 (CRITICAL): schema init ordering
--------------------------------------
src/atocore/models/database.py

SCHEMA_SQL contained CREATE INDEX statements that referenced
columns added later by _apply_migrations():

    CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project);
    CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project);
    CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id);

On a FRESH install, CREATE TABLE IF NOT EXISTS creates the tables
with the Phase 9 shape (columns present), so the CREATE INDEX runs
cleanly and _apply_migrations is effectively a no-op.

On an UPGRADE from a pre-Phase-9 schema, CREATE TABLE IF NOT EXISTS
is a no-op (the tables already exist in the old shape), the columns
are NOT added yet, and the CREATE INDEX fails with
"OperationalError: no such column: project" before
_apply_migrations gets a chance to add the columns.

Dalidou Claude hit this exactly when redeploying from 0.1.0 to
0.2.0 — had to manually ALTER TABLE to add the Phase 9 columns
before the container could start.

The fix is to remove the Phase 9-column indexes from SCHEMA_SQL.
They already exist in _apply_migrations() AFTER the corresponding
ALTER TABLE, so they still get created on both fresh and upgrade
paths — just after the columns exist, not before.

Indexes still in SCHEMA_SQL (all safe — reference columns that
have existed since the first release):
- idx_chunks_document on source_chunks(document_id)
- idx_memories_type on memories(memory_type)
- idx_memories_status on memories(status)
- idx_interactions_project on interactions(project_id)

Indexes moved to _apply_migrations (already there — just no longer
duplicated in SCHEMA_SQL):
- idx_memories_project on memories(project)
- idx_interactions_project_name on interactions(project)
- idx_interactions_session on interactions(session_id)
- idx_interactions_created_at on interactions(created_at)

Regression test: tests/test_database.py
---------------------------------------
New test_init_db_upgrades_pre_phase9_schema_without_failing:

- Seeds the DB with the exact pre-Phase-9 shape (no project /
  last_referenced_at / reference_count on memories; no project /
  client / session_id / response / memories_used / chunks_used on
  interactions)
- Calls init_db() — which used to raise OperationalError before
  the fix
- Verifies all Phase 9 columns are present after the call
- Verifies the migration indexes exist

Before the fix this test would have failed with
"OperationalError: no such column: project" on the init_db call.
After the fix it passes. This locks the invariant "init_db is
safe on any legacy schema shape" so the bug can't silently come
back.

Full suite: 216 passing (was 215), 1 warning. The +1 is the new
regression test.

Bug 3 (usability): deploy.sh DNS default
----------------------------------------
deploy/dalidou/deploy.sh

ATOCORE_GIT_REMOTE defaulted to http://dalidou:3000/Antoine/ATOCore.git
which requires the "dalidou" hostname to resolve. On the Dalidou
host itself it didn't (no /etc/hosts entry for localhost alias),
so deploy.sh had to be run with the IP as a manual workaround.

Fix: default ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git.
Loopback always works on the host running the script. Callers
from a remote host (e.g. running deploy.sh from a laptop against
the Dalidou LAN) set ATOCORE_GIT_REMOTE explicitly. The script
header's Environment Variables section documents this with an
explicit reference to the 2026-04-08 Dalidou deploy report so the
rationale isn't lost.

docs/dalidou-deployment.md gets a new "Troubleshooting hostname
resolution" subsection and a new example invocation showing how
to deploy from a remote host with an explicit ATOCORE_GIT_REMOTE
override.

Bug 2 (usability): atocore_client.py ATOCORE_BASE_URL documentation
-------------------------------------------------------------------
scripts/atocore_client.py

Same class of issue as bug 3. BASE_URL defaults to
http://dalidou:8100 which resolves fine from a remote caller
(laptop, T420/OpenClaw over Tailscale) but NOT from the Dalidou
host itself or from inside the atocore container. Dalidou Claude
saw the CLI return
{"status": "unavailable", "fail_open": true}
while direct curl to http://127.0.0.1:8100 worked.

The fix here is NOT to change the default (remote callers are
the common case and would break) but to DOCUMENT the override
clearly so the next operator knows what's happening:

- The script module docstring grew a new "Environment variables"
  section covering ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS,
  ATOCORE_REFRESH_TIMEOUT_SECONDS, and ATOCORE_FAIL_OPEN, with
  the explicit override example for on-host/in-container use
- It calls out the exact symptom (fail-open envelope when the
  base URL doesn't resolve) so the diagnosis is obvious from
  the error alone
- docs/dalidou-deployment.md troubleshooting section mirrors
  this guidance so there's one place to look regardless of
  whether the operator starts with the client help or the
  deploy doc

What this commit does NOT do
----------------------------
- Does NOT change the default ATOCORE_BASE_URL. Doing that would
  break the T420 OpenClaw helper and every remote caller who
  currently relies on the hostname. Documentation is the right
  fix for this case.
- Does NOT fix /etc/hosts on Dalidou. That's a host-level
  configuration issue that the user can fix if they prefer
  having the hostname resolve; the deploy.sh fix makes it
  unnecessary regardless.
- Does NOT re-run the validation on Dalidou. The next step is
  for the live service to pull this commit via deploy.sh (which
  should now work without the IP workaround) and re-run the
  Phase 9 loop test to confirm nothing regressed.
2026-04-08 19:02:57 -04:00
e877e5b8ff deploy: version-visible /health + deploy.sh + update runbook
Dalidou Claude's validation run against the live service exposed a
structural gap: the deployment at /srv/storage/atocore/app has no
git connection, the running container was built from pre-Phase-9
source, and /health hardcoded 'version: 0.1.0' so drift is
invisible. Weeks of work have been shipping to Gitea but never
reaching the live service.

This commit fixes both the drift-invisibility problem and the
absence of an update workflow, so the next deploy to Dalidou can
go live cleanly and future drifts surface immediately.

Layer 1: deployment drift is now visible via /health
----------------------------------------------------
- src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0
  and documented as the source of truth for the deployed code
  version, with a history block explaining when each bump happens
  (API surface change, schema change, user-visible behavior change)
- src/atocore/main.py: FastAPI constructor now uses __version__
  instead of the hardcoded '0.1.0' string, so the OpenAPI docs
  reflect the actual code version
- src/atocore/api/routes.py: /health now reads from __version__
  dynamically. Both the existing 'version' field and a new
  'code_version' field report the same value for backwards compat.
  A new docstring explains that comparing this to the main
  branch's __version__ is the fastest way to detect drift.
- pyproject.toml: version bumped to 0.2.0 to stay in sync

The comparison is now:
  curl /health -> "code_version": "0.2.0"
  grep __version__ src/atocore/__init__.py -> "0.2.0"
If those differ, the deployment is stale. Concrete, unambiguous.

Layer 2: deploy.sh as the canonical update path
-----------------------------------------------
New file: deploy/dalidou/deploy.sh

One-shot bash script that handles both the first-time deploy
(where /srv/storage/atocore/app may not be a git repo yet) and
the ongoing update case. Steps:

1. If app dir is not a git checkout, back it up as
   <dir>.pre-git-<utc-stamp> and re-clone from Gitea.
   If it IS a checkout, fetch + reset --hard origin/<branch>.
2. Report the deployable commit SHA
3. Check that deploy/dalidou/.env exists (hard fail if missing
   with a clear message pointing at .env.example)
4. docker compose up -d --build — rebuilds the image from
   current source, restarts the container
5. Poll /health for up to 30 seconds; on failure, print the
   last 50 lines of container logs and exit non-zero
6. Parse /health.code_version and compare to the __version__
   in the freshly-pulled source. If they differ, exit non-zero
   with a message suggesting docker compose down && up
7. On success, report commit + code_version + "health: ok"

Configurable via env vars:
- ATOCORE_APP_DIR (default /srv/storage/atocore/app)
- ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git)
- ATOCORE_BRANCH (default main)
- ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health)
- ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode

Explicit non-goals documented in the script header:
- does not manage secrets (.env is the caller's responsibility)
- does not take a pre-deploy backup (call /admin/backup first
  if you want one)
- does not roll back on failure (redeploy a known-good commit
  to recover)
- does not touch the DB directly — schema migrations run at
  service startup via the lifespan handler, and all existing
  _apply_migrations ALTERs are idempotent ADD COLUMN operations

Layer 3: updated docs/dalidou-deployment.md
-------------------------------------------
- First-time deployment steps now explicitly say "git clone", not
  "place the repository", so future first-time deploys don't end
  up as static snapshots again
- New "Updating a running deployment" section covering deploy.sh
  usage with all three modes (normal / branch override / dry-run)
- New "Deployment drift detection" section with the one-liner
  comparison between /health code_version and the repo's
  __version__
- New "Schema migrations on redeploy" section enumerating the
  exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0
  upgrade, confirming they are additive-only and safe, and
  recommending a backup via /admin/backup before any redeploy

Full suite: 215 passing, 1 warning. No test was hardcoded to the
old version string, so the version bump was safe without test
changes.

What this commit does NOT do
----------------------------
- Does NOT execute the deploy on the live Dalidou instance. That
  requires Dalidou access and is the next step. A ready-to-paste
  prompt for Dalidou Claude will be provided separately.
- Does NOT add CI/CD, webhook-based auto-deploy, or reverse
  proxy. Those remain in the 'deferred' section of the
  deployment doc.
- Does NOT change the Dockerfile. The existing 'COPY source at
  build time' pattern is what deploy.sh relies on — rebuilding
  the image picks up new code.
- Does NOT modify the database schema. The Phase 9 migrations
  that Dalidou's DB needs will be applied automatically on next
  service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
fad30d5461 feat(client): Phase 9 reflection loop surface in shared operator CLI
Codex's sequence step 3: finish the Phase 9 operator surface in the
shared client. The previous client version (0.1.0) covered stable
operations (project lifecycle, retrieval, context build, trusted
state, audit-query) but explicitly deferred capture/extract/queue/
promote/reject pending "exercised workflow". That deferral ran
into a bootstrap problem: real Claude Code sessions can't exercise
the Phase 9 loop without a usable client surface to drive it. This
commit ships the 8 missing subcommands so the next step (real
validation on Dalidou) is unblocked.

Bumps CLIENT_VERSION from 0.1.0 to 0.2.0 per the semver rules in
llm-client-integration.md (new subcommands = minor bump).

New subcommands in scripts/atocore_client.py
--------------------------------------------
| Subcommand            | Endpoint                                  |
|-----------------------|-------------------------------------------|
| capture               | POST /interactions                        |
| extract               | POST /interactions/{id}/extract           |
| reinforce-interaction | POST /interactions/{id}/reinforce         |
| list-interactions     | GET  /interactions                        |
| get-interaction       | GET  /interactions/{id}                   |
| queue                 | GET  /memory?status=candidate             |
| promote               | POST /memory/{id}/promote                 |
| reject                | POST /memory/{id}/reject                  |

Each follows the existing client style: positional arguments with
empty-string defaults for optional filters, truthy-string arguments
for booleans (matching the existing refresh-project pattern), JSON
output via print_json(), fail-open behavior inherited from
request().

capture accepts prompt + response + project + client + session_id +
reinforce as positionals, defaulting the client field to
"atocore-client" when omitted so every capture from the shared
client is identifiable in the interactions audit trail.

extract defaults to preview mode (persist=false). Pass "true" as
the second positional to create candidate memories.

list-interactions and queue build URL query strings with
url-encoded values and always include the limit, matching how the
existing context-build subcommand handles its parameters.

Security fix: ID-field URL encoding
-----------------------------------
The initial draft used urllib.parse.quote() with the default safe
set, which does NOT encode "/" because it's a reserved path
character. That's a security footgun on ID fields: passing
"promote mem/evil/action" would build /memory/mem/evil/action/promote
and hit a completely different endpoint than intended.

Fixed by passing safe="" to urllib.parse.quote() on every ID field
(interaction_id and memory_id). The tests cover this explicitly via
test_extract_url_encodes_interaction_id and test_promote_url_encodes_memory_id,
both of which would have failed with the default behavior.

Project names keep the default quote behavior because a project
name with a slash would already be broken elsewhere in the system
(ingest root resolution, file paths, etc).

tests/test_atocore_client.py (new, 18 tests, all green)
-------------------------------------------------------
A dedicated test file for the shared client that mocks the
request() helper and verifies each subcommand:
- calls the correct HTTP method and path
- builds the correct JSON body (or query string)
- passes the right subset of CLI arguments through
- URL-encodes ID fields so path traversal isn't possible

Tests are structured as unit tests (not integration tests) because
the API surface on the server side already has its own route tests
in test_api_storage.py and the Phase 9 specific files. These tests
are the wiring contract between CLI args and HTTP calls.

Test file highlights:
- capture: default values, custom client, reinforce=false
- extract: preview by default, persist=true opt-in, URL encoding
- reinforce-interaction: correct path construction
- list-interactions: no filters, single filter, full filter set
  (including ISO 8601 since parameter with T separator and Z)
- get-interaction: fetch by id
- queue: always filters status=candidate, accepts memory_type
  and project, coerces limit to int
- promote / reject: correct path + URL encoding
- test_phase9_full_loop_via_client_shape: end-to-end sequence
  that drives capture -> extract preview -> extract persist ->
  queue list -> promote -> reject through the shared client and
  verifies the exact sequence of HTTP calls that would be made

These tests run in ~0.2s because they mock request() — no DB, no
Chroma, no HTTP. The fast feedback loop matters because the
client surface is what every agent integration eventually depends
on.

docs/architecture/llm-client-integration.md updates
---------------------------------------------------
- New "Phase 9 reflection loop (shipped after migration safety
  work)" section under "What's in scope for the shared client
  today" with the full 8-subcommand table and a note explaining
  the bootstrap-problem rationale
- Removed the "Memory review queue and reflection loop" section
  from "What's intentionally NOT in scope today"; backup admin
  and engineering-entity commands remain the only deferred
  families
- Renumbered the deferred-commands list (was 3 items, now 2)
- Open follow-ups updated: memory-review-subcommand item replaced
  with "real-usage validation of the Phase 9 loop" as the next
  concrete dependency
- TL;DR updated to list the reflection-loop subcommands
- Versioning note records the v0.1.0 -> v0.2.0 bump with the
  subcommands included

Full suite: 215 passing (was 197), 1 warning. The +18 is
tests/test_atocore_client.py. Runtime unchanged because the new
tests don't touch the DB.

What this commit does NOT do
----------------------------
- Does NOT change the server-side endpoints. All 8 subcommands
  call existing API routes that were shipped in Phase 9 Commits
  A/B/C. This is purely a client-side wiring commit.
- Does NOT run the reflection loop against the live Dalidou
  instance. That's the next concrete step and is explicitly
  called out in the open-follow-ups section of the updated doc.
- Does NOT modify the Claude Code slash command. It still pulls
  context only; the capture/extract/queue/promote companion
  commands (e.g. /atocore-record-response) are deferred until the
  capture workflow has been exercised in real use at least once.
- Does NOT refactor the OpenClaw helper. That's a cross-repo
  change and remains a queued follow-up, now unblocked by the
  shared client having the reflection-loop subcommands.
2026-04-08 16:09:42 -04:00
261277fd51 fix(migration): preserve superseded/invalid shadow state during rekey
Codex caught a real data-loss bug in the legacy alias migration
shipped in 7e60f5a. plan_state_migration filtered state rows to
status='active' only, then apply_plan deleted the shadow projects
row at the end. Because project_state.project_id has
ON DELETE CASCADE, any superseded or invalid state rows still
attached to the shadow project got silently cascade-deleted —
exactly the audit loss a cleanup migration must not cause.

This commit fixes the bug and adds regression tests that lock in
the invariant "shadow state of every status is accounted for".

Root cause
----------
scripts/migrate_legacy_aliases.py::plan_state_migration was:

    "SELECT * FROM project_state WHERE project_id = ? AND status = 'active'"

which only found live rows. Any historical row (status in
'superseded' or 'invalid') was invisible to the plan, so the apply
step had nothing to rekey for it. Then the shadow project row was
deleted at the end, cascade-deleting every unplanned row.

The fix
-------
plan_state_migration now selects ALL state rows attached to the
shadow project regardless of status, and handles every row per a
per-status decision table:

| Shadow status | Canonical at same triple? | Values     | Action                         |
|---------------|---------------------------|------------|--------------------------------|
| any           | no                        | —          | clean rekey                    |
| any           | yes                       | same       | shadow superseded in place     |
| active        | yes, active               | different  | COLLISION, apply refuses       |
| active        | yes, inactive             | different  | shadow wins, canonical deleted |
| inactive      | yes, any                  | different  | historical drop (logged)       |

Four changes in the script:

1. SELECT drops the status filter so the plan walks every row.
2. New StateRekeyPlan.historical_drops list captures the shadow
   rows that lose to a canonical row at the same triple because the
   shadow is already inactive. These are the only unavoidable data
   losses, and they happen because the UNIQUE(project_id, category,
   key) constraint on project_state doesn't allow two rows per
   triple regardless of status.
3. New apply action 'replace_inactive_canonical' for the
   shadow-active-vs-canonical-inactive case. At apply time the
   canonical inactive row is DELETEd first (SQLite's default
   immediate constraint checking) and then the shadow is UPDATEd
   into its place in two separate statements. Adds a new
   state_rows_replaced_inactive_canonical counter.
4. New apply counter state_rows_historical_dropped for audit
   transparency. The rows themselves are still cascade-deleted
   when the shadow project row is dropped, but they're counted
   and reported.

Five places render_plan_text and plan_to_json_dict updated:

- counts() gains state_historical_drops
- render_plan_text prints a 'historical drops' section with each
  shadow-canonical pair and their statuses when there are any, so
  the operator sees the audit loss BEFORE running --apply
- The new section explicitly tells the operator: "if any of these
  values are worth keeping as separate audit records, manually copy
  them out before running --apply"
- plan_to_json_dict carries historical_drops into the JSON report
- The state counts table in the human report now shows both
  'state collisions (block)' and 'state historical drops' as
  separate lines so the operator can distinguish
  "apply will refuse" from "apply will drop historical rows"

Regression tests (3 new, all green)
-----------------------------------
tests/test_migrate_legacy_aliases.py:

- test_apply_preserves_superseded_shadow_state_when_no_collision:
  the direct regression for the codex finding. Seeds a shadow with
  a superseded state row on a triple the canonical doesn't have,
  runs the migration, verifies via raw SQL that the row is now
  attached to the canonical projects row and still has status
  'superseded'. This is the test that would have failed before
  the fix.
- test_apply_drops_shadow_inactive_row_when_canonical_holds_same_triple:
  covers the unavoidable data-loss case. Seeds shadow superseded
  + canonical active at the same triple with different values,
  verifies plan.counts() reports one historical_drop, runs apply,
  verifies the canonical value is preserved and the shadow value
  is gone.
- test_apply_replaces_inactive_canonical_with_active_shadow:
  covers the cross-contamination case where shadow has live value
  and canonical has a stale invalid row. Shadow wins by deleting
  canonical and rekeying in its place. Verifies the counter and
  the final state.

Plus _seed_state_row now accepts a status kwarg so the seeding
helper can create superseded/invalid rows directly.

test_dry_run_on_empty_registry_reports_empty_plan was updated to
include the new state_historical_drops key in the expected counts
dict (all zero for an empty plan, so the test shape is the same).

Full suite: 197 passing (was 194), 1 warning. The +3 is the three
new regression tests.

What this commit does NOT do
----------------------------
- Does NOT try to preserve historical shadow rows that collide
  with a canonical row at the same triple. That would require a
  schema change (adding (id) to the UNIQUE key, or a separate
  history table) and isn't in scope for a cleanup migration.
  The operator sees these as explicit 'historical drops' in the
  plan output and can copy them out manually if any are worth
  preserving.
- Does NOT change any behavior for rows that were already
  reachable from the canonicalized read path. The fix only
  affects legacy rows whose project_id points at a shadow row.
- Does NOT re-verify the earlier happy-path tests beyond the full
  suite confirming them still green.
2026-04-08 15:52:44 -04:00
7e60f5a0e6 feat(ops): legacy alias migration script with dry-run/apply modes
Closes the compatibility gap documented in
docs/architecture/project-identity-canonicalization.md. Before fb6298a,
writes to project_state, memories, and interactions stored the raw
project name. After fb6298a every service-layer entry point
canonicalizes through the registry, which silently made pre-fix
alias-keyed rows unreachable from the new read path. Now there's
a migration tool to find and fix them.

This commit is the tool and its tests. The tool is NOT run against
the live Dalidou DB in this commit — that's a separate supervised
manual step after reviewing the dry-run output.

scripts/migrate_legacy_aliases.py
---------------------------------
Standalone offline migration tool. Dry-run default, --apply explicit.

What it inspects:
- projects: rows whose name is a registered alias and differs from
  the canonical project_id (shadow rows)
- project_state: rows whose project_id points at a shadow; plan
  rekeys them to the canonical row's id. (category, key) collisions
  against the canonical block the apply step until a human resolves
- memories: rows whose project column is a registered alias. Plain
  string rekey. Dedup collisions (after rekey, same
  (memory_type, content, project, status)) are handled by the
  existing memory supersession model: newer row stays active, older
  becomes superseded with updated_at as tiebreaker
- interactions: rows whose project column is a registered alias.
  Plain string rekey, no collision handling

What it does NOT do:
- Never touches rows that are already canonical
- Never auto-resolves project_state collisions (refuses until the
  human picks a winner via POST /project/state)
- Never creates data; only rekeys or supersedes
- Never runs outside a single SQLite transaction; any failure rolls
  back the entire migration

Safety rails:
- Dry-run is default. --apply is explicit.
- Apply on empty plan refuses unless --allow-empty (prevents
  accidental runs that look meaningful but did nothing)
- Apply refuses on any project_state collision
- Apply refuses on integrity errors (e.g. two case-variant rows
  both matching the canonical lookup)
- Writes a JSON report to data/migrations/ on every run (dry-run
  and apply alike) for audit
- Idempotent: running twice produces the same final state as
  running once. The second run finds zero shadow rows and exits
  clean.

CLI flags:
  --registry PATH     override ATOCORE_PROJECT_REGISTRY_PATH
  --db PATH           override the AtoCore SQLite DB path
  --apply             actually mutate (default is dry-run)
  --allow-empty       permit --apply on an empty plan
  --report-dir PATH   where to write the JSON report
  --json              emit the plan as JSON instead of human prose

Smoke test against the Phase 9 validation DB produces the expected
"Nothing to migrate. The database is clean." output with 4 known
canonical projects and 0 shadows.

tests/test_migrate_legacy_aliases.py
------------------------------------
19 new tests, all green:

Plan-building:
- test_dry_run_on_empty_registry_reports_empty_plan
- test_dry_run_on_clean_registered_db_reports_empty_plan
- test_dry_run_finds_shadow_project
- test_dry_run_plans_state_rekey_without_collisions
- test_dry_run_detects_state_collision
- test_dry_run_plans_memory_rekey_and_supersession
- test_dry_run_plans_interaction_rekey

Apply:
- test_apply_refuses_on_state_collision
- test_apply_migrates_clean_shadow_end_to_end (verifies get_state
  can see the state via BOTH the alias AND the canonical after
  migration)
- test_apply_drops_shadow_state_duplicate_without_collision
  (same (category, key, value) on both sides - mark shadow
  superseded, don't hit the UNIQUE constraint)
- test_apply_migrates_memories
- test_apply_migrates_interactions
- test_apply_is_idempotent
- test_apply_refuses_with_integrity_errors (uses case-variant
  canonical rows to work around projects.name UNIQUE constraint;
  verifies the case-insensitive duplicate detection works)

Reporting:
- test_plan_to_json_dict_is_serializable
- test_write_report_creates_file
- test_render_plan_text_on_empty_plan
- test_render_plan_text_on_collision

End-to-end gap closure (the most important test):
- test_legacy_alias_gap_is_closed_after_migration
  - Seeds the exact same scenario as
    test_legacy_alias_keyed_state_is_invisible_until_migrated
    in test_project_state.py (which documents the pre-migration
    gap)
  - Confirms the row is invisible before migration
  - Runs the migration
  - Verifies the row is reachable via BOTH the canonical id AND
    the alias afterward
  - This test and the pre-migration gap test together lock in
    "before migration: invisible, after migration: reachable"
    as the documented invariant

Full suite: 194 passing (was 175), 1 warning. The +19 is the new
migration test file.

Next concrete step after this commit
------------------------------------
- Run the dry-run against the live Dalidou DB to find out the
  actual blast radius. The script is the inspection SQL, codified.
- Review the dry-run output together
- If clean (zero shadows), no apply needed; close the doc gap as
  "verified nothing to migrate on this deployment"
- If there are shadows, resolve any collisions via
  POST /project/state, then run --apply under supervision
- After apply, the test_legacy_alias_keyed_state_is_invisible_until_migrated
  test still passes (it simulates the gap directly, so it's
  independent of the live DB state) and the gap-closed companion
  test continues to guard forward
2026-04-08 15:08:16 -04:00
1953e559f9 docs+test: clarify legacy alias compatibility gap, add gap regression test
Codex caught a real documentation accuracy bug in the previous
canonicalization doc commit (f521aab). The doc claimed that rows
written under aliases before fb6298a "still work via the
unregistered-name fallback path" — that is wrong for REGISTERED
aliases, which is exactly the case that matters.

The unregistered-name fallback only saves you when the project was
never in the registry: a row stored under "orphan-project" is read
back via "orphan-project", both pass through resolve_project_name
unchanged, and the strings line up. For a registered alias like
"p05", the helper rewrites the read key to "p05-interferometer"
but does NOT rewrite the storage key, so the legacy row becomes
silently invisible.

This commit corrects the doc and locks the gap behavior in with
a regression test, so the issue cannot be lost again.

docs/architecture/project-identity-canonicalization.md
------------------------------------------------------
- Removed the misleading claim from the "What this rule does NOT
  cover" section. Replaced with a pointer to the new gap section
  and an explicit statement that the migration is required before
  engineering V1 ships.
- New "Compatibility gap: legacy alias-keyed rows" section between
  "Why this is the trust hierarchy in action" and "The rule for
  new entry points". This is the natural insertion point because
  the gap is exactly the trust hierarchy failing for legacy data.
  The section covers:
  * a worked T0/T1 timeline showing the exact failure mode
  * what is at risk on the live Dalidou DB, ranked by trust tier:
    projects table (shadow rows), project_state (highest risk
    because Layer 3 is most-authoritative), memories, interactions
  * inspection SQL queries for measuring the actual blast radius
    on the live DB before running any migration
  * the spec for the migration script: walk projects, find shadow
    rows, merge dependent state via the conflict model when there
    are collisions, dry-run mode, idempotent
  * explicit statement that this is required pre-V1 because V1
    will add new project-keyed tables and the killer correctness
    queries from engineering-query-catalog.md would report wrong
    results against any project that has shadow rows
- "Open follow-ups" item 1 promoted from "tracked optional" to
  "REQUIRED before engineering V1 ships, NOT optional" with a
  more honest cost estimate (~150 LOC migration + ~50 LOC tests
  + supervised live run, not the previous optimistic ~30 LOC)
- TL;DR rewritten to mention the gap explicitly and re-order
  the open follow-ups so the migration is the top priority

tests/test_project_state.py
---------------------------
- New test_legacy_alias_keyed_state_is_invisible_until_migrated
- Inserts a "p05" project row + a project_state row pointing at
  it via raw SQL (bypassing set_state which now canonicalizes),
  simulating a pre-fix legacy row
- Verifies the canonicalized get_state path can NOT see the row
  via either the alias or the canonical id — this is the bug
- Verifies the row is still in the database (just unreachable),
  so the migration script has something to find
- The docstring explicitly says: "When the legacy alias migration
  script lands, this test must be inverted." Future readers will
  know exactly when and how to update it.

Full suite: 175 passing (was 174), 1 warning. The +1 is the new
gap regression test.

What this commit does NOT do
----------------------------
- The migration script itself is NOT in this commit. Codex's
  finding was a doc accuracy issue, and the right scope is fix
  the doc + lock the gap behavior in. Writing the migration is
  the next concrete step but is bigger (~200 LOC + dry-run mode
  + collision handling via the conflict model + supervised run
  on the live Dalidou DB), warrants its own commit, and probably
  warrants a "draft + review the dry-run output before applying"
  workflow rather than a single shot.
- Existing tests are unchanged. The new test stands alone as a
  documented gap; the 12 canonicalization tests from fb6298a
  still pass without modification.
2026-04-07 20:14:19 -04:00
f521aab97b docs(arch): project-identity-canonicalization contract
Codifies the helper-at-every-service-boundary rule that fb6298a
implemented across the eight current callsites. The contract is
intentionally simple but easy to forget, so it lives in its own
doc that the engineering layer V1 implementation sprint can read
before adding new project-keyed entity surfaces.

docs/architecture/project-identity-canonicalization.md
------------------------------------------------------
- The contract: every read/write that takes a project name MUST
  call resolve_project_name() before the value crosses a service
  boundary; canonicalization happens once, at the first statement
  after input validation, never later
- The helper API: resolve_project_name(name) returns the canonical
  project_id for registered names, the input unchanged for empty
  or unregistered names (the second case is the backwards-compat
  path for hand-curated state predating the registry)
- Full table of the 8 current callsites: builder.build_context,
  project_state.set_state/get_state/invalidate_state,
  interactions.record_interaction/list_interactions,
  memory.create_memory/get_memories
- Where the helper is intentionally NOT called and why: legacy
  ensure_project lookup, retriever's own _project_match_boost
  (which already calls get_registered_project), _rank_chunks
  secondary substring boost (multiplicative not filter, can't
  drop relevant chunks), update_memory (no project field update),
  unregistered names (the rule applied to a name with no record)
- Why this is the trust hierarchy in action: Layer 3 trusted
  state has to be findable to win the trust battle; an
  un-canonicalized lookup silently makes Layer 3 invisible and
  the system falls through to lower-trust retrieved chunks with
  no signal to the human
- The 4-step rule for new entry points: identify project-keyed
  reads/writes, place the call as the first statement after
  validation, add a regression test using the project_registry
  fixture, verify None/empty paths
- How the project_registry fixture works with a copy-pasteable
  example
- What the rule does NOT cover: alias creation (registry's own
  write path), registry hot-reloading (no in-process cache by
  design), cross-project dedup (collision detection at
  registration), time-bounded canonicalization (canonical id is
  stable forever), legacy data migration (open follow-up)
- Engineering layer V1 implications: every new service entry
  point in the entities/relationships/conflicts/mirror modules
  must apply the helper at the first statement after validation;
  treated as code review failure if missing
- Open follow-ups: legacy data migration script (~30 LOC),
  registry file caching when projects scale beyond ~50, case
  sensitivity audit when entity-side storage lands, _rank_chunks
  cleanup, documentation discoverability (intentional redundancy
  between this doc, the helper docstring, and per-callsite comments)
- Quick reference card: copy-pasteable template for new service
  functions

master-plan-status.md updated
-----------------------------
- New doc added to the engineering-layer planning sprint listing
- Marked as required reading before V1 implementation begins
- Note that V1 must apply the contract at every new service-layer
  entry point

Pure doc work, no code changes. Full suite stays at 174 passing
because no source changed.
2026-04-07 19:32:31 -04:00
fb6298a9a1 fix(P1+P2): canonicalize project names at every trust boundary
Three findings from codex's review of the previous P1+P2 fix. The
earlier commit (f2372ef) only fixed alias resolution at the context
builder. Codex correctly pointed out that the same fragmentation
applies at every other place a project name crosses a boundary —
project_state writes/reads, interaction capture/listing/filtering,
memory create/queries, and reinforcement's downstream queries. Plus
a real bug in the interaction `since` filter where the storage
format and the documented ISO format don't compare cleanly.

The fix is one helper used at every boundary instead of duplicating
the resolution inline.

New helper: src/atocore/projects/registry.py::resolve_project_name
---------------------------------------------------------------
- Single canonicalization boundary for project names
- Returns the canonical project_id when the input matches any
  registered id or alias
- Returns the input unchanged for empty/None and for unregistered
  names (preserves backwards compat with hand-curated state that
  predates the registry)
- Documented as the contract that every read/write at the trust
  boundary should pass through

P1 — Trusted Project State endpoints
------------------------------------
src/atocore/context/project_state.py: set_state, get_state, and
invalidate_state now all canonicalize project_name through
resolve_project_name BEFORE looking up or creating the project row.

Before this fix:
- POST /project/state with project="p05" called ensure_project("p05")
  which created a separate row in the projects table
- The state row was attached to that alias project_id
- Later context builds canonicalized "p05" -> "p05-interferometer"
  via the builder fix from f2372ef and never found the state
- Result: trusted state silently fragmented across alias rows

After this fix:
- The alias is resolved to the canonical id at every entry point
- Two captures (one via "p05", one via "p05-interferometer") write
  to the same row
- get_state via either alias or the canonical id finds the same row

Fixes the highest-priority gap codex flagged because Trusted Project
State is supposed to be the most dependable layer in the AtoCore
trust hierarchy.

P2.a — Interaction capture project canonicalization
----------------------------------------------------
src/atocore/interactions/service.py: record_interaction now
canonicalizes project before storing, so interaction.project is
always the canonical id regardless of what the client passed.

Downstream effects:
- reinforce_from_interaction queries memories by interaction.project
  -> previously missed memories stored under canonical id
  -> now consistent because interaction.project IS the canonical id
- the extractor stamps candidates with interaction.project
  -> previously created candidates in alias buckets
  -> now creates candidates in the canonical bucket
- list_interactions(project=alias) was already broken, now fixed by
  canonicalizing the filter input on the read side too

Memory service applied the same fix:
- src/atocore/memory/service.py: create_memory and get_memories
  both canonicalize project through resolve_project_name
- This keeps stored memory.project consistent with the
  reinforcement query path

P2.b — Interaction `since` filter format normalization
------------------------------------------------------
src/atocore/interactions/service.py: new _normalize_since helper.

The bug:
- created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by
  convention) so it sorts lexically and compares cleanly with the
  SQLite CURRENT_TIMESTAMP default
- The `since` parameter was documented as ISO 8601 but compared as
  a raw string against the storage format
- The lexically-greater 'T' separator means an ISO timestamp like
  '2026-04-07T12:00:00Z' is GREATER than the storage form
  '2026-04-07 12:00:00' for the same instant
- Result: a client passing ISO `since` got an empty result for any
  row from the same day, even though those rows existed and were
  technically "after" the cutoff in real-world time

The fix:
- _normalize_since accepts ISO 8601 with T, optional Z suffix,
  optional fractional seconds, optional +HH:MM offsets
- Uses datetime.fromisoformat for parsing (Python 3.11+)
- Converts to UTC and reformats as the storage format before the
  SQL comparison
- The bare storage format still works (backwards compat path is a
  regex match that returns the input unchanged)
- Unparseable input is returned as-is so the comparison degrades
  gracefully (rows just don't match) instead of raising and
  breaking the listing endpoint

builder.py refactor
-------------------
The previous P1 fix had inline canonicalization. Now it uses the
shared helper for consistency:
- import changed from get_registered_project to resolve_project_name
- the inline lookup is replaced with a single helper call
- the comment block now points at representation-authority.md for
  the canonicalization contract

New shared test fixture: tests/conftest.py::project_registry
------------------------------------------------------------
- Standardizes the registry-setup pattern that was duplicated
  across test_context_builder.py, test_project_state.py,
  test_interactions.py, and test_reinforcement.py
- Returns a callable that takes (project_id, [aliases]) tuples
  and writes them into a temp registry file with the env var
  pointed at it and config.settings reloaded
- Used by all 12 new regression tests in this commit

Tests (12 new, all green on first run)
--------------------------------------
test_project_state.py:
- test_set_state_canonicalizes_alias: write via alias, read via
  every alias and the canonical id, verify same row id
- test_get_state_canonicalizes_alias_after_canonical_write
- test_invalidate_state_canonicalizes_alias
- test_unregistered_project_state_still_works (backwards compat)

test_interactions.py:
- test_record_interaction_canonicalizes_project
- test_list_interactions_canonicalizes_project_filter
- test_list_interactions_since_accepts_iso_with_t_separator
- test_list_interactions_since_accepts_z_suffix
- test_list_interactions_since_accepts_offset
- test_list_interactions_since_storage_format_still_works

test_reinforcement.py:
- test_reinforcement_works_when_capture_uses_alias (end-to-end:
  capture under alias, seed memory under canonical, verify
  reinforcement matches)
- test_get_memories_filter_by_alias

Full suite: 174 passing (was 162), 1 warning. The +12 is the
new regression tests, no existing tests regressed.

What's still NOT canonicalized (and why)
----------------------------------------
- _rank_chunks's secondary substring boost in builder.py — the
  retriever already does the right thing via its own
  _project_match_boost which calls get_registered_project. The
  redundant secondary boost still uses the raw hint but it's a
  multiplicative factor on top of correct retrieval, not a
  filter, so it can't drop relevant chunks. Tracked as a future
  cleanup but not a P1.
- update_memory's project field (you can't change a memory's
  project after creation in the API anyway).
- The retriever's project_hint parameter on direct /query calls
  — same reasoning as the builder boost, plus the retriever's
  own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
f2372eff9e fix(P1+P2): alias-aware project state lookup + slash command corpus fallback
Two regression fixes from codex's review of the slash command
refactor commit (78d4e97). Both findings are real and now have
covered tests.

P1 — server-side alias resolution for project_state lookup
----------------------------------------------------------
The bug:
- /context/build forwarded the caller's project hint verbatim to
  get_state(project_hint), which does an exact-name lookup against
  the projects table (case-insensitive but no alias resolution)
- the project registry's alias matching was only used by the
  client's auto-context path and the retriever's project-match
  boost, never by the server's project_state lookup
- consequence: /atocore-context "... p05" would silently miss
  trusted project state stored under the canonical id
  "p05-interferometer", weakening project-hinted retrieval to
  the point that an explicit alias hint was *worse* than no hint

The fix in src/atocore/context/builder.py:
- import get_registered_project from the projects registry
- before calling get_state(project_hint), resolve the hint
  through get_registered_project; if a registry record exists,
  use the canonical project_id for the state lookup
- if no registry record exists, fall back to the raw hint so a
  hand-curated project_state entry that predates the registry
  still works (backwards compat with pre-registry deployments)

The retriever already does its own alias expansion via
get_registered_project for the project-match boost, so the
retriever side was never broken — only the project_state lookup
in the builder. The fix is scoped to that one call site.

Tests added in tests/test_context_builder.py:
- test_alias_hint_resolves_through_registry: stands up a fresh
  registry, sets state under "p05-interferometer", then verifies
  build_context with project_hint="p05" finds the state, AND
  with project_hint="interferometer" (the second alias) finds it
  too, AND with the canonical id finds it. Covers all three
  resolution paths.
- test_unknown_hint_falls_back_to_raw_lookup: empty registry,
  set state under an unregistered project name, verify the
  build_context call with that name as the hint still finds the
  state. Locks in the backwards-compat behavior.

P2 — slash command no-hint fallback to corpus-wide context build
----------------------------------------------------------------
The bug:
- the slash command's no-hint path called auto-context, which
  returns {"status": "no_project_match"} when project detection
  fails and does NOT fall back to a plain context-build
- the slash command's own help text told the user "call without
  a hint to use the corpus-wide context build" — which was a lie
  because the wrapper no longer did that
- consequence: generic prompts like "what changed in AtoCore
  backup policy?" or any cross-project question got a useless
  no_project_match envelope instead of a context pack

The fix in .claude/commands/atocore-context.md:
- the no-hint path now does the 2-step fallback dance:
    1. try `auto-context "<prompt>"` for project detection
    2. if the response contains "no_project_match", fall back to
       `context-build "<prompt>"` (no project arg)
- both branches return a real context pack, fail-open envelope
  is preserved for genuine network errors
- the underlying client surface is unchanged (no new flags, no
  new subcommands) — the fallback is per-frontend logic in the
  slash command, leaving auto-context's existing semantics
  intact for OpenClaw and any other caller that depends on the
  no_project_match envelope as a "do nothing" signal

While I was here, also tightened the slash command's argument
parsing to delegate alias-knowledge to the registry instead of
embedding a hardcoded list:
- old version had a literal list of "atocore", "p04", "p05",
  "p06" and their aliases that needed manual maintenance every
  time a project was added
- new version takes the last token of $ARGUMENTS and asks the
  client's `detect-project` subcommand whether it's a known
  alias; if matched, it's the explicit hint, if not it's part
  of the prompt
- this delegates registry knowledge to the registry, where it
  belongs

Unrelated improvement noted but NOT fixed in this commit:
- _rank_chunks in builder.py also has a naive substring boost
  that uses the original hint without alias expansion. The
  retriever already does the right thing, so this secondary
  boost is redundant. Tracked as a future cleanup but not in
  scope for the P1/P2 fix; codex's findings are about
  project_state lookup, not about the secondary chunk boost.

Full suite: 162 passing (was 160), 1 warning. The +2 is the two
new P1 regression tests.
2026-04-07 07:47:03 -04:00
78d4e979e5 refactor slash command onto shared client + llm-client-integration doc
Codex's review caught that the Claude Code slash command shipped in
Session 2 was a parallel reimplementation of routing logic the
existing scripts/atocore_client.py already had. That client was
introduced via the codex/port-atocore-ops-client merge and is
already a comprehensive operator client (auto-context,
detect-project, refresh-project, project-state, audit-query, etc.).
The slash command should have been a thin wrapper from the start.

This commit fixes the shape without expanding scope.

.claude/commands/atocore-context.md
-----------------------------------
Rewritten as a thin Claude Code-specific frontend that shells out
to the shared client:

- explicit project hint -> calls `python scripts/atocore_client.py
  context-build "<prompt>" "<project>"`
- no explicit hint -> calls `python scripts/atocore_client.py
  auto-context "<prompt>"` which runs the client's detect-project
  routing first and falls through to context-build with the match

Inherits the client's stable behaviour for free:
- ATOCORE_BASE_URL env var (default http://dalidou:8100)
- fail-open on network errors via ATOCORE_FAIL_OPEN
- consistent JSON output shape
- the same project alias matching the OpenClaw helper uses

Removes the speculative `--capture` capture path that was in the
original draft. Capture/extract/queue/promote/reject are
intentionally NOT in the shared client yet (memory-review
workflow not exercised in real use), so the slash command can't
expose them either.

docs/architecture/llm-client-integration.md
-------------------------------------------
New planning doc that defines the layering rule for AtoCore's
relationship with LLM client contexts:

Three layers:
1. AtoCore HTTP API (universal, src/atocore/api/routes.py)
2. Shared operator client (scripts/atocore_client.py) — the
   canonical Python backbone for stable AtoCore operations
3. Per-agent thin frontends (Claude Code slash command,
   OpenClaw helper, future Codex skill, future MCP server)
   that shell out to the shared client

Three non-negotiable rules:
- every per-agent frontend is a thin wrapper (translate the
  agent's command format and render the JSON; nothing else)
- the shared client never duplicates the API (it composes
  endpoints; new logic goes in the API first)
- the shared client only exposes stable operations (subcommands
  land only after the API has been exercised in a real workflow)

Doc covers:
- the full table of subcommands currently in scope (project
  lifecycle, ingestion, project-state, retrieval, context build,
  audit-query, debug-context, health/stats)
- the three deferred families with rationale: memory review
  queue (workflow not exercised), backup admin (fail-open
  default would hide errors), engineering layer entities (V1
  not yet implemented)
- the integration recipe for new agent platforms
- explicit acknowledgement that the OpenClaw helper currently
  duplicates routing logic and that the refactor to the shared
  client is a queued cross-repo follow-up
- how the layering connects to phase 8 (OpenClaw) and phase 11
  (multi-model)
- versioning and stability rules for the shared client surface
- open follow-ups: OpenClaw refactor, memory-review subcommands
  when ready, optional backup admin subcommands, engineering
  entity subcommands during V1 implementation

master-plan-status.md updated
-----------------------------
- New "LLM Client Integration" subsection that points to the
  layering doc and explicitly notes the deferral of memory-review
  and engineering-entity subcommands
- Frames the layering as sitting between phase 8 and phase 11

Scope is intentionally narrow per codex's framing: promote the
existing client to canonical status, refactor the slash command
to use it, document the layering. No new client subcommands
added in this commit. The OpenClaw helper refactor is a
separate cross-repo follow-up. Memory-review and engineering-
entity work stay deferred.

Full suite: 160 passing, no behavior changes.
2026-04-07 07:22:54 -04:00
d6ce6128cf docs(arch): human-mirror-rules + engineering-v1-acceptance, sprint complete
Session 4 of the four-session plan. Final two engineering planning
docs, plus master-plan-status.md updated to reflect that the
engineering layer planning sprint is now complete.

docs/architecture/human-mirror-rules.md
---------------------------------------
The Layer 3 derived markdown view spec:

- The non-negotiable rule: the Mirror is read-only from the
  human's perspective; edits go to the canonical home and the
  Mirror picks them up on regeneration
- 3 V1 template families: Project Overview, Decision Log,
  Subsystem Detail
- Explicit V1 exclusions: per-component pages, per-decision
  pages, cross-project rollups, time-series pages, diff pages,
  conflict queue render, per-memory pages
- Mirror files live in /srv/storage/atocore/data/mirror/ NOT in
  the source vault (sources stay read-only per the operating
  model)
- 3 regeneration triggers: explicit POST, debounced async on
  entity write, daily scheduled refresh
- "Do not edit" header banner with checksum so unchanged inputs
  skip work
- Conflicts and project_state overrides surface inline so the
  trust hierarchy is visible in the human reading experience
- Templates checked in under templates/mirror/, edited via PR
- Deterministic output is a V1 requirement so future Mirror
  diffing works without rework
- Open questions for V1: debounce window, scheduler integration,
  template testing approach, directory listing endpoint, empty
  state rendering

docs/architecture/engineering-v1-acceptance.md
----------------------------------------------
The measurable done definition:

- Single-sentence definition: V1 is done when every v1-required
  query in engineering-query-catalog.md returns a correct result
  for one chosen test project, the Human Mirror renders a
  coherent overview, and a real KB-CAD or KB-FEM export round-
  trips through ingest -> review queue -> active entity without
  violating any conflict or trust invariant
- 23 acceptance criteria across 4 categories:
  * Functional (8): entity store, all 20 v1-required queries,
    tool ingest endpoints, candidate review queue, conflict
    detection, Human Mirror, memory-to-entity graduation,
    complete provenance chain
  * Quality (6): existing tests pass, V1 has its own coverage,
    conflict invariants enforced, trust hierarchy enforced,
    Mirror reproducible via golden file, killer correctness
    queries pass against representative data
  * Operational (5): safe migration, backup/restore drill,
    performance bounds, no new manual ops burden, Phase 9 not
    regressed
  * Documentation (4): per-entity-type spec docs, KB schema docs,
    V1 release notes, master-plan-status updated
- Explicit negative list of things V1 does NOT need to do:
  no LLM extractor, no auto-promotion, no write-back, no
  multi-user, no real-time UI, no cross-project rollups,
  no time-travel, no nightly conflict sweep, no incremental
  Chroma, no retention cleanup, no encryption, no off-Dalidou
  backup target
- Recommended implementation order: F-1 -> F-8 in sequence,
  with the graduation flow (F-7) saved for last as the most
  cross-cutting change
- Anticipated friction points called out in advance:
  graduation cross-cuts memory module, Mirror determinism trap,
  conflict detector subtle correctness, provenance backfill
  for graduated entities

master-plan-status.md updated
-----------------------------
- Engineering Layer Planning Sprint section now marked complete
  with all 8 architecture docs listed
- Note that the next concrete step is the V1 implementation
  sprint following engineering-v1-acceptance.md as its checklist

Pure doc work. No code, no schema, no behavior changes.

After this commit, the engineering planning sprint is fully done
(8/8 docs) and Phase 9 is fully complete (Commits A/B/C all
shipped, validated, and pushed). AtoCore is ready for either
the engineering V1 implementation sprint OR a pause for real-
world Phase 9 usage, depending on which the user prefers next.
2026-04-07 06:55:43 -04:00
368adf2ebc docs(arch): tool-handoff-boundaries + representation-authority
Session 3 of the four-session plan. Two more engineering planning
docs that lock in the most contentious architectural decisions
before V1 implementation begins.

docs/architecture/tool-handoff-boundaries.md
--------------------------------------------
Locks in the V1 read/write relationship with external tools:

- AtoCore is a one-way mirror in V1. External tools push,
  AtoCore reads, AtoCore never writes back.
- Per-tool stance table covering KB-CAD, KB-FEM, NX, PKM, Gitea
  repos, OpenClaw, AtoDrive, PLM/vendor systems
- Two new ingest endpoints proposed for V1:
  POST /ingest/kb-cad/export and POST /ingest/kb-fem/export
- Sketch JSON shapes for both exports (intentionally minimal,
  to be refined in dedicated schema docs during implementation)
- Drift handling: KB-CAD changes a value -> creates an entity
  candidate -> existing active becomes a conflict member ->
  human resolves via the conflict model
- Hard-line invariants V1 will not cross: no write to external
  tools, no live polling, no silent merging, no schema fan-out,
  no external-tool-specific logic in entity types
- Why not bidirectional: schema drift, conflict semantics, trust
  hierarchy, velocity, reversibility
- V2+ deferred items: selective write-back annotations, light
  polling, direct NX integration, cost/vendor/PLM connections
- Open questions for the implementation sprint: schema location,
  who runs the exporter, full-vs-incremental, exporter auth

docs/architecture/representation-authority.md
---------------------------------------------
The canonical-home matrix that says where each kind of fact
actually lives:

- Six representation layers identified: PKM, KB project,
  Gitea repos, AtoCore memories, AtoCore entities, AtoCore
  project_state
- The hard rule: every fact kind has exactly one canonical
  home; other layers may hold derived copies but never disagree
- Comprehensive matrix covering 22 fact kinds (CAD geometry,
  CAD-side structure, FEM mesh, FEM results, code, repo docs,
  PKM prose, identity, preference, episodic, decision,
  requirement, constraint, validation claim, material,
  parameter, project status, ADRs, runbooks, backup metadata,
  interactions)
- Cross-layer supremacy rule: project_state > tool-of-origin >
  entities > active memories > source chunks
- Three worked examples showing how the rules apply:
  * "what material does the lateral support pad use?" (KB-CAD
    canonical, project_state override possible)
  * "did we decide to merge the bind mounts?" (Gitea + memory
    both canonical for different aspects)
  * "what's p05's current next focus?" (project_state always
    wins for current state queries)
- Concrete consequences for V1 implementation: Material and
  Parameter are mostly KB-CAD shadows; Decisions / Requirements /
  Constraints / ValidationClaims are AtoCore-canonical; PKM is
  never authoritative; project_state is the override layer;
  the conflict model is the enforcement mechanism
- Out of scope for V1: facts about other people, vendor/cost
  facts, time-bounded facts, cross-project shared facts
- Open questions for V1: how the reviewer sees canonical home
  in the UI, whether entities need an explicit canonical_home
  field, how project_state overrides surface in query results

This is pure doc work. No code, no schema, no behavior changes.
After this commit the engineering planning sprint is 6 of 8 docs
done — only human-mirror-rules and engineering-v1-acceptance
remain.
2026-04-07 06:50:56 -04:00
a637017900 slash command for daily AtoCore use + backup-restore procedure
Session 2 of the four-session plan. Lands two operational pieces:
the Claude Code slash command that makes AtoCore reachable from
inside any Claude Code session, and the full backup/restore
procedure doc that turns the backup endpoint code into a real
operational drill.

Slash command (.claude/commands/atocore-context.md)
---------------------------------------------------
- Project-level slash command following the standard frontmatter
  format (description + argument-hint)
- Parses the user prompt and an optional trailing project id, with
  case-insensitive matching against the registered project ids
  (atocore, p04-gigabit, p05-interferometer, p06-polisher and
  their aliases)
- Calls POST /context/build on the live AtoCore service, defaulting
  to http://dalidou:8100 (overridable via ATOCORE_API_BASE env var)
- Renders the formatted context pack inline so the user can see
  exactly what AtoCore would feed an LLM, plus a stats banner and a
  per-chunk source list
- Includes graceful failure handling for network errors, 4xx, 5xx,
  and the empty-result case
- Defines a future capture path that POSTs to /interactions for the
  Phase 9 reflection loop. The current command leaves capture as
  manual / opt-in pending a clean post-turn hook design

.gitignore changes
------------------
- Replaced wholesale .claude/ ignore with .claude/* + exceptions
  for .claude/commands/ so project slash commands can be tracked
- Other .claude/* paths (worktrees, settings, local state) remain
  ignored

Backup-restore procedure (docs/backup-restore-procedure.md)
-----------------------------------------------------------
- Defines what gets backed up (SQLite + registry always, Chroma
  optional under ingestion lock) and what doesn't (sources, code,
  logs, cache, tmp)
- Documents the snapshot directory layout and the timestamp format
- Three trigger paths in priority order:
  - via POST /admin/backup with {include_chroma: true|false}
  - via the standalone src/atocore/ops/backup.py module
  - via cold filesystem copy with brief downtime as last resort
- Listing and validation procedure with the /admin/backup and
  /admin/backup/{stamp}/validate endpoints
- Full step-by-step restore procedure with mandatory pre-flight
  safety snapshot, ownership/permission requirements, and the
  post-restore verification checks
- Rollback path using the pre-restore safety copy
- Retention policy (last 7 daily / 4 weekly / 6 monthly) and
  explicit acknowledgment that the cleanup job is not yet
  implemented
- Drill schedule: quarterly full restore drill, post-migration
  drill, post-incident validation
- Common failure mode table with diagnoses
- Quickstart cheat sheet at the end for daily reference
- Open follow-ups: cleanup script, off-Dalidou target,
  encryption, automatic post-backup validation, incremental
  Chroma snapshots

The procedure has not yet been exercised against the live Dalidou
instance — that is the next step the user runs themselves once
the slash command is in place.
2026-04-07 06:46:50 -04:00
d0ff8b5738 Merge origin/main into codex/dalidou-storage-foundation
Integrate codex/port-atocore-ops-client (operator client + operations
playbook) so the dalidou-storage-foundation branch can fast-forward
into main.

# Conflicts:
#	README.md
2026-04-07 06:20:19 -04:00
ac14f8d6a4 Merge branch 'codex/port-atocore-ops-client' 2026-04-06 20:05:56 -04:00
ceb129c7d1 Add operator client and operations playbook 2026-04-06 19:59:09 -04:00
43 changed files with 8551 additions and 78 deletions

View File

@@ -0,0 +1,159 @@
---
description: Pull a context pack from the live AtoCore service for the current prompt
argument-hint: <prompt text> [project-id]
---
You are about to enrich a user prompt with context from the live
AtoCore service. This is the daily-use entry point for AtoCore from
inside Claude Code.
The work happens via the **shared AtoCore operator client** at
`scripts/atocore_client.py`. That client is the canonical Python
backbone for stable AtoCore operations and is meant to be reused by
every LLM client (OpenClaw helper, future Codex skill, etc.) — see
`docs/architecture/llm-client-integration.md` for the layering. This
slash command is a thin Claude Code-specific frontend on top of it.
## Step 1 — parse the arguments
The user invoked `/atocore-context` with:
```
$ARGUMENTS
```
You need to figure out two things:
1. The **prompt text** — what AtoCore will retrieve context for
2. An **optional project hint** — used to scope retrieval to a
specific project's trusted state and corpus
The user may have passed a project id or alias as the **last
whitespace-separated token**. Don't maintain a hardcoded list of
known aliases — let the shared client decide. Use this rule:
- Take the last token of `$ARGUMENTS`. Call it `MAYBE_HINT`.
- Run `python scripts/atocore_client.py detect-project "$MAYBE_HINT"`
to ask the registry whether it's a known project id or alias.
This call is cheap (it just hits `/projects` and does a regex
match) and inherits the client's fail-open behavior.
- If the response has a non-null `matched_project`, the last
token was an explicit project hint. `PROMPT_TEXT` is everything
except the last token; `PROJECT_HINT` is the matched canonical
project id.
- Otherwise the last token is just part of the prompt.
`PROMPT_TEXT` is the full `$ARGUMENTS`; `PROJECT_HINT` is empty.
This delegates the alias-knowledge to the registry instead of
embedding a stale list in this markdown file. When you add a new
project to the registry, the slash command picks it up
automatically with no edits here.
## Step 2 — call the shared client for the context pack
The server resolves project hints through the registry before
looking up trusted state, so you can pass either the canonical id
or any alias to `context-build` and the trusted state lookup will
work either way. (Regression test:
`tests/test_context_builder.py::test_alias_hint_resolves_through_registry`.)
**If `PROJECT_HINT` is non-empty**, call `context-build` directly
with that hint:
```bash
python scripts/atocore_client.py context-build \
"$PROMPT_TEXT" \
"$PROJECT_HINT"
```
**If `PROJECT_HINT` is empty**, do the 2-step fallback dance so the
user always gets a context pack regardless of whether the prompt
implies a project:
```bash
# Try project auto-detection first.
RESULT=$(python scripts/atocore_client.py auto-context "$PROMPT_TEXT")
# If auto-context could not detect a project it returns a small
# {"status": "no_project_match", ...} envelope. In that case fall
# back to a corpus-wide context build with no project hint, which
# is the right behaviour for cross-project or generic prompts like
# "what changed in AtoCore backup policy this week?"
if echo "$RESULT" | grep -q '"no_project_match"'; then
RESULT=$(python scripts/atocore_client.py context-build "$PROMPT_TEXT")
fi
echo "$RESULT"
```
This is the fix for the P2 finding from codex's review: previously
the slash command sent every no-hint prompt through `auto-context`
and returned `no_project_match` to the user with no context, even
though the underlying client's `context-build` subcommand has
always supported corpus-wide context builds.
In both branches the response is the JSON payload from
`/context/build` (or, in the rare case where even the corpus-wide
build fails, a `{"status": "unavailable"}` envelope from the
client's fail-open layer).
## Step 3 — present the context pack to the user
The successful response contains at least:
- `formatted_context` — the assembled context block AtoCore would
feed an LLM
- `chunks_used`, `total_chars`, `budget`, `budget_remaining`,
`duration_ms`
- `chunks` — array of source documents that contributed, each with
`source_file`, `heading_path`, `score`
Render in this order:
1. A one-line stats banner: `chunks=N, chars=X/budget, duration=Yms`
2. The `formatted_context` block verbatim inside a fenced text code
block so the user can read what AtoCore would feed an LLM
3. The `chunks` array as a small bullet list with `source_file`,
`heading_path`, and `score` per chunk
Two special cases:
- **`{"status": "unavailable"}`** (fail-open from the client)
→ Tell the user: "AtoCore is unreachable at `$ATOCORE_BASE_URL`.
Check `python scripts/atocore_client.py health` for diagnostics."
- **Empty `chunks_used: 0` with no project state and no memories**
→ Tell the user: "AtoCore returned no context for this prompt —
either the corpus does not have relevant information or the
project hint is wrong. Try a different hint or a longer prompt."
## Step 4 — what about capturing the interaction
Capture (Phase 9 Commit A) and the rest of the reflection loop
(reinforcement, extraction, review queue) are intentionally NOT
exposed by the shared client yet. The contracts are stable but the
workflow ergonomics are not, so the daily-use slash command stays
focused on context retrieval until those review flows have been
exercised in real use. See `docs/architecture/llm-client-integration.md`
for the deferral rationale.
When capture is added to the shared client, this slash command will
gain a follow-up `/atocore-record-response` companion command that
posts the LLM's response back to the same interaction. That work is
queued.
## Notes for the assistant
- DO NOT bypass the shared client by calling curl yourself. The
client is the contract between AtoCore and every LLM frontend; if
you find a missing capability, the right fix is to extend the
client, not to work around it.
- DO NOT maintain a hardcoded list of project aliases in this
file. Use `detect-project` to ask the registry — that's the
whole point of having a registry.
- DO NOT silently change `ATOCORE_BASE_URL`. If the env var points
at the wrong instance, surface the error so the user can fix it.
- DO NOT hide the formatted context pack from the user. Showing
what AtoCore would feed an LLM is the whole point.
- The output goes into the user's working context as background;
they may follow up with their actual question, and the AtoCore
context pack acts as informal injected knowledge.

4
.gitignore vendored
View File

@@ -10,4 +10,6 @@ htmlcov/
.coverage
venv/
.venv/
.claude/
.claude/*
!.claude/commands/
!.claude/commands/**

View File

@@ -24,6 +24,10 @@ curl -X POST http://localhost:8100/context/build \
# CLI ingestion
python scripts/ingest_folder.py --path /path/to/notes
# Live operator client
python scripts/atocore_client.py health
python scripts/atocore_client.py audit-query "gigabit" 5
```
## API Endpoints
@@ -66,10 +70,19 @@ pip install -e ".[dev]"
pytest
```
## Operations
- `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
- `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
## Architecture Notes
Implementation-facing architecture notes live under `docs/architecture/`.
Current additions:
- `docs/architecture/engineering-knowledge-hybrid-architecture.md`
- `docs/architecture/engineering-ontology-v1.md`
- `docs/architecture/engineering-knowledge-hybrid-architecture.md` — 5-layer hybrid model
- `docs/architecture/engineering-ontology-v1.md` — V1 object and relationship inventory
- `docs/architecture/engineering-query-catalog.md` — 20 v1-required queries
- `docs/architecture/memory-vs-entities.md` — canonical home split
- `docs/architecture/promotion-rules.md` — Layer 0 to Layer 2 pipeline
- `docs/architecture/conflict-model.md` — contradictory facts detection and resolution

349
deploy/dalidou/deploy.sh Normal file
View File

@@ -0,0 +1,349 @@
#!/usr/bin/env bash
#
# deploy/dalidou/deploy.sh
# -------------------------
# One-shot deploy script for updating the running AtoCore container
# on Dalidou from the current Gitea main branch.
#
# The script is idempotent and safe to re-run. It handles both the
# first-time deploy (where /srv/storage/atocore/app may not yet be
# a git checkout) and the ongoing update case (where it is).
#
# Usage
# -----
#
# # Normal update from main (most common)
# bash deploy/dalidou/deploy.sh
#
# # Deploy a specific branch or tag
# ATOCORE_BRANCH=codex/some-feature bash deploy/dalidou/deploy.sh
#
# # Dry-run: show what would happen without touching anything
# ATOCORE_DEPLOY_DRY_RUN=1 bash deploy/dalidou/deploy.sh
#
# Environment variables
# ---------------------
#
# ATOCORE_APP_DIR default /srv/storage/atocore/app
# ATOCORE_GIT_REMOTE default http://127.0.0.1:3000/Antoine/ATOCore.git
# This is the local Dalidou gitea, reached
# via loopback. Override only when running
# the deploy from a remote host. The default
# is loopback (not the hostname "dalidou")
# because the hostname doesn't reliably
# resolve on the host itself — Dalidou
# Claude's first deploy had to work around
# exactly this.
# ATOCORE_BRANCH default main
# ATOCORE_DEPLOY_DRY_RUN if set to 1, report only, no mutations
# ATOCORE_HEALTH_URL default http://127.0.0.1:8100/health
#
# Safety rails
# ------------
#
# - If the app dir exists but is NOT a git repo, the script renames
# it to <dir>.pre-git-<timestamp> before re-cloning, so you never
# lose the pre-existing snapshot to a git clobber.
# - If the health check fails after restart, the script exits
# non-zero and prints the container logs tail for diagnosis.
# - Dry-run mode is the default recommendation for the first deploy
# on a new environment: it shows the planned git operations and
# the compose command without actually running them.
#
# What this script does NOT do
# ----------------------------
#
# - Does not manage secrets / .env files. The caller is responsible
# for placing deploy/dalidou/.env before running.
# - Does not run a backup before deploying. Run the backup endpoint
# first if you want a pre-deploy snapshot.
# - Does not roll back on health-check failure. If deploy fails,
# the previous container is already stopped; you need to redeploy
# a known-good commit to recover.
# - Does not touch the database. The Phase 9 schema migrations in
# src/atocore/models/database.py::_apply_migrations are idempotent
# ALTER TABLE ADD COLUMN calls that run at service startup via the
# lifespan handler. Stale pre-Phase-9 schema is upgraded in place.
set -euo pipefail
APP_DIR="${ATOCORE_APP_DIR:-/srv/storage/atocore/app}"
GIT_REMOTE="${ATOCORE_GIT_REMOTE:-http://127.0.0.1:3000/Antoine/ATOCore.git}"
BRANCH="${ATOCORE_BRANCH:-main}"
HEALTH_URL="${ATOCORE_HEALTH_URL:-http://127.0.0.1:8100/health}"
DRY_RUN="${ATOCORE_DEPLOY_DRY_RUN:-0}"
COMPOSE_DIR="$APP_DIR/deploy/dalidou"
log() { printf '==> %s\n' "$*"; }
run() {
if [ "$DRY_RUN" = "1" ]; then
printf ' [dry-run] %s\n' "$*"
else
eval "$@"
fi
}
log "AtoCore deploy starting"
log " app dir: $APP_DIR"
log " git remote: $GIT_REMOTE"
log " branch: $BRANCH"
log " health url: $HEALTH_URL"
log " dry run: $DRY_RUN"
# ---------------------------------------------------------------------
# Step 0: pre-flight permission check
# ---------------------------------------------------------------------
#
# If $APP_DIR exists but the current user cannot write to it (because
# a previous manual deploy left it root-owned, for example), the git
# fetch / reset in step 1 will fail with cryptic errors. Detect this
# up front and give the operator a clean remediation command instead
# of letting git produce half-state on partial failure. This was the
# exact workaround the 2026-04-08 Dalidou redeploy needed — pre-
# existing root ownership from the pre-phase9 manual schema fix.
if [ -d "$APP_DIR" ] && [ "$DRY_RUN" != "1" ]; then
if [ ! -w "$APP_DIR" ] || [ ! -r "$APP_DIR/.git" ] 2>/dev/null; then
log "WARNING: app dir exists but may not be writable by current user"
fi
current_owner="$(stat -c '%U:%G' "$APP_DIR" 2>/dev/null || echo unknown)"
current_user="$(id -un 2>/dev/null || echo unknown)"
current_uid_gid="$(id -u 2>/dev/null):$(id -g 2>/dev/null)"
log "Step 0: permission check"
log " app dir owner: $current_owner"
log " current user: $current_user ($current_uid_gid)"
# Try to write a tiny marker file. If it fails, surface a clean
# remediation message and exit before git produces confusing
# half-state.
marker="$APP_DIR/.deploy-permission-check"
if ! ( : > "$marker" ) 2>/dev/null; then
log "FATAL: cannot write to $APP_DIR as $current_user"
log ""
log "The app dir is owned by $current_owner and the current user"
log "doesn't have write permission. This usually happens after a"
log "manual workaround deploy that ran as root."
log ""
log "Remediation (pick the one that matches your setup):"
log ""
log " # If you have passwordless sudo and gitea runs as UID 1000:"
log " sudo chown -R 1000:1000 $APP_DIR"
log ""
log " # If you're running deploy.sh itself as root:"
log " sudo bash $0"
log ""
log " # If neither works, do it via a throwaway container:"
log " docker run --rm -v $APP_DIR:/app alpine \\"
log " chown -R 1000:1000 /app"
log ""
log "Then re-run deploy.sh."
exit 5
fi
rm -f "$marker" 2>/dev/null || true
fi
# ---------------------------------------------------------------------
# Step 1: make sure $APP_DIR is a proper git checkout of the branch
# ---------------------------------------------------------------------
if [ -d "$APP_DIR/.git" ]; then
log "Step 1: app dir is already a git checkout; fetching latest"
run "cd '$APP_DIR' && git fetch origin '$BRANCH'"
run "cd '$APP_DIR' && git reset --hard 'origin/$BRANCH'"
else
log "Step 1: app dir is NOT a git checkout; converting"
if [ -d "$APP_DIR" ]; then
BACKUP="${APP_DIR}.pre-git-$(date -u +%Y%m%dT%H%M%SZ)"
log " backing up existing snapshot to $BACKUP"
run "mv '$APP_DIR' '$BACKUP'"
fi
log " cloning $GIT_REMOTE -> $APP_DIR (branch: $BRANCH)"
run "git clone --branch '$BRANCH' '$GIT_REMOTE' '$APP_DIR'"
fi
# ---------------------------------------------------------------------
# Step 1.5: self-update re-exec guard
# ---------------------------------------------------------------------
#
# When deploy.sh itself changes in the commit we just pulled, the bash
# process running this script is still executing the OLD deploy.sh
# from memory — git reset --hard updated the file on disk but our
# in-memory instructions are stale. That's exactly how the first
# 2026-04-09 Dalidou deploy silently wrote "unknown" build_sha: old
# Step 2 logic ran against fresh source. Detect the mismatch and
# re-exec into the fresh copy so every post-update run exercises the
# new script.
#
# Guard rails:
# - Only runs when $APP_DIR exists, holds a git checkout, and a
# deploy.sh exists there (i.e. after Step 1 succeeded).
# - Uses a sentinel env var ATOCORE_DEPLOY_REEXECED=1 to make sure
# we only re-exec once, never recurse.
# - Skipped in dry-run mode (no mutation).
# - Skipped if $0 isn't a readable file (bash -c pipe inputs, etc.).
if [ "$DRY_RUN" != "1" ] \
&& [ -z "${ATOCORE_DEPLOY_REEXECED:-}" ] \
&& [ -r "$0" ] \
&& [ -f "$APP_DIR/deploy/dalidou/deploy.sh" ]; then
ON_DISK_HASH="$(sha1sum "$APP_DIR/deploy/dalidou/deploy.sh" 2>/dev/null | awk '{print $1}')"
RUNNING_HASH="$(sha1sum "$0" 2>/dev/null | awk '{print $1}')"
if [ -n "$ON_DISK_HASH" ] \
&& [ -n "$RUNNING_HASH" ] \
&& [ "$ON_DISK_HASH" != "$RUNNING_HASH" ]; then
log "Step 1.5: deploy.sh changed in the pulled commit; re-exec'ing"
log " running script hash: $RUNNING_HASH"
log " on-disk script hash: $ON_DISK_HASH"
log " re-exec -> $APP_DIR/deploy/dalidou/deploy.sh"
export ATOCORE_DEPLOY_REEXECED=1
exec bash "$APP_DIR/deploy/dalidou/deploy.sh" "$@"
fi
fi
# ---------------------------------------------------------------------
# Step 2: capture build provenance to pass to the container
# ---------------------------------------------------------------------
#
# We compute the full SHA, the short SHA, the UTC build timestamp,
# and the source branch. These get exported as env vars before
# `docker compose up -d --build` so the running container can read
# them at startup and report them via /health. The post-deploy
# verification step (Step 6) reads /health and compares the
# reported SHA against this value to detect any silent drift.
log "Step 2: capturing build provenance"
if [ "$DRY_RUN" != "1" ] && [ -d "$APP_DIR/.git" ]; then
DEPLOYING_SHA_FULL="$(cd "$APP_DIR" && git rev-parse HEAD)"
DEPLOYING_SHA="$(echo "$DEPLOYING_SHA_FULL" | cut -c1-7)"
DEPLOYING_TIME="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
DEPLOYING_BRANCH="$BRANCH"
log " commit: $DEPLOYING_SHA ($DEPLOYING_SHA_FULL)"
log " built at: $DEPLOYING_TIME"
log " branch: $DEPLOYING_BRANCH"
( cd "$APP_DIR" && git log --oneline -1 ) | sed 's/^/ /'
export ATOCORE_BUILD_SHA="$DEPLOYING_SHA_FULL"
export ATOCORE_BUILD_TIME="$DEPLOYING_TIME"
export ATOCORE_BUILD_BRANCH="$DEPLOYING_BRANCH"
else
log " [dry-run] would read git log from $APP_DIR"
DEPLOYING_SHA="dry-run"
DEPLOYING_SHA_FULL="dry-run"
fi
# ---------------------------------------------------------------------
# Step 3: preserve the .env file (it's not in git)
# ---------------------------------------------------------------------
ENV_FILE="$COMPOSE_DIR/.env"
if [ "$DRY_RUN" != "1" ] && [ ! -f "$ENV_FILE" ]; then
log "Step 3: WARNING — $ENV_FILE does not exist"
log " the compose workflow needs this file to map mount points"
log " copy deploy/dalidou/.env.example to $ENV_FILE and edit it"
log " before re-running this script"
exit 2
fi
# ---------------------------------------------------------------------
# Step 4: rebuild and restart the container
# ---------------------------------------------------------------------
log "Step 4: rebuilding and restarting the atocore container"
run "cd '$COMPOSE_DIR' && docker compose up -d --build"
if [ "$DRY_RUN" = "1" ]; then
log "dry-run complete — no mutations performed"
exit 0
fi
# ---------------------------------------------------------------------
# Step 5: wait for the service to come up and pass the health check
# ---------------------------------------------------------------------
log "Step 5: waiting for /health to respond"
for i in 1 2 3 4 5 6 7 8 9 10; do
if curl -fsS "$HEALTH_URL" > /tmp/atocore-health.json 2>/dev/null; then
log " service is responding"
break
fi
log " not ready yet ($i/10); waiting 3s"
sleep 3
done
if ! curl -fsS "$HEALTH_URL" > /tmp/atocore-health.json 2>/dev/null; then
log "FATAL: service did not come up within 30 seconds"
log " container logs (last 50 lines):"
cd "$COMPOSE_DIR" && docker compose logs --tail=50 atocore || true
exit 3
fi
# ---------------------------------------------------------------------
# Step 6: verify the deployed build matches what we just shipped
# ---------------------------------------------------------------------
#
# Two layers of comparison:
#
# - code_version: matches src/atocore/__init__.py::__version__.
# Coarse: any commit between version bumps reports the same value.
# - build_sha: full git SHA the container was built from. Set as
# an env var by Step 2 above and read by /health from
# ATOCORE_BUILD_SHA. This is the precise drift signal — if the
# live build_sha doesn't match $DEPLOYING_SHA_FULL, the build
# didn't pick up the new source.
log "Step 6: verifying deployed build"
log " /health response:"
if command -v jq >/dev/null 2>&1; then
jq . < /tmp/atocore-health.json | sed 's/^/ /'
REPORTED_VERSION="$(jq -r '.code_version // .version' < /tmp/atocore-health.json)"
REPORTED_SHA="$(jq -r '.build_sha // "unknown"' < /tmp/atocore-health.json)"
REPORTED_BUILD_TIME="$(jq -r '.build_time // "unknown"' < /tmp/atocore-health.json)"
else
cat /tmp/atocore-health.json | sed 's/^/ /'
echo
REPORTED_VERSION="$(grep -o '"code_version":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
if [ -z "$REPORTED_VERSION" ]; then
REPORTED_VERSION="$(grep -o '"version":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
fi
REPORTED_SHA="$(grep -o '"build_sha":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
REPORTED_SHA="${REPORTED_SHA:-unknown}"
REPORTED_BUILD_TIME="$(grep -o '"build_time":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
REPORTED_BUILD_TIME="${REPORTED_BUILD_TIME:-unknown}"
fi
EXPECTED_VERSION="$(grep -oE "__version__ = \"[^\"]+\"" "$APP_DIR/src/atocore/__init__.py" | head -1 | cut -d'"' -f2)"
log " Layer 1 — coarse version:"
log " expected code_version: $EXPECTED_VERSION (from src/atocore/__init__.py)"
log " reported code_version: $REPORTED_VERSION (from live /health)"
if [ "$REPORTED_VERSION" != "$EXPECTED_VERSION" ]; then
log "FATAL: code_version mismatch"
log " the container may not have picked up the new image"
log " try: docker compose down && docker compose up -d --build"
exit 4
fi
log " Layer 2 — precise build SHA:"
log " expected build_sha: $DEPLOYING_SHA_FULL (from this deploy.sh run)"
log " reported build_sha: $REPORTED_SHA (from live /health)"
log " reported build_time: $REPORTED_BUILD_TIME"
if [ "$REPORTED_SHA" != "$DEPLOYING_SHA_FULL" ]; then
log "FATAL: build_sha mismatch"
log " the live container is reporting a different commit than"
log " the one this deploy.sh run just shipped. Possible causes:"
log " - the container is using a cached image instead of the"
log " freshly-built one (try: docker compose build --no-cache)"
log " - the env vars didn't propagate (check that"
log " deploy/dalidou/docker-compose.yml has the environment"
log " section with ATOCORE_BUILD_SHA)"
log " - another process restarted the container between the"
log " build and the health check"
exit 6
fi
log "Deploy complete."
log " commit: $DEPLOYING_SHA ($DEPLOYING_SHA_FULL)"
log " code_version: $REPORTED_VERSION"
log " build_sha: $REPORTED_SHA"
log " build_time: $REPORTED_BUILD_TIME"
log " health: ok"

View File

@@ -9,6 +9,15 @@ services:
- "${ATOCORE_PORT:-8100}:8100"
env_file:
- .env
environment:
# Build provenance — set by deploy/dalidou/deploy.sh on each
# rebuild so /health can report exactly which commit is live.
# Defaults to 'unknown' for direct `docker compose up` runs that
# bypass deploy.sh; in that case the operator should run
# deploy.sh instead so the deployed SHA is recorded.
ATOCORE_BUILD_SHA: "${ATOCORE_BUILD_SHA:-unknown}"
ATOCORE_BUILD_TIME: "${ATOCORE_BUILD_TIME:-unknown}"
ATOCORE_BUILD_BRANCH: "${ATOCORE_BUILD_BRANCH:-unknown}"
volumes:
- ${ATOCORE_DB_DIR}:${ATOCORE_DB_DIR}
- ${ATOCORE_CHROMA_DIR}:${ATOCORE_CHROMA_DIR}

View File

@@ -0,0 +1,188 @@
#!/usr/bin/env python3
"""Claude Code Stop hook: capture interaction to AtoCore.
Reads the Stop hook JSON from stdin, extracts the last user prompt
from the transcript JSONL, and POSTs to the AtoCore /interactions
endpoint in conservative mode (reinforce=false, no extraction).
Fail-open: always exits 0, logs errors to stderr only.
Environment variables:
ATOCORE_URL Base URL of the AtoCore instance (default: http://dalidou:8100)
ATOCORE_CAPTURE_DISABLED Set to "1" to disable capture (kill switch)
Usage in ~/.claude/settings.json:
"Stop": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "python /path/to/capture_stop.py",
"timeout": 15
}]
}]
"""
from __future__ import annotations
import json
import os
import sys
import urllib.error
import urllib.request
ATOCORE_URL = os.environ.get("ATOCORE_URL", "http://dalidou:8100")
TIMEOUT_SECONDS = 10
# Minimum prompt length to bother capturing. Single-word acks,
# slash commands, and empty lines aren't useful interactions.
MIN_PROMPT_LENGTH = 15
# Maximum response length to capture. Truncate very long assistant
# responses to keep the interactions table manageable.
MAX_RESPONSE_LENGTH = 50_000
def main() -> None:
"""Entry point. Always exits 0."""
try:
_capture()
except Exception as exc:
print(f"capture_stop: {exc}", file=sys.stderr)
def _capture() -> None:
if os.environ.get("ATOCORE_CAPTURE_DISABLED") == "1":
return
raw = sys.stdin.read()
if not raw.strip():
return
hook_data = json.loads(raw)
session_id = hook_data.get("session_id", "")
assistant_message = hook_data.get("last_assistant_message", "")
transcript_path = hook_data.get("transcript_path", "")
cwd = hook_data.get("cwd", "")
prompt = _extract_last_user_prompt(transcript_path)
if not prompt or len(prompt.strip()) < MIN_PROMPT_LENGTH:
return
response = assistant_message or ""
if len(response) > MAX_RESPONSE_LENGTH:
response = response[:MAX_RESPONSE_LENGTH] + "\n\n[truncated]"
project = _infer_project(cwd)
payload = {
"prompt": prompt,
"response": response,
"client": "claude-code",
"session_id": session_id,
"project": project,
"reinforce": False,
}
body = json.dumps(payload, ensure_ascii=True).encode("utf-8")
req = urllib.request.Request(
f"{ATOCORE_URL}/interactions",
data=body,
headers={"Content-Type": "application/json"},
method="POST",
)
resp = urllib.request.urlopen(req, timeout=TIMEOUT_SECONDS)
result = json.loads(resp.read().decode("utf-8"))
print(
f"capture_stop: recorded interaction {result.get('id', '?')} "
f"(project={project or 'none'}, prompt_chars={len(prompt)}, "
f"response_chars={len(response)})",
file=sys.stderr,
)
def _extract_last_user_prompt(transcript_path: str) -> str:
"""Read the JSONL transcript and return the last real user prompt.
Skips meta messages (isMeta=True) and system/command messages
(content starting with '<').
"""
if not transcript_path:
return ""
# Normalize path for the current OS
path = os.path.normpath(transcript_path)
if not os.path.isfile(path):
return ""
last_prompt = ""
try:
with open(path, encoding="utf-8", errors="replace") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
entry = json.loads(line)
except json.JSONDecodeError:
continue
if entry.get("type") != "user":
continue
if entry.get("isMeta", False):
continue
msg = entry.get("message", {})
if not isinstance(msg, dict):
continue
content = msg.get("content", "")
if isinstance(content, str):
text = content.strip()
elif isinstance(content, list):
# Content blocks: extract text blocks
parts = []
for block in content:
if isinstance(block, str):
parts.append(block)
elif isinstance(block, dict) and block.get("type") == "text":
parts.append(block.get("text", ""))
text = "\n".join(parts).strip()
else:
continue
# Skip system/command XML and very short messages
if text.startswith("<") or len(text) < MIN_PROMPT_LENGTH:
continue
last_prompt = text
except OSError:
pass
return last_prompt
# Project inference from working directory.
# Maps known repo paths to AtoCore project IDs. The user can extend
# this table or replace it with a registry lookup later.
_PROJECT_PATH_MAP: dict[str, str] = {
# Add mappings as needed, e.g.:
# "C:\\Users\\antoi\\gigabit": "p04-gigabit",
# "C:\\Users\\antoi\\interferometer": "p05-interferometer",
}
def _infer_project(cwd: str) -> str:
"""Try to map the working directory to an AtoCore project."""
if not cwd:
return ""
norm = os.path.normpath(cwd).lower()
for path_prefix, project_id in _PROJECT_PATH_MAP.items():
if norm.startswith(os.path.normpath(path_prefix).lower()):
return project_id
return ""
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,434 @@
# Engineering Layer V1 Acceptance Criteria
## Why this document exists
The engineering layer planning sprint produced 7 architecture
docs. None of them on their own says "you're done with V1, ship
it". This document does. It translates the planning into
measurable, falsifiable acceptance criteria so the implementation
sprint can know unambiguously when V1 is complete.
The acceptance criteria are organized into four categories:
1. **Functional** — what the system must be able to do
2. **Quality** — how well it must do it
3. **Operational** — what running it must look like
4. **Documentation** — what must be written down
V1 is "done" only when **every criterion in this document is met
against at least one of the three active projects** (`p04-gigabit`,
`p05-interferometer`, `p06-polisher`). The choice of which
project is the test bed is up to the implementer, but the same
project must satisfy all functional criteria.
## The single-sentence definition
> AtoCore Engineering Layer V1 is done when, against one chosen
> active project, every v1-required query in
> `engineering-query-catalog.md` returns a correct result, the
> Human Mirror renders a coherent project overview, and a real
> KB-CAD or KB-FEM export round-trips through the ingest →
> review queue → active entity flow without violating any
> conflict or trust invariant.
Everything below is the operational form of that sentence.
## Category 1 — Functional acceptance
### F-1: Entity store implemented per the V1 ontology
- The 12 V1 entity types from `engineering-ontology-v1.md` exist
in the database with the schema described there
- The 4 relationship families (Structural, Intent, Validation,
Provenance) are implemented as edges with the relationship
types listed in the catalog
- Every entity has the shared header fields:
`id, type, name, project_id, status, confidence, source_refs,
created_at, updated_at, extractor_version, canonical_home`
- The status lifecycle matches the memory layer:
`candidate → active → superseded | invalid`
### F-2: All v1-required queries return correct results
For the chosen test project, every query Q-001 through Q-020 in
`engineering-query-catalog.md` must:
- be implemented as an API endpoint with the shape specified in
the catalog
- return the expected result shape against real data
- include the provenance chain when the catalog requires it
- handle the empty case (no matches) gracefully — empty array,
not 500
The "killer correctness queries" — Q-006 (orphan requirements),
Q-009 (decisions on flagged assumptions), Q-011 (unsupported
validation claims) — are non-negotiable. If any of those three
returns wrong results, V1 is not done.
### F-3: Tool ingest endpoints are live
Both endpoints from `tool-handoff-boundaries.md` are implemented:
- `POST /ingest/kb-cad/export` accepts the documented JSON
shape, validates it, and produces entity candidates
- `POST /ingest/kb-fem/export` ditto
- Both refuse exports with invalid schemas (4xx with a clear
error)
- Both return a summary of created/dropped/failed counts
- Both never auto-promote anything; everything lands as
`status="candidate"`
- Both carry source identifiers (exporter name, exporter version,
source artifact id) into the candidate's provenance fields
A real KB-CAD export — even a hand-crafted one if the actual
exporter doesn't exist yet — must round-trip through the endpoint
and produce reviewable candidates for the test project.
### F-4: Candidate review queue works end to end
Per `promotion-rules.md`:
- `GET /entities?status=candidate` lists the queue
- `POST /entities/{id}/promote` moves candidate → active
- `POST /entities/{id}/reject` moves candidate → invalid
- The same shapes work for memories (already shipped in Phase 9 C)
- The reviewer can edit a candidate's content via
`PUT /entities/{id}` before promoting
- Every promote/reject is logged with timestamp and reason
### F-5: Conflict detection fires
Per `conflict-model.md`:
- The synchronous detector runs at every active write
(create, promote, project_state set, KB import)
- A test must demonstrate that pushing a contradictory KB-CAD
export creates a `conflicts` row with both members linked
- The reviewer can resolve the conflict via
`POST /conflicts/{id}/resolve` with one of the supported
actions (supersede_others, no_action, dismiss)
- Resolution updates the underlying entities according to the
chosen action
### F-6: Human Mirror renders for the test project
Per `human-mirror-rules.md`:
- `GET /mirror/{project}/overview` returns rendered markdown
- `GET /mirror/{project}/decisions` returns rendered markdown
- `GET /mirror/{project}/subsystems/{subsystem}` returns
rendered markdown for at least one subsystem
- `POST /mirror/{project}/regenerate` triggers regeneration on
demand
- Generated files appear under `/srv/storage/atocore/data/mirror/`
with the "do not edit" header banner
- Disputed markers appear inline when conflicts exist
- Project-state overrides display with the `(curated)` annotation
- Output is deterministic (the same inputs produce the same
bytes, suitable for diffing)
### F-7: Memory-to-entity graduation works for at least one type
Per `memory-vs-entities.md`:
- `POST /memory/{id}/graduate` exists
- Graduating a memory of type `adaptation` produces a Decision
entity candidate with the memory's content as a starting point
- The original memory row stays at `status="graduated"` (a new
status added by the engineering layer migration)
- The graduated memory has a forward pointer to the entity
candidate's id
- Promoting the entity candidate does NOT delete the original
memory
- The same graduation flow works for `project` → Requirement
and `knowledge` → Fact entity types (test the path; doesn't
have to be exhaustive)
### F-8: Provenance chain is complete
For every active entity in the test project, the following must
be true:
- It links back to at least one source via `source_refs` (which
is one or more of: source_chunk_id, source_interaction_id,
source_artifact_id from KB import)
- The provenance chain can be walked from the entity to the
underlying raw text (source_chunks) or external artifact
- Q-017 (the evidence query) returns at least one row for every
active entity
If any active entity has no provenance, it's a bug — provenance
is mandatory at write time per the promotion rules.
## Category 2 — Quality acceptance
### Q-1: All existing tests still pass
The full pre-V1 test suite (currently 160 tests) must still
pass. The V1 implementation may add new tests but cannot regress
any existing test.
### Q-2: V1 has its own test coverage
For each of F-1 through F-8 above, at least one automated test
exists that:
- exercises the happy path
- covers at least one error path
- runs in CI in under 10 seconds (no real network, no real LLM)
The full V1 test suite should be under 30 seconds total runtime
to keep the development loop fast.
### Q-3: Conflict invariants are enforced by tests
Specific tests must demonstrate:
- Two contradictory KB exports produce a conflict (not silent
overwrite)
- A reviewer can't accidentally promote both members of an open
conflict to active without resolving the conflict first
- The "flag, never block" rule holds — writes still succeed
even when they create a conflict
### Q-4: Trust hierarchy is enforced by tests
Specific tests must demonstrate:
- Entity candidates can never appear in context packs
- Reinforcement only touches active memories (already covered
by Phase 9 Commit B tests, but the same property must hold
for entities once they exist)
- Nothing automatically writes to project_state ever
- Candidates can never satisfy Q-005 (only active entities count)
### Q-5: The Human Mirror is reproducible
A golden-file test exists for at least one Mirror page. Updating
the golden file is a normal part of template work (single
command, well-documented). The test fails if the renderer
produces different bytes for the same input, catching
non-determinism.
### Q-6: Killer correctness queries pass against real-ish data
The test bed for Q-006, Q-009, Q-011 is not synthetic. The
implementation must seed the test project with at least:
- One Requirement that has a satisfying Component (Q-006 should
not flag it)
- One Requirement with no satisfying Component (Q-006 must flag it)
- One Decision based on an Assumption flagged as `needs_review`
(Q-009 must flag the Decision)
- One ValidationClaim with at least one supporting Result
(Q-011 should not flag it)
- One ValidationClaim with no supporting Result (Q-011 must flag it)
These five seed cases run as a single integration test that
exercises the killer correctness queries against actual
representative data.
## Category 3 — Operational acceptance
### O-1: Migration is safe and reversible
The V1 schema migration (adding the `entities`, `relationships`,
`conflicts`, `conflict_members` tables, plus `mirror_regeneration_failures`)
must:
- run cleanly against a production-shape database
- be implemented via the same `_apply_migrations` pattern as
Phase 9 (additive only, idempotent, safe to run twice)
- be tested by spinning up a fresh DB AND running against a
copy of the live Dalidou DB taken from a backup
### O-2: Backup and restore still work
The backup endpoint must include the new tables. A restore drill
on the test project must:
- successfully back up the V1 entity state via
`POST /admin/backup`
- successfully validate the snapshot
- successfully restore from the snapshot per
`docs/backup-restore-procedure.md`
- pass post-restore verification including a Q-001 query against
the test project
The drill must be performed once before V1 is declared done.
### O-3: Performance bounds
These are starting bounds; tune later if real usage shows
problems:
- Single-entity write (`POST /entities/...`): under 100ms p99
on the production Dalidou hardware
- Single Q-001 / Q-005 / Q-008 query: under 500ms p99 against
a project with up to 1000 entities
- Mirror regeneration of one project overview: under 5 seconds
for a project with up to 1000 entities
- Conflict detector at write time: adds no more than 50ms p99
to a write that doesn't actually produce a conflict
These bounds are not tested by automated benchmarks in V1 (that
would be over-engineering). They are sanity-checked by the
developer running the operations against the test project.
### O-4: No new manual ops burden
V1 should not introduce any new "you have to remember to run X
every day" requirement. Specifically:
- Mirror regeneration is automatic (debounced async + daily
refresh), no manual cron entry needed
- Conflict detection is automatic at write time, no manual sweep
needed in V1 (the nightly sweep is V2)
- Backup retention cleanup is **still** an open follow-up from
the operational baseline; V1 does not block on it
### O-5: No regressions in Phase 9 reflection loop
The capture, reinforcement, and extraction loop from Phase 9
A/B/C must continue to work end to end with the engineering
layer in place. Specifically:
- Memories whose types are NOT in the engineering layer
(identity, preference, episodic) keep working exactly as
before
- Memories whose types ARE in the engineering layer (project,
knowledge, adaptation) can still be created hand or by
extraction; the deprecation rule from `memory-vs-entities.md`
("no new writes after V1 ships") is implemented as a
configurable warning, not a hard block, so existing
workflows aren't disrupted
## Category 4 — Documentation acceptance
### D-1: Per-entity-type spec docs
Each of the 12 V1 entity types has a short spec doc under
`docs/architecture/entities/` covering:
- the entity's purpose
- its required and optional fields
- its lifecycle quirks (if any beyond the standard
candidate/active/superseded/invalid)
- which queries it appears in (cross-reference to the catalog)
- which relationship types reference it
These docs can be terse — a page each, mostly bullet lists.
Their purpose is to make the entity model legible to a future
maintainer, not to be reference manuals.
### D-2: KB-CAD and KB-FEM export schema docs
`docs/architecture/kb-cad-export-schema.md` and
`docs/architecture/kb-fem-export-schema.md` are written and
match the implemented validators.
### D-3: V1 release notes
A `docs/v1-release-notes.md` summarizes:
- What V1 added (entities, relationships, conflicts, mirror,
ingest endpoints)
- What V1 deferred (auto-promotion, BOM/cost/manufacturing
entities, NX direct integration, cross-project rollups)
- The migration story for existing memories (graduation flow)
- Known limitations and the V2 roadmap pointers
### D-4: master-plan-status.md and current-state.md updated
Both top-level status docs reflect V1's completion:
- Phase 6 (AtoDrive) and the engineering layer are explicitly
marked as separate tracks
- The engineering planning sprint section is marked complete
- Phase 9 stays at "baseline complete" (V1 doesn't change Phase 9)
- The engineering layer V1 is added as its own line item
## What V1 explicitly does NOT need to do
To prevent scope creep, here is the negative list. None of the
following are V1 acceptance criteria:
- **No LLM extractor.** The Phase 9 C rule-based extractor is
the entity extractor for V1 too, just with new rules added for
entity types.
- **No auto-promotion of candidates.** Per `promotion-rules.md`.
- **No write-back to KB-CAD or KB-FEM.** Per
`tool-handoff-boundaries.md`.
- **No multi-user / per-reviewer auth.** Single-user assumed.
- **No real-time UI.** API + Mirror markdown is the V1 surface.
A web UI is V2+.
- **No cross-project rollups.** Per `human-mirror-rules.md`.
- **No time-travel queries** (Q-015 stays v1-stretch).
- **No nightly conflict sweep.** Synchronous detection only in V1.
- **No incremental Chroma snapshots.** The current full-copy
approach in `backup-restore-procedure.md` is fine for V1.
- **No retention cleanup script.** Still an open follow-up.
- **No backup encryption.** Still an open follow-up.
- **No off-Dalidou backup target.** Still an open follow-up.
## How to use this document during implementation
When the implementation sprint begins:
1. Read this doc once, top to bottom
2. Pick the test project (probably p05-interferometer because
the optical/structural domain has the cleanest entity model)
3. For each section, write the test or the implementation, in
roughly the order: F-1 → F-2 → F-3 → F-4 → F-5 → F-6 → F-7 → F-8
4. Each acceptance criterion's test should be written **before
or alongside** the implementation, not after
5. Run the full test suite at every commit
6. When every box is checked, write D-3 (release notes), update
D-4 (status docs), and call V1 done
The implementation sprint should not touch anything outside the
scope listed here. If a desire arises to add something not in
this doc, that's a V2 conversation, not a V1 expansion.
## Anticipated friction points
These are the things I expect will be hard during implementation:
1. **The graduation flow (F-7)** is the most cross-cutting
change because it touches the existing memory module.
Worth doing it last so the memory module is stable for
all the V1 entity work first.
2. **The Mirror's deterministic-output requirement (Q-5)** will
bite if the implementer iterates over Python dicts without
sorting. Plan to use `sorted()` literally everywhere.
3. **Conflict detection (F-5)** has subtle correctness traps:
the slot key extraction must be stable, the dedup-of-existing-conflicts
logic must be right, and the synchronous detector must not
slow writes meaningfully (Q-3 / O-3 cover this, but watch).
4. **Provenance backfill** for entities that come from the
existing memory layer via graduation (F-7) is the trickiest
part: the original memory may not have had a strict
`source_chunk_id`, in which case the graduated entity also
doesn't have one. The implementation needs an "orphan
provenance" allowance for graduated entities, with a
warning surfaced in the Mirror.
These aren't blockers, just the parts of the V1 spec I'd
attack with extra care.
## TL;DR
- Engineering V1 is done when every box in this doc is checked
against one chosen active project
- Functional: 8 criteria covering entities, queries, ingest,
review queue, conflicts, mirror, graduation, provenance
- Quality: 6 criteria covering tests, golden files, killer
correctness, trust enforcement
- Operational: 5 criteria covering migration safety, backup
drill, performance bounds, no new manual ops, Phase 9 not
regressed
- Documentation: 4 criteria covering entity specs, KB schema
docs, release notes, top-level status updates
- Negative list: a clear set of things V1 deliberately does
NOT need to do, to prevent scope creep
- The implementation sprint follows this doc as a checklist

View File

@@ -0,0 +1,384 @@
# Human Mirror Rules (Layer 3 → derived markdown views)
## Why this document exists
The engineering layer V1 stores facts as typed entities and
relationships in a SQL database. That representation is excellent
for queries, conflict detection, and automated reasoning, but
it's terrible for the human reading experience. People want to
read prose, not crawl JSON.
The Human Mirror is the layer that turns the typed entity store
into human-readable markdown pages. It's strictly a derived view —
nothing in the Human Mirror is canonical, every page is regenerated
from current entity state on demand.
This document defines:
- what the Human Mirror generates
- when it regenerates
- how the human edits things they see in the Mirror
- how the canonical-vs-derived rule is enforced (so editing the
derived markdown can't silently corrupt the entity store)
## The non-negotiable rule
> **The Human Mirror is read-only from the human's perspective.**
>
> If the human wants to change a fact they see in the Mirror, they
> change it in the canonical home (per `representation-authority.md`),
> NOT in the Mirror page. The next regeneration picks up the change.
This rule is what makes the whole derived-view approach safe. If
the human is allowed to edit Mirror pages directly, the
canonical-vs-derived split breaks and the Mirror becomes a second
source of truth that disagrees with the entity store.
The technical enforcement is that every Mirror page carries a
header banner that says "this file is generated from AtoCore
entity state, do not edit", and the file is regenerated from the
entity store on every change to its underlying entities. Manual
edits will be silently overwritten on the next regeneration.
## What the Mirror generates in V1
Three template families, each producing one or more pages per
project:
### 1. Project Overview
One page per registered project. Renders:
- Project header (id, aliases, description)
- Subsystem tree (from Q-001 / Q-004 in the query catalog)
- Active Decisions affecting this project (Q-008, ordered by date)
- Open Requirements with coverage status (Q-005, Q-006)
- Open ValidationClaims with support status (Q-010, Q-011)
- Currently flagged conflicts (from the conflict model)
- Recent changes (Q-013) — last 14 days
This is the most important Mirror page. It's the page someone
opens when they want to know "what's the state of this project
right now". It deliberately mirrors what `current-state.md` does
for AtoCore itself but generated entirely from typed state.
### 2. Decision Log
One page per project. Renders:
- All active Decisions in chronological order (newest first)
- Each Decision shows: id, what was decided, when, the affected
Subsystem/Component, the supporting evidence (Q-014, Q-017)
- Superseded Decisions appear as collapsed "history" entries
with a forward link to whatever superseded them
- Conflicting Decisions get a "⚠ disputed" marker
This is the human-readable form of the engineering query catalog's
Q-014 query.
### 3. Subsystem Detail
One page per Subsystem (so a few per project). Renders:
- Subsystem header
- Components contained in this subsystem (Q-001)
- Interfaces this subsystem has (Q-003)
- Constraints applying to it (Q-007)
- Decisions affecting it (Q-008)
- Validation status: which Requirements are satisfied,
which are open (Q-005, Q-006)
- Change history within this subsystem (Q-013 scoped)
Subsystem detail pages are what someone reads when they're
working on a specific part of the system and want everything
relevant in one place.
## What the Mirror does NOT generate in V1
Intentionally excluded so the V1 implementation stays scoped:
- **Per-component detail pages.** Components are listed in
Subsystem pages but don't get their own pages. Reduces page
count from hundreds to dozens.
- **Per-Decision detail pages.** Decisions appear inline in
Project Overview and Decision Log; their full text plus
evidence chain is shown there, not on a separate page.
- **Cross-project rollup pages.** No "all projects at a glance"
page in V1. Each project is its own report.
- **Time-series / historical pages.** The Mirror is always
"current state". History is accessible via Decision Log and
superseded chains, but no "what was true on date X" page exists
in V1 (Q-015 is v1-stretch in the query catalog for the same
reason).
- **Diff pages between two timestamps.** Same reasoning.
- **Render of the conflict queue itself.** Conflicts appear
inline in the relevant Mirror pages with the "⚠ disputed"
marker and a link to `/conflicts/{id}`, but there's no
Mirror page that lists all conflicts. Use `GET /conflicts`.
- **Per-memory pages.** Memories are not engineering entities;
they appear in context packs and the review queue, not in the
Human Mirror.
## Where Mirror pages live
Two options were considered. The chosen V1 path is option B:
**Option A — write Mirror pages back into the source vault.**
Generate `/srv/storage/atocore/sources/vault/mirror/p05/overview.md`
so the human reads them in their normal Obsidian / markdown
viewer. **Rejected** because writing into the source vault
violates the "sources are read-only" rule from
`tool-handoff-boundaries.md` and the operating model.
**Option B (chosen) — write Mirror pages into a dedicated AtoCore
output dir, served via the API.** Generate under
`/srv/storage/atocore/data/mirror/p05/overview.md`. The human
reads them via:
- the API endpoints `GET /mirror/{project}/overview`,
`GET /mirror/{project}/decisions`,
`GET /mirror/{project}/subsystems/{subsystem}` (all return
rendered markdown as text/markdown)
- a future "Mirror viewer" in the Claude Code slash command
`/atocore-mirror <project>` that fetches the rendered markdown
and displays it inline
- direct file access on Dalidou for power users:
`cat /srv/storage/atocore/data/mirror/p05/overview.md`
The dedicated dir keeps the Mirror clearly separated from the
canonical sources and makes regeneration safe (it's just a
directory wipe + write).
## When the Mirror regenerates
Three triggers, in order from cheapest to most expensive:
### 1. On explicit human request
```
POST /mirror/{project}/regenerate
```
Returns the timestamp of the regeneration and the list of files
written. This is the path the human takes when they've just
curated something into project_state and want to see the Mirror
reflect it immediately.
### 2. On entity write (debounced, async, per project)
When any entity in a project changes status (candidate → active,
active → superseded), a regeneration of that project's Mirror is
queued. The queue is debounced — multiple writes within a 30-second
window only trigger one regeneration. This keeps the Mirror
"close to current" without generating a Mirror update on every
single API call.
The implementation is a simple dict of "next regeneration time"
per project, checked by a background task. No cron, no message
queue, no Celery. Just a `dict[str, datetime]` and a thread.
### 3. On scheduled refresh (daily)
Once per day at a quiet hour, every project's Mirror regenerates
unconditionally. This catches any state drift from manual
project_state edits that bypassed the entity write hooks, and
provides a baseline guarantee that the Mirror is at most 24
hours stale.
The schedule runs from the same machinery as the future backup
retention job, so we get one cron-equivalent system to maintain
instead of two.
## What if regeneration fails
The Mirror has to be resilient. If regeneration fails for a
project (e.g. a query catalog query crashes, a template rendering
error), the existing Mirror files are **not** deleted. The
existing files stay in place (showing the last successful state)
and a regeneration error is recorded in:
- the API response if the trigger was explicit
- a log entry at warning level for the async path
- a `mirror_regeneration_failures` table for the daily refresh
This means the human can always read the Mirror, even if the
last 5 minutes of changes haven't made it in yet. Stale is
better than blank.
## How the human curates "around" the Mirror
The Mirror reflects the current entity state. If the human
doesn't like what they see, the right edits go into one of:
| What you want to change | Where you change it |
|---|---|
| A Decision's text | `PUT /entities/Decision/{id}` (or `PUT /memory/{id}` if it's still memory-layer) |
| A Decision's status (active → superseded) | `POST /entities/Decision/{id}/supersede` (V1 entity API) |
| Whether a Component "satisfies" a Requirement | edit the relationship directly via the entity API (V1) |
| The current trusted next focus shown on the Project Overview | `POST /project/state` with `category=status, key=next_focus` |
| A typo in a generated heading or label | edit the **template**, not the rendered file. Templates live in `templates/mirror/` (V1 implementation) |
| Source of a fact ("this came from KB-CAD on day X") | not editable by hand — it's automatically populated from provenance |
The rule is consistent: edit the canonical home, regenerate (or
let the auto-trigger fire), see the change reflected in the
Mirror.
## Templates
The Mirror uses Jinja2-style templates checked into the repo
under `templates/mirror/`. Each template is a markdown file with
placeholders that the renderer fills from query catalog results.
Template list for V1:
- `templates/mirror/project-overview.md.j2`
- `templates/mirror/decision-log.md.j2`
- `templates/mirror/subsystem-detail.md.j2`
Editing a template is a code change, reviewed via normal git PRs.
The templates are deliberately small and readable so the human
can tweak the output format without touching renderer code.
The renderer is a thin module:
```python
# src/atocore/mirror/renderer.py (V1, not yet implemented)
def render_project_overview(project: str) -> str:
"""Generate the project overview markdown for one project."""
facts = collect_project_overview_facts(project)
template = load_template("project-overview.md.j2")
return template.render(**facts)
```
## The "do not edit" header
Every generated Mirror file starts with a fixed banner:
```markdown
<!--
This file is generated by AtoCore from current entity state.
DO NOT EDIT — manual changes will be silently overwritten on
the next regeneration.
Edit the canonical home instead. See:
https://docs.atocore.../representation-authority.md
Regenerated: 2026-04-07T12:34:56Z
Source entities: <commit-like checksum of input data>
-->
```
The checksum at the end lets the renderer skip work when nothing
relevant has changed since the last regeneration. If the inputs
match the previous run's checksum, the existing file is left
untouched.
## Conflicts in the Mirror
Per the conflict model, any open conflict on a fact that appears
in the Mirror gets a visible disputed marker:
```markdown
- Lateral support material: **GF-PTFE** ⚠ disputed
- The KB-CAD import on 2026-04-07 reported PEEK; conflict #c-039.
```
The disputed marker is a hyperlink (in renderer terms; the markdown
output is a relative link) to the conflict detail page in the API
or to the conflict id for direct lookup. The reviewer follows the
link, resolves the conflict via `POST /conflicts/{id}/resolve`,
and on the next regeneration the marker disappears.
## Project-state overrides in the Mirror
When a Mirror page would show a value derived from entities, but
project_state has an override on the same key, **the Mirror shows
the project_state value** with a small annotation noting the
override:
```markdown
- Next focus: **Wave 2 trusted-operational ingestion** (curated)
```
The `(curated)` annotation tells the reader "this is from the
trusted-state Layer 3, not from extracted entities". This makes
the trust hierarchy visible in the human reading experience.
## The "Mirror diff" workflow (post-V1, but designed for)
A common workflow after V1 ships will be:
1. Reviewer has curated some new entities
2. They want to see "what changed in the Mirror as a result"
3. They want to share that diff with someone else as evidence
To support this, the Mirror generator writes its output
deterministically (sorted iteration, stable timestamp formatting)
so a `git diff` between two regenerated states is meaningful.
V1 doesn't add an explicit "diff between two Mirror snapshots"
endpoint — that's deferred. But the deterministic-output
property is a V1 requirement so future diffing works without
re-renderer-design work.
## What the Mirror enables
With the Mirror in place:
- **OpenClaw can read project state in human form.** The
read-only AtoCore helper skill on the T420 already calls
`/context/build`; in V1 it gains the option to call
`/mirror/{project}/overview` to get a fully-rendered markdown
page instead of just retrieved chunks. This is much faster
than crawling individual entities for general questions.
- **The human gets a daily-readable artifact.** Every morning,
Antoine can `cat /srv/storage/atocore/data/mirror/p05/overview.md`
and see the current state of p05 in his preferred reading
format. No API calls, no JSON parsing.
- **Cross-collaborator sharing.** If you ever want to send
someone a project overview without giving them AtoCore access,
the Mirror file is a self-contained markdown document they can
read in any markdown viewer.
- **Claude Code integration.** A future
`/atocore-mirror <project>` slash command renders the Mirror
inline, complementing the existing `/atocore-context` command
with a human-readable view of "what does AtoCore think about
this project right now".
## Open questions for V1 implementation
1. **What's the regeneration debounce window?** 30 seconds is the
starting value but should be tuned with real usage.
2. **Does the daily refresh need a separate trigger mechanism, or
is it just a long-period entry in the same in-process scheduler
that handles the debounced async refreshes?** Probably the
latter — keep it simple.
3. **How are templates tested?** Likely a small set of fixture
project states + golden output files, with a single test that
asserts `render(fixture) == golden`. Updating golden files is
a normal part of template work.
4. **Are Mirror pages discoverable via a directory listing
endpoint?** `GET /mirror/{project}` returns the list of
available pages for that project. Probably yes; cheap to add.
5. **How does the Mirror handle a project that has zero entities
yet?** Render an empty-state page that says "no curated facts
yet — add some via /memory or /entities/Decision". Better than
a blank file.
## TL;DR
- The Human Mirror generates 3 template families per project
(Overview, Decision Log, Subsystem Detail) from current entity
state
- It's strictly read-only from the human's perspective; edits go
to the canonical home and the Mirror picks them up on
regeneration
- Three regeneration triggers: explicit POST, debounced
async-on-write, daily scheduled refresh
- Mirror files live in `/srv/storage/atocore/data/mirror/`
(NOT in the source vault — sources stay read-only)
- Conflicts and project_state overrides are visible inline in
the rendered markdown so the trust hierarchy shows through
- Templates are checked into the repo and edited via PR; the
rendered files are derived and never canonical
- Deterministic output is a V1 requirement so future diffing
works without rework

View File

@@ -0,0 +1,333 @@
# LLM Client Integration (the layering)
## Why this document exists
AtoCore must be reachable from many different LLM client contexts:
- **OpenClaw** on the T420 (already integrated via the read-only
helper skill at `/home/papa/clawd/skills/atocore-context/`)
- **Claude Code** on the laptop (via the slash command shipped in
this repo at `.claude/commands/atocore-context.md`)
- **Codex** sessions (future)
- **Direct API consumers** — scripts, Python code, ad-hoc curl
- **The eventual MCP server** when it's worth building
Without an explicit layering rule, every new client tends to
reimplement the same routing logic (project detection, context
build, retrieval audit, project-state inspection) in slightly
different ways. That is exactly what almost happened in the first
draft of the Claude Code slash command, which started as a curl +
jq script that duplicated capabilities the existing operator client
already had.
This document defines the layering so future clients don't repeat
that mistake.
## The layering
Three layers, top to bottom:
```
+----------------------------------------------------+
| Per-agent thin frontends |
| |
| - Claude Code slash command |
| (.claude/commands/atocore-context.md) |
| - OpenClaw helper skill |
| (/home/papa/clawd/skills/atocore-context/) |
| - Codex skill (future) |
| - MCP server (future) |
+----------------------------------------------------+
|
| shells out to / imports
v
+----------------------------------------------------+
| Shared operator client |
| scripts/atocore_client.py |
| |
| - subcommands for stable AtoCore operations |
| - fail-open on network errors |
| - consistent JSON output across all subcommands |
| - environment-driven configuration |
| (ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS, |
| ATOCORE_REFRESH_TIMEOUT_SECONDS, |
| ATOCORE_FAIL_OPEN) |
+----------------------------------------------------+
|
| HTTP
v
+----------------------------------------------------+
| AtoCore HTTP API |
| src/atocore/api/routes.py |
| |
| - the universal interface to AtoCore |
| - everything else above is glue |
+----------------------------------------------------+
```
## The non-negotiable rules
These rules are what make the layering work.
### Rule 1 — every per-agent frontend is a thin wrapper
A per-agent frontend exists to do exactly two things:
1. **Translate the agent platform's command/skill format** into an
invocation of the shared client (or a small sequence of them)
2. **Render the JSON response** into whatever shape the agent
platform wants (markdown for Claude Code, plaintext for
OpenClaw, MCP tool result for an MCP server, etc.)
Everything else — talking to AtoCore, project detection, retrieval
audit, fail-open behavior, configuration — is the **shared
client's** job.
If a per-agent frontend grows logic beyond the two responsibilities
above, that logic is in the wrong place. It belongs in the shared
client where every other frontend gets to use it.
### Rule 2 — the shared client never duplicates the API
The shared client is allowed to **compose** API calls (e.g.
`auto-context` calls `detect-project` then `context-build`), but
it never reimplements API logic. If a useful operation can't be
expressed via the existing API endpoints, the right fix is to
extend the API, not to embed the logic in the client.
This rule keeps the API as the single source of truth for what
AtoCore can do.
### Rule 3 — the shared client only exposes stable operations
A subcommand only makes it into the shared client when:
- the API endpoint behind it has been exercised by at least one
real workflow
- the request and response shapes are unlikely to change
- the operation is one that more than one frontend will plausibly
want
This rule keeps the client surface stable so frontends don't have
to chase changes. New endpoints land in the API first, get
exercised in real use, and only then get a client subcommand.
## What's in scope for the shared client today
The currently shipped scope (per `scripts/atocore_client.py`):
### Stable operations (shipped since the client was introduced)
| Subcommand | Purpose | API endpoint(s) |
|---|---|---|
| `health` | service status, mount + source readiness | `GET /health` |
| `sources` | enabled source roots and their existence | `GET /sources` |
| `stats` | document/chunk/vector counts | `GET /stats` |
| `projects` | registered projects | `GET /projects` |
| `project-template` | starter shape for a new project | `GET /projects/template` |
| `propose-project` | preview a registration | `POST /projects/proposal` |
| `register-project` | persist a registration | `POST /projects/register` |
| `update-project` | update an existing registration | `PUT /projects/{name}` |
| `refresh-project` | re-ingest a project's roots | `POST /projects/{name}/refresh` |
| `project-state` | list trusted state for a project | `GET /project/state/{name}` |
| `project-state-set` | curate trusted state | `POST /project/state` |
| `project-state-invalidate` | supersede trusted state | `DELETE /project/state` |
| `query` | raw retrieval | `POST /query` |
| `context-build` | full context pack | `POST /context/build` |
| `auto-context` | detect-project then context-build | composes `/projects` + `/context/build` |
| `detect-project` | match a prompt to a registered project | composes `/projects` + local regex |
| `audit-query` | retrieval-quality audit with classification | composes `/query` + local labelling |
| `debug-context` | last context pack inspection | `GET /debug/context` |
| `ingest-sources` | ingest configured source dirs | `POST /ingest/sources` |
### Phase 9 reflection loop (shipped after migration safety work)
These were explicitly deferred in earlier versions of this doc
pending "exercised workflow". The constraint was real — premature
API freeze would have made it harder to iterate on the ergonomics —
but the deferral ran into a bootstrap problem: you can't exercise
the workflow in real Claude Code sessions without a usable client
surface to drive it from. The fix is to ship a minimal Phase 9
surface now and treat it as stable-but-refinable: adding new
optional parameters is fine, renaming subcommands is not.
| Subcommand | Purpose | API endpoint(s) |
|---|---|---|
| `capture` | record one interaction round-trip | `POST /interactions` |
| `extract` | run the rule-based extractor (preview or persist) | `POST /interactions/{id}/extract` |
| `reinforce-interaction` | backfill reinforcement on an existing interaction | `POST /interactions/{id}/reinforce` |
| `list-interactions` | paginated list with filters | `GET /interactions` |
| `get-interaction` | fetch one interaction by id | `GET /interactions/{id}` |
| `queue` | list the candidate review queue | `GET /memory?status=candidate` |
| `promote` | move a candidate memory to active | `POST /memory/{id}/promote` |
| `reject` | mark a candidate memory invalid | `POST /memory/{id}/reject` |
All 8 Phase 9 subcommands have test coverage in
`tests/test_atocore_client.py` via mocked `request()`, including
an end-to-end test that drives the full capture → extract → queue
→ promote/reject cycle through the client.
### Coverage summary
That covers everything in the "stable operations" set AND the
full Phase 9 reflection loop: project lifecycle, ingestion,
project-state curation, retrieval, context build,
retrieval-quality audit, health and stats inspection, interaction
capture, candidate extraction, candidate review queue.
## What's intentionally NOT in scope today
Two families of operations remain deferred:
### 1. Backup and restore admin operations
Phase 9 Commit B shipped these endpoints:
- `POST /admin/backup` (with `include_chroma`)
- `GET /admin/backup` (list)
- `GET /admin/backup/{stamp}/validate`
The backup endpoints are stable, but the documented operational
procedure (`docs/backup-restore-procedure.md`) intentionally uses
direct curl rather than the shared client. The reason is that
backup operations are *administrative* and benefit from being
explicit about which instance they're targeting, with no
fail-open behavior. The shared client's fail-open default would
hide a real backup failure.
If we later decide to add backup commands to the shared client,
they would set `ATOCORE_FAIL_OPEN=false` for the duration of the
call so the operator gets a real error on failure rather than a
silent fail-open envelope.
### 2. Engineering layer entity operations
The engineering layer is in planning, not implementation. When
V1 ships per `engineering-v1-acceptance.md`, the shared client
will gain entity, relationship, conflict, and Mirror commands.
None of those exist as stable contracts yet, so they are not in
the shared client today.
## How a new agent platform integrates
When a new LLM client needs AtoCore (e.g. Codex, ChatGPT custom
GPT, a Cursor extension), the integration recipe is:
1. **Don't reimplement.** Don't write a new HTTP client. Use the
shared client.
2. **Write a thin frontend** that translates the platform's
command/skill format into a shell call to
`python scripts/atocore_client.py <subcommand> <args...>`.
3. **Render the JSON response** in the platform's preferred shape.
4. **Inherit fail-open and env-var behavior** from the shared
client. Don't override unless the platform explicitly needs
to (e.g. an admin tool that wants to see real errors).
5. **If a needed capability is missing**, propose adding it to
the shared client. If the underlying API endpoint also
doesn't exist, propose adding it to the API first. Don't
add the logic to your frontend.
The Claude Code slash command in this repo is a worked example:
~50 lines of markdown that does argument parsing, calls the
shared client, and renders the result. It contains zero AtoCore
business logic of its own.
## How OpenClaw fits
OpenClaw's helper skill at `/home/papa/clawd/skills/atocore-context/`
on the T420 currently has its own implementation of `auto-context`,
`detect-project`, and the project lifecycle commands. It predates
this layering doc.
The right long-term shape is to **refactor the OpenClaw helper to
shell out to the shared client** instead of duplicating the
routing logic. This isn't urgent because:
- OpenClaw's helper works today and is in active use
- The duplication is on the OpenClaw side; AtoCore itself is not
affected
- The shared client and the OpenClaw helper are in different
repos (AtoCore vs OpenClaw clawd), so the refactor is a
cross-repo coordination
The refactor is queued as a follow-up. Until then, **the OpenClaw
helper and the Claude Code slash command are parallel
implementations** of the same idea. The shared client is the
canonical backbone going forward; new clients should follow the
new pattern even though the existing OpenClaw helper still has
its own.
## How this connects to the master plan
| Layer | Phase home | Status |
|---|---|---|
| AtoCore HTTP API | Phases 0/0.5/1/2/3/5/7/9 | shipped |
| Shared operator client (`scripts/atocore_client.py`) | implicitly Phase 8 (OpenClaw integration) infrastructure | shipped via codex/port-atocore-ops-client merge |
| OpenClaw helper skill (T420) | Phase 8 — partial | shipped (own implementation, refactor queued) |
| Claude Code slash command (this repo) | precursor to Phase 11 (multi-model) | shipped (refactored to use the shared client) |
| Codex skill | Phase 11 | future |
| MCP server | Phase 11 | future |
| Web UI / dashboard | Phase 11+ | future |
The shared client is the **substrate Phase 11 will build on**.
Every new client added in Phase 11 should be a thin frontend on
the shared client, not a fresh reimplementation.
## Versioning and stability
The shared client's subcommand surface is **stable**. Adding new
subcommands is non-breaking. Changing or removing existing
subcommands is breaking and would require a coordinated update
of every frontend that depends on them.
The current shared client has no explicit version constant; the
implicit contract is "the subcommands and JSON shapes documented
in this file". When the client surface meaningfully changes,
add a `CLIENT_VERSION = "x.y.z"` constant to
`scripts/atocore_client.py` and bump it per semver:
- patch: bug fixes, no surface change
- minor: new subcommands or new optional fields
- major: removed subcommands, renamed fields, changed defaults
## Open follow-ups
1. **Refactor the OpenClaw helper** to shell out to the shared
client. Cross-repo coordination, not blocking anything in
AtoCore itself. With the Phase 9 subcommands now in the shared
client, the OpenClaw refactor can reuse all the reflection-loop
work instead of duplicating it.
2. **Real-usage validation of the Phase 9 loop**, now that the
client surface exists. First capture → extract → review cycle
against the live Dalidou instance, likely via the Claude Code
slash command flow. Findings feed back into subcommand
refinement (new optional flags are fine, renames require a
semver bump).
3. **Add backup admin subcommands** if and when we decide the
shared client should be the canonical backup operator
interface (with fail-open disabled for admin commands).
4. **Add engineering-layer entity subcommands** as part of the
engineering V1 implementation sprint, per
`engineering-v1-acceptance.md`.
5. **Tag a `CLIENT_VERSION` constant** the next time the shared
client surface meaningfully changes. Today's surface with the
Phase 9 loop added is the v0.2.0 baseline (v0.1.0 was the
stable-ops-only version).
## TL;DR
- AtoCore HTTP API is the universal interface
- `scripts/atocore_client.py` is the canonical shared Python
backbone for stable AtoCore operations
- Per-agent frontends (Claude Code slash command, OpenClaw
helper, future Codex skill, future MCP server) are thin
wrappers that shell out to the shared client
- The shared client today covers project lifecycle, ingestion,
retrieval, context build, project-state, retrieval audit, AND
the full Phase 9 reflection loop (capture / extract /
reinforce / list / queue / promote / reject)
- Backup admin and engineering-entity commands remain deferred
- The OpenClaw helper is currently a parallel implementation and
the refactor to the shared client is a queued follow-up
- New LLM clients should never reimplement HTTP calls — they
follow the shell-out pattern documented here

View File

@@ -0,0 +1,462 @@
# Project Identity Canonicalization
## Why this document exists
AtoCore identifies projects by name in many places: trusted state
rows, memories, captured interactions, query/context API parameters,
extractor candidates, future engineering entities. Without an
explicit rule, every callsite would have to remember to canonicalize
project names through the registry — and the recent codex review
caught exactly the bug class that follows when one of them forgets.
The fix landed in `fb6298a` and works correctly today. This document
exists to make the rule **explicit and discoverable** so the
engineering layer V1 implementation, future entity write paths, and
any new agent integration don't reintroduce the same fragmentation
when nobody is looking.
## The contract
> **Every read/write that takes a project name MUST canonicalize it
> through `resolve_project_name()` before the value crosses a service
> boundary.**
The boundary is wherever a project name becomes a database row, a
query filter, an attribute on a stored object, or a key for any
lookup. The canonicalization happens **once**, at that boundary,
before the underlying storage primitive is called.
Symbolically:
```
HTTP layer (raw user input)
service entry point
project_name = resolve_project_name(project_name) ← ONLY canonical from this point
storage / queries / further service calls
```
The rule is intentionally simple. There's no per-call exception,
no "trust me, the caller already canonicalized it" shortcut, no
opt-out flag. Every service-layer entry point applies the helper
the moment it receives a project name from outside the service.
## The helper
```python
# src/atocore/projects/registry.py
def resolve_project_name(name: str | None) -> str:
"""Canonicalize a project name through the registry.
Returns the canonical project_id if the input matches any
registered project's id or alias. Returns the input unchanged
when it's empty or not in the registry — the second case keeps
backwards compatibility with hand-curated state, memories, and
interactions that predate the registry, or for projects that
are intentionally not registered.
"""
if not name:
return name or ""
project = get_registered_project(name)
if project is not None:
return project.project_id
return name
```
Three behaviors worth keeping in mind:
1. **Empty / None input → empty string output.** Callers don't have
to pre-check; passing `""` or `None` to a query filter still
works as "no project scope".
2. **Registered alias → canonical project_id.** The helper does the
case-insensitive lookup and returns the project's `id` field
(e.g. `"p05" → "p05-interferometer"`).
3. **Unregistered name → input unchanged.** This is the
backwards-compatibility path. Hand-curated state, memories, or
interactions created under a name that isn't in the registry
keep working. The retrieval is then "best effort" — the raw
string is used as the SQL key, which still finds the row that
was stored under the same raw string. This path exists so the
engineering layer V1 doesn't have to also be a data migration.
## Where the helper is currently called
As of `fb6298a`, the helper is invoked at exactly these eight
service-layer entry points:
| Module | Function | What gets canonicalized |
|---|---|---|
| `src/atocore/context/builder.py` | `build_context` | the `project_hint` parameter, before the trusted state lookup |
| `src/atocore/context/project_state.py` | `set_state` | `project_name`, before `ensure_project()` |
| `src/atocore/context/project_state.py` | `get_state` | `project_name`, before the SQL lookup |
| `src/atocore/context/project_state.py` | `invalidate_state` | `project_name`, before the SQL lookup |
| `src/atocore/interactions/service.py` | `record_interaction` | `project`, before insert |
| `src/atocore/interactions/service.py` | `list_interactions` | `project` filter parameter, before WHERE clause |
| `src/atocore/memory/service.py` | `create_memory` | `project`, before insert |
| `src/atocore/memory/service.py` | `get_memories` | `project` filter parameter, before WHERE clause |
Every one of those is the **first** thing the function does after
input validation. There is no path through any of those eight
functions where a project name reaches storage without passing
through `resolve_project_name`.
## Where the helper is NOT called (and why that's correct)
These places intentionally do not canonicalize:
1. **`update_memory`'s project field.** The API does not allow
changing a memory's project after creation, so there's no
project to canonicalize. The function only updates `content`,
`confidence`, and `status`.
2. **The retriever's `_project_match_boost` substring matcher.** It
already calls `get_registered_project` internally to expand the
hint into the candidate set (canonical id + all aliases + last
path segments). It accepts the raw hint by design.
3. **`_rank_chunks`'s secondary substring boost in
`builder.py`.** Still uses the raw hint. This is a multiplicative
factor on top of correct retrieval, not a filter, so it cannot
drop relevant chunks. Tracked as a future cleanup but not
critical.
4. **Direct SQL queries for the projects table itself** (e.g.
`ensure_project`'s lookup). These are intentional case-insensitive
raw lookups against the column the canonical id is stored in.
`set_state` already canonicalized before reaching `ensure_project`,
so the value passed is the canonical id by definition.
5. **Hand-authored project names that aren't in the registry.**
The helper returns those unchanged. This is the backwards-compat
path mentioned above; it is *not* a violation of the rule, it's
the rule applied to a name with no registry record.
## Why this is the trust hierarchy in action
The whole point of AtoCore is the trust hierarchy from the operating
model:
1. Trusted Project State (Layer 3) is the most authoritative layer
2. Memories (active) are second
3. Source chunks (raw retrieved content) are last
If a caller passes the alias `p05` and Layer 3 was written under
`p05-interferometer`, and the lookup fails to find the canonical
row, **the trust hierarchy collapses**. The most-authoritative
layer is silently invisible to the caller. The system would still
return *something* — namely, lower-trust retrieved chunks — and the
human would never know they got a degraded answer.
The canonicalization helper is what makes the trust hierarchy
**dependable**. Layer 3 is supposed to win every time. To win it
has to be findable. To be findable, the lookup key has to match
how the row was stored. And the only way to guarantee that match
across every entry point is to canonicalize at every boundary.
## Compatibility gap: legacy alias-keyed rows
The canonicalization rule fixes new writes going forward, but it
does NOT fix rows that were already written under a registered
alias before `fb6298a` landed. Those rows have a real, concrete
gap that must be closed by a one-time migration before the
engineering layer V1 ships.
The exact failure mode:
```
time T0 (before fb6298a):
POST /project/state {project: "p05", ...}
-> set_state("p05", ...) # no canonicalization
-> ensure_project("p05") # creates a "p05" row
-> writes state with project_id pointing at the "p05" row
time T1 (after fb6298a):
POST /project/state {project: "p05", ...} (or any read)
-> set_state("p05", ...)
-> resolve_project_name("p05") -> "p05-interferometer"
-> ensure_project("p05-interferometer") # creates a SECOND row
-> writes new state under the canonical row
-> the T0 state is still in the "p05" row, INVISIBLE to every
canonicalized read
```
The unregistered-name fallback path saves you when the project was
never in the registry: a row stored under `"orphan-project"` is read
back via `"orphan-project"`, both pass through `resolve_project_name`
unchanged, and the strings line up. **It does not save you when the
name is a registered alias** — the helper rewrites the read key but
not the storage key, and the legacy row becomes invisible.
What is at risk on the live Dalidou DB:
1. **`projects` table**: any rows whose `name` column matches a
registered alias (one row per alias actually written under
before the fix landed). These shadow the canonical project row
and silently fragment the projects namespace.
2. **`project_state` table**: any rows whose `project_id` points
at one of those shadow project rows. **This is the highest-risk
case** because it directly defeats the trust hierarchy: Layer 3
trusted state becomes invisible to every canonicalized lookup.
3. **`memories` table**: any rows whose `project` column is a
registered alias. Reinforcement and extraction queries will
miss them.
4. **`interactions` table**: any rows whose `project` column is a
registered alias. Listing and downstream reflection will miss
them.
How to find out the actual blast radius on the live Dalidou DB:
```sql
-- inspect the projects table for alias-shadow rows
SELECT id, name FROM projects;
-- count alias-keyed memories per known alias
SELECT project, COUNT(*) FROM memories
WHERE project IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core')
GROUP BY project;
-- count alias-keyed interactions
SELECT project, COUNT(*) FROM interactions
WHERE project IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core')
GROUP BY project;
-- count alias-shadowed project_state rows by project name
SELECT p.name, COUNT(*) FROM project_state ps
JOIN projects p ON ps.project_id = p.id
WHERE p.name IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core');
```
The migration that closes the gap has to:
1. For each registered project, find all `projects` rows whose
name matches one of the project's aliases AND is not the
canonical id itself. These are the "shadow" rows.
2. For each shadow row, MERGE its dependent state into the
canonical project's row:
- rekey `project_state.project_id` from shadow → canonical
- if the merge would create a `(project_id, category, key)`
collision (a state row already exists under the canonical
id with the same category+key), the migration must surface
the conflict via the existing conflict model and pause
until the human resolves it
- delete the now-empty shadow `projects` row
3. For `memories` and `interactions`, the fix is simpler because
the alias appears as a string column (not a foreign key):
`UPDATE memories SET project = canonical WHERE project = alias`,
then same for interactions.
4. The migration must run in dry-run mode first, printing the
exact rows it would touch and the canonical destinations they
would be merged into.
5. The migration must be idempotent — running it twice produces
the same final state as running it once.
This work is **required before the engineering layer V1 ships**
because V1 will add new `entities`, `relationships`, `conflicts`,
and `mirror_regeneration_failures` tables that all key on the
canonical project id. Any leaked alias-keyed rows in the existing
tables would show up in V1 reads as silently missing data, and
the killer-correctness queries from `engineering-query-catalog.md`
(orphan requirements, decisions on flagged assumptions,
unsupported claims) would report wrong results against any project
that has shadow rows.
The migration script does NOT exist yet. The open follow-ups
section below tracks it as the next concrete step.
## The rule for new entry points
When you add a new service-layer function that takes a project name,
follow this checklist:
1. **Does the function read or write a row keyed by project?** If
yes, you must call `resolve_project_name`. If no (e.g. it only
takes `project` as a label for logging), you may skip the
canonicalization but you should add a comment explaining why.
2. **Where does the canonicalization go?** As the first statement
after input validation. Not later, not "before storage", not
"in the helper that does the actual write". As the first
statement, so any subsequent service call inside the function
sees the canonical value.
3. **Add a regression test that uses an alias.** Use the
`project_registry` fixture from `tests/conftest.py` to set up
a temp registry with at least one project + aliases, then
verify the new function works when called with the alias and
when called with the canonical id.
4. **If the function can be called with `None` or empty string,
verify that path too.** The helper handles it correctly but
the function-under-test might not.
## How the `project_registry` test fixture works
`tests/conftest.py::project_registry` returns a callable that
takes one or more `(project_id, [aliases])` tuples (or just a bare
`project_id` string), writes them into a temp registry file,
points `ATOCORE_PROJECT_REGISTRY_PATH` at it, and reloads
`config.settings`. Use it like:
```python
def test_my_new_thing_canonicalizes(project_registry):
project_registry(("p05-interferometer", ["p05", "interferometer"]))
# ... call your service function with "p05" ...
# ... assert it works the same as if you'd passed "p05-interferometer" ...
```
The fixture is reused by all 12 alias-canonicalization regression
tests added in `fb6298a`. Following the same pattern for new
features is the cheapest way to keep the contract intact.
## What this rule does NOT cover
1. **Alias creation / management.** This document is about reading
and writing project-keyed data. Adding new projects or new
aliases is the registry's own write path
(`POST /projects/register`, `PUT /projects/{name}`), which
already enforces collision detection and atomic file writes.
2. **Registry hot-reloading.** The helper calls
`load_project_registry()` on every invocation, which reads the
JSON file each time. There is no in-process cache. If the
registry file changes, the next call sees the new contents.
Performance is fine for the current registry size but if it
becomes a bottleneck, add a versioned cache here, not at every
call site.
3. **Cross-project deduplication.** If two different projects in
the registry happen to share an alias, the registry's collision
detection blocks the second one at registration time, so this
case can't arise in practice. The helper does not handle it
defensively.
4. **Time-bounded canonicalization.** A project's canonical id is
stable. Aliases can be added or removed via
`PUT /projects/{name}`, but the canonical `id` field never
changes after registration. So a row written today under the
canonical id will always remain findable under that id, even
if the alias set evolves.
5. **Migration of legacy data.** If the live Dalidou DB has rows
that were written under aliases before the canonicalization
landed (e.g. a `memories` row with `project = "p05"` from
before `fb6298a`), those rows are **NOT** automatically
reachable from the canonicalized read path. The unregistered-
name fallback only helps for project names that were never
registered at all; it does **NOT** help for names that are
registered as aliases. See the "Compatibility gap" section
below for the exact failure mode and the migration path that
has to run before the engineering layer V1 ships.
## What this enables for the engineering layer V1
When the engineering layer ships per `engineering-v1-acceptance.md`,
it adds at least these new project-keyed surfaces:
- `entities` table with a `project_id` column
- `relationships` table that joins entities, indirectly project-keyed
- `conflicts` table with a `project` column
- `mirror_regeneration_failures` table with a `project` column
- new endpoints: `POST /entities/...`, `POST /ingest/kb-cad/export`,
`POST /ingest/kb-fem/export`, `GET /mirror/{project}/...`,
`GET /conflicts?project=...`
**Every one of those write/read paths needs to call
`resolve_project_name` at its service-layer entry point**, following
the same pattern as the eight existing call sites listed above. The
implementation sprint should:
1. Apply the helper at each new service entry point as the first
statement after input validation
2. Add a regression test using the `project_registry` fixture that
exercises an alias against each new entry point
3. Treat any new service function that takes a project name without
calling `resolve_project_name` as a code review failure
The pattern is simple enough to follow without thinking, which is
exactly the property we want for a contract that has to hold
across many independent additions.
## Open follow-ups
These are things the canonicalization story still has open. None
are blockers, but they're the rough edges to be aware of.
1. **Legacy alias data migration — REQUIRED before engineering V1
ships, NOT optional.** If the live Dalidou DB has any rows
written under aliases before `fb6298a` landed, they are
silently invisible to the canonicalized read path (see the
"Compatibility gap" section above for the exact failure mode).
This is a real correctness issue, not a theoretical one: any
trusted state, memory, or interaction stored under `p05`,
`gigabit`, `polisher`, etc. before the fix landed is currently
unreachable from any service-layer query. The migration script
has to walk `projects`, `project_state`, `memories`, and
`interactions`, merge shadow rows into their canonical
counterparts (with conflict-model handling for any collisions),
and run in dry-run mode first. Estimated cost: ~150 LOC for
the migration script + ~50 LOC of tests + a one-time supervised
run on the live Dalidou DB. **This migration is the next
concrete pre-V1 step.**
2. **Registry file caching.** `load_project_registry()` reads the
JSON file on every `resolve_project_name` call. With ~5
projects this is fine; with 50+ it would warrant a versioned
cache (cache key = file mtime + size). Defer until measured.
3. **Case sensitivity audit.** The helper uses
`get_registered_project` which lowercases for comparison. The
stored canonical id keeps its original casing. No bug today
because every test passes, but worth re-confirming when the
engineering layer adds entity-side storage.
4. **`_rank_chunks`'s secondary substring boost.** Mentioned
earlier; still uses the raw hint. Replace it with the same
helper-driven approach the retriever uses, OR delete it as
redundant once we confirm the retriever's primary boost is
sufficient.
5. **Documentation discoverability.** This doc lives under
`docs/architecture/`. The contract is also restated in the
docstring of `resolve_project_name` and referenced from each
call site's comment. That redundancy is intentional — the
contract is too easy to forget to live in only one place.
## Quick reference card
Copy-pasteable for new service functions:
```python
from atocore.projects.registry import resolve_project_name
def my_new_service_entry_point(
project_name: str,
other_args: ...,
) -> ...:
# Validate inputs first
if not project_name:
raise ValueError("project_name is required")
# Canonicalize through the registry as the first thing after
# validation. Every subsequent operation in this function uses
# the canonical id, so storage and queries are guaranteed
# consistent across alias and canonical-id callers.
project_name = resolve_project_name(project_name)
# ... rest of the function ...
```
## TL;DR
- One helper, one rule: `resolve_project_name` at every service-layer
entry point that takes a project name
- Currently called in 8 places across builder, project_state,
interactions, and memory; all 8 listed in this doc
- Backwards-compat path returns **unregistered** names unchanged
(e.g. `"orphan-project"`); this does NOT cover **registered
alias** names that were used as storage keys before `fb6298a`
- **Real compatibility gap**: any row whose `project` column is a
registered alias from before the canonicalization landed is
silently invisible to the new read path. A one-time migration
is required before engineering V1 ships. See the "Compatibility
gap" section.
- The trust hierarchy depends on this helper being applied
everywhere — Layer 3 trusted state has to be findable for it to
win the trust battle
- Use the `project_registry` test fixture to add regression tests
for any new service function that takes a project name
- The engineering layer V1 implementation must follow the same
pattern at every new service entry point
- Open follow-ups (in priority order): **legacy alias data
migration (required pre-V1)**, redundant substring boost
cleanup, registry caching when projects scale

View File

@@ -0,0 +1,273 @@
# Representation Authority (canonical home matrix)
## Why this document exists
The same fact about an engineering project can show up in many
places: a markdown note in the PKM, a structured field in KB-CAD,
a commit message in a Gitea repo, an active memory in AtoCore, an
entity in the engineering layer, a row in trusted project state.
**Without an explicit rule about which representation is
authoritative for which kind of fact, the system will accumulate
contradictions and the human will lose trust in all of them.**
This document is the canonical-home matrix. Every kind of fact
that AtoCore handles has exactly one authoritative representation,
and every other place that holds a copy of that fact is, by
definition, a derived view that may be stale.
## The representations in scope
Six places where facts can live in this ecosystem:
| Layer | What it is | Who edits it | How it's structured |
|---|---|---|---|
| **PKM** | Antoine's Obsidian-style markdown vault under `/srv/storage/atocore/sources/vault/` | Antoine, by hand | unstructured markdown with optional frontmatter |
| **KB project** | the engineering Knowledge Base (KB-CAD / KB-FEM repos and any companion docs) | Antoine, semi-structured | per-tool typed records |
| **Gitea repos** | source code repos under `dalidou:3000/Antoine/*` (Fullum-Interferometer, polisher-sim, ATOCore itself, ...) | Antoine via git commits | code, READMEs, repo-specific markdown |
| **AtoCore memories** | rows in the `memories` table | hand-authored or extracted from interactions | typed (identity / preference / project / episodic / knowledge / adaptation) |
| **AtoCore entities** | rows in the `entities` table (V1, not yet built) | imported from KB exports or extracted from interactions | typed entities + relationships per the V1 ontology |
| **AtoCore project state** | rows in the `project_state` table (Layer 3, trusted) | hand-curated only, never automatic | category + key + value |
## The canonical home rule
> For each kind of fact, exactly one of the six representations is
> the authoritative source. The other five may hold derived
> copies, but they are not allowed to disagree with the
> authoritative one. When they disagree, the disagreement is a
> conflict and surfaces via the conflict model.
The matrix below assigns the authoritative representation per fact
kind. It is the practical answer to the question "where does this
fact actually live?" for daily decisions.
## The canonical-home matrix
| Fact kind | Canonical home | Why | How it gets into AtoCore |
|---|---|---|---|
| **CAD geometry** (the actual model) | NX (or successor CAD tool) | the only place that can render and validate it | not in AtoCore at all in V1 |
| **CAD-side structure** (subsystem tree, component list, materials, parameters) | KB-CAD | KB-CAD is the structured wrapper around NX | KB-CAD export → `/ingest/kb-cad/export` → entities |
| **FEM mesh & solver settings** | KB-FEM (wrapping the FEM tool) | only the solver representation can run | not in AtoCore at all in V1 |
| **FEM results & validation outcomes** | KB-FEM | KB-FEM owns the outcome records | KB-FEM export → `/ingest/kb-fem/export` → entities |
| **Source code** | Gitea repos | repos are version-controlled and reviewable | indirectly via repo markdown ingestion (Phase 1) |
| **Repo-level documentation** (READMEs, design docs in the repo) | Gitea repos | lives next to the code it documents | ingested as source chunks; never hand-edited in AtoCore |
| **Project-level prose notes** (decisions in long-form, journal-style entries, working notes) | PKM | the place Antoine actually writes when thinking | ingested as source chunks; the extractor proposes candidates from these for the review queue |
| **Identity** ("the user is a mechanical engineer running AtoCore") | AtoCore memories (`identity` type) | nowhere else holds personal identity | hand-authored via `POST /memory` or extracted from interactions |
| **Preference** ("prefers small reviewable diffs", "uses SI units") | AtoCore memories (`preference` type) | nowhere else holds personal preferences | hand-authored or extracted |
| **Episodic** ("on April 6 we debugged the EXDEV bug") | AtoCore memories (`episodic` type) | nowhere else has time-bound personal recall | extracted from captured interactions |
| **Decision** (a structured engineering decision) | AtoCore **entities** (Decision) once the engineering layer ships; AtoCore memories (`adaptation`) until then | needs structured supersession, audit trail, and link to affected components | extracted from PKM or interactions; promoted via review queue |
| **Requirement** | AtoCore **entities** (Requirement) | needs structured satisfaction tracking | extracted from PKM, KB-CAD, or interactions |
| **Constraint** | AtoCore **entities** (Constraint) | needs structured link to the entity it constrains | extracted from PKM, KB-CAD, or interactions |
| **Validation claim** | AtoCore **entities** (ValidationClaim) | needs structured link to supporting Result | extracted from KB-FEM exports or interactions |
| **Material** | KB-CAD if the material is on a real component; AtoCore entity (Material) if it's a project-wide material decision not yet attached to geometry | structured properties live in KB-CAD's material database | KB-CAD export, or hand-authored as a Material entity |
| **Parameter** | KB-CAD or KB-FEM depending on whether it's a geometry or solver parameter; AtoCore entity (Parameter) if it's a higher-level project parameter not in either tool | structured numeric values with units live in their tool of origin | KB export, or hand-authored |
| **Project status / current focus / next milestone** | AtoCore **project_state** (Layer 3) | the trust hierarchy says trusted state is the highest authority for "what is the current state of the project" | hand-curated via `POST /project/state` |
| **Architectural decision records (ADRs)** | depends on form: long-form ADR markdown lives in the repo; the structured fact about which ADR was selected lives in the AtoCore Decision entity | both representations are useful for different audiences | repo ingestion provides the prose; the entity is created by extraction or hand-authored |
| **Operational runbooks** | repo (next to the code they describe) | lives with the system it operates | not promoted into AtoCore entities — runbooks are reference material, not facts |
| **Backup metadata** (snapshot timestamps, integrity status) | the backup-metadata.json files under `/srv/storage/atocore/backups/` | each snapshot is its own self-describing record | not in AtoCore's database; queried via the `/admin/backup` endpoints |
| **Conversation history with AtoCore (interactions)** | AtoCore `interactions` table | nowhere else has the prompt + context pack + response triple | written by capture (Phase 9 Commit A) |
## The supremacy rule for cross-layer facts
When the same fact has copies in multiple representations and they
disagree, the trust hierarchy applies in this order:
1. **AtoCore project_state** (Layer 3) is highest authority for any
"current state of the project" question. This is why it requires
manual curation and never gets touched by automatic processes.
2. **The tool-of-origin canonical home** is highest authority for
facts that are tool-managed: KB-CAD wins over AtoCore entities
for CAD-side structure facts; KB-FEM wins for FEM result facts.
3. **AtoCore entities** are highest authority for facts that are
AtoCore-managed: Decisions, Requirements, Constraints,
ValidationClaims (when the supporting Results are still loose).
4. **Active AtoCore memories** are highest authority for personal
facts (identity, preference, episodic).
5. **Source chunks (PKM, repos, ingested docs)** are lowest
authority — they are the raw substrate from which higher layers
are extracted, but they may be stale, contradictory among
themselves, or out of date.
This is the same hierarchy enforced by `conflict-model.md`. This
document just makes it explicit per fact kind.
## Examples
### Example 1 — "what material does the lateral support pad use?"
Possible representations:
- KB-CAD has the field `component.lateral-support-pad.material = "GF-PTFE"`
- A PKM note from last month says "considering PEEK for the
lateral support, GF-PTFE was the previous choice"
- An AtoCore Material entity says `GF-PTFE`
- An AtoCore project_state entry says `p05 / decision /
lateral_support_material = GF-PTFE`
Which one wins for the question "what's the current material"?
- **project_state wins** if the query is "what is the current
trusted answer for p05's lateral support material" (Layer 3)
- **KB-CAD wins** if project_state has not been curated for this
field yet, because KB-CAD is the canonical home for CAD-side
structure
- **The Material entity** is a derived view from KB-CAD; if it
disagrees with KB-CAD, the entity is wrong and a conflict is
surfaced
- **The PKM note** is historical context, not authoritative for
"current"
### Example 2 — "did we decide to merge the bind mounts?"
Possible representations:
- A working session interaction is captured in the `interactions`
table with the response containing `## Decision: merge the two
bind mounts into one`
- The Phase 9 Commit C extractor produced a candidate adaptation
memory from that decision
- A reviewer promoted the candidate to active
- The AtoCore source repo has the actual code change in commit
`d0ff8b5` and the docker-compose.yml is in its post-merge form
Which one wins for "is this decision real and current"?
- **The Gitea repo** wins for "is this decision implemented" —
the docker-compose.yml is the canonical home for the actual
bind mount configuration
- **The active adaptation memory** wins for "did we decide this"
— that's exactly what the Commit C lifecycle is for
- **The interaction record** is the audit trail — it's
authoritative for "when did this conversation happen and what
did the LLM say", but not for "is this decision current"
- **The source chunks** from PKM are not relevant here because no
PKM note about this decision exists yet (and that's fine —
decisions don't have to live in PKM if they live in the repo
and the AtoCore memory)
### Example 3 — "what's p05's current next focus?"
Possible representations:
- The PKM has a `current-status.md` note updated last week
- AtoCore project_state has `p05 / status / next_focus = "wave 2 ingestion"`
- A captured interaction from yesterday discussed the next focus
at length
Which one wins?
- **project_state wins**, full stop. The trust hierarchy says
Layer 3 is canonical for current state. This is exactly the
reason project_state exists.
- The PKM note is historical context.
- The interaction is conversation history.
- If project_state and the PKM disagree, the human updates one or
the other to bring them in line — usually by re-curating
project_state if the conversation revealed a real change.
## What this means for the engineering layer V1 implementation
Several concrete consequences fall out of the matrix:
1. **The Material and Parameter entity types are mostly KB-CAD
shadows in V1.** They exist in AtoCore so other entities
(Decisions, Requirements) can reference them with structured
links, but their authoritative values come from KB-CAD imports.
If KB-CAD doesn't know about a material, the AtoCore entity is
the canonical home only because nothing else is.
2. **Decisions / Requirements / Constraints / ValidationClaims
are AtoCore-canonical.** These don't have a natural home in
KB-CAD or KB-FEM. They live in AtoCore as first-class entities
with full lifecycle and supersession.
3. **The PKM is never authoritative.** It is the substrate for
extraction. The reviewer promotes things out of it; they don't
point at PKM notes as the "current truth".
4. **project_state is the override layer.** Whenever the human
wants to declare "the current truth is X regardless of what
the entities and memories and KB exports say", they curate
into project_state. Layer 3 is intentionally small and
intentionally manual.
5. **The conflict model is the enforcement mechanism.** When two
representations disagree on a fact whose canonical home rule
should pick a winner, the conflict surfaces via the
`/conflicts` endpoint and the reviewer resolves it. The
matrix in this document tells the reviewer who is supposed
to win in each scenario; they're not making the decision blind.
## What the matrix does NOT define
1. **Facts about people other than the user.** No "team member"
entity, no per-collaborator preferences. AtoCore is
single-user in V1.
2. **Facts about AtoCore itself as a project.** Those are project
memories and project_state entries under `project=atocore`,
same lifecycle as any other project's facts.
3. **Vendor / supplier / cost facts.** Out of V1 scope.
4. **Time-bounded facts** (a value that was true between two
dates and may not be true now). The current matrix treats all
active facts as currently-true and uses supersession to
represent change. Temporal facts are a V2 concern.
5. **Cross-project shared facts** (a Material that is reused across
p04, p05, and p06). Currently each project has its own copy.
Cross-project deduplication is also a V2 concern.
## The "single canonical home" invariant in practice
The hard rule that every fact has exactly one canonical home is
the load-bearing invariant of this matrix. To enforce it
operationally:
- **Extraction never duplicates.** When the extractor scans an
interaction or a source chunk and proposes a candidate, the
candidate is dropped if it duplicates an already-active record
in the canonical home (the existing extractor implementation
already does this for memories; the entity extractor will
follow the same pattern).
- **Imports never duplicate.** When KB-CAD pushes the same
Component twice with the same value, the second push is
recognized as identical and updates the `last_imported_at`
timestamp without creating a new entity.
- **Imports surface drift as conflict.** When KB-CAD pushes the
same Component with a different value, that's a conflict per
the conflict model — never a silent overwrite.
- **Hand-curation into project_state always wins.** A
project_state entry can disagree with an entity or a KB
export; the project_state entry is correct by fiat (Layer 3
trust), and the reviewer is responsible for bringing the lower
layers in line if appropriate.
## Open questions for V1 implementation
1. **How does the reviewer see the canonical home for a fact in
the UI?** Probably by including the fact's authoritative
layer in the entity / memory detail view: "this Material is
currently mirrored from KB-CAD; the canonical home is KB-CAD".
2. **Who owns running the KB-CAD / KB-FEM exporter?** The
`tool-handoff-boundaries.md` doc lists this as an open
question; same answer applies here.
3. **Do we need an explicit `canonical_home` field on entity
rows?** A field that records "this entity is canonical here"
vs "this entity is a mirror of <external system>". Probably
yes; deferred to the entity schema spec.
4. **How are project_state overrides surfaced in the engineering
layer query results?** When a query (e.g. Q-001 "what does
this subsystem contain?") would return entity rows, the result
should also flag any project_state entries that contradict the
entities — letting the reviewer see the override at query
time, not just in the conflict queue.
## TL;DR
- Six representation layers: PKM, KB project, repos, AtoCore
memories, AtoCore entities, AtoCore project_state
- Every fact kind has exactly one canonical home
- The trust hierarchy resolves cross-layer conflicts:
project_state > tool-of-origin (KB-CAD/KB-FEM) > entities >
active memories > source chunks
- Decisions / Requirements / Constraints / ValidationClaims are
AtoCore-canonical (no other system has a natural home for them)
- Materials / Parameters / CAD-side structure are KB-CAD-canonical
- FEM results / validation outcomes are KB-FEM-canonical
- project_state is the human override layer, top of the
hierarchy, manually curated only
- Conflicts surface via `/conflicts` and the reviewer applies the
matrix to pick a winner

View File

@@ -0,0 +1,339 @@
# Tool Hand-off Boundaries (KB-CAD / KB-FEM and friends)
## Why this document exists
The engineering layer V1 will accumulate typed entities about
projects, subsystems, components, materials, requirements,
constraints, decisions, parameters, analysis models, results, and
validation claims. Many of those concepts also live in real
external tools — CAD systems, FEM solvers, BOM managers, PLM
databases, vendor portals.
The first big design decision before writing any entity-layer code
is: **what is AtoCore's read/write relationship with each of those
external tools?**
The wrong answer in either direction is expensive:
- Too read-only: AtoCore becomes a stale shadow of the tools and
loses the trust battle the moment a value drifts.
- Too bidirectional: AtoCore takes on responsibilities it can't
reliably honor (live sync, conflict resolution against external
schemas, write-back validation), and the project never ships.
This document picks a position for V1.
## The position
> **AtoCore is a one-way mirror in V1.** External tools push
> structured exports into AtoCore. AtoCore never pushes back.
That position has three corollaries:
1. **External tools remain the source of truth for everything they
already manage.** A CAD model is canonical for geometry; a FEM
project is canonical for meshes and solver settings; KB-CAD is
canonical for whatever KB-CAD already calls canonical.
2. **AtoCore is the source of truth for the *AtoCore-shaped*
record** of those facts: the Decision that selected the geometry,
the Requirement the geometry satisfies, the ValidationClaim the
FEM result supports. AtoCore does not duplicate the external
tool's primary representation; it stores the structured *facts
about* it.
3. **The boundary is enforced by absence.** No write endpoint in
AtoCore ever generates a `.prt`, a `.fem`, an export to a PLM
schema, or a vendor purchase order. If we find ourselves wanting
to add such an endpoint in V1, we should stop and reconsider
the V1 scope.
## Why one-way and not bidirectional
Bidirectional sync between independent systems is one of the
hardest problems in engineering software. The honest reasons we
are not attempting it in V1:
1. **Schema drift.** External tools evolve their schemas
independently. A bidirectional sync would have to track every
schema version of every external tool we touch. That is a
permanent maintenance tax.
2. **Conflict semantics.** When AtoCore and an external tool
disagree on the same field, "who wins" is a per-tool, per-field
decision. There is no general rule. Bidirectional sync would
require us to specify that decision exhaustively.
3. **Trust hierarchy.** AtoCore's whole point is the trust
hierarchy: trusted project state > entities > memories. If we
let entities push values back into the external tools, we
silently elevate AtoCore's confidence to "high enough to write
to a CAD model", which it almost never deserves.
4. **Velocity.** A bidirectional engineering layer is a
multi-year project. A one-way mirror is a months project. The
value-to-effort ratio favors one-way for V1 by an enormous
margin.
5. **Reversibility.** We can always add bidirectional sync later
on a per-tool basis once V1 has shown itself to be useful. We
cannot easily walk back a half-finished bidirectional sync that
has already corrupted data in someone's CAD model.
## Per-tool stance for V1
| External tool | V1 stance | What AtoCore reads in | What AtoCore writes back |
|---|---|---|---|
| **KB-CAD** (Antoine's CAD knowledge base) | one-way mirror | structured exports of subsystems, components, materials, parameters via a documented JSON or CSV shape | nothing |
| **KB-FEM** (Antoine's FEM knowledge base) | one-way mirror | structured exports of analysis models, results, validation claims | nothing |
| **NX / Siemens NX** (the CAD tool itself) | not connected in V1 | nothing direct — only what KB-CAD exports about NX projects | nothing |
| **PKM (Obsidian / markdown vault)** | already connected via the ingestion pipeline (Phase 1) | full markdown/text corpus per the ingestion-waves doc | nothing |
| **Gitea repos** | already connected via the ingestion pipeline | repo markdown/text per project | nothing |
| **OpenClaw** (the LLM agent) | already connected via the read-only helper skill on the T420 | nothing — OpenClaw reads from AtoCore | nothing — OpenClaw does not write into AtoCore |
| **AtoDrive** (operational truth layer, future) | future: bidirectional with AtoDrive itself, but AtoDrive is internal to AtoCore so this isn't an external tool boundary | n/a in V1 | n/a in V1 |
| **PLM / vendor portals / cost systems** | not in V1 scope | nothing | nothing |
## What "one-way mirror" actually looks like in code
AtoCore exposes an ingestion endpoint per external tool that
accepts a structured export and turns it into entity candidates.
The endpoint is read-side from AtoCore's perspective (it reads
from a file or HTTP body), even though the external tool is the
one initiating the call.
Proposed V1 ingestion endpoints:
```
POST /ingest/kb-cad/export body: KB-CAD export JSON
POST /ingest/kb-fem/export body: KB-FEM export JSON
```
Each endpoint:
1. Validates the export against the documented schema
2. Maps each export record to an entity candidate (status="candidate")
3. Carries the export's source identifier into the candidate's
provenance fields (source_artifact_id, exporter_version, etc.)
4. Returns a summary: how many candidates were created, how many
were dropped as duplicates, how many failed schema validation
5. Does NOT auto-promote anything
The KB-CAD and KB-FEM teams (which is to say, future-you) own the
exporter scripts that produce these JSON bodies. Those scripts
live in the KB-CAD / KB-FEM repos respectively, not in AtoCore.
## The export schemas (sketch, not final)
These are starting shapes, intentionally minimal. The schemas
will be refined in `kb-cad-export-schema.md` and
`kb-fem-export-schema.md` once the V1 ontology lands.
### KB-CAD export shape (starting sketch)
```json
{
"exporter": "kb-cad",
"exporter_version": "1.0.0",
"exported_at": "2026-04-07T12:00:00Z",
"project": "p05-interferometer",
"subsystems": [
{
"id": "subsystem.optical-frame",
"name": "Optical frame",
"parent": null,
"components": [
{
"id": "component.lateral-support-pad",
"name": "Lateral support pad",
"material": "GF-PTFE",
"parameters": {
"thickness_mm": 3.0,
"preload_n": 12.0
},
"source_artifact": "kb-cad://p05/subsystems/optical-frame#lateral-support"
}
]
}
]
}
```
### KB-FEM export shape (starting sketch)
```json
{
"exporter": "kb-fem",
"exporter_version": "1.0.0",
"exported_at": "2026-04-07T12:00:00Z",
"project": "p05-interferometer",
"analysis_models": [
{
"id": "model.optical-frame-modal",
"name": "Optical frame modal analysis v3",
"subsystem": "subsystem.optical-frame",
"results": [
{
"id": "result.first-mode-frequency",
"name": "First-mode frequency",
"value": 187.4,
"unit": "Hz",
"supports_validation_claim": "claim.frame-rigidity-min-150hz",
"source_artifact": "kb-fem://p05/models/optical-frame-modal#first-mode"
}
]
}
]
}
```
These shapes will evolve. The point of including them now is to
make the one-way mirror concrete: it is a small, well-defined
JSON shape, not "AtoCore reaches into KB-CAD's database".
## What AtoCore is allowed to do with the imported records
After ingestion, the imported records become entity candidates
in AtoCore's own table. From that point forward they follow the
exact same lifecycle as any other candidate:
- they sit at status="candidate" until a human reviews them
- the reviewer promotes them to status="active" or rejects them
- the active entities are queryable via the engineering query
catalog (Q-001 through Q-020)
- the active entities can be referenced from Decisions, Requirements,
ValidationClaims, etc. via the V1 relationship types
The imported records are never automatically pushed into trusted
project state, never modified in place after import (they are
superseded by re-imports, not edited), and never written back to
the external tool.
## What happens when KB-CAD changes a value AtoCore already has
This is the canonical "drift" scenario. The flow:
1. KB-CAD exports a fresh JSON. Component `component.lateral-support-pad`
now has `material: "PEEK"` instead of `material: "GF-PTFE"`.
2. AtoCore's ingestion endpoint sees the same `id` and a different
value.
3. The ingestion endpoint creates a new entity candidate with the
new value, **does NOT delete or modify the existing active
entity**, and creates a `conflicts` row linking the two members
(per the conflict model doc).
4. The reviewer sees an open conflict on the next visit to
`/conflicts`.
5. The reviewer either:
- **promotes the new value** (the active is superseded, the
candidate becomes the new active, the audit trail keeps both)
- **rejects the new value** (the candidate is invalidated, the
active stays — useful when the export was wrong)
- **dismisses the conflict** (declares them not actually about
the same thing, both stay active)
The reviewer never touches KB-CAD from AtoCore. If the resolution
implies a change in KB-CAD itself, the reviewer makes that change
in KB-CAD, then re-exports.
## What about NX directly?
NX (Siemens NX) is the underlying CAD tool that KB-CAD wraps.
**NX is not connected to AtoCore in V1.** Any facts about NX
projects flow through KB-CAD as the structured intermediate. This
gives us:
- **One schema to maintain.** AtoCore only has to understand the
KB-CAD export shape, not the NX API.
- **One ownership boundary.** KB-CAD owns the question of "what's
in NX". AtoCore owns the question of "what's in the typed
knowledge base".
- **Future flexibility.** When NX is replaced or upgraded, only
KB-CAD has to adapt; AtoCore doesn't notice.
The same logic applies to FEM solvers (Nastran, Abaqus, ANSYS):
KB-FEM is the structured intermediate, AtoCore never talks to the
solver directly.
## The hard-line invariants
These are the things V1 will not do, regardless of how convenient
they might seem:
1. **No write to external tools.** No POST/PUT/PATCH to any
external API, no file generation that gets written into a
CAD/FEM project tree, no email/chat sends.
2. **No live polling.** AtoCore does not poll KB-CAD or KB-FEM on
a schedule. Imports are explicit pushes from the external tool
into AtoCore's ingestion endpoint.
3. **No silent merging.** Every value drift surfaces as a
conflict for the reviewer (per the conflict model doc).
4. **No schema fan-out.** AtoCore does not store every field that
KB-CAD knows about. Only fields that map to one of the V1
entity types make it into AtoCore. Everything else is dropped
at the import boundary.
5. **No external-tool-specific logic in entity types.** A
`Component` in AtoCore is the same shape regardless of whether
it came from KB-CAD, KB-FEM, the PKM, or a hand-curated
project state entry. The source is recorded in provenance,
not in the entity shape.
## What this enables
With the one-way mirror locked in, V1 implementation can focus on:
- The entity table and its lifecycle
- The two `/ingest/kb-cad/export` and `/ingest/kb-fem/export`
endpoints with their JSON validators
- The candidate review queue extension (already designed in
`promotion-rules.md`)
- The conflict model (already designed in `conflict-model.md`)
- The query catalog implementation (already designed in
`engineering-query-catalog.md`)
None of those are unbounded. Each is a finite, well-defined
implementation task. The one-way mirror is the choice that makes
V1 finishable.
## What V2 might consider (deferred)
After V1 has been live and demonstrably useful for a quarter or
two, the questions that become reasonable to revisit:
1. **Selective write-back to KB-CAD for low-risk fields.** For
example, AtoCore could push back a "Decision id linked to this
component" annotation that KB-CAD then displays without it
being canonical there. Read-only annotations from AtoCore's
perspective, advisory metadata from KB-CAD's perspective.
2. **Live polling for very small payloads.** A daily poll of
"what subsystem ids exist in KB-CAD now" so AtoCore can flag
subsystems that disappeared from KB-CAD without an explicit
AtoCore invalidation.
3. **Direct NX integration** if the KB-CAD layer becomes a
bottleneck — but only if the friction is real, not theoretical.
4. **Cost / vendor / PLM connections** for projects where the
procurement cycle is part of the active engineering work.
None of these are V1 work and they are listed only so the V1
design intentionally leaves room for them later.
## Open questions for the V1 implementation sprint
1. **Where do the export schemas live?** Probably in
`docs/architecture/kb-cad-export-schema.md` and
`docs/architecture/kb-fem-export-schema.md`, drafted during
the implementation sprint.
2. **Who runs the exporter?** A scheduled job on the KB-CAD /
KB-FEM hosts, triggered by the human after a meaningful
change, or both?
3. **Is the export incremental or full?** Full is simpler but
more expensive. Incremental needs delta semantics. V1 starts
with full and revisits when full becomes too slow.
4. **How is the exporter authenticated to AtoCore?** Probably
the existing PAT model (one PAT per exporter, scoped to
`write:engineering-import` once that scope exists). Worth a
quick auth design pass before the endpoints exist.
## TL;DR
- AtoCore is a one-way mirror in V1: external tools push,
AtoCore reads, AtoCore never writes back
- Two import endpoints for V1: KB-CAD and KB-FEM, each with a
documented JSON export shape
- Drift surfaces as conflicts in the existing conflict model
- No NX, no FEM solvers, no PLM, no vendor portals, no
cost/BOM systems in V1
- Bidirectional sync is reserved for V2+ on a per-tool basis,
only after V1 demonstrates value

View File

@@ -0,0 +1,442 @@
# AtoCore Backup and Restore Procedure
## Scope
This document defines the operational procedure for backing up and
restoring AtoCore's machine state on the Dalidou deployment. It is
the practical companion to `docs/backup-strategy.md` (which defines
the strategy) and `src/atocore/ops/backup.py` (which implements the
mechanics).
The intent is that this procedure can be followed by anyone with
SSH access to Dalidou and the AtoCore admin endpoints.
## What gets backed up
A `create_runtime_backup` snapshot contains, in order of importance:
| Artifact | Source path on Dalidou | Backup destination | Always included |
|---|---|---|---|
| SQLite database | `/srv/storage/atocore/data/db/atocore.db` | `<backup_root>/db/atocore.db` | yes |
| Project registry JSON | `/srv/storage/atocore/config/project-registry.json` | `<backup_root>/config/project-registry.json` | yes (if file exists) |
| Backup metadata | (generated) | `<backup_root>/backup-metadata.json` | yes |
| Chroma vector store | `/srv/storage/atocore/data/chroma/` | `<backup_root>/chroma/` | only when `include_chroma=true` |
The SQLite snapshot uses the online `conn.backup()` API and is safe
to take while the database is in use. The Chroma snapshot is a cold
directory copy and is **only safe when no ingestion is running**;
the API endpoint enforces this by acquiring the ingestion lock for
the duration of the copy.
What is **not** in the backup:
- Source documents under `/srv/storage/atocore/sources/vault/` and
`/srv/storage/atocore/sources/drive/`. These are read-only
inputs and live in the user's PKM/Drive, which is backed up
separately by their own systems.
- Application code. The container image is the source of truth for
code; recovery means rebuilding the image, not restoring code from
a backup.
- Logs under `/srv/storage/atocore/logs/`.
- Embeddings cache under `/srv/storage/atocore/data/cache/`.
- Temp files under `/srv/storage/atocore/data/tmp/`.
## Backup root layout
Each backup snapshot lives in its own timestamped directory:
```
/srv/storage/atocore/backups/snapshots/
├── 20260407T060000Z/
│ ├── backup-metadata.json
│ ├── db/
│ │ └── atocore.db
│ ├── config/
│ │ └── project-registry.json
│ └── chroma/ # only if include_chroma=true
│ └── ...
├── 20260408T060000Z/
│ └── ...
└── ...
```
The timestamp is UTC, format `YYYYMMDDTHHMMSSZ`.
## Triggering a backup
### Option A — via the admin endpoint (preferred)
```bash
# DB + registry only (fast, safe at any time)
curl -fsS -X POST http://dalidou:8100/admin/backup \
-H "Content-Type: application/json" \
-d '{"include_chroma": false}'
# DB + registry + Chroma (acquires ingestion lock)
curl -fsS -X POST http://dalidou:8100/admin/backup \
-H "Content-Type: application/json" \
-d '{"include_chroma": true}'
```
The response is the backup metadata JSON. Save the `backup_root`
field — that's the directory the snapshot was written to.
### Option B — via the standalone script (when the API is down)
```bash
docker exec atocore python -m atocore.ops.backup
```
This runs `create_runtime_backup()` directly, without going through
the API or the ingestion lock. Use it only when the AtoCore service
itself is unhealthy and you can't hit the admin endpoint.
### Option C — manual file copy (last resort)
If both the API and the standalone script are unusable:
```bash
sudo systemctl stop atocore # or: docker compose stop atocore
sudo cp /srv/storage/atocore/data/db/atocore.db \
/srv/storage/atocore/backups/manual-$(date -u +%Y%m%dT%H%M%SZ).db
sudo cp /srv/storage/atocore/config/project-registry.json \
/srv/storage/atocore/backups/manual-$(date -u +%Y%m%dT%H%M%SZ).registry.json
sudo systemctl start atocore
```
This is a cold backup and requires brief downtime.
## Listing backups
```bash
curl -fsS http://dalidou:8100/admin/backup
```
Returns the configured `backup_dir` and a list of all snapshots
under it, with their full metadata if available.
Or, on the host directly:
```bash
ls -la /srv/storage/atocore/backups/snapshots/
```
## Validating a backup
Before relying on a backup for restore, validate it:
```bash
curl -fsS http://dalidou:8100/admin/backup/20260407T060000Z/validate
```
The validator:
- confirms the snapshot directory exists
- opens the SQLite snapshot and runs `PRAGMA integrity_check`
- parses the registry JSON
- confirms the Chroma directory exists (if it was included)
A valid backup returns `"valid": true` and an empty `errors` array.
A failing validation returns `"valid": false` with one or more
specific error strings (e.g. `db_integrity_check_failed`,
`registry_invalid_json`, `chroma_snapshot_missing`).
**Validate every backup at creation time.** A backup that has never
been validated is not actually a backup — it's just a hopeful copy
of bytes.
## Restore procedure
Since 2026-04-09 the restore is implemented as a proper module
function plus CLI entry point: `restore_runtime_backup()` in
`src/atocore/ops/backup.py`, invoked as
`python -m atocore.ops.backup restore <STAMP> --confirm-service-stopped`.
It automatically takes a pre-restore safety snapshot (your rollback
anchor), handles SQLite WAL/SHM cleanly, restores the registry, and
runs `PRAGMA integrity_check` on the restored db. This replaces the
earlier manual `sudo cp` sequence.
The function refuses to run without `--confirm-service-stopped`.
This is deliberate: hot-restoring into a running service corrupts
SQLite state.
### Pre-flight (always)
1. Identify which snapshot you want to restore. List available
snapshots and pick by timestamp:
```bash
curl -fsS http://127.0.0.1:8100/admin/backup | jq '.backups[].stamp'
```
2. Validate it. Refuse to restore an invalid backup:
```bash
STAMP=20260409T060000Z
curl -fsS http://127.0.0.1:8100/admin/backup/$STAMP/validate | jq .
```
3. **Stop AtoCore.** SQLite cannot be hot-restored under a running
process and Chroma will not pick up new files until the process
restarts.
```bash
cd /srv/storage/atocore/app/deploy/dalidou
docker compose down
docker compose ps # atocore should be Exited/gone
```
### Run the restore
Use a one-shot container that reuses the live service's volume
mounts so every path (`db_path`, `chroma_path`, backup dir) resolves
to the same place the main service would see:
```bash
cd /srv/storage/atocore/app/deploy/dalidou
docker compose run --rm --entrypoint python atocore \
-m atocore.ops.backup restore \
$STAMP \
--confirm-service-stopped
```
Output is a JSON document. The critical fields:
- `pre_restore_snapshot`: stamp of the safety snapshot of live
state taken right before the restore. **Write this down.** If
the restore was the wrong call, this is how you roll it back.
- `db_restored`: should be `true`
- `registry_restored`: `true` if the backup captured a registry
- `chroma_restored`: `true` if the backup captured a chroma tree
and include_chroma resolved to true (default)
- `restored_integrity_ok`: **must be `true`** — if this is false,
STOP and do not start the service; investigate the integrity
error first. The restored file is still on disk but untrusted.
### Controlling the restore
The CLI supports a few flags for finer control:
- `--no-pre-snapshot` skips the pre-restore safety snapshot. Use
this only when you know you have another rollback path.
- `--no-chroma` restores only SQLite + registry, leaving the
current Chroma dir alone. Useful if Chroma is consistent but
SQLite needs a rollback.
- `--chroma` forces Chroma restoration even if the metadata
doesn't clearly indicate the snapshot has it (rare).
### Chroma restore and bind-mounted volumes
The Chroma dir on Dalidou is a bind-mounted Docker volume. The
restore cannot `rmtree` the destination (you can't unlink a mount
point — it raises `OSError [Errno 16] Device or resource busy`),
so the function clears the dir's CONTENTS and uses
`copytree(dirs_exist_ok=True)` to copy the snapshot back in. The
regression test `test_restore_chroma_does_not_unlink_destination_directory`
in `tests/test_backup.py` captures the destination inode before
and after restore and asserts it's stable — the same invariant
that protects the bind mount.
This was discovered during the first real Dalidou restore drill
on 2026-04-09. If you see a new restore failure with
`Device or resource busy`, something has regressed this fix.
### Restart AtoCore
```bash
cd /srv/storage/atocore/app/deploy/dalidou
docker compose up -d
# Wait for /health to come up
for i in 1 2 3 4 5 6 7 8 9 10; do
curl -fsS http://127.0.0.1:8100/health \
&& break || { echo "not ready ($i/10)"; sleep 3; }
done
```
**Note on build_sha after restore:** The one-shot `docker compose run`
container does not carry the build provenance env vars that `deploy.sh`
exports at deploy time. After a restore, `/health` will report
`build_sha: "unknown"` until you re-run `deploy.sh` or manually
re-deploy. This is cosmetic — the data is correctly restored — but if
you need `build_sha` to be accurate, run a redeploy after the restore:
```bash
cd /srv/storage/atocore/app
bash deploy/dalidou/deploy.sh
```
### Post-restore verification
```bash
# 1. Service is healthy
curl -fsS http://127.0.0.1:8100/health | jq .
# 2. Stats look right
curl -fsS http://127.0.0.1:8100/stats | jq .
# 3. Project registry loads
curl -fsS http://127.0.0.1:8100/projects | jq '.projects | length'
# 4. A known-good context query returns non-empty results
curl -fsS -X POST http://127.0.0.1:8100/context/build \
-H "Content-Type: application/json" \
-d '{"prompt": "what is p05 about", "project": "p05-interferometer"}' | jq '.chunks_used'
```
If any of these are wrong, the restore is bad. Roll back using the
pre-restore safety snapshot whose stamp you recorded from the
restore output. The rollback is the same procedure — stop the
service and restore that stamp:
```bash
docker compose down
docker compose run --rm --entrypoint python atocore \
-m atocore.ops.backup restore \
$PRE_RESTORE_SNAPSHOT_STAMP \
--confirm-service-stopped \
--no-pre-snapshot
docker compose up -d
```
(`--no-pre-snapshot` because the rollback itself doesn't need one;
you already have the original snapshot as a fallback if everything
goes sideways.)
### Restore drill
The restore is exercised at three levels:
1. **Unit tests.** `tests/test_backup.py` has six restore tests
(refuse-without-confirm, invalid backup, full round-trip,
Chroma round-trip, inode-stability regression, WAL sidecar
cleanup, skip-pre-snapshot). These run in CI on every commit.
2. **Module-level round-trip.**
`test_restore_round_trip_reverses_post_backup_mutations` is
the canonical drill in code form: seed baseline, snapshot,
mutate, restore, assert mutation reversed + baseline survived
+ pre-restore snapshot captured the mutation.
3. **Live drill on Dalidou.** Periodically run the full procedure
against the real service with a disposable drill-marker
memory (created via `POST /memory` with `memory_type=episodic`
and `project=drill`), following the sequence above and then
verifying the marker is gone afterward via
`GET /memory?project=drill`. The first such drill on
2026-04-09 surfaced the bind-mount bug; future runs
primarily exist to verify the fix stays fixed.
Run the live drill:
- **Before** enabling any new write-path automation (auto-capture,
automated ingestion, reinforcement sweeps).
- **After** any change to `src/atocore/ops/backup.py` or to
schema migrations in `src/atocore/models/database.py`.
- **After** a Dalidou OS upgrade or docker version bump.
- **At least once per quarter** as a standing operational check.
- **After any incident** that touched the storage layer.
Record each drill run (stamp, pre-restore snapshot stamp, pass/fail,
any surprises) somewhere durable — a line in the project journal
or a git commit message is enough. A drill you ran once and never
again is barely more than a drill you never ran.
## Retention policy
- **Last 7 daily backups**: kept verbatim
- **Last 4 weekly backups** (Sunday): kept verbatim
- **Last 6 monthly backups** (1st of month): kept verbatim
- **Anything older**: deleted
The retention job is **not yet implemented** and is tracked as a
follow-up. Until then, the snapshots directory grows monotonically.
A simple cron-based cleanup script is the next step:
```cron
0 4 * * * /srv/storage/atocore/scripts/cleanup-old-backups.sh
```
## Common failure modes and what to do about them
| Symptom | Likely cause | Action |
|---|---|---|
| `db_integrity_check_failed` on validation | SQLite snapshot copied while a write was in progress, or disk corruption | Take a fresh backup and validate again. If it fails twice, suspect the underlying disk. |
| `registry_invalid_json` | Registry was being edited at backup time | Take a fresh backup. The registry is small so this is cheap. |
| Restore: `restored_integrity_ok: false` | Source snapshot was itself corrupt (validation should have caught it — file a bug) or copy was interrupted mid-write | Do NOT start the service. Validate the snapshot directly with `python -m atocore.ops.backup validate <STAMP>`, try a different older snapshot, or roll back to the pre-restore safety snapshot. |
| Restore: `OSError [Errno 16] Device or resource busy` on Chroma | Old code tried to `rmtree` the Chroma mount point. Fixed on 2026-04-09 by `test_restore_chroma_does_not_unlink_destination_directory` | Ensure you're running commit 2026-04-09 or later; if you need to work around an older build, use `--no-chroma` and restore Chroma contents manually. |
| `chroma_snapshot_missing` after a restore | Snapshot was DB-only | Either rebuild via fresh ingestion or restore an older snapshot that includes Chroma. |
| Service won't start after restore | Permissions wrong on the restored files | Re-run `chown 1000:1000` (or whatever the gitea/atocore container user is) on the data dir. |
| `/stats` returns 0 documents after restore | The SQL store was restored but the source paths in `source_documents` don't match the current Dalidou paths | This means the backup came from a different deployment. Don't trust this restore — it's pulling from the wrong layout. |
| Drill marker still present after restore | Wrong stamp, service still writing during `docker compose down`, or the restore JSON didn't report `db_restored: true` | Roll back via the pre-restore safety snapshot and retry with the correct source snapshot. |
## Open follow-ups (not yet implemented)
Tracked separately in `docs/next-steps.md` — the list below is the
backup-specific subset.
1. **Retention cleanup script**: see the cron entry above. The
snapshots directory grows monotonically until this exists.
2. **Off-Dalidou backup target**: currently snapshots live on the
same disk as the live data. A real disaster-recovery story
needs at least one snapshot on a different physical machine.
The simplest first step is a periodic `rsync` to the user's
laptop or to another server.
3. **Backup encryption**: snapshots contain raw SQLite and JSON.
Consider age/gpg encryption if backups will be shipped off-site.
4. **Automatic post-backup validation**: today the validator must
be invoked manually. The `create_runtime_backup` function
should call `validate_backup` on its own output and refuse to
declare success if validation fails.
5. **Chroma backup is currently full directory copy** every time.
For large vector stores this gets expensive. A future
improvement would be incremental snapshots via filesystem-level
snapshotting (LVM, btrfs, ZFS).
**Done** (kept for historical reference):
- ~~Implement `restore_runtime_backup()` as a proper module
function so the restore isn't a manual `sudo cp` dance~~ —
landed 2026-04-09 in commit 3362080, followed by the
Chroma bind-mount fix from the first real drill.
## Quickstart cheat sheet
```bash
# Daily backup (DB + registry only — fast)
curl -fsS -X POST http://127.0.0.1:8100/admin/backup \
-H "Content-Type: application/json" -d '{}'
# Weekly backup (DB + registry + Chroma — slower, holds ingestion lock)
curl -fsS -X POST http://127.0.0.1:8100/admin/backup \
-H "Content-Type: application/json" -d '{"include_chroma": true}'
# List backups
curl -fsS http://127.0.0.1:8100/admin/backup | jq '.backups[].stamp'
# Validate the most recent backup
LATEST=$(curl -fsS http://127.0.0.1:8100/admin/backup | jq -r '.backups[-1].stamp')
curl -fsS http://127.0.0.1:8100/admin/backup/$LATEST/validate | jq .
# Full restore (service must be stopped first)
cd /srv/storage/atocore/app/deploy/dalidou
docker compose down
docker compose run --rm --entrypoint python atocore \
-m atocore.ops.backup restore $STAMP --confirm-service-stopped
docker compose up -d
# Live drill: exercise the full create -> mutate -> restore flow
# against the running service. The marker memory uses
# memory_type=episodic (valid types: identity, preference, project,
# episodic, knowledge, adaptation) and project=drill so it's easy
# to find via GET /memory?project=drill before and after.
#
# See the "Restore drill" section above for the full sequence.
STAMP=$(curl -fsS -X POST http://127.0.0.1:8100/admin/backup \
-H 'Content-Type: application/json' \
-d '{"include_chroma": true}' | jq -r '.backup_root' | awk -F/ '{print $NF}')
curl -fsS -X POST http://127.0.0.1:8100/memory \
-H 'Content-Type: application/json' \
-d '{"memory_type":"episodic","content":"DRILL-MARKER","project":"drill","confidence":1.0}'
cd /srv/storage/atocore/app/deploy/dalidou
docker compose down
docker compose run --rm --entrypoint python atocore \
-m atocore.ops.backup restore $STAMP --confirm-service-stopped
docker compose up -d
# Marker should be gone:
curl -fsS 'http://127.0.0.1:8100/memory?project=drill' | jq .
```

View File

@@ -200,10 +200,30 @@ The runtime has now been hardened in a few practical ways:
- SQLite connections use a configurable busy timeout
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
- project registry writes are atomic file replacements rather than in-place rewrites
- a first runtime backup path now exists for:
- SQLite
- project registry
- a full runtime backup and restore path now exists and has been exercised on
live Dalidou:
- SQLite (hot online backup via `conn.backup()`)
- project registry (file copy)
- Chroma vector store (cold directory copy under `exclusive_ingestion()`)
- backup metadata
- `restore_runtime_backup()` with CLI entry point
(`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped`), pre-restore safety snapshot for
rollback, WAL/SHM sidecar cleanup, `PRAGMA integrity_check`
on the restored file
- the first live drill on 2026-04-09 surfaced and fixed a Chroma
restore bug on Docker bind-mounted volumes (`shutil.rmtree`
on a mount point); a regression test now asserts the
destination inode is stable across restore
- deploy provenance is visible end-to-end:
- `/health` reports `build_sha`, `build_time`, `build_branch`
from env vars wired by `deploy.sh`
- `deploy.sh` Step 6 verifies the live `build_sha` matches the
just-built commit (exit code 6 on drift) so "live is current?"
can be answered precisely, not just by `__version__`
- `deploy.sh` Step 1.5 detects that the script itself changed
in the pulled commit and re-execs into the fresh copy, so
the deploy never silently runs the old script against new source
This does not eliminate every concurrency edge, but it materially improves the
current operational baseline.
@@ -224,15 +244,23 @@ This separation is healthy:
## Immediate Next Focus
1. Use the new T420-side organic routing layer in real OpenClaw workflows
2. Tighten retrieval quality for the now fully ingested active project corpora
3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
5. Expand the boring operations baseline:
- restore validation
- Chroma rebuild / backup policy
- retention
6. Only later consider write-back, reflection, or deeper autonomous behaviors
1. ~~Re-run the full backup/restore drill~~ — DONE 2026-04-11,
full pass (db, registry, chroma, integrity all true)
2. ~~Turn on auto-capture of Claude Code sessions in conservative
mode~~ — DONE 2026-04-11, Stop hook wired via
`deploy/hooks/capture_stop.py``POST /interactions`
with `reinforce=false`; kill switch via
`ATOCORE_CAPTURE_DISABLED=1`
3. Run a short real-use pilot with auto-capture on, verify
interactions are landing in Dalidou, review quality
4. Use the new T420-side organic routing layer in real OpenClaw workflows
4. Tighten retrieval quality for the now fully ingested active project corpora
5. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
6. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
7. Expand the remaining boring operations baseline:
- retention policy cleanup script
- off-Dalidou backup target (rsync or similar)
8. Only later consider write-back, reflection, or deeper autonomous behaviors
See also:

View File

@@ -50,26 +50,205 @@ starting from:
deploy/dalidou/.env.example
```
## Deployment steps
## First-time deployment steps
1. Place the repository under `/srv/storage/atocore/app` — ideally as a
proper git clone so future updates can be pulled, not as a static
snapshot:
```bash
sudo git clone http://dalidou:3000/Antoine/ATOCore.git \
/srv/storage/atocore/app
```
1. Place the repository under `/srv/storage/atocore/app`.
2. Create the canonical directories listed above.
3. Copy `deploy/dalidou/.env.example` to `deploy/dalidou/.env`.
4. Adjust the source paths if your AtoVault/AtoDrive mirrors live elsewhere.
5. Run:
```bash
cd /srv/storage/atocore/app/deploy/dalidou
docker compose up -d --build
```
```bash
cd /srv/storage/atocore/app/deploy/dalidou
docker compose up -d --build
```
6. Validate:
```bash
curl http://127.0.0.1:8100/health
curl http://127.0.0.1:8100/sources
```
## Updating a running deployment
**Use `deploy/dalidou/deploy.sh` for every code update.** It is the
one-shot sync script that:
- fetches latest main from Gitea into `/srv/storage/atocore/app`
- (if the app dir is not a git checkout) backs it up as
`<dir>.pre-git-<timestamp>` and re-clones
- rebuilds the container image
- restarts the container
- waits for `/health` to respond
- compares the reported `code_version` against the
`__version__` in the freshly-pulled source, and exits non-zero
if they don't match (deployment drift detection)
```bash
curl http://127.0.0.1:8100/health
curl http://127.0.0.1:8100/sources
# Normal update from main
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
# Deploy a specific branch or tag
ATOCORE_BRANCH=codex/some-feature \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
# Dry-run: show what would happen without touching anything
ATOCORE_DEPLOY_DRY_RUN=1 \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
# Deploy from a remote host (e.g. the laptop) using the Tailscale
# or LAN address instead of loopback
ATOCORE_GIT_REMOTE=http://192.168.86.50:3000/Antoine/ATOCore.git \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
```
The script is idempotent and safe to re-run. It never touches the
database directly — schema migrations are applied automatically at
service startup by the lifespan handler in `src/atocore/main.py`
which calls `init_db()` (which in turn runs the ALTER TABLE
statements in `_apply_migrations`).
### Troubleshooting hostname resolution
`deploy.sh` defaults `ATOCORE_GIT_REMOTE` to
`http://127.0.0.1:3000/Antoine/ATOCore.git` (loopback) because the
hostname "dalidou" doesn't reliably resolve on the host itself —
the first real Dalidou deploy hit exactly this on 2026-04-08. If
you need to override (e.g. running deploy.sh from a laptop against
the Dalidou LAN), set `ATOCORE_GIT_REMOTE` explicitly.
The same applies to `scripts/atocore_client.py`: its default
`ATOCORE_BASE_URL` is `http://dalidou:8100` for remote callers, but
when running the client on Dalidou itself (or inside the container
via `docker exec`), override to loopback:
```bash
ATOCORE_BASE_URL=http://127.0.0.1:8100 \
python scripts/atocore_client.py health
```
If you see `{"status": "unavailable", "fail_open": true}` from the
client, the first thing to check is whether the base URL resolves
from where you're running the client.
### The deploy.sh self-update race
When `deploy.sh` itself changes in the commit being pulled, the
first run after the update is still executing the *old* script from
the bash process's in-memory copy. `git reset --hard` updates the
file on disk, but the running bash has already loaded the
instructions. On 2026-04-09 this silently shipped an "unknown"
`build_sha` because the old Step 2 (which predated env-var export)
ran against fresh source.
`deploy.sh` now detects this: Step 1.5 compares the sha1 of `$0`
(the running script) against the sha1 of
`$APP_DIR/deploy/dalidou/deploy.sh` (the on-disk copy) after the
git reset. If they differ, it sets `ATOCORE_DEPLOY_REEXECED=1` and
`exec`s the fresh copy so the rest of the deploy runs under the new
script. The sentinel env var prevents infinite recursion.
You'll see this in the logs as:
```text
==> Step 1.5: deploy.sh changed in the pulled commit; re-exec'ing
==> running script hash: <old>
==> on-disk script hash: <new>
==> re-exec -> /srv/storage/atocore/app/deploy/dalidou/deploy.sh
```
To opt out (debugging, for example), pre-set
`ATOCORE_DEPLOY_REEXECED=1` before invoking `deploy.sh` and the
self-update guard will be skipped.
### Deployment drift detection
`/health` reports drift signals at three increasing levels of
precision:
| Field | Source | Precision | When to use |
|---|---|---|---|
| `version` / `code_version` | `atocore.__version__` (manual bump) | coarse — same value across many commits | quick smoke check that the right *release* is running |
| `build_sha` | `ATOCORE_BUILD_SHA` env var, set by `deploy.sh` per build | precise — changes per commit | the canonical drift signal |
| `build_time` / `build_branch` | same env var path | per-build | forensics when multiple branches in flight |
The **precise** check (run on the laptop or any host that can curl
the live service AND has the source repo at hand):
```bash
# What's actually running on Dalidou
LIVE_SHA=$(curl -fsS http://dalidou:8100/health | grep -o '"build_sha":"[^"]*"' | cut -d'"' -f4)
# What the deployed branch tip should be
EXPECTED_SHA=$(cd /srv/storage/atocore/app && git rev-parse HEAD)
# Compare
if [ "$LIVE_SHA" = "$EXPECTED_SHA" ]; then
echo "live is current at $LIVE_SHA"
else
echo "DRIFT: live $LIVE_SHA vs expected $EXPECTED_SHA"
echo "run deploy.sh to sync"
fi
```
The `deploy.sh` script does exactly this comparison automatically
in its post-deploy verification step (Step 6) and exits non-zero
on mismatch. So the **simplest drift check** is just to run
`deploy.sh` — if there's nothing to deploy, it succeeds quickly;
if the live service is stale, it deploys and verifies.
If `/health` reports `build_sha: "unknown"`, the running container
was started without `deploy.sh` (probably via `docker compose up`
directly), and the build provenance was never recorded. Re-run
via `deploy.sh` to fix.
The coarse `code_version` check is still useful as a quick visual
sanity check — bumping `__version__` from `0.2.0` to `0.3.0`
signals a meaningful release boundary even if the precise
`build_sha` is what tools should compare against:
```bash
# Quick sanity check (coarse)
curl -s http://127.0.0.1:8100/health | grep -o '"code_version":"[^"]*"'
grep '__version__' /srv/storage/atocore/app/src/atocore/__init__.py
```
### Schema migrations on redeploy
When updating from an older `__version__`, the first startup after
the redeploy runs the idempotent ALTER TABLE migrations in
`_apply_migrations`. For a pre-0.2.0 → 0.2.0 upgrade the migrations
add these columns to existing tables (all with safe defaults so no
data is touched):
- `memories.project TEXT DEFAULT ''`
- `memories.last_referenced_at DATETIME`
- `memories.reference_count INTEGER DEFAULT 0`
- `interactions.response TEXT DEFAULT ''`
- `interactions.memories_used TEXT DEFAULT '[]'`
- `interactions.chunks_used TEXT DEFAULT '[]'`
- `interactions.client TEXT DEFAULT ''`
- `interactions.session_id TEXT DEFAULT ''`
- `interactions.project TEXT DEFAULT ''`
Plus new indexes on the new columns. No row data is modified. The
migration is safe to run against a database that already has the
columns — the `_column_exists` check makes each ALTER a no-op in
that case.
Backup the database before any redeploy (via `POST /admin/backup`)
if you want a pre-upgrade snapshot. The migration is additive and
reversible by restoring the snapshot.
## Deferred
- backup automation

View File

@@ -44,8 +44,9 @@ read-only additive mode.
### Engineering Layer Planning Sprint
The engineering layer is intentionally in planning, not implementation.
The architecture docs below are the current state of that planning:
**Status: complete.** All 8 architecture docs are drafted. The
engineering layer is now ready for V1 implementation against the
active project set.
- [engineering-query-catalog.md](architecture/engineering-query-catalog.md) —
the 20 v1-required queries the engineering layer must answer
@@ -55,17 +56,44 @@ The architecture docs below are the current state of that planning:
Layer 0 → Layer 2 pipeline, triggers, review queue mechanics
- [conflict-model.md](architecture/conflict-model.md) —
detection, representation, and resolution of contradictory facts
- [tool-handoff-boundaries.md](architecture/tool-handoff-boundaries.md) —
KB-CAD / KB-FEM one-way mirror stance, ingest endpoints, drift handling
- [representation-authority.md](architecture/representation-authority.md) —
canonical home matrix across PKM / KB / repos / AtoCore for 22 fact kinds
- [human-mirror-rules.md](architecture/human-mirror-rules.md) —
templates, regeneration triggers, edit flow, "do not edit" enforcement
- [engineering-v1-acceptance.md](architecture/engineering-v1-acceptance.md) —
measurable done definition with 23 acceptance criteria
- [engineering-knowledge-hybrid-architecture.md](architecture/engineering-knowledge-hybrid-architecture.md) —
the 5-layer model (from the previous planning wave)
- [engineering-ontology-v1.md](architecture/engineering-ontology-v1.md) —
the initial V1 object and relationship inventory (previous wave)
- [project-identity-canonicalization.md](architecture/project-identity-canonicalization.md) —
the helper-at-every-service-boundary contract that keeps the
trust hierarchy dependable across alias and canonical-id callers;
required reading before adding new project-keyed entity surfaces
in the V1 implementation sprint
Still to draft before engineering-layer implementation begins:
The next concrete next step is the V1 implementation sprint, which
should follow engineering-v1-acceptance.md as its checklist, and
must apply the project-identity-canonicalization contract at every
new service-layer entry point.
- tool-handoff-boundaries.md (KB-CAD / KB-FEM read vs write)
- human-mirror-rules.md (templates, triggers, edit flow)
- representation-authority.md (PKM / KB / repo / AtoCore canonical home matrix)
- engineering-v1-acceptance.md (done definition)
### LLM Client Integration
A separate but related architectural concern: how AtoCore is reachable
from many different LLM client contexts (OpenClaw, Claude Code, future
Codex skills, future MCP server). The layering rule is documented in:
- [llm-client-integration.md](architecture/llm-client-integration.md) —
three-layer shape: HTTP API → shared operator client
(`scripts/atocore_client.py`) → per-agent thin frontends; the
shared client is the canonical backbone every new client should
shell out to instead of reimplementing HTTP calls
This sits implicitly between Phase 8 (OpenClaw) and Phase 11
(multi-model). Memory-review and engineering-entity commands are
deferred from the shared client until their workflows are exercised.
## What Is Real Today

View File

@@ -20,45 +20,65 @@ This working list should be read alongside:
## Immediate Next Steps
1. Use the T420 `atocore-context` skill and the new organic routing layer in
1. ~~Re-run the backup/restore drill~~ — DONE 2026-04-11, full pass
2. ~~Turn on auto-capture of Claude Code sessions~~ — DONE 2026-04-11,
Stop hook via `deploy/hooks/capture_stop.py``POST /interactions`
with `reinforce=false`; kill switch: `ATOCORE_CAPTURE_DISABLED=1`
2a. Run a short real-use pilot with auto-capture on
- verify interactions are landing in Dalidou
- check prompt/response quality and truncation
- confirm fail-open: no user-visible impact when Dalidou is down
3. Use the T420 `atocore-context` skill and the new organic routing layer in
real OpenClaw workflows
- confirm `auto-context` feels natural
- confirm project inference is good enough in practice
- confirm the fail-open behavior remains acceptable in practice
2. Review retrieval quality after the first real project ingestion batch
4. Review retrieval quality after the first real project ingestion batch
- check whether the top hits are useful
- check whether trusted project state remains dominant
- reduce cross-project competition and prompt ambiguity where needed
- use `debug-context` to inspect the exact last AtoCore supplement
3. Treat the active-project full markdown/text wave as complete
5. Treat the active-project full markdown/text wave as complete
- `p04-gigabit`
- `p05-interferometer`
- `p06-polisher`
4. Define a cleaner source refresh model
6. Define a cleaner source refresh model
- make the difference between source truth, staged inputs, and machine store
explicit
- move toward a project source registry and refresh workflow
- foundation now exists via project registry + per-project refresh API
- registration policy + template + proposal + approved registration are now
the normal path for new projects
5. Move to Wave 2 trusted-operational ingestion
7. Move to Wave 2 trusted-operational ingestion
- curated dashboards
- decision logs
- milestone/current-status views
- operational truth, not just raw project notes
6. Integrate the new engineering architecture docs into active planning, not immediate schema code
8. Integrate the new engineering architecture docs into active planning, not immediate schema code
- keep `docs/architecture/engineering-knowledge-hybrid-architecture.md` as the target layer model
- keep `docs/architecture/engineering-ontology-v1.md` as the V1 structured-domain target
- do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
7. Define backup and export procedures for Dalidou
- exercise the new SQLite + registry snapshot path on Dalidou
- Chroma backup or rebuild policy
- retention and restore validation
- admin backup endpoint now supports `include_chroma` cold snapshot
under the ingestion lock and `validate` confirms each snapshot is
openable; remaining work is the operational retention policy
8. Keep deeper automatic runtime integration modest until the organic read-only
model has proven value
9. Finish the boring operations baseline around backup
- retention policy cleanup script (snapshots dir grows
monotonically today)
- off-Dalidou backup target (at minimum an rsync to laptop or
another host so a single-disk failure isn't terminal)
- automatic post-backup validation (have `create_runtime_backup`
call `validate_backup` on its own output and refuse to
declare success if validation fails)
- DONE in commits be40994 / 0382238 / 3362080 / this one:
- `create_runtime_backup` + `list_runtime_backups` +
`validate_backup` + `restore_runtime_backup` with CLI
- `POST /admin/backup` with `include_chroma=true` under
the ingestion lock
- `/health` build_sha / build_time / build_branch provenance
- `deploy.sh` self-update re-exec guard + build_sha drift
verification
- live drill procedure in `docs/backup-restore-procedure.md`
with failure-mode table and the memory_type=episodic
marker pattern from the 2026-04-09 drill
10. Keep deeper automatic runtime integration modest until the organic read-only
model has proven value
## Trusted State Status

96
docs/operations.md Normal file
View File

@@ -0,0 +1,96 @@
# AtoCore Operations
Current operating order for improving AtoCore:
1. Retrieval-quality pass
2. Wave 2 trusted-operational ingestion
3. AtoDrive clarification
4. Restore and ops validation
## Retrieval-Quality Pass
Current live behavior:
- broad prompts like `gigabit` and `polisher` can surface archive/history noise
- meaningful project prompts perform much better
- ranking quality now matters more than raw corpus growth
Use the operator client to audit retrieval:
```bash
python scripts/atocore_client.py audit-query "gigabit" 5
python scripts/atocore_client.py audit-query "polisher" 5
python scripts/atocore_client.py audit-query "mirror frame stiffness requirements and selected architecture" 5 p04-gigabit
python scripts/atocore_client.py audit-query "interferometer error budget and vendor selection constraints" 5 p05-interferometer
python scripts/atocore_client.py audit-query "polisher system map shared contracts and calibration workflow" 5 p06-polisher
```
What to improve:
- reduce `_archive`, `pre-cleanup`, `pre-migration`, and `History` prominence
- prefer current-status, decision, requirement, architecture-freeze, and milestone docs
- prefer trusted project-state when it expresses current truth
- avoid letting broad single-word prompts drift into stale chunks
## Wave 2 Trusted-Operational Ingestion
Do not ingest the whole PKM vault next.
Prioritize, for each active project:
- current status
- current decisions
- requirements baseline
- architecture freeze / current baseline
- milestone plan
- next actions
Useful commands:
```bash
python scripts/atocore_client.py project-state p04-gigabit
python scripts/atocore_client.py project-state p05-interferometer
python scripts/atocore_client.py project-state p06-polisher
python scripts/atocore_client.py refresh-project p04-gigabit
python scripts/atocore_client.py refresh-project p05-interferometer
python scripts/atocore_client.py refresh-project p06-polisher
```
## AtoDrive Clarification
Treat AtoDrive as a curated trusted-operational source, not a generic dump.
Good candidates:
- current dashboards
- approved baselines
- architecture freezes
- decision logs
- milestone and next-step views
Avoid by default:
- duplicated exports
- stale snapshots
- generic archives
- exploratory notes that are not designated current truth
## Restore and Ops Validation
Backups are not enough until restore has been tested.
Validate:
- SQLite metadata restore
- Chroma restore or rebuild
- project registry restore
- project refresh after recovery
- retrieval audit before and after recovery
Baseline capture:
```bash
python scripts/atocore_client.py health
python scripts/atocore_client.py stats
python scripts/atocore_client.py projects
```

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "atocore"
version = "0.1.0"
version = "0.2.0"
description = "Personal context engine for LLM interactions"
requires-python = ">=3.11"
dependencies = [

483
scripts/atocore_client.py Normal file
View File

@@ -0,0 +1,483 @@
"""Operator-facing API client for live AtoCore instances.
This script is intentionally external to the app runtime. It is for admins
and operators who want a convenient way to inspect live project state,
refresh projects, audit retrieval quality, manage trusted project-state
entries, and drive the Phase 9 reflection loop (capture, extract, queue,
promote, reject).
Environment variables
---------------------
ATOCORE_BASE_URL
Base URL of the AtoCore service (default: ``http://dalidou:8100``).
When running ON the Dalidou host itself or INSIDE the Dalidou
container, override this with loopback or the real IP::
ATOCORE_BASE_URL=http://127.0.0.1:8100 \\
python scripts/atocore_client.py health
The default hostname "dalidou" is meant for cases where the
caller is a remote machine (laptop, T420/OpenClaw, etc.) with
"dalidou" in its /etc/hosts or resolvable via Tailscale. It does
NOT reliably resolve on the host itself or inside the container,
and when it fails the client returns
``{"status": "unavailable", "fail_open": true}`` — the right
diagnosis when that happens is to set ATOCORE_BASE_URL explicitly
to 127.0.0.1:8100 and retry.
ATOCORE_TIMEOUT_SECONDS
Request timeout for most operations (default: 30).
ATOCORE_REFRESH_TIMEOUT_SECONDS
Longer timeout for project refresh operations which can be slow
(default: 1800).
ATOCORE_FAIL_OPEN
When "true" (default), network errors return a small fail-open
envelope instead of raising. Set to "false" for admin operations
where you need the real error.
"""
from __future__ import annotations
import argparse
import json
import os
import re
import sys
import urllib.error
import urllib.parse
import urllib.request
from typing import Any
BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100").rstrip("/")
TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
REFRESH_TIMEOUT = int(os.environ.get("ATOCORE_REFRESH_TIMEOUT_SECONDS", "1800"))
FAIL_OPEN = os.environ.get("ATOCORE_FAIL_OPEN", "true").lower() == "true"
# Bumped when the subcommand surface or JSON output shapes meaningfully
# change. See docs/architecture/llm-client-integration.md for the
# semver rules. History:
# 0.1.0 initial stable-ops-only client
# 0.2.0 Phase 9 reflection loop added: capture, extract,
# reinforce-interaction, list-interactions, get-interaction,
# queue, promote, reject
CLIENT_VERSION = "0.2.0"
def print_json(payload: Any) -> None:
print(json.dumps(payload, ensure_ascii=True, indent=2))
def fail_open_payload() -> dict[str, Any]:
return {"status": "unavailable", "source": "atocore", "fail_open": True}
def request(
method: str,
path: str,
data: dict[str, Any] | None = None,
timeout: int | None = None,
) -> Any:
url = f"{BASE_URL}{path}"
headers = {"Content-Type": "application/json"} if data is not None else {}
payload = json.dumps(data).encode("utf-8") if data is not None else None
req = urllib.request.Request(url, data=payload, headers=headers, method=method)
try:
with urllib.request.urlopen(req, timeout=timeout or TIMEOUT) as response:
body = response.read().decode("utf-8")
except urllib.error.HTTPError as exc:
body = exc.read().decode("utf-8")
if body:
print(body)
raise SystemExit(22) from exc
except (urllib.error.URLError, TimeoutError, OSError):
if FAIL_OPEN:
print_json(fail_open_payload())
raise SystemExit(0)
raise
if not body.strip():
return {}
return json.loads(body)
def parse_aliases(aliases_csv: str) -> list[str]:
return [alias.strip() for alias in aliases_csv.split(",") if alias.strip()]
def detect_project(prompt: str) -> dict[str, Any]:
payload = request("GET", "/projects")
prompt_lower = prompt.lower()
best_project = None
best_alias = None
best_score = -1
for project in payload.get("projects", []):
candidates = [project.get("id", ""), *project.get("aliases", [])]
for candidate in candidates:
candidate = (candidate or "").strip()
if not candidate:
continue
pattern = rf"(?<![a-z0-9]){re.escape(candidate.lower())}(?![a-z0-9])"
matched = re.search(pattern, prompt_lower) is not None
if not matched and candidate.lower() not in prompt_lower:
continue
score = len(candidate)
if score > best_score:
best_project = project.get("id")
best_alias = candidate
best_score = score
return {"matched_project": best_project, "matched_alias": best_alias}
def classify_result(result: dict[str, Any]) -> dict[str, Any]:
source_file = (result.get("source_file") or "").lower()
heading = (result.get("heading_path") or "").lower()
title = (result.get("title") or "").lower()
text = " ".join([source_file, heading, title])
labels: list[str] = []
if any(token in text for token in ["_archive", "/archive", "archive/", "pre-cleanup", "pre-migration", "history"]):
labels.append("archive_or_history")
if any(token in text for token in ["status", "dashboard", "current-state", "current state", "next-steps", "next steps"]):
labels.append("current_status")
if any(token in text for token in ["decision", "adr", "tradeoff", "selected architecture", "selection"]):
labels.append("decision")
if any(token in text for token in ["requirement", "spec", "constraints", "baseline", "cdr", "sow"]):
labels.append("requirements")
if any(token in text for token in ["roadmap", "milestone", "plan", "workflow", "calibration", "contract"]):
labels.append("execution_plan")
if not labels:
labels.append("reference")
return {
"score": result.get("score"),
"title": result.get("title"),
"heading_path": result.get("heading_path"),
"source_file": result.get("source_file"),
"labels": labels,
"is_noise_risk": "archive_or_history" in labels,
}
def audit_query(prompt: str, top_k: int, project: str | None) -> dict[str, Any]:
response = request(
"POST",
"/query",
{"prompt": prompt, "top_k": top_k, "project": project or None},
)
classifications = [classify_result(result) for result in response.get("results", [])]
broad_prompt = len(prompt.split()) <= 2
noise_hits = sum(1 for item in classifications if item["is_noise_risk"])
current_hits = sum(1 for item in classifications if "current_status" in item["labels"])
decision_hits = sum(1 for item in classifications if "decision" in item["labels"])
requirements_hits = sum(1 for item in classifications if "requirements" in item["labels"])
recommendations: list[str] = []
if broad_prompt:
recommendations.append("Prompt is broad; prefer a project-specific question with intent, artifact type, or constraint language.")
if noise_hits:
recommendations.append("Archive/history noise is present; prefer current-status, decision, requirements, and baseline docs in the next ingestion/ranking pass.")
if current_hits == 0:
recommendations.append("No current-status docs surfaced in the top results; Wave 2 should ingest or strengthen trusted operational truth.")
if decision_hits == 0:
recommendations.append("No decision docs surfaced in the top results; add or freeze decision logs for the active project.")
if requirements_hits == 0:
recommendations.append("No requirements/baseline docs surfaced in the top results; prioritize baseline and architecture-freeze material.")
if not recommendations:
recommendations.append("Ranking looks healthy for this prompt.")
return {
"prompt": prompt,
"project": project,
"top_k": top_k,
"broad_prompt": broad_prompt,
"noise_hits": noise_hits,
"current_status_hits": current_hits,
"decision_hits": decision_hits,
"requirements_hits": requirements_hits,
"results": classifications,
"recommendations": recommendations,
}
def project_payload(
project_id: str,
aliases_csv: str,
source: str,
subpath: str,
description: str,
label: str,
) -> dict[str, Any]:
return {
"project_id": project_id,
"aliases": parse_aliases(aliases_csv),
"description": description,
"ingest_roots": [{"source": source, "subpath": subpath, "label": label}],
}
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description="AtoCore live API client")
sub = parser.add_subparsers(dest="command", required=True)
for name in ["health", "sources", "stats", "projects", "project-template", "debug-context", "ingest-sources"]:
sub.add_parser(name)
p = sub.add_parser("detect-project")
p.add_argument("prompt")
p = sub.add_parser("auto-context")
p.add_argument("prompt")
p.add_argument("budget", nargs="?", type=int, default=3000)
p.add_argument("project", nargs="?", default="")
for name in ["propose-project", "register-project"]:
p = sub.add_parser(name)
p.add_argument("project_id")
p.add_argument("aliases_csv")
p.add_argument("source")
p.add_argument("subpath")
p.add_argument("description", nargs="?", default="")
p.add_argument("label", nargs="?", default="")
p = sub.add_parser("update-project")
p.add_argument("project")
p.add_argument("description")
p.add_argument("aliases_csv", nargs="?", default="")
p = sub.add_parser("refresh-project")
p.add_argument("project")
p.add_argument("purge_deleted", nargs="?", default="false")
p = sub.add_parser("project-state")
p.add_argument("project")
p.add_argument("category", nargs="?", default="")
p = sub.add_parser("project-state-set")
p.add_argument("project")
p.add_argument("category")
p.add_argument("key")
p.add_argument("value")
p.add_argument("source", nargs="?", default="")
p.add_argument("confidence", nargs="?", type=float, default=1.0)
p = sub.add_parser("project-state-invalidate")
p.add_argument("project")
p.add_argument("category")
p.add_argument("key")
p = sub.add_parser("query")
p.add_argument("prompt")
p.add_argument("top_k", nargs="?", type=int, default=5)
p.add_argument("project", nargs="?", default="")
p = sub.add_parser("context-build")
p.add_argument("prompt")
p.add_argument("project", nargs="?", default="")
p.add_argument("budget", nargs="?", type=int, default=3000)
p = sub.add_parser("audit-query")
p.add_argument("prompt")
p.add_argument("top_k", nargs="?", type=int, default=5)
p.add_argument("project", nargs="?", default="")
# --- Phase 9 reflection loop surface --------------------------------
#
# capture: record one interaction (prompt + response + context used).
# Mirrors POST /interactions. response is positional so shell
# callers can pass it via $(cat file.txt) or heredoc. project,
# client, and session_id are optional positionals with empty
# defaults, matching the existing script's style.
p = sub.add_parser("capture")
p.add_argument("prompt")
p.add_argument("response", nargs="?", default="")
p.add_argument("project", nargs="?", default="")
p.add_argument("client", nargs="?", default="")
p.add_argument("session_id", nargs="?", default="")
p.add_argument("reinforce", nargs="?", default="true")
# extract: run the Phase 9 C rule-based extractor against an
# already-captured interaction. persist='true' writes the
# candidates as status='candidate' memories; default is
# preview-only.
p = sub.add_parser("extract")
p.add_argument("interaction_id")
p.add_argument("persist", nargs="?", default="false")
# reinforce: backfill reinforcement on an already-captured interaction.
p = sub.add_parser("reinforce-interaction")
p.add_argument("interaction_id")
# list-interactions: paginated listing with filters.
p = sub.add_parser("list-interactions")
p.add_argument("project", nargs="?", default="")
p.add_argument("session_id", nargs="?", default="")
p.add_argument("client", nargs="?", default="")
p.add_argument("since", nargs="?", default="")
p.add_argument("limit", nargs="?", type=int, default=50)
# get-interaction: fetch one by id
p = sub.add_parser("get-interaction")
p.add_argument("interaction_id")
# queue: list the candidate review queue
p = sub.add_parser("queue")
p.add_argument("memory_type", nargs="?", default="")
p.add_argument("project", nargs="?", default="")
p.add_argument("limit", nargs="?", type=int, default=50)
# promote: candidate -> active
p = sub.add_parser("promote")
p.add_argument("memory_id")
# reject: candidate -> invalid
p = sub.add_parser("reject")
p.add_argument("memory_id")
return parser
def main() -> int:
args = build_parser().parse_args()
cmd = args.command
if cmd == "health":
print_json(request("GET", "/health"))
elif cmd == "sources":
print_json(request("GET", "/sources"))
elif cmd == "stats":
print_json(request("GET", "/stats"))
elif cmd == "projects":
print_json(request("GET", "/projects"))
elif cmd == "project-template":
print_json(request("GET", "/projects/template"))
elif cmd == "debug-context":
print_json(request("GET", "/debug/context"))
elif cmd == "ingest-sources":
print_json(request("POST", "/ingest/sources", {}))
elif cmd == "detect-project":
print_json(detect_project(args.prompt))
elif cmd == "auto-context":
project = args.project or detect_project(args.prompt).get("matched_project") or ""
if not project:
print_json({"status": "no_project_match", "source": "atocore", "mode": "auto-context"})
else:
print_json(request("POST", "/context/build", {"prompt": args.prompt, "project": project, "budget": args.budget}))
elif cmd in {"propose-project", "register-project"}:
path = "/projects/proposal" if cmd == "propose-project" else "/projects/register"
print_json(request("POST", path, project_payload(args.project_id, args.aliases_csv, args.source, args.subpath, args.description, args.label)))
elif cmd == "update-project":
payload: dict[str, Any] = {"description": args.description}
if args.aliases_csv.strip():
payload["aliases"] = parse_aliases(args.aliases_csv)
print_json(request("PUT", f"/projects/{urllib.parse.quote(args.project)}", payload))
elif cmd == "refresh-project":
purge_deleted = args.purge_deleted.lower() in {"1", "true", "yes", "y"}
path = f"/projects/{urllib.parse.quote(args.project)}/refresh?purge_deleted={str(purge_deleted).lower()}"
print_json(request("POST", path, {}, timeout=REFRESH_TIMEOUT))
elif cmd == "project-state":
suffix = f"?category={urllib.parse.quote(args.category)}" if args.category else ""
print_json(request("GET", f"/project/state/{urllib.parse.quote(args.project)}{suffix}"))
elif cmd == "project-state-set":
print_json(request("POST", "/project/state", {
"project": args.project,
"category": args.category,
"key": args.key,
"value": args.value,
"source": args.source,
"confidence": args.confidence,
}))
elif cmd == "project-state-invalidate":
print_json(request("DELETE", "/project/state", {"project": args.project, "category": args.category, "key": args.key}))
elif cmd == "query":
print_json(request("POST", "/query", {"prompt": args.prompt, "top_k": args.top_k, "project": args.project or None}))
elif cmd == "context-build":
print_json(request("POST", "/context/build", {"prompt": args.prompt, "project": args.project or None, "budget": args.budget}))
elif cmd == "audit-query":
print_json(audit_query(args.prompt, args.top_k, args.project or None))
# --- Phase 9 reflection loop surface ------------------------------
elif cmd == "capture":
body: dict[str, Any] = {
"prompt": args.prompt,
"response": args.response,
"project": args.project,
"client": args.client or "atocore-client",
"session_id": args.session_id,
"reinforce": args.reinforce.lower() in {"1", "true", "yes", "y"},
}
print_json(request("POST", "/interactions", body))
elif cmd == "extract":
persist = args.persist.lower() in {"1", "true", "yes", "y"}
print_json(
request(
"POST",
f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}/extract",
{"persist": persist},
)
)
elif cmd == "reinforce-interaction":
print_json(
request(
"POST",
f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}/reinforce",
{},
)
)
elif cmd == "list-interactions":
query_parts: list[str] = []
if args.project:
query_parts.append(f"project={urllib.parse.quote(args.project)}")
if args.session_id:
query_parts.append(f"session_id={urllib.parse.quote(args.session_id)}")
if args.client:
query_parts.append(f"client={urllib.parse.quote(args.client)}")
if args.since:
query_parts.append(f"since={urllib.parse.quote(args.since)}")
query_parts.append(f"limit={int(args.limit)}")
query = "?" + "&".join(query_parts)
print_json(request("GET", f"/interactions{query}"))
elif cmd == "get-interaction":
print_json(
request(
"GET",
f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}",
)
)
elif cmd == "queue":
query_parts = ["status=candidate"]
if args.memory_type:
query_parts.append(f"memory_type={urllib.parse.quote(args.memory_type)}")
if args.project:
query_parts.append(f"project={urllib.parse.quote(args.project)}")
query_parts.append(f"limit={int(args.limit)}")
query = "?" + "&".join(query_parts)
print_json(request("GET", f"/memory{query}"))
elif cmd == "promote":
print_json(
request(
"POST",
f"/memory/{urllib.parse.quote(args.memory_id, safe='')}/promote",
{},
)
)
elif cmd == "reject":
print_json(
request(
"POST",
f"/memory/{urllib.parse.quote(args.memory_id, safe='')}/reject",
{},
)
)
else:
return 1
return 0
if __name__ == "__main__":
raise SystemExit(main())

File diff suppressed because it is too large Load Diff

View File

@@ -1,3 +1,15 @@
"""AtoCore — Personal Context Engine."""
__version__ = "0.1.0"
# Bumped when a commit meaningfully changes the API surface, schema, or
# user-visible behavior. The /health endpoint reports this value so
# deployment drift is immediately visible: if the running service's
# /health reports an older version than the main branch's __version__,
# the deployment is stale and needs a redeploy (see
# docs/dalidou-deployment.md and deploy/dalidou/deploy.sh).
#
# History:
# 0.1.0 Phase 0/0.5/1/2/3/5/7 baseline
# 0.2.0 Phase 9 reflection loop (capture/reinforce/extract + review
# queue), shared client v0.2.0, project identity
# canonicalization at every service-layer entry point
__version__ = "0.2.0"

View File

@@ -742,12 +742,45 @@ def api_validate_backup(stamp: str) -> dict:
@router.get("/health")
def api_health() -> dict:
"""Health check."""
"""Health check.
Three layers of version reporting, in increasing precision:
- ``version`` / ``code_version``: ``atocore.__version__`` (e.g.
"0.2.0"). Bumped manually on commits that change the API
surface, schema, or user-visible behavior. Coarse — any
number of commits can land between bumps without changing
this value.
- ``build_sha``: full git SHA of the commit the running
container was built from. Set by ``deploy/dalidou/deploy.sh``
via the ``ATOCORE_BUILD_SHA`` env var on every rebuild.
Reports ``"unknown"`` for builds that bypass deploy.sh
(direct ``docker compose up`` etc.). This is the precise
drift signal: if the live ``build_sha`` doesn't match the
tip of the deployed branch on Gitea, the service is stale
regardless of what ``code_version`` says.
- ``build_time`` / ``build_branch``: when and from which branch
the live container was built. Useful for forensics when
multiple branches are in flight or when build_sha is
ambiguous (e.g. a force-push to the same SHA).
The deploy.sh post-deploy verification step compares the live
``build_sha`` to the SHA it just set, and exits non-zero on
mismatch.
"""
import os
from atocore import __version__
store = get_vector_store()
source_status = get_source_status()
return {
"status": "ok",
"version": "0.1.0",
"version": __version__,
"code_version": __version__,
"build_sha": os.environ.get("ATOCORE_BUILD_SHA", "unknown"),
"build_time": os.environ.get("ATOCORE_BUILD_TIME", "unknown"),
"build_branch": os.environ.get("ATOCORE_BUILD_BRANCH", "unknown"),
"vectors_count": store.count,
"env": _config.settings.env,
"machine_paths": {

View File

@@ -14,6 +14,7 @@ import atocore.config as _config
from atocore.context.project_state import format_project_state, get_state
from atocore.memory.service import get_memories_for_context
from atocore.observability.logger import get_logger
from atocore.projects.registry import resolve_project_name
from atocore.retrieval.retriever import ChunkResult, retrieve
log = get_logger("context_builder")
@@ -84,8 +85,16 @@ def build_context(
max(0, int(budget * PROJECT_STATE_BUDGET_RATIO)),
)
if project_hint:
state_entries = get_state(project_hint)
# Canonicalize the project hint through the registry so callers
# can pass an alias (`p05`, `gigabit`) and still find trusted
# state stored under the canonical project id. The same helper
# is used everywhere a project name crosses a trust boundary
# (project_state, memories, interactions). When the registry has
# no entry the helper returns the input unchanged so hand-curated
# state that predates the registry still works.
canonical_project = resolve_project_name(project_hint) if project_hint else ""
if canonical_project:
state_entries = get_state(canonical_project)
if state_entries:
project_state_text = format_project_state(state_entries)
project_state_text, project_state_chars = _truncate_text_block(

View File

@@ -18,6 +18,7 @@ from datetime import datetime, timezone
from atocore.models.database import get_connection
from atocore.observability.logger import get_logger
from atocore.projects.registry import resolve_project_name
log = get_logger("project_state")
@@ -101,11 +102,19 @@ def set_state(
source: str = "",
confidence: float = 1.0,
) -> ProjectStateEntry:
"""Set or update a project state entry. Upsert semantics."""
"""Set or update a project state entry. Upsert semantics.
The ``project_name`` is canonicalized through the registry so a
caller passing an alias (``p05``) ends up writing into the same
row as the canonical id (``p05-interferometer``). Without this
step, alias and canonical names would create two parallel
project rows and fragmented state.
"""
if category not in CATEGORIES:
raise ValueError(f"Invalid category '{category}'. Must be one of: {CATEGORIES}")
_validate_confidence(confidence)
project_name = resolve_project_name(project_name)
project_id = ensure_project(project_name)
entry_id = str(uuid.uuid4())
now = datetime.now(timezone.utc).isoformat()
@@ -153,7 +162,12 @@ def get_state(
category: str | None = None,
active_only: bool = True,
) -> list[ProjectStateEntry]:
"""Get project state entries, optionally filtered by category."""
"""Get project state entries, optionally filtered by category.
The lookup is canonicalized through the registry so an alias hint
finds the same rows as the canonical id.
"""
project_name = resolve_project_name(project_name)
with get_connection() as conn:
project = conn.execute(
"SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,)
@@ -191,7 +205,12 @@ def get_state(
def invalidate_state(project_name: str, category: str, key: str) -> bool:
"""Mark a project state entry as superseded."""
"""Mark a project state entry as superseded.
The lookup is canonicalized through the registry so an alias is
treated as the canonical project for the invalidation lookup.
"""
project_name = resolve_project_name(project_name)
with get_connection() as conn:
project = conn.execute(
"SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,)

View File

@@ -18,15 +18,24 @@ violating the AtoCore trust hierarchy.
from __future__ import annotations
import json
import re
import uuid
from dataclasses import dataclass, field
from datetime import datetime, timezone
from atocore.models.database import get_connection
from atocore.observability.logger import get_logger
from atocore.projects.registry import resolve_project_name
log = get_logger("interactions")
# Stored timestamps use 'YYYY-MM-DD HH:MM:SS' (no timezone offset, UTC by
# convention) so they sort lexically and compare cleanly with the SQLite
# CURRENT_TIMESTAMP default. The since filter accepts ISO 8601 strings
# (with 'T', optional 'Z' or +offset, optional fractional seconds) and
# normalizes them to the storage format before the SQL comparison.
_STORAGE_TIMESTAMP_FORMAT = "%Y-%m-%d %H:%M:%S"
@dataclass
class Interaction:
@@ -72,6 +81,13 @@ def record_interaction(
if not prompt or not prompt.strip():
raise ValueError("Interaction prompt must be non-empty")
# Canonicalize the project through the registry so an alias and
# the canonical id store under the same bucket. Without this,
# reinforcement and extraction (which both query by raw
# interaction.project) would silently miss memories and create
# candidates in the wrong project.
project = resolve_project_name(project)
interaction_id = str(uuid.uuid4())
# Store created_at explicitly so the same string lives in both the DB
# column and the returned dataclass. SQLite's CURRENT_TIMESTAMP uses
@@ -159,9 +175,14 @@ def list_interactions(
) -> list[Interaction]:
"""List captured interactions, optionally filtered.
``since`` is an ISO timestamp string; only interactions created at or
after that time are returned. ``limit`` is hard-capped at 500 to keep
casual API listings cheap.
``since`` accepts an ISO 8601 timestamp string (with ``T``, an
optional ``Z`` or numeric offset, optional fractional seconds).
The value is normalized to the storage format (UTC,
``YYYY-MM-DD HH:MM:SS``) before the SQL comparison so external
callers can pass any of the common ISO shapes without filter
drift. ``project`` is canonicalized through the registry so an
alias finds rows stored under the canonical project id.
``limit`` is hard-capped at 500 to keep casual API listings cheap.
"""
if limit <= 0:
return []
@@ -172,7 +193,7 @@ def list_interactions(
if project:
query += " AND project = ?"
params.append(project)
params.append(resolve_project_name(project))
if session_id:
query += " AND session_id = ?"
params.append(session_id)
@@ -181,7 +202,7 @@ def list_interactions(
params.append(client)
if since:
query += " AND created_at >= ?"
params.append(since)
params.append(_normalize_since(since))
query += " ORDER BY created_at DESC LIMIT ?"
params.append(limit)
@@ -243,3 +264,41 @@ def _safe_json_dict(raw: str | None) -> dict:
if not isinstance(value, dict):
return {}
return value
def _normalize_since(since: str) -> str:
"""Normalize an ISO 8601 ``since`` filter to the storage format.
Stored ``created_at`` values are ``YYYY-MM-DD HH:MM:SS`` (no
timezone, UTC by convention). External callers naturally pass
ISO 8601 with ``T`` separator, optional ``Z`` suffix, optional
fractional seconds, and optional ``+HH:MM`` offsets. A naive
string comparison between the two formats fails on the same
day because the lexically-greater ``T`` makes any ISO value
sort after any space-separated value.
This helper accepts the common ISO shapes plus the bare
storage format and returns the storage format. On a parse
failure it returns the input unchanged so the SQL comparison
fails open (no rows match) instead of raising and breaking
the listing endpoint.
"""
if not since:
return since
candidate = since.strip()
# Python's fromisoformat understands trailing 'Z' from 3.11+ but
# we replace it explicitly for safety against earlier shapes.
if candidate.endswith("Z"):
candidate = candidate[:-1] + "+00:00"
try:
dt = datetime.fromisoformat(candidate)
except ValueError:
# Already in storage format, or unparseable: best-effort
# match the storage format with a regex; if that fails too,
# return the raw input.
if re.fullmatch(r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", since):
return since
return since
if dt.tzinfo is not None:
dt = dt.astimezone(timezone.utc).replace(tzinfo=None)
return dt.strftime(_STORAGE_TIMESTAMP_FORMAT)

View File

@@ -4,6 +4,7 @@ from contextlib import asynccontextmanager
from fastapi import FastAPI
from atocore import __version__
from atocore.api.routes import router
import atocore.config as _config
from atocore.context.project_state import init_project_state_schema
@@ -43,7 +44,7 @@ async def lifespan(app: FastAPI):
app = FastAPI(
title="AtoCore",
description="Personal Context Engine for LLM interactions",
version="0.1.0",
version=__version__,
lifespan=lifespan,
)

View File

@@ -8,10 +8,11 @@ given memory, without ever promoting anything new into trusted state.
Design notes
------------
- Matching is intentionally simple and explainable:
* normalize both sides (lowercase, collapse whitespace)
* require the normalized memory content (or its first 80 chars) to
appear as a substring in the normalized response
- Matching uses token-overlap: tokenize both sides (lowercase, stem,
drop stop words), then check whether >= 70 % of the memory's content
tokens appear in the response token set. This handles natural
paraphrases (e.g. "prefers" vs "prefer", "because history" vs
"because the history") that substring matching missed.
- Candidates and invalidated memories are NEVER considered — reinforcement
must not revive history.
- Reinforcement is capped at 1.0 and monotonically non-decreasing.
@@ -43,9 +44,12 @@ log = get_logger("reinforcement")
# memories like "prefers Python".
_MIN_MEMORY_CONTENT_LENGTH = 12
# When a memory's content is very long, match on its leading window only
# to avoid punishing small paraphrases further into the body.
_MATCH_WINDOW_CHARS = 80
# Token-overlap matching constants.
_STOP_WORDS: frozenset[str] = frozenset({
"the", "a", "an", "and", "or", "of", "to", "is", "was",
"that", "this", "with", "for", "from", "into",
})
_MATCH_THRESHOLD = 0.70
DEFAULT_CONFIDENCE_DELTA = 0.02
@@ -144,12 +148,58 @@ def _normalize(text: str) -> str:
return collapsed.strip()
def _stem(word: str) -> str:
"""Aggressive suffix-folding so inflected forms collapse.
Handles trailing ``ing``, ``ed``, and ``s`` — good enough for
reinforcement matching without pulling in nltk/snowball.
"""
# Order matters: try longest suffix first.
if word.endswith("ing") and len(word) >= 6:
return word[:-3]
if word.endswith("ed") and len(word) > 4:
stem = word[:-2]
# "preferred" → "preferr" → "prefer" (doubled consonant before -ed)
if len(stem) >= 3 and stem[-1] == stem[-2]:
stem = stem[:-1]
return stem
if word.endswith("s") and len(word) > 3:
return word[:-1]
return word
def _tokenize(text: str) -> set[str]:
"""Split normalized text into a stemmed token set.
Strips punctuation, drops words shorter than 3 chars and stop words.
"""
tokens: set[str] = set()
for raw in text.split():
# Strip leading/trailing punctuation (commas, periods, quotes, etc.)
word = raw.strip(".,;:!?\"'()[]{}-/")
if len(word) < 3:
continue
if word in _STOP_WORDS:
continue
tokens.add(_stem(word))
return tokens
def _memory_matches(memory_content: str, normalized_response: str) -> bool:
"""Return True if the memory content appears in the response."""
"""Return True if enough of the memory's tokens appear in the response.
Uses token-overlap: tokenize both sides (lowercase, stem, drop stop
words), then check whether >= 70 % of the memory's content tokens
appear in the response token set.
"""
if not memory_content:
return False
normalized_memory = _normalize(memory_content)
if len(normalized_memory) < _MIN_MEMORY_CONTENT_LENGTH:
return False
window = normalized_memory[:_MATCH_WINDOW_CHARS]
return window in normalized_response
memory_tokens = _tokenize(normalized_memory)
if not memory_tokens:
return False
response_tokens = _tokenize(normalized_response)
overlap = memory_tokens & response_tokens
return len(overlap) / len(memory_tokens) >= _MATCH_THRESHOLD

View File

@@ -29,6 +29,7 @@ from datetime import datetime, timezone
from atocore.models.database import get_connection
from atocore.observability.logger import get_logger
from atocore.projects.registry import resolve_project_name
log = get_logger("memory")
@@ -84,6 +85,13 @@ def create_memory(
raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
_validate_confidence(confidence)
# Canonicalize the project through the registry so an alias and
# the canonical id store under the same bucket. This keeps
# reinforcement queries (which use the interaction's project) and
# context retrieval (which uses the registry-canonicalized hint)
# consistent with how memories are created.
project = resolve_project_name(project)
memory_id = str(uuid.uuid4())
now = datetime.now(timezone.utc).isoformat()
@@ -162,8 +170,13 @@ def get_memories(
query += " AND memory_type = ?"
params.append(memory_type)
if project is not None:
# Canonicalize on the read side so a caller passing an alias
# finds rows that were stored under the canonical id (and
# vice versa). resolve_project_name returns the input
# unchanged for unregistered names so empty-string queries
# for "no project scope" still work.
query += " AND project = ?"
params.append(project)
params.append(resolve_project_name(project))
if status is not None:
query += " AND status = ?"
params.append(status)

View File

@@ -71,14 +71,18 @@ CREATE TABLE IF NOT EXISTS interactions (
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Indexes that reference columns guaranteed to exist since the first
-- release ship here. Indexes that reference columns added by later
-- migrations (memories.project, interactions.project,
-- interactions.session_id) are created inside _apply_migrations AFTER
-- the corresponding ALTER TABLE, NOT here. Creating them here would
-- fail on upgrade from a pre-migration schema because CREATE TABLE
-- IF NOT EXISTS is a no-op on an existing table, so the new columns
-- wouldn't be added before the CREATE INDEX runs.
CREATE INDEX IF NOT EXISTS idx_chunks_document ON source_chunks(document_id);
CREATE INDEX IF NOT EXISTS idx_memories_type ON memories(memory_type);
CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project);
CREATE INDEX IF NOT EXISTS idx_memories_status ON memories(status);
CREATE INDEX IF NOT EXISTS idx_interactions_project ON interactions(project_id);
CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project);
CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id);
CREATE INDEX IF NOT EXISTS idx_interactions_created_at ON interactions(created_at);
"""

View File

@@ -103,12 +103,27 @@ def create_runtime_backup(
encoding="utf-8",
)
# Automatic post-backup validation. Failures log a warning but do
# not raise — the backup files are still on disk and may be useful.
validation = validate_backup(stamp)
validated = validation.get("valid", False)
validation_errors = validation.get("errors", [])
if not validated:
log.warning(
"post_backup_validation_failed",
backup_root=str(backup_root),
errors=validation_errors,
)
metadata["validated"] = validated
metadata["validation_errors"] = validation_errors
log.info(
"runtime_backup_created",
backup_root=str(backup_root),
db_snapshot=str(db_snapshot_path),
chroma_included=include_chroma,
chroma_bytes=chroma_bytes_copied,
validated=validated,
)
return metadata
@@ -216,6 +231,286 @@ def validate_backup(stamp: str) -> dict:
return result
def restore_runtime_backup(
stamp: str,
*,
include_chroma: bool | None = None,
pre_restore_snapshot: bool = True,
confirm_service_stopped: bool = False,
) -> dict:
"""Restore a previously captured runtime backup.
CRITICAL: the AtoCore service MUST be stopped before calling this.
Overwriting a live SQLite database corrupts state and can break
the running container's open connections. The caller must pass
``confirm_service_stopped=True`` as an explicit acknowledgment —
otherwise this function refuses to run.
The restore procedure:
1. Validate the backup via ``validate_backup``; refuse on any error.
2. (default) Create a pre-restore safety snapshot of the CURRENT
state so the restore itself is reversible. The snapshot stamp
is returned in the result for the operator to record.
3. Remove stale SQLite WAL/SHM sidecar files next to the target db
before copying — the snapshot is a self-contained main-file
image from ``conn.backup()``, and leftover WAL/SHM from the old
live db would desync against the restored main file.
4. Copy the snapshot db over the target db path.
5. Restore the project registry file if the snapshot captured one.
6. Restore the Chroma directory if ``include_chroma`` resolves to
true. When ``include_chroma is None`` the function defers to
whether the snapshot captured Chroma (the common case).
7. Run ``PRAGMA integrity_check`` on the restored db and report
the result.
Returns a dict describing what was restored. On refused restore
(service still running, validation failed) raises ``RuntimeError``.
"""
if not confirm_service_stopped:
raise RuntimeError(
"restore_runtime_backup refuses to run without "
"confirm_service_stopped=True — stop the AtoCore container "
"first (e.g. `docker compose down` from deploy/dalidou) "
"before calling this function"
)
validation = validate_backup(stamp)
if not validation.get("valid"):
raise RuntimeError(
f"backup {stamp} failed validation: {validation.get('errors')}"
)
metadata = validation.get("metadata") or {}
pre_snapshot_stamp: str | None = None
if pre_restore_snapshot:
pre = create_runtime_backup(include_chroma=False)
pre_snapshot_stamp = Path(pre["backup_root"]).name
target_db = _config.settings.db_path
source_db = Path(metadata.get("db_snapshot_path", ""))
if not source_db.exists():
raise RuntimeError(
f"db snapshot not found at {source_db} — backup "
f"metadata may be stale"
)
# Force sqlite to flush any lingering WAL into the main file and
# release OS-level file handles on -wal/-shm before we swap the
# main file. Passing through conn.backup() in the pre-restore
# snapshot can leave sidecars momentarily locked on Windows;
# an explicit checkpoint(TRUNCATE) is the reliable way to flush
# and release. Best-effort: if the target db can't be opened
# (missing, corrupt), fall through and trust the copy step.
if target_db.exists():
try:
with sqlite3.connect(str(target_db)) as checkpoint_conn:
checkpoint_conn.execute("PRAGMA wal_checkpoint(TRUNCATE)")
except sqlite3.DatabaseError as exc:
log.warning(
"restore_pre_checkpoint_failed",
target_db=str(target_db),
error=str(exc),
)
# Remove stale WAL/SHM sidecars from the old live db so SQLite
# can't read inconsistent state on next open. Tolerant to
# Windows file-lock races — the subsequent copy replaces the
# main file anyway, and the integrity check afterward is the
# actual correctness signal.
wal_path = target_db.with_name(target_db.name + "-wal")
shm_path = target_db.with_name(target_db.name + "-shm")
for stale in (wal_path, shm_path):
if stale.exists():
try:
stale.unlink()
except OSError as exc:
log.warning(
"restore_sidecar_unlink_failed",
path=str(stale),
error=str(exc),
)
target_db.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(source_db, target_db)
registry_restored = False
registry_snapshot_path = metadata.get("registry_snapshot_path", "")
if registry_snapshot_path:
src_reg = Path(registry_snapshot_path)
if src_reg.exists():
dst_reg = _config.settings.resolved_project_registry_path
dst_reg.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(src_reg, dst_reg)
registry_restored = True
chroma_snapshot_path = metadata.get("chroma_snapshot_path", "")
if include_chroma is None:
include_chroma = bool(chroma_snapshot_path)
chroma_restored = False
if include_chroma and chroma_snapshot_path:
src_chroma = Path(chroma_snapshot_path)
if src_chroma.exists() and src_chroma.is_dir():
dst_chroma = _config.settings.chroma_path
# Do NOT rmtree the destination itself: in a Dockerized
# deployment the chroma dir is a bind-mounted volume, and
# unlinking a mount point raises
# OSError [Errno 16] Device or resource busy.
# Instead, clear the directory's CONTENTS and copytree into
# it with dirs_exist_ok=True. This is equivalent to an
# rmtree+copytree for restore purposes but stays inside the
# mount boundary. Discovered during the first real restore
# drill on Dalidou (2026-04-09).
dst_chroma.mkdir(parents=True, exist_ok=True)
for item in dst_chroma.iterdir():
if item.is_dir() and not item.is_symlink():
shutil.rmtree(item)
else:
item.unlink()
shutil.copytree(src_chroma, dst_chroma, dirs_exist_ok=True)
chroma_restored = True
restored_integrity_ok = False
integrity_error: str | None = None
try:
with sqlite3.connect(str(target_db)) as conn:
row = conn.execute("PRAGMA integrity_check").fetchone()
restored_integrity_ok = bool(row and row[0] == "ok")
if not restored_integrity_ok:
integrity_error = row[0] if row else "no_row"
except sqlite3.DatabaseError as exc:
integrity_error = f"db_open_failed: {exc}"
result: dict = {
"stamp": stamp,
"pre_restore_snapshot": pre_snapshot_stamp,
"target_db": str(target_db),
"db_restored": True,
"registry_restored": registry_restored,
"chroma_restored": chroma_restored,
"restored_integrity_ok": restored_integrity_ok,
}
if integrity_error:
result["integrity_error"] = integrity_error
log.info(
"runtime_backup_restored",
stamp=stamp,
pre_restore_snapshot=pre_snapshot_stamp,
registry_restored=registry_restored,
chroma_restored=chroma_restored,
integrity_ok=restored_integrity_ok,
)
return result
def cleanup_old_backups(*, confirm: bool = False) -> dict:
"""Apply retention policy and remove old snapshots.
Retention keeps:
- Last 7 daily snapshots (most recent per calendar day)
- Last 4 weekly snapshots (most recent on each Sunday)
- Last 6 monthly snapshots (most recent on the 1st of each month)
All other snapshots are candidates for deletion. Runs as dry-run by
default; pass ``confirm=True`` to actually delete.
Returns a dict with kept/deleted counts and any errors.
"""
snapshots_root = _config.settings.resolved_backup_dir / "snapshots"
if not snapshots_root.exists() or not snapshots_root.is_dir():
return {"kept": 0, "deleted": 0, "would_delete": 0, "dry_run": not confirm, "errors": []}
# Parse all stamp directories into (datetime, dir_path) pairs.
stamps: list[tuple[datetime, Path]] = []
unparseable: list[str] = []
for entry in sorted(snapshots_root.iterdir()):
if not entry.is_dir():
continue
try:
dt = datetime.strptime(entry.name, "%Y%m%dT%H%M%SZ").replace(tzinfo=UTC)
stamps.append((dt, entry))
except ValueError:
unparseable.append(entry.name)
if not stamps:
return {
"kept": 0, "deleted": 0, "would_delete": 0,
"dry_run": not confirm, "errors": [],
"unparseable": unparseable,
}
# Sort newest first so "most recent per bucket" is a simple first-seen.
stamps.sort(key=lambda t: t[0], reverse=True)
keep_set: set[Path] = set()
# Last 7 daily: most recent snapshot per calendar day.
seen_days: set[str] = set()
for dt, path in stamps:
day_key = dt.strftime("%Y-%m-%d")
if day_key not in seen_days:
seen_days.add(day_key)
keep_set.add(path)
if len(seen_days) >= 7:
break
# Last 4 weekly: most recent snapshot that falls on a Sunday.
seen_weeks: set[str] = set()
for dt, path in stamps:
if dt.weekday() == 6: # Sunday
week_key = dt.strftime("%Y-W%W")
if week_key not in seen_weeks:
seen_weeks.add(week_key)
keep_set.add(path)
if len(seen_weeks) >= 4:
break
# Last 6 monthly: most recent snapshot on the 1st of a month.
seen_months: set[str] = set()
for dt, path in stamps:
if dt.day == 1:
month_key = dt.strftime("%Y-%m")
if month_key not in seen_months:
seen_months.add(month_key)
keep_set.add(path)
if len(seen_months) >= 6:
break
to_delete = [path for _, path in stamps if path not in keep_set]
errors: list[str] = []
deleted_count = 0
if confirm:
for path in to_delete:
try:
shutil.rmtree(path)
deleted_count += 1
except OSError as exc:
errors.append(f"{path.name}: {exc}")
result: dict = {
"kept": len(keep_set),
"dry_run": not confirm,
"errors": errors,
}
if confirm:
result["deleted"] = deleted_count
else:
result["would_delete"] = len(to_delete)
if unparseable:
result["unparseable"] = unparseable
log.info(
"cleanup_old_backups",
kept=len(keep_set),
deleted=deleted_count if confirm else 0,
would_delete=len(to_delete) if not confirm else 0,
dry_run=not confirm,
)
return result
def _backup_sqlite_db(source_path: Path, dest_path: Path) -> None:
source_conn = sqlite3.connect(str(source_path))
dest_conn = sqlite3.connect(str(dest_path))
@@ -242,7 +537,98 @@ def _copy_directory_tree(source: Path, dest: Path) -> tuple[int, int]:
def main() -> None:
result = create_runtime_backup()
"""CLI entry point for the backup module.
Supports four subcommands:
- ``create`` run ``create_runtime_backup`` (default if none given)
- ``list`` list all runtime backup snapshots
- ``validate`` validate a specific snapshot by stamp
- ``restore`` restore a specific snapshot by stamp
The restore subcommand is the one used by the backup/restore drill
and MUST be run only when the AtoCore service is stopped. It takes
``--confirm-service-stopped`` as an explicit acknowledgment.
"""
import argparse
parser = argparse.ArgumentParser(
prog="python -m atocore.ops.backup",
description="AtoCore runtime backup create/list/validate/restore",
)
sub = parser.add_subparsers(dest="command")
p_create = sub.add_parser("create", help="create a new runtime backup")
p_create.add_argument(
"--chroma",
action="store_true",
help="also snapshot the Chroma vector store (cold copy)",
)
sub.add_parser("list", help="list runtime backup snapshots")
p_validate = sub.add_parser("validate", help="validate a snapshot by stamp")
p_validate.add_argument("stamp", help="snapshot stamp (e.g. 20260409T010203Z)")
p_cleanup = sub.add_parser("cleanup", help="remove old snapshots per retention policy")
p_cleanup.add_argument(
"--confirm",
action="store_true",
help="actually delete (default is dry-run)",
)
p_restore = sub.add_parser(
"restore",
help="restore a snapshot by stamp (service must be stopped)",
)
p_restore.add_argument("stamp", help="snapshot stamp to restore")
p_restore.add_argument(
"--confirm-service-stopped",
action="store_true",
help="explicit acknowledgment that the AtoCore container is stopped",
)
p_restore.add_argument(
"--no-pre-snapshot",
action="store_true",
help="skip the pre-restore safety snapshot of current state",
)
chroma_group = p_restore.add_mutually_exclusive_group()
chroma_group.add_argument(
"--chroma",
dest="include_chroma",
action="store_true",
default=None,
help="force-restore the Chroma snapshot",
)
chroma_group.add_argument(
"--no-chroma",
dest="include_chroma",
action="store_false",
help="skip the Chroma snapshot even if it was captured",
)
args = parser.parse_args()
command = args.command or "create"
if command == "create":
include_chroma = getattr(args, "chroma", False)
result = create_runtime_backup(include_chroma=include_chroma)
elif command == "list":
result = {"backups": list_runtime_backups()}
elif command == "validate":
result = validate_backup(args.stamp)
elif command == "cleanup":
result = cleanup_old_backups(confirm=getattr(args, "confirm", False))
elif command == "restore":
result = restore_runtime_backup(
args.stamp,
include_chroma=args.include_chroma,
pre_restore_snapshot=not args.no_pre_snapshot,
confirm_service_stopped=args.confirm_service_stopped,
)
else: # pragma: no cover — argparse guards this
parser.error(f"unknown command: {command}")
print(json.dumps(result, indent=2, ensure_ascii=True))

View File

@@ -254,6 +254,30 @@ def get_registered_project(project_name: str) -> RegisteredProject | None:
return None
def resolve_project_name(name: str | None) -> str:
"""Canonicalize a project name through the registry.
Returns the canonical ``project_id`` if the input matches any
registered project's id or alias. Returns the input unchanged
when it's empty or not in the registry — the second case keeps
backwards compatibility with hand-curated state, memories, and
interactions that predate the registry, or for projects that
are intentionally not registered.
This helper is the single canonicalization boundary for project
names across the trust hierarchy. Every read/write that takes a
project name should pass it through ``resolve_project_name``
before storing or querying. The contract is documented in
``docs/architecture/representation-authority.md``.
"""
if not name:
return name or ""
project = get_registered_project(name)
if project is not None:
return project.project_id
return name
def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
"""Ingest all configured source roots for a registered project.

View File

@@ -1,5 +1,6 @@
"""pytest configuration and shared fixtures."""
import json
import os
import sys
import tempfile
@@ -29,6 +30,45 @@ def tmp_data_dir(tmp_path):
return tmp_path
@pytest.fixture
def project_registry(tmp_path, monkeypatch):
"""Stand up an isolated project registry pointing at a temp file.
Returns a callable that takes one or more (project_id, [aliases])
tuples and writes them into the registry, then forces the in-process
settings singleton to re-resolve. Use this when a test needs the
canonicalization helpers (resolve_project_name, get_registered_project)
to recognize aliases.
"""
registry_path = tmp_path / "test-project-registry.json"
def _set(*projects):
payload = {"projects": []}
for entry in projects:
if isinstance(entry, str):
project_id, aliases = entry, []
else:
project_id, aliases = entry
payload["projects"].append(
{
"id": project_id,
"aliases": list(aliases),
"description": f"test project {project_id}",
"ingest_roots": [
{"source": "vault", "subpath": f"incoming/projects/{project_id}"}
],
}
)
registry_path.write_text(json.dumps(payload), encoding="utf-8")
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
from atocore import config
config.settings = config.Settings()
return registry_path
return _set
@pytest.fixture
def sample_markdown(tmp_path) -> Path:
"""Create a sample markdown file for testing."""

View File

@@ -50,6 +50,65 @@ def test_health_endpoint_exposes_machine_paths_and_source_readiness(tmp_data_dir
assert "run_dir" in body["machine_paths"]
def test_health_endpoint_reports_code_version_from_module(tmp_data_dir):
"""The /health response must include code_version reflecting
atocore.__version__, so deployment drift detection works."""
from atocore import __version__
client = TestClient(app)
response = client.get("/health")
assert response.status_code == 200
body = response.json()
assert body["version"] == __version__
assert body["code_version"] == __version__
def test_health_endpoint_reports_build_metadata_from_env(tmp_data_dir, monkeypatch):
"""The /health response must include build_sha, build_time, and
build_branch from the ATOCORE_BUILD_* env vars, so deploy.sh can
detect precise drift via SHA comparison instead of relying on
the coarse code_version field.
Regression test for the codex finding from 2026-04-08:
code_version 0.2.0 is too coarse to trust as a 'live is current'
signal because it only changes on manual bumps. The build_sha
field changes per commit and is set by deploy.sh.
"""
monkeypatch.setenv("ATOCORE_BUILD_SHA", "abc1234567890fedcba0987654321")
monkeypatch.setenv("ATOCORE_BUILD_TIME", "2026-04-09T01:23:45Z")
monkeypatch.setenv("ATOCORE_BUILD_BRANCH", "main")
client = TestClient(app)
response = client.get("/health")
assert response.status_code == 200
body = response.json()
assert body["build_sha"] == "abc1234567890fedcba0987654321"
assert body["build_time"] == "2026-04-09T01:23:45Z"
assert body["build_branch"] == "main"
def test_health_endpoint_reports_unknown_when_build_env_unset(tmp_data_dir, monkeypatch):
"""When deploy.sh hasn't set the build env vars (e.g. someone
ran `docker compose up` directly), /health reports 'unknown'
for all three build fields. This is a clear signal to the
operator that the deploy provenance is missing and they should
re-run via deploy.sh."""
monkeypatch.delenv("ATOCORE_BUILD_SHA", raising=False)
monkeypatch.delenv("ATOCORE_BUILD_TIME", raising=False)
monkeypatch.delenv("ATOCORE_BUILD_BRANCH", raising=False)
client = TestClient(app)
response = client.get("/health")
assert response.status_code == 200
body = response.json()
assert body["build_sha"] == "unknown"
assert body["build_time"] == "unknown"
assert body["build_branch"] == "unknown"
def test_projects_endpoint_reports_registered_projects(tmp_data_dir, monkeypatch):
vault_dir = tmp_data_dir / "vault-source"
drive_dir = tmp_data_dir / "drive-source"

View File

@@ -0,0 +1,313 @@
"""Tests for scripts/atocore_client.py — the shared operator CLI.
Specifically covers the Phase 9 reflection-loop subcommands added
after codex's sequence-step-3 review: ``capture``, ``extract``,
``reinforce-interaction``, ``list-interactions``, ``get-interaction``,
``queue``, ``promote``, ``reject``.
The tests mock the client's ``request()`` helper and verify each
subcommand:
- calls the correct HTTP method and path
- builds the correct JSON body (or the correct query string)
- passes the right subset of CLI arguments through
This is the same "wiring test" shape used by tests/test_api_storage.py:
we don't exercise the live HTTP stack; we verify the client builds
the request correctly. The server side is already covered by its
own route tests.
"""
from __future__ import annotations
import json
import sys
from pathlib import Path
import pytest
# Make scripts/ importable
_REPO_ROOT = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(_REPO_ROOT / "scripts"))
import atocore_client as client # noqa: E402
# ---------------------------------------------------------------------------
# Request capture helper
# ---------------------------------------------------------------------------
class _RequestCapture:
"""Drop-in replacement for client.request() that records calls."""
def __init__(self, response: dict | None = None):
self.calls: list[dict] = []
self._response = response if response is not None else {"ok": True}
def __call__(self, method, path, data=None, timeout=None):
self.calls.append(
{"method": method, "path": path, "data": data, "timeout": timeout}
)
return self._response
@pytest.fixture
def capture_requests(monkeypatch):
"""Replace client.request with a recording stub and return it."""
stub = _RequestCapture()
monkeypatch.setattr(client, "request", stub)
return stub
def _run_client(monkeypatch, argv: list[str]) -> int:
"""Simulate a CLI invocation with the given argv."""
monkeypatch.setattr(sys, "argv", ["atocore_client.py", *argv])
return client.main()
# ---------------------------------------------------------------------------
# capture
# ---------------------------------------------------------------------------
def test_capture_posts_to_interactions_endpoint(capture_requests, monkeypatch):
_run_client(
monkeypatch,
[
"capture",
"what is p05's current focus",
"The current focus is wave 2 operational ingestion.",
"p05-interferometer",
"claude-code-test",
"session-abc",
],
)
assert len(capture_requests.calls) == 1
call = capture_requests.calls[0]
assert call["method"] == "POST"
assert call["path"] == "/interactions"
body = call["data"]
assert body["prompt"] == "what is p05's current focus"
assert body["response"].startswith("The current focus")
assert body["project"] == "p05-interferometer"
assert body["client"] == "claude-code-test"
assert body["session_id"] == "session-abc"
assert body["reinforce"] is True # default
def test_capture_sets_default_client_when_omitted(capture_requests, monkeypatch):
_run_client(
monkeypatch,
["capture", "hi", "hello"],
)
call = capture_requests.calls[0]
assert call["data"]["client"] == "atocore-client"
assert call["data"]["project"] == ""
assert call["data"]["reinforce"] is True
def test_capture_accepts_reinforce_false(capture_requests, monkeypatch):
_run_client(
monkeypatch,
["capture", "prompt", "response", "p05", "claude", "sess", "false"],
)
call = capture_requests.calls[0]
assert call["data"]["reinforce"] is False
# ---------------------------------------------------------------------------
# extract
# ---------------------------------------------------------------------------
def test_extract_default_is_preview(capture_requests, monkeypatch):
_run_client(monkeypatch, ["extract", "abc-123"])
call = capture_requests.calls[0]
assert call["method"] == "POST"
assert call["path"] == "/interactions/abc-123/extract"
assert call["data"] == {"persist": False}
def test_extract_persist_true(capture_requests, monkeypatch):
_run_client(monkeypatch, ["extract", "abc-123", "true"])
call = capture_requests.calls[0]
assert call["data"] == {"persist": True}
def test_extract_url_encodes_interaction_id(capture_requests, monkeypatch):
_run_client(monkeypatch, ["extract", "abc/def"])
call = capture_requests.calls[0]
assert call["path"] == "/interactions/abc%2Fdef/extract"
# ---------------------------------------------------------------------------
# reinforce-interaction
# ---------------------------------------------------------------------------
def test_reinforce_interaction_posts_to_correct_path(capture_requests, monkeypatch):
_run_client(monkeypatch, ["reinforce-interaction", "int-xyz"])
call = capture_requests.calls[0]
assert call["method"] == "POST"
assert call["path"] == "/interactions/int-xyz/reinforce"
assert call["data"] == {}
# ---------------------------------------------------------------------------
# list-interactions
# ---------------------------------------------------------------------------
def test_list_interactions_no_filters(capture_requests, monkeypatch):
_run_client(monkeypatch, ["list-interactions"])
call = capture_requests.calls[0]
assert call["method"] == "GET"
assert call["path"] == "/interactions?limit=50"
def test_list_interactions_with_project_filter(capture_requests, monkeypatch):
_run_client(monkeypatch, ["list-interactions", "p05-interferometer"])
call = capture_requests.calls[0]
assert "project=p05-interferometer" in call["path"]
assert "limit=50" in call["path"]
def test_list_interactions_full_filter_set(capture_requests, monkeypatch):
_run_client(
monkeypatch,
[
"list-interactions",
"p05",
"sess-1",
"claude-code",
"2026-04-07T00:00:00Z",
"20",
],
)
call = capture_requests.calls[0]
path = call["path"]
assert "project=p05" in path
assert "session_id=sess-1" in path
assert "client=claude-code" in path
# Since is URL-encoded — the : and + chars get escaped
assert "since=2026-04-07" in path
assert "limit=20" in path
# ---------------------------------------------------------------------------
# get-interaction
# ---------------------------------------------------------------------------
def test_get_interaction_fetches_by_id(capture_requests, monkeypatch):
_run_client(monkeypatch, ["get-interaction", "int-42"])
call = capture_requests.calls[0]
assert call["method"] == "GET"
assert call["path"] == "/interactions/int-42"
# ---------------------------------------------------------------------------
# queue
# ---------------------------------------------------------------------------
def test_queue_always_filters_by_candidate_status(capture_requests, monkeypatch):
_run_client(monkeypatch, ["queue"])
call = capture_requests.calls[0]
assert call["method"] == "GET"
assert call["path"].startswith("/memory?")
assert "status=candidate" in call["path"]
assert "limit=50" in call["path"]
def test_queue_with_memory_type_and_project(capture_requests, monkeypatch):
_run_client(monkeypatch, ["queue", "adaptation", "p05-interferometer", "10"])
call = capture_requests.calls[0]
path = call["path"]
assert "status=candidate" in path
assert "memory_type=adaptation" in path
assert "project=p05-interferometer" in path
assert "limit=10" in path
def test_queue_limit_coercion(capture_requests, monkeypatch):
"""limit is typed as int by argparse so string '25' becomes 25."""
_run_client(monkeypatch, ["queue", "", "", "25"])
call = capture_requests.calls[0]
assert "limit=25" in call["path"]
# ---------------------------------------------------------------------------
# promote / reject
# ---------------------------------------------------------------------------
def test_promote_posts_to_memory_promote_path(capture_requests, monkeypatch):
_run_client(monkeypatch, ["promote", "mem-abc"])
call = capture_requests.calls[0]
assert call["method"] == "POST"
assert call["path"] == "/memory/mem-abc/promote"
assert call["data"] == {}
def test_reject_posts_to_memory_reject_path(capture_requests, monkeypatch):
_run_client(monkeypatch, ["reject", "mem-xyz"])
call = capture_requests.calls[0]
assert call["method"] == "POST"
assert call["path"] == "/memory/mem-xyz/reject"
assert call["data"] == {}
def test_promote_url_encodes_memory_id(capture_requests, monkeypatch):
_run_client(monkeypatch, ["promote", "mem/with/slashes"])
call = capture_requests.calls[0]
assert "mem%2Fwith%2Fslashes" in call["path"]
# ---------------------------------------------------------------------------
# end-to-end: ensure the Phase 9 loop can be driven entirely through
# the client
# ---------------------------------------------------------------------------
def test_phase9_full_loop_via_client_shape(capture_requests, monkeypatch):
"""Simulate the full capture -> extract -> queue -> promote cycle.
This doesn't exercise real HTTP — each call is intercepted by
the mock request. But it proves every step of the Phase 9 loop
is reachable through the shared client, which is the whole point
of the codex-step-3 work.
"""
# Step 1: capture
_run_client(
monkeypatch,
[
"capture",
"what about GF-PTFE for lateral support",
"## Decision: use GF-PTFE pads for thermal stability",
"p05-interferometer",
],
)
# Step 2: extract candidates (preview)
_run_client(monkeypatch, ["extract", "fake-interaction-id"])
# Step 3: extract and persist
_run_client(monkeypatch, ["extract", "fake-interaction-id", "true"])
# Step 4: list the review queue
_run_client(monkeypatch, ["queue"])
# Step 5: promote a candidate
_run_client(monkeypatch, ["promote", "fake-memory-id"])
# Step 6: reject another
_run_client(monkeypatch, ["reject", "fake-memory-id-2"])
methods_and_paths = [
(c["method"], c["path"]) for c in capture_requests.calls
]
assert methods_and_paths == [
("POST", "/interactions"),
("POST", "/interactions/fake-interaction-id/extract"),
("POST", "/interactions/fake-interaction-id/extract"),
("GET", "/memory?status=candidate&limit=50"),
("POST", "/memory/fake-memory-id/promote"),
("POST", "/memory/fake-memory-id-2/reject"),
]

View File

@@ -1,14 +1,18 @@
"""Tests for runtime backup creation."""
"""Tests for runtime backup creation, restore, and retention cleanup."""
import json
import sqlite3
from datetime import UTC, datetime
from datetime import UTC, datetime, timedelta
import pytest
import atocore.config as config
from atocore.models.database import init_db
from atocore.ops.backup import (
cleanup_old_backups,
create_runtime_backup,
list_runtime_backups,
restore_runtime_backup,
validate_backup,
)
@@ -156,3 +160,531 @@ def test_create_runtime_backup_handles_missing_registry(tmp_path, monkeypatch):
config.settings = original_settings
assert result["registry_snapshot_path"] == ""
def test_restore_refuses_without_confirm_service_stopped(tmp_path, monkeypatch):
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
create_runtime_backup(datetime(2026, 4, 9, 10, 0, 0, tzinfo=UTC))
with pytest.raises(RuntimeError, match="confirm_service_stopped"):
restore_runtime_backup("20260409T100000Z")
finally:
config.settings = original_settings
def test_restore_raises_on_invalid_backup(tmp_path, monkeypatch):
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
with pytest.raises(RuntimeError, match="failed validation"):
restore_runtime_backup(
"20250101T000000Z", confirm_service_stopped=True
)
finally:
config.settings = original_settings
def test_restore_round_trip_reverses_post_backup_mutations(tmp_path, monkeypatch):
"""Canonical drill: snapshot -> mutate -> restore -> mutation gone."""
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
registry_path = tmp_path / "config" / "project-registry.json"
registry_path.parent.mkdir(parents=True)
registry_path.write_text(
'{"projects":[{"id":"p01-example","aliases":[],'
'"ingest_roots":[{"source":"vault","subpath":"incoming/projects/p01-example"}]}]}\n',
encoding="utf-8",
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
# 1. Seed baseline state that should SURVIVE the restore.
with sqlite3.connect(str(config.settings.db_path)) as conn:
conn.execute(
"INSERT INTO projects (id, name) VALUES (?, ?)",
("p01", "Baseline Project"),
)
conn.commit()
# 2. Create the backup we're going to restore to.
create_runtime_backup(datetime(2026, 4, 9, 11, 0, 0, tzinfo=UTC))
stamp = "20260409T110000Z"
# 3. Mutate live state AFTER the backup — this is what the
# restore should reverse.
with sqlite3.connect(str(config.settings.db_path)) as conn:
conn.execute(
"INSERT INTO projects (id, name) VALUES (?, ?)",
("p99", "Post Backup Mutation"),
)
conn.commit()
# Confirm the mutation is present before restore.
with sqlite3.connect(str(config.settings.db_path)) as conn:
row = conn.execute(
"SELECT name FROM projects WHERE id = ?", ("p99",)
).fetchone()
assert row is not None and row[0] == "Post Backup Mutation"
# 4. Restore — the drill procedure. Explicit confirm_service_stopped.
result = restore_runtime_backup(
stamp, confirm_service_stopped=True
)
# 5. Verify restore report
assert result["stamp"] == stamp
assert result["db_restored"] is True
assert result["registry_restored"] is True
assert result["restored_integrity_ok"] is True
assert result["pre_restore_snapshot"] is not None
# 6. Verify live state reflects the restore: baseline survived,
# post-backup mutation is gone.
with sqlite3.connect(str(config.settings.db_path)) as conn:
baseline = conn.execute(
"SELECT name FROM projects WHERE id = ?", ("p01",)
).fetchone()
mutation = conn.execute(
"SELECT name FROM projects WHERE id = ?", ("p99",)
).fetchone()
assert baseline is not None and baseline[0] == "Baseline Project"
assert mutation is None
# 7. Pre-restore safety snapshot DOES contain the mutation —
# it captured current state before overwriting. This is the
# reversibility guarantee: the operator can restore back to
# it if the restore itself was a mistake.
pre_stamp = result["pre_restore_snapshot"]
pre_validation = validate_backup(pre_stamp)
assert pre_validation["valid"] is True
pre_db_path = pre_validation["metadata"]["db_snapshot_path"]
with sqlite3.connect(pre_db_path) as conn:
pre_mutation = conn.execute(
"SELECT name FROM projects WHERE id = ?", ("p99",)
).fetchone()
assert pre_mutation is not None and pre_mutation[0] == "Post Backup Mutation"
finally:
config.settings = original_settings
def test_restore_round_trip_with_chroma(tmp_path, monkeypatch):
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
# Seed baseline chroma state that should survive restore.
chroma_dir = config.settings.chroma_path
(chroma_dir / "coll-a").mkdir(parents=True, exist_ok=True)
(chroma_dir / "coll-a" / "baseline.bin").write_bytes(b"baseline")
create_runtime_backup(
datetime(2026, 4, 9, 12, 0, 0, tzinfo=UTC), include_chroma=True
)
stamp = "20260409T120000Z"
# Mutate chroma after backup: add a file + remove baseline.
(chroma_dir / "coll-a" / "post_backup.bin").write_bytes(b"post")
(chroma_dir / "coll-a" / "baseline.bin").unlink()
result = restore_runtime_backup(
stamp, confirm_service_stopped=True
)
assert result["chroma_restored"] is True
assert (chroma_dir / "coll-a" / "baseline.bin").exists()
assert not (chroma_dir / "coll-a" / "post_backup.bin").exists()
finally:
config.settings = original_settings
def test_restore_chroma_does_not_unlink_destination_directory(tmp_path, monkeypatch):
"""Regression: restore must not rmtree the chroma dir itself.
In a Dockerized deployment the chroma dir is a bind-mounted
volume. Calling shutil.rmtree on a mount point raises
``OSError [Errno 16] Device or resource busy``, which broke the
first real Dalidou drill on 2026-04-09. The fix clears the
directory's CONTENTS and copytree(dirs_exist_ok=True) into it,
keeping the directory inode (and any bind mount) intact.
This test captures the inode of the destination directory before
and after restore and asserts they match — that's what a
bind-mounted chroma dir would also see.
"""
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
chroma_dir = config.settings.chroma_path
(chroma_dir / "coll-a").mkdir(parents=True, exist_ok=True)
(chroma_dir / "coll-a" / "baseline.bin").write_bytes(b"baseline")
create_runtime_backup(
datetime(2026, 4, 9, 15, 0, 0, tzinfo=UTC), include_chroma=True
)
# Capture the destination directory's stat signature before restore.
chroma_stat_before = chroma_dir.stat()
# Add a file post-backup so restore has work to do.
(chroma_dir / "coll-a" / "post_backup.bin").write_bytes(b"post")
restore_runtime_backup(
"20260409T150000Z", confirm_service_stopped=True
)
# Directory still exists (would have failed on mount point) and
# its st_ino matches — the mount itself wasn't unlinked.
assert chroma_dir.exists()
chroma_stat_after = chroma_dir.stat()
assert chroma_stat_before.st_ino == chroma_stat_after.st_ino, (
"chroma directory inode changed — restore recreated the "
"directory instead of clearing its contents; this would "
"fail on a Docker bind-mounted volume"
)
# And the contents did actually get restored.
assert (chroma_dir / "coll-a" / "baseline.bin").exists()
assert not (chroma_dir / "coll-a" / "post_backup.bin").exists()
finally:
config.settings = original_settings
def test_restore_skips_pre_snapshot_when_requested(tmp_path, monkeypatch):
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
create_runtime_backup(datetime(2026, 4, 9, 13, 0, 0, tzinfo=UTC))
before_count = len(list_runtime_backups())
result = restore_runtime_backup(
"20260409T130000Z",
confirm_service_stopped=True,
pre_restore_snapshot=False,
)
after_count = len(list_runtime_backups())
assert result["pre_restore_snapshot"] is None
assert after_count == before_count
finally:
config.settings = original_settings
def test_create_backup_includes_validation_fields(tmp_path, monkeypatch):
"""Task B: create_runtime_backup auto-validates and reports result."""
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
result = create_runtime_backup(datetime(2026, 4, 11, 10, 0, 0, tzinfo=UTC))
finally:
config.settings = original_settings
assert "validated" in result
assert "validation_errors" in result
assert result["validated"] is True
assert result["validation_errors"] == []
def test_create_backup_validation_failure_does_not_raise(tmp_path, monkeypatch):
"""Task B: if post-backup validation fails, backup still returns metadata."""
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
def _broken_validate(stamp):
return {"valid": False, "errors": ["db_missing", "metadata_missing"]}
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
monkeypatch.setattr("atocore.ops.backup.validate_backup", _broken_validate)
result = create_runtime_backup(datetime(2026, 4, 11, 11, 0, 0, tzinfo=UTC))
finally:
config.settings = original_settings
# Should NOT have raised — backup still returned metadata
assert result["validated"] is False
assert result["validation_errors"] == ["db_missing", "metadata_missing"]
# Core backup fields still present
assert "db_snapshot_path" in result
assert "created_at" in result
def test_restore_cleans_stale_wal_sidecars(tmp_path, monkeypatch):
"""Stale WAL/SHM sidecars must not carry bytes past the restore.
Note: after restore runs, PRAGMA integrity_check reopens the
restored db which may legitimately recreate a fresh -wal. So we
assert that the STALE byte marker no longer appears in either
sidecar, not that the files are absent.
"""
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
create_runtime_backup(datetime(2026, 4, 9, 14, 0, 0, tzinfo=UTC))
# Write fake stale WAL/SHM next to the live db with an
# unmistakable marker.
target_db = config.settings.db_path
wal = target_db.with_name(target_db.name + "-wal")
shm = target_db.with_name(target_db.name + "-shm")
stale_marker = b"STALE-SIDECAR-MARKER-DO-NOT-SURVIVE"
wal.write_bytes(stale_marker)
shm.write_bytes(stale_marker)
assert wal.exists() and shm.exists()
restore_runtime_backup(
"20260409T140000Z", confirm_service_stopped=True
)
# The restored db must pass integrity check (tested elsewhere);
# here we just confirm that no file next to it still contains
# the stale marker from the old live process.
for sidecar in (wal, shm):
if sidecar.exists():
assert stale_marker not in sidecar.read_bytes(), (
f"{sidecar.name} still carries stale marker"
)
finally:
config.settings = original_settings
# ---------------------------------------------------------------------------
# Task C: Backup retention cleanup
# ---------------------------------------------------------------------------
def _setup_cleanup_env(tmp_path, monkeypatch):
"""Helper: configure env, init db, return snapshots_root."""
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original = config.settings
config.settings = config.Settings()
init_db()
snapshots_root = config.settings.resolved_backup_dir / "snapshots"
snapshots_root.mkdir(parents=True, exist_ok=True)
return original, snapshots_root
def _seed_snapshots(snapshots_root, dates):
"""Create minimal valid snapshot dirs for the given datetimes."""
for dt in dates:
stamp = dt.strftime("%Y%m%dT%H%M%SZ")
snap_dir = snapshots_root / stamp
db_dir = snap_dir / "db"
db_dir.mkdir(parents=True, exist_ok=True)
db_path = db_dir / "atocore.db"
conn = sqlite3.connect(str(db_path))
conn.execute("CREATE TABLE IF NOT EXISTS _marker (id INTEGER)")
conn.close()
metadata = {
"created_at": dt.isoformat(),
"backup_root": str(snap_dir),
"db_snapshot_path": str(db_path),
"db_size_bytes": db_path.stat().st_size,
"registry_snapshot_path": "",
"chroma_snapshot_path": "",
"chroma_snapshot_bytes": 0,
"chroma_snapshot_files": 0,
"chroma_snapshot_included": False,
"vector_store_note": "",
}
(snap_dir / "backup-metadata.json").write_text(
json.dumps(metadata, indent=2) + "\n", encoding="utf-8"
)
def test_cleanup_empty_dir(tmp_path, monkeypatch):
original, _ = _setup_cleanup_env(tmp_path, monkeypatch)
try:
result = cleanup_old_backups()
assert result["kept"] == 0
assert result["would_delete"] == 0
assert result["dry_run"] is True
finally:
config.settings = original
def test_cleanup_dry_run_identifies_old_snapshots(tmp_path, monkeypatch):
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
try:
# 10 daily snapshots Apr 2-11 (avoiding Apr 1 which is monthly).
base = datetime(2026, 4, 2, 12, 0, 0, tzinfo=UTC)
dates = [base + timedelta(days=i) for i in range(10)]
_seed_snapshots(snapshots_root, dates)
result = cleanup_old_backups()
assert result["dry_run"] is True
# 7 daily kept + Apr 5 is a Sunday (weekly) but already in daily.
# Apr 2, 3, 4 are oldest. Apr 5 is Sunday → kept as weekly.
# So: 7 daily (Apr 5-11) + 1 weekly (Apr 5 already counted) = 7 daily.
# But Apr 5 is the 8th newest day from Apr 11... wait.
# Newest 7 days: Apr 11,10,9,8,7,6,5 → all kept as daily.
# Remaining: Apr 4,3,2. Apr 5 is already in daily.
# None of Apr 4,3,2 are Sunday or 1st → all 3 deleted.
assert result["kept"] == 7
assert result["would_delete"] == 3
assert len(list(snapshots_root.iterdir())) == 10
finally:
config.settings = original
def test_cleanup_confirm_deletes(tmp_path, monkeypatch):
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
try:
base = datetime(2026, 4, 2, 12, 0, 0, tzinfo=UTC)
dates = [base + timedelta(days=i) for i in range(10)]
_seed_snapshots(snapshots_root, dates)
result = cleanup_old_backups(confirm=True)
assert result["dry_run"] is False
assert result["deleted"] == 3
assert result["kept"] == 7
assert len(list(snapshots_root.iterdir())) == 7
finally:
config.settings = original
def test_cleanup_keeps_last_7_daily(tmp_path, monkeypatch):
"""Exactly 7 snapshots on different days → all kept."""
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
try:
base = datetime(2026, 4, 5, 12, 0, 0, tzinfo=UTC)
dates = [base + timedelta(days=i) for i in range(7)]
_seed_snapshots(snapshots_root, dates)
result = cleanup_old_backups()
assert result["kept"] == 7
assert result["would_delete"] == 0
finally:
config.settings = original
def test_cleanup_keeps_sunday_weekly(tmp_path, monkeypatch):
"""Snapshots on Sundays outside the 7-day window are kept as weekly."""
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
try:
# 7 daily snapshots covering Apr 5-11
base = datetime(2026, 4, 5, 12, 0, 0, tzinfo=UTC)
daily = [base + timedelta(days=i) for i in range(7)]
# 2 older Sunday snapshots
sun1 = datetime(2026, 3, 29, 12, 0, 0, tzinfo=UTC) # Sunday
sun2 = datetime(2026, 3, 22, 12, 0, 0, tzinfo=UTC) # Sunday
# A non-Sunday old snapshot that should be deleted
wed = datetime(2026, 3, 25, 12, 0, 0, tzinfo=UTC) # Wednesday
_seed_snapshots(snapshots_root, daily + [sun1, sun2, wed])
result = cleanup_old_backups()
# 7 daily + 2 Sunday weekly = 9 kept, 1 Wednesday deleted
assert result["kept"] == 9
assert result["would_delete"] == 1
finally:
config.settings = original
def test_cleanup_keeps_monthly_first(tmp_path, monkeypatch):
"""Snapshots on the 1st of a month outside daily+weekly are kept as monthly."""
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
try:
# 7 daily in April 2026
base = datetime(2026, 4, 5, 12, 0, 0, tzinfo=UTC)
daily = [base + timedelta(days=i) for i in range(7)]
# Old monthly 1st snapshots
m1 = datetime(2026, 1, 1, 12, 0, 0, tzinfo=UTC)
m2 = datetime(2025, 12, 1, 12, 0, 0, tzinfo=UTC)
# Old non-1st, non-Sunday snapshot — should be deleted
old = datetime(2026, 1, 15, 12, 0, 0, tzinfo=UTC)
_seed_snapshots(snapshots_root, daily + [m1, m2, old])
result = cleanup_old_backups()
# 7 daily + 2 monthly = 9 kept, 1 deleted
assert result["kept"] == 9
assert result["would_delete"] == 1
finally:
config.settings = original
def test_cleanup_unparseable_stamp_skipped(tmp_path, monkeypatch):
"""Directories with unparseable names are ignored, not deleted."""
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
try:
base = datetime(2026, 4, 5, 12, 0, 0, tzinfo=UTC)
_seed_snapshots(snapshots_root, [base])
bad_dir = snapshots_root / "not-a-timestamp"
bad_dir.mkdir()
result = cleanup_old_backups(confirm=True)
assert result.get("unparseable") == ["not-a-timestamp"]
assert bad_dir.exists()
assert result["kept"] == 1
finally:
config.settings = original

249
tests/test_capture_stop.py Normal file
View File

@@ -0,0 +1,249 @@
"""Tests for deploy/hooks/capture_stop.py — Claude Code Stop hook."""
from __future__ import annotations
import json
import os
import sys
import tempfile
import textwrap
from io import StringIO
from pathlib import Path
from unittest import mock
import pytest
# The hook script lives outside of the normal package tree, so import
# it by manipulating sys.path.
_HOOK_DIR = str(Path(__file__).resolve().parent.parent / "deploy" / "hooks")
if _HOOK_DIR not in sys.path:
sys.path.insert(0, _HOOK_DIR)
import capture_stop # noqa: E402
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _write_transcript(tmp: Path, entries: list[dict]) -> str:
"""Write a JSONL transcript and return the path."""
path = tmp / "transcript.jsonl"
with open(path, "w", encoding="utf-8") as f:
for entry in entries:
f.write(json.dumps(entry, ensure_ascii=False) + "\n")
return str(path)
def _user_entry(content: str, *, is_meta: bool = False) -> dict:
return {
"type": "user",
"isMeta": is_meta,
"message": {"role": "user", "content": content},
}
def _assistant_entry() -> dict:
return {
"type": "assistant",
"message": {
"role": "assistant",
"content": [{"type": "text", "text": "Sure, here's the answer."}],
},
}
def _system_entry() -> dict:
return {"type": "system", "message": {"role": "system", "content": "system init"}}
# ---------------------------------------------------------------------------
# _extract_last_user_prompt
# ---------------------------------------------------------------------------
class TestExtractLastUserPrompt:
def test_returns_last_real_prompt(self, tmp_path):
path = _write_transcript(tmp_path, [
_user_entry("First prompt that is long enough to capture"),
_assistant_entry(),
_user_entry("Second prompt that should be the one we capture"),
_assistant_entry(),
])
result = capture_stop._extract_last_user_prompt(path)
assert result == "Second prompt that should be the one we capture"
def test_skips_meta_messages(self, tmp_path):
path = _write_transcript(tmp_path, [
_user_entry("Real prompt that is definitely long enough"),
_user_entry("<local-command>some system stuff</local-command>"),
_user_entry("Meta message that looks real enough", is_meta=True),
])
result = capture_stop._extract_last_user_prompt(path)
assert result == "Real prompt that is definitely long enough"
def test_skips_xml_content(self, tmp_path):
path = _write_transcript(tmp_path, [
_user_entry("Actual prompt from a real human user"),
_user_entry("<command-name>/help</command-name>"),
])
result = capture_stop._extract_last_user_prompt(path)
assert result == "Actual prompt from a real human user"
def test_skips_short_messages(self, tmp_path):
path = _write_transcript(tmp_path, [
_user_entry("This prompt is long enough to be captured"),
_user_entry("yes"), # too short
])
result = capture_stop._extract_last_user_prompt(path)
assert result == "This prompt is long enough to be captured"
def test_handles_content_blocks(self, tmp_path):
entry = {
"type": "user",
"message": {
"role": "user",
"content": [
{"type": "text", "text": "First paragraph of the prompt."},
{"type": "text", "text": "Second paragraph continues here."},
],
},
}
path = _write_transcript(tmp_path, [entry])
result = capture_stop._extract_last_user_prompt(path)
assert "First paragraph" in result
assert "Second paragraph" in result
def test_empty_transcript(self, tmp_path):
path = _write_transcript(tmp_path, [])
result = capture_stop._extract_last_user_prompt(path)
assert result == ""
def test_missing_file(self):
result = capture_stop._extract_last_user_prompt("/nonexistent/path.jsonl")
assert result == ""
def test_empty_path(self):
result = capture_stop._extract_last_user_prompt("")
assert result == ""
# ---------------------------------------------------------------------------
# _infer_project
# ---------------------------------------------------------------------------
class TestInferProject:
def test_empty_cwd(self):
assert capture_stop._infer_project("") == ""
def test_unknown_path(self):
assert capture_stop._infer_project("C:\\Users\\antoi\\random") == ""
def test_mapped_path(self):
with mock.patch.dict(capture_stop._PROJECT_PATH_MAP, {
"C:\\Users\\antoi\\gigabit": "p04-gigabit",
}):
result = capture_stop._infer_project("C:\\Users\\antoi\\gigabit\\src")
assert result == "p04-gigabit"
# ---------------------------------------------------------------------------
# _capture (integration-style, mocking HTTP)
# ---------------------------------------------------------------------------
class TestCapture:
def _hook_input(self, *, transcript_path: str = "", **overrides) -> str:
data = {
"session_id": "test-session-123",
"transcript_path": transcript_path,
"cwd": "C:\\Users\\antoi\\ATOCore",
"permission_mode": "default",
"hook_event_name": "Stop",
"last_assistant_message": "Here is the answer to your question about the code.",
"turn_number": 3,
}
data.update(overrides)
return json.dumps(data)
@mock.patch("capture_stop.urllib.request.urlopen")
def test_posts_to_atocore(self, mock_urlopen, tmp_path):
transcript = _write_transcript(tmp_path, [
_user_entry("Please explain how the backup system works in detail"),
_assistant_entry(),
])
mock_resp = mock.MagicMock()
mock_resp.read.return_value = json.dumps({"id": "int-001", "status": "recorded"}).encode()
mock_urlopen.return_value = mock_resp
with mock.patch("sys.stdin", StringIO(self._hook_input(transcript_path=transcript))):
capture_stop._capture()
mock_urlopen.assert_called_once()
req = mock_urlopen.call_args[0][0]
body = json.loads(req.data.decode())
assert body["prompt"] == "Please explain how the backup system works in detail"
assert body["client"] == "claude-code"
assert body["session_id"] == "test-session-123"
assert body["reinforce"] is False
@mock.patch("capture_stop.urllib.request.urlopen")
def test_skips_when_disabled(self, mock_urlopen, tmp_path):
transcript = _write_transcript(tmp_path, [
_user_entry("A prompt that would normally be captured"),
])
with mock.patch.dict(os.environ, {"ATOCORE_CAPTURE_DISABLED": "1"}):
with mock.patch("sys.stdin", StringIO(self._hook_input(transcript_path=transcript))):
capture_stop._capture()
mock_urlopen.assert_not_called()
@mock.patch("capture_stop.urllib.request.urlopen")
def test_skips_short_prompt(self, mock_urlopen, tmp_path):
transcript = _write_transcript(tmp_path, [
_user_entry("yes"),
])
with mock.patch("sys.stdin", StringIO(self._hook_input(transcript_path=transcript))):
capture_stop._capture()
mock_urlopen.assert_not_called()
@mock.patch("capture_stop.urllib.request.urlopen")
def test_truncates_long_response(self, mock_urlopen, tmp_path):
transcript = _write_transcript(tmp_path, [
_user_entry("Tell me everything about the entire codebase architecture"),
])
long_response = "x" * 60_000
mock_resp = mock.MagicMock()
mock_resp.read.return_value = json.dumps({"id": "int-002"}).encode()
mock_urlopen.return_value = mock_resp
with mock.patch("sys.stdin", StringIO(
self._hook_input(transcript_path=transcript, last_assistant_message=long_response)
)):
capture_stop._capture()
req = mock_urlopen.call_args[0][0]
body = json.loads(req.data.decode())
assert len(body["response"]) <= capture_stop.MAX_RESPONSE_LENGTH + 20
assert body["response"].endswith("[truncated]")
def test_main_never_raises(self):
"""main() must always exit 0, even on garbage input."""
with mock.patch("sys.stdin", StringIO("not json at all")):
# Should not raise
capture_stop.main()
@mock.patch("capture_stop.urllib.request.urlopen")
def test_uses_atocore_url_env(self, mock_urlopen, tmp_path):
transcript = _write_transcript(tmp_path, [
_user_entry("Please help me with this particular problem in the code"),
])
mock_resp = mock.MagicMock()
mock_resp.read.return_value = json.dumps({"id": "int-003"}).encode()
mock_urlopen.return_value = mock_resp
with mock.patch.dict(os.environ, {"ATOCORE_URL": "http://localhost:9999"}):
# Re-read the env var
with mock.patch.object(capture_stop, "ATOCORE_URL", "http://localhost:9999"):
with mock.patch("sys.stdin", StringIO(self._hook_input(transcript_path=transcript))):
capture_stop._capture()
req = mock_urlopen.call_args[0][0]
assert req.full_url == "http://localhost:9999/interactions"

View File

@@ -1,5 +1,8 @@
"""Tests for the context builder."""
import json
import atocore.config as config
from atocore.context.builder import build_context, get_last_context_pack
from atocore.context.project_state import init_project_state_schema, set_state
from atocore.ingestion.pipeline import ingest_file
@@ -162,3 +165,89 @@ def test_no_project_state_without_hint(tmp_data_dir, sample_markdown):
pack = build_context("What is AtoCore?")
assert pack.project_state_chars == 0
assert "--- Trusted Project State ---" not in pack.formatted_context
def test_alias_hint_resolves_through_registry(tmp_data_dir, sample_markdown, monkeypatch):
"""An alias hint like 'p05' should find project state stored under 'p05-interferometer'.
This is the regression test for the P1 finding from codex's review:
/context/build was previously doing an exact-name lookup that
silently dropped trusted project state when the caller passed an
alias instead of the canonical project id.
"""
init_db()
init_project_state_schema()
ingest_file(sample_markdown)
# Stand up a minimal project registry that knows the aliases.
# The registry lives in a JSON file pointed to by
# ATOCORE_PROJECT_REGISTRY_PATH; the dataclass-driven loader picks
# it up on every call (no in-process cache to invalidate).
registry_path = tmp_data_dir / "project-registry.json"
registry_path.write_text(
json.dumps(
{
"projects": [
{
"id": "p05-interferometer",
"aliases": ["p05", "interferometer"],
"description": "P05 alias-resolution regression test",
"ingest_roots": [
{"source": "vault", "subpath": "incoming/projects/p05"}
],
}
]
}
),
encoding="utf-8",
)
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
config.settings = config.Settings()
# Trusted state is stored under the canonical id (the way the
# /project/state endpoint always writes it).
set_state(
"p05-interferometer",
"status",
"next_focus",
"Wave 2 trusted-operational ingestion",
)
# The bug: pack with alias hint used to silently miss the state.
pack_with_alias = build_context("status?", project_hint="p05", budget=2000)
assert "Wave 2 trusted-operational ingestion" in pack_with_alias.formatted_context
assert pack_with_alias.project_state_chars > 0
# The canonical id should still work the same way.
pack_with_canonical = build_context(
"status?", project_hint="p05-interferometer", budget=2000
)
assert "Wave 2 trusted-operational ingestion" in pack_with_canonical.formatted_context
# A second alias should also resolve.
pack_with_other_alias = build_context(
"status?", project_hint="interferometer", budget=2000
)
assert "Wave 2 trusted-operational ingestion" in pack_with_other_alias.formatted_context
def test_unknown_hint_falls_back_to_raw_lookup(tmp_data_dir, sample_markdown, monkeypatch):
"""A hint that isn't in the registry should still try the raw name.
This preserves backwards compatibility with hand-curated
project_state entries that predate the project registry.
"""
init_db()
init_project_state_schema()
ingest_file(sample_markdown)
# Empty registry — the hint won't resolve through it.
registry_path = tmp_data_dir / "project-registry.json"
registry_path.write_text('{"projects": []}', encoding="utf-8")
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
config.settings = config.Settings()
set_state("orphan-project", "status", "phase", "Solo run")
pack = build_context("status?", project_hint="orphan-project", budget=2000)
assert "Solo run" in pack.formatted_context

View File

@@ -47,3 +47,138 @@ def test_get_connection_uses_configured_timeout_value(tmp_path, monkeypatch):
assert calls
assert calls[0] == 2.5
def test_init_db_upgrades_pre_phase9_schema_without_failing(tmp_path, monkeypatch):
"""Regression test for the schema init ordering bug caught during
the first real Dalidou deploy (report from 2026-04-08).
Before the fix, SCHEMA_SQL contained CREATE INDEX statements that
referenced columns (memories.project, interactions.project,
interactions.session_id) added by _apply_migrations later in
init_db. On a fresh install this worked because CREATE TABLE
created the tables with the new columns before the CREATE INDEX
ran, but on UPGRADE from a pre-Phase-9 schema the CREATE TABLE
IF NOT EXISTS was a no-op and the CREATE INDEX hit
OperationalError: no such column.
This test seeds the tables with the OLD pre-Phase-9 shape then
calls init_db() and verifies that:
- init_db does not raise
- The new columns were added via _apply_migrations
- The new indexes exist
If the bug is reintroduced by moving a CREATE INDEX for a
migration column back into SCHEMA_SQL, this test will fail
with OperationalError before reaching the assertions.
"""
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
original_settings = config.settings
try:
config.settings = config.Settings()
# Step 1: create the data dir and open a direct connection
config.ensure_runtime_dirs()
db_path = config.settings.db_path
# Step 2: seed the DB with the old pre-Phase-9 shape. No
# project/last_referenced_at/reference_count on memories; no
# project/client/session_id/response/memories_used/chunks_used
# on interactions. We also need the prerequisite tables
# (projects, source_documents, source_chunks) because the
# memories table has an FK to source_chunks.
with sqlite3.connect(str(db_path)) as conn:
conn.executescript(
"""
CREATE TABLE source_documents (
id TEXT PRIMARY KEY,
file_path TEXT UNIQUE NOT NULL,
file_hash TEXT NOT NULL,
title TEXT,
doc_type TEXT DEFAULT 'markdown',
tags TEXT DEFAULT '[]',
ingested_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE source_chunks (
id TEXT PRIMARY KEY,
document_id TEXT NOT NULL REFERENCES source_documents(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
heading_path TEXT DEFAULT '',
char_count INTEGER NOT NULL,
metadata TEXT DEFAULT '{}',
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE memories (
id TEXT PRIMARY KEY,
memory_type TEXT NOT NULL,
content TEXT NOT NULL,
source_chunk_id TEXT REFERENCES source_chunks(id),
confidence REAL DEFAULT 1.0,
status TEXT DEFAULT 'active',
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE projects (
id TEXT PRIMARY KEY,
name TEXT UNIQUE NOT NULL,
description TEXT DEFAULT '',
status TEXT DEFAULT 'active',
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE interactions (
id TEXT PRIMARY KEY,
prompt TEXT NOT NULL,
context_pack TEXT DEFAULT '{}',
response_summary TEXT DEFAULT '',
project_id TEXT REFERENCES projects(id),
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
"""
)
conn.commit()
# Step 3: call init_db — this used to raise on the upgrade
# path. After the fix it should succeed.
init_db()
# Step 4: verify the migrations ran — Phase 9 columns present
with sqlite3.connect(str(db_path)) as conn:
conn.row_factory = sqlite3.Row
memories_cols = {
row["name"] for row in conn.execute("PRAGMA table_info(memories)")
}
interactions_cols = {
row["name"]
for row in conn.execute("PRAGMA table_info(interactions)")
}
assert "project" in memories_cols
assert "last_referenced_at" in memories_cols
assert "reference_count" in memories_cols
assert "project" in interactions_cols
assert "client" in interactions_cols
assert "session_id" in interactions_cols
assert "response" in interactions_cols
assert "memories_used" in interactions_cols
assert "chunks_used" in interactions_cols
# Step 5: verify the indexes on migration columns exist
index_rows = conn.execute(
"SELECT name FROM sqlite_master WHERE type='index' AND tbl_name IN ('memories','interactions')"
).fetchall()
index_names = {row["name"] for row in index_rows}
assert "idx_memories_project" in index_names
assert "idx_interactions_project_name" in index_names
assert "idx_interactions_session" in index_names
finally:
config.settings = original_settings

View File

@@ -209,3 +209,96 @@ def test_list_interactions_endpoint_returns_summaries(tmp_data_dir):
assert body["interactions"][0]["response_chars"] == 50
# The list endpoint never includes the full response body
assert "response" not in body["interactions"][0]
# --- alias canonicalization on interaction capture/list -------------------
def test_record_interaction_canonicalizes_project(project_registry):
"""Capturing under an alias should store the canonical project id.
Regression for codex's P2 finding: reinforcement and extraction
query memories by interaction.project; if the captured project is
a raw alias they would silently miss memories stored under the
canonical id.
"""
init_db()
project_registry(("p05-interferometer", ["p05", "interferometer"]))
interaction = record_interaction(
prompt="quick capture", response="response body", project="p05", reinforce=False
)
assert interaction.project == "p05-interferometer"
fetched = get_interaction(interaction.id)
assert fetched.project == "p05-interferometer"
def test_list_interactions_canonicalizes_project_filter(project_registry):
init_db()
project_registry(("p06-polisher", ["p06", "polisher"]))
record_interaction(prompt="a", response="ra", project="p06-polisher", reinforce=False)
record_interaction(prompt="b", response="rb", project="polisher", reinforce=False)
record_interaction(prompt="c", response="rc", project="atocore", reinforce=False)
# Query by an alias should still find both p06 captures
via_alias = list_interactions(project="p06")
via_canonical = list_interactions(project="p06-polisher")
assert len(via_alias) == 2
assert len(via_canonical) == 2
assert {i.prompt for i in via_alias} == {"a", "b"}
# --- since filter format normalization ------------------------------------
def test_list_interactions_since_accepts_iso_with_t_separator(tmp_data_dir):
init_db()
record_interaction(prompt="early", response="r", reinforce=False)
time.sleep(1.05)
pivot = record_interaction(prompt="late", response="r", reinforce=False)
# pivot.created_at is in storage format 'YYYY-MM-DD HH:MM:SS'.
# Build the equivalent ISO 8601 with 'T' that an external client
# would naturally send.
iso_with_t = pivot.created_at.replace(" ", "T")
items = list_interactions(since=iso_with_t)
assert any(i.id == pivot.id for i in items)
# The early row must also be excluded if its timestamp is strictly
# before the pivot — since is inclusive on the cutoff
early_ids = {i.id for i in items if i.prompt == "early"}
assert early_ids == set() or len(items) >= 1
def test_list_interactions_since_accepts_z_suffix(tmp_data_dir):
init_db()
pivot = record_interaction(prompt="pivot", response="r", reinforce=False)
time.sleep(1.05)
after = record_interaction(prompt="after", response="r", reinforce=False)
iso_with_z = pivot.created_at.replace(" ", "T") + "Z"
items = list_interactions(since=iso_with_z)
ids = {i.id for i in items}
assert pivot.id in ids
assert after.id in ids
def test_list_interactions_since_accepts_offset(tmp_data_dir):
init_db()
pivot = record_interaction(prompt="pivot", response="r", reinforce=False)
time.sleep(1.05)
after = record_interaction(prompt="after", response="r", reinforce=False)
iso_with_offset = pivot.created_at.replace(" ", "T") + "+00:00"
items = list_interactions(since=iso_with_offset)
assert any(i.id == after.id for i in items)
def test_list_interactions_since_storage_format_still_works(tmp_data_dir):
"""The bare storage format must still work for backwards compatibility."""
init_db()
pivot = record_interaction(prompt="pivot", response="r", reinforce=False)
items = list_interactions(since=pivot.created_at)
assert any(i.id == pivot.id for i in items)

View File

@@ -0,0 +1,802 @@
"""Tests for scripts/migrate_legacy_aliases.py.
The migration script closes the compatibility gap documented in
docs/architecture/project-identity-canonicalization.md. These tests
cover:
- empty/clean database behavior
- shadow projects detection
- state rekey without collisions
- state collision detection + apply refusal
- memory rekey + supersession of duplicates
- interaction rekey
- end-to-end apply on a realistic shadow
- idempotency (running twice produces the same final state)
- report artifact is written
- the pre-fix regression gap is actually closed after migration
"""
from __future__ import annotations
import json
import sqlite3
import sys
import uuid
from pathlib import Path
import pytest
from atocore.context.project_state import (
get_state,
init_project_state_schema,
)
from atocore.models.database import init_db
# Make scripts/ importable
_REPO_ROOT = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(_REPO_ROOT / "scripts"))
import migrate_legacy_aliases as mig # noqa: E402
# ---------------------------------------------------------------------------
# Helpers that seed "legacy" rows the way they would have looked before fb6298a
# ---------------------------------------------------------------------------
def _open_db_connection():
"""Open a direct SQLite connection to the test data dir's DB."""
import atocore.config as config
conn = sqlite3.connect(str(config.settings.db_path))
conn.row_factory = sqlite3.Row
conn.execute("PRAGMA foreign_keys = ON")
return conn
def _seed_shadow_project(
conn: sqlite3.Connection, shadow_name: str
) -> str:
"""Insert a projects row keyed under an alias, like the old set_state would have."""
project_id = str(uuid.uuid4())
conn.execute(
"INSERT INTO projects (id, name, description) VALUES (?, ?, ?)",
(project_id, shadow_name, f"shadow row for {shadow_name}"),
)
conn.commit()
return project_id
def _seed_state_row(
conn: sqlite3.Connection,
project_id: str,
category: str,
key: str,
value: str,
status: str = "active",
) -> str:
row_id = str(uuid.uuid4())
conn.execute(
"INSERT INTO project_state "
"(id, project_id, category, key, value, source, confidence, status) "
"VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
(row_id, project_id, category, key, value, "legacy-test", 1.0, status),
)
conn.commit()
return row_id
def _seed_memory_row(
conn: sqlite3.Connection,
memory_type: str,
content: str,
project: str,
status: str = "active",
) -> str:
row_id = str(uuid.uuid4())
conn.execute(
"INSERT INTO memories "
"(id, memory_type, content, project, source_chunk_id, confidence, status) "
"VALUES (?, ?, ?, ?, ?, ?, ?)",
(row_id, memory_type, content, project, None, 1.0, status),
)
conn.commit()
return row_id
def _seed_interaction_row(
conn: sqlite3.Connection, prompt: str, project: str
) -> str:
row_id = str(uuid.uuid4())
conn.execute(
"INSERT INTO interactions "
"(id, prompt, context_pack, response_summary, response, "
" memories_used, chunks_used, client, session_id, project, created_at) "
"VALUES (?, ?, '{}', '', '', '[]', '[]', 'legacy-test', '', ?, '2026-04-01 12:00:00')",
(row_id, prompt, project),
)
conn.commit()
return row_id
# ---------------------------------------------------------------------------
# plan-building tests
# ---------------------------------------------------------------------------
@pytest.fixture(autouse=True)
def _setup(tmp_data_dir):
init_db()
init_project_state_schema()
def test_dry_run_on_empty_registry_reports_empty_plan(tmp_data_dir):
"""Empty registry -> empty alias map -> empty plan."""
registry_path = tmp_data_dir / "empty-registry.json"
registry_path.write_text('{"projects": []}', encoding="utf-8")
conn = _open_db_connection()
try:
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
assert plan.alias_map == {}
assert plan.is_empty
assert not plan.has_collisions
assert plan.counts() == {
"shadow_projects": 0,
"state_rekey_rows": 0,
"state_collisions": 0,
"state_historical_drops": 0,
"memory_rekey_rows": 0,
"memory_supersede_rows": 0,
"interaction_rekey_rows": 0,
}
def test_dry_run_on_clean_registered_db_reports_empty_plan(project_registry):
"""A registry with projects but no legacy rows -> empty plan."""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
assert plan.alias_map != {}
assert plan.is_empty
def test_dry_run_finds_shadow_project(project_registry):
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
_seed_shadow_project(conn, "p05")
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
assert len(plan.shadow_projects) == 1
assert plan.shadow_projects[0].shadow_name == "p05"
assert plan.shadow_projects[0].canonical_project_id == "p05-interferometer"
def test_dry_run_plans_state_rekey_without_collisions(project_registry):
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1 ingestion")
_seed_state_row(conn, shadow_id, "decision", "lateral_support", "GF-PTFE")
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
assert len(plan.state_plans) == 1
sp = plan.state_plans[0]
assert len(sp.rows_to_rekey) == 2
assert sp.collisions == []
assert not plan.has_collisions
def test_dry_run_detects_state_collision(project_registry):
"""Shadow and canonical both have state under the same (category, key) with different values."""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1")
_seed_state_row(
conn, canonical_id, "status", "next_focus", "Wave 2"
)
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
assert plan.has_collisions
collision = plan.state_plans[0].collisions[0]
assert collision["shadow"]["value"] == "Wave 1"
assert collision["canonical"]["value"] == "Wave 2"
def test_dry_run_plans_memory_rekey_and_supersession(project_registry):
registry_path = project_registry(
("p04-gigabit", ["p04", "gigabit"])
)
conn = _open_db_connection()
try:
# A clean memory under the alias that will just be rekeyed
_seed_memory_row(conn, "project", "clean rekey memory", "p04")
# A memory that collides with an existing canonical memory
_seed_memory_row(conn, "project", "duplicate content", "p04")
_seed_memory_row(conn, "project", "duplicate content", "p04-gigabit")
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
# There's exactly one memory plan (one alias matched)
assert len(plan.memory_plans) == 1
mp = plan.memory_plans[0]
# Two rows are candidates for rekey or supersession — one clean,
# one duplicate. The duplicate is handled via to_supersede; the
# other via rows_to_rekey.
total_affected = len(mp.rows_to_rekey) + len(mp.to_supersede)
assert total_affected == 2
def test_dry_run_plans_interaction_rekey(project_registry):
registry_path = project_registry(
("p06-polisher", ["p06", "polisher"])
)
conn = _open_db_connection()
try:
_seed_interaction_row(conn, "quick capture under alias", "polisher")
_seed_interaction_row(conn, "another alias-keyed row", "p06")
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
total = sum(len(p.rows_to_rekey) for p in plan.interaction_plans)
assert total == 2
# ---------------------------------------------------------------------------
# apply tests
# ---------------------------------------------------------------------------
def test_apply_refuses_on_state_collision(project_registry):
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1")
_seed_state_row(conn, canonical_id, "status", "next_focus", "Wave 2")
plan = mig.build_plan(conn, registry_path)
assert plan.has_collisions
with pytest.raises(mig.MigrationRefused):
mig.apply_plan(conn, plan)
finally:
conn.close()
def test_apply_migrates_clean_shadow_end_to_end(project_registry):
"""The happy path: one shadow project with clean state rows, rekey into a freshly-created canonical row, verify reachability via get_state."""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
_seed_state_row(
conn, shadow_id, "status", "next_focus", "Wave 1 ingestion"
)
_seed_state_row(
conn, shadow_id, "decision", "lateral_support", "GF-PTFE"
)
plan = mig.build_plan(conn, registry_path)
assert not plan.has_collisions
summary = mig.apply_plan(conn, plan)
finally:
conn.close()
assert summary["state_rows_rekeyed"] == 2
assert summary["shadow_projects_deleted"] == 1
assert summary["canonical_rows_created"] == 1
# The regression gap is now closed: the service layer can see
# the state under the canonical id via either the alias OR the
# canonical.
via_alias = get_state("p05")
via_canonical = get_state("p05-interferometer")
assert len(via_alias) == 2
assert len(via_canonical) == 2
values = {entry.value for entry in via_canonical}
assert values == {"Wave 1 ingestion", "GF-PTFE"}
def test_apply_drops_shadow_state_duplicate_without_collision(project_registry):
"""Shadow and canonical both have the same (category, key, value) — shadow gets marked superseded rather than hitting the UNIQUE constraint."""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
_seed_state_row(
conn, shadow_id, "status", "next_focus", "Wave 1 ingestion"
)
_seed_state_row(
conn, canonical_id, "status", "next_focus", "Wave 1 ingestion"
)
plan = mig.build_plan(conn, registry_path)
assert not plan.has_collisions
summary = mig.apply_plan(conn, plan)
finally:
conn.close()
assert summary["state_rows_merged_as_duplicate"] == 1
via_canonical = get_state("p05-interferometer")
# Exactly one active row survives
assert len(via_canonical) == 1
assert via_canonical[0].value == "Wave 1 ingestion"
def test_apply_preserves_superseded_shadow_state_when_no_collision(project_registry):
"""Regression test for the codex-flagged data-loss bug.
Before the fix, plan_state_migration only selected status='active'
rows. Any superseded or invalid row on the shadow project was
invisible to the plan and got silently cascade-deleted when the
shadow projects row was dropped at the end of apply. That's
exactly the kind of audit loss a cleanup migration must not cause.
This test seeds a shadow project with a superseded state row on
a triple the canonical project doesn't have, runs the migration,
and verifies the row survived and is now attached to the
canonical project (still with status='superseded').
"""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
# Superseded row on a triple the canonical won't have
_seed_state_row(
conn,
shadow_id,
"status",
"historical_phase",
"Phase 0 legacy",
status="superseded",
)
plan = mig.build_plan(conn, registry_path)
assert not plan.has_collisions
summary = mig.apply_plan(conn, plan)
finally:
conn.close()
# The superseded row should have been rekeyed, not dropped
assert summary["state_rows_rekeyed"] == 1
assert summary["state_rows_historical_dropped"] == 0
# Verify via raw SQL that the row is now attached to the canonical
# projects row and still has status='superseded'
conn = _open_db_connection()
try:
row = conn.execute(
"SELECT ps.status, ps.value, p.name "
"FROM project_state ps JOIN projects p ON ps.project_id = p.id "
"WHERE ps.category = ? AND ps.key = ?",
("status", "historical_phase"),
).fetchone()
finally:
conn.close()
assert row is not None, "superseded shadow row was lost during migration"
assert row["status"] == "superseded"
assert row["value"] == "Phase 0 legacy"
assert row["name"] == "p05-interferometer"
def test_apply_drops_shadow_inactive_row_when_canonical_holds_same_triple(project_registry):
"""Shadow is inactive (superseded) and collides with an active canonical row.
The canonical wins by definition of the UPSERT schema. The shadow
row is recorded as a historical_drop in the plan so the operator
sees the audit loss, and the apply cascade-deletes it via the
shadow projects row. This is the unavoidable data-loss case
documented in the migration module docstring.
"""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
# Shadow has a superseded value on a triple where the canonical
# has a different active value. Can't preserve both: UNIQUE
# allows only one row per triple.
_seed_state_row(
conn,
shadow_id,
"status",
"next_focus",
"Old wave 1",
status="superseded",
)
_seed_state_row(
conn,
canonical_id,
"status",
"next_focus",
"Wave 2 trusted-operational",
status="active",
)
plan = mig.build_plan(conn, registry_path)
assert not plan.has_collisions # not an active-vs-active collision
assert plan.counts()["state_historical_drops"] == 1
summary = mig.apply_plan(conn, plan)
finally:
conn.close()
assert summary["state_rows_historical_dropped"] == 1
# The canonical's active row survives unchanged
via_canonical = get_state("p05-interferometer")
active_next_focus = [
e
for e in via_canonical
if e.category == "status" and e.key == "next_focus"
]
assert len(active_next_focus) == 1
assert active_next_focus[0].value == "Wave 2 trusted-operational"
def test_apply_replaces_inactive_canonical_with_active_shadow(project_registry):
"""Shadow is active, canonical has an inactive row at the same triple.
The shadow wins: canonical inactive row is deleted, shadow is
rekeyed into canonical's project_id. This covers the
cross-contamination case where the old alias path was used for
the live value while the canonical path had a stale row.
"""
registry_path = project_registry(
("p06-polisher", ["p06", "polisher"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p06")
canonical_id = _seed_shadow_project(conn, "p06-polisher")
# Canonical has a stale invalid row; shadow has the live value.
_seed_state_row(
conn,
canonical_id,
"decision",
"frame",
"Old frame (no longer current)",
status="invalid",
)
_seed_state_row(
conn,
shadow_id,
"decision",
"frame",
"kinematic mount frame",
status="active",
)
plan = mig.build_plan(conn, registry_path)
assert not plan.has_collisions
assert plan.counts()["state_historical_drops"] == 0
summary = mig.apply_plan(conn, plan)
finally:
conn.close()
assert summary["state_rows_replaced_inactive_canonical"] == 1
# The active shadow value now lives on the canonical row
via_canonical = get_state("p06-polisher")
frame_entries = [
e for e in via_canonical if e.category == "decision" and e.key == "frame"
]
assert len(frame_entries) == 1
assert frame_entries[0].value == "kinematic mount frame"
# Confirm via raw SQL that the previously-inactive canonical row
# no longer exists
conn = _open_db_connection()
try:
stale = conn.execute(
"SELECT COUNT(*) AS c FROM project_state WHERE value = ?",
("Old frame (no longer current)",),
).fetchone()
finally:
conn.close()
assert stale["c"] == 0
def test_apply_migrates_memories(project_registry):
registry_path = project_registry(
("p04-gigabit", ["p04", "gigabit"])
)
conn = _open_db_connection()
try:
_seed_memory_row(conn, "project", "lateral support uses GF-PTFE", "p04")
_seed_memory_row(conn, "preference", "I prefer descriptive commits", "gigabit")
plan = mig.build_plan(conn, registry_path)
summary = mig.apply_plan(conn, plan)
finally:
conn.close()
assert summary["memory_rows_rekeyed"] == 2
# Both memories should now read as living under the canonical id
from atocore.memory.service import get_memories
rows = get_memories(project="p04-gigabit", limit=50)
contents = {m.content for m in rows}
assert "lateral support uses GF-PTFE" in contents
assert "I prefer descriptive commits" in contents
def test_apply_migrates_interactions(project_registry):
registry_path = project_registry(
("p06-polisher", ["p06", "polisher"])
)
conn = _open_db_connection()
try:
_seed_interaction_row(conn, "alias-keyed 1", "polisher")
_seed_interaction_row(conn, "alias-keyed 2", "p06")
plan = mig.build_plan(conn, registry_path)
summary = mig.apply_plan(conn, plan)
finally:
conn.close()
assert summary["interaction_rows_rekeyed"] == 2
from atocore.interactions.service import list_interactions
rows = list_interactions(project="p06-polisher", limit=50)
prompts = {i.prompt for i in rows}
assert prompts == {"alias-keyed 1", "alias-keyed 2"}
def test_apply_is_idempotent(project_registry):
"""Running apply twice produces the same final state as running it once."""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1")
_seed_memory_row(conn, "project", "m1", "p05")
_seed_interaction_row(conn, "i1", "p05")
# first apply
plan_a = mig.build_plan(conn, registry_path)
summary_a = mig.apply_plan(conn, plan_a)
# second apply: plan should be empty
plan_b = mig.build_plan(conn, registry_path)
assert plan_b.is_empty
# forcing a second apply on the empty plan via the function
# directly should also succeed as a no-op (caller normally
# has to pass --allow-empty through the CLI, but apply_plan
# itself doesn't enforce that — the refusal is in run())
summary_b = mig.apply_plan(conn, plan_b)
finally:
conn.close()
assert summary_a["state_rows_rekeyed"] == 1
assert summary_a["memory_rows_rekeyed"] == 1
assert summary_a["interaction_rows_rekeyed"] == 1
assert summary_b["state_rows_rekeyed"] == 0
assert summary_b["memory_rows_rekeyed"] == 0
assert summary_b["interaction_rows_rekeyed"] == 0
def test_apply_refuses_with_integrity_errors(project_registry):
"""If the projects table has two case-variant rows for the canonical id, refuse.
The projects.name column has a case-sensitive UNIQUE constraint,
so exact duplicates can't exist. But case-variant rows
``p05-interferometer`` and ``P05-Interferometer`` can both
survive the UNIQUE constraint while both matching the
case-insensitive ``lower(name) = lower(?)`` lookup that the
migration uses to find the canonical row. That ambiguity
(which canonical row should dependents rekey into?) is exactly
the integrity failure the migration is guarding against.
"""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
_seed_shadow_project(conn, "p05-interferometer")
_seed_shadow_project(conn, "P05-Interferometer")
plan = mig.build_plan(conn, registry_path)
assert plan.integrity_errors
with pytest.raises(mig.MigrationRefused):
mig.apply_plan(conn, plan)
finally:
conn.close()
# ---------------------------------------------------------------------------
# reporting tests
# ---------------------------------------------------------------------------
def test_plan_to_json_dict_is_serializable(project_registry):
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
_seed_state_row(conn, shadow_id, "status", "next_focus", "Wave 1")
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
payload = mig.plan_to_json_dict(plan)
# Must be JSON-serializable
json_str = json.dumps(payload, default=str)
assert "p05-interferometer" in json_str
assert payload["counts"]["state_rekey_rows"] == 1
def test_write_report_creates_file(tmp_path, project_registry):
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
report_dir = tmp_path / "reports"
report_path = mig.write_report(
plan,
summary=None,
db_path=Path("/tmp/fake.db"),
registry_path=registry_path,
mode="dry-run",
report_dir=report_dir,
)
assert report_path.exists()
payload = json.loads(report_path.read_text(encoding="utf-8"))
assert payload["mode"] == "dry-run"
assert "plan" in payload
def test_render_plan_text_on_empty_plan(project_registry):
registry_path = project_registry() # empty
conn = _open_db_connection()
try:
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
text = mig.render_plan_text(plan)
assert "nothing to plan" in text.lower()
def test_render_plan_text_on_collision(project_registry):
registry_path = project_registry(
("p05-interferometer", ["p05"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
canonical_id = _seed_shadow_project(conn, "p05-interferometer")
_seed_state_row(conn, shadow_id, "status", "phase", "A")
_seed_state_row(conn, canonical_id, "status", "phase", "B")
plan = mig.build_plan(conn, registry_path)
finally:
conn.close()
text = mig.render_plan_text(plan)
assert "COLLISION" in text.upper()
assert "REFUSE" in text.upper() or "refuse" in text.lower()
# ---------------------------------------------------------------------------
# gap-closed companion test — the flip side of
# test_legacy_alias_keyed_state_is_invisible_until_migrated in
# test_project_state.py. After running this migration, the legacy row
# IS reachable via the canonical id.
# ---------------------------------------------------------------------------
def test_legacy_alias_gap_is_closed_after_migration(project_registry):
"""End-to-end regression test for the canonicalization gap.
Simulates the exact scenario from
test_legacy_alias_keyed_state_is_invisible_until_migrated in
test_project_state.py — a shadow projects row with a state row
pointing at it. Runs the migration. Verifies the state is now
reachable via the canonical id.
"""
registry_path = project_registry(
("p05-interferometer", ["p05", "interferometer"])
)
conn = _open_db_connection()
try:
shadow_id = _seed_shadow_project(conn, "p05")
_seed_state_row(
conn, shadow_id, "status", "legacy_focus", "Wave 1 ingestion"
)
# Before migration: the legacy row is invisible to get_state
# (this is the documented gap, covered in test_project_state.py)
assert all(
entry.value != "Wave 1 ingestion" for entry in get_state("p05")
)
assert all(
entry.value != "Wave 1 ingestion"
for entry in get_state("p05-interferometer")
)
# Run the migration
plan = mig.build_plan(conn, registry_path)
mig.apply_plan(conn, plan)
finally:
conn.close()
# After migration: the row is reachable via canonical AND alias
via_canonical = get_state("p05-interferometer")
via_alias = get_state("p05")
assert any(e.value == "Wave 1 ingestion" for e in via_canonical)
assert any(e.value == "Wave 1 ingestion" for e in via_alias)

View File

@@ -131,3 +131,139 @@ def test_format_project_state():
def test_format_empty():
"""Test formatting empty state."""
assert format_project_state([]) == ""
# --- Alias canonicalization regression tests --------------------------------
def test_set_state_canonicalizes_alias(project_registry):
"""Writing state via an alias should land under the canonical project id.
Regression for codex's P1 finding: previously /project/state with
project="p05" created a separate alias row that later context builds
(which canonicalize the hint) would never see.
"""
project_registry(("p05-interferometer", ["p05", "interferometer"]))
set_state("p05", "status", "next_focus", "Wave 2 ingestion")
# The state must be reachable via every alias AND the canonical id
via_alias = get_state("p05")
via_canonical = get_state("p05-interferometer")
via_other_alias = get_state("interferometer")
assert len(via_alias) == 1
assert len(via_canonical) == 1
assert len(via_other_alias) == 1
# All three reads return the same row id (no fragmented duplicates)
assert via_alias[0].id == via_canonical[0].id == via_other_alias[0].id
assert via_canonical[0].value == "Wave 2 ingestion"
def test_get_state_canonicalizes_alias_after_canonical_write(project_registry):
"""Reading via an alias should find state written under the canonical id."""
project_registry(("p04-gigabit", ["p04", "gigabit"]))
set_state("p04-gigabit", "status", "phase", "Phase 1 baseline")
via_alias = get_state("gigabit")
assert len(via_alias) == 1
assert via_alias[0].value == "Phase 1 baseline"
def test_invalidate_state_canonicalizes_alias(project_registry):
"""Invalidating via an alias should hit the canonical row."""
project_registry(("p06-polisher", ["p06", "polisher"]))
set_state("p06-polisher", "decision", "frame", "kinematic mounts")
success = invalidate_state("polisher", "decision", "frame")
assert success is True
active = get_state("p06-polisher")
assert len(active) == 0
def test_unregistered_project_state_still_works(project_registry):
"""Hand-curated state for an unregistered project must still round-trip.
Backwards compatibility with state created before the project
registry existed: resolve_project_name returns the input unchanged
when the registry has no record, so the raw name is used as-is.
"""
project_registry() # empty registry
set_state("orphan-project", "status", "phase", "Standalone")
entries = get_state("orphan-project")
assert len(entries) == 1
assert entries[0].value == "Standalone"
def test_legacy_alias_keyed_state_is_invisible_until_migrated(project_registry):
"""Documents the compatibility gap from project-identity-canonicalization.md.
Rows that were written under a registered alias BEFORE the
canonicalization landed in fb6298a are stored in the projects
table under the alias name (not the canonical id). Every read
path now canonicalizes to the canonical id, so those legacy
rows become invisible.
This test simulates the legacy state by inserting a shadow
project row and a state row that points at it via raw SQL,
bypassing set_state() which now canonicalizes. Then it
verifies the canonicalized get_state() does NOT find the
legacy row.
When the legacy alias migration script lands (see the open
follow-ups in docs/architecture/project-identity-canonicalization.md),
this test must be inverted: after running the migration the
legacy state should be reachable via the canonical project,
not invisible. The migration is required before engineering
V1 ships.
"""
import uuid
from atocore.models.database import get_connection
project_registry(("p05-interferometer", ["p05", "interferometer"]))
# Simulate a pre-fix legacy row by writing directly under the
# alias name. This is what the OLD set_state would have done
# before fb6298a added canonicalization.
legacy_project_id = str(uuid.uuid4())
legacy_state_id = str(uuid.uuid4())
with get_connection() as conn:
conn.execute(
"INSERT INTO projects (id, name, description) VALUES (?, ?, ?)",
(legacy_project_id, "p05", "shadow row created before canonicalization"),
)
conn.execute(
"INSERT INTO project_state "
"(id, project_id, category, key, value, source, confidence) "
"VALUES (?, ?, ?, ?, ?, ?, ?)",
(
legacy_state_id,
legacy_project_id,
"status",
"legacy_focus",
"Wave 1 ingestion",
"pre-canonicalization",
1.0,
),
)
# The canonicalized read path looks under "p05-interferometer"
# and cannot see the legacy row. THIS IS THE GAP.
via_alias = get_state("p05")
via_canonical = get_state("p05-interferometer")
assert all(entry.value != "Wave 1 ingestion" for entry in via_alias)
assert all(entry.value != "Wave 1 ingestion" for entry in via_canonical)
# The legacy row is still in the database — it's just unreachable
# from the canonicalized read path. The migration script (open
# follow-up) is what closes the gap.
with get_connection() as conn:
row = conn.execute(
"SELECT value FROM project_state WHERE id = ?", (legacy_state_id,)
).fetchone()
assert row is not None
assert row["value"] == "Wave 1 ingestion"

View File

@@ -6,6 +6,8 @@ from atocore.interactions.service import record_interaction
from atocore.main import app
from atocore.memory.reinforcement import (
DEFAULT_CONFIDENCE_DELTA,
_stem,
_tokenize,
reinforce_from_interaction,
)
from atocore.memory.service import (
@@ -314,3 +316,177 @@ def test_api_post_interactions_accepts_reinforce_false(tmp_data_dir):
reloaded = [m for m in get_memories(memory_type="preference", limit=20) if m.id == mem.id][0]
assert reloaded.confidence == 0.5
assert reloaded.reference_count == 0
# --- alias canonicalization end-to-end -------------------------------------
def test_reinforcement_works_when_capture_uses_alias(project_registry):
"""End-to-end: capture under an alias, seed memory under canonical id,
verify reinforcement still finds and bumps the memory.
Regression for codex's P2 finding: previously interaction.project
was stored verbatim and reinforcement queried memories using that
raw value, so capturing under "p05" while memories live under
"p05-interferometer" silently missed everything.
"""
init_db()
project_registry(("p05-interferometer", ["p05", "interferometer"]))
# Seed an active memory under the CANONICAL id
mem = create_memory(
memory_type="project",
content="the lateral support pads use GF-PTFE for thermal stability",
project="p05-interferometer",
confidence=0.5,
)
# Capture an interaction under the ALIAS — this is the bug case
record_interaction(
prompt="status update",
response=(
"Quick note: the lateral support pads use GF-PTFE for thermal "
"stability and that's still the current selection."
),
project="p05",
)
# The seeded memory should have been reinforced
reloaded = [
m
for m in get_memories(memory_type="project", project="p05-interferometer", limit=20)
if m.id == mem.id
][0]
assert reloaded.confidence > 0.5
assert reloaded.reference_count == 1
def test_get_memories_filter_by_alias(project_registry):
"""Filtering memories by an alias should find rows stored under canonical."""
init_db()
project_registry(("p04-gigabit", ["p04", "gigabit"]))
create_memory(memory_type="project", content="m1", project="p04-gigabit")
create_memory(memory_type="project", content="m2", project="gigabit")
via_alias = get_memories(memory_type="project", project="p04")
via_canonical = get_memories(memory_type="project", project="p04-gigabit")
assert len(via_alias) == 2
assert len(via_canonical) == 2
assert {m.content for m in via_alias} == {"m1", "m2"}
# --- token-overlap matcher: unit tests -------------------------------------
def test_stem_folds_s_ed_ing():
assert _stem("prefers") == "prefer"
assert _stem("preferred") == "prefer"
assert _stem("services") == "service"
assert _stem("processing") == "process"
# Short words must not be over-stripped
assert _stem("red") == "red" # 3 chars, don't strip "ed"
assert _stem("bus") == "bus" # 3 chars, don't strip "s"
assert _stem("sing") == "sing" # 4 chars, don't strip "ing"
assert _stem("being") == "being" # 5 chars, "ing" strip leaves "be" (2) — too short
def test_tokenize_removes_stop_words():
tokens = _tokenize("the quick brown fox jumps over the lazy dog")
assert "the" not in tokens
assert "quick" in tokens
assert "brown" in tokens
assert "fox" in tokens
assert "dog" in tokens
# "over" has len 4, not a stop word → kept (stemmed: "over")
assert "over" in tokens
# --- token-overlap matcher: paraphrase matching ----------------------------
def test_reinforce_matches_paraphrase_prefers_vs_prefer(tmp_data_dir):
"""The canonical rebase case from phase9-first-real-use.md."""
init_db()
mem = create_memory(
memory_type="preference",
content="prefers rebase-based workflows because history stays linear",
confidence=0.5,
)
interaction = _make_interaction(
response=(
"I prefer rebase-based workflows because the history stays "
"linear and reviewers have an easier time."
),
)
results = reinforce_from_interaction(interaction)
assert any(r.memory_id == mem.id for r in results)
def test_reinforce_matches_paraphrase_with_articles_and_ed(tmp_data_dir):
init_db()
mem = create_memory(
memory_type="preference",
content="preferred structured logging across all backend services",
confidence=0.5,
)
interaction = _make_interaction(
response=(
"I set up structured logging across all the backend services, "
"which the team prefers for consistency."
),
)
results = reinforce_from_interaction(interaction)
assert any(r.memory_id == mem.id for r in results)
def test_reinforce_rejects_low_overlap(tmp_data_dir):
init_db()
mem = create_memory(
memory_type="preference",
content="always uses Python for data processing scripts",
confidence=0.5,
)
interaction = _make_interaction(
response=(
"The CI pipeline runs on Node.js and deploys to Kubernetes "
"using Helm charts."
),
)
results = reinforce_from_interaction(interaction)
assert all(r.memory_id != mem.id for r in results)
def test_reinforce_matches_at_70_percent_threshold(tmp_data_dir):
"""Exactly 7 of 10 content tokens present → should match."""
init_db()
# After stop-word removal and stemming, this has 10 tokens:
# alpha, bravo, charlie, delta, echo, foxtrot, golf, hotel, india, juliet
mem = create_memory(
memory_type="preference",
content="alpha bravo charlie delta echo foxtrot golf hotel india juliet",
confidence=0.5,
)
# Echo 7 of 10 tokens (70%) plus some noise
interaction = _make_interaction(
response="alpha bravo charlie delta echo foxtrot golf noise words here",
)
results = reinforce_from_interaction(interaction)
assert any(r.memory_id == mem.id for r in results)
def test_reinforce_rejects_below_70_percent(tmp_data_dir):
"""Only 6 of 10 content tokens present (60%) → should NOT match."""
init_db()
mem = create_memory(
memory_type="preference",
content="alpha bravo charlie delta echo foxtrot golf hotel india juliet",
confidence=0.5,
)
# Echo 6 of 10 tokens (60%) plus noise
interaction = _make_interaction(
response="alpha bravo charlie delta echo foxtrot noise words here only",
)
results = reinforce_from_interaction(interaction)
assert all(r.memory_id != mem.id for r in results)