Commit Graph

68 Commits

Author SHA1 Message Date
1161645415 fix: raise project-memory budget ratio to 0.25
At 0.15 the effective per-call allowance (450 - 55 wrapper) was 395
chars, which is just under the length of a real paragraph-length
project memory (~400 chars). Verified on live p04 probe: band was
still absent after the flat-budget fix because the first memory
entry was one character too long for the budget.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:51:04 -04:00
5913da53c5 fix: flat-budget walk in get_memories_for_context
The per-type slicing (available // len(memory_types)) starved
paragraph-length memories: with 3 types and a 450-char budget,
each type got ~131 chars while real project memories are 300-500
chars each — every entry was skipped and the new Project Memories
band never appeared in the live pack.

Switch to a flat budget pool walked type-by-type in order. Short
identity/preference memories still get first pick when the budget
is tight, but long project memories can now compete for space.

Caught on the first post-deploy probe: 2 active p04 memories
existed but none landed in formatted_context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:43:41 -04:00
8ea53f4003 feat: fold project-scoped memories into context pack
The retrieval-quality review on 2026-04-11 found that active
project/knowledge/episodic memories never reached the pack: only
Trusted Project State and identity/preference memories were being
assembled. Reinforcement bumped confidence on memories that had
no retrieval outlet, so the reflection loop was half-open.

This change adds a third memory tier between identity/preference
and retrieved chunks:

- PROJECT_MEMORY_BUDGET_RATIO = 0.15
- Memory types: project, knowledge, episodic
- Only populated when a canonical project is in scope — without
  a project hint, project memories stay out (cross-project bleed
  would rot the signal)
- Rendered under a dedicated "--- Project Memories ---" header
  so the LLM can distinguish it from the identity/preference band
- Trim order in _trim_context_to_budget: retrieval → project
  memories → identity/preference → project state (most recently
  added tier drops first when budget is tight)

get_memories_for_context gains header/footer kwargs so the two
memory blocks can be distinguished in a single pack without a
second helper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:35:40 -04:00
9366ba7879 feat: length-aware reinforcement + batch triage CLI + off-host backup
- Reinforcement matcher now handles paragraph-length memories via a
  dual-mode threshold: short memories keep the 70% overlap rule,
  long memories (>15 stems) require 12 absolute overlaps AND 35%
  fraction so organic paraphrase can still reinforce. Diagnosis:
  every active memory stayed at reference_count=0 because 40-token
  project summaries never hit 70% overlap on real responses.
- scripts/atocore_client.py gains batch-extract (fan out
  /interactions/{id}/extract over recent interactions) and triage
  (interactive promote/reject walker for the candidate queue),
  matching the Phase 9 reflection-loop review flow without pulling
  extraction into the capture hot path.
- deploy/dalidou/cron-backup.sh adds an optional off-host rsync step
  gated on ATOCORE_BACKUP_RSYNC, fail-open when the target is offline
  so a laptop being off at 03:00 UTC never reds the local backup.
- docs/next-steps.md records the retrieval-quality sweep: project
  state surfaces, chunks are on-topic but broad, active memories
  never reach the pack (reflection loop has no retrieval outlet yet).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:20:03 -04:00
c5bad996a7 feat: enable reinforcement on live capture
The Stop hook now sends reinforce=true so the token-overlap matcher
runs on every captured interaction. Memory confidence will accumulate
signal from organic Claude Code use.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 10:58:56 -04:00
0b1742770a feat: cleanup endpoint, auto-extraction on capture, daily cron script
- POST /admin/backup/cleanup — retention cleanup via API (dry-run by default)
- record_interaction() accepts extract=True to auto-extract candidate
  memories from response text using the Phase 9C rule-based extractor
- POST /interactions accepts extract field to enable extraction on capture
- deploy/dalidou/cron-backup.sh — daily backup + cleanup for cron

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 10:28:32 -04:00
2829d5ec1c Merge hardening sprint: reinforcement matcher + backup ops
- Task A: token-overlap reinforcement matcher (fixes broken substring matching)
- Task B: automatic post-backup validation
- Task C: backup retention cleanup with CLI subcommand

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 10:02:35 -04:00
58c744fd2f feat: post-backup validation + retention cleanup (Tasks B & C)
- create_runtime_backup() now auto-validates its output and includes
  validated/validation_errors fields in returned metadata
- New cleanup_old_backups() with retention policy: 7 daily, 4 weekly
  (Sundays), 6 monthly (1st of month), dry-run by default
- CLI `cleanup` subcommand added to backup module
- 9 new tests (2 validation + 7 retention), 259 total passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:46:46 -04:00
a34a7a995f fix: token-overlap matcher for reinforcement (Phase 9B)
Replace the substring-based _memory_matches() with a token-overlap
matcher that tokenizes both memory content and response, applies
lightweight stemming (trailing s/ed/ing) and stop-word removal, then
checks whether >= 70% of the memory's tokens appear in the response.

This fixes the paraphrase blindness that prevented reinforcement from
ever firing on natural responses ("prefers" vs "prefer", "because
history" vs "because the history").

7 new tests (26 total reinforcement tests, all passing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:40:05 -04:00
92fc250b54 fix: use correct hook field name last_assistant_message
The Claude Code Stop hook sends `last_assistant_message`, not
`assistant_message`. This was causing response_chars=0 on all
captured interactions. Also removes the temporary debug log block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:17:21 -04:00
2d911909f8 feat: auto-capture Claude Code sessions via Stop hook
Add deploy/hooks/capture_stop.py — a Claude Code Stop hook that reads
the transcript JSONL, extracts the last user prompt, and POSTs to the
AtoCore /interactions endpoint in conservative mode (reinforce=false).

Conservative mode means: capture only, no automatic reinforcement or
extraction into the review queue. Kill switch: ATOCORE_CAPTURE_DISABLED=1.

Also: note build_sha cosmetic issue after restore in runbook, update
project status docs to reflect drill pass and auto-capture wiring.

17 new tests (243 total, all passing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:00:42 -04:00
1a8fdf4225 fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.

## Chroma restore bind-mount bug (drill finding)

src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
  OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.

Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.

Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.

## Doc consolidation

docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.

- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
  - Replaced the manual sudo cp restore sequence with the new
    `python -m atocore.ops.backup restore <STAMP>
    --confirm-service-stopped` CLI
  - Added the one-shot docker compose run pattern for running
    restore inside a container that reuses the live volume mounts
  - Documented the --no-pre-snapshot / --no-chroma / --chroma flags
  - New "Chroma restore and bind-mounted volumes" subsection
    explaining the bug and the regression test that protects the fix
  - New "Restore drill" subsection with three levels (unit tests,
    module round-trip, live Dalidou drill) and the cadence list
  - Failure-mode table gained four entries: restored_integrity_ok,
    Device-or-resource-busy, drill marker still present,
    chroma_snapshot_missing
  - "Open follow-ups" struck the restore_runtime_backup item (done)
    and added a "Done (historical)" note referencing 2026-04-09
  - Quickstart cheat sheet now has a full drill one-liner using
    memory_type=episodic (the 2026-04-09 drill found the runbook's
    memory_type=note was invalid — the valid set is identity,
    preference, project, episodic, knowledge, adaptation)

## Status doc sync

Long overdue — I've been landing code without updating the
project's narrative state docs.

docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
  real with CLI, pre-restore safety snapshot, WAL cleanup,
  integrity check; live drill on 2026-04-09 surfaced and fixed
  Chroma bind-mount bug; deploy provenance via /health build_sha;
  deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
  auto-capture (priority 2) are now ahead of retrieval quality work,
  reflecting the updated unblock sequence

docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
  (create/list/validate/restore, admin/backup endpoint with
  include_chroma, /health provenance, self-update guard,
  procedure doc with failure modes) and what's still pending
  (retention cleanup, off-Dalidou target, auto-validation)

## Test count

226 passing (was 225 + 1 new inode-stability regression test).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
336208004c ops: add restore_runtime_backup + drill runbook
Close the backup side of the loop: we had create/list/validate but
no restore, and no documented drill. A backup you've never restored
is not a backup. This lands the missing restore surface and the
procedure to exercise it before enabling any write-path automation
(auto-capture, automated ingestion, reinforcement sweeps).

Code — src/atocore/ops/backup.py:

- restore_runtime_backup(stamp, *, include_chroma, pre_restore_snapshot,
  confirm_service_stopped) performs:
  1. validate_backup() gate — refuse on any error
  2. pre-restore safety snapshot of current state (reversibility anchor)
  3. PRAGMA wal_checkpoint(TRUNCATE) on target db (flush + release
     OS handles; Windows needs this after conn.backup() reads)
  4. unlink stale -wal/-shm sidecars (tolerant to Windows lock races)
  5. shutil.copy2 snapshot db over target
  6. restore registry if snapshot captured one
  7. restore Chroma tree if snapshot captured one and include_chroma
     resolves to true (defaults to whether backup has Chroma)
  8. PRAGMA integrity_check on restored db, report result
- Refuses without confirm_service_stopped=True to prevent hot-restore
  into a running service (would corrupt SQLite state)
- Rewrote main() as argparse with 4 subcommands: create, list,
  validate, restore. `python -m atocore.ops.backup restore STAMP
  --confirm-service-stopped` is the drill CLI entry point, run via
  `docker compose run --rm --entrypoint python atocore` so it reuses
  the live service's volume mounts

Tests — tests/test_backup.py (6 new):

- test_restore_refuses_without_confirm_service_stopped
- test_restore_raises_on_invalid_backup
- test_restore_round_trip_reverses_post_backup_mutations
  (canonical drill flow: seed -> backup -> mutate -> restore ->
   mutation gone + baseline survived + pre-restore snapshot has
   the mutation captured as rollback anchor)
- test_restore_round_trip_with_chroma
- test_restore_skips_pre_snapshot_when_requested
- test_restore_cleans_stale_wal_sidecars (asserts stale byte
  markers do not survive, not file existence, since PRAGMA
  integrity_check may legitimately recreate -wal)

Docs — docs/backup-restore-drill.md (new):

- What gets backed up (hot sqlite, cold chroma, registry JSON,
  metadata.json) and what doesn't (.env, source content)
- What restore does, step by step, and why confirm_service_stopped
  is a hard gate
- 8-step drill procedure: capture -> baseline -> mutate -> stop ->
  restore -> start -> verify marker gone -> optional cleanup
- Correct endpoint bodies verified against routes.py:
    POST /admin/backup with JSON body {"include_chroma": true}
    POST /memory with memory_type/content/project/confidence
    GET /memory?project=drill to list drill markers
    POST /query with {"prompt": ..., "top_k": ...} (not "query")
- Failure modes: integrity_check fail, container won't start,
  marker still present after restore, with remediation for each
- When to run: before new write-path automation, after backup.py
  or schema changes, after infra bumps, monthly as standing check

225/225 tests passing (219 existing + 6 new restore).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:17:48 -04:00
03822389a1 deploy: self-update re-exec guard in deploy.sh
When deploy.sh itself changes in the commit being pulled, the bash
process is still running the OLD script from memory — git reset --hard
updated the file on disk but the in-memory instructions are stale.
This bit the 2026-04-09 Dalidou deploy: the old pre-build-sha Step 2
ran against fresh source, so the container started with
ATOCORE_BUILD_SHA="unknown" instead of the real commit. Manual
re-run fixed it, but the class of bug will re-emerge every time
deploy.sh itself changes.

Fix (Step 1.5):
- After git reset --hard, sha1 the running script ($0) and the
  on-disk copy at $APP_DIR/deploy/dalidou/deploy.sh
- If they differ, export ATOCORE_DEPLOY_REEXECED=1 and exec into
  the fresh copy so Step 2 onward runs under the new script
- The sentinel env var prevents recursion
- Skipped in dry-run mode, when $0 isn't readable, or when the
  on-disk script doesn't exist yet

Docs (docs/dalidou-deployment.md):
- New "The deploy.sh self-update race" troubleshooting section
  explaining the root cause, the Step 1.5 mechanism, what the log
  output looks like, and how to opt out

Verified syntax and dry-run. 219/219 tests still passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:08:41 -04:00
be4099486c deploy: add build_sha visibility for precise drift detection
Make /health report the precise git SHA the container was built from,
so 'is the live service current?' can be answered without ambiguity.
0.2.0 was too coarse to trust as a 'live is current' signal — many
commits share the same __version__.

Three layers:

1. /health endpoint (src/atocore/api/routes.py)
   - Reads ATOCORE_BUILD_SHA, ATOCORE_BUILD_TIME, ATOCORE_BUILD_BRANCH
     from environment, defaults to 'unknown'
   - Reports them alongside existing code_version field

2. docker-compose.yml
   - Forwards the three env vars from the host into the container
   - Defaults to 'unknown' so direct `docker compose up` runs (without
     deploy.sh) cleanly signal missing build provenance

3. deploy.sh
   - Step 2 captures git SHA + UTC timestamp + branch and exports them
     as env vars before `docker compose up -d --build`
   - Step 6 reads /health post-deploy and compares the reported
     build_sha against the freshly-built one. Mismatch exits non-zero
     (exit code 6) with a remediation hint covering cached image,
     env propagation, and concurrent restart cases

Tests (tests/test_api_storage.py):
- test_health_endpoint_reports_code_version_from_module
- test_health_endpoint_reports_build_metadata_from_env
- test_health_endpoint_reports_unknown_when_build_env_unset

Docs (docs/dalidou-deployment.md):
- Three-level drift detection table (code_version coarse,
  build_sha precise, build_time/branch forensic)
- Canonical drift check script using LIVE_SHA vs EXPECTED_SHA
- Note that running deploy.sh is itself the simplest drift check

219/219 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:25:32 -04:00
2c0b214137 deploy.sh: add permission pre-flight check with clean remediation
Dalidou Claude's second re-deploy (commit b492f5f) reported one
remaining friction point: the app dir was root-owned from the
previous manual-workaround deploy (when ALTER TABLE was run as
root to work around the schema init bug), so deploy.sh's git
fetch/reset hit a permission wall. They worked around it with
a one-shot docker run chown, but the script itself produced
cryptic git errors before that, so the fix wasn't obvious until
after the fact.

This commit adds a permission pre-flight check that runs BEFORE
any git operations and exits cleanly with an explicit remediation
message instead of letting git produce half-state on partial
failure.

The check:
1. Reads the current owner of the app dir via `stat -c '%U:%G'`
2. Reports the current user via `id -un` / `id -u:id -g`
3. Attempts to create a throwaway marker file in the app dir
4. If the marker write fails, prints three distinct remediation
   commands covering the common environments:
     a. sudo chown -R 1000:1000 $APP_DIR (if passwordless sudo)
     b. sudo bash $0 (if running deploy.sh itself as root works)
     c. docker run --rm -v $APP_DIR:/app alpine chown -R ...
        (what Dalidou Claude actually did on 2026-04-08)
5. Exits with code 5 so CI / automation can distinguish "no
   permission" from other deploy failures

Dry-run mode skips the check (nothing is mutated in dry-run).

A brief WARNING is also printed early if the app dir exists but
doesn't appear writable, before the fatal check — this gives
operators a heads-up even in the happy-path case.

Syntax check: bash -n passes.
Full suite: 216 passing (unchanged; no code changes to the app).

What this commit does NOT do
----------------------------
- Does NOT automatically fix permissions. chown needs root and
  we don't want deploy.sh to escalate silently. The operator
  runs one of the three remediation commands manually.
- Does NOT check permissions on nested files (like .git/config)
  individually. The marker-file test on the app dir root is the
  cheapest proxy that catches the common case (root-owned dir
  tree after a previous sudo-based operation).
- Does NOT change behavior on first-time deploys where the app
  dir doesn't exist yet. The check is gated on `-d $APP_DIR`.
2026-04-08 19:55:50 -04:00
b492f5f7b0 fix: schema init ordering, deploy.sh default, client BASE_URL docs
Three issues Dalidou Claude surfaced during the first real deploy
of commit e877e5b to the live service (report from 2026-04-08).
Bug 1 was the critical one — a schema init ordering bug that would
have bitten every future upgrade from a pre-Phase-9 schema — and
the other two were usability traps around hostname resolution.

Bug 1 (CRITICAL): schema init ordering
--------------------------------------
src/atocore/models/database.py

SCHEMA_SQL contained CREATE INDEX statements that referenced
columns added later by _apply_migrations():

    CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project);
    CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project);
    CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id);

On a FRESH install, CREATE TABLE IF NOT EXISTS creates the tables
with the Phase 9 shape (columns present), so the CREATE INDEX runs
cleanly and _apply_migrations is effectively a no-op.

On an UPGRADE from a pre-Phase-9 schema, CREATE TABLE IF NOT EXISTS
is a no-op (the tables already exist in the old shape), the columns
are NOT added yet, and the CREATE INDEX fails with
"OperationalError: no such column: project" before
_apply_migrations gets a chance to add the columns.

Dalidou Claude hit this exactly when redeploying from 0.1.0 to
0.2.0 — had to manually ALTER TABLE to add the Phase 9 columns
before the container could start.

The fix is to remove the Phase 9-column indexes from SCHEMA_SQL.
They already exist in _apply_migrations() AFTER the corresponding
ALTER TABLE, so they still get created on both fresh and upgrade
paths — just after the columns exist, not before.

Indexes still in SCHEMA_SQL (all safe — reference columns that
have existed since the first release):
- idx_chunks_document on source_chunks(document_id)
- idx_memories_type on memories(memory_type)
- idx_memories_status on memories(status)
- idx_interactions_project on interactions(project_id)

Indexes moved to _apply_migrations (already there — just no longer
duplicated in SCHEMA_SQL):
- idx_memories_project on memories(project)
- idx_interactions_project_name on interactions(project)
- idx_interactions_session on interactions(session_id)
- idx_interactions_created_at on interactions(created_at)

Regression test: tests/test_database.py
---------------------------------------
New test_init_db_upgrades_pre_phase9_schema_without_failing:

- Seeds the DB with the exact pre-Phase-9 shape (no project /
  last_referenced_at / reference_count on memories; no project /
  client / session_id / response / memories_used / chunks_used on
  interactions)
- Calls init_db() — which used to raise OperationalError before
  the fix
- Verifies all Phase 9 columns are present after the call
- Verifies the migration indexes exist

Before the fix this test would have failed with
"OperationalError: no such column: project" on the init_db call.
After the fix it passes. This locks the invariant "init_db is
safe on any legacy schema shape" so the bug can't silently come
back.

Full suite: 216 passing (was 215), 1 warning. The +1 is the new
regression test.

Bug 3 (usability): deploy.sh DNS default
----------------------------------------
deploy/dalidou/deploy.sh

ATOCORE_GIT_REMOTE defaulted to http://dalidou:3000/Antoine/ATOCore.git
which requires the "dalidou" hostname to resolve. On the Dalidou
host itself it didn't (no /etc/hosts entry for localhost alias),
so deploy.sh had to be run with the IP as a manual workaround.

Fix: default ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git.
Loopback always works on the host running the script. Callers
from a remote host (e.g. running deploy.sh from a laptop against
the Dalidou LAN) set ATOCORE_GIT_REMOTE explicitly. The script
header's Environment Variables section documents this with an
explicit reference to the 2026-04-08 Dalidou deploy report so the
rationale isn't lost.

docs/dalidou-deployment.md gets a new "Troubleshooting hostname
resolution" subsection and a new example invocation showing how
to deploy from a remote host with an explicit ATOCORE_GIT_REMOTE
override.

Bug 2 (usability): atocore_client.py ATOCORE_BASE_URL documentation
-------------------------------------------------------------------
scripts/atocore_client.py

Same class of issue as bug 3. BASE_URL defaults to
http://dalidou:8100 which resolves fine from a remote caller
(laptop, T420/OpenClaw over Tailscale) but NOT from the Dalidou
host itself or from inside the atocore container. Dalidou Claude
saw the CLI return
{"status": "unavailable", "fail_open": true}
while direct curl to http://127.0.0.1:8100 worked.

The fix here is NOT to change the default (remote callers are
the common case and would break) but to DOCUMENT the override
clearly so the next operator knows what's happening:

- The script module docstring grew a new "Environment variables"
  section covering ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS,
  ATOCORE_REFRESH_TIMEOUT_SECONDS, and ATOCORE_FAIL_OPEN, with
  the explicit override example for on-host/in-container use
- It calls out the exact symptom (fail-open envelope when the
  base URL doesn't resolve) so the diagnosis is obvious from
  the error alone
- docs/dalidou-deployment.md troubleshooting section mirrors
  this guidance so there's one place to look regardless of
  whether the operator starts with the client help or the
  deploy doc

What this commit does NOT do
----------------------------
- Does NOT change the default ATOCORE_BASE_URL. Doing that would
  break the T420 OpenClaw helper and every remote caller who
  currently relies on the hostname. Documentation is the right
  fix for this case.
- Does NOT fix /etc/hosts on Dalidou. That's a host-level
  configuration issue that the user can fix if they prefer
  having the hostname resolve; the deploy.sh fix makes it
  unnecessary regardless.
- Does NOT re-run the validation on Dalidou. The next step is
  for the live service to pull this commit via deploy.sh (which
  should now work without the IP workaround) and re-run the
  Phase 9 loop test to confirm nothing regressed.
2026-04-08 19:02:57 -04:00
e877e5b8ff deploy: version-visible /health + deploy.sh + update runbook
Dalidou Claude's validation run against the live service exposed a
structural gap: the deployment at /srv/storage/atocore/app has no
git connection, the running container was built from pre-Phase-9
source, and /health hardcoded 'version: 0.1.0' so drift is
invisible. Weeks of work have been shipping to Gitea but never
reaching the live service.

This commit fixes both the drift-invisibility problem and the
absence of an update workflow, so the next deploy to Dalidou can
go live cleanly and future drifts surface immediately.

Layer 1: deployment drift is now visible via /health
----------------------------------------------------
- src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0
  and documented as the source of truth for the deployed code
  version, with a history block explaining when each bump happens
  (API surface change, schema change, user-visible behavior change)
- src/atocore/main.py: FastAPI constructor now uses __version__
  instead of the hardcoded '0.1.0' string, so the OpenAPI docs
  reflect the actual code version
- src/atocore/api/routes.py: /health now reads from __version__
  dynamically. Both the existing 'version' field and a new
  'code_version' field report the same value for backwards compat.
  A new docstring explains that comparing this to the main
  branch's __version__ is the fastest way to detect drift.
- pyproject.toml: version bumped to 0.2.0 to stay in sync

The comparison is now:
  curl /health -> "code_version": "0.2.0"
  grep __version__ src/atocore/__init__.py -> "0.2.0"
If those differ, the deployment is stale. Concrete, unambiguous.

Layer 2: deploy.sh as the canonical update path
-----------------------------------------------
New file: deploy/dalidou/deploy.sh

One-shot bash script that handles both the first-time deploy
(where /srv/storage/atocore/app may not be a git repo yet) and
the ongoing update case. Steps:

1. If app dir is not a git checkout, back it up as
   <dir>.pre-git-<utc-stamp> and re-clone from Gitea.
   If it IS a checkout, fetch + reset --hard origin/<branch>.
2. Report the deployable commit SHA
3. Check that deploy/dalidou/.env exists (hard fail if missing
   with a clear message pointing at .env.example)
4. docker compose up -d --build — rebuilds the image from
   current source, restarts the container
5. Poll /health for up to 30 seconds; on failure, print the
   last 50 lines of container logs and exit non-zero
6. Parse /health.code_version and compare to the __version__
   in the freshly-pulled source. If they differ, exit non-zero
   with a message suggesting docker compose down && up
7. On success, report commit + code_version + "health: ok"

Configurable via env vars:
- ATOCORE_APP_DIR (default /srv/storage/atocore/app)
- ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git)
- ATOCORE_BRANCH (default main)
- ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health)
- ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode

Explicit non-goals documented in the script header:
- does not manage secrets (.env is the caller's responsibility)
- does not take a pre-deploy backup (call /admin/backup first
  if you want one)
- does not roll back on failure (redeploy a known-good commit
  to recover)
- does not touch the DB directly — schema migrations run at
  service startup via the lifespan handler, and all existing
  _apply_migrations ALTERs are idempotent ADD COLUMN operations

Layer 3: updated docs/dalidou-deployment.md
-------------------------------------------
- First-time deployment steps now explicitly say "git clone", not
  "place the repository", so future first-time deploys don't end
  up as static snapshots again
- New "Updating a running deployment" section covering deploy.sh
  usage with all three modes (normal / branch override / dry-run)
- New "Deployment drift detection" section with the one-liner
  comparison between /health code_version and the repo's
  __version__
- New "Schema migrations on redeploy" section enumerating the
  exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0
  upgrade, confirming they are additive-only and safe, and
  recommending a backup via /admin/backup before any redeploy

Full suite: 215 passing, 1 warning. No test was hardcoded to the
old version string, so the version bump was safe without test
changes.

What this commit does NOT do
----------------------------
- Does NOT execute the deploy on the live Dalidou instance. That
  requires Dalidou access and is the next step. A ready-to-paste
  prompt for Dalidou Claude will be provided separately.
- Does NOT add CI/CD, webhook-based auto-deploy, or reverse
  proxy. Those remain in the 'deferred' section of the
  deployment doc.
- Does NOT change the Dockerfile. The existing 'COPY source at
  build time' pattern is what deploy.sh relies on — rebuilding
  the image picks up new code.
- Does NOT modify the database schema. The Phase 9 migrations
  that Dalidou's DB needs will be applied automatically on next
  service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
fad30d5461 feat(client): Phase 9 reflection loop surface in shared operator CLI
Codex's sequence step 3: finish the Phase 9 operator surface in the
shared client. The previous client version (0.1.0) covered stable
operations (project lifecycle, retrieval, context build, trusted
state, audit-query) but explicitly deferred capture/extract/queue/
promote/reject pending "exercised workflow". That deferral ran
into a bootstrap problem: real Claude Code sessions can't exercise
the Phase 9 loop without a usable client surface to drive it. This
commit ships the 8 missing subcommands so the next step (real
validation on Dalidou) is unblocked.

Bumps CLIENT_VERSION from 0.1.0 to 0.2.0 per the semver rules in
llm-client-integration.md (new subcommands = minor bump).

New subcommands in scripts/atocore_client.py
--------------------------------------------
| Subcommand            | Endpoint                                  |
|-----------------------|-------------------------------------------|
| capture               | POST /interactions                        |
| extract               | POST /interactions/{id}/extract           |
| reinforce-interaction | POST /interactions/{id}/reinforce         |
| list-interactions     | GET  /interactions                        |
| get-interaction       | GET  /interactions/{id}                   |
| queue                 | GET  /memory?status=candidate             |
| promote               | POST /memory/{id}/promote                 |
| reject                | POST /memory/{id}/reject                  |

Each follows the existing client style: positional arguments with
empty-string defaults for optional filters, truthy-string arguments
for booleans (matching the existing refresh-project pattern), JSON
output via print_json(), fail-open behavior inherited from
request().

capture accepts prompt + response + project + client + session_id +
reinforce as positionals, defaulting the client field to
"atocore-client" when omitted so every capture from the shared
client is identifiable in the interactions audit trail.

extract defaults to preview mode (persist=false). Pass "true" as
the second positional to create candidate memories.

list-interactions and queue build URL query strings with
url-encoded values and always include the limit, matching how the
existing context-build subcommand handles its parameters.

Security fix: ID-field URL encoding
-----------------------------------
The initial draft used urllib.parse.quote() with the default safe
set, which does NOT encode "/" because it's a reserved path
character. That's a security footgun on ID fields: passing
"promote mem/evil/action" would build /memory/mem/evil/action/promote
and hit a completely different endpoint than intended.

Fixed by passing safe="" to urllib.parse.quote() on every ID field
(interaction_id and memory_id). The tests cover this explicitly via
test_extract_url_encodes_interaction_id and test_promote_url_encodes_memory_id,
both of which would have failed with the default behavior.

Project names keep the default quote behavior because a project
name with a slash would already be broken elsewhere in the system
(ingest root resolution, file paths, etc).

tests/test_atocore_client.py (new, 18 tests, all green)
-------------------------------------------------------
A dedicated test file for the shared client that mocks the
request() helper and verifies each subcommand:
- calls the correct HTTP method and path
- builds the correct JSON body (or query string)
- passes the right subset of CLI arguments through
- URL-encodes ID fields so path traversal isn't possible

Tests are structured as unit tests (not integration tests) because
the API surface on the server side already has its own route tests
in test_api_storage.py and the Phase 9 specific files. These tests
are the wiring contract between CLI args and HTTP calls.

Test file highlights:
- capture: default values, custom client, reinforce=false
- extract: preview by default, persist=true opt-in, URL encoding
- reinforce-interaction: correct path construction
- list-interactions: no filters, single filter, full filter set
  (including ISO 8601 since parameter with T separator and Z)
- get-interaction: fetch by id
- queue: always filters status=candidate, accepts memory_type
  and project, coerces limit to int
- promote / reject: correct path + URL encoding
- test_phase9_full_loop_via_client_shape: end-to-end sequence
  that drives capture -> extract preview -> extract persist ->
  queue list -> promote -> reject through the shared client and
  verifies the exact sequence of HTTP calls that would be made

These tests run in ~0.2s because they mock request() — no DB, no
Chroma, no HTTP. The fast feedback loop matters because the
client surface is what every agent integration eventually depends
on.

docs/architecture/llm-client-integration.md updates
---------------------------------------------------
- New "Phase 9 reflection loop (shipped after migration safety
  work)" section under "What's in scope for the shared client
  today" with the full 8-subcommand table and a note explaining
  the bootstrap-problem rationale
- Removed the "Memory review queue and reflection loop" section
  from "What's intentionally NOT in scope today"; backup admin
  and engineering-entity commands remain the only deferred
  families
- Renumbered the deferred-commands list (was 3 items, now 2)
- Open follow-ups updated: memory-review-subcommand item replaced
  with "real-usage validation of the Phase 9 loop" as the next
  concrete dependency
- TL;DR updated to list the reflection-loop subcommands
- Versioning note records the v0.1.0 -> v0.2.0 bump with the
  subcommands included

Full suite: 215 passing (was 197), 1 warning. The +18 is
tests/test_atocore_client.py. Runtime unchanged because the new
tests don't touch the DB.

What this commit does NOT do
----------------------------
- Does NOT change the server-side endpoints. All 8 subcommands
  call existing API routes that were shipped in Phase 9 Commits
  A/B/C. This is purely a client-side wiring commit.
- Does NOT run the reflection loop against the live Dalidou
  instance. That's the next concrete step and is explicitly
  called out in the open-follow-ups section of the updated doc.
- Does NOT modify the Claude Code slash command. It still pulls
  context only; the capture/extract/queue/promote companion
  commands (e.g. /atocore-record-response) are deferred until the
  capture workflow has been exercised in real use at least once.
- Does NOT refactor the OpenClaw helper. That's a cross-repo
  change and remains a queued follow-up, now unblocked by the
  shared client having the reflection-loop subcommands.
2026-04-08 16:09:42 -04:00
261277fd51 fix(migration): preserve superseded/invalid shadow state during rekey
Codex caught a real data-loss bug in the legacy alias migration
shipped in 7e60f5a. plan_state_migration filtered state rows to
status='active' only, then apply_plan deleted the shadow projects
row at the end. Because project_state.project_id has
ON DELETE CASCADE, any superseded or invalid state rows still
attached to the shadow project got silently cascade-deleted —
exactly the audit loss a cleanup migration must not cause.

This commit fixes the bug and adds regression tests that lock in
the invariant "shadow state of every status is accounted for".

Root cause
----------
scripts/migrate_legacy_aliases.py::plan_state_migration was:

    "SELECT * FROM project_state WHERE project_id = ? AND status = 'active'"

which only found live rows. Any historical row (status in
'superseded' or 'invalid') was invisible to the plan, so the apply
step had nothing to rekey for it. Then the shadow project row was
deleted at the end, cascade-deleting every unplanned row.

The fix
-------
plan_state_migration now selects ALL state rows attached to the
shadow project regardless of status, and handles every row per a
per-status decision table:

| Shadow status | Canonical at same triple? | Values     | Action                         |
|---------------|---------------------------|------------|--------------------------------|
| any           | no                        | —          | clean rekey                    |
| any           | yes                       | same       | shadow superseded in place     |
| active        | yes, active               | different  | COLLISION, apply refuses       |
| active        | yes, inactive             | different  | shadow wins, canonical deleted |
| inactive      | yes, any                  | different  | historical drop (logged)       |

Four changes in the script:

1. SELECT drops the status filter so the plan walks every row.
2. New StateRekeyPlan.historical_drops list captures the shadow
   rows that lose to a canonical row at the same triple because the
   shadow is already inactive. These are the only unavoidable data
   losses, and they happen because the UNIQUE(project_id, category,
   key) constraint on project_state doesn't allow two rows per
   triple regardless of status.
3. New apply action 'replace_inactive_canonical' for the
   shadow-active-vs-canonical-inactive case. At apply time the
   canonical inactive row is DELETEd first (SQLite's default
   immediate constraint checking) and then the shadow is UPDATEd
   into its place in two separate statements. Adds a new
   state_rows_replaced_inactive_canonical counter.
4. New apply counter state_rows_historical_dropped for audit
   transparency. The rows themselves are still cascade-deleted
   when the shadow project row is dropped, but they're counted
   and reported.

Five places render_plan_text and plan_to_json_dict updated:

- counts() gains state_historical_drops
- render_plan_text prints a 'historical drops' section with each
  shadow-canonical pair and their statuses when there are any, so
  the operator sees the audit loss BEFORE running --apply
- The new section explicitly tells the operator: "if any of these
  values are worth keeping as separate audit records, manually copy
  them out before running --apply"
- plan_to_json_dict carries historical_drops into the JSON report
- The state counts table in the human report now shows both
  'state collisions (block)' and 'state historical drops' as
  separate lines so the operator can distinguish
  "apply will refuse" from "apply will drop historical rows"

Regression tests (3 new, all green)
-----------------------------------
tests/test_migrate_legacy_aliases.py:

- test_apply_preserves_superseded_shadow_state_when_no_collision:
  the direct regression for the codex finding. Seeds a shadow with
  a superseded state row on a triple the canonical doesn't have,
  runs the migration, verifies via raw SQL that the row is now
  attached to the canonical projects row and still has status
  'superseded'. This is the test that would have failed before
  the fix.
- test_apply_drops_shadow_inactive_row_when_canonical_holds_same_triple:
  covers the unavoidable data-loss case. Seeds shadow superseded
  + canonical active at the same triple with different values,
  verifies plan.counts() reports one historical_drop, runs apply,
  verifies the canonical value is preserved and the shadow value
  is gone.
- test_apply_replaces_inactive_canonical_with_active_shadow:
  covers the cross-contamination case where shadow has live value
  and canonical has a stale invalid row. Shadow wins by deleting
  canonical and rekeying in its place. Verifies the counter and
  the final state.

Plus _seed_state_row now accepts a status kwarg so the seeding
helper can create superseded/invalid rows directly.

test_dry_run_on_empty_registry_reports_empty_plan was updated to
include the new state_historical_drops key in the expected counts
dict (all zero for an empty plan, so the test shape is the same).

Full suite: 197 passing (was 194), 1 warning. The +3 is the three
new regression tests.

What this commit does NOT do
----------------------------
- Does NOT try to preserve historical shadow rows that collide
  with a canonical row at the same triple. That would require a
  schema change (adding (id) to the UNIQUE key, or a separate
  history table) and isn't in scope for a cleanup migration.
  The operator sees these as explicit 'historical drops' in the
  plan output and can copy them out manually if any are worth
  preserving.
- Does NOT change any behavior for rows that were already
  reachable from the canonicalized read path. The fix only
  affects legacy rows whose project_id points at a shadow row.
- Does NOT re-verify the earlier happy-path tests beyond the full
  suite confirming them still green.
2026-04-08 15:52:44 -04:00
7e60f5a0e6 feat(ops): legacy alias migration script with dry-run/apply modes
Closes the compatibility gap documented in
docs/architecture/project-identity-canonicalization.md. Before fb6298a,
writes to project_state, memories, and interactions stored the raw
project name. After fb6298a every service-layer entry point
canonicalizes through the registry, which silently made pre-fix
alias-keyed rows unreachable from the new read path. Now there's
a migration tool to find and fix them.

This commit is the tool and its tests. The tool is NOT run against
the live Dalidou DB in this commit — that's a separate supervised
manual step after reviewing the dry-run output.

scripts/migrate_legacy_aliases.py
---------------------------------
Standalone offline migration tool. Dry-run default, --apply explicit.

What it inspects:
- projects: rows whose name is a registered alias and differs from
  the canonical project_id (shadow rows)
- project_state: rows whose project_id points at a shadow; plan
  rekeys them to the canonical row's id. (category, key) collisions
  against the canonical block the apply step until a human resolves
- memories: rows whose project column is a registered alias. Plain
  string rekey. Dedup collisions (after rekey, same
  (memory_type, content, project, status)) are handled by the
  existing memory supersession model: newer row stays active, older
  becomes superseded with updated_at as tiebreaker
- interactions: rows whose project column is a registered alias.
  Plain string rekey, no collision handling

What it does NOT do:
- Never touches rows that are already canonical
- Never auto-resolves project_state collisions (refuses until the
  human picks a winner via POST /project/state)
- Never creates data; only rekeys or supersedes
- Never runs outside a single SQLite transaction; any failure rolls
  back the entire migration

Safety rails:
- Dry-run is default. --apply is explicit.
- Apply on empty plan refuses unless --allow-empty (prevents
  accidental runs that look meaningful but did nothing)
- Apply refuses on any project_state collision
- Apply refuses on integrity errors (e.g. two case-variant rows
  both matching the canonical lookup)
- Writes a JSON report to data/migrations/ on every run (dry-run
  and apply alike) for audit
- Idempotent: running twice produces the same final state as
  running once. The second run finds zero shadow rows and exits
  clean.

CLI flags:
  --registry PATH     override ATOCORE_PROJECT_REGISTRY_PATH
  --db PATH           override the AtoCore SQLite DB path
  --apply             actually mutate (default is dry-run)
  --allow-empty       permit --apply on an empty plan
  --report-dir PATH   where to write the JSON report
  --json              emit the plan as JSON instead of human prose

Smoke test against the Phase 9 validation DB produces the expected
"Nothing to migrate. The database is clean." output with 4 known
canonical projects and 0 shadows.

tests/test_migrate_legacy_aliases.py
------------------------------------
19 new tests, all green:

Plan-building:
- test_dry_run_on_empty_registry_reports_empty_plan
- test_dry_run_on_clean_registered_db_reports_empty_plan
- test_dry_run_finds_shadow_project
- test_dry_run_plans_state_rekey_without_collisions
- test_dry_run_detects_state_collision
- test_dry_run_plans_memory_rekey_and_supersession
- test_dry_run_plans_interaction_rekey

Apply:
- test_apply_refuses_on_state_collision
- test_apply_migrates_clean_shadow_end_to_end (verifies get_state
  can see the state via BOTH the alias AND the canonical after
  migration)
- test_apply_drops_shadow_state_duplicate_without_collision
  (same (category, key, value) on both sides - mark shadow
  superseded, don't hit the UNIQUE constraint)
- test_apply_migrates_memories
- test_apply_migrates_interactions
- test_apply_is_idempotent
- test_apply_refuses_with_integrity_errors (uses case-variant
  canonical rows to work around projects.name UNIQUE constraint;
  verifies the case-insensitive duplicate detection works)

Reporting:
- test_plan_to_json_dict_is_serializable
- test_write_report_creates_file
- test_render_plan_text_on_empty_plan
- test_render_plan_text_on_collision

End-to-end gap closure (the most important test):
- test_legacy_alias_gap_is_closed_after_migration
  - Seeds the exact same scenario as
    test_legacy_alias_keyed_state_is_invisible_until_migrated
    in test_project_state.py (which documents the pre-migration
    gap)
  - Confirms the row is invisible before migration
  - Runs the migration
  - Verifies the row is reachable via BOTH the canonical id AND
    the alias afterward
  - This test and the pre-migration gap test together lock in
    "before migration: invisible, after migration: reachable"
    as the documented invariant

Full suite: 194 passing (was 175), 1 warning. The +19 is the new
migration test file.

Next concrete step after this commit
------------------------------------
- Run the dry-run against the live Dalidou DB to find out the
  actual blast radius. The script is the inspection SQL, codified.
- Review the dry-run output together
- If clean (zero shadows), no apply needed; close the doc gap as
  "verified nothing to migrate on this deployment"
- If there are shadows, resolve any collisions via
  POST /project/state, then run --apply under supervision
- After apply, the test_legacy_alias_keyed_state_is_invisible_until_migrated
  test still passes (it simulates the gap directly, so it's
  independent of the live DB state) and the gap-closed companion
  test continues to guard forward
2026-04-08 15:08:16 -04:00
1953e559f9 docs+test: clarify legacy alias compatibility gap, add gap regression test
Codex caught a real documentation accuracy bug in the previous
canonicalization doc commit (f521aab). The doc claimed that rows
written under aliases before fb6298a "still work via the
unregistered-name fallback path" — that is wrong for REGISTERED
aliases, which is exactly the case that matters.

The unregistered-name fallback only saves you when the project was
never in the registry: a row stored under "orphan-project" is read
back via "orphan-project", both pass through resolve_project_name
unchanged, and the strings line up. For a registered alias like
"p05", the helper rewrites the read key to "p05-interferometer"
but does NOT rewrite the storage key, so the legacy row becomes
silently invisible.

This commit corrects the doc and locks the gap behavior in with
a regression test, so the issue cannot be lost again.

docs/architecture/project-identity-canonicalization.md
------------------------------------------------------
- Removed the misleading claim from the "What this rule does NOT
  cover" section. Replaced with a pointer to the new gap section
  and an explicit statement that the migration is required before
  engineering V1 ships.
- New "Compatibility gap: legacy alias-keyed rows" section between
  "Why this is the trust hierarchy in action" and "The rule for
  new entry points". This is the natural insertion point because
  the gap is exactly the trust hierarchy failing for legacy data.
  The section covers:
  * a worked T0/T1 timeline showing the exact failure mode
  * what is at risk on the live Dalidou DB, ranked by trust tier:
    projects table (shadow rows), project_state (highest risk
    because Layer 3 is most-authoritative), memories, interactions
  * inspection SQL queries for measuring the actual blast radius
    on the live DB before running any migration
  * the spec for the migration script: walk projects, find shadow
    rows, merge dependent state via the conflict model when there
    are collisions, dry-run mode, idempotent
  * explicit statement that this is required pre-V1 because V1
    will add new project-keyed tables and the killer correctness
    queries from engineering-query-catalog.md would report wrong
    results against any project that has shadow rows
- "Open follow-ups" item 1 promoted from "tracked optional" to
  "REQUIRED before engineering V1 ships, NOT optional" with a
  more honest cost estimate (~150 LOC migration + ~50 LOC tests
  + supervised live run, not the previous optimistic ~30 LOC)
- TL;DR rewritten to mention the gap explicitly and re-order
  the open follow-ups so the migration is the top priority

tests/test_project_state.py
---------------------------
- New test_legacy_alias_keyed_state_is_invisible_until_migrated
- Inserts a "p05" project row + a project_state row pointing at
  it via raw SQL (bypassing set_state which now canonicalizes),
  simulating a pre-fix legacy row
- Verifies the canonicalized get_state path can NOT see the row
  via either the alias or the canonical id — this is the bug
- Verifies the row is still in the database (just unreachable),
  so the migration script has something to find
- The docstring explicitly says: "When the legacy alias migration
  script lands, this test must be inverted." Future readers will
  know exactly when and how to update it.

Full suite: 175 passing (was 174), 1 warning. The +1 is the new
gap regression test.

What this commit does NOT do
----------------------------
- The migration script itself is NOT in this commit. Codex's
  finding was a doc accuracy issue, and the right scope is fix
  the doc + lock the gap behavior in. Writing the migration is
  the next concrete step but is bigger (~200 LOC + dry-run mode
  + collision handling via the conflict model + supervised run
  on the live Dalidou DB), warrants its own commit, and probably
  warrants a "draft + review the dry-run output before applying"
  workflow rather than a single shot.
- Existing tests are unchanged. The new test stands alone as a
  documented gap; the 12 canonicalization tests from fb6298a
  still pass without modification.
2026-04-07 20:14:19 -04:00
f521aab97b docs(arch): project-identity-canonicalization contract
Codifies the helper-at-every-service-boundary rule that fb6298a
implemented across the eight current callsites. The contract is
intentionally simple but easy to forget, so it lives in its own
doc that the engineering layer V1 implementation sprint can read
before adding new project-keyed entity surfaces.

docs/architecture/project-identity-canonicalization.md
------------------------------------------------------
- The contract: every read/write that takes a project name MUST
  call resolve_project_name() before the value crosses a service
  boundary; canonicalization happens once, at the first statement
  after input validation, never later
- The helper API: resolve_project_name(name) returns the canonical
  project_id for registered names, the input unchanged for empty
  or unregistered names (the second case is the backwards-compat
  path for hand-curated state predating the registry)
- Full table of the 8 current callsites: builder.build_context,
  project_state.set_state/get_state/invalidate_state,
  interactions.record_interaction/list_interactions,
  memory.create_memory/get_memories
- Where the helper is intentionally NOT called and why: legacy
  ensure_project lookup, retriever's own _project_match_boost
  (which already calls get_registered_project), _rank_chunks
  secondary substring boost (multiplicative not filter, can't
  drop relevant chunks), update_memory (no project field update),
  unregistered names (the rule applied to a name with no record)
- Why this is the trust hierarchy in action: Layer 3 trusted
  state has to be findable to win the trust battle; an
  un-canonicalized lookup silently makes Layer 3 invisible and
  the system falls through to lower-trust retrieved chunks with
  no signal to the human
- The 4-step rule for new entry points: identify project-keyed
  reads/writes, place the call as the first statement after
  validation, add a regression test using the project_registry
  fixture, verify None/empty paths
- How the project_registry fixture works with a copy-pasteable
  example
- What the rule does NOT cover: alias creation (registry's own
  write path), registry hot-reloading (no in-process cache by
  design), cross-project dedup (collision detection at
  registration), time-bounded canonicalization (canonical id is
  stable forever), legacy data migration (open follow-up)
- Engineering layer V1 implications: every new service entry
  point in the entities/relationships/conflicts/mirror modules
  must apply the helper at the first statement after validation;
  treated as code review failure if missing
- Open follow-ups: legacy data migration script (~30 LOC),
  registry file caching when projects scale beyond ~50, case
  sensitivity audit when entity-side storage lands, _rank_chunks
  cleanup, documentation discoverability (intentional redundancy
  between this doc, the helper docstring, and per-callsite comments)
- Quick reference card: copy-pasteable template for new service
  functions

master-plan-status.md updated
-----------------------------
- New doc added to the engineering-layer planning sprint listing
- Marked as required reading before V1 implementation begins
- Note that V1 must apply the contract at every new service-layer
  entry point

Pure doc work, no code changes. Full suite stays at 174 passing
because no source changed.
2026-04-07 19:32:31 -04:00
fb6298a9a1 fix(P1+P2): canonicalize project names at every trust boundary
Three findings from codex's review of the previous P1+P2 fix. The
earlier commit (f2372ef) only fixed alias resolution at the context
builder. Codex correctly pointed out that the same fragmentation
applies at every other place a project name crosses a boundary —
project_state writes/reads, interaction capture/listing/filtering,
memory create/queries, and reinforcement's downstream queries. Plus
a real bug in the interaction `since` filter where the storage
format and the documented ISO format don't compare cleanly.

The fix is one helper used at every boundary instead of duplicating
the resolution inline.

New helper: src/atocore/projects/registry.py::resolve_project_name
---------------------------------------------------------------
- Single canonicalization boundary for project names
- Returns the canonical project_id when the input matches any
  registered id or alias
- Returns the input unchanged for empty/None and for unregistered
  names (preserves backwards compat with hand-curated state that
  predates the registry)
- Documented as the contract that every read/write at the trust
  boundary should pass through

P1 — Trusted Project State endpoints
------------------------------------
src/atocore/context/project_state.py: set_state, get_state, and
invalidate_state now all canonicalize project_name through
resolve_project_name BEFORE looking up or creating the project row.

Before this fix:
- POST /project/state with project="p05" called ensure_project("p05")
  which created a separate row in the projects table
- The state row was attached to that alias project_id
- Later context builds canonicalized "p05" -> "p05-interferometer"
  via the builder fix from f2372ef and never found the state
- Result: trusted state silently fragmented across alias rows

After this fix:
- The alias is resolved to the canonical id at every entry point
- Two captures (one via "p05", one via "p05-interferometer") write
  to the same row
- get_state via either alias or the canonical id finds the same row

Fixes the highest-priority gap codex flagged because Trusted Project
State is supposed to be the most dependable layer in the AtoCore
trust hierarchy.

P2.a — Interaction capture project canonicalization
----------------------------------------------------
src/atocore/interactions/service.py: record_interaction now
canonicalizes project before storing, so interaction.project is
always the canonical id regardless of what the client passed.

Downstream effects:
- reinforce_from_interaction queries memories by interaction.project
  -> previously missed memories stored under canonical id
  -> now consistent because interaction.project IS the canonical id
- the extractor stamps candidates with interaction.project
  -> previously created candidates in alias buckets
  -> now creates candidates in the canonical bucket
- list_interactions(project=alias) was already broken, now fixed by
  canonicalizing the filter input on the read side too

Memory service applied the same fix:
- src/atocore/memory/service.py: create_memory and get_memories
  both canonicalize project through resolve_project_name
- This keeps stored memory.project consistent with the
  reinforcement query path

P2.b — Interaction `since` filter format normalization
------------------------------------------------------
src/atocore/interactions/service.py: new _normalize_since helper.

The bug:
- created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by
  convention) so it sorts lexically and compares cleanly with the
  SQLite CURRENT_TIMESTAMP default
- The `since` parameter was documented as ISO 8601 but compared as
  a raw string against the storage format
- The lexically-greater 'T' separator means an ISO timestamp like
  '2026-04-07T12:00:00Z' is GREATER than the storage form
  '2026-04-07 12:00:00' for the same instant
- Result: a client passing ISO `since` got an empty result for any
  row from the same day, even though those rows existed and were
  technically "after" the cutoff in real-world time

The fix:
- _normalize_since accepts ISO 8601 with T, optional Z suffix,
  optional fractional seconds, optional +HH:MM offsets
- Uses datetime.fromisoformat for parsing (Python 3.11+)
- Converts to UTC and reformats as the storage format before the
  SQL comparison
- The bare storage format still works (backwards compat path is a
  regex match that returns the input unchanged)
- Unparseable input is returned as-is so the comparison degrades
  gracefully (rows just don't match) instead of raising and
  breaking the listing endpoint

builder.py refactor
-------------------
The previous P1 fix had inline canonicalization. Now it uses the
shared helper for consistency:
- import changed from get_registered_project to resolve_project_name
- the inline lookup is replaced with a single helper call
- the comment block now points at representation-authority.md for
  the canonicalization contract

New shared test fixture: tests/conftest.py::project_registry
------------------------------------------------------------
- Standardizes the registry-setup pattern that was duplicated
  across test_context_builder.py, test_project_state.py,
  test_interactions.py, and test_reinforcement.py
- Returns a callable that takes (project_id, [aliases]) tuples
  and writes them into a temp registry file with the env var
  pointed at it and config.settings reloaded
- Used by all 12 new regression tests in this commit

Tests (12 new, all green on first run)
--------------------------------------
test_project_state.py:
- test_set_state_canonicalizes_alias: write via alias, read via
  every alias and the canonical id, verify same row id
- test_get_state_canonicalizes_alias_after_canonical_write
- test_invalidate_state_canonicalizes_alias
- test_unregistered_project_state_still_works (backwards compat)

test_interactions.py:
- test_record_interaction_canonicalizes_project
- test_list_interactions_canonicalizes_project_filter
- test_list_interactions_since_accepts_iso_with_t_separator
- test_list_interactions_since_accepts_z_suffix
- test_list_interactions_since_accepts_offset
- test_list_interactions_since_storage_format_still_works

test_reinforcement.py:
- test_reinforcement_works_when_capture_uses_alias (end-to-end:
  capture under alias, seed memory under canonical, verify
  reinforcement matches)
- test_get_memories_filter_by_alias

Full suite: 174 passing (was 162), 1 warning. The +12 is the
new regression tests, no existing tests regressed.

What's still NOT canonicalized (and why)
----------------------------------------
- _rank_chunks's secondary substring boost in builder.py — the
  retriever already does the right thing via its own
  _project_match_boost which calls get_registered_project. The
  redundant secondary boost still uses the raw hint but it's a
  multiplicative factor on top of correct retrieval, not a
  filter, so it can't drop relevant chunks. Tracked as a future
  cleanup but not a P1.
- update_memory's project field (you can't change a memory's
  project after creation in the API anyway).
- The retriever's project_hint parameter on direct /query calls
  — same reasoning as the builder boost, plus the retriever's
  own get_registered_project call already handles aliases there.
2026-04-07 08:29:33 -04:00
f2372eff9e fix(P1+P2): alias-aware project state lookup + slash command corpus fallback
Two regression fixes from codex's review of the slash command
refactor commit (78d4e97). Both findings are real and now have
covered tests.

P1 — server-side alias resolution for project_state lookup
----------------------------------------------------------
The bug:
- /context/build forwarded the caller's project hint verbatim to
  get_state(project_hint), which does an exact-name lookup against
  the projects table (case-insensitive but no alias resolution)
- the project registry's alias matching was only used by the
  client's auto-context path and the retriever's project-match
  boost, never by the server's project_state lookup
- consequence: /atocore-context "... p05" would silently miss
  trusted project state stored under the canonical id
  "p05-interferometer", weakening project-hinted retrieval to
  the point that an explicit alias hint was *worse* than no hint

The fix in src/atocore/context/builder.py:
- import get_registered_project from the projects registry
- before calling get_state(project_hint), resolve the hint
  through get_registered_project; if a registry record exists,
  use the canonical project_id for the state lookup
- if no registry record exists, fall back to the raw hint so a
  hand-curated project_state entry that predates the registry
  still works (backwards compat with pre-registry deployments)

The retriever already does its own alias expansion via
get_registered_project for the project-match boost, so the
retriever side was never broken — only the project_state lookup
in the builder. The fix is scoped to that one call site.

Tests added in tests/test_context_builder.py:
- test_alias_hint_resolves_through_registry: stands up a fresh
  registry, sets state under "p05-interferometer", then verifies
  build_context with project_hint="p05" finds the state, AND
  with project_hint="interferometer" (the second alias) finds it
  too, AND with the canonical id finds it. Covers all three
  resolution paths.
- test_unknown_hint_falls_back_to_raw_lookup: empty registry,
  set state under an unregistered project name, verify the
  build_context call with that name as the hint still finds the
  state. Locks in the backwards-compat behavior.

P2 — slash command no-hint fallback to corpus-wide context build
----------------------------------------------------------------
The bug:
- the slash command's no-hint path called auto-context, which
  returns {"status": "no_project_match"} when project detection
  fails and does NOT fall back to a plain context-build
- the slash command's own help text told the user "call without
  a hint to use the corpus-wide context build" — which was a lie
  because the wrapper no longer did that
- consequence: generic prompts like "what changed in AtoCore
  backup policy?" or any cross-project question got a useless
  no_project_match envelope instead of a context pack

The fix in .claude/commands/atocore-context.md:
- the no-hint path now does the 2-step fallback dance:
    1. try `auto-context "<prompt>"` for project detection
    2. if the response contains "no_project_match", fall back to
       `context-build "<prompt>"` (no project arg)
- both branches return a real context pack, fail-open envelope
  is preserved for genuine network errors
- the underlying client surface is unchanged (no new flags, no
  new subcommands) — the fallback is per-frontend logic in the
  slash command, leaving auto-context's existing semantics
  intact for OpenClaw and any other caller that depends on the
  no_project_match envelope as a "do nothing" signal

While I was here, also tightened the slash command's argument
parsing to delegate alias-knowledge to the registry instead of
embedding a hardcoded list:
- old version had a literal list of "atocore", "p04", "p05",
  "p06" and their aliases that needed manual maintenance every
  time a project was added
- new version takes the last token of $ARGUMENTS and asks the
  client's `detect-project` subcommand whether it's a known
  alias; if matched, it's the explicit hint, if not it's part
  of the prompt
- this delegates registry knowledge to the registry, where it
  belongs

Unrelated improvement noted but NOT fixed in this commit:
- _rank_chunks in builder.py also has a naive substring boost
  that uses the original hint without alias expansion. The
  retriever already does the right thing, so this secondary
  boost is redundant. Tracked as a future cleanup but not in
  scope for the P1/P2 fix; codex's findings are about
  project_state lookup, not about the secondary chunk boost.

Full suite: 162 passing (was 160), 1 warning. The +2 is the two
new P1 regression tests.
2026-04-07 07:47:03 -04:00
78d4e979e5 refactor slash command onto shared client + llm-client-integration doc
Codex's review caught that the Claude Code slash command shipped in
Session 2 was a parallel reimplementation of routing logic the
existing scripts/atocore_client.py already had. That client was
introduced via the codex/port-atocore-ops-client merge and is
already a comprehensive operator client (auto-context,
detect-project, refresh-project, project-state, audit-query, etc.).
The slash command should have been a thin wrapper from the start.

This commit fixes the shape without expanding scope.

.claude/commands/atocore-context.md
-----------------------------------
Rewritten as a thin Claude Code-specific frontend that shells out
to the shared client:

- explicit project hint -> calls `python scripts/atocore_client.py
  context-build "<prompt>" "<project>"`
- no explicit hint -> calls `python scripts/atocore_client.py
  auto-context "<prompt>"` which runs the client's detect-project
  routing first and falls through to context-build with the match

Inherits the client's stable behaviour for free:
- ATOCORE_BASE_URL env var (default http://dalidou:8100)
- fail-open on network errors via ATOCORE_FAIL_OPEN
- consistent JSON output shape
- the same project alias matching the OpenClaw helper uses

Removes the speculative `--capture` capture path that was in the
original draft. Capture/extract/queue/promote/reject are
intentionally NOT in the shared client yet (memory-review
workflow not exercised in real use), so the slash command can't
expose them either.

docs/architecture/llm-client-integration.md
-------------------------------------------
New planning doc that defines the layering rule for AtoCore's
relationship with LLM client contexts:

Three layers:
1. AtoCore HTTP API (universal, src/atocore/api/routes.py)
2. Shared operator client (scripts/atocore_client.py) — the
   canonical Python backbone for stable AtoCore operations
3. Per-agent thin frontends (Claude Code slash command,
   OpenClaw helper, future Codex skill, future MCP server)
   that shell out to the shared client

Three non-negotiable rules:
- every per-agent frontend is a thin wrapper (translate the
  agent's command format and render the JSON; nothing else)
- the shared client never duplicates the API (it composes
  endpoints; new logic goes in the API first)
- the shared client only exposes stable operations (subcommands
  land only after the API has been exercised in a real workflow)

Doc covers:
- the full table of subcommands currently in scope (project
  lifecycle, ingestion, project-state, retrieval, context build,
  audit-query, debug-context, health/stats)
- the three deferred families with rationale: memory review
  queue (workflow not exercised), backup admin (fail-open
  default would hide errors), engineering layer entities (V1
  not yet implemented)
- the integration recipe for new agent platforms
- explicit acknowledgement that the OpenClaw helper currently
  duplicates routing logic and that the refactor to the shared
  client is a queued cross-repo follow-up
- how the layering connects to phase 8 (OpenClaw) and phase 11
  (multi-model)
- versioning and stability rules for the shared client surface
- open follow-ups: OpenClaw refactor, memory-review subcommands
  when ready, optional backup admin subcommands, engineering
  entity subcommands during V1 implementation

master-plan-status.md updated
-----------------------------
- New "LLM Client Integration" subsection that points to the
  layering doc and explicitly notes the deferral of memory-review
  and engineering-entity subcommands
- Frames the layering as sitting between phase 8 and phase 11

Scope is intentionally narrow per codex's framing: promote the
existing client to canonical status, refactor the slash command
to use it, document the layering. No new client subcommands
added in this commit. The OpenClaw helper refactor is a
separate cross-repo follow-up. Memory-review and engineering-
entity work stay deferred.

Full suite: 160 passing, no behavior changes.
2026-04-07 07:22:54 -04:00
d6ce6128cf docs(arch): human-mirror-rules + engineering-v1-acceptance, sprint complete
Session 4 of the four-session plan. Final two engineering planning
docs, plus master-plan-status.md updated to reflect that the
engineering layer planning sprint is now complete.

docs/architecture/human-mirror-rules.md
---------------------------------------
The Layer 3 derived markdown view spec:

- The non-negotiable rule: the Mirror is read-only from the
  human's perspective; edits go to the canonical home and the
  Mirror picks them up on regeneration
- 3 V1 template families: Project Overview, Decision Log,
  Subsystem Detail
- Explicit V1 exclusions: per-component pages, per-decision
  pages, cross-project rollups, time-series pages, diff pages,
  conflict queue render, per-memory pages
- Mirror files live in /srv/storage/atocore/data/mirror/ NOT in
  the source vault (sources stay read-only per the operating
  model)
- 3 regeneration triggers: explicit POST, debounced async on
  entity write, daily scheduled refresh
- "Do not edit" header banner with checksum so unchanged inputs
  skip work
- Conflicts and project_state overrides surface inline so the
  trust hierarchy is visible in the human reading experience
- Templates checked in under templates/mirror/, edited via PR
- Deterministic output is a V1 requirement so future Mirror
  diffing works without rework
- Open questions for V1: debounce window, scheduler integration,
  template testing approach, directory listing endpoint, empty
  state rendering

docs/architecture/engineering-v1-acceptance.md
----------------------------------------------
The measurable done definition:

- Single-sentence definition: V1 is done when every v1-required
  query in engineering-query-catalog.md returns a correct result
  for one chosen test project, the Human Mirror renders a
  coherent overview, and a real KB-CAD or KB-FEM export round-
  trips through ingest -> review queue -> active entity without
  violating any conflict or trust invariant
- 23 acceptance criteria across 4 categories:
  * Functional (8): entity store, all 20 v1-required queries,
    tool ingest endpoints, candidate review queue, conflict
    detection, Human Mirror, memory-to-entity graduation,
    complete provenance chain
  * Quality (6): existing tests pass, V1 has its own coverage,
    conflict invariants enforced, trust hierarchy enforced,
    Mirror reproducible via golden file, killer correctness
    queries pass against representative data
  * Operational (5): safe migration, backup/restore drill,
    performance bounds, no new manual ops burden, Phase 9 not
    regressed
  * Documentation (4): per-entity-type spec docs, KB schema docs,
    V1 release notes, master-plan-status updated
- Explicit negative list of things V1 does NOT need to do:
  no LLM extractor, no auto-promotion, no write-back, no
  multi-user, no real-time UI, no cross-project rollups,
  no time-travel, no nightly conflict sweep, no incremental
  Chroma, no retention cleanup, no encryption, no off-Dalidou
  backup target
- Recommended implementation order: F-1 -> F-8 in sequence,
  with the graduation flow (F-7) saved for last as the most
  cross-cutting change
- Anticipated friction points called out in advance:
  graduation cross-cuts memory module, Mirror determinism trap,
  conflict detector subtle correctness, provenance backfill
  for graduated entities

master-plan-status.md updated
-----------------------------
- Engineering Layer Planning Sprint section now marked complete
  with all 8 architecture docs listed
- Note that the next concrete step is the V1 implementation
  sprint following engineering-v1-acceptance.md as its checklist

Pure doc work. No code, no schema, no behavior changes.

After this commit, the engineering planning sprint is fully done
(8/8 docs) and Phase 9 is fully complete (Commits A/B/C all
shipped, validated, and pushed). AtoCore is ready for either
the engineering V1 implementation sprint OR a pause for real-
world Phase 9 usage, depending on which the user prefers next.
2026-04-07 06:55:43 -04:00
368adf2ebc docs(arch): tool-handoff-boundaries + representation-authority
Session 3 of the four-session plan. Two more engineering planning
docs that lock in the most contentious architectural decisions
before V1 implementation begins.

docs/architecture/tool-handoff-boundaries.md
--------------------------------------------
Locks in the V1 read/write relationship with external tools:

- AtoCore is a one-way mirror in V1. External tools push,
  AtoCore reads, AtoCore never writes back.
- Per-tool stance table covering KB-CAD, KB-FEM, NX, PKM, Gitea
  repos, OpenClaw, AtoDrive, PLM/vendor systems
- Two new ingest endpoints proposed for V1:
  POST /ingest/kb-cad/export and POST /ingest/kb-fem/export
- Sketch JSON shapes for both exports (intentionally minimal,
  to be refined in dedicated schema docs during implementation)
- Drift handling: KB-CAD changes a value -> creates an entity
  candidate -> existing active becomes a conflict member ->
  human resolves via the conflict model
- Hard-line invariants V1 will not cross: no write to external
  tools, no live polling, no silent merging, no schema fan-out,
  no external-tool-specific logic in entity types
- Why not bidirectional: schema drift, conflict semantics, trust
  hierarchy, velocity, reversibility
- V2+ deferred items: selective write-back annotations, light
  polling, direct NX integration, cost/vendor/PLM connections
- Open questions for the implementation sprint: schema location,
  who runs the exporter, full-vs-incremental, exporter auth

docs/architecture/representation-authority.md
---------------------------------------------
The canonical-home matrix that says where each kind of fact
actually lives:

- Six representation layers identified: PKM, KB project,
  Gitea repos, AtoCore memories, AtoCore entities, AtoCore
  project_state
- The hard rule: every fact kind has exactly one canonical
  home; other layers may hold derived copies but never disagree
- Comprehensive matrix covering 22 fact kinds (CAD geometry,
  CAD-side structure, FEM mesh, FEM results, code, repo docs,
  PKM prose, identity, preference, episodic, decision,
  requirement, constraint, validation claim, material,
  parameter, project status, ADRs, runbooks, backup metadata,
  interactions)
- Cross-layer supremacy rule: project_state > tool-of-origin >
  entities > active memories > source chunks
- Three worked examples showing how the rules apply:
  * "what material does the lateral support pad use?" (KB-CAD
    canonical, project_state override possible)
  * "did we decide to merge the bind mounts?" (Gitea + memory
    both canonical for different aspects)
  * "what's p05's current next focus?" (project_state always
    wins for current state queries)
- Concrete consequences for V1 implementation: Material and
  Parameter are mostly KB-CAD shadows; Decisions / Requirements /
  Constraints / ValidationClaims are AtoCore-canonical; PKM is
  never authoritative; project_state is the override layer;
  the conflict model is the enforcement mechanism
- Out of scope for V1: facts about other people, vendor/cost
  facts, time-bounded facts, cross-project shared facts
- Open questions for V1: how the reviewer sees canonical home
  in the UI, whether entities need an explicit canonical_home
  field, how project_state overrides surface in query results

This is pure doc work. No code, no schema, no behavior changes.
After this commit the engineering planning sprint is 6 of 8 docs
done — only human-mirror-rules and engineering-v1-acceptance
remain.
2026-04-07 06:50:56 -04:00
a637017900 slash command for daily AtoCore use + backup-restore procedure
Session 2 of the four-session plan. Lands two operational pieces:
the Claude Code slash command that makes AtoCore reachable from
inside any Claude Code session, and the full backup/restore
procedure doc that turns the backup endpoint code into a real
operational drill.

Slash command (.claude/commands/atocore-context.md)
---------------------------------------------------
- Project-level slash command following the standard frontmatter
  format (description + argument-hint)
- Parses the user prompt and an optional trailing project id, with
  case-insensitive matching against the registered project ids
  (atocore, p04-gigabit, p05-interferometer, p06-polisher and
  their aliases)
- Calls POST /context/build on the live AtoCore service, defaulting
  to http://dalidou:8100 (overridable via ATOCORE_API_BASE env var)
- Renders the formatted context pack inline so the user can see
  exactly what AtoCore would feed an LLM, plus a stats banner and a
  per-chunk source list
- Includes graceful failure handling for network errors, 4xx, 5xx,
  and the empty-result case
- Defines a future capture path that POSTs to /interactions for the
  Phase 9 reflection loop. The current command leaves capture as
  manual / opt-in pending a clean post-turn hook design

.gitignore changes
------------------
- Replaced wholesale .claude/ ignore with .claude/* + exceptions
  for .claude/commands/ so project slash commands can be tracked
- Other .claude/* paths (worktrees, settings, local state) remain
  ignored

Backup-restore procedure (docs/backup-restore-procedure.md)
-----------------------------------------------------------
- Defines what gets backed up (SQLite + registry always, Chroma
  optional under ingestion lock) and what doesn't (sources, code,
  logs, cache, tmp)
- Documents the snapshot directory layout and the timestamp format
- Three trigger paths in priority order:
  - via POST /admin/backup with {include_chroma: true|false}
  - via the standalone src/atocore/ops/backup.py module
  - via cold filesystem copy with brief downtime as last resort
- Listing and validation procedure with the /admin/backup and
  /admin/backup/{stamp}/validate endpoints
- Full step-by-step restore procedure with mandatory pre-flight
  safety snapshot, ownership/permission requirements, and the
  post-restore verification checks
- Rollback path using the pre-restore safety copy
- Retention policy (last 7 daily / 4 weekly / 6 monthly) and
  explicit acknowledgment that the cleanup job is not yet
  implemented
- Drill schedule: quarterly full restore drill, post-migration
  drill, post-incident validation
- Common failure mode table with diagnoses
- Quickstart cheat sheet at the end for daily reference
- Open follow-ups: cleanup script, off-Dalidou target,
  encryption, automatic post-backup validation, incremental
  Chroma snapshots

The procedure has not yet been exercised against the live Dalidou
instance — that is the next step the user runs themselves once
the slash command is in place.
2026-04-07 06:46:50 -04:00
d0ff8b5738 Merge origin/main into codex/dalidou-storage-foundation
Integrate codex/port-atocore-ops-client (operator client + operations
playbook) so the dalidou-storage-foundation branch can fast-forward
into main.

# Conflicts:
#	README.md
2026-04-07 06:20:19 -04:00
b9da5b6d84 phase9 first-real-use validation + small hygiene wins
Session 1 of the four-session plan. Empirically exercises the Phase 9
loop (capture -> reinforce -> extract) for the first time and lands
three small hygiene fixes.

Validation script + report
--------------------------
scripts/phase9_first_real_use.py — reproducible script that:
  - sets up an isolated SQLite + Chroma store under
    data/validation/phase9-first-use (gitignored)
  - seeds 3 active memories
  - runs 8 sample interactions through capture + reinforce + extract
  - prints what each step produced and reinforcement state at the end
  - supports --json output for downstream tooling

docs/phase9-first-real-use.md — narrative report of the run with:
  - extraction results table (8/8 expectations met exactly)
  - the empirical finding that REINFORCEMENT MATCHED ZERO seeds
    despite sample 5 clearly echoing the rebase preference memory
  - root cause analysis: the substring matcher is too brittle for
    natural paraphrases (e.g. "prefers" vs "I prefer", "history"
    vs "the history")
  - recommended fix: replace substring matcher with a token-overlap
    matcher (>=70% of memory tokens present in response, with
    light stemming and a small stop list)
  - explicit note that the fix is queued as a follow-up commit, not
    bundled into the report — keeps the audit trail clean

Key extraction results from the run:
  - all 7 heading/sentence rules fired correctly
  - 0 false positives on the prose-only sample (the most important
    sanity check)
  - long content preserved without truncation
  - dedup correctly kept three distinct cues from one interaction
  - project scoping flowed cleanly through the pipeline

Hygiene 1: FastAPI lifespan migration (src/atocore/main.py)
- Replaced @app.on_event("startup") with the modern @asynccontextmanager
  lifespan handler
- Same setup work (setup_logging, ensure_runtime_dirs, init_db,
  init_project_state_schema, startup_ready log)
- Removes the two on_event deprecation warnings from every test run
- Test suite now shows 1 warning instead of 3

Hygiene 2: EXTRACTOR_VERSION constant (src/atocore/memory/extractor.py)
- Added EXTRACTOR_VERSION = "0.1.0" with a versioned change log comment
- MemoryCandidate dataclass carries extractor_version on every candidate
- POST /interactions/{id}/extract response now includes extractor_version
  on both the top level (current run) and on each candidate
- Implements the versioning requirement called out in
  docs/architecture/promotion-rules.md so old candidates can be
  identified and re-evaluated when the rule set evolves

Hygiene 3: ~/.git-credentials cleanup (out-of-tree, not committed)
- Removed the dead OAUTH_USER:<jwt> line for dalidou:3000 that was
  being silently rewritten by the system credential manager on every
  push attempt
- Configured credential.http://dalidou:3000.helper with the empty-string
  sentinel pattern so the URL-specific helper chain is exactly
  ["", store] instead of inheriting the system-level "manager" helper
  that ships with Git for Windows
- Same fix for the 100.80.199.40 (Tailscale) entry
- Verified end to end: a fresh push using only the cleaned credentials
  file (no embedded URL) authenticates as Antoine and lands cleanly

Full suite: 160 passing (no change from previous), 1 warning
(was 3) thanks to the lifespan migration.
2026-04-07 06:16:35 -04:00
bd291ff874 test: empty commit to verify clean credential helper round-trip 2026-04-07 06:13:45 -04:00
480f13a6df docs(arch): memory-vs-entities, promotion-rules, conflict-model
Three planning docs that answer the architectural questions the
engineering query catalog raised. Together with the catalog they
form roughly half of the pre-implementation planning sprint.

docs/architecture/memory-vs-entities.md
---------------------------------------
Resolves the central question blocking every other engineering
layer doc: is a Decision a memory or an entity?

Key decisions:
- memories stay the canonical home for identity, preference, and
  episodic facts
- entities become the canonical home for project, knowledge, and
  adaptation facts once the engineering layer V1 ships
- no concept lives in both layers at full fidelity; one canonical
  home per concept
- a "graduation" flow lets active memories upgrade into entities
  (memory stays as a frozen historical pointer, never deleted)
- one shared candidate review queue across both layers
- context builder budget gains a 15% slot for engineering entities,
  slotted between identity/preference memories and retrieved chunks
- the Phase 9 memory extractor's structural cues (decision heading,
  constraint heading, requirement heading) are explicitly an
  intentional temporary overlap, cleanly migrated via graduation
  when the entity extractor ships

docs/architecture/promotion-rules.md
------------------------------------
Defines the full Layer 0 → Layer 2 pipeline:

- four layers: L0 raw source, L1 memory candidate/active, L2 entity
  candidate/active, L3 trusted project state
- three extraction triggers: on interaction capture (existing),
  on ingestion wave (new, batched per wave), on explicit request
- per-rule prior confidence tuned at write time by structural
  signal (echoes the retriever's high/low signal hints) and
  freshness bonus
- batch cap of 50 candidates per pass to protect the reviewer
- full provenance requirements: every candidate carries rule id,
  source_chunk_id, source_interaction_id, and extractor_version
- reversibility matrix for every promotion step
- explicit no-auto-promotion-in-V1 stance with the schema designed
  so auto-promotion policies can be added later without migration
- the hard invariant: nothing ever moves into L3 automatically
- ingestion-wave extraction produces a report artifact under
  data/extraction-reports/<wave-id>/

docs/architecture/conflict-model.md
-----------------------------------
Defines how AtoCore handles contradictory facts without violating
the "bad memory is worse than no memory" rule.

- conflict = two or more active rows claiming the same slot with
  incompatible values
- per-type "slot key" tuples for both memory and entity types
- cross-layer conflict detection respects the trust hierarchy:
  trusted project state > active entities > active memories
- new conflicts and conflict_members tables (schema proposal)
- detection at two latencies: synchronous at write time,
  asynchronous nightly sweep
- "flag, never block" rule: writes always succeed, conflicts are
  surfaced via /conflicts, /health open_conflicts_count, per-row
  response bodies, and the Human Mirror's disputed marker
- resolution is always human: promote-winner + supersede-others,
  or dismiss-as-not-a-real-conflict, both with audit trail
- explicitly out of scope for V1: cross-project conflicts,
  temporal-overlap conflicts, tolerance-aware numeric comparisons

Also updates:
- master-plan-status.md: Phase 9 moved from "started" to "baseline
  complete" now that Commits A, B, C are all landed
- master-plan-status.md: adds a "Engineering Layer Planning Sprint"
  section listing the doc wave so far and the remaining docs
  (tool-handoff-boundaries, human-mirror-rules,
  representation-authority, engineering-v1-acceptance)
- current-state.md: Phase 9 moved from "not started" to "baseline
  complete" with the A/B/C annotation

This is pure doc work. No code changes, no schema changes, no
behavior changes. Per the working rule in master-plan-status.md:
the architecture docs shape decisions, they do not force premature
schema work.
2026-04-06 21:30:35 -04:00
53147d326c feat(phase9-C): rule-based candidate extractor and review queue
Phase 9 Commit C. Closes the capture loop: Commit A records what
AtoCore fed the LLM and what came back, Commit B bumps confidence on
active memories the response actually references, and this commit
turns structured cues in the response into candidate memories for a
human review queue.

Nothing extracted here is ever automatically promoted into trusted
state. Every candidate sits at status="candidate" until a human (or
later, a confident automatic policy) calls /memory/{id}/promote or
/memory/{id}/reject. This keeps the "bad memory is worse than no
memory" invariant from the operating model intact.

New module: src/atocore/memory/extractor.py
- MemoryCandidate dataclass (type, content, rule, source_span,
  project, confidence, source_interaction_id)
- extract_candidates_from_interaction(interaction): runs a fixed set
  of regex rules over the response + response_summary and returns
  a list of candidates

V0 rule set (deliberately narrow to keep false positives low):
- decision_heading     ## Decision: / ## Decision - / ## Decision —
                       -> adaptation candidate
- constraint_heading   ## Constraint: ...      -> project candidate
- requirement_heading  ## Requirement: ...     -> project candidate
- fact_heading         ## Fact: ...            -> knowledge candidate
- preference_sentence  "I prefer X" / "the user prefers X"
                       -> preference candidate
- decided_to_sentence  "decided to X"          -> adaptation candidate
- requirement_sentence "the requirement is X"  -> project candidate

Extractor post-processing:
- clean_value: collapse whitespace, strip trailing punctuation
- min content length 8 chars, max 280 (keeps candidates reviewable)
- dedupe by (memory_type, normalized value, rule)
- drop candidates whose content already matches an active memory of
  the same type+project so the queue doesn't ask humans to re-curate
  things they already promoted

Memory service (extends Commit B candidate-status foundation):
- promote_memory(id): candidate -> active (404 if not a candidate)
- reject_candidate_memory(id): candidate -> invalid
- both are no-ops if the target isn't currently a candidate so the
  API can surface 404 without the caller needing to pre-check

API endpoints (new):
- POST /interactions/{id}/extract             run extractor, preview-only
  body: {"persist": false}                    (default) returns candidates
        {"persist": true}                     creates candidate memories
- POST /memory/{id}/promote                   candidate -> active
- POST /memory/{id}/reject                    candidate -> invalid
- GET  /memory?status=candidate               list review queue explicitly
      (existing endpoint now accepts status= override)
- GET  /memory now also returns reference_count and last_referenced_at
  per memory so the Commit B reinforcement signal is visible to clients

Trust model unchanged:
- candidates NEVER appear in context packs (get_memories_for_context
  still filters to active via the active_only default)
- candidates NEVER get reinforced by the Commit B loop (reinforcement
  refuses non-active memories)
- trusted project state is untouched end-to-end

Tests (25 new, all green):
- heading pattern: decision, constraint, requirement, fact
- separator variants :, -, em-dash
- sentence patterns: preference, decided_to, requirement
- rejects too-short matches
- dedupes identical matches
- strips trailing punctuation
- carries project and source_interaction_id onto candidates
- drops candidates that duplicate an existing active memory
- returns empty for prose without structural cues
- candidate and active coexist in the memory table
- promote_memory moves candidate -> active
- promote on non-candidate returns False
- reject_candidate_memory moves candidate -> invalid
- reject on non-candidate returns False
- get_memories(status="candidate") returns just the queue
- POST /interactions/{id}/extract preview-only path
- POST /interactions/{id}/extract persist=true path
- POST /interactions/{id}/extract 404 for missing interaction
- POST /memory/{id}/promote success + 404 on non-candidate
- POST /memory/{id}/reject 404 on missing
- GET /memory?status=candidate surfaces the queue
- GET /memory?status=<invalid> returns 400

Full suite: 160 passing (was 135).

What Phase 9 looks like end to end after this commit
----------------------------------------------------
prompt
  -> context pack assembled
    -> LLM response
      -> POST /interactions (capture)
         -> automatic Commit B reinforcement (active memories only)
         -> [optional] POST /interactions/{id}/extract
            -> Commit C extractor proposes candidates
               -> human reviews via GET /memory?status=candidate
                  -> POST /memory/{id}/promote  (candidate -> active)
                  OR POST /memory/{id}/reject   (candidate -> invalid)

Not in this commit (deferred on purpose):
- Decay of unused memories (we keep reference_count and
  last_referenced_at so a later decay job has the signal it needs)
- LLM-based extractor as an alternative to the regex rules
- Automatic promotion of high-confidence candidates
- Candidate-to-entity upgrade path (needs the engineering layer
  memory-vs-entities decision, planned in a coming architecture doc)
2026-04-06 21:24:17 -04:00
2704997256 feat(phase9-B): reinforce active memories from captured interactions
Phase 9 Commit B from the agreed plan. With Commit A capturing what
AtoCore fed to the LLM and what came back, this commit closes the
weakest part of the loop: when a memory is actually referenced in a
response, its confidence should drift up, and stale memories that
nobody ever mentions should stay where they are.

This is reinforcement only — nothing is promoted into trusted state
and no candidates are created. Extraction is Commit C.

Schema (additive migration):
- memories.last_referenced_at DATETIME      (null by default)
- memories.reference_count    INTEGER DEFAULT 0
- idx_memories_last_referenced on last_referenced_at
- memories.status now accepts the new "candidate" value so Commit C
  has the status slot to land on. Existing active/superseded/invalid
  rows are untouched.

New module: src/atocore/memory/reinforcement.py
- reinforce_from_interaction(interaction): scans the interaction's
  response + response_summary for echoes of active memories and
  bumps confidence / reference_count for each match
- matching is intentionally simple and explainable:
  * normalize both sides (lowercase, collapse whitespace)
  * require >= 12 chars of memory content to match
  * compare the leading 80-char window of each memory
- the candidate pool is project-scoped memories for the interaction's
  project + global identity + preference memories, deduplicated
- candidates and invalidated memories are NEVER reinforced; only
  active memories move

Memory service changes:
- MEMORY_STATUSES = ["candidate", "active", "superseded", "invalid"]
- create_memory(status="candidate"|"active"|...) with per-status
  duplicate scoping so a candidate and an active with identical text
  can legitimately coexist during review
- get_memories(status=...) explicit override of the legacy active_only
  flag; callers can now list the review queue cleanly
- update_memory accepts any valid status including "candidate"
- reinforce_memory(id, delta): low-level primitive that bumps
  confidence (capped at 1.0), increments reference_count, and sets
  last_referenced_at. Only active memories; returns (applied, old, new)
- promote_memory / reject_candidate_memory helpers prepping Commit C

Interactions service:
- record_interaction(reinforce=True) runs reinforce_from_interaction
  automatically when the interaction has response content. reinforcement
  errors are logged but never raised back to the caller so capture
  itself is never blocked by a flaky downstream.
- circular import between interactions service and memory.reinforcement
  avoided by lazy import inside the function

API:
- POST /interactions now accepts a reinforce bool field (default true)
- POST /interactions/{id}/reinforce runs reinforcement on an existing
  captured interaction — useful for backfilling or for retrying after
  a transient error in the automatic pass
- response lists which memory ids were reinforced with
  old / new confidence for audit

Tests (17 new, all green):
- reinforce_memory bumps, caps at 1.0, accumulates reference_count
- reinforce_memory rejects candidates and missing ids
- reinforce_memory rejects negative delta
- reinforce_from_interaction matches active memory
- reinforce_from_interaction ignores candidates and inactive
- reinforce_from_interaction requires minimum content length
- reinforce_from_interaction handles empty response cleanly
- reinforce_from_interaction normalizes casing and whitespace
- reinforce_from_interaction deduplicates across memory buckets
- record_interaction auto-reinforces by default
- record_interaction reinforce=False skips the pass
- record_interaction handles empty response
- POST /interactions/{id}/reinforce runs against stored interaction
- POST /interactions/{id}/reinforce returns 404 for missing id
- POST /interactions accepts reinforce=false

Full suite: 135 passing (was 118).

Trust model unchanged:
- reinforcement only moves confidence within the existing active set
- the candidate lifecycle is declared but only Commit C will actually
  create candidate memories
- trusted project state is never touched by reinforcement

Next: Commit C adds the rule-based extractor that produces candidate
memories from captured interactions plus the promote/reject review
queue endpoints.
2026-04-06 21:18:38 -04:00
ac14f8d6a4 Merge branch 'codex/port-atocore-ops-client' 2026-04-06 20:05:56 -04:00
ceb129c7d1 Add operator client and operations playbook 2026-04-06 19:59:09 -04:00
2e449a4c33 docs(arch): engineering query catalog as the V1 driving target
First doc in the engineering-layer planning sprint. The premise of
this document is the inverse of the existing ontology doc: instead of
listing objects and seeing what they could do, we list the questions
we need to answer and let those drive what objects and relationships
must exist.

The rule established here:

> If a typed object or relationship does not serve at least one query
> in this catalog, it is not in V1.

Contents:

- 20 v1-required queries grouped into 5 tiers:
  - structure (Q-001..Q-004)
  - intent (Q-005..Q-009)
  - validation (Q-010..Q-012)
  - change/time (Q-013..Q-014)
  - cross-cutting (Q-016..Q-020)
- 3 v1-stretch queries (Q-021..Q-023)
- 4 v2 deferred queries (Q-024..Q-027) so V1 does not paint us into
  a corner

Each entry has: id, question, invocation, expected result shape,
required objects, required relationships, provenance requirement,
and tier.

Three queries are flagged as the "killer correctness" queries:
- Q-006 orphan requirements (engineering equivalent of untested code)
- Q-009 decisions based on flagged assumptions (catches fragile design)
- Q-011 validation claims with no supporting result (catches
  unevidenced claims)

The catalog ends with the implied implementation order for V1, the
list of object families intentionally deferred (BOM, manufacturing,
software, electrical, test correlation), and the open questions this
catalog raises for the next planning docs:

- when do orphan/unsupported queries flag (insert time vs query time)?
- when an Assumption flips, are dependent Decisions auto-flagged?
- does AtoCore block conflicts or always save-and-flag?
- is EVIDENCED_BY mandatory at insert?
- when does the Human Mirror regenerate?

These are the questions the next planning docs (memory-vs-entities,
conflict-model, promotion-rules) should answer before any engineering
layer code is written.

This is doc work only. No code, no schema, no behavior change.
Per the working rule in master-plan-status.md: the architecture docs
shape decisions, they do not force premature schema work.
2026-04-06 19:33:44 -04:00
ea3fed3d44 feat(phase9-A): interaction capture loop foundation
Phase 9 Commit A from the agreed plan: turn AtoCore from a stateless
context enhancer into a system that records what it actually fed to an
LLM and what came back. This is the audit trail Reflection (Commit B)
and Extraction (Commit C) will be layered on top of.

The interactions table existed in the schema since the original PoC
but nothing wrote to it. This change makes it real:

Schema migration (additive only):
- response                full LLM response (caller decides how much)
- memories_used           JSON list of memory ids in the context pack
- chunks_used             JSON list of chunk ids in the context pack
- client                  identifier of the calling system
                          (openclaw, claude-code, manual, ...)
- session_id              groups multi-turn conversations
- project                 project name (mirrors the memory module pattern,
                          no FK so capture stays cheap)
- indexes on session_id, project, created_at

The created_at column is now written explicitly with a SQLite-compatible
'YYYY-MM-DD HH:MM:SS' format so the same string lives in the DB and the
returned dataclass. Without this the `since` filter on list_interactions
would silently fail because CURRENT_TIMESTAMP and isoformat use different
shapes that do not compare cleanly as strings.

New module src/atocore/interactions/:
- Interaction dataclass
- record_interaction()    persists one round-trip (prompt required;
                          everything else optional). Refuses empty prompts.
- list_interactions()     filters by project / session_id / client / since,
                          newest-first, hard-capped at 500
- get_interaction()       fetch by id, full response + context pack

API endpoints:
- POST   /interactions                capture one interaction
- GET    /interactions                list with summaries (no full response)
- GET    /interactions/{id}           full record incl. response + pack

Trust model:
- Capture is read-only with respect to memories, project state, and
  source chunks. Nothing here promotes anything into trusted state.
- The audit trail becomes the dataset Commit B (reinforcement) and
  Commit C (extraction + review queue) will operate on.

Tests (13 new, all green):
- service: persist + roundtrip every field
- service: minimum-fields path (prompt only)
- service: empty / whitespace prompt rejected
- service: get by id returns None for missing
- service: filter by project, session, client
- service: ordering newest-first with limit
- service: since filter inclusive on cutoff (the bug the timestamp
  fix above caught)
- service: limit=0 returns empty
- API: POST records and round-trips through GET /interactions/{id}
- API: empty prompt returns 400
- API: missing id returns 404
- API: list filter returns summaries (not full response bodies)

Full suite: 118 passing (was 105).

master-plan-status.md updated to move Phase 9 from "not started" to
"started" with the explicit note that Commit A is in and Commits B/C
remain.
2026-04-06 19:31:43 -04:00
c9b9eede25 feat: tunable ranking, refresh status, chroma backup + admin endpoints
Three small improvements that move the operational baseline forward
without changing the existing trust model.

1. Tunable retrieval ranking weights
   - rank_project_match_boost, rank_query_token_step,
     rank_query_token_cap, rank_path_high_signal_boost,
     rank_path_low_signal_penalty are now Settings fields
   - all overridable via ATOCORE_* env vars
   - retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32
   - lets ranking be tuned per environment as Wave 1 is exercised
     without code changes

2. /projects/{name}/refresh status
   - refresh_registered_project now returns an overall status field
     ("ingested", "partial", "nothing_to_ingest") plus roots_ingested
     and roots_skipped counters
   - ProjectRefreshResponse advertises the new fields so callers can
     rely on them
   - covers the case where every configured root is missing on disk

3. Chroma cold snapshot + admin backup endpoints
   - create_runtime_backup now accepts include_chroma and writes a
     cold directory copy of the chroma persistence path
   - new list_runtime_backups() and validate_backup() helpers
   - new endpoints:
     - POST /admin/backup            create snapshot (optional chroma)
     - GET  /admin/backup            list snapshots
     - GET  /admin/backup/{stamp}/validate  structural validation
   - chroma snapshots are taken under exclusive_ingestion() so a refresh
     or ingest cannot race with the cold copy
   - backup metadata records what was actually included and how big

Tests:
- 8 new tests covering tunable weights, refresh status branches
  (ingested / partial / nothing_to_ingest), chroma snapshot, list,
  validate, and the API endpoints (including the lock-acquisition path)
- existing fake refresh stubs in test_api_storage.py updated for the
  expanded ProjectRefreshResponse model
- full suite: 105 passing (was 97)

next-steps doc updated to reflect that the chroma snapshot + restore
validation gap from current-state.md is now closed in code; only the
operational retention policy remains.
2026-04-06 18:42:19 -04:00
14ab7c8e9f fix: pass project_hint into retrieve and add path-signal ranking
Two changes that belong together:

1. builder.build_context() now passes project_hint into retrieve(),
   so the project-aware boost actually fires for the retrieval pipeline
   driven by /context/build. Before this, only direct /query callers
   benefited from the registered-project boost.

2. retriever now applies two more ranking signals on every chunk:
   - _query_match_boost: boosts chunks whose source/title/heading
     echo high-signal query tokens (stop list filters out generic
     words like "the", "project", "system")
   - _path_signal_boost: down-weights archival noise (_archive,
     _history, pre-cleanup, reviews) by 0.72 and up-weights current
     high-signal docs (status, decision, requirements, charter,
     system-map, error-budget, ...) by 1.18

Tests:
- test_context_builder_passes_project_hint_to_retrieval verifies
  the wiring fix
- test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths
  verifies the new ranking helpers prefer current docs over archive

This addresses the cross-project competition and archive bleed
called out in current-state.md after the Wave 1 ingestion.
2026-04-06 18:37:07 -04:00
bdb42dba05 Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00
46a5d5887a Update plan status for organic routing 2026-04-06 14:06:54 -04:00
9943338846 Document organic OpenClaw routing layer 2026-04-06 14:04:49 -04:00
26bfa94c65 Add project-aware boost to raw query 2026-04-06 13:32:33 -04:00
4aa2b696a9 Document next-phase execution plan 2026-04-06 13:10:11 -04:00
af01dd3e70 Add engineering architecture docs 2026-04-06 12:45:28 -04:00
8f74cab0e6 Sync live project registry descriptions 2026-04-06 12:36:15 -04:00
06aa931273 Add project registry update flow 2026-04-06 12:31:24 -04:00
c9757e313a Harden runtime and add backup foundation 2026-04-06 10:15:00 -04:00