Commit Graph

3 Commits

Author SHA1 Message Date
2d911909f8 feat: auto-capture Claude Code sessions via Stop hook
Add deploy/hooks/capture_stop.py — a Claude Code Stop hook that reads
the transcript JSONL, extracts the last user prompt, and POSTs to the
AtoCore /interactions endpoint in conservative mode (reinforce=false).

Conservative mode means: capture only, no automatic reinforcement or
extraction into the review queue. Kill switch: ATOCORE_CAPTURE_DISABLED=1.

Also: note build_sha cosmetic issue after restore in runbook, update
project status docs to reflect drill pass and auto-capture wiring.

17 new tests (243 total, all passing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 09:00:42 -04:00
1a8fdf4225 fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.

## Chroma restore bind-mount bug (drill finding)

src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
  OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.

Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.

Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.

## Doc consolidation

docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.

- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
  - Replaced the manual sudo cp restore sequence with the new
    `python -m atocore.ops.backup restore <STAMP>
    --confirm-service-stopped` CLI
  - Added the one-shot docker compose run pattern for running
    restore inside a container that reuses the live volume mounts
  - Documented the --no-pre-snapshot / --no-chroma / --chroma flags
  - New "Chroma restore and bind-mounted volumes" subsection
    explaining the bug and the regression test that protects the fix
  - New "Restore drill" subsection with three levels (unit tests,
    module round-trip, live Dalidou drill) and the cadence list
  - Failure-mode table gained four entries: restored_integrity_ok,
    Device-or-resource-busy, drill marker still present,
    chroma_snapshot_missing
  - "Open follow-ups" struck the restore_runtime_backup item (done)
    and added a "Done (historical)" note referencing 2026-04-09
  - Quickstart cheat sheet now has a full drill one-liner using
    memory_type=episodic (the 2026-04-09 drill found the runbook's
    memory_type=note was invalid — the valid set is identity,
    preference, project, episodic, knowledge, adaptation)

## Status doc sync

Long overdue — I've been landing code without updating the
project's narrative state docs.

docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
  real with CLI, pre-restore safety snapshot, WAL cleanup,
  integrity check; live drill on 2026-04-09 surfaced and fixed
  Chroma bind-mount bug; deploy provenance via /health build_sha;
  deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
  auto-capture (priority 2) are now ahead of retrieval quality work,
  reflecting the updated unblock sequence

docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
  (create/list/validate/restore, admin/backup endpoint with
  include_chroma, /health provenance, self-update guard,
  procedure doc with failure modes) and what's still pending
  (retention cleanup, off-Dalidou target, auto-validation)

## Test count

226 passing (was 225 + 1 new inode-stability regression test).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
a637017900 slash command for daily AtoCore use + backup-restore procedure
Session 2 of the four-session plan. Lands two operational pieces:
the Claude Code slash command that makes AtoCore reachable from
inside any Claude Code session, and the full backup/restore
procedure doc that turns the backup endpoint code into a real
operational drill.

Slash command (.claude/commands/atocore-context.md)
---------------------------------------------------
- Project-level slash command following the standard frontmatter
  format (description + argument-hint)
- Parses the user prompt and an optional trailing project id, with
  case-insensitive matching against the registered project ids
  (atocore, p04-gigabit, p05-interferometer, p06-polisher and
  their aliases)
- Calls POST /context/build on the live AtoCore service, defaulting
  to http://dalidou:8100 (overridable via ATOCORE_API_BASE env var)
- Renders the formatted context pack inline so the user can see
  exactly what AtoCore would feed an LLM, plus a stats banner and a
  per-chunk source list
- Includes graceful failure handling for network errors, 4xx, 5xx,
  and the empty-result case
- Defines a future capture path that POSTs to /interactions for the
  Phase 9 reflection loop. The current command leaves capture as
  manual / opt-in pending a clean post-turn hook design

.gitignore changes
------------------
- Replaced wholesale .claude/ ignore with .claude/* + exceptions
  for .claude/commands/ so project slash commands can be tracked
- Other .claude/* paths (worktrees, settings, local state) remain
  ignored

Backup-restore procedure (docs/backup-restore-procedure.md)
-----------------------------------------------------------
- Defines what gets backed up (SQLite + registry always, Chroma
  optional under ingestion lock) and what doesn't (sources, code,
  logs, cache, tmp)
- Documents the snapshot directory layout and the timestamp format
- Three trigger paths in priority order:
  - via POST /admin/backup with {include_chroma: true|false}
  - via the standalone src/atocore/ops/backup.py module
  - via cold filesystem copy with brief downtime as last resort
- Listing and validation procedure with the /admin/backup and
  /admin/backup/{stamp}/validate endpoints
- Full step-by-step restore procedure with mandatory pre-flight
  safety snapshot, ownership/permission requirements, and the
  post-restore verification checks
- Rollback path using the pre-restore safety copy
- Retention policy (last 7 daily / 4 weekly / 6 monthly) and
  explicit acknowledgment that the cleanup job is not yet
  implemented
- Drill schedule: quarterly full restore drill, post-migration
  drill, post-incident validation
- Common failure mode table with diagnoses
- Quickstart cheat sheet at the end for daily reference
- Open follow-ups: cleanup script, off-Dalidou target,
  encryption, automatic post-backup validation, incremental
  Chroma snapshots

The procedure has not yet been exercised against the live Dalidou
instance — that is the next step the user runs themselves once
the slash command is in place.
2026-04-07 06:46:50 -04:00