Close the backup side of the loop: we had create/list/validate but
no restore, and no documented drill. A backup you've never restored
is not a backup. This lands the missing restore surface and the
procedure to exercise it before enabling any write-path automation
(auto-capture, automated ingestion, reinforcement sweeps).
Code — src/atocore/ops/backup.py:
- restore_runtime_backup(stamp, *, include_chroma, pre_restore_snapshot,
confirm_service_stopped) performs:
1. validate_backup() gate — refuse on any error
2. pre-restore safety snapshot of current state (reversibility anchor)
3. PRAGMA wal_checkpoint(TRUNCATE) on target db (flush + release
OS handles; Windows needs this after conn.backup() reads)
4. unlink stale -wal/-shm sidecars (tolerant to Windows lock races)
5. shutil.copy2 snapshot db over target
6. restore registry if snapshot captured one
7. restore Chroma tree if snapshot captured one and include_chroma
resolves to true (defaults to whether backup has Chroma)
8. PRAGMA integrity_check on restored db, report result
- Refuses without confirm_service_stopped=True to prevent hot-restore
into a running service (would corrupt SQLite state)
- Rewrote main() as argparse with 4 subcommands: create, list,
validate, restore. `python -m atocore.ops.backup restore STAMP
--confirm-service-stopped` is the drill CLI entry point, run via
`docker compose run --rm --entrypoint python atocore` so it reuses
the live service's volume mounts
Tests — tests/test_backup.py (6 new):
- test_restore_refuses_without_confirm_service_stopped
- test_restore_raises_on_invalid_backup
- test_restore_round_trip_reverses_post_backup_mutations
(canonical drill flow: seed -> backup -> mutate -> restore ->
mutation gone + baseline survived + pre-restore snapshot has
the mutation captured as rollback anchor)
- test_restore_round_trip_with_chroma
- test_restore_skips_pre_snapshot_when_requested
- test_restore_cleans_stale_wal_sidecars (asserts stale byte
markers do not survive, not file existence, since PRAGMA
integrity_check may legitimately recreate -wal)
Docs — docs/backup-restore-drill.md (new):
- What gets backed up (hot sqlite, cold chroma, registry JSON,
metadata.json) and what doesn't (.env, source content)
- What restore does, step by step, and why confirm_service_stopped
is a hard gate
- 8-step drill procedure: capture -> baseline -> mutate -> stop ->
restore -> start -> verify marker gone -> optional cleanup
- Correct endpoint bodies verified against routes.py:
POST /admin/backup with JSON body {"include_chroma": true}
POST /memory with memory_type/content/project/confidence
GET /memory?project=drill to list drill markers
POST /query with {"prompt": ..., "top_k": ...} (not "query")
- Failure modes: integrity_check fail, container won't start,
marker still present after restore, with remediation for each
- When to run: before new write-path automation, after backup.py
or schema changes, after infra bumps, monthly as standing check
225/225 tests passing (219 existing + 6 new restore).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>