2026-04-05 19:12:45 -04:00
|
|
|
# AtoCore Next Steps
|
|
|
|
|
|
|
|
|
|
## Current Position
|
|
|
|
|
|
|
|
|
|
AtoCore now has:
|
|
|
|
|
|
|
|
|
|
- canonical runtime and machine storage on Dalidou
|
|
|
|
|
- separated source and machine-data boundaries
|
|
|
|
|
- initial self-knowledge ingested into the live instance
|
|
|
|
|
- trusted project-state entries for AtoCore itself
|
|
|
|
|
- a first read-only OpenClaw integration path on the T420
|
2026-04-06 07:25:33 -04:00
|
|
|
- a first real active-project corpus batch for:
|
|
|
|
|
- `p04-gigabit`
|
|
|
|
|
- `p05-interferometer`
|
|
|
|
|
- `p06-polisher`
|
2026-04-05 19:12:45 -04:00
|
|
|
|
2026-04-06 13:10:11 -04:00
|
|
|
This working list should be read alongside:
|
|
|
|
|
|
|
|
|
|
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
|
|
|
|
|
|
2026-04-05 19:12:45 -04:00
|
|
|
## Immediate Next Steps
|
|
|
|
|
|
2026-04-11 09:00:42 -04:00
|
|
|
1. ~~Re-run the backup/restore drill~~ — DONE 2026-04-11, full pass
|
|
|
|
|
2. ~~Turn on auto-capture of Claude Code sessions~~ — DONE 2026-04-11,
|
|
|
|
|
Stop hook via `deploy/hooks/capture_stop.py` → `POST /interactions`
|
|
|
|
|
with `reinforce=false`; kill switch: `ATOCORE_CAPTURE_DISABLED=1`
|
|
|
|
|
2a. Run a short real-use pilot with auto-capture on
|
|
|
|
|
- verify interactions are landing in Dalidou
|
|
|
|
|
- check prompt/response quality and truncation
|
|
|
|
|
- confirm fail-open: no user-visible impact when Dalidou is down
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
3. Use the T420 `atocore-context` skill and the new organic routing layer in
|
2026-04-06 14:04:49 -04:00
|
|
|
real OpenClaw workflows
|
|
|
|
|
- confirm `auto-context` feels natural
|
|
|
|
|
- confirm project inference is good enough in practice
|
2026-04-05 19:12:45 -04:00
|
|
|
- confirm the fail-open behavior remains acceptable in practice
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
4. Review retrieval quality after the first real project ingestion batch
|
2026-04-05 19:12:45 -04:00
|
|
|
- check whether the top hits are useful
|
|
|
|
|
- check whether trusted project state remains dominant
|
2026-04-06 07:25:33 -04:00
|
|
|
- reduce cross-project competition and prompt ambiguity where needed
|
2026-04-06 14:58:14 -04:00
|
|
|
- use `debug-context` to inspect the exact last AtoCore supplement
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
5. Treat the active-project full markdown/text wave as complete
|
2026-04-06 14:58:14 -04:00
|
|
|
- `p04-gigabit`
|
|
|
|
|
- `p05-interferometer`
|
|
|
|
|
- `p06-polisher`
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
6. Define a cleaner source refresh model
|
2026-04-06 07:53:18 -04:00
|
|
|
- make the difference between source truth, staged inputs, and machine store
|
|
|
|
|
explicit
|
|
|
|
|
- move toward a project source registry and refresh workflow
|
2026-04-06 08:02:13 -04:00
|
|
|
- foundation now exists via project registry + per-project refresh API
|
2026-04-06 10:15:00 -04:00
|
|
|
- registration policy + template + proposal + approved registration are now
|
|
|
|
|
the normal path for new projects
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
7. Move to Wave 2 trusted-operational ingestion
|
2026-04-06 14:58:14 -04:00
|
|
|
- curated dashboards
|
|
|
|
|
- decision logs
|
|
|
|
|
- milestone/current-status views
|
|
|
|
|
- operational truth, not just raw project notes
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
8. Integrate the new engineering architecture docs into active planning, not immediate schema code
|
2026-04-06 12:45:28 -04:00
|
|
|
- keep `docs/architecture/engineering-knowledge-hybrid-architecture.md` as the target layer model
|
|
|
|
|
- keep `docs/architecture/engineering-ontology-v1.md` as the V1 structured-domain target
|
|
|
|
|
- do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
9. Finish the boring operations baseline around backup
|
|
|
|
|
- retention policy cleanup script (snapshots dir grows
|
|
|
|
|
monotonically today)
|
|
|
|
|
- off-Dalidou backup target (at minimum an rsync to laptop or
|
|
|
|
|
another host so a single-disk failure isn't terminal)
|
|
|
|
|
- automatic post-backup validation (have `create_runtime_backup`
|
|
|
|
|
call `validate_backup` on its own output and refuse to
|
|
|
|
|
declare success if validation fails)
|
|
|
|
|
- DONE in commits be40994 / 0382238 / 3362080 / this one:
|
|
|
|
|
- `create_runtime_backup` + `list_runtime_backups` +
|
|
|
|
|
`validate_backup` + `restore_runtime_backup` with CLI
|
|
|
|
|
- `POST /admin/backup` with `include_chroma=true` under
|
|
|
|
|
the ingestion lock
|
|
|
|
|
- `/health` build_sha / build_time / build_branch provenance
|
|
|
|
|
- `deploy.sh` self-update re-exec guard + build_sha drift
|
|
|
|
|
verification
|
|
|
|
|
- live drill procedure in `docs/backup-restore-procedure.md`
|
|
|
|
|
with failure-mode table and the memory_type=episodic
|
|
|
|
|
marker pattern from the 2026-04-09 drill
|
|
|
|
|
10. Keep deeper automatic runtime integration modest until the organic read-only
|
|
|
|
|
model has proven value
|
2026-04-05 19:12:45 -04:00
|
|
|
|
2026-04-06 07:36:33 -04:00
|
|
|
## Trusted State Status
|
|
|
|
|
|
|
|
|
|
The first conservative trusted-state promotion pass is now complete for:
|
|
|
|
|
|
|
|
|
|
- `p04-gigabit`
|
|
|
|
|
- `p05-interferometer`
|
|
|
|
|
- `p06-polisher`
|
|
|
|
|
|
|
|
|
|
Each project now has a small set of stable entries covering:
|
|
|
|
|
|
|
|
|
|
- summary
|
|
|
|
|
- architecture or boundary decision
|
|
|
|
|
- key constraints
|
|
|
|
|
- current next focus
|
|
|
|
|
|
|
|
|
|
This materially improves `context/build` quality for project-hinted prompts.
|
|
|
|
|
|
2026-04-06 07:25:33 -04:00
|
|
|
## Recommended Near-Term Project Work
|
|
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
The active-project full markdown/text wave is now in.
|
2026-04-06 07:25:33 -04:00
|
|
|
|
|
|
|
|
The near-term work is now:
|
|
|
|
|
|
|
|
|
|
1. strengthen retrieval quality
|
2026-04-06 14:58:14 -04:00
|
|
|
2. promote or refine trusted operational truth where the broad corpus is now too noisy
|
2026-04-06 07:36:33 -04:00
|
|
|
3. keep trusted project state concise and high-confidence
|
2026-04-06 14:58:14 -04:00
|
|
|
4. widen only through named ingestion waves
|
2026-04-06 07:25:33 -04:00
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
## Recommended Next Wave Inputs
|
2026-04-05 19:12:45 -04:00
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
Wave 2 should emphasize trusted operational truth, not bulk historical notes.
|
2026-04-05 19:12:45 -04:00
|
|
|
|
2026-04-06 07:25:33 -04:00
|
|
|
P04:
|
|
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
- current status dashboard
|
|
|
|
|
- current selected design path
|
|
|
|
|
- current frame interface truth
|
|
|
|
|
- current next-step milestone view
|
2026-04-06 07:25:33 -04:00
|
|
|
|
|
|
|
|
P05:
|
2026-04-05 19:12:45 -04:00
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
- selected vendor path
|
|
|
|
|
- current error-budget baseline
|
|
|
|
|
- current architecture freeze or open decisions
|
|
|
|
|
- current procurement / next-action view
|
2026-04-06 07:25:33 -04:00
|
|
|
|
|
|
|
|
P06:
|
|
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
- current system map
|
|
|
|
|
- current shared contracts baseline
|
|
|
|
|
- current calibration procedure truth
|
|
|
|
|
- current July / proving roadmap view
|
2026-04-05 19:12:45 -04:00
|
|
|
|
|
|
|
|
## Deferred On Purpose
|
|
|
|
|
|
|
|
|
|
- automatic write-back from OpenClaw into AtoCore
|
|
|
|
|
- automatic memory promotion
|
|
|
|
|
- reflection loop integration
|
|
|
|
|
- replacing OpenClaw's own memory system
|
|
|
|
|
- syncing the live machine DB between machines
|
|
|
|
|
|
|
|
|
|
## Success Criteria For The Next Batch
|
|
|
|
|
|
|
|
|
|
The next batch is successful if:
|
|
|
|
|
|
|
|
|
|
- OpenClaw can use AtoCore naturally when context is needed
|
2026-04-06 14:04:49 -04:00
|
|
|
- OpenClaw can infer registered projects and call AtoCore organically for
|
|
|
|
|
project-knowledge questions
|
2026-04-06 14:58:14 -04:00
|
|
|
- the active-project full corpus wave can be inspected and used concretely
|
|
|
|
|
through `auto-context`, `context-build`, and `debug-context`
|
2026-04-06 10:15:00 -04:00
|
|
|
- OpenClaw can also register a new project cleanly before refreshing it
|
2026-04-06 12:31:24 -04:00
|
|
|
- existing project registrations can be refined safely before refresh when the
|
|
|
|
|
staged source set evolves
|
2026-04-05 19:12:45 -04:00
|
|
|
- AtoCore answers correctly for the active project set
|
2026-04-06 07:25:33 -04:00
|
|
|
- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
|
2026-04-06 07:36:33 -04:00
|
|
|
- trusted project state remains concise and high confidence
|
2026-04-05 19:12:45 -04:00
|
|
|
- project ingestion remains controlled rather than noisy
|
|
|
|
|
- the canonical Dalidou instance stays stable
|
2026-04-06 07:25:33 -04:00
|
|
|
|
|
|
|
|
## Long-Run Goal
|
|
|
|
|
|
|
|
|
|
The long-run target is:
|
|
|
|
|
|
|
|
|
|
- continue working normally inside PKM project stacks and Gitea repos
|
|
|
|
|
- let OpenClaw keep its own memory and runtime behavior
|
|
|
|
|
- let AtoCore supplement LLM work with stronger trusted context, retrieval, and
|
|
|
|
|
context assembly
|
|
|
|
|
|
|
|
|
|
That means AtoCore should behave like a durable external context engine and
|
|
|
|
|
machine-memory layer, not a replacement for normal repo work or OpenClaw memory.
|