2026-04-05 18:47:40 -04:00
|
|
|
# AtoCore Current State
|
|
|
|
|
|
|
|
|
|
## Status Summary
|
|
|
|
|
|
|
|
|
|
AtoCore is no longer just a proof of concept. The local engine exists, the
|
2026-04-05 19:12:45 -04:00
|
|
|
correctness pass is complete, Dalidou now hosts the canonical runtime and
|
|
|
|
|
machine-storage location, and the T420/OpenClaw side now has a safe read-only
|
2026-04-06 07:25:33 -04:00
|
|
|
path to consume AtoCore. The live corpus is no longer just self-knowledge: it
|
|
|
|
|
now includes a first curated ingestion batch for the active projects.
|
2026-04-05 18:47:40 -04:00
|
|
|
|
|
|
|
|
## Phase Assessment
|
|
|
|
|
|
|
|
|
|
- completed
|
|
|
|
|
- Phase 0
|
|
|
|
|
- Phase 0.5
|
|
|
|
|
- Phase 1
|
|
|
|
|
- baseline complete
|
|
|
|
|
- Phase 2
|
|
|
|
|
- Phase 3
|
|
|
|
|
- Phase 5
|
|
|
|
|
- Phase 7
|
docs(arch): memory-vs-entities, promotion-rules, conflict-model
Three planning docs that answer the architectural questions the
engineering query catalog raised. Together with the catalog they
form roughly half of the pre-implementation planning sprint.
docs/architecture/memory-vs-entities.md
---------------------------------------
Resolves the central question blocking every other engineering
layer doc: is a Decision a memory or an entity?
Key decisions:
- memories stay the canonical home for identity, preference, and
episodic facts
- entities become the canonical home for project, knowledge, and
adaptation facts once the engineering layer V1 ships
- no concept lives in both layers at full fidelity; one canonical
home per concept
- a "graduation" flow lets active memories upgrade into entities
(memory stays as a frozen historical pointer, never deleted)
- one shared candidate review queue across both layers
- context builder budget gains a 15% slot for engineering entities,
slotted between identity/preference memories and retrieved chunks
- the Phase 9 memory extractor's structural cues (decision heading,
constraint heading, requirement heading) are explicitly an
intentional temporary overlap, cleanly migrated via graduation
when the entity extractor ships
docs/architecture/promotion-rules.md
------------------------------------
Defines the full Layer 0 → Layer 2 pipeline:
- four layers: L0 raw source, L1 memory candidate/active, L2 entity
candidate/active, L3 trusted project state
- three extraction triggers: on interaction capture (existing),
on ingestion wave (new, batched per wave), on explicit request
- per-rule prior confidence tuned at write time by structural
signal (echoes the retriever's high/low signal hints) and
freshness bonus
- batch cap of 50 candidates per pass to protect the reviewer
- full provenance requirements: every candidate carries rule id,
source_chunk_id, source_interaction_id, and extractor_version
- reversibility matrix for every promotion step
- explicit no-auto-promotion-in-V1 stance with the schema designed
so auto-promotion policies can be added later without migration
- the hard invariant: nothing ever moves into L3 automatically
- ingestion-wave extraction produces a report artifact under
data/extraction-reports/<wave-id>/
docs/architecture/conflict-model.md
-----------------------------------
Defines how AtoCore handles contradictory facts without violating
the "bad memory is worse than no memory" rule.
- conflict = two or more active rows claiming the same slot with
incompatible values
- per-type "slot key" tuples for both memory and entity types
- cross-layer conflict detection respects the trust hierarchy:
trusted project state > active entities > active memories
- new conflicts and conflict_members tables (schema proposal)
- detection at two latencies: synchronous at write time,
asynchronous nightly sweep
- "flag, never block" rule: writes always succeed, conflicts are
surfaced via /conflicts, /health open_conflicts_count, per-row
response bodies, and the Human Mirror's disputed marker
- resolution is always human: promote-winner + supersede-others,
or dismiss-as-not-a-real-conflict, both with audit trail
- explicitly out of scope for V1: cross-project conflicts,
temporal-overlap conflicts, tolerance-aware numeric comparisons
Also updates:
- master-plan-status.md: Phase 9 moved from "started" to "baseline
complete" now that Commits A, B, C are all landed
- master-plan-status.md: adds a "Engineering Layer Planning Sprint"
section listing the doc wave so far and the remaining docs
(tool-handoff-boundaries, human-mirror-rules,
representation-authority, engineering-v1-acceptance)
- current-state.md: Phase 9 moved from "not started" to "baseline
complete" with the A/B/C annotation
This is pure doc work. No code changes, no schema changes, no
behavior changes. Per the working rule in master-plan-status.md:
the architecture docs shape decisions, they do not force premature
schema work.
2026-04-06 21:30:35 -04:00
|
|
|
- Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
|
2026-04-05 18:47:40 -04:00
|
|
|
- partial
|
|
|
|
|
- Phase 4
|
2026-04-06 14:06:54 -04:00
|
|
|
- Phase 8
|
2026-04-05 18:47:40 -04:00
|
|
|
- not started
|
|
|
|
|
- Phase 6
|
|
|
|
|
- Phase 10
|
|
|
|
|
- Phase 11
|
|
|
|
|
- Phase 12
|
|
|
|
|
- Phase 13
|
|
|
|
|
|
|
|
|
|
## What Exists Today
|
|
|
|
|
|
|
|
|
|
- ingestion pipeline
|
|
|
|
|
- parser and chunker
|
|
|
|
|
- SQLite-backed memory and project state
|
|
|
|
|
- vector retrieval
|
|
|
|
|
- context builder
|
|
|
|
|
- API routes for query, context, health, and source status
|
2026-04-06 08:02:13 -04:00
|
|
|
- project registry and per-project refresh foundation
|
2026-04-06 10:15:00 -04:00
|
|
|
- project registration lifecycle:
|
|
|
|
|
- template
|
|
|
|
|
- proposal preview
|
|
|
|
|
- approved registration
|
2026-04-06 12:31:24 -04:00
|
|
|
- safe update of existing project registrations
|
2026-04-06 10:15:00 -04:00
|
|
|
- refresh
|
2026-04-06 12:45:28 -04:00
|
|
|
- implementation-facing architecture notes for:
|
|
|
|
|
- engineering knowledge hybrid architecture
|
|
|
|
|
- engineering ontology v1
|
2026-04-05 18:47:40 -04:00
|
|
|
- env-driven storage and deployment paths
|
|
|
|
|
- Dalidou Docker deployment foundation
|
2026-04-05 19:12:45 -04:00
|
|
|
- initial AtoCore self-knowledge corpus ingested on Dalidou
|
|
|
|
|
- T420/OpenClaw read-only AtoCore helper skill
|
2026-04-06 14:58:14 -04:00
|
|
|
- full active-project markdown/text corpus wave for:
|
2026-04-06 07:25:33 -04:00
|
|
|
- `p04-gigabit`
|
|
|
|
|
- `p05-interferometer`
|
|
|
|
|
- `p06-polisher`
|
2026-04-05 18:47:40 -04:00
|
|
|
|
|
|
|
|
## What Is True On Dalidou
|
|
|
|
|
|
|
|
|
|
- deployed repo location:
|
|
|
|
|
- `/srv/storage/atocore/app`
|
|
|
|
|
- canonical machine DB location:
|
|
|
|
|
- `/srv/storage/atocore/data/db/atocore.db`
|
|
|
|
|
- canonical vector store location:
|
|
|
|
|
- `/srv/storage/atocore/data/chroma`
|
|
|
|
|
- source input locations:
|
|
|
|
|
- `/srv/storage/atocore/sources/vault`
|
|
|
|
|
- `/srv/storage/atocore/sources/drive`
|
|
|
|
|
|
|
|
|
|
The service and storage foundation are live on Dalidou.
|
|
|
|
|
|
|
|
|
|
The machine-data host is real and canonical.
|
|
|
|
|
|
2026-04-06 10:15:00 -04:00
|
|
|
The project registry is now also persisted in a canonical mounted config path on
|
|
|
|
|
Dalidou:
|
|
|
|
|
|
|
|
|
|
- `/srv/storage/atocore/config/project-registry.json`
|
|
|
|
|
|
2026-04-05 19:12:45 -04:00
|
|
|
The content corpus is partially populated now.
|
|
|
|
|
|
|
|
|
|
The Dalidou instance already contains:
|
|
|
|
|
|
|
|
|
|
- AtoCore ecosystem and hosting docs
|
|
|
|
|
- current-state and OpenClaw integration docs
|
|
|
|
|
- Master Plan V3
|
|
|
|
|
- Build Spec V1
|
|
|
|
|
- trusted project-state entries for `atocore`
|
2026-04-06 14:58:14 -04:00
|
|
|
- full staged project markdown/text corpora for:
|
2026-04-06 07:25:33 -04:00
|
|
|
- `p04-gigabit`
|
|
|
|
|
- `p05-interferometer`
|
|
|
|
|
- `p06-polisher`
|
|
|
|
|
- curated repo-context docs for:
|
|
|
|
|
- `p05`: `Fullum-Interferometer`
|
|
|
|
|
- `p06`: `polisher-sim`
|
2026-04-06 07:36:33 -04:00
|
|
|
- trusted project-state entries for:
|
|
|
|
|
- `p04-gigabit`
|
|
|
|
|
- `p05-interferometer`
|
|
|
|
|
- `p06-polisher`
|
2026-04-06 07:25:33 -04:00
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
Current live stats after the full active-project wave are now far beyond the
|
|
|
|
|
initial seed stage:
|
2026-04-06 07:25:33 -04:00
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
- more than `1,100` source documents
|
|
|
|
|
- more than `20,000` chunks
|
|
|
|
|
- matching vector count
|
2026-04-05 19:12:45 -04:00
|
|
|
|
|
|
|
|
The broader long-term corpus is still not fully populated yet. Wider project and
|
|
|
|
|
vault ingestion remains a deliberate next step rather than something already
|
2026-04-06 07:25:33 -04:00
|
|
|
completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.
|
2026-04-05 19:12:45 -04:00
|
|
|
|
2026-04-06 07:36:33 -04:00
|
|
|
For human-readable quality review, the current staged project markdown corpus is
|
|
|
|
|
primarily visible under:
|
|
|
|
|
|
|
|
|
|
- `/srv/storage/atocore/sources/vault/incoming/projects`
|
|
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
This staged area is now useful for review because it contains the markdown/text
|
|
|
|
|
project docs that were actually ingested for the full active-project wave.
|
2026-04-06 07:36:33 -04:00
|
|
|
|
2026-04-06 07:53:18 -04:00
|
|
|
It is important to read this staged area correctly:
|
|
|
|
|
|
|
|
|
|
- it is a readable ingestion input layer
|
|
|
|
|
- it is not the final machine-memory representation itself
|
|
|
|
|
- seeing familiar PKM-style notes there is expected
|
|
|
|
|
- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
|
|
|
|
|
trusted project state, and context-builder outputs
|
|
|
|
|
|
2026-04-05 19:12:45 -04:00
|
|
|
## What Is True On The T420
|
|
|
|
|
|
|
|
|
|
- SSH access is working
|
|
|
|
|
- OpenClaw workspace inspected at `/home/papa/clawd`
|
|
|
|
|
- OpenClaw's own memory system remains unchanged
|
|
|
|
|
- a read-only AtoCore integration skill exists in the workspace:
|
|
|
|
|
- `/home/papa/clawd/skills/atocore-context/`
|
|
|
|
|
- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
|
|
|
|
|
- fail-open behavior has been verified for the helper path
|
2026-04-06 07:25:33 -04:00
|
|
|
- OpenClaw can now seed AtoCore in two distinct ways:
|
|
|
|
|
- project-scoped memory entries
|
|
|
|
|
- staged document ingestion into the retrieval corpus
|
2026-04-06 13:10:11 -04:00
|
|
|
- the helper now supports the practical registered-project lifecycle:
|
|
|
|
|
- projects
|
|
|
|
|
- project-template
|
|
|
|
|
- propose-project
|
|
|
|
|
- register-project
|
|
|
|
|
- update-project
|
|
|
|
|
- refresh-project
|
2026-04-06 14:04:49 -04:00
|
|
|
- the helper now also supports the first organic routing layer:
|
|
|
|
|
- `detect-project "<prompt>"`
|
|
|
|
|
- `auto-context "<prompt>" [budget] [project]`
|
|
|
|
|
- OpenClaw can now default to AtoCore for project-knowledge questions without
|
|
|
|
|
requiring explicit helper commands from the human every time
|
2026-04-06 07:25:33 -04:00
|
|
|
|
|
|
|
|
## What Exists In Memory vs Corpus
|
|
|
|
|
|
|
|
|
|
These remain separate and that is intentional.
|
|
|
|
|
|
|
|
|
|
In `/memory`:
|
|
|
|
|
|
|
|
|
|
- project-scoped curated memories now exist for:
|
|
|
|
|
- `p04-gigabit`: 5 memories
|
|
|
|
|
- `p05-interferometer`: 6 memories
|
|
|
|
|
- `p06-polisher`: 8 memories
|
|
|
|
|
|
|
|
|
|
These are curated summaries and extracted stable project signals.
|
|
|
|
|
|
|
|
|
|
In `source_documents` / retrieval corpus:
|
|
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
- full project markdown/text corpora are now present for the active project set
|
2026-04-06 07:25:33 -04:00
|
|
|
- retrieval is no longer limited to AtoCore self-knowledge only
|
2026-04-06 14:58:14 -04:00
|
|
|
- the current corpus is broad enough that ranking quality matters more than
|
|
|
|
|
corpus presence alone
|
|
|
|
|
- underspecified prompts can still pull in historical or archive material, so
|
|
|
|
|
project-aware routing and better ranking remain important
|
2026-04-06 07:25:33 -04:00
|
|
|
|
2026-04-06 08:02:13 -04:00
|
|
|
The source refresh model now has a concrete foundation in code:
|
|
|
|
|
|
|
|
|
|
- a project registry file defines known project ids, aliases, and ingest roots
|
|
|
|
|
- the API can list registered projects
|
2026-04-06 10:15:00 -04:00
|
|
|
- the API can return a registration template
|
|
|
|
|
- the API can preview a registration without mutating state
|
|
|
|
|
- the API can persist an approved registration
|
2026-04-06 12:31:24 -04:00
|
|
|
- the API can update an existing registered project without changing its canonical id
|
2026-04-06 08:02:13 -04:00
|
|
|
- the API can refresh one registered project at a time
|
|
|
|
|
|
2026-04-06 10:15:00 -04:00
|
|
|
This lifecycle is now coherent end to end for normal use.
|
|
|
|
|
|
2026-04-06 12:36:15 -04:00
|
|
|
The first live update passes on existing registered projects have now been
|
|
|
|
|
verified against `p04-gigabit` and `p05-interferometer`:
|
2026-04-06 12:31:24 -04:00
|
|
|
|
|
|
|
|
- the registration description can be updated safely
|
|
|
|
|
- the canonical project id remains unchanged
|
|
|
|
|
- refresh still behaves cleanly after the update
|
|
|
|
|
- `context/build` still returns useful project-specific context afterward
|
|
|
|
|
|
2026-04-06 10:15:00 -04:00
|
|
|
## Reliability Baseline
|
|
|
|
|
|
|
|
|
|
The runtime has now been hardened in a few practical ways:
|
|
|
|
|
|
|
|
|
|
- SQLite connections use a configurable busy timeout
|
|
|
|
|
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
|
|
|
|
|
- project registry writes are atomic file replacements rather than in-place rewrites
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
- a full runtime backup and restore path now exists and has been exercised on
|
|
|
|
|
live Dalidou:
|
|
|
|
|
- SQLite (hot online backup via `conn.backup()`)
|
|
|
|
|
- project registry (file copy)
|
|
|
|
|
- Chroma vector store (cold directory copy under `exclusive_ingestion()`)
|
2026-04-06 10:15:00 -04:00
|
|
|
- backup metadata
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
- `restore_runtime_backup()` with CLI entry point
|
|
|
|
|
(`python -m atocore.ops.backup restore <STAMP>
|
|
|
|
|
--confirm-service-stopped`), pre-restore safety snapshot for
|
|
|
|
|
rollback, WAL/SHM sidecar cleanup, `PRAGMA integrity_check`
|
|
|
|
|
on the restored file
|
|
|
|
|
- the first live drill on 2026-04-09 surfaced and fixed a Chroma
|
|
|
|
|
restore bug on Docker bind-mounted volumes (`shutil.rmtree`
|
|
|
|
|
on a mount point); a regression test now asserts the
|
|
|
|
|
destination inode is stable across restore
|
|
|
|
|
- deploy provenance is visible end-to-end:
|
|
|
|
|
- `/health` reports `build_sha`, `build_time`, `build_branch`
|
|
|
|
|
from env vars wired by `deploy.sh`
|
|
|
|
|
- `deploy.sh` Step 6 verifies the live `build_sha` matches the
|
|
|
|
|
just-built commit (exit code 6 on drift) so "live is current?"
|
|
|
|
|
can be answered precisely, not just by `__version__`
|
|
|
|
|
- `deploy.sh` Step 1.5 detects that the script itself changed
|
|
|
|
|
in the pulled commit and re-execs into the fresh copy, so
|
|
|
|
|
the deploy never silently runs the old script against new source
|
2026-04-06 10:15:00 -04:00
|
|
|
|
|
|
|
|
This does not eliminate every concurrency edge, but it materially improves the
|
|
|
|
|
current operational baseline.
|
|
|
|
|
|
2026-04-06 07:36:33 -04:00
|
|
|
In `Trusted Project State`:
|
|
|
|
|
|
|
|
|
|
- each active seeded project now has a conservative trusted-state set
|
|
|
|
|
- promoted facts cover:
|
|
|
|
|
- summary
|
|
|
|
|
- core architecture or boundary decision
|
|
|
|
|
- key constraints
|
|
|
|
|
- next focus
|
|
|
|
|
|
2026-04-06 07:25:33 -04:00
|
|
|
This separation is healthy:
|
|
|
|
|
|
|
|
|
|
- memory stores distilled project facts
|
|
|
|
|
- corpus stores the underlying retrievable documents
|
2026-04-05 18:47:40 -04:00
|
|
|
|
|
|
|
|
## Immediate Next Focus
|
|
|
|
|
|
2026-04-11 09:00:42 -04:00
|
|
|
1. ~~Re-run the full backup/restore drill~~ — DONE 2026-04-11,
|
|
|
|
|
full pass (db, registry, chroma, integrity all true)
|
|
|
|
|
2. ~~Turn on auto-capture of Claude Code sessions in conservative
|
|
|
|
|
mode~~ — DONE 2026-04-11, Stop hook wired via
|
|
|
|
|
`deploy/hooks/capture_stop.py` → `POST /interactions`
|
|
|
|
|
with `reinforce=false`; kill switch via
|
|
|
|
|
`ATOCORE_CAPTURE_DISABLED=1`
|
|
|
|
|
3. Run a short real-use pilot with auto-capture on, verify
|
|
|
|
|
interactions are landing in Dalidou, review quality
|
|
|
|
|
4. Use the new T420-side organic routing layer in real OpenClaw workflows
|
fix: chroma restore bind-mount bug + consolidate docs
Two fixes from the 2026-04-09 first real restore drill on Dalidou,
plus the long-overdue doc consolidation I should have done when I
added the drill runbook instead of creating a duplicate.
## Chroma restore bind-mount bug (drill finding)
src/atocore/ops/backup.py: restore_runtime_backup() used to call
shutil.rmtree(dst_chroma) before copying the snapshot back. In the
Dockerized Dalidou deployment the chroma dir is a bind-mounted
volume — you can't unlink a mount point, rmtree raises
OSError [Errno 16] Device or resource busy
and the restore silently fails to touch Chroma. This bit the first
real drill; the operator worked around it with --no-chroma plus a
manual cp -a.
Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per
child) and use copytree(dirs_exist_ok=True) so the mount point
itself is never touched. Equivalent semantics, bind-mount-safe.
Regression test:
tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory
captures Path.stat().st_ino of the dest dir before and after
restore and asserts they match. That's the same invariant a
bind-mounted chroma dir enforces — if the inode changed, the
mount would have failed. 11/11 backup tests now pass.
## Doc consolidation
docs/backup-restore-drill.md existed as a duplicate of the
authoritative docs/backup-restore-procedure.md. When I added the
drill runbook in commit 3362080 I wrote it from scratch instead of
updating the existing procedure — bad doc hygiene on a project
that's literally about being a context engine.
- Deleted docs/backup-restore-drill.md
- Folded its contents into docs/backup-restore-procedure.md:
- Replaced the manual sudo cp restore sequence with the new
`python -m atocore.ops.backup restore <STAMP>
--confirm-service-stopped` CLI
- Added the one-shot docker compose run pattern for running
restore inside a container that reuses the live volume mounts
- Documented the --no-pre-snapshot / --no-chroma / --chroma flags
- New "Chroma restore and bind-mounted volumes" subsection
explaining the bug and the regression test that protects the fix
- New "Restore drill" subsection with three levels (unit tests,
module round-trip, live Dalidou drill) and the cadence list
- Failure-mode table gained four entries: restored_integrity_ok,
Device-or-resource-busy, drill marker still present,
chroma_snapshot_missing
- "Open follow-ups" struck the restore_runtime_backup item (done)
and added a "Done (historical)" note referencing 2026-04-09
- Quickstart cheat sheet now has a full drill one-liner using
memory_type=episodic (the 2026-04-09 drill found the runbook's
memory_type=note was invalid — the valid set is identity,
preference, project, episodic, knowledge, adaptation)
## Status doc sync
Long overdue — I've been landing code without updating the
project's narrative state docs.
docs/current-state.md:
- "Reliability Baseline" now reflects: restore_runtime_backup is
real with CLI, pre-restore safety snapshot, WAL cleanup,
integrity check; live drill on 2026-04-09 surfaced and fixed
Chroma bind-mount bug; deploy provenance via /health build_sha;
deploy.sh self-update re-exec guard
- "Immediate Next Focus" reshuffled: drill re-run (priority 1) and
auto-capture (priority 2) are now ahead of retrieval quality work,
reflecting the updated unblock sequence
docs/next-steps.md:
- New item 1: re-run the drill with chroma working end-to-end
- New item 2: auto-capture conservative mode (Stop hook)
- Old item 7 rewritten as item 9 listing what's DONE
(create/list/validate/restore, admin/backup endpoint with
include_chroma, /health provenance, self-update guard,
procedure doc with failure modes) and what's still pending
(retention cleanup, off-Dalidou target, auto-validation)
## Test count
226 passing (was 225 + 1 new inode-stability regression test).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:13:21 -04:00
|
|
|
4. Tighten retrieval quality for the now fully ingested active project corpora
|
|
|
|
|
5. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
|
|
|
|
|
6. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
|
|
|
|
|
7. Expand the remaining boring operations baseline:
|
|
|
|
|
- retention policy cleanup script
|
|
|
|
|
- off-Dalidou backup target (rsync or similar)
|
|
|
|
|
8. Only later consider write-back, reflection, or deeper autonomous behaviors
|
2026-04-05 18:47:40 -04:00
|
|
|
|
2026-04-06 13:10:11 -04:00
|
|
|
See also:
|
|
|
|
|
|
2026-04-06 14:58:14 -04:00
|
|
|
- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
|
2026-04-06 13:10:11 -04:00
|
|
|
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
|
|
|
|
|
|
2026-04-05 18:47:40 -04:00
|
|
|
## Guiding Constraints
|
|
|
|
|
|
|
|
|
|
- bad memory is worse than no memory
|
|
|
|
|
- trusted project state must remain highest priority
|
|
|
|
|
- human-readable sources and machine storage stay separate
|
|
|
|
|
- OpenClaw integration must not degrade OpenClaw baseline behavior
|