Add deploy/hooks/capture_stop.py — a Claude Code Stop hook that reads the transcript JSONL, extracts the last user prompt, and POSTs to the AtoCore /interactions endpoint in conservative mode (reinforce=false). Conservative mode means: capture only, no automatic reinforcement or extraction into the review queue. Kill switch: ATOCORE_CAPTURE_DISABLED=1. Also: note build_sha cosmetic issue after restore in runbook, update project status docs to reflect drill pass and auto-capture wiring. 17 new tests (243 total, all passing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9.8 KiB
AtoCore Current State
Status Summary
AtoCore is no longer just a proof of concept. The local engine exists, the correctness pass is complete, Dalidou now hosts the canonical runtime and machine-storage location, and the T420/OpenClaw side now has a safe read-only path to consume AtoCore. The live corpus is no longer just self-knowledge: it now includes a first curated ingestion batch for the active projects.
Phase Assessment
- completed
- Phase 0
- Phase 0.5
- Phase 1
- baseline complete
- Phase 2
- Phase 3
- Phase 5
- Phase 7
- Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
- partial
- Phase 4
- Phase 8
- not started
- Phase 6
- Phase 10
- Phase 11
- Phase 12
- Phase 13
What Exists Today
- ingestion pipeline
- parser and chunker
- SQLite-backed memory and project state
- vector retrieval
- context builder
- API routes for query, context, health, and source status
- project registry and per-project refresh foundation
- project registration lifecycle:
- template
- proposal preview
- approved registration
- safe update of existing project registrations
- refresh
- implementation-facing architecture notes for:
- engineering knowledge hybrid architecture
- engineering ontology v1
- env-driven storage and deployment paths
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
- T420/OpenClaw read-only AtoCore helper skill
- full active-project markdown/text corpus wave for:
p04-gigabitp05-interferometerp06-polisher
What Is True On Dalidou
- deployed repo location:
/srv/storage/atocore/app
- canonical machine DB location:
/srv/storage/atocore/data/db/atocore.db
- canonical vector store location:
/srv/storage/atocore/data/chroma
- source input locations:
/srv/storage/atocore/sources/vault/srv/storage/atocore/sources/drive
The service and storage foundation are live on Dalidou.
The machine-data host is real and canonical.
The project registry is now also persisted in a canonical mounted config path on Dalidou:
/srv/storage/atocore/config/project-registry.json
The content corpus is partially populated now.
The Dalidou instance already contains:
- AtoCore ecosystem and hosting docs
- current-state and OpenClaw integration docs
- Master Plan V3
- Build Spec V1
- trusted project-state entries for
atocore - full staged project markdown/text corpora for:
p04-gigabitp05-interferometerp06-polisher
- curated repo-context docs for:
p05:Fullum-Interferometerp06:polisher-sim
- trusted project-state entries for:
p04-gigabitp05-interferometerp06-polisher
Current live stats after the full active-project wave are now far beyond the initial seed stage:
- more than
1,100source documents - more than
20,000chunks - matching vector count
The broader long-term corpus is still not fully populated yet. Wider project and vault ingestion remains a deliberate next step rather than something already completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.
For human-readable quality review, the current staged project markdown corpus is primarily visible under:
/srv/storage/atocore/sources/vault/incoming/projects
This staged area is now useful for review because it contains the markdown/text project docs that were actually ingested for the full active-project wave.
It is important to read this staged area correctly:
- it is a readable ingestion input layer
- it is not the final machine-memory representation itself
- seeing familiar PKM-style notes there is expected
- the machine-processed intelligence lives in the DB, chunks, vectors, memory, trusted project state, and context-builder outputs
What Is True On The T420
- SSH access is working
- OpenClaw workspace inspected at
/home/papa/clawd - OpenClaw's own memory system remains unchanged
- a read-only AtoCore integration skill exists in the workspace:
/home/papa/clawd/skills/atocore-context/
- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
- fail-open behavior has been verified for the helper path
- OpenClaw can now seed AtoCore in two distinct ways:
- project-scoped memory entries
- staged document ingestion into the retrieval corpus
- the helper now supports the practical registered-project lifecycle:
- projects
- project-template
- propose-project
- register-project
- update-project
- refresh-project
- the helper now also supports the first organic routing layer:
detect-project "<prompt>"auto-context "<prompt>" [budget] [project]
- OpenClaw can now default to AtoCore for project-knowledge questions without requiring explicit helper commands from the human every time
What Exists In Memory vs Corpus
These remain separate and that is intentional.
In /memory:
- project-scoped curated memories now exist for:
p04-gigabit: 5 memoriesp05-interferometer: 6 memoriesp06-polisher: 8 memories
These are curated summaries and extracted stable project signals.
In source_documents / retrieval corpus:
- full project markdown/text corpora are now present for the active project set
- retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is broad enough that ranking quality matters more than corpus presence alone
- underspecified prompts can still pull in historical or archive material, so project-aware routing and better ranking remain important
The source refresh model now has a concrete foundation in code:
- a project registry file defines known project ids, aliases, and ingest roots
- the API can list registered projects
- the API can return a registration template
- the API can preview a registration without mutating state
- the API can persist an approved registration
- the API can update an existing registered project without changing its canonical id
- the API can refresh one registered project at a time
This lifecycle is now coherent end to end for normal use.
The first live update passes on existing registered projects have now been
verified against p04-gigabit and p05-interferometer:
- the registration description can be updated safely
- the canonical project id remains unchanged
- refresh still behaves cleanly after the update
context/buildstill returns useful project-specific context afterward
Reliability Baseline
The runtime has now been hardened in a few practical ways:
- SQLite connections use a configurable busy timeout
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
- project registry writes are atomic file replacements rather than in-place rewrites
- a full runtime backup and restore path now exists and has been exercised on
live Dalidou:
- SQLite (hot online backup via
conn.backup()) - project registry (file copy)
- Chroma vector store (cold directory copy under
exclusive_ingestion()) - backup metadata
restore_runtime_backup()with CLI entry point (python -m atocore.ops.backup restore <STAMP> --confirm-service-stopped), pre-restore safety snapshot for rollback, WAL/SHM sidecar cleanup,PRAGMA integrity_checkon the restored file- the first live drill on 2026-04-09 surfaced and fixed a Chroma
restore bug on Docker bind-mounted volumes (
shutil.rmtreeon a mount point); a regression test now asserts the destination inode is stable across restore
- SQLite (hot online backup via
- deploy provenance is visible end-to-end:
/healthreportsbuild_sha,build_time,build_branchfrom env vars wired bydeploy.shdeploy.shStep 6 verifies the livebuild_shamatches the just-built commit (exit code 6 on drift) so "live is current?" can be answered precisely, not just by__version__deploy.shStep 1.5 detects that the script itself changed in the pulled commit and re-execs into the fresh copy, so the deploy never silently runs the old script against new source
This does not eliminate every concurrency edge, but it materially improves the current operational baseline.
In Trusted Project State:
- each active seeded project now has a conservative trusted-state set
- promoted facts cover:
- summary
- core architecture or boundary decision
- key constraints
- next focus
This separation is healthy:
- memory stores distilled project facts
- corpus stores the underlying retrievable documents
Immediate Next Focus
Re-run the full backup/restore drill— DONE 2026-04-11, full pass (db, registry, chroma, integrity all true)Turn on auto-capture of Claude Code sessions in conservative mode— DONE 2026-04-11, Stop hook wired viadeploy/hooks/capture_stop.py→POST /interactionswithreinforce=false; kill switch viaATOCORE_CAPTURE_DISABLED=1- Run a short real-use pilot with auto-capture on, verify interactions are landing in Dalidou, review quality
- Use the new T420-side organic routing layer in real OpenClaw workflows
- Tighten retrieval quality for the now fully ingested active project corpora
- Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
- Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
- Expand the remaining boring operations baseline:
- retention policy cleanup script
- off-Dalidou backup target (rsync or similar)
- Only later consider write-back, reflection, or deeper autonomous behaviors
See also:
Guiding Constraints
- bad memory is worse than no memory
- trusted project state must remain highest priority
- human-readable sources and machine storage stay separate
- OpenClaw integration must not degrade OpenClaw baseline behavior