# AtoCore Current State ## Status Summary AtoCore is no longer just a proof of concept. The local engine exists, the correctness pass is complete, Dalidou now hosts the canonical runtime and machine-storage location, and the T420/OpenClaw side now has a safe read-only path to consume AtoCore. The live corpus is no longer just self-knowledge: it now includes a first curated ingestion batch for the active projects. ## Phase Assessment - completed - Phase 0 - Phase 0.5 - Phase 1 - baseline complete - Phase 2 - Phase 3 - Phase 5 - Phase 7 - partial - Phase 4 - Phase 8 - not started - Phase 6 - Phase 9 - Phase 10 - Phase 11 - Phase 12 - Phase 13 ## What Exists Today - ingestion pipeline - parser and chunker - SQLite-backed memory and project state - vector retrieval - context builder - API routes for query, context, health, and source status - project registry and per-project refresh foundation - project registration lifecycle: - template - proposal preview - approved registration - safe update of existing project registrations - refresh - implementation-facing architecture notes for: - engineering knowledge hybrid architecture - engineering ontology v1 - env-driven storage and deployment paths - Dalidou Docker deployment foundation - initial AtoCore self-knowledge corpus ingested on Dalidou - T420/OpenClaw read-only AtoCore helper skill - full active-project markdown/text corpus wave for: - `p04-gigabit` - `p05-interferometer` - `p06-polisher` ## What Is True On Dalidou - deployed repo location: - `/srv/storage/atocore/app` - canonical machine DB location: - `/srv/storage/atocore/data/db/atocore.db` - canonical vector store location: - `/srv/storage/atocore/data/chroma` - source input locations: - `/srv/storage/atocore/sources/vault` - `/srv/storage/atocore/sources/drive` The service and storage foundation are live on Dalidou. The machine-data host is real and canonical. The project registry is now also persisted in a canonical mounted config path on Dalidou: - `/srv/storage/atocore/config/project-registry.json` The content corpus is partially populated now. The Dalidou instance already contains: - AtoCore ecosystem and hosting docs - current-state and OpenClaw integration docs - Master Plan V3 - Build Spec V1 - trusted project-state entries for `atocore` - full staged project markdown/text corpora for: - `p04-gigabit` - `p05-interferometer` - `p06-polisher` - curated repo-context docs for: - `p05`: `Fullum-Interferometer` - `p06`: `polisher-sim` - trusted project-state entries for: - `p04-gigabit` - `p05-interferometer` - `p06-polisher` Current live stats after the full active-project wave are now far beyond the initial seed stage: - more than `1,100` source documents - more than `20,000` chunks - matching vector count The broader long-term corpus is still not fully populated yet. Wider project and vault ingestion remains a deliberate next step rather than something already completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs. For human-readable quality review, the current staged project markdown corpus is primarily visible under: - `/srv/storage/atocore/sources/vault/incoming/projects` This staged area is now useful for review because it contains the markdown/text project docs that were actually ingested for the full active-project wave. It is important to read this staged area correctly: - it is a readable ingestion input layer - it is not the final machine-memory representation itself - seeing familiar PKM-style notes there is expected - the machine-processed intelligence lives in the DB, chunks, vectors, memory, trusted project state, and context-builder outputs ## What Is True On The T420 - SSH access is working - OpenClaw workspace inspected at `/home/papa/clawd` - OpenClaw's own memory system remains unchanged - a read-only AtoCore integration skill exists in the workspace: - `/home/papa/clawd/skills/atocore-context/` - the T420 can successfully reach Dalidou AtoCore over network/Tailscale - fail-open behavior has been verified for the helper path - OpenClaw can now seed AtoCore in two distinct ways: - project-scoped memory entries - staged document ingestion into the retrieval corpus - the helper now supports the practical registered-project lifecycle: - projects - project-template - propose-project - register-project - update-project - refresh-project - the helper now also supports the first organic routing layer: - `detect-project ""` - `auto-context "" [budget] [project]` - OpenClaw can now default to AtoCore for project-knowledge questions without requiring explicit helper commands from the human every time ## What Exists In Memory vs Corpus These remain separate and that is intentional. In `/memory`: - project-scoped curated memories now exist for: - `p04-gigabit`: 5 memories - `p05-interferometer`: 6 memories - `p06-polisher`: 8 memories These are curated summaries and extracted stable project signals. In `source_documents` / retrieval corpus: - full project markdown/text corpora are now present for the active project set - retrieval is no longer limited to AtoCore self-knowledge only - the current corpus is broad enough that ranking quality matters more than corpus presence alone - underspecified prompts can still pull in historical or archive material, so project-aware routing and better ranking remain important The source refresh model now has a concrete foundation in code: - a project registry file defines known project ids, aliases, and ingest roots - the API can list registered projects - the API can return a registration template - the API can preview a registration without mutating state - the API can persist an approved registration - the API can update an existing registered project without changing its canonical id - the API can refresh one registered project at a time This lifecycle is now coherent end to end for normal use. The first live update passes on existing registered projects have now been verified against `p04-gigabit` and `p05-interferometer`: - the registration description can be updated safely - the canonical project id remains unchanged - refresh still behaves cleanly after the update - `context/build` still returns useful project-specific context afterward ## Reliability Baseline The runtime has now been hardened in a few practical ways: - SQLite connections use a configurable busy timeout - SQLite uses WAL mode to reduce transient lock pain under normal concurrent use - project registry writes are atomic file replacements rather than in-place rewrites - a first runtime backup path now exists for: - SQLite - project registry - backup metadata This does not eliminate every concurrency edge, but it materially improves the current operational baseline. In `Trusted Project State`: - each active seeded project now has a conservative trusted-state set - promoted facts cover: - summary - core architecture or boundary decision - key constraints - next focus This separation is healthy: - memory stores distilled project facts - corpus stores the underlying retrievable documents ## Immediate Next Focus 1. Use the new T420-side organic routing layer in real OpenClaw workflows 2. Tighten retrieval quality for the now fully ingested active project corpora 3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further 4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work 5. Expand the boring operations baseline: - restore validation - Chroma rebuild / backup policy - retention 6. Only later consider write-back, reflection, or deeper autonomous behaviors See also: - [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md) - [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md) ## Guiding Constraints - bad memory is worse than no memory - trusted project state must remain highest priority - human-readable sources and machine storage stay separate - OpenClaw integration must not degrade OpenClaw baseline behavior