Files
ATOCore/docs/current-state.md

7.8 KiB

AtoCore Current State

Status Summary

AtoCore is no longer just a proof of concept. The local engine exists, the correctness pass is complete, Dalidou now hosts the canonical runtime and machine-storage location, and the T420/OpenClaw side now has a safe read-only path to consume AtoCore. The live corpus is no longer just self-knowledge: it now includes a first curated ingestion batch for the active projects.

Phase Assessment

  • completed
    • Phase 0
    • Phase 0.5
    • Phase 1
  • baseline complete
    • Phase 2
    • Phase 3
    • Phase 5
    • Phase 7
  • partial
    • Phase 4
    • Phase 8
  • not started
    • Phase 6
    • Phase 9
    • Phase 10
    • Phase 11
    • Phase 12
    • Phase 13

What Exists Today

  • ingestion pipeline
  • parser and chunker
  • SQLite-backed memory and project state
  • vector retrieval
  • context builder
  • API routes for query, context, health, and source status
  • project registry and per-project refresh foundation
  • project registration lifecycle:
    • template
    • proposal preview
    • approved registration
    • safe update of existing project registrations
    • refresh
  • implementation-facing architecture notes for:
    • engineering knowledge hybrid architecture
    • engineering ontology v1
  • env-driven storage and deployment paths
  • Dalidou Docker deployment foundation
  • initial AtoCore self-knowledge corpus ingested on Dalidou
  • T420/OpenClaw read-only AtoCore helper skill
  • first curated active-project corpus batch for:
    • p04-gigabit
    • p05-interferometer
    • p06-polisher

What Is True On Dalidou

  • deployed repo location:
    • /srv/storage/atocore/app
  • canonical machine DB location:
    • /srv/storage/atocore/data/db/atocore.db
  • canonical vector store location:
    • /srv/storage/atocore/data/chroma
  • source input locations:
    • /srv/storage/atocore/sources/vault
    • /srv/storage/atocore/sources/drive

The service and storage foundation are live on Dalidou.

The machine-data host is real and canonical.

The project registry is now also persisted in a canonical mounted config path on Dalidou:

  • /srv/storage/atocore/config/project-registry.json

The content corpus is partially populated now.

The Dalidou instance already contains:

  • AtoCore ecosystem and hosting docs
  • current-state and OpenClaw integration docs
  • Master Plan V3
  • Build Spec V1
  • trusted project-state entries for atocore
  • curated staged project docs for:
    • p04-gigabit
    • p05-interferometer
    • p06-polisher
  • curated repo-context docs for:
    • p05: Fullum-Interferometer
    • p06: polisher-sim
  • trusted project-state entries for:
    • p04-gigabit
    • p05-interferometer
    • p06-polisher

Current live stats after the latest documentation sync and active-project ingest passes:

  • source_documents: 36
  • source_chunks: 568
  • vectors: 568

The broader long-term corpus is still not fully populated yet. Wider project and vault ingestion remains a deliberate next step rather than something already completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.

For human-readable quality review, the current staged project markdown corpus is primarily visible under:

  • /srv/storage/atocore/sources/vault/incoming/projects

This staged area is now useful for review because it contains the curated project docs that were actually ingested for the first active-project batch.

It is important to read this staged area correctly:

  • it is a readable ingestion input layer
  • it is not the final machine-memory representation itself
  • seeing familiar PKM-style notes there is expected
  • the machine-processed intelligence lives in the DB, chunks, vectors, memory, trusted project state, and context-builder outputs

What Is True On The T420

  • SSH access is working
  • OpenClaw workspace inspected at /home/papa/clawd
  • OpenClaw's own memory system remains unchanged
  • a read-only AtoCore integration skill exists in the workspace:
    • /home/papa/clawd/skills/atocore-context/
  • the T420 can successfully reach Dalidou AtoCore over network/Tailscale
  • fail-open behavior has been verified for the helper path
  • OpenClaw can now seed AtoCore in two distinct ways:
    • project-scoped memory entries
    • staged document ingestion into the retrieval corpus
  • the helper now supports the practical registered-project lifecycle:
    • projects
    • project-template
    • propose-project
    • register-project
    • update-project
    • refresh-project
  • the helper now also supports the first organic routing layer:
    • detect-project "<prompt>"
    • auto-context "<prompt>" [budget] [project]
  • OpenClaw can now default to AtoCore for project-knowledge questions without requiring explicit helper commands from the human every time

What Exists In Memory vs Corpus

These remain separate and that is intentional.

In /memory:

  • project-scoped curated memories now exist for:
    • p04-gigabit: 5 memories
    • p05-interferometer: 6 memories
    • p06-polisher: 8 memories

These are curated summaries and extracted stable project signals.

In source_documents / retrieval corpus:

  • real project documents are now present for the same active project set
  • retrieval is no longer limited to AtoCore self-knowledge only
  • the current corpus is still selective rather than exhaustive
  • that selectivity is intentional at this stage

The source refresh model now has a concrete foundation in code:

  • a project registry file defines known project ids, aliases, and ingest roots
  • the API can list registered projects
  • the API can return a registration template
  • the API can preview a registration without mutating state
  • the API can persist an approved registration
  • the API can update an existing registered project without changing its canonical id
  • the API can refresh one registered project at a time

This lifecycle is now coherent end to end for normal use.

The first live update passes on existing registered projects have now been verified against p04-gigabit and p05-interferometer:

  • the registration description can be updated safely
  • the canonical project id remains unchanged
  • refresh still behaves cleanly after the update
  • context/build still returns useful project-specific context afterward

Reliability Baseline

The runtime has now been hardened in a few practical ways:

  • SQLite connections use a configurable busy timeout
  • SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
  • project registry writes are atomic file replacements rather than in-place rewrites
  • a first runtime backup path now exists for:
    • SQLite
    • project registry
    • backup metadata

This does not eliminate every concurrency edge, but it materially improves the current operational baseline.

In Trusted Project State:

  • each active seeded project now has a conservative trusted-state set
  • promoted facts cover:
    • summary
    • core architecture or boundary decision
    • key constraints
    • next focus

This separation is healthy:

  • memory stores distilled project facts
  • corpus stores the underlying retrievable documents

Immediate Next Focus

  1. Use the new T420-side organic routing layer in real OpenClaw workflows
  2. Keep tightening retrieval quality for the newly seeded active projects
  3. Define the first broader AtoVault/AtoDrive ingestion batches
  4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
  5. Expand the boring operations baseline:
    • restore validation
    • Chroma rebuild / backup policy
    • retention
  6. Only later consider write-back, reflection, or deeper autonomous behaviors

See also:

Guiding Constraints

  • bad memory is worse than no memory
  • trusted project state must remain highest priority
  • human-readable sources and machine storage stay separate
  • OpenClaw integration must not degrade OpenClaw baseline behavior