ATOCore/docs/current-state.md

# AtoCore Current State

## Status Summary

AtoCore is no longer just a proof of concept. The local engine exists, the
correctness pass is complete, Dalidou now hosts the canonical runtime and
machine-storage location, and the T420/OpenClaw side now has a safe read-only
path to consume AtoCore. The live corpus is no longer just self-knowledge: it
now includes a first curated ingestion batch for the active projects.

## Phase Assessment

- completed
  - Phase 0
  - Phase 0.5
  - Phase 1
- baseline complete
  - Phase 2
  - Phase 3
  - Phase 5
  - Phase 7
- partial
  - Phase 4
- not started
  - Phase 6
  - Phase 8
  - Phase 9
  - Phase 10
  - Phase 11
  - Phase 12
  - Phase 13

## What Exists Today

- ingestion pipeline
- parser and chunker
- SQLite-backed memory and project state
- vector retrieval
- context builder
- API routes for query, context, health, and source status
- env-driven storage and deployment paths
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
- T420/OpenClaw read-only AtoCore helper skill
- first curated active-project corpus batch for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`

## What Is True On Dalidou

- deployed repo location:
  - `/srv/storage/atocore/app`
- canonical machine DB location:
  - `/srv/storage/atocore/data/db/atocore.db`
- canonical vector store location:
  - `/srv/storage/atocore/data/chroma`
- source input locations:
  - `/srv/storage/atocore/sources/vault`
  - `/srv/storage/atocore/sources/drive`

The service and storage foundation are live on Dalidou.

The machine-data host is real and canonical.

The content corpus is partially populated now.

The Dalidou instance already contains:

- AtoCore ecosystem and hosting docs
- current-state and OpenClaw integration docs
- Master Plan V3
- Build Spec V1
- trusted project-state entries for `atocore`
- curated staged project docs for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`
- curated repo-context docs for:
  - `p05`: `Fullum-Interferometer`
  - `p06`: `polisher-sim`
- trusted project-state entries for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`

Current live stats after the latest documentation sync and active-project ingest
passes:

- `source_documents`: 34
- `source_chunks`: 550
- `vectors`: 550

The broader long-term corpus is still not fully populated yet. Wider project and
vault ingestion remains a deliberate next step rather than something already
completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.

For human-readable quality review, the current staged project markdown corpus is
primarily visible under:

- `/srv/storage/atocore/sources/vault/incoming/projects`

This staged area is now useful for review because it contains the curated
project docs that were actually ingested for the first active-project batch.

It is important to read this staged area correctly:

- it is a readable ingestion input layer
- it is not the final machine-memory representation itself
- seeing familiar PKM-style notes there is expected
- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
  trusted project state, and context-builder outputs

## What Is True On The T420

- SSH access is working
- OpenClaw workspace inspected at `/home/papa/clawd`
- OpenClaw's own memory system remains unchanged
- a read-only AtoCore integration skill exists in the workspace:
  - `/home/papa/clawd/skills/atocore-context/`
- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
- fail-open behavior has been verified for the helper path
- OpenClaw can now seed AtoCore in two distinct ways:
  - project-scoped memory entries
  - staged document ingestion into the retrieval corpus

## What Exists In Memory vs Corpus

These remain separate and that is intentional.

In `/memory`:

- project-scoped curated memories now exist for:
  - `p04-gigabit`: 5 memories
  - `p05-interferometer`: 6 memories
  - `p06-polisher`: 8 memories

These are curated summaries and extracted stable project signals.

In `source_documents` / retrieval corpus:

- real project documents are now present for the same active project set
- retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is still selective rather than exhaustive
- that selectivity is intentional at this stage

In `Trusted Project State`:

- each active seeded project now has a conservative trusted-state set
- promoted facts cover:
  - summary
  - core architecture or boundary decision
  - key constraints
  - next focus

This separation is healthy:

- memory stores distilled project facts
- corpus stores the underlying retrievable documents

## Immediate Next Focus

1. Use the new T420-side AtoCore skill in real OpenClaw workflows
2. Tighten retrieval quality for the newly seeded active projects
3. Define the first broader AtoVault/AtoDrive ingestion batches
4. Add backup/export strategy for Dalidou machine state
5. Only later consider deeper automatic OpenClaw integration or write-back

## Guiding Constraints

- bad memory is worse than no memory
- trusted project state must remain highest priority
- human-readable sources and machine storage stay separate
- OpenClaw integration must not degrade OpenClaw baseline behavior