docs/current-state.md

# AtoCore Current State

## Status Summary

AtoCore is no longer just a proof of concept. The local engine exists, the
correctness pass is complete, Dalidou now hosts the canonical runtime and
machine-storage location, and the T420/OpenClaw side now has a safe read-only
path to consume AtoCore. The live corpus is no longer just self-knowledge: it
now includes a first curated ingestion batch for the active projects.

## Phase Assessment

- completed
  - Phase 0
  - Phase 0.5
  - Phase 1
- baseline complete
  - Phase 2
  - Phase 3
  - Phase 5
  - Phase 7
  - Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
- partial
  - Phase 4
  - Phase 8
- not started
  - Phase 6
  - Phase 10
  - Phase 11
  - Phase 12
  - Phase 13

## What Exists Today

- ingestion pipeline
- parser and chunker
- SQLite-backed memory and project state
- vector retrieval
- context builder
- API routes for query, context, health, and source status
- project registry and per-project refresh foundation
- project registration lifecycle:
  - template
  - proposal preview
  - approved registration
  - safe update of existing project registrations
  - refresh
- implementation-facing architecture notes for:
  - engineering knowledge hybrid architecture
  - engineering ontology v1
- env-driven storage and deployment paths
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
- T420/OpenClaw read-only AtoCore helper skill
- full active-project markdown/text corpus wave for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`

## What Is True On Dalidou

- deployed repo location:
  - `/srv/storage/atocore/app`
- canonical machine DB location:
  - `/srv/storage/atocore/data/db/atocore.db`
- canonical vector store location:
  - `/srv/storage/atocore/data/chroma`
- source input locations:
  - `/srv/storage/atocore/sources/vault`
  - `/srv/storage/atocore/sources/drive`

The service and storage foundation are live on Dalidou.

The machine-data host is real and canonical.

The project registry is now also persisted in a canonical mounted config path on
Dalidou:

- `/srv/storage/atocore/config/project-registry.json`

The content corpus is partially populated now.

The Dalidou instance already contains:

- AtoCore ecosystem and hosting docs
- current-state and OpenClaw integration docs
- Master Plan V3
- Build Spec V1
- trusted project-state entries for `atocore`
- full staged project markdown/text corpora for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`
- curated repo-context docs for:
  - `p05`: `Fullum-Interferometer`
  - `p06`: `polisher-sim`
- trusted project-state entries for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`

Current live stats after the full active-project wave are now far beyond the
initial seed stage:

- more than `1,100` source documents
- more than `20,000` chunks
- matching vector count

The broader long-term corpus is still not fully populated yet. Wider project and
vault ingestion remains a deliberate next step rather than something already
completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.

For human-readable quality review, the current staged project markdown corpus is
primarily visible under:

- `/srv/storage/atocore/sources/vault/incoming/projects`

This staged area is now useful for review because it contains the markdown/text
project docs that were actually ingested for the full active-project wave.

It is important to read this staged area correctly:

- it is a readable ingestion input layer
- it is not the final machine-memory representation itself
- seeing familiar PKM-style notes there is expected
- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
  trusted project state, and context-builder outputs

## What Is True On The T420

- SSH access is working
- OpenClaw workspace inspected at `/home/papa/clawd`
- OpenClaw's own memory system remains unchanged
- a read-only AtoCore integration skill exists in the workspace:
  - `/home/papa/clawd/skills/atocore-context/`
- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
- fail-open behavior has been verified for the helper path
- OpenClaw can now seed AtoCore in two distinct ways:
  - project-scoped memory entries
  - staged document ingestion into the retrieval corpus
- the helper now supports the practical registered-project lifecycle:
  - projects
  - project-template
  - propose-project
  - register-project
  - update-project
  - refresh-project
- the helper now also supports the first organic routing layer:
  - `detect-project "<prompt>"`
  - `auto-context "<prompt>" [budget] [project]`
- OpenClaw can now default to AtoCore for project-knowledge questions without
  requiring explicit helper commands from the human every time

## What Exists In Memory vs Corpus

These remain separate and that is intentional.

In `/memory`:

- project-scoped curated memories now exist for:
  - `p04-gigabit`: 5 memories
  - `p05-interferometer`: 6 memories
  - `p06-polisher`: 8 memories

These are curated summaries and extracted stable project signals.

In `source_documents` / retrieval corpus:

- full project markdown/text corpora are now present for the active project set
- retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is broad enough that ranking quality matters more than
  corpus presence alone
- underspecified prompts can still pull in historical or archive material, so
  project-aware routing and better ranking remain important

The source refresh model now has a concrete foundation in code:

- a project registry file defines known project ids, aliases, and ingest roots
- the API can list registered projects
- the API can return a registration template
- the API can preview a registration without mutating state
- the API can persist an approved registration
- the API can update an existing registered project without changing its canonical id
- the API can refresh one registered project at a time

This lifecycle is now coherent end to end for normal use.

The first live update passes on existing registered projects have now been
verified against `p04-gigabit` and `p05-interferometer`:

- the registration description can be updated safely
- the canonical project id remains unchanged
- refresh still behaves cleanly after the update
- `context/build` still returns useful project-specific context afterward

## Reliability Baseline

The runtime has now been hardened in a few practical ways:

- SQLite connections use a configurable busy timeout
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
- project registry writes are atomic file replacements rather than in-place rewrites
- a first runtime backup path now exists for:
  - SQLite
  - project registry
  - backup metadata

This does not eliminate every concurrency edge, but it materially improves the
current operational baseline.

In `Trusted Project State`:

- each active seeded project now has a conservative trusted-state set
- promoted facts cover:
  - summary
  - core architecture or boundary decision
  - key constraints
  - next focus

This separation is healthy:

- memory stores distilled project facts
- corpus stores the underlying retrievable documents

## Immediate Next Focus

1. Use the new T420-side organic routing layer in real OpenClaw workflows
2. Tighten retrieval quality for the now fully ingested active project corpora
3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
5. Expand the boring operations baseline:
   - restore validation
   - Chroma rebuild / backup policy
   - retention
6. Only later consider write-back, reflection, or deeper autonomous behaviors

See also:

- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)

## Guiding Constraints

- bad memory is worse than no memory
- trusted project state must remain highest priority
- human-readable sources and machine storage stay separate
- OpenClaw integration must not degrade OpenClaw baseline behavior
Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00			`# AtoCore Current State`

			`## Status Summary`

			`AtoCore is no longer just a proof of concept. The local engine exists, the`
Update current state and next steps docs 2026-04-05 19:12:45 -04:00			`correctness pass is complete, Dalidou now hosts the canonical runtime and`
			`machine-storage location, and the T420/OpenClaw side now has a safe read-only`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00			`path to consume AtoCore. The live corpus is no longer just self-knowledge: it`
			`now includes a first curated ingestion batch for the active projects.`
Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00
			`## Phase Assessment`

			`- completed`
			`- Phase 0`
			`- Phase 0.5`
			`- Phase 1`
			`- baseline complete`
			`- Phase 2`
			`- Phase 3`
			`- Phase 5`
			`- Phase 7`
docs(arch): memory-vs-entities, promotion-rules, conflict-model Three planning docs that answer the architectural questions the engineering query catalog raised. Together with the catalog they form roughly half of the pre-implementation planning sprint. docs/architecture/memory-vs-entities.md --------------------------------------- Resolves the central question blocking every other engineering layer doc: is a Decision a memory or an entity? Key decisions: - memories stay the canonical home for identity, preference, and episodic facts - entities become the canonical home for project, knowledge, and adaptation facts once the engineering layer V1 ships - no concept lives in both layers at full fidelity; one canonical home per concept - a "graduation" flow lets active memories upgrade into entities (memory stays as a frozen historical pointer, never deleted) - one shared candidate review queue across both layers - context builder budget gains a 15% slot for engineering entities, slotted between identity/preference memories and retrieved chunks - the Phase 9 memory extractor's structural cues (decision heading, constraint heading, requirement heading) are explicitly an intentional temporary overlap, cleanly migrated via graduation when the entity extractor ships docs/architecture/promotion-rules.md ------------------------------------ Defines the full Layer 0 → Layer 2 pipeline: - four layers: L0 raw source, L1 memory candidate/active, L2 entity candidate/active, L3 trusted project state - three extraction triggers: on interaction capture (existing), on ingestion wave (new, batched per wave), on explicit request - per-rule prior confidence tuned at write time by structural signal (echoes the retriever's high/low signal hints) and freshness bonus - batch cap of 50 candidates per pass to protect the reviewer - full provenance requirements: every candidate carries rule id, source_chunk_id, source_interaction_id, and extractor_version - reversibility matrix for every promotion step - explicit no-auto-promotion-in-V1 stance with the schema designed so auto-promotion policies can be added later without migration - the hard invariant: nothing ever moves into L3 automatically - ingestion-wave extraction produces a report artifact under data/extraction-reports/<wave-id>/ docs/architecture/conflict-model.md ----------------------------------- Defines how AtoCore handles contradictory facts without violating the "bad memory is worse than no memory" rule. - conflict = two or more active rows claiming the same slot with incompatible values - per-type "slot key" tuples for both memory and entity types - cross-layer conflict detection respects the trust hierarchy: trusted project state > active entities > active memories - new conflicts and conflict_members tables (schema proposal) - detection at two latencies: synchronous at write time, asynchronous nightly sweep - "flag, never block" rule: writes always succeed, conflicts are surfaced via /conflicts, /health open_conflicts_count, per-row response bodies, and the Human Mirror's disputed marker - resolution is always human: promote-winner + supersede-others, or dismiss-as-not-a-real-conflict, both with audit trail - explicitly out of scope for V1: cross-project conflicts, temporal-overlap conflicts, tolerance-aware numeric comparisons Also updates: - master-plan-status.md: Phase 9 moved from "started" to "baseline complete" now that Commits A, B, C are all landed - master-plan-status.md: adds a "Engineering Layer Planning Sprint" section listing the doc wave so far and the remaining docs (tool-handoff-boundaries, human-mirror-rules, representation-authority, engineering-v1-acceptance) - current-state.md: Phase 9 moved from "not started" to "baseline complete" with the A/B/C annotation This is pure doc work. No code changes, no schema changes, no behavior changes. Per the working rule in master-plan-status.md: the architecture docs shape decisions, they do not force premature schema work. 2026-04-06 21:30:35 -04:00			`- Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)`
Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00			`- partial`
			`- Phase 4`
Update plan status for organic routing 2026-04-06 14:06:54 -04:00			`- Phase 8`
Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00			`- not started`
			`- Phase 6`
			`- Phase 10`
			`- Phase 11`
			`- Phase 12`
			`- Phase 13`

			`## What Exists Today`

			`- ingestion pipeline`
			`- parser and chunker`
			`- SQLite-backed memory and project state`
			`- vector retrieval`
			`- context builder`
			`- API routes for query, context, health, and source status`
Add project registry refresh foundation 2026-04-06 08:02:13 -04:00			`- project registry and per-project refresh foundation`
Harden runtime and add backup foundation 2026-04-06 10:15:00 -04:00			`- project registration lifecycle:`
			`- template`
			`- proposal preview`
			`- approved registration`
Add project registry update flow 2026-04-06 12:31:24 -04:00			`- safe update of existing project registrations`
Harden runtime and add backup foundation 2026-04-06 10:15:00 -04:00			`- refresh`
Add engineering architecture docs 2026-04-06 12:45:28 -04:00			`- implementation-facing architecture notes for:`
			`- engineering knowledge hybrid architecture`
			`- engineering ontology v1`
Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00			`- env-driven storage and deployment paths`
			`- Dalidou Docker deployment foundation`
Update current state and next steps docs 2026-04-05 19:12:45 -04:00			`- initial AtoCore self-knowledge corpus ingested on Dalidou`
			`- T420/OpenClaw read-only AtoCore helper skill`
Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			`- full active-project markdown/text corpus wave for:`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00			- `p04-gigabit`
			- `p05-interferometer`
			- `p06-polisher`
Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00
			`## What Is True On Dalidou`

			`- deployed repo location:`
			- `/srv/storage/atocore/app`
			`- canonical machine DB location:`
			- `/srv/storage/atocore/data/db/atocore.db`
			`- canonical vector store location:`
			- `/srv/storage/atocore/data/chroma`
			`- source input locations:`
			- `/srv/storage/atocore/sources/vault`
			- `/srv/storage/atocore/sources/drive`

			`The service and storage foundation are live on Dalidou.`

			`The machine-data host is real and canonical.`

Harden runtime and add backup foundation 2026-04-06 10:15:00 -04:00			`The project registry is now also persisted in a canonical mounted config path on`
			`Dalidou:`

			- `/srv/storage/atocore/config/project-registry.json`

Update current state and next steps docs 2026-04-05 19:12:45 -04:00			`The content corpus is partially populated now.`

			`The Dalidou instance already contains:`

			`- AtoCore ecosystem and hosting docs`
			`- current-state and OpenClaw integration docs`
			`- Master Plan V3`
			`- Build Spec V1`
			- trusted project-state entries for `atocore`
Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			`- full staged project markdown/text corpora for:`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00			- `p04-gigabit`
			- `p05-interferometer`
			- `p06-polisher`
			`- curated repo-context docs for:`
			- `p05`: `Fullum-Interferometer`
			- `p06`: `polisher-sim`
Refresh current state and next steps docs 2026-04-06 07:36:33 -04:00			`- trusted project-state entries for:`
			- `p04-gigabit`
			- `p05-interferometer`
			- `p06-polisher`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00
Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			`Current live stats after the full active-project wave are now far beyond the`
			`initial seed stage:`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00
Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			- more than `1,100` source documents
			- more than `20,000` chunks
			`- matching vector count`
Update current state and next steps docs 2026-04-05 19:12:45 -04:00
			`The broader long-term corpus is still not fully populated yet. Wider project and`
			`vault ingestion remains a deliberate next step rather than something already`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00			`completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.`
Update current state and next steps docs 2026-04-05 19:12:45 -04:00
Refresh current state and next steps docs 2026-04-06 07:36:33 -04:00			`For human-readable quality review, the current staged project markdown corpus is`
			`primarily visible under:`

			- `/srv/storage/atocore/sources/vault/incoming/projects`

Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			`This staged area is now useful for review because it contains the markdown/text`
			`project docs that were actually ingested for the full active-project wave.`
Refresh current state and next steps docs 2026-04-06 07:36:33 -04:00
Clarify source staging and refresh model 2026-04-06 07:53:18 -04:00			`It is important to read this staged area correctly:`

			`- it is a readable ingestion input layer`
			`- it is not the final machine-memory representation itself`
			`- seeing familiar PKM-style notes there is expected`
			`- the machine-processed intelligence lives in the DB, chunks, vectors, memory,`
			`trusted project state, and context-builder outputs`

Update current state and next steps docs 2026-04-05 19:12:45 -04:00			`## What Is True On The T420`

			`- SSH access is working`
			- OpenClaw workspace inspected at `/home/papa/clawd`
			`- OpenClaw's own memory system remains unchanged`
			`- a read-only AtoCore integration skill exists in the workspace:`
			- `/home/papa/clawd/skills/atocore-context/`
			`- the T420 can successfully reach Dalidou AtoCore over network/Tailscale`
			`- fail-open behavior has been verified for the helper path`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00			`- OpenClaw can now seed AtoCore in two distinct ways:`
			`- project-scoped memory entries`
			`- staged document ingestion into the retrieval corpus`
Document next-phase execution plan 2026-04-06 13:10:11 -04:00			`- the helper now supports the practical registered-project lifecycle:`
			`- projects`
			`- project-template`
			`- propose-project`
			`- register-project`
			`- update-project`
			`- refresh-project`
Document organic OpenClaw routing layer 2026-04-06 14:04:49 -04:00			`- the helper now also supports the first organic routing layer:`
			- `detect-project "<prompt>"`
			- `auto-context "<prompt>" [budget] [project]`
			`- OpenClaw can now default to AtoCore for project-knowledge questions without`
			`requiring explicit helper commands from the human every time`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00
			`## What Exists In Memory vs Corpus`

			`These remain separate and that is intentional.`

			In `/memory`:

			`- project-scoped curated memories now exist for:`
			- `p04-gigabit`: 5 memories
			- `p05-interferometer`: 6 memories
			- `p06-polisher`: 8 memories

			`These are curated summaries and extracted stable project signals.`

			In `source_documents` / retrieval corpus:

Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			`- full project markdown/text corpora are now present for the active project set`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00			`- retrieval is no longer limited to AtoCore self-knowledge only`
Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			`- the current corpus is broad enough that ranking quality matters more than`
			`corpus presence alone`
			`- underspecified prompts can still pull in historical or archive material, so`
			`project-aware routing and better ranking remain important`
Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00
Add project registry refresh foundation 2026-04-06 08:02:13 -04:00			`The source refresh model now has a concrete foundation in code:`

			`- a project registry file defines known project ids, aliases, and ingest roots`
			`- the API can list registered projects`
Harden runtime and add backup foundation 2026-04-06 10:15:00 -04:00			`- the API can return a registration template`
			`- the API can preview a registration without mutating state`
			`- the API can persist an approved registration`
Add project registry update flow 2026-04-06 12:31:24 -04:00			`- the API can update an existing registered project without changing its canonical id`
Add project registry refresh foundation 2026-04-06 08:02:13 -04:00			`- the API can refresh one registered project at a time`

Harden runtime and add backup foundation 2026-04-06 10:15:00 -04:00			`This lifecycle is now coherent end to end for normal use.`

Sync live project registry descriptions 2026-04-06 12:36:15 -04:00			`The first live update passes on existing registered projects have now been`
			verified against `p04-gigabit` and `p05-interferometer`:
Add project registry update flow 2026-04-06 12:31:24 -04:00
			`- the registration description can be updated safely`
			`- the canonical project id remains unchanged`
			`- refresh still behaves cleanly after the update`
			- `context/build` still returns useful project-specific context afterward

Harden runtime and add backup foundation 2026-04-06 10:15:00 -04:00			`## Reliability Baseline`

			`The runtime has now been hardened in a few practical ways:`

			`- SQLite connections use a configurable busy timeout`
			`- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use`
			`- project registry writes are atomic file replacements rather than in-place rewrites`
			`- a first runtime backup path now exists for:`
			`- SQLite`
			`- project registry`
			`- backup metadata`

			`This does not eliminate every concurrency edge, but it materially improves the`
			`current operational baseline.`

Refresh current state and next steps docs 2026-04-06 07:36:33 -04:00			In `Trusted Project State`:

			`- each active seeded project now has a conservative trusted-state set`
			`- promoted facts cover:`
			`- summary`
			`- core architecture or boundary decision`
			`- key constraints`
			`- next focus`

Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00			`This separation is healthy:`

			`- memory stores distilled project facts`
			`- corpus stores the underlying retrievable documents`
Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00
			`## Immediate Next Focus`

Document organic OpenClaw routing layer 2026-04-06 14:04:49 -04:00			`1. Use the new T420-side organic routing layer in real OpenClaw workflows`
Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			`2. Tighten retrieval quality for the now fully ingested active project corpora`
			`3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further`
Document organic OpenClaw routing layer 2026-04-06 14:04:49 -04:00			`4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work`
			`5. Expand the boring operations baseline:`
			`- restore validation`
			`- Chroma rebuild / backup policy`
			`- retention`
			`6. Only later consider write-back, reflection, or deeper autonomous behaviors`
Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00
Document next-phase execution plan 2026-04-06 13:10:11 -04:00			`See also:`

Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00			`- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)`
Document next-phase execution plan 2026-04-06 13:10:11 -04:00			`- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)`

Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00			`## Guiding Constraints`

			`- bad memory is worse than no memory`
			`- trusted project state must remain highest priority`
			`- human-readable sources and machine storage stay separate`
			`- OpenClaw integration must not degrade OpenClaw baseline behavior`