Clarify source staging and refresh model

This commit is contained in:
2026-04-06 07:53:18 -04:00
parent 82c7535d15
commit 0f95415530
5 changed files with 192 additions and 6 deletions

View File

@@ -49,6 +49,31 @@ separate.
The machine database is derived operational state, not the primary
human-readable source of truth.
## Source Snapshot Vs Machine Store
The human-readable files visible under `sources/vault` or `sources/drive` are
not the final "smart storage" format of AtoCore.
They are source snapshots made visible to the canonical Dalidou instance so
AtoCore can ingest them.
The actual machine-processed state lives in:
- `source_documents`
- `source_chunks`
- vector embeddings and indexes
- project memories
- trusted project state
- context-builder output
This means the staged markdown can still look very similar to the original PKM
or repo docs. That is normal.
The intelligence does not come from rewriting everything into a new markdown
vault. It comes from ingesting selected source material into the machine store
and then using that store for retrieval, trust-aware context assembly, and
memory.
## Canonical Hosting Model
Dalidou is the canonical host for the AtoCore service and machine database.
@@ -86,6 +111,14 @@ file replication of the live machine store.
- OpenClaw must continue to work if AtoCore is unavailable
- write-back from OpenClaw into AtoCore is deferred until later phases
Current staging behavior:
- selected project docs may be copied into a readable staging area on Dalidou
- AtoCore ingests from that staging area into the machine store
- the staging area is not itself the durable intelligence layer
- changes to the original PKM or repo source do not propagate automatically
until a refresh or re-ingest happens
## Intended Daily Operating Model
The target workflow is:

View File

@@ -84,11 +84,12 @@ The Dalidou instance already contains:
- `p05-interferometer`
- `p06-polisher`
Current live stats after the first active-project ingest pass:
Current live stats after the latest documentation sync and active-project ingest
passes:
- `source_documents`: 33
- `source_chunks`: 535
- `vectors`: 535
- `source_documents`: 34
- `source_chunks`: 550
- `vectors`: 550
The broader long-term corpus is still not fully populated yet. Wider project and
vault ingestion remains a deliberate next step rather than something already
@@ -102,6 +103,14 @@ primarily visible under:
This staged area is now useful for review because it contains the curated
project docs that were actually ingested for the first active-project batch.
It is important to read this staged area correctly:
- it is a readable ingestion input layer
- it is not the final machine-memory representation itself
- seeing familiar PKM-style notes there is expected
- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
trusted project state, and context-builder outputs
## What Is True On The T420
- SSH access is working
@@ -132,6 +141,8 @@ In `source_documents` / retrieval corpus:
- real project documents are now present for the same active project set
- retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is still selective rather than exhaustive
- that selectivity is intentional at this stage
In `Trusted Project State`:

View File

@@ -26,10 +26,14 @@ AtoCore now has:
3. Continue controlled project ingestion only where the current corpus is still
thin
- a few additional anchor docs per active project
4. Define backup and export procedures for Dalidou
4. Define a cleaner source refresh model
- make the difference between source truth, staged inputs, and machine store
explicit
- move toward a project source registry and refresh workflow
5. Define backup and export procedures for Dalidou
- SQLite snapshot/backup strategy
- Chroma backup or rebuild policy
5. Keep deeper automatic runtime integration deferred until the read-only model
6. Keep deeper automatic runtime integration deferred until the read-only model
has proven value
## Trusted State Status

View File

@@ -91,3 +91,52 @@ Think of AtoCore as the durable external context hard drive for LLM work:
- no need to replace the normal project workflow
That is the architecture target.
## Why The Staged Markdown Exists
The staged markdown on Dalidou is a source-input layer, not the end product of
the system.
In the current deployment model:
1. selected PKM, AtoDrive, or repo docs are copied or mirrored into a Dalidou
source path
2. AtoCore ingests them
3. the machine store keeps the processed representation
4. retrieval and context building operate on that machine store
So if the staged docs look very similar to your original PKM notes, that is
expected. They are source material, not the compiled context layer itself.
## What Happens When A Source Changes
If you edit a PKM note or repo doc at the original source, AtoCore does not
magically know yet.
The current model is refresh-based:
1. update the human-authoritative source
2. refresh or re-stage the relevant project source set on Dalidou
3. run ingestion again
4. let AtoCore update the machine representation
This is still an intermediate workflow. The long-run target is a cleaner source
registry and refresh model so that commands like `refresh p05-interferometer`
become natural and reliable.
## Current Scope Of Ingestion
The current project corpus is intentionally selective, not exhaustive.
For active projects, the goal right now is to ingest:
- high-value anchor docs
- strong meeting notes with real decisions
- architecture and constraints docs
- selected repo context that explains the system shape
The goal is not to dump the entire PKM or whole repo tree into AtoCore on the
first pass.
So if a project only has some curated notes and not the full project universe in
the staged area yet, that is normal for the current phase.

View File

@@ -0,0 +1,89 @@
# AtoCore Source Refresh Model
## Purpose
This document explains how human-authored project material should flow into the
Dalidou-hosted AtoCore machine store.
It exists to make one distinction explicit:
- source markdown is not the same thing as the machine-memory layer
- source refresh is how changes in PKM or repos become visible to AtoCore
## Current Model
Today, the flow is:
1. human-authoritative project material exists in PKM, AtoDrive, and repos
2. selected high-value files are staged into Dalidou source paths
3. AtoCore ingests those source files
4. AtoCore stores the processed representation in:
- document records
- chunks
- vectors
- project memory
- trusted project state
5. retrieval and context assembly use the machine store, not the staged folder
## Why This Feels Redundant
The staged source files can look almost identical to the original PKM notes or
repo docs because they are still source material.
That is expected.
The staged source area exists because the canonical AtoCore instance on Dalidou
needs a server-visible path to ingest from.
## What Happens When A Project Source Changes
If you edit a note in PKM or a doc in a repo:
- the original source changes immediately
- the staged Dalidou copy does not change automatically
- the AtoCore machine store also does not change automatically
To refresh AtoCore:
1. select the updated project source set
2. copy or mirror the new version into the Dalidou source area
3. run ingestion again
4. verify that retrieval and context reflect the new material
## Current Intentional Limits
The current active-project ingestion strategy is selective.
That means:
- not every note from a project is staged
- not every repo file is staged
- the goal is to start with high-value anchor docs
- broader ingestion comes later if needed
This is why the staged source area for a project may look partial or uneven at
this stage.
## Long-Run Target
The long-run workflow should become much more natural:
- each project has a registered source map
- PKM root
- AtoDrive root
- repo root
- preferred docs
- excluded noisy paths
- a command like `refresh p06-polisher` resolves the right sources
- AtoCore refreshes the machine representation cleanly
- OpenClaw consumes the improved context over API
## Healthy Mental Model
Use this distinction:
- PKM / AtoDrive / repos = human-authoritative sources
- staged Dalidou markdown = server-visible ingestion inputs
- AtoCore DB/vector state = compiled machine context layer
That separation is intentional and healthy.