Clarify source staging and refresh model

2026-04-06 07:53:18 -04:00
parent 82c7535d15
commit 0f95415530
5 changed files with 192 additions and 6 deletions
--- a/docs/atocore-ecosystem-and-hosting.md
+++ b/docs/atocore-ecosystem-and-hosting.md
@@ -49,6 +49,31 @@ separate.
 The machine database is derived operational state, not the primary
 human-readable source of truth.

+## Source Snapshot Vs Machine Store
+
+The human-readable files visible under `sources/vault` or `sources/drive` are
+not the final "smart storage" format of AtoCore.
+
+They are source snapshots made visible to the canonical Dalidou instance so
+AtoCore can ingest them.
+
+The actual machine-processed state lives in:
+
+- `source_documents`
+- `source_chunks`
+- vector embeddings and indexes
+- project memories
+- trusted project state
+- context-builder output
+
+This means the staged markdown can still look very similar to the original PKM
+or repo docs. That is normal.
+
+The intelligence does not come from rewriting everything into a new markdown
+vault. It comes from ingesting selected source material into the machine store
+and then using that store for retrieval, trust-aware context assembly, and
+memory.
+
 ## Canonical Hosting Model

 Dalidou is the canonical host for the AtoCore service and machine database.
@@ -86,6 +111,14 @@ file replication of the live machine store.
 - OpenClaw must continue to work if AtoCore is unavailable
 - write-back from OpenClaw into AtoCore is deferred until later phases

+Current staging behavior:
+
+- selected project docs may be copied into a readable staging area on Dalidou
+- AtoCore ingests from that staging area into the machine store
+- the staging area is not itself the durable intelligence layer
+- changes to the original PKM or repo source do not propagate automatically
+  until a refresh or re-ingest happens
+
 ## Intended Daily Operating Model

 The target workflow is:
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -84,11 +84,12 @@ The Dalidou instance already contains:
  - `p05-interferometer`
  - `p06-polisher`

-Current live stats after the first active-project ingest pass:
+Current live stats after the latest documentation sync and active-project ingest
+passes:

- `source_documents`: 33
- `source_chunks`: 535
- `vectors`: 535
+- `source_documents`: 34
+- `source_chunks`: 550
+- `vectors`: 550

 The broader long-term corpus is still not fully populated yet. Wider project and
 vault ingestion remains a deliberate next step rather than something already
@@ -102,6 +103,14 @@ primarily visible under:
 This staged area is now useful for review because it contains the curated
 project docs that were actually ingested for the first active-project batch.

+It is important to read this staged area correctly:
+
+- it is a readable ingestion input layer
+- it is not the final machine-memory representation itself
+- seeing familiar PKM-style notes there is expected
+- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
+  trusted project state, and context-builder outputs
+
 ## What Is True On The T420

 - SSH access is working
@@ -132,6 +141,8 @@ In `source_documents` / retrieval corpus:

 - real project documents are now present for the same active project set
 - retrieval is no longer limited to AtoCore self-knowledge only
+- the current corpus is still selective rather than exhaustive
+- that selectivity is intentional at this stage

 In `Trusted Project State`:

--- a/docs/next-steps.md
+++ b/docs/next-steps.md
@@ -26,10 +26,14 @@ AtoCore now has:
 3. Continue controlled project ingestion only where the current corpus is still
   thin
   - a few additional anchor docs per active project
-4. Define backup and export procedures for Dalidou
+4. Define a cleaner source refresh model
+   - make the difference between source truth, staged inputs, and machine store
+     explicit
+   - move toward a project source registry and refresh workflow
+5. Define backup and export procedures for Dalidou
   - SQLite snapshot/backup strategy
   - Chroma backup or rebuild policy
-5. Keep deeper automatic runtime integration deferred until the read-only model
+6. Keep deeper automatic runtime integration deferred until the read-only model
   has proven value

 ## Trusted State Status
--- a/docs/operating-model.md
+++ b/docs/operating-model.md
@@ -91,3 +91,52 @@ Think of AtoCore as the durable external context hard drive for LLM work:
 - no need to replace the normal project workflow

 That is the architecture target.
+
+## Why The Staged Markdown Exists
+
+The staged markdown on Dalidou is a source-input layer, not the end product of
+the system.
+
+In the current deployment model:
+
+1. selected PKM, AtoDrive, or repo docs are copied or mirrored into a Dalidou
+   source path
+2. AtoCore ingests them
+3. the machine store keeps the processed representation
+4. retrieval and context building operate on that machine store
+
+So if the staged docs look very similar to your original PKM notes, that is
+expected. They are source material, not the compiled context layer itself.
+
+## What Happens When A Source Changes
+
+If you edit a PKM note or repo doc at the original source, AtoCore does not
+magically know yet.
+
+The current model is refresh-based:
+
+1. update the human-authoritative source
+2. refresh or re-stage the relevant project source set on Dalidou
+3. run ingestion again
+4. let AtoCore update the machine representation
+
+This is still an intermediate workflow. The long-run target is a cleaner source
+registry and refresh model so that commands like `refresh p05-interferometer`
+become natural and reliable.
+
+## Current Scope Of Ingestion
+
+The current project corpus is intentionally selective, not exhaustive.
+
+For active projects, the goal right now is to ingest:
+
+- high-value anchor docs
+- strong meeting notes with real decisions
+- architecture and constraints docs
+- selected repo context that explains the system shape
+
+The goal is not to dump the entire PKM or whole repo tree into AtoCore on the
+first pass.
+
+So if a project only has some curated notes and not the full project universe in
+the staged area yet, that is normal for the current phase.
--- a/docs/source-refresh-model.md
+++ b/docs/source-refresh-model.md
@@ -0,0 +1,89 @@
+# AtoCore Source Refresh Model
+
+## Purpose
+
+This document explains how human-authored project material should flow into the
+Dalidou-hosted AtoCore machine store.
+
+It exists to make one distinction explicit:
+
+- source markdown is not the same thing as the machine-memory layer
+- source refresh is how changes in PKM or repos become visible to AtoCore
+
+## Current Model
+
+Today, the flow is:
+
+1. human-authoritative project material exists in PKM, AtoDrive, and repos
+2. selected high-value files are staged into Dalidou source paths
+3. AtoCore ingests those source files
+4. AtoCore stores the processed representation in:
+   - document records
+   - chunks
+   - vectors
+   - project memory
+   - trusted project state
+5. retrieval and context assembly use the machine store, not the staged folder
+
+## Why This Feels Redundant
+
+The staged source files can look almost identical to the original PKM notes or
+repo docs because they are still source material.
+
+That is expected.
+
+The staged source area exists because the canonical AtoCore instance on Dalidou
+needs a server-visible path to ingest from.
+
+## What Happens When A Project Source Changes
+
+If you edit a note in PKM or a doc in a repo:
+
+- the original source changes immediately
+- the staged Dalidou copy does not change automatically
+- the AtoCore machine store also does not change automatically
+
+To refresh AtoCore:
+
+1. select the updated project source set
+2. copy or mirror the new version into the Dalidou source area
+3. run ingestion again
+4. verify that retrieval and context reflect the new material
+
+## Current Intentional Limits
+
+The current active-project ingestion strategy is selective.
+
+That means:
+
+- not every note from a project is staged
+- not every repo file is staged
+- the goal is to start with high-value anchor docs
+- broader ingestion comes later if needed
+
+This is why the staged source area for a project may look partial or uneven at
+this stage.
+
+## Long-Run Target
+
+The long-run workflow should become much more natural:
+
+- each project has a registered source map
+  - PKM root
+  - AtoDrive root
+  - repo root
+  - preferred docs
+  - excluded noisy paths
+- a command like `refresh p06-polisher` resolves the right sources
+- AtoCore refreshes the machine representation cleanly
+- OpenClaw consumes the improved context over API
+
+## Healthy Mental Model
+
+Use this distinction:
+
+- PKM / AtoDrive / repos = human-authoritative sources
+- staged Dalidou markdown = server-visible ingestion inputs
+- AtoCore DB/vector state = compiled machine context layer
+
+That separation is intentional and healthy.