Clarify source staging and refresh model

2026-04-06 07:53:18 -04:00
parent 82c7535d15
commit 0f95415530
5 changed files with 192 additions and 6 deletions
--- a/docs/atocore-ecosystem-and-hosting.md
+++ b/docs/atocore-ecosystem-and-hosting.md
@@ -49,6 +49,31 @@ separate.
 The machine database is derived operational state, not the primary
 human-readable source of truth.
 ## Source Snapshot Vs Machine Store
 The human-readable files visible under `sources/vault` or `sources/drive` are
 not the final "smart storage" format of AtoCore.
 They are source snapshots made visible to the canonical Dalidou instance so
 AtoCore can ingest them.
 The actual machine-processed state lives in:
 - `source_documents`
 - `source_chunks`
 - vector embeddings and indexes
 - project memories
 - trusted project state
 - context-builder output
 This means the staged markdown can still look very similar to the original PKM
 or repo docs. That is normal.
 The intelligence does not come from rewriting everything into a new markdown
 vault. It comes from ingesting selected source material into the machine store
 and then using that store for retrieval, trust-aware context assembly, and
 memory.
 ## Canonical Hosting Model
 Dalidou is the canonical host for the AtoCore service and machine database.
@@ -86,6 +111,14 @@ file replication of the live machine store.
 - OpenClaw must continue to work if AtoCore is unavailable
 - write-back from OpenClaw into AtoCore is deferred until later phases
 Current staging behavior:
 - selected project docs may be copied into a readable staging area on Dalidou
 - AtoCore ingests from that staging area into the machine store
 - the staging area is not itself the durable intelligence layer
 - changes to the original PKM or repo source do not propagate automatically
  until a refresh or re-ingest happens
 ## Intended Daily Operating Model
 The target workflow is:
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -84,11 +84,12 @@ The Dalidou instance already contains:
  - `p05-interferometer`
  - `p06-polisher`
-Current live stats after the first active-project ingest pass:
+Current live stats after the latest documentation sync and active-project ingest
 passes:
- `source_documents`: 33
+- `source_documents`: 34
- `source_chunks`: 535
+- `source_chunks`: 550
- `vectors`: 535
+- `vectors`: 550
 The broader long-term corpus is still not fully populated yet. Wider project and
 vault ingestion remains a deliberate next step rather than something already
@@ -102,6 +103,14 @@ primarily visible under:
 This staged area is now useful for review because it contains the curated
 project docs that were actually ingested for the first active-project batch.
 It is important to read this staged area correctly:
 - it is a readable ingestion input layer
 - it is not the final machine-memory representation itself
 - seeing familiar PKM-style notes there is expected
 - the machine-processed intelligence lives in the DB, chunks, vectors, memory,
  trusted project state, and context-builder outputs
 ## What Is True On The T420
 - SSH access is working
@@ -132,6 +141,8 @@ In `source_documents` / retrieval corpus:
 - real project documents are now present for the same active project set
 - retrieval is no longer limited to AtoCore self-knowledge only
 - the current corpus is still selective rather than exhaustive
 - that selectivity is intentional at this stage
 In `Trusted Project State`:
--- a/docs/next-steps.md
+++ b/docs/next-steps.md
@@ -26,10 +26,14 @@ AtoCore now has:
 3. Continue controlled project ingestion only where the current corpus is still
   thin
   - a few additional anchor docs per active project
-4. Define backup and export procedures for Dalidou
+4. Define a cleaner source refresh model
   - make the difference between source truth, staged inputs, and machine store
     explicit
   - move toward a project source registry and refresh workflow
 5. Define backup and export procedures for Dalidou
   - SQLite snapshot/backup strategy
   - Chroma backup or rebuild policy
-5. Keep deeper automatic runtime integration deferred until the read-only model
+6. Keep deeper automatic runtime integration deferred until the read-only model
   has proven value
 ## Trusted State Status
--- a/docs/operating-model.md
+++ b/docs/operating-model.md
@@ -91,3 +91,52 @@ Think of AtoCore as the durable external context hard drive for LLM work:
 - no need to replace the normal project workflow
 That is the architecture target.
 ## Why The Staged Markdown Exists
 The staged markdown on Dalidou is a source-input layer, not the end product of
 the system.
 In the current deployment model:
 1. selected PKM, AtoDrive, or repo docs are copied or mirrored into a Dalidou
   source path
 2. AtoCore ingests them
 3. the machine store keeps the processed representation
 4. retrieval and context building operate on that machine store
 So if the staged docs look very similar to your original PKM notes, that is
 expected. They are source material, not the compiled context layer itself.
 ## What Happens When A Source Changes
 If you edit a PKM note or repo doc at the original source, AtoCore does not
 magically know yet.
 The current model is refresh-based:
 1. update the human-authoritative source
 2. refresh or re-stage the relevant project source set on Dalidou
 3. run ingestion again
 4. let AtoCore update the machine representation
 This is still an intermediate workflow. The long-run target is a cleaner source
 registry and refresh model so that commands like `refresh p05-interferometer`
 become natural and reliable.
 ## Current Scope Of Ingestion
 The current project corpus is intentionally selective, not exhaustive.
 For active projects, the goal right now is to ingest:
 - high-value anchor docs
 - strong meeting notes with real decisions
 - architecture and constraints docs
 - selected repo context that explains the system shape
 The goal is not to dump the entire PKM or whole repo tree into AtoCore on the
 first pass.
 So if a project only has some curated notes and not the full project universe in
 the staged area yet, that is normal for the current phase.
--- a/docs/source-refresh-model.md
+++ b/docs/source-refresh-model.md
@@ -0,0 +1,89 @@
 # AtoCore Source Refresh Model
 ## Purpose
 This document explains how human-authored project material should flow into the
 Dalidou-hosted AtoCore machine store.
 It exists to make one distinction explicit:
 - source markdown is not the same thing as the machine-memory layer
 - source refresh is how changes in PKM or repos become visible to AtoCore
 ## Current Model
 Today, the flow is:
 1. human-authoritative project material exists in PKM, AtoDrive, and repos
 2. selected high-value files are staged into Dalidou source paths
 3. AtoCore ingests those source files
 4. AtoCore stores the processed representation in:
   - document records
   - chunks
   - vectors
   - project memory
   - trusted project state
 5. retrieval and context assembly use the machine store, not the staged folder
 ## Why This Feels Redundant
 The staged source files can look almost identical to the original PKM notes or
 repo docs because they are still source material.
 That is expected.
 The staged source area exists because the canonical AtoCore instance on Dalidou
 needs a server-visible path to ingest from.
 ## What Happens When A Project Source Changes
 If you edit a note in PKM or a doc in a repo:
 - the original source changes immediately
 - the staged Dalidou copy does not change automatically
 - the AtoCore machine store also does not change automatically
 To refresh AtoCore:
 1. select the updated project source set
 2. copy or mirror the new version into the Dalidou source area
 3. run ingestion again
 4. verify that retrieval and context reflect the new material
 ## Current Intentional Limits
 The current active-project ingestion strategy is selective.
 That means:
 - not every note from a project is staged
 - not every repo file is staged
 - the goal is to start with high-value anchor docs
 - broader ingestion comes later if needed
 This is why the staged source area for a project may look partial or uneven at
 this stage.
 ## Long-Run Target
 The long-run workflow should become much more natural:
 - each project has a registered source map
  - PKM root
  - AtoDrive root
  - repo root
  - preferred docs
  - excluded noisy paths
 - a command like `refresh p06-polisher` resolves the right sources
 - AtoCore refreshes the machine representation cleanly
 - OpenClaw consumes the improved context over API
 ## Healthy Mental Model
 Use this distinction:
 - PKM / AtoDrive / repos = human-authoritative sources
 - staged Dalidou markdown = server-visible ingestion inputs
 - AtoCore DB/vector state = compiled machine context layer
 That separation is intentional and healthy.