Clarify source staging and refresh model
This commit is contained in:
@@ -49,6 +49,31 @@ separate.
|
||||
The machine database is derived operational state, not the primary
|
||||
human-readable source of truth.
|
||||
|
||||
## Source Snapshot Vs Machine Store
|
||||
|
||||
The human-readable files visible under `sources/vault` or `sources/drive` are
|
||||
not the final "smart storage" format of AtoCore.
|
||||
|
||||
They are source snapshots made visible to the canonical Dalidou instance so
|
||||
AtoCore can ingest them.
|
||||
|
||||
The actual machine-processed state lives in:
|
||||
|
||||
- `source_documents`
|
||||
- `source_chunks`
|
||||
- vector embeddings and indexes
|
||||
- project memories
|
||||
- trusted project state
|
||||
- context-builder output
|
||||
|
||||
This means the staged markdown can still look very similar to the original PKM
|
||||
or repo docs. That is normal.
|
||||
|
||||
The intelligence does not come from rewriting everything into a new markdown
|
||||
vault. It comes from ingesting selected source material into the machine store
|
||||
and then using that store for retrieval, trust-aware context assembly, and
|
||||
memory.
|
||||
|
||||
## Canonical Hosting Model
|
||||
|
||||
Dalidou is the canonical host for the AtoCore service and machine database.
|
||||
@@ -86,6 +111,14 @@ file replication of the live machine store.
|
||||
- OpenClaw must continue to work if AtoCore is unavailable
|
||||
- write-back from OpenClaw into AtoCore is deferred until later phases
|
||||
|
||||
Current staging behavior:
|
||||
|
||||
- selected project docs may be copied into a readable staging area on Dalidou
|
||||
- AtoCore ingests from that staging area into the machine store
|
||||
- the staging area is not itself the durable intelligence layer
|
||||
- changes to the original PKM or repo source do not propagate automatically
|
||||
until a refresh or re-ingest happens
|
||||
|
||||
## Intended Daily Operating Model
|
||||
|
||||
The target workflow is:
|
||||
|
||||
@@ -84,11 +84,12 @@ The Dalidou instance already contains:
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
|
||||
Current live stats after the first active-project ingest pass:
|
||||
Current live stats after the latest documentation sync and active-project ingest
|
||||
passes:
|
||||
|
||||
- `source_documents`: 33
|
||||
- `source_chunks`: 535
|
||||
- `vectors`: 535
|
||||
- `source_documents`: 34
|
||||
- `source_chunks`: 550
|
||||
- `vectors`: 550
|
||||
|
||||
The broader long-term corpus is still not fully populated yet. Wider project and
|
||||
vault ingestion remains a deliberate next step rather than something already
|
||||
@@ -102,6 +103,14 @@ primarily visible under:
|
||||
This staged area is now useful for review because it contains the curated
|
||||
project docs that were actually ingested for the first active-project batch.
|
||||
|
||||
It is important to read this staged area correctly:
|
||||
|
||||
- it is a readable ingestion input layer
|
||||
- it is not the final machine-memory representation itself
|
||||
- seeing familiar PKM-style notes there is expected
|
||||
- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
|
||||
trusted project state, and context-builder outputs
|
||||
|
||||
## What Is True On The T420
|
||||
|
||||
- SSH access is working
|
||||
@@ -132,6 +141,8 @@ In `source_documents` / retrieval corpus:
|
||||
|
||||
- real project documents are now present for the same active project set
|
||||
- retrieval is no longer limited to AtoCore self-knowledge only
|
||||
- the current corpus is still selective rather than exhaustive
|
||||
- that selectivity is intentional at this stage
|
||||
|
||||
In `Trusted Project State`:
|
||||
|
||||
|
||||
@@ -26,10 +26,14 @@ AtoCore now has:
|
||||
3. Continue controlled project ingestion only where the current corpus is still
|
||||
thin
|
||||
- a few additional anchor docs per active project
|
||||
4. Define backup and export procedures for Dalidou
|
||||
4. Define a cleaner source refresh model
|
||||
- make the difference between source truth, staged inputs, and machine store
|
||||
explicit
|
||||
- move toward a project source registry and refresh workflow
|
||||
5. Define backup and export procedures for Dalidou
|
||||
- SQLite snapshot/backup strategy
|
||||
- Chroma backup or rebuild policy
|
||||
5. Keep deeper automatic runtime integration deferred until the read-only model
|
||||
6. Keep deeper automatic runtime integration deferred until the read-only model
|
||||
has proven value
|
||||
|
||||
## Trusted State Status
|
||||
|
||||
@@ -91,3 +91,52 @@ Think of AtoCore as the durable external context hard drive for LLM work:
|
||||
- no need to replace the normal project workflow
|
||||
|
||||
That is the architecture target.
|
||||
|
||||
## Why The Staged Markdown Exists
|
||||
|
||||
The staged markdown on Dalidou is a source-input layer, not the end product of
|
||||
the system.
|
||||
|
||||
In the current deployment model:
|
||||
|
||||
1. selected PKM, AtoDrive, or repo docs are copied or mirrored into a Dalidou
|
||||
source path
|
||||
2. AtoCore ingests them
|
||||
3. the machine store keeps the processed representation
|
||||
4. retrieval and context building operate on that machine store
|
||||
|
||||
So if the staged docs look very similar to your original PKM notes, that is
|
||||
expected. They are source material, not the compiled context layer itself.
|
||||
|
||||
## What Happens When A Source Changes
|
||||
|
||||
If you edit a PKM note or repo doc at the original source, AtoCore does not
|
||||
magically know yet.
|
||||
|
||||
The current model is refresh-based:
|
||||
|
||||
1. update the human-authoritative source
|
||||
2. refresh or re-stage the relevant project source set on Dalidou
|
||||
3. run ingestion again
|
||||
4. let AtoCore update the machine representation
|
||||
|
||||
This is still an intermediate workflow. The long-run target is a cleaner source
|
||||
registry and refresh model so that commands like `refresh p05-interferometer`
|
||||
become natural and reliable.
|
||||
|
||||
## Current Scope Of Ingestion
|
||||
|
||||
The current project corpus is intentionally selective, not exhaustive.
|
||||
|
||||
For active projects, the goal right now is to ingest:
|
||||
|
||||
- high-value anchor docs
|
||||
- strong meeting notes with real decisions
|
||||
- architecture and constraints docs
|
||||
- selected repo context that explains the system shape
|
||||
|
||||
The goal is not to dump the entire PKM or whole repo tree into AtoCore on the
|
||||
first pass.
|
||||
|
||||
So if a project only has some curated notes and not the full project universe in
|
||||
the staged area yet, that is normal for the current phase.
|
||||
|
||||
89
docs/source-refresh-model.md
Normal file
89
docs/source-refresh-model.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# AtoCore Source Refresh Model
|
||||
|
||||
## Purpose
|
||||
|
||||
This document explains how human-authored project material should flow into the
|
||||
Dalidou-hosted AtoCore machine store.
|
||||
|
||||
It exists to make one distinction explicit:
|
||||
|
||||
- source markdown is not the same thing as the machine-memory layer
|
||||
- source refresh is how changes in PKM or repos become visible to AtoCore
|
||||
|
||||
## Current Model
|
||||
|
||||
Today, the flow is:
|
||||
|
||||
1. human-authoritative project material exists in PKM, AtoDrive, and repos
|
||||
2. selected high-value files are staged into Dalidou source paths
|
||||
3. AtoCore ingests those source files
|
||||
4. AtoCore stores the processed representation in:
|
||||
- document records
|
||||
- chunks
|
||||
- vectors
|
||||
- project memory
|
||||
- trusted project state
|
||||
5. retrieval and context assembly use the machine store, not the staged folder
|
||||
|
||||
## Why This Feels Redundant
|
||||
|
||||
The staged source files can look almost identical to the original PKM notes or
|
||||
repo docs because they are still source material.
|
||||
|
||||
That is expected.
|
||||
|
||||
The staged source area exists because the canonical AtoCore instance on Dalidou
|
||||
needs a server-visible path to ingest from.
|
||||
|
||||
## What Happens When A Project Source Changes
|
||||
|
||||
If you edit a note in PKM or a doc in a repo:
|
||||
|
||||
- the original source changes immediately
|
||||
- the staged Dalidou copy does not change automatically
|
||||
- the AtoCore machine store also does not change automatically
|
||||
|
||||
To refresh AtoCore:
|
||||
|
||||
1. select the updated project source set
|
||||
2. copy or mirror the new version into the Dalidou source area
|
||||
3. run ingestion again
|
||||
4. verify that retrieval and context reflect the new material
|
||||
|
||||
## Current Intentional Limits
|
||||
|
||||
The current active-project ingestion strategy is selective.
|
||||
|
||||
That means:
|
||||
|
||||
- not every note from a project is staged
|
||||
- not every repo file is staged
|
||||
- the goal is to start with high-value anchor docs
|
||||
- broader ingestion comes later if needed
|
||||
|
||||
This is why the staged source area for a project may look partial or uneven at
|
||||
this stage.
|
||||
|
||||
## Long-Run Target
|
||||
|
||||
The long-run workflow should become much more natural:
|
||||
|
||||
- each project has a registered source map
|
||||
- PKM root
|
||||
- AtoDrive root
|
||||
- repo root
|
||||
- preferred docs
|
||||
- excluded noisy paths
|
||||
- a command like `refresh p06-polisher` resolves the right sources
|
||||
- AtoCore refreshes the machine representation cleanly
|
||||
- OpenClaw consumes the improved context over API
|
||||
|
||||
## Healthy Mental Model
|
||||
|
||||
Use this distinction:
|
||||
|
||||
- PKM / AtoDrive / repos = human-authoritative sources
|
||||
- staged Dalidou markdown = server-visible ingestion inputs
|
||||
- AtoCore DB/vector state = compiled machine context layer
|
||||
|
||||
That separation is intentional and healthy.
|
||||
Reference in New Issue
Block a user