diff --git a/docs/atocore-ecosystem-and-hosting.md b/docs/atocore-ecosystem-and-hosting.md index 29531d7..2276399 100644 --- a/docs/atocore-ecosystem-and-hosting.md +++ b/docs/atocore-ecosystem-and-hosting.md @@ -49,6 +49,31 @@ separate. The machine database is derived operational state, not the primary human-readable source of truth. +## Source Snapshot Vs Machine Store + +The human-readable files visible under `sources/vault` or `sources/drive` are +not the final "smart storage" format of AtoCore. + +They are source snapshots made visible to the canonical Dalidou instance so +AtoCore can ingest them. + +The actual machine-processed state lives in: + +- `source_documents` +- `source_chunks` +- vector embeddings and indexes +- project memories +- trusted project state +- context-builder output + +This means the staged markdown can still look very similar to the original PKM +or repo docs. That is normal. + +The intelligence does not come from rewriting everything into a new markdown +vault. It comes from ingesting selected source material into the machine store +and then using that store for retrieval, trust-aware context assembly, and +memory. + ## Canonical Hosting Model Dalidou is the canonical host for the AtoCore service and machine database. @@ -86,6 +111,14 @@ file replication of the live machine store. - OpenClaw must continue to work if AtoCore is unavailable - write-back from OpenClaw into AtoCore is deferred until later phases +Current staging behavior: + +- selected project docs may be copied into a readable staging area on Dalidou +- AtoCore ingests from that staging area into the machine store +- the staging area is not itself the durable intelligence layer +- changes to the original PKM or repo source do not propagate automatically + until a refresh or re-ingest happens + ## Intended Daily Operating Model The target workflow is: diff --git a/docs/current-state.md b/docs/current-state.md index 3db5cff..a859d1c 100644 --- a/docs/current-state.md +++ b/docs/current-state.md @@ -84,11 +84,12 @@ The Dalidou instance already contains: - `p05-interferometer` - `p06-polisher` -Current live stats after the first active-project ingest pass: +Current live stats after the latest documentation sync and active-project ingest +passes: -- `source_documents`: 33 -- `source_chunks`: 535 -- `vectors`: 535 +- `source_documents`: 34 +- `source_chunks`: 550 +- `vectors`: 550 The broader long-term corpus is still not fully populated yet. Wider project and vault ingestion remains a deliberate next step rather than something already @@ -102,6 +103,14 @@ primarily visible under: This staged area is now useful for review because it contains the curated project docs that were actually ingested for the first active-project batch. +It is important to read this staged area correctly: + +- it is a readable ingestion input layer +- it is not the final machine-memory representation itself +- seeing familiar PKM-style notes there is expected +- the machine-processed intelligence lives in the DB, chunks, vectors, memory, + trusted project state, and context-builder outputs + ## What Is True On The T420 - SSH access is working @@ -132,6 +141,8 @@ In `source_documents` / retrieval corpus: - real project documents are now present for the same active project set - retrieval is no longer limited to AtoCore self-knowledge only +- the current corpus is still selective rather than exhaustive +- that selectivity is intentional at this stage In `Trusted Project State`: diff --git a/docs/next-steps.md b/docs/next-steps.md index 25d78fb..0ca7b72 100644 --- a/docs/next-steps.md +++ b/docs/next-steps.md @@ -26,10 +26,14 @@ AtoCore now has: 3. Continue controlled project ingestion only where the current corpus is still thin - a few additional anchor docs per active project -4. Define backup and export procedures for Dalidou +4. Define a cleaner source refresh model + - make the difference between source truth, staged inputs, and machine store + explicit + - move toward a project source registry and refresh workflow +5. Define backup and export procedures for Dalidou - SQLite snapshot/backup strategy - Chroma backup or rebuild policy -5. Keep deeper automatic runtime integration deferred until the read-only model +6. Keep deeper automatic runtime integration deferred until the read-only model has proven value ## Trusted State Status diff --git a/docs/operating-model.md b/docs/operating-model.md index 16ece2e..25b043d 100644 --- a/docs/operating-model.md +++ b/docs/operating-model.md @@ -91,3 +91,52 @@ Think of AtoCore as the durable external context hard drive for LLM work: - no need to replace the normal project workflow That is the architecture target. + +## Why The Staged Markdown Exists + +The staged markdown on Dalidou is a source-input layer, not the end product of +the system. + +In the current deployment model: + +1. selected PKM, AtoDrive, or repo docs are copied or mirrored into a Dalidou + source path +2. AtoCore ingests them +3. the machine store keeps the processed representation +4. retrieval and context building operate on that machine store + +So if the staged docs look very similar to your original PKM notes, that is +expected. They are source material, not the compiled context layer itself. + +## What Happens When A Source Changes + +If you edit a PKM note or repo doc at the original source, AtoCore does not +magically know yet. + +The current model is refresh-based: + +1. update the human-authoritative source +2. refresh or re-stage the relevant project source set on Dalidou +3. run ingestion again +4. let AtoCore update the machine representation + +This is still an intermediate workflow. The long-run target is a cleaner source +registry and refresh model so that commands like `refresh p05-interferometer` +become natural and reliable. + +## Current Scope Of Ingestion + +The current project corpus is intentionally selective, not exhaustive. + +For active projects, the goal right now is to ingest: + +- high-value anchor docs +- strong meeting notes with real decisions +- architecture and constraints docs +- selected repo context that explains the system shape + +The goal is not to dump the entire PKM or whole repo tree into AtoCore on the +first pass. + +So if a project only has some curated notes and not the full project universe in +the staged area yet, that is normal for the current phase. diff --git a/docs/source-refresh-model.md b/docs/source-refresh-model.md new file mode 100644 index 0000000..c7a4895 --- /dev/null +++ b/docs/source-refresh-model.md @@ -0,0 +1,89 @@ +# AtoCore Source Refresh Model + +## Purpose + +This document explains how human-authored project material should flow into the +Dalidou-hosted AtoCore machine store. + +It exists to make one distinction explicit: + +- source markdown is not the same thing as the machine-memory layer +- source refresh is how changes in PKM or repos become visible to AtoCore + +## Current Model + +Today, the flow is: + +1. human-authoritative project material exists in PKM, AtoDrive, and repos +2. selected high-value files are staged into Dalidou source paths +3. AtoCore ingests those source files +4. AtoCore stores the processed representation in: + - document records + - chunks + - vectors + - project memory + - trusted project state +5. retrieval and context assembly use the machine store, not the staged folder + +## Why This Feels Redundant + +The staged source files can look almost identical to the original PKM notes or +repo docs because they are still source material. + +That is expected. + +The staged source area exists because the canonical AtoCore instance on Dalidou +needs a server-visible path to ingest from. + +## What Happens When A Project Source Changes + +If you edit a note in PKM or a doc in a repo: + +- the original source changes immediately +- the staged Dalidou copy does not change automatically +- the AtoCore machine store also does not change automatically + +To refresh AtoCore: + +1. select the updated project source set +2. copy or mirror the new version into the Dalidou source area +3. run ingestion again +4. verify that retrieval and context reflect the new material + +## Current Intentional Limits + +The current active-project ingestion strategy is selective. + +That means: + +- not every note from a project is staged +- not every repo file is staged +- the goal is to start with high-value anchor docs +- broader ingestion comes later if needed + +This is why the staged source area for a project may look partial or uneven at +this stage. + +## Long-Run Target + +The long-run workflow should become much more natural: + +- each project has a registered source map + - PKM root + - AtoDrive root + - repo root + - preferred docs + - excluded noisy paths +- a command like `refresh p06-polisher` resolves the right sources +- AtoCore refreshes the machine representation cleanly +- OpenClaw consumes the improved context over API + +## Healthy Mental Model + +Use this distinction: + +- PKM / AtoDrive / repos = human-authoritative sources +- staged Dalidou markdown = server-visible ingestion inputs +- AtoCore DB/vector state = compiled machine context layer + +That separation is intentional and healthy.