Clarify operating model and project corpus state

2026-04-06 07:25:33 -04:00
parent 5069d5b1b6
commit 8a94da4bf4
4 changed files with 220 additions and 20 deletions
--- a/docs/atocore-ecosystem-and-hosting.md
+++ b/docs/atocore-ecosystem-and-hosting.md
@@ -86,6 +86,24 @@ file replication of the live machine store.
 - OpenClaw must continue to work if AtoCore is unavailable
 - write-back from OpenClaw into AtoCore is deferred until later phases

+## Intended Daily Operating Model
+
+The target workflow is:
+
+- the human continues to work primarily in PKM project notes, Git/Gitea repos,
+  Discord, and normal OpenClaw sessions
+- OpenClaw keeps its own runtime behavior and memory system
+- AtoCore acts as the durable external context layer that compiles trusted
+  project state, retrieval, and long-lived machine-readable context
+- AtoCore improves prompt quality and robustness without replacing direct repo
+  work, direct file reads, or OpenClaw's own memory
+
+In other words:
+
+- PKM and repos remain the human-authoritative project sources
+- OpenClaw remains the active operating environment
+- AtoCore remains the compiled context engine and machine-memory host
+
 ## Current Status

 As of the current implementation pass:
@@ -95,6 +113,8 @@ As of the current implementation pass:
 - the service is running from Dalidou
 - the T420/OpenClaw machine can reach AtoCore over network
 - a first read-only OpenClaw-side helper exists
+- the live corpus now includes initial AtoCore self-knowledge and a first
+  curated batch for active projects
 - the long-term content corpus still needs broader project and vault ingestion

 This means the platform is hosted on Dalidou now, the first cross-machine
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -5,7 +5,8 @@
 AtoCore is no longer just a proof of concept. The local engine exists, the
 correctness pass is complete, Dalidou now hosts the canonical runtime and
 machine-storage location, and the T420/OpenClaw side now has a safe read-only
-path to consume AtoCore.
+path to consume AtoCore. The live corpus is no longer just self-knowledge: it
+now includes a first curated ingestion batch for the active projects.

 ## Phase Assessment

@@ -41,6 +42,10 @@ path to consume AtoCore.
 - Dalidou Docker deployment foundation
 - initial AtoCore self-knowledge corpus ingested on Dalidou
 - T420/OpenClaw read-only AtoCore helper skill
+- first curated active-project corpus batch for:
+  - `p04-gigabit`
+  - `p05-interferometer`
+  - `p06-polisher`

 ## What Is True On Dalidou

@@ -67,10 +72,23 @@ The Dalidou instance already contains:
 - Master Plan V3
 - Build Spec V1
 - trusted project-state entries for `atocore`
+- curated staged project docs for:
+  - `p04-gigabit`
+  - `p05-interferometer`
+  - `p06-polisher`
+- curated repo-context docs for:
+  - `p05`: `Fullum-Interferometer`
+  - `p06`: `polisher-sim`
+
+Current live stats after the first active-project ingest pass:
+
+- `source_documents`: 32
+- `source_chunks`: 523
+- `vectors`: 523

 The broader long-term corpus is still not fully populated yet. Wider project and
 vault ingestion remains a deliberate next step rather than something already
-completed.
+completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.

 ## What Is True On The T420

@@ -81,14 +99,41 @@ completed.
  - `/home/papa/clawd/skills/atocore-context/`
 - the T420 can successfully reach Dalidou AtoCore over network/Tailscale
 - fail-open behavior has been verified for the helper path
+- OpenClaw can now seed AtoCore in two distinct ways:
+  - project-scoped memory entries
+  - staged document ingestion into the retrieval corpus
+
+## What Exists In Memory vs Corpus
+
+These remain separate and that is intentional.
+
+In `/memory`:
+
+- project-scoped curated memories now exist for:
+  - `p04-gigabit`: 5 memories
+  - `p05-interferometer`: 6 memories
+  - `p06-polisher`: 8 memories
+
+These are curated summaries and extracted stable project signals.
+
+In `source_documents` / retrieval corpus:
+
+- real project documents are now present for the same active project set
+- retrieval is no longer limited to AtoCore self-knowledge only
+
+This separation is healthy:
+
+- memory stores distilled project facts
+- corpus stores the underlying retrievable documents

 ## Immediate Next Focus

 1. Use the new T420-side AtoCore skill in real OpenClaw workflows
-2. Ingest selected active project sources in a controlled way
-3. Define the first broader AtoVault/AtoDrive ingestion batches
-4. Add backup/export strategy for Dalidou machine state
-5. Only later consider deeper automatic OpenClaw integration or write-back
+2. Tighten retrieval quality for the newly seeded active projects
+3. Promote only the most stable active-project facts into trusted project state
+4. Define the first broader AtoVault/AtoDrive ingestion batches
+5. Add backup/export strategy for Dalidou machine state
+6. Only later consider deeper automatic OpenClaw integration or write-back

 ## Guiding Constraints

--- a/docs/next-steps.md
+++ b/docs/next-steps.md
@@ -9,38 +9,67 @@ AtoCore now has:
 - initial self-knowledge ingested into the live instance
 - trusted project-state entries for AtoCore itself
 - a first read-only OpenClaw integration path on the T420
+- a first real active-project corpus batch for:
+  - `p04-gigabit`
+  - `p05-interferometer`
+  - `p06-polisher`

 ## Immediate Next Steps

 1. Use the T420 `atocore-context` skill in real OpenClaw workflows
   - confirm the ergonomics are good
   - confirm the fail-open behavior remains acceptable in practice
-2. Ingest selected active projects only
-   - start with the current active project set
-   - prefer trusted operational/project sources first
-   - ingest broader PKM sources only after the trusted layer is loaded
-3. Review retrieval quality after the first real project ingestion batch
+2. Review retrieval quality after the first real project ingestion batch
   - check whether the top hits are useful
   - check whether trusted project state remains dominant
-4. Define backup and export procedures for Dalidou
+   - reduce cross-project competition and prompt ambiguity where needed
+3. Promote a small number of stable active-project facts into trusted project
+   state
+   - active architecture
+   - current selected path
+   - key constraints
+   - current next step
+4. Continue controlled project ingestion only where the current corpus is still
+   thin
+   - a few additional anchor docs per active project
+5. Define backup and export procedures for Dalidou
   - SQLite snapshot/backup strategy
   - Chroma backup or rebuild policy
-5. Keep deeper automatic runtime integration deferred until the read-only model
+6. Keep deeper automatic runtime integration deferred until the read-only model
   has proven value

-## Recommended Active Project Ingestion Order
+## Recommended Near-Term Project Work
+
+The first curated batch is already in.
+
+The near-term work is now:
+
+1. strengthen retrieval quality
+2. promote the most stable facts into trusted project state
+3. only then add a few more anchor docs where still needed
+
+## Recommended Additional Anchor Docs

 1. `p04-gigabit`
 2. `p05-interferometer`
 3. `p06-polisher`

-For each project:
+P04:

-1. identify the matching AtoDrive/project-operational sources
-2. identify the matching PKM project folder(s)
-3. ingest the trusted/operational material first
-4. ingest broader notes second
-5. review retrieval quality before moving on
+- 1 to 2 more strong study summaries
+- 1 to 2 more meeting notes with actual decisions
+
+P05:
+
+- a couple more architecture docs
+- selected vendor-response notes
+- possibly one or two NX/WAVE consumer docs
+
+P06:
+
+- more explicit interface/schema docs if needed
+- selected operations or UI docs
+- a distilled non-empty operational context doc to replace an empty `_context.md`

 ## Deferred On Purpose

@@ -56,5 +85,18 @@ The next batch is successful if:

 - OpenClaw can use AtoCore naturally when context is needed
 - AtoCore answers correctly for the active project set
+- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
 - project ingestion remains controlled rather than noisy
 - the canonical Dalidou instance stays stable
+
+## Long-Run Goal
+
+The long-run target is:
+
+- continue working normally inside PKM project stacks and Gitea repos
+- let OpenClaw keep its own memory and runtime behavior
+- let AtoCore supplement LLM work with stronger trusted context, retrieval, and
+  context assembly
+
+That means AtoCore should behave like a durable external context engine and
+machine-memory layer, not a replacement for normal repo work or OpenClaw memory.
--- a/docs/operating-model.md
+++ b/docs/operating-model.md
@@ -0,0 +1,93 @@
+# AtoCore Operating Model
+
+## Purpose
+
+This document makes the intended day-to-day operating model explicit.
+
+The goal is not to replace how work already happens. The goal is to make that
+existing workflow stronger by adding a durable context engine.
+
+## Core Idea
+
+Normal work continues in:
+
+- PKM project notes
+- Gitea repositories
+- Discord and OpenClaw workflows
+
+OpenClaw keeps:
+
+- its own memory
+- its own runtime and orchestration behavior
+- its own workspace and direct file/repo tooling
+
+AtoCore adds:
+
+- trusted project state
+- retrievable cross-source context
+- durable machine memory
+- context assembly that improves prompt quality and robustness
+
+## Layer Responsibilities
+
+- PKM and repos
+  - human-authoritative project sources
+  - where knowledge is created, edited, reviewed, and maintained
+- OpenClaw
+  - active operating environment
+  - orchestration, direct repo work, messaging, agent workflows, local memory
+- AtoCore
+  - compiled context engine
+  - durable machine-memory host
+  - retrieval and context assembly layer
+
+## Why This Architecture Works
+
+Each layer has different strengths and weaknesses.
+
+- PKM and repos are rich but noisy and manual to search
+- OpenClaw memory is useful but session-shaped and not the whole project record
+- raw LLM repo work is powerful but can miss trusted broader context
+- AtoCore can compile context across sources and provide a better prompt input
+
+The result should be:
+
+- stronger prompts
+- more robust outputs
+- less manual reconstruction
+- better continuity across sessions and models
+
+## What AtoCore Should Not Replace
+
+AtoCore should not replace:
+
+- normal file reads
+- direct repo search
+- direct PKM work
+- OpenClaw's own memory
+- OpenClaw's runtime and tool behavior
+
+It should supplement those systems.
+
+## What Healthy Usage Looks Like
+
+When working on a project:
+
+1. OpenClaw still uses local workspace/repo context
+2. OpenClaw still uses its own memory
+3. AtoCore adds:
+   - trusted current project state
+   - retrieved project documents
+   - cross-source project context
+   - context assembly for more robust model prompts
+
+## Practical Rule
+
+Think of AtoCore as the durable external context hard drive for LLM work:
+
+- fast machine-readable context
+- persistent project understanding
+- stronger prompt inputs
+- no need to replace the normal project workflow
+
+That is the architecture target.