Clarify operating model and project corpus state

Update current state and next steps docs
2026-04-06 07:25:33 -04:00 · 2026-04-05 19:12:45 -04:00
5 changed files with 329 additions and 13 deletions
--- a/docs/atocore-ecosystem-and-hosting.md
+++ b/docs/atocore-ecosystem-and-hosting.md
@@ -86,6 +86,24 @@ file replication of the live machine store.
 - OpenClaw must continue to work if AtoCore is unavailable
 - write-back from OpenClaw into AtoCore is deferred until later phases
 ## Intended Daily Operating Model
 The target workflow is:
 - the human continues to work primarily in PKM project notes, Git/Gitea repos,
  Discord, and normal OpenClaw sessions
 - OpenClaw keeps its own runtime behavior and memory system
 - AtoCore acts as the durable external context layer that compiles trusted
  project state, retrieval, and long-lived machine-readable context
 - AtoCore improves prompt quality and robustness without replacing direct repo
  work, direct file reads, or OpenClaw's own memory
 In other words:
 - PKM and repos remain the human-authoritative project sources
 - OpenClaw remains the active operating environment
 - AtoCore remains the compiled context engine and machine-memory host
 ## Current Status
 As of the current implementation pass:
@@ -93,8 +111,12 @@ As of the current implementation pass:
 - the AtoCore runtime is deployed on Dalidou
 - the canonical machine-data layout exists on Dalidou
 - the service is running from Dalidou
- the long-term content corpus still needs to be populated into the live
+- the T420/OpenClaw machine can reach AtoCore over network
-  Dalidou instance
+- a first read-only OpenClaw-side helper exists
 - the live corpus now includes initial AtoCore self-knowledge and a first
  curated batch for active projects
 - the long-term content corpus still needs broader project and vault ingestion
-This means the platform is hosted on Dalidou now, while the live content corpus
+This means the platform is hosted on Dalidou now, the first cross-machine
-is only partially initialized and not yet fully ingested.
+integration path exists, and the live content corpus is partially populated but
 not yet fully ingested.
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -3,8 +3,10 @@
 ## Status Summary
 AtoCore is no longer just a proof of concept. The local engine exists, the
-correctness pass is complete, and Dalidou now hosts the canonical runtime and
+correctness pass is complete, Dalidou now hosts the canonical runtime and
-machine-storage location.
+machine-storage location, and the T420/OpenClaw side now has a safe read-only
 path to consume AtoCore. The live corpus is no longer just self-knowledge: it
 now includes a first curated ingestion batch for the active projects.
 ## Phase Assessment
@@ -38,6 +40,12 @@ machine-storage location.
 - API routes for query, context, health, and source status
 - env-driven storage and deployment paths
 - Dalidou Docker deployment foundation
 - initial AtoCore self-knowledge corpus ingested on Dalidou
 - T420/OpenClaw read-only AtoCore helper skill
 - first curated active-project corpus batch for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`
 ## What Is True On Dalidou
@@ -55,16 +63,77 @@ The service and storage foundation are live on Dalidou.
 The machine-data host is real and canonical.
-The content corpus is not fully populated yet. A fresh or near-fresh live DB is
+The content corpus is partially populated now.
-running there until the ingestion pipeline loads the ecosystem docs and project
+
-content.
+The Dalidou instance already contains:
 - AtoCore ecosystem and hosting docs
 - current-state and OpenClaw integration docs
 - Master Plan V3
 - Build Spec V1
 - trusted project-state entries for `atocore`
 - curated staged project docs for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`
 - curated repo-context docs for:
  - `p05`: `Fullum-Interferometer`
  - `p06`: `polisher-sim`
 Current live stats after the first active-project ingest pass:
 - `source_documents`: 32
 - `source_chunks`: 523
 - `vectors`: 523
 The broader long-term corpus is still not fully populated yet. Wider project and
 vault ingestion remains a deliberate next step rather than something already
 completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.
 ## What Is True On The T420
 - SSH access is working
 - OpenClaw workspace inspected at `/home/papa/clawd`
 - OpenClaw's own memory system remains unchanged
 - a read-only AtoCore integration skill exists in the workspace:
  - `/home/papa/clawd/skills/atocore-context/`
 - the T420 can successfully reach Dalidou AtoCore over network/Tailscale
 - fail-open behavior has been verified for the helper path
 - OpenClaw can now seed AtoCore in two distinct ways:
  - project-scoped memory entries
  - staged document ingestion into the retrieval corpus
 ## What Exists In Memory vs Corpus
 These remain separate and that is intentional.
 In `/memory`:
 - project-scoped curated memories now exist for:
  - `p04-gigabit`: 5 memories
  - `p05-interferometer`: 6 memories
  - `p06-polisher`: 8 memories
 These are curated summaries and extracted stable project signals.
 In `source_documents` / retrieval corpus:
 - real project documents are now present for the same active project set
 - retrieval is no longer limited to AtoCore self-knowledge only
 This separation is healthy:
 - memory stores distilled project facts
 - corpus stores the underlying retrievable documents
 ## Immediate Next Focus
-1. Ingest AtoCore ecosystem and planning docs into the Dalidou instance
+1. Use the new T420-side AtoCore skill in real OpenClaw workflows
-2. Define the OpenClaw integration contract clearly
+2. Tighten retrieval quality for the newly seeded active projects
-3. Wire OpenClaw to consume AtoCore read-only over network
+3. Promote only the most stable active-project facts into trusted project state
-4. Ingest selected project content in a controlled way
+4. Define the first broader AtoVault/AtoDrive ingestion batches
 5. Add backup/export strategy for Dalidou machine state
 6. Only later consider deeper automatic OpenClaw integration or write-back
 ## Guiding Constraints
--- a/docs/next-steps.md
+++ b/docs/next-steps.md
@@ -0,0 +1,102 @@
 # AtoCore Next Steps
 ## Current Position
 AtoCore now has:
 - canonical runtime and machine storage on Dalidou
 - separated source and machine-data boundaries
 - initial self-knowledge ingested into the live instance
 - trusted project-state entries for AtoCore itself
 - a first read-only OpenClaw integration path on the T420
 - a first real active-project corpus batch for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`
 ## Immediate Next Steps
 1. Use the T420 `atocore-context` skill in real OpenClaw workflows
   - confirm the ergonomics are good
   - confirm the fail-open behavior remains acceptable in practice
 2. Review retrieval quality after the first real project ingestion batch
   - check whether the top hits are useful
   - check whether trusted project state remains dominant
   - reduce cross-project competition and prompt ambiguity where needed
 3. Promote a small number of stable active-project facts into trusted project
   state
   - active architecture
   - current selected path
   - key constraints
   - current next step
 4. Continue controlled project ingestion only where the current corpus is still
   thin
   - a few additional anchor docs per active project
 5. Define backup and export procedures for Dalidou
   - SQLite snapshot/backup strategy
   - Chroma backup or rebuild policy
 6. Keep deeper automatic runtime integration deferred until the read-only model
   has proven value
 ## Recommended Near-Term Project Work
 The first curated batch is already in.
 The near-term work is now:
 1. strengthen retrieval quality
 2. promote the most stable facts into trusted project state
 3. only then add a few more anchor docs where still needed
 ## Recommended Additional Anchor Docs
 1. `p04-gigabit`
 2. `p05-interferometer`
 3. `p06-polisher`
 P04:
 - 1 to 2 more strong study summaries
 - 1 to 2 more meeting notes with actual decisions
 P05:
 - a couple more architecture docs
 - selected vendor-response notes
 - possibly one or two NX/WAVE consumer docs
 P06:
 - more explicit interface/schema docs if needed
 - selected operations or UI docs
 - a distilled non-empty operational context doc to replace an empty `_context.md`
 ## Deferred On Purpose
 - automatic write-back from OpenClaw into AtoCore
 - automatic memory promotion
 - reflection loop integration
 - replacing OpenClaw's own memory system
 - syncing the live machine DB between machines
 ## Success Criteria For The Next Batch
 The next batch is successful if:
 - OpenClaw can use AtoCore naturally when context is needed
 - AtoCore answers correctly for the active project set
 - retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
 - project ingestion remains controlled rather than noisy
 - the canonical Dalidou instance stays stable
 ## Long-Run Goal
 The long-run target is:
 - continue working normally inside PKM project stacks and Gitea repos
 - let OpenClaw keep its own memory and runtime behavior
 - let AtoCore supplement LLM work with stronger trusted context, retrieval, and
  context assembly
 That means AtoCore should behave like a durable external context engine and
 machine-memory layer, not a replacement for normal repo work or OpenClaw memory.
--- a/docs/openclaw-integration-contract.md
+++ b/docs/openclaw-integration-contract.md
@@ -8,6 +8,23 @@ AtoCore.
 The goal is to let OpenClaw consume AtoCore as an external context service
 without degrading OpenClaw's existing baseline behavior.
 ## Current Implemented State
 The first safe integration foundation now exists on the T420 workspace:
 - OpenClaw's own memory system is unchanged
 - a local read-only helper skill exists at:
  - `/home/papa/clawd/skills/atocore-context/`
 - the helper currently talks to the canonical Dalidou instance
 - the helper has verified:
  - `health`
  - `project-state`
  - `query`
  - fail-open fallback when AtoCore is unavailable
 This means the network and workflow foundation is working, even though deeper
 automatic integration into OpenClaw runtime behavior is still deferred.
 ## Integration Principles
 - OpenClaw remains the runtime and orchestration layer
@@ -50,6 +67,18 @@ OpenClaw should treat these as the initial contract:
 Additional project-state inspection can be added if needed, but the first
 integration should stay small and resilient.
 ## Current Helper Surface
 The current helper script exposes:
 - `health`
 - `sources`
 - `stats`
 - `project-state <project>`
 - `query <prompt> [top_k]`
 - `context-build <prompt> [project] [budget]`
 - `ingest-sources`
 ## Failure Behavior
 OpenClaw must treat AtoCore as additive.
@@ -86,6 +115,7 @@ Recommended first behavior:
 ## Deferred Work
 - deeper automatic runtime wiring inside OpenClaw itself
 - memory promotion rules
 - identity and preference write flows
 - reflection loop
--- a/docs/operating-model.md
+++ b/docs/operating-model.md
@@ -0,0 +1,93 @@
 # AtoCore Operating Model
 ## Purpose
 This document makes the intended day-to-day operating model explicit.
 The goal is not to replace how work already happens. The goal is to make that
 existing workflow stronger by adding a durable context engine.
 ## Core Idea
 Normal work continues in:
 - PKM project notes
 - Gitea repositories
 - Discord and OpenClaw workflows
 OpenClaw keeps:
 - its own memory
 - its own runtime and orchestration behavior
 - its own workspace and direct file/repo tooling
 AtoCore adds:
 - trusted project state
 - retrievable cross-source context
 - durable machine memory
 - context assembly that improves prompt quality and robustness
 ## Layer Responsibilities
 - PKM and repos
  - human-authoritative project sources
  - where knowledge is created, edited, reviewed, and maintained
 - OpenClaw
  - active operating environment
  - orchestration, direct repo work, messaging, agent workflows, local memory
 - AtoCore
  - compiled context engine
  - durable machine-memory host
  - retrieval and context assembly layer
 ## Why This Architecture Works
 Each layer has different strengths and weaknesses.
 - PKM and repos are rich but noisy and manual to search
 - OpenClaw memory is useful but session-shaped and not the whole project record
 - raw LLM repo work is powerful but can miss trusted broader context
 - AtoCore can compile context across sources and provide a better prompt input
 The result should be:
 - stronger prompts
 - more robust outputs
 - less manual reconstruction
 - better continuity across sessions and models
 ## What AtoCore Should Not Replace
 AtoCore should not replace:
 - normal file reads
 - direct repo search
 - direct PKM work
 - OpenClaw's own memory
 - OpenClaw's runtime and tool behavior
 It should supplement those systems.
 ## What Healthy Usage Looks Like
 When working on a project:
 1. OpenClaw still uses local workspace/repo context
 2. OpenClaw still uses its own memory
 3. AtoCore adds:
   - trusted current project state
   - retrieved project documents
   - cross-source project context
   - context assembly for more robust model prompts
 ## Practical Rule
 Think of AtoCore as the durable external context hard drive for LLM work:
 - fast machine-readable context
 - persistent project understanding
 - stronger prompt inputs
 - no need to replace the normal project workflow
 That is the architecture target.
Author	SHA1	Message	Date
Anto01	8a94da4bf4	Clarify operating model and project corpus state	2026-04-06 07:25:33 -04:00
Anto01	5069d5b1b6	Update current state and next steps docs	2026-04-05 19:12:45 -04:00