Expand active project wave and serialize refreshes

2026-04-06 14:58:14 -04:00
parent 46a5d5887a
commit bdb42dba05
8 changed files with 243 additions and 45 deletions
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -52,7 +52,7 @@ now includes a first curated ingestion batch for the active projects.
 - Dalidou Docker deployment foundation
 - initial AtoCore self-knowledge corpus ingested on Dalidou
 - T420/OpenClaw read-only AtoCore helper skill
- first curated active-project corpus batch for:
+- full active-project markdown/text corpus wave for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`
@@ -87,7 +87,7 @@ The Dalidou instance already contains:
 - Master Plan V3
 - Build Spec V1
 - trusted project-state entries for `atocore`
- curated staged project docs for:
+- full staged project markdown/text corpora for:
  - `p04-gigabit`
  - `p05-interferometer`
  - `p06-polisher`
@@ -99,12 +99,12 @@ The Dalidou instance already contains:
  - `p05-interferometer`
  - `p06-polisher`

-Current live stats after the latest documentation sync and active-project ingest
-passes:
+Current live stats after the full active-project wave are now far beyond the
+initial seed stage:

- `source_documents`: 36
- `source_chunks`: 568
- `vectors`: 568
+- more than `1,100` source documents
+- more than `20,000` chunks
+- matching vector count

 The broader long-term corpus is still not fully populated yet. Wider project and
 vault ingestion remains a deliberate next step rather than something already
@@ -115,8 +115,8 @@ primarily visible under:

 - `/srv/storage/atocore/sources/vault/incoming/projects`

-This staged area is now useful for review because it contains the curated
-project docs that were actually ingested for the first active-project batch.
+This staged area is now useful for review because it contains the markdown/text
+project docs that were actually ingested for the full active-project wave.

 It is important to read this staged area correctly:

@@ -166,10 +166,12 @@ These are curated summaries and extracted stable project signals.

 In `source_documents` / retrieval corpus:

- real project documents are now present for the same active project set
+- full project markdown/text corpora are now present for the active project set
 - retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is still selective rather than exhaustive
- that selectivity is intentional at this stage
+- the current corpus is broad enough that ranking quality matters more than
+  corpus presence alone
+- underspecified prompts can still pull in historical or archive material, so
+  project-aware routing and better ranking remain important

 The source refresh model now has a concrete foundation in code:

@@ -223,8 +225,8 @@ This separation is healthy:
 ## Immediate Next Focus

 1. Use the new T420-side organic routing layer in real OpenClaw workflows
-2. Keep tightening retrieval quality for the newly seeded active projects
-3. Define the first broader AtoVault/AtoDrive ingestion batches
+2. Tighten retrieval quality for the now fully ingested active project corpora
+3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
 4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
 5. Expand the boring operations baseline:
   - restore validation
@@ -234,6 +236,7 @@ This separation is healthy:

 See also:

+- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
 - [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)

 ## Guiding Constraints