Expand active project wave and serialize refreshes

This commit is contained in:
2026-04-06 14:58:14 -04:00
parent 46a5d5887a
commit bdb42dba05
8 changed files with 243 additions and 45 deletions

View File

@@ -52,7 +52,7 @@ now includes a first curated ingestion batch for the active projects.
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
- T420/OpenClaw read-only AtoCore helper skill
- first curated active-project corpus batch for:
- full active-project markdown/text corpus wave for:
- `p04-gigabit`
- `p05-interferometer`
- `p06-polisher`
@@ -87,7 +87,7 @@ The Dalidou instance already contains:
- Master Plan V3
- Build Spec V1
- trusted project-state entries for `atocore`
- curated staged project docs for:
- full staged project markdown/text corpora for:
- `p04-gigabit`
- `p05-interferometer`
- `p06-polisher`
@@ -99,12 +99,12 @@ The Dalidou instance already contains:
- `p05-interferometer`
- `p06-polisher`
Current live stats after the latest documentation sync and active-project ingest
passes:
Current live stats after the full active-project wave are now far beyond the
initial seed stage:
- `source_documents`: 36
- `source_chunks`: 568
- `vectors`: 568
- more than `1,100` source documents
- more than `20,000` chunks
- matching vector count
The broader long-term corpus is still not fully populated yet. Wider project and
vault ingestion remains a deliberate next step rather than something already
@@ -115,8 +115,8 @@ primarily visible under:
- `/srv/storage/atocore/sources/vault/incoming/projects`
This staged area is now useful for review because it contains the curated
project docs that were actually ingested for the first active-project batch.
This staged area is now useful for review because it contains the markdown/text
project docs that were actually ingested for the full active-project wave.
It is important to read this staged area correctly:
@@ -166,10 +166,12 @@ These are curated summaries and extracted stable project signals.
In `source_documents` / retrieval corpus:
- real project documents are now present for the same active project set
- full project markdown/text corpora are now present for the active project set
- retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is still selective rather than exhaustive
- that selectivity is intentional at this stage
- the current corpus is broad enough that ranking quality matters more than
corpus presence alone
- underspecified prompts can still pull in historical or archive material, so
project-aware routing and better ranking remain important
The source refresh model now has a concrete foundation in code:
@@ -223,8 +225,8 @@ This separation is healthy:
## Immediate Next Focus
1. Use the new T420-side organic routing layer in real OpenClaw workflows
2. Keep tightening retrieval quality for the newly seeded active projects
3. Define the first broader AtoVault/AtoDrive ingestion batches
2. Tighten retrieval quality for the now fully ingested active project corpora
3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
5. Expand the boring operations baseline:
- restore validation
@@ -234,6 +236,7 @@ This separation is healthy:
See also:
- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
## Guiding Constraints