2.8 KiB
2.8 KiB
AtoCore Ingestion Waves
Purpose
This document tracks how the corpus should grow without losing signal quality.
The rule is:
- ingest in waves
- validate retrieval after each wave
- only then widen the source scope
Wave 1 - Active Project Full Markdown Corpus
Status: complete
Projects:
p04-gigabitp05-interferometerp06-polisher
What was ingested:
- the full markdown/text PKM stacks for the three active projects
- selected staged operational docs already under the Dalidou source roots
- selected repo markdown/text context for:
Fullum-Interferometerpolisher-simPolisher-Toolhead(when markdown exists)
What was intentionally excluded:
- binaries
- images
- PDFs
- generated outputs unless they were plain text reports
- dependency folders
- hidden runtime junk
Practical result:
- AtoCore moved from a curated-seed corpus to a real active-project corpus
- the live corpus now contains well over one thousand source documents and over twenty thousand chunks
- project-specific context building is materially stronger than before
Main lesson from Wave 1:
- full project ingestion is valuable
- but broad historical/archive material can dilute retrieval for underspecified prompts
- context quality now depends more strongly on good project hints and better ranking than on corpus size alone
Wave 2 - Trusted Operational Layer Expansion
Status: next
Goal:
- expand
AtoDrive-style operational truth for the active projects
Candidate inputs:
- current status dashboards
- decision logs
- milestone tracking
- curated requirements baselines
- explicit next-step plans
Why this matters:
- this raises the quality of the high-trust layer instead of only widening general retrieval
Wave 3 - Broader Active Engineering References
Status: planned
Goal:
- ingest reusable engineering references that support the active project set without dumping the entire vault
Candidate inputs:
- interferometry reference notes directly tied to
p05 - polishing physics references directly tied to
p06 - mirror and structural reference material directly tied to
p04
Rule:
- only bring in references with a clear connection to active work
Wave 4 - Wider PKM Population
Status: deferred
Goal:
- widen beyond the active projects while preserving retrieval quality
Preconditions:
- stronger ranking
- better project-aware routing
- stable operational restore path
- clearer promotion rules for trusted state
Validation After Each Wave
After every ingestion wave, verify:
stats- project-specific
query - project-specific
context-build debug-context- whether trusted project state still dominates when it should
- whether cross-project bleed is getting worse or better
Working Rule
The next wave should only happen when the current wave is:
- ingested
- inspected
- retrieval-tested
- operationally stable