# AtoCore Ingestion Waves ## Purpose This document tracks how the corpus should grow without losing signal quality. The rule is: - ingest in waves - validate retrieval after each wave - only then widen the source scope ## Wave 1 - Active Project Full Markdown Corpus Status: complete Projects: - `p04-gigabit` - `p05-interferometer` - `p06-polisher` What was ingested: - the full markdown/text PKM stacks for the three active projects - selected staged operational docs already under the Dalidou source roots - selected repo markdown/text context for: - `Fullum-Interferometer` - `polisher-sim` - `Polisher-Toolhead` (when markdown exists) What was intentionally excluded: - binaries - images - PDFs - generated outputs unless they were plain text reports - dependency folders - hidden runtime junk Practical result: - AtoCore moved from a curated-seed corpus to a real active-project corpus - the live corpus now contains well over one thousand source documents and over twenty thousand chunks - project-specific context building is materially stronger than before Main lesson from Wave 1: - full project ingestion is valuable - but broad historical/archive material can dilute retrieval for underspecified prompts - context quality now depends more strongly on good project hints and better ranking than on corpus size alone ## Wave 2 - Trusted Operational Layer Expansion Status: next Goal: - expand `AtoDrive`-style operational truth for the active projects Candidate inputs: - current status dashboards - decision logs - milestone tracking - curated requirements baselines - explicit next-step plans Why this matters: - this raises the quality of the high-trust layer instead of only widening general retrieval ## Wave 3 - Broader Active Engineering References Status: planned Goal: - ingest reusable engineering references that support the active project set without dumping the entire vault Candidate inputs: - interferometry reference notes directly tied to `p05` - polishing physics references directly tied to `p06` - mirror and structural reference material directly tied to `p04` Rule: - only bring in references with a clear connection to active work ## Wave 4 - Wider PKM Population Status: deferred Goal: - widen beyond the active projects while preserving retrieval quality Preconditions: - stronger ranking - better project-aware routing - stable operational restore path - clearer promotion rules for trusted state ## Validation After Each Wave After every ingestion wave, verify: - `stats` - project-specific `query` - project-specific `context-build` - `debug-context` - whether trusted project state still dominates when it should - whether cross-project bleed is getting worse or better ## Working Rule The next wave should only happen when the current wave is: - ingested - inspected - retrieval-tested - operationally stable