Files
ATOCore/docs/ingestion-waves.md

130 lines
2.8 KiB
Markdown
Raw Normal View History

# AtoCore Ingestion Waves
## Purpose
This document tracks how the corpus should grow without losing signal quality.
The rule is:
- ingest in waves
- validate retrieval after each wave
- only then widen the source scope
## Wave 1 - Active Project Full Markdown Corpus
Status: complete
Projects:
- `p04-gigabit`
- `p05-interferometer`
- `p06-polisher`
What was ingested:
- the full markdown/text PKM stacks for the three active projects
- selected staged operational docs already under the Dalidou source roots
- selected repo markdown/text context for:
- `Fullum-Interferometer`
- `polisher-sim`
- `Polisher-Toolhead` (when markdown exists)
What was intentionally excluded:
- binaries
- images
- PDFs
- generated outputs unless they were plain text reports
- dependency folders
- hidden runtime junk
Practical result:
- AtoCore moved from a curated-seed corpus to a real active-project corpus
- the live corpus now contains well over one thousand source documents and over
twenty thousand chunks
- project-specific context building is materially stronger than before
Main lesson from Wave 1:
- full project ingestion is valuable
- but broad historical/archive material can dilute retrieval for underspecified
prompts
- context quality now depends more strongly on good project hints and better
ranking than on corpus size alone
## Wave 2 - Trusted Operational Layer Expansion
Status: next
Goal:
- expand `AtoDrive`-style operational truth for the active projects
Candidate inputs:
- current status dashboards
- decision logs
- milestone tracking
- curated requirements baselines
- explicit next-step plans
Why this matters:
- this raises the quality of the high-trust layer instead of only widening
general retrieval
## Wave 3 - Broader Active Engineering References
Status: planned
Goal:
- ingest reusable engineering references that support the active project set
without dumping the entire vault
Candidate inputs:
- interferometry reference notes directly tied to `p05`
- polishing physics references directly tied to `p06`
- mirror and structural reference material directly tied to `p04`
Rule:
- only bring in references with a clear connection to active work
## Wave 4 - Wider PKM Population
Status: deferred
Goal:
- widen beyond the active projects while preserving retrieval quality
Preconditions:
- stronger ranking
- better project-aware routing
- stable operational restore path
- clearer promotion rules for trusted state
## Validation After Each Wave
After every ingestion wave, verify:
- `stats`
- project-specific `query`
- project-specific `context-build`
- `debug-context`
- whether trusted project state still dominates when it should
- whether cross-project bleed is getting worse or better
## Working Rule
The next wave should only happen when the current wave is:
- ingested
- inspected
- retrieval-tested
- operationally stable