Files
ATOCore/docs/ingestion-waves.md

2.8 KiB

AtoCore Ingestion Waves

Purpose

This document tracks how the corpus should grow without losing signal quality.

The rule is:

  • ingest in waves
  • validate retrieval after each wave
  • only then widen the source scope

Wave 1 - Active Project Full Markdown Corpus

Status: complete

Projects:

  • p04-gigabit
  • p05-interferometer
  • p06-polisher

What was ingested:

  • the full markdown/text PKM stacks for the three active projects
  • selected staged operational docs already under the Dalidou source roots
  • selected repo markdown/text context for:
    • Fullum-Interferometer
    • polisher-sim
    • Polisher-Toolhead (when markdown exists)

What was intentionally excluded:

  • binaries
  • images
  • PDFs
  • generated outputs unless they were plain text reports
  • dependency folders
  • hidden runtime junk

Practical result:

  • AtoCore moved from a curated-seed corpus to a real active-project corpus
  • the live corpus now contains well over one thousand source documents and over twenty thousand chunks
  • project-specific context building is materially stronger than before

Main lesson from Wave 1:

  • full project ingestion is valuable
  • but broad historical/archive material can dilute retrieval for underspecified prompts
  • context quality now depends more strongly on good project hints and better ranking than on corpus size alone

Wave 2 - Trusted Operational Layer Expansion

Status: next

Goal:

  • expand AtoDrive-style operational truth for the active projects

Candidate inputs:

  • current status dashboards
  • decision logs
  • milestone tracking
  • curated requirements baselines
  • explicit next-step plans

Why this matters:

  • this raises the quality of the high-trust layer instead of only widening general retrieval

Wave 3 - Broader Active Engineering References

Status: planned

Goal:

  • ingest reusable engineering references that support the active project set without dumping the entire vault

Candidate inputs:

  • interferometry reference notes directly tied to p05
  • polishing physics references directly tied to p06
  • mirror and structural reference material directly tied to p04

Rule:

  • only bring in references with a clear connection to active work

Wave 4 - Wider PKM Population

Status: deferred

Goal:

  • widen beyond the active projects while preserving retrieval quality

Preconditions:

  • stronger ranking
  • better project-aware routing
  • stable operational restore path
  • clearer promotion rules for trusted state

Validation After Each Wave

After every ingestion wave, verify:

  • stats
  • project-specific query
  • project-specific context-build
  • debug-context
  • whether trusted project state still dominates when it should
  • whether cross-project bleed is getting worse or better

Working Rule

The next wave should only happen when the current wave is:

  • ingested
  • inspected
  • retrieval-tested
  • operationally stable