Expand active project wave and serialize refreshes
This commit is contained in:
@@ -52,7 +52,7 @@ now includes a first curated ingestion batch for the active projects.
|
||||
- Dalidou Docker deployment foundation
|
||||
- initial AtoCore self-knowledge corpus ingested on Dalidou
|
||||
- T420/OpenClaw read-only AtoCore helper skill
|
||||
- first curated active-project corpus batch for:
|
||||
- full active-project markdown/text corpus wave for:
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
@@ -87,7 +87,7 @@ The Dalidou instance already contains:
|
||||
- Master Plan V3
|
||||
- Build Spec V1
|
||||
- trusted project-state entries for `atocore`
|
||||
- curated staged project docs for:
|
||||
- full staged project markdown/text corpora for:
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
@@ -99,12 +99,12 @@ The Dalidou instance already contains:
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
|
||||
Current live stats after the latest documentation sync and active-project ingest
|
||||
passes:
|
||||
Current live stats after the full active-project wave are now far beyond the
|
||||
initial seed stage:
|
||||
|
||||
- `source_documents`: 36
|
||||
- `source_chunks`: 568
|
||||
- `vectors`: 568
|
||||
- more than `1,100` source documents
|
||||
- more than `20,000` chunks
|
||||
- matching vector count
|
||||
|
||||
The broader long-term corpus is still not fully populated yet. Wider project and
|
||||
vault ingestion remains a deliberate next step rather than something already
|
||||
@@ -115,8 +115,8 @@ primarily visible under:
|
||||
|
||||
- `/srv/storage/atocore/sources/vault/incoming/projects`
|
||||
|
||||
This staged area is now useful for review because it contains the curated
|
||||
project docs that were actually ingested for the first active-project batch.
|
||||
This staged area is now useful for review because it contains the markdown/text
|
||||
project docs that were actually ingested for the full active-project wave.
|
||||
|
||||
It is important to read this staged area correctly:
|
||||
|
||||
@@ -166,10 +166,12 @@ These are curated summaries and extracted stable project signals.
|
||||
|
||||
In `source_documents` / retrieval corpus:
|
||||
|
||||
- real project documents are now present for the same active project set
|
||||
- full project markdown/text corpora are now present for the active project set
|
||||
- retrieval is no longer limited to AtoCore self-knowledge only
|
||||
- the current corpus is still selective rather than exhaustive
|
||||
- that selectivity is intentional at this stage
|
||||
- the current corpus is broad enough that ranking quality matters more than
|
||||
corpus presence alone
|
||||
- underspecified prompts can still pull in historical or archive material, so
|
||||
project-aware routing and better ranking remain important
|
||||
|
||||
The source refresh model now has a concrete foundation in code:
|
||||
|
||||
@@ -223,8 +225,8 @@ This separation is healthy:
|
||||
## Immediate Next Focus
|
||||
|
||||
1. Use the new T420-side organic routing layer in real OpenClaw workflows
|
||||
2. Keep tightening retrieval quality for the newly seeded active projects
|
||||
3. Define the first broader AtoVault/AtoDrive ingestion batches
|
||||
2. Tighten retrieval quality for the now fully ingested active project corpora
|
||||
3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
|
||||
4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
|
||||
5. Expand the boring operations baseline:
|
||||
- restore validation
|
||||
@@ -234,6 +236,7 @@ This separation is healthy:
|
||||
|
||||
See also:
|
||||
|
||||
- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
|
||||
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
|
||||
|
||||
## Guiding Constraints
|
||||
|
||||
129
docs/ingestion-waves.md
Normal file
129
docs/ingestion-waves.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# AtoCore Ingestion Waves
|
||||
|
||||
## Purpose
|
||||
|
||||
This document tracks how the corpus should grow without losing signal quality.
|
||||
|
||||
The rule is:
|
||||
|
||||
- ingest in waves
|
||||
- validate retrieval after each wave
|
||||
- only then widen the source scope
|
||||
|
||||
## Wave 1 - Active Project Full Markdown Corpus
|
||||
|
||||
Status: complete
|
||||
|
||||
Projects:
|
||||
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
|
||||
What was ingested:
|
||||
|
||||
- the full markdown/text PKM stacks for the three active projects
|
||||
- selected staged operational docs already under the Dalidou source roots
|
||||
- selected repo markdown/text context for:
|
||||
- `Fullum-Interferometer`
|
||||
- `polisher-sim`
|
||||
- `Polisher-Toolhead` (when markdown exists)
|
||||
|
||||
What was intentionally excluded:
|
||||
|
||||
- binaries
|
||||
- images
|
||||
- PDFs
|
||||
- generated outputs unless they were plain text reports
|
||||
- dependency folders
|
||||
- hidden runtime junk
|
||||
|
||||
Practical result:
|
||||
|
||||
- AtoCore moved from a curated-seed corpus to a real active-project corpus
|
||||
- the live corpus now contains well over one thousand source documents and over
|
||||
twenty thousand chunks
|
||||
- project-specific context building is materially stronger than before
|
||||
|
||||
Main lesson from Wave 1:
|
||||
|
||||
- full project ingestion is valuable
|
||||
- but broad historical/archive material can dilute retrieval for underspecified
|
||||
prompts
|
||||
- context quality now depends more strongly on good project hints and better
|
||||
ranking than on corpus size alone
|
||||
|
||||
## Wave 2 - Trusted Operational Layer Expansion
|
||||
|
||||
Status: next
|
||||
|
||||
Goal:
|
||||
|
||||
- expand `AtoDrive`-style operational truth for the active projects
|
||||
|
||||
Candidate inputs:
|
||||
|
||||
- current status dashboards
|
||||
- decision logs
|
||||
- milestone tracking
|
||||
- curated requirements baselines
|
||||
- explicit next-step plans
|
||||
|
||||
Why this matters:
|
||||
|
||||
- this raises the quality of the high-trust layer instead of only widening
|
||||
general retrieval
|
||||
|
||||
## Wave 3 - Broader Active Engineering References
|
||||
|
||||
Status: planned
|
||||
|
||||
Goal:
|
||||
|
||||
- ingest reusable engineering references that support the active project set
|
||||
without dumping the entire vault
|
||||
|
||||
Candidate inputs:
|
||||
|
||||
- interferometry reference notes directly tied to `p05`
|
||||
- polishing physics references directly tied to `p06`
|
||||
- mirror and structural reference material directly tied to `p04`
|
||||
|
||||
Rule:
|
||||
|
||||
- only bring in references with a clear connection to active work
|
||||
|
||||
## Wave 4 - Wider PKM Population
|
||||
|
||||
Status: deferred
|
||||
|
||||
Goal:
|
||||
|
||||
- widen beyond the active projects while preserving retrieval quality
|
||||
|
||||
Preconditions:
|
||||
|
||||
- stronger ranking
|
||||
- better project-aware routing
|
||||
- stable operational restore path
|
||||
- clearer promotion rules for trusted state
|
||||
|
||||
## Validation After Each Wave
|
||||
|
||||
After every ingestion wave, verify:
|
||||
|
||||
- `stats`
|
||||
- project-specific `query`
|
||||
- project-specific `context-build`
|
||||
- `debug-context`
|
||||
- whether trusted project state still dominates when it should
|
||||
- whether cross-project bleed is getting worse or better
|
||||
|
||||
## Working Rule
|
||||
|
||||
The next wave should only happen when the current wave is:
|
||||
|
||||
- ingested
|
||||
- inspected
|
||||
- retrieval-tested
|
||||
- operationally stable
|
||||
@@ -29,9 +29,11 @@ This working list should be read alongside:
|
||||
- check whether the top hits are useful
|
||||
- check whether trusted project state remains dominant
|
||||
- reduce cross-project competition and prompt ambiguity where needed
|
||||
3. Continue controlled project ingestion only where the current corpus is still
|
||||
thin
|
||||
- a few additional anchor docs per active project
|
||||
- use `debug-context` to inspect the exact last AtoCore supplement
|
||||
3. Treat the active-project full markdown/text wave as complete
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
4. Define a cleaner source refresh model
|
||||
- make the difference between source truth, staged inputs, and machine store
|
||||
explicit
|
||||
@@ -39,15 +41,20 @@ This working list should be read alongside:
|
||||
- foundation now exists via project registry + per-project refresh API
|
||||
- registration policy + template + proposal + approved registration are now
|
||||
the normal path for new projects
|
||||
5. Integrate the new engineering architecture docs into active planning, not immediate schema code
|
||||
5. Move to Wave 2 trusted-operational ingestion
|
||||
- curated dashboards
|
||||
- decision logs
|
||||
- milestone/current-status views
|
||||
- operational truth, not just raw project notes
|
||||
6. Integrate the new engineering architecture docs into active planning, not immediate schema code
|
||||
- keep `docs/architecture/engineering-knowledge-hybrid-architecture.md` as the target layer model
|
||||
- keep `docs/architecture/engineering-ontology-v1.md` as the V1 structured-domain target
|
||||
- do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
|
||||
6. Define backup and export procedures for Dalidou
|
||||
7. Define backup and export procedures for Dalidou
|
||||
- exercise the new SQLite + registry snapshot path on Dalidou
|
||||
- Chroma backup or rebuild policy
|
||||
- retention and restore validation
|
||||
7. Keep deeper automatic runtime integration modest until the organic read-only
|
||||
8. Keep deeper automatic runtime integration modest until the organic read-only
|
||||
model has proven value
|
||||
|
||||
## Trusted State Status
|
||||
@@ -69,36 +76,39 @@ This materially improves `context/build` quality for project-hinted prompts.
|
||||
|
||||
## Recommended Near-Term Project Work
|
||||
|
||||
The first curated batch is already in.
|
||||
The active-project full markdown/text wave is now in.
|
||||
|
||||
The near-term work is now:
|
||||
|
||||
1. strengthen retrieval quality
|
||||
2. add a few more anchor docs only where retrieval is still weak
|
||||
2. promote or refine trusted operational truth where the broad corpus is now too noisy
|
||||
3. keep trusted project state concise and high-confidence
|
||||
4. widen only through named ingestion waves
|
||||
|
||||
## Recommended Additional Anchor Docs
|
||||
## Recommended Next Wave Inputs
|
||||
|
||||
1. `p04-gigabit`
|
||||
2. `p05-interferometer`
|
||||
3. `p06-polisher`
|
||||
Wave 2 should emphasize trusted operational truth, not bulk historical notes.
|
||||
|
||||
P04:
|
||||
|
||||
- 1 to 2 more strong study summaries
|
||||
- 1 to 2 more meeting notes with actual decisions
|
||||
- current status dashboard
|
||||
- current selected design path
|
||||
- current frame interface truth
|
||||
- current next-step milestone view
|
||||
|
||||
P05:
|
||||
|
||||
- a couple more architecture docs
|
||||
- selected vendor-response notes
|
||||
- possibly one or two NX/WAVE consumer docs
|
||||
- selected vendor path
|
||||
- current error-budget baseline
|
||||
- current architecture freeze or open decisions
|
||||
- current procurement / next-action view
|
||||
|
||||
P06:
|
||||
|
||||
- more explicit interface/schema docs if needed
|
||||
- selected operations or UI docs
|
||||
- a distilled non-empty operational context doc to replace an empty `_context.md`
|
||||
- current system map
|
||||
- current shared contracts baseline
|
||||
- current calibration procedure truth
|
||||
- current July / proving roadmap view
|
||||
|
||||
## Deferred On Purpose
|
||||
|
||||
@@ -115,6 +125,8 @@ The next batch is successful if:
|
||||
- OpenClaw can use AtoCore naturally when context is needed
|
||||
- OpenClaw can infer registered projects and call AtoCore organically for
|
||||
project-knowledge questions
|
||||
- the active-project full corpus wave can be inspected and used concretely
|
||||
through `auto-context`, `context-build`, and `debug-context`
|
||||
- OpenClaw can also register a new project cleanly before refreshing it
|
||||
- existing project registrations can be refined safely before refresh when the
|
||||
staged source set evolves
|
||||
|
||||
@@ -82,6 +82,7 @@ The current helper script exposes:
|
||||
- `project-template`
|
||||
- `detect-project <prompt>`
|
||||
- `auto-context <prompt> [budget] [project]`
|
||||
- `debug-context`
|
||||
- `propose-project ...`
|
||||
- `register-project ...`
|
||||
- `update-project ...`
|
||||
@@ -125,6 +126,8 @@ Recommended first behavior:
|
||||
1. OpenClaw receives a user request
|
||||
2. If the prompt looks like project knowledge, OpenClaw should try:
|
||||
- `auto-context "<prompt>" 3000`
|
||||
- optionally `debug-context` immediately after if a human wants to inspect
|
||||
the exact AtoCore supplement
|
||||
3. If the prompt is clearly asking for trusted current truth, OpenClaw should
|
||||
prefer:
|
||||
- `project-state <project>`
|
||||
|
||||
Reference in New Issue
Block a user