# AtoCore Knowledge Architecture ## The Problem Engineering work produces two kinds of knowledge simultaneously: 1. **Applied knowledge** — specific to the project being worked on ("the p04 support pad layout is driven by CTE gradient analysis") 2. **Domain knowledge** — generalizable insight earned through that work ("Zerodur CTE gradient dominates WFE at fast focal ratios") A system that only stores applied knowledge loses the general insight. A system that mixes them pollutes project context with cross-project noise. AtoCore needs both — separated, but both growing organically from the same conversations. ## The Quality Bar **AtoCore stores earned insight, not information.** The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds?" | Store | Don't store | |-------|-------------| | "Preston removal model breaks down below 5N because the contact assumption fails" | "Preston's equation relates removal rate to pressure and velocity" | | "m=1 (coma) is NOT correctable by force modulation (score 0.09)" | "Zernike polynomials describe wavefront aberrations" | | "At F/1.2, CTE gradient costs ~3nm WFE and drives pad placement" | "Zerodur CTE is 0.05 ppm/K" | | "Quilting limit for 16-inch tool is 234N" | "Quilting is a mid-spatial-frequency artifact in polishing" | The bar is enforced in the LLM extraction system prompt (`src/atocore/memory/extractor_llm.py`) and the auto-triage prompt (`scripts/auto_triage.py`). Both explicitly list examples of what qualifies and what doesn't. ## Architecture ### Five-tier context assembly When AtoCore builds a context pack for any LLM query, it assembles five tiers in strict trust order: ``` Tier 1: Trusted Project State [project-specific, highest trust] Curated key-value entries from the project state API. Example: "decision/vendor_path: Twyman-Green preferred, 4D technical lead but cost-challenged" Tier 2: Identity / Preferences [global, always included] Who the user is and how they work. Example: "Antoine Letarte, mechanical/optical engineer at Atomaste" / "No API keys — uses OAuth exclusively" Tier 3: Project Memories [project-specific] Reinforced memories from the reflection loop, scoped to the queried project. Example: "Firmware interface contract is invariant: controller-job.v1 in, run-log.v1 out" Tier 4: Domain Knowledge [cross-project] Earned engineering insight with project="" and a domain tag. Surfaces in ALL project packs when query-relevant. Example: "[materials] Zerodur CTE gradient dominates WFE at fast focal ratios — costs ~3nm at F/1.2" Tier 5: Retrieved Chunks [project-boosted, lowest trust] Vector-similarity search over the ingested document corpus. Project-hinted but not filtered — cross-project docs can appear at lower rank. ``` ### Budget allocation (at default 3000 chars) | Tier | Budget ratio | Approx chars | Entries | |------|-------------|-------------|---------| | Project State | 20% | 600 | all curated entries | | Identity/Preferences | 5% | 150 | 1 memory | | Project Memories | 25% | 750 | 2-3 memories | | Domain Knowledge | 10% | 300 | 1-2 memories | | Retrieved Chunks | 40% | 1200 | 2-4 chunks | Trim order when budget is tight: chunks first, then domain knowledge, then project memories, then identity, then project state last. ### Knowledge domains The LLM extractor tags domain knowledge with one of these domains: | Domain | What qualifies | |--------|---------------| | `physics` | Optical physics, wave propagation, diffraction, thermal effects | | `materials` | Material properties in context, CTE behavior, stress limits | | `optics` | Lens/mirror design, aberration analysis, metrology techniques | | `mechanics` | Structural FEA insights, support system design, kinematics | | `manufacturing` | Polishing, grinding, machining, process control | | `metrology` | Measurement systems, interferometry, calibration techniques | | `controls` | PID tuning, force control, servo systems, real-time constraints | | `software` | Architecture patterns, testing strategies, deployment insights | | `math` | Numerical methods, optimization, statistical analysis | | `finance` | Cost modeling, procurement strategy, budget optimization | New domains can be added by updating the system prompt in `extractor_llm.py` and `batch_llm_extract_live.py`. ### How domain knowledge is stored Domain tags are embedded as a prefix in the memory content: ``` memory_type: knowledge project: "" ← empty = cross-project content: "[materials] Zerodur CTE gradient dominates WFE at F/1.2" ``` The `[domain]` prefix is a lightweight encoding that avoids a schema migration. The context builder's query-relevance ranking matches on domain terms naturally (a query about "materials" or "CTE" will rank a `[materials]` memory higher). A future migration can parse the prefix into a proper `domain` column. ## How knowledge flows ### Capture → Extract → Triage → Surface ``` 1. CAPTURE Claude Code (Stop hook) or OpenClaw (plugin) → POST /interactions with reinforce=true → Interaction stored on Dalidou 2. EXTRACT (nightly cron, 03:00 UTC) batch_llm_extract_live.py runs claude -p sonnet → For each interaction, the LLM decides: - Is this project-specific? → candidate with project=X - Is this generalizable insight? → candidate with domain=Y, project="" - Is it both? → TWO candidates emitted - Is it common knowledge? → skip (quality bar) → Candidates persisted as status=candidate 3. TRIAGE (nightly, immediately after extraction) auto_triage.py runs claude -p sonnet → Each candidate classified: promote / reject / needs_human → Auto-promote at confidence ≥ 0.8 + no duplicate → Auto-reject stale snapshots, duplicates, common knowledge → Only needs_human reaches the operator 4. SURFACE (every context/build query) → Project-specific memories appear in Tier 3 → Domain knowledge appears in Tier 4 (regardless of project) → Both are query-ranked by overlap-density ``` ### Example: knowledge earned on p04 surfaces on p06 Working on p04-gigabit, you discover that Zerodur CTE gradient is the dominant WFE contributor at fast focal ratios. The extraction produces: ```json [ {"type": "project", "content": "CTE gradient analysis drove the M1 support pad layout — 2nd largest WFE contributor after gravity", "project": "p04-gigabit", "domain": "", "confidence": 0.6}, {"type": "knowledge", "content": "Zerodur CTE gradient dominates WFE contribution at fast focal ratios (F/1.2 = ~3nm)", "project": "", "domain": "materials", "confidence": 0.6} ] ``` Two weeks later, working on p06-polisher (which also uses Zerodur): ``` Query: "thermal effects on polishing accuracy" Project: p06-polisher Tier 3 (Project Memories): [project] Calibration loop adjusts Preston kp from surface measurements... Tier 4 (Domain Knowledge): [materials] Zerodur CTE gradient dominates WFE contribution at fast focal ratios — THIS CAME FROM P04 WORK ``` The insight crosses over without any manual curation. ## Future directions ### Personal knowledge branch The same architecture supports personal domains (health, finance, personal) by adding new domain tags and a trust boundary so Atomaste project data never leaks into personal packs. The domain system is domain-agnostic — it doesn't care whether the domain is "optics" or "nutrition". ### Multi-model extraction Different models can specialize: sonnet for extraction, opus or Gemini for triage review. Independent validation reduces correlated blind spots on what qualifies as "earned insight" vs "common knowledge." ### Reinforcement-based domain promotion A domain-knowledge memory that gets reinforced across multiple projects (its content echoed in p04, p05, and p06 responses) accumulates confidence faster than a project-specific memory. High-confidence domain memories could auto-promote to a "verified knowledge" tier above regular domain knowledge.