Comprehensive architecture doc covering: - The problem (applied vs domain knowledge separation) - The quality bar (earned insight vs common knowledge, with examples) - Five-tier context assembly with budget allocation - Knowledge domains (10 domains: physics through finance) - Domain tag encoding (prefix in content, no schema migration) - Full flow: capture → extract → triage → surface - Cross-project example (p04 insight surfaces in p06 context) - Future directions: personal branch, multi-model, reinforcement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8.0 KiB
AtoCore Knowledge Architecture
The Problem
Engineering work produces two kinds of knowledge simultaneously:
- Applied knowledge — specific to the project being worked on ("the p04 support pad layout is driven by CTE gradient analysis")
- Domain knowledge — generalizable insight earned through that work ("Zerodur CTE gradient dominates WFE at fast focal ratios")
A system that only stores applied knowledge loses the general insight. A system that mixes them pollutes project context with cross-project noise. AtoCore needs both — separated, but both growing organically from the same conversations.
The Quality Bar
AtoCore stores earned insight, not information.
The test: "Would a competent engineer need experience to know this, or could they find it in 30 seconds?"
| Store | Don't store |
|---|---|
| "Preston removal model breaks down below 5N because the contact assumption fails" | "Preston's equation relates removal rate to pressure and velocity" |
| "m=1 (coma) is NOT correctable by force modulation (score 0.09)" | "Zernike polynomials describe wavefront aberrations" |
| "At F/1.2, CTE gradient costs ~3nm WFE and drives pad placement" | "Zerodur CTE is 0.05 ppm/K" |
| "Quilting limit for 16-inch tool is 234N" | "Quilting is a mid-spatial-frequency artifact in polishing" |
The bar is enforced in the LLM extraction system prompt
(src/atocore/memory/extractor_llm.py) and the auto-triage prompt
(scripts/auto_triage.py). Both explicitly list examples of what
qualifies and what doesn't.
Architecture
Five-tier context assembly
When AtoCore builds a context pack for any LLM query, it assembles five tiers in strict trust order:
Tier 1: Trusted Project State [project-specific, highest trust]
Curated key-value entries from the project state API.
Example: "decision/vendor_path: Twyman-Green preferred, 4D
technical lead but cost-challenged"
Tier 2: Identity / Preferences [global, always included]
Who the user is and how they work.
Example: "Antoine Letarte, mechanical/optical engineer at
Atomaste" / "No API keys — uses OAuth exclusively"
Tier 3: Project Memories [project-specific]
Reinforced memories from the reflection loop, scoped to the
queried project. Example: "Firmware interface contract is
invariant: controller-job.v1 in, run-log.v1 out"
Tier 4: Domain Knowledge [cross-project]
Earned engineering insight with project="" and a domain tag.
Surfaces in ALL project packs when query-relevant.
Example: "[materials] Zerodur CTE gradient dominates WFE at
fast focal ratios — costs ~3nm at F/1.2"
Tier 5: Retrieved Chunks [project-boosted, lowest trust]
Vector-similarity search over the ingested document corpus.
Project-hinted but not filtered — cross-project docs can
appear at lower rank.
Budget allocation (at default 3000 chars)
| Tier | Budget ratio | Approx chars | Entries |
|---|---|---|---|
| Project State | 20% | 600 | all curated entries |
| Identity/Preferences | 5% | 150 | 1 memory |
| Project Memories | 25% | 750 | 2-3 memories |
| Domain Knowledge | 10% | 300 | 1-2 memories |
| Retrieved Chunks | 40% | 1200 | 2-4 chunks |
Trim order when budget is tight: chunks first, then domain knowledge, then project memories, then identity, then project state last.
Knowledge domains
The LLM extractor tags domain knowledge with one of these domains:
| Domain | What qualifies |
|---|---|
physics |
Optical physics, wave propagation, diffraction, thermal effects |
materials |
Material properties in context, CTE behavior, stress limits |
optics |
Lens/mirror design, aberration analysis, metrology techniques |
mechanics |
Structural FEA insights, support system design, kinematics |
manufacturing |
Polishing, grinding, machining, process control |
metrology |
Measurement systems, interferometry, calibration techniques |
controls |
PID tuning, force control, servo systems, real-time constraints |
software |
Architecture patterns, testing strategies, deployment insights |
math |
Numerical methods, optimization, statistical analysis |
finance |
Cost modeling, procurement strategy, budget optimization |
New domains can be added by updating the system prompt in
extractor_llm.py and batch_llm_extract_live.py.
How domain knowledge is stored
Domain tags are embedded as a prefix in the memory content:
memory_type: knowledge
project: "" ← empty = cross-project
content: "[materials] Zerodur CTE gradient dominates WFE at F/1.2"
The [domain] prefix is a lightweight encoding that avoids a schema
migration. The context builder's query-relevance ranking matches on
domain terms naturally (a query about "materials" or "CTE" will rank
a [materials] memory higher). A future migration can parse the
prefix into a proper domain column.
How knowledge flows
Capture → Extract → Triage → Surface
1. CAPTURE
Claude Code (Stop hook) or OpenClaw (plugin)
→ POST /interactions with reinforce=true
→ Interaction stored on Dalidou
2. EXTRACT (nightly cron, 03:00 UTC)
batch_llm_extract_live.py runs claude -p sonnet
→ For each interaction, the LLM decides:
- Is this project-specific? → candidate with project=X
- Is this generalizable insight? → candidate with domain=Y, project=""
- Is it both? → TWO candidates emitted
- Is it common knowledge? → skip (quality bar)
→ Candidates persisted as status=candidate
3. TRIAGE (nightly, immediately after extraction)
auto_triage.py runs claude -p sonnet
→ Each candidate classified: promote / reject / needs_human
→ Auto-promote at confidence ≥ 0.8 + no duplicate
→ Auto-reject stale snapshots, duplicates, common knowledge
→ Only needs_human reaches the operator
4. SURFACE (every context/build query)
→ Project-specific memories appear in Tier 3
→ Domain knowledge appears in Tier 4 (regardless of project)
→ Both are query-ranked by overlap-density
Example: knowledge earned on p04 surfaces on p06
Working on p04-gigabit, you discover that Zerodur CTE gradient is the dominant WFE contributor at fast focal ratios. The extraction produces:
[
{"type": "project", "content": "CTE gradient analysis drove the
M1 support pad layout — 2nd largest WFE contributor after gravity",
"project": "p04-gigabit", "domain": "", "confidence": 0.6},
{"type": "knowledge", "content": "Zerodur CTE gradient dominates
WFE contribution at fast focal ratios (F/1.2 = ~3nm)",
"project": "", "domain": "materials", "confidence": 0.6}
]
Two weeks later, working on p06-polisher (which also uses Zerodur):
Query: "thermal effects on polishing accuracy"
Project: p06-polisher
Tier 3 (Project Memories):
[project] Calibration loop adjusts Preston kp from surface measurements...
Tier 4 (Domain Knowledge):
[materials] Zerodur CTE gradient dominates WFE contribution at fast
focal ratios — THIS CAME FROM P04 WORK
The insight crosses over without any manual curation.
Future directions
Personal knowledge branch
The same architecture supports personal domains (health, finance, personal) by adding new domain tags and a trust boundary so Atomaste project data never leaks into personal packs. The domain system is domain-agnostic — it doesn't care whether the domain is "optics" or "nutrition".
Multi-model extraction
Different models can specialize: sonnet for extraction, opus or Gemini for triage review. Independent validation reduces correlated blind spots on what qualifies as "earned insight" vs "common knowledge."
Reinforcement-based domain promotion
A domain-knowledge memory that gets reinforced across multiple projects (its content echoed in p04, p05, and p06 responses) accumulates confidence faster than a project-specific memory. High-confidence domain memories could auto-promote to a "verified knowledge" tier above regular domain knowledge.