Files
ATOCore/docs/architecture/representation-authority.md

274 lines
16 KiB
Markdown
Raw Permalink Normal View History

docs(arch): tool-handoff-boundaries + representation-authority Session 3 of the four-session plan. Two more engineering planning docs that lock in the most contentious architectural decisions before V1 implementation begins. docs/architecture/tool-handoff-boundaries.md -------------------------------------------- Locks in the V1 read/write relationship with external tools: - AtoCore is a one-way mirror in V1. External tools push, AtoCore reads, AtoCore never writes back. - Per-tool stance table covering KB-CAD, KB-FEM, NX, PKM, Gitea repos, OpenClaw, AtoDrive, PLM/vendor systems - Two new ingest endpoints proposed for V1: POST /ingest/kb-cad/export and POST /ingest/kb-fem/export - Sketch JSON shapes for both exports (intentionally minimal, to be refined in dedicated schema docs during implementation) - Drift handling: KB-CAD changes a value -> creates an entity candidate -> existing active becomes a conflict member -> human resolves via the conflict model - Hard-line invariants V1 will not cross: no write to external tools, no live polling, no silent merging, no schema fan-out, no external-tool-specific logic in entity types - Why not bidirectional: schema drift, conflict semantics, trust hierarchy, velocity, reversibility - V2+ deferred items: selective write-back annotations, light polling, direct NX integration, cost/vendor/PLM connections - Open questions for the implementation sprint: schema location, who runs the exporter, full-vs-incremental, exporter auth docs/architecture/representation-authority.md --------------------------------------------- The canonical-home matrix that says where each kind of fact actually lives: - Six representation layers identified: PKM, KB project, Gitea repos, AtoCore memories, AtoCore entities, AtoCore project_state - The hard rule: every fact kind has exactly one canonical home; other layers may hold derived copies but never disagree - Comprehensive matrix covering 22 fact kinds (CAD geometry, CAD-side structure, FEM mesh, FEM results, code, repo docs, PKM prose, identity, preference, episodic, decision, requirement, constraint, validation claim, material, parameter, project status, ADRs, runbooks, backup metadata, interactions) - Cross-layer supremacy rule: project_state > tool-of-origin > entities > active memories > source chunks - Three worked examples showing how the rules apply: * "what material does the lateral support pad use?" (KB-CAD canonical, project_state override possible) * "did we decide to merge the bind mounts?" (Gitea + memory both canonical for different aspects) * "what's p05's current next focus?" (project_state always wins for current state queries) - Concrete consequences for V1 implementation: Material and Parameter are mostly KB-CAD shadows; Decisions / Requirements / Constraints / ValidationClaims are AtoCore-canonical; PKM is never authoritative; project_state is the override layer; the conflict model is the enforcement mechanism - Out of scope for V1: facts about other people, vendor/cost facts, time-bounded facts, cross-project shared facts - Open questions for V1: how the reviewer sees canonical home in the UI, whether entities need an explicit canonical_home field, how project_state overrides surface in query results This is pure doc work. No code, no schema, no behavior changes. After this commit the engineering planning sprint is 6 of 8 docs done — only human-mirror-rules and engineering-v1-acceptance remain.
2026-04-07 06:50:56 -04:00
# Representation Authority (canonical home matrix)
## Why this document exists
The same fact about an engineering project can show up in many
places: a markdown note in the PKM, a structured field in KB-CAD,
a commit message in a Gitea repo, an active memory in AtoCore, an
entity in the engineering layer, a row in trusted project state.
**Without an explicit rule about which representation is
authoritative for which kind of fact, the system will accumulate
contradictions and the human will lose trust in all of them.**
This document is the canonical-home matrix. Every kind of fact
that AtoCore handles has exactly one authoritative representation,
and every other place that holds a copy of that fact is, by
definition, a derived view that may be stale.
## The representations in scope
Six places where facts can live in this ecosystem:
| Layer | What it is | Who edits it | How it's structured |
|---|---|---|---|
| **PKM** | Antoine's Obsidian-style markdown vault under `/srv/storage/atocore/sources/vault/` | Antoine, by hand | unstructured markdown with optional frontmatter |
| **KB project** | the engineering Knowledge Base (KB-CAD / KB-FEM repos and any companion docs) | Antoine, semi-structured | per-tool typed records |
| **Gitea repos** | source code repos under `dalidou:3000/Antoine/*` (Fullum-Interferometer, polisher-sim, ATOCore itself, ...) | Antoine via git commits | code, READMEs, repo-specific markdown |
| **AtoCore memories** | rows in the `memories` table | hand-authored or extracted from interactions | typed (identity / preference / project / episodic / knowledge / adaptation) |
| **AtoCore entities** | rows in the `entities` table (V1, not yet built) | imported from KB exports or extracted from interactions | typed entities + relationships per the V1 ontology |
| **AtoCore project state** | rows in the `project_state` table (Layer 3, trusted) | hand-curated only, never automatic | category + key + value |
## The canonical home rule
> For each kind of fact, exactly one of the six representations is
> the authoritative source. The other five may hold derived
> copies, but they are not allowed to disagree with the
> authoritative one. When they disagree, the disagreement is a
> conflict and surfaces via the conflict model.
The matrix below assigns the authoritative representation per fact
kind. It is the practical answer to the question "where does this
fact actually live?" for daily decisions.
## The canonical-home matrix
| Fact kind | Canonical home | Why | How it gets into AtoCore |
|---|---|---|---|
| **CAD geometry** (the actual model) | NX (or successor CAD tool) | the only place that can render and validate it | not in AtoCore at all in V1 |
| **CAD-side structure** (subsystem tree, component list, materials, parameters) | KB-CAD | KB-CAD is the structured wrapper around NX | KB-CAD export → `/ingest/kb-cad/export` → entities |
| **FEM mesh & solver settings** | KB-FEM (wrapping the FEM tool) | only the solver representation can run | not in AtoCore at all in V1 |
| **FEM results & validation outcomes** | KB-FEM | KB-FEM owns the outcome records | KB-FEM export → `/ingest/kb-fem/export` → entities |
| **Source code** | Gitea repos | repos are version-controlled and reviewable | indirectly via repo markdown ingestion (Phase 1) |
| **Repo-level documentation** (READMEs, design docs in the repo) | Gitea repos | lives next to the code it documents | ingested as source chunks; never hand-edited in AtoCore |
| **Project-level prose notes** (decisions in long-form, journal-style entries, working notes) | PKM | the place Antoine actually writes when thinking | ingested as source chunks; the extractor proposes candidates from these for the review queue |
| **Identity** ("the user is a mechanical engineer running AtoCore") | AtoCore memories (`identity` type) | nowhere else holds personal identity | hand-authored via `POST /memory` or extracted from interactions |
| **Preference** ("prefers small reviewable diffs", "uses SI units") | AtoCore memories (`preference` type) | nowhere else holds personal preferences | hand-authored or extracted |
| **Episodic** ("on April 6 we debugged the EXDEV bug") | AtoCore memories (`episodic` type) | nowhere else has time-bound personal recall | extracted from captured interactions |
| **Decision** (a structured engineering decision) | AtoCore **entities** (Decision) once the engineering layer ships; AtoCore memories (`adaptation`) until then | needs structured supersession, audit trail, and link to affected components | extracted from PKM or interactions; promoted via review queue |
| **Requirement** | AtoCore **entities** (Requirement) | needs structured satisfaction tracking | extracted from PKM, KB-CAD, or interactions |
| **Constraint** | AtoCore **entities** (Constraint) | needs structured link to the entity it constrains | extracted from PKM, KB-CAD, or interactions |
| **Validation claim** | AtoCore **entities** (ValidationClaim) | needs structured link to supporting Result | extracted from KB-FEM exports or interactions |
| **Material** | KB-CAD if the material is on a real component; AtoCore entity (Material) if it's a project-wide material decision not yet attached to geometry | structured properties live in KB-CAD's material database | KB-CAD export, or hand-authored as a Material entity |
| **Parameter** | KB-CAD or KB-FEM depending on whether it's a geometry or solver parameter; AtoCore entity (Parameter) if it's a higher-level project parameter not in either tool | structured numeric values with units live in their tool of origin | KB export, or hand-authored |
| **Project status / current focus / next milestone** | AtoCore **project_state** (Layer 3) | the trust hierarchy says trusted state is the highest authority for "what is the current state of the project" | hand-curated via `POST /project/state` |
| **Architectural decision records (ADRs)** | depends on form: long-form ADR markdown lives in the repo; the structured fact about which ADR was selected lives in the AtoCore Decision entity | both representations are useful for different audiences | repo ingestion provides the prose; the entity is created by extraction or hand-authored |
| **Operational runbooks** | repo (next to the code they describe) | lives with the system it operates | not promoted into AtoCore entities — runbooks are reference material, not facts |
| **Backup metadata** (snapshot timestamps, integrity status) | the backup-metadata.json files under `/srv/storage/atocore/backups/` | each snapshot is its own self-describing record | not in AtoCore's database; queried via the `/admin/backup` endpoints |
| **Conversation history with AtoCore (interactions)** | AtoCore `interactions` table | nowhere else has the prompt + context pack + response triple | written by capture (Phase 9 Commit A) |
## The supremacy rule for cross-layer facts
When the same fact has copies in multiple representations and they
disagree, the trust hierarchy applies in this order:
1. **AtoCore project_state** (Layer 3) is highest authority for any
"current state of the project" question. This is why it requires
manual curation and never gets touched by automatic processes.
2. **The tool-of-origin canonical home** is highest authority for
facts that are tool-managed: KB-CAD wins over AtoCore entities
for CAD-side structure facts; KB-FEM wins for FEM result facts.
3. **AtoCore entities** are highest authority for facts that are
AtoCore-managed: Decisions, Requirements, Constraints,
ValidationClaims (when the supporting Results are still loose).
4. **Active AtoCore memories** are highest authority for personal
facts (identity, preference, episodic).
5. **Source chunks (PKM, repos, ingested docs)** are lowest
authority — they are the raw substrate from which higher layers
are extracted, but they may be stale, contradictory among
themselves, or out of date.
This is the same hierarchy enforced by `conflict-model.md`. This
document just makes it explicit per fact kind.
## Examples
### Example 1 — "what material does the lateral support pad use?"
Possible representations:
- KB-CAD has the field `component.lateral-support-pad.material = "GF-PTFE"`
- A PKM note from last month says "considering PEEK for the
lateral support, GF-PTFE was the previous choice"
- An AtoCore Material entity says `GF-PTFE`
- An AtoCore project_state entry says `p05 / decision /
lateral_support_material = GF-PTFE`
Which one wins for the question "what's the current material"?
- **project_state wins** if the query is "what is the current
trusted answer for p05's lateral support material" (Layer 3)
- **KB-CAD wins** if project_state has not been curated for this
field yet, because KB-CAD is the canonical home for CAD-side
structure
- **The Material entity** is a derived view from KB-CAD; if it
disagrees with KB-CAD, the entity is wrong and a conflict is
surfaced
- **The PKM note** is historical context, not authoritative for
"current"
### Example 2 — "did we decide to merge the bind mounts?"
Possible representations:
- A working session interaction is captured in the `interactions`
table with the response containing `## Decision: merge the two
bind mounts into one`
- The Phase 9 Commit C extractor produced a candidate adaptation
memory from that decision
- A reviewer promoted the candidate to active
- The AtoCore source repo has the actual code change in commit
`d0ff8b5` and the docker-compose.yml is in its post-merge form
Which one wins for "is this decision real and current"?
- **The Gitea repo** wins for "is this decision implemented" —
the docker-compose.yml is the canonical home for the actual
bind mount configuration
- **The active adaptation memory** wins for "did we decide this"
— that's exactly what the Commit C lifecycle is for
- **The interaction record** is the audit trail — it's
authoritative for "when did this conversation happen and what
did the LLM say", but not for "is this decision current"
- **The source chunks** from PKM are not relevant here because no
PKM note about this decision exists yet (and that's fine —
decisions don't have to live in PKM if they live in the repo
and the AtoCore memory)
### Example 3 — "what's p05's current next focus?"
Possible representations:
- The PKM has a `current-status.md` note updated last week
- AtoCore project_state has `p05 / status / next_focus = "wave 2 ingestion"`
- A captured interaction from yesterday discussed the next focus
at length
Which one wins?
- **project_state wins**, full stop. The trust hierarchy says
Layer 3 is canonical for current state. This is exactly the
reason project_state exists.
- The PKM note is historical context.
- The interaction is conversation history.
- If project_state and the PKM disagree, the human updates one or
the other to bring them in line — usually by re-curating
project_state if the conversation revealed a real change.
## What this means for the engineering layer V1 implementation
Several concrete consequences fall out of the matrix:
1. **The Material and Parameter entity types are mostly KB-CAD
shadows in V1.** They exist in AtoCore so other entities
(Decisions, Requirements) can reference them with structured
links, but their authoritative values come from KB-CAD imports.
If KB-CAD doesn't know about a material, the AtoCore entity is
the canonical home only because nothing else is.
2. **Decisions / Requirements / Constraints / ValidationClaims
are AtoCore-canonical.** These don't have a natural home in
KB-CAD or KB-FEM. They live in AtoCore as first-class entities
with full lifecycle and supersession.
3. **The PKM is never authoritative.** It is the substrate for
extraction. The reviewer promotes things out of it; they don't
point at PKM notes as the "current truth".
4. **project_state is the override layer.** Whenever the human
wants to declare "the current truth is X regardless of what
the entities and memories and KB exports say", they curate
into project_state. Layer 3 is intentionally small and
intentionally manual.
5. **The conflict model is the enforcement mechanism.** When two
representations disagree on a fact whose canonical home rule
should pick a winner, the conflict surfaces via the
`/conflicts` endpoint and the reviewer resolves it. The
matrix in this document tells the reviewer who is supposed
to win in each scenario; they're not making the decision blind.
## What the matrix does NOT define
1. **Facts about people other than the user.** No "team member"
entity, no per-collaborator preferences. AtoCore is
single-user in V1.
2. **Facts about AtoCore itself as a project.** Those are project
memories and project_state entries under `project=atocore`,
same lifecycle as any other project's facts.
3. **Vendor / supplier / cost facts.** Out of V1 scope.
4. **Time-bounded facts** (a value that was true between two
dates and may not be true now). The current matrix treats all
active facts as currently-true and uses supersession to
represent change. Temporal facts are a V2 concern.
5. **Cross-project shared facts** (a Material that is reused across
p04, p05, and p06). Currently each project has its own copy.
Cross-project deduplication is also a V2 concern.
## The "single canonical home" invariant in practice
The hard rule that every fact has exactly one canonical home is
the load-bearing invariant of this matrix. To enforce it
operationally:
- **Extraction never duplicates.** When the extractor scans an
interaction or a source chunk and proposes a candidate, the
candidate is dropped if it duplicates an already-active record
in the canonical home (the existing extractor implementation
already does this for memories; the entity extractor will
follow the same pattern).
- **Imports never duplicate.** When KB-CAD pushes the same
Component twice with the same value, the second push is
recognized as identical and updates the `last_imported_at`
timestamp without creating a new entity.
- **Imports surface drift as conflict.** When KB-CAD pushes the
same Component with a different value, that's a conflict per
the conflict model — never a silent overwrite.
- **Hand-curation into project_state always wins.** A
project_state entry can disagree with an entity or a KB
export; the project_state entry is correct by fiat (Layer 3
trust), and the reviewer is responsible for bringing the lower
layers in line if appropriate.
## Open questions for V1 implementation
1. **How does the reviewer see the canonical home for a fact in
the UI?** Probably by including the fact's authoritative
layer in the entity / memory detail view: "this Material is
currently mirrored from KB-CAD; the canonical home is KB-CAD".
2. **Who owns running the KB-CAD / KB-FEM exporter?** The
`tool-handoff-boundaries.md` doc lists this as an open
question; same answer applies here.
3. **Do we need an explicit `canonical_home` field on entity
rows?** A field that records "this entity is canonical here"
vs "this entity is a mirror of <external system>". Probably
yes; deferred to the entity schema spec.
4. **How are project_state overrides surfaced in the engineering
layer query results?** When a query (e.g. Q-001 "what does
this subsystem contain?") would return entity rows, the result
should also flag any project_state entries that contradict the
entities — letting the reviewer see the override at query
time, not just in the conflict queue.
## TL;DR
- Six representation layers: PKM, KB project, repos, AtoCore
memories, AtoCore entities, AtoCore project_state
- Every fact kind has exactly one canonical home
- The trust hierarchy resolves cross-layer conflicts:
project_state > tool-of-origin (KB-CAD/KB-FEM) > entities >
active memories > source chunks
- Decisions / Requirements / Constraints / ValidationClaims are
AtoCore-canonical (no other system has a natural home for them)
- Materials / Parameters / CAD-side structure are KB-CAD-canonical
- FEM results / validation outcomes are KB-FEM-canonical
- project_state is the human override layer, top of the
hierarchy, manually curated only
- Conflicts surface via `/conflicts` and the reviewer applies the
matrix to pick a winner