docs(arch): memory-vs-entities, promotion-rules, conflict-model

Three planning docs that answer the architectural questions the engineering query catalog raised. Together with the catalog they form roughly half of the pre-implementation planning sprint. docs/architecture/memory-vs-entities.md --------------------------------------- Resolves the central question blocking every other engineering layer doc: is a Decision a memory or an entity? Key decisions: - memories stay the canonical home for identity, preference, and episodic facts - entities become the canonical home for project, knowledge, and adaptation facts once the engineering layer V1 ships - no concept lives in both layers at full fidelity; one canonical home per concept - a "graduation" flow lets active memories upgrade into entities (memory stays as a frozen historical pointer, never deleted) - one shared candidate review queue across both layers - context builder budget gains a 15% slot for engineering entities, slotted between identity/preference memories and retrieved chunks - the Phase 9 memory extractor's structural cues (decision heading, constraint heading, requirement heading) are explicitly an intentional temporary overlap, cleanly migrated via graduation when the entity extractor ships docs/architecture/promotion-rules.md ------------------------------------ Defines the full Layer 0 → Layer 2 pipeline: - four layers: L0 raw source, L1 memory candidate/active, L2 entity candidate/active, L3 trusted project state - three extraction triggers: on interaction capture (existing), on ingestion wave (new, batched per wave), on explicit request - per-rule prior confidence tuned at write time by structural signal (echoes the retriever's high/low signal hints) and freshness bonus - batch cap of 50 candidates per pass to protect the reviewer - full provenance requirements: every candidate carries rule id, source_chunk_id, source_interaction_id, and extractor_version - reversibility matrix for every promotion step - explicit no-auto-promotion-in-V1 stance with the schema designed so auto-promotion policies can be added later without migration - the hard invariant: nothing ever moves into L3 automatically - ingestion-wave extraction produces a report artifact under data/extraction-reports/<wave-id>/ docs/architecture/conflict-model.md ----------------------------------- Defines how AtoCore handles contradictory facts without violating the "bad memory is worse than no memory" rule. - conflict = two or more active rows claiming the same slot with incompatible values - per-type "slot key" tuples for both memory and entity types - cross-layer conflict detection respects the trust hierarchy: trusted project state > active entities > active memories - new conflicts and conflict_members tables (schema proposal) - detection at two latencies: synchronous at write time, asynchronous nightly sweep - "flag, never block" rule: writes always succeed, conflicts are surfaced via /conflicts, /health open_conflicts_count, per-row response bodies, and the Human Mirror's disputed marker - resolution is always human: promote-winner + supersede-others, or dismiss-as-not-a-real-conflict, both with audit trail - explicitly out of scope for V1: cross-project conflicts, temporal-overlap conflicts, tolerance-aware numeric comparisons Also updates: - master-plan-status.md: Phase 9 moved from "started" to "baseline complete" now that Commits A, B, C are all landed - master-plan-status.md: adds a "Engineering Layer Planning Sprint" section listing the doc wave so far and the remaining docs (tool-handoff-boundaries, human-mirror-rules, representation-authority, engineering-v1-acceptance) - current-state.md: Phase 9 moved from "not started" to "baseline complete" with the A/B/C annotation This is pure doc work. No code changes, no schema changes, no behavior changes. Per the working rule in master-plan-status.md: the architecture docs shape decisions, they do not force premature schema work.
2026-04-06 21:30:35 -04:00
parent 53147d326c
commit 480f13a6df
5 changed files with 1013 additions and 4 deletions
--- a/docs/architecture/memory-vs-entities.md
+++ b/docs/architecture/memory-vs-entities.md
@@ -0,0 +1,309 @@
+# Memory vs Entities (Engineering Layer V1 boundary)
+
+## Why this document exists
+
+The engineering layer introduces a new representation — typed
+entities with explicit relationships — alongside AtoCore's existing
+memory system and its six memory types. The question that blocks
+every other engineering-layer planning doc is:
+
+> When we extract a fact from an interaction or a document, does it
+> become a memory, an entity, or both? And if both, which one is
+> canonical?
+
+Without an answer, the rest of the engineering layer cannot be
+designed. This document is the answer.
+
+## The short version
+
+- **Memories stay.** They are still the canonical home for
+  *unstructured, attributed, personal, natural-language* facts.
+- **Entities are new.** They are the canonical home for *structured,
+  typed, relational, engineering-domain* facts.
+- **No concept lives in both at full fidelity.** Every concept has
+  exactly one canonical home. The other layer may hold a pointer or
+  a rendered view, never a second source of truth.
+- **The two layers share one review queue.** Candidates from
+  extraction flow into the same `status=candidate` lifecycle
+  regardless of whether they are memory-bound or entity-bound.
+- **Memories can "graduate" into entities** when enough structure has
+  accumulated, but the upgrade is an explicit, logged promotion, not
+  a silent rewrite.
+
+## The split per memory type
+
+The six memory types from the current Phase 2 implementation each
+map to exactly one outcome in V1:
+
+| Memory type   | V1 destination                | Rationale                                                                                                   |
+|---------------|-------------------------------|-------------------------------------------------------------------------------------------------------------|
+| identity      | **memory only**               | Always about the human user. No engineering domain structure. Never gets entity-shaped.                     |
+| preference    | **memory only**               | Always about the human user's working style. Same reasoning.                                                |
+| episodic      | **memory only**               | "What happened in this conversation / this day." Attribution and time are the point, not typed structure.  |
+| knowledge     | **entity when possible**, memory otherwise | If the knowledge maps to a typed engineering object (material property, constant, tolerance), it becomes a Fact entity with provenance. If it's loose general knowledge, stays a memory. |
+| project       | **entity**                    | Anything that belonged in the "project" memory type is really a Requirement, Constraint, Decision, Subsystem attribute, etc. It belongs in the engineering layer once entities exist. |
+| adaptation    | **entity (Decision)**         | "We decided to X" is literally a Decision entity in the ontology. This is the clearest migration.            |
+
+**Practical consequence:** when the engineering layer V1 ships, the
+`project`, `knowledge`, and `adaptation` memory types are deprecated
+as a canonical home for new facts. Existing rows are not deleted —
+they are backfilled as entities through the promotion-rules flow
+(see `promotion-rules.md`), and the old memory rows become frozen
+references pointing at their graduated entity.
+
+The `identity`, `preference`, and `episodic` memory types continue
+to exist exactly as they do today and do not interact with the
+engineering layer at all.
+
+## What "canonical home" actually means
+
+A concept's canonical home is the single place where:
+
+- its *current active value* is stored
+- its *status lifecycle* is managed (active/superseded/invalid)
+- its *confidence* is tracked
+- its *provenance chain* is rooted
+- edits, supersessions, and invalidations are applied
+- conflict resolution is arbitrated
+
+Everything else is a derived view of that canonical row.
+
+If a `Decision` entity is the canonical home for "we switched to
+GF-PTFE pads", then:
+
+- there is no `adaptation` memory row with the same content; the
+  extractor creates a `Decision` candidate directly
+- the context builder, when asked to include relevant state, reaches
+  into the entity store via the engineering layer, not the memory
+  store
+- if the user wants to see "recent decisions" they hit the entity
+  API, never the memory API
+- if they want to invalidate the decision, they do so via the entity
+  API
+
+The memory API remains the canonical home for `identity`,
+`preference`, and `episodic` — same rules, just a different set of
+types.
+
+## Why not a unified table with a `kind` column?
+
+It would be simpler to implement. It is rejected for three reasons:
+
+1. **Different query shapes.** Memories are queried by type, project,
+   confidence, recency. Entities are queried by type, relationships,
+   graph traversal, coverage gaps ("orphan requirements"). Cramming
+   both into one table forces the schema to be the union of both
+   worlds and makes each query slower.
+
+2. **Different lifecycles.** Memories have a simple four-state
+   lifecycle (candidate/active/superseded/invalid). Entities have
+   the same four states *plus* per-relationship supersession,
+   per-field versioning for the killer correctness queries, and
+   structured conflict flagging. The unified table would have to
+   carry all entity apparatus for every memory row.
+
+3. **Different provenance semantics.** A preference memory is
+   provenanced by "the user told me" — one author, one time.
+   An entity like a `Requirement` is provenanced by "this source
+   chunk + this source document + these supporting Results" — a
+   graph. The tables want to be different because their provenance
+   models are different.
+
+So: two tables, one review queue, one promotion flow, one trust
+hierarchy.
+
+## The shared review queue
+
+Both the memory extractor (Phase 9 Commit C, already shipped) and
+the future entity extractor write into the same conceptual queue:
+everything lands at `status=candidate` in its own table, and the
+human reviewer sees a unified list. The reviewer UI (future work)
+shows candidates of all kinds side by side, grouped by source
+interaction / source document, with the rule that fired.
+
+From the data side this means:
+
+- the memories table gets a `candidate` status (**already done in
+  Phase 9 Commit B/C**)
+- the future entities table will get the same `candidate` status
+- both tables get the same `promote` / `reject` API shape: one verb
+  per candidate, with an audit log entry
+
+Implementation note: the API routes should evolve from
+`POST /memory/{id}/promote` to `POST /candidates/{id}/promote` once
+both tables exist, so the reviewer tooling can treat them
+uniformly. The current memory-only route stays in place for
+backward compatibility and is aliased by the unified route.
+
+## Memory-to-entity graduation
+
+Even though the split is clean on paper, real usage will reveal
+memories that deserve to be entities but started as plain text.
+Four signals are good candidates for proposing graduation:
+
+1. **Reference count crosses a threshold.** A memory that has been
+   reinforced 5+ times across multiple interactions is a strong
+   signal that it deserves structure.
+
+2. **Memory content matches a known entity template.** If a
+   `knowledge` memory's content matches the shape "X = value [unit]"
+   it can be proposed as a `Fact` or `Parameter` entity.
+
+3. **A user explicitly asks for promotion.** `POST /memory/{id}/graduate`
+   is the simplest explicit path — it returns a proposal for an
+   entity structured from the memory's content, which the user can
+   accept or reject.
+
+4. **Extraction pass proposes an entity that happens to match an
+   existing memory.** The entity extractor, when scanning a new
+   interaction, sees the same content already exists as a memory
+   and proposes graduation as part of its candidate output.
+
+The graduation flow is:
+
+```
+memory row (active, confidence C)
+      |
+      | propose_graduation()
+      v
+entity candidate row (candidate, confidence C)
+      +
+memory row gets status="graduated" and a forward pointer to the
+entity candidate
+      |
+      | human promotes the candidate entity
+      v
+entity row (active)
+      +
+memory row stays "graduated" permanently (historical record)
+```
+
+The memory is never deleted. It becomes a frozen historical
+pointer to the entity it became. This keeps the audit trail intact
+and lets the Human Mirror show "this decision started life as a
+memory on April 2, was graduated to an entity on April 15, now has
+2 supporting ValidationClaims".
+
+The `graduated` status is a new memory status that gets added when
+the graduation flow is implemented. For now (Phase 9), only the
+three non-graduating types (identity/preference/episodic) would
+ever avoid it, and the three graduating types stay in their current
+memory-only state until the engineering layer ships.
+
+## Context pack assembly after the split
+
+The context builder today (`src/atocore/context/builder.py`) pulls:
+
+1. Trusted Project State
+2. Identity + Preference memories
+3. Retrieved chunks
+
+After the split, it pulls:
+
+1. Trusted Project State (unchanged)
+2. **Identity + Preference memories** (unchanged — these stay memories)
+3. **Engineering-layer facts relevant to the prompt**, queried through
+   the entity API (new)
+4. Retrieved chunks (unchanged, lowest trust)
+
+Note the ordering: identity/preference memories stay above entities,
+because personal style information is always more trusted than
+extracted engineering facts. Entities sit below the personal layer
+but above raw retrieval, because they have structured provenance
+that raw chunks lack.
+
+The budget allocation gains a new slot:
+
+- trusted project state: 20% (unchanged, highest trust)
+- identity memories: 5% (unchanged)
+- preference memories: 5% (unchanged)
+- **engineering entities: 15%** (new — pulls only V1-required
+  objects relevant to the prompt)
+- retrieval: 55% (reduced from 70% to make room)
+
+These are starting numbers. After the engineering layer ships and
+real usage tunes retrieval quality, these will be revisited.
+
+## What the shipped memory types still mean after the split
+
+| Memory type | Still accepts new writes? | V1 destination for new extractions |
+|-------------|---------------------------|------------------------------------|
+| identity    | **yes**                   | memory (no change)                 |
+| preference  | **yes**                   | memory (no change)                 |
+| episodic    | **yes**                   | memory (no change)                 |
+| knowledge   | yes, but only for loose facts | entity (Fact / Parameter) for structured things; memory is a fallback |
+| project     | **no new writes after engineering V1 ships** | entity (Requirement / Constraint / Subsystem attribute) |
+| adaptation  | **no new writes after engineering V1 ships** | entity (Decision) |
+
+"No new writes" means the `create_memory` path will refuse to
+create new `project` or `adaptation` memories once the engineering
+layer V1 ships. Existing rows stay queryable and reinforceable but
+new facts of those kinds must become entities. This keeps the
+canonical-home rule clean going forward.
+
+The deprecation is deferred: it does not happen until the engineering
+layer V1 is demonstrably working against the active project set. Until
+then, the existing memory types continue to accept writes so the
+Phase 9 loop can be exercised without waiting on the engineering
+layer.
+
+## Consequences for Phase 9 (what we just built)
+
+The capture loop, reinforcement, and extractor we shipped today
+are *memory-facing*. They produce memory candidates, reinforce
+memory confidence, and respect the memory status lifecycle. None
+of that changes.
+
+When the engineering layer V1 ships, the extractor in
+`src/atocore/memory/extractor.py` gets a sibling in
+`src/atocore/entities/extractor.py` that uses the same
+interaction-scanning approach but produces entity candidates
+instead. The `POST /interactions/{id}/extract` endpoint either:
+
+- runs both extractors and returns a combined result, or
+- gains a `?target=memory|entities|both` query parameter
+
+and the decision between those two shapes can wait until the
+entity extractor actually exists.
+
+Until the entity layer is real, the memory extractor also has to
+cover some things that will eventually move to entities (decisions,
+constraints, requirements). **That overlap is temporary and
+intentional.** Rather than leave those cues unextracted for months
+while the entity layer is being built, the memory extractor
+surfaces them as memory candidates. Later, a migration pass will
+propose graduation on every active memory created by
+`decision_heading`, `constraint_heading`, and `requirement_heading`
+rules once the entity types exist to receive them.
+
+So: **no rework in Phase 9, no wasted extraction, clean handoff
+once the entity layer lands**.
+
+## Open questions this document does NOT answer
+
+These are deliberately deferred to later planning docs:
+
+1. **When exactly does extraction fire?** (answered by
+   `promotion-rules.md`)
+2. **How are conflicts between a memory and an entity handled
+   during graduation?** (answered by `conflict-model.md`)
+3. **Does the context builder traverse the entity graph for
+   relationship-rich queries, or does it only surface direct facts?**
+   (answered by the context-builder spec in a future
+   `engineering-context-integration.md` doc)
+4. **What is the exact API shape of the unified candidate review
+   queue?** (answered by a future `review-queue-api.md` doc when
+   the entity extractor exists and both tables need one UI)
+
+## TL;DR
+
+- memories = user-facing unstructured facts, still own identity/preference/episodic
+- entities = engineering-facing typed facts, own project/knowledge/adaptation
+- one canonical home per concept, never both
+- one shared candidate-review queue, same promote/reject shape
+- graduated memories stay as frozen historical pointers
+- Phase 9 stays memory-only and ships today; entity V1 follows the
+  remaining architecture docs in this planning sprint
+- no rework required when the entity layer lands; the current memory
+  extractor's structural cues get migrated forward via explicit
+  graduation