# Memory vs Entities (Engineering Layer V1 boundary) ## Why this document exists The engineering layer introduces a new representation — typed entities with explicit relationships — alongside AtoCore's existing memory system and its six memory types. The question that blocks every other engineering-layer planning doc is: > When we extract a fact from an interaction or a document, does it > become a memory, an entity, or both? And if both, which one is > canonical? Without an answer, the rest of the engineering layer cannot be designed. This document is the answer. ## The short version - **Memories stay.** They are still the canonical home for *unstructured, attributed, personal, natural-language* facts. - **Entities are new.** They are the canonical home for *structured, typed, relational, engineering-domain* facts. - **No concept lives in both at full fidelity.** Every concept has exactly one canonical home. The other layer may hold a pointer or a rendered view, never a second source of truth. - **The two layers share one review queue.** Candidates from extraction flow into the same `status=candidate` lifecycle regardless of whether they are memory-bound or entity-bound. - **Memories can "graduate" into entities** when enough structure has accumulated, but the upgrade is an explicit, logged promotion, not a silent rewrite. ## The split per memory type The six memory types from the current Phase 2 implementation each map to exactly one outcome in V1: | Memory type | V1 destination | Rationale | |---------------|-------------------------------|-------------------------------------------------------------------------------------------------------------| | identity | **memory only** | Always about the human user. No engineering domain structure. Never gets entity-shaped. | | preference | **memory only** | Always about the human user's working style. Same reasoning. | | episodic | **memory only** | "What happened in this conversation / this day." Attribution and time are the point, not typed structure. | | knowledge | **entity when possible**, memory otherwise | If the knowledge maps to a typed engineering object (material property, constant, tolerance), it becomes a Fact entity with provenance. If it's loose general knowledge, stays a memory. | | project | **entity** | Anything that belonged in the "project" memory type is really a Requirement, Constraint, Decision, Subsystem attribute, etc. It belongs in the engineering layer once entities exist. | | adaptation | **entity (Decision)** | "We decided to X" is literally a Decision entity in the ontology. This is the clearest migration. | **Practical consequence:** when the engineering layer V1 ships, the `project`, `knowledge`, and `adaptation` memory types are deprecated as a canonical home for new facts. Existing rows are not deleted — they are backfilled as entities through the promotion-rules flow (see `promotion-rules.md`), and the old memory rows become frozen references pointing at their graduated entity. The `identity`, `preference`, and `episodic` memory types continue to exist exactly as they do today and do not interact with the engineering layer at all. ## What "canonical home" actually means A concept's canonical home is the single place where: - its *current active value* is stored - its *status lifecycle* is managed (active/superseded/invalid) - its *confidence* is tracked - its *provenance chain* is rooted - edits, supersessions, and invalidations are applied - conflict resolution is arbitrated Everything else is a derived view of that canonical row. If a `Decision` entity is the canonical home for "we switched to GF-PTFE pads", then: - there is no `adaptation` memory row with the same content; the extractor creates a `Decision` candidate directly - the context builder, when asked to include relevant state, reaches into the entity store via the engineering layer, not the memory store - if the user wants to see "recent decisions" they hit the entity API, never the memory API - if they want to invalidate the decision, they do so via the entity API The memory API remains the canonical home for `identity`, `preference`, and `episodic` — same rules, just a different set of types. ## Why not a unified table with a `kind` column? It would be simpler to implement. It is rejected for three reasons: 1. **Different query shapes.** Memories are queried by type, project, confidence, recency. Entities are queried by type, relationships, graph traversal, coverage gaps ("orphan requirements"). Cramming both into one table forces the schema to be the union of both worlds and makes each query slower. 2. **Different lifecycles.** Memories have a simple four-state lifecycle (candidate/active/superseded/invalid). Entities have the same four states *plus* per-relationship supersession, per-field versioning for the killer correctness queries, and structured conflict flagging. The unified table would have to carry all entity apparatus for every memory row. 3. **Different provenance semantics.** A preference memory is provenanced by "the user told me" — one author, one time. An entity like a `Requirement` is provenanced by "this source chunk + this source document + these supporting Results" — a graph. The tables want to be different because their provenance models are different. So: two tables, one review queue, one promotion flow, one trust hierarchy. ## The shared review queue Both the memory extractor (Phase 9 Commit C, already shipped) and the future entity extractor write into the same conceptual queue: everything lands at `status=candidate` in its own table, and the human reviewer sees a unified list. The reviewer UI (future work) shows candidates of all kinds side by side, grouped by source interaction / source document, with the rule that fired. From the data side this means: - the memories table gets a `candidate` status (**already done in Phase 9 Commit B/C**) - the future entities table will get the same `candidate` status - both tables get the same `promote` / `reject` API shape: one verb per candidate, with an audit log entry Implementation note: the API routes should evolve from `POST /memory/{id}/promote` to `POST /candidates/{id}/promote` once both tables exist, so the reviewer tooling can treat them uniformly. The current memory-only route stays in place for backward compatibility and is aliased by the unified route. ## Memory-to-entity graduation Even though the split is clean on paper, real usage will reveal memories that deserve to be entities but started as plain text. Four signals are good candidates for proposing graduation: 1. **Reference count crosses a threshold.** A memory that has been reinforced 5+ times across multiple interactions is a strong signal that it deserves structure. 2. **Memory content matches a known entity template.** If a `knowledge` memory's content matches the shape "X = value [unit]" it can be proposed as a `Fact` or `Parameter` entity. 3. **A user explicitly asks for promotion.** `POST /memory/{id}/graduate` is the simplest explicit path — it returns a proposal for an entity structured from the memory's content, which the user can accept or reject. 4. **Extraction pass proposes an entity that happens to match an existing memory.** The entity extractor, when scanning a new interaction, sees the same content already exists as a memory and proposes graduation as part of its candidate output. The graduation flow is: ``` memory row (active, confidence C) | | propose_graduation() v entity candidate row (candidate, confidence C) + memory row gets status="graduated" and a forward pointer to the entity candidate | | human promotes the candidate entity v entity row (active) + memory row stays "graduated" permanently (historical record) ``` The memory is never deleted. It becomes a frozen historical pointer to the entity it became. This keeps the audit trail intact and lets the Human Mirror show "this decision started life as a memory on April 2, was graduated to an entity on April 15, now has 2 supporting ValidationClaims". The `graduated` status is a new memory status that gets added when the graduation flow is implemented. For now (Phase 9), only the three non-graduating types (identity/preference/episodic) would ever avoid it, and the three graduating types stay in their current memory-only state until the engineering layer ships. ## Context pack assembly after the split The context builder today (`src/atocore/context/builder.py`) pulls: 1. Trusted Project State 2. Identity + Preference memories 3. Retrieved chunks After the split, it pulls: 1. Trusted Project State (unchanged) 2. **Identity + Preference memories** (unchanged — these stay memories) 3. **Engineering-layer facts relevant to the prompt**, queried through the entity API (new) 4. Retrieved chunks (unchanged, lowest trust) Note the ordering: identity/preference memories stay above entities, because personal style information is always more trusted than extracted engineering facts. Entities sit below the personal layer but above raw retrieval, because they have structured provenance that raw chunks lack. The budget allocation gains a new slot: - trusted project state: 20% (unchanged, highest trust) - identity memories: 5% (unchanged) - preference memories: 5% (unchanged) - **engineering entities: 15%** (new — pulls only V1-required objects relevant to the prompt) - retrieval: 55% (reduced from 70% to make room) These are starting numbers. After the engineering layer ships and real usage tunes retrieval quality, these will be revisited. ## What the shipped memory types still mean after the split | Memory type | Still accepts new writes? | V1 destination for new extractions | |-------------|---------------------------|------------------------------------| | identity | **yes** | memory (no change) | | preference | **yes** | memory (no change) | | episodic | **yes** | memory (no change) | | knowledge | yes, but only for loose facts | entity (Fact / Parameter) for structured things; memory is a fallback | | project | **no new writes after engineering V1 ships** | entity (Requirement / Constraint / Subsystem attribute) | | adaptation | **no new writes after engineering V1 ships** | entity (Decision) | "No new writes" means the `create_memory` path will refuse to create new `project` or `adaptation` memories once the engineering layer V1 ships. Existing rows stay queryable and reinforceable but new facts of those kinds must become entities. This keeps the canonical-home rule clean going forward. The deprecation is deferred: it does not happen until the engineering layer V1 is demonstrably working against the active project set. Until then, the existing memory types continue to accept writes so the Phase 9 loop can be exercised without waiting on the engineering layer. ## Consequences for Phase 9 (what we just built) The capture loop, reinforcement, and extractor we shipped today are *memory-facing*. They produce memory candidates, reinforce memory confidence, and respect the memory status lifecycle. None of that changes. When the engineering layer V1 ships, the extractor in `src/atocore/memory/extractor.py` gets a sibling in `src/atocore/entities/extractor.py` that uses the same interaction-scanning approach but produces entity candidates instead. The `POST /interactions/{id}/extract` endpoint either: - runs both extractors and returns a combined result, or - gains a `?target=memory|entities|both` query parameter and the decision between those two shapes can wait until the entity extractor actually exists. Until the entity layer is real, the memory extractor also has to cover some things that will eventually move to entities (decisions, constraints, requirements). **That overlap is temporary and intentional.** Rather than leave those cues unextracted for months while the entity layer is being built, the memory extractor surfaces them as memory candidates. Later, a migration pass will propose graduation on every active memory created by `decision_heading`, `constraint_heading`, and `requirement_heading` rules once the entity types exist to receive them. So: **no rework in Phase 9, no wasted extraction, clean handoff once the entity layer lands**. ## Open questions this document does NOT answer These are deliberately deferred to later planning docs: 1. **When exactly does extraction fire?** (answered by `promotion-rules.md`) 2. **How are conflicts between a memory and an entity handled during graduation?** (answered by `conflict-model.md`) 3. **Does the context builder traverse the entity graph for relationship-rich queries, or does it only surface direct facts?** (answered by the context-builder spec in a future `engineering-context-integration.md` doc) 4. **What is the exact API shape of the unified candidate review queue?** (answered by a future `review-queue-api.md` doc when the entity extractor exists and both tables need one UI) ## TL;DR - memories = user-facing unstructured facts, still own identity/preference/episodic - entities = engineering-facing typed facts, own project/knowledge/adaptation - one canonical home per concept, never both - one shared candidate-review queue, same promote/reject shape - graduated memories stay as frozen historical pointers - Phase 9 stays memory-only and ships today; entity V1 follows the remaining architecture docs in this planning sprint - no rework required when the entity layer lands; the current memory extractor's structural cues get migrated forward via explicit graduation