# Memory vs Entities (Engineering Layer V1 boundary)

## Why this document exists

The engineering layer introduces a new representation — typed
entities with explicit relationships — alongside AtoCore's existing
memory system and its six memory types. The question that blocks
every other engineering-layer planning doc is:

> When we extract a fact from an interaction or a document, does it
> become a memory, an entity, or both? And if both, which one is
> canonical?

Without an answer, the rest of the engineering layer cannot be
designed. This document is the answer.

## The short version

- **Memories stay.** They are still the canonical home for
  *unstructured, attributed, personal, natural-language* facts.
- **Entities are new.** They are the canonical home for *structured,
  typed, relational, engineering-domain* facts.
- **No concept lives in both at full fidelity.** Every concept has
  exactly one canonical home. The other layer may hold a pointer or
  a rendered view, never a second source of truth.
- **The two layers share one review queue.** Candidates from
  extraction flow into the same `status=candidate` lifecycle
  regardless of whether they are memory-bound or entity-bound.
- **Memories can "graduate" into entities** when enough structure has
  accumulated, but the upgrade is an explicit, logged promotion, not
  a silent rewrite.

## The split per memory type

The six memory types from the current Phase 2 implementation each
map to exactly one outcome in V1:

| Memory type   | V1 destination                | Rationale                                                                                                   |
|---------------|-------------------------------|-------------------------------------------------------------------------------------------------------------|
| identity      | **memory only**               | Always about the human user. No engineering domain structure. Never gets entity-shaped.                     |
| preference    | **memory only**               | Always about the human user's working style. Same reasoning.                                                |
| episodic      | **memory only**               | "What happened in this conversation / this day." Attribution and time are the point, not typed structure.  |
| knowledge     | **entity when possible**, memory otherwise | If the knowledge maps to a typed engineering object (material property, constant, tolerance), it becomes a Fact entity with provenance. If it's loose general knowledge, stays a memory. |
| project       | **entity**                    | Anything that belonged in the "project" memory type is really a Requirement, Constraint, Decision, Subsystem attribute, etc. It belongs in the engineering layer once entities exist. |
| adaptation    | **entity (Decision)**         | "We decided to X" is literally a Decision entity in the ontology. This is the clearest migration.            |

**Practical consequence:** when the engineering layer V1 ships, the
`project`, `knowledge`, and `adaptation` memory types are deprecated
as a canonical home for new facts. Existing rows are not deleted —
they are backfilled as entities through the promotion-rules flow
(see `promotion-rules.md`), and the old memory rows become frozen
references pointing at their graduated entity.

The `identity`, `preference`, and `episodic` memory types continue
to exist exactly as they do today and do not interact with the
engineering layer at all.

## What "canonical home" actually means

A concept's canonical home is the single place where:

- its *current active value* is stored
- its *status lifecycle* is managed (active/superseded/invalid)
- its *confidence* is tracked
- its *provenance chain* is rooted
- edits, supersessions, and invalidations are applied
- conflict resolution is arbitrated

Everything else is a derived view of that canonical row.

If a `Decision` entity is the canonical home for "we switched to
GF-PTFE pads", then:

- there is no `adaptation` memory row with the same content; the
  extractor creates a `Decision` candidate directly
- the context builder, when asked to include relevant state, reaches
  into the entity store via the engineering layer, not the memory
  store
- if the user wants to see "recent decisions" they hit the entity
  API, never the memory API
- if they want to invalidate the decision, they do so via the entity
  API

The memory API remains the canonical home for `identity`,
`preference`, and `episodic` — same rules, just a different set of
types.

## Why not a unified table with a `kind` column?

It would be simpler to implement. It is rejected for three reasons:

1. **Different query shapes.** Memories are queried by type, project,
   confidence, recency. Entities are queried by type, relationships,
   graph traversal, coverage gaps ("orphan requirements"). Cramming
   both into one table forces the schema to be the union of both
   worlds and makes each query slower.

2. **Different lifecycles.** Memories have a simple four-state
   lifecycle (candidate/active/superseded/invalid). Entities have
   the same four states *plus* per-relationship supersession,
   per-field versioning for the killer correctness queries, and
   structured conflict flagging. The unified table would have to
   carry all entity apparatus for every memory row.

3. **Different provenance semantics.** A preference memory is
   provenanced by "the user told me" — one author, one time.
   An entity like a `Requirement` is provenanced by "this source
   chunk + this source document + these supporting Results" — a
   graph. The tables want to be different because their provenance
   models are different.

So: two tables, one review queue, one promotion flow, one trust
hierarchy.

## The shared review queue

Both the memory extractor (Phase 9 Commit C, already shipped) and
the future entity extractor write into the same conceptual queue:
everything lands at `status=candidate` in its own table, and the
human reviewer sees a unified list. The reviewer UI (future work)
shows candidates of all kinds side by side, grouped by source
interaction / source document, with the rule that fired.

From the data side this means:

- the memories table gets a `candidate` status (**already done in
  Phase 9 Commit B/C**)
- the future entities table will get the same `candidate` status
- both tables get the same `promote` / `reject` API shape: one verb
  per candidate, with an audit log entry

Implementation note: the API routes should evolve from
`POST /memory/{id}/promote` to `POST /candidates/{id}/promote` once
both tables exist, so the reviewer tooling can treat them
uniformly. The current memory-only route stays in place for
backward compatibility and is aliased by the unified route.

## Memory-to-entity graduation

Even though the split is clean on paper, real usage will reveal
memories that deserve to be entities but started as plain text.
Four signals are good candidates for proposing graduation:

1. **Reference count crosses a threshold.** A memory that has been
   reinforced 5+ times across multiple interactions is a strong
   signal that it deserves structure.

2. **Memory content matches a known entity template.** If a
   `knowledge` memory's content matches the shape "X = value [unit]"
   it can be proposed as a `Fact` or `Parameter` entity.

3. **A user explicitly asks for promotion.** `POST /memory/{id}/graduate`
   is the simplest explicit path — it returns a proposal for an
   entity structured from the memory's content, which the user can
   accept or reject.

4. **Extraction pass proposes an entity that happens to match an
   existing memory.** The entity extractor, when scanning a new
   interaction, sees the same content already exists as a memory
   and proposes graduation as part of its candidate output.

The graduation flow is:

```
memory row (active, confidence C)
      |
      | propose_graduation()
      v
entity candidate row (candidate, confidence C)
      +
memory row gets status="graduated" and a forward pointer to the
entity candidate
      |
      | human promotes the candidate entity
      v
entity row (active)
      +
memory row stays "graduated" permanently (historical record)
```

The memory is never deleted. It becomes a frozen historical
pointer to the entity it became. This keeps the audit trail intact
and lets the Human Mirror show "this decision started life as a
memory on April 2, was graduated to an entity on April 15, now has
2 supporting ValidationClaims".

The `graduated` status is a new memory status that gets added when
the graduation flow is implemented. For now (Phase 9), only the
three non-graduating types (identity/preference/episodic) would
ever avoid it, and the three graduating types stay in their current
memory-only state until the engineering layer ships.

## Context pack assembly after the split

The context builder today (`src/atocore/context/builder.py`) pulls:

1. Trusted Project State
2. Identity + Preference memories
3. Retrieved chunks

After the split, it pulls:

1. Trusted Project State (unchanged)
2. **Identity + Preference memories** (unchanged — these stay memories)
3. **Engineering-layer facts relevant to the prompt**, queried through
   the entity API (new)
4. Retrieved chunks (unchanged, lowest trust)

Note the ordering: identity/preference memories stay above entities,
because personal style information is always more trusted than
extracted engineering facts. Entities sit below the personal layer
but above raw retrieval, because they have structured provenance
that raw chunks lack.

The budget allocation gains a new slot:

- trusted project state: 20% (unchanged, highest trust)
- identity memories: 5% (unchanged)
- preference memories: 5% (unchanged)
- **engineering entities: 15%** (new — pulls only V1-required
  objects relevant to the prompt)
- retrieval: 55% (reduced from 70% to make room)

These are starting numbers. After the engineering layer ships and
real usage tunes retrieval quality, these will be revisited.

## What the shipped memory types still mean after the split

| Memory type | Still accepts new writes? | V1 destination for new extractions |
|-------------|---------------------------|------------------------------------|
| identity    | **yes**                   | memory (no change)                 |
| preference  | **yes**                   | memory (no change)                 |
| episodic    | **yes**                   | memory (no change)                 |
| knowledge   | yes, but only for loose facts | entity (Fact / Parameter) for structured things; memory is a fallback |
| project     | **no new writes after engineering V1 ships** | entity (Requirement / Constraint / Subsystem attribute) |
| adaptation  | **no new writes after engineering V1 ships** | entity (Decision) |

"No new writes" means the `create_memory` path will refuse to
create new `project` or `adaptation` memories once the engineering
layer V1 ships. Existing rows stay queryable and reinforceable but
new facts of those kinds must become entities. This keeps the
canonical-home rule clean going forward.

The deprecation is deferred: it does not happen until the engineering
layer V1 is demonstrably working against the active project set. Until
then, the existing memory types continue to accept writes so the
Phase 9 loop can be exercised without waiting on the engineering
layer.

## Consequences for Phase 9 (what we just built)

The capture loop, reinforcement, and extractor we shipped today
are *memory-facing*. They produce memory candidates, reinforce
memory confidence, and respect the memory status lifecycle. None
of that changes.

When the engineering layer V1 ships, the extractor in
`src/atocore/memory/extractor.py` gets a sibling in
`src/atocore/entities/extractor.py` that uses the same
interaction-scanning approach but produces entity candidates
instead. The `POST /interactions/{id}/extract` endpoint either:

- runs both extractors and returns a combined result, or
- gains a `?target=memory|entities|both` query parameter

and the decision between those two shapes can wait until the
entity extractor actually exists.

Until the entity layer is real, the memory extractor also has to
cover some things that will eventually move to entities (decisions,
constraints, requirements). **That overlap is temporary and
intentional.** Rather than leave those cues unextracted for months
while the entity layer is being built, the memory extractor
surfaces them as memory candidates. Later, a migration pass will
propose graduation on every active memory created by
`decision_heading`, `constraint_heading`, and `requirement_heading`
rules once the entity types exist to receive them.

So: **no rework in Phase 9, no wasted extraction, clean handoff
once the entity layer lands**.

## Open questions this document does NOT answer

These are deliberately deferred to later planning docs:

1. **When exactly does extraction fire?** (answered by
   `promotion-rules.md`)
2. **How are conflicts between a memory and an entity handled
   during graduation?** (answered by `conflict-model.md`)
3. **Does the context builder traverse the entity graph for
   relationship-rich queries, or does it only surface direct facts?**
   (answered by the context-builder spec in a future
   `engineering-context-integration.md` doc)
4. **What is the exact API shape of the unified candidate review
   queue?** (answered by a future `review-queue-api.md` doc when
   the entity extractor exists and both tables need one UI)

## TL;DR

- memories = user-facing unstructured facts, still own identity/preference/episodic
- entities = engineering-facing typed facts, own project/knowledge/adaptation
- one canonical home per concept, never both
- one shared candidate-review queue, same promote/reject shape
- graduated memories stay as frozen historical pointers
- Phase 9 stays memory-only and ships today; entity V1 follows the
  remaining architecture docs in this planning sprint
- no rework required when the entity layer lands; the current memory
  extractor's structural cues get migrated forward via explicit
  graduation