310 lines
14 KiB
Markdown
310 lines
14 KiB
Markdown
|
|
# Memory vs Entities (Engineering Layer V1 boundary)
|
||
|
|
|
||
|
|
## Why this document exists
|
||
|
|
|
||
|
|
The engineering layer introduces a new representation — typed
|
||
|
|
entities with explicit relationships — alongside AtoCore's existing
|
||
|
|
memory system and its six memory types. The question that blocks
|
||
|
|
every other engineering-layer planning doc is:
|
||
|
|
|
||
|
|
> When we extract a fact from an interaction or a document, does it
|
||
|
|
> become a memory, an entity, or both? And if both, which one is
|
||
|
|
> canonical?
|
||
|
|
|
||
|
|
Without an answer, the rest of the engineering layer cannot be
|
||
|
|
designed. This document is the answer.
|
||
|
|
|
||
|
|
## The short version
|
||
|
|
|
||
|
|
- **Memories stay.** They are still the canonical home for
|
||
|
|
*unstructured, attributed, personal, natural-language* facts.
|
||
|
|
- **Entities are new.** They are the canonical home for *structured,
|
||
|
|
typed, relational, engineering-domain* facts.
|
||
|
|
- **No concept lives in both at full fidelity.** Every concept has
|
||
|
|
exactly one canonical home. The other layer may hold a pointer or
|
||
|
|
a rendered view, never a second source of truth.
|
||
|
|
- **The two layers share one review queue.** Candidates from
|
||
|
|
extraction flow into the same `status=candidate` lifecycle
|
||
|
|
regardless of whether they are memory-bound or entity-bound.
|
||
|
|
- **Memories can "graduate" into entities** when enough structure has
|
||
|
|
accumulated, but the upgrade is an explicit, logged promotion, not
|
||
|
|
a silent rewrite.
|
||
|
|
|
||
|
|
## The split per memory type
|
||
|
|
|
||
|
|
The six memory types from the current Phase 2 implementation each
|
||
|
|
map to exactly one outcome in V1:
|
||
|
|
|
||
|
|
| Memory type | V1 destination | Rationale |
|
||
|
|
|---------------|-------------------------------|-------------------------------------------------------------------------------------------------------------|
|
||
|
|
| identity | **memory only** | Always about the human user. No engineering domain structure. Never gets entity-shaped. |
|
||
|
|
| preference | **memory only** | Always about the human user's working style. Same reasoning. |
|
||
|
|
| episodic | **memory only** | "What happened in this conversation / this day." Attribution and time are the point, not typed structure. |
|
||
|
|
| knowledge | **entity when possible**, memory otherwise | If the knowledge maps to a typed engineering object (material property, constant, tolerance), it becomes a Fact entity with provenance. If it's loose general knowledge, stays a memory. |
|
||
|
|
| project | **entity** | Anything that belonged in the "project" memory type is really a Requirement, Constraint, Decision, Subsystem attribute, etc. It belongs in the engineering layer once entities exist. |
|
||
|
|
| adaptation | **entity (Decision)** | "We decided to X" is literally a Decision entity in the ontology. This is the clearest migration. |
|
||
|
|
|
||
|
|
**Practical consequence:** when the engineering layer V1 ships, the
|
||
|
|
`project`, `knowledge`, and `adaptation` memory types are deprecated
|
||
|
|
as a canonical home for new facts. Existing rows are not deleted —
|
||
|
|
they are backfilled as entities through the promotion-rules flow
|
||
|
|
(see `promotion-rules.md`), and the old memory rows become frozen
|
||
|
|
references pointing at their graduated entity.
|
||
|
|
|
||
|
|
The `identity`, `preference`, and `episodic` memory types continue
|
||
|
|
to exist exactly as they do today and do not interact with the
|
||
|
|
engineering layer at all.
|
||
|
|
|
||
|
|
## What "canonical home" actually means
|
||
|
|
|
||
|
|
A concept's canonical home is the single place where:
|
||
|
|
|
||
|
|
- its *current active value* is stored
|
||
|
|
- its *status lifecycle* is managed (active/superseded/invalid)
|
||
|
|
- its *confidence* is tracked
|
||
|
|
- its *provenance chain* is rooted
|
||
|
|
- edits, supersessions, and invalidations are applied
|
||
|
|
- conflict resolution is arbitrated
|
||
|
|
|
||
|
|
Everything else is a derived view of that canonical row.
|
||
|
|
|
||
|
|
If a `Decision` entity is the canonical home for "we switched to
|
||
|
|
GF-PTFE pads", then:
|
||
|
|
|
||
|
|
- there is no `adaptation` memory row with the same content; the
|
||
|
|
extractor creates a `Decision` candidate directly
|
||
|
|
- the context builder, when asked to include relevant state, reaches
|
||
|
|
into the entity store via the engineering layer, not the memory
|
||
|
|
store
|
||
|
|
- if the user wants to see "recent decisions" they hit the entity
|
||
|
|
API, never the memory API
|
||
|
|
- if they want to invalidate the decision, they do so via the entity
|
||
|
|
API
|
||
|
|
|
||
|
|
The memory API remains the canonical home for `identity`,
|
||
|
|
`preference`, and `episodic` — same rules, just a different set of
|
||
|
|
types.
|
||
|
|
|
||
|
|
## Why not a unified table with a `kind` column?
|
||
|
|
|
||
|
|
It would be simpler to implement. It is rejected for three reasons:
|
||
|
|
|
||
|
|
1. **Different query shapes.** Memories are queried by type, project,
|
||
|
|
confidence, recency. Entities are queried by type, relationships,
|
||
|
|
graph traversal, coverage gaps ("orphan requirements"). Cramming
|
||
|
|
both into one table forces the schema to be the union of both
|
||
|
|
worlds and makes each query slower.
|
||
|
|
|
||
|
|
2. **Different lifecycles.** Memories have a simple four-state
|
||
|
|
lifecycle (candidate/active/superseded/invalid). Entities have
|
||
|
|
the same four states *plus* per-relationship supersession,
|
||
|
|
per-field versioning for the killer correctness queries, and
|
||
|
|
structured conflict flagging. The unified table would have to
|
||
|
|
carry all entity apparatus for every memory row.
|
||
|
|
|
||
|
|
3. **Different provenance semantics.** A preference memory is
|
||
|
|
provenanced by "the user told me" — one author, one time.
|
||
|
|
An entity like a `Requirement` is provenanced by "this source
|
||
|
|
chunk + this source document + these supporting Results" — a
|
||
|
|
graph. The tables want to be different because their provenance
|
||
|
|
models are different.
|
||
|
|
|
||
|
|
So: two tables, one review queue, one promotion flow, one trust
|
||
|
|
hierarchy.
|
||
|
|
|
||
|
|
## The shared review queue
|
||
|
|
|
||
|
|
Both the memory extractor (Phase 9 Commit C, already shipped) and
|
||
|
|
the future entity extractor write into the same conceptual queue:
|
||
|
|
everything lands at `status=candidate` in its own table, and the
|
||
|
|
human reviewer sees a unified list. The reviewer UI (future work)
|
||
|
|
shows candidates of all kinds side by side, grouped by source
|
||
|
|
interaction / source document, with the rule that fired.
|
||
|
|
|
||
|
|
From the data side this means:
|
||
|
|
|
||
|
|
- the memories table gets a `candidate` status (**already done in
|
||
|
|
Phase 9 Commit B/C**)
|
||
|
|
- the future entities table will get the same `candidate` status
|
||
|
|
- both tables get the same `promote` / `reject` API shape: one verb
|
||
|
|
per candidate, with an audit log entry
|
||
|
|
|
||
|
|
Implementation note: the API routes should evolve from
|
||
|
|
`POST /memory/{id}/promote` to `POST /candidates/{id}/promote` once
|
||
|
|
both tables exist, so the reviewer tooling can treat them
|
||
|
|
uniformly. The current memory-only route stays in place for
|
||
|
|
backward compatibility and is aliased by the unified route.
|
||
|
|
|
||
|
|
## Memory-to-entity graduation
|
||
|
|
|
||
|
|
Even though the split is clean on paper, real usage will reveal
|
||
|
|
memories that deserve to be entities but started as plain text.
|
||
|
|
Four signals are good candidates for proposing graduation:
|
||
|
|
|
||
|
|
1. **Reference count crosses a threshold.** A memory that has been
|
||
|
|
reinforced 5+ times across multiple interactions is a strong
|
||
|
|
signal that it deserves structure.
|
||
|
|
|
||
|
|
2. **Memory content matches a known entity template.** If a
|
||
|
|
`knowledge` memory's content matches the shape "X = value [unit]"
|
||
|
|
it can be proposed as a `Fact` or `Parameter` entity.
|
||
|
|
|
||
|
|
3. **A user explicitly asks for promotion.** `POST /memory/{id}/graduate`
|
||
|
|
is the simplest explicit path — it returns a proposal for an
|
||
|
|
entity structured from the memory's content, which the user can
|
||
|
|
accept or reject.
|
||
|
|
|
||
|
|
4. **Extraction pass proposes an entity that happens to match an
|
||
|
|
existing memory.** The entity extractor, when scanning a new
|
||
|
|
interaction, sees the same content already exists as a memory
|
||
|
|
and proposes graduation as part of its candidate output.
|
||
|
|
|
||
|
|
The graduation flow is:
|
||
|
|
|
||
|
|
```
|
||
|
|
memory row (active, confidence C)
|
||
|
|
|
|
||
|
|
| propose_graduation()
|
||
|
|
v
|
||
|
|
entity candidate row (candidate, confidence C)
|
||
|
|
+
|
||
|
|
memory row gets status="graduated" and a forward pointer to the
|
||
|
|
entity candidate
|
||
|
|
|
|
||
|
|
| human promotes the candidate entity
|
||
|
|
v
|
||
|
|
entity row (active)
|
||
|
|
+
|
||
|
|
memory row stays "graduated" permanently (historical record)
|
||
|
|
```
|
||
|
|
|
||
|
|
The memory is never deleted. It becomes a frozen historical
|
||
|
|
pointer to the entity it became. This keeps the audit trail intact
|
||
|
|
and lets the Human Mirror show "this decision started life as a
|
||
|
|
memory on April 2, was graduated to an entity on April 15, now has
|
||
|
|
2 supporting ValidationClaims".
|
||
|
|
|
||
|
|
The `graduated` status is a new memory status that gets added when
|
||
|
|
the graduation flow is implemented. For now (Phase 9), only the
|
||
|
|
three non-graduating types (identity/preference/episodic) would
|
||
|
|
ever avoid it, and the three graduating types stay in their current
|
||
|
|
memory-only state until the engineering layer ships.
|
||
|
|
|
||
|
|
## Context pack assembly after the split
|
||
|
|
|
||
|
|
The context builder today (`src/atocore/context/builder.py`) pulls:
|
||
|
|
|
||
|
|
1. Trusted Project State
|
||
|
|
2. Identity + Preference memories
|
||
|
|
3. Retrieved chunks
|
||
|
|
|
||
|
|
After the split, it pulls:
|
||
|
|
|
||
|
|
1. Trusted Project State (unchanged)
|
||
|
|
2. **Identity + Preference memories** (unchanged — these stay memories)
|
||
|
|
3. **Engineering-layer facts relevant to the prompt**, queried through
|
||
|
|
the entity API (new)
|
||
|
|
4. Retrieved chunks (unchanged, lowest trust)
|
||
|
|
|
||
|
|
Note the ordering: identity/preference memories stay above entities,
|
||
|
|
because personal style information is always more trusted than
|
||
|
|
extracted engineering facts. Entities sit below the personal layer
|
||
|
|
but above raw retrieval, because they have structured provenance
|
||
|
|
that raw chunks lack.
|
||
|
|
|
||
|
|
The budget allocation gains a new slot:
|
||
|
|
|
||
|
|
- trusted project state: 20% (unchanged, highest trust)
|
||
|
|
- identity memories: 5% (unchanged)
|
||
|
|
- preference memories: 5% (unchanged)
|
||
|
|
- **engineering entities: 15%** (new — pulls only V1-required
|
||
|
|
objects relevant to the prompt)
|
||
|
|
- retrieval: 55% (reduced from 70% to make room)
|
||
|
|
|
||
|
|
These are starting numbers. After the engineering layer ships and
|
||
|
|
real usage tunes retrieval quality, these will be revisited.
|
||
|
|
|
||
|
|
## What the shipped memory types still mean after the split
|
||
|
|
|
||
|
|
| Memory type | Still accepts new writes? | V1 destination for new extractions |
|
||
|
|
|-------------|---------------------------|------------------------------------|
|
||
|
|
| identity | **yes** | memory (no change) |
|
||
|
|
| preference | **yes** | memory (no change) |
|
||
|
|
| episodic | **yes** | memory (no change) |
|
||
|
|
| knowledge | yes, but only for loose facts | entity (Fact / Parameter) for structured things; memory is a fallback |
|
||
|
|
| project | **no new writes after engineering V1 ships** | entity (Requirement / Constraint / Subsystem attribute) |
|
||
|
|
| adaptation | **no new writes after engineering V1 ships** | entity (Decision) |
|
||
|
|
|
||
|
|
"No new writes" means the `create_memory` path will refuse to
|
||
|
|
create new `project` or `adaptation` memories once the engineering
|
||
|
|
layer V1 ships. Existing rows stay queryable and reinforceable but
|
||
|
|
new facts of those kinds must become entities. This keeps the
|
||
|
|
canonical-home rule clean going forward.
|
||
|
|
|
||
|
|
The deprecation is deferred: it does not happen until the engineering
|
||
|
|
layer V1 is demonstrably working against the active project set. Until
|
||
|
|
then, the existing memory types continue to accept writes so the
|
||
|
|
Phase 9 loop can be exercised without waiting on the engineering
|
||
|
|
layer.
|
||
|
|
|
||
|
|
## Consequences for Phase 9 (what we just built)
|
||
|
|
|
||
|
|
The capture loop, reinforcement, and extractor we shipped today
|
||
|
|
are *memory-facing*. They produce memory candidates, reinforce
|
||
|
|
memory confidence, and respect the memory status lifecycle. None
|
||
|
|
of that changes.
|
||
|
|
|
||
|
|
When the engineering layer V1 ships, the extractor in
|
||
|
|
`src/atocore/memory/extractor.py` gets a sibling in
|
||
|
|
`src/atocore/entities/extractor.py` that uses the same
|
||
|
|
interaction-scanning approach but produces entity candidates
|
||
|
|
instead. The `POST /interactions/{id}/extract` endpoint either:
|
||
|
|
|
||
|
|
- runs both extractors and returns a combined result, or
|
||
|
|
- gains a `?target=memory|entities|both` query parameter
|
||
|
|
|
||
|
|
and the decision between those two shapes can wait until the
|
||
|
|
entity extractor actually exists.
|
||
|
|
|
||
|
|
Until the entity layer is real, the memory extractor also has to
|
||
|
|
cover some things that will eventually move to entities (decisions,
|
||
|
|
constraints, requirements). **That overlap is temporary and
|
||
|
|
intentional.** Rather than leave those cues unextracted for months
|
||
|
|
while the entity layer is being built, the memory extractor
|
||
|
|
surfaces them as memory candidates. Later, a migration pass will
|
||
|
|
propose graduation on every active memory created by
|
||
|
|
`decision_heading`, `constraint_heading`, and `requirement_heading`
|
||
|
|
rules once the entity types exist to receive them.
|
||
|
|
|
||
|
|
So: **no rework in Phase 9, no wasted extraction, clean handoff
|
||
|
|
once the entity layer lands**.
|
||
|
|
|
||
|
|
## Open questions this document does NOT answer
|
||
|
|
|
||
|
|
These are deliberately deferred to later planning docs:
|
||
|
|
|
||
|
|
1. **When exactly does extraction fire?** (answered by
|
||
|
|
`promotion-rules.md`)
|
||
|
|
2. **How are conflicts between a memory and an entity handled
|
||
|
|
during graduation?** (answered by `conflict-model.md`)
|
||
|
|
3. **Does the context builder traverse the entity graph for
|
||
|
|
relationship-rich queries, or does it only surface direct facts?**
|
||
|
|
(answered by the context-builder spec in a future
|
||
|
|
`engineering-context-integration.md` doc)
|
||
|
|
4. **What is the exact API shape of the unified candidate review
|
||
|
|
queue?** (answered by a future `review-queue-api.md` doc when
|
||
|
|
the entity extractor exists and both tables need one UI)
|
||
|
|
|
||
|
|
## TL;DR
|
||
|
|
|
||
|
|
- memories = user-facing unstructured facts, still own identity/preference/episodic
|
||
|
|
- entities = engineering-facing typed facts, own project/knowledge/adaptation
|
||
|
|
- one canonical home per concept, never both
|
||
|
|
- one shared candidate-review queue, same promote/reject shape
|
||
|
|
- graduated memories stay as frozen historical pointers
|
||
|
|
- Phase 9 stays memory-only and ships today; entity V1 follows the
|
||
|
|
remaining architecture docs in this planning sprint
|
||
|
|
- no rework required when the entity layer lands; the current memory
|
||
|
|
extractor's structural cues get migrated forward via explicit
|
||
|
|
graduation
|