Files

Anto01 480f13a6df docs(arch): memory-vs-entities, promotion-rules, conflict-model

Three planning docs that answer the architectural questions the
engineering query catalog raised. Together with the catalog they
form roughly half of the pre-implementation planning sprint.

docs/architecture/memory-vs-entities.md
---------------------------------------
Resolves the central question blocking every other engineering
layer doc: is a Decision a memory or an entity?

Key decisions:
- memories stay the canonical home for identity, preference, and
  episodic facts
- entities become the canonical home for project, knowledge, and
  adaptation facts once the engineering layer V1 ships
- no concept lives in both layers at full fidelity; one canonical
  home per concept
- a "graduation" flow lets active memories upgrade into entities
  (memory stays as a frozen historical pointer, never deleted)
- one shared candidate review queue across both layers
- context builder budget gains a 15% slot for engineering entities,
  slotted between identity/preference memories and retrieved chunks
- the Phase 9 memory extractor's structural cues (decision heading,
  constraint heading, requirement heading) are explicitly an
  intentional temporary overlap, cleanly migrated via graduation
  when the entity extractor ships

docs/architecture/promotion-rules.md
------------------------------------
Defines the full Layer 0 → Layer 2 pipeline:

- four layers: L0 raw source, L1 memory candidate/active, L2 entity
  candidate/active, L3 trusted project state
- three extraction triggers: on interaction capture (existing),
  on ingestion wave (new, batched per wave), on explicit request
- per-rule prior confidence tuned at write time by structural
  signal (echoes the retriever's high/low signal hints) and
  freshness bonus
- batch cap of 50 candidates per pass to protect the reviewer
- full provenance requirements: every candidate carries rule id,
  source_chunk_id, source_interaction_id, and extractor_version
- reversibility matrix for every promotion step
- explicit no-auto-promotion-in-V1 stance with the schema designed
  so auto-promotion policies can be added later without migration
- the hard invariant: nothing ever moves into L3 automatically
- ingestion-wave extraction produces a report artifact under
  data/extraction-reports/<wave-id>/

docs/architecture/conflict-model.md
-----------------------------------
Defines how AtoCore handles contradictory facts without violating
the "bad memory is worse than no memory" rule.

- conflict = two or more active rows claiming the same slot with
  incompatible values
- per-type "slot key" tuples for both memory and entity types
- cross-layer conflict detection respects the trust hierarchy:
  trusted project state > active entities > active memories
- new conflicts and conflict_members tables (schema proposal)
- detection at two latencies: synchronous at write time,
  asynchronous nightly sweep
- "flag, never block" rule: writes always succeed, conflicts are
  surfaced via /conflicts, /health open_conflicts_count, per-row
  response bodies, and the Human Mirror's disputed marker
- resolution is always human: promote-winner + supersede-others,
  or dismiss-as-not-a-real-conflict, both with audit trail
- explicitly out of scope for V1: cross-project conflicts,
  temporal-overlap conflicts, tolerance-aware numeric comparisons

Also updates:
- master-plan-status.md: Phase 9 moved from "started" to "baseline
  complete" now that Commits A, B, C are all landed
- master-plan-status.md: adds a "Engineering Layer Planning Sprint"
  section listing the doc wave so far and the remaining docs
  (tool-handoff-boundaries, human-mirror-rules,
  representation-authority, engineering-v1-acceptance)
- current-state.md: Phase 9 moved from "not started" to "baseline
  complete" with the A/B/C annotation

This is pure doc work. No code changes, no schema changes, no
behavior changes. Per the working rule in master-plan-status.md:
the architecture docs shape decisions, they do not force premature
schema work.

2026-04-06 21:30:35 -04:00

14 KiB

Raw Permalink Blame History

Memory vs Entities (Engineering Layer V1 boundary)

Why this document exists

The engineering layer introduces a new representation — typed entities with explicit relationships — alongside AtoCore's existing memory system and its six memory types. The question that blocks every other engineering-layer planning doc is:

When we extract a fact from an interaction or a document, does it become a memory, an entity, or both? And if both, which one is canonical?

Without an answer, the rest of the engineering layer cannot be designed. This document is the answer.

The short version

Memories stay. They are still the canonical home for unstructured, attributed, personal, natural-language facts.
Entities are new. They are the canonical home for structured, typed, relational, engineering-domain facts.
No concept lives in both at full fidelity. Every concept has exactly one canonical home. The other layer may hold a pointer or a rendered view, never a second source of truth.
The two layers share one review queue. Candidates from extraction flow into the same status=candidate lifecycle regardless of whether they are memory-bound or entity-bound.
Memories can "graduate" into entities when enough structure has accumulated, but the upgrade is an explicit, logged promotion, not a silent rewrite.

The split per memory type

The six memory types from the current Phase 2 implementation each map to exactly one outcome in V1:

Memory type	V1 destination	Rationale
identity	memory only	Always about the human user. No engineering domain structure. Never gets entity-shaped.
preference	memory only	Always about the human user's working style. Same reasoning.
episodic	memory only	"What happened in this conversation / this day." Attribution and time are the point, not typed structure.
knowledge	entity when possible, memory otherwise	If the knowledge maps to a typed engineering object (material property, constant, tolerance), it becomes a Fact entity with provenance. If it's loose general knowledge, stays a memory.
project	entity	Anything that belonged in the "project" memory type is really a Requirement, Constraint, Decision, Subsystem attribute, etc. It belongs in the engineering layer once entities exist.
adaptation	entity (Decision)	"We decided to X" is literally a Decision entity in the ontology. This is the clearest migration.

Practical consequence: when the engineering layer V1 ships, the project, knowledge, and adaptation memory types are deprecated as a canonical home for new facts. Existing rows are not deleted — they are backfilled as entities through the promotion-rules flow (see promotion-rules.md), and the old memory rows become frozen references pointing at their graduated entity.

The identity, preference, and episodic memory types continue to exist exactly as they do today and do not interact with the engineering layer at all.

What "canonical home" actually means

A concept's canonical home is the single place where:

its current active value is stored
its status lifecycle is managed (active/superseded/invalid)
its confidence is tracked
its provenance chain is rooted
edits, supersessions, and invalidations are applied
conflict resolution is arbitrated

Everything else is a derived view of that canonical row.

If a Decision entity is the canonical home for "we switched to GF-PTFE pads", then:

there is no adaptation memory row with the same content; the extractor creates a Decision candidate directly
the context builder, when asked to include relevant state, reaches into the entity store via the engineering layer, not the memory store
if the user wants to see "recent decisions" they hit the entity API, never the memory API
if they want to invalidate the decision, they do so via the entity API

The memory API remains the canonical home for identity, preference, and episodic — same rules, just a different set of types.

Why not a unified table with a `kind` column?

It would be simpler to implement. It is rejected for three reasons:

Different query shapes. Memories are queried by type, project, confidence, recency. Entities are queried by type, relationships, graph traversal, coverage gaps ("orphan requirements"). Cramming both into one table forces the schema to be the union of both worlds and makes each query slower.
Different lifecycles. Memories have a simple four-state lifecycle (candidate/active/superseded/invalid). Entities have the same four states plus per-relationship supersession, per-field versioning for the killer correctness queries, and structured conflict flagging. The unified table would have to carry all entity apparatus for every memory row.
Different provenance semantics. A preference memory is provenanced by "the user told me" — one author, one time. An entity like a Requirement is provenanced by "this source chunk + this source document + these supporting Results" — a graph. The tables want to be different because their provenance models are different.

So: two tables, one review queue, one promotion flow, one trust hierarchy.

The shared review queue

Both the memory extractor (Phase 9 Commit C, already shipped) and the future entity extractor write into the same conceptual queue: everything lands at status=candidate in its own table, and the human reviewer sees a unified list. The reviewer UI (future work) shows candidates of all kinds side by side, grouped by source interaction / source document, with the rule that fired.

From the data side this means:

the memories table gets a candidate status (already done in Phase 9 Commit B/C)
the future entities table will get the same candidate status
both tables get the same promote / reject API shape: one verb per candidate, with an audit log entry

Implementation note: the API routes should evolve from POST /memory/{id}/promote to POST /candidates/{id}/promote once both tables exist, so the reviewer tooling can treat them uniformly. The current memory-only route stays in place for backward compatibility and is aliased by the unified route.

Memory-to-entity graduation

Even though the split is clean on paper, real usage will reveal memories that deserve to be entities but started as plain text. Four signals are good candidates for proposing graduation:

Reference count crosses a threshold. A memory that has been reinforced 5+ times across multiple interactions is a strong signal that it deserves structure.
Memory content matches a known entity template. If a knowledge memory's content matches the shape "X = value [unit]" it can be proposed as a Fact or Parameter entity.
A user explicitly asks for promotion. POST /memory/{id}/graduate is the simplest explicit path — it returns a proposal for an entity structured from the memory's content, which the user can accept or reject.
Extraction pass proposes an entity that happens to match an existing memory. The entity extractor, when scanning a new interaction, sees the same content already exists as a memory and proposes graduation as part of its candidate output.

The graduation flow is:

memory row (active, confidence C)
      |
      | propose_graduation()
      v
entity candidate row (candidate, confidence C)
      +
memory row gets status="graduated" and a forward pointer to the
entity candidate
      |
      | human promotes the candidate entity
      v
entity row (active)
      +
memory row stays "graduated" permanently (historical record)

The memory is never deleted. It becomes a frozen historical pointer to the entity it became. This keeps the audit trail intact and lets the Human Mirror show "this decision started life as a memory on April 2, was graduated to an entity on April 15, now has 2 supporting ValidationClaims".

The graduated status is a new memory status that gets added when the graduation flow is implemented. For now (Phase 9), only the three non-graduating types (identity/preference/episodic) would ever avoid it, and the three graduating types stay in their current memory-only state until the engineering layer ships.

Context pack assembly after the split

The context builder today (src/atocore/context/builder.py) pulls:

Trusted Project State
Identity + Preference memories
Retrieved chunks

After the split, it pulls:

Trusted Project State (unchanged)
Identity + Preference memories (unchanged — these stay memories)
Engineering-layer facts relevant to the prompt, queried through the entity API (new)
Retrieved chunks (unchanged, lowest trust)

Note the ordering: identity/preference memories stay above entities, because personal style information is always more trusted than extracted engineering facts. Entities sit below the personal layer but above raw retrieval, because they have structured provenance that raw chunks lack.

The budget allocation gains a new slot:

trusted project state: 20% (unchanged, highest trust)
identity memories: 5% (unchanged)
preference memories: 5% (unchanged)
engineering entities: 15% (new — pulls only V1-required objects relevant to the prompt)
retrieval: 55% (reduced from 70% to make room)

These are starting numbers. After the engineering layer ships and real usage tunes retrieval quality, these will be revisited.

What the shipped memory types still mean after the split

Memory type	Still accepts new writes?	V1 destination for new extractions
identity	yes	memory (no change)
preference	yes	memory (no change)
episodic	yes	memory (no change)
knowledge	yes, but only for loose facts	entity (Fact / Parameter) for structured things; memory is a fallback
project	no new writes after engineering V1 ships	entity (Requirement / Constraint / Subsystem attribute)
adaptation	no new writes after engineering V1 ships	entity (Decision)

"No new writes" means the create_memory path will refuse to create new project or adaptation memories once the engineering layer V1 ships. Existing rows stay queryable and reinforceable but new facts of those kinds must become entities. This keeps the canonical-home rule clean going forward.

The deprecation is deferred: it does not happen until the engineering layer V1 is demonstrably working against the active project set. Until then, the existing memory types continue to accept writes so the Phase 9 loop can be exercised without waiting on the engineering layer.

Consequences for Phase 9 (what we just built)

The capture loop, reinforcement, and extractor we shipped today are memory-facing. They produce memory candidates, reinforce memory confidence, and respect the memory status lifecycle. None of that changes.

When the engineering layer V1 ships, the extractor in src/atocore/memory/extractor.py gets a sibling in src/atocore/entities/extractor.py that uses the same interaction-scanning approach but produces entity candidates instead. The POST /interactions/{id}/extract endpoint either:

runs both extractors and returns a combined result, or
gains a ?target=memory|entities|both query parameter

and the decision between those two shapes can wait until the entity extractor actually exists.

Until the entity layer is real, the memory extractor also has to cover some things that will eventually move to entities (decisions, constraints, requirements). That overlap is temporary and intentional. Rather than leave those cues unextracted for months while the entity layer is being built, the memory extractor surfaces them as memory candidates. Later, a migration pass will propose graduation on every active memory created by decision_heading, constraint_heading, and requirement_heading rules once the entity types exist to receive them.

So: no rework in Phase 9, no wasted extraction, clean handoff once the entity layer lands.

Open questions this document does NOT answer

These are deliberately deferred to later planning docs:

When exactly does extraction fire? (answered by promotion-rules.md)
How are conflicts between a memory and an entity handled during graduation? (answered by conflict-model.md)
Does the context builder traverse the entity graph for relationship-rich queries, or does it only surface direct facts? (answered by the context-builder spec in a future engineering-context-integration.md doc)
What is the exact API shape of the unified candidate review queue? (answered by a future review-queue-api.md doc when the entity extractor exists and both tables need one UI)

TL;DR

memories = user-facing unstructured facts, still own identity/preference/episodic
entities = engineering-facing typed facts, own project/knowledge/adaptation
one canonical home per concept, never both
one shared candidate-review queue, same promote/reject shape
graduated memories stay as frozen historical pointers
Phase 9 stays memory-only and ships today; entity V1 follows the remaining architecture docs in this planning sprint
no rework required when the entity layer lands; the current memory extractor's structural cues get migrated forward via explicit graduation

14 KiB Raw Permalink Blame History