Three planning docs that answer the architectural questions the engineering query catalog raised. Together with the catalog they form roughly half of the pre-implementation planning sprint. docs/architecture/memory-vs-entities.md --------------------------------------- Resolves the central question blocking every other engineering layer doc: is a Decision a memory or an entity? Key decisions: - memories stay the canonical home for identity, preference, and episodic facts - entities become the canonical home for project, knowledge, and adaptation facts once the engineering layer V1 ships - no concept lives in both layers at full fidelity; one canonical home per concept - a "graduation" flow lets active memories upgrade into entities (memory stays as a frozen historical pointer, never deleted) - one shared candidate review queue across both layers - context builder budget gains a 15% slot for engineering entities, slotted between identity/preference memories and retrieved chunks - the Phase 9 memory extractor's structural cues (decision heading, constraint heading, requirement heading) are explicitly an intentional temporary overlap, cleanly migrated via graduation when the entity extractor ships docs/architecture/promotion-rules.md ------------------------------------ Defines the full Layer 0 → Layer 2 pipeline: - four layers: L0 raw source, L1 memory candidate/active, L2 entity candidate/active, L3 trusted project state - three extraction triggers: on interaction capture (existing), on ingestion wave (new, batched per wave), on explicit request - per-rule prior confidence tuned at write time by structural signal (echoes the retriever's high/low signal hints) and freshness bonus - batch cap of 50 candidates per pass to protect the reviewer - full provenance requirements: every candidate carries rule id, source_chunk_id, source_interaction_id, and extractor_version - reversibility matrix for every promotion step - explicit no-auto-promotion-in-V1 stance with the schema designed so auto-promotion policies can be added later without migration - the hard invariant: nothing ever moves into L3 automatically - ingestion-wave extraction produces a report artifact under data/extraction-reports/<wave-id>/ docs/architecture/conflict-model.md ----------------------------------- Defines how AtoCore handles contradictory facts without violating the "bad memory is worse than no memory" rule. - conflict = two or more active rows claiming the same slot with incompatible values - per-type "slot key" tuples for both memory and entity types - cross-layer conflict detection respects the trust hierarchy: trusted project state > active entities > active memories - new conflicts and conflict_members tables (schema proposal) - detection at two latencies: synchronous at write time, asynchronous nightly sweep - "flag, never block" rule: writes always succeed, conflicts are surfaced via /conflicts, /health open_conflicts_count, per-row response bodies, and the Human Mirror's disputed marker - resolution is always human: promote-winner + supersede-others, or dismiss-as-not-a-real-conflict, both with audit trail - explicitly out of scope for V1: cross-project conflicts, temporal-overlap conflicts, tolerance-aware numeric comparisons Also updates: - master-plan-status.md: Phase 9 moved from "started" to "baseline complete" now that Commits A, B, C are all landed - master-plan-status.md: adds a "Engineering Layer Planning Sprint" section listing the doc wave so far and the remaining docs (tool-handoff-boundaries, human-mirror-rules, representation-authority, engineering-v1-acceptance) - current-state.md: Phase 9 moved from "not started" to "baseline complete" with the A/B/C annotation This is pure doc work. No code changes, no schema changes, no behavior changes. Per the working rule in master-plan-status.md: the architecture docs shape decisions, they do not force premature schema work.
344 lines
14 KiB
Markdown
344 lines
14 KiB
Markdown
# Promotion Rules (Layer 0 → Layer 2 pipeline)
|
||
|
||
## Purpose
|
||
|
||
AtoCore ingests raw human-authored content (markdown, repo notes,
|
||
interaction transcripts) and eventually must turn some of it into
|
||
typed engineering entities that the V1 query catalog can answer.
|
||
The path from raw text to typed entity has to be:
|
||
|
||
- **explicit**: every step has a named operation, a trigger, and an
|
||
audit log
|
||
- **reversible**: every promotion can be undone without data loss
|
||
- **conservative**: no automatic movement into trusted state; a human
|
||
(or later, a very confident policy) always signs off
|
||
- **traceable**: every typed entity must carry a back-pointer to
|
||
the raw source that produced it
|
||
|
||
This document defines that path.
|
||
|
||
## The four layers
|
||
|
||
Promotion is described in terms of four layers, all of which exist
|
||
simultaneously in the system once the engineering layer V1 ships:
|
||
|
||
| Layer | Name | Canonical storage | Trust | Who writes |
|
||
|-------|-------------------|------------------------------------------|-------|------------|
|
||
| L0 | Raw source | source_documents + source_chunks | low | ingestion pipeline |
|
||
| L1 | Memory candidate | memories (status="candidate") | low | extractor |
|
||
| L1' | Active memory | memories (status="active") | med | human promotion |
|
||
| L2 | Entity candidate | entities (status="candidate") | low | extractor + graduation |
|
||
| L2' | Active entity | entities (status="active") | high | human promotion |
|
||
| L3 | Trusted state | project_state | highest | human curation |
|
||
|
||
Layer 3 (trusted project state) is already implemented and stays
|
||
manually curated — automatic promotion into L3 is **never** allowed.
|
||
|
||
## The promotion graph
|
||
|
||
```
|
||
[L0] source chunks
|
||
|
|
||
| extraction (memory extractor, Phase 9 Commit C)
|
||
v
|
||
[L1] memory candidate
|
||
|
|
||
| promote_memory()
|
||
v
|
||
[L1'] active memory
|
||
|
|
||
| (optional) propose_graduation()
|
||
v
|
||
[L2] entity candidate
|
||
|
|
||
| promote_entity()
|
||
v
|
||
[L2'] active entity
|
||
|
|
||
| (manual curation, NEVER automatic)
|
||
v
|
||
[L3] trusted project state
|
||
```
|
||
|
||
Short path (direct entity extraction, once the entity extractor
|
||
exists):
|
||
|
||
```
|
||
[L0] source chunks
|
||
|
|
||
| entity extractor
|
||
v
|
||
[L2] entity candidate
|
||
|
|
||
| promote_entity()
|
||
v
|
||
[L2'] active entity
|
||
```
|
||
|
||
A single fact can travel either path depending on what the
|
||
extractor saw. The graduation path exists for facts that started
|
||
life as memories before the entity layer existed, and for the
|
||
memory extractor's structural cues (decisions, constraints,
|
||
requirements) which are eventually entity-shaped.
|
||
|
||
## Triggers (when does extraction fire?)
|
||
|
||
Phase 9 already shipped one trigger: **on explicit API request**
|
||
(`POST /interactions/{id}/extract`). The V1 engineering layer adds
|
||
two more:
|
||
|
||
1. **On interaction capture (automatic)**
|
||
- Same event that runs reinforcement today
|
||
- Controlled by a `extract` boolean flag on the record request
|
||
(default: `false` for memory extractor, `true` once an
|
||
engineering extractor exists and has been validated)
|
||
- Output goes to the candidate queue; nothing auto-promotes
|
||
|
||
2. **On ingestion (batched, per wave)**
|
||
- After a wave of markdown ingestion finishes, a batch extractor
|
||
pass sweeps all newly-added source chunks and produces
|
||
candidates from them
|
||
- Batched per wave (not per chunk) to keep the review queue
|
||
digestible and to let the reviewer see all candidates from a
|
||
single ingestion in one place
|
||
- Output: a report artifact plus a review queue entry per
|
||
candidate
|
||
|
||
3. **On explicit human request (existing)**
|
||
- `POST /interactions/{id}/extract` for a single interaction
|
||
- Future: `POST /ingestion/wave/{id}/extract` for a whole wave
|
||
- Future: `POST /memory/{id}/graduate` to propose graduation
|
||
of one specific memory into an entity
|
||
|
||
Batch size rule: **extraction passes never write more than N
|
||
candidates per human review cycle, where N = 50 by default**. If
|
||
a pass produces more, it ranks by (rule confidence × content
|
||
length × novelty) and only writes the top N. The remaining
|
||
candidates are logged, not persisted. This protects the reviewer
|
||
from getting buried.
|
||
|
||
## Confidence and ranking of candidates
|
||
|
||
Each rule-based extraction rule carries a *prior confidence*
|
||
based on how specific its pattern is:
|
||
|
||
| Rule class | Prior | Rationale |
|
||
|---------------------------|-------|-----------|
|
||
| Heading with explicit type (`## Decision:`) | 0.7 | Very specific structural cue, intentional author marker |
|
||
| Typed list item (`- [Decision] ...`) | 0.65 | Explicit but often embedded in looser prose |
|
||
| Sentence pattern (`I prefer X`) | 0.5 | Moderate structure, more false positives |
|
||
| Regex pattern matching a value+unit (`X = 4.8 kg`) | 0.6 | Structural but prone to coincidence |
|
||
| LLM-based (future) | variable | Depends on model's returned confidence |
|
||
|
||
The candidate's final confidence at write time is:
|
||
|
||
```
|
||
final = prior * structural_signal_multiplier * freshness_bonus
|
||
```
|
||
|
||
Where:
|
||
|
||
- `structural_signal_multiplier` is 1.1 if the source chunk path
|
||
contains any of `_HIGH_SIGNAL_HINTS` from the retriever (status,
|
||
decision, requirements, charter, ...) and 0.9 if it contains
|
||
`_LOW_SIGNAL_HINTS` (`_archive`, `_history`, ...)
|
||
- `freshness_bonus` is 1.05 if the source chunk was updated in the
|
||
last 30 days, else 1.0
|
||
|
||
This formula is tuned later; the numbers are starting values.
|
||
|
||
## Review queue mechanics
|
||
|
||
### Queue population
|
||
|
||
- Each candidate writes one row into its target table
|
||
(memories or entities) with `status="candidate"`
|
||
- Each candidate carries: `rule`, `source_span`, `source_chunk_id`,
|
||
`source_interaction_id`, `extractor_version`
|
||
- No two candidates ever share the same (type, normalized_content,
|
||
project) — if a second extraction pass produces a duplicate, it
|
||
is dropped before being written
|
||
|
||
### Queue surfacing
|
||
|
||
- `GET /memory?status=candidate` lists memory candidates
|
||
- `GET /entities?status=candidate` (future) lists entity candidates
|
||
- `GET /candidates` (future unified route) lists both
|
||
|
||
### Reviewer actions
|
||
|
||
For each candidate, exactly one of:
|
||
|
||
- **promote**: `POST /memory/{id}/promote` or
|
||
`POST /entities/{id}/promote`
|
||
- sets `status="active"`
|
||
- preserves the audit trail (source_chunk_id, rule, source_span)
|
||
- **reject**: `POST /memory/{id}/reject` or
|
||
`POST /entities/{id}/reject`
|
||
- sets `status="invalid"`
|
||
- preserves audit trail so repeat extractions don't re-propose
|
||
- **edit-then-promote**: `PUT /memory/{id}` to adjust content, then
|
||
`POST /memory/{id}/promote`
|
||
- every edit is logged, original content preserved in a
|
||
`previous_content_log` column (schema addition deferred to
|
||
the first implementation sprint)
|
||
- **defer**: no action; candidate stays in queue indefinitely
|
||
(future: add a `pending_since` staleness indicator to the UI)
|
||
|
||
### Reviewer authentication
|
||
|
||
In V1 the review queue is single-user by convention. There is no
|
||
per-reviewer authorization. Every promote/reject call is logged
|
||
with the same default identity. Multi-user review is a V2 concern.
|
||
|
||
## Auto-promotion policies (deferred, but designed for)
|
||
|
||
The current V1 stance is: **no auto-promotion, ever**. All
|
||
promotions require a human reviewer.
|
||
|
||
The schema and API are designed so that automatic policies can be
|
||
added later without schema changes. The anticipated policies:
|
||
|
||
1. **Reference-count threshold**
|
||
- If a candidate accumulates N+ references across multiple
|
||
interactions within M days AND the reviewer hasn't seen it yet
|
||
(indicating the system sees it often but the human hasn't
|
||
gotten to it), propose auto-promote
|
||
- Starting thresholds: N=5, M=7 days. Never auto-promote
|
||
entity candidates that affect validation claims or decisions
|
||
without explicit human review — those are too consequential.
|
||
|
||
2. **Confidence threshold**
|
||
- If `final_confidence >= 0.85` AND the rule is a heading
|
||
rule (not a sentence rule), eligible for auto-promotion
|
||
|
||
3. **Identity/preference lane**
|
||
- identity and preference memories extracted from an
|
||
interaction where the user explicitly says "I am X" or
|
||
"I prefer X" with a first-person subject and high-signal
|
||
verb could auto-promote. This is the safest lane because
|
||
the user is the authoritative source for their own identity.
|
||
|
||
None of these run in V1. The APIs and data shape are designed so
|
||
they can be added as a separate policy module without disrupting
|
||
existing tests.
|
||
|
||
## Reversibility
|
||
|
||
Every promotion step must be undoable:
|
||
|
||
| Operation | How to undo |
|
||
|---------------------------|-------------------------------------------------------|
|
||
| memory candidate written | delete the candidate row (low-risk, it was never in context) |
|
||
| memory candidate promoted | `PUT /memory/{id}` status=candidate (reverts to queue) |
|
||
| memory candidate rejected | `PUT /memory/{id}` status=candidate |
|
||
| memory graduated | memory stays as a frozen pointer; delete the entity candidate to undo |
|
||
| entity candidate promoted | `PUT /entities/{id}` status=candidate |
|
||
| entity promoted to active | supersede with a new active, or `PUT` back to candidate |
|
||
|
||
The only irreversible operation is manual curation into L3
|
||
(trusted project state). That is by design — L3 is small, curated,
|
||
and human-authored end to end.
|
||
|
||
## Provenance (what every candidate must carry)
|
||
|
||
Every candidate row, memory or entity, MUST have:
|
||
|
||
- `source_chunk_id` — if extracted from ingested content, the chunk it came from
|
||
- `source_interaction_id` — if extracted from a captured interaction, the interaction it came from
|
||
- `rule` — the extractor rule id that fired
|
||
- `extractor_version` — a semver-ish string the extractor module carries
|
||
so old candidates can be re-evaluated with a newer extractor
|
||
|
||
If both `source_chunk_id` and `source_interaction_id` are null, the
|
||
candidate was hand-authored (via `POST /memory` directly) and must
|
||
be flagged as such. Hand-authored candidates are allowed but
|
||
discouraged — the preference is to extract from real content, not
|
||
dictate candidates directly.
|
||
|
||
The active rows inherit all of these fields from their candidate
|
||
row at promotion time. They are never overwritten.
|
||
|
||
## Extractor versioning
|
||
|
||
The extractor is going to change — new rules added, old rules
|
||
refined, precision/recall tuned over time. The promotion flow
|
||
must survive extractor changes:
|
||
|
||
- every extractor module exposes an `EXTRACTOR_VERSION = "0.1.0"`
|
||
constant
|
||
- every candidate row records this version
|
||
- when the extractor version changes, the change log explains
|
||
what the new rules do
|
||
- old candidates are NOT automatically re-evaluated by the new
|
||
extractor — that would lose the auditable history of why the
|
||
old candidate was created
|
||
- future `POST /memory/{id}/re-extract` can optionally propose
|
||
an updated candidate from the same source chunk with the new
|
||
extractor, but it produces a *new* candidate alongside the old
|
||
one, never a silent rewrite
|
||
|
||
## Ingestion-wave extraction semantics
|
||
|
||
When the batched extraction pass fires on an ingestion wave, it
|
||
produces a report artifact:
|
||
|
||
```
|
||
data/extraction-reports/<wave-id>/
|
||
├── report.json # summary counts, rule distribution
|
||
├── candidates.ndjson # one JSON line per persisted candidate
|
||
├── dropped.ndjson # one JSON line per candidate dropped
|
||
│ # (over batch cap, duplicate, below
|
||
│ # min content length, etc.)
|
||
└── errors.log # any rule-level errors
|
||
```
|
||
|
||
The report artifact lives under the configured `data_dir` and is
|
||
retained per the backup retention policy. The ingestion-waves doc
|
||
(`docs/ingestion-waves.md`) is updated to include an "extract"
|
||
step after each wave, with the expectation that the human
|
||
reviews the candidates before the next wave fires.
|
||
|
||
## Candidate-to-candidate deduplication across passes
|
||
|
||
Two extraction passes over the same chunk (or two different
|
||
chunks containing the same fact) should not produce two identical
|
||
candidate rows. The deduplication key is:
|
||
|
||
```
|
||
(memory_type_or_entity_type, normalized_content, project, status)
|
||
```
|
||
|
||
Normalization strips whitespace variants, lowercases, and drops
|
||
trailing punctuation (same rules as the extractor's `_clean_value`
|
||
function). If a second pass would produce a duplicate, it instead
|
||
increments a `re_extraction_count` column on the existing
|
||
candidate row and updates `last_re_extracted_at`. This gives the
|
||
reviewer a "saw this N times" signal without flooding the queue.
|
||
|
||
This column is a future schema addition — current candidates do
|
||
not track re-extraction. The promotion-rules implementation will
|
||
land the column as part of its first migration.
|
||
|
||
## The "never auto-promote into trusted state" invariant
|
||
|
||
Regardless of what auto-promotion policies might exist between
|
||
L0 → L2', **nothing ever moves into L3 (trusted project state)
|
||
without explicit human action via `POST /project/state`**. This
|
||
is the one hard line in the promotion graph and it is enforced
|
||
by having no API endpoint that takes a candidate id and writes
|
||
to `project_state`.
|
||
|
||
## Summary
|
||
|
||
- Four layers: L0 raw, L1 memory candidate/active, L2 entity
|
||
candidate/active, L3 trusted state
|
||
- Three triggers for extraction: on capture, on ingestion wave, on
|
||
explicit request
|
||
- Per-rule prior confidence, tuned by structural signals at write time
|
||
- Shared candidate review queue, promote/reject/edit/defer actions
|
||
- No auto-promotion in V1 (but the schema allows it later)
|
||
- Every candidate carries full provenance and extractor version
|
||
- Every promotion step is reversible except L3 curation
|
||
- L3 is never touched automatically
|