New table memory_merge_candidates + service functions to cluster near-duplicate active memories within (project, memory_type) buckets, draft a unified content via LLM, and merge on human approval. Source memories become superseded (never deleted); merged memory carries union of tags, max of confidence, sum of reference_count. - schema migration for memory_merge_candidates - atocore.memory.similarity: cosine + transitive clustering - atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific - service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate - scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent) - 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan - triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar - batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays) - deploy/dalidou/dedup-watcher.sh for UI-triggered scans - 21 new tests (374 → 395) - docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
97 lines
6.6 KiB
Markdown
97 lines
6.6 KiB
Markdown
# Phase 7 — Memory Consolidation (the "Sleep Cycle")
|
||
|
||
**Status**: 7A in progress · 7B-H scoped, deferred
|
||
**Design principle**: *"Like human memory while sleeping, but more robotic — never discard relevant details. Consolidate, update, supersede — don't delete."*
|
||
|
||
## Why
|
||
|
||
Phases 1–6 built capture + triage + graduation + emerging-project detection. What they don't solve:
|
||
|
||
| # | Problem | Fix |
|
||
|---|---|---|
|
||
| 1 | Redundancy — "APM uses NX" said 5 different ways across 5 memories | **7A** Semantic dedup |
|
||
| 2 | Latent contradictions — "chose Zygo" + "switched from Zygo" both active | **7B** Pair contradiction detection |
|
||
| 3 | Tag drift — `firmware`, `fw`, `firmware-control` fragment retrieval | **7C** Tag canonicalization |
|
||
| 4 | Confidence staleness — 6-month unreferenced memory ranks as fresh | **7D** Confidence decay |
|
||
| 5 | No memory drill-down page | **7E** `/wiki/memories/{id}` |
|
||
| 6 | Domain knowledge siloed per project | **7F** `/wiki/domains/{tag}` |
|
||
| 7 | Prompt upgrades (llm-0.5 → 0.6) don't re-process old interactions | **7G** Re-extraction on version bump |
|
||
| 8 | Superseded memory vectors still in Chroma polluting retrieval | **7H** Vector hygiene |
|
||
|
||
Collectively: the brain needs a nightly pass that looks at what it already knows and tidies up — dedup, resolve contradictions, canonicalize tags, decay stale facts — **without losing information**.
|
||
|
||
## Subphases
|
||
|
||
### 7A — Semantic dedup + consolidation *(this sprint)*
|
||
|
||
Compute embeddings on active memories, find pairs within `(project, memory_type)` bucket above similarity threshold (default 0.88), cluster, draft a unified memory via LLM, human approves in triage UI. On approve: sources become `superseded`, new merged memory created with union of `source_refs`, sum of `reference_count`, max of `confidence`. **Ships first** because redundancy compounds — every new memory potentially duplicates an old one.
|
||
|
||
Detailed spec lives in the working plan (`dapper-cooking-tower.md`) and across the files listed under "Files touched" below. Key decisions:
|
||
|
||
- LLM drafts, human approves — no silent auto-merge.
|
||
- Same `(project, memory_type)` bucket only. Cross-project merges are rare + risky → separate flow in 7B.
|
||
- Recompute embeddings each scan (~2s / 335 memories). Persist only if scan time becomes a problem.
|
||
- Cluster-based proposals (A~B~C → one merge), not pair-based.
|
||
- `status=superseded` never deleted — still queryable with filter.
|
||
|
||
**Schema**: new table `memory_merge_candidates` (pending | approved | rejected).
|
||
**Cron**: nightly at threshold 0.90 (tight); weekly (Sundays) at 0.85 (deeper cleanup).
|
||
**UI**: new "🔗 Merge Candidates" section in `/admin/triage`.
|
||
|
||
**Files touched in 7A**:
|
||
- `src/atocore/models/database.py` — migration
|
||
- `src/atocore/memory/similarity.py` — new, `compute_memory_similarity()`
|
||
- `src/atocore/memory/_dedup_prompt.py` — new, shared LLM prompt
|
||
- `src/atocore/memory/service.py` — `merge_memories()`
|
||
- `scripts/memory_dedup.py` — new, host-side detector (HTTP-only)
|
||
- `src/atocore/api/routes.py` — 5 new endpoints under `/admin/memory/`
|
||
- `src/atocore/engineering/triage_ui.py` — merge cards section
|
||
- `deploy/dalidou/batch-extract.sh` — Step B3
|
||
- `deploy/dalidou/dedup-watcher.sh` — new, UI-triggered scans
|
||
- `tests/test_memory_dedup.py` — ~10-15 new tests
|
||
|
||
### 7B — Memory-to-memory contradiction detection
|
||
|
||
Same embedding-pair machinery as 7A but within a *different* band (similarity 0.70–0.88 — semantically related but different wording). LLM classifies each pair: `duplicate | complementary | contradicts | supersedes-older`. Contradictions write a `memory_conflicts` row + surface a triage badge. Clear supersessions (both tier 1 sonnet and tier 2 opus agree) auto-mark the older as `superseded`.
|
||
|
||
### 7C — Tag canonicalization
|
||
|
||
Weekly LLM pass over `domain_tags` distribution, proposes `alias → canonical` map (e.g. `fw → firmware`). Human approves via UI (one-click pattern, same as emerging-project registration). Bulk-rewrites `domain_tags` atomically across all memories.
|
||
|
||
### 7D — Confidence decay
|
||
|
||
Daily lightweight job. For memories with `reference_count=0` AND `last_referenced_at` older than 30 days: multiply confidence by 0.97/day (~2-month half-life). Reinforcement already bumps confidence. Below 0.3 → auto-supersede with reason `decayed, no references`. Reversible (tune half-life), non-destructive (still searchable with status filter).
|
||
|
||
### 7E — Memory detail page `/wiki/memories/{id}`
|
||
|
||
Provenance chain: source_chunk → interaction → graduated_to_entity. Audit trail (Phase 4 has the data). Related memories (same project + tag + semantic neighbors). Decay trajectory plot (if 7D ships). Link target from every memory surfaced anywhere in the wiki.
|
||
|
||
### 7F — Cross-project domain view `/wiki/domains/{tag}`
|
||
|
||
One page per `domain_tag` showing all memories + graduated entities with that tag, grouped by project. "Optics across p04+p05+p06" becomes a real navigable page. Answers the long-standing question the tag system was meant to enable.
|
||
|
||
### 7G — Re-extraction on prompt upgrade
|
||
|
||
`batch_llm_extract_live.py --force-reextract --since DATE`. Dedupe key: `(interaction_id, extractor_version)` — same run on same interaction doesn't double-create. Triggered manually when `LLM_EXTRACTOR_VERSION` bumps. Not automatic (destructive).
|
||
|
||
### 7H — Vector store hygiene
|
||
|
||
Nightly: scan `source_chunks` and `memory_embeddings` (added in 7A V2) for `status=superseded|invalid`. Delete matching vectors from Chroma. Fail-open — the retrieval harness catches any real regression.
|
||
|
||
## Verification & ship order
|
||
|
||
1. **7A** — ship + observe 1 week → validate merge proposals are high-signal, rejection rate acceptable
|
||
2. **7D** — decay is low-risk + high-compounding value; ship second
|
||
3. **7C** — clean up tag fragmentation before 7F depends on canonical tags
|
||
4. **7E** + **7F** — UX surfaces; ship together once data is clean
|
||
5. **7B** — contradictions flow (pairs harder than duplicates to classify; wait for 7A data to tune threshold)
|
||
6. **7G** — on-demand; no ship until we actually bump the extractor prompt
|
||
7. **7H** — housekeeping; after 7A + 7B + 7D have generated enough `superseded` rows to matter
|
||
|
||
## Scope NOT in Phase 7
|
||
|
||
- Graduated memories (entity-descended) are **frozen** — exempt from dedup/decay. Entity consolidation is a separate Phase (8+).
|
||
- Auto-merging without human approval (always human-in-the-loop in V1).
|
||
- Summarization / compression — a different problem (reducing the number of chunks per memory, not the number of memories).
|
||
- Forgetting policies — there's no user-facing "delete this" flow in Phase 7. Supersede + filter covers the need.
|