Files
ATOCore/docs/PHASE-7-MEMORY-CONSOLIDATION.md
Anto01 028d4c3594 feat: Phase 7A — semantic memory dedup ("sleep cycle" V1)
New table memory_merge_candidates + service functions to cluster
near-duplicate active memories within (project, memory_type) buckets,
draft a unified content via LLM, and merge on human approval. Source
memories become superseded (never deleted); merged memory carries
union of tags, max of confidence, sum of reference_count.

- schema migration for memory_merge_candidates
- atocore.memory.similarity: cosine + transitive clustering
- atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific
- service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate
- scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent)
- 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan
- triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar
- batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays)
- deploy/dalidou/dedup-watcher.sh for UI-triggered scans
- 21 new tests (374 → 395)
- docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:30:49 -04:00

97 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 7 — Memory Consolidation (the "Sleep Cycle")
**Status**: 7A in progress · 7B-H scoped, deferred
**Design principle**: *"Like human memory while sleeping, but more robotic — never discard relevant details. Consolidate, update, supersede — don't delete."*
## Why
Phases 16 built capture + triage + graduation + emerging-project detection. What they don't solve:
| # | Problem | Fix |
|---|---|---|
| 1 | Redundancy — "APM uses NX" said 5 different ways across 5 memories | **7A** Semantic dedup |
| 2 | Latent contradictions — "chose Zygo" + "switched from Zygo" both active | **7B** Pair contradiction detection |
| 3 | Tag drift — `firmware`, `fw`, `firmware-control` fragment retrieval | **7C** Tag canonicalization |
| 4 | Confidence staleness — 6-month unreferenced memory ranks as fresh | **7D** Confidence decay |
| 5 | No memory drill-down page | **7E** `/wiki/memories/{id}` |
| 6 | Domain knowledge siloed per project | **7F** `/wiki/domains/{tag}` |
| 7 | Prompt upgrades (llm-0.5 → 0.6) don't re-process old interactions | **7G** Re-extraction on version bump |
| 8 | Superseded memory vectors still in Chroma polluting retrieval | **7H** Vector hygiene |
Collectively: the brain needs a nightly pass that looks at what it already knows and tidies up — dedup, resolve contradictions, canonicalize tags, decay stale facts — **without losing information**.
## Subphases
### 7A — Semantic dedup + consolidation *(this sprint)*
Compute embeddings on active memories, find pairs within `(project, memory_type)` bucket above similarity threshold (default 0.88), cluster, draft a unified memory via LLM, human approves in triage UI. On approve: sources become `superseded`, new merged memory created with union of `source_refs`, sum of `reference_count`, max of `confidence`. **Ships first** because redundancy compounds — every new memory potentially duplicates an old one.
Detailed spec lives in the working plan (`dapper-cooking-tower.md`) and across the files listed under "Files touched" below. Key decisions:
- LLM drafts, human approves — no silent auto-merge.
- Same `(project, memory_type)` bucket only. Cross-project merges are rare + risky → separate flow in 7B.
- Recompute embeddings each scan (~2s / 335 memories). Persist only if scan time becomes a problem.
- Cluster-based proposals (A~B~C → one merge), not pair-based.
- `status=superseded` never deleted — still queryable with filter.
**Schema**: new table `memory_merge_candidates` (pending | approved | rejected).
**Cron**: nightly at threshold 0.90 (tight); weekly (Sundays) at 0.85 (deeper cleanup).
**UI**: new "🔗 Merge Candidates" section in `/admin/triage`.
**Files touched in 7A**:
- `src/atocore/models/database.py` — migration
- `src/atocore/memory/similarity.py` — new, `compute_memory_similarity()`
- `src/atocore/memory/_dedup_prompt.py` — new, shared LLM prompt
- `src/atocore/memory/service.py``merge_memories()`
- `scripts/memory_dedup.py` — new, host-side detector (HTTP-only)
- `src/atocore/api/routes.py` — 5 new endpoints under `/admin/memory/`
- `src/atocore/engineering/triage_ui.py` — merge cards section
- `deploy/dalidou/batch-extract.sh` — Step B3
- `deploy/dalidou/dedup-watcher.sh` — new, UI-triggered scans
- `tests/test_memory_dedup.py` — ~10-15 new tests
### 7B — Memory-to-memory contradiction detection
Same embedding-pair machinery as 7A but within a *different* band (similarity 0.700.88 — semantically related but different wording). LLM classifies each pair: `duplicate | complementary | contradicts | supersedes-older`. Contradictions write a `memory_conflicts` row + surface a triage badge. Clear supersessions (both tier 1 sonnet and tier 2 opus agree) auto-mark the older as `superseded`.
### 7C — Tag canonicalization
Weekly LLM pass over `domain_tags` distribution, proposes `alias → canonical` map (e.g. `fw → firmware`). Human approves via UI (one-click pattern, same as emerging-project registration). Bulk-rewrites `domain_tags` atomically across all memories.
### 7D — Confidence decay
Daily lightweight job. For memories with `reference_count=0` AND `last_referenced_at` older than 30 days: multiply confidence by 0.97/day (~2-month half-life). Reinforcement already bumps confidence. Below 0.3 → auto-supersede with reason `decayed, no references`. Reversible (tune half-life), non-destructive (still searchable with status filter).
### 7E — Memory detail page `/wiki/memories/{id}`
Provenance chain: source_chunk → interaction → graduated_to_entity. Audit trail (Phase 4 has the data). Related memories (same project + tag + semantic neighbors). Decay trajectory plot (if 7D ships). Link target from every memory surfaced anywhere in the wiki.
### 7F — Cross-project domain view `/wiki/domains/{tag}`
One page per `domain_tag` showing all memories + graduated entities with that tag, grouped by project. "Optics across p04+p05+p06" becomes a real navigable page. Answers the long-standing question the tag system was meant to enable.
### 7G — Re-extraction on prompt upgrade
`batch_llm_extract_live.py --force-reextract --since DATE`. Dedupe key: `(interaction_id, extractor_version)` — same run on same interaction doesn't double-create. Triggered manually when `LLM_EXTRACTOR_VERSION` bumps. Not automatic (destructive).
### 7H — Vector store hygiene
Nightly: scan `source_chunks` and `memory_embeddings` (added in 7A V2) for `status=superseded|invalid`. Delete matching vectors from Chroma. Fail-open — the retrieval harness catches any real regression.
## Verification & ship order
1. **7A** — ship + observe 1 week → validate merge proposals are high-signal, rejection rate acceptable
2. **7D** — decay is low-risk + high-compounding value; ship second
3. **7C** — clean up tag fragmentation before 7F depends on canonical tags
4. **7E** + **7F** — UX surfaces; ship together once data is clean
5. **7B** — contradictions flow (pairs harder than duplicates to classify; wait for 7A data to tune threshold)
6. **7G** — on-demand; no ship until we actually bump the extractor prompt
7. **7H** — housekeeping; after 7A + 7B + 7D have generated enough `superseded` rows to matter
## Scope NOT in Phase 7
- Graduated memories (entity-descended) are **frozen** — exempt from dedup/decay. Entity consolidation is a separate Phase (8+).
- Auto-merging without human approval (always human-in-the-loop in V1).
- Summarization / compression — a different problem (reducing the number of chunks per memory, not the number of memories).
- Forgetting policies — there's no user-facing "delete this" flow in Phase 7. Supersede + filter covers the need.