feat: Phase 7A — semantic memory dedup ("sleep cycle" V1)

New table memory_merge_candidates + service functions to cluster
near-duplicate active memories within (project, memory_type) buckets,
draft a unified content via LLM, and merge on human approval. Source
memories become superseded (never deleted); merged memory carries
union of tags, max of confidence, sum of reference_count.

- schema migration for memory_merge_candidates
- atocore.memory.similarity: cosine + transitive clustering
- atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific
- service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate
- scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent)
- 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan
- triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar
- batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays)
- deploy/dalidou/dedup-watcher.sh for UI-triggered scans
- 21 new tests (374 → 395)
- docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-18 10:30:49 -04:00
parent 9f262a21b0
commit 028d4c3594
12 changed files with 1860 additions and 8 deletions

View File

@@ -251,6 +251,42 @@ def _apply_migrations(conn: sqlite3.Connection) -> None:
"CREATE INDEX IF NOT EXISTS idx_interactions_created_at ON interactions(created_at)"
)
# Phase 7A (Memory Consolidation — "sleep cycle"): merge candidates.
# When the dedup detector finds a cluster of semantically similar active
# memories within the same (project, memory_type) bucket, it drafts a
# unified content via LLM and writes a proposal here. The triage UI
# surfaces these for human approval. On approve, source memories become
# status=superseded and a new merged memory is created.
# memory_ids is a JSON array (length >= 2) of the source memory ids.
# proposed_* hold the LLM's draft; a human can edit before approve.
# result_memory_id is filled on approve with the new merged memory's id.
conn.execute(
"""
CREATE TABLE IF NOT EXISTS memory_merge_candidates (
id TEXT PRIMARY KEY,
status TEXT DEFAULT 'pending',
memory_ids TEXT NOT NULL,
similarity REAL,
proposed_content TEXT,
proposed_memory_type TEXT,
proposed_project TEXT,
proposed_tags TEXT DEFAULT '[]',
proposed_confidence REAL,
reason TEXT DEFAULT '',
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
resolved_at DATETIME,
resolved_by TEXT,
result_memory_id TEXT
)
"""
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_mmc_status ON memory_merge_candidates(status)"
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_mmc_created_at ON memory_merge_candidates(created_at)"
)
def _column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
rows = conn.execute(f"PRAGMA table_info({table})").fetchall()