Commit Graph

3 Commits

Author SHA1 Message Date
90001c1956 fix(7A): host-side memory_dedup.py must stay stdlib-only
Broke the dedup-watcher cron when I wrote memory_dedup.py in session
7A: imported atocore.memory.similarity, which transitively pulls
sentence-transformers + pydantic_settings onto host Python that
intentionally doesn't have them. Every UI-triggered + cron dedup scan
since 7A deployed was silently crashing with ModuleNotFoundError
(visible only in /home/papa/atocore-logs/dedup-ondemand-*.log).

I even documented this architecture rule in atocore.memory._llm_prompt
('This module MUST stay stdlib-only') then violated it one session
later. Shame.

Real fix — matches the extractor pattern:
- New endpoint POST /admin/memory/dedup-cluster on the server: takes
  {project, similarity_threshold, max_clusters}, runs the embedding +
  transitive-clustering inside the container where
  sentence-transformers lives, returns cluster shape.
- scripts/memory_dedup.py now pure stdlib: pulls clusters via HTTP,
  LLM-drafts merges via claude CLI, POSTs proposals back. No atocore
  imports beyond the stdlib-only _dedup_prompt shared module.
- Regression test pins the rule: test_memory_dedup_script_is_stdlib_only
  snapshots sys.modules before/after importing the script and asserts
  no non-allowed atocore modules were pulled.

Also: similarity.py + cluster_by_threshold stay server-side, still
covered by the same tests that used to live in the host tier-helper
section.

Tests 459 → 458 (-1 via rewrite of obsolete host-tier helper tests,
+2 for the new stdlib-only regression + endpoint shape tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:18:00 -04:00
56d5df0ab4 feat: Phase 7A.1 — autonomous merge tiering (sonnet → opus → human)
Dedup detector now merges high-confidence duplicates silently instead
of piling every proposal into a human triage queue. Matches the 3-tier
escalation pattern that auto_triage already uses.

Tiering decision per cluster:
  TIER-1 auto-approve: sonnet confidence >= 0.8 AND min_pairwise_sim >= 0.92
                       AND all sources share project+type → auto-merge silently
                       (actor="auto-dedup-tier1" in audit log)
  TIER-2 escalation:   sonnet 0.5-0.8 conf OR sim 0.85-0.92 → opus second opinion.
                       Opus confirms with conf >= 0.8 → auto-merge (actor="auto-dedup-tier2").
                       Opus overrides (reject) → skip silently.
                       Opus low conf → human triage with opus's refined draft.
  HUMAN triage:        Only the genuinely ambiguous land in /admin/triage.

Env-tunable thresholds:
  ATOCORE_DEDUP_AUTO_APPROVE_CONF (0.8)
  ATOCORE_DEDUP_AUTO_APPROVE_SIM (0.92)
  ATOCORE_DEDUP_TIER2_MIN_CONF (0.5)
  ATOCORE_DEDUP_TIER2_MIN_SIM (0.85)
  ATOCORE_DEDUP_TIER2_MODEL (opus)

New flag --no-auto-approve for kill-switch testing (everything → human queue).

Tests: +6 (tier-2 prompt content, same_bucket edges, min_pairwise_similarity
on identical + transitive clusters). 395 → 401.

Rationale: user asked for autonomous behavior — "this needs to be intelligent,
I don't want to manually triage stuff". Matches the consolidation principle:
never discard details, but let the brain tidy up on its own for the easy cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:46:26 -04:00
028d4c3594 feat: Phase 7A — semantic memory dedup ("sleep cycle" V1)
New table memory_merge_candidates + service functions to cluster
near-duplicate active memories within (project, memory_type) buckets,
draft a unified content via LLM, and merge on human approval. Source
memories become superseded (never deleted); merged memory carries
union of tags, max of confidence, sum of reference_count.

- schema migration for memory_merge_candidates
- atocore.memory.similarity: cosine + transitive clustering
- atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific
- service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate
- scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent)
- 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan
- triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar
- batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays)
- deploy/dalidou/dedup-watcher.sh for UI-triggered scans
- 21 new tests (374 → 395)
- docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:30:49 -04:00