docs(ledger): session log for Phase 7A.1/7C/7D/7I + UI refresh + capture surface scope

feat: Claude Code context injection (UserPromptSubmit hook)
Closes the asymmetry the user surfaced: before this, Claude Code captured every turn (Stop hook) but retrieval only happened when Claude chose to call atocore_context (opt-in MCP tool). OpenClaw had both sides covered after 7I; Claude Code did not. Now symmetric. Every Claude Code prompt is auto-sent to /context/build and the returned pack is prepended via hookSpecificOutput.additionalContext — same as what OpenClaw's before_agent_start hook now does. - deploy/hooks/inject_context.py — UserPromptSubmit hook. Fail-open (always exit 0). Skips short/XML prompts. 5s timeout. Project inference mirrors capture_stop.py cwd→slug table. Kill switch: ATOCORE_CONTEXT_DISABLED=1. - ~/.claude/settings.json registered the hook (local config, not committed; copy-paste snippet in docs/capture-surfaces.md). - Removed /wiki/capture from topnav. Endpoint still exists but the page is now labeled "fallback only" with a warning banner. The sanctioned surfaces are Claude Code + OpenClaw; manual paste is explicitly not the design. - docs/capture-surfaces.md — scope statement: two surfaces, nothing else. Anthropic API polling explicitly prohibited. Tests: +8 for inject_context.py (exit 0 on all failure modes, kill switch, short prompt filter, XML filter, bad stdin, mock-server success shape, project inference from cwd). Updated 2 wiki tests for the topnav change. 450 → 459. Verified live with real AtoCore: injected 2979 chars of atocore project context on a cwd-matched prompt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:14:14 -04:00 · 2026-04-19 12:01:41 -04:00 · 2026-04-19 10:14:15 -04:00 · 2026-04-19 09:41:02 -04:00 · 2026-04-18 16:50:20 -04:00 · 2026-04-18 15:46:26 -04:00
31 changed files with 5728 additions and 292 deletions
--- a/DEV-LEDGER.md
+++ b/DEV-LEDGER.md
@@ -7,9 +7,9 @@
 ## Orientation

 - **live_sha** (Dalidou `/health` build_sha): `775960c` (verified 2026-04-16 via /health, build_time 2026-04-16T17:59:30Z)
- **last_updated**: 2026-04-16 by Claude ("Make It Actually Useful" sprint — observability + Phase 10)
+- **last_updated**: 2026-04-18 by Claude (Phase 7A — Memory Consolidation "sleep cycle" V1 on branch, not yet deployed)
 - **main_tip**: `999788b`
- **test_count**: 303 (4 new Phase 10 tests)
+- **test_count**: 395 (21 new Phase 7A dedup tests + accumulated Phase 5/6 tests since last ledger refresh)
 - **harness**: `17/18 PASS` on live Dalidou (p04-constraints expects "Zerodur" — retrieval content gap, not regression)
 - **vectors**: 33,253
 - **active_memories**: 84 (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
@@ -160,6 +160,10 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha

 ## Session Log

+- **2026-04-19 Claude** Shipped Phases 7A.1 (tiered auto-merge), 7C (tag canonicalization), 7D (confidence decay), 7I (OpenClaw context injection), UI refresh (memory/domain/activity pages + topnav), and closed the Claude Code retrieval asymmetry. Builds deployed: `028d4c3` → `56d5df0` → `e840ef4` → `877b97e` → `6e43cc7` → `9c91d77`. New capture-surface scope: Claude Code (Stop + UserPromptSubmit hooks, both installed and verified live) + OpenClaw (v0.2.0 plugin with capture + context injection, verified loaded on T420 gateway). `/wiki/capture` paste form removed from topnav; kept as labeled fallback. Anthropic API polling explicitly out of scope per user. Tests 414 → 459. `docs/capture-surfaces.md` documents the sanctioned scope.
+
+- **2026-04-18 Claude** **Phase 7A — Memory Consolidation V1 ("sleep cycle") landed on branch.** New `docs/PHASE-7-MEMORY-CONSOLIDATION.md` covers all 8 subphases (7A dedup, 7B contradictions, 7C tag canon, 7D confidence decay, 7E memory detail, 7F domain view, 7F re-extract, 7H vector hygiene). 7A implementation: schema migration `memory_merge_candidates`, `atocore.memory.similarity` (cosine + transitive cluster), stdlib-only `atocore.memory._dedup_prompt` (llm drafts unified content preserving all specifics), `merge_memories()` + `create_merge_candidate()` + `get_merge_candidates()` + `reject_merge_candidate()` in service.py, host-side `scripts/memory_dedup.py` (HTTP + claude -p, idempotent via sorted-id set), 5 new endpoints under `/admin/memory/merge-candidates*` + `/admin/memory/dedup-scan` + `/admin/memory/dedup-status`, purple-themed "🔗 Merge Candidates" section in /admin/triage with editable draft + approve/reject buttons, "🔗 Scan for duplicates" control bar with threshold slider, nightly Step B3 in batch-extract.sh (0.90 daily, 0.85 Sundays deep), `deploy/dalidou/dedup-watcher.sh` host watcher for UI-triggered scans (mirrors graduation-watcher pattern). 21 new tests (similarity, prompt parse, idempotency, merge happy path, override content/tags, audit rows, abort-if-source-tampered, reject leaves sources alone, schema). Tests 374 → 395. Not yet deployed; harness not re-run. Next: push + deploy, install `dedup-watcher.sh` in host cron, trigger first scan, review proposals in UI.
+
 - **2026-04-16 Claude** `b687e7f..999788b` **"Make It Actually Useful" sprint.** Two-part session: ops fixes then consolidation sprint.

  **Part 1 — Ops fixes:** Deployed `b687e7f` (project inference from cwd). Fixed cron logging (was `/dev/null` — redirected to `~/atocore-logs/`). Fixed OpenClaw gateway crash-loop (`discord.replyToMode: "any"` invalid → `"all"`). Deployed `atocore-capture` plugin on T420 OpenClaw using `before_agent_start` + `llm_output` hooks — verified end-to-end: 38 `client=openclaw` interactions captured. Backfilled project tags on 179/181 unscoped interactions (165 atocore, 8 p06, 6 p04).
--- a/deploy/dalidou/batch-extract.sh
+++ b/deploy/dalidou/batch-extract.sh
@@ -150,6 +150,66 @@ print(f'Pipeline summary persisted: {json.dumps(summary)}')
    log "WARN: pipeline summary persistence failed (non-blocking)"
 }

+# Step F2: Emerging-concepts detector (Phase 6 C.1)
+log "Step F2: emerging-concepts detector"
+python3 "$APP_DIR/scripts/detect_emerging.py" \
+    --base-url "$ATOCORE_URL" \
+    2>&1 || {
+    log "WARN: emerging detector failed (non-blocking)"
+}
+
+# Step F3: Transient-to-durable extension (Phase 6 C.3)
+log "Step F3: transient-to-durable extension"
+curl -sSf -X POST "$ATOCORE_URL/admin/memory/extend-reinforced" \
+    -H 'Content-Type: application/json' \
+    2>&1 | tail -5 || {
+    log "WARN: extend-reinforced failed (non-blocking)"
+}
+
+# Step F4: Confidence decay on unreferenced cold memories (Phase 7D)
+# Daily: memories with reference_count=0 AND idle > 30 days → confidence × 0.97.
+# Below 0.3 → auto-supersede with audit. Reversible via reinforcement.
+log "Step F4: confidence decay"
+curl -sSf -X POST "$ATOCORE_URL/admin/memory/decay-run" \
+    -H 'Content-Type: application/json' \
+    -d '{"idle_days_threshold": 30, "daily_decay_factor": 0.97, "supersede_confidence_floor": 0.30}' \
+    2>&1 | tail -5 || {
+    log "WARN: decay-run failed (non-blocking)"
+}
+
+# Step B3: Memory dedup scan (Phase 7A)
+# Nightly at 0.90 (tight — only near-duplicates). Sundays run a deeper
+# pass at 0.85 to catch semantically-similar-but-differently-worded memories.
+if [[ "$(date -u +%u)" == "7" ]]; then
+    DEDUP_THRESHOLD="0.85"
+    DEDUP_BATCH="80"
+    log "Step B3: memory dedup (Sunday deep pass, threshold $DEDUP_THRESHOLD)"
+else
+    DEDUP_THRESHOLD="0.90"
+    DEDUP_BATCH="50"
+    log "Step B3: memory dedup (daily, threshold $DEDUP_THRESHOLD)"
+fi
+python3 "$APP_DIR/scripts/memory_dedup.py" \
+    --base-url "$ATOCORE_URL" \
+    --similarity-threshold "$DEDUP_THRESHOLD" \
+    --max-batch "$DEDUP_BATCH" \
+    2>&1 || {
+    log "WARN: memory dedup failed (non-blocking)"
+}
+
+# Step B4: Tag canonicalization (Phase 7C, weekly Sundays)
+# Autonomous: LLM proposes alias→canonical maps, auto-applies confidence >= 0.8.
+# Projects tokens are protected (skipped on both sides). Borderline proposals
+# land in /admin/tags/aliases for human review.
+if [[ "$(date -u +%u)" == "7" ]]; then
+    log "Step B4: tag canonicalization (Sunday)"
+    python3 "$APP_DIR/scripts/canonicalize_tags.py" \
+        --base-url "$ATOCORE_URL" \
+        2>&1 || {
+        log "WARN: tag canonicalization failed (non-blocking)"
+    }
+fi
+
 # Step G: Integrity check (Phase 4 V1)
 log "Step G: integrity check"
 python3 "$APP_DIR/scripts/integrity_check.py" \
--- a/deploy/dalidou/dedup-watcher.sh
+++ b/deploy/dalidou/dedup-watcher.sh
@@ -0,0 +1,110 @@
+#!/usr/bin/env bash
+#
+# deploy/dalidou/dedup-watcher.sh
+# -------------------------------
+# Host-side watcher for on-demand memory dedup scans (Phase 7A).
+#
+# The /admin/triage page has a "🔗 Scan for duplicates" button that POSTs
+# to /admin/memory/dedup-scan with {project, similarity_threshold, max_batch}.
+# The container writes this to project_state (atocore/config/dedup_requested_at).
+#
+# This script runs on the Dalidou HOST (where claude CLI lives), polls
+# for the flag, and runs memory_dedup.py when seen.
+#
+# Installed via cron every 2 minutes:
+#   */2 * * * * /srv/storage/atocore/app/deploy/dalidou/dedup-watcher.sh \
+#     >> /home/papa/atocore-logs/dedup-watcher.log 2>&1
+#
+# Mirrors deploy/dalidou/graduation-watcher.sh exactly.
+
+set -euo pipefail
+
+ATOCORE_URL="${ATOCORE_URL:-http://127.0.0.1:8100}"
+APP_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
+LOCK_FILE="/tmp/atocore-dedup.lock"
+LOG_DIR="/home/papa/atocore-logs"
+mkdir -p "$LOG_DIR"
+
+TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+log() { printf '[%s] %s\n' "$TS" "$*"; }
+
+# Fetch the flag via API
+STATE_JSON=$(curl -sSf --max-time 5 "$ATOCORE_URL/project/state/atocore" 2>/dev/null || echo "{}")
+REQUESTED=$(echo "$STATE_JSON" | python3 -c "
+import sys, json
+try:
+    d = json.load(sys.stdin)
+    for e in d.get('entries', d.get('state', [])):
+        if e.get('category') == 'config' and e.get('key') == 'dedup_requested_at':
+            print(e.get('value', ''))
+            break
+except Exception:
+    pass
+" 2>/dev/null || echo "")
+
+if [[ -z "$REQUESTED" ]]; then
+    exit 0
+fi
+
+PROJECT=$(echo "$REQUESTED" | python3 -c "import sys,json; print(json.loads(sys.stdin.read() or '{}').get('project',''))" 2>/dev/null || echo "")
+THRESHOLD=$(echo "$REQUESTED" | python3 -c "import sys,json; print(json.loads(sys.stdin.read() or '{}').get('similarity_threshold',0.88))" 2>/dev/null || echo "0.88")
+MAX_BATCH=$(echo "$REQUESTED" | python3 -c "import sys,json; print(json.loads(sys.stdin.read() or '{}').get('max_batch',50))" 2>/dev/null || echo "50")
+
+# Acquire lock
+exec 9>"$LOCK_FILE" || exit 0
+if ! flock -n 9; then
+    log "dedup already running, skipping"
+    exit 0
+fi
+
+# Mark running
+curl -sSf -X POST "$ATOCORE_URL/project/state" \
+    -H 'Content-Type: application/json' \
+    -d "{\"project\":\"atocore\",\"category\":\"status\",\"key\":\"dedup_running\",\"value\":\"1\",\"source\":\"dedup watcher\"}" \
+    >/dev/null 2>&1 || true
+curl -sSf -X POST "$ATOCORE_URL/project/state" \
+    -H 'Content-Type: application/json' \
+    -d "{\"project\":\"atocore\",\"category\":\"status\",\"key\":\"dedup_last_started_at\",\"value\":\"$TS\",\"source\":\"dedup watcher\"}" \
+    >/dev/null 2>&1 || true
+
+LOG_FILE="$LOG_DIR/dedup-ondemand-$(date -u +%Y%m%d-%H%M%S).log"
+log "Starting dedup (project='$PROJECT' threshold=$THRESHOLD max_batch=$MAX_BATCH, log: $LOG_FILE)"
+
+# Clear the flag BEFORE running so duplicate clicks queue at most one
+curl -sSf -X DELETE "$ATOCORE_URL/project/state" \
+    -H 'Content-Type: application/json' \
+    -d "{\"project\":\"atocore\",\"category\":\"config\",\"key\":\"dedup_requested_at\"}" \
+    >/dev/null 2>&1 || true
+
+cd "$APP_DIR"
+export PYTHONPATH="$APP_DIR/src:${PYTHONPATH:-}"
+ARGS=(--base-url "$ATOCORE_URL" --similarity-threshold "$THRESHOLD" --max-batch "$MAX_BATCH")
+if [[ -n "$PROJECT" ]]; then
+    ARGS+=(--project "$PROJECT")
+fi
+
+if python3 scripts/memory_dedup.py "${ARGS[@]}" >> "$LOG_FILE" 2>&1; then
+    RESULT=$(grep "^summary:" "$LOG_FILE" | tail -1 || tail -1 "$LOG_FILE")
+    RESULT="${RESULT:-completed}"
+    log "dedup finished: $RESULT"
+else
+    RESULT="ERROR — see $LOG_FILE"
+    log "dedup FAILED"
+fi
+
+FINISH_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+
+curl -sSf -X POST "$ATOCORE_URL/project/state" \
+    -H 'Content-Type: application/json' \
+    -d "{\"project\":\"atocore\",\"category\":\"status\",\"key\":\"dedup_running\",\"value\":\"0\",\"source\":\"dedup watcher\"}" \
+    >/dev/null 2>&1 || true
+curl -sSf -X POST "$ATOCORE_URL/project/state" \
+    -H 'Content-Type: application/json' \
+    -d "{\"project\":\"atocore\",\"category\":\"status\",\"key\":\"dedup_last_finished_at\",\"value\":\"$FINISH_TS\",\"source\":\"dedup watcher\"}" \
+    >/dev/null 2>&1 || true
+
+SAFE_RESULT=$(printf '%s' "$RESULT" | python3 -c "import sys,json; print(json.dumps(sys.stdin.read())[1:-1])")
+curl -sSf -X POST "$ATOCORE_URL/project/state" \
+    -H 'Content-Type: application/json' \
+    -d "{\"project\":\"atocore\",\"category\":\"status\",\"key\":\"dedup_last_result\",\"value\":\"$SAFE_RESULT\",\"source\":\"dedup watcher\"}" \
+    >/dev/null 2>&1 || true
--- a/deploy/dalidou/hourly-extract.sh
+++ b/deploy/dalidou/hourly-extract.sh
@@ -0,0 +1,64 @@
+#!/usr/bin/env bash
+#
+# deploy/dalidou/hourly-extract.sh
+# ---------------------------------
+# Lightweight hourly extraction + triage so autonomous capture stays
+# current (not a 24h-latency nightly-only affair).
+#
+# Does ONLY:
+#   Step A: LLM extraction over recent interactions (last 2h window)
+#   Step B: 3-tier auto-triage on the resulting candidates
+#
+# Skips the heavy nightly stuff (backup, rsync, OpenClaw import,
+# synthesis, harness, integrity check, emerging detector). Those stay
+# in cron-backup.sh at 03:00 UTC.
+#
+# Runs every hour via cron:
+#   0 * * * * /srv/storage/atocore/app/deploy/dalidou/hourly-extract.sh \
+#       >> /home/papa/atocore-logs/hourly-extract.log 2>&1
+#
+# Lock file prevents overlap if a previous run is still going (which
+# can happen if claude CLI rate-limits and retries).
+
+set -euo pipefail
+
+ATOCORE_URL="${ATOCORE_URL:-http://127.0.0.1:8100}"
+# 50 recent interactions is enough for an hour — typical usage is under 20/h.
+LIMIT="${ATOCORE_HOURLY_EXTRACT_LIMIT:-50}"
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+APP_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
+TIMESTAMP="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+LOCK_FILE="/tmp/atocore-hourly-extract.lock"
+
+log() { printf '[%s] %s\n' "$TIMESTAMP" "$*"; }
+
+# Acquire lock (non-blocking)
+exec 9>"$LOCK_FILE" || exit 0
+if ! flock -n 9; then
+    log "hourly extract already running, skipping"
+    exit 0
+fi
+
+export PYTHONPATH="$APP_DIR/src:${PYTHONPATH:-}"
+
+log "=== hourly extract+triage starting ==="
+
+# Step A — Extract candidates from recent interactions
+log "Step A: LLM extraction (since last run)"
+python3 "$APP_DIR/scripts/batch_llm_extract_live.py" \
+    --base-url "$ATOCORE_URL" \
+    --limit "$LIMIT" \
+    2>&1 || {
+    log "WARN: batch extraction failed (non-blocking)"
+}
+
+# Step B — 3-tier auto-triage (sonnet → opus → discard)
+log "Step B: auto-triage (3-tier)"
+python3 "$APP_DIR/scripts/auto_triage.py" \
+    --base-url "$ATOCORE_URL" \
+    --max-batches 3 \
+    2>&1 || {
+    log "WARN: auto-triage failed (non-blocking)"
+}
+
+log "=== hourly extract+triage complete ==="
--- a/deploy/hooks/inject_context.py
+++ b/deploy/hooks/inject_context.py
@@ -0,0 +1,174 @@
+#!/usr/bin/env python3
+"""Claude Code UserPromptSubmit hook: inject AtoCore context.
+
+Mirrors the OpenClaw 7I pattern on the Claude Code side. Every user
+prompt submitted to Claude Code is (a) sent to /context/build on the
+AtoCore API, and (b) the returned context pack is prepended to the
+prompt the LLM sees — so Claude Code answers grounded in what AtoCore
+already knows, same as OpenClaw now does.
+
+Contract per Claude Code hooks spec:
+  stdin: JSON with `prompt`, `session_id`, `transcript_path`, `cwd`,
+         `hook_event_name`, etc.
+  stdout on success: JSON
+      {"hookSpecificOutput":
+          {"hookEventName": "UserPromptSubmit",
+           "additionalContext": "<pack>"}}
+  exit 0 always — fail open. An unreachable AtoCore must never block
+  the user's prompt.
+
+Environment variables:
+  ATOCORE_URL                 base URL (default http://dalidou:8100)
+  ATOCORE_CONTEXT_DISABLED    set to "1" to disable injection
+  ATOCORE_CONTEXT_BUDGET      max chars of injected pack (default 4000)
+  ATOCORE_CONTEXT_TIMEOUT     HTTP timeout in seconds (default 5)
+
+Usage in ~/.claude/settings.json:
+    "UserPromptSubmit": [{
+        "matcher": "",
+        "hooks": [{
+            "type": "command",
+            "command": "python /path/to/inject_context.py",
+            "timeout": 10
+        }]
+    }]
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import sys
+import urllib.error
+import urllib.request
+
+ATOCORE_URL = os.environ.get("ATOCORE_URL", "http://dalidou:8100")
+CONTEXT_TIMEOUT = float(os.environ.get("ATOCORE_CONTEXT_TIMEOUT", "5"))
+CONTEXT_BUDGET = int(os.environ.get("ATOCORE_CONTEXT_BUDGET", "4000"))
+
+# Don't spend an API call on trivial acks or slash commands.
+MIN_PROMPT_LENGTH = 15
+
+
+# Project inference table — kept in sync with capture_stop.py so both
+# hooks agree on what project a Claude Code session belongs to.
+_VAULT = "C:\\Users\\antoi\\antoine\\My Libraries\\Antoine Brain Extension"
+_PROJECT_PATH_MAP: dict[str, str] = {
+    f"{_VAULT}\\2-Projects\\P04-GigaBIT-M1": "p04-gigabit",
+    f"{_VAULT}\\2-Projects\\P10-Interferometer": "p05-interferometer",
+    f"{_VAULT}\\2-Projects\\P11-Polisher-Fullum": "p06-polisher",
+    f"{_VAULT}\\2-Projects\\P08-ABB-Space-Mirror": "abb-space",
+    f"{_VAULT}\\2-Projects\\I01-Atomizer": "atomizer-v2",
+    f"{_VAULT}\\2-Projects\\I02-AtoCore": "atocore",
+    "C:\\Users\\antoi\\ATOCore": "atocore",
+    "C:\\Users\\antoi\\Polisher-Sim": "p06-polisher",
+    "C:\\Users\\antoi\\Fullum-Interferometer": "p05-interferometer",
+    "C:\\Users\\antoi\\Atomizer-V2": "atomizer-v2",
+}
+
+
+def _infer_project(cwd: str) -> str:
+    if not cwd:
+        return ""
+    norm = os.path.normpath(cwd).lower()
+    for path_prefix, project_id in _PROJECT_PATH_MAP.items():
+        if norm.startswith(os.path.normpath(path_prefix).lower()):
+            return project_id
+    return ""
+
+
+def _emit_empty() -> None:
+    """Exit 0 with no additionalContext — equivalent to no-op."""
+    sys.exit(0)
+
+
+def _emit_context(pack: str) -> None:
+    """Write the hook output JSON and exit 0."""
+    out = {
+        "hookSpecificOutput": {
+            "hookEventName": "UserPromptSubmit",
+            "additionalContext": pack,
+        }
+    }
+    sys.stdout.write(json.dumps(out))
+    sys.exit(0)
+
+
+def main() -> None:
+    if os.environ.get("ATOCORE_CONTEXT_DISABLED") == "1":
+        _emit_empty()
+
+    try:
+        raw = sys.stdin.read()
+        if not raw.strip():
+            _emit_empty()
+        hook_data = json.loads(raw)
+    except Exception as exc:
+        # Bad stdin → nothing to do
+        print(f"inject_context: bad stdin: {exc}", file=sys.stderr)
+        _emit_empty()
+
+    prompt = (hook_data.get("prompt") or "").strip()
+    cwd = hook_data.get("cwd", "")
+
+    if len(prompt) < MIN_PROMPT_LENGTH:
+        _emit_empty()
+
+    # Skip meta / system prompts that start with '<' (XML tags etc.)
+    if prompt.startswith("<"):
+        _emit_empty()
+
+    project = _infer_project(cwd)
+
+    body = json.dumps({
+        "prompt": prompt,
+        "project": project,
+        "char_budget": CONTEXT_BUDGET,
+    }).encode("utf-8")
+
+    req = urllib.request.Request(
+        f"{ATOCORE_URL}/context/build",
+        data=body,
+        headers={"Content-Type": "application/json"},
+        method="POST",
+    )
+
+    try:
+        resp = urllib.request.urlopen(req, timeout=CONTEXT_TIMEOUT)
+        data = json.loads(resp.read().decode("utf-8"))
+    except urllib.error.URLError as exc:
+        # AtoCore unreachable — fail open
+        print(f"inject_context: atocore unreachable: {exc}", file=sys.stderr)
+        _emit_empty()
+    except Exception as exc:
+        print(f"inject_context: request failed: {exc}", file=sys.stderr)
+        _emit_empty()
+
+    pack = (data.get("formatted_context") or "").strip()
+    if not pack:
+        _emit_empty()
+
+    # Safety truncate. /context/build respects the budget we sent, but
+    # be defensive in case of a regression.
+    if len(pack) > CONTEXT_BUDGET + 500:
+        pack = pack[:CONTEXT_BUDGET] + "\n\n[context truncated]"
+
+    # Wrap so the LLM knows this is injected grounding, not user text.
+    wrapped = (
+        "---\n"
+        "AtoCore-injected context for this prompt "
+        f"(project={project or '(none)'}):\n\n"
+        f"{pack}\n"
+        "---"
+    )
+
+    print(
+        f"inject_context: injected {len(pack)} chars "
+        f"(project={project or 'none'}, prompt_chars={len(prompt)})",
+        file=sys.stderr,
+    )
+    _emit_context(wrapped)
+
+
+if __name__ == "__main__":
+    main()
--- a/docs/PHASE-7-MEMORY-CONSOLIDATION.md
+++ b/docs/PHASE-7-MEMORY-CONSOLIDATION.md
@@ -0,0 +1,96 @@
+# Phase 7 — Memory Consolidation (the "Sleep Cycle")
+
+**Status**: 7A in progress · 7B-H scoped, deferred
+**Design principle**: *"Like human memory while sleeping, but more robotic — never discard relevant details. Consolidate, update, supersede — don't delete."*
+
+## Why
+
+Phases 1–6 built capture + triage + graduation + emerging-project detection. What they don't solve:
+
+| # | Problem | Fix |
+|---|---|---|
+| 1 | Redundancy — "APM uses NX" said 5 different ways across 5 memories | **7A** Semantic dedup |
+| 2 | Latent contradictions — "chose Zygo" + "switched from Zygo" both active | **7B** Pair contradiction detection |
+| 3 | Tag drift — `firmware`, `fw`, `firmware-control` fragment retrieval | **7C** Tag canonicalization |
+| 4 | Confidence staleness — 6-month unreferenced memory ranks as fresh | **7D** Confidence decay |
+| 5 | No memory drill-down page | **7E** `/wiki/memories/{id}` |
+| 6 | Domain knowledge siloed per project | **7F** `/wiki/domains/{tag}` |
+| 7 | Prompt upgrades (llm-0.5 → 0.6) don't re-process old interactions | **7G** Re-extraction on version bump |
+| 8 | Superseded memory vectors still in Chroma polluting retrieval | **7H** Vector hygiene |
+
+Collectively: the brain needs a nightly pass that looks at what it already knows and tidies up — dedup, resolve contradictions, canonicalize tags, decay stale facts — **without losing information**.
+
+## Subphases
+
+### 7A — Semantic dedup + consolidation *(this sprint)*
+
+Compute embeddings on active memories, find pairs within `(project, memory_type)` bucket above similarity threshold (default 0.88), cluster, draft a unified memory via LLM, human approves in triage UI. On approve: sources become `superseded`, new merged memory created with union of `source_refs`, sum of `reference_count`, max of `confidence`. **Ships first** because redundancy compounds — every new memory potentially duplicates an old one.
+
+Detailed spec lives in the working plan (`dapper-cooking-tower.md`) and across the files listed under "Files touched" below. Key decisions:
+
+- LLM drafts, human approves — no silent auto-merge.
+- Same `(project, memory_type)` bucket only. Cross-project merges are rare + risky → separate flow in 7B.
+- Recompute embeddings each scan (~2s / 335 memories). Persist only if scan time becomes a problem.
+- Cluster-based proposals (A~B~C → one merge), not pair-based.
+- `status=superseded` never deleted — still queryable with filter.
+
+**Schema**: new table `memory_merge_candidates` (pending | approved | rejected).
+**Cron**: nightly at threshold 0.90 (tight); weekly (Sundays) at 0.85 (deeper cleanup).
+**UI**: new "🔗 Merge Candidates" section in `/admin/triage`.
+
+**Files touched in 7A**:
+- `src/atocore/models/database.py` — migration
+- `src/atocore/memory/similarity.py` — new, `compute_memory_similarity()`
+- `src/atocore/memory/_dedup_prompt.py` — new, shared LLM prompt
+- `src/atocore/memory/service.py` — `merge_memories()`
+- `scripts/memory_dedup.py` — new, host-side detector (HTTP-only)
+- `src/atocore/api/routes.py` — 5 new endpoints under `/admin/memory/`
+- `src/atocore/engineering/triage_ui.py` — merge cards section
+- `deploy/dalidou/batch-extract.sh` — Step B3
+- `deploy/dalidou/dedup-watcher.sh` — new, UI-triggered scans
+- `tests/test_memory_dedup.py` — ~10-15 new tests
+
+### 7B — Memory-to-memory contradiction detection
+
+Same embedding-pair machinery as 7A but within a *different* band (similarity 0.70–0.88 — semantically related but different wording). LLM classifies each pair: `duplicate | complementary | contradicts | supersedes-older`. Contradictions write a `memory_conflicts` row + surface a triage badge. Clear supersessions (both tier 1 sonnet and tier 2 opus agree) auto-mark the older as `superseded`.
+
+### 7C — Tag canonicalization
+
+Weekly LLM pass over `domain_tags` distribution, proposes `alias → canonical` map (e.g. `fw → firmware`). Human approves via UI (one-click pattern, same as emerging-project registration). Bulk-rewrites `domain_tags` atomically across all memories.
+
+### 7D — Confidence decay
+
+Daily lightweight job. For memories with `reference_count=0` AND `last_referenced_at` older than 30 days: multiply confidence by 0.97/day (~2-month half-life). Reinforcement already bumps confidence. Below 0.3 → auto-supersede with reason `decayed, no references`. Reversible (tune half-life), non-destructive (still searchable with status filter).
+
+### 7E — Memory detail page `/wiki/memories/{id}`
+
+Provenance chain: source_chunk → interaction → graduated_to_entity. Audit trail (Phase 4 has the data). Related memories (same project + tag + semantic neighbors). Decay trajectory plot (if 7D ships). Link target from every memory surfaced anywhere in the wiki.
+
+### 7F — Cross-project domain view `/wiki/domains/{tag}`
+
+One page per `domain_tag` showing all memories + graduated entities with that tag, grouped by project. "Optics across p04+p05+p06" becomes a real navigable page. Answers the long-standing question the tag system was meant to enable.
+
+### 7G — Re-extraction on prompt upgrade
+
+`batch_llm_extract_live.py --force-reextract --since DATE`. Dedupe key: `(interaction_id, extractor_version)` — same run on same interaction doesn't double-create. Triggered manually when `LLM_EXTRACTOR_VERSION` bumps. Not automatic (destructive).
+
+### 7H — Vector store hygiene
+
+Nightly: scan `source_chunks` and `memory_embeddings` (added in 7A V2) for `status=superseded|invalid`. Delete matching vectors from Chroma. Fail-open — the retrieval harness catches any real regression.
+
+## Verification & ship order
+
+1. **7A** — ship + observe 1 week → validate merge proposals are high-signal, rejection rate acceptable
+2. **7D** — decay is low-risk + high-compounding value; ship second
+3. **7C** — clean up tag fragmentation before 7F depends on canonical tags
+4. **7E** + **7F** — UX surfaces; ship together once data is clean
+5. **7B** — contradictions flow (pairs harder than duplicates to classify; wait for 7A data to tune threshold)
+6. **7G** — on-demand; no ship until we actually bump the extractor prompt
+7. **7H** — housekeeping; after 7A + 7B + 7D have generated enough `superseded` rows to matter
+
+## Scope NOT in Phase 7
+
+- Graduated memories (entity-descended) are **frozen** — exempt from dedup/decay. Entity consolidation is a separate Phase (8+).
+- Auto-merging without human approval (always human-in-the-loop in V1).
+- Summarization / compression — a different problem (reducing the number of chunks per memory, not the number of memories).
+- Forgetting policies — there's no user-facing "delete this" flow in Phase 7. Supersede + filter covers the need.
--- a/docs/capture-surfaces.md
+++ b/docs/capture-surfaces.md
@@ -0,0 +1,45 @@
+# AtoCore — sanctioned capture surfaces
+
+**Scope statement**: AtoCore captures conversations from **two surfaces only**. Everything else is intentionally out of scope.
+
+| Surface | Hooks | Status |
+|---|---|---|
+| **Claude Code** (local CLI) | `Stop` (capture) + `UserPromptSubmit` (context injection) | both installed |
+| **OpenClaw** (agent framework on T420) | `before_agent_start` (context injection) + `llm_output` (capture) | both installed (v0.2.0 plugin, Phase 7I) |
+
+Both surfaces are **symmetric** — push (capture) and pull (context injection on prompt submit) — so AtoCore learns from every turn AND every turn is grounded in what AtoCore already knows.
+
+## Why these two?
+
+- **Stable hook APIs.** Claude Code exposes `Stop` and `UserPromptSubmit` lifecycle hooks with documented JSON contracts. OpenClaw exposes `before_agent_start` and `llm_output`. Both run locally where we control the process.
+- **Passive from the user's perspective.** No paste, no manual capture command, no "remember this" prompt. You just use the tool and AtoCore absorbs everything durable.
+- **Failure is graceful.** If AtoCore is down, hooks exit 0 with no output — the user's turn proceeds uninterrupted.
+
+## Why not Claude Desktop / Claude.ai web / Claude mobile / ChatGPT / …?
+
+- Claude Desktop has MCP but no `Stop`-equivalent hook for auto-capture; auto-capture would require system-prompt coercion ("call atocore_remember every turn"), which is fragile.
+- Claude.ai web has no hook surface — would need a browser extension (real project, not shipped).
+- Claude mobile app has neither hooks nor MCP — nothing to wire into.
+- ChatGPT etc. — same as above.
+
+**Anthropic API log polling is explicitly prohibited.**
+
+If you find yourself wanting to capture from one of these, the real answer is: use Claude Code or OpenClaw for the work that matters. Don't paste chat transcripts into AtoCore — that contradicts the whole design principle of passive capture.
+
+A `/wiki/capture` fallback form still exists (the endpoint `/interactions` is public) but it is **not promoted in the UI** and is documented as a last-resort escape hatch. If you're reaching for it, something is wrong with your workflow, not with AtoCore.
+
+## Hook files
+
+- `deploy/hooks/capture_stop.py` — Claude Code Stop → POSTs `/interactions`
+- `deploy/hooks/inject_context.py` — Claude Code UserPromptSubmit → POSTs `/context/build`, returns pack via `hookSpecificOutput.additionalContext`
+- `openclaw-plugins/atocore-capture/index.js` — OpenClaw plugin v0.2.0: capture + context injection
+
+Both Claude Code hooks share a `_infer_project` table mapping cwd to project slug. Keep them in sync when adding a new project path.
+
+## Kill switches
+
+- `ATOCORE_CAPTURE_DISABLED=1` → skip Stop capture
+- `ATOCORE_CONTEXT_DISABLED=1` → skip UserPromptSubmit injection
+- OpenClaw plugin config `injectContext: false` → skip context injection (capture still fires)
+
+All three are documented in the respective hook/plugin files.
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -1,275 +1,49 @@
-# AtoCore Current State
+# AtoCore — Current State (2026-04-19)

-## Status Summary
+Live deploy: `877b97e` · Dalidou health: ok · Harness: 17/18.

-AtoCore is no longer just a proof of concept. The local engine exists, the
-correctness pass is complete, Dalidou now hosts the canonical runtime and
-machine-storage location, and the T420/OpenClaw side now has a safe read-only
-path to consume AtoCore. The live corpus is no longer just self-knowledge: it
-now includes a first curated ingestion batch for the active projects.
+## The numbers

-## Phase Assessment
+| | count |
+|---|---|
+| Active memories | 266 (180 project, 31 preference, 24 knowledge, 17 adaptation, 11 episodic, 3 identity) |
+| Candidates pending | **0** (autonomous triage drained the queue) |
+| Interactions captured | 605 (250 claude-code, 351 openclaw) |
+| Entities (typed graph) | 50 |
+| Vectors in Chroma | 33K+ |
+| Projects | 6 registered (p04, p05, p06, abb-space, atomizer-v2, atocore) + apm emerging (2 memories, below auto-register threshold) |
+| Unique domain tags | 210 |
+| Tests | 440 passing |

- completed
-  - Phase 0
-  - Phase 0.5
-  - Phase 1
- baseline complete
-  - Phase 2
-  - Phase 3
-  - Phase 5
-  - Phase 7
-  - Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
- partial
-  - Phase 4
-  - Phase 8
- not started
-  - Phase 6
-  - Phase 10
-  - Phase 11
-  - Phase 12
-  - Phase 13
+## Autonomous pipeline — what runs without me

-## What Exists Today
+| When | Job | Does |
+|---|---|---|
+| every hour | `hourly-extract.sh` | Pulls new interactions → LLM extraction → 3-tier auto-triage (sonnet → opus → discard/human). 0 pending candidates right now = autonomy is working. |
+| every 2 min | `dedup-watcher.sh` | Services UI-triggered dedup scans |
+| daily 03:00 UTC | Full nightly (`batch-extract.sh`) | Extract · triage · auto-promote reinforced · synthesis · harness · dedup (0.90) · emerging detector · transient→durable · **confidence decay (7D)** · integrity check · alerts |
+| Sundays | +Weekly deep pass | Knowledge-base lint · dedup @ 0.85 · **tag canonicalization (7C)** |

- ingestion pipeline
- parser and chunker
- SQLite-backed memory and project state
- vector retrieval
- context builder
- API routes for query, context, health, and source status
- project registry and per-project refresh foundation
- project registration lifecycle:
-  - template
-  - proposal preview
-  - approved registration
-  - safe update of existing project registrations
-  - refresh
- implementation-facing architecture notes for:
-  - engineering knowledge hybrid architecture
-  - engineering ontology v1
- env-driven storage and deployment paths
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
- T420/OpenClaw read-only AtoCore helper skill
- full active-project markdown/text corpus wave for:
-  - `p04-gigabit`
-  - `p05-interferometer`
-  - `p06-polisher`
+Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs human**. That's the brain self-organizing.

-## What Is True On Dalidou
+## Phase 7 — Memory Consolidation status

- deployed repo location:
-  - `/srv/storage/atocore/app`
- canonical machine DB location:
-  - `/srv/storage/atocore/data/db/atocore.db`
- canonical vector store location:
-  - `/srv/storage/atocore/data/chroma`
- source input locations:
-  - `/srv/storage/atocore/sources/vault`
-  - `/srv/storage/atocore/sources/drive`
+| Subphase | What | Status |
+|---|---|---|
+| 7A | Semantic dedup + merge lifecycle | live |
+| 7A.1 | Tiered auto-approve (sonnet ≥0.8 + sim ≥0.92 → merge; opus escalation; human only for ambiguous) | live |
+| 7B | Memory-to-memory contradiction detection (0.70–0.88 band, classify duplicate/contradicts/supersedes) | deferred, needs 7A signal |
+| 7C | Tag canonicalization (weekly; auto-apply ≥0.8 confidence; protects project tokens) | live (first run: 0 proposals — vocabulary is clean) |
+| 7D | Confidence decay (0.97/day on idle unreferenced; auto-supersede below 0.3) | live (first run: 0 decayed — nothing idle+unreferenced yet) |
+| 7E | `/wiki/memories/{id}` detail page | pending |
+| 7F | `/wiki/domains/{tag}` cross-project view | pending (wants 7C + more usage first) |
+| 7G | Re-extraction on prompt version bump | pending |
+| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |

-The service and storage foundation are live on Dalidou.
+## Known gaps (honest)

-The machine-data host is real and canonical.
-
-The project registry is now also persisted in a canonical mounted config path on
-Dalidou:
-
- `/srv/storage/atocore/config/project-registry.json`
-
-The content corpus is partially populated now.
-
-The Dalidou instance already contains:
-
- AtoCore ecosystem and hosting docs
- current-state and OpenClaw integration docs
- Master Plan V3
- Build Spec V1
- trusted project-state entries for `atocore`
- full staged project markdown/text corpora for:
-  - `p04-gigabit`
-  - `p05-interferometer`
-  - `p06-polisher`
- curated repo-context docs for:
-  - `p05`: `Fullum-Interferometer`
-  - `p06`: `polisher-sim`
- trusted project-state entries for:
-  - `p04-gigabit`
-  - `p05-interferometer`
-  - `p06-polisher`
-
-Current live stats after the full active-project wave are now far beyond the
-initial seed stage:
-
- more than `1,100` source documents
- more than `20,000` chunks
- matching vector count
-
-The broader long-term corpus is still not fully populated yet. Wider project and
-vault ingestion remains a deliberate next step rather than something already
-completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.
-
-For human-readable quality review, the current staged project markdown corpus is
-primarily visible under:
-
- `/srv/storage/atocore/sources/vault/incoming/projects`
-
-This staged area is now useful for review because it contains the markdown/text
-project docs that were actually ingested for the full active-project wave.
-
-It is important to read this staged area correctly:
-
- it is a readable ingestion input layer
- it is not the final machine-memory representation itself
- seeing familiar PKM-style notes there is expected
- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
-  trusted project state, and context-builder outputs
-
-## What Is True On The T420
-
- SSH access is working
- OpenClaw workspace inspected at `/home/papa/clawd`
- OpenClaw's own memory system remains unchanged
- a read-only AtoCore integration skill exists in the workspace:
-  - `/home/papa/clawd/skills/atocore-context/`
- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
- fail-open behavior has been verified for the helper path
- OpenClaw can now seed AtoCore in two distinct ways:
-  - project-scoped memory entries
-  - staged document ingestion into the retrieval corpus
- the helper now supports the practical registered-project lifecycle:
-  - projects
-  - project-template
-  - propose-project
-  - register-project
-  - update-project
-  - refresh-project
- the helper now also supports the first organic routing layer:
-  - `detect-project "<prompt>"`
-  - `auto-context "<prompt>" [budget] [project]`
- OpenClaw can now default to AtoCore for project-knowledge questions without
-  requiring explicit helper commands from the human every time
-
-## What Exists In Memory vs Corpus
-
-These remain separate and that is intentional.
-
-In `/memory`:
-
- project-scoped curated memories now exist for:
-  - `p04-gigabit`: 5 memories
-  - `p05-interferometer`: 6 memories
-  - `p06-polisher`: 8 memories
-
-These are curated summaries and extracted stable project signals.
-
-In `source_documents` / retrieval corpus:
-
- full project markdown/text corpora are now present for the active project set
- retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is broad enough that ranking quality matters more than
-  corpus presence alone
- underspecified prompts can still pull in historical or archive material, so
-  project-aware routing and better ranking remain important
-
-The source refresh model now has a concrete foundation in code:
-
- a project registry file defines known project ids, aliases, and ingest roots
- the API can list registered projects
- the API can return a registration template
- the API can preview a registration without mutating state
- the API can persist an approved registration
- the API can update an existing registered project without changing its canonical id
- the API can refresh one registered project at a time
-
-This lifecycle is now coherent end to end for normal use.
-
-The first live update passes on existing registered projects have now been
-verified against `p04-gigabit` and `p05-interferometer`:
-
- the registration description can be updated safely
- the canonical project id remains unchanged
- refresh still behaves cleanly after the update
- `context/build` still returns useful project-specific context afterward
-
-## Reliability Baseline
-
-The runtime has now been hardened in a few practical ways:
-
- SQLite connections use a configurable busy timeout
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
- project registry writes are atomic file replacements rather than in-place rewrites
- a full runtime backup and restore path now exists and has been exercised on
-  live Dalidou:
-  - SQLite (hot online backup via `conn.backup()`)
-  - project registry (file copy)
-  - Chroma vector store (cold directory copy under `exclusive_ingestion()`)
-  - backup metadata
-  - `restore_runtime_backup()` with CLI entry point
-    (`python -m atocore.ops.backup restore <STAMP>
-    --confirm-service-stopped`), pre-restore safety snapshot for
-    rollback, WAL/SHM sidecar cleanup, `PRAGMA integrity_check`
-    on the restored file
-  - the first live drill on 2026-04-09 surfaced and fixed a Chroma
-    restore bug on Docker bind-mounted volumes (`shutil.rmtree`
-    on a mount point); a regression test now asserts the
-    destination inode is stable across restore
- deploy provenance is visible end-to-end:
-  - `/health` reports `build_sha`, `build_time`, `build_branch`
-    from env vars wired by `deploy.sh`
-  - `deploy.sh` Step 6 verifies the live `build_sha` matches the
-    just-built commit (exit code 6 on drift) so "live is current?"
-    can be answered precisely, not just by `__version__`
-  - `deploy.sh` Step 1.5 detects that the script itself changed
-    in the pulled commit and re-execs into the fresh copy, so
-    the deploy never silently runs the old script against new source
-
-This does not eliminate every concurrency edge, but it materially improves the
-current operational baseline.
-
-In `Trusted Project State`:
-
- each active seeded project now has a conservative trusted-state set
- promoted facts cover:
-  - summary
-  - core architecture or boundary decision
-  - key constraints
-  - next focus
-
-This separation is healthy:
-
- memory stores distilled project facts
- corpus stores the underlying retrievable documents
-
-## Immediate Next Focus
-
-1. ~~Re-run the full backup/restore drill~~ — DONE 2026-04-11,
-   full pass (db, registry, chroma, integrity all true)
-2. ~~Turn on auto-capture of Claude Code sessions in conservative
-   mode~~ — DONE 2026-04-11, Stop hook wired via
-   `deploy/hooks/capture_stop.py` → `POST /interactions`
-   with `reinforce=false`; kill switch via
-   `ATOCORE_CAPTURE_DISABLED=1`
-3. Run a short real-use pilot with auto-capture on, verify
-   interactions are landing in Dalidou, review quality
-4. Use the new T420-side organic routing layer in real OpenClaw workflows
-4. Tighten retrieval quality for the now fully ingested active project corpora
-5. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
-6. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
-7. Expand the remaining boring operations baseline:
-   - retention policy cleanup script
-   - off-Dalidou backup target (rsync or similar)
-8. Only later consider write-back, reflection, or deeper autonomous behaviors
-
-See also:
-
- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
-
-## Guiding Constraints
-
- bad memory is worse than no memory
- trusted project state must remain highest priority
- human-readable sources and machine storage stay separate
- OpenClaw integration must not degrade OpenClaw baseline behavior
+1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
+2. **OpenClaw is capture-only, not context-grounded.** The plugin POSTs `/interactions` on `llm_output` but does NOT call `/context/build` on `before_agent_start`. OpenClaw's underlying agent runs blind. See Q2 below.
+3. **Human interface (wiki) is thin and static.** 5 project cards + a "System" line. No dashboard for the autonomous activity. No per-memory detail page. See Q3/Q5.
+4. **Harness 17/18** — the `p04-constraints` fixture wants "Zerodur" but retrieval surfaces related-not-exact terms. Content gap, not a retrieval regression.
+5. **Two projects under-populated**: p05-interferometer (4 memories, 18 state) and atomizer-v2 (1 memory, 6 state). Batch re-extract with the new llm-0.6.0 prompt would help.
--- a/openclaw-plugins/atocore-capture/README.md
+++ b/openclaw-plugins/atocore-capture/README.md
@@ -1,29 +1,40 @@
-# AtoCore Capture Plugin for OpenClaw
+# AtoCore Capture + Context Plugin for OpenClaw

-Minimal OpenClaw plugin that mirrors Claude Code's `capture_stop.py` behavior:
+Two-way bridge between OpenClaw agents and AtoCore:

+**Capture (since v1)**
 - watches user-triggered assistant turns
 - POSTs `prompt` + `response` to `POST /interactions`
- sets `client="openclaw"`
- sets `reinforce=true`
+- sets `client="openclaw"`, `reinforce=true`
 - fails open on network or API errors

-## Config
+**Context injection (Phase 7I, v2+)**
+- on `before_agent_start`, fetches a context pack from `POST /context/build`
+- prepends the pack to the agent's prompt so whatever LLM runs underneath
+  (sonnet, opus, codex, local model — whichever OpenClaw delegates to)
+  answers grounded in what AtoCore already knows
+- original user prompt is still what gets captured later (no recursion)
+- fails open: context unreachable → agent runs as before

-Optional plugin config:
+## Config

 ```json
 {
  "baseUrl": "http://dalidou:8100",
  "minPromptLength": 15,
-  "maxResponseLength": 50000
+  "maxResponseLength": 50000,
+  "injectContext": true,
+  "contextCharBudget": 4000
 }
 ```

-If `baseUrl` is omitted, the plugin uses `ATOCORE_BASE_URL` or defaults to `http://dalidou:8100`.
+- `baseUrl` — defaults to `ATOCORE_BASE_URL` env or `http://dalidou:8100`
+- `injectContext` — set to `false` to disable the Phase 7I context injection and make this a pure one-way capture plugin again
+- `contextCharBudget` — cap on injected context size. `/context/build` respects it too; this is a client-side safety net. Default 4000 chars (~1000 tokens).

 ## Notes

- Project detection is intentionally left empty for now. Unscoped capture is acceptable because AtoCore's extraction pipeline handles unscoped interactions.
- Extraction is **not** part of the capture path. This plugin only records interactions and lets AtoCore reinforcement run automatically.
- The plugin captures only user-triggered turns, not heartbeats or system-only runs.
+- Project detection is intentionally left empty — AtoCore's extraction pipeline handles unscoped interactions and infers the project from content.
+- Extraction is **not** part of this plugin. Interactions are captured; batch extraction runs via cron on the AtoCore host.
+- Context injection only fires for user-triggered turns (not heartbeats or system-only runs).
+- Timeouts: context fetch is 5s (short so a slow AtoCore never blocks a user turn); capture post is 10s.
--- a/openclaw-plugins/atocore-capture/handler.js
+++ b/openclaw-plugins/atocore-capture/handler.js
@@ -98,6 +98,14 @@ export default definePluginEntry({
        lastPrompt = null;
        return;
      }
+      // Filter cron-initiated agent runs. OpenClaw's scheduled tasks fire
+      // agent sessions with prompts that begin "[cron:<id> ...]". These are
+      // automated polls (DXF email watcher, calendar reminders, etc.), not
+      // real user turns — they're pure noise in the AtoCore capture stream.
+      if (prompt.startsWith("[cron:")) {
+        lastPrompt = null;
+        return;
+      }
      lastPrompt = { text: prompt, sessionKey: ctx?.sessionKey || "", ts: Date.now() };
      log.info("atocore-capture:prompt_buffered", { len: prompt.length });
    });
--- a/openclaw-plugins/atocore-capture/index.js
+++ b/openclaw-plugins/atocore-capture/index.js
@@ -3,6 +3,11 @@ import { definePluginEntry } from "openclaw/plugin-sdk/core";
 const DEFAULT_BASE_URL = process.env.ATOCORE_BASE_URL || "http://dalidou:8100";
 const DEFAULT_MIN_PROMPT_LENGTH = 15;
 const DEFAULT_MAX_RESPONSE_LENGTH = 50_000;
+// Phase 7I — context injection: cap how much AtoCore context we stuff
+// back into the prompt. The /context/build endpoint respects a budget
+// parameter too, but we keep a client-side safety net.
+const DEFAULT_CONTEXT_CHAR_BUDGET = 4_000;
+const DEFAULT_INJECT_CONTEXT = true;

 function trimText(value) {
  return typeof value === "string" ? value.trim() : "";
@@ -41,6 +46,37 @@ async function postInteraction(baseUrl, payload, logger) {
  }
 }

+// Phase 7I — fetch a context pack for the incoming prompt so the agent
+// answers grounded in what AtoCore already knows. Fail-open: if the
+// request times out or errors, we just don't inject; the agent runs as
+// before. Never block the user's turn on AtoCore availability.
+async function fetchContextPack(baseUrl, prompt, project, charBudget, logger) {
+  try {
+    const res = await fetch(`${baseUrl.replace(/\/$/, "")}/context/build`, {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({
+        prompt,
+        project: project || "",
+        char_budget: charBudget
+      }),
+      signal: AbortSignal.timeout(5_000)
+    });
+    if (!res.ok) {
+      logger?.debug?.("atocore_context_fetch_failed", { status: res.status });
+      return null;
+    }
+    const data = await res.json();
+    const pack = trimText(data?.formatted_context || "");
+    return pack || null;
+  } catch (error) {
+    logger?.debug?.("atocore_context_fetch_error", {
+      error: error instanceof Error ? error.message : String(error)
+    });
+    return null;
+  }
+}
+
 export default definePluginEntry({
  register(api) {
    const logger = api.logger;
@@ -55,6 +91,28 @@ export default definePluginEntry({
        pendingBySession.delete(ctx.sessionId);
        return;
      }
+
+      // Phase 7I — inject AtoCore context into the agent's prompt so it
+      // answers grounded in what the brain already knows. Config-gated
+      // (injectContext: false disables). Fail-open.
+      const baseUrl = trimText(config.baseUrl) || DEFAULT_BASE_URL;
+      const injectContext = config.injectContext !== false && DEFAULT_INJECT_CONTEXT;
+      const charBudget = Number(config.contextCharBudget || DEFAULT_CONTEXT_CHAR_BUDGET);
+      if (injectContext && event && typeof event === "object") {
+        const pack = await fetchContextPack(baseUrl, prompt, "", charBudget, logger);
+        if (pack) {
+          // Prepend to the event's prompt so the agent sees grounded info
+          // before the user's question. OpenClaw's agent receives
+          // event.prompt as its primary input; modifying it here grounds
+          // whatever LLM the agent delegates to (sonnet, opus, codex,
+          // local model — doesn't matter).
+          event.prompt = `${pack}\n\n---\n\n${prompt}`;
+          logger?.debug?.("atocore_context_injected", { chars: pack.length });
+        }
+      }
+
+      // Record the ORIGINAL user prompt (not the injected version) so
+      // captured interactions stay clean for later extraction.
      pendingBySession.set(ctx.sessionId, {
        prompt,
        sessionId: ctx.sessionId,
--- a/openclaw-plugins/atocore-capture/package.json
+++ b/openclaw-plugins/atocore-capture/package.json
@@ -1,7 +1,7 @@
 {
  "name": "@atomaste/atocore-openclaw-capture",
  "private": true,
-  "version": "0.0.0",
+  "version": "0.2.0",
  "type": "module",
-  "description": "OpenClaw plugin that captures assistant turns to AtoCore interactions"
+  "description": "OpenClaw plugin: captures assistant turns to AtoCore interactions AND injects AtoCore context into agent prompts before they run (Phase 7I two-way bridge)"
 }
--- a/scripts/atocore_mcp.py
+++ b/scripts/atocore_mcp.py
@@ -37,6 +37,17 @@ import urllib.error
 import urllib.parse
 import urllib.request

+# Force UTF-8 on stdio — MCP protocol expects UTF-8 but Windows Python
+# defaults stdout to cp1252, which crashes on any non-ASCII char (emojis,
+# ≥, →, etc.) in tool responses. This call is a no-op on Linux/macOS
+# where UTF-8 is already the default.
+try:
+    sys.stdin.reconfigure(encoding="utf-8")
+    sys.stdout.reconfigure(encoding="utf-8")
+    sys.stderr.reconfigure(encoding="utf-8")
+except Exception:
+    pass
+
 # --- Configuration ---

 ATOCORE_URL = os.environ.get("ATOCORE_URL", "http://dalidou:8100").rstrip("/")
@@ -232,6 +243,72 @@ def _tool_projects(args: dict) -> str:
    return "\n".join(lines)


+def _tool_remember(args: dict) -> str:
+    """Phase 6 Part B — universal capture from any Claude session.
+
+    Wraps POST /memory to create a candidate memory tagged with
+    source='mcp-remember'. The existing 3-tier triage is the quality
+    gate: nothing becomes active until sonnet (+ opus if borderline)
+    approves it. Returns the memory id so the caller can reference it
+    in the same session.
+    """
+    content = (args.get("content") or "").strip()
+    if not content:
+        return "Error: 'content' is required."
+
+    memory_type = (args.get("memory_type") or "knowledge").strip()
+    valid_types = ["identity", "preference", "project", "episodic", "knowledge", "adaptation"]
+    if memory_type not in valid_types:
+        return f"Error: memory_type must be one of {valid_types}."
+
+    project = (args.get("project") or "").strip()
+    try:
+        confidence = float(args.get("confidence") or 0.6)
+    except (TypeError, ValueError):
+        confidence = 0.6
+    confidence = max(0.0, min(1.0, confidence))
+
+    valid_until = (args.get("valid_until") or "").strip()
+    tags = args.get("domain_tags") or []
+    if not isinstance(tags, list):
+        tags = []
+    # Normalize tags: lowercase, dedupe, cap at 10
+    clean_tags: list[str] = []
+    for t in tags[:10]:
+        if not isinstance(t, str):
+            continue
+        t = t.strip().lower()
+        if t and t not in clean_tags:
+            clean_tags.append(t)
+
+    payload = {
+        "memory_type": memory_type,
+        "content": content,
+        "project": project,
+        "confidence": confidence,
+        "status": "candidate",
+    }
+    if valid_until:
+        payload["valid_until"] = valid_until
+    if clean_tags:
+        payload["domain_tags"] = clean_tags
+
+    result, err = safe_call(http_post, "/memory", payload)
+    if err:
+        return f"AtoCore remember failed: {err}"
+
+    mid = result.get("id", "?")
+    scope = project if project else "(global)"
+    tag_str = f" tags=[{', '.join(clean_tags)}]" if clean_tags else ""
+    expires = f" valid_until={valid_until}" if valid_until else ""
+    return (
+        f"Remembered as candidate: id={mid}\n"
+        f"  type={memory_type} project={scope} confidence={confidence:.2f}{tag_str}{expires}\n"
+        f"Will flow through the standard triage pipeline within 24h "
+        f"(or on next auto-process button click at /admin/triage)."
+    )
+
+
 def _tool_health(args: dict) -> str:
    """Check AtoCore service health."""
    result, err = safe_call(http_get, "/health")
@@ -516,6 +593,58 @@ TOOLS = [
        },
        "handler": _tool_memory_create,
    },
+    {
+        "name": "atocore_remember",
+        "description": (
+            "Save a durable fact to AtoCore's memory layer from any conversation. "
+            "Use when the user says 'remember this', 'save that for later', "
+            "'don't lose this fact', or when you identify a decision/insight/"
+            "preference worth persisting across future sessions. The fact "
+            "goes through quality review before being consulted in future "
+            "context packs (so durable facts get kept, noise gets rejected). "
+            "Call multiple times if one conversation has multiple distinct "
+            "facts worth remembering — one tool call per atomic fact. "
+            "Prefer 'knowledge' type for cross-project engineering insights, "
+            "'project' for facts specific to one project, 'preference' for "
+            "user work-style notes, 'adaptation' for standing behavioral rules."
+        ),
+        "inputSchema": {
+            "type": "object",
+            "properties": {
+                "content": {
+                    "type": "string",
+                    "description": "The atomic fact to remember. Under 250 chars. Should stand alone without session context.",
+                },
+                "memory_type": {
+                    "type": "string",
+                    "enum": ["identity", "preference", "project", "episodic", "knowledge", "adaptation"],
+                    "default": "knowledge",
+                },
+                "project": {
+                    "type": "string",
+                    "description": "Project id if scoped. Empty for cross-project. Unregistered names flagged by triage as 'emerging project' proposals.",
+                },
+                "confidence": {
+                    "type": "number",
+                    "minimum": 0,
+                    "maximum": 1,
+                    "default": 0.6,
+                    "description": "0.5-0.7 typical. 0.8+ only for ratified/committed claims.",
+                },
+                "valid_until": {
+                    "type": "string",
+                    "description": "ISO date YYYY-MM-DD if time-bounded (e.g. current state, scheduled event, quote expiry). Empty for permanent facts.",
+                },
+                "domain_tags": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": "Lowercase topical tags (optics, thermal, firmware, procurement, etc.) for cross-project retrieval. 2-5 tags typical.",
+                },
+            },
+            "required": ["content"],
+        },
+        "handler": _tool_remember,
+    },
    {
        "name": "atocore_project_state",
        "description": (
--- a/scripts/canonicalize_tags.py
+++ b/scripts/canonicalize_tags.py
@@ -0,0 +1,254 @@
+#!/usr/bin/env python3
+"""Phase 7C — tag canonicalization detector.
+
+Weekly (or on-demand) LLM pass that:
+  1. Fetches the tag distribution across all active memories via HTTP
+  2. Asks claude-p to propose alias→canonical mappings
+  3. AUTO-APPLIES aliases with confidence >= AUTO_APPROVE_CONF (0.8)
+  4. Submits lower-confidence proposals as pending for human review
+
+Autonomous by default — matches the Phase 7A.1 pattern. Set
+--no-auto-approve to force every proposal into human review.
+
+Host-side because claude CLI lives on Dalidou, not the container.
+Reuses the PYTHONPATH=src pattern from scripts/memory_dedup.py.
+
+Usage:
+  python3 scripts/canonicalize_tags.py [--base-url URL] [--dry-run] [--no-auto-approve]
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import shutil
+import subprocess
+import sys
+import tempfile
+import time
+import urllib.error
+import urllib.request
+
+_SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+_SRC_DIR = os.path.abspath(os.path.join(_SCRIPT_DIR, "..", "src"))
+if _SRC_DIR not in sys.path:
+    sys.path.insert(0, _SRC_DIR)
+
+from atocore.memory._tag_canon_prompt import (  # noqa: E402
+    PROTECTED_PROJECT_TOKENS,
+    SYSTEM_PROMPT,
+    TAG_CANON_PROMPT_VERSION,
+    build_user_message,
+    normalize_alias_item,
+    parse_canon_output,
+)
+
+DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://127.0.0.1:8100")
+DEFAULT_MODEL = os.environ.get("ATOCORE_TAG_CANON_MODEL", "sonnet")
+DEFAULT_TIMEOUT_S = float(os.environ.get("ATOCORE_TAG_CANON_TIMEOUT_S", "90"))
+
+AUTO_APPROVE_CONF = float(os.environ.get("ATOCORE_TAG_CANON_AUTO_APPROVE_CONF", "0.8"))
+MIN_ALIAS_COUNT = int(os.environ.get("ATOCORE_TAG_CANON_MIN_ALIAS_COUNT", "1"))
+
+_sandbox_cwd = None
+
+
+def get_sandbox_cwd() -> str:
+    global _sandbox_cwd
+    if _sandbox_cwd is None:
+        _sandbox_cwd = tempfile.mkdtemp(prefix="ato-tagcanon-")
+    return _sandbox_cwd
+
+
+def api_get(base_url: str, path: str) -> dict:
+    req = urllib.request.Request(f"{base_url}{path}")
+    with urllib.request.urlopen(req, timeout=30) as resp:
+        return json.loads(resp.read().decode("utf-8"))
+
+
+def api_post(base_url: str, path: str, body: dict | None = None) -> dict:
+    data = json.dumps(body or {}).encode("utf-8")
+    req = urllib.request.Request(
+        f"{base_url}{path}", method="POST",
+        headers={"Content-Type": "application/json"}, data=data,
+    )
+    with urllib.request.urlopen(req, timeout=30) as resp:
+        return json.loads(resp.read().decode("utf-8"))
+
+
+def call_claude(user_message: str, model: str, timeout_s: float) -> tuple[str | None, str | None]:
+    if not shutil.which("claude"):
+        return None, "claude CLI not available"
+    args = [
+        "claude", "-p",
+        "--model", model,
+        "--append-system-prompt", SYSTEM_PROMPT,
+        "--disable-slash-commands",
+        user_message,
+    ]
+    last_error = ""
+    for attempt in range(3):
+        if attempt > 0:
+            time.sleep(2 ** attempt)
+        try:
+            completed = subprocess.run(
+                args, capture_output=True, text=True,
+                timeout=timeout_s, cwd=get_sandbox_cwd(),
+                encoding="utf-8", errors="replace",
+            )
+        except subprocess.TimeoutExpired:
+            last_error = f"{model} timed out"
+            continue
+        except Exception as exc:
+            last_error = f"subprocess error: {exc}"
+            continue
+        if completed.returncode == 0:
+            return (completed.stdout or "").strip(), None
+        stderr = (completed.stderr or "").strip()[:200]
+        last_error = f"{model} exit {completed.returncode}: {stderr}"
+    return None, last_error
+
+
+def fetch_tag_distribution(base_url: str) -> dict[str, int]:
+    """Count tag occurrences across active memories (client-side)."""
+    try:
+        result = api_get(base_url, "/memory?active_only=true&limit=2000")
+    except Exception as e:
+        print(f"ERROR: could not fetch memories: {e}", file=sys.stderr)
+        return {}
+    mems = result.get("memories", [])
+    counts: dict[str, int] = {}
+    for m in mems:
+        tags = m.get("domain_tags") or []
+        if isinstance(tags, str):
+            try:
+                tags = json.loads(tags)
+            except Exception:
+                tags = []
+        if not isinstance(tags, list):
+            continue
+        for t in tags:
+            if not isinstance(t, str):
+                continue
+            key = t.strip().lower()
+            if key:
+                counts[key] = counts.get(key, 0) + 1
+    return counts
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Phase 7C tag canonicalization detector")
+    parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
+    parser.add_argument("--model", default=DEFAULT_MODEL)
+    parser.add_argument("--timeout-s", type=float, default=DEFAULT_TIMEOUT_S)
+    parser.add_argument("--no-auto-approve", action="store_true",
+                        help="Disable autonomous apply; all proposals → human queue")
+    parser.add_argument("--dry-run", action="store_true",
+                        help="Print decisions without touching state")
+    args = parser.parse_args()
+
+    base = args.base_url.rstrip("/")
+    autonomous = not args.no_auto_approve
+
+    print(
+        f"canonicalize_tags {TAG_CANON_PROMPT_VERSION} | model={args.model} | "
+        f"autonomous={autonomous} | auto-approve conf>={AUTO_APPROVE_CONF}"
+    )
+
+    dist = fetch_tag_distribution(base)
+    print(f"tag distribution: {len(dist)} unique tags, "
+          f"{sum(dist.values())} total references")
+    if not dist:
+        print("no tags found — nothing to canonicalize")
+        return
+
+    user_msg = build_user_message(dist)
+    raw, err = call_claude(user_msg, args.model, args.timeout_s)
+    if err or raw is None:
+        print(f"ERROR: LLM call failed: {err}", file=sys.stderr)
+        return
+
+    aliases_raw = parse_canon_output(raw)
+    print(f"LLM returned {len(aliases_raw)} raw alias proposals")
+
+    auto_applied = 0
+    auto_skipped_missing_canonical = 0
+    proposals_created = 0
+    duplicates_skipped = 0
+
+    for item in aliases_raw:
+        norm = normalize_alias_item(item)
+        if norm is None:
+            continue
+        alias = norm["alias"]
+        canonical = norm["canonical"]
+        confidence = norm["confidence"]
+
+        alias_count = dist.get(alias, 0)
+        canonical_count = dist.get(canonical, 0)
+
+        # Sanity: alias must actually exist in the current distribution
+        if alias_count < MIN_ALIAS_COUNT:
+            print(f"  SKIP {alias!r} → {canonical!r}: alias not in distribution")
+            continue
+        if canonical_count == 0:
+            auto_skipped_missing_canonical += 1
+            print(f"  SKIP {alias!r} → {canonical!r}: canonical missing from distribution")
+            continue
+
+        label = f"{alias!r} ({alias_count}) → {canonical!r} ({canonical_count}) conf={confidence:.2f}"
+
+        auto_apply = autonomous and confidence >= AUTO_APPROVE_CONF
+        if auto_apply:
+            if args.dry_run:
+                auto_applied += 1
+                print(f"  [dry-run] would auto-apply: {label}")
+                continue
+            try:
+                result = api_post(base, "/admin/tags/aliases/apply", {
+                    "alias": alias, "canonical": canonical,
+                    "confidence": confidence, "reason": norm["reason"],
+                    "alias_count": alias_count, "canonical_count": canonical_count,
+                    "actor": "auto-tag-canon",
+                })
+                touched = result.get("memories_touched", 0)
+                auto_applied += 1
+                print(f"  ✅ auto-applied: {label} ({touched} memories)")
+            except Exception as e:
+                print(f"  ⚠️ auto-apply failed: {label} — {e}", file=sys.stderr)
+            time.sleep(0.2)
+            continue
+
+        # Lower confidence → human review
+        if args.dry_run:
+            proposals_created += 1
+            print(f"  [dry-run] would propose for review: {label}")
+            continue
+        try:
+            result = api_post(base, "/admin/tags/aliases/propose", {
+                "alias": alias, "canonical": canonical,
+                "confidence": confidence, "reason": norm["reason"],
+                "alias_count": alias_count, "canonical_count": canonical_count,
+            })
+            if result.get("proposal_id"):
+                proposals_created += 1
+                print(f"  → pending proposal: {label}")
+            else:
+                duplicates_skipped += 1
+                print(f"  (duplicate pending proposal): {label}")
+        except Exception as e:
+            print(f"  ⚠️ propose failed: {label} — {e}", file=sys.stderr)
+        time.sleep(0.2)
+
+    print(
+        f"\nsummary: proposals_seen={len(aliases_raw)} "
+        f"auto_applied={auto_applied} "
+        f"proposals_created={proposals_created} "
+        f"duplicates_skipped={duplicates_skipped} "
+        f"skipped_missing_canonical={auto_skipped_missing_canonical}"
+    )
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/detect_emerging.py
+++ b/scripts/detect_emerging.py
@@ -0,0 +1,223 @@
+#!/usr/bin/env python3
+"""Phase 6 C.1 — Emerging-concepts detector (HTTP-only).
+
+Scans active + candidate memories via the HTTP API to surface:
+  1. Unregistered projects — project strings appearing on 3+ memories
+     that aren't in the project registry. Surface for one-click
+     registration.
+  2. Emerging categories — top 20 domain_tags by frequency, for
+     "what themes are emerging in my work?" intelligence.
+  3. Reinforced transients — active memories with reference_count >= 5
+     AND valid_until set. These "were temporary but now durable"; a
+     sibling endpoint (/admin/memory/extend-reinforced) actually
+     performs the extension.
+
+Writes results to project_state under atocore/proposals/* via the API.
+Runs host-side (cron calls it) so uses stdlib only — no atocore deps.
+
+Usage:
+  python3 scripts/detect_emerging.py [--base-url URL] [--dry-run]
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+import urllib.error
+import urllib.request
+from collections import Counter, defaultdict
+
+PROJECT_MIN_MEMORIES = int(os.environ.get("ATOCORE_EMERGING_PROJECT_MIN", "3"))
+PROJECT_ALERT_THRESHOLD = int(os.environ.get("ATOCORE_EMERGING_ALERT_THRESHOLD", "5"))
+TOP_TAGS_LIMIT = int(os.environ.get("ATOCORE_EMERGING_TOP_TAGS", "20"))
+
+
+def api_get(base_url: str, path: str, timeout: int = 30) -> dict:
+    req = urllib.request.Request(f"{base_url}{path}")
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        return json.loads(resp.read().decode("utf-8"))
+
+
+def api_post(base_url: str, path: str, body: dict, timeout: int = 10) -> dict:
+    data = json.dumps(body).encode("utf-8")
+    req = urllib.request.Request(
+        f"{base_url}{path}", method="POST",
+        headers={"Content-Type": "application/json"}, data=data,
+    )
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        return json.loads(resp.read().decode("utf-8"))
+
+
+def fetch_registered_project_names(base_url: str) -> set[str]:
+    """Set of all registered project ids + aliases, lowercased."""
+    try:
+        result = api_get(base_url, "/projects")
+    except Exception as e:
+        print(f"WARN: could not load project registry: {e}", file=sys.stderr)
+        return set()
+    registered = set()
+    for p in result.get("projects", []):
+        pid = (p.get("project_id") or p.get("id") or p.get("name") or "").strip()
+        if pid:
+            registered.add(pid.lower())
+        for alias in p.get("aliases", []) or []:
+            if isinstance(alias, str) and alias.strip():
+                registered.add(alias.strip().lower())
+    return registered
+
+
+def fetch_memories(base_url: str, status: str, limit: int = 500) -> list[dict]:
+    try:
+        params = f"limit={limit}"
+        if status == "active":
+            params += "&active_only=true"
+        else:
+            params += f"&status={status}"
+        result = api_get(base_url, f"/memory?{params}")
+        return result.get("memories", [])
+    except Exception as e:
+        print(f"WARN: could not fetch {status} memories: {e}", file=sys.stderr)
+        return []
+
+
+def fetch_previous_proposals(base_url: str) -> list[dict]:
+    """Read last run's unregistered_projects to diff against this run."""
+    try:
+        result = api_get(base_url, "/project/state/atocore")
+        entries = result.get("entries", result.get("state", []))
+        for e in entries:
+            if e.get("category") == "proposals" and e.get("key") == "unregistered_projects_prev":
+                try:
+                    return json.loads(e.get("value") or "[]")
+                except Exception:
+                    return []
+    except Exception:
+        pass
+    return []
+
+
+def set_state(base_url: str, category: str, key: str, value: str, source: str = "emerging detector") -> None:
+    api_post(base_url, "/project/state", {
+        "project": "atocore",
+        "category": category,
+        "key": key,
+        "value": value,
+        "source": source,
+    })
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Detect emerging projects + categories")
+    parser.add_argument("--base-url", default=os.environ.get("ATOCORE_BASE_URL", "http://127.0.0.1:8100"))
+    parser.add_argument("--dry-run", action="store_true", help="Report without writing to project state")
+    args = parser.parse_args()
+
+    base = args.base_url.rstrip("/")
+
+    registered = fetch_registered_project_names(base)
+    active = fetch_memories(base, "active")
+    candidates = fetch_memories(base, "candidate")
+    all_mems = active + candidates
+
+    # --- Unregistered projects ---
+    project_mems: dict[str, list] = defaultdict(list)
+    for m in all_mems:
+        proj = (m.get("project") or "").strip().lower()
+        if not proj or proj in registered:
+            continue
+        project_mems[proj].append(m)
+
+    unregistered = []
+    for proj, mems in sorted(project_mems.items()):
+        if len(mems) < PROJECT_MIN_MEMORIES:
+            continue
+        unregistered.append({
+            "project": proj,
+            "count": len(mems),
+            "sample_memory_ids": [m.get("id") for m in mems[:3]],
+            "sample_contents": [(m.get("content") or "")[:150] for m in mems[:3]],
+        })
+
+    # --- Emerging domain_tags (active only) ---
+    tag_counter: Counter = Counter()
+    for m in active:
+        for t in (m.get("domain_tags") or []):
+            if isinstance(t, str) and t.strip():
+                tag_counter[t.strip().lower()] += 1
+    emerging_tags = [{"tag": tag, "count": cnt} for tag, cnt in tag_counter.most_common(TOP_TAGS_LIMIT)]
+
+    # --- Reinforced transients (active, high refs, has expiry) ---
+    reinforced = []
+    for m in active:
+        ref_count = int(m.get("reference_count") or 0)
+        vu = (m.get("valid_until") or "").strip()
+        if ref_count >= 5 and vu:
+            reinforced.append({
+                "memory_id": m.get("id"),
+                "reference_count": ref_count,
+                "valid_until": vu,
+                "content_preview": (m.get("content") or "")[:150],
+                "project": m.get("project") or "",
+            })
+
+    result = {
+        "unregistered_projects": unregistered,
+        "emerging_categories": emerging_tags,
+        "reinforced_transients": reinforced,
+        "counts": {
+            "active_memories": len(active),
+            "candidate_memories": len(candidates),
+            "unregistered_project_count": len(unregistered),
+            "emerging_tag_count": len(emerging_tags),
+            "reinforced_transient_count": len(reinforced),
+        },
+    }
+
+    print(json.dumps(result, indent=2))
+
+    if args.dry_run:
+        return
+
+    # --- Persist to project state via HTTP ---
+    try:
+        set_state(base, "proposals", "unregistered_projects", json.dumps(unregistered))
+        set_state(base, "proposals", "emerging_categories", json.dumps(emerging_tags))
+        set_state(base, "proposals", "reinforced_transients", json.dumps(reinforced))
+    except Exception as e:
+        print(f"WARN: failed to persist proposals: {e}", file=sys.stderr)
+
+    # --- Alert on NEW projects crossing the threshold ---
+    try:
+        prev = fetch_previous_proposals(base)
+        prev_names = {p.get("project") for p in prev if isinstance(p, dict)}
+        newly_crossed = [
+            p for p in unregistered
+            if p["count"] >= PROJECT_ALERT_THRESHOLD
+            and p["project"] not in prev_names
+        ]
+        if newly_crossed:
+            names = ", ".join(p["project"] for p in newly_crossed)
+            # Use existing alert mechanism via state (Phase 4 infra)
+            try:
+                set_state(base, "alert", "last_warning", json.dumps({
+                    "title": f"Emerging project(s) detected: {names}",
+                    "message": (
+                        f"{len(newly_crossed)} unregistered project(s) crossed "
+                        f"the {PROJECT_ALERT_THRESHOLD}-memory threshold. "
+                        f"Review at /wiki or /admin/dashboard."
+                    ),
+                    "timestamp": "",
+                }))
+            except Exception:
+                pass
+
+        # Snapshot for next run's diff
+        set_state(base, "proposals", "unregistered_projects_prev", json.dumps(unregistered))
+    except Exception as e:
+        print(f"WARN: alert/state write failed: {e}", file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/memory_dedup.py
+++ b/scripts/memory_dedup.py
@@ -0,0 +1,463 @@
+#!/usr/bin/env python3
+"""Phase 7A — semantic memory dedup detector.
+
+Finds clusters of near-duplicate active memories and writes merge-
+candidate proposals for human review in the triage UI.
+
+Algorithm:
+  1. Fetch active memories via HTTP
+  2. Group by (project, memory_type) — cross-bucket merges are deferred
+     to Phase 7B contradiction flow
+  3. Within each group, embed contents via atocore.retrieval.embeddings
+  4. Greedy transitive cluster at similarity >= threshold
+  5. For each cluster of size >= 2, ask claude-p to draft unified content
+  6. POST the proposal to /admin/memory/merge-candidates/create (server-
+     side dedupes by the sorted memory-id set, so re-runs don't double-
+     create)
+
+Host-side because claude CLI lives on Dalidou, not the container. Reuses
+the same PYTHONPATH=src pattern as scripts/graduate_memories.py for
+atocore imports (embeddings, similarity, prompt module).
+
+Usage:
+  python3 scripts/memory_dedup.py --base-url http://127.0.0.1:8100 \\
+      --similarity-threshold 0.88 --max-batch 50
+
+Threshold conventions (see Phase 7 doc):
+  0.88  interactive / default — balanced precision/recall
+  0.90  nightly cron — tight, only near-duplicates
+  0.85  weekly cron — deeper cleanup
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import shutil
+import subprocess
+import sys
+import tempfile
+import time
+import urllib.error
+import urllib.request
+from collections import defaultdict
+from typing import Any
+
+# Make src/ importable — same pattern as graduate_memories.py
+_SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+_SRC_DIR = os.path.abspath(os.path.join(_SCRIPT_DIR, "..", "src"))
+if _SRC_DIR not in sys.path:
+    sys.path.insert(0, _SRC_DIR)
+
+from atocore.memory._dedup_prompt import (  # noqa: E402
+    DEDUP_PROMPT_VERSION,
+    SYSTEM_PROMPT,
+    TIER2_SYSTEM_PROMPT,
+    build_tier2_user_message,
+    build_user_message,
+    normalize_merge_verdict,
+    parse_merge_verdict,
+)
+from atocore.memory.similarity import cluster_by_threshold  # noqa: E402
+
+DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://127.0.0.1:8100")
+DEFAULT_MODEL = os.environ.get("ATOCORE_DEDUP_MODEL", "sonnet")
+DEFAULT_TIER2_MODEL = os.environ.get("ATOCORE_DEDUP_TIER2_MODEL", "opus")
+DEFAULT_TIMEOUT_S = float(os.environ.get("ATOCORE_DEDUP_TIMEOUT_S", "60"))
+
+# Phase 7A.1 — auto-merge tiering thresholds.
+# TIER-1 auto-approve: if sonnet confidence >= this AND min pairwise
+# similarity >= AUTO_APPROVE_SIM AND all sources share project+type → merge silently.
+AUTO_APPROVE_CONF = float(os.environ.get("ATOCORE_DEDUP_AUTO_APPROVE_CONF", "0.8"))
+AUTO_APPROVE_SIM = float(os.environ.get("ATOCORE_DEDUP_AUTO_APPROVE_SIM", "0.92"))
+# TIER-2 escalation band: sonnet uncertain but pair is similar enough to be worth opus time.
+TIER2_MIN_CONF = float(os.environ.get("ATOCORE_DEDUP_TIER2_MIN_CONF", "0.5"))
+TIER2_MIN_SIM = float(os.environ.get("ATOCORE_DEDUP_TIER2_MIN_SIM", "0.85"))
+
+_sandbox_cwd = None
+
+
+def get_sandbox_cwd() -> str:
+    global _sandbox_cwd
+    if _sandbox_cwd is None:
+        _sandbox_cwd = tempfile.mkdtemp(prefix="ato-dedup-")
+    return _sandbox_cwd
+
+
+def api_get(base_url: str, path: str) -> dict:
+    req = urllib.request.Request(f"{base_url}{path}")
+    with urllib.request.urlopen(req, timeout=30) as resp:
+        return json.loads(resp.read().decode("utf-8"))
+
+
+def api_post(base_url: str, path: str, body: dict | None = None) -> dict:
+    data = json.dumps(body or {}).encode("utf-8")
+    req = urllib.request.Request(
+        f"{base_url}{path}", method="POST",
+        headers={"Content-Type": "application/json"}, data=data,
+    )
+    with urllib.request.urlopen(req, timeout=30) as resp:
+        return json.loads(resp.read().decode("utf-8"))
+
+
+def call_claude(system_prompt: str, user_message: str, model: str, timeout_s: float) -> tuple[str | None, str | None]:
+    """Shared CLI caller with retry + stderr capture (mirrors auto_triage)."""
+    if not shutil.which("claude"):
+        return None, "claude CLI not available"
+    args = [
+        "claude", "-p",
+        "--model", model,
+        "--append-system-prompt", system_prompt,
+        "--disable-slash-commands",
+        user_message,
+    ]
+    last_error = ""
+    for attempt in range(3):
+        if attempt > 0:
+            time.sleep(2 ** attempt)
+        try:
+            completed = subprocess.run(
+                args, capture_output=True, text=True,
+                timeout=timeout_s, cwd=get_sandbox_cwd(),
+                encoding="utf-8", errors="replace",
+            )
+        except subprocess.TimeoutExpired:
+            last_error = f"{model} timed out"
+            continue
+        except Exception as exc:
+            last_error = f"subprocess error: {exc}"
+            continue
+        if completed.returncode == 0:
+            return (completed.stdout or "").strip(), None
+        stderr = (completed.stderr or "").strip()[:200]
+        last_error = f"{model} exit {completed.returncode}: {stderr}" if stderr else f"{model} exit {completed.returncode}"
+    return None, last_error
+
+
+def fetch_active_memories(base_url: str, project: str | None) -> list[dict]:
+    # The /memory endpoint with active_only=true returns active memories.
+    # Graduated memories are exempt from dedup — they're frozen pointers
+    # to entities. Filter them out on the client side.
+    params = "active_only=true&limit=2000"
+    if project:
+        params += f"&project={urllib.request.quote(project)}"
+    try:
+        result = api_get(base_url, f"/memory?{params}")
+    except Exception as e:
+        print(f"ERROR: could not fetch memories: {e}", file=sys.stderr)
+        return []
+    mems = result.get("memories", [])
+    return [m for m in mems if (m.get("status") or "active") == "active"]
+
+
+def group_memories(mems: list[dict]) -> dict[tuple[str, str], list[dict]]:
+    """Bucket by (project, memory_type). Empty project is its own bucket."""
+    buckets: dict[tuple[str, str], list[dict]] = defaultdict(list)
+    for m in mems:
+        key = ((m.get("project") or "").strip().lower(), (m.get("memory_type") or "").strip().lower())
+        buckets[key].append(m)
+    return buckets
+
+
+def draft_merge(sources: list[dict], model: str, timeout_s: float) -> dict[str, Any] | None:
+    """Tier-1 draft: cheap sonnet call proposes the unified content."""
+    user_msg = build_user_message(sources)
+    raw, err = call_claude(SYSTEM_PROMPT, user_msg, model, timeout_s)
+    if err:
+        print(f"  WARN: claude tier-1 failed: {err}", file=sys.stderr)
+        return None
+    parsed = parse_merge_verdict(raw or "")
+    if parsed is None:
+        print(f"  WARN: could not parse tier-1 verdict: {(raw or '')[:200]}", file=sys.stderr)
+        return None
+    return normalize_merge_verdict(parsed)
+
+
+def tier2_review(
+    sources: list[dict],
+    tier1_verdict: dict[str, Any],
+    model: str,
+    timeout_s: float,
+) -> dict[str, Any] | None:
+    """Tier-2 second opinion: opus confirms or overrides the tier-1 draft."""
+    user_msg = build_tier2_user_message(sources, tier1_verdict)
+    raw, err = call_claude(TIER2_SYSTEM_PROMPT, user_msg, model, timeout_s)
+    if err:
+        print(f"  WARN: claude tier-2 failed: {err}", file=sys.stderr)
+        return None
+    parsed = parse_merge_verdict(raw or "")
+    if parsed is None:
+        print(f"  WARN: could not parse tier-2 verdict: {(raw or '')[:200]}", file=sys.stderr)
+        return None
+    return normalize_merge_verdict(parsed)
+
+
+def min_pairwise_similarity(texts: list[str]) -> float:
+    """Return the minimum pairwise cosine similarity across N texts.
+
+    Used to sanity-check a transitive cluster: A~B~C doesn't guarantee
+    A~C, so the auto-approve threshold should be met by the WEAKEST
+    pair, not just by the strongest. If the cluster has N=2 this is just
+    the one pairwise similarity.
+    """
+    if len(texts) < 2:
+        return 0.0
+    # Reuse similarity_matrix rather than computing it ourselves
+    from atocore.memory.similarity import similarity_matrix
+    m = similarity_matrix(texts)
+    min_sim = 1.0
+    for i in range(len(texts)):
+        for j in range(i + 1, len(texts)):
+            if m[i][j] < min_sim:
+                min_sim = m[i][j]
+    return min_sim
+
+
+def submit_candidate(
+    base_url: str,
+    memory_ids: list[str],
+    similarity: float,
+    verdict: dict[str, Any],
+    dry_run: bool,
+) -> str | None:
+    body = {
+        "memory_ids": memory_ids,
+        "similarity": similarity,
+        "proposed_content": verdict["content"],
+        "proposed_memory_type": verdict["memory_type"],
+        "proposed_project": verdict["project"],
+        "proposed_tags": verdict["domain_tags"],
+        "proposed_confidence": verdict["confidence"],
+        "reason": verdict["reason"],
+    }
+    if dry_run:
+        print(f"  [dry-run] would POST: {json.dumps(body)[:200]}...")
+        return "dry-run"
+    try:
+        result = api_post(base_url, "/admin/memory/merge-candidates/create", body)
+        return result.get("candidate_id")
+    except urllib.error.HTTPError as e:
+        print(f"  ERROR: submit failed: {e.code} {e.read().decode()[:200]}", file=sys.stderr)
+        return None
+    except Exception as e:
+        print(f"  ERROR: submit failed: {e}", file=sys.stderr)
+        return None
+
+
+def auto_approve(base_url: str, candidate_id: str, actor: str, dry_run: bool) -> str | None:
+    """POST /admin/memory/merge-candidates/{id}/approve. Returns result_memory_id."""
+    if dry_run:
+        return "dry-run"
+    try:
+        result = api_post(
+            base_url,
+            f"/admin/memory/merge-candidates/{candidate_id}/approve",
+            {"actor": actor},
+        )
+        return result.get("result_memory_id")
+    except Exception as e:
+        print(f"  ERROR: auto-approve failed: {e}", file=sys.stderr)
+        return None
+
+
+def same_bucket(sources: list[dict]) -> bool:
+    """All sources share the same (project, memory_type)."""
+    if not sources:
+        return False
+    proj = (sources[0].get("project") or "").strip().lower()
+    mtype = (sources[0].get("memory_type") or "").strip().lower()
+    for s in sources[1:]:
+        if (s.get("project") or "").strip().lower() != proj:
+            return False
+        if (s.get("memory_type") or "").strip().lower() != mtype:
+            return False
+    return True
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Phase 7A semantic dedup detector (tiered)")
+    parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
+    parser.add_argument("--project", default="", help="Only scan this project (empty = all)")
+    parser.add_argument("--similarity-threshold", type=float, default=0.88)
+    parser.add_argument("--max-batch", type=int, default=50,
+                        help="Max clusters to process per run")
+    parser.add_argument("--model", default=DEFAULT_MODEL, help="Tier-1 model (default: sonnet)")
+    parser.add_argument("--tier2-model", default=DEFAULT_TIER2_MODEL, help="Tier-2 model (default: opus)")
+    parser.add_argument("--timeout-s", type=float, default=DEFAULT_TIMEOUT_S)
+    parser.add_argument("--no-auto-approve", action="store_true",
+                        help="Disable autonomous merging; all merges land in human triage queue")
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args()
+
+    base = args.base_url.rstrip("/")
+    autonomous = not args.no_auto_approve
+
+    print(
+        f"memory_dedup {DEDUP_PROMPT_VERSION} | threshold={args.similarity_threshold} | "
+        f"tier1={args.model} tier2={args.tier2_model} | "
+        f"autonomous={autonomous} | "
+        f"auto-approve: conf>={AUTO_APPROVE_CONF} sim>={AUTO_APPROVE_SIM}"
+    )
+    mems = fetch_active_memories(base, args.project or None)
+    print(f"fetched {len(mems)} active memories")
+    if not mems:
+        return
+
+    buckets = group_memories(mems)
+    print(f"grouped into {len(buckets)} (project, memory_type) buckets")
+
+    clusters_found = 0
+    auto_merged_tier1 = 0
+    auto_merged_tier2 = 0
+    human_candidates = 0
+    tier1_rejections = 0
+    tier2_overrides = 0  # opus disagreed with sonnet
+    skipped_low_sim = 0
+    skipped_existing = 0
+    processed = 0
+
+    for (proj, mtype), group in sorted(buckets.items()):
+        if len(group) < 2:
+            continue
+        if processed >= args.max_batch:
+            print(f"reached max-batch={args.max_batch}, stopping")
+            break
+
+        texts = [(m.get("content") or "") for m in group]
+        clusters = cluster_by_threshold(texts, args.similarity_threshold)
+        clusters = [c for c in clusters if len(c) >= 2]
+        if not clusters:
+            continue
+
+        print(f"\n[{proj or '(global)'}/{mtype}] {len(group)} mems → {len(clusters)} cluster(s)")
+        for cluster in clusters:
+            if processed >= args.max_batch:
+                break
+            clusters_found += 1
+            sources = [group[i] for i in cluster]
+            ids = [s["id"] for s in sources]
+            cluster_texts = [texts[i] for i in cluster]
+            min_sim = min_pairwise_similarity(cluster_texts)
+            print(f"  cluster of {len(cluster)} (min_sim={min_sim:.3f}): {[s['id'][:8] for s in sources]}")
+
+            # Tier-1 draft
+            tier1 = draft_merge(sources, args.model, args.timeout_s)
+            processed += 1
+            if tier1 is None:
+                continue
+            if tier1["action"] == "reject":
+                tier1_rejections += 1
+                print(f"    TIER-1 rejected: {tier1['reason'][:100]}")
+                continue
+
+            # --- Tiering decision ---
+            bucket_ok = same_bucket(sources)
+            tier1_ok = (
+                tier1["confidence"] >= AUTO_APPROVE_CONF
+                and min_sim >= AUTO_APPROVE_SIM
+                and bucket_ok
+            )
+
+            if autonomous and tier1_ok:
+                cid = submit_candidate(base, ids, min_sim, tier1, args.dry_run)
+                if cid == "dry-run":
+                    auto_merged_tier1 += 1
+                    print("    [dry-run] would auto-merge (tier-1)")
+                elif cid:
+                    new_id = auto_approve(base, cid, actor="auto-dedup-tier1", dry_run=args.dry_run)
+                    if new_id:
+                        auto_merged_tier1 += 1
+                        print(f"    ✅ auto-merged (tier-1) → {str(new_id)[:8]}")
+                    else:
+                        print(f"    ⚠️ tier-1 approve failed; candidate {cid[:8]} left pending")
+                        human_candidates += 1
+                else:
+                    skipped_existing += 1
+                time.sleep(0.3)
+                continue
+
+            # Not tier-1 auto-approve. Decide if it's worth tier-2 escalation.
+            tier2_eligible = (
+                autonomous
+                and min_sim >= TIER2_MIN_SIM
+                and tier1["confidence"] >= TIER2_MIN_CONF
+                and bucket_ok
+            )
+
+            if tier2_eligible:
+                print("    → escalating to tier-2 (opus)…")
+                tier2 = tier2_review(sources, tier1, args.tier2_model, args.timeout_s)
+                if tier2 is None:
+                    # Opus errored. Fall back to human triage.
+                    cid = submit_candidate(base, ids, min_sim, tier1, args.dry_run)
+                    if cid and cid != "dry-run":
+                        human_candidates += 1
+                        print(f"    → candidate {cid[:8]} (tier-2 errored, human review)")
+                    time.sleep(0.5)
+                    continue
+
+                if tier2["action"] == "reject":
+                    tier2_overrides += 1
+                    print(f"    ❌ TIER-2 override (reject): {tier2['reason'][:100]}")
+                    time.sleep(0.5)
+                    continue
+
+                if tier2["confidence"] >= AUTO_APPROVE_CONF:
+                    # Opus confirms. Auto-merge using opus's (possibly refined) content.
+                    cid = submit_candidate(base, ids, min_sim, tier2, args.dry_run)
+                    if cid == "dry-run":
+                        auto_merged_tier2 += 1
+                        print("    [dry-run] would auto-merge (tier-2)")
+                    elif cid:
+                        new_id = auto_approve(base, cid, actor="auto-dedup-tier2", dry_run=args.dry_run)
+                        if new_id:
+                            auto_merged_tier2 += 1
+                            print(f"    ✅ auto-merged (tier-2) → {str(new_id)[:8]}")
+                        else:
+                            human_candidates += 1
+                            print(f"    ⚠️ tier-2 approve failed; candidate {cid[:8]} left pending")
+                    else:
+                        skipped_existing += 1
+                    time.sleep(0.5)
+                    continue
+
+                # Opus confirmed but low confidence → human review with opus's draft
+                cid = submit_candidate(base, ids, min_sim, tier2, args.dry_run)
+                if cid and cid != "dry-run":
+                    human_candidates += 1
+                    print(f"    → candidate {cid[:8]} (tier-2 low-confidence, human review)")
+                time.sleep(0.5)
+                continue
+
+            # Below tier-2 eligibility (either non-autonomous mode, or
+            # similarity too low / cross-bucket). Always human review.
+            if min_sim < TIER2_MIN_SIM or not bucket_ok:
+                skipped_low_sim += 1
+                # Still create a human candidate so it's visible, but log why
+                print(f"    → below auto-tier thresholds (min_sim={min_sim:.3f}, bucket_ok={bucket_ok})")
+
+            cid = submit_candidate(base, ids, min_sim, tier1, args.dry_run)
+            if cid == "dry-run":
+                human_candidates += 1
+            elif cid:
+                human_candidates += 1
+                print(f"    → candidate {cid[:8]} (human review)")
+            else:
+                skipped_existing += 1
+            time.sleep(0.3)
+
+    print(
+        f"\nsummary: clusters_found={clusters_found} "
+        f"auto_merged_tier1={auto_merged_tier1} "
+        f"auto_merged_tier2={auto_merged_tier2} "
+        f"human_candidates={human_candidates} "
+        f"tier1_rejections={tier1_rejections} "
+        f"tier2_overrides={tier2_overrides} "
+        f"skipped_low_sim={skipped_low_sim} "
+        f"skipped_existing={skipped_existing}"
+    )
+
+
+if __name__ == "__main__":
+    main()
--- a/src/atocore/api/routes.py
+++ b/src/atocore/api/routes.py
@@ -33,8 +33,12 @@ from atocore.interactions.service import (
 )
 from atocore.engineering.mirror import generate_project_overview
 from atocore.engineering.wiki import (
+    render_activity,
+    render_capture,
+    render_domain,
    render_entity,
    render_homepage,
+    render_memory_detail,
    render_project,
    render_search,
 )
@@ -119,6 +123,33 @@ def wiki_search(q: str = "") -> HTMLResponse:
    return HTMLResponse(content=render_search(q))


+@router.get("/wiki/capture", response_class=HTMLResponse)
+def wiki_capture() -> HTMLResponse:
+    """Phase 7I follow-up: paste mobile/desktop chats into AtoCore."""
+    return HTMLResponse(content=render_capture())
+
+
+@router.get("/wiki/memories/{memory_id}", response_class=HTMLResponse)
+def wiki_memory(memory_id: str) -> HTMLResponse:
+    """Phase 7E: memory detail with audit trail + neighbors."""
+    html = render_memory_detail(memory_id)
+    if html is None:
+        raise HTTPException(status_code=404, detail="Memory not found")
+    return HTMLResponse(content=html)
+
+
+@router.get("/wiki/domains/{tag}", response_class=HTMLResponse)
+def wiki_domain(tag: str) -> HTMLResponse:
+    """Phase 7F: cross-project view for a domain tag."""
+    return HTMLResponse(content=render_domain(tag))
+
+
+@router.get("/wiki/activity", response_class=HTMLResponse)
+def wiki_activity(hours: int = 48, limit: int = 100) -> HTMLResponse:
+    """Autonomous-activity timeline feed."""
+    return HTMLResponse(content=render_activity(hours=hours, limit=limit))
+
+
@router.get("/admin/triage", response_class=HTMLResponse)
 def admin_triage(limit: int = 100) -> HTMLResponse:
    """Human triage UI for candidate memories.
@@ -369,6 +400,72 @@ def api_project_registration(req: ProjectRegistrationProposalRequest) -> dict:
        raise HTTPException(status_code=400, detail=str(e))


+class RegisterEmergingRequest(BaseModel):
+    project_id: str
+    description: str = ""
+    aliases: list[str] | None = None
+
+
+@router.post("/admin/projects/register-emerging")
+def api_register_emerging_project(req: RegisterEmergingRequest) -> dict:
+    """Phase 6 C.2 — one-click register a detected emerging project.
+
+    Fills in sensible defaults so the user doesn't have to think about
+    paths: ingest_roots defaults to vault:incoming/projects/<project_id>/
+    (will be empty until the user creates content there, which is fine).
+    Delegates to the existing register_project() for validation + file
+    write. Clears the project from the unregistered_projects proposal
+    list so it stops appearing in the dashboard.
+    """
+    import json as _json
+
+    pid = (req.project_id or "").strip().lower()
+    if not pid:
+        raise HTTPException(status_code=400, detail="project_id is required")
+
+    aliases = req.aliases or []
+    description = req.description or f"Emerging project registered from dashboard: {pid}"
+    ingest_roots = [{
+        "source": "vault",
+        "subpath": f"incoming/projects/{pid}/",
+        "label": pid,
+    }]
+
+    try:
+        result = register_project(
+            project_id=pid,
+            aliases=aliases,
+            description=description,
+            ingest_roots=ingest_roots,
+        )
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+    # Clear from proposals so dashboard doesn't keep showing it
+    try:
+        from atocore.context.project_state import get_state, set_state
+        for e in get_state("atocore"):
+            if e.category == "proposals" and e.key == "unregistered_projects":
+                try:
+                    current = _json.loads(e.value)
+                except Exception:
+                    current = []
+                filtered = [p for p in current if p.get("project") != pid]
+                set_state(
+                    project_name="atocore",
+                    category="proposals",
+                    key="unregistered_projects",
+                    value=_json.dumps(filtered),
+                    source="register-emerging",
+                )
+                break
+    except Exception:
+        pass  # non-fatal
+
+    result["message"] = f"Project {pid!r} registered. Now has a wiki page, system map, and killer queries."
+    return result
+
+
@router.put("/projects/{project_name}")
 def api_project_update(project_name: str, req: ProjectUpdateRequest) -> dict:
    """Update an existing project registration."""
@@ -1190,6 +1287,25 @@ def api_dashboard() -> dict:
    except Exception:
        pass

+    # Phase 6 C.2: emerging-concepts proposals from the detector
+    proposals: dict = {}
+    try:
+        for entry in get_state("atocore"):
+            if entry.category != "proposals":
+                continue
+            try:
+                data = _json.loads(entry.value)
+            except Exception:
+                continue
+            if entry.key == "unregistered_projects":
+                proposals["unregistered_projects"] = data
+            elif entry.key == "emerging_categories":
+                proposals["emerging_categories"] = data
+            elif entry.key == "reinforced_transients":
+                proposals["reinforced_transients"] = data
+    except Exception:
+        pass
+
    # Project state counts — include all registered projects
    ps_counts = {}
    try:
@@ -1248,6 +1364,7 @@ def api_dashboard() -> dict:
        "integrity": integrity,
        "alerts": alerts,
        "recent_audit": recent_audit,
+        "proposals": proposals,
    }


@@ -1431,6 +1548,334 @@ def api_graduation_status() -> dict:
    return out


+# --- Phase 7C: tag canonicalization ---
+
+
+class TagAliasProposalBody(BaseModel):
+    alias: str
+    canonical: str
+    confidence: float = 0.0
+    reason: str = ""
+    alias_count: int = 0
+    canonical_count: int = 0
+
+
+class TagAliasApplyBody(BaseModel):
+    alias: str
+    canonical: str
+    confidence: float = 0.9
+    reason: str = ""
+    alias_count: int = 0
+    canonical_count: int = 0
+    actor: str = "auto-tag-canon"
+
+
+class TagAliasResolveBody(BaseModel):
+    actor: str = "human-triage"
+
+
+@router.get("/admin/tags/distribution")
+def api_tag_distribution() -> dict:
+    """Current tag distribution across active memories (for UI / debug)."""
+    from atocore.memory.service import get_tag_distribution
+    dist = get_tag_distribution()
+    sorted_tags = sorted(dist.items(), key=lambda x: x[1], reverse=True)
+    return {"total_references": sum(dist.values()), "unique_tags": len(dist),
+            "tags": [{"tag": t, "count": c} for t, c in sorted_tags]}
+
+
+@router.get("/admin/tags/aliases")
+def api_list_tag_aliases(status: str = "pending", limit: int = 100) -> dict:
+    """List tag alias proposals (default: pending for review)."""
+    from atocore.memory.service import get_tag_alias_proposals
+    rows = get_tag_alias_proposals(status=status, limit=limit)
+    return {"proposals": rows, "count": len(rows)}
+
+
+@router.post("/admin/tags/aliases/propose")
+def api_propose_tag_alias(body: TagAliasProposalBody) -> dict:
+    """Submit a low-confidence alias proposal for human review."""
+    from atocore.memory.service import create_tag_alias_proposal
+    pid = create_tag_alias_proposal(
+        alias=body.alias, canonical=body.canonical,
+        confidence=body.confidence, alias_count=body.alias_count,
+        canonical_count=body.canonical_count, reason=body.reason,
+    )
+    if pid is None:
+        return {"proposal_id": None, "duplicate": True}
+    return {"proposal_id": pid, "duplicate": False}
+
+
+@router.post("/admin/tags/aliases/apply")
+def api_apply_tag_alias(body: TagAliasApplyBody) -> dict:
+    """Apply an alias rewrite directly (used by the auto-approval path).
+
+    Creates a tag_aliases row in status=approved with the apply result
+    recorded, so autonomous merges land in the same audit surface as
+    human approvals.
+    """
+    from datetime import datetime as _dt, timezone as _tz
+
+    from atocore.memory.service import apply_tag_alias, create_tag_alias_proposal
+    from atocore.models.database import get_connection
+
+    # Record proposal + apply + mark approved in one flow
+    pid = create_tag_alias_proposal(
+        alias=body.alias, canonical=body.canonical,
+        confidence=body.confidence, alias_count=body.alias_count,
+        canonical_count=body.canonical_count, reason=body.reason,
+    )
+    if pid is None:
+        # A pending proposal already exists — don't double-apply.
+        raise HTTPException(status_code=409, detail="A pending proposal already exists for this (alias, canonical) pair — approve it via /admin/tags/aliases/{id}/approve")
+    try:
+        result = apply_tag_alias(body.alias, body.canonical, actor=body.actor)
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+    now_str = _dt.now(_tz.utc).strftime("%Y-%m-%d %H:%M:%S")
+    with get_connection() as conn:
+        conn.execute(
+            "UPDATE tag_aliases SET status = 'approved', resolved_at = ?, "
+            "resolved_by = ?, applied_to_memories = ? WHERE id = ?",
+            (now_str, body.actor, result["memories_touched"], pid),
+        )
+    return {
+        "proposal_id": pid,
+        "memories_touched": result["memories_touched"],
+        "alias": body.alias, "canonical": body.canonical,
+    }
+
+
+@router.post("/admin/tags/aliases/{proposal_id}/approve")
+def api_approve_tag_alias(proposal_id: str, body: TagAliasResolveBody) -> dict:
+    """Human-in-the-loop approve for a pending proposal."""
+    from atocore.memory.service import approve_tag_alias
+    result = approve_tag_alias(proposal_id, actor=body.actor)
+    if result is None:
+        raise HTTPException(status_code=404, detail="Proposal not found or already resolved")
+    return {"status": "approved", "proposal_id": proposal_id,
+            "memories_touched": result["memories_touched"]}
+
+
+@router.post("/admin/tags/aliases/{proposal_id}/reject")
+def api_reject_tag_alias(proposal_id: str, body: TagAliasResolveBody) -> dict:
+    """Human-in-the-loop reject for a pending proposal."""
+    from atocore.memory.service import reject_tag_alias
+    if not reject_tag_alias(proposal_id, actor=body.actor):
+        raise HTTPException(status_code=404, detail="Proposal not found or already resolved")
+    return {"status": "rejected", "proposal_id": proposal_id}
+
+
+class DecayRunBody(BaseModel):
+    idle_days_threshold: int = 30
+    daily_decay_factor: float = 0.97
+    supersede_confidence_floor: float = 0.30
+
+
+@router.post("/admin/memory/decay-run")
+def api_decay_run(body: DecayRunBody | None = None) -> dict:
+    """Phase 7D — confidence decay on unreferenced memories.
+
+    One-shot run (daily cron or on-demand). For active memories with
+    reference_count=0 and idle for >30 days: multiply confidence by
+    0.97 (~2-month half-life). Below 0.3 → auto-supersede with audit.
+
+    Reversible: reinforcement bumps confidence back up. Non-destructive:
+    superseded memories stay queryable with status filter.
+    """
+    from atocore.memory.service import decay_unreferenced_memories
+
+    b = body or DecayRunBody()
+    result = decay_unreferenced_memories(
+        idle_days_threshold=b.idle_days_threshold,
+        daily_decay_factor=b.daily_decay_factor,
+        supersede_confidence_floor=b.supersede_confidence_floor,
+    )
+    return {
+        "decayed_count": len(result["decayed"]),
+        "superseded_count": len(result["superseded"]),
+        "decayed": result["decayed"][:20],  # cap payload
+        "superseded": result["superseded"][:20],
+    }
+
+
+@router.post("/admin/memory/extend-reinforced")
+def api_extend_reinforced() -> dict:
+    """Phase 6 C.3 — batch transient-to-durable extension.
+
+    Scans active memories with valid_until in the next 30 days and
+    reference_count >= 5. Extends expiry by 90 days, or clears it
+    entirely (permanent) if reference_count >= 10. Writes audit rows.
+    """
+    from atocore.memory.service import extend_reinforced_valid_until
+    extended = extend_reinforced_valid_until()
+    return {"extended_count": len(extended), "extensions": extended}
+
+
+# --- Phase 7A: memory dedup / merge-candidate lifecycle ---
+
+
+class MergeCandidateCreateBody(BaseModel):
+    memory_ids: list[str]
+    similarity: float = 0.0
+    proposed_content: str
+    proposed_memory_type: str = "knowledge"
+    proposed_project: str = ""
+    proposed_tags: list[str] = []
+    proposed_confidence: float = 0.6
+    reason: str = ""
+
+
+class MergeCandidateApproveBody(BaseModel):
+    actor: str = "human-triage"
+    content: str | None = None
+    domain_tags: list[str] | None = None
+
+
+class MergeCandidateRejectBody(BaseModel):
+    actor: str = "human-triage"
+    note: str = ""
+
+
+class DedupScanRequestBody(BaseModel):
+    project: str = ""
+    similarity_threshold: float = 0.88
+    max_batch: int = 50
+
+
+@router.get("/admin/memory/merge-candidates")
+def api_list_merge_candidates(status: str = "pending", limit: int = 100) -> dict:
+    """Phase 7A: list merge-candidate proposals for triage UI."""
+    from atocore.memory.service import get_merge_candidates
+    cands = get_merge_candidates(status=status, limit=limit)
+    return {"candidates": cands, "count": len(cands)}
+
+
+@router.post("/admin/memory/merge-candidates/create")
+def api_create_merge_candidate(body: MergeCandidateCreateBody) -> dict:
+    """Phase 7A: host-side dedup detector submits a proposal here.
+
+    Server-side idempotency: if a pending candidate already exists for
+    the same sorted memory_id set, returns the existing id.
+    """
+    from atocore.memory.service import create_merge_candidate
+    try:
+        cid = create_merge_candidate(
+            memory_ids=body.memory_ids,
+            similarity=body.similarity,
+            proposed_content=body.proposed_content,
+            proposed_memory_type=body.proposed_memory_type,
+            proposed_project=body.proposed_project,
+            proposed_tags=body.proposed_tags,
+            proposed_confidence=body.proposed_confidence,
+            reason=body.reason,
+        )
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    if cid is None:
+        return {"candidate_id": None, "duplicate": True}
+    return {"candidate_id": cid, "duplicate": False}
+
+
+@router.post("/admin/memory/merge-candidates/{candidate_id}/approve")
+def api_approve_merge_candidate(candidate_id: str, body: MergeCandidateApproveBody) -> dict:
+    """Phase 7A: execute an approved merge. Sources → superseded; new
+    merged memory created. UI can pass content/tag edits via body."""
+    from atocore.memory.service import merge_memories
+    new_id = merge_memories(
+        candidate_id=candidate_id,
+        actor=body.actor,
+        override_content=body.content,
+        override_tags=body.domain_tags,
+    )
+    if new_id is None:
+        raise HTTPException(
+            status_code=409,
+            detail="Merge could not execute (candidate not pending, or source memory tampered)",
+        )
+    return {"status": "approved", "candidate_id": candidate_id, "result_memory_id": new_id}
+
+
+@router.post("/admin/memory/merge-candidates/{candidate_id}/reject")
+def api_reject_merge_candidate(candidate_id: str, body: MergeCandidateRejectBody) -> dict:
+    """Phase 7A: dismiss a merge candidate. Sources stay untouched."""
+    from atocore.memory.service import reject_merge_candidate
+    ok = reject_merge_candidate(candidate_id, actor=body.actor, note=body.note)
+    if not ok:
+        raise HTTPException(status_code=404, detail="Candidate not found or already resolved")
+    return {"status": "rejected", "candidate_id": candidate_id}
+
+
+@router.post("/admin/memory/dedup-scan")
+def api_request_dedup_scan(body: DedupScanRequestBody) -> dict:
+    """Phase 7A: request a host-side dedup scan.
+
+    Writes a flag in project_state with project + threshold + max_batch.
+    A host cron watcher picks it up within ~2 min and runs
+    scripts/memory_dedup.py. Mirrors /admin/graduation/request.
+    """
+    import json as _json
+    from datetime import datetime as _dt, timezone as _tz
+    from atocore.context.project_state import set_state
+
+    now = _dt.now(_tz.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+    payload = _json.dumps({
+        "project": (body.project or "").strip(),
+        "similarity_threshold": max(0.5, min(0.99, body.similarity_threshold)),
+        "max_batch": max(1, min(body.max_batch, 200)),
+        "requested_at": now,
+    })
+    set_state(
+        project_name="atocore",
+        category="config",
+        key="dedup_requested_at",
+        value=payload,
+        source="admin ui",
+    )
+    return {
+        "requested_at": now,
+        "project": body.project,
+        "similarity_threshold": body.similarity_threshold,
+        "max_batch": body.max_batch,
+        "note": "Host watcher picks up within ~2 min. Poll /admin/memory/dedup-status for progress.",
+    }
+
+
+@router.get("/admin/memory/dedup-status")
+def api_dedup_status() -> dict:
+    """Phase 7A: state of the dedup scan pipeline (UI polling)."""
+    import json as _json
+    from atocore.context.project_state import get_state
+    out = {
+        "requested": None,
+        "last_started_at": None,
+        "last_finished_at": None,
+        "last_result": None,
+        "is_running": False,
+    }
+    try:
+        for e in get_state("atocore"):
+            if e.category not in ("config", "status"):
+                continue
+            if e.key == "dedup_requested_at":
+                try:
+                    out["requested"] = _json.loads(e.value)
+                except Exception:
+                    out["requested"] = {"raw": e.value}
+            elif e.key == "dedup_last_started_at":
+                out["last_started_at"] = e.value
+            elif e.key == "dedup_last_finished_at":
+                out["last_finished_at"] = e.value
+            elif e.key == "dedup_last_result":
+                out["last_result"] = e.value
+            elif e.key == "dedup_running":
+                out["is_running"] = (e.value == "1")
+    except Exception:
+        pass
+    return out
+
+
@router.get("/admin/graduation/stats")
 def api_graduation_stats() -> dict:
    """Phase 5F graduation stats for dashboard."""
--- a/src/atocore/engineering/triage_ui.py
+++ b/src/atocore/engineering/triage_ui.py
@@ -377,6 +377,177 @@ _ENTITY_TRIAGE_CSS = """
 """


+# ---------------------------------------------------------------------
+# Phase 7A — Merge candidates (semantic dedup)
+# ---------------------------------------------------------------------
+
+_MERGE_TRIAGE_CSS = """
+<style>
+  .cand-merge { border-left: 3px solid #8b5cf6; }
+  .merge-type { background: #8b5cf6; color: white; padding: 0.1rem 0.5rem; border-radius: 3px; font-size: 0.75rem; }
+  .merge-sources { margin: 0.5rem 0 0.8rem 0; display: flex; flex-direction: column; gap: 0.35rem; }
+  .merge-source { background: var(--bg); border: 1px dashed var(--border); border-radius: 4px; padding: 0.4rem 0.6rem; font-size: 0.85rem; }
+  .merge-source-meta { font-family: monospace; font-size: 0.72rem; opacity: 0.7; margin-bottom: 0.2rem; }
+  .merge-arrow { text-align: center; font-size: 1.1rem; opacity: 0.5; margin: 0.3rem 0; }
+  .merge-proposed { background: var(--card); border: 1px solid #8b5cf6; border-radius: 4px; padding: 0.5rem; }
+  .btn-merge-approve { background: #8b5cf6; color: white; border-color: #8b5cf6; }
+  .btn-merge-approve:hover { background: #7c3aed; }
+</style>
+"""
+
+
+def _render_merge_card(cand: dict) -> str:
+    import json as _json
+    cid = _escape(cand.get("id", ""))
+    sim = cand.get("similarity") or 0.0
+    sources = cand.get("sources") or []
+    proposed_content = cand.get("proposed_content") or ""
+    proposed_tags = cand.get("proposed_tags") or []
+    proposed_project = cand.get("proposed_project") or ""
+    reason = cand.get("reason") or ""
+
+    src_html = "".join(
+        f"""
+        <div class="merge-source">
+          <div class="merge-source-meta">
+            {_escape(s.get('id','')[:8])} · [{_escape(s.get('memory_type',''))}]
+            · {_escape(s.get('project','') or '(global)')}
+            · conf {float(s.get('confidence',0)):.2f}
+            · refs {int(s.get('reference_count',0))}
+          </div>
+          <div>{_escape((s.get('content') or '')[:300])}</div>
+        </div>
+        """
+        for s in sources
+    )
+    tags_str = ", ".join(proposed_tags)
+    return f"""
+<div class="cand cand-merge" id="mcand-{cid}" data-merge-id="{cid}">
+  <div class="cand-head">
+    <span class="cand-type merge-type">[merge · {len(sources)} sources]</span>
+    <span class="cand-project">{_escape(proposed_project or '(global)')}</span>
+    <span class="cand-meta">sim ≥ {sim:.2f}</span>
+  </div>
+  <div class="merge-sources">{src_html}</div>
+  <div class="merge-arrow">↓ merged into ↓</div>
+  <div class="merge-proposed">
+    <textarea class="cand-content" id="mcontent-{cid}">{_escape(proposed_content)}</textarea>
+    <div class="cand-meta-row">
+      <label class="cand-field-label">Tags:
+        <input type="text" class="cand-tags-input" id="mtags-{cid}" value="{_escape(tags_str)}" placeholder="tag1, tag2">
+      </label>
+    </div>
+    {f'<div class="auto-triage-msg" style="margin-top:0.4rem;">💡 {_escape(reason)}</div>' if reason else ''}
+  </div>
+  <div class="cand-actions">
+    <button class="btn-merge-approve" data-merge-id="{cid}" title="Approve merge">✅ Approve Merge</button>
+    <button class="btn-reject" data-merge-id="{cid}" data-merge-reject="1" title="Keep separate">❌ Keep Separate</button>
+  </div>
+  <div class="cand-status" id="mstatus-{cid}"></div>
+</div>
+"""
+
+
+_MERGE_TRIAGE_SCRIPT = """
+<script>
+async function mergeApprove(id) {
+  const st = document.getElementById('mstatus-' + id);
+  st.textContent = 'Merging…';
+  st.className = 'cand-status ok';
+  const content = document.getElementById('mcontent-' + id).value;
+  const tagsRaw = document.getElementById('mtags-' + id).value;
+  const tags = tagsRaw.split(',').map(t => t.trim()).filter(Boolean);
+  const r = await fetch('/admin/memory/merge-candidates/' + encodeURIComponent(id) + '/approve', {
+    method: 'POST',
+    headers: {'Content-Type': 'application/json'},
+    body: JSON.stringify({actor: 'human-triage', content: content, domain_tags: tags}),
+  });
+  if (r.ok) {
+    const data = await r.json();
+    st.textContent = '✅ Merged → ' + (data.result_memory_id || '').slice(0, 8);
+    setTimeout(() => {
+      const card = document.getElementById('mcand-' + id);
+      if (card) { card.style.opacity = '0'; setTimeout(() => card.remove(), 300); }
+    }, 600);
+  } else {
+    const err = await r.text();
+    st.textContent = '❌ ' + r.status + ': ' + err.slice(0, 120);
+    st.className = 'cand-status err';
+  }
+}
+
+async function mergeReject(id) {
+  const st = document.getElementById('mstatus-' + id);
+  st.textContent = 'Rejecting…';
+  st.className = 'cand-status ok';
+  const r = await fetch('/admin/memory/merge-candidates/' + encodeURIComponent(id) + '/reject', {
+    method: 'POST',
+    headers: {'Content-Type': 'application/json'},
+    body: JSON.stringify({actor: 'human-triage'}),
+  });
+  if (r.ok) {
+    st.textContent = '❌ Kept separate';
+    setTimeout(() => {
+      const card = document.getElementById('mcand-' + id);
+      if (card) { card.style.opacity = '0'; setTimeout(() => card.remove(), 300); }
+    }, 400);
+  } else st.textContent = '❌ ' + r.status;
+}
+
+document.addEventListener('click', (e) => {
+  const mid = e.target.dataset?.mergeId;
+  if (!mid) return;
+  if (e.target.classList.contains('btn-merge-approve')) mergeApprove(mid);
+  else if (e.target.dataset?.mergeReject) mergeReject(mid);
+});
+
+async function requestDedupScan() {
+  const btn = document.getElementById('dedup-btn');
+  const status = document.getElementById('dedup-status');
+  btn.disabled = true;
+  btn.textContent = 'Queuing…';
+  status.textContent = '';
+  status.className = 'auto-triage-msg';
+  const threshold = parseFloat(document.getElementById('dedup-threshold').value || '0.88');
+  const r = await fetch('/admin/memory/dedup-scan', {
+    method: 'POST',
+    headers: {'Content-Type': 'application/json'},
+    body: JSON.stringify({project: '', similarity_threshold: threshold, max_batch: 50}),
+  });
+  if (r.ok) {
+    status.textContent = `✓ Queued dedup scan at threshold ${threshold}. Host watcher runs every 2 min; refresh in ~3 min to see merge candidates.`;
+    status.className = 'auto-triage-msg ok';
+  } else {
+    status.textContent = '✗ ' + r.status;
+    status.className = 'auto-triage-msg err';
+  }
+  setTimeout(() => {
+    btn.disabled = false;
+    btn.textContent = '🔗 Scan for duplicates';
+  }, 2000);
+}
+</script>
+"""
+
+
+def _render_dedup_bar() -> str:
+    return """
+<div class="auto-triage-bar">
+  <button id="dedup-btn" onclick="requestDedupScan()" title="Run semantic dedup scan on Dalidou host">
+    🔗 Scan for duplicates
+  </button>
+  <label class="cand-field-label" style="margin:0 0.5rem;">
+    Threshold:
+    <input id="dedup-threshold" type="number" min="0.70" max="0.99" step="0.01" value="0.88"
+           style="width:70px; padding:0.25rem; background:var(--bg); color:var(--text); border:1px solid var(--border); border-radius:3px;">
+  </label>
+  <span id="dedup-status" class="auto-triage-msg">
+    Finds semantically near-duplicate active memories and proposes LLM-drafted merges for review. Source memories become <code>superseded</code> on approve; nothing is deleted.
+  </span>
+</div>
+"""
+
+
 def _render_graduation_bar() -> str:
    """The 'Graduate memories → entity candidates' control bar."""
    from atocore.projects.registry import load_project_registry
@@ -478,26 +649,51 @@ def render_triage_page(limit: int = 100) -> str:
    except Exception as e:
        entity_candidates = []

-    total = len(mem_candidates) + len(entity_candidates)
+    try:
+        from atocore.memory.service import get_merge_candidates
+        merge_candidates = get_merge_candidates(status="pending", limit=limit)
+    except Exception:
+        merge_candidates = []
+
+    total = len(mem_candidates) + len(entity_candidates) + len(merge_candidates)
    graduation_bar = _render_graduation_bar()
+    dedup_bar = _render_dedup_bar()

    if total == 0:
-        body = _TRIAGE_CSS + _ENTITY_TRIAGE_CSS + f"""
+        body = _TRIAGE_CSS + _ENTITY_TRIAGE_CSS + _MERGE_TRIAGE_CSS + f"""
          <div class="triage-header">
            <h1>Triage Queue</h1>
          </div>
          {graduation_bar}
+          {dedup_bar}
          <div class="empty">
            <p>🎉 No candidates to review.</p>
            <p>The auto-triage pipeline keeps this queue empty unless something needs your judgment.</p>
-            <p>Use the 🎓 Graduate memories button above to propose new entity candidates from existing memories.</p>
+            <p>Use 🎓 Graduate memories to propose entity candidates, or 🔗 Scan for duplicates to find near-duplicate memories to merge.</p>
          </div>
-        """ + _GRADUATION_SCRIPT
+        """ + _GRADUATION_SCRIPT + _MERGE_TRIAGE_SCRIPT
        return render_html("Triage — AtoCore", body, breadcrumbs=[("Wiki", "/wiki"), ("Triage", "")])

    # Memory cards
    mem_cards = "".join(_render_candidate_card(c) for c in mem_candidates)

+    # Merge cards (Phase 7A)
+    merge_cards_html = ""
+    if merge_candidates:
+        merge_cards = "".join(_render_merge_card(c) for c in merge_candidates)
+        merge_cards_html = f"""
+        <div class="section-break">
+          <h2>🔗 Merge Candidates ({len(merge_candidates)})</h2>
+          <p class="auto-triage-msg">
+            Semantically near-duplicate active memories. Approving merges the sources
+            into the proposed unified memory; sources become <code>superseded</code>
+            (not deleted — still queryable). You can edit the draft content and tags
+            before approving.
+          </p>
+        </div>
+        {merge_cards}
+        """
+
    # Entity cards
    ent_cards_html = ""
    if entity_candidates:
@@ -513,11 +709,12 @@ def render_triage_page(limit: int = 100) -> str:
        {ent_cards}
        """

-    body = _TRIAGE_CSS + _ENTITY_TRIAGE_CSS + f"""
+    body = _TRIAGE_CSS + _ENTITY_TRIAGE_CSS + _MERGE_TRIAGE_CSS + f"""
      <div class="triage-header">
        <h1>Triage Queue</h1>
        <span class="count">
          <span id="cand-count">{len(mem_candidates)}</span> memory ·
+          {len(merge_candidates)} merge ·
          {len(entity_candidates)} entity
        </span>
      </div>
@@ -536,10 +733,12 @@ def render_triage_page(limit: int = 100) -> str:
        </span>
      </div>
      {graduation_bar}
+      {dedup_bar}
      <h2>📝 Memory Candidates ({len(mem_candidates)})</h2>
      {mem_cards}
+      {merge_cards_html}
      {ent_cards_html}
-    """ + _TRIAGE_SCRIPT + _ENTITY_TRIAGE_SCRIPT + _GRADUATION_SCRIPT
+    """ + _TRIAGE_SCRIPT + _ENTITY_TRIAGE_SCRIPT + _GRADUATION_SCRIPT + _MERGE_TRIAGE_SCRIPT

    return render_html(
        "Triage — AtoCore",
--- a/src/atocore/engineering/wiki.py
+++ b/src/atocore/engineering/wiki.py
@@ -26,8 +26,25 @@ from atocore.memory.service import get_memories
 from atocore.projects.registry import load_project_registry


-def render_html(title: str, body_html: str, breadcrumbs: list[tuple[str, str]] | None = None) -> str:
-    nav = ""
+_TOP_NAV_LINKS = [
+    ("🏠 Home", "/wiki"),
+    ("📡 Activity", "/wiki/activity"),
+    ("🔀 Triage", "/admin/triage"),
+    ("📊 Dashboard", "/admin/dashboard"),
+]
+
+
+def _render_topnav(active_path: str = "") -> str:
+    items = []
+    for label, href in _TOP_NAV_LINKS:
+        cls = "topnav-item active" if href == active_path else "topnav-item"
+        items.append(f'<a href="{href}" class="{cls}">{label}</a>')
+    return f'<nav class="topnav">{" ".join(items)}</nav>'
+
+
+def render_html(title: str, body_html: str, breadcrumbs: list[tuple[str, str]] | None = None, active_path: str = "") -> str:
+    topnav = _render_topnav(active_path)
+    crumbs = ""
    if breadcrumbs:
        parts = []
        for label, href in breadcrumbs:
@@ -35,8 +52,9 @@ def render_html(title: str, body_html: str, breadcrumbs: list[tuple[str, str]] |
                parts.append(f'<a href="{href}">{label}</a>')
            else:
                parts.append(f"<span>{label}</span>")
-        nav = f'<nav class="breadcrumbs">{"  /  ".join(parts)}</nav>'
+        crumbs = f'<nav class="breadcrumbs">{"  /  ".join(parts)}</nav>'

+    nav = topnav + crumbs
    return _TEMPLATE.replace("{{title}}", title).replace("{{nav}}", nav).replace("{{body}}", body_html)


@@ -100,6 +118,35 @@ def render_homepage() -> str:
    lines.append('<button type="submit">Search</button>')
    lines.append('</form>')

+    # What's happening — autonomous activity snippet
+    try:
+        from atocore.memory.service import get_recent_audit
+        recent = get_recent_audit(limit=30)
+        by_action: dict[str, int] = {}
+        by_actor: dict[str, int] = {}
+        for a in recent:
+            by_action[a["action"]] = by_action.get(a["action"], 0) + 1
+            by_actor[a["actor"]] = by_actor.get(a["actor"], 0) + 1
+        # Surface autonomous actors specifically
+        auto_actors = {k: v for k, v in by_actor.items()
+                       if k.startswith("auto-") or k == "confidence-decay"
+                       or k == "phase10-auto-promote" or k == "transient-to-durable"}
+        if recent:
+            lines.append('<div class="activity-snippet">')
+            lines.append('<h3>📡 What the brain is doing</h3>')
+            top_actions = sorted(by_action.items(), key=lambda x: -x[1])[:6]
+            lines.append('<div class="stat-row">' +
+                         "".join(f'<span>{a}: {n}</span>' for a, n in top_actions) +
+                         '</div>')
+            if auto_actors:
+                lines.append(f'<p style="font-size:0.9rem; margin:0.3rem 0;">Autonomous actors: ' +
+                             " · ".join(f'<code>{k}</code> ({v})' for k, v in auto_actors.items()) +
+                             '</p>')
+            lines.append('<p style="font-size:0.85rem; margin:0;"><a href="/wiki/activity">Full timeline →</a></p>')
+            lines.append('</div>')
+    except Exception:
+        pass
+
    for bucket_name, items in buckets.items():
        if not items:
            continue
@@ -116,6 +163,40 @@ def render_homepage() -> str:
            lines.append('</a>')
        lines.append('</div>')

+    # Phase 6 C.2: Emerging projects section
+    try:
+        import json as _json
+        emerging_projects = []
+        state_entries = get_state("atocore")
+        for e in state_entries:
+            if e.category == "proposals" and e.key == "unregistered_projects":
+                try:
+                    emerging_projects = _json.loads(e.value)
+                except Exception:
+                    emerging_projects = []
+                break
+        if emerging_projects:
+            lines.append('<h2>📋 Emerging</h2>')
+            lines.append('<p class="emerging-intro">Projects that appear in memories but aren\'t yet registered. '
+                         'One click to promote them to first-class projects.</p>')
+            lines.append('<div class="emerging-grid">')
+            for ep in emerging_projects[:10]:
+                name = ep.get("project", "?")
+                count = ep.get("count", 0)
+                samples = ep.get("sample_contents", [])
+                samples_html = "".join(f'<li>{s[:120]}</li>' for s in samples[:2])
+                lines.append(
+                    f'<div class="emerging-card">'
+                    f'<h3>{name}</h3>'
+                    f'<div class="emerging-count">{count} memories</div>'
+                    f'<ul class="emerging-samples">{samples_html}</ul>'
+                    f'<button class="btn-register-emerging" onclick="registerEmerging({name!r})">📌 Register as project</button>'
+                    f'</div>'
+                )
+            lines.append('</div>')
+    except Exception:
+        pass
+
    # Quick stats
    all_entities = get_entities(limit=500)
    all_memories = get_memories(active_only=True, limit=500)
@@ -133,7 +214,7 @@ def render_homepage() -> str:

    lines.append(f'<p><a href="/admin/triage">Triage Queue</a> · <a href="/admin/dashboard">API Dashboard (JSON)</a> · <a href="/health">Health Check</a></p>')

-    return render_html("AtoCore Wiki", "\n".join(lines))
+    return render_html("AtoCore Wiki", "\n".join(lines), active_path="/wiki")


 def render_project(project: str) -> str:
@@ -254,6 +335,381 @@ def render_search(query: str) -> str:
    )


+# ---------------------------------------------------------------------
+# /wiki/capture — DEPRECATED emergency paste-in form.
+# Kept as an endpoint because POST /interactions is public anyway, but
+# REMOVED from the topnav so it's not promoted as the capture path.
+# The sanctioned surfaces are Claude Code (Stop + UserPromptSubmit
+# hooks) and OpenClaw (capture plugin with 7I context injection).
+# This form is explicitly a last-resort for when someone has to feed
+# in an external log and can't get the normal hooks to reach it.
+# ---------------------------------------------------------------------
+
+
+def render_capture() -> str:
+    lines = ['<h1>📥 Manual capture (fallback only)</h1>']
+    lines.append(
+        '<div class="triage-warning"><strong>This is not the capture path.</strong> '
+        'The sanctioned capture surfaces are Claude Code (Stop hook auto-captures every turn) '
+        'and OpenClaw (plugin auto-captures + injects AtoCore context on every agent turn). '
+        'This form exists only as a last resort for external logs you can\'t get into the normal pipeline.</div>'
+    )
+    lines.append(
+        '<p>If you\'re reaching for this page because you had a chat somewhere AtoCore didn\'t see, '
+        'fix the capture surface instead — don\'t paste. The deliberate scope is Claude Code + OpenClaw.</p>'
+    )
+    lines.append('<p class="meta">Your prompt + the assistant\'s response. Project is optional — '
+                 'the extractor infers it from content.</p>')
+    lines.append("""
+<form id="capture-form" style="display:flex; flex-direction:column; gap:0.8rem; margin-top:1rem;">
+  <label><strong>Your prompt / question</strong>
+    <textarea id="cap-prompt" required rows="4"
+      style="width:100%; padding:0.6rem; background:var(--bg); color:var(--text); border:1px solid var(--border); border-radius:6px; font-family:inherit; font-size:0.95rem;"
+      placeholder="Paste what you asked…"></textarea>
+  </label>
+  <label><strong>Assistant response</strong>
+    <textarea id="cap-response" required rows="10"
+      style="width:100%; padding:0.6rem; background:var(--bg); color:var(--text); border:1px solid var(--border); border-radius:6px; font-family:inherit; font-size:0.95rem;"
+      placeholder="Paste the full assistant response…"></textarea>
+  </label>
+  <div style="display:flex; gap:0.5rem; align-items:center; flex-wrap:wrap;">
+    <label style="display:flex; gap:0.35rem; align-items:center;">Project (optional):
+      <input type="text" id="cap-project" placeholder="auto-detect"
+        style="padding:0.35rem 0.6rem; background:var(--bg); color:var(--text); border:1px solid var(--border); border-radius:4px; font-family:monospace; width:180px;">
+    </label>
+    <label style="display:flex; gap:0.35rem; align-items:center;">Source:
+      <select id="cap-source" style="padding:0.35rem; background:var(--bg); color:var(--text); border:1px solid var(--border); border-radius:4px;">
+        <option value="claude-desktop">Claude Desktop</option>
+        <option value="claude-web">Claude.ai web</option>
+        <option value="claude-mobile">Claude mobile</option>
+        <option value="chatgpt">ChatGPT</option>
+        <option value="other">Other</option>
+      </select>
+    </label>
+  </div>
+  <button type="submit"
+    style="padding:0.6rem 1.2rem; background:var(--accent); color:white; border:none; border-radius:6px; cursor:pointer; font-size:1rem; font-weight:600; align-self:flex-start;">
+    Save to AtoCore
+  </button>
+</form>
+<div id="cap-status" style="margin-top:1rem; font-size:0.9rem; min-height:1.5em;"></div>
+
+<script>
+document.getElementById('capture-form').addEventListener('submit', async (e) => {
+  e.preventDefault();
+  const prompt = document.getElementById('cap-prompt').value.trim();
+  const response = document.getElementById('cap-response').value.trim();
+  const project = document.getElementById('cap-project').value.trim();
+  const source = document.getElementById('cap-source').value;
+  const status = document.getElementById('cap-status');
+  if (!prompt || !response) { status.textContent = 'Need both prompt and response.'; return; }
+  status.textContent = 'Saving…';
+  try {
+    const r = await fetch('/interactions', {
+      method: 'POST',
+      headers: {'Content-Type': 'application/json'},
+      body: JSON.stringify({
+        prompt: prompt, response: response,
+        client: source, project: project, reinforce: true
+      })
+    });
+    if (r.ok) {
+      const data = await r.json();
+      status.innerHTML = '✅ Saved — interaction ' + (data.interaction_id || '?').slice(0,8) +
+        '. Runs through extraction + triage within the hour.<br>' +
+        '<a href="/interactions/' + (data.interaction_id || '') + '">view</a>';
+      document.getElementById('capture-form').reset();
+    } else {
+      status.textContent = '❌ ' + r.status + ': ' + (await r.text()).slice(0, 200);
+    }
+  } catch (err) { status.textContent = '❌ ' + err.message; }
+});
+</script>
+""")
+    lines.append(
+        '<h2>How this works</h2>'
+        '<ul>'
+        '<li><strong>Claude Code</strong> → auto-captured via Stop hook</li>'
+        '<li><strong>OpenClaw</strong> → auto-captured + gets AtoCore context injected on prompt start (Phase 7I)</li>'
+        '<li><strong>Anything else</strong> (Claude Desktop, mobile, web, ChatGPT) → paste here</li>'
+        '</ul>'
+        '<p>The extractor is aggressive about capturing signal — don\'t hand-filter. '
+        'If the conversation had nothing durable, triage will auto-reject.</p>'
+    )
+
+    return render_html(
+        "Capture — AtoCore",
+        "\n".join(lines),
+        breadcrumbs=[("Wiki", "/wiki"), ("Capture", "")],
+        active_path="/wiki/capture",
+    )
+
+
+# ---------------------------------------------------------------------
+# Phase 7E — /wiki/memories/{id}: memory detail page
+# ---------------------------------------------------------------------
+
+
+def render_memory_detail(memory_id: str) -> str | None:
+    """Full view of a single memory: content, audit trail, source refs,
+    neighbors, graduation status. Fills the drill-down gap the list
+    views can't."""
+    from atocore.memory.service import get_memory_audit
+    from atocore.models.database import get_connection
+
+    with get_connection() as conn:
+        row = conn.execute("SELECT * FROM memories WHERE id = ?", (memory_id,)).fetchone()
+    if row is None:
+        return None
+
+    import json as _json
+    mem = dict(row)
+    try:
+        tags = _json.loads(mem.get("domain_tags") or "[]") or []
+    except Exception:
+        tags = []
+
+    lines = [f'<h1>{mem["memory_type"]}: <span style="color:var(--text);">{mem["content"][:80]}</span></h1>']
+    if len(mem["content"]) > 80:
+        lines.append(f'<blockquote><p>{mem["content"]}</p></blockquote>')
+
+    # Metadata row
+    meta_items = [
+        f'<span class="tag">{mem["status"]}</span>',
+        f'<strong>{mem["memory_type"]}</strong>',
+    ]
+    if mem.get("project"):
+        meta_items.append(f'<a href="/wiki/projects/{mem["project"]}">{mem["project"]}</a>')
+    meta_items.append(f'confidence: <strong>{float(mem.get("confidence") or 0):.2f}</strong>')
+    meta_items.append(f'refs: <strong>{int(mem.get("reference_count") or 0)}</strong>')
+    if mem.get("valid_until"):
+        meta_items.append(f'<span class="mem-expiry">valid until {str(mem["valid_until"])[:10]}</span>')
+    lines.append(f'<p>{" · ".join(meta_items)}</p>')
+
+    if tags:
+        tag_links = " ".join(f'<a href="/wiki/domains/{t}" class="tag-badge">{t}</a>' for t in tags)
+        lines.append(f'<p><span class="mem-tags">{tag_links}</span></p>')
+
+    lines.append(f'<p class="meta">id: <code>{mem["id"]}</code> · created: {mem["created_at"]}'
+                 f' · updated: {mem.get("updated_at", "?")}'
+                 + (f' · last referenced: {mem["last_referenced_at"]}' if mem.get("last_referenced_at") else '')
+                 + '</p>')
+
+    # Graduation
+    if mem.get("graduated_to_entity_id"):
+        eid = mem["graduated_to_entity_id"]
+        lines.append(
+            f'<h2>🎓 Graduated</h2>'
+            f'<p>This memory was promoted to a typed entity: '
+            f'<a href="/wiki/entities/{eid}">{eid[:8]}</a></p>'
+        )
+
+    # Source chunk
+    if mem.get("source_chunk_id"):
+        lines.append(f'<h2>Source chunk</h2><p><code>{mem["source_chunk_id"]}</code></p>')
+
+    # Audit trail
+    audit = get_memory_audit(memory_id, limit=50)
+    if audit:
+        lines.append(f'<h2>Audit trail ({len(audit)} events)</h2><ul>')
+        for a in audit:
+            note = f' — {a["note"]}' if a.get("note") else ""
+            lines.append(
+                f'<li><code>{a["timestamp"]}</code> '
+                f'<strong>{a["action"]}</strong> '
+                f'<em>{a["actor"]}</em>{note}</li>'
+            )
+        lines.append('</ul>')
+
+    # Neighbors by shared tag
+    if tags:
+        from atocore.memory.service import get_memories as _get_memories
+        neighbors = []
+        for t in tags[:3]:
+            for other in _get_memories(active_only=True, limit=30):
+                if other.id == memory_id:
+                    continue
+                if any(ot == t for ot in (other.domain_tags or [])):
+                    neighbors.append(other)
+        # Dedupe
+        seen = set()
+        uniq = []
+        for n in neighbors:
+            if n.id in seen:
+                continue
+            seen.add(n.id)
+            uniq.append(n)
+        if uniq:
+            lines.append(f'<h2>Related (by tag)</h2><ul>')
+            for n in uniq[:10]:
+                lines.append(
+                    f'<li><a href="/wiki/memories/{n.id}">[{n.memory_type}] '
+                    f'{n.content[:120]}</a>'
+                    + (f' <span class="tag">{n.project}</span>' if n.project else '')
+                    + '</li>'
+                )
+            lines.append('</ul>')
+
+    return render_html(
+        f"Memory {memory_id[:8]}",
+        "\n".join(lines),
+        breadcrumbs=[("Wiki", "/wiki"), ("Memory", "")],
+    )
+
+
+# ---------------------------------------------------------------------
+# Phase 7F — /wiki/domains/{tag}: cross-project domain view
+# ---------------------------------------------------------------------
+
+
+def render_domain(tag: str) -> str:
+    """All memories + entities carrying a given domain_tag, grouped by project.
+    Answers 'what does the brain know about optics, across all projects?'"""
+    tag = (tag or "").strip().lower()
+    if not tag:
+        return render_html("Domain", "<p>No tag specified.</p>",
+                           breadcrumbs=[("Wiki", "/wiki"), ("Domains", "")])
+
+    all_mems = get_memories(active_only=True, limit=500)
+    matching = [m for m in all_mems
+                if any((t or "").lower() == tag for t in (m.domain_tags or []))]
+
+    # Group by project
+    by_project: dict[str, list] = {}
+    for m in matching:
+        by_project.setdefault(m.project or "(global)", []).append(m)
+
+    lines = [f'<h1>Domain: <code>{tag}</code></h1>']
+    lines.append(f'<p class="meta">{len(matching)} active memories across {len(by_project)} projects</p>')
+
+    if not matching:
+        lines.append(
+            f'<p>No memories currently carry the tag <code>{tag}</code>.</p>'
+            '<p>Domain tags are assigned by the extractor when it identifies '
+            'the topical scope of a memory. They update over time.</p>'
+        )
+        return render_html(
+            f"Domain: {tag}",
+            "\n".join(lines),
+            breadcrumbs=[("Wiki", "/wiki"), ("Domains", ""), (tag, "")],
+        )
+
+    # Sort projects by count descending, (global) last
+    def sort_key(item: tuple[str, list]) -> tuple[int, int]:
+        proj, mems = item
+        return (1 if proj == "(global)" else 0, -len(mems))
+
+    for proj, mems in sorted(by_project.items(), key=sort_key):
+        proj_link = proj if proj == "(global)" else f'<a href="/wiki/projects/{proj}">{proj}</a>'
+        lines.append(f'<h2>{proj_link} ({len(mems)})</h2><ul>')
+        for m in mems:
+            other_tags = [t for t in (m.domain_tags or []) if t != tag][:3]
+            other_tags_html = ""
+            if other_tags:
+                other_tags_html = ' <span class="mem-tags">' + " ".join(
+                    f'<a href="/wiki/domains/{t}" class="tag-badge">{t}</a>' for t in other_tags
+                ) + '</span>'
+            lines.append(
+                f'<li><a href="/wiki/memories/{m.id}">[{m.memory_type}] '
+                f'{m.content[:200]}</a>'
+                f' <span class="meta">conf {m.confidence:.2f} · refs {m.reference_count}</span>'
+                f'{other_tags_html}</li>'
+            )
+        lines.append('</ul>')
+
+    # Entities with this tag (if any have tags — currently they might not)
+    try:
+        all_entities = get_entities(limit=500)
+        ent_matching = []
+        for e in all_entities:
+            tags = e.properties.get("domain_tags") if e.properties else []
+            if isinstance(tags, list) and tag in [str(t).lower() for t in tags]:
+                ent_matching.append(e)
+        if ent_matching:
+            lines.append(f'<h2>🔧 Entities ({len(ent_matching)})</h2><ul>')
+            for e in ent_matching:
+                lines.append(
+                    f'<li><a href="/wiki/entities/{e.id}">[{e.entity_type}] {e.name}</a>'
+                    + (f' <span class="tag">{e.project}</span>' if e.project else '')
+                    + '</li>'
+                )
+            lines.append('</ul>')
+    except Exception:
+        pass
+
+    return render_html(
+        f"Domain: {tag}",
+        "\n".join(lines),
+        breadcrumbs=[("Wiki", "/wiki"), ("Domains", ""), (tag, "")],
+    )
+
+
+# ---------------------------------------------------------------------
+# /wiki/activity — autonomous-activity feed
+# ---------------------------------------------------------------------
+
+
+def render_activity(hours: int = 48, limit: int = 100) -> str:
+    """Timeline of what the autonomous pipeline did recently. Answers
+    'what has the brain been doing while I was away?'"""
+    from atocore.memory.service import get_recent_audit
+
+    audit = get_recent_audit(limit=limit)
+
+    # Group events by category for summary
+    by_action: dict[str, int] = {}
+    by_actor: dict[str, int] = {}
+    for a in audit:
+        by_action[a["action"]] = by_action.get(a["action"], 0) + 1
+        by_actor[a["actor"]] = by_actor.get(a["actor"], 0) + 1
+
+    lines = [f'<h1>📡 Activity Feed</h1>']
+    lines.append(f'<p class="meta">Last {len(audit)} events in the memory audit log</p>')
+
+    # Summary chips
+    if by_action or by_actor:
+        lines.append('<h2>Summary</h2>')
+        lines.append('<p><strong>By action:</strong> ' +
+                     " · ".join(f'{k}: {v}' for k, v in sorted(by_action.items(), key=lambda x: -x[1])) +
+                     '</p>')
+        lines.append('<p><strong>By actor:</strong> ' +
+                     " · ".join(f'<code>{k}</code>: {v}' for k, v in sorted(by_actor.items(), key=lambda x: -x[1])) +
+                     '</p>')
+
+    # Action-type color/emoji
+    action_emoji = {
+        "created": "➕", "promoted": "✅", "rejected": "❌", "invalidated": "🚫",
+        "superseded": "🔀", "reinforced": "🔁", "updated": "✏️",
+        "auto_promoted": "⚡", "created_via_merge": "🔗",
+        "valid_until_extended": "⏳", "tag_canonicalized": "🏷️",
+    }
+
+    lines.append('<h2>Timeline</h2><ul>')
+    for a in audit:
+        emoji = action_emoji.get(a["action"], "•")
+        preview = a.get("content_preview") or ""
+        ts_short = a["timestamp"][:16] if a.get("timestamp") else "?"
+        mid_short = (a.get("memory_id") or "")[:8]
+        note = f' — <em>{a["note"]}</em>' if a.get("note") else ""
+        lines.append(
+            f'<li>{emoji} <code>{ts_short}</code> '
+            f'<strong>{a["action"]}</strong> '
+            f'<em>{a["actor"]}</em> '
+            f'<a href="/wiki/memories/{a["memory_id"]}">{mid_short}</a>'
+            f'{note}'
+            + (f'<br><span style="opacity:0.6; font-size:0.85rem; margin-left:1.5rem;">{preview[:140]}</span>' if preview else '')
+            + '</li>'
+        )
+    lines.append('</ul>')
+
+    return render_html(
+        "Activity — AtoCore",
+        "\n".join(lines),
+        breadcrumbs=[("Wiki", "/wiki"), ("Activity", "")],
+        active_path="/wiki/activity",
+    )
+
+
 _TEMPLATE = """<!DOCTYPE html>
 <html lang="en">
 <head>
@@ -290,6 +746,17 @@ _TEMPLATE = """<!DOCTYPE html>
  hr { border: none; border-top: 1px solid var(--border); margin: 2rem 0; }
  .breadcrumbs { margin-bottom: 1.5rem; font-size: 0.85em; opacity: 0.7; }
  .breadcrumbs a { opacity: 0.8; }
+  .topnav { display: flex; gap: 0.25rem; flex-wrap: wrap; margin-bottom: 1rem; padding-bottom: 0.8rem; border-bottom: 1px solid var(--border); }
+  .topnav-item { padding: 0.35rem 0.8rem; background: var(--card); border: 1px solid var(--border); border-radius: 6px; font-size: 0.88rem; color: var(--text); opacity: 0.75; text-decoration: none; }
+  .topnav-item:hover { opacity: 1; background: var(--hover); text-decoration: none; }
+  .topnav-item.active { background: var(--accent); color: white; border-color: var(--accent); opacity: 1; }
+  .topnav-item.active:hover { background: var(--accent); }
+  .activity-snippet { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; margin: 1rem 0; }
+  .activity-snippet h3 { color: var(--accent); margin-bottom: 0.4rem; }
+  .activity-snippet ul { margin: 0.3rem 0 0 1.2rem; font-size: 0.9rem; }
+  .activity-snippet li { margin-bottom: 0.2rem; }
+  .stat-row { display: flex; gap: 1rem; flex-wrap: wrap; font-size: 0.9rem; margin: 0.4rem 0; }
+  .stat-row span { padding: 0.1rem 0.4rem; background: var(--hover); border-radius: 4px; }
  .meta { font-size: 0.8em; opacity: 0.5; margin-top: 0.5rem; }
  .tag { background: var(--accent); color: var(--bg); padding: 0.1rem 0.4rem; border-radius: 3px; font-size: 0.75em; margin-left: 0.3rem; }
  .search-box { display: flex; gap: 0.5rem; margin: 1.5rem 0; }
@@ -324,7 +791,41 @@ _TEMPLATE = """<!DOCTYPE html>
  .tag-badge:hover { opacity: 0.85; text-decoration: none; }
  .mem-expiry { font-size: 0.75rem; color: #d97706; font-style: italic; margin-left: 0.4rem; }
  @media (prefers-color-scheme: dark) { .mem-expiry { color: #fbbf24; } }
+  /* Phase 6 C.2 — Emerging projects section */
+  .emerging-intro { font-size: 0.9rem; opacity: 0.75; margin-bottom: 0.8rem; }
+  .emerging-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(280px, 1fr)); gap: 1rem; margin-bottom: 1rem; }
+  .emerging-card { background: var(--card); border: 1px dashed var(--accent); border-radius: 8px; padding: 1rem; }
+  .emerging-card h3 { margin: 0 0 0.3rem 0; color: var(--accent); font-family: monospace; font-size: 1rem; }
+  .emerging-count { font-size: 0.8rem; opacity: 0.6; margin-bottom: 0.5rem; }
+  .emerging-samples { font-size: 0.85rem; margin: 0.5rem 0; padding-left: 1.2rem; opacity: 0.8; }
+  .emerging-samples li { margin-bottom: 0.25rem; }
+  .btn-register-emerging { width: 100%; padding: 0.45rem 0.9rem; background: var(--accent); color: white; border: 1px solid var(--accent); border-radius: 4px; cursor: pointer; font-size: 0.88rem; font-weight: 500; margin-top: 0.5rem; }
+  .btn-register-emerging:hover { opacity: 0.9; }
 </style>
+<script>
+async function registerEmerging(projectId) {
+  if (!confirm(`Register "${projectId}" as a first-class project?\n\nThis creates:\n• /wiki/projects/${projectId} page\n• System map + gaps + killer queries\n• Triage + graduation support\n\nIngest root defaults to vault:incoming/projects/${projectId}/`)) {
+    return;
+  }
+  try {
+    const r = await fetch('/admin/projects/register-emerging', {
+      method: 'POST',
+      headers: {'Content-Type': 'application/json'},
+      body: JSON.stringify({project_id: projectId}),
+    });
+    if (r.ok) {
+      const data = await r.json();
+      alert(data.message || `Registered ${projectId}`);
+      window.location.reload();
+    } else {
+      const err = await r.text();
+      alert(`Registration failed: ${r.status}\n${err.substring(0, 300)}`);
+    }
+  } catch (e) {
+    alert(`Network error: ${e.message}`);
+  }
+}
+</script>
 </head>
 <body>
 {{nav}}
--- a/src/atocore/memory/_dedup_prompt.py
+++ b/src/atocore/memory/_dedup_prompt.py
@@ -0,0 +1,200 @@
+"""Shared LLM prompt + parser for memory dedup (Phase 7A).
+
+Stdlib-only — must be importable from both the in-container service
+layer (when a user clicks "scan for duplicates" in the UI) and the
+host-side batch script (``scripts/memory_dedup.py``), which runs on
+Dalidou where the container's Python deps are not available.
+
+The prompt instructs the model to draft a UNIFIED memory that
+preserves every specific detail from the sources. We never want a
+merge to lose information — if two memories disagree on a number, the
+merged content should surface both with context.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Any
+
+DEDUP_PROMPT_VERSION = "dedup-0.1.0"
+MAX_CONTENT_CHARS = 1000
+MAX_SOURCES = 8  # cluster size cap — bigger clusters are suspicious
+
+SYSTEM_PROMPT = """You consolidate near-duplicate memories for AtoCore, a personal context engine.
+
+Given 2-8 memories that a semantic-similarity scan flagged as likely duplicates, draft a UNIFIED replacement that preserves every specific detail from every source.
+
+CORE PRINCIPLE: information never gets lost. If the sources disagree on a number, date, vendor, or spec, surface BOTH with attribution (e.g., "quoted at $3.2k on 2026-03-01, revised to $3.8k on 2026-04-10"). If one source is more specific than another, keep the specificity. If they say the same thing differently, pick the clearer wording.
+
+YOU MUST:
+- Produce content under 500 characters that reads as a single coherent statement
+- Keep all project/vendor/person/part names that appear in any source
+- Keep all numbers, dates, and identifiers
+- Keep the strongest claim wording ("ratified", "decided", "committed") if any source has it
+- Propose domain_tags as a UNION of the sources' tags (lowercase, deduped, cap 6)
+- Return valid_until = latest non-null valid_until across sources, or null if any source has null (permanent beats transient)
+
+REFUSE TO MERGE (return action="reject") if:
+- The memories are actually about DIFFERENT subjects that just share vocabulary (e.g., "p04 mirror" and "p05 mirror" — same project bucket means same project, but different components)
+- One memory CONTRADICTS another and you cannot reconcile them — flag for contradiction review instead
+- The sources span different time snapshots of a changing state that should stay as a timeline, not be collapsed
+
+OUTPUT — raw JSON, no prose, no markdown fences:
+{
+  "action": "merge" | "reject",
+  "content": "the unified memory content",
+  "memory_type": "knowledge|project|preference|adaptation|episodic|identity",
+  "project": "project-slug or empty",
+  "domain_tags": ["tag1", "tag2"],
+  "confidence": 0.5,
+  "reason": "one sentence explaining the merge (or the rejection)"
+}
+
+On action=reject, still fill content with a short explanation and set confidence=0."""
+
+
+TIER2_SYSTEM_PROMPT = """You are the second-opinion reviewer for AtoCore's memory-consolidation pipeline.
+
+A tier-1 model (cheaper, faster) already drafted a unified memory from N near-duplicate source memories. Your job is to either CONFIRM the merge (refining the content if you see a clearer phrasing) or OVERRIDE with action="reject" if the tier-1 missed something important.
+
+You must be STRICTER than tier-1. Specifically, REJECT if:
+- The sources are about different subjects that share vocabulary (e.g., different components within the same project)
+- The tier-1 draft dropped specifics that existed in the sources (numbers, dates, vendors, people, part IDs)
+- One source contradicts another and the draft glossed over it
+- The sources span a timeline of a changing state (should be preserved as a sequence, not collapsed)
+
+If you CONFIRM, you may polish the content — but preserve every specific from every source.
+
+Same output schema as tier-1:
+{
+  "action": "merge" | "reject",
+  "content": "the unified memory content",
+  "memory_type": "knowledge|project|preference|adaptation|episodic|identity",
+  "project": "project-slug or empty",
+  "domain_tags": ["tag1", "tag2"],
+  "confidence": 0.5,
+  "reason": "one sentence — what you confirmed or why you overrode"
+}
+
+Raw JSON only, no prose, no markdown fences."""
+
+
+def build_tier2_user_message(sources: list[dict[str, Any]], tier1_verdict: dict[str, Any]) -> str:
+    """Format tier-2 review payload: same sources + tier-1's draft."""
+    base = build_user_message(sources)
+    draft_summary = (
+        f"\n\n--- TIER-1 DRAFT (for your review) ---\n"
+        f"action: {tier1_verdict.get('action')}\n"
+        f"confidence: {tier1_verdict.get('confidence', 0):.2f}\n"
+        f"proposed content: {(tier1_verdict.get('content') or '')[:600]}\n"
+        f"proposed memory_type: {tier1_verdict.get('memory_type', '')}\n"
+        f"proposed project: {tier1_verdict.get('project', '')}\n"
+        f"proposed tags: {tier1_verdict.get('domain_tags', [])}\n"
+        f"tier-1 reason: {tier1_verdict.get('reason', '')[:300]}\n"
+        f"---\n\n"
+        f"Return your JSON verdict now. Confirm or override."
+    )
+    return base.replace("Return the JSON object now.", "").rstrip() + draft_summary
+
+
+def build_user_message(sources: list[dict[str, Any]]) -> str:
+    """Format N source memories for the model to consolidate.
+
+    Each source dict should carry id, content, project, memory_type,
+    domain_tags, confidence, valid_until, reference_count.
+    """
+    lines = [f"You have {len(sources)} source memories in the same (project, memory_type) bucket:\n"]
+    for i, src in enumerate(sources[:MAX_SOURCES], start=1):
+        tags = src.get("domain_tags") or []
+        if isinstance(tags, str):
+            try:
+                tags = json.loads(tags)
+            except Exception:
+                tags = []
+        lines.append(
+            f"--- Source {i} (id={src.get('id','?')[:8]}, "
+            f"refs={src.get('reference_count',0)}, "
+            f"conf={src.get('confidence',0):.2f}, "
+            f"valid_until={src.get('valid_until') or 'permanent'}) ---"
+        )
+        lines.append(f"project: {src.get('project','')}")
+        lines.append(f"type: {src.get('memory_type','')}")
+        lines.append(f"tags: {tags}")
+        lines.append(f"content: {(src.get('content') or '')[:MAX_CONTENT_CHARS]}")
+        lines.append("")
+    lines.append("Return the JSON object now.")
+    return "\n".join(lines)
+
+
+def parse_merge_verdict(raw_output: str) -> dict[str, Any] | None:
+    """Strip markdown fences / leading prose and return the parsed JSON
+    object. Returns None on parse failure."""
+    text = (raw_output or "").strip()
+    if text.startswith("```"):
+        text = text.strip("`")
+        nl = text.find("\n")
+        if nl >= 0:
+            text = text[nl + 1:]
+        if text.endswith("```"):
+            text = text[:-3]
+        text = text.strip()
+
+    if not text.lstrip().startswith("{"):
+        start = text.find("{")
+        end = text.rfind("}")
+        if start >= 0 and end > start:
+            text = text[start:end + 1]
+
+    try:
+        parsed = json.loads(text)
+    except json.JSONDecodeError:
+        return None
+    if not isinstance(parsed, dict):
+        return None
+    return parsed
+
+
+def normalize_merge_verdict(verdict: dict[str, Any]) -> dict[str, Any] | None:
+    """Validate + normalize a raw merge verdict. Returns None if the
+    verdict is unusable (no content, unknown action)."""
+    action = str(verdict.get("action") or "").strip().lower()
+    if action not in ("merge", "reject"):
+        return None
+
+    content = str(verdict.get("content") or "").strip()
+    if not content:
+        return None
+
+    memory_type = str(verdict.get("memory_type") or "knowledge").strip().lower()
+    project = str(verdict.get("project") or "").strip()
+
+    raw_tags = verdict.get("domain_tags") or []
+    if isinstance(raw_tags, str):
+        raw_tags = [t.strip() for t in raw_tags.split(",") if t.strip()]
+    if not isinstance(raw_tags, list):
+        raw_tags = []
+    tags: list[str] = []
+    for t in raw_tags[:6]:
+        if not isinstance(t, str):
+            continue
+        tt = t.strip().lower()
+        if tt and tt not in tags:
+            tags.append(tt)
+
+    try:
+        confidence = float(verdict.get("confidence", 0.5))
+    except (TypeError, ValueError):
+        confidence = 0.5
+    confidence = max(0.0, min(1.0, confidence))
+
+    reason = str(verdict.get("reason") or "").strip()[:500]
+
+    return {
+        "action": action,
+        "content": content[:1000],
+        "memory_type": memory_type,
+        "project": project,
+        "domain_tags": tags,
+        "confidence": confidence,
+        "reason": reason,
+    }
--- a/src/atocore/memory/_llm_prompt.py
+++ b/src/atocore/memory/_llm_prompt.py
@@ -21,7 +21,7 @@ from __future__ import annotations
 import json
 from typing import Any

-LLM_EXTRACTOR_VERSION = "llm-0.5.0"
+LLM_EXTRACTOR_VERSION = "llm-0.6.0"  # bolder unknown-project tagging
 MAX_RESPONSE_CHARS = 8000
 MAX_PROMPT_CHARS = 2000
 MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "adaptation"}
@@ -30,7 +30,24 @@ SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for

 AtoCore is the brain for Atomaste's engineering work. Known projects:
 p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore,
-abb-space. Unknown project names — still tag them, the system auto-detects.
+abb-space.
+
+UNKNOWN PROJECT/TOOL DETECTION (important): when a memory is clearly
+about a named tool, product, project, or system that is NOT in the
+known list above, use a slugified version of that name as the project
+tag (e.g., "apm" for "Atomaste Part Manager", "foo-bar" for "Foo Bar
+System"). DO NOT default to a nearest registered match just because
+APM isn't listed — that's misattribution. The system's Living
+Taxonomy detector scans for these unregistered tags and surfaces them
+for one-click registration once they appear in ≥3 memories. Your job
+is to be honest about scope, not to squeeze everything into existing
+buckets.
+
+Exception: if the memory is about a registered project that merely
+uses or integrates with an unknown tool (e.g., "p04 parts are missing
+materials in APM"), tag with the registered project (p04-gigabit) and
+mention the tool in content. Only use an unknown tool as the project
+tag when the tool itself is the primary subject.

 Your job is to emit SIGNALS that matter for future context. Be aggressive:
 err on the side of capturing useful signal. Triage filters noise downstream.
--- a/src/atocore/memory/_tag_canon_prompt.py
+++ b/src/atocore/memory/_tag_canon_prompt.py
@@ -0,0 +1,158 @@
+"""Shared LLM prompt + parser for tag canonicalization (Phase 7C).
+
+Stdlib-only, importable from both the in-container service layer and the
+host-side batch script that shells out to ``claude -p``.
+
+The prompt instructs the model to propose a map of domain_tag aliases
+to their canonical form. Confidence is key here — we AUTO-APPLY high-
+confidence aliases; low-confidence go to human review. Over-merging
+distinct concepts ("optics" vs "optical" — sometimes equivalent,
+sometimes not) destroys cross-cutting retrieval, so the model is
+instructed to err conservative.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Any
+
+TAG_CANON_PROMPT_VERSION = "tagcanon-0.1.0"
+MAX_TAGS_IN_PROMPT = 100
+
+SYSTEM_PROMPT = """You canonicalize domain tags for AtoCore's memory layer.
+
+Input: a distribution of lowercase domain tags (keyword → usage count across active memories). Examples: "firmware: 23", "fw: 5", "firmware-control: 3", "optics: 18", "optical: 2".
+
+Your job: identify aliases — distinct strings that refer to the SAME concept — and map them to a single canonical form. The canonical should be the clearest / most-used / most-descriptive variant.
+
+STRICT RULES:
+
+1. ONLY propose aliases that are UNAMBIGUOUSLY equivalent. Examples:
+     - "fw" → "firmware" (abbreviation)
+     - "firmware-control" → "firmware" (compound narrowing — only if usage context makes it clear the narrower one is never used to DISTINGUISH from firmware-in-general)
+     - "py" → "python"
+     - "ml" → "machine-learning"
+   Do NOT merge:
+     - "optics" vs "optical" — these CAN diverge ("optics" = subsystem/product domain; "optical" = adjective used in non-optics contexts)
+     - "p04" vs "p04-gigabit" — project ids are their own namespace, never canonicalize
+     - "thermal" vs "temperature" — related but distinct
+     - Anything where you're not sure — skip it, human review will catch real aliases next week
+
+2. Confidence scale:
+     0.9+  obvious abbreviation, very high usage disparity, no plausible alternative meaning
+     0.7-0.9  likely alias, one-word-diff or standard contraction
+     0.5-0.7  plausible but requires context — low count on alias side
+     <0.5    DO NOT PROPOSE — if you're under 0.5, skip the pair entirely
+   AtoCore auto-applies aliases at confidence >= 0.8; anything below goes to human review.
+
+3. The CANONICAL must actually appear in the input list (don't invent a new term).
+
+4. Never propose `alias == canonical`. Never propose circular mappings.
+
+5. Project tags (p04, p05, p06, abb-space, atomizer-v2, atocore, apm) are OFF LIMITS — they are project identifiers, not concepts. Leave them alone entirely.
+
+OUTPUT — raw JSON, no prose, no markdown fences:
+{
+  "aliases": [
+    {"alias": "fw", "canonical": "firmware", "confidence": 0.95, "reason": "fw is a standard abbreviation of firmware; 5 uses vs 23"},
+    {"alias": "ml", "canonical": "machine-learning", "confidence": 0.90, "reason": "ml is the universal abbreviation"}
+  ]
+}
+
+Empty aliases list is fine if nothing in the distribution is a clear alias. Err conservative — one false merge can pollute retrieval for hundreds of memories."""
+
+
+def build_user_message(tag_distribution: dict[str, int]) -> str:
+    """Format the tag distribution for the model.
+
+    Limited to MAX_TAGS_IN_PROMPT entries, sorted by count descending
+    so high-usage tags appear first (the LLM uses them as anchor points
+    for canonical selection).
+    """
+    if not tag_distribution:
+        return "Empty tag distribution — return {\"aliases\": []}."
+
+    sorted_tags = sorted(tag_distribution.items(), key=lambda x: x[1], reverse=True)
+    top = sorted_tags[:MAX_TAGS_IN_PROMPT]
+    lines = [f"{tag}: {count}" for tag, count in top]
+    return (
+        f"Tag distribution across {sum(tag_distribution.values())} total tag references "
+        f"(showing top {len(top)} of {len(tag_distribution)} unique tags):\n\n"
+        + "\n".join(lines)
+        + "\n\nReturn the JSON aliases map now. Only propose UNAMBIGUOUS equivalents."
+    )
+
+
+def parse_canon_output(raw_output: str) -> list[dict[str, Any]]:
+    """Strip markdown fences / prose and return the parsed aliases list."""
+    text = (raw_output or "").strip()
+    if text.startswith("```"):
+        text = text.strip("`")
+        nl = text.find("\n")
+        if nl >= 0:
+            text = text[nl + 1:]
+        if text.endswith("```"):
+            text = text[:-3]
+        text = text.strip()
+
+    if not text.lstrip().startswith("{"):
+        start = text.find("{")
+        end = text.rfind("}")
+        if start >= 0 and end > start:
+            text = text[start:end + 1]
+
+    try:
+        parsed = json.loads(text)
+    except json.JSONDecodeError:
+        return []
+
+    if not isinstance(parsed, dict):
+        return []
+    aliases = parsed.get("aliases") or []
+    if not isinstance(aliases, list):
+        return []
+    return [a for a in aliases if isinstance(a, dict)]
+
+
+# Project tokens that must never be canonicalized — they're project ids,
+# not concepts. Keep this list in sync with the registered projects.
+# Safe to be over-inclusive; extra entries just skip canonicalization.
+PROTECTED_PROJECT_TOKENS = frozenset({
+    "p04", "p04-gigabit",
+    "p05", "p05-interferometer",
+    "p06", "p06-polisher",
+    "p08", "abb-space",
+    "atomizer", "atomizer-v2",
+    "atocore", "apm",
+})
+
+
+def normalize_alias_item(item: dict[str, Any]) -> dict[str, Any] | None:
+    """Validate one raw alias proposal. Returns None if unusable.
+
+    Filters: non-strings, empty strings, identity mappings, protected
+    project tokens on either side.
+    """
+    alias = str(item.get("alias") or "").strip().lower()
+    canonical = str(item.get("canonical") or "").strip().lower()
+    if not alias or not canonical:
+        return None
+    if alias == canonical:
+        return None
+    if alias in PROTECTED_PROJECT_TOKENS or canonical in PROTECTED_PROJECT_TOKENS:
+        return None
+
+    try:
+        confidence = float(item.get("confidence", 0.0))
+    except (TypeError, ValueError):
+        confidence = 0.0
+    confidence = max(0.0, min(1.0, confidence))
+
+    reason = str(item.get("reason") or "").strip()[:300]
+
+    return {
+        "alias": alias,
+        "canonical": canonical,
+        "confidence": confidence,
+        "reason": reason,
+    }
--- a/src/atocore/memory/service.py
+++ b/src/atocore/memory/service.py
@@ -604,6 +604,204 @@ def auto_promote_reinforced(
    return promoted


+def extend_reinforced_valid_until(
+    min_reference_count: int = 5,
+    permanent_reference_count: int = 10,
+    extension_days: int = 90,
+    imminent_expiry_days: int = 30,
+) -> list[dict]:
+    """Phase 6 C.3 — transient-to-durable auto-extension.
+
+    For active memories with valid_until within the next N days AND
+    reference_count >= min_reference_count: extend valid_until by
+    extension_days. If reference_count >= permanent_reference_count,
+    clear valid_until entirely (becomes permanent).
+
+    Matches the user's intuition: "something transient becomes important
+    if you keep coming back to it". The system watches reinforcement
+    signals and extends expiry so context packs keep seeing durable
+    facts instead of letting them decay out.
+
+    Returns a list of {memory_id, action, old, new} dicts for each
+    memory touched.
+    """
+    from datetime import timedelta
+
+    now = datetime.now(timezone.utc)
+    horizon = (now + timedelta(days=imminent_expiry_days)).strftime("%Y-%m-%d")
+    new_expiry = (now + timedelta(days=extension_days)).strftime("%Y-%m-%d")
+    now_str = now.strftime("%Y-%m-%d %H:%M:%S")
+
+    extended: list[dict] = []
+
+    with get_connection() as conn:
+        rows = conn.execute(
+            "SELECT id, valid_until, reference_count FROM memories "
+            "WHERE status = 'active' "
+            "AND valid_until IS NOT NULL AND valid_until != '' "
+            "AND substr(valid_until, 1, 10) <= ? "
+            "AND COALESCE(reference_count, 0) >= ?",
+            (horizon, min_reference_count),
+        ).fetchall()
+
+        for r in rows:
+            mid = r["id"]
+            old_vu = r["valid_until"]
+            ref_count = int(r["reference_count"] or 0)
+
+            if ref_count >= permanent_reference_count:
+                # Permanent promotion
+                conn.execute(
+                    "UPDATE memories SET valid_until = NULL, updated_at = ? WHERE id = ?",
+                    (now_str, mid),
+                )
+                extended.append({
+                    "memory_id": mid, "action": "made_permanent",
+                    "old_valid_until": old_vu, "new_valid_until": None,
+                    "reference_count": ref_count,
+                })
+            else:
+                # 90-day extension
+                conn.execute(
+                    "UPDATE memories SET valid_until = ?, updated_at = ? WHERE id = ?",
+                    (new_expiry, now_str, mid),
+                )
+                extended.append({
+                    "memory_id": mid, "action": "extended",
+                    "old_valid_until": old_vu, "new_valid_until": new_expiry,
+                    "reference_count": ref_count,
+                })
+
+    # Audit rows via the shared framework (fail-open)
+    for ex in extended:
+        try:
+            _audit_memory(
+                memory_id=ex["memory_id"],
+                action="valid_until_extended",
+                actor="transient-to-durable",
+                before={"valid_until": ex["old_valid_until"]},
+                after={"valid_until": ex["new_valid_until"]},
+                note=f"reinforced {ex['reference_count']}x; {ex['action']}",
+            )
+        except Exception:
+            pass
+
+    if extended:
+        log.info("reinforced_valid_until_extended", count=len(extended))
+    return extended
+
+
+def decay_unreferenced_memories(
+    idle_days_threshold: int = 30,
+    daily_decay_factor: float = 0.97,
+    supersede_confidence_floor: float = 0.30,
+    actor: str = "confidence-decay",
+) -> dict[str, list]:
+    """Phase 7D — daily confidence decay on cold memories.
+
+    For every active, non-graduated memory with ``reference_count == 0``
+    AND whose last activity (``last_referenced_at`` if set, else
+    ``created_at``) is older than ``idle_days_threshold``: multiply
+    confidence by ``daily_decay_factor`` (0.97/day ≈ 2-month half-life).
+
+    If the decayed confidence falls below ``supersede_confidence_floor``,
+    auto-supersede the memory with note "decayed, no references".
+    Supersession is non-destructive — the row stays queryable via
+    ``status='superseded'`` for audit.
+
+    Reinforcement already bumps confidence back up, so a decayed memory
+    that later gets referenced reverses its trajectory naturally.
+
+    The job is idempotent-per-day: running it multiple times in one day
+    decays extra, but the cron runs once/day so this stays on-policy.
+    If a day's cron gets skipped, we under-decay (safe direction —
+    memories age slower, not faster, than the policy).
+
+    Returns {"decayed": [...], "superseded": [...]} with per-memory
+    before/after snapshots for audit/observability.
+    """
+    from datetime import timedelta
+
+    if not (0.0 < daily_decay_factor < 1.0):
+        raise ValueError("daily_decay_factor must be between 0 and 1 (exclusive)")
+    if not (0.0 <= supersede_confidence_floor <= 1.0):
+        raise ValueError("supersede_confidence_floor must be in [0,1]")
+
+    cutoff_dt = datetime.now(timezone.utc) - timedelta(days=idle_days_threshold)
+    cutoff_str = cutoff_dt.strftime("%Y-%m-%d %H:%M:%S")
+    now_str = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+
+    decayed: list[dict] = []
+    superseded: list[dict] = []
+
+    with get_connection() as conn:
+        # COALESCE(last_referenced_at, created_at) is the effective "last
+        # activity" — if a memory was never reinforced, we measure age
+        # from creation. "IS NOT status graduated" is enforced to keep
+        # graduated memories (which are frozen pointers to entities)
+        # out of the decay pool.
+        rows = conn.execute(
+            "SELECT id, confidence, last_referenced_at, created_at "
+            "FROM memories "
+            "WHERE status = 'active' "
+            "AND COALESCE(reference_count, 0) = 0 "
+            "AND COALESCE(last_referenced_at, created_at) < ?",
+            (cutoff_str,),
+        ).fetchall()
+
+        for r in rows:
+            mid = r["id"]
+            old_conf = float(r["confidence"])
+            new_conf = max(0.0, old_conf * daily_decay_factor)
+
+            if new_conf < supersede_confidence_floor:
+                # Auto-supersede
+                conn.execute(
+                    "UPDATE memories SET status = 'superseded', "
+                    "confidence = ?, updated_at = ? WHERE id = ?",
+                    (new_conf, now_str, mid),
+                )
+                superseded.append({
+                    "memory_id": mid,
+                    "old_confidence": old_conf,
+                    "new_confidence": new_conf,
+                })
+            else:
+                conn.execute(
+                    "UPDATE memories SET confidence = ?, updated_at = ? WHERE id = ?",
+                    (new_conf, now_str, mid),
+                )
+                decayed.append({
+                    "memory_id": mid,
+                    "old_confidence": old_conf,
+                    "new_confidence": new_conf,
+                })
+
+    # Audit rows outside the transaction. We skip per-decay audit because
+    # it would be too chatty (potentially hundreds of rows/day for no
+    # human value); supersessions ARE audited because those are
+    # status-changing events humans may want to review.
+    for entry in superseded:
+        _audit_memory(
+            memory_id=entry["memory_id"],
+            action="superseded",
+            actor=actor,
+            before={"status": "active", "confidence": entry["old_confidence"]},
+            after={"status": "superseded", "confidence": entry["new_confidence"]},
+            note=f"decayed below floor {supersede_confidence_floor}, no references",
+        )
+
+    if decayed or superseded:
+        log.info(
+            "confidence_decay_run",
+            decayed=len(decayed),
+            superseded=len(superseded),
+            idle_days_threshold=idle_days_threshold,
+            daily_decay_factor=daily_decay_factor,
+        )
+    return {"decayed": decayed, "superseded": superseded}
+
+
 def expire_stale_candidates(
    max_age_days: int = 14,
 ) -> list[str]:
@@ -838,3 +1036,545 @@ def _row_to_memory(row) -> Memory:
 def _validate_confidence(confidence: float) -> None:
    if not 0.0 <= confidence <= 1.0:
        raise ValueError("Confidence must be between 0.0 and 1.0")
+
+
+# ---------------------------------------------------------------------
+# Phase 7C — Tag canonicalization
+# ---------------------------------------------------------------------
+
+
+def get_tag_distribution(
+    active_only: bool = True,
+    min_count: int = 1,
+) -> dict[str, int]:
+    """Return {tag: occurrence_count} across memories for LLM input.
+
+    Used by the canonicalization detector to spot alias clusters like
+    {firmware: 23, fw: 5, firmware-control: 3}. Only counts memories
+    in the requested status (active by default) so superseded/invalid
+    rows don't bias the distribution.
+    """
+    import json as _json
+    counts: dict[str, int] = {}
+    query = "SELECT domain_tags FROM memories"
+    if active_only:
+        query += " WHERE status = 'active'"
+    with get_connection() as conn:
+        rows = conn.execute(query).fetchall()
+    for r in rows:
+        tags_raw = r["domain_tags"]
+        try:
+            tags = _json.loads(tags_raw) if tags_raw else []
+        except Exception:
+            tags = []
+        if not isinstance(tags, list):
+            continue
+        for t in tags:
+            if not isinstance(t, str):
+                continue
+            key = t.strip().lower()
+            if key:
+                counts[key] = counts.get(key, 0) + 1
+    if min_count > 1:
+        counts = {k: v for k, v in counts.items() if v >= min_count}
+    return counts
+
+
+def apply_tag_alias(
+    alias: str,
+    canonical: str,
+    actor: str = "tag-canon",
+) -> dict:
+    """Rewrite every active memory's domain_tags: alias → canonical.
+
+    Atomic per-memory. Dedupes within each memory's tag list (so if a
+    memory already has both alias AND canonical, we drop the alias and
+    keep canonical without duplicating). Writes one audit row per
+    touched memory with action="tag_canonicalized" so the full trail
+    is recoverable.
+
+    Returns {"memories_touched": int, "alias": ..., "canonical": ...}.
+    """
+    import json as _json
+    alias = (alias or "").strip().lower()
+    canonical = (canonical or "").strip().lower()
+    if not alias or not canonical:
+        raise ValueError("alias and canonical must be non-empty")
+    if alias == canonical:
+        raise ValueError("alias cannot equal canonical")
+
+    touched: list[tuple[str, list[str], list[str]]] = []
+
+    with get_connection() as conn:
+        rows = conn.execute(
+            "SELECT id, domain_tags FROM memories WHERE status = 'active'"
+        ).fetchall()
+
+        for r in rows:
+            raw = r["domain_tags"]
+            try:
+                tags = _json.loads(raw) if raw else []
+            except Exception:
+                tags = []
+            if not isinstance(tags, list):
+                continue
+            if alias not in tags:
+                continue
+
+            old_tags = [t for t in tags if isinstance(t, str)]
+            new_tags: list[str] = []
+            for t in old_tags:
+                rewritten = canonical if t == alias else t
+                if rewritten not in new_tags:
+                    new_tags.append(rewritten)
+            if new_tags == old_tags:
+                continue
+            conn.execute(
+                "UPDATE memories SET domain_tags = ?, updated_at = CURRENT_TIMESTAMP "
+                "WHERE id = ?",
+                (_json.dumps(new_tags), r["id"]),
+            )
+            touched.append((r["id"], old_tags, new_tags))
+
+    # Audit rows outside the transaction
+    for mem_id, old_tags, new_tags in touched:
+        _audit_memory(
+            memory_id=mem_id,
+            action="tag_canonicalized",
+            actor=actor,
+            before={"domain_tags": old_tags},
+            after={"domain_tags": new_tags},
+            note=f"{alias} → {canonical}",
+        )
+
+    if touched:
+        log.info("tag_alias_applied", alias=alias, canonical=canonical, memories_touched=len(touched))
+    return {
+        "memories_touched": len(touched),
+        "alias": alias,
+        "canonical": canonical,
+    }
+
+
+def create_tag_alias_proposal(
+    alias: str,
+    canonical: str,
+    confidence: float,
+    alias_count: int = 0,
+    canonical_count: int = 0,
+    reason: str = "",
+) -> str | None:
+    """Insert a tag_aliases row in status=pending.
+
+    Idempotent: if a pending proposal for (alias, canonical) already
+    exists, returns None.
+    """
+    import json as _json  # noqa: F401 — kept for parity with other helpers
+    alias = (alias or "").strip().lower()
+    canonical = (canonical or "").strip().lower()
+    if not alias or not canonical or alias == canonical:
+        return None
+    confidence = max(0.0, min(1.0, float(confidence)))
+
+    proposal_id = str(uuid.uuid4())
+    with get_connection() as conn:
+        existing = conn.execute(
+            "SELECT id FROM tag_aliases WHERE alias = ? AND canonical = ? "
+            "AND status = 'pending'",
+            (alias, canonical),
+        ).fetchone()
+        if existing:
+            return None
+
+        conn.execute(
+            "INSERT INTO tag_aliases (id, alias, canonical, status, confidence, "
+            "alias_count, canonical_count, reason) "
+            "VALUES (?, ?, ?, 'pending', ?, ?, ?, ?)",
+            (proposal_id, alias, canonical, confidence,
+             int(alias_count), int(canonical_count), reason[:500]),
+        )
+    log.info(
+        "tag_alias_proposed",
+        proposal_id=proposal_id,
+        alias=alias,
+        canonical=canonical,
+        confidence=round(confidence, 3),
+    )
+    return proposal_id
+
+
+def get_tag_alias_proposals(status: str = "pending", limit: int = 100) -> list[dict]:
+    """List tag alias proposals."""
+    with get_connection() as conn:
+        rows = conn.execute(
+            "SELECT * FROM tag_aliases WHERE status = ? "
+            "ORDER BY confidence DESC, created_at DESC LIMIT ?",
+            (status, limit),
+        ).fetchall()
+    return [dict(r) for r in rows]
+
+
+def approve_tag_alias(
+    proposal_id: str,
+    actor: str = "human-triage",
+) -> dict | None:
+    """Apply the alias rewrite + mark the proposal approved.
+
+    Returns the apply_tag_alias result dict, or None if the proposal
+    is not found or already resolved.
+    """
+    now_str = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT alias, canonical, status FROM tag_aliases WHERE id = ?",
+            (proposal_id,),
+        ).fetchone()
+        if row is None or row["status"] != "pending":
+            return None
+        alias, canonical = row["alias"], row["canonical"]
+
+    result = apply_tag_alias(alias, canonical, actor=actor)
+
+    with get_connection() as conn:
+        conn.execute(
+            "UPDATE tag_aliases SET status = 'approved', resolved_at = ?, "
+            "resolved_by = ?, applied_to_memories = ? WHERE id = ?",
+            (now_str, actor, result["memories_touched"], proposal_id),
+        )
+    return result
+
+
+def reject_tag_alias(proposal_id: str, actor: str = "human-triage") -> bool:
+    """Mark a tag alias proposal as rejected without applying it."""
+    now_str = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+    with get_connection() as conn:
+        result = conn.execute(
+            "UPDATE tag_aliases SET status = 'rejected', resolved_at = ?, "
+            "resolved_by = ? WHERE id = ? AND status = 'pending'",
+            (now_str, actor, proposal_id),
+        )
+        return result.rowcount > 0
+
+
+# ---------------------------------------------------------------------
+# Phase 7A — Memory Consolidation: merge-candidate lifecycle
+# ---------------------------------------------------------------------
+#
+# The detector (scripts/memory_dedup.py) writes proposals into
+# memory_merge_candidates. The triage UI lists pending rows, a human
+# reviews, and on approve we execute the merge here — never at detect
+# time. This keeps the audit trail clean: every mutation is a human
+# decision.
+
+
+def create_merge_candidate(
+    memory_ids: list[str],
+    similarity: float,
+    proposed_content: str,
+    proposed_memory_type: str,
+    proposed_project: str,
+    proposed_tags: list[str] | None = None,
+    proposed_confidence: float = 0.6,
+    reason: str = "",
+) -> str | None:
+    """Insert a merge-candidate row. Returns the new row id, or None if
+    a pending candidate already covers this exact set of memory ids
+    (idempotent scan — re-running the detector doesn't double-create)."""
+    import json as _json
+
+    if not memory_ids or len(memory_ids) < 2:
+        raise ValueError("merge candidate requires at least 2 memory_ids")
+
+    memory_ids_sorted = sorted(set(memory_ids))
+    memory_ids_json = _json.dumps(memory_ids_sorted)
+    tags_json = _json.dumps(_normalize_tags(proposed_tags))
+    candidate_id = str(uuid.uuid4())
+
+    with get_connection() as conn:
+        # Idempotency: same sorted-id set already pending? skip.
+        existing = conn.execute(
+            "SELECT id FROM memory_merge_candidates "
+            "WHERE status = 'pending' AND memory_ids = ?",
+            (memory_ids_json,),
+        ).fetchone()
+        if existing:
+            return None
+
+        conn.execute(
+            "INSERT INTO memory_merge_candidates "
+            "(id, status, memory_ids, similarity, proposed_content, "
+            "proposed_memory_type, proposed_project, proposed_tags, "
+            "proposed_confidence, reason) "
+            "VALUES (?, 'pending', ?, ?, ?, ?, ?, ?, ?, ?)",
+            (
+                candidate_id, memory_ids_json, float(similarity or 0.0),
+                (proposed_content or "")[:2000],
+                (proposed_memory_type or "knowledge")[:50],
+                (proposed_project or "")[:100],
+                tags_json,
+                max(0.0, min(1.0, float(proposed_confidence))),
+                (reason or "")[:500],
+            ),
+        )
+    log.info(
+        "merge_candidate_created",
+        candidate_id=candidate_id,
+        memory_count=len(memory_ids_sorted),
+        similarity=round(similarity, 4),
+    )
+    return candidate_id
+
+
+def get_merge_candidates(status: str = "pending", limit: int = 100) -> list[dict]:
+    """List merge candidates with their source memories inlined."""
+    import json as _json
+
+    with get_connection() as conn:
+        rows = conn.execute(
+            "SELECT * FROM memory_merge_candidates "
+            "WHERE status = ? ORDER BY created_at DESC LIMIT ?",
+            (status, limit),
+        ).fetchall()
+
+        out = []
+        for r in rows:
+            try:
+                mem_ids = _json.loads(r["memory_ids"] or "[]")
+            except Exception:
+                mem_ids = []
+            try:
+                tags = _json.loads(r["proposed_tags"] or "[]")
+            except Exception:
+                tags = []
+
+            sources = []
+            for mid in mem_ids:
+                srow = conn.execute(
+                    "SELECT id, memory_type, content, project, confidence, "
+                    "status, reference_count, domain_tags, valid_until "
+                    "FROM memories WHERE id = ?",
+                    (mid,),
+                ).fetchone()
+                if srow:
+                    try:
+                        stags = _json.loads(srow["domain_tags"] or "[]")
+                    except Exception:
+                        stags = []
+                    sources.append({
+                        "id": srow["id"],
+                        "memory_type": srow["memory_type"],
+                        "content": srow["content"],
+                        "project": srow["project"] or "",
+                        "confidence": srow["confidence"],
+                        "status": srow["status"],
+                        "reference_count": int(srow["reference_count"] or 0),
+                        "domain_tags": stags,
+                        "valid_until": srow["valid_until"] or "",
+                    })
+
+            out.append({
+                "id": r["id"],
+                "status": r["status"],
+                "memory_ids": mem_ids,
+                "similarity": r["similarity"],
+                "proposed_content": r["proposed_content"] or "",
+                "proposed_memory_type": r["proposed_memory_type"] or "knowledge",
+                "proposed_project": r["proposed_project"] or "",
+                "proposed_tags": tags,
+                "proposed_confidence": r["proposed_confidence"],
+                "reason": r["reason"] or "",
+                "created_at": r["created_at"],
+                "resolved_at": r["resolved_at"],
+                "resolved_by": r["resolved_by"],
+                "result_memory_id": r["result_memory_id"],
+                "sources": sources,
+            })
+        return out
+
+
+def reject_merge_candidate(candidate_id: str, actor: str = "human-triage", note: str = "") -> bool:
+    """Mark a merge candidate as rejected. Source memories stay untouched."""
+    now_str = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+    with get_connection() as conn:
+        result = conn.execute(
+            "UPDATE memory_merge_candidates "
+            "SET status = 'rejected', resolved_at = ?, resolved_by = ? "
+            "WHERE id = ? AND status = 'pending'",
+            (now_str, actor, candidate_id),
+        )
+        if result.rowcount == 0:
+            return False
+    log.info("merge_candidate_rejected", candidate_id=candidate_id, actor=actor, note=note[:100])
+    return True
+
+
+def merge_memories(
+    candidate_id: str,
+    actor: str = "human-triage",
+    override_content: str | None = None,
+    override_tags: list[str] | None = None,
+) -> str | None:
+    """Execute an approved merge candidate.
+
+    1. Validate all source memories still status=active
+    2. Create the new merged memory (status=active)
+    3. Mark each source status=superseded with an audit row pointing at
+       the new merged id
+    4. Mark the candidate status=approved, record result_memory_id
+    5. Write a consolidated audit row on the new memory
+
+    Returns the new merged memory's id, or None if the candidate cannot
+    be executed (already resolved, source tampered, etc.).
+
+    ``override_content`` and ``override_tags`` let the UI pass the human's
+    edits before clicking approve.
+    """
+    import json as _json
+
+    now_str = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT * FROM memory_merge_candidates WHERE id = ?",
+            (candidate_id,),
+        ).fetchone()
+        if row is None or row["status"] != "pending":
+            log.warning("merge_candidate_not_pending", candidate_id=candidate_id)
+            return None
+
+        try:
+            mem_ids = _json.loads(row["memory_ids"] or "[]")
+        except Exception:
+            mem_ids = []
+        if not mem_ids or len(mem_ids) < 2:
+            log.warning("merge_candidate_invalid_memory_ids", candidate_id=candidate_id)
+            return None
+
+        # Snapshot sources + validate all active
+        source_rows = []
+        for mid in mem_ids:
+            srow = conn.execute(
+                "SELECT * FROM memories WHERE id = ?", (mid,)
+            ).fetchone()
+            if srow is None or srow["status"] != "active":
+                log.warning(
+                    "merge_source_not_active",
+                    candidate_id=candidate_id,
+                    memory_id=mid,
+                    actual_status=(srow["status"] if srow else "missing"),
+                )
+                return None
+            source_rows.append(srow)
+
+        # Build merged memory fields — prefer human overrides, then proposed
+        content = (override_content or row["proposed_content"] or "").strip()
+        if not content:
+            log.warning("merge_candidate_empty_content", candidate_id=candidate_id)
+            return None
+
+        merged_type = (row["proposed_memory_type"] or source_rows[0]["memory_type"]).lower()
+        if merged_type not in MEMORY_TYPES:
+            merged_type = source_rows[0]["memory_type"]
+
+        merged_project = row["proposed_project"] or source_rows[0]["project"] or ""
+        merged_project = resolve_project_name(merged_project)
+
+        # Tags: override wins, else proposed, else union of sources
+        if override_tags is not None:
+            merged_tags = _normalize_tags(override_tags)
+        else:
+            try:
+                proposed_tags = _json.loads(row["proposed_tags"] or "[]")
+            except Exception:
+                proposed_tags = []
+            if proposed_tags:
+                merged_tags = _normalize_tags(proposed_tags)
+            else:
+                union: list[str] = []
+                for srow in source_rows:
+                    try:
+                        stags = _json.loads(srow["domain_tags"] or "[]")
+                    except Exception:
+                        stags = []
+                    for t in stags:
+                        if isinstance(t, str) and t and t not in union:
+                            union.append(t)
+                merged_tags = union
+
+        # confidence = max; reference_count = sum
+        merged_confidence = max(float(s["confidence"]) for s in source_rows)
+        total_refs = sum(int(s["reference_count"] or 0) for s in source_rows)
+
+        # valid_until: if any source is permanent (None/empty), merged is permanent.
+        # Otherwise take the latest (lexical compare on ISO dates works).
+        merged_vu: str | None = ""  # placeholder
+        has_permanent = any(not (s["valid_until"] or "").strip() for s in source_rows)
+        if has_permanent:
+            merged_vu = None
+        else:
+            merged_vu = max((s["valid_until"] or "").strip() for s in source_rows) or None
+
+        new_id = str(uuid.uuid4())
+        tags_json = _json.dumps(merged_tags)
+
+        conn.execute(
+            "INSERT INTO memories (id, memory_type, content, project, "
+            "source_chunk_id, confidence, status, domain_tags, valid_until, "
+            "reference_count, last_referenced_at) "
+            "VALUES (?, ?, ?, ?, NULL, ?, 'active', ?, ?, ?, ?)",
+            (
+                new_id, merged_type, content[:2000], merged_project,
+                merged_confidence, tags_json, merged_vu, total_refs, now_str,
+            ),
+        )
+
+        # Mark sources superseded
+        for srow in source_rows:
+            conn.execute(
+                "UPDATE memories SET status = 'superseded', updated_at = ? "
+                "WHERE id = ?",
+                (now_str, srow["id"]),
+            )
+
+        # Mark candidate approved
+        conn.execute(
+            "UPDATE memory_merge_candidates SET status = 'approved', "
+            "resolved_at = ?, resolved_by = ?, result_memory_id = ? WHERE id = ?",
+            (now_str, actor, new_id, candidate_id),
+        )
+
+    # Audit rows (out of the transaction; fail-open via _audit_memory)
+    _audit_memory(
+        memory_id=new_id,
+        action="created_via_merge",
+        actor=actor,
+        after={
+            "memory_type": merged_type,
+            "content": content,
+            "project": merged_project,
+            "confidence": merged_confidence,
+            "domain_tags": merged_tags,
+            "reference_count": total_refs,
+            "merged_from": list(mem_ids),
+            "merge_candidate_id": candidate_id,
+        },
+        note=f"merged {len(mem_ids)} sources via candidate {candidate_id[:8]}",
+    )
+    for srow in source_rows:
+        _audit_memory(
+            memory_id=srow["id"],
+            action="superseded",
+            actor=actor,
+            before={"status": "active", "content": srow["content"]},
+            after={"status": "superseded", "superseded_by": new_id},
+            note=f"merged into {new_id}",
+        )
+
+    log.info(
+        "merge_executed",
+        candidate_id=candidate_id,
+        result_memory_id=new_id,
+        source_count=len(source_rows),
+        actor=actor,
+    )
+    return new_id
--- a/src/atocore/memory/similarity.py
+++ b/src/atocore/memory/similarity.py
@@ -0,0 +1,88 @@
+"""Phase 7A (Memory Consolidation): semantic similarity helpers.
+
+Thin wrapper over ``atocore.retrieval.embeddings`` that exposes
+pairwise + batch cosine similarity on normalized embeddings. Used by
+the dedup detector to cluster near-duplicate active memories.
+
+Embeddings from ``embed_texts()`` are already L2-normalized, so cosine
+similarity reduces to a dot product — no extra normalization needed.
+"""
+
+from __future__ import annotations
+
+from atocore.retrieval.embeddings import embed_texts
+
+
+def _dot(a: list[float], b: list[float]) -> float:
+    return sum(x * y for x, y in zip(a, b))
+
+
+def cosine(a: list[float], b: list[float]) -> float:
+    """Cosine similarity on already-normalized vectors. Clamped to [0,1]
+    (embeddings use paraphrase-multilingual-MiniLM which is unit-norm,
+    and we never want negative values leaking into thresholds)."""
+    return max(0.0, min(1.0, _dot(a, b)))
+
+
+def compute_memory_similarity(text_a: str, text_b: str) -> float:
+    """Return cosine similarity of two memory contents in [0,1].
+
+    Convenience helper for one-off checks + tests. For batch work (the
+    dedup detector), use ``embed_texts()`` directly and compute the
+    similarity matrix yourself to avoid re-embedding shared texts.
+    """
+    if not text_a or not text_b:
+        return 0.0
+    vecs = embed_texts([text_a, text_b])
+    return cosine(vecs[0], vecs[1])
+
+
+def similarity_matrix(texts: list[str]) -> list[list[float]]:
+    """N×N cosine similarity matrix. Diagonal is 1.0, symmetric."""
+    if not texts:
+        return []
+    vecs = embed_texts(texts)
+    n = len(vecs)
+    matrix = [[0.0] * n for _ in range(n)]
+    for i in range(n):
+        matrix[i][i] = 1.0
+        for j in range(i + 1, n):
+            s = cosine(vecs[i], vecs[j])
+            matrix[i][j] = s
+            matrix[j][i] = s
+    return matrix
+
+
+def cluster_by_threshold(texts: list[str], threshold: float) -> list[list[int]]:
+    """Greedy transitive clustering: if sim(i,j) >= threshold, merge.
+
+    Returns a list of clusters, each a list of indices into ``texts``.
+    Singletons are included. Used by the dedup detector to collapse
+    A~B~C into one merge proposal rather than three pair proposals.
+    """
+    if not texts:
+        return []
+    matrix = similarity_matrix(texts)
+    n = len(texts)
+    parent = list(range(n))
+
+    def find(x: int) -> int:
+        while parent[x] != x:
+            parent[x] = parent[parent[x]]
+            x = parent[x]
+        return x
+
+    def union(x: int, y: int) -> None:
+        rx, ry = find(x), find(y)
+        if rx != ry:
+            parent[rx] = ry
+
+    for i in range(n):
+        for j in range(i + 1, n):
+            if matrix[i][j] >= threshold:
+                union(i, j)
+
+    groups: dict[int, list[int]] = {}
+    for i in range(n):
+        groups.setdefault(find(i), []).append(i)
+    return list(groups.values())
--- a/src/atocore/models/database.py
+++ b/src/atocore/models/database.py
@@ -251,6 +251,69 @@ def _apply_migrations(conn: sqlite3.Connection) -> None:
        "CREATE INDEX IF NOT EXISTS idx_interactions_created_at ON interactions(created_at)"
    )

+    # Phase 7A (Memory Consolidation — "sleep cycle"): merge candidates.
+    # When the dedup detector finds a cluster of semantically similar active
+    # memories within the same (project, memory_type) bucket, it drafts a
+    # unified content via LLM and writes a proposal here. The triage UI
+    # surfaces these for human approval. On approve, source memories become
+    # status=superseded and a new merged memory is created.
+    # memory_ids is a JSON array (length >= 2) of the source memory ids.
+    # proposed_* hold the LLM's draft; a human can edit before approve.
+    # result_memory_id is filled on approve with the new merged memory's id.
+    conn.execute(
+        """
+        CREATE TABLE IF NOT EXISTS memory_merge_candidates (
+            id TEXT PRIMARY KEY,
+            status TEXT DEFAULT 'pending',
+            memory_ids TEXT NOT NULL,
+            similarity REAL,
+            proposed_content TEXT,
+            proposed_memory_type TEXT,
+            proposed_project TEXT,
+            proposed_tags TEXT DEFAULT '[]',
+            proposed_confidence REAL,
+            reason TEXT DEFAULT '',
+            created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
+            resolved_at DATETIME,
+            resolved_by TEXT,
+            result_memory_id TEXT
+        )
+        """
+    )
+    conn.execute(
+        "CREATE INDEX IF NOT EXISTS idx_mmc_status ON memory_merge_candidates(status)"
+    )
+    conn.execute(
+        "CREATE INDEX IF NOT EXISTS idx_mmc_created_at ON memory_merge_candidates(created_at)"
+    )
+
+    # Phase 7C (Memory Consolidation — tag canonicalization): alias → canonical
+    # map for domain_tags. A weekly LLM pass proposes rows here; high-confidence
+    # ones auto-apply (rewrite domain_tags across all memories), low-confidence
+    # ones stay pending for human approval. Immutable history: resolved rows
+    # keep status=approved/rejected; the same alias can re-appear with a new
+    # id if the tag reaches a different canonical later.
+    conn.execute(
+        """
+        CREATE TABLE IF NOT EXISTS tag_aliases (
+            id TEXT PRIMARY KEY,
+            alias TEXT NOT NULL,
+            canonical TEXT NOT NULL,
+            status TEXT DEFAULT 'pending',
+            confidence REAL DEFAULT 0.0,
+            alias_count INTEGER DEFAULT 0,
+            canonical_count INTEGER DEFAULT 0,
+            reason TEXT DEFAULT '',
+            applied_to_memories INTEGER DEFAULT 0,
+            created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
+            resolved_at DATETIME,
+            resolved_by TEXT
+        )
+        """
+    )
+    conn.execute("CREATE INDEX IF NOT EXISTS idx_tag_aliases_status ON tag_aliases(status)")
+    conn.execute("CREATE INDEX IF NOT EXISTS idx_tag_aliases_alias ON tag_aliases(alias)")
+

 def _column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
    rows = conn.execute(f"PRAGMA table_info({table})").fetchall()
--- a/tests/test_confidence_decay.py
+++ b/tests/test_confidence_decay.py
@@ -0,0 +1,251 @@
+"""Phase 7D — confidence decay tests.
+
+Covers:
+  - idle unreferenced memories decay at the expected rate
+  - fresh / reinforced memories are untouched
+  - below floor → auto-supersede with audit
+  - graduated memories exempt
+  - reinforcement reverses decay (integration with Phase 9 Commit B)
+"""
+
+from __future__ import annotations
+
+from datetime import datetime, timedelta, timezone
+
+import pytest
+
+from atocore.memory.service import (
+    create_memory,
+    decay_unreferenced_memories,
+    get_memory_audit,
+    reinforce_memory,
+)
+from atocore.models.database import get_connection, init_db
+
+
+def _force_old(mem_id: str, days_ago: int) -> None:
+    """Force last_referenced_at and created_at to N days in the past."""
+    ts = (datetime.now(timezone.utc) - timedelta(days=days_ago)).strftime("%Y-%m-%d %H:%M:%S")
+    with get_connection() as conn:
+        conn.execute(
+            "UPDATE memories SET last_referenced_at = ?, created_at = ? WHERE id = ?",
+            (ts, ts, mem_id),
+        )
+
+
+def _set_confidence(mem_id: str, c: float) -> None:
+    with get_connection() as conn:
+        conn.execute("UPDATE memories SET confidence = ? WHERE id = ?", (c, mem_id))
+
+
+def _set_reference_count(mem_id: str, n: int) -> None:
+    with get_connection() as conn:
+        conn.execute("UPDATE memories SET reference_count = ? WHERE id = ?", (n, mem_id))
+
+
+def _get(mem_id: str) -> dict:
+    with get_connection() as conn:
+        row = conn.execute("SELECT * FROM memories WHERE id = ?", (mem_id,)).fetchone()
+    return dict(row) if row else {}
+
+
+def _set_status(mem_id: str, status: str) -> None:
+    with get_connection() as conn:
+        conn.execute("UPDATE memories SET status = ? WHERE id = ?", (status, mem_id))
+
+
+# --- Basic decay mechanics ---
+
+
+def test_decay_applies_to_idle_unreferenced(tmp_data_dir):
+    init_db()
+    m = create_memory("knowledge", "cold fact", confidence=0.8)
+    _force_old(m.id, days_ago=60)
+    _set_reference_count(m.id, 0)
+
+    result = decay_unreferenced_memories()
+    assert len(result["decayed"]) == 1
+    assert result["decayed"][0]["memory_id"] == m.id
+
+    row = _get(m.id)
+    # 0.8 * 0.97 = 0.776
+    assert row["confidence"] == pytest.approx(0.776)
+    assert row["status"] == "active"  # still above floor
+
+
+def test_decay_skips_fresh_memory(tmp_data_dir):
+    """A memory created today shouldn't decay even if reference_count=0."""
+    init_db()
+    m = create_memory("knowledge", "just-created fact", confidence=0.8)
+    # Don't force old — it's fresh
+    result = decay_unreferenced_memories()
+    assert not any(e["memory_id"] == m.id for e in result["decayed"])
+    assert not any(e["memory_id"] == m.id for e in result["superseded"])
+
+    row = _get(m.id)
+    assert row["confidence"] == pytest.approx(0.8)
+
+
+def test_decay_skips_reinforced_memory(tmp_data_dir):
+    """Any reinforcement protects the memory from decay."""
+    init_db()
+    m = create_memory("knowledge", "referenced fact", confidence=0.8)
+    _force_old(m.id, days_ago=90)
+    _set_reference_count(m.id, 1)  # just one reference is enough
+
+    result = decay_unreferenced_memories()
+    assert not any(e["memory_id"] == m.id for e in result["decayed"])
+
+    row = _get(m.id)
+    assert row["confidence"] == pytest.approx(0.8)
+
+
+# --- Auto-supersede at floor ---
+
+
+def test_decay_supersedes_below_floor(tmp_data_dir):
+    init_db()
+    m = create_memory("knowledge", "very cold fact", confidence=0.31)
+    _force_old(m.id, days_ago=60)
+    _set_reference_count(m.id, 0)
+
+    # 0.31 * 0.97 = 0.3007 which is still above the default floor 0.30.
+    # Drop it a hair lower to cross the floor in one step.
+    _set_confidence(m.id, 0.305)
+
+    result = decay_unreferenced_memories(supersede_confidence_floor=0.30)
+    # 0.305 * 0.97 = 0.29585 → below 0.30, supersede
+    assert len(result["superseded"]) == 1
+    assert result["superseded"][0]["memory_id"] == m.id
+
+    row = _get(m.id)
+    assert row["status"] == "superseded"
+    assert row["confidence"] < 0.30
+
+
+def test_supersede_writes_audit_row(tmp_data_dir):
+    init_db()
+    m = create_memory("knowledge", "will decay out", confidence=0.305)
+    _force_old(m.id, days_ago=60)
+    _set_reference_count(m.id, 0)
+
+    decay_unreferenced_memories(supersede_confidence_floor=0.30)
+
+    audit = get_memory_audit(m.id)
+    actions = [a["action"] for a in audit]
+    assert "superseded" in actions
+    entry = next(a for a in audit if a["action"] == "superseded")
+    assert entry["actor"] == "confidence-decay"
+    assert "decayed below floor" in entry["note"]
+
+
+# --- Exemptions ---
+
+
+def test_decay_skips_graduated_memory(tmp_data_dir):
+    """Graduated memories are frozen pointers to entities — never decay."""
+    init_db()
+    m = create_memory("knowledge", "graduated fact", confidence=0.8)
+    _force_old(m.id, days_ago=90)
+    _set_reference_count(m.id, 0)
+    _set_status(m.id, "graduated")
+
+    result = decay_unreferenced_memories()
+    assert not any(e["memory_id"] == m.id for e in result["decayed"])
+
+    row = _get(m.id)
+    assert row["confidence"] == pytest.approx(0.8)  # unchanged
+
+
+def test_decay_skips_superseded_memory(tmp_data_dir):
+    """Already superseded memories don't decay further."""
+    init_db()
+    m = create_memory("knowledge", "old news", confidence=0.5)
+    _force_old(m.id, days_ago=90)
+    _set_reference_count(m.id, 0)
+    _set_status(m.id, "superseded")
+
+    result = decay_unreferenced_memories()
+    assert not any(e["memory_id"] == m.id for e in result["decayed"])
+
+
+# --- Reversibility ---
+
+
+def test_reinforcement_reverses_decay(tmp_data_dir):
+    """A memory that decayed then got reinforced comes back up."""
+    init_db()
+    m = create_memory("knowledge", "will come back", confidence=0.8)
+    _force_old(m.id, days_ago=60)
+    _set_reference_count(m.id, 0)
+
+    decay_unreferenced_memories()
+    # Now at 0.776
+    reinforce_memory(m.id, confidence_delta=0.05)
+    row = _get(m.id)
+    assert row["confidence"] == pytest.approx(0.826)
+    assert row["reference_count"] >= 1
+
+
+def test_reinforced_memory_no_longer_decays(tmp_data_dir):
+    """Once reinforce_memory bumps reference_count, decay skips it."""
+    init_db()
+    m = create_memory("knowledge", "protected", confidence=0.8)
+    _force_old(m.id, days_ago=90)
+    # Simulate reinforcement
+    reinforce_memory(m.id)
+
+    result = decay_unreferenced_memories()
+    assert not any(e["memory_id"] == m.id for e in result["decayed"])
+
+
+# --- Parameter validation ---
+
+
+def test_decay_rejects_invalid_factor(tmp_data_dir):
+    init_db()
+    with pytest.raises(ValueError):
+        decay_unreferenced_memories(daily_decay_factor=1.0)
+    with pytest.raises(ValueError):
+        decay_unreferenced_memories(daily_decay_factor=0.0)
+    with pytest.raises(ValueError):
+        decay_unreferenced_memories(daily_decay_factor=-0.5)
+
+
+def test_decay_rejects_invalid_floor(tmp_data_dir):
+    init_db()
+    with pytest.raises(ValueError):
+        decay_unreferenced_memories(supersede_confidence_floor=1.5)
+    with pytest.raises(ValueError):
+        decay_unreferenced_memories(supersede_confidence_floor=-0.1)
+
+
+# --- Threshold tuning ---
+
+
+def test_decay_threshold_tight_excludes_newer(tmp_data_dir):
+    """With idle_days_threshold=90, a 60-day-old memory should NOT decay."""
+    init_db()
+    m = create_memory("knowledge", "60-day-old", confidence=0.8)
+    _force_old(m.id, days_ago=60)
+    _set_reference_count(m.id, 0)
+
+    result = decay_unreferenced_memories(idle_days_threshold=90)
+    assert not any(e["memory_id"] == m.id for e in result["decayed"])
+
+
+# --- Idempotency-ish (multiple runs apply additional decay) ---
+
+
+def test_decay_stacks_across_runs(tmp_data_dir):
+    """Running decay twice (simulating two days) compounds the factor."""
+    init_db()
+    m = create_memory("knowledge", "aging fact", confidence=0.8)
+    _force_old(m.id, days_ago=60)
+    _set_reference_count(m.id, 0)
+
+    decay_unreferenced_memories()
+    decay_unreferenced_memories()
+    row = _get(m.id)
+    # 0.8 * 0.97 * 0.97 = 0.75272
+    assert row["confidence"] == pytest.approx(0.75272, rel=1e-4)
--- a/tests/test_inject_context_hook.py
+++ b/tests/test_inject_context_hook.py
@@ -0,0 +1,198 @@
+"""Tests for deploy/hooks/inject_context.py — Claude Code UserPromptSubmit hook.
+
+These are process-level tests: we run the actual script with subprocess,
+feed it stdin, and check the exit code + stdout shape. The hook must:
+  - always exit 0 (never block a user prompt)
+  - emit valid hookSpecificOutput JSON on success
+  - fail open (empty output) on network errors, bad stdin, kill-switch
+  - respect the short-prompt filter
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import subprocess
+import sys
+from pathlib import Path
+
+import pytest
+
+HOOK = Path(__file__).resolve().parent.parent / "deploy" / "hooks" / "inject_context.py"
+
+
+def _run_hook(stdin_json: dict | str, env_overrides: dict | None = None, timeout: float = 10) -> tuple[int, str, str]:
+    env = os.environ.copy()
+    # Force kill switch off unless the test overrides
+    env.pop("ATOCORE_CONTEXT_DISABLED", None)
+    if env_overrides:
+        env.update(env_overrides)
+    stdin = stdin_json if isinstance(stdin_json, str) else json.dumps(stdin_json)
+    proc = subprocess.run(
+        [sys.executable, str(HOOK)],
+        input=stdin, text=True,
+        capture_output=True, timeout=timeout,
+        env=env,
+    )
+    return proc.returncode, proc.stdout, proc.stderr
+
+
+def test_hook_exit_0_on_success_or_failure():
+    """Canonical contract: the hook never blocks a prompt. Even with a
+    bogus URL we must exit 0 with empty stdout (fail-open)."""
+    code, stdout, stderr = _run_hook(
+        {
+            "prompt": "What's the p04-gigabit current status?",
+            "cwd": "/tmp",
+            "session_id": "t",
+            "hook_event_name": "UserPromptSubmit",
+        },
+        env_overrides={"ATOCORE_URL": "http://127.0.0.1:1",  # unreachable
+                       "ATOCORE_CONTEXT_TIMEOUT": "1"},
+    )
+    assert code == 0
+    # stdout is empty (fail-open) — no hookSpecificOutput emitted
+    assert stdout.strip() == ""
+    assert "atocore unreachable" in stderr or "request failed" in stderr
+
+
+def test_hook_kill_switch():
+    code, stdout, stderr = _run_hook(
+        {"prompt": "hello world is this a thing", "cwd": "", "session_id": "t"},
+        env_overrides={"ATOCORE_CONTEXT_DISABLED": "1"},
+    )
+    assert code == 0
+    assert stdout.strip() == ""
+
+
+def test_hook_ignores_short_prompt():
+    code, stdout, _ = _run_hook(
+        {"prompt": "ok", "cwd": "", "session_id": "t"},
+        env_overrides={"ATOCORE_URL": "http://127.0.0.1:1"},
+    )
+    assert code == 0
+    # No network call attempted; empty output
+    assert stdout.strip() == ""
+
+
+def test_hook_ignores_xml_prompt():
+    """System/meta prompts starting with '<' should be skipped."""
+    code, stdout, _ = _run_hook(
+        {"prompt": "<system>do something</system>", "cwd": "", "session_id": "t"},
+        env_overrides={"ATOCORE_URL": "http://127.0.0.1:1"},
+    )
+    assert code == 0
+    assert stdout.strip() == ""
+
+
+def test_hook_handles_bad_stdin():
+    code, stdout, stderr = _run_hook("not-json-at-all")
+    assert code == 0
+    assert stdout.strip() == ""
+    assert "bad stdin" in stderr
+
+
+def test_hook_handles_empty_stdin():
+    code, stdout, _ = _run_hook("")
+    assert code == 0
+    assert stdout.strip() == ""
+
+
+def test_hook_success_shape_with_mock_server(monkeypatch, tmp_path):
+    """When the API returns a pack, the hook emits valid
+    hookSpecificOutput JSON wrapping it."""
+    # Start a tiny HTTP server on localhost that returns a fake pack
+    import http.server
+    import json as _json
+    import threading
+
+    pack = "Trusted State: foo=bar"
+
+    class Handler(http.server.BaseHTTPRequestHandler):
+        def do_POST(self):  # noqa: N802
+            self.rfile.read(int(self.headers.get("Content-Length", 0)))
+            body = _json.dumps({"formatted_context": pack}).encode()
+            self.send_response(200)
+            self.send_header("Content-Type", "application/json")
+            self.send_header("Content-Length", str(len(body)))
+            self.end_headers()
+            self.wfile.write(body)
+
+        def log_message(self, *a, **kw):
+            pass
+
+    server = http.server.HTTPServer(("127.0.0.1", 0), Handler)
+    port = server.server_address[1]
+    t = threading.Thread(target=server.serve_forever, daemon=True)
+    t.start()
+    try:
+        code, stdout, stderr = _run_hook(
+            {
+                "prompt": "What do we know about p04?",
+                "cwd": "",
+                "session_id": "t",
+                "hook_event_name": "UserPromptSubmit",
+            },
+            env_overrides={
+                "ATOCORE_URL": f"http://127.0.0.1:{port}",
+                "ATOCORE_CONTEXT_TIMEOUT": "5",
+            },
+            timeout=15,
+        )
+    finally:
+        server.shutdown()
+
+    assert code == 0, stderr
+    assert stdout.strip(), "expected JSON output with context"
+    out = json.loads(stdout)
+    hso = out.get("hookSpecificOutput", {})
+    assert hso.get("hookEventName") == "UserPromptSubmit"
+    assert pack in hso.get("additionalContext", "")
+    assert "AtoCore-injected context" in hso.get("additionalContext", "")
+
+
+def test_hook_project_inference_from_cwd(monkeypatch):
+    """The hook should map a known cwd to a project slug and send it in
+    the /context/build payload."""
+    import http.server
+    import json as _json
+    import threading
+
+    captured_body: dict = {}
+
+    class Handler(http.server.BaseHTTPRequestHandler):
+        def do_POST(self):  # noqa: N802
+            n = int(self.headers.get("Content-Length", 0))
+            body = self.rfile.read(n)
+            captured_body.update(_json.loads(body.decode()))
+            out = _json.dumps({"formatted_context": "ok"}).encode()
+            self.send_response(200)
+            self.send_header("Content-Length", str(len(out)))
+            self.end_headers()
+            self.wfile.write(out)
+
+        def log_message(self, *a, **kw):
+            pass
+
+    server = http.server.HTTPServer(("127.0.0.1", 0), Handler)
+    port = server.server_address[1]
+    t = threading.Thread(target=server.serve_forever, daemon=True)
+    t.start()
+    try:
+        _run_hook(
+            {
+                "prompt": "Is this being tested properly",
+                "cwd": "C:\\Users\\antoi\\ATOCore",
+                "session_id": "t",
+            },
+            env_overrides={
+                "ATOCORE_URL": f"http://127.0.0.1:{port}",
+                "ATOCORE_CONTEXT_TIMEOUT": "5",
+            },
+        )
+    finally:
+        server.shutdown()
+
+    # Hook should have inferred project="atocore" from the ATOCore cwd
+    assert captured_body.get("project") == "atocore"
+    assert captured_body.get("prompt", "").startswith("Is this being tested")
--- a/tests/test_memory_dedup.py
+++ b/tests/test_memory_dedup.py
@@ -0,0 +1,496 @@
+"""Phase 7A — memory consolidation tests.
+
+Covers:
+  - similarity helpers (cosine bounds, matrix symmetry, clustering)
+  - _dedup_prompt parser / normalizer robustness
+  - create_merge_candidate idempotency
+  - get_merge_candidates inlines source memories
+  - merge_memories end-to-end happy path (sources → superseded,
+    new merged memory active, audit rows, result_memory_id)
+  - reject_merge_candidate leaves sources untouched
+"""
+
+from __future__ import annotations
+
+import pytest
+
+from atocore.memory._dedup_prompt import (
+    TIER2_SYSTEM_PROMPT,
+    build_tier2_user_message,
+    normalize_merge_verdict,
+    parse_merge_verdict,
+)
+from atocore.memory.service import (
+    create_memory,
+    create_merge_candidate,
+    get_memory_audit,
+    get_merge_candidates,
+    merge_memories,
+    reject_merge_candidate,
+)
+from atocore.memory.similarity import (
+    cluster_by_threshold,
+    cosine,
+    compute_memory_similarity,
+    similarity_matrix,
+)
+from atocore.models.database import get_connection, init_db
+
+
+# --- Similarity helpers ---
+
+
+def test_cosine_bounds():
+    assert cosine([1.0, 0.0], [1.0, 0.0]) == pytest.approx(1.0)
+    assert cosine([1.0, 0.0], [0.0, 1.0]) == pytest.approx(0.0)
+    # Negative dot product clamped to 0
+    assert cosine([1.0, 0.0], [-1.0, 0.0]) == 0.0
+
+
+def test_compute_memory_similarity_identical_high():
+    s = compute_memory_similarity("the sky is blue", "the sky is blue")
+    assert 0.99 <= s <= 1.0
+
+
+def test_compute_memory_similarity_unrelated_low():
+    s = compute_memory_similarity(
+        "APM integrates with NX via a Python bridge",
+        "the polisher firmware must use USB SSD not SD card",
+    )
+    assert 0.0 <= s < 0.7
+
+
+def test_similarity_matrix_symmetric():
+    texts = ["alpha beta gamma", "alpha beta gamma", "completely unrelated text"]
+    m = similarity_matrix(texts)
+    assert len(m) == 3 and all(len(r) == 3 for r in m)
+    for i in range(3):
+        assert m[i][i] == pytest.approx(1.0)
+    for i in range(3):
+        for j in range(3):
+            assert m[i][j] == pytest.approx(m[j][i])
+
+
+def test_cluster_by_threshold_transitive():
+    # Three near-paraphrases should land in one cluster
+    texts = [
+        "Antoine prefers OAuth over API keys",
+        "Antoine's preference is OAuth, not API keys",
+        "the polisher firmware uses USB SSD storage",
+    ]
+    clusters = cluster_by_threshold(texts, threshold=0.7)
+    # At least one cluster of size 2+ containing the paraphrases
+    big = [c for c in clusters if len(c) >= 2]
+    assert big, f"expected at least one multi-member cluster, got {clusters}"
+    assert 0 in big[0] and 1 in big[0]
+
+
+# --- Prompt parser robustness ---
+
+
+def test_parse_merge_verdict_strips_fences():
+    raw = "```json\n{\"action\":\"merge\",\"content\":\"x\"}\n```"
+    parsed = parse_merge_verdict(raw)
+    assert parsed == {"action": "merge", "content": "x"}
+
+
+def test_parse_merge_verdict_handles_prose_prefix():
+    raw = "Sure! Here's the result:\n{\"action\":\"reject\",\"content\":\"no\"}"
+    parsed = parse_merge_verdict(raw)
+    assert parsed is not None
+    assert parsed["action"] == "reject"
+
+
+def test_normalize_merge_verdict_fills_defaults():
+    v = normalize_merge_verdict({
+        "action": "merge",
+        "content": "unified text",
+    })
+    assert v is not None
+    assert v["memory_type"] == "knowledge"
+    assert v["project"] == ""
+    assert v["domain_tags"] == []
+    assert v["confidence"] == 0.5
+
+
+def test_normalize_merge_verdict_rejects_empty_content():
+    assert normalize_merge_verdict({"action": "merge", "content": ""}) is None
+
+
+def test_normalize_merge_verdict_rejects_unknown_action():
+    assert normalize_merge_verdict({"action": "?", "content": "x"}) is None
+
+
+# --- Tier-2 (Phase 7A.1) ---
+
+
+def test_tier2_prompt_is_stricter():
+    # The tier-2 system prompt must explicitly instruct the model to be
+    # stricter than tier-1 — that's the whole point of escalation.
+    assert "STRICTER" in TIER2_SYSTEM_PROMPT
+    assert "REJECT" in TIER2_SYSTEM_PROMPT
+
+
+def test_build_tier2_user_message_includes_tier1_draft():
+    sources = [{
+        "id": "abc12345", "content": "source text A",
+        "memory_type": "knowledge", "project": "p04",
+        "domain_tags": ["optics"], "confidence": 0.6,
+        "valid_until": "", "reference_count": 2,
+    }, {
+        "id": "def67890", "content": "source text B",
+        "memory_type": "knowledge", "project": "p04",
+        "domain_tags": ["optics"], "confidence": 0.7,
+        "valid_until": "", "reference_count": 1,
+    }]
+    tier1 = {
+        "action": "merge",
+        "content": "unified draft by tier1",
+        "memory_type": "knowledge",
+        "project": "p04",
+        "domain_tags": ["optics"],
+        "confidence": 0.65,
+        "reason": "near-paraphrase",
+    }
+    msg = build_tier2_user_message(sources, tier1)
+    assert "source text A" in msg
+    assert "source text B" in msg
+    assert "TIER-1 DRAFT" in msg
+    assert "unified draft by tier1" in msg
+    assert "near-paraphrase" in msg
+    # Should end asking for a verdict
+    assert "verdict" in msg.lower()
+
+
+# --- Tiering helpers (min_pairwise_similarity, same_bucket) ---
+
+
+def test_same_bucket_true_for_matching():
+    import importlib.util
+    spec = importlib.util.spec_from_file_location(
+        "memory_dedup_for_test",
+        "scripts/memory_dedup.py",
+    )
+    mod = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(mod)
+
+    sources = [
+        {"memory_type": "knowledge", "project": "p04"},
+        {"memory_type": "knowledge", "project": "p04"},
+    ]
+    assert mod.same_bucket(sources) is True
+
+
+def test_same_bucket_false_for_mixed():
+    import importlib.util
+    spec = importlib.util.spec_from_file_location(
+        "memory_dedup_for_test",
+        "scripts/memory_dedup.py",
+    )
+    mod = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(mod)
+
+    # Different project
+    assert mod.same_bucket([
+        {"memory_type": "knowledge", "project": "p04"},
+        {"memory_type": "knowledge", "project": "p05"},
+    ]) is False
+    # Different memory_type
+    assert mod.same_bucket([
+        {"memory_type": "knowledge", "project": "p04"},
+        {"memory_type": "project", "project": "p04"},
+    ]) is False
+
+
+def test_min_pairwise_similarity_identical_texts():
+    import importlib.util
+    spec = importlib.util.spec_from_file_location(
+        "memory_dedup_for_test",
+        "scripts/memory_dedup.py",
+    )
+    mod = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(mod)
+
+    # Three identical texts — min should be ~1.0
+    ms = mod.min_pairwise_similarity(["hello world"] * 3)
+    assert 0.99 <= ms <= 1.0
+
+
+def test_min_pairwise_similarity_mixed_cluster():
+    """Transitive cluster A~B~C with A and C actually quite different
+    should expose a low min even though A~B and B~C are high."""
+    import importlib.util
+    spec = importlib.util.spec_from_file_location(
+        "memory_dedup_for_test",
+        "scripts/memory_dedup.py",
+    )
+    mod = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(mod)
+
+    ms = mod.min_pairwise_similarity([
+        "Antoine prefers OAuth over API keys",
+        "Antoine's OAuth preference",
+        "USB SSD mandatory for polisher firmware",
+    ])
+    assert ms < 0.6  # Third is unrelated; min is far below threshold
+
+
+# --- create_merge_candidate idempotency ---
+
+
+def test_create_merge_candidate_inserts_row(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "APM uses NX for DXF conversion")
+    m2 = create_memory("knowledge", "APM uses NX for DXF-to-STL")
+
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id],
+        similarity=0.92,
+        proposed_content="APM uses NX for DXF→STL conversion",
+        proposed_memory_type="knowledge",
+        proposed_project="",
+        proposed_tags=["apm", "nx"],
+        proposed_confidence=0.6,
+        reason="near-paraphrase",
+    )
+    assert cid is not None
+
+    pending = get_merge_candidates(status="pending")
+    assert len(pending) == 1
+    assert pending[0]["id"] == cid
+    assert pending[0]["similarity"] == pytest.approx(0.92)
+    assert len(pending[0]["sources"]) == 2
+
+
+def test_create_merge_candidate_idempotent(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "Fact A")
+    m2 = create_memory("knowledge", "Fact A slightly reworded")
+
+    first = create_merge_candidate(
+        memory_ids=[m1.id, m2.id],
+        similarity=0.9,
+        proposed_content="merged",
+        proposed_memory_type="knowledge",
+        proposed_project="",
+    )
+    # Same id set, different order → dedupe skips
+    second = create_merge_candidate(
+        memory_ids=[m2.id, m1.id],
+        similarity=0.9,
+        proposed_content="merged (again)",
+        proposed_memory_type="knowledge",
+        proposed_project="",
+    )
+    assert first is not None
+    assert second is None
+
+
+def test_create_merge_candidate_requires_two_ids(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "lonely")
+    with pytest.raises(ValueError):
+        create_merge_candidate(
+            memory_ids=[m1.id],
+            similarity=1.0,
+            proposed_content="x",
+            proposed_memory_type="knowledge",
+            proposed_project="",
+        )
+
+
+# --- merge_memories end-to-end ---
+
+
+def test_merge_memories_happy_path(tmp_data_dir):
+    init_db()
+    m1 = create_memory(
+        "knowledge", "APM uses NX for DXF conversion",
+        project="apm", confidence=0.6, domain_tags=["apm", "nx"],
+    )
+    m2 = create_memory(
+        "knowledge", "APM does DXF to STL via NX bridge",
+        project="apm", confidence=0.8, domain_tags=["apm", "bridge"],
+    )
+    # Bump reference counts so sum is meaningful
+    with get_connection() as conn:
+        conn.execute("UPDATE memories SET reference_count = 3 WHERE id = ?", (m1.id,))
+        conn.execute("UPDATE memories SET reference_count = 5 WHERE id = ?", (m2.id,))
+
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id],
+        similarity=0.92,
+        proposed_content="APM uses NX bridge for DXF→STL conversion",
+        proposed_memory_type="knowledge",
+        proposed_project="apm",
+        proposed_tags=["apm", "nx", "bridge"],
+        proposed_confidence=0.7,
+        reason="duplicates",
+    )
+    new_id = merge_memories(candidate_id=cid, actor="human-triage")
+    assert new_id is not None
+
+    # Sources superseded
+    with get_connection() as conn:
+        s1 = conn.execute("SELECT status FROM memories WHERE id = ?", (m1.id,)).fetchone()
+        s2 = conn.execute("SELECT status FROM memories WHERE id = ?", (m2.id,)).fetchone()
+        merged = conn.execute(
+            "SELECT content, status, confidence, reference_count, project "
+            "FROM memories WHERE id = ?", (new_id,)
+        ).fetchone()
+        cand = conn.execute(
+            "SELECT status, result_memory_id FROM memory_merge_candidates WHERE id = ?",
+            (cid,),
+        ).fetchone()
+    assert s1["status"] == "superseded"
+    assert s2["status"] == "superseded"
+    assert merged["status"] == "active"
+    assert merged["project"] == "apm"
+    # confidence = max of sources (0.8), not the proposed 0.7 (proposed is hint;
+    # merge_memories picks max of actual source confidences — verify).
+    assert merged["confidence"] == pytest.approx(0.8)
+    # reference_count = sum (3 + 5 = 8)
+    assert int(merged["reference_count"]) == 8
+    assert cand["status"] == "approved"
+    assert cand["result_memory_id"] == new_id
+
+
+def test_merge_memories_content_override(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "draft A", project="p05-interferometer")
+    m2 = create_memory("knowledge", "draft B", project="p05-interferometer")
+
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id],
+        similarity=0.9,
+        proposed_content="AI draft",
+        proposed_memory_type="knowledge",
+        proposed_project="p05-interferometer",
+    )
+    new_id = merge_memories(
+        candidate_id=cid,
+        actor="human-triage",
+        override_content="human-edited final text",
+        override_tags=["optics", "custom"],
+    )
+    assert new_id is not None
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT content, domain_tags FROM memories WHERE id = ?", (new_id,)
+        ).fetchone()
+    assert row["content"] == "human-edited final text"
+    # domain_tags JSON should contain the override
+    assert "optics" in row["domain_tags"]
+    assert "custom" in row["domain_tags"]
+
+
+def test_merge_memories_writes_audit(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "alpha")
+    m2 = create_memory("knowledge", "alpha variant")
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id], similarity=0.9,
+        proposed_content="alpha merged",
+        proposed_memory_type="knowledge", proposed_project="",
+    )
+    new_id = merge_memories(candidate_id=cid)
+    assert new_id
+
+    audit_new = get_memory_audit(new_id)
+    actions_new = {a["action"] for a in audit_new}
+    assert "created_via_merge" in actions_new
+
+    audit_m1 = get_memory_audit(m1.id)
+    actions_m1 = {a["action"] for a in audit_m1}
+    assert "superseded" in actions_m1
+
+
+def test_merge_memories_aborts_if_source_not_active(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "one")
+    m2 = create_memory("knowledge", "two")
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id], similarity=0.9,
+        proposed_content="merged",
+        proposed_memory_type="knowledge", proposed_project="",
+    )
+    # Tamper: supersede one source before the merge runs
+    with get_connection() as conn:
+        conn.execute("UPDATE memories SET status = 'superseded' WHERE id = ?", (m1.id,))
+    result = merge_memories(candidate_id=cid)
+    assert result is None
+
+    # Candidate still pending
+    pending = get_merge_candidates(status="pending")
+    assert any(c["id"] == cid for c in pending)
+
+
+def test_merge_memories_rejects_already_resolved(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "x")
+    m2 = create_memory("knowledge", "y")
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id], similarity=0.9,
+        proposed_content="xy",
+        proposed_memory_type="knowledge", proposed_project="",
+    )
+    first = merge_memories(candidate_id=cid)
+    assert first is not None
+    # second call — already approved, should return None
+    second = merge_memories(candidate_id=cid)
+    assert second is None
+
+
+# --- reject_merge_candidate ---
+
+
+def test_reject_merge_candidate_leaves_sources_untouched(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "a")
+    m2 = create_memory("knowledge", "b")
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id], similarity=0.9,
+        proposed_content="a+b",
+        proposed_memory_type="knowledge", proposed_project="",
+    )
+    ok = reject_merge_candidate(cid, actor="human-triage", note="false positive")
+    assert ok
+
+    # Sources still active
+    with get_connection() as conn:
+        s1 = conn.execute("SELECT status FROM memories WHERE id = ?", (m1.id,)).fetchone()
+        s2 = conn.execute("SELECT status FROM memories WHERE id = ?", (m2.id,)).fetchone()
+        cand = conn.execute(
+            "SELECT status FROM memory_merge_candidates WHERE id = ?", (cid,)
+        ).fetchone()
+    assert s1["status"] == "active"
+    assert s2["status"] == "active"
+    assert cand["status"] == "rejected"
+
+
+def test_reject_merge_candidate_idempotent(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "p")
+    m2 = create_memory("knowledge", "q")
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id], similarity=0.9,
+        proposed_content="pq",
+        proposed_memory_type="knowledge", proposed_project="",
+    )
+    assert reject_merge_candidate(cid) is True
+    # second reject — already rejected, returns False
+    assert reject_merge_candidate(cid) is False
+
+
+# --- Schema sanity ---
+
+
+def test_merge_candidates_table_exists(tmp_data_dir):
+    init_db()
+    with get_connection() as conn:
+        cols = [r["name"] for r in conn.execute("PRAGMA table_info(memory_merge_candidates)").fetchall()]
+    expected = {"id", "status", "memory_ids", "similarity", "proposed_content",
+                "proposed_memory_type", "proposed_project", "proposed_tags",
+                "proposed_confidence", "reason", "created_at", "resolved_at",
+                "resolved_by", "result_memory_id"}
+    assert expected.issubset(set(cols))
--- a/tests/test_phase6_living_taxonomy.py
+++ b/tests/test_phase6_living_taxonomy.py
@@ -0,0 +1,148 @@
+"""Phase 6 tests — Living Taxonomy: detector + transient-to-durable extension."""
+
+from __future__ import annotations
+
+from datetime import datetime, timedelta, timezone
+
+import pytest
+
+from atocore.memory.service import (
+    create_memory,
+    extend_reinforced_valid_until,
+)
+from atocore.models.database import get_connection, init_db
+
+
+def _set_memory_fields(mem_id, reference_count=None, valid_until=None):
+    """Helper to force memory state for tests."""
+    with get_connection() as conn:
+        fields, params = [], []
+        if reference_count is not None:
+            fields.append("reference_count = ?")
+            params.append(reference_count)
+        if valid_until is not None:
+            fields.append("valid_until = ?")
+            params.append(valid_until)
+        params.append(mem_id)
+        conn.execute(
+            f"UPDATE memories SET {', '.join(fields)} WHERE id = ?",
+            params,
+        )
+
+
+# --- Transient-to-durable extension (C.3) ---
+
+
+def test_extend_extends_imminent_valid_until(tmp_data_dir):
+    init_db()
+    mem = create_memory("knowledge", "Reinforced content for extension")
+    soon = (datetime.now(timezone.utc) + timedelta(days=7)).strftime("%Y-%m-%d")
+    _set_memory_fields(mem.id, reference_count=6, valid_until=soon)
+
+    result = extend_reinforced_valid_until()
+    assert len(result) == 1
+    assert result[0]["memory_id"] == mem.id
+    assert result[0]["action"] == "extended"
+    # New expiry should be ~90 days out
+    new_date = datetime.strptime(result[0]["new_valid_until"], "%Y-%m-%d")
+    days_out = (new_date - datetime.now(timezone.utc).replace(tzinfo=None)).days
+    assert 85 <= days_out <= 92  # ~90 days, some slop for test timing
+
+
+def test_extend_makes_permanent_at_high_reference_count(tmp_data_dir):
+    init_db()
+    mem = create_memory("knowledge", "Heavy-referenced content")
+    soon = (datetime.now(timezone.utc) + timedelta(days=7)).strftime("%Y-%m-%d")
+    _set_memory_fields(mem.id, reference_count=15, valid_until=soon)
+
+    result = extend_reinforced_valid_until()
+    assert len(result) == 1
+    assert result[0]["action"] == "made_permanent"
+    assert result[0]["new_valid_until"] is None
+
+    # Verify the DB reflects the cleared expiry
+    with get_connection() as conn:
+        row = conn.execute(
+            "SELECT valid_until FROM memories WHERE id = ?", (mem.id,)
+        ).fetchone()
+    assert row["valid_until"] is None
+
+
+def test_extend_skips_not_expiring_soon(tmp_data_dir):
+    init_db()
+    mem = create_memory("knowledge", "Far-future expiry")
+    far = (datetime.now(timezone.utc) + timedelta(days=365)).strftime("%Y-%m-%d")
+    _set_memory_fields(mem.id, reference_count=6, valid_until=far)
+
+    result = extend_reinforced_valid_until(imminent_expiry_days=30)
+    assert result == []
+
+
+def test_extend_skips_low_reference_count(tmp_data_dir):
+    init_db()
+    mem = create_memory("knowledge", "Not reinforced enough")
+    soon = (datetime.now(timezone.utc) + timedelta(days=7)).strftime("%Y-%m-%d")
+    _set_memory_fields(mem.id, reference_count=2, valid_until=soon)
+
+    result = extend_reinforced_valid_until(min_reference_count=5)
+    assert result == []
+
+
+def test_extend_skips_permanent_memory(tmp_data_dir):
+    """Memory with no valid_until is already permanent — shouldn't touch."""
+    init_db()
+    mem = create_memory("knowledge", "Already permanent")
+    _set_memory_fields(mem.id, reference_count=20)
+    # no valid_until
+
+    result = extend_reinforced_valid_until()
+    assert result == []
+
+
+def test_extend_writes_audit_row(tmp_data_dir):
+    init_db()
+    mem = create_memory("knowledge", "Audited extension")
+    soon = (datetime.now(timezone.utc) + timedelta(days=7)).strftime("%Y-%m-%d")
+    _set_memory_fields(mem.id, reference_count=6, valid_until=soon)
+
+    extend_reinforced_valid_until()
+
+    from atocore.memory.service import get_memory_audit
+    audit = get_memory_audit(mem.id)
+    actions = [a["action"] for a in audit]
+    assert "valid_until_extended" in actions
+    entry = next(a for a in audit if a["action"] == "valid_until_extended")
+    assert entry["actor"] == "transient-to-durable"
+
+
+# --- Emerging detector (smoke tests — detector runs against live DB state
+#     so we test the shape of results rather than full integration here) ---
+
+
+def test_detector_imports_cleanly():
+    """Detector module must import without errors (it's called from nightly cron)."""
+    import importlib.util
+    import sys
+    from pathlib import Path
+
+    # Load the detector script as a module
+    script = Path(__file__).resolve().parent.parent / "scripts" / "detect_emerging.py"
+    assert script.exists()
+    spec = importlib.util.spec_from_file_location("detect_emerging", script)
+    mod = importlib.util.module_from_spec(spec)
+    # Don't actually run main() — just verify it parses and defines expected names
+    spec.loader.exec_module(mod)
+    assert hasattr(mod, "main")
+    assert hasattr(mod, "PROJECT_MIN_MEMORIES")
+    assert hasattr(mod, "PROJECT_ALERT_THRESHOLD")
+
+
+def test_detector_handles_empty_db(tmp_data_dir):
+    """Detector should handle zero memories without crashing."""
+    init_db()
+    # Don't create any memories. Just verify the queries work via the service layer.
+    from atocore.memory.service import get_memories
+    active = get_memories(active_only=True, limit=500)
+    candidates = get_memories(status="candidate", limit=500)
+    assert active == []
+    assert candidates == []
--- a/tests/test_tag_canon.py
+++ b/tests/test_tag_canon.py
@@ -0,0 +1,296 @@
+"""Phase 7C — tag canonicalization tests.
+
+Covers:
+  - prompt parser (fences, prose, empty)
+  - normalizer (identity, protected tokens, empty)
+  - get_tag_distribution counts across active memories
+  - apply_tag_alias rewrites + dedupes + audits
+  - create / approve / reject lifecycle
+  - idempotency (dup proposals skipped)
+"""
+
+from __future__ import annotations
+
+import pytest
+
+from atocore.memory._tag_canon_prompt import (
+    PROTECTED_PROJECT_TOKENS,
+    build_user_message,
+    normalize_alias_item,
+    parse_canon_output,
+)
+from atocore.memory.service import (
+    apply_tag_alias,
+    approve_tag_alias,
+    create_memory,
+    create_tag_alias_proposal,
+    get_memory_audit,
+    get_tag_alias_proposals,
+    get_tag_distribution,
+    reject_tag_alias,
+)
+from atocore.models.database import get_connection, init_db
+
+
+# --- Prompt parser ---
+
+
+def test_parse_canon_output_handles_fences():
+    raw = "```json\n{\"aliases\": [{\"alias\": \"fw\", \"canonical\": \"firmware\", \"confidence\": 0.9}]}\n```"
+    items = parse_canon_output(raw)
+    assert len(items) == 1
+    assert items[0]["alias"] == "fw"
+
+
+def test_parse_canon_output_handles_prose_prefix():
+    raw = "Here you go:\n{\"aliases\": [{\"alias\": \"ml\", \"canonical\": \"machine-learning\", \"confidence\": 0.9}]}"
+    items = parse_canon_output(raw)
+    assert len(items) == 1
+
+
+def test_parse_canon_output_empty_list():
+    assert parse_canon_output("{\"aliases\": []}") == []
+
+
+def test_parse_canon_output_malformed():
+    assert parse_canon_output("not json at all") == []
+    assert parse_canon_output("") == []
+
+
+# --- Normalizer ---
+
+
+def test_normalize_alias_strips_and_lowercases():
+    n = normalize_alias_item({"alias": " FW ", "canonical": "Firmware", "confidence": 0.95, "reason": "abbrev"})
+    assert n == {"alias": "fw", "canonical": "firmware", "confidence": 0.95, "reason": "abbrev"}
+
+
+def test_normalize_rejects_identity():
+    assert normalize_alias_item({"alias": "foo", "canonical": "foo", "confidence": 0.9}) is None
+
+
+def test_normalize_rejects_empty():
+    assert normalize_alias_item({"alias": "", "canonical": "foo", "confidence": 0.9}) is None
+    assert normalize_alias_item({"alias": "foo", "canonical": "", "confidence": 0.9}) is None
+
+
+def test_normalize_protects_project_tokens():
+    # Project ids must not be canonicalized — they're their own namespace
+    assert "p04" in PROTECTED_PROJECT_TOKENS
+    assert normalize_alias_item({"alias": "p04", "canonical": "p04-gigabit", "confidence": 1.0}) is None
+    assert normalize_alias_item({"alias": "p04-gigabit", "canonical": "p04", "confidence": 1.0}) is None
+    assert normalize_alias_item({"alias": "apm", "canonical": "part-manager", "confidence": 1.0}) is None
+
+
+def test_normalize_clamps_confidence():
+    hi = normalize_alias_item({"alias": "a", "canonical": "b", "confidence": 2.5})
+    assert hi["confidence"] == 1.0
+    lo = normalize_alias_item({"alias": "a", "canonical": "b", "confidence": -0.5})
+    assert lo["confidence"] == 0.0
+
+
+def test_normalize_handles_non_numeric_confidence():
+    n = normalize_alias_item({"alias": "a", "canonical": "b", "confidence": "not a number"})
+    assert n is not None and n["confidence"] == 0.0
+
+
+# --- build_user_message ---
+
+
+def test_build_user_message_includes_top_tags():
+    dist = {"firmware": 23, "fw": 5, "optics": 18, "optical": 2}
+    msg = build_user_message(dist)
+    assert "firmware: 23" in msg
+    assert "optics: 18" in msg
+    assert "aliases" in msg.lower() or "JSON" in msg
+
+
+def test_build_user_message_empty():
+    msg = build_user_message({})
+    assert "Empty" in msg or "empty" in msg
+
+
+# --- get_tag_distribution ---
+
+
+def test_tag_distribution_counts_active_only(tmp_data_dir):
+    init_db()
+    create_memory("knowledge", "a", domain_tags=["firmware", "p06"])
+    create_memory("knowledge", "b", domain_tags=["firmware"])
+    create_memory("knowledge", "c", domain_tags=["optics"])
+
+    # Add an invalid memory — should NOT be counted
+    m_invalid = create_memory("knowledge", "d", domain_tags=["firmware", "ignored"])
+    with get_connection() as conn:
+        conn.execute("UPDATE memories SET status = 'invalid' WHERE id = ?", (m_invalid.id,))
+
+    dist = get_tag_distribution()
+    assert dist.get("firmware") == 2  # two active memories
+    assert dist.get("optics") == 1
+    assert dist.get("p06") == 1
+    assert "ignored" not in dist
+
+
+def test_tag_distribution_min_count_filter(tmp_data_dir):
+    init_db()
+    create_memory("knowledge", "a", domain_tags=["firmware"])
+    create_memory("knowledge", "b", domain_tags=["firmware"])
+    create_memory("knowledge", "c", domain_tags=["once"])
+
+    dist = get_tag_distribution(min_count=2)
+    assert "firmware" in dist
+    assert "once" not in dist
+
+
+# --- apply_tag_alias ---
+
+
+def test_apply_tag_alias_rewrites_across_memories(tmp_data_dir):
+    init_db()
+    m1 = create_memory("knowledge", "a", domain_tags=["fw", "p06"])
+    m2 = create_memory("knowledge", "b", domain_tags=["fw"])
+    m3 = create_memory("knowledge", "c", domain_tags=["optics"])  # untouched
+
+    result = apply_tag_alias("fw", "firmware")
+    assert result["memories_touched"] == 2
+
+    import json as _json
+    with get_connection() as conn:
+        r1 = conn.execute("SELECT domain_tags FROM memories WHERE id = ?", (m1.id,)).fetchone()
+        r2 = conn.execute("SELECT domain_tags FROM memories WHERE id = ?", (m2.id,)).fetchone()
+        r3 = conn.execute("SELECT domain_tags FROM memories WHERE id = ?", (m3.id,)).fetchone()
+    assert "firmware" in _json.loads(r1["domain_tags"])
+    assert "fw" not in _json.loads(r1["domain_tags"])
+    assert "firmware" in _json.loads(r2["domain_tags"])
+    assert _json.loads(r3["domain_tags"]) == ["optics"]  # untouched
+
+
+def test_apply_tag_alias_dedupes_when_both_present(tmp_data_dir):
+    """Memory has both fw AND firmware → rewrite collapses to just firmware."""
+    init_db()
+    m = create_memory("knowledge", "dual-tagged", domain_tags=["fw", "firmware", "p06"])
+
+    result = apply_tag_alias("fw", "firmware")
+    assert result["memories_touched"] == 1
+
+    import json as _json
+    with get_connection() as conn:
+        r = conn.execute("SELECT domain_tags FROM memories WHERE id = ?", (m.id,)).fetchone()
+    tags = _json.loads(r["domain_tags"])
+    assert tags.count("firmware") == 1
+    assert "fw" not in tags
+    assert "p06" in tags
+
+
+def test_apply_tag_alias_skips_memories_without_alias(tmp_data_dir):
+    init_db()
+    m = create_memory("knowledge", "no match", domain_tags=["optics", "p04"])
+    result = apply_tag_alias("fw", "firmware")
+    assert result["memories_touched"] == 0
+
+
+def test_apply_tag_alias_writes_audit(tmp_data_dir):
+    init_db()
+    m = create_memory("knowledge", "audited", domain_tags=["fw"])
+    apply_tag_alias("fw", "firmware", actor="auto-tag-canon")
+
+    audit = get_memory_audit(m.id)
+    actions = [a["action"] for a in audit]
+    assert "tag_canonicalized" in actions
+    entry = next(a for a in audit if a["action"] == "tag_canonicalized")
+    assert entry["actor"] == "auto-tag-canon"
+    assert "fw → firmware" in entry["note"]
+    assert "fw" in entry["before"]["domain_tags"]
+    assert "firmware" in entry["after"]["domain_tags"]
+
+
+def test_apply_tag_alias_rejects_identity(tmp_data_dir):
+    init_db()
+    with pytest.raises(ValueError):
+        apply_tag_alias("foo", "foo")
+
+
+def test_apply_tag_alias_rejects_empty(tmp_data_dir):
+    init_db()
+    with pytest.raises(ValueError):
+        apply_tag_alias("", "firmware")
+
+
+# --- Proposal lifecycle ---
+
+
+def test_create_proposal_inserts_pending(tmp_data_dir):
+    init_db()
+    pid = create_tag_alias_proposal("fw", "firmware", confidence=0.65,
+                                    alias_count=5, canonical_count=23,
+                                    reason="standard abbreviation")
+    assert pid is not None
+
+    rows = get_tag_alias_proposals(status="pending")
+    assert len(rows) == 1
+    assert rows[0]["alias"] == "fw"
+    assert rows[0]["confidence"] == pytest.approx(0.65)
+
+
+def test_create_proposal_idempotent(tmp_data_dir):
+    init_db()
+    first = create_tag_alias_proposal("fw", "firmware", confidence=0.6)
+    second = create_tag_alias_proposal("fw", "firmware", confidence=0.7)
+    assert first is not None
+    assert second is None
+
+
+def test_approve_applies_rewrite(tmp_data_dir):
+    init_db()
+    m = create_memory("knowledge", "x", domain_tags=["fw"])
+    pid = create_tag_alias_proposal("fw", "firmware", confidence=0.7)
+    result = approve_tag_alias(pid, actor="human-triage")
+    assert result is not None
+    assert result["memories_touched"] == 1
+
+    # Proposal now approved with applied_to_memories recorded
+    rows = get_tag_alias_proposals(status="approved")
+    assert len(rows) == 1
+    assert rows[0]["applied_to_memories"] == 1
+
+    # Memory actually rewritten
+    import json as _json
+    with get_connection() as conn:
+        r = conn.execute("SELECT domain_tags FROM memories WHERE id = ?", (m.id,)).fetchone()
+    assert "firmware" in _json.loads(r["domain_tags"])
+
+
+def test_approve_already_resolved_returns_none(tmp_data_dir):
+    init_db()
+    pid = create_tag_alias_proposal("a", "b", confidence=0.6)
+    approve_tag_alias(pid)
+    assert approve_tag_alias(pid) is None  # second approve — no-op
+
+
+def test_reject_leaves_memories_untouched(tmp_data_dir):
+    init_db()
+    m = create_memory("knowledge", "x", domain_tags=["fw"])
+    pid = create_tag_alias_proposal("fw", "firmware", confidence=0.6)
+    assert reject_tag_alias(pid)
+
+    rows = get_tag_alias_proposals(status="rejected")
+    assert len(rows) == 1
+
+    # Memory still has the original tag
+    import json as _json
+    with get_connection() as conn:
+        r = conn.execute("SELECT domain_tags FROM memories WHERE id = ?", (m.id,)).fetchone()
+    assert "fw" in _json.loads(r["domain_tags"])
+
+
+# --- Schema sanity ---
+
+
+def test_tag_aliases_table_exists(tmp_data_dir):
+    init_db()
+    with get_connection() as conn:
+        cols = [r["name"] for r in conn.execute("PRAGMA table_info(tag_aliases)").fetchall()]
+    expected = {"id", "alias", "canonical", "status", "confidence",
+                "alias_count", "canonical_count", "reason",
+                "applied_to_memories", "created_at", "resolved_at", "resolved_by"}
+    assert expected.issubset(set(cols))
--- a/tests/test_wiki_pages.py
+++ b/tests/test_wiki_pages.py
@@ -0,0 +1,163 @@
+"""Tests for the new wiki pages shipped in the UI refresh:
+  - /wiki/capture     (7I follow-up)
+  - /wiki/memories/{id}   (7E)
+  - /wiki/domains/{tag}   (7F)
+  - /wiki/activity        (activity feed)
+  - home refresh          (topnav + activity snippet)
+"""
+
+from __future__ import annotations
+
+import pytest
+
+from atocore.engineering.wiki import (
+    render_activity,
+    render_capture,
+    render_domain,
+    render_homepage,
+    render_memory_detail,
+)
+from atocore.engineering.service import init_engineering_schema
+from atocore.memory.service import create_memory
+from atocore.models.database import init_db
+
+
+def _init_all():
+    """Wiki pages read from both the memory and engineering schemas, so
+    tests need both initialized (the engineering schema is a separate
+    init_engineering_schema() call)."""
+    init_db()
+    init_engineering_schema()
+
+
+def test_capture_page_renders_as_fallback(tmp_data_dir):
+    _init_all()
+    html = render_capture()
+    # Page is reachable but now labeled as a fallback, not promoted
+    assert "fallback only" in html
+    assert "sanctioned capture surfaces are Claude Code" in html
+    # Form inputs still exist for emergency use
+    assert "cap-prompt" in html
+    assert "cap-response" in html
+
+
+def test_capture_not_in_topnav(tmp_data_dir):
+    """The paste form should NOT appear in topnav — it's not the sanctioned path."""
+    _init_all()
+    html = render_homepage()
+    assert "/wiki/capture" not in html
+    assert "📥 Capture" not in html
+
+
+def test_memory_detail_renders(tmp_data_dir):
+    _init_all()
+    m = create_memory(
+        "knowledge", "APM uses NX bridge for DXF → STL",
+        project="apm", confidence=0.7, domain_tags=["apm", "nx", "cad"],
+    )
+    html = render_memory_detail(m.id)
+    assert html is not None
+    assert "APM uses NX" in html
+    assert "Audit trail" in html
+    # Tag links go to domain pages
+    assert '/wiki/domains/apm' in html
+    assert '/wiki/domains/nx' in html
+    # Project link present
+    assert '/wiki/projects/apm' in html
+
+
+def test_memory_detail_404(tmp_data_dir):
+    _init_all()
+    assert render_memory_detail("nonexistent-id") is None
+
+
+def test_domain_page_lists_memories(tmp_data_dir):
+    _init_all()
+    create_memory("knowledge", "optics fact 1", project="p04-gigabit",
+                  domain_tags=["optics"])
+    create_memory("knowledge", "optics fact 2", project="p05-interferometer",
+                  domain_tags=["optics", "metrology"])
+    create_memory("knowledge", "other", project="p06-polisher",
+                  domain_tags=["firmware"])
+
+    html = render_domain("optics")
+    assert "Domain: <code>optics</code>" in html
+    assert "p04-gigabit" in html
+    assert "p05-interferometer" in html
+    assert "optics fact 1" in html
+    assert "optics fact 2" in html
+    # Unrelated memory should NOT appear
+    assert "other" not in html or "firmware" not in html
+
+
+def test_domain_page_empty(tmp_data_dir):
+    _init_all()
+    html = render_domain("definitely-not-a-tag")
+    assert "No memories currently carry" in html
+
+
+def test_domain_page_normalizes_tag(tmp_data_dir):
+    _init_all()
+    create_memory("knowledge", "x", domain_tags=["firmware"])
+    # Case-insensitive
+    assert "firmware" in render_domain("FIRMWARE")
+    # Whitespace tolerant
+    assert "firmware" in render_domain("  firmware  ")
+
+
+def test_activity_feed_renders(tmp_data_dir):
+    _init_all()
+    m = create_memory("knowledge", "activity test")
+    html = render_activity()
+    assert "Activity Feed" in html
+    # The newly-created memory should appear as a "created" event
+    assert "created" in html
+    # Short timestamp format
+    assert m.id[:8] in html
+
+
+def test_activity_feed_groups_by_action_and_actor(tmp_data_dir):
+    _init_all()
+    for i in range(3):
+        create_memory("knowledge", f"m{i}", actor="test-actor")
+
+    html = render_activity()
+    # Summary row should show "created: 3" or similar
+    assert "created" in html
+    assert "test-actor" in html
+
+
+def test_homepage_has_topnav_and_activity(tmp_data_dir):
+    _init_all()
+    create_memory("knowledge", "homepage test")
+    html = render_homepage()
+    # Topnav with expected items (Capture removed — it's not sanctioned capture)
+    assert "🏠 Home" in html
+    assert "📡 Activity" in html
+    assert "/wiki/activity" in html
+    assert "/wiki/capture" not in html
+    # Activity snippet
+    assert "What the brain is doing" in html
+
+
+def test_memory_detail_shows_superseded_sources(tmp_data_dir):
+    """After a merge, sources go to status=superseded. Detail page should
+    still render them."""
+    from atocore.memory.service import (
+        create_merge_candidate, merge_memories,
+    )
+    _init_all()
+    m1 = create_memory("knowledge", "alpha variant 1", project="test")
+    m2 = create_memory("knowledge", "alpha variant 2", project="test")
+    cid = create_merge_candidate(
+        memory_ids=[m1.id, m2.id], similarity=0.9,
+        proposed_content="alpha merged",
+        proposed_memory_type="knowledge", proposed_project="test",
+    )
+    merge_memories(cid, actor="auto-dedup-tier1")
+
+    # Source detail page should render and show the superseded status
+    html1 = render_memory_detail(m1.id)
+    assert html1 is not None
+    assert "superseded" in html1
+    assert "auto-dedup-tier1" in html1  # audit trail shows who merged
Author	SHA1	Message	Date
Anto01	83b4d78cb7	docs(ledger): session log for Phase 7A.1/7C/7D/7I + UI refresh + capture surface scope	2026-04-19 12:14:14 -04:00
Anto01	9c91d778d9	feat: Claude Code context injection (UserPromptSubmit hook) Closes the asymmetry the user surfaced: before this, Claude Code captured every turn (Stop hook) but retrieval only happened when Claude chose to call atocore_context (opt-in MCP tool). OpenClaw had both sides covered after 7I; Claude Code did not. Now symmetric. Every Claude Code prompt is auto-sent to /context/build and the returned pack is prepended via hookSpecificOutput.additionalContext — same as what OpenClaw's before_agent_start hook now does. - deploy/hooks/inject_context.py — UserPromptSubmit hook. Fail-open (always exit 0). Skips short/XML prompts. 5s timeout. Project inference mirrors capture_stop.py cwd→slug table. Kill switch: ATOCORE_CONTEXT_DISABLED=1. - ~/.claude/settings.json registered the hook (local config, not committed; copy-paste snippet in docs/capture-surfaces.md). - Removed /wiki/capture from topnav. Endpoint still exists but the page is now labeled "fallback only" with a warning banner. The sanctioned surfaces are Claude Code + OpenClaw; manual paste is explicitly not the design. - docs/capture-surfaces.md — scope statement: two surfaces, nothing else. Anthropic API polling explicitly prohibited. Tests: +8 for inject_context.py (exit 0 on all failure modes, kill switch, short prompt filter, XML filter, bad stdin, mock-server success shape, project inference from cwd). Updated 2 wiki tests for the topnav change. 450 → 459. Verified live with real AtoCore: injected 2979 chars of atocore project context on a cwd-matched prompt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:01:41 -04:00
Anto01	6e43cc7383	feat: Phase 7I + UI refresh (capture form, memory/domain/activity pages, topnav) Closes three gaps the user surfaced: (1) OpenClaw agents run blind without AtoCore context, (2) mobile/desktop chats can't be captured at all, (3) wiki UI hadn't kept up with backend capabilities. Phase 7I — OpenClaw two-way bridge - Plugin now calls /context/build on before_agent_start and prepends the context pack to event.prompt, so whatever LLM runs underneath (sonnet, opus, codex, local model) answers grounded in AtoCore knowledge. Captured prompt stays the user's original text; fail-open with a 5s timeout. Config-gated via injectContext flag. - Plugin version 0.0.0 → 0.2.0; README rewritten. UI refresh - /wiki/capture — paste-to-ingest form for Claude Desktop / web / mobile / ChatGPT / other. Goes through normal /interactions pipeline with client="claude-desktop\|claude-web\|claude-mobile\|chatgpt\|other". Fixes the rotovap/mushroom-on-phone gap. - /wiki/memories/{id} (Phase 7E) — full memory detail: content, status, confidence, refs, valid_until, domain_tags (clickable to domain pages), project link, source chunk, graduated-to-entity link, full audit trail, related-by-tag neighbors. - /wiki/domains/{tag} (Phase 7F) — cross-project view: all active memories with the given tag grouped by project, sorted by count. Case-insensitive, whitespace-tolerant. Also surfaces graduated entities carrying the tag. - /wiki/activity — autonomous-activity timeline feed. Summary chips by action (created/promoted/merged/superseded/decayed/canonicalized) and by actor (auto-dedup-tier1, auto-dedup-tier2, confidence-decay, phase10-auto-promote, transient-to-durable, tag-canon, human-triage). Answers "what has the brain been doing while I was away?" - Home refresh: persistent topnav (Home · Activity · Capture · Triage · Dashboard), "What the brain is doing" snippet above project cards showing recent autonomous-actor counts, link to full activity. Tests: +10 (capture page, memory detail + 404, domain cross-project + empty + tag normalization, activity feed + groupings, home topnav, superseded-source detail after merge). 440 → 450. Known next: capture-browser extension for Claude.ai web (bigger project, deferred); voice/mobile relay (adjacent). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:14:15 -04:00
Anto01	877b97ec78	feat: Phase 7C — tag canonicalization (autonomous, weekly) LLM proposes alias→canonical mappings for domain_tags; confidence >= 0.8 auto-apply, below goes to human triage. Protects project identifiers (p04, p05, p06, atocore, apm, etc.) from ever being canonicalized since they're their own namespace, not concepts. Problem solved: tag drift fragments retrieval. "fw" vs "firmware" vs "firmware-control" all mean the same thing, but cross-cutting queries that filter by tag only hit one variant. Weekly canonicalization pass keeps the tag graph clean. - Schema: tag_aliases table (pending \| approved \| rejected) - atocore.memory._tag_canon_prompt (stdlib-only, protected project tokens) - service: get_tag_distribution, apply_tag_alias (atomic per-memory, dedupes if both alias + canonical present), create / approve / reject proposal lifecycle, per-memory audit rows with action="tag_canonicalized" - scripts/canonicalize_tags.py: host-side detector, autonomous by default, --no-auto-approve kill switch - 6 API endpoints under /admin/tags/* (distribution, list, propose, apply, approve/{id}, reject/{id}) - Step B4 in batch-extract.sh (Sundays only — weekly cadence) - 26 new tests (prompt parser, normalizer protections, distribution counting, rewrite atomicity, dedup, audit, lifecycle). 414 → 440. Design: aggressive protection of project tokens because a false canonicalization (p04 → p04-gigabit, or vice versa) would scramble cross-project filtering. Err toward preservation; the alias only applies if the model is very confident AND both strings appear in the current distribution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 09:41:02 -04:00
Anto01	e840ef4be3	feat: Phase 7D — confidence decay on unreferenced cold memories Daily job multiplies confidence by 0.97 (~2-month half-life) for active memories with reference_count=0 AND idle > 30 days. Below 0.3 → auto-supersede with audit. Reversible via reinforcement (which already bumps confidence back up). Rationale: stale memories currently rank equal to fresh ones in retrieval. Without decay, the brain accumulates obsolete facts that compete with fresh knowledge for context-pack slots. With decay, memories earn their longevity via reference. - decay_unreferenced_memories() in service.py (stdlib-only, no cron infra needed) - POST /admin/memory/decay-run endpoint - Nightly Step F4 in batch-extract.sh - Exempt: reinforced (refcount > 0), graduated, superseded, invalid - Audit row per supersession ("decayed below floor, no references"), actor="confidence-decay". Per-decay rows skipped (chatty, no human value — status change is the meaningful signal). - Configurable via env: ATOCORE_DECAY_* (exposed through endpoint body) Tests: +13 (basic decay, reinforcement protection, supersede at floor, audit trail, graduated/superseded exemption, reinforcement reversibility, threshold tuning, parameter validation, cross-run stacking). 401 → 414. Next in Phase 7: 7C tag canonicalization (weekly), then 7B contradiction detection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:50:20 -04:00
Anto01	56d5df0ab4	feat: Phase 7A.1 — autonomous merge tiering (sonnet → opus → human) Dedup detector now merges high-confidence duplicates silently instead of piling every proposal into a human triage queue. Matches the 3-tier escalation pattern that auto_triage already uses. Tiering decision per cluster: TIER-1 auto-approve: sonnet confidence >= 0.8 AND min_pairwise_sim >= 0.92 AND all sources share project+type → auto-merge silently (actor="auto-dedup-tier1" in audit log) TIER-2 escalation: sonnet 0.5-0.8 conf OR sim 0.85-0.92 → opus second opinion. Opus confirms with conf >= 0.8 → auto-merge (actor="auto-dedup-tier2"). Opus overrides (reject) → skip silently. Opus low conf → human triage with opus's refined draft. HUMAN triage: Only the genuinely ambiguous land in /admin/triage. Env-tunable thresholds: ATOCORE_DEDUP_AUTO_APPROVE_CONF (0.8) ATOCORE_DEDUP_AUTO_APPROVE_SIM (0.92) ATOCORE_DEDUP_TIER2_MIN_CONF (0.5) ATOCORE_DEDUP_TIER2_MIN_SIM (0.85) ATOCORE_DEDUP_TIER2_MODEL (opus) New flag --no-auto-approve for kill-switch testing (everything → human queue). Tests: +6 (tier-2 prompt content, same_bucket edges, min_pairwise_similarity on identical + transitive clusters). 395 → 401. Rationale: user asked for autonomous behavior — "this needs to be intelligent, I don't want to manually triage stuff". Matches the consolidation principle: never discard details, but let the brain tidy up on its own for the easy cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:46:26 -04:00
Anto01	028d4c3594	feat: Phase 7A — semantic memory dedup ("sleep cycle" V1) New table memory_merge_candidates + service functions to cluster near-duplicate active memories within (project, memory_type) buckets, draft a unified content via LLM, and merge on human approval. Source memories become superseded (never deleted); merged memory carries union of tags, max of confidence, sum of reference_count. - schema migration for memory_merge_candidates - atocore.memory.similarity: cosine + transitive clustering - atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific - service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate - scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent) - 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan - triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar - batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays) - deploy/dalidou/dedup-watcher.sh for UI-triggered scans - 21 new tests (374 → 395) - docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:30:49 -04:00
Anto01	9f262a21b0	feat: extractor llm-0.6.0 — bolder unknown-project tagging User observation: APM work was captured + extracted, but candidates got tagged project=atocore or left blank instead of project=apm. Reason: the prompt said 'Unknown project names — still tag them' but was too terse; sonnet hedged toward registered matches rather than proposing new slugs. Fix: explicit guidance in the system prompt for when to propose an unregistered project name vs when to stick with a registered one. New instructions: - When a memory is clearly ABOUT a named tool/product/project/system not in the known list, use a slugified version as the project tag ('apm' for 'Atomaste Part Manager'). The Living Taxonomy detector (Phase 6 C.1) scans these and surfaces for one-click registration once ≥3 memories accumulate with that tag. - Exception: if a memory merely USES an unknown tool but is about a registered project ('p04 parts missing materials in APM'), tag with the registered project and mention the tool in content. This closes the loop on the Phase 6 detector: extractor now produces taggable data for it, detector surfaces, user registers with one click. Version bump: llm-0.5.0 → llm-0.6.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:31:09 -04:00
Anto01	7863ab3825	feat: hourly extract+triage cron — close the 24h latency gap User observation that triggered this: 'AtoCore was meant to remember and triage by its own, not me specifically asking to remember things'. Correct — the system IS capturing autonomously (Stop hook + OpenClaw plugin), but extraction was nightly-only. So 'I talked about APM today' didn't show up in memories until the next 03:00 UTC cron run. Fix: split the lightweight extraction + triage into a new hourly cron. The heavy nightly (backup, rsync, OpenClaw import, synthesis, harness, integrity, emerging detector) stays at 03:00 UTC — no reason to run those hourly. hourly-extract.sh does ONLY: - Step A: batch_llm_extract_live.py (limit 50, ~1h window) - Step B: auto_triage.py (3-tier, max_batches=3) Lock file prevents overlap on rate-limit retries. After this lands: latency from 'you told me X' to 'X is an active memory' drops from ~24h to ~1h (plus the ~5min it takes for extraction + triage to complete on a typical <20 interactions/hour). The 'atocore_remember' MCP tool stays as an escape hatch for conversations that happen outside captured channels (Claude Desktop web, phone), NOT as the primary capture path. The primary path is automatic: Claude Code / OpenClaw captures → hourly extract → 3-tier triage → active memory. Install cron entry manually: 0 * * * * /srv/storage/atocore/app/deploy/dalidou/hourly-extract.sh \ >> /home/papa/atocore-logs/hourly-extract.log 2>&1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:24:49 -04:00
Anto01	3ba49e92a9	fix: detector uses HTTP-only (host lacks atocore deps) Same pattern as integrity_check.py — the host-side Python doesn't have pydantic_settings. Refactor detect_emerging.py to talk to the container via HTTP instead of importing atocore.memory.service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:15:50 -04:00
Anto01	02055e8db3	feat: Phase 6 — Living Taxonomy + Universal Capture Closes two real-use gaps: 1. "APM tool" gap: work done outside Claude Code (desktop, web, phone, other machine) was invisible to AtoCore. 2. Project discovery gap: manual JSON-file edits required to promote an emerging theme to a first-class project. B — atocore_remember MCP tool (scripts/atocore_mcp.py): - New MCP tool for universal capture from any MCP-aware client (Claude Desktop, Code, Cursor, Zed, Windsurf, etc.) - Accepts content (required) + memory_type/project/confidence/ valid_until/domain_tags (all optional with sensible defaults) - Creates a candidate memory, goes through the existing 3-tier triage (no bypass — the quality gate catches noise) - Detailed tool description guides Claude on when to invoke: "remember this", "save that for later", "don't lose this fact" - Total tools exposed by MCP server: 14 → 15 C.1 Emerging-concepts detector (scripts/detect_emerging.py): - Nightly scan of active + candidate memories for: * Unregistered project names with ≥3 memory occurrences * Top 20 domain_tags by frequency (emerging categories) * Active memories with reference_count ≥ 5 + valid_until set (reinforced transients — candidates for extension) - Writes findings to atocore/proposals/* project state entries - Emits "warning" alert via Phase 4 framework the FIRST time a new project crosses the 5-memory alert threshold (avoids spam) - Configurable via env vars: ATOCORE_EMERGING_PROJECT_MIN (default 3), ATOCORE_EMERGING_ALERT_THRESHOLD (default 5), TOP_TAGS_LIMIT (20) C.2 Registration surface (src/atocore/api/routes.py + wiki.py): - POST /admin/projects/register-emerging — one-click register with sensible defaults (ingest_roots auto-filled with vault:incoming/projects/<id>/ convention). Clears the proposal from the dashboard list on success. - Dashboard /admin/dashboard: new "proposals" section with unregistered_projects + emerging_categories + reinforced_transients. - Wiki homepage: "📋 Emerging" section rendering each unregistered project as a card with count + 2 sample memory previews + inline "📌 Register as project" button that calls the endpoint via fetch, reloads the page on success. C.3 Transient-to-durable extension (src/atocore/memory/service.py + API + cron): - New extend_reinforced_valid_until() function — scans active memories with valid_until in the next 30 days and reference_count ≥ 5. Extends expiry by 90 days. If reference_count ≥ 10, clears expiry entirely (makes permanent). Writes audit rows via the Phase 4 memory_audit framework with actor="transient-to-durable". - POST /admin/memory/extend-reinforced — API wrapper for cron. - Matches the user's intuition: "something transient becomes important if you keep coming back to it". Nightly cron (deploy/dalidou/batch-extract.sh): - Step F2: detect_emerging.py (after F pipeline summary) - Step F3: /admin/memory/extend-reinforced (before integrity check) - Both fail-open; errors don't break the pipeline. Tests: 366 → 374 (+8 for Phase 6): - 6 tests for extend_reinforced_valid_until covering: extension path, permanent path, skip far-future, skip low-refs, skip permanent memories, audit row write - 2 smoke tests for the detector (imports cleanly, handles empty DB) - MCP tool changes don't need new tests — the wrapper is pure passthrough Design decisions documented in plan file: - atocore_remember deliberately doesn't bypass triage (quality gate) - Detector is passive (surfaces proposals) not active (auto-registers) - Sensible ingest-root defaults ("vault:incoming/projects/<id>/") so registration is one-click with no file-path thinking - Extension adds 90 days rather than clearing expiry (gradual permanence earned through sustained reinforcement) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:08:55 -04:00
Anto01	cc68839306	fix: OpenClaw plugin filters cron-initiated agent runs OpenClaw scheduled tasks (DXF email watcher, calendar reminder pings) fire agent sessions with prompts that begin '[cron:<id> ...]'. These were all getting captured as AtoCore interactions — 45 out of 50 recent interactions today were cron noise, not real user turns. Filter at the plugin level: before_agent_start ignores any prompt starting with '[cron:'. The gateway has been restarted with the updated plugin. Impact: graduation, triage, and context pipelines stop seeing noise from OpenClaw's own internal automation. Only real user turns (via chat channels) feed the brain going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:09:44 -04:00
Anto01	45196f352f	fix: force UTF-8 on MCP stdio for Windows compatibility Windows Python defaults stdout to cp1252. Any non-ASCII char in tool responses (emojis, ≥, →, etc.) crashes the MCP server with a UnicodeEncodeError. Explicitly reconfigure stdin/stdout/stderr to UTF-8 at startup. No-op on Linux/macOS. Noticed when Claude Code called atocore_search and atocore_memory_list and both crashed on ⏳ / ≥ characters that came back in the response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:08:24 -04:00