Files
ATOCore/scripts/detect_emerging.py

201 lines
7.2 KiB
Python
Raw Normal View History

feat: Phase 6 — Living Taxonomy + Universal Capture Closes two real-use gaps: 1. "APM tool" gap: work done outside Claude Code (desktop, web, phone, other machine) was invisible to AtoCore. 2. Project discovery gap: manual JSON-file edits required to promote an emerging theme to a first-class project. B — atocore_remember MCP tool (scripts/atocore_mcp.py): - New MCP tool for universal capture from any MCP-aware client (Claude Desktop, Code, Cursor, Zed, Windsurf, etc.) - Accepts content (required) + memory_type/project/confidence/ valid_until/domain_tags (all optional with sensible defaults) - Creates a candidate memory, goes through the existing 3-tier triage (no bypass — the quality gate catches noise) - Detailed tool description guides Claude on when to invoke: "remember this", "save that for later", "don't lose this fact" - Total tools exposed by MCP server: 14 → 15 C.1 Emerging-concepts detector (scripts/detect_emerging.py): - Nightly scan of active + candidate memories for: * Unregistered project names with ≥3 memory occurrences * Top 20 domain_tags by frequency (emerging categories) * Active memories with reference_count ≥ 5 + valid_until set (reinforced transients — candidates for extension) - Writes findings to atocore/proposals/* project state entries - Emits "warning" alert via Phase 4 framework the FIRST time a new project crosses the 5-memory alert threshold (avoids spam) - Configurable via env vars: ATOCORE_EMERGING_PROJECT_MIN (default 3), ATOCORE_EMERGING_ALERT_THRESHOLD (default 5), TOP_TAGS_LIMIT (20) C.2 Registration surface (src/atocore/api/routes.py + wiki.py): - POST /admin/projects/register-emerging — one-click register with sensible defaults (ingest_roots auto-filled with vault:incoming/projects/<id>/ convention). Clears the proposal from the dashboard list on success. - Dashboard /admin/dashboard: new "proposals" section with unregistered_projects + emerging_categories + reinforced_transients. - Wiki homepage: "📋 Emerging" section rendering each unregistered project as a card with count + 2 sample memory previews + inline "📌 Register as project" button that calls the endpoint via fetch, reloads the page on success. C.3 Transient-to-durable extension (src/atocore/memory/service.py + API + cron): - New extend_reinforced_valid_until() function — scans active memories with valid_until in the next 30 days and reference_count ≥ 5. Extends expiry by 90 days. If reference_count ≥ 10, clears expiry entirely (makes permanent). Writes audit rows via the Phase 4 memory_audit framework with actor="transient-to-durable". - POST /admin/memory/extend-reinforced — API wrapper for cron. - Matches the user's intuition: "something transient becomes important if you keep coming back to it". Nightly cron (deploy/dalidou/batch-extract.sh): - Step F2: detect_emerging.py (after F pipeline summary) - Step F3: /admin/memory/extend-reinforced (before integrity check) - Both fail-open; errors don't break the pipeline. Tests: 366 → 374 (+8 for Phase 6): - 6 tests for extend_reinforced_valid_until covering: extension path, permanent path, skip far-future, skip low-refs, skip permanent memories, audit row write - 2 smoke tests for the detector (imports cleanly, handles empty DB) - MCP tool changes don't need new tests — the wrapper is pure passthrough Design decisions documented in plan file: - atocore_remember deliberately doesn't bypass triage (quality gate) - Detector is passive (surfaces proposals) not active (auto-registers) - Sensible ingest-root defaults ("vault:incoming/projects/<id>/") so registration is one-click with no file-path thinking - Extension adds 90 days rather than clearing expiry (gradual permanence earned through sustained reinforcement) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:08:55 -04:00
#!/usr/bin/env python3
"""Phase 6 C.1 — Emerging-concepts detector.
Scans active + candidate memories to surface:
1. Unregistered projects project strings appearing on 3+ memories
that aren't in the project registry. Surface for one-click
registration.
2. Emerging categories top 20 domain_tags by frequency, for
"what themes are emerging in my work?" intelligence.
3. Reinforced transients active memories with reference_count >= 5
AND valid_until set. These "were temporary but now durable";
candidates for valid_until extension (handled by a sibling script).
Writes results to project_state under atocore/proposals/*. Emits a
warning alert the FIRST time a project crosses the 5-memory threshold
(so the user gets notified without being spammed on every run).
Usage:
python3 scripts/detect_emerging.py [--base-url URL] [--dry-run]
"""
from __future__ import annotations
import argparse
import json
import os
import sys
from collections import Counter, defaultdict
# src/ importable so we can reuse service helpers
_SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
_SRC_DIR = os.path.abspath(os.path.join(_SCRIPT_DIR, "..", "src"))
if _SRC_DIR not in sys.path:
sys.path.insert(0, _SRC_DIR)
PROJECT_MIN_MEMORIES = int(os.environ.get("ATOCORE_EMERGING_PROJECT_MIN", "3"))
PROJECT_ALERT_THRESHOLD = int(os.environ.get("ATOCORE_EMERGING_ALERT_THRESHOLD", "5"))
TOP_TAGS_LIMIT = int(os.environ.get("ATOCORE_EMERGING_TOP_TAGS", "20"))
def main() -> None:
parser = argparse.ArgumentParser(description="Detect emerging projects + categories")
parser.add_argument("--base-url", default=os.environ.get("ATOCORE_BASE_URL", "http://127.0.0.1:8100"))
parser.add_argument("--dry-run", action="store_true", help="Report without writing to project state")
args = parser.parse_args()
from atocore.memory.service import get_memories
from atocore.projects.registry import load_project_registry
from atocore.context.project_state import set_state, get_state
# Registered project ids (including aliases — a memory tagged 'p04' should
# NOT be flagged as emerging since 'p04' is a registered alias for p04-gigabit)
registered = set()
for p in load_project_registry():
registered.add(p.project_id.lower())
for alias in p.aliases:
registered.add(alias.lower())
# Pull active + candidate memories (give ourselves a broad view)
active = get_memories(active_only=True, limit=500)
candidates = get_memories(status="candidate", limit=500)
all_mems = list(active) + list(candidates)
# --- Unregistered projects ---
project_mems: dict[str, list] = defaultdict(list)
for m in all_mems:
proj = (m.project or "").strip().lower()
if not proj or proj in registered:
continue
project_mems[proj].append(m)
unregistered = []
for proj, mems in sorted(project_mems.items()):
if len(mems) < PROJECT_MIN_MEMORIES:
continue
unregistered.append({
"project": proj,
"count": len(mems),
"sample_memory_ids": [m.id for m in mems[:3]],
"sample_contents": [(m.content or "")[:150] for m in mems[:3]],
})
# --- Emerging domain_tags (only active memories — candidates might be noise) ---
tag_counter = Counter()
for m in active:
for t in (m.domain_tags or []):
if isinstance(t, str) and t.strip():
tag_counter[t.strip().lower()] += 1
emerging_tags = [
{"tag": tag, "count": cnt}
for tag, cnt in tag_counter.most_common(TOP_TAGS_LIMIT)
]
# --- Reinforced transients ---
reinforced = []
for m in active:
ref_count = getattr(m, "reference_count", 0) or 0
vu = (getattr(m, "valid_until", "") or "").strip()
if ref_count >= 5 and vu:
reinforced.append({
"memory_id": m.id,
"reference_count": ref_count,
"valid_until": vu,
"content_preview": (m.content or "")[:150],
"project": m.project or "",
})
# --- Output ---
result = {
"unregistered_projects": unregistered,
"emerging_categories": emerging_tags,
"reinforced_transients": reinforced,
"counts": {
"active_memories": len(active),
"candidate_memories": len(candidates),
"unregistered_project_count": len(unregistered),
"emerging_tag_count": len(emerging_tags),
"reinforced_transient_count": len(reinforced),
},
}
print(json.dumps(result, indent=2))
if args.dry_run:
return
# --- Persist to project state ---
try:
set_state(
project_name="atocore",
category="proposals",
key="unregistered_projects",
value=json.dumps(unregistered),
source="emerging detector",
)
set_state(
project_name="atocore",
category="proposals",
key="emerging_categories",
value=json.dumps(emerging_tags),
source="emerging detector",
)
set_state(
project_name="atocore",
category="proposals",
key="reinforced_transients",
value=json.dumps(reinforced),
source="emerging detector",
)
except Exception as e:
print(f"WARN: failed to persist to project state: {e}", file=sys.stderr)
# --- Alert on NEW projects crossing alert threshold ---
try:
# Read previous run's projects to detect "new" ones
prev_unregistered: list = []
for e in get_state("atocore"):
if e.category == "proposals" and e.key == "unregistered_projects_prev":
try:
prev_unregistered = json.loads(e.value)
except Exception:
pass
prev_names = {p.get("project") for p in prev_unregistered if isinstance(p, dict)}
newly_crossed = [
p for p in unregistered
if p["count"] >= PROJECT_ALERT_THRESHOLD
and p["project"] not in prev_names
]
if newly_crossed:
from atocore.observability.alerts import emit_alert
names = ", ".join(p["project"] for p in newly_crossed)
emit_alert(
severity="warning",
title=f"Emerging project(s) detected: {names}",
message=(
f"{len(newly_crossed)} unregistered project(s) have crossed "
f"the {PROJECT_ALERT_THRESHOLD}-memory threshold and may "
f"warrant registration: {names}. Review at /wiki or "
f"/admin/dashboard."
),
context={"projects": [p["project"] for p in newly_crossed]},
)
# Persist this run's list for next-run comparison
set_state(
project_name="atocore",
category="proposals",
key="unregistered_projects_prev",
value=json.dumps(unregistered),
source="emerging detector",
)
except Exception as e:
print(f"WARN: alert/state write failed: {e}", file=sys.stderr)
if __name__ == "__main__":
main()