feat: Phase 4 V1 — Robustness Hardening
Adds the observability + safety layer that turns AtoCore from
"works until something silently breaks" into "every mutation is
traceable, drift is detected, failures raise alerts."
1. Audit log (memory_audit table):
- New table with id, memory_id, action, actor, before/after JSON,
note, timestamp; 3 indexes for memory_id/timestamp/action
- _audit_memory() helper called from every mutation:
create_memory, update_memory, promote_memory,
reject_candidate_memory, invalidate_memory, supersede_memory,
reinforce_memory, auto_promote_reinforced, expire_stale_candidates
- Action verb auto-selected: promoted/rejected/invalidated/
superseded/updated based on state transition
- "actor" threaded through: api-http, human-triage, phase10-auto-
promote, candidate-expiry, reinforcement, etc.
- Fail-open: audit write failure logs but never breaks the mutation
- GET /memory/{id}/audit: full history for one memory
- GET /admin/audit/recent: last 50 mutations across the system
2. Alerts framework (src/atocore/observability/alerts.py):
- emit_alert(severity, title, message, context) fans out to:
- structlog logger (always)
- ~/atocore-logs/alerts.log append (configurable via
ATOCORE_ALERT_LOG)
- project_state atocore/alert/last_{severity} (dashboard surface)
- ATOCORE_ALERT_WEBHOOK POST if set (auto-detects Discord webhook
format for nice embeds; generic JSON otherwise)
- Every sink fail-open — one failure doesn't prevent the others
- Pipeline alert step in nightly cron: harness < 85% → warning;
candidate queue > 200 → warning
3. Integrity checks (scripts/integrity_check.py):
- Nightly scan for drift:
- Memories → missing source_chunk_id references
- Duplicate active memories (same type+content+project)
- project_state → missing projects
- Orphaned source_chunks (no parent document)
- Results persisted to atocore/status/integrity_check_result
- Any finding emits a warning alert
- Added as Step G in deploy/dalidou/batch-extract.sh nightly cron
4. Dashboard surfaces it all:
- integrity (findings + details)
- alerts (last info/warning/critical per severity)
- recent_audit (last 10 mutations with actor + action + preview)
Tests: 308 → 317 (9 new):
- test_audit_create_logs_entry
- test_audit_promote_logs_entry
- test_audit_reject_logs_entry
- test_audit_update_captures_before_after
- test_audit_reinforce_logs_entry
- test_recent_audit_returns_cross_memory_entries
- test_emit_alert_writes_log_file
- test_emit_alert_invalid_severity_falls_back_to_info
- test_emit_alert_fails_open_on_log_write_error
Deferred: formal migration framework with rollback (current additive
pattern is fine for V1); memory detail wiki page with audit view
(quick follow-up).
To enable Discord alerts: set ATOCORE_ALERT_WEBHOOK to a Discord
webhook URL in Dalidou's environment. Default = log-only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -67,6 +67,106 @@ class Memory:
|
||||
valid_until: str = "" # ISO UTC; empty = permanent
|
||||
|
||||
|
||||
def _audit_memory(
|
||||
memory_id: str,
|
||||
action: str,
|
||||
actor: str = "api",
|
||||
before: dict | None = None,
|
||||
after: dict | None = None,
|
||||
note: str = "",
|
||||
) -> None:
|
||||
"""Append an entry to memory_audit.
|
||||
|
||||
Phase 4 Robustness V1. Every memory mutation flows through this
|
||||
helper so we can answer "how did this memory get to its current
|
||||
state?" and "when did we learn X?".
|
||||
|
||||
``action`` is a short verb: created, updated, promoted, rejected,
|
||||
superseded, invalidated, reinforced, auto_promoted, expired.
|
||||
``actor`` identifies the caller: api (default), auto-triage,
|
||||
human-triage, host-cron, reinforcement, phase10-auto-promote,
|
||||
etc. ``before`` / ``after`` are field snapshots (JSON-serialized).
|
||||
Fail-open: a logging failure never breaks the mutation itself.
|
||||
"""
|
||||
import json as _json
|
||||
try:
|
||||
with get_connection() as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO memory_audit (id, memory_id, action, actor, "
|
||||
"before_json, after_json, note) VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||
(
|
||||
str(uuid.uuid4()),
|
||||
memory_id,
|
||||
action,
|
||||
actor or "api",
|
||||
_json.dumps(before or {}),
|
||||
_json.dumps(after or {}),
|
||||
(note or "")[:500],
|
||||
),
|
||||
)
|
||||
except Exception as e:
|
||||
log.warning("memory_audit_failed", memory_id=memory_id, action=action, error=str(e))
|
||||
|
||||
|
||||
def get_memory_audit(memory_id: str, limit: int = 100) -> list[dict]:
|
||||
"""Fetch audit entries for a memory, newest first."""
|
||||
import json as _json
|
||||
with get_connection() as conn:
|
||||
rows = conn.execute(
|
||||
"SELECT id, memory_id, action, actor, before_json, after_json, note, timestamp "
|
||||
"FROM memory_audit WHERE memory_id = ? ORDER BY timestamp DESC LIMIT ?",
|
||||
(memory_id, limit),
|
||||
).fetchall()
|
||||
out = []
|
||||
for r in rows:
|
||||
try:
|
||||
before = _json.loads(r["before_json"] or "{}")
|
||||
except Exception:
|
||||
before = {}
|
||||
try:
|
||||
after = _json.loads(r["after_json"] or "{}")
|
||||
except Exception:
|
||||
after = {}
|
||||
out.append({
|
||||
"id": r["id"],
|
||||
"memory_id": r["memory_id"],
|
||||
"action": r["action"],
|
||||
"actor": r["actor"] or "api",
|
||||
"before": before,
|
||||
"after": after,
|
||||
"note": r["note"] or "",
|
||||
"timestamp": r["timestamp"],
|
||||
})
|
||||
return out
|
||||
|
||||
|
||||
def get_recent_audit(limit: int = 50) -> list[dict]:
|
||||
"""Fetch recent memory_audit entries across all memories, newest first."""
|
||||
import json as _json
|
||||
with get_connection() as conn:
|
||||
rows = conn.execute(
|
||||
"SELECT id, memory_id, action, actor, before_json, after_json, note, timestamp "
|
||||
"FROM memory_audit ORDER BY timestamp DESC LIMIT ?",
|
||||
(limit,),
|
||||
).fetchall()
|
||||
out = []
|
||||
for r in rows:
|
||||
try:
|
||||
after = _json.loads(r["after_json"] or "{}")
|
||||
except Exception:
|
||||
after = {}
|
||||
out.append({
|
||||
"id": r["id"],
|
||||
"memory_id": r["memory_id"],
|
||||
"action": r["action"],
|
||||
"actor": r["actor"] or "api",
|
||||
"note": r["note"] or "",
|
||||
"timestamp": r["timestamp"],
|
||||
"content_preview": (after.get("content") or "")[:120],
|
||||
})
|
||||
return out
|
||||
|
||||
|
||||
def _normalize_tags(tags) -> list[str]:
|
||||
"""Coerce a tags value (list, JSON string, None) to a clean lowercase list."""
|
||||
import json as _json
|
||||
@@ -98,6 +198,7 @@ def create_memory(
|
||||
status: str = "active",
|
||||
domain_tags: list[str] | None = None,
|
||||
valid_until: str = "",
|
||||
actor: str = "api",
|
||||
) -> Memory:
|
||||
"""Create a new memory entry.
|
||||
|
||||
@@ -160,6 +261,21 @@ def create_memory(
|
||||
valid_until=valid_until or "",
|
||||
)
|
||||
|
||||
_audit_memory(
|
||||
memory_id=memory_id,
|
||||
action="created",
|
||||
actor=actor,
|
||||
after={
|
||||
"memory_type": memory_type,
|
||||
"content": content,
|
||||
"project": project,
|
||||
"status": status,
|
||||
"confidence": confidence,
|
||||
"domain_tags": tags,
|
||||
"valid_until": valid_until or "",
|
||||
},
|
||||
)
|
||||
|
||||
return Memory(
|
||||
id=memory_id,
|
||||
memory_type=memory_type,
|
||||
@@ -235,6 +351,8 @@ def update_memory(
|
||||
memory_type: str | None = None,
|
||||
domain_tags: list[str] | None = None,
|
||||
valid_until: str | None = None,
|
||||
actor: str = "api",
|
||||
note: str = "",
|
||||
) -> bool:
|
||||
"""Update an existing memory."""
|
||||
import json as _json
|
||||
@@ -258,31 +376,48 @@ def update_memory(
|
||||
if duplicate:
|
||||
raise ValueError("Update would create a duplicate active memory")
|
||||
|
||||
# Capture before-state for audit
|
||||
before_snapshot = {
|
||||
"content": existing["content"],
|
||||
"status": existing["status"],
|
||||
"confidence": existing["confidence"],
|
||||
"memory_type": existing["memory_type"],
|
||||
}
|
||||
after_snapshot = dict(before_snapshot)
|
||||
|
||||
updates = []
|
||||
params: list = []
|
||||
|
||||
if content is not None:
|
||||
updates.append("content = ?")
|
||||
params.append(content)
|
||||
after_snapshot["content"] = content
|
||||
if confidence is not None:
|
||||
updates.append("confidence = ?")
|
||||
params.append(confidence)
|
||||
after_snapshot["confidence"] = confidence
|
||||
if status is not None:
|
||||
if status not in MEMORY_STATUSES:
|
||||
raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
|
||||
updates.append("status = ?")
|
||||
params.append(status)
|
||||
after_snapshot["status"] = status
|
||||
if memory_type is not None:
|
||||
if memory_type not in MEMORY_TYPES:
|
||||
raise ValueError(f"Invalid memory type '{memory_type}'. Must be one of: {MEMORY_TYPES}")
|
||||
updates.append("memory_type = ?")
|
||||
params.append(memory_type)
|
||||
after_snapshot["memory_type"] = memory_type
|
||||
if domain_tags is not None:
|
||||
norm_tags = _normalize_tags(domain_tags)
|
||||
updates.append("domain_tags = ?")
|
||||
params.append(_json.dumps(_normalize_tags(domain_tags)))
|
||||
params.append(_json.dumps(norm_tags))
|
||||
after_snapshot["domain_tags"] = norm_tags
|
||||
if valid_until is not None:
|
||||
vu = valid_until.strip() or None
|
||||
updates.append("valid_until = ?")
|
||||
params.append(valid_until.strip() or None)
|
||||
params.append(vu)
|
||||
after_snapshot["valid_until"] = vu or ""
|
||||
|
||||
if not updates:
|
||||
return False
|
||||
@@ -297,21 +432,40 @@ def update_memory(
|
||||
|
||||
if result.rowcount > 0:
|
||||
log.info("memory_updated", memory_id=memory_id)
|
||||
# Action verb is driven by status change when applicable; otherwise "updated"
|
||||
if status == "active" and before_snapshot["status"] == "candidate":
|
||||
action = "promoted"
|
||||
elif status == "invalid" and before_snapshot["status"] == "candidate":
|
||||
action = "rejected"
|
||||
elif status == "invalid":
|
||||
action = "invalidated"
|
||||
elif status == "superseded":
|
||||
action = "superseded"
|
||||
else:
|
||||
action = "updated"
|
||||
_audit_memory(
|
||||
memory_id=memory_id,
|
||||
action=action,
|
||||
actor=actor,
|
||||
before=before_snapshot,
|
||||
after=after_snapshot,
|
||||
note=note,
|
||||
)
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def invalidate_memory(memory_id: str) -> bool:
|
||||
def invalidate_memory(memory_id: str, actor: str = "api") -> bool:
|
||||
"""Mark a memory as invalid (error correction)."""
|
||||
return update_memory(memory_id, status="invalid")
|
||||
return update_memory(memory_id, status="invalid", actor=actor)
|
||||
|
||||
|
||||
def supersede_memory(memory_id: str) -> bool:
|
||||
def supersede_memory(memory_id: str, actor: str = "api") -> bool:
|
||||
"""Mark a memory as superseded (replaced by newer info)."""
|
||||
return update_memory(memory_id, status="superseded")
|
||||
return update_memory(memory_id, status="superseded", actor=actor)
|
||||
|
||||
|
||||
def promote_memory(memory_id: str) -> bool:
|
||||
def promote_memory(memory_id: str, actor: str = "api", note: str = "") -> bool:
|
||||
"""Promote a candidate memory to active (Phase 9 Commit C review queue).
|
||||
|
||||
Returns False if the memory does not exist or is not currently a
|
||||
@@ -326,10 +480,10 @@ def promote_memory(memory_id: str) -> bool:
|
||||
return False
|
||||
if row["status"] != "candidate":
|
||||
return False
|
||||
return update_memory(memory_id, status="active")
|
||||
return update_memory(memory_id, status="active", actor=actor, note=note)
|
||||
|
||||
|
||||
def reject_candidate_memory(memory_id: str) -> bool:
|
||||
def reject_candidate_memory(memory_id: str, actor: str = "api", note: str = "") -> bool:
|
||||
"""Reject a candidate memory (Phase 9 Commit C).
|
||||
|
||||
Sets the candidate's status to ``invalid`` so it drops out of the
|
||||
@@ -344,7 +498,7 @@ def reject_candidate_memory(memory_id: str) -> bool:
|
||||
return False
|
||||
if row["status"] != "candidate":
|
||||
return False
|
||||
return update_memory(memory_id, status="invalid")
|
||||
return update_memory(memory_id, status="invalid", actor=actor, note=note)
|
||||
|
||||
|
||||
def reinforce_memory(
|
||||
@@ -385,6 +539,17 @@ def reinforce_memory(
|
||||
old_confidence=round(old_confidence, 4),
|
||||
new_confidence=round(new_confidence, 4),
|
||||
)
|
||||
# Reinforcement writes an audit row per bump. Reinforcement fires often
|
||||
# (every captured interaction); this lets you trace which interactions
|
||||
# kept which memories alive. Could become chatty but is invaluable for
|
||||
# decay/cold-memory analysis. If it becomes an issue, throttle here.
|
||||
_audit_memory(
|
||||
memory_id=memory_id,
|
||||
action="reinforced",
|
||||
actor="reinforcement",
|
||||
before={"confidence": old_confidence},
|
||||
after={"confidence": new_confidence},
|
||||
)
|
||||
return True, old_confidence, new_confidence
|
||||
|
||||
|
||||
@@ -420,7 +585,11 @@ def auto_promote_reinforced(
|
||||
|
||||
for row in rows:
|
||||
mid = row["id"]
|
||||
ok = promote_memory(mid)
|
||||
ok = promote_memory(
|
||||
mid,
|
||||
actor="phase10-auto-promote",
|
||||
note=f"ref_count={row['reference_count']} confidence={row['confidence']:.2f}",
|
||||
)
|
||||
if ok:
|
||||
promoted.append(mid)
|
||||
log.info(
|
||||
@@ -459,7 +628,11 @@ def expire_stale_candidates(
|
||||
|
||||
for row in rows:
|
||||
mid = row["id"]
|
||||
ok = reject_candidate_memory(mid)
|
||||
ok = reject_candidate_memory(
|
||||
mid,
|
||||
actor="candidate-expiry",
|
||||
note=f"unreinforced for {max_age_days}+ days",
|
||||
)
|
||||
if ok:
|
||||
expired.append(mid)
|
||||
log.info("memory_expired", memory_id=mid)
|
||||
|
||||
Reference in New Issue
Block a user