Daily job multiplies confidence by 0.97 (~2-month half-life) for
active memories with reference_count=0 AND idle > 30 days. Below
0.3 → auto-supersede with audit. Reversible via reinforcement
(which already bumps confidence back up).
Rationale: stale memories currently rank equal to fresh ones in
retrieval. Without decay, the brain accumulates obsolete facts
that compete with fresh knowledge for context-pack slots. With
decay, memories earn their longevity via reference.
- decay_unreferenced_memories() in service.py (stdlib-only, no cron
infra needed)
- POST /admin/memory/decay-run endpoint
- Nightly Step F4 in batch-extract.sh
- Exempt: reinforced (refcount > 0), graduated, superseded, invalid
- Audit row per supersession ("decayed below floor, no references"),
actor="confidence-decay". Per-decay rows skipped (chatty, no
human value — status change is the meaningful signal).
- Configurable via env: ATOCORE_DECAY_* (exposed through endpoint body)
Tests: +13 (basic decay, reinforcement protection, supersede at floor,
audit trail, graduated/superseded exemption, reinforcement reversibility,
threshold tuning, parameter validation, cross-run stacking).
401 → 414.
Next in Phase 7: 7C tag canonicalization (weekly), then 7B contradiction
detection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dedup detector now merges high-confidence duplicates silently instead
of piling every proposal into a human triage queue. Matches the 3-tier
escalation pattern that auto_triage already uses.
Tiering decision per cluster:
TIER-1 auto-approve: sonnet confidence >= 0.8 AND min_pairwise_sim >= 0.92
AND all sources share project+type → auto-merge silently
(actor="auto-dedup-tier1" in audit log)
TIER-2 escalation: sonnet 0.5-0.8 conf OR sim 0.85-0.92 → opus second opinion.
Opus confirms with conf >= 0.8 → auto-merge (actor="auto-dedup-tier2").
Opus overrides (reject) → skip silently.
Opus low conf → human triage with opus's refined draft.
HUMAN triage: Only the genuinely ambiguous land in /admin/triage.
Env-tunable thresholds:
ATOCORE_DEDUP_AUTO_APPROVE_CONF (0.8)
ATOCORE_DEDUP_AUTO_APPROVE_SIM (0.92)
ATOCORE_DEDUP_TIER2_MIN_CONF (0.5)
ATOCORE_DEDUP_TIER2_MIN_SIM (0.85)
ATOCORE_DEDUP_TIER2_MODEL (opus)
New flag --no-auto-approve for kill-switch testing (everything → human queue).
Tests: +6 (tier-2 prompt content, same_bucket edges, min_pairwise_similarity
on identical + transitive clusters). 395 → 401.
Rationale: user asked for autonomous behavior — "this needs to be intelligent,
I don't want to manually triage stuff". Matches the consolidation principle:
never discard details, but let the brain tidy up on its own for the easy cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New table memory_merge_candidates + service functions to cluster
near-duplicate active memories within (project, memory_type) buckets,
draft a unified content via LLM, and merge on human approval. Source
memories become superseded (never deleted); merged memory carries
union of tags, max of confidence, sum of reference_count.
- schema migration for memory_merge_candidates
- atocore.memory.similarity: cosine + transitive clustering
- atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific
- service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate
- scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent)
- 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan
- triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar
- batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays)
- deploy/dalidou/dedup-watcher.sh for UI-triggered scans
- 21 new tests (374 → 395)
- docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User observation: APM work was captured + extracted, but candidates got
tagged project=atocore or left blank instead of project=apm. Reason:
the prompt said 'Unknown project names — still tag them' but was too
terse; sonnet hedged toward registered matches rather than proposing
new slugs.
Fix: explicit guidance in the system prompt for when to propose an
unregistered project name vs when to stick with a registered one.
New instructions:
- When a memory is clearly ABOUT a named tool/product/project/system
not in the known list, use a slugified version as the project tag
('apm' for 'Atomaste Part Manager'). The Living Taxonomy detector
(Phase 6 C.1) scans these and surfaces for one-click registration
once ≥3 memories accumulate with that tag.
- Exception: if a memory merely USES an unknown tool but is about a
registered project ('p04 parts missing materials in APM'), tag with
the registered project and mention the tool in content.
This closes the loop on the Phase 6 detector: extractor now produces
taggable data for it, detector surfaces, user registers with one click.
Version bump: llm-0.5.0 → llm-0.6.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes two real-use gaps:
1. "APM tool" gap: work done outside Claude Code (desktop, web, phone,
other machine) was invisible to AtoCore.
2. Project discovery gap: manual JSON-file edits required to promote
an emerging theme to a first-class project.
B — atocore_remember MCP tool (scripts/atocore_mcp.py):
- New MCP tool for universal capture from any MCP-aware client
(Claude Desktop, Code, Cursor, Zed, Windsurf, etc.)
- Accepts content (required) + memory_type/project/confidence/
valid_until/domain_tags (all optional with sensible defaults)
- Creates a candidate memory, goes through the existing 3-tier triage
(no bypass — the quality gate catches noise)
- Detailed tool description guides Claude on when to invoke: "remember
this", "save that for later", "don't lose this fact"
- Total tools exposed by MCP server: 14 → 15
C.1 Emerging-concepts detector (scripts/detect_emerging.py):
- Nightly scan of active + candidate memories for:
* Unregistered project names with ≥3 memory occurrences
* Top 20 domain_tags by frequency (emerging categories)
* Active memories with reference_count ≥ 5 + valid_until set
(reinforced transients — candidates for extension)
- Writes findings to atocore/proposals/* project state entries
- Emits "warning" alert via Phase 4 framework the FIRST time a new
project crosses the 5-memory alert threshold (avoids spam)
- Configurable via env vars: ATOCORE_EMERGING_PROJECT_MIN (default 3),
ATOCORE_EMERGING_ALERT_THRESHOLD (default 5), TOP_TAGS_LIMIT (20)
C.2 Registration surface (src/atocore/api/routes.py + wiki.py):
- POST /admin/projects/register-emerging — one-click register with
sensible defaults (ingest_roots auto-filled with
vault:incoming/projects/<id>/ convention). Clears the proposal
from the dashboard list on success.
- Dashboard /admin/dashboard: new "proposals" section with
unregistered_projects + emerging_categories + reinforced_transients.
- Wiki homepage: "📋 Emerging" section rendering each unregistered
project as a card with count + 2 sample memory previews + inline
"📌 Register as project" button that calls the endpoint via fetch,
reloads the page on success.
C.3 Transient-to-durable extension
(src/atocore/memory/service.py + API + cron):
- New extend_reinforced_valid_until() function — scans active memories
with valid_until in the next 30 days and reference_count ≥ 5.
Extends expiry by 90 days. If reference_count ≥ 10, clears expiry
entirely (makes permanent). Writes audit rows via the Phase 4
memory_audit framework with actor="transient-to-durable".
- POST /admin/memory/extend-reinforced — API wrapper for cron.
- Matches the user's intuition: "something transient becomes important
if you keep coming back to it".
Nightly cron (deploy/dalidou/batch-extract.sh):
- Step F2: detect_emerging.py (after F pipeline summary)
- Step F3: /admin/memory/extend-reinforced (before integrity check)
- Both fail-open; errors don't break the pipeline.
Tests: 366 → 374 (+8 for Phase 6):
- 6 tests for extend_reinforced_valid_until covering:
extension path, permanent path, skip far-future, skip low-refs,
skip permanent memories, audit row write
- 2 smoke tests for the detector (imports cleanly, handles empty DB)
- MCP tool changes don't need new tests — the wrapper is pure passthrough
Design decisions documented in plan file:
- atocore_remember deliberately doesn't bypass triage (quality gate)
- Detector is passive (surfaces proposals) not active (auto-registers)
- Sensible ingest-root defaults ("vault:incoming/projects/<id>/")
so registration is one-click with no file-path thinking
- Extension adds 90 days rather than clearing expiry (gradual
permanence earned through sustained reinforcement)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NameError on /admin/graduation/request — routes.py doesn't import json
at module scope. Added local 'import json as _json' in both graduation
handlers matching the pattern used elsewhere in this file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes the graduation UX loop: no more SSH required to populate the
entity graph from memories. Click button → host watcher picks up
→ graduation runs → entity candidates appear in the same triage UI.
New API endpoints (src/atocore/api/routes.py):
- POST /admin/graduation/request: takes {project, limit}, writes flag
to project_state. Host watcher picks up within 2 min.
- GET /admin/graduation/status: returns requested/running/last_result
fields for UI polling.
Triage UI (src/atocore/engineering/triage_ui.py):
- Graduation bar with:
- 🎓 Graduate memories button
- Project selector populated from registry (or "all projects")
- Limit number input (default 30, max 200)
- Status message area
- Poll every 10s until is_running=false, then auto-reload the page to
show new entity candidates in the Entity section below
- Graduation bar appears on both populated and empty triage page
states so you can kick off graduation from either
Host watcher (deploy/dalidou/graduation-watcher.sh):
- Mirrors auto-triage-watcher.sh pattern: poll, lock, clear flag,
run, record result, unlock
- Parses {project, limit} JSON from the flag payload
- Runs graduate_memories.py with those args
- Records graduation_running/started/finished/last_result in project
state for the UI to display
- Lock file prevents concurrent runs
Install on host (one-time, via cron):
*/2 * * * * /srv/storage/atocore/app/deploy/dalidou/graduation-watcher.sh \
>> /home/papa/atocore-logs/graduation-watcher.log 2>&1
This completes the Phase 5 self-service loop: queue triage happens
autonomously via the 3-tier escalation (shipped in 3ca1972); entity
graph population happens autonomously via a button click. No shell
required for daily use.
Tests: 366 passing (no new tests — UI + shell are integration-level).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The population move + the safety net + the universal consumer hookup,
all shipped together. This is where the engineering graph becomes
genuinely useful against the real 262-memory corpus.
5F: Memory → Entity graduation (THE population move)
- src/atocore/engineering/_graduation_prompt.py: stdlib-only shared
prompt module mirroring _llm_prompt.py pattern (container + host
use same system prompt, no drift)
- scripts/graduate_memories.py: host-side batch driver that asks
claude-p "does this memory describe a typed entity?" and creates
entity candidates with source_refs pointing back to the memory
- promote_entity() now scans source_refs for memory:* prefix; if
found, flips source memory to status='graduated' with
graduated_to_entity_id forward pointer + writes memory_audit row
- GET /admin/graduation/stats exposes graduation rate for dashboard
5G: Sync conflict detection on entity promote
- src/atocore/engineering/conflicts.py: detect_conflicts_for_entity()
runs on every active promote. V1 checks 3 slot kinds narrowly to
avoid false positives:
* component.material (multiple USES_MATERIAL edges)
* component.part_of (multiple PART_OF edges)
* requirement.name (duplicate active Requirements in same project)
- Conflicts + members persist via the tables built in 5A
- Fires a "warning" alert via Phase 4 framework
- Deduplicates: same (slot_kind, slot_key) won't get a new row
- resolve_conflict(action="dismiss|supersede_others|no_action"):
supersede_others marks non-winner members as status='superseded'
- GET /admin/conflicts + POST /admin/conflicts/{id}/resolve
5H: MCP + context pack integration
- scripts/atocore_mcp.py: 7 new engineering tools exposed to every
MCP-aware client (Claude Desktop, Claude Code, Cursor, Zed):
* atocore_engineering_map (Q-001/004 system tree)
* atocore_engineering_gaps (Q-006/009/011 killer queries — THE
director's question surfaced as a built-in tool)
* atocore_engineering_requirements_for_component (Q-005)
* atocore_engineering_decisions (Q-008)
* atocore_engineering_changes (Q-013 — reads entity audit log)
* atocore_engineering_impact (Q-016 BFS downstream)
* atocore_engineering_evidence (Q-017 inbound provenance)
- MCP tools total: 14 (7 memory/state/health + 7 engineering)
- context/builder.py _build_engineering_context now appends a compact
gaps summary ("Gaps: N orphan reqs, M risky decisions, K unsupported
claims") so every project-scoped LLM call sees "what we're missing"
Tests: 341 → 356 (15 new):
- 5F: graduation prompt parses positive/negative decisions, rejects
unknown entity types, tolerates markdown fences; promote_entity
marks source memory graduated with forward pointer; entity without
memory refs promotes cleanly
- 5G: component.material + component.part_of + requirement.name
conflicts detected; clean component triggers nothing; dedup works;
supersede_others resolution marks losers; dismiss leaves both
active; end-to-end promote triggers detection
- 5H: graduation user message includes project + type + content
No regressions across the 341 prior tests. The MCP server now answers
"which p05 requirements aren't satisfied?" directly from any Claude
session — no user prompt engineering, no context hacks.
Next to kick off from user: run graduation script on Dalidou to
populate the graph from 262 existing memories:
ssh papa@dalidou 'cd /srv/storage/atocore/app && PYTHONPATH=src \
python3 scripts/graduate_memories.py --project p05-interferometer --limit 30 --dry-run'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The graph becomes useful. Before this commit, entities sat in the DB
as data with no narrative. After: the director can ask "what am I
forgetting?" and get a structured answer in milliseconds.
New module (src/atocore/engineering/queries.py, 360 lines):
Structure queries (Q-001/004/005/008/013):
- system_map(project): full subsystem → component tree + orphans +
materials joined per component
- decisions_affecting(project, subsystem_id?): decisions linked via
AFFECTED_BY_DECISION, scoped to a subsystem or whole project
- requirements_for(component_id): Q-005 forward trace
- recent_changes(project, since, limit): Q-013 via memory_audit join
(reuses the Phase 4 audit infrastructure — entity_kind='entity')
The 3 killer queries (the real value):
- orphan_requirements(project): requirements with NO inbound SATISFIES
edge. "What do I claim the system must do that nothing actually
claims to handle?" Q-006.
- risky_decisions(project): decisions whose BASED_ON_ASSUMPTION edge
points to an assumption with status in ('superseded','invalid') OR
properties.flagged=True. Finds cascading risk from shaky premises. Q-009.
- unsupported_claims(project): ValidationClaim entities with no inbound
SUPPORTS edge — asserted but no Result to back them. Q-011.
- all_gaps(project): runs all three in one call for dashboards.
History + impact (Q-016/017):
- impact_analysis(entity_id, max_depth=3): BFS over outbound edges.
"What's downstream of this if I change it?"
- evidence_chain(entity_id): inbound SUPPORTS/EVIDENCED_BY/DESCRIBED_BY/
VALIDATED_BY/ANALYZED_BY. "How do I know this is true?"
API (src/atocore/api/routes.py) exposes 10 endpoints:
- GET /engineering/projects/{p}/systems
- GET /engineering/decisions?project=&subsystem=
- GET /engineering/components/{id}/requirements
- GET /engineering/changes?project=&since=&limit=
- GET /engineering/gaps/orphan-requirements?project=
- GET /engineering/gaps/risky-decisions?project=
- GET /engineering/gaps/unsupported-claims?project=
- GET /engineering/gaps?project= (combined)
- GET /engineering/impact?entity=&max_depth=
- GET /engineering/evidence?entity=
Mirror integration (src/atocore/engineering/mirror.py):
- New _gaps_section() renders at top of every project page
- If any gap non-empty: shows up-to-10 per category with names + context
- Clean project: "✅ No gaps detected" — signals everything is traced
Triage UI (src/atocore/engineering/triage_ui.py):
- /admin/triage now shows BOTH memory candidates AND entity candidates
- Entity cards: name, type, project, confidence, source provenance,
Promote/Reject buttons, link to wiki entity page
- Entity promote/reject via fetch to /entities/{id}/promote|reject
- One triage UI for the whole pipeline — consistent muscle memory
Tests: 326 → 341 (15 new, all in test_engineering_queries.py):
- System map structure + orphan detection + material joins
- Killer queries: positive + negative cases (empty when clean)
- Decisions query: project-wide and subsystem-scoped
- Impact analysis walks outbound BFS
- Evidence chain walks inbound provenance
No regressions. All 10 daily queries from the plan are now live and
answering real questions against the graph.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
scripts/integrity_check.py now POSTs to /admin/integrity-check
instead of importing atocore directly. The actual scan lives in
the container where DB access + deps are available. Host-side
cron just triggers and logs the result.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the observability + safety layer that turns AtoCore from
"works until something silently breaks" into "every mutation is
traceable, drift is detected, failures raise alerts."
1. Audit log (memory_audit table):
- New table with id, memory_id, action, actor, before/after JSON,
note, timestamp; 3 indexes for memory_id/timestamp/action
- _audit_memory() helper called from every mutation:
create_memory, update_memory, promote_memory,
reject_candidate_memory, invalidate_memory, supersede_memory,
reinforce_memory, auto_promote_reinforced, expire_stale_candidates
- Action verb auto-selected: promoted/rejected/invalidated/
superseded/updated based on state transition
- "actor" threaded through: api-http, human-triage, phase10-auto-
promote, candidate-expiry, reinforcement, etc.
- Fail-open: audit write failure logs but never breaks the mutation
- GET /memory/{id}/audit: full history for one memory
- GET /admin/audit/recent: last 50 mutations across the system
2. Alerts framework (src/atocore/observability/alerts.py):
- emit_alert(severity, title, message, context) fans out to:
- structlog logger (always)
- ~/atocore-logs/alerts.log append (configurable via
ATOCORE_ALERT_LOG)
- project_state atocore/alert/last_{severity} (dashboard surface)
- ATOCORE_ALERT_WEBHOOK POST if set (auto-detects Discord webhook
format for nice embeds; generic JSON otherwise)
- Every sink fail-open — one failure doesn't prevent the others
- Pipeline alert step in nightly cron: harness < 85% → warning;
candidate queue > 200 → warning
3. Integrity checks (scripts/integrity_check.py):
- Nightly scan for drift:
- Memories → missing source_chunk_id references
- Duplicate active memories (same type+content+project)
- project_state → missing projects
- Orphaned source_chunks (no parent document)
- Results persisted to atocore/status/integrity_check_result
- Any finding emits a warning alert
- Added as Step G in deploy/dalidou/batch-extract.sh nightly cron
4. Dashboard surfaces it all:
- integrity (findings + details)
- alerts (last info/warning/critical per severity)
- recent_audit (last 10 mutations with actor + action + preview)
Tests: 308 → 317 (9 new):
- test_audit_create_logs_entry
- test_audit_promote_logs_entry
- test_audit_reject_logs_entry
- test_audit_update_captures_before_after
- test_audit_reinforce_logs_entry
- test_recent_audit_returns_cross_memory_entries
- test_emit_alert_writes_log_file
- test_emit_alert_invalid_severity_falls_back_to_info
- test_emit_alert_fails_open_on_log_write_error
Deferred: formal migration framework with rollback (current additive
pattern is fine for V1); memory detail wiki page with audit view
(quick follow-up).
To enable Discord alerts: set ATOCORE_ALERT_WEBHOOK to a Discord
webhook URL in Dalidou's environment. Default = log-only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds structural metadata that the LLM triage was already implicitly
reasoning about ("stale snapshot" → reject). Phase 3 captures that
reasoning as fields so it can DRIVE retrieval, not just rejection.
Schema (src/atocore/models/database.py):
- domain_tags TEXT DEFAULT '[]' JSON array of lowercase topic keywords
- valid_until DATETIME ISO date; null = permanent
- idx_memories_valid_until index for efficient expiry queries
Memory service (src/atocore/memory/service.py):
- Memory dataclass gains domain_tags + valid_until
- create_memory, update_memory accept/persist both
- _row_to_memory safely reads both (JSON-decode + null handling)
- _normalize_tags helper: lowercase, dedup, strip, cap at 10
- get_memories_for_context filters expired (valid_until < today UTC)
- _rank_memories_for_query adds tag-boost: memories whose domain_tags
appear as substrings in query text rank higher (tertiary key after
content-overlap density + absolute overlap, before confidence)
LLM extractor (_llm_prompt.py → llm-0.5.0):
- SYSTEM_PROMPT documents domain_tags (2-5 keywords) + valid_until
(time-bounded facts get expiry dates; durable facts stay null)
- normalize_candidate_item parses both fields from model output with
graceful fallback for string/null/missing
LLM triage (scripts/auto_triage.py):
- TRIAGE_SYSTEM_PROMPT documents same two fields
- parse_verdict extracts them from verdict JSON
- On promote: PUT /memory/{id} with tags + valid_until BEFORE
POST /memory/{id}/promote, so active memories carry them
API (src/atocore/api/routes.py):
- MemoryCreateRequest: adds domain_tags, valid_until
- MemoryUpdateRequest: adds domain_tags, valid_until, memory_type
- GET /memory response exposes domain_tags + valid_until + created_at
Triage UI (src/atocore/engineering/triage_ui.py):
- Renders existing tags as colored badges
- Adds inline text field for tags (comma-separated) + date picker for
valid_until on every candidate card
- Save&Promote button persists edits via PUT then promotes
- Plain Promote (and Y shortcut) also saves tags/expiry if edited
Wiki (src/atocore/engineering/wiki.py):
- Search now matches memory content OR domain_tags
- Search results render tags as clickable badges linking to
/wiki/search?q=<tag> for cross-project navigation
- valid_until shown as amber "valid until YYYY-MM-DD" hint
Tests: 303 → 308 (5 new for Phase 3 behavior):
- test_create_memory_with_tags_and_valid_until
- test_create_memory_normalizes_tags
- test_update_memory_sets_tags_and_valid_until
- test_get_memories_for_context_excludes_expired
- test_context_builder_tag_boost_orders_results
Deferred (explicitly): temporal_scope enum, source_refs memory graph,
HDBSCAN clustering, memory detail wiki page, backfill of existing
actives. See docs/MASTER-BRAIN-PLAN.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an "Auto-process queue" button to /admin/triage that lets the
user kick off a full LLM triage pass without SSH. Bridges the gap
between web UI (in container) and claude CLI (host-only).
Architecture:
- UI button POSTs to /admin/triage/request-drain
- Endpoint writes atocore/config/auto_triage_requested_at flag
- Host-side watcher cron (every 2 min) checks for the flag
- When found: clears flag, acquires lock, runs auto_triage.py,
records progress via atocore/status/* entries
- UI polls /admin/triage/drain-status every 10s to show progress,
auto-reloads when done
Safety:
- Lock file prevents concurrent runs on host
- Flag cleared before run so duplicate clicks queue at most one re-run
- Fail-open: watcher errors just log, don't break anything
- Status endpoint stays read-only
Installation on host (one-time):
*/2 * * * * /srv/storage/atocore/app/deploy/dalidou/auto-triage-watcher.sh \
>> /home/papa/atocore-logs/auto-triage-watcher.log 2>&1
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes human triage sustainable. Before: command-line-only review,
auto-triage stopped after 100 candidates/run. Now:
1. Web UI at /admin/triage
- Lists all pending candidates with inline promote/reject/edit
- Edit content in-place before promoting (PUT /memory/{id})
- Change type via dropdown
- Keyboard shortcuts: Y=promote, N=reject, E=edit, S=scroll-next
- Cards fade out after action, queue count updates live
- Zero JS framework — vanilla fetch + event delegation
2. auto_triage.py drains queue
- Loops up to 20 batches (default) of 100 candidates each
- Tracks seen IDs so needs_human items don't reprocess
- Exits cleanly when queue empty
- Nightly cron naturally drains everything
3. Dashboard + wiki surface triage queue
- Dashboard /admin/dashboard: new "triage" section with pending
count + /admin/triage URL + warning/notice severity levels
- Wiki homepage: prominent callout "N candidates awaiting triage —
review now" linking to /admin/triage, styled with triage-warning
(>50) or triage-notice (>20) CSS
Pattern: human intervenes only when AI can't decide. The UI makes
that intervention fast (20 candidates in 60 seconds). Nightly
auto-triage drains the queue so the human queue stays bounded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R11: POST /admin/extract-batch with mode=llm now returns 503 when the
claude CLI is unavailable (was silently returning success with 0
candidates), with a message pointing at the host-side script. +2 tests.
R12: extracted SYSTEM_PROMPT + parse_llm_json_array +
normalize_candidate_item + build_user_message into stdlib-only
src/atocore/memory/_llm_prompt.py. Both the container extractor and
scripts/batch_llm_extract_live.py now import from it, eliminating the
prompt/parser drift risk.
Tests 297 -> 299.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-triage was rejecting 8 of 10 OpenClaw imports as 'session log'
or 'process rule belongs elsewhere'. But OpenClaw's SOUL.md, USER.md,
MEMORY.md and daily memory/*.md files are already curated — they ARE
the canonical continuity layer we want to absorb. Applying the
conservative LLM-conversation triage bar to them discards the signal
the importer was designed to capture.
Triage prompt now has a rule 4: when candidate content starts with
'From OpenClaw/' apply a much lower bar. Session events, project
updates, stakeholder notes, and decisions from daily memory files
should promote, not reject.
The ABB-Space Schott quote that DID promote was the lucky exception
— after this fix, the other 7 daily notes (CDR execution log,
Discord migration plan, isogrid research, etc.) will promote too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extraction prompt rewritten for signal-aggressive mode. The old prompt
rewarded silence ("durable insight only, empty is correct") which
caused quiet failures — real project signal (Schott quotes arriving,
stakeholder events, blockers) was dropped as "not architectural enough".
New prompt explicitly lists what to emit:
1. Project activity (mentions with context — quote received, blocker,
action item)
2. Decisions and choices (architectural commitments, vendor selection)
3. Durable engineering insight (earned knowledge, generalizable)
4. Stakeholder and vendor events (emails sent, meetings scheduled)
5. Preferences and adaptations (how Antoine works)
Philosophy shift: "capture more signal, let triage filter noise"
replaces "extract only durable architectural facts". Auto-triage
already rejects noise well, so moving the filter downstream gives us
visibility into weak signals without polluting active memory.
Added 'episodic' to the candidate types list to support stakeholder
events with a timestamp feel.
LLM_EXTRACTOR_VERSION bumped to llm-0.4.0.
Also: cron-backup.sh now runs POST /ingest/sources before extraction
so new PKM files flow in automatically. Fail-open, non-blocking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three additive upgrades borrowed from Karpathy's LLM Wiki pattern:
1. CONTRADICTION DETECTION: auto-triage now has a fourth verdict —
"contradicts". When a candidate conflicts with an existing memory
(not duplicates, genuine disagreement like "Option A selected"
vs "Option B selected"), the triage model flags it and leaves
it in the queue for human review instead of silently rejecting
or double-storing. Preserves source tension rather than
suppressing it.
2. WEEKLY LINT PASS: scripts/lint_knowledge_base.py checks for:
- Orphan memories (active but zero references after 14 days)
- Stale candidates (>7 days unreviewed)
- Unused entities (no relationships)
- Empty-state projects
- Unregistered projects auto-detected in memories
Runs Sundays via the cron. Outputs a report.
3. WEEKLY SYNTHESIS: scripts/synthesize_projects.py uses sonnet to
generate a 3-5 sentence "current state" paragraph per project
from state + memories + entities. Cached in project_state under
status/synthesis_cache. Wiki project pages now show this at the
top under "Current State (auto-synthesis)". Falls back to a
deterministic summary if no cache exists.
deploy/dalidou/batch-extract.sh: added Step C (synthesis) and
Step D (lint) gated to Sundays via date check.
All additive — nothing existing changes behavior. The database
remains the source of truth; these operations just produce better
synthesized views and catch rot.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Projects now appear under three buckets based on their state entries:
- Active Contracts
- Leads & Prospects
- Internal Tools & Infra
Each card shows the stage as a tag on the project title, the client
as an italic subtitle, and the project description. Empty buckets
hide. Makes it obvious at a glance what's contracted vs lead vs
internal.
Paired with stage/type/client state entries added to all 6 projects
so the grouping has data to work with.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes:
1. ABB-Space registered as a lead project with stage=lead in
Trusted Project State. Projects now have lifecycle awareness
(lead/proposition vs active contract vs completed).
2. Extraction no longer drops unregistered project tags. When the
LLM extractor sees a conversation about a project not in the
registry, it keeps the model's tag on the candidate instead of
falling back to empty. This enables auto-detection of new
projects/leads from organic conversations. The nightly pipeline
surfaces these candidates for triage, where the operator sees
"hey, there's a new project called X" and can decide whether
to register it.
3. Extraction prompt updated to tell the model: "If the conversation
discusses a project NOT in the known list, still tag it — the
system will auto-detect it." This removes the artificial ceiling
that prevented new project discovery.
Updated Case D test: unregistered + unscoped now keeps the model's
tag instead of dropping to empty.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full wiki interface at /wiki with:
- /wiki — Homepage with project cards, search box, system stats
- /wiki/projects/{name} — Project page with clickable entity links
- /wiki/entities/{id} — Entity detail with relationships as links
- /wiki/search?q=... — Search across entities and memories
Every entity name in a project page links to its detail page.
Entity detail pages show properties, relationships as clickable
links to related entities, and breadcrumb navigation back to the
project and wiki home.
Responsive, dark-mode, mobile-friendly. Card grid for projects.
Generated on-demand from the database — always current, no static
files, source of truth is the DB.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Layer 3 of the AtoCore architecture. Generates a human-readable
project overview in markdown from structured data:
- Trusted Project State (by category)
- System Architecture (systems → subsystems → components with
material and interface links)
- Decisions (with affected entities)
- Requirements & Constraints
- Materials
- Vendors
- Active Memories (with confidence and reference counts)
The mirror is DERIVED — every line traces back to an entity, state
entry, or memory. The footer stamps the generation timestamp and
the "not canonical truth" disclaimer.
API: GET /projects/{project_name}/mirror returns {project, format,
content} where content is the full markdown page. Supports project
aliases via resolve_project_name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a query matches a known engineering entity by name, the context
pack now includes a structured '--- Engineering Context ---' band
showing the entity's type, description, and its relationships to
other entities (subsystems, materials, requirements, decisions).
Six-tier context assembly:
1. Trusted Project State
2. Identity / Preferences
3. Project Memories
4. Domain Knowledge
5. Engineering Context (NEW)
6. Retrieved Chunks
The engineering band uses the same token-overlap scoring as memory
ranking: query tokens are matched against entity names + descriptions.
The top match gets its full relationship context included.
10% budget allocation. Trims before domain knowledge (lowest
priority of the structured tiers since the same info may appear in
chunks).
Example: query 'lateral support design' against p04-gigabit
surfaces the Lateral Support subsystem entity with its relationships
to GF-PTFE material, M1 Mirror Assembly parent system, and related
components.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Layer 2 of the AtoCore architecture. Adds typed engineering entities
with relationships on top of the flat memory/state/chunk substrate.
Schema:
- entities table: id, entity_type, name, project, description,
properties (JSON), status, confidence, source_refs, timestamps
- relationships table: source_entity_id, target_entity_id,
relationship_type, confidence, source_refs
15 entity types: project, system, subsystem, component, interface,
requirement, constraint, decision, material, parameter,
analysis_model, result, validation_claim, vendor, process
12 relationship types: contains, part_of, interfaces_with,
satisfies, constrained_by, affected_by_decision, analyzed_by,
validated_by, depends_on, uses_material, described_by, supersedes
Service layer: full CRUD + get_entity_with_context (returns an
entity with its relationships and all related entities in one call).
API endpoints:
- POST /entities — create entity
- GET /entities — list/filter by type, project, status, name
- GET /entities/{id} — entity + relationships + related entities
- POST /relationships — create relationship
Schema auto-initialized on app startup via init_engineering_schema().
7 tests covering entity CRUD, relationships, context traversal,
filtering, name search, and validation.
Test count: 290 -> 297.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The extraction system now produces two kinds of candidates from
the same conversation:
A. PROJECT-SPECIFIC: applied facts scoped to a named project
(unchanged behavior)
B. DOMAIN KNOWLEDGE: generalizable engineering insight earned
through project work, tagged with a domain (physics, materials,
optics, mechanics, manufacturing, metrology, controls, software,
math, finance) and stored with project="" so it surfaces across
all projects.
Critical quality bar enforced in the system prompt: "Would a
competent engineer need experience to know this, or could they
find it in 30 seconds on Google?" Textbook values, definitions,
and obvious facts are explicitly excluded. Only hard-won insight
qualifies — the kind that takes weeks of FEA or real machining
experience to discover.
Domain tags are embedded in the content as a prefix ("[physics]",
"[materials]") so they survive without a schema migration. A future
column can parse them out.
Context builder gains a new tier between project memories and
retrieved chunks:
Tier 1: Trusted Project State (project-specific)
Tier 2: Identity / Preferences (global)
Tier 3: Project Memories (project-specific)
Tier 4: Domain Knowledge (NEW) (cross-project, 10% budget)
Tier 5: Retrieved Chunks (project-boosted)
Trim order: chunks -> domain knowledge -> project memories ->
identity/preference -> project state.
Host-side extraction script updated with the same prompt and
domain-tag handling.
LLM_EXTRACTOR_VERSION bumped to llm-0.3.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wave 2 deeper ingestion:
- 6 new Trusted Project State entries from design-level docs:
p05: test rig architecture, CGH specification, procurement combos
p06: force control architecture, control channels, calibration loop
- Total state entries: ~23 (was ~17)
Observability:
- GET /admin/dashboard — one-shot system overview: memory counts
by type/project/status, reinforced count, project state entry
counts, recent interaction timestamp, extraction pipeline status.
Replaces the need to query 4+ endpoints to understand system state.
Harness: 17/18 (no regression from new state entries).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 identity memories (Antoine's role, projects, infrastructure) and
3 preference memories (no API keys, multi-model collab, action bias)
seeded on live Dalidou. These fill the identity/preference band
that was previously empty.
Lowered MEMORY_BUDGET_RATIO from 0.10 to 0.05 because the 10%
allocation squeezed project memories and retrieval chunks enough
to regress 4 harness fixtures. At 5% the band fits at most 1 short
memory — enough for the most relevant identity/preference fact
without starving the project-specific tiers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Batch 3, Days 1-3. The core R9 failure was Case F: when the model
returned a registered project DIFFERENT from the interaction's
known scope, the old code trusted the model because the project
was registered. A p06-polisher interaction could silently produce
a p04-gigabit candidate.
New rule (trust hierarchy):
1. Interaction scope always wins when set (cases A, C, E, F)
2. Model project used only for unscoped interactions AND only when
it resolves to a registered project (cases D, G)
3. Empty string when both are empty or unregistered (case B)
The rule is: interaction.project is the strongest signal because
it comes from the capture hook's project detection, which runs
before the LLM ever sees the content. The model's project guess
is only useful when the capture hook had no project context.
7 case tests (A-G) cover every combination of model/interaction
project state. Pre-existing tests updated for the new behavior.
Host-side script mirrors the same hierarchy using _known_projects
fetched from GET /projects at startup.
Test count: 286 -> 290 (+4 net, 7 new R9 cases, 3 old tests
consolidated).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R7: ranking scorer now uses overlap-density (overlap_count /
memory_token_count) as primary key instead of raw overlap count.
A 5-token memory with 3 overlapping tokens (density 0.6) now beats
a 40-token overview memory with 3 overlapping tokens (density 0.075)
at the same absolute count. Secondary: absolute overlap. Tertiary:
confidence. Targeting p06-firmware-interface harness fixture.
R9: when the LLM extractor returns a project that differs from the
interaction's known project, it now checks the project registry.
If the model's project is a registered canonical ID, trust it. If
not (hallucinated name), fall back to the interaction's project.
Uses load_project_registry() for the check. The host-side script
mirrors this via an API call to GET /projects at startup.
Two new tests: test_parser_keeps_registered_model_project and
test_parser_rejects_hallucinated_project.
Test count: 280 -> 281.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dalidou runs Claude Code 2.0.60 which does not have this flag
(added in 2.1.x). Removed from both extractor_llm.py and the
host-side batch script. --append-system-prompt and
--disable-slash-commands are supported on 2.0.60.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Day 1 of the operational-reflection batch. Two changes:
1. POST /admin/extract-batch: batch extraction endpoint that fetches
recent interactions (since last run or explicit 'since' param),
runs the extractor (rule or LLM mode), and persists candidates
with status=candidate. Tracks last-run timestamp in project state
(atocore/status/last_extract_batch_run) so subsequent calls
auto-resume. This is the operational home for R1/R5 — makes the
LLM extractor an API operation, not just a script.
2. POST /interactions/{id}/extract now accepts mode: "rule" | "llm"
(default "rule" for backward compatibility). When "llm", it uses
extract_candidates_llm (claude -p sonnet, OAuth).
Both changes preserve the standing decision: extraction stays off
the capture hot path. The batch endpoint is invoked explicitly by
cron, manual curl, or CLI — never inline with POST /interactions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex R6: the LLM extractor accepted the model's project field
verbatim. When the model returned empty string, clearly p06 memories
got promoted as project='', making them invisible to the p06
project-memory band and explaining the p06-offline-design harness
failure.
Fix: if model returns empty project but interaction.project is set,
inherit the interaction's project. Model-supplied project still takes
precedence when non-empty.
Two new tests lock the fallback and precedence behaviors.
R5 acknowledged (LLM extractor not yet wired into API — next task).
Test count: 278 -> 280. Harness re-run pending after deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Haiku was producing noisy candidates (31% accept rate on first
triage). Sonnet should give tighter extraction with fewer false
positives while still catching the same durable-fact patterns.
Override: ATOCORE_LLM_EXTRACTOR_MODEL=haiku to revert.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A 530-char program overview memory with confidence 0.96 was filling
the entire 25% project-memory budget at equal overlap score (3 tokens),
beating shorter query-relevant newly-promoted memories (confidence
0.5) on the confidence tiebreaker. The long memory legitimately
scored well, but its length starved every other memory from the band.
Fix: truncate each formatted entry to 250 chars with '...' so at
least 2-3 memories fit the ~700-char available budget. This doesn't
change ranking — the most relevant memory still goes first — but
it ensures the runner-up can also appear.
Harness fixture delta: Day 7 regression pass pending after deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the LLM-assisted extractor's in-scope / out-of-scope
categories derived from the first live triage pass (16 promoted,
35 rejected). Five in-scope classes, six explicit out-of-scope
classes, trust model summary, multi-model future direction.
Cleaned up stale follow-up items in next-steps.md: rule expansion
marked deprioritized, LLM extractor marked done, retrieval harness
marked done with expansion pending.
Fixed docstring timeout (45s -> 90s) in extractor_llm.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The API endpoint now passes the request's status field through to
create_memory() so external scripts can create candidate memories
directly without going through the extract endpoint. Default remains
'active' for backward compatibility.
persist_llm_candidates.py reads a saved extractor eval baseline
JSON (e.g. the Day 4 LLM run) and POSTs each candidate to Dalidou
with status=candidate. Safe to re-run — duplicate content returns
400 which the script counts as 'skipped'.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second pass on the LLM-assisted extractor after Antoine's explicit
rule: no API key, ever. Refactored src/atocore/memory/extractor_llm.py
to shell out to the Claude Code 'claude -p' CLI via subprocess instead
of the anthropic SDK, so extraction reuses the user's existing Claude.ai
OAuth credentials and needs zero secret management.
Implementation:
- subprocess.run(["claude", "-p", "--model", "haiku",
"--append-system-prompt", <instructions>,
"--no-session-persistence", "--disable-slash-commands",
user_message], ...)
- cwd is a cached tempfile.mkdtemp() so every invocation starts with
a clean context instead of auto-discovering CLAUDE.md / AGENTS.md /
DEV-LEDGER.md from the repo root. We cannot use --bare because it
forces API-key auth, which defeats the purpose; the temp-cwd trick
is the lightest way to keep OAuth auth while skipping project
context loading.
- Silent-failure contract unchanged: missing CLI, non-zero exit,
timeout, malformed JSON — all return [] and log an error. The
capture audit trail must not break on an optional side effect.
- Default timeout bumped from 20s to 90s: Haiku + Node.js startup
+ OAuth check is ~20-40s per call in practice, plus real responses
up to 8KB take longer. 45s hit 2 timeouts on the first live run.
- tests/test_extractor_llm.py refactored: the API-key / anthropic SDK
tests are replaced by subprocess-mocking tests covering missing
CLI, timeout, non-zero exit, and a happy-path stdout parse. 14
tests, all green.
scripts/extractor_eval.py:
- New --output <path> flag writes the JSON result directly to a file,
bypassing stdout/log interleaving (structlog sends INFO to stdout
via PrintLoggerFactory, so a naive '> out.json' pollutes the file).
- Forces UTF-8 on stdout so real LLM output with em-dashes / arrows /
CJK doesn't crash the human report on Windows cp1252 consoles.
First live baseline run against the 20-interaction labeled corpus
(scripts/eval_data/extractor_llm_baseline_2026-04-11.json):
mode=llm labeled=20 recall=1.0 precision=0.357 yield_rate=2.55
total_actual_candidates=51 total_expected_candidates=7
false_negative_interactions=0 false_positive_interactions=9
Recall 0% -> 100% vs rule baseline — every human-labeled positive is
caught. Precision reads low (0.357) but inspection shows the "false
positives" are real candidates the human labels under-counted. For
example interaction a6b0d279 was labeled at 2 expected candidates,
the model caught all 6 polisher architectural facts; interaction
52c8c0f3 was labeled at 1, the model caught all 5 infra commitments.
The labels are the bottleneck, not the model.
Day 4 gate against Codex's criteria:
- candidate yield: 255% vs ≥15-25% target
- FP rate tolerable for manual triage: 51 candidates reviewable in
~10 minutes via the triage CLI
- ≥2 real non-synthetic candidates worth review: 20+ obvious wins
(polisher architecture set, p05 infra set, DEV-LEDGER protocol set)
Gate cleared. LLM-assisted extraction is the path forward for
conversational captures. Rule-based extractor stays as-is for
structured-cue inputs and remains the default mode. The next step
(Day 5 stabilize / document) will wire LLM mode behind a flag in
the public extraction endpoint and document scope.
Test count: 276 -> 278 passing. No existing tests changed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Day 2 baseline showed 0% recall for the rule-based extractor across
5 distinct miss classes. Day 4 decision gate: prototype an
LLM-assisted mode behind a flag. Option A ratified by Antoine.
New module src/atocore/memory/extractor_llm.py:
- extract_candidates_llm(interaction) returns the same MemoryCandidate
dataclass the rule extractor produces, so both paths flow through
the existing triage / candidate pipeline unchanged.
- extract_candidates_llm_verbose() also returns the raw model output
and any error string, for eval and debugging.
- Uses Claude Haiku 4.5 by default; model overridable via
ATOCORE_LLM_EXTRACTOR_MODEL env. Timeout via
ATOCORE_LLM_EXTRACTOR_TIMEOUT_S (default 20s).
- Silent-failure contract: missing API key, unreachable model,
malformed JSON — all return [] and log an error. Never raises
into the caller. The capture audit trail must not break on an
optional side effect.
- Parser tolerates markdown fences, surrounding prose, invalid
memory types, clamps confidence to [0,1], drops empty content.
- System prompt explicitly tells the model to return [] for most
conversational turns (durable-fact bar, not "extract everything").
- Trust rules unchanged: candidates are never auto-promoted,
extraction stays off the capture hot path, human triages via the
existing CLI.
scripts/extractor_eval.py: new --mode {rule,llm} flag so the same
labeled corpus can be scored against both extractors. Default
remains rule so existing invocations are unchanged.
tests/test_extractor_llm.py: 12 new unit tests covering the parser
(empty array, malformed JSON, markdown fences, surrounding prose,
invalid types, empty content, confidence clamping, version tagging),
plus contract tests for missing API key, empty response, and a
mocked api_error path so failure modes never raise.
Test count: 264 -> 276 passing. No existing tests changed.
Next step: run `python scripts/extractor_eval.py --mode llm` against
the labeled set with ANTHROPIC_API_KEY in env, record the delta,
decide whether to wire LLM mode into the API endpoint and CLI or
keep it script-only for now.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hyphen- and slash-separated identifiers (polisher-control,
twyman-green, etc.) were single tokens in the reinforcement /
memory-ranking tokenizer, so queries had to match the exact
hyphenation to score. The harness caught this on p06-control-rule:
'polisher control design rule' scored 2 overlap on each of the
three polisher-*/design-rule memories and the tiebreaker picked
the wrong one.
Now hyphenated words contribute both the full form AND each
sub-token. Extracted _add_token helper to avoid duplicating the
stop-word / length gate at both insertion points.
Reinforcement matcher tests still pass (28) — the new sub-tokens
only widen the match set, they never narrow it, so memories that
previously reinforced continue to reinforce.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per-type ranking was still starving later types: when a p05 query
matched a 'knowledge' memory best but 'project' came first in the
type order, the project-type candidates filled the budget before
the knowledge-type pool was even ranked.
Collect all candidates into a single pool, dedupe by id, then
rank the whole pool once against the query before walking the
flat budget. Python's stable sort preserves insertion order (which
still reflects the caller's memory_types order) as a natural
tiebreaker when scores are equal.
Regression surfaced by the retrieval eval harness:
p05-vendor-signal still missing 'Zygo' after 5aeeb1c — the vendor
memory was type=knowledge but never reached the ranker because
type=project consumed the budget first.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
get_memories_for_context now accepts an optional query string.
When provided, candidate memories are reranked by lexical overlap
with the query (stemmed token intersection, ties broken by
confidence) before the budget walk. Without a query the order is
unchanged — effectively "by confidence desc" as before — so
non-builder callers see no behaviour change.
The fetch limit is raised from 10 to 30 so there's a real pool to
rerank. Token overlap reuses _normalize/_tokenize from
reinforcement.py so ranking and reinforcement matching share the
same notion of distinctive terms.
build_context passes the user_prompt through to both the identity/
preference and project-memory calls. The retrieval harness
regression the fix is targeting:
- p05-vendor-signal FAIL @ 1161645: "Zygo" missing from the pack
even though an active vendor memory contained it. Root cause:
higher-confidence p05 memories filled the 25% budget slice
before the vendor memory ever got a chance. Query-aware ordering
puts the vendor memory first when the query is about vendors.
New regression test test_project_memories_query_relevance_ordering
locks the behaviour in with two p05 memories and a tight budget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
At 0.15 the effective per-call allowance (450 - 55 wrapper) was 395
chars, which is just under the length of a real paragraph-length
project memory (~400 chars). Verified on live p04 probe: band was
still absent after the flat-budget fix because the first memory
entry was one character too long for the budget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The per-type slicing (available // len(memory_types)) starved
paragraph-length memories: with 3 types and a 450-char budget,
each type got ~131 chars while real project memories are 300-500
chars each — every entry was skipped and the new Project Memories
band never appeared in the live pack.
Switch to a flat budget pool walked type-by-type in order. Short
identity/preference memories still get first pick when the budget
is tight, but long project memories can now compete for space.
Caught on the first post-deploy probe: 2 active p04 memories
existed but none landed in formatted_context.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The retrieval-quality review on 2026-04-11 found that active
project/knowledge/episodic memories never reached the pack: only
Trusted Project State and identity/preference memories were being
assembled. Reinforcement bumped confidence on memories that had
no retrieval outlet, so the reflection loop was half-open.
This change adds a third memory tier between identity/preference
and retrieved chunks:
- PROJECT_MEMORY_BUDGET_RATIO = 0.15
- Memory types: project, knowledge, episodic
- Only populated when a canonical project is in scope — without
a project hint, project memories stay out (cross-project bleed
would rot the signal)
- Rendered under a dedicated "--- Project Memories ---" header
so the LLM can distinguish it from the identity/preference band
- Trim order in _trim_context_to_budget: retrieval → project
memories → identity/preference → project state (most recently
added tier drops first when budget is tight)
get_memories_for_context gains header/footer kwargs so the two
memory blocks can be distinguished in a single pack without a
second helper.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reinforcement matcher now handles paragraph-length memories via a
dual-mode threshold: short memories keep the 70% overlap rule,
long memories (>15 stems) require 12 absolute overlaps AND 35%
fraction so organic paraphrase can still reinforce. Diagnosis:
every active memory stayed at reference_count=0 because 40-token
project summaries never hit 70% overlap on real responses.
- scripts/atocore_client.py gains batch-extract (fan out
/interactions/{id}/extract over recent interactions) and triage
(interactive promote/reject walker for the candidate queue),
matching the Phase 9 reflection-loop review flow without pulling
extraction into the capture hot path.
- deploy/dalidou/cron-backup.sh adds an optional off-host rsync step
gated on ATOCORE_BACKUP_RSYNC, fail-open when the target is offline
so a laptop being off at 03:00 UTC never reds the local backup.
- docs/next-steps.md records the retrieval-quality sweep: project
state surfaces, chunks are on-topic but broad, active memories
never reach the pack (reflection loop has no retrieval outlet yet).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- POST /admin/backup/cleanup — retention cleanup via API (dry-run by default)
- record_interaction() accepts extract=True to auto-extract candidate
memories from response text using the Phase 9C rule-based extractor
- POST /interactions accepts extract field to enable extraction on capture
- deploy/dalidou/cron-backup.sh — daily backup + cleanup for cron
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- create_runtime_backup() now auto-validates its output and includes
validated/validation_errors fields in returned metadata
- New cleanup_old_backups() with retention policy: 7 daily, 4 weekly
(Sundays), 6 monthly (1st of month), dry-run by default
- CLI `cleanup` subcommand added to backup module
- 9 new tests (2 validation + 7 retention), 259 total passing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>