POST /entities/{id}/promote now wraps the promote_entity call in
try/except ValueError → HTTPException(400). Previously the new
V1-0 provenance re-check raised ValueError that the route didn't
catch, so legacy no-provenance candidates promoted via the API
surfaced as 500 instead of 400.
Matches the existing ValueError → 400 handling on POST /entities
(api_create_entity at routes.py:1490).
Test: test_api_promote_returns_400_on_legacy_no_provenance inserts
a pre-V1-0 legacy candidate directly, POSTs to the promote route,
asserts 400 with the expected detail, asserts the row stays
candidate.
Test count: 547 -> 548. Full suite green in 72.91s.
Closes R14.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last P2 from Antoine's "daily-usable" sprint. Entities referenced via
[[Name]] in descriptions or mirror markdown now render as:
- live wikilink if the name matches an entity in the same project
- live cross-project link with "(in project X)" scope indicator if the
only match is in another project
- red italic redlink pointing at /wiki/new?name=... otherwise
Clicking a redlink opens a pre-filled "create this entity" form that
POSTs to /v1/entities and redirects to the new entity's page.
- engineering/wiki.py: _wikilink_transform + _resolve_wikilink,
applied in render_project (pre-markdown) and render_entity
(description body). render_new_entity_form for the create page.
CSS for .wikilink / .wikilink-cross / .redlink / .new-entity-form
- api/routes.py: GET /wiki/new?name&project
- tests/test_wikilinks.py: 12 tests including the spec regression
(A references [[B]] -> redlink; create B -> link becomes live)
- DEV-LEDGER.md: session log + test_count 521 -> 533
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Broke the dedup-watcher cron when I wrote memory_dedup.py in session
7A: imported atocore.memory.similarity, which transitively pulls
sentence-transformers + pydantic_settings onto host Python that
intentionally doesn't have them. Every UI-triggered + cron dedup scan
since 7A deployed was silently crashing with ModuleNotFoundError
(visible only in /home/papa/atocore-logs/dedup-ondemand-*.log).
I even documented this architecture rule in atocore.memory._llm_prompt
('This module MUST stay stdlib-only') then violated it one session
later. Shame.
Real fix — matches the extractor pattern:
- New endpoint POST /admin/memory/dedup-cluster on the server: takes
{project, similarity_threshold, max_clusters}, runs the embedding +
transitive-clustering inside the container where
sentence-transformers lives, returns cluster shape.
- scripts/memory_dedup.py now pure stdlib: pulls clusters via HTTP,
LLM-drafts merges via claude CLI, POSTs proposals back. No atocore
imports beyond the stdlib-only _dedup_prompt shared module.
- Regression test pins the rule: test_memory_dedup_script_is_stdlib_only
snapshots sys.modules before/after importing the script and asserts
no non-allowed atocore modules were pulled.
Also: similarity.py + cluster_by_threshold stay server-side, still
covered by the same tests that used to live in the host tier-helper
section.
Tests 459 → 458 (-1 via rewrite of obsolete host-tier helper tests,
+2 for the new stdlib-only regression + endpoint shape tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the asymmetry the user surfaced: before this, Claude Code
captured every turn (Stop hook) but retrieval only happened when
Claude chose to call atocore_context (opt-in MCP tool). OpenClaw had
both sides covered after 7I; Claude Code did not.
Now symmetric. Every Claude Code prompt is auto-sent to
/context/build and the returned pack is prepended via
hookSpecificOutput.additionalContext — same as what OpenClaw's
before_agent_start hook now does.
- deploy/hooks/inject_context.py — UserPromptSubmit hook. Fail-open
(always exit 0). Skips short/XML prompts. 5s timeout. Project
inference mirrors capture_stop.py cwd→slug table. Kill switch:
ATOCORE_CONTEXT_DISABLED=1.
- ~/.claude/settings.json registered the hook (local config, not
committed; copy-paste snippet in docs/capture-surfaces.md).
- Removed /wiki/capture from topnav. Endpoint still exists but the
page is now labeled "fallback only" with a warning banner. The
sanctioned surfaces are Claude Code + OpenClaw; manual paste is
explicitly not the design.
- docs/capture-surfaces.md — scope statement: two surfaces, nothing
else. Anthropic API polling explicitly prohibited.
Tests: +8 for inject_context.py (exit 0 on all failure modes, kill
switch, short prompt filter, XML filter, bad stdin, mock-server
success shape, project inference from cwd). Updated 2 wiki tests
for the topnav change. 450 → 459.
Verified live with real AtoCore: injected 2979 chars of atocore
project context on a cwd-matched prompt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes three gaps the user surfaced: (1) OpenClaw agents run blind
without AtoCore context, (2) mobile/desktop chats can't be captured
at all, (3) wiki UI hadn't kept up with backend capabilities.
Phase 7I — OpenClaw two-way bridge
- Plugin now calls /context/build on before_agent_start and prepends
the context pack to event.prompt, so whatever LLM runs underneath
(sonnet, opus, codex, local model) answers grounded in AtoCore
knowledge. Captured prompt stays the user's original text; fail-open
with a 5s timeout. Config-gated via injectContext flag.
- Plugin version 0.0.0 → 0.2.0; README rewritten.
UI refresh
- /wiki/capture — paste-to-ingest form for Claude Desktop / web / mobile
/ ChatGPT / other. Goes through normal /interactions pipeline with
client="claude-desktop|claude-web|claude-mobile|chatgpt|other".
Fixes the rotovap/mushroom-on-phone gap.
- /wiki/memories/{id} (Phase 7E) — full memory detail: content, status,
confidence, refs, valid_until, domain_tags (clickable to domain
pages), project link, source chunk, graduated-to-entity link, full
audit trail, related-by-tag neighbors.
- /wiki/domains/{tag} (Phase 7F) — cross-project view: all active
memories with the given tag grouped by project, sorted by count.
Case-insensitive, whitespace-tolerant. Also surfaces graduated
entities carrying the tag.
- /wiki/activity — autonomous-activity timeline feed. Summary chips
by action (created/promoted/merged/superseded/decayed/canonicalized)
and by actor (auto-dedup-tier1, auto-dedup-tier2, confidence-decay,
phase10-auto-promote, transient-to-durable, tag-canon, human-triage).
Answers "what has the brain been doing while I was away?"
- Home refresh: persistent topnav (Home · Activity · Capture · Triage
· Dashboard), "What the brain is doing" snippet above project cards
showing recent autonomous-actor counts, link to full activity.
Tests: +10 (capture page, memory detail + 404, domain cross-project +
empty + tag normalization, activity feed + groupings, home topnav,
superseded-source detail after merge). 440 → 450.
Known next: capture-browser extension for Claude.ai web (bigger
project, deferred); voice/mobile relay (adjacent).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LLM proposes alias→canonical mappings for domain_tags; confidence >= 0.8
auto-apply, below goes to human triage. Protects project identifiers
(p04, p05, p06, atocore, apm, etc.) from ever being canonicalized
since they're their own namespace, not concepts.
Problem solved: tag drift fragments retrieval. "fw" vs "firmware" vs
"firmware-control" all mean the same thing, but cross-cutting queries
that filter by tag only hit one variant. Weekly canonicalization pass
keeps the tag graph clean.
- Schema: tag_aliases table (pending | approved | rejected)
- atocore.memory._tag_canon_prompt (stdlib-only, protected project tokens)
- service: get_tag_distribution, apply_tag_alias (atomic per-memory,
dedupes if both alias + canonical present), create / approve / reject
proposal lifecycle, per-memory audit rows with action="tag_canonicalized"
- scripts/canonicalize_tags.py: host-side detector, autonomous by default,
--no-auto-approve kill switch
- 6 API endpoints under /admin/tags/* (distribution, list, propose,
apply, approve/{id}, reject/{id})
- Step B4 in batch-extract.sh (Sundays only — weekly cadence)
- 26 new tests (prompt parser, normalizer protections, distribution
counting, rewrite atomicity, dedup, audit, lifecycle). 414 → 440.
Design: aggressive protection of project tokens because a false
canonicalization (p04 → p04-gigabit, or vice versa) would scramble
cross-project filtering. Err toward preservation; the alias only
applies if the model is very confident AND both strings appear in
the current distribution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Daily job multiplies confidence by 0.97 (~2-month half-life) for
active memories with reference_count=0 AND idle > 30 days. Below
0.3 → auto-supersede with audit. Reversible via reinforcement
(which already bumps confidence back up).
Rationale: stale memories currently rank equal to fresh ones in
retrieval. Without decay, the brain accumulates obsolete facts
that compete with fresh knowledge for context-pack slots. With
decay, memories earn their longevity via reference.
- decay_unreferenced_memories() in service.py (stdlib-only, no cron
infra needed)
- POST /admin/memory/decay-run endpoint
- Nightly Step F4 in batch-extract.sh
- Exempt: reinforced (refcount > 0), graduated, superseded, invalid
- Audit row per supersession ("decayed below floor, no references"),
actor="confidence-decay". Per-decay rows skipped (chatty, no
human value — status change is the meaningful signal).
- Configurable via env: ATOCORE_DECAY_* (exposed through endpoint body)
Tests: +13 (basic decay, reinforcement protection, supersede at floor,
audit trail, graduated/superseded exemption, reinforcement reversibility,
threshold tuning, parameter validation, cross-run stacking).
401 → 414.
Next in Phase 7: 7C tag canonicalization (weekly), then 7B contradiction
detection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dedup detector now merges high-confidence duplicates silently instead
of piling every proposal into a human triage queue. Matches the 3-tier
escalation pattern that auto_triage already uses.
Tiering decision per cluster:
TIER-1 auto-approve: sonnet confidence >= 0.8 AND min_pairwise_sim >= 0.92
AND all sources share project+type → auto-merge silently
(actor="auto-dedup-tier1" in audit log)
TIER-2 escalation: sonnet 0.5-0.8 conf OR sim 0.85-0.92 → opus second opinion.
Opus confirms with conf >= 0.8 → auto-merge (actor="auto-dedup-tier2").
Opus overrides (reject) → skip silently.
Opus low conf → human triage with opus's refined draft.
HUMAN triage: Only the genuinely ambiguous land in /admin/triage.
Env-tunable thresholds:
ATOCORE_DEDUP_AUTO_APPROVE_CONF (0.8)
ATOCORE_DEDUP_AUTO_APPROVE_SIM (0.92)
ATOCORE_DEDUP_TIER2_MIN_CONF (0.5)
ATOCORE_DEDUP_TIER2_MIN_SIM (0.85)
ATOCORE_DEDUP_TIER2_MODEL (opus)
New flag --no-auto-approve for kill-switch testing (everything → human queue).
Tests: +6 (tier-2 prompt content, same_bucket edges, min_pairwise_similarity
on identical + transitive clusters). 395 → 401.
Rationale: user asked for autonomous behavior — "this needs to be intelligent,
I don't want to manually triage stuff". Matches the consolidation principle:
never discard details, but let the brain tidy up on its own for the easy cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New table memory_merge_candidates + service functions to cluster
near-duplicate active memories within (project, memory_type) buckets,
draft a unified content via LLM, and merge on human approval. Source
memories become superseded (never deleted); merged memory carries
union of tags, max of confidence, sum of reference_count.
- schema migration for memory_merge_candidates
- atocore.memory.similarity: cosine + transitive clustering
- atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific
- service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate
- scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent)
- 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan
- triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar
- batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays)
- deploy/dalidou/dedup-watcher.sh for UI-triggered scans
- 21 new tests (374 → 395)
- docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User observation: APM work was captured + extracted, but candidates got
tagged project=atocore or left blank instead of project=apm. Reason:
the prompt said 'Unknown project names — still tag them' but was too
terse; sonnet hedged toward registered matches rather than proposing
new slugs.
Fix: explicit guidance in the system prompt for when to propose an
unregistered project name vs when to stick with a registered one.
New instructions:
- When a memory is clearly ABOUT a named tool/product/project/system
not in the known list, use a slugified version as the project tag
('apm' for 'Atomaste Part Manager'). The Living Taxonomy detector
(Phase 6 C.1) scans these and surfaces for one-click registration
once ≥3 memories accumulate with that tag.
- Exception: if a memory merely USES an unknown tool but is about a
registered project ('p04 parts missing materials in APM'), tag with
the registered project and mention the tool in content.
This closes the loop on the Phase 6 detector: extractor now produces
taggable data for it, detector surfaces, user registers with one click.
Version bump: llm-0.5.0 → llm-0.6.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes two real-use gaps:
1. "APM tool" gap: work done outside Claude Code (desktop, web, phone,
other machine) was invisible to AtoCore.
2. Project discovery gap: manual JSON-file edits required to promote
an emerging theme to a first-class project.
B — atocore_remember MCP tool (scripts/atocore_mcp.py):
- New MCP tool for universal capture from any MCP-aware client
(Claude Desktop, Code, Cursor, Zed, Windsurf, etc.)
- Accepts content (required) + memory_type/project/confidence/
valid_until/domain_tags (all optional with sensible defaults)
- Creates a candidate memory, goes through the existing 3-tier triage
(no bypass — the quality gate catches noise)
- Detailed tool description guides Claude on when to invoke: "remember
this", "save that for later", "don't lose this fact"
- Total tools exposed by MCP server: 14 → 15
C.1 Emerging-concepts detector (scripts/detect_emerging.py):
- Nightly scan of active + candidate memories for:
* Unregistered project names with ≥3 memory occurrences
* Top 20 domain_tags by frequency (emerging categories)
* Active memories with reference_count ≥ 5 + valid_until set
(reinforced transients — candidates for extension)
- Writes findings to atocore/proposals/* project state entries
- Emits "warning" alert via Phase 4 framework the FIRST time a new
project crosses the 5-memory alert threshold (avoids spam)
- Configurable via env vars: ATOCORE_EMERGING_PROJECT_MIN (default 3),
ATOCORE_EMERGING_ALERT_THRESHOLD (default 5), TOP_TAGS_LIMIT (20)
C.2 Registration surface (src/atocore/api/routes.py + wiki.py):
- POST /admin/projects/register-emerging — one-click register with
sensible defaults (ingest_roots auto-filled with
vault:incoming/projects/<id>/ convention). Clears the proposal
from the dashboard list on success.
- Dashboard /admin/dashboard: new "proposals" section with
unregistered_projects + emerging_categories + reinforced_transients.
- Wiki homepage: "📋 Emerging" section rendering each unregistered
project as a card with count + 2 sample memory previews + inline
"📌 Register as project" button that calls the endpoint via fetch,
reloads the page on success.
C.3 Transient-to-durable extension
(src/atocore/memory/service.py + API + cron):
- New extend_reinforced_valid_until() function — scans active memories
with valid_until in the next 30 days and reference_count ≥ 5.
Extends expiry by 90 days. If reference_count ≥ 10, clears expiry
entirely (makes permanent). Writes audit rows via the Phase 4
memory_audit framework with actor="transient-to-durable".
- POST /admin/memory/extend-reinforced — API wrapper for cron.
- Matches the user's intuition: "something transient becomes important
if you keep coming back to it".
Nightly cron (deploy/dalidou/batch-extract.sh):
- Step F2: detect_emerging.py (after F pipeline summary)
- Step F3: /admin/memory/extend-reinforced (before integrity check)
- Both fail-open; errors don't break the pipeline.
Tests: 366 → 374 (+8 for Phase 6):
- 6 tests for extend_reinforced_valid_until covering:
extension path, permanent path, skip far-future, skip low-refs,
skip permanent memories, audit row write
- 2 smoke tests for the detector (imports cleanly, handles empty DB)
- MCP tool changes don't need new tests — the wrapper is pure passthrough
Design decisions documented in plan file:
- atocore_remember deliberately doesn't bypass triage (quality gate)
- Detector is passive (surfaces proposals) not active (auto-registers)
- Sensible ingest-root defaults ("vault:incoming/projects/<id>/")
so registration is one-click with no file-path thinking
- Extension adds 90 days rather than clearing expiry (gradual
permanence earned through sustained reinforcement)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NameError on /admin/graduation/request — routes.py doesn't import json
at module scope. Added local 'import json as _json' in both graduation
handlers matching the pattern used elsewhere in this file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes the graduation UX loop: no more SSH required to populate the
entity graph from memories. Click button → host watcher picks up
→ graduation runs → entity candidates appear in the same triage UI.
New API endpoints (src/atocore/api/routes.py):
- POST /admin/graduation/request: takes {project, limit}, writes flag
to project_state. Host watcher picks up within 2 min.
- GET /admin/graduation/status: returns requested/running/last_result
fields for UI polling.
Triage UI (src/atocore/engineering/triage_ui.py):
- Graduation bar with:
- 🎓 Graduate memories button
- Project selector populated from registry (or "all projects")
- Limit number input (default 30, max 200)
- Status message area
- Poll every 10s until is_running=false, then auto-reload the page to
show new entity candidates in the Entity section below
- Graduation bar appears on both populated and empty triage page
states so you can kick off graduation from either
Host watcher (deploy/dalidou/graduation-watcher.sh):
- Mirrors auto-triage-watcher.sh pattern: poll, lock, clear flag,
run, record result, unlock
- Parses {project, limit} JSON from the flag payload
- Runs graduate_memories.py with those args
- Records graduation_running/started/finished/last_result in project
state for the UI to display
- Lock file prevents concurrent runs
Install on host (one-time, via cron):
*/2 * * * * /srv/storage/atocore/app/deploy/dalidou/graduation-watcher.sh \
>> /home/papa/atocore-logs/graduation-watcher.log 2>&1
This completes the Phase 5 self-service loop: queue triage happens
autonomously via the 3-tier escalation (shipped in 3ca1972); entity
graph population happens autonomously via a button click. No shell
required for daily use.
Tests: 366 passing (no new tests — UI + shell are integration-level).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The population move + the safety net + the universal consumer hookup,
all shipped together. This is where the engineering graph becomes
genuinely useful against the real 262-memory corpus.
5F: Memory → Entity graduation (THE population move)
- src/atocore/engineering/_graduation_prompt.py: stdlib-only shared
prompt module mirroring _llm_prompt.py pattern (container + host
use same system prompt, no drift)
- scripts/graduate_memories.py: host-side batch driver that asks
claude-p "does this memory describe a typed entity?" and creates
entity candidates with source_refs pointing back to the memory
- promote_entity() now scans source_refs for memory:* prefix; if
found, flips source memory to status='graduated' with
graduated_to_entity_id forward pointer + writes memory_audit row
- GET /admin/graduation/stats exposes graduation rate for dashboard
5G: Sync conflict detection on entity promote
- src/atocore/engineering/conflicts.py: detect_conflicts_for_entity()
runs on every active promote. V1 checks 3 slot kinds narrowly to
avoid false positives:
* component.material (multiple USES_MATERIAL edges)
* component.part_of (multiple PART_OF edges)
* requirement.name (duplicate active Requirements in same project)
- Conflicts + members persist via the tables built in 5A
- Fires a "warning" alert via Phase 4 framework
- Deduplicates: same (slot_kind, slot_key) won't get a new row
- resolve_conflict(action="dismiss|supersede_others|no_action"):
supersede_others marks non-winner members as status='superseded'
- GET /admin/conflicts + POST /admin/conflicts/{id}/resolve
5H: MCP + context pack integration
- scripts/atocore_mcp.py: 7 new engineering tools exposed to every
MCP-aware client (Claude Desktop, Claude Code, Cursor, Zed):
* atocore_engineering_map (Q-001/004 system tree)
* atocore_engineering_gaps (Q-006/009/011 killer queries — THE
director's question surfaced as a built-in tool)
* atocore_engineering_requirements_for_component (Q-005)
* atocore_engineering_decisions (Q-008)
* atocore_engineering_changes (Q-013 — reads entity audit log)
* atocore_engineering_impact (Q-016 BFS downstream)
* atocore_engineering_evidence (Q-017 inbound provenance)
- MCP tools total: 14 (7 memory/state/health + 7 engineering)
- context/builder.py _build_engineering_context now appends a compact
gaps summary ("Gaps: N orphan reqs, M risky decisions, K unsupported
claims") so every project-scoped LLM call sees "what we're missing"
Tests: 341 → 356 (15 new):
- 5F: graduation prompt parses positive/negative decisions, rejects
unknown entity types, tolerates markdown fences; promote_entity
marks source memory graduated with forward pointer; entity without
memory refs promotes cleanly
- 5G: component.material + component.part_of + requirement.name
conflicts detected; clean component triggers nothing; dedup works;
supersede_others resolution marks losers; dismiss leaves both
active; end-to-end promote triggers detection
- 5H: graduation user message includes project + type + content
No regressions across the 341 prior tests. The MCP server now answers
"which p05 requirements aren't satisfied?" directly from any Claude
session — no user prompt engineering, no context hacks.
Next to kick off from user: run graduation script on Dalidou to
populate the graph from 262 existing memories:
ssh papa@dalidou 'cd /srv/storage/atocore/app && PYTHONPATH=src \
python3 scripts/graduate_memories.py --project p05-interferometer --limit 30 --dry-run'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The graph becomes useful. Before this commit, entities sat in the DB
as data with no narrative. After: the director can ask "what am I
forgetting?" and get a structured answer in milliseconds.
New module (src/atocore/engineering/queries.py, 360 lines):
Structure queries (Q-001/004/005/008/013):
- system_map(project): full subsystem → component tree + orphans +
materials joined per component
- decisions_affecting(project, subsystem_id?): decisions linked via
AFFECTED_BY_DECISION, scoped to a subsystem or whole project
- requirements_for(component_id): Q-005 forward trace
- recent_changes(project, since, limit): Q-013 via memory_audit join
(reuses the Phase 4 audit infrastructure — entity_kind='entity')
The 3 killer queries (the real value):
- orphan_requirements(project): requirements with NO inbound SATISFIES
edge. "What do I claim the system must do that nothing actually
claims to handle?" Q-006.
- risky_decisions(project): decisions whose BASED_ON_ASSUMPTION edge
points to an assumption with status in ('superseded','invalid') OR
properties.flagged=True. Finds cascading risk from shaky premises. Q-009.
- unsupported_claims(project): ValidationClaim entities with no inbound
SUPPORTS edge — asserted but no Result to back them. Q-011.
- all_gaps(project): runs all three in one call for dashboards.
History + impact (Q-016/017):
- impact_analysis(entity_id, max_depth=3): BFS over outbound edges.
"What's downstream of this if I change it?"
- evidence_chain(entity_id): inbound SUPPORTS/EVIDENCED_BY/DESCRIBED_BY/
VALIDATED_BY/ANALYZED_BY. "How do I know this is true?"
API (src/atocore/api/routes.py) exposes 10 endpoints:
- GET /engineering/projects/{p}/systems
- GET /engineering/decisions?project=&subsystem=
- GET /engineering/components/{id}/requirements
- GET /engineering/changes?project=&since=&limit=
- GET /engineering/gaps/orphan-requirements?project=
- GET /engineering/gaps/risky-decisions?project=
- GET /engineering/gaps/unsupported-claims?project=
- GET /engineering/gaps?project= (combined)
- GET /engineering/impact?entity=&max_depth=
- GET /engineering/evidence?entity=
Mirror integration (src/atocore/engineering/mirror.py):
- New _gaps_section() renders at top of every project page
- If any gap non-empty: shows up-to-10 per category with names + context
- Clean project: "✅ No gaps detected" — signals everything is traced
Triage UI (src/atocore/engineering/triage_ui.py):
- /admin/triage now shows BOTH memory candidates AND entity candidates
- Entity cards: name, type, project, confidence, source provenance,
Promote/Reject buttons, link to wiki entity page
- Entity promote/reject via fetch to /entities/{id}/promote|reject
- One triage UI for the whole pipeline — consistent muscle memory
Tests: 326 → 341 (15 new, all in test_engineering_queries.py):
- System map structure + orphan detection + material joins
- Killer queries: positive + negative cases (empty when clean)
- Decisions query: project-wide and subsystem-scoped
- Impact analysis walks outbound BFS
- Evidence chain walks inbound provenance
No regressions. All 10 daily queries from the plan are now live and
answering real questions against the graph.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
scripts/integrity_check.py now POSTs to /admin/integrity-check
instead of importing atocore directly. The actual scan lives in
the container where DB access + deps are available. Host-side
cron just triggers and logs the result.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the observability + safety layer that turns AtoCore from
"works until something silently breaks" into "every mutation is
traceable, drift is detected, failures raise alerts."
1. Audit log (memory_audit table):
- New table with id, memory_id, action, actor, before/after JSON,
note, timestamp; 3 indexes for memory_id/timestamp/action
- _audit_memory() helper called from every mutation:
create_memory, update_memory, promote_memory,
reject_candidate_memory, invalidate_memory, supersede_memory,
reinforce_memory, auto_promote_reinforced, expire_stale_candidates
- Action verb auto-selected: promoted/rejected/invalidated/
superseded/updated based on state transition
- "actor" threaded through: api-http, human-triage, phase10-auto-
promote, candidate-expiry, reinforcement, etc.
- Fail-open: audit write failure logs but never breaks the mutation
- GET /memory/{id}/audit: full history for one memory
- GET /admin/audit/recent: last 50 mutations across the system
2. Alerts framework (src/atocore/observability/alerts.py):
- emit_alert(severity, title, message, context) fans out to:
- structlog logger (always)
- ~/atocore-logs/alerts.log append (configurable via
ATOCORE_ALERT_LOG)
- project_state atocore/alert/last_{severity} (dashboard surface)
- ATOCORE_ALERT_WEBHOOK POST if set (auto-detects Discord webhook
format for nice embeds; generic JSON otherwise)
- Every sink fail-open — one failure doesn't prevent the others
- Pipeline alert step in nightly cron: harness < 85% → warning;
candidate queue > 200 → warning
3. Integrity checks (scripts/integrity_check.py):
- Nightly scan for drift:
- Memories → missing source_chunk_id references
- Duplicate active memories (same type+content+project)
- project_state → missing projects
- Orphaned source_chunks (no parent document)
- Results persisted to atocore/status/integrity_check_result
- Any finding emits a warning alert
- Added as Step G in deploy/dalidou/batch-extract.sh nightly cron
4. Dashboard surfaces it all:
- integrity (findings + details)
- alerts (last info/warning/critical per severity)
- recent_audit (last 10 mutations with actor + action + preview)
Tests: 308 → 317 (9 new):
- test_audit_create_logs_entry
- test_audit_promote_logs_entry
- test_audit_reject_logs_entry
- test_audit_update_captures_before_after
- test_audit_reinforce_logs_entry
- test_recent_audit_returns_cross_memory_entries
- test_emit_alert_writes_log_file
- test_emit_alert_invalid_severity_falls_back_to_info
- test_emit_alert_fails_open_on_log_write_error
Deferred: formal migration framework with rollback (current additive
pattern is fine for V1); memory detail wiki page with audit view
(quick follow-up).
To enable Discord alerts: set ATOCORE_ALERT_WEBHOOK to a Discord
webhook URL in Dalidou's environment. Default = log-only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds structural metadata that the LLM triage was already implicitly
reasoning about ("stale snapshot" → reject). Phase 3 captures that
reasoning as fields so it can DRIVE retrieval, not just rejection.
Schema (src/atocore/models/database.py):
- domain_tags TEXT DEFAULT '[]' JSON array of lowercase topic keywords
- valid_until DATETIME ISO date; null = permanent
- idx_memories_valid_until index for efficient expiry queries
Memory service (src/atocore/memory/service.py):
- Memory dataclass gains domain_tags + valid_until
- create_memory, update_memory accept/persist both
- _row_to_memory safely reads both (JSON-decode + null handling)
- _normalize_tags helper: lowercase, dedup, strip, cap at 10
- get_memories_for_context filters expired (valid_until < today UTC)
- _rank_memories_for_query adds tag-boost: memories whose domain_tags
appear as substrings in query text rank higher (tertiary key after
content-overlap density + absolute overlap, before confidence)
LLM extractor (_llm_prompt.py → llm-0.5.0):
- SYSTEM_PROMPT documents domain_tags (2-5 keywords) + valid_until
(time-bounded facts get expiry dates; durable facts stay null)
- normalize_candidate_item parses both fields from model output with
graceful fallback for string/null/missing
LLM triage (scripts/auto_triage.py):
- TRIAGE_SYSTEM_PROMPT documents same two fields
- parse_verdict extracts them from verdict JSON
- On promote: PUT /memory/{id} with tags + valid_until BEFORE
POST /memory/{id}/promote, so active memories carry them
API (src/atocore/api/routes.py):
- MemoryCreateRequest: adds domain_tags, valid_until
- MemoryUpdateRequest: adds domain_tags, valid_until, memory_type
- GET /memory response exposes domain_tags + valid_until + created_at
Triage UI (src/atocore/engineering/triage_ui.py):
- Renders existing tags as colored badges
- Adds inline text field for tags (comma-separated) + date picker for
valid_until on every candidate card
- Save&Promote button persists edits via PUT then promotes
- Plain Promote (and Y shortcut) also saves tags/expiry if edited
Wiki (src/atocore/engineering/wiki.py):
- Search now matches memory content OR domain_tags
- Search results render tags as clickable badges linking to
/wiki/search?q=<tag> for cross-project navigation
- valid_until shown as amber "valid until YYYY-MM-DD" hint
Tests: 303 → 308 (5 new for Phase 3 behavior):
- test_create_memory_with_tags_and_valid_until
- test_create_memory_normalizes_tags
- test_update_memory_sets_tags_and_valid_until
- test_get_memories_for_context_excludes_expired
- test_context_builder_tag_boost_orders_results
Deferred (explicitly): temporal_scope enum, source_refs memory graph,
HDBSCAN clustering, memory detail wiki page, backfill of existing
actives. See docs/MASTER-BRAIN-PLAN.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an "Auto-process queue" button to /admin/triage that lets the
user kick off a full LLM triage pass without SSH. Bridges the gap
between web UI (in container) and claude CLI (host-only).
Architecture:
- UI button POSTs to /admin/triage/request-drain
- Endpoint writes atocore/config/auto_triage_requested_at flag
- Host-side watcher cron (every 2 min) checks for the flag
- When found: clears flag, acquires lock, runs auto_triage.py,
records progress via atocore/status/* entries
- UI polls /admin/triage/drain-status every 10s to show progress,
auto-reloads when done
Safety:
- Lock file prevents concurrent runs on host
- Flag cleared before run so duplicate clicks queue at most one re-run
- Fail-open: watcher errors just log, don't break anything
- Status endpoint stays read-only
Installation on host (one-time):
*/2 * * * * /srv/storage/atocore/app/deploy/dalidou/auto-triage-watcher.sh \
>> /home/papa/atocore-logs/auto-triage-watcher.log 2>&1
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes human triage sustainable. Before: command-line-only review,
auto-triage stopped after 100 candidates/run. Now:
1. Web UI at /admin/triage
- Lists all pending candidates with inline promote/reject/edit
- Edit content in-place before promoting (PUT /memory/{id})
- Change type via dropdown
- Keyboard shortcuts: Y=promote, N=reject, E=edit, S=scroll-next
- Cards fade out after action, queue count updates live
- Zero JS framework — vanilla fetch + event delegation
2. auto_triage.py drains queue
- Loops up to 20 batches (default) of 100 candidates each
- Tracks seen IDs so needs_human items don't reprocess
- Exits cleanly when queue empty
- Nightly cron naturally drains everything
3. Dashboard + wiki surface triage queue
- Dashboard /admin/dashboard: new "triage" section with pending
count + /admin/triage URL + warning/notice severity levels
- Wiki homepage: prominent callout "N candidates awaiting triage —
review now" linking to /admin/triage, styled with triage-warning
(>50) or triage-notice (>20) CSS
Pattern: human intervenes only when AI can't decide. The UI makes
that intervention fast (20 candidates in 60 seconds). Nightly
auto-triage drains the queue so the human queue stays bounded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R11: POST /admin/extract-batch with mode=llm now returns 503 when the
claude CLI is unavailable (was silently returning success with 0
candidates), with a message pointing at the host-side script. +2 tests.
R12: extracted SYSTEM_PROMPT + parse_llm_json_array +
normalize_candidate_item + build_user_message into stdlib-only
src/atocore/memory/_llm_prompt.py. Both the container extractor and
scripts/batch_llm_extract_live.py now import from it, eliminating the
prompt/parser drift risk.
Tests 297 -> 299.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-triage was rejecting 8 of 10 OpenClaw imports as 'session log'
or 'process rule belongs elsewhere'. But OpenClaw's SOUL.md, USER.md,
MEMORY.md and daily memory/*.md files are already curated — they ARE
the canonical continuity layer we want to absorb. Applying the
conservative LLM-conversation triage bar to them discards the signal
the importer was designed to capture.
Triage prompt now has a rule 4: when candidate content starts with
'From OpenClaw/' apply a much lower bar. Session events, project
updates, stakeholder notes, and decisions from daily memory files
should promote, not reject.
The ABB-Space Schott quote that DID promote was the lucky exception
— after this fix, the other 7 daily notes (CDR execution log,
Discord migration plan, isogrid research, etc.) will promote too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extraction prompt rewritten for signal-aggressive mode. The old prompt
rewarded silence ("durable insight only, empty is correct") which
caused quiet failures — real project signal (Schott quotes arriving,
stakeholder events, blockers) was dropped as "not architectural enough".
New prompt explicitly lists what to emit:
1. Project activity (mentions with context — quote received, blocker,
action item)
2. Decisions and choices (architectural commitments, vendor selection)
3. Durable engineering insight (earned knowledge, generalizable)
4. Stakeholder and vendor events (emails sent, meetings scheduled)
5. Preferences and adaptations (how Antoine works)
Philosophy shift: "capture more signal, let triage filter noise"
replaces "extract only durable architectural facts". Auto-triage
already rejects noise well, so moving the filter downstream gives us
visibility into weak signals without polluting active memory.
Added 'episodic' to the candidate types list to support stakeholder
events with a timestamp feel.
LLM_EXTRACTOR_VERSION bumped to llm-0.4.0.
Also: cron-backup.sh now runs POST /ingest/sources before extraction
so new PKM files flow in automatically. Fail-open, non-blocking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three additive upgrades borrowed from Karpathy's LLM Wiki pattern:
1. CONTRADICTION DETECTION: auto-triage now has a fourth verdict —
"contradicts". When a candidate conflicts with an existing memory
(not duplicates, genuine disagreement like "Option A selected"
vs "Option B selected"), the triage model flags it and leaves
it in the queue for human review instead of silently rejecting
or double-storing. Preserves source tension rather than
suppressing it.
2. WEEKLY LINT PASS: scripts/lint_knowledge_base.py checks for:
- Orphan memories (active but zero references after 14 days)
- Stale candidates (>7 days unreviewed)
- Unused entities (no relationships)
- Empty-state projects
- Unregistered projects auto-detected in memories
Runs Sundays via the cron. Outputs a report.
3. WEEKLY SYNTHESIS: scripts/synthesize_projects.py uses sonnet to
generate a 3-5 sentence "current state" paragraph per project
from state + memories + entities. Cached in project_state under
status/synthesis_cache. Wiki project pages now show this at the
top under "Current State (auto-synthesis)". Falls back to a
deterministic summary if no cache exists.
deploy/dalidou/batch-extract.sh: added Step C (synthesis) and
Step D (lint) gated to Sundays via date check.
All additive — nothing existing changes behavior. The database
remains the source of truth; these operations just produce better
synthesized views and catch rot.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Projects now appear under three buckets based on their state entries:
- Active Contracts
- Leads & Prospects
- Internal Tools & Infra
Each card shows the stage as a tag on the project title, the client
as an italic subtitle, and the project description. Empty buckets
hide. Makes it obvious at a glance what's contracted vs lead vs
internal.
Paired with stage/type/client state entries added to all 6 projects
so the grouping has data to work with.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes:
1. ABB-Space registered as a lead project with stage=lead in
Trusted Project State. Projects now have lifecycle awareness
(lead/proposition vs active contract vs completed).
2. Extraction no longer drops unregistered project tags. When the
LLM extractor sees a conversation about a project not in the
registry, it keeps the model's tag on the candidate instead of
falling back to empty. This enables auto-detection of new
projects/leads from organic conversations. The nightly pipeline
surfaces these candidates for triage, where the operator sees
"hey, there's a new project called X" and can decide whether
to register it.
3. Extraction prompt updated to tell the model: "If the conversation
discusses a project NOT in the known list, still tag it — the
system will auto-detect it." This removes the artificial ceiling
that prevented new project discovery.
Updated Case D test: unregistered + unscoped now keeps the model's
tag instead of dropping to empty.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full wiki interface at /wiki with:
- /wiki — Homepage with project cards, search box, system stats
- /wiki/projects/{name} — Project page with clickable entity links
- /wiki/entities/{id} — Entity detail with relationships as links
- /wiki/search?q=... — Search across entities and memories
Every entity name in a project page links to its detail page.
Entity detail pages show properties, relationships as clickable
links to related entities, and breadcrumb navigation back to the
project and wiki home.
Responsive, dark-mode, mobile-friendly. Card grid for projects.
Generated on-demand from the database — always current, no static
files, source of truth is the DB.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Layer 3 of the AtoCore architecture. Generates a human-readable
project overview in markdown from structured data:
- Trusted Project State (by category)
- System Architecture (systems → subsystems → components with
material and interface links)
- Decisions (with affected entities)
- Requirements & Constraints
- Materials
- Vendors
- Active Memories (with confidence and reference counts)
The mirror is DERIVED — every line traces back to an entity, state
entry, or memory. The footer stamps the generation timestamp and
the "not canonical truth" disclaimer.
API: GET /projects/{project_name}/mirror returns {project, format,
content} where content is the full markdown page. Supports project
aliases via resolve_project_name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a query matches a known engineering entity by name, the context
pack now includes a structured '--- Engineering Context ---' band
showing the entity's type, description, and its relationships to
other entities (subsystems, materials, requirements, decisions).
Six-tier context assembly:
1. Trusted Project State
2. Identity / Preferences
3. Project Memories
4. Domain Knowledge
5. Engineering Context (NEW)
6. Retrieved Chunks
The engineering band uses the same token-overlap scoring as memory
ranking: query tokens are matched against entity names + descriptions.
The top match gets its full relationship context included.
10% budget allocation. Trims before domain knowledge (lowest
priority of the structured tiers since the same info may appear in
chunks).
Example: query 'lateral support design' against p04-gigabit
surfaces the Lateral Support subsystem entity with its relationships
to GF-PTFE material, M1 Mirror Assembly parent system, and related
components.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Layer 2 of the AtoCore architecture. Adds typed engineering entities
with relationships on top of the flat memory/state/chunk substrate.
Schema:
- entities table: id, entity_type, name, project, description,
properties (JSON), status, confidence, source_refs, timestamps
- relationships table: source_entity_id, target_entity_id,
relationship_type, confidence, source_refs
15 entity types: project, system, subsystem, component, interface,
requirement, constraint, decision, material, parameter,
analysis_model, result, validation_claim, vendor, process
12 relationship types: contains, part_of, interfaces_with,
satisfies, constrained_by, affected_by_decision, analyzed_by,
validated_by, depends_on, uses_material, described_by, supersedes
Service layer: full CRUD + get_entity_with_context (returns an
entity with its relationships and all related entities in one call).
API endpoints:
- POST /entities — create entity
- GET /entities — list/filter by type, project, status, name
- GET /entities/{id} — entity + relationships + related entities
- POST /relationships — create relationship
Schema auto-initialized on app startup via init_engineering_schema().
7 tests covering entity CRUD, relationships, context traversal,
filtering, name search, and validation.
Test count: 290 -> 297.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The extraction system now produces two kinds of candidates from
the same conversation:
A. PROJECT-SPECIFIC: applied facts scoped to a named project
(unchanged behavior)
B. DOMAIN KNOWLEDGE: generalizable engineering insight earned
through project work, tagged with a domain (physics, materials,
optics, mechanics, manufacturing, metrology, controls, software,
math, finance) and stored with project="" so it surfaces across
all projects.
Critical quality bar enforced in the system prompt: "Would a
competent engineer need experience to know this, or could they
find it in 30 seconds on Google?" Textbook values, definitions,
and obvious facts are explicitly excluded. Only hard-won insight
qualifies — the kind that takes weeks of FEA or real machining
experience to discover.
Domain tags are embedded in the content as a prefix ("[physics]",
"[materials]") so they survive without a schema migration. A future
column can parse them out.
Context builder gains a new tier between project memories and
retrieved chunks:
Tier 1: Trusted Project State (project-specific)
Tier 2: Identity / Preferences (global)
Tier 3: Project Memories (project-specific)
Tier 4: Domain Knowledge (NEW) (cross-project, 10% budget)
Tier 5: Retrieved Chunks (project-boosted)
Trim order: chunks -> domain knowledge -> project memories ->
identity/preference -> project state.
Host-side extraction script updated with the same prompt and
domain-tag handling.
LLM_EXTRACTOR_VERSION bumped to llm-0.3.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wave 2 deeper ingestion:
- 6 new Trusted Project State entries from design-level docs:
p05: test rig architecture, CGH specification, procurement combos
p06: force control architecture, control channels, calibration loop
- Total state entries: ~23 (was ~17)
Observability:
- GET /admin/dashboard — one-shot system overview: memory counts
by type/project/status, reinforced count, project state entry
counts, recent interaction timestamp, extraction pipeline status.
Replaces the need to query 4+ endpoints to understand system state.
Harness: 17/18 (no regression from new state entries).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 identity memories (Antoine's role, projects, infrastructure) and
3 preference memories (no API keys, multi-model collab, action bias)
seeded on live Dalidou. These fill the identity/preference band
that was previously empty.
Lowered MEMORY_BUDGET_RATIO from 0.10 to 0.05 because the 10%
allocation squeezed project memories and retrieval chunks enough
to regress 4 harness fixtures. At 5% the band fits at most 1 short
memory — enough for the most relevant identity/preference fact
without starving the project-specific tiers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Batch 3, Days 1-3. The core R9 failure was Case F: when the model
returned a registered project DIFFERENT from the interaction's
known scope, the old code trusted the model because the project
was registered. A p06-polisher interaction could silently produce
a p04-gigabit candidate.
New rule (trust hierarchy):
1. Interaction scope always wins when set (cases A, C, E, F)
2. Model project used only for unscoped interactions AND only when
it resolves to a registered project (cases D, G)
3. Empty string when both are empty or unregistered (case B)
The rule is: interaction.project is the strongest signal because
it comes from the capture hook's project detection, which runs
before the LLM ever sees the content. The model's project guess
is only useful when the capture hook had no project context.
7 case tests (A-G) cover every combination of model/interaction
project state. Pre-existing tests updated for the new behavior.
Host-side script mirrors the same hierarchy using _known_projects
fetched from GET /projects at startup.
Test count: 286 -> 290 (+4 net, 7 new R9 cases, 3 old tests
consolidated).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R7: ranking scorer now uses overlap-density (overlap_count /
memory_token_count) as primary key instead of raw overlap count.
A 5-token memory with 3 overlapping tokens (density 0.6) now beats
a 40-token overview memory with 3 overlapping tokens (density 0.075)
at the same absolute count. Secondary: absolute overlap. Tertiary:
confidence. Targeting p06-firmware-interface harness fixture.
R9: when the LLM extractor returns a project that differs from the
interaction's known project, it now checks the project registry.
If the model's project is a registered canonical ID, trust it. If
not (hallucinated name), fall back to the interaction's project.
Uses load_project_registry() for the check. The host-side script
mirrors this via an API call to GET /projects at startup.
Two new tests: test_parser_keeps_registered_model_project and
test_parser_rejects_hallucinated_project.
Test count: 280 -> 281.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dalidou runs Claude Code 2.0.60 which does not have this flag
(added in 2.1.x). Removed from both extractor_llm.py and the
host-side batch script. --append-system-prompt and
--disable-slash-commands are supported on 2.0.60.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Day 1 of the operational-reflection batch. Two changes:
1. POST /admin/extract-batch: batch extraction endpoint that fetches
recent interactions (since last run or explicit 'since' param),
runs the extractor (rule or LLM mode), and persists candidates
with status=candidate. Tracks last-run timestamp in project state
(atocore/status/last_extract_batch_run) so subsequent calls
auto-resume. This is the operational home for R1/R5 — makes the
LLM extractor an API operation, not just a script.
2. POST /interactions/{id}/extract now accepts mode: "rule" | "llm"
(default "rule" for backward compatibility). When "llm", it uses
extract_candidates_llm (claude -p sonnet, OAuth).
Both changes preserve the standing decision: extraction stays off
the capture hot path. The batch endpoint is invoked explicitly by
cron, manual curl, or CLI — never inline with POST /interactions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex R6: the LLM extractor accepted the model's project field
verbatim. When the model returned empty string, clearly p06 memories
got promoted as project='', making them invisible to the p06
project-memory band and explaining the p06-offline-design harness
failure.
Fix: if model returns empty project but interaction.project is set,
inherit the interaction's project. Model-supplied project still takes
precedence when non-empty.
Two new tests lock the fallback and precedence behaviors.
R5 acknowledged (LLM extractor not yet wired into API — next task).
Test count: 278 -> 280. Harness re-run pending after deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Haiku was producing noisy candidates (31% accept rate on first
triage). Sonnet should give tighter extraction with fewer false
positives while still catching the same durable-fact patterns.
Override: ATOCORE_LLM_EXTRACTOR_MODEL=haiku to revert.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A 530-char program overview memory with confidence 0.96 was filling
the entire 25% project-memory budget at equal overlap score (3 tokens),
beating shorter query-relevant newly-promoted memories (confidence
0.5) on the confidence tiebreaker. The long memory legitimately
scored well, but its length starved every other memory from the band.
Fix: truncate each formatted entry to 250 chars with '...' so at
least 2-3 memories fit the ~700-char available budget. This doesn't
change ranking — the most relevant memory still goes first — but
it ensures the runner-up can also appear.
Harness fixture delta: Day 7 regression pass pending after deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the LLM-assisted extractor's in-scope / out-of-scope
categories derived from the first live triage pass (16 promoted,
35 rejected). Five in-scope classes, six explicit out-of-scope
classes, trust model summary, multi-model future direction.
Cleaned up stale follow-up items in next-steps.md: rule expansion
marked deprioritized, LLM extractor marked done, retrieval harness
marked done with expansion pending.
Fixed docstring timeout (45s -> 90s) in extractor_llm.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>