45196f352f3ee60de8e8e9e3a2081dc440d9ab04
46 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| 45196f352f |
fix: force UTF-8 on MCP stdio for Windows compatibility
Windows Python defaults stdout to cp1252. Any non-ASCII char in tool responses (emojis, ≥, →, etc.) crashes the MCP server with a UnicodeEncodeError. Explicitly reconfigure stdin/stdout/stderr to UTF-8 at startup. No-op on Linux/macOS. Noticed when Claude Code called atocore_search and atocore_memory_list and both crashed on ⏳ / ≥ characters that came back in the response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 3ca19724a5 |
feat: 3-tier triage escalation + project validation + enriched context
Addresses the triage-quality problems the user observed:
- Candidates getting wrong project/product attribution
- Stale facts promoted as if still true
- "Hard to decide" items reaching human queue without real value
Solution: let sonnet handle the easy 80%, escalate borderline cases
to opus, auto-discard (or flag) what two models can't resolve.
Plus enrich the context the triage model sees so it can catch
misattribution, contradictions, and temporal drift earlier.
THE 3-TIER FLOW (scripts/auto_triage.py):
Tier 1: sonnet (fast, cheap)
- confidence >= 0.8 + clear verdict → PROMOTE or REJECT (done)
- otherwise → escalate to tier 2
Tier 2: opus (smarter, sees tier-1 verdict + reasoning)
- second opinion with explicit "sonnet said X, resolve the uncertainty"
- confidence >= 0.8 → PROMOTE or REJECT with note="[opus]"
- still uncertain → tier 3
Tier 3: configurable (default discard)
- ATOCORE_TRIAGE_TIER3=discard (default): auto-reject with
"two models couldn't decide" reason
- ATOCORE_TRIAGE_TIER3=human: leave in queue for /admin/triage
Configuration via env vars:
ATOCORE_TRIAGE_MODEL_TIER1 (default sonnet)
ATOCORE_TRIAGE_MODEL_TIER2 (default opus)
ATOCORE_TRIAGE_TIER3 (default discard)
ATOCORE_TRIAGE_ESCALATION_THRESHOLD (default 0.75)
ATOCORE_TRIAGE_TIER2_TIMEOUT_S (default 120 — opus is slower)
ENRICHED CONTEXT shown to the triage model (both tiers):
- List of registered project ids so misattribution is detectable
- Trusted project state entries (ground truth, higher trust than memories)
- Top 30 active memories for the claimed project (was 20)
- Tier 2 additionally sees tier 1's verdict + reason
PROJECT MISATTRIBUTION DETECTION:
- Triage prompt asks the model to output "suggested_project" when it
detects the claimed project is wrong but the content clearly belongs
to a registered one
- Main loop auto-applies the fix via PUT /memory/{id} (which canonicalizes
through the registry)
- Misattribution is the #1 pollution source — this catches it upstream
TEMPORAL AGGRESSIVENESS:
- Prompt upgraded: "be aggressive with valid_until for anything that
reads like 'current state' or 'this week'. When in doubt, 2-4 week
expiry rather than null."
- Stale facts decay automatically via Phase 3's expiry filter
CONFIDENCE GRADING (new in prompt):
- 0.9+: crystal clear durable fact or clear noise
- 0.75-0.9: confident but not cryptographic-certain
- 0.6-0.75: borderline — WILL escalate
- <0.6: genuinely ambiguous — human or discard
Tests: 356 → 366 (10 new, all in test_triage_escalation.py):
- High-confidence tier-1 promote/reject → no tier-2 call
- Low-confidence tier-1 → tier-2 escalates → decides
- needs_human always escalates regardless of confidence
- tier-2 uncertain → discard by default
- tier-2 uncertain → human when configured
- dry-run skips all API calls
- suggested_project flag surfaces + gets printed
- parse_verdict captures suggested_project
Runtime behavior unchanged for the clear cases (sonnet still handles
them). The 20-30% of candidates that currently land as needs_human
will now route through opus, and only the genuinely stuck get a human
(or discard) action.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 3316ff99f9 |
feat: Phase 5F/5G/5H — graduation, conflicts, MCP engineering tools
The population move + the safety net + the universal consumer hookup,
all shipped together. This is where the engineering graph becomes
genuinely useful against the real 262-memory corpus.
5F: Memory → Entity graduation (THE population move)
- src/atocore/engineering/_graduation_prompt.py: stdlib-only shared
prompt module mirroring _llm_prompt.py pattern (container + host
use same system prompt, no drift)
- scripts/graduate_memories.py: host-side batch driver that asks
claude-p "does this memory describe a typed entity?" and creates
entity candidates with source_refs pointing back to the memory
- promote_entity() now scans source_refs for memory:* prefix; if
found, flips source memory to status='graduated' with
graduated_to_entity_id forward pointer + writes memory_audit row
- GET /admin/graduation/stats exposes graduation rate for dashboard
5G: Sync conflict detection on entity promote
- src/atocore/engineering/conflicts.py: detect_conflicts_for_entity()
runs on every active promote. V1 checks 3 slot kinds narrowly to
avoid false positives:
* component.material (multiple USES_MATERIAL edges)
* component.part_of (multiple PART_OF edges)
* requirement.name (duplicate active Requirements in same project)
- Conflicts + members persist via the tables built in 5A
- Fires a "warning" alert via Phase 4 framework
- Deduplicates: same (slot_kind, slot_key) won't get a new row
- resolve_conflict(action="dismiss|supersede_others|no_action"):
supersede_others marks non-winner members as status='superseded'
- GET /admin/conflicts + POST /admin/conflicts/{id}/resolve
5H: MCP + context pack integration
- scripts/atocore_mcp.py: 7 new engineering tools exposed to every
MCP-aware client (Claude Desktop, Claude Code, Cursor, Zed):
* atocore_engineering_map (Q-001/004 system tree)
* atocore_engineering_gaps (Q-006/009/011 killer queries — THE
director's question surfaced as a built-in tool)
* atocore_engineering_requirements_for_component (Q-005)
* atocore_engineering_decisions (Q-008)
* atocore_engineering_changes (Q-013 — reads entity audit log)
* atocore_engineering_impact (Q-016 BFS downstream)
* atocore_engineering_evidence (Q-017 inbound provenance)
- MCP tools total: 14 (7 memory/state/health + 7 engineering)
- context/builder.py _build_engineering_context now appends a compact
gaps summary ("Gaps: N orphan reqs, M risky decisions, K unsupported
claims") so every project-scoped LLM call sees "what we're missing"
Tests: 341 → 356 (15 new):
- 5F: graduation prompt parses positive/negative decisions, rejects
unknown entity types, tolerates markdown fences; promote_entity
marks source memory graduated with forward pointer; entity without
memory refs promotes cleanly
- 5G: component.material + component.part_of + requirement.name
conflicts detected; clean component triggers nothing; dedup works;
supersede_others resolution marks losers; dismiss leaves both
active; end-to-end promote triggers detection
- 5H: graduation user message includes project + type + content
No regressions across the 341 prior tests. The MCP server now answers
"which p05 requirements aren't satisfied?" directly from any Claude
session — no user prompt engineering, no context hacks.
Next to kick off from user: run graduation script on Dalidou to
populate the graph from 262 existing memories:
ssh papa@dalidou 'cd /srv/storage/atocore/app && PYTHONPATH=src \
python3 scripts/graduate_memories.py --project p05-interferometer --limit 30 --dry-run'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| bb46e21c9b |
fix: integrity check runs in container (host lacks deps)
scripts/integrity_check.py now POSTs to /admin/integrity-check instead of importing atocore directly. The actual scan lives in the container where DB access + deps are available. Host-side cron just triggers and logs the result. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 88f2f7c4e1 |
feat: Phase 4 V1 — Robustness Hardening
Adds the observability + safety layer that turns AtoCore from
"works until something silently breaks" into "every mutation is
traceable, drift is detected, failures raise alerts."
1. Audit log (memory_audit table):
- New table with id, memory_id, action, actor, before/after JSON,
note, timestamp; 3 indexes for memory_id/timestamp/action
- _audit_memory() helper called from every mutation:
create_memory, update_memory, promote_memory,
reject_candidate_memory, invalidate_memory, supersede_memory,
reinforce_memory, auto_promote_reinforced, expire_stale_candidates
- Action verb auto-selected: promoted/rejected/invalidated/
superseded/updated based on state transition
- "actor" threaded through: api-http, human-triage, phase10-auto-
promote, candidate-expiry, reinforcement, etc.
- Fail-open: audit write failure logs but never breaks the mutation
- GET /memory/{id}/audit: full history for one memory
- GET /admin/audit/recent: last 50 mutations across the system
2. Alerts framework (src/atocore/observability/alerts.py):
- emit_alert(severity, title, message, context) fans out to:
- structlog logger (always)
- ~/atocore-logs/alerts.log append (configurable via
ATOCORE_ALERT_LOG)
- project_state atocore/alert/last_{severity} (dashboard surface)
- ATOCORE_ALERT_WEBHOOK POST if set (auto-detects Discord webhook
format for nice embeds; generic JSON otherwise)
- Every sink fail-open — one failure doesn't prevent the others
- Pipeline alert step in nightly cron: harness < 85% → warning;
candidate queue > 200 → warning
3. Integrity checks (scripts/integrity_check.py):
- Nightly scan for drift:
- Memories → missing source_chunk_id references
- Duplicate active memories (same type+content+project)
- project_state → missing projects
- Orphaned source_chunks (no parent document)
- Results persisted to atocore/status/integrity_check_result
- Any finding emits a warning alert
- Added as Step G in deploy/dalidou/batch-extract.sh nightly cron
4. Dashboard surfaces it all:
- integrity (findings + details)
- alerts (last info/warning/critical per severity)
- recent_audit (last 10 mutations with actor + action + preview)
Tests: 308 → 317 (9 new):
- test_audit_create_logs_entry
- test_audit_promote_logs_entry
- test_audit_reject_logs_entry
- test_audit_update_captures_before_after
- test_audit_reinforce_logs_entry
- test_recent_audit_returns_cross_memory_entries
- test_emit_alert_writes_log_file
- test_emit_alert_invalid_severity_falls_back_to_info
- test_emit_alert_fails_open_on_log_write_error
Deferred: formal migration framework with rollback (current additive
pattern is fine for V1); memory detail wiki page with audit view
(quick follow-up).
To enable Discord alerts: set ATOCORE_ALERT_WEBHOOK to a Discord
webhook URL in Dalidou's environment. Default = log-only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| bfa7dba4de |
feat: Phase 3 V1 — Auto-Organization (domain_tags + valid_until)
Adds structural metadata that the LLM triage was already implicitly
reasoning about ("stale snapshot" → reject). Phase 3 captures that
reasoning as fields so it can DRIVE retrieval, not just rejection.
Schema (src/atocore/models/database.py):
- domain_tags TEXT DEFAULT '[]' JSON array of lowercase topic keywords
- valid_until DATETIME ISO date; null = permanent
- idx_memories_valid_until index for efficient expiry queries
Memory service (src/atocore/memory/service.py):
- Memory dataclass gains domain_tags + valid_until
- create_memory, update_memory accept/persist both
- _row_to_memory safely reads both (JSON-decode + null handling)
- _normalize_tags helper: lowercase, dedup, strip, cap at 10
- get_memories_for_context filters expired (valid_until < today UTC)
- _rank_memories_for_query adds tag-boost: memories whose domain_tags
appear as substrings in query text rank higher (tertiary key after
content-overlap density + absolute overlap, before confidence)
LLM extractor (_llm_prompt.py → llm-0.5.0):
- SYSTEM_PROMPT documents domain_tags (2-5 keywords) + valid_until
(time-bounded facts get expiry dates; durable facts stay null)
- normalize_candidate_item parses both fields from model output with
graceful fallback for string/null/missing
LLM triage (scripts/auto_triage.py):
- TRIAGE_SYSTEM_PROMPT documents same two fields
- parse_verdict extracts them from verdict JSON
- On promote: PUT /memory/{id} with tags + valid_until BEFORE
POST /memory/{id}/promote, so active memories carry them
API (src/atocore/api/routes.py):
- MemoryCreateRequest: adds domain_tags, valid_until
- MemoryUpdateRequest: adds domain_tags, valid_until, memory_type
- GET /memory response exposes domain_tags + valid_until + created_at
Triage UI (src/atocore/engineering/triage_ui.py):
- Renders existing tags as colored badges
- Adds inline text field for tags (comma-separated) + date picker for
valid_until on every candidate card
- Save&Promote button persists edits via PUT then promotes
- Plain Promote (and Y shortcut) also saves tags/expiry if edited
Wiki (src/atocore/engineering/wiki.py):
- Search now matches memory content OR domain_tags
- Search results render tags as clickable badges linking to
/wiki/search?q=<tag> for cross-project navigation
- valid_until shown as amber "valid until YYYY-MM-DD" hint
Tests: 303 → 308 (5 new for Phase 3 behavior):
- test_create_memory_with_tags_and_valid_until
- test_create_memory_normalizes_tags
- test_update_memory_sets_tags_and_valid_until
- test_get_memories_for_context_excludes_expired
- test_context_builder_tag_boost_orders_results
Deferred (explicitly): temporal_scope enum, source_refs memory graph,
HDBSCAN clustering, memory detail wiki page, backfill of existing
actives. See docs/MASTER-BRAIN-PLAN.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| d8b370fd0a |
feat: /admin/triage web UI + auto-drain loop
Makes human triage sustainable. Before: command-line-only review,
auto-triage stopped after 100 candidates/run. Now:
1. Web UI at /admin/triage
- Lists all pending candidates with inline promote/reject/edit
- Edit content in-place before promoting (PUT /memory/{id})
- Change type via dropdown
- Keyboard shortcuts: Y=promote, N=reject, E=edit, S=scroll-next
- Cards fade out after action, queue count updates live
- Zero JS framework — vanilla fetch + event delegation
2. auto_triage.py drains queue
- Loops up to 20 batches (default) of 100 candidates each
- Tracks seen IDs so needs_human items don't reprocess
- Exits cleanly when queue empty
- Nightly cron naturally drains everything
3. Dashboard + wiki surface triage queue
- Dashboard /admin/dashboard: new "triage" section with pending
count + /admin/triage URL + warning/notice severity levels
- Wiki homepage: prominent callout "N candidates awaiting triage —
review now" linking to /admin/triage, styled with triage-warning
(>50) or triage-notice (>20) CSS
Pattern: human intervenes only when AI can't decide. The UI makes
that intervention fast (20 candidates in 60 seconds). Nightly
auto-triage drains the queue so the human queue stays bounded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 86637f8eee |
feat: universal LLM consumption (Phase 1 complete)
Completes the Phase 1 master brain keystone: every LLM interaction
across the ecosystem now pulls context from AtoCore automatically.
Three adapters, one HTTP backend:
1. OpenClaw plugin pull (handler.js):
- Added before_prompt_build hook that calls /context/build and
injects the pack via prependContext
- Existing capture hooks (before_agent_start + llm_output)
unchanged
- 6s context timeout, fail-open on AtoCore unreachable
- Deployed to T420, gateway restarted, "7 plugins loaded"
2. atocore-proxy (scripts/atocore_proxy.py):
- Stdlib-only OpenAI-compatible HTTP middleware
- Drop-in layer for Codex, Ollama, LiteLLM, any OpenAI-compat client
- Intercepts /chat/completions: extracts query, pulls context,
injects as system message, forwards to upstream, captures back
- Fail-open: AtoCore down = passthrough without injection
- Configurable via env: UPSTREAM, PORT, CLIENT_LABEL, INJECT, CAPTURE
3. (from prior commit
|
|||
| c49363fccc |
feat: atocore-mcp server for universal LLM consumption (Phase 1)
Stdlib-only Python stdio MCP server that wraps the AtoCore HTTP API. Makes AtoCore available as built-in tools to every MCP-aware client (Claude Desktop, Claude Code, Cursor, Zed, Windsurf). 7 tools exposed: - atocore_context: full context pack (state + memories + chunks) - atocore_search: semantic retrieval with scores + sources - atocore_memory_list: filter active memories by project/type - atocore_memory_create: propose a candidate memory - atocore_project_state: query Trusted Project State by category - atocore_projects: list registered projects + aliases - atocore_health: service status check Design choices: - stdlib only (no mcp SDK dep) — AtoCore philosophy - Thin HTTP passthrough — zero business logic, zero drift risk - Fail-open: AtoCore unreachable returns graceful error, not crash - Protocol MCP 2024-11-05 compatible Registered in Claude Code: `claude mcp add atocore -- python ...` Verified: ✓ Connected, all 7 tools exposed, context/search/state return live data from Dalidou (sha=775960c8, vectors=33253). This is the keystone for master brain vision: every Claude session now has AtoCore available as built-in capability without the user or agent having to remember to invoke it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 33a6c61ca6 |
feat: daily backup to Windows main computer via pull-based scp
Third backup tier (after Dalidou local + T420 off-host): pull-based backup to the user's Windows main computer. - scripts/windows/atocore-backup-pull.ps1: PowerShell script using built-in OpenSSH scp. Fail-open: exits cleanly if Dalidou unreachable (e.g., laptop on the road). Pulls whole snapshots dir (~45MB, bounded by Dalidou's retention policy). - docs/windows-backup-setup.md: Task Scheduler setup (automated + manual). Runs daily 10:00 local, catches up missed days via StartWhenAvailable, retries 2x on failure. Verified: pulled 3 snapshots (45MB) to C:\Users\antoi\Documents\ATOCore_Backups\. Task "AtoCore Backup Pull" registered in Task Scheduler, State: Ready. Three independent backup tiers now: Dalidou local, T420 off-host, user Windows machine. Any two can fail without data loss. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 3011aa77da |
fix: retry + stderr capture + pacing in triage/extractor
Both scripts now: - Retry up to 3x with 2s/4s exponential backoff on transient failures (rate limits, capacity spikes) - Capture claude CLI stderr in the error message (200 char cap) instead of just the exit code — diagnostics actually useful now - Sleep 0.5s between calls to avoid bursting the backend Context: last batch run hit 100% failure in triage (every call exit 1) after 40% failure in extraction. claude CLI worked fine immediately after, so the failures were capacity/rate-limit transients. With retry + pacing these batches should complete cleanly now. 439 candidates are already in the queue waiting for triage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 775960c8c8 |
feat: "Make It Actually Useful" sprint — observability + Phase 10
Pipeline observability: - Retrieval harness runs nightly (Step E in batch-extract.sh) - Pipeline summary persisted to project state after each run (pipeline_last_run, pipeline_summary, retrieval_harness_result) - Dashboard enhanced: interaction total + by_client, pipeline health (last_run, hours_since, harness results, triage stats), dynamic project list from registry Phase 10 — reinforcement-based auto-promotion: - auto_promote_reinforced(): candidates with reference_count >= 3 and confidence >= 0.7 auto-graduate to active - expire_stale_candidates(): candidates unreinforced for 14+ days auto-rejected to prevent unbounded queue growth - Both wired into nightly cron (Step B2) - Batch script: scripts/auto_promote_reinforced.py (--dry-run support) Knowledge seeding: - scripts/seed_project_state.py: 26 curated Trusted Project State entries across p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space, atocore (decisions, requirements, facts, contacts, milestones) Tests: 299 → 303 (4 new Phase 10 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 4d4d5f437a |
test(harness): fix p06-tailscale false positive, 18/18 PASS
The fixture's expect_absent: "GigaBIT" was catching legitimate semantic overlap, not retrieval bleed. The p06 ARCHITECTURE.md Overview describes the Polisher Suite as built for the GigaBIT M1 mirror — it is what the polisher is for, so the word appears correctly in p06 content. All retrieved sources for this prompt were genuinely p06/shared paths; zero actual p04 chunks leaked. Narrowed the assertion to expect_absent: "[Source: p04-gigabit/", which tests the real invariant (no p04 source chunks retrieved into p06 context) without the false positive. No retrieval/ranking code change. Fixture-only fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| c2e7064238 |
fix(extraction): R11 container 503 + R12 shared prompt module
R11: POST /admin/extract-batch with mode=llm now returns 503 when the claude CLI is unavailable (was silently returning success with 0 candidates), with a message pointing at the host-side script. +2 tests. R12: extracted SYSTEM_PROMPT + parse_llm_json_array + normalize_candidate_item + build_user_message into stdlib-only src/atocore/memory/_llm_prompt.py. Both the container extractor and scripts/batch_llm_extract_live.py now import from it, eliminating the prompt/parser drift risk. Tests 297 -> 299. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 58ea21df80 |
fix: triage prompt leniency for OpenClaw-curated imports (real this time)
Previous commit had the wrong message — the diff was the config persistence fix, not triage. This properly adds rule 4 to the triage prompt: when candidate content starts with 'From OpenClaw/', apply a much lower bar. OpenClaw's SOUL.md, USER.md, MEMORY.md, MODEL-ROUTING.md, and daily memory/*.md are already curated — promote unless clearly wrong or duplicate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 3db1dd99b5 |
fix: OpenClaw importer default path = /home/papa/clawd
The .openclaw/workspace-* dirs were empty templates. Antoine's real OpenClaw workspace is /home/papa/clawd with SOUL.md, USER.md, MEMORY.md, MODEL-ROUTING.md, IDENTITY.md, PROJECT_STATE.md and rich continuity subdirs (decisions/, lessons/, knowledge/, commitments/, preferences/, goals/, projects/, handoffs/, memory/). First real import: 10 candidates produced from 11 files scanned. MEMORY.md (36K chars) skipped as duplicate content; needs smarter section-level splitting in a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 57b64523fb |
feat: OpenClaw state importer — one-way pull via SSH
scripts/import_openclaw_state.py reads the OpenClaw file continuity layer from clawdbot (T420) via SSH and imports candidate memories into AtoCore. Loose coupling: OpenClaw's internals don't need to change, AtoCore pulls from stable markdown files. Per codex's integration proposal (docs/openclaw-atocore-integration-proposal.md): Classification: - SOUL.md -> identity candidate - USER.md -> identity candidate - MODEL-ROUTING.md -> adaptation candidate (routing rules) - MEMORY.md -> memory candidate (long-term curated) - memory/YYYY-MM-DD.md -> episodic candidate (daily logs, last 7 days) - heartbeat-state.json -> skipped (ops metadata only, not canonical) Delta detection: SHA-256 hash per file stored in project_state under atocore/status/openclaw_import_hashes. Only changed files re-import. Hashes persist across runs so no wasted work. All imports land as status=candidate. Auto-triage filters. Nothing auto-promotes — the importer is a signal producer, the pipeline decides what graduates. Discord: deferred per codex's proposal — no durable local store in current OpenClaw snapshot. Revisit if OpenClaw exposes an export. Wired into cron-backup.sh as Step 3a (before vault refresh + extraction) so OpenClaw signals flow through the same pipeline. Gated on ATOCORE_OPENCLAW_IMPORT=true (default true). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 3f23ca1bc6 |
feat: signal-aggressive extraction + auto vault refresh in nightly cron
Extraction prompt rewritten for signal-aggressive mode. The old prompt
rewarded silence ("durable insight only, empty is correct") which
caused quiet failures — real project signal (Schott quotes arriving,
stakeholder events, blockers) was dropped as "not architectural enough".
New prompt explicitly lists what to emit:
1. Project activity (mentions with context — quote received, blocker,
action item)
2. Decisions and choices (architectural commitments, vendor selection)
3. Durable engineering insight (earned knowledge, generalizable)
4. Stakeholder and vendor events (emails sent, meetings scheduled)
5. Preferences and adaptations (how Antoine works)
Philosophy shift: "capture more signal, let triage filter noise"
replaces "extract only durable architectural facts". Auto-triage
already rejects noise well, so moving the filter downstream gives us
visibility into weak signals without polluting active memory.
Added 'episodic' to the candidate types list to support stakeholder
events with a timestamp feel.
LLM_EXTRACTOR_VERSION bumped to llm-0.4.0.
Also: cron-backup.sh now runs POST /ingest/sources before extraction
so new PKM files flow in automatically. Fail-open, non-blocking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| c1f5b3bdee |
feat: Karpathy-inspired upgrades — contradiction, lint, synthesis
Three additive upgrades borrowed from Karpathy's LLM Wiki pattern: 1. CONTRADICTION DETECTION: auto-triage now has a fourth verdict — "contradicts". When a candidate conflicts with an existing memory (not duplicates, genuine disagreement like "Option A selected" vs "Option B selected"), the triage model flags it and leaves it in the queue for human review instead of silently rejecting or double-storing. Preserves source tension rather than suppressing it. 2. WEEKLY LINT PASS: scripts/lint_knowledge_base.py checks for: - Orphan memories (active but zero references after 14 days) - Stale candidates (>7 days unreviewed) - Unused entities (no relationships) - Empty-state projects - Unregistered projects auto-detected in memories Runs Sundays via the cron. Outputs a report. 3. WEEKLY SYNTHESIS: scripts/synthesize_projects.py uses sonnet to generate a 3-5 sentence "current state" paragraph per project from state + memories + entities. Cached in project_state under status/synthesis_cache. Wiki project pages now show this at the top under "Current State (auto-synthesis)". Falls back to a deterministic summary if no cache exists. deploy/dalidou/batch-extract.sh: added Step C (synthesis) and Step D (lint) gated to Sundays via date check. All additive — nothing existing changes behavior. The database remains the source of truth; these operations just produce better synthesized views and catch rot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| c57617f611 |
feat: auto-project-detection + project stages
Three changes: 1. ABB-Space registered as a lead project with stage=lead in Trusted Project State. Projects now have lifecycle awareness (lead/proposition vs active contract vs completed). 2. Extraction no longer drops unregistered project tags. When the LLM extractor sees a conversation about a project not in the registry, it keeps the model's tag on the candidate instead of falling back to empty. This enables auto-detection of new projects/leads from organic conversations. The nightly pipeline surfaces these candidates for triage, where the operator sees "hey, there's a new project called X" and can decide whether to register it. 3. Extraction prompt updated to tell the model: "If the conversation discusses a project NOT in the known list, still tag it — the system will auto-detect it." This removes the artificial ceiling that prevented new project discovery. Updated Case D test: unregistered + unscoped now keeps the model's tag instead of dropping to empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 3e0a357441 |
feat: bootstrap 35 engineering entities + relationships from project knowledge
Seeds the entity graph from existing project state, memories, and
vault docs across p04-gigabit (11 entities), p05-interferometer (10),
and p06-polisher (14). Covers systems, subsystems, components,
materials, decisions, requirements, constraints, vendors, and
parameters with structural and intent relationships.
Example: GET /entities/{M1 Mirror Assembly id} returns the full
context — 4 subsystems it contains, 2 requirements it's constrained
by, and the parent project — traversable in one API call.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 9118f824fa |
feat: dual-layer knowledge extraction + domain knowledge band
The extraction system now produces two kinds of candidates from
the same conversation:
A. PROJECT-SPECIFIC: applied facts scoped to a named project
(unchanged behavior)
B. DOMAIN KNOWLEDGE: generalizable engineering insight earned
through project work, tagged with a domain (physics, materials,
optics, mechanics, manufacturing, metrology, controls, software,
math, finance) and stored with project="" so it surfaces across
all projects.
Critical quality bar enforced in the system prompt: "Would a
competent engineer need experience to know this, or could they
find it in 30 seconds on Google?" Textbook values, definitions,
and obvious facts are explicitly excluded. Only hard-won insight
qualifies — the kind that takes weeks of FEA or real machining
experience to discover.
Domain tags are embedded in the content as a prefix ("[physics]",
"[materials]") so they survive without a schema migration. A future
column can parse them out.
Context builder gains a new tier between project memories and
retrieved chunks:
Tier 1: Trusted Project State (project-specific)
Tier 2: Identity / Preferences (global)
Tier 3: Project Memories (project-specific)
Tier 4: Domain Knowledge (NEW) (cross-project, 10% budget)
Tier 5: Retrieved Chunks (project-boosted)
Trim order: chunks -> domain knowledge -> project memories ->
identity/preference -> project state.
Host-side extraction script updated with the same prompt and
domain-tag handling.
LLM_EXTRACTOR_VERSION bumped to llm-0.3.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| e5e9a9931e |
fix(R9): trust hierarchy for project attribution
Batch 3, Days 1-3. The core R9 failure was Case F: when the model returned a registered project DIFFERENT from the interaction's known scope, the old code trusted the model because the project was registered. A p06-polisher interaction could silently produce a p04-gigabit candidate. New rule (trust hierarchy): 1. Interaction scope always wins when set (cases A, C, E, F) 2. Model project used only for unscoped interactions AND only when it resolves to a registered project (cases D, G) 3. Empty string when both are empty or unregistered (case B) The rule is: interaction.project is the strongest signal because it comes from the capture hook's project detection, which runs before the LLM ever sees the content. The model's project guess is only useful when the capture hook had no project context. 7 case tests (A-G) cover every combination of model/interaction project state. Pre-existing tests updated for the new behavior. Host-side script mirrors the same hierarchy using _known_projects fetched from GET /projects at startup. Test count: 286 -> 290 (+4 net, 7 new R9 cases, 3 old tests consolidated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 8951c624fe |
fix(R7/R9): overlap-density ranking + project trust-preservation
R7: ranking scorer now uses overlap-density (overlap_count / memory_token_count) as primary key instead of raw overlap count. A 5-token memory with 3 overlapping tokens (density 0.6) now beats a 40-token overview memory with 3 overlapping tokens (density 0.075) at the same absolute count. Secondary: absolute overlap. Tertiary: confidence. Targeting p06-firmware-interface harness fixture. R9: when the LLM extractor returns a project that differs from the interaction's known project, it now checks the project registry. If the model's project is a registered canonical ID, trust it. If not (hallucinated name), fall back to the interaction's project. Uses load_project_registry() for the check. The host-side script mirrors this via an API call to GET /projects at startup. Two new tests: test_parser_keeps_registered_model_project and test_parser_rejects_hallucinated_project. Test count: 280 -> 281. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 1a2ee5e07f |
feat: Day 3 — auto-triage via LLM second pass
scripts/auto_triage.py: fetches candidate memories, asks a triage model (claude -p, default sonnet) to classify each as promote / reject / needs_human, and executes the verdict via the API. Trust model: - Auto-promote: model says promote AND confidence >= 0.8 AND dedup-checked against existing active memories for the project - Auto-reject: model says reject - needs_human: everything else stays in queue for manual review The triage model receives both the candidate content AND a summary of existing active memories for the same project, so it can detect duplicates and near-duplicates. The system prompt explicitly lists the rejection categories learned from the first two manual triage passes (stale snapshots, impl details, planned-not-implemented, process rules that belong in ledger not memory). deploy/dalidou/batch-extract.sh now runs extraction (Step A) then auto-triage (Step B) in sequence. The nightly cron at 03:00 UTC will run the full pipeline: backup → cleanup → rsync → extract → triage. Only needs_human candidates reach the human. Supports --dry-run for preview without executing. Supports --model override for multi-model triage (e.g. opus for higher-quality review, or a future Gemini/Ollama backend). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| ac7f77d86d |
fix: remove --no-session-persistence (unsupported on claude 2.0.60)
Dalidou runs Claude Code 2.0.60 which does not have this flag (added in 2.1.x). Removed from both extractor_llm.py and the host-side batch script. --append-system-prompt and --disable-slash-commands are supported on 2.0.60. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| 719ff649a8 |
fix: fetch full interaction body per-id (list endpoint omits response)
GET /interactions returns response_chars but not the response body
to keep the listing lightweight. The batch extractor now lists ids
first, then fetches each interaction individually via
GET /interactions/{id} to get the full response for LLM extraction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 8af8af90d0 |
fix: pure-stdlib host-side extraction script (no atocore imports)
The host Python on Dalidou lacks pydantic_settings and other container-only deps. Refactored batch_llm_extract_live.py to be a standalone HTTP client + subprocess wrapper using only stdlib. Duplicates the system prompt and JSON parser from extractor_llm.py rather than importing them — acceptable duplication since this is a deployment adapter, not a library. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| cd0fd390a8 |
fix: host-side LLM extraction (claude CLI not in container)
The claude CLI is installed on the Dalidou HOST but not inside
the Docker container. The /admin/extract-batch API endpoint with
mode=llm silently returned 0 candidates because
shutil.which('claude') was None inside the container.
Fix: extraction runs host-side via deploy/dalidou/batch-extract.sh
which calls scripts/batch_llm_extract_live.py with the host's
PYTHONPATH pointing at the repo's src/. The script:
- Fetches interactions from the API (GET /interactions?since=...)
- Runs extract_candidates_llm() locally (host has claude CLI)
- POSTs candidates back to the API (POST /memory, status=candidate)
- Tracks last-run timestamp via project state
The cron now calls the host-side script instead of the container
API endpoint for LLM mode. Rule-mode extraction in the container
still works via /admin/extract-batch.
The API endpoint retains the mode=llm option for environments
where claude IS inside the container (future Docker image with
claude CLI, or a different deployment model).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 3921c5ffc7 |
test: Day 6 — retrieval harness expanded from 6 to 18 fixtures
Added 12 new fixtures across all three active projects:
- p04: 1 short/ambiguous case ('current status')
- p05: 1 CGH calibration case with cross-project bleed guard
- p06: 7 new fixtures targeting triage-promoted memories
(firmware interface, z-axis, cam encoder, telemetry rate,
offline design, USB SSD, Tailscale)
- Adversarial: cross-project-no-bleed (p04 query must not surface
p06 telemetry rate), no-project-hint (project memories must not
appear without a hint)
First run: 14/18 passing.
4 failures (p06-firmware-interface, p06-z-axis, p06-offline-design,
p06-tailscale) share the same root cause: long pre-existing p06
memories (530+ chars, confidence 0.9+) fill the 25% project-memory
budget before the query-relevant newly-promoted memories (shorter,
confidence 0.5) get a slot. Budget contention at equal overlap
score tiebroken by confidence. Day 7 ranking tweak target.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 06792d862e |
feat: first live triage — 16 promoted, 35 rejected from LLM extraction
First end-to-end triage pass on 51 LLM-extracted candidates from
the Day 4 baseline run (extractor_llm via claude -p haiku against
a 20-interaction frozen snapshot).
Results:
- Promoted 16 memories (31% accept rate):
* p06-polisher: 9 (USB SSD, Tailscale, 10 Hz telemetry,
controller-job.v1 invariant, offline-first, z-axis engage/
retract, cam encoder read-only, spec separation)
* atocore: 7 (extraction off hot path, DEV-LEDGER adopted,
codex branching rule, Claude builds/Codex audits, alias
canonicalization, Stop hook capture, passive capture)
- Rejected 35 (stale roadmap items, duplicates with wrong project
tags, already-fixed P1 findings, process rules that live in
DEV-LEDGER/AGENTS.md not in memory, too-granular implementation
details, operational instructions)
Active memory count: 20 → 36. p06-polisher went from 2 to 16.
Candidate queue: 0.
The triage verdict is saved at
scripts/eval_data/triage_verdict_2026-04-12.json for audit.
persist_llm_candidates.py used to push candidates to Dalidou.
POST /memory now accepts a 'status' field (default 'active') so
external scripts can create candidate memories directly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 3a7e8ccba4 |
feat: expose status field on POST /memory + persist_llm_candidates script
The API endpoint now passes the request's status field through to create_memory() so external scripts can create candidate memories directly without going through the extract endpoint. Default remains 'active' for backward compatibility. persist_llm_candidates.py reads a saved extractor eval baseline JSON (e.g. the Day 4 LLM run) and POSTs each candidate to Dalidou with status=candidate. Safe to re-run — duplicate content returns 400 which the script counts as 'skipped'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
| a29b5e22f2 |
feat(eval-loop): Day 4 — LLM extractor via claude -p (OAuth, no API key)
Second pass on the LLM-assisted extractor after Antoine's explicit
rule: no API key, ever. Refactored src/atocore/memory/extractor_llm.py
to shell out to the Claude Code 'claude -p' CLI via subprocess instead
of the anthropic SDK, so extraction reuses the user's existing Claude.ai
OAuth credentials and needs zero secret management.
Implementation:
- subprocess.run(["claude", "-p", "--model", "haiku",
"--append-system-prompt", <instructions>,
"--no-session-persistence", "--disable-slash-commands",
user_message], ...)
- cwd is a cached tempfile.mkdtemp() so every invocation starts with
a clean context instead of auto-discovering CLAUDE.md / AGENTS.md /
DEV-LEDGER.md from the repo root. We cannot use --bare because it
forces API-key auth, which defeats the purpose; the temp-cwd trick
is the lightest way to keep OAuth auth while skipping project
context loading.
- Silent-failure contract unchanged: missing CLI, non-zero exit,
timeout, malformed JSON — all return [] and log an error. The
capture audit trail must not break on an optional side effect.
- Default timeout bumped from 20s to 90s: Haiku + Node.js startup
+ OAuth check is ~20-40s per call in practice, plus real responses
up to 8KB take longer. 45s hit 2 timeouts on the first live run.
- tests/test_extractor_llm.py refactored: the API-key / anthropic SDK
tests are replaced by subprocess-mocking tests covering missing
CLI, timeout, non-zero exit, and a happy-path stdout parse. 14
tests, all green.
scripts/extractor_eval.py:
- New --output <path> flag writes the JSON result directly to a file,
bypassing stdout/log interleaving (structlog sends INFO to stdout
via PrintLoggerFactory, so a naive '> out.json' pollutes the file).
- Forces UTF-8 on stdout so real LLM output with em-dashes / arrows /
CJK doesn't crash the human report on Windows cp1252 consoles.
First live baseline run against the 20-interaction labeled corpus
(scripts/eval_data/extractor_llm_baseline_2026-04-11.json):
mode=llm labeled=20 recall=1.0 precision=0.357 yield_rate=2.55
total_actual_candidates=51 total_expected_candidates=7
false_negative_interactions=0 false_positive_interactions=9
Recall 0% -> 100% vs rule baseline — every human-labeled positive is
caught. Precision reads low (0.357) but inspection shows the "false
positives" are real candidates the human labels under-counted. For
example interaction a6b0d279 was labeled at 2 expected candidates,
the model caught all 6 polisher architectural facts; interaction
52c8c0f3 was labeled at 1, the model caught all 5 infra commitments.
The labels are the bottleneck, not the model.
Day 4 gate against Codex's criteria:
- candidate yield: 255% vs ≥15-25% target
- FP rate tolerable for manual triage: 51 candidates reviewable in
~10 minutes via the triage CLI
- ≥2 real non-synthetic candidates worth review: 20+ obvious wins
(polisher architecture set, p05 infra set, DEV-LEDGER protocol set)
Gate cleared. LLM-assisted extraction is the path forward for
conversational captures. Rule-based extractor stays as-is for
structured-cue inputs and remains the default mode. The next step
(Day 5 stabilize / document) will wire LLM mode behind a flag in
the public extraction endpoint and document scope.
Test count: 276 -> 278 passing. No existing tests changed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| b309e7fd49 |
feat(eval-loop): Day 4 — LLM-assisted extractor path (additive, flagged)
Day 2 baseline showed 0% recall for the rule-based extractor across
5 distinct miss classes. Day 4 decision gate: prototype an
LLM-assisted mode behind a flag. Option A ratified by Antoine.
New module src/atocore/memory/extractor_llm.py:
- extract_candidates_llm(interaction) returns the same MemoryCandidate
dataclass the rule extractor produces, so both paths flow through
the existing triage / candidate pipeline unchanged.
- extract_candidates_llm_verbose() also returns the raw model output
and any error string, for eval and debugging.
- Uses Claude Haiku 4.5 by default; model overridable via
ATOCORE_LLM_EXTRACTOR_MODEL env. Timeout via
ATOCORE_LLM_EXTRACTOR_TIMEOUT_S (default 20s).
- Silent-failure contract: missing API key, unreachable model,
malformed JSON — all return [] and log an error. Never raises
into the caller. The capture audit trail must not break on an
optional side effect.
- Parser tolerates markdown fences, surrounding prose, invalid
memory types, clamps confidence to [0,1], drops empty content.
- System prompt explicitly tells the model to return [] for most
conversational turns (durable-fact bar, not "extract everything").
- Trust rules unchanged: candidates are never auto-promoted,
extraction stays off the capture hot path, human triages via the
existing CLI.
scripts/extractor_eval.py: new --mode {rule,llm} flag so the same
labeled corpus can be scored against both extractors. Default
remains rule so existing invocations are unchanged.
tests/test_extractor_llm.py: 12 new unit tests covering the parser
(empty array, malformed JSON, markdown fences, surrounding prose,
invalid types, empty content, confidence clamping, version tagging),
plus contract tests for missing API key, empty response, and a
mocked api_error path so failure modes never raise.
Test count: 264 -> 276 passing. No existing tests changed.
Next step: run `python scripts/extractor_eval.py --mode llm` against
the labeled set with ANTHROPIC_API_KEY in env, record the delta,
decide whether to wire LLM mode into the API endpoint and CLI or
keep it script-only for now.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 7d8d599030 |
feat(eval-loop): Day 1+2 — labeled extractor corpus + baseline scorecard
Day 1 (labeled corpus):
- scripts/eval_data/interactions_snapshot_2026-04-11.json — frozen
snapshot of 64 real claude-code interactions pulled from live
Dalidou (test-client captures filtered out). This is the stable
corpus the whole mini-phase labels against, independent of future
captures.
- scripts/eval_data/extractor_labels_2026-04-11.json — 20 hand-labeled
interactions drawn by length-stratified random sample. Positives:
5/20 = ~25%, total expected candidates: 7. Plan deviation: Codex's
plan asked for 30 (10/10/10 buckets); the real corpus is heavily
skewed toward instructional/status content, so honest labeling of
20 already crosses the fail-early threshold of "at least 5 plausible
positives" without padding.
Day 2 (baseline measurement):
- scripts/extractor_eval.py — file-based eval runner that loads the
snapshot + labels, runs extract_candidates_from_interaction on each,
and reports yield / recall / precision / miss-class breakdown.
Returns exit 1 on any false positive or false negative.
Current rule extractor against the labeled set:
labeled=20 exact_match=15 positive_expected=5
yield=0.0 recall=0.0 precision=0.0
false_negatives=5 false_positives=0
miss_classes:
recommendation_prose
architectural_change_summary
spec_update_announcement
layered_recommendation
alignment_assertion
Interpretation: the rule-based extractor matches exactly zero of the
5 plausible positive interactions in the labeled set, and the misses
are spread across 5 distinct cue classes with no single dominant
pattern. This is the Day 4 hard-stop signal landing on Day 2 — a
single rule expansion cannot close a 5-way miss, and widening rules
blindly will collapse precision. The right move is to go straight to
the Day 4 decision gate and consider LLM-assisted extraction.
Escalating to DEV-LEDGER.md as R5 for human ratification before
continuing. Not skipping Day 3 silently.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| 30ee857d62 |
test: loosen p05-configuration fixture cross-project check
The fixture asserted 'GigaBIT M1' must not appear in a p05 pack,
but GigaBIT M1 is the mirror the interferometer measures, so its
name legitimately shows up in p05 source docs (CGH test setup
diagrams, AOM design input, etc.). Flagging it as bleed was false
positive.
Replace the assertion with genuinely p04-only material: the
'Option B' / 'conical back' architecture decision and a p06 tag,
neither of which has any reason to appear in a p05 configuration
answer.
Harness now passes 6/6 against live Dalidou at
|
|||
| 4da81c9e4e |
feat: retrieval eval harness + doc sync
scripts/retrieval_eval.py walks a fixture file of project-hinted
questions, runs each against POST /context/build, and scores the
returned formatted_context against per-fixture expect_present and
expect_absent substring checklists. Exit 0 on all-pass, 1 on any
miss. Human-readable by default, --json for automation.
First live run against Dalidou at SHA
|
|||
| 9366ba7879 |
feat: length-aware reinforcement + batch triage CLI + off-host backup
- Reinforcement matcher now handles paragraph-length memories via a
dual-mode threshold: short memories keep the 70% overlap rule,
long memories (>15 stems) require 12 absolute overlaps AND 35%
fraction so organic paraphrase can still reinforce. Diagnosis:
every active memory stayed at reference_count=0 because 40-token
project summaries never hit 70% overlap on real responses.
- scripts/atocore_client.py gains batch-extract (fan out
/interactions/{id}/extract over recent interactions) and triage
(interactive promote/reject walker for the candidate queue),
matching the Phase 9 reflection-loop review flow without pulling
extraction into the capture hot path.
- deploy/dalidou/cron-backup.sh adds an optional off-host rsync step
gated on ATOCORE_BACKUP_RSYNC, fail-open when the target is offline
so a laptop being off at 03:00 UTC never reds the local backup.
- docs/next-steps.md records the retrieval-quality sweep: project
state surfaces, chunks are on-topic but broad, active memories
never reach the pack (reflection loop has no retrieval outlet yet).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
| b492f5f7b0 |
fix: schema init ordering, deploy.sh default, client BASE_URL docs
Three issues Dalidou Claude surfaced during the first real deploy
of commit
|
|||
| fad30d5461 |
feat(client): Phase 9 reflection loop surface in shared operator CLI
Codex's sequence step 3: finish the Phase 9 operator surface in the
shared client. The previous client version (0.1.0) covered stable
operations (project lifecycle, retrieval, context build, trusted
state, audit-query) but explicitly deferred capture/extract/queue/
promote/reject pending "exercised workflow". That deferral ran
into a bootstrap problem: real Claude Code sessions can't exercise
the Phase 9 loop without a usable client surface to drive it. This
commit ships the 8 missing subcommands so the next step (real
validation on Dalidou) is unblocked.
Bumps CLIENT_VERSION from 0.1.0 to 0.2.0 per the semver rules in
llm-client-integration.md (new subcommands = minor bump).
New subcommands in scripts/atocore_client.py
--------------------------------------------
| Subcommand | Endpoint |
|-----------------------|-------------------------------------------|
| capture | POST /interactions |
| extract | POST /interactions/{id}/extract |
| reinforce-interaction | POST /interactions/{id}/reinforce |
| list-interactions | GET /interactions |
| get-interaction | GET /interactions/{id} |
| queue | GET /memory?status=candidate |
| promote | POST /memory/{id}/promote |
| reject | POST /memory/{id}/reject |
Each follows the existing client style: positional arguments with
empty-string defaults for optional filters, truthy-string arguments
for booleans (matching the existing refresh-project pattern), JSON
output via print_json(), fail-open behavior inherited from
request().
capture accepts prompt + response + project + client + session_id +
reinforce as positionals, defaulting the client field to
"atocore-client" when omitted so every capture from the shared
client is identifiable in the interactions audit trail.
extract defaults to preview mode (persist=false). Pass "true" as
the second positional to create candidate memories.
list-interactions and queue build URL query strings with
url-encoded values and always include the limit, matching how the
existing context-build subcommand handles its parameters.
Security fix: ID-field URL encoding
-----------------------------------
The initial draft used urllib.parse.quote() with the default safe
set, which does NOT encode "/" because it's a reserved path
character. That's a security footgun on ID fields: passing
"promote mem/evil/action" would build /memory/mem/evil/action/promote
and hit a completely different endpoint than intended.
Fixed by passing safe="" to urllib.parse.quote() on every ID field
(interaction_id and memory_id). The tests cover this explicitly via
test_extract_url_encodes_interaction_id and test_promote_url_encodes_memory_id,
both of which would have failed with the default behavior.
Project names keep the default quote behavior because a project
name with a slash would already be broken elsewhere in the system
(ingest root resolution, file paths, etc).
tests/test_atocore_client.py (new, 18 tests, all green)
-------------------------------------------------------
A dedicated test file for the shared client that mocks the
request() helper and verifies each subcommand:
- calls the correct HTTP method and path
- builds the correct JSON body (or query string)
- passes the right subset of CLI arguments through
- URL-encodes ID fields so path traversal isn't possible
Tests are structured as unit tests (not integration tests) because
the API surface on the server side already has its own route tests
in test_api_storage.py and the Phase 9 specific files. These tests
are the wiring contract between CLI args and HTTP calls.
Test file highlights:
- capture: default values, custom client, reinforce=false
- extract: preview by default, persist=true opt-in, URL encoding
- reinforce-interaction: correct path construction
- list-interactions: no filters, single filter, full filter set
(including ISO 8601 since parameter with T separator and Z)
- get-interaction: fetch by id
- queue: always filters status=candidate, accepts memory_type
and project, coerces limit to int
- promote / reject: correct path + URL encoding
- test_phase9_full_loop_via_client_shape: end-to-end sequence
that drives capture -> extract preview -> extract persist ->
queue list -> promote -> reject through the shared client and
verifies the exact sequence of HTTP calls that would be made
These tests run in ~0.2s because they mock request() — no DB, no
Chroma, no HTTP. The fast feedback loop matters because the
client surface is what every agent integration eventually depends
on.
docs/architecture/llm-client-integration.md updates
---------------------------------------------------
- New "Phase 9 reflection loop (shipped after migration safety
work)" section under "What's in scope for the shared client
today" with the full 8-subcommand table and a note explaining
the bootstrap-problem rationale
- Removed the "Memory review queue and reflection loop" section
from "What's intentionally NOT in scope today"; backup admin
and engineering-entity commands remain the only deferred
families
- Renumbered the deferred-commands list (was 3 items, now 2)
- Open follow-ups updated: memory-review-subcommand item replaced
with "real-usage validation of the Phase 9 loop" as the next
concrete dependency
- TL;DR updated to list the reflection-loop subcommands
- Versioning note records the v0.1.0 -> v0.2.0 bump with the
subcommands included
Full suite: 215 passing (was 197), 1 warning. The +18 is
tests/test_atocore_client.py. Runtime unchanged because the new
tests don't touch the DB.
What this commit does NOT do
----------------------------
- Does NOT change the server-side endpoints. All 8 subcommands
call existing API routes that were shipped in Phase 9 Commits
A/B/C. This is purely a client-side wiring commit.
- Does NOT run the reflection loop against the live Dalidou
instance. That's the next concrete step and is explicitly
called out in the open-follow-ups section of the updated doc.
- Does NOT modify the Claude Code slash command. It still pulls
context only; the capture/extract/queue/promote companion
commands (e.g. /atocore-record-response) are deferred until the
capture workflow has been exercised in real use at least once.
- Does NOT refactor the OpenClaw helper. That's a cross-repo
change and remains a queued follow-up, now unblocked by the
shared client having the reflection-loop subcommands.
|
|||
| 261277fd51 |
fix(migration): preserve superseded/invalid shadow state during rekey
Codex caught a real data-loss bug in the legacy alias migration
shipped in
|
|||
| 7e60f5a0e6 |
feat(ops): legacy alias migration script with dry-run/apply modes
Closes the compatibility gap documented in docs/architecture/project-identity-canonicalization.md. Before |
|||
| d0ff8b5738 |
Merge origin/main into codex/dalidou-storage-foundation
Integrate codex/port-atocore-ops-client (operator client + operations playbook) so the dalidou-storage-foundation branch can fast-forward into main. # Conflicts: # README.md |
|||
| b9da5b6d84 |
phase9 first-real-use validation + small hygiene wins
Session 1 of the four-session plan. Empirically exercises the Phase 9
loop (capture -> reinforce -> extract) for the first time and lands
three small hygiene fixes.
Validation script + report
--------------------------
scripts/phase9_first_real_use.py — reproducible script that:
- sets up an isolated SQLite + Chroma store under
data/validation/phase9-first-use (gitignored)
- seeds 3 active memories
- runs 8 sample interactions through capture + reinforce + extract
- prints what each step produced and reinforcement state at the end
- supports --json output for downstream tooling
docs/phase9-first-real-use.md — narrative report of the run with:
- extraction results table (8/8 expectations met exactly)
- the empirical finding that REINFORCEMENT MATCHED ZERO seeds
despite sample 5 clearly echoing the rebase preference memory
- root cause analysis: the substring matcher is too brittle for
natural paraphrases (e.g. "prefers" vs "I prefer", "history"
vs "the history")
- recommended fix: replace substring matcher with a token-overlap
matcher (>=70% of memory tokens present in response, with
light stemming and a small stop list)
- explicit note that the fix is queued as a follow-up commit, not
bundled into the report — keeps the audit trail clean
Key extraction results from the run:
- all 7 heading/sentence rules fired correctly
- 0 false positives on the prose-only sample (the most important
sanity check)
- long content preserved without truncation
- dedup correctly kept three distinct cues from one interaction
- project scoping flowed cleanly through the pipeline
Hygiene 1: FastAPI lifespan migration (src/atocore/main.py)
- Replaced @app.on_event("startup") with the modern @asynccontextmanager
lifespan handler
- Same setup work (setup_logging, ensure_runtime_dirs, init_db,
init_project_state_schema, startup_ready log)
- Removes the two on_event deprecation warnings from every test run
- Test suite now shows 1 warning instead of 3
Hygiene 2: EXTRACTOR_VERSION constant (src/atocore/memory/extractor.py)
- Added EXTRACTOR_VERSION = "0.1.0" with a versioned change log comment
- MemoryCandidate dataclass carries extractor_version on every candidate
- POST /interactions/{id}/extract response now includes extractor_version
on both the top level (current run) and on each candidate
- Implements the versioning requirement called out in
docs/architecture/promotion-rules.md so old candidates can be
identified and re-evaluated when the rule set evolves
Hygiene 3: ~/.git-credentials cleanup (out-of-tree, not committed)
- Removed the dead OAUTH_USER:<jwt> line for dalidou:3000 that was
being silently rewritten by the system credential manager on every
push attempt
- Configured credential.http://dalidou:3000.helper with the empty-string
sentinel pattern so the URL-specific helper chain is exactly
["", store] instead of inheriting the system-level "manager" helper
that ships with Git for Windows
- Same fix for the 100.80.199.40 (Tailscale) entry
- Verified end to end: a fresh push using only the cleaned credentials
file (no embedded URL) authenticates as Antoine and lands cleanly
Full suite: 160 passing (no change from previous), 1 warning
(was 3) thanks to the lifespan migration.
|
|||
| ceb129c7d1 | Add operator client and operations playbook | |||
| b4afbbb53a |
feat: implement AtoCore Phase 0 + Phase 0.5 (foundation + PoC)
Complete implementation of the personal context engine foundation: - FastAPI server with 5 endpoints (ingest, query, context/build, health, debug) - SQLite database with 5 tables (documents, chunks, memories, projects, interactions) - Heading-aware markdown chunker (800 char max, recursive splitting) - Multilingual embeddings via sentence-transformers (EN/FR) - ChromaDB vector store with cosine similarity retrieval - Context builder with project boosting, dedup, and budget enforcement - CLI scripts for batch ingestion and test prompt evaluation - 19 unit tests passing, 79% coverage - Validated on 482 real project files (8383 chunks, 0 errors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |