ATOCore

Author	SHA1	Message	Date
Anto01	8c0f1ff6f3	fix: triage is lenient on OpenClaw-curated content Auto-triage was rejecting 8 of 10 OpenClaw imports as 'session log' or 'process rule belongs elsewhere'. But OpenClaw's SOUL.md, USER.md, MEMORY.md and daily memory/*.md files are already curated — they ARE the canonical continuity layer we want to absorb. Applying the conservative LLM-conversation triage bar to them discards the signal the importer was designed to capture. Triage prompt now has a rule 4: when candidate content starts with 'From OpenClaw/' apply a much lower bar. Session events, project updates, stakeholder notes, and decisions from daily memory files should promote, not reject. The ABB-Space Schott quote that DID promote was the lucky exception — after this fix, the other 7 daily notes (CDR execution log, Discord migration plan, isogrid research, etc.) will promote too. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:54:17 -04:00
Anto01	3db1dd99b5	fix: OpenClaw importer default path = /home/papa/clawd The .openclaw/workspace-* dirs were empty templates. Antoine's real OpenClaw workspace is /home/papa/clawd with SOUL.md, USER.md, MEMORY.md, MODEL-ROUTING.md, IDENTITY.md, PROJECT_STATE.md and rich continuity subdirs (decisions/, lessons/, knowledge/, commitments/, preferences/, goals/, projects/, handoffs/, memory/). First real import: 10 candidates produced from 11 files scanned. MEMORY.md (36K chars) skipped as duplicate content; needs smarter section-level splitting in a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:41:49 -04:00
Anto01	57b64523fb	feat: OpenClaw state importer — one-way pull via SSH scripts/import_openclaw_state.py reads the OpenClaw file continuity layer from clawdbot (T420) via SSH and imports candidate memories into AtoCore. Loose coupling: OpenClaw's internals don't need to change, AtoCore pulls from stable markdown files. Per codex's integration proposal (docs/openclaw-atocore-integration-proposal.md): Classification: - SOUL.md -> identity candidate - USER.md -> identity candidate - MODEL-ROUTING.md -> adaptation candidate (routing rules) - MEMORY.md -> memory candidate (long-term curated) - memory/YYYY-MM-DD.md -> episodic candidate (daily logs, last 7 days) - heartbeat-state.json -> skipped (ops metadata only, not canonical) Delta detection: SHA-256 hash per file stored in project_state under atocore/status/openclaw_import_hashes. Only changed files re-import. Hashes persist across runs so no wasted work. All imports land as status=candidate. Auto-triage filters. Nothing auto-promotes — the importer is a signal producer, the pipeline decides what graduates. Discord: deferred per codex's proposal — no durable local store in current OpenClaw snapshot. Revisit if OpenClaw exposes an export. Wired into cron-backup.sh as Step 3a (before vault refresh + extraction) so OpenClaw signals flow through the same pipeline. Gated on ATOCORE_OPENCLAW_IMPORT=true (default true). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:39:27 -04:00
Anto01	a13ea3b9d1	docs: propose OpenClaw one-way pull integration	2026-04-14 10:34:15 -04:00
Anto01	3f23ca1bc6	feat: signal-aggressive extraction + auto vault refresh in nightly cron Extraction prompt rewritten for signal-aggressive mode. The old prompt rewarded silence ("durable insight only, empty is correct") which caused quiet failures — real project signal (Schott quotes arriving, stakeholder events, blockers) was dropped as "not architectural enough". New prompt explicitly lists what to emit: 1. Project activity (mentions with context — quote received, blocker, action item) 2. Decisions and choices (architectural commitments, vendor selection) 3. Durable engineering insight (earned knowledge, generalizable) 4. Stakeholder and vendor events (emails sent, meetings scheduled) 5. Preferences and adaptations (how Antoine works) Philosophy shift: "capture more signal, let triage filter noise" replaces "extract only durable architectural facts". Auto-triage already rejects noise well, so moving the filter downstream gives us visibility into weak signals without polluting active memory. Added 'episodic' to the candidate types list to support stakeholder events with a timestamp feel. LLM_EXTRACTOR_VERSION bumped to llm-0.4.0. Also: cron-backup.sh now runs POST /ingest/sources before extraction so new PKM files flow in automatically. Fail-open, non-blocking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:24:50 -04:00
Anto01	c1f5b3bdee	feat: Karpathy-inspired upgrades — contradiction, lint, synthesis Three additive upgrades borrowed from Karpathy's LLM Wiki pattern: 1. CONTRADICTION DETECTION: auto-triage now has a fourth verdict — "contradicts". When a candidate conflicts with an existing memory (not duplicates, genuine disagreement like "Option A selected" vs "Option B selected"), the triage model flags it and leaves it in the queue for human review instead of silently rejecting or double-storing. Preserves source tension rather than suppressing it. 2. WEEKLY LINT PASS: scripts/lint_knowledge_base.py checks for: - Orphan memories (active but zero references after 14 days) - Stale candidates (>7 days unreviewed) - Unused entities (no relationships) - Empty-state projects - Unregistered projects auto-detected in memories Runs Sundays via the cron. Outputs a report. 3. WEEKLY SYNTHESIS: scripts/synthesize_projects.py uses sonnet to generate a 3-5 sentence "current state" paragraph per project from state + memories + entities. Cached in project_state under status/synthesis_cache. Wiki project pages now show this at the top under "Current State (auto-synthesis)". Falls back to a deterministic summary if no cache exists. deploy/dalidou/batch-extract.sh: added Step C (synthesis) and Step D (lint) gated to Sundays via date check. All additive — nothing existing changes behavior. The database remains the source of truth; these operations just produce better synthesized views and catch rot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 21:08:13 -04:00
Anto01	761c483474	feat: wiki homepage groups projects by stage Projects now appear under three buckets based on their state entries: - Active Contracts - Leads & Prospects - Internal Tools & Infra Each card shows the stage as a tag on the project title, the client as an italic subtitle, and the project description. Empty buckets hide. Makes it obvious at a glance what's contracted vs lead vs internal. Paired with stage/type/client state entries added to all 6 projects so the grouping has data to work with. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 18:47:44 -04:00
Anto01	c57617f611	feat: auto-project-detection + project stages Three changes: 1. ABB-Space registered as a lead project with stage=lead in Trusted Project State. Projects now have lifecycle awareness (lead/proposition vs active contract vs completed). 2. Extraction no longer drops unregistered project tags. When the LLM extractor sees a conversation about a project not in the registry, it keeps the model's tag on the candidate instead of falling back to empty. This enables auto-detection of new projects/leads from organic conversations. The nightly pipeline surfaces these candidates for triage, where the operator sees "hey, there's a new project called X" and can decide whether to register it. 3. Extraction prompt updated to tell the model: "If the conversation discusses a project NOT in the known list, still tag it — the system will auto-detect it." This removes the artificial ceiling that prevented new project discovery. Updated Case D test: unregistered + unscoped now keeps the model's tag instead of dropping to empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:16:04 -04:00
Anto01	3f18ba3b35	feat: AtoCore Wiki — navigable project knowledge browser Full wiki interface at /wiki with: - /wiki — Homepage with project cards, search box, system stats - /wiki/projects/{name} — Project page with clickable entity links - /wiki/entities/{id} — Entity detail with relationships as links - /wiki/search?q=... — Search across entities and memories Every entity name in a project page links to its detail page. Entity detail pages show properties, relationships as clickable links to related entities, and breadcrumb navigation back to the project and wiki home. Responsive, dark-mode, mobile-friendly. Card grid for projects. Generated on-demand from the database — always current, no static files, source of truth is the DB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:09:12 -04:00
Anto01	8527c369ee	fix: add markdown to pyproject.toml (container pip install reads this, not requirements.txt) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:37:22 -04:00
Anto01	bd3dc50100	feat: HTML mirror pages — readable project dashboards in browser GET /projects/{name}/mirror.html serves a styled HTML page rendered from the mirror markdown. Clean typography, responsive, dark mode support, mobile-friendly. Open from phone or desktop: http://dalidou:8100/projects/p04-gigabit/mirror.html http://dalidou:8100/projects/p05-interferometer/mirror.html http://dalidou:8100/projects/p06-polisher/mirror.html Uses the markdown library for md→html conversion. Added to requirements.txt. The JSON endpoint (/mirror) still exists for programmatic access. Source of truth remains the AtoCore database. The HTML page is a derived view with a clear disclaimer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:31:03 -04:00
Anto01	700e3ca2c2	feat: Human Mirror — GET /projects/{name}/mirror Layer 3 of the AtoCore architecture. Generates a human-readable project overview in markdown from structured data: - Trusted Project State (by category) - System Architecture (systems → subsystems → components with material and interface links) - Decisions (with affected entities) - Requirements & Constraints - Materials - Vendors - Active Memories (with confidence and reference counts) The mirror is DERIVED — every line traces back to an entity, state entry, or memory. The footer stamps the generation timestamp and the "not canonical truth" disclaimer. API: GET /projects/{project_name}/mirror returns {project, format, content} where content is the full markdown page. Supports project aliases via resolve_project_name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 14:37:12 -04:00
Anto01	ccc49d3a8f	feat: engineering-aware context assembly When a query matches a known engineering entity by name, the context pack now includes a structured '--- Engineering Context ---' band showing the entity's type, description, and its relationships to other entities (subsystems, materials, requirements, decisions). Six-tier context assembly: 1. Trusted Project State 2. Identity / Preferences 3. Project Memories 4. Domain Knowledge 5. Engineering Context (NEW) 6. Retrieved Chunks The engineering band uses the same token-overlap scoring as memory ranking: query tokens are matched against entity names + descriptions. The top match gets its full relationship context included. 10% budget allocation. Trims before domain knowledge (lowest priority of the structured tiers since the same info may appear in chunks). Example: query 'lateral support design' against p04-gigabit surfaces the Lateral Support subsystem entity with its relationships to GF-PTFE material, M1 Mirror Assembly parent system, and related components. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:17:01 -04:00
Anto01	3e0a357441	feat: bootstrap 35 engineering entities + relationships from project knowledge Seeds the entity graph from existing project state, memories, and vault docs across p04-gigabit (11 entities), p05-interferometer (10), and p06-polisher (14). Covers systems, subsystems, components, materials, decisions, requirements, constraints, vendors, and parameters with structural and intent relationships. Example: GET /entities/{M1 Mirror Assembly id} returns the full context — 4 subsystems it contains, 2 requirements it's constrained by, and the parent project — traversable in one API call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 09:57:53 -04:00
Anto01	dc20033a93	feat: Engineering Knowledge Layer V1 — entities + relationships Layer 2 of the AtoCore architecture. Adds typed engineering entities with relationships on top of the flat memory/state/chunk substrate. Schema: - entities table: id, entity_type, name, project, description, properties (JSON), status, confidence, source_refs, timestamps - relationships table: source_entity_id, target_entity_id, relationship_type, confidence, source_refs 15 entity types: project, system, subsystem, component, interface, requirement, constraint, decision, material, parameter, analysis_model, result, validation_claim, vendor, process 12 relationship types: contains, part_of, interfaces_with, satisfies, constrained_by, affected_by_decision, analyzed_by, validated_by, depends_on, uses_material, described_by, supersedes Service layer: full CRUD + get_entity_with_context (returns an entity with its relationships and all related entities in one call). API endpoints: - POST /entities — create entity - GET /entities — list/filter by type, project, status, name - GET /entities/{id} — entity + relationships + related entities - POST /relationships — create relationship Schema auto-initialized on app startup via init_engineering_schema(). 7 tests covering entity CRUD, relationships, context traversal, filtering, name search, and validation. Test count: 290 -> 297. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 09:50:58 -04:00
Anto01	b86181eb6c	docs: knowledge architecture — dual-layer model + domain knowledge Comprehensive architecture doc covering: - The problem (applied vs domain knowledge separation) - The quality bar (earned insight vs common knowledge, with examples) - Five-tier context assembly with budget allocation - Knowledge domains (10 domains: physics through finance) - Domain tag encoding (prefix in content, no schema migration) - Full flow: capture → extract → triage → surface - Cross-project example (p04 insight surfaces in p06 context) - Future directions: personal branch, multi-model, reinforcement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 09:14:32 -04:00
Anto01	9118f824fa	feat: dual-layer knowledge extraction + domain knowledge band The extraction system now produces two kinds of candidates from the same conversation: A. PROJECT-SPECIFIC: applied facts scoped to a named project (unchanged behavior) B. DOMAIN KNOWLEDGE: generalizable engineering insight earned through project work, tagged with a domain (physics, materials, optics, mechanics, manufacturing, metrology, controls, software, math, finance) and stored with project="" so it surfaces across all projects. Critical quality bar enforced in the system prompt: "Would a competent engineer need experience to know this, or could they find it in 30 seconds on Google?" Textbook values, definitions, and obvious facts are explicitly excluded. Only hard-won insight qualifies — the kind that takes weeks of FEA or real machining experience to discover. Domain tags are embedded in the content as a prefix ("[physics]", "[materials]") so they survive without a schema migration. A future column can parse them out. Context builder gains a new tier between project memories and retrieved chunks: Tier 1: Trusted Project State (project-specific) Tier 2: Identity / Preferences (global) Tier 3: Project Memories (project-specific) Tier 4: Domain Knowledge (NEW) (cross-project, 10% budget) Tier 5: Retrieved Chunks (project-boosted) Trim order: chunks -> domain knowledge -> project memories -> identity/preference -> project state. Host-side extraction script updated with the same prompt and domain-tag handling. LLM_EXTRACTOR_VERSION bumped to llm-0.3.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 09:04:04 -04:00
Anto01	db89978871	docs: full session sync — master plan + ledger + atomizer-v2 ingested Master plan status updated to reflect current reality: - 5 registered projects (atomizer-v2 newly ingested, 33,253 vectors) - 47 active memories across all types - 61 project state entries - Nightly pipeline fully operational (both capture clients) - 7/14 phases baseline complete - "Now" section updated: observe/stabilize, multi-model triage, automated eval, atomizer state entries - "Next" section updated: write-back, AtoDrive, hardening - "Not Yet" items crossed off where applicable (reflection loop, auto-promotion, OpenClaw write-back) DEV-LEDGER orientation fully refreshed with current vectors, projects, pipeline state, and capture clients. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 20:32:47 -04:00
Anto01	4ac4e5cc44	Merge codex/openclaw-capture-plugin — OpenClaw capture integration Adds openclaw-plugins/atocore-capture/: a minimal OpenClaw plugin that mirrors Claude Code's Stop hook. Captures user-triggered assistant turns and POSTs to AtoCore /interactions with client=openclaw, reinforce=true, fail-open. Review verdict: functionally complete, one polish item (prompt includes wrapper context — not blocking, extraction pipeline handles noisy prompts). End-to-end verified on Dalidou with a real client=openclaw interaction. Both Claude Code and OpenClaw now feed AtoCore's reflection loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 18:34:47 -04:00
Anto01	a6ae6166a4	feat: add OpenClaw AtoCore capture plugin	2026-04-12 22:06:07 +00:00
Anto01	4f8bec7419	feat: deeper Wave 2 + observability dashboard Wave 2 deeper ingestion: - 6 new Trusted Project State entries from design-level docs: p05: test rig architecture, CGH specification, procurement combos p06: force control architecture, control channels, calibration loop - Total state entries: ~23 (was ~17) Observability: - GET /admin/dashboard — one-shot system overview: memory counts by type/project/status, reinforced count, project state entry counts, recent interaction timestamp, extraction pipeline status. Replaces the need to query 4+ endpoints to understand system state. Harness: 17/18 (no regression from new state entries). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:09:36 -04:00
Anto01	52380a233e	docs: Phase 4 baseline complete Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:56:24 -04:00
Anto01	8b77e83f0a	feat: Phase 4 — seed identity + preference memories, lower band to 5% 3 identity memories (Antoine's role, projects, infrastructure) and 3 preference memories (no API keys, multi-model collab, action bias) seeded on live Dalidou. These fill the identity/preference band that was previously empty. Lowered MEMORY_BUDGET_RATIO from 0.10 to 0.05 because the 10% allocation squeezed project memories and retrieval chunks enough to regress 4 harness fixtures. At 5% the band fits at most 1 short memory — enough for the most relevant identity/preference fact without starving the project-specific tiers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:48:56 -04:00
Anto01	dbb8f915e2	chore(ledger): Batch 3 close — R9 fixed, before/after documented Before: a model returning 'p04-gigabit' for a p06-polisher interaction would silently override the known scope because the project was registered. After: interaction.project always wins when set. Model project is only a fallback for unscoped captures. Not yet guaranteed: within-project semantic errors (model says the right project but wrong content). That's a content-quality concern, not a trust-hierarchy issue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 15:38:19 -04:00
Anto01	e5e9a9931e	fix(R9): trust hierarchy for project attribution Batch 3, Days 1-3. The core R9 failure was Case F: when the model returned a registered project DIFFERENT from the interaction's known scope, the old code trusted the model because the project was registered. A p06-polisher interaction could silently produce a p04-gigabit candidate. New rule (trust hierarchy): 1. Interaction scope always wins when set (cases A, C, E, F) 2. Model project used only for unscoped interactions AND only when it resolves to a registered project (cases D, G) 3. Empty string when both are empty or unregistered (case B) The rule is: interaction.project is the strongest signal because it comes from the capture hook's project detection, which runs before the LLM ever sees the content. The model's project guess is only useful when the capture hook had no project context. 7 case tests (A-G) cover every combination of model/interaction project state. Pre-existing tests updated for the new behavior. Host-side script mirrors the same hierarchy using _known_projects fetched from GET /projects at startup. Test count: 286 -> 290 (+4 net, 7 new R9 cases, 3 old tests consolidated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 15:37:29 -04:00
Anto01	144dbbd700	Merge codex/audit-batch2 — R7/R8 confirmed fixed, R9 stays open Codex verified R1/R5/R7/R8 fixed, harness 17/18, auto-triage dry-run works. R9 stays open: registered-but-wrong project from model can still override interaction scope. Fair — the registry check prevents hallucinated names but not misattribution between real projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 15:28:00 -04:00
Anto01	7650c339a2	audit: verify batch2 claims and findings	2026-04-12 19:06:51 +00:00
Anto01	69c971708a	feat: Day 4+5 — R7/R9 fixes + integration tests (R8) Day 4: - R7 fixed: overlap-density ranking. p06-firmware-interface now passes (was the last memory-ranking failure). Harness 16/18→17/18. - R9 fixed: LLM extractor checks project registry before trusting model-supplied project. Hallucinated projects fall back to interaction's known scope. Registry lookup via load_project_registry(), matched by project_id. Host-side script mirrors this via GET /projects at startup. Day 5: - R8 addressed: 5 integration tests in test_extraction_pipeline.py covering the full LLM extract → persist as candidate → promote/ reject flow, project fallback, failure handling, and dedup behavior. Uses mocked subprocess to avoid real claude -p calls. Harness: 17/18 (only p06-tailscale remains — chunk bleed from source content, not a memory/ranking issue). Tests: 280 → 286 (+6). Batch complete. Before/after for this batch: R1: fixed (extraction pipeline operational on Dalidou) R5: fixed (batch endpoint + host-side script) R7: fixed (overlap-density ranking) R9: fixed (project trust-preservation via registry check) R8: addressed (5 integration tests) Harness: 16/18 → 17/18 Active memories: 36 → 41 Nightly pipeline: backup → cleanup → rsync → extract → auto-triage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:44:02 -04:00
Anto01	8951c624fe	fix(R7/R9): overlap-density ranking + project trust-preservation R7: ranking scorer now uses overlap-density (overlap_count / memory_token_count) as primary key instead of raw overlap count. A 5-token memory with 3 overlapping tokens (density 0.6) now beats a 40-token overview memory with 3 overlapping tokens (density 0.075) at the same absolute count. Secondary: absolute overlap. Tertiary: confidence. Targeting p06-firmware-interface harness fixture. R9: when the LLM extractor returns a project that differs from the interaction's known project, it now checks the project registry. If the model's project is a registered canonical ID, trust it. If not (hallucinated name), fall back to the interaction's project. Uses load_project_registry() for the check. The host-side script mirrors this via an API call to GET /projects at startup. Two new tests: test_parser_keeps_registered_model_project and test_parser_rejects_hallucinated_project. Test count: 280 -> 281. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:34:33 -04:00
Anto01	1a2ee5e07f	feat: Day 3 — auto-triage via LLM second pass scripts/auto_triage.py: fetches candidate memories, asks a triage model (claude -p, default sonnet) to classify each as promote / reject / needs_human, and executes the verdict via the API. Trust model: - Auto-promote: model says promote AND confidence >= 0.8 AND dedup-checked against existing active memories for the project - Auto-reject: model says reject - needs_human: everything else stays in queue for manual review The triage model receives both the candidate content AND a summary of existing active memories for the same project, so it can detect duplicates and near-duplicates. The system prompt explicitly lists the rejection categories learned from the first two manual triage passes (stale snapshots, impl details, planned-not-implemented, process rules that belong in ledger not memory). deploy/dalidou/batch-extract.sh now runs extraction (Step A) then auto-triage (Step B) in sequence. The nightly cron at 03:00 UTC will run the full pipeline: backup → cleanup → rsync → extract → triage. Only needs_human candidates reach the human. Supports --dry-run for preview without executing. Supports --model override for multi-model triage (e.g. opus for higher-quality review, or a future Gemini/Ollama backend). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 12:30:57 -04:00
Anto01	9b149d4bfd	Merge codex/audit-2026-04-12-extraction — R1+R5 fixed, R11-R12 added Codex verified the host-side extraction pipeline works end-to-end on Dalidou (ran it manually, produced 13 additional candidates). R1 and R5 are now marked fixed. New findings: - R11: container mode=llm silently returns 0 candidates - R12: duplicated prompt/parser between host script and extractor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 12:23:17 -04:00
Anto01	abc8af5f7e	audit: record extraction pipeline findings	2026-04-12 16:20:42 +00:00
Anto01	ac7f77d86d	fix: remove --no-session-persistence (unsupported on claude 2.0.60) Dalidou runs Claude Code 2.0.60 which does not have this flag (added in 2.1.x). Removed from both extractor_llm.py and the host-side batch script. --append-system-prompt and --disable-slash-commands are supported on 2.0.60. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:59:19 -04:00
Anto01	719ff649a8	fix: fetch full interaction body per-id (list endpoint omits response) GET /interactions returns response_chars but not the response body to keep the listing lightweight. The batch extractor now lists ids first, then fetches each interaction individually via GET /interactions/{id} to get the full response for LLM extraction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:58:00 -04:00
Anto01	8af8af90d0	fix: pure-stdlib host-side extraction script (no atocore imports) The host Python on Dalidou lacks pydantic_settings and other container-only deps. Refactored batch_llm_extract_live.py to be a standalone HTTP client + subprocess wrapper using only stdlib. Duplicates the system prompt and JSON parser from extractor_llm.py rather than importing them — acceptable duplication since this is a deployment adapter, not a library. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:57:18 -04:00
Anto01	cd0fd390a8	fix: host-side LLM extraction (claude CLI not in container) The claude CLI is installed on the Dalidou HOST but not inside the Docker container. The /admin/extract-batch API endpoint with mode=llm silently returned 0 candidates because shutil.which('claude') was None inside the container. Fix: extraction runs host-side via deploy/dalidou/batch-extract.sh which calls scripts/batch_llm_extract_live.py with the host's PYTHONPATH pointing at the repo's src/. The script: - Fetches interactions from the API (GET /interactions?since=...) - Runs extract_candidates_llm() locally (host has claude CLI) - POSTs candidates back to the API (POST /memory, status=candidate) - Tracks last-run timestamp via project state The cron now calls the host-side script instead of the container API endpoint for LLM mode. Rule-mode extraction in the container still works via /admin/extract-batch. The API endpoint retains the mode=llm option for environments where claude IS inside the container (future Docker image with claude CLI, or a different deployment model). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:55:22 -04:00
Anto01	c67bec095c	feat: nightly batch extraction in cron-backup.sh (Day 2) Step 4 added to the daily cron: POST /admin/extract-batch with mode=llm, persist=true, limit=50. Runs after backup + cleanup + rsync. Fail-open: extraction failure never blocks the backup. Gated on ATOCORE_EXTRACT_BATCH=true (defaults to true). The endpoint uses the last_extract_batch_run timestamp from project state to auto-resume, so the cron doesn't need to track state. curl --max-time 600 gives the LLM extractor up to 10 minutes for the batch (50 interactions × ~20s each worst case = ~17 min, but most will be no-ops if already extracted). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:51:13 -04:00
Anto01	bcb7675a0d	feat(R1/R5): POST /admin/extract-batch + LLM mode on single extract Day 1 of the operational-reflection batch. Two changes: 1. POST /admin/extract-batch: batch extraction endpoint that fetches recent interactions (since last run or explicit 'since' param), runs the extractor (rule or LLM mode), and persists candidates with status=candidate. Tracks last-run timestamp in project state (atocore/status/last_extract_batch_run) so subsequent calls auto-resume. This is the operational home for R1/R5 — makes the LLM extractor an API operation, not just a script. 2. POST /interactions/{id}/extract now accepts mode: "rule" \| "llm" (default "rule" for backward compatibility). When "llm", it uses extract_candidates_llm (claude -p sonnet, OAuth). Both changes preserve the standing decision: extraction stays off the capture hot path. The batch endpoint is invoked explicitly by cron, manual curl, or CLI — never inline with POST /interactions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:45:42 -04:00
Anto01	54d84b52cb	Merge codex/audit-2026-04-12-final — R9-R10, state count corrections R9 (P2): model-supplied non-empty project can override correct interaction scope — edge case, acknowledged. R10 (P2): Phase 8 is baseline-complete, not primary-complete — correct characterization, already marked as Baseline Complete. Corrected Wave 2 state counts (p04=5, p05=6, p06=6). Confirmed live SHA drift (`39d73e9` vs `e2895b5`) — docs-only commits don't trigger redeploy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:05:01 -04:00
Anto01	b790e7eb30	audit: record final 2026-04-12 findings	2026-04-12 13:03:10 +00:00
Anto01	e2895b5d2b	feat: Phase 8 OpenClaw integration verified end-to-end Verified t420-openclaw/atocore.py against live Dalidou from both the development machine and the T420 (clawdbot @ 192.168.86.39): - health: returns 0.2.0 + build_sha + vector count - auto-context: project detection + context/build produces full packs with Trusted Project State, Project Memories band, and retrieved chunks (tested p05 vendor query and p06 firmware query) - fail-open: unreachable host returns {status: unavailable, fail_open: true} without crashing or blocking the session API surface coverage: atocore.py hits 15/33 endpoints (core retrieval + project state + context build). Memory management, interactions, and backup endpoints are correctly excluded — those belong to the operator client (scripts/atocore_client.py) per the read-only additive integration model. No code changes needed — the April 6 atocore.py already matches the current API surface. Wave 2 state entries and project-memory band changes are transparent to the client (they enrich formatted_context without requiring client-side updates). Cloned repo to T420 at /home/papa/ATOCore for future OpenClaw use. Updated master-plan-status.md: Phase 8 moved from Partial to Baseline Complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 08:50:51 -04:00
Anto01	2b79680167	chore(ledger): Wave 2 ingestion + codex audit response session log Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 07:57:32 -04:00
Anto01	39d73e91b4	fix(R6): fall back to interaction.project when LLM returns empty Codex R6: the LLM extractor accepted the model's project field verbatim. When the model returned empty string, clearly p06 memories got promoted as project='', making them invisible to the p06 project-memory band and explaining the p06-offline-design harness failure. Fix: if model returns empty project but interaction.project is set, inherit the interaction's project. Model-supplied project still takes precedence when non-empty. Two new tests lock the fallback and precedence behaviors. R5 acknowledged (LLM extractor not yet wired into API — next task). Test count: 278 -> 280. Harness re-run pending after deploy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 07:37:14 -04:00
Anto01	7ddf0e38ee	Merge codex/audit-2026-04-12 — R5-R8 findings Codex correctly identified: - R5 (P1): LLM extractor is script-only, not wired into the API - R6 (P1): LLM extractor drops interaction.project when model returns empty — caused the p06-offline-design harness failure - R7 (P2): lexical scorer ties on overlap count, broad memories win on confidence tiebreaker - R8 (P2): no integration test for the persist/triage flow Also corrected the harness-failure narrative: not all 3 are budget contention. One is a ranking tie, one is a project-scope miss, one is chunk bleed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 07:35:09 -04:00
Anto01	b0fde3ee60	config: default LLM extractor model haiku -> sonnet Haiku was producing noisy candidates (31% accept rate on first triage). Sonnet should give tighter extraction with fewer false positives while still catching the same durable-fact patterns. Override: ATOCORE_LLM_EXTRACTOR_MODEL=haiku to revert. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 07:31:34 -04:00
Anto01	89c7964237	audit: record 2026-04-12 review findings	2026-04-12 11:31:32 +00:00
Anto01	146f2e4a5e	chore: Day 8 — close mini-phase with before/after metrics Mini-phase complete. Before/after deltas: Metric Before After ───────────────────────────────────────── Rule extractor recall 0% 0% (unchanged, deprioritized) LLM extractor recall n/a 100% (new, claude -p haiku) LLM candidate yield n/a 2.55/interaction First triage accept rate n/a 31% (16/51) Active memories 20 36 (+16) p06-polisher memories 2 16 (+14) atocore memories 0 5 (+5) Retrieval harness 6/6 15/18 (expanded to 18 fixtures) Test count 264 278 (+14) 3 remaining harness failures are budget-contention on the p06 memory band: the specific memory a fixture targets ranks 4th+ and the 25% budget only holds 2-3 entries. Not a ranking bug — the per-entry 250-char cap was the one justified tweak; a second budget change risks regressing other fixtures per Codex's Day 7 hard gate. Ledger updated: Orientation, Session Log, main_tip, harness line. Next on the roadmap (from DEV-LEDGER Active Plan / docs/next-steps): - Wave 2 trusted operational ingestion (p04/p05/p06 dashboards) - Finish OpenClaw integration (Phase 8) - Auto-triage (multi-model second pass to reduce human review) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 06:41:42 -04:00
Anto01	5c69f77b45	fix: cap per-entry memory length at 250 chars in context band A 530-char program overview memory with confidence 0.96 was filling the entire 25% project-memory budget at equal overlap score (3 tokens), beating shorter query-relevant newly-promoted memories (confidence 0.5) on the confidence tiebreaker. The long memory legitimately scored well, but its length starved every other memory from the band. Fix: truncate each formatted entry to 250 chars with '...' so at least 2-3 memories fit the ~700-char available budget. This doesn't change ranking — the most relevant memory still goes first — but it ensures the runner-up can also appear. Harness fixture delta: Day 7 regression pass pending after deploy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 06:34:27 -04:00
Anto01	3921c5ffc7	test: Day 6 — retrieval harness expanded from 6 to 18 fixtures Added 12 new fixtures across all three active projects: - p04: 1 short/ambiguous case ('current status') - p05: 1 CGH calibration case with cross-project bleed guard - p06: 7 new fixtures targeting triage-promoted memories (firmware interface, z-axis, cam encoder, telemetry rate, offline design, USB SSD, Tailscale) - Adversarial: cross-project-no-bleed (p04 query must not surface p06 telemetry rate), no-project-hint (project memories must not appear without a hint) First run: 14/18 passing. 4 failures (p06-firmware-interface, p06-z-axis, p06-offline-design, p06-tailscale) share the same root cause: long pre-existing p06 memories (530+ chars, confidence 0.9+) fill the 25% project-memory budget before the query-relevant newly-promoted memories (shorter, confidence 0.5) get a slot. Budget contention at equal overlap score tiebroken by confidence. Day 7 ranking tweak target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 06:32:47 -04:00
Anto01	93f796207f	docs: Day 5 — extractor scope + stale follow-ups cleaned Documents the LLM-assisted extractor's in-scope / out-of-scope categories derived from the first live triage pass (16 promoted, 35 rejected). Five in-scope classes, six explicit out-of-scope classes, trust model summary, multi-model future direction. Cleaned up stale follow-up items in next-steps.md: rule expansion marked deprioritized, LLM extractor marked done, retrieval harness marked done with expansion pending. Fixed docstring timeout (45s -> 90s) in extractor_llm.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 06:24:25 -04:00

1 2 3

137 Commits