ATOCore

Author	SHA1	Message	Date
Anto01	a29b5e22f2	feat(eval-loop): Day 4 — LLM extractor via claude -p (OAuth, no API key) Second pass on the LLM-assisted extractor after Antoine's explicit rule: no API key, ever. Refactored src/atocore/memory/extractor_llm.py to shell out to the Claude Code 'claude -p' CLI via subprocess instead of the anthropic SDK, so extraction reuses the user's existing Claude.ai OAuth credentials and needs zero secret management. Implementation: - subprocess.run(["claude", "-p", "--model", "haiku", "--append-system-prompt", <instructions>, "--no-session-persistence", "--disable-slash-commands", user_message], ...) - cwd is a cached tempfile.mkdtemp() so every invocation starts with a clean context instead of auto-discovering CLAUDE.md / AGENTS.md / DEV-LEDGER.md from the repo root. We cannot use --bare because it forces API-key auth, which defeats the purpose; the temp-cwd trick is the lightest way to keep OAuth auth while skipping project context loading. - Silent-failure contract unchanged: missing CLI, non-zero exit, timeout, malformed JSON — all return [] and log an error. The capture audit trail must not break on an optional side effect. - Default timeout bumped from 20s to 90s: Haiku + Node.js startup + OAuth check is ~20-40s per call in practice, plus real responses up to 8KB take longer. 45s hit 2 timeouts on the first live run. - tests/test_extractor_llm.py refactored: the API-key / anthropic SDK tests are replaced by subprocess-mocking tests covering missing CLI, timeout, non-zero exit, and a happy-path stdout parse. 14 tests, all green. scripts/extractor_eval.py: - New --output <path> flag writes the JSON result directly to a file, bypassing stdout/log interleaving (structlog sends INFO to stdout via PrintLoggerFactory, so a naive '> out.json' pollutes the file). - Forces UTF-8 on stdout so real LLM output with em-dashes / arrows / CJK doesn't crash the human report on Windows cp1252 consoles. First live baseline run against the 20-interaction labeled corpus (scripts/eval_data/extractor_llm_baseline_2026-04-11.json): mode=llm labeled=20 recall=1.0 precision=0.357 yield_rate=2.55 total_actual_candidates=51 total_expected_candidates=7 false_negative_interactions=0 false_positive_interactions=9 Recall 0% -> 100% vs rule baseline — every human-labeled positive is caught. Precision reads low (0.357) but inspection shows the "false positives" are real candidates the human labels under-counted. For example interaction a6b0d279 was labeled at 2 expected candidates, the model caught all 6 polisher architectural facts; interaction 52c8c0f3 was labeled at 1, the model caught all 5 infra commitments. The labels are the bottleneck, not the model. Day 4 gate against Codex's criteria: - candidate yield: 255% vs ≥15-25% target - FP rate tolerable for manual triage: 51 candidates reviewable in ~10 minutes via the triage CLI - ≥2 real non-synthetic candidates worth review: 20+ obvious wins (polisher architecture set, p05 infra set, DEV-LEDGER protocol set) Gate cleared. LLM-assisted extraction is the path forward for conversational captures. Rule-based extractor stays as-is for structured-cue inputs and remains the default mode. The next step (Day 5 stabilize / document) will wire LLM mode behind a flag in the public extraction endpoint and document scope. Test count: 276 -> 278 passing. No existing tests changed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:45:24 -04:00
Anto01	b309e7fd49	feat(eval-loop): Day 4 — LLM-assisted extractor path (additive, flagged) Day 2 baseline showed 0% recall for the rule-based extractor across 5 distinct miss classes. Day 4 decision gate: prototype an LLM-assisted mode behind a flag. Option A ratified by Antoine. New module src/atocore/memory/extractor_llm.py: - extract_candidates_llm(interaction) returns the same MemoryCandidate dataclass the rule extractor produces, so both paths flow through the existing triage / candidate pipeline unchanged. - extract_candidates_llm_verbose() also returns the raw model output and any error string, for eval and debugging. - Uses Claude Haiku 4.5 by default; model overridable via ATOCORE_LLM_EXTRACTOR_MODEL env. Timeout via ATOCORE_LLM_EXTRACTOR_TIMEOUT_S (default 20s). - Silent-failure contract: missing API key, unreachable model, malformed JSON — all return [] and log an error. Never raises into the caller. The capture audit trail must not break on an optional side effect. - Parser tolerates markdown fences, surrounding prose, invalid memory types, clamps confidence to [0,1], drops empty content. - System prompt explicitly tells the model to return [] for most conversational turns (durable-fact bar, not "extract everything"). - Trust rules unchanged: candidates are never auto-promoted, extraction stays off the capture hot path, human triages via the existing CLI. scripts/extractor_eval.py: new --mode {rule,llm} flag so the same labeled corpus can be scored against both extractors. Default remains rule so existing invocations are unchanged. tests/test_extractor_llm.py: 12 new unit tests covering the parser (empty array, malformed JSON, markdown fences, surrounding prose, invalid types, empty content, confidence clamping, version tagging), plus contract tests for missing API key, empty response, and a mocked api_error path so failure modes never raise. Test count: 264 -> 276 passing. No existing tests changed. Next step: run `python scripts/extractor_eval.py --mode llm` against the labeled set with ANTHROPIC_API_KEY in env, record the delta, decide whether to wire LLM mode into the API endpoint and CLI or keep it script-only for now. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 15:18:30 -04:00
Anto01	5aeeb1cad1	feat: query-relevance ordering for memory selection get_memories_for_context now accepts an optional query string. When provided, candidate memories are reranked by lexical overlap with the query (stemmed token intersection, ties broken by confidence) before the budget walk. Without a query the order is unchanged — effectively "by confidence desc" as before — so non-builder callers see no behaviour change. The fetch limit is raised from 10 to 30 so there's a real pool to rerank. Token overlap reuses _normalize/_tokenize from reinforcement.py so ranking and reinforcement matching share the same notion of distinctive terms. build_context passes the user_prompt through to both the identity/ preference and project-memory calls. The retrieval harness regression the fix is targeting: - p05-vendor-signal FAIL @ `1161645`: "Zygo" missing from the pack even though an active vendor memory contained it. Root cause: higher-confidence p05 memories filled the 25% budget slice before the vendor memory ever got a chance. Query-aware ordering puts the vendor memory first when the query is about vendors. New regression test test_project_memories_query_relevance_ordering locks the behaviour in with two p05 memories and a tight budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 12:47:05 -04:00
Anto01	8ea53f4003	feat: fold project-scoped memories into context pack The retrieval-quality review on 2026-04-11 found that active project/knowledge/episodic memories never reached the pack: only Trusted Project State and identity/preference memories were being assembled. Reinforcement bumped confidence on memories that had no retrieval outlet, so the reflection loop was half-open. This change adds a third memory tier between identity/preference and retrieved chunks: - PROJECT_MEMORY_BUDGET_RATIO = 0.15 - Memory types: project, knowledge, episodic - Only populated when a canonical project is in scope — without a project hint, project memories stay out (cross-project bleed would rot the signal) - Rendered under a dedicated "--- Project Memories ---" header so the LLM can distinguish it from the identity/preference band - Trim order in _trim_context_to_budget: retrieval → project memories → identity/preference → project state (most recently added tier drops first when budget is tight) get_memories_for_context gains header/footer kwargs so the two memory blocks can be distinguished in a single pack without a second helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 11:35:40 -04:00
Anto01	9366ba7879	feat: length-aware reinforcement + batch triage CLI + off-host backup - Reinforcement matcher now handles paragraph-length memories via a dual-mode threshold: short memories keep the 70% overlap rule, long memories (>15 stems) require 12 absolute overlaps AND 35% fraction so organic paraphrase can still reinforce. Diagnosis: every active memory stayed at reference_count=0 because 40-token project summaries never hit 70% overlap on real responses. - scripts/atocore_client.py gains batch-extract (fan out /interactions/{id}/extract over recent interactions) and triage (interactive promote/reject walker for the candidate queue), matching the Phase 9 reflection-loop review flow without pulling extraction into the capture hot path. - deploy/dalidou/cron-backup.sh adds an optional off-host rsync step gated on ATOCORE_BACKUP_RSYNC, fail-open when the target is offline so a laptop being off at 03:00 UTC never reds the local backup. - docs/next-steps.md records the retrieval-quality sweep: project state surfaces, chunks are on-topic but broad, active memories never reach the pack (reflection loop has no retrieval outlet yet). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 11:20:03 -04:00
Anto01	c5bad996a7	feat: enable reinforcement on live capture The Stop hook now sends reinforce=true so the token-overlap matcher runs on every captured interaction. Memory confidence will accumulate signal from organic Claude Code use. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 10:58:56 -04:00
Anto01	58c744fd2f	feat: post-backup validation + retention cleanup (Tasks B & C) - create_runtime_backup() now auto-validates its output and includes validated/validation_errors fields in returned metadata - New cleanup_old_backups() with retention policy: 7 daily, 4 weekly (Sundays), 6 monthly (1st of month), dry-run by default - CLI `cleanup` subcommand added to backup module - 9 new tests (2 validation + 7 retention), 259 total passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 09:46:46 -04:00
Anto01	a34a7a995f	fix: token-overlap matcher for reinforcement (Phase 9B) Replace the substring-based _memory_matches() with a token-overlap matcher that tokenizes both memory content and response, applies lightweight stemming (trailing s/ed/ing) and stop-word removal, then checks whether >= 70% of the memory's tokens appear in the response. This fixes the paraphrase blindness that prevented reinforcement from ever firing on natural responses ("prefers" vs "prefer", "because history" vs "because the history"). 7 new tests (26 total reinforcement tests, all passing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 09:40:05 -04:00
Anto01	92fc250b54	fix: use correct hook field name last_assistant_message The Claude Code Stop hook sends `last_assistant_message`, not `assistant_message`. This was causing response_chars=0 on all captured interactions. Also removes the temporary debug log block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 09:17:21 -04:00
Anto01	2d911909f8	feat: auto-capture Claude Code sessions via Stop hook Add deploy/hooks/capture_stop.py — a Claude Code Stop hook that reads the transcript JSONL, extracts the last user prompt, and POSTs to the AtoCore /interactions endpoint in conservative mode (reinforce=false). Conservative mode means: capture only, no automatic reinforcement or extraction into the review queue. Kill switch: ATOCORE_CAPTURE_DISABLED=1. Also: note build_sha cosmetic issue after restore in runbook, update project status docs to reflect drill pass and auto-capture wiring. 17 new tests (243 total, all passing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 09:00:42 -04:00
Anto01	1a8fdf4225	fix: chroma restore bind-mount bug + consolidate docs Two fixes from the 2026-04-09 first real restore drill on Dalidou, plus the long-overdue doc consolidation I should have done when I added the drill runbook instead of creating a duplicate. ## Chroma restore bind-mount bug (drill finding) src/atocore/ops/backup.py: restore_runtime_backup() used to call shutil.rmtree(dst_chroma) before copying the snapshot back. In the Dockerized Dalidou deployment the chroma dir is a bind-mounted volume — you can't unlink a mount point, rmtree raises OSError [Errno 16] Device or resource busy and the restore silently fails to touch Chroma. This bit the first real drill; the operator worked around it with --no-chroma plus a manual cp -a. Fix: clear the destination's CONTENTS (iterdir + rmtree/unlink per child) and use copytree(dirs_exist_ok=True) so the mount point itself is never touched. Equivalent semantics, bind-mount-safe. Regression test: tests/test_backup.py::test_restore_chroma_does_not_unlink_destination_directory captures Path.stat().st_ino of the dest dir before and after restore and asserts they match. That's the same invariant a bind-mounted chroma dir enforces — if the inode changed, the mount would have failed. 11/11 backup tests now pass. ## Doc consolidation docs/backup-restore-drill.md existed as a duplicate of the authoritative docs/backup-restore-procedure.md. When I added the drill runbook in commit `3362080` I wrote it from scratch instead of updating the existing procedure — bad doc hygiene on a project that's literally about being a context engine. - Deleted docs/backup-restore-drill.md - Folded its contents into docs/backup-restore-procedure.md: - Replaced the manual sudo cp restore sequence with the new `python -m atocore.ops.backup restore <STAMP> --confirm-service-stopped` CLI - Added the one-shot docker compose run pattern for running restore inside a container that reuses the live volume mounts - Documented the --no-pre-snapshot / --no-chroma / --chroma flags - New "Chroma restore and bind-mounted volumes" subsection explaining the bug and the regression test that protects the fix - New "Restore drill" subsection with three levels (unit tests, module round-trip, live Dalidou drill) and the cadence list - Failure-mode table gained four entries: restored_integrity_ok, Device-or-resource-busy, drill marker still present, chroma_snapshot_missing - "Open follow-ups" struck the restore_runtime_backup item (done) and added a "Done (historical)" note referencing 2026-04-09 - Quickstart cheat sheet now has a full drill one-liner using memory_type=episodic (the 2026-04-09 drill found the runbook's memory_type=note was invalid — the valid set is identity, preference, project, episodic, knowledge, adaptation) ## Status doc sync Long overdue — I've been landing code without updating the project's narrative state docs. docs/current-state.md: - "Reliability Baseline" now reflects: restore_runtime_backup is real with CLI, pre-restore safety snapshot, WAL cleanup, integrity check; live drill on 2026-04-09 surfaced and fixed Chroma bind-mount bug; deploy provenance via /health build_sha; deploy.sh self-update re-exec guard - "Immediate Next Focus" reshuffled: drill re-run (priority 1) and auto-capture (priority 2) are now ahead of retrieval quality work, reflecting the updated unblock sequence docs/next-steps.md: - New item 1: re-run the drill with chroma working end-to-end - New item 2: auto-capture conservative mode (Stop hook) - Old item 7 rewritten as item 9 listing what's DONE (create/list/validate/restore, admin/backup endpoint with include_chroma, /health provenance, self-update guard, procedure doc with failure modes) and what's still pending (retention cleanup, off-Dalidou target, auto-validation) ## Test count 226 passing (was 225 + 1 new inode-stability regression test). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 09:13:21 -04:00
Anto01	336208004c	ops: add restore_runtime_backup + drill runbook Close the backup side of the loop: we had create/list/validate but no restore, and no documented drill. A backup you've never restored is not a backup. This lands the missing restore surface and the procedure to exercise it before enabling any write-path automation (auto-capture, automated ingestion, reinforcement sweeps). Code — src/atocore/ops/backup.py: - restore_runtime_backup(stamp, *, include_chroma, pre_restore_snapshot, confirm_service_stopped) performs: 1. validate_backup() gate — refuse on any error 2. pre-restore safety snapshot of current state (reversibility anchor) 3. PRAGMA wal_checkpoint(TRUNCATE) on target db (flush + release OS handles; Windows needs this after conn.backup() reads) 4. unlink stale -wal/-shm sidecars (tolerant to Windows lock races) 5. shutil.copy2 snapshot db over target 6. restore registry if snapshot captured one 7. restore Chroma tree if snapshot captured one and include_chroma resolves to true (defaults to whether backup has Chroma) 8. PRAGMA integrity_check on restored db, report result - Refuses without confirm_service_stopped=True to prevent hot-restore into a running service (would corrupt SQLite state) - Rewrote main() as argparse with 4 subcommands: create, list, validate, restore. `python -m atocore.ops.backup restore STAMP --confirm-service-stopped` is the drill CLI entry point, run via `docker compose run --rm --entrypoint python atocore` so it reuses the live service's volume mounts Tests — tests/test_backup.py (6 new): - test_restore_refuses_without_confirm_service_stopped - test_restore_raises_on_invalid_backup - test_restore_round_trip_reverses_post_backup_mutations (canonical drill flow: seed -> backup -> mutate -> restore -> mutation gone + baseline survived + pre-restore snapshot has the mutation captured as rollback anchor) - test_restore_round_trip_with_chroma - test_restore_skips_pre_snapshot_when_requested - test_restore_cleans_stale_wal_sidecars (asserts stale byte markers do not survive, not file existence, since PRAGMA integrity_check may legitimately recreate -wal) Docs — docs/backup-restore-drill.md (new): - What gets backed up (hot sqlite, cold chroma, registry JSON, metadata.json) and what doesn't (.env, source content) - What restore does, step by step, and why confirm_service_stopped is a hard gate - 8-step drill procedure: capture -> baseline -> mutate -> stop -> restore -> start -> verify marker gone -> optional cleanup - Correct endpoint bodies verified against routes.py: POST /admin/backup with JSON body {"include_chroma": true} POST /memory with memory_type/content/project/confidence GET /memory?project=drill to list drill markers POST /query with {"prompt": ..., "top_k": ...} (not "query") - Failure modes: integrity_check fail, container won't start, marker still present after restore, with remediation for each - When to run: before new write-path automation, after backup.py or schema changes, after infra bumps, monthly as standing check 225/225 tests passing (219 existing + 6 new restore). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 21:17:48 -04:00
Anto01	be4099486c	deploy: add build_sha visibility for precise drift detection Make /health report the precise git SHA the container was built from, so 'is the live service current?' can be answered without ambiguity. 0.2.0 was too coarse to trust as a 'live is current' signal — many commits share the same __version__. Three layers: 1. /health endpoint (src/atocore/api/routes.py) - Reads ATOCORE_BUILD_SHA, ATOCORE_BUILD_TIME, ATOCORE_BUILD_BRANCH from environment, defaults to 'unknown' - Reports them alongside existing code_version field 2. docker-compose.yml - Forwards the three env vars from the host into the container - Defaults to 'unknown' so direct `docker compose up` runs (without deploy.sh) cleanly signal missing build provenance 3. deploy.sh - Step 2 captures git SHA + UTC timestamp + branch and exports them as env vars before `docker compose up -d --build` - Step 6 reads /health post-deploy and compares the reported build_sha against the freshly-built one. Mismatch exits non-zero (exit code 6) with a remediation hint covering cached image, env propagation, and concurrent restart cases Tests (tests/test_api_storage.py): - test_health_endpoint_reports_code_version_from_module - test_health_endpoint_reports_build_metadata_from_env - test_health_endpoint_reports_unknown_when_build_env_unset Docs (docs/dalidou-deployment.md): - Three-level drift detection table (code_version coarse, build_sha precise, build_time/branch forensic) - Canonical drift check script using LIVE_SHA vs EXPECTED_SHA - Note that running deploy.sh is itself the simplest drift check 219/219 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 20:25:32 -04:00
Anto01	b492f5f7b0	fix: schema init ordering, deploy.sh default, client BASE_URL docs Three issues Dalidou Claude surfaced during the first real deploy of commit `e877e5b` to the live service (report from 2026-04-08). Bug 1 was the critical one — a schema init ordering bug that would have bitten every future upgrade from a pre-Phase-9 schema — and the other two were usability traps around hostname resolution. Bug 1 (CRITICAL): schema init ordering -------------------------------------- src/atocore/models/database.py SCHEMA_SQL contained CREATE INDEX statements that referenced columns added later by _apply_migrations(): CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project); CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project); CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id); On a FRESH install, CREATE TABLE IF NOT EXISTS creates the tables with the Phase 9 shape (columns present), so the CREATE INDEX runs cleanly and _apply_migrations is effectively a no-op. On an UPGRADE from a pre-Phase-9 schema, CREATE TABLE IF NOT EXISTS is a no-op (the tables already exist in the old shape), the columns are NOT added yet, and the CREATE INDEX fails with "OperationalError: no such column: project" before _apply_migrations gets a chance to add the columns. Dalidou Claude hit this exactly when redeploying from 0.1.0 to 0.2.0 — had to manually ALTER TABLE to add the Phase 9 columns before the container could start. The fix is to remove the Phase 9-column indexes from SCHEMA_SQL. They already exist in _apply_migrations() AFTER the corresponding ALTER TABLE, so they still get created on both fresh and upgrade paths — just after the columns exist, not before. Indexes still in SCHEMA_SQL (all safe — reference columns that have existed since the first release): - idx_chunks_document on source_chunks(document_id) - idx_memories_type on memories(memory_type) - idx_memories_status on memories(status) - idx_interactions_project on interactions(project_id) Indexes moved to _apply_migrations (already there — just no longer duplicated in SCHEMA_SQL): - idx_memories_project on memories(project) - idx_interactions_project_name on interactions(project) - idx_interactions_session on interactions(session_id) - idx_interactions_created_at on interactions(created_at) Regression test: tests/test_database.py --------------------------------------- New test_init_db_upgrades_pre_phase9_schema_without_failing: - Seeds the DB with the exact pre-Phase-9 shape (no project / last_referenced_at / reference_count on memories; no project / client / session_id / response / memories_used / chunks_used on interactions) - Calls init_db() — which used to raise OperationalError before the fix - Verifies all Phase 9 columns are present after the call - Verifies the migration indexes exist Before the fix this test would have failed with "OperationalError: no such column: project" on the init_db call. After the fix it passes. This locks the invariant "init_db is safe on any legacy schema shape" so the bug can't silently come back. Full suite: 216 passing (was 215), 1 warning. The +1 is the new regression test. Bug 3 (usability): deploy.sh DNS default ---------------------------------------- deploy/dalidou/deploy.sh ATOCORE_GIT_REMOTE defaulted to http://dalidou:3000/Antoine/ATOCore.git which requires the "dalidou" hostname to resolve. On the Dalidou host itself it didn't (no /etc/hosts entry for localhost alias), so deploy.sh had to be run with the IP as a manual workaround. Fix: default ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git. Loopback always works on the host running the script. Callers from a remote host (e.g. running deploy.sh from a laptop against the Dalidou LAN) set ATOCORE_GIT_REMOTE explicitly. The script header's Environment Variables section documents this with an explicit reference to the 2026-04-08 Dalidou deploy report so the rationale isn't lost. docs/dalidou-deployment.md gets a new "Troubleshooting hostname resolution" subsection and a new example invocation showing how to deploy from a remote host with an explicit ATOCORE_GIT_REMOTE override. Bug 2 (usability): atocore_client.py ATOCORE_BASE_URL documentation ------------------------------------------------------------------- scripts/atocore_client.py Same class of issue as bug 3. BASE_URL defaults to http://dalidou:8100 which resolves fine from a remote caller (laptop, T420/OpenClaw over Tailscale) but NOT from the Dalidou host itself or from inside the atocore container. Dalidou Claude saw the CLI return {"status": "unavailable", "fail_open": true} while direct curl to http://127.0.0.1:8100 worked. The fix here is NOT to change the default (remote callers are the common case and would break) but to DOCUMENT the override clearly so the next operator knows what's happening: - The script module docstring grew a new "Environment variables" section covering ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS, ATOCORE_REFRESH_TIMEOUT_SECONDS, and ATOCORE_FAIL_OPEN, with the explicit override example for on-host/in-container use - It calls out the exact symptom (fail-open envelope when the base URL doesn't resolve) so the diagnosis is obvious from the error alone - docs/dalidou-deployment.md troubleshooting section mirrors this guidance so there's one place to look regardless of whether the operator starts with the client help or the deploy doc What this commit does NOT do ---------------------------- - Does NOT change the default ATOCORE_BASE_URL. Doing that would break the T420 OpenClaw helper and every remote caller who currently relies on the hostname. Documentation is the right fix for this case. - Does NOT fix /etc/hosts on Dalidou. That's a host-level configuration issue that the user can fix if they prefer having the hostname resolve; the deploy.sh fix makes it unnecessary regardless. - Does NOT re-run the validation on Dalidou. The next step is for the live service to pull this commit via deploy.sh (which should now work without the IP workaround) and re-run the Phase 9 loop test to confirm nothing regressed.	2026-04-08 19:02:57 -04:00
Anto01	fad30d5461	feat(client): Phase 9 reflection loop surface in shared operator CLI Codex's sequence step 3: finish the Phase 9 operator surface in the shared client. The previous client version (0.1.0) covered stable operations (project lifecycle, retrieval, context build, trusted state, audit-query) but explicitly deferred capture/extract/queue/ promote/reject pending "exercised workflow". That deferral ran into a bootstrap problem: real Claude Code sessions can't exercise the Phase 9 loop without a usable client surface to drive it. This commit ships the 8 missing subcommands so the next step (real validation on Dalidou) is unblocked. Bumps CLIENT_VERSION from 0.1.0 to 0.2.0 per the semver rules in llm-client-integration.md (new subcommands = minor bump). New subcommands in scripts/atocore_client.py -------------------------------------------- \| Subcommand \| Endpoint \| \|-----------------------\|-------------------------------------------\| \| capture \| POST /interactions \| \| extract \| POST /interactions/{id}/extract \| \| reinforce-interaction \| POST /interactions/{id}/reinforce \| \| list-interactions \| GET /interactions \| \| get-interaction \| GET /interactions/{id} \| \| queue \| GET /memory?status=candidate \| \| promote \| POST /memory/{id}/promote \| \| reject \| POST /memory/{id}/reject \| Each follows the existing client style: positional arguments with empty-string defaults for optional filters, truthy-string arguments for booleans (matching the existing refresh-project pattern), JSON output via print_json(), fail-open behavior inherited from request(). capture accepts prompt + response + project + client + session_id + reinforce as positionals, defaulting the client field to "atocore-client" when omitted so every capture from the shared client is identifiable in the interactions audit trail. extract defaults to preview mode (persist=false). Pass "true" as the second positional to create candidate memories. list-interactions and queue build URL query strings with url-encoded values and always include the limit, matching how the existing context-build subcommand handles its parameters. Security fix: ID-field URL encoding ----------------------------------- The initial draft used urllib.parse.quote() with the default safe set, which does NOT encode "/" because it's a reserved path character. That's a security footgun on ID fields: passing "promote mem/evil/action" would build /memory/mem/evil/action/promote and hit a completely different endpoint than intended. Fixed by passing safe="" to urllib.parse.quote() on every ID field (interaction_id and memory_id). The tests cover this explicitly via test_extract_url_encodes_interaction_id and test_promote_url_encodes_memory_id, both of which would have failed with the default behavior. Project names keep the default quote behavior because a project name with a slash would already be broken elsewhere in the system (ingest root resolution, file paths, etc). tests/test_atocore_client.py (new, 18 tests, all green) ------------------------------------------------------- A dedicated test file for the shared client that mocks the request() helper and verifies each subcommand: - calls the correct HTTP method and path - builds the correct JSON body (or query string) - passes the right subset of CLI arguments through - URL-encodes ID fields so path traversal isn't possible Tests are structured as unit tests (not integration tests) because the API surface on the server side already has its own route tests in test_api_storage.py and the Phase 9 specific files. These tests are the wiring contract between CLI args and HTTP calls. Test file highlights: - capture: default values, custom client, reinforce=false - extract: preview by default, persist=true opt-in, URL encoding - reinforce-interaction: correct path construction - list-interactions: no filters, single filter, full filter set (including ISO 8601 since parameter with T separator and Z) - get-interaction: fetch by id - queue: always filters status=candidate, accepts memory_type and project, coerces limit to int - promote / reject: correct path + URL encoding - test_phase9_full_loop_via_client_shape: end-to-end sequence that drives capture -> extract preview -> extract persist -> queue list -> promote -> reject through the shared client and verifies the exact sequence of HTTP calls that would be made These tests run in ~0.2s because they mock request() — no DB, no Chroma, no HTTP. The fast feedback loop matters because the client surface is what every agent integration eventually depends on. docs/architecture/llm-client-integration.md updates --------------------------------------------------- - New "Phase 9 reflection loop (shipped after migration safety work)" section under "What's in scope for the shared client today" with the full 8-subcommand table and a note explaining the bootstrap-problem rationale - Removed the "Memory review queue and reflection loop" section from "What's intentionally NOT in scope today"; backup admin and engineering-entity commands remain the only deferred families - Renumbered the deferred-commands list (was 3 items, now 2) - Open follow-ups updated: memory-review-subcommand item replaced with "real-usage validation of the Phase 9 loop" as the next concrete dependency - TL;DR updated to list the reflection-loop subcommands - Versioning note records the v0.1.0 -> v0.2.0 bump with the subcommands included Full suite: 215 passing (was 197), 1 warning. The +18 is tests/test_atocore_client.py. Runtime unchanged because the new tests don't touch the DB. What this commit does NOT do ---------------------------- - Does NOT change the server-side endpoints. All 8 subcommands call existing API routes that were shipped in Phase 9 Commits A/B/C. This is purely a client-side wiring commit. - Does NOT run the reflection loop against the live Dalidou instance. That's the next concrete step and is explicitly called out in the open-follow-ups section of the updated doc. - Does NOT modify the Claude Code slash command. It still pulls context only; the capture/extract/queue/promote companion commands (e.g. /atocore-record-response) are deferred until the capture workflow has been exercised in real use at least once. - Does NOT refactor the OpenClaw helper. That's a cross-repo change and remains a queued follow-up, now unblocked by the shared client having the reflection-loop subcommands.	2026-04-08 16:09:42 -04:00
Anto01	261277fd51	fix(migration): preserve superseded/invalid shadow state during rekey Codex caught a real data-loss bug in the legacy alias migration shipped in `7e60f5a`. plan_state_migration filtered state rows to status='active' only, then apply_plan deleted the shadow projects row at the end. Because project_state.project_id has ON DELETE CASCADE, any superseded or invalid state rows still attached to the shadow project got silently cascade-deleted — exactly the audit loss a cleanup migration must not cause. This commit fixes the bug and adds regression tests that lock in the invariant "shadow state of every status is accounted for". Root cause ---------- scripts/migrate_legacy_aliases.py::plan_state_migration was: "SELECT * FROM project_state WHERE project_id = ? AND status = 'active'" which only found live rows. Any historical row (status in 'superseded' or 'invalid') was invisible to the plan, so the apply step had nothing to rekey for it. Then the shadow project row was deleted at the end, cascade-deleting every unplanned row. The fix ------- plan_state_migration now selects ALL state rows attached to the shadow project regardless of status, and handles every row per a per-status decision table: \| Shadow status \| Canonical at same triple? \| Values \| Action \| \|---------------\|---------------------------\|------------\|--------------------------------\| \| any \| no \| — \| clean rekey \| \| any \| yes \| same \| shadow superseded in place \| \| active \| yes, active \| different \| COLLISION, apply refuses \| \| active \| yes, inactive \| different \| shadow wins, canonical deleted \| \| inactive \| yes, any \| different \| historical drop (logged) \| Four changes in the script: 1. SELECT drops the status filter so the plan walks every row. 2. New StateRekeyPlan.historical_drops list captures the shadow rows that lose to a canonical row at the same triple because the shadow is already inactive. These are the only unavoidable data losses, and they happen because the UNIQUE(project_id, category, key) constraint on project_state doesn't allow two rows per triple regardless of status. 3. New apply action 'replace_inactive_canonical' for the shadow-active-vs-canonical-inactive case. At apply time the canonical inactive row is DELETEd first (SQLite's default immediate constraint checking) and then the shadow is UPDATEd into its place in two separate statements. Adds a new state_rows_replaced_inactive_canonical counter. 4. New apply counter state_rows_historical_dropped for audit transparency. The rows themselves are still cascade-deleted when the shadow project row is dropped, but they're counted and reported. Five places render_plan_text and plan_to_json_dict updated: - counts() gains state_historical_drops - render_plan_text prints a 'historical drops' section with each shadow-canonical pair and their statuses when there are any, so the operator sees the audit loss BEFORE running --apply - The new section explicitly tells the operator: "if any of these values are worth keeping as separate audit records, manually copy them out before running --apply" - plan_to_json_dict carries historical_drops into the JSON report - The state counts table in the human report now shows both 'state collisions (block)' and 'state historical drops' as separate lines so the operator can distinguish "apply will refuse" from "apply will drop historical rows" Regression tests (3 new, all green) ----------------------------------- tests/test_migrate_legacy_aliases.py: - test_apply_preserves_superseded_shadow_state_when_no_collision: the direct regression for the codex finding. Seeds a shadow with a superseded state row on a triple the canonical doesn't have, runs the migration, verifies via raw SQL that the row is now attached to the canonical projects row and still has status 'superseded'. This is the test that would have failed before the fix. - test_apply_drops_shadow_inactive_row_when_canonical_holds_same_triple: covers the unavoidable data-loss case. Seeds shadow superseded + canonical active at the same triple with different values, verifies plan.counts() reports one historical_drop, runs apply, verifies the canonical value is preserved and the shadow value is gone. - test_apply_replaces_inactive_canonical_with_active_shadow: covers the cross-contamination case where shadow has live value and canonical has a stale invalid row. Shadow wins by deleting canonical and rekeying in its place. Verifies the counter and the final state. Plus _seed_state_row now accepts a status kwarg so the seeding helper can create superseded/invalid rows directly. test_dry_run_on_empty_registry_reports_empty_plan was updated to include the new state_historical_drops key in the expected counts dict (all zero for an empty plan, so the test shape is the same). Full suite: 197 passing (was 194), 1 warning. The +3 is the three new regression tests. What this commit does NOT do ---------------------------- - Does NOT try to preserve historical shadow rows that collide with a canonical row at the same triple. That would require a schema change (adding (id) to the UNIQUE key, or a separate history table) and isn't in scope for a cleanup migration. The operator sees these as explicit 'historical drops' in the plan output and can copy them out manually if any are worth preserving. - Does NOT change any behavior for rows that were already reachable from the canonicalized read path. The fix only affects legacy rows whose project_id points at a shadow row. - Does NOT re-verify the earlier happy-path tests beyond the full suite confirming them still green.	2026-04-08 15:52:44 -04:00
Anto01	7e60f5a0e6	feat(ops): legacy alias migration script with dry-run/apply modes Closes the compatibility gap documented in docs/architecture/project-identity-canonicalization.md. Before `fb6298a`, writes to project_state, memories, and interactions stored the raw project name. After `fb6298a` every service-layer entry point canonicalizes through the registry, which silently made pre-fix alias-keyed rows unreachable from the new read path. Now there's a migration tool to find and fix them. This commit is the tool and its tests. The tool is NOT run against the live Dalidou DB in this commit — that's a separate supervised manual step after reviewing the dry-run output. scripts/migrate_legacy_aliases.py --------------------------------- Standalone offline migration tool. Dry-run default, --apply explicit. What it inspects: - projects: rows whose name is a registered alias and differs from the canonical project_id (shadow rows) - project_state: rows whose project_id points at a shadow; plan rekeys them to the canonical row's id. (category, key) collisions against the canonical block the apply step until a human resolves - memories: rows whose project column is a registered alias. Plain string rekey. Dedup collisions (after rekey, same (memory_type, content, project, status)) are handled by the existing memory supersession model: newer row stays active, older becomes superseded with updated_at as tiebreaker - interactions: rows whose project column is a registered alias. Plain string rekey, no collision handling What it does NOT do: - Never touches rows that are already canonical - Never auto-resolves project_state collisions (refuses until the human picks a winner via POST /project/state) - Never creates data; only rekeys or supersedes - Never runs outside a single SQLite transaction; any failure rolls back the entire migration Safety rails: - Dry-run is default. --apply is explicit. - Apply on empty plan refuses unless --allow-empty (prevents accidental runs that look meaningful but did nothing) - Apply refuses on any project_state collision - Apply refuses on integrity errors (e.g. two case-variant rows both matching the canonical lookup) - Writes a JSON report to data/migrations/ on every run (dry-run and apply alike) for audit - Idempotent: running twice produces the same final state as running once. The second run finds zero shadow rows and exits clean. CLI flags: --registry PATH override ATOCORE_PROJECT_REGISTRY_PATH --db PATH override the AtoCore SQLite DB path --apply actually mutate (default is dry-run) --allow-empty permit --apply on an empty plan --report-dir PATH where to write the JSON report --json emit the plan as JSON instead of human prose Smoke test against the Phase 9 validation DB produces the expected "Nothing to migrate. The database is clean." output with 4 known canonical projects and 0 shadows. tests/test_migrate_legacy_aliases.py ------------------------------------ 19 new tests, all green: Plan-building: - test_dry_run_on_empty_registry_reports_empty_plan - test_dry_run_on_clean_registered_db_reports_empty_plan - test_dry_run_finds_shadow_project - test_dry_run_plans_state_rekey_without_collisions - test_dry_run_detects_state_collision - test_dry_run_plans_memory_rekey_and_supersession - test_dry_run_plans_interaction_rekey Apply: - test_apply_refuses_on_state_collision - test_apply_migrates_clean_shadow_end_to_end (verifies get_state can see the state via BOTH the alias AND the canonical after migration) - test_apply_drops_shadow_state_duplicate_without_collision (same (category, key, value) on both sides - mark shadow superseded, don't hit the UNIQUE constraint) - test_apply_migrates_memories - test_apply_migrates_interactions - test_apply_is_idempotent - test_apply_refuses_with_integrity_errors (uses case-variant canonical rows to work around projects.name UNIQUE constraint; verifies the case-insensitive duplicate detection works) Reporting: - test_plan_to_json_dict_is_serializable - test_write_report_creates_file - test_render_plan_text_on_empty_plan - test_render_plan_text_on_collision End-to-end gap closure (the most important test): - test_legacy_alias_gap_is_closed_after_migration - Seeds the exact same scenario as test_legacy_alias_keyed_state_is_invisible_until_migrated in test_project_state.py (which documents the pre-migration gap) - Confirms the row is invisible before migration - Runs the migration - Verifies the row is reachable via BOTH the canonical id AND the alias afterward - This test and the pre-migration gap test together lock in "before migration: invisible, after migration: reachable" as the documented invariant Full suite: 194 passing (was 175), 1 warning. The +19 is the new migration test file. Next concrete step after this commit ------------------------------------ - Run the dry-run against the live Dalidou DB to find out the actual blast radius. The script is the inspection SQL, codified. - Review the dry-run output together - If clean (zero shadows), no apply needed; close the doc gap as "verified nothing to migrate on this deployment" - If there are shadows, resolve any collisions via POST /project/state, then run --apply under supervision - After apply, the test_legacy_alias_keyed_state_is_invisible_until_migrated test still passes (it simulates the gap directly, so it's independent of the live DB state) and the gap-closed companion test continues to guard forward	2026-04-08 15:08:16 -04:00
Anto01	1953e559f9	docs+test: clarify legacy alias compatibility gap, add gap regression test Codex caught a real documentation accuracy bug in the previous canonicalization doc commit (`f521aab`). The doc claimed that rows written under aliases before `fb6298a` "still work via the unregistered-name fallback path" — that is wrong for REGISTERED aliases, which is exactly the case that matters. The unregistered-name fallback only saves you when the project was never in the registry: a row stored under "orphan-project" is read back via "orphan-project", both pass through resolve_project_name unchanged, and the strings line up. For a registered alias like "p05", the helper rewrites the read key to "p05-interferometer" but does NOT rewrite the storage key, so the legacy row becomes silently invisible. This commit corrects the doc and locks the gap behavior in with a regression test, so the issue cannot be lost again. docs/architecture/project-identity-canonicalization.md ------------------------------------------------------ - Removed the misleading claim from the "What this rule does NOT cover" section. Replaced with a pointer to the new gap section and an explicit statement that the migration is required before engineering V1 ships. - New "Compatibility gap: legacy alias-keyed rows" section between "Why this is the trust hierarchy in action" and "The rule for new entry points". This is the natural insertion point because the gap is exactly the trust hierarchy failing for legacy data. The section covers: * a worked T0/T1 timeline showing the exact failure mode * what is at risk on the live Dalidou DB, ranked by trust tier: projects table (shadow rows), project_state (highest risk because Layer 3 is most-authoritative), memories, interactions * inspection SQL queries for measuring the actual blast radius on the live DB before running any migration * the spec for the migration script: walk projects, find shadow rows, merge dependent state via the conflict model when there are collisions, dry-run mode, idempotent * explicit statement that this is required pre-V1 because V1 will add new project-keyed tables and the killer correctness queries from engineering-query-catalog.md would report wrong results against any project that has shadow rows - "Open follow-ups" item 1 promoted from "tracked optional" to "REQUIRED before engineering V1 ships, NOT optional" with a more honest cost estimate (~150 LOC migration + ~50 LOC tests + supervised live run, not the previous optimistic ~30 LOC) - TL;DR rewritten to mention the gap explicitly and re-order the open follow-ups so the migration is the top priority tests/test_project_state.py --------------------------- - New test_legacy_alias_keyed_state_is_invisible_until_migrated - Inserts a "p05" project row + a project_state row pointing at it via raw SQL (bypassing set_state which now canonicalizes), simulating a pre-fix legacy row - Verifies the canonicalized get_state path can NOT see the row via either the alias or the canonical id — this is the bug - Verifies the row is still in the database (just unreachable), so the migration script has something to find - The docstring explicitly says: "When the legacy alias migration script lands, this test must be inverted." Future readers will know exactly when and how to update it. Full suite: 175 passing (was 174), 1 warning. The +1 is the new gap regression test. What this commit does NOT do ---------------------------- - The migration script itself is NOT in this commit. Codex's finding was a doc accuracy issue, and the right scope is fix the doc + lock the gap behavior in. Writing the migration is the next concrete step but is bigger (~200 LOC + dry-run mode + collision handling via the conflict model + supervised run on the live Dalidou DB), warrants its own commit, and probably warrants a "draft + review the dry-run output before applying" workflow rather than a single shot. - Existing tests are unchanged. The new test stands alone as a documented gap; the 12 canonicalization tests from `fb6298a` still pass without modification.	2026-04-07 20:14:19 -04:00
Anto01	fb6298a9a1	fix(P1+P2): canonicalize project names at every trust boundary Three findings from codex's review of the previous P1+P2 fix. The earlier commit (`f2372ef`) only fixed alias resolution at the context builder. Codex correctly pointed out that the same fragmentation applies at every other place a project name crosses a boundary — project_state writes/reads, interaction capture/listing/filtering, memory create/queries, and reinforcement's downstream queries. Plus a real bug in the interaction `since` filter where the storage format and the documented ISO format don't compare cleanly. The fix is one helper used at every boundary instead of duplicating the resolution inline. New helper: src/atocore/projects/registry.py::resolve_project_name --------------------------------------------------------------- - Single canonicalization boundary for project names - Returns the canonical project_id when the input matches any registered id or alias - Returns the input unchanged for empty/None and for unregistered names (preserves backwards compat with hand-curated state that predates the registry) - Documented as the contract that every read/write at the trust boundary should pass through P1 — Trusted Project State endpoints ------------------------------------ src/atocore/context/project_state.py: set_state, get_state, and invalidate_state now all canonicalize project_name through resolve_project_name BEFORE looking up or creating the project row. Before this fix: - POST /project/state with project="p05" called ensure_project("p05") which created a separate row in the projects table - The state row was attached to that alias project_id - Later context builds canonicalized "p05" -> "p05-interferometer" via the builder fix from `f2372ef` and never found the state - Result: trusted state silently fragmented across alias rows After this fix: - The alias is resolved to the canonical id at every entry point - Two captures (one via "p05", one via "p05-interferometer") write to the same row - get_state via either alias or the canonical id finds the same row Fixes the highest-priority gap codex flagged because Trusted Project State is supposed to be the most dependable layer in the AtoCore trust hierarchy. P2.a — Interaction capture project canonicalization ---------------------------------------------------- src/atocore/interactions/service.py: record_interaction now canonicalizes project before storing, so interaction.project is always the canonical id regardless of what the client passed. Downstream effects: - reinforce_from_interaction queries memories by interaction.project -> previously missed memories stored under canonical id -> now consistent because interaction.project IS the canonical id - the extractor stamps candidates with interaction.project -> previously created candidates in alias buckets -> now creates candidates in the canonical bucket - list_interactions(project=alias) was already broken, now fixed by canonicalizing the filter input on the read side too Memory service applied the same fix: - src/atocore/memory/service.py: create_memory and get_memories both canonicalize project through resolve_project_name - This keeps stored memory.project consistent with the reinforcement query path P2.b — Interaction `since` filter format normalization ------------------------------------------------------ src/atocore/interactions/service.py: new _normalize_since helper. The bug: - created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by convention) so it sorts lexically and compares cleanly with the SQLite CURRENT_TIMESTAMP default - The `since` parameter was documented as ISO 8601 but compared as a raw string against the storage format - The lexically-greater 'T' separator means an ISO timestamp like '2026-04-07T12:00:00Z' is GREATER than the storage form '2026-04-07 12:00:00' for the same instant - Result: a client passing ISO `since` got an empty result for any row from the same day, even though those rows existed and were technically "after" the cutoff in real-world time The fix: - _normalize_since accepts ISO 8601 with T, optional Z suffix, optional fractional seconds, optional +HH:MM offsets - Uses datetime.fromisoformat for parsing (Python 3.11+) - Converts to UTC and reformats as the storage format before the SQL comparison - The bare storage format still works (backwards compat path is a regex match that returns the input unchanged) - Unparseable input is returned as-is so the comparison degrades gracefully (rows just don't match) instead of raising and breaking the listing endpoint builder.py refactor ------------------- The previous P1 fix had inline canonicalization. Now it uses the shared helper for consistency: - import changed from get_registered_project to resolve_project_name - the inline lookup is replaced with a single helper call - the comment block now points at representation-authority.md for the canonicalization contract New shared test fixture: tests/conftest.py::project_registry ------------------------------------------------------------ - Standardizes the registry-setup pattern that was duplicated across test_context_builder.py, test_project_state.py, test_interactions.py, and test_reinforcement.py - Returns a callable that takes (project_id, [aliases]) tuples and writes them into a temp registry file with the env var pointed at it and config.settings reloaded - Used by all 12 new regression tests in this commit Tests (12 new, all green on first run) -------------------------------------- test_project_state.py: - test_set_state_canonicalizes_alias: write via alias, read via every alias and the canonical id, verify same row id - test_get_state_canonicalizes_alias_after_canonical_write - test_invalidate_state_canonicalizes_alias - test_unregistered_project_state_still_works (backwards compat) test_interactions.py: - test_record_interaction_canonicalizes_project - test_list_interactions_canonicalizes_project_filter - test_list_interactions_since_accepts_iso_with_t_separator - test_list_interactions_since_accepts_z_suffix - test_list_interactions_since_accepts_offset - test_list_interactions_since_storage_format_still_works test_reinforcement.py: - test_reinforcement_works_when_capture_uses_alias (end-to-end: capture under alias, seed memory under canonical, verify reinforcement matches) - test_get_memories_filter_by_alias Full suite: 174 passing (was 162), 1 warning. The +12 is the new regression tests, no existing tests regressed. What's still NOT canonicalized (and why) ---------------------------------------- - _rank_chunks's secondary substring boost in builder.py — the retriever already does the right thing via its own _project_match_boost which calls get_registered_project. The redundant secondary boost still uses the raw hint but it's a multiplicative factor on top of correct retrieval, not a filter, so it can't drop relevant chunks. Tracked as a future cleanup but not a P1. - update_memory's project field (you can't change a memory's project after creation in the API anyway). - The retriever's project_hint parameter on direct /query calls — same reasoning as the builder boost, plus the retriever's own get_registered_project call already handles aliases there.	2026-04-07 08:29:33 -04:00
Anto01	f2372eff9e	fix(P1+P2): alias-aware project state lookup + slash command corpus fallback Two regression fixes from codex's review of the slash command refactor commit (`78d4e97`). Both findings are real and now have covered tests. P1 — server-side alias resolution for project_state lookup ---------------------------------------------------------- The bug: - /context/build forwarded the caller's project hint verbatim to get_state(project_hint), which does an exact-name lookup against the projects table (case-insensitive but no alias resolution) - the project registry's alias matching was only used by the client's auto-context path and the retriever's project-match boost, never by the server's project_state lookup - consequence: /atocore-context "... p05" would silently miss trusted project state stored under the canonical id "p05-interferometer", weakening project-hinted retrieval to the point that an explicit alias hint was worse than no hint The fix in src/atocore/context/builder.py: - import get_registered_project from the projects registry - before calling get_state(project_hint), resolve the hint through get_registered_project; if a registry record exists, use the canonical project_id for the state lookup - if no registry record exists, fall back to the raw hint so a hand-curated project_state entry that predates the registry still works (backwards compat with pre-registry deployments) The retriever already does its own alias expansion via get_registered_project for the project-match boost, so the retriever side was never broken — only the project_state lookup in the builder. The fix is scoped to that one call site. Tests added in tests/test_context_builder.py: - test_alias_hint_resolves_through_registry: stands up a fresh registry, sets state under "p05-interferometer", then verifies build_context with project_hint="p05" finds the state, AND with project_hint="interferometer" (the second alias) finds it too, AND with the canonical id finds it. Covers all three resolution paths. - test_unknown_hint_falls_back_to_raw_lookup: empty registry, set state under an unregistered project name, verify the build_context call with that name as the hint still finds the state. Locks in the backwards-compat behavior. P2 — slash command no-hint fallback to corpus-wide context build ---------------------------------------------------------------- The bug: - the slash command's no-hint path called auto-context, which returns {"status": "no_project_match"} when project detection fails and does NOT fall back to a plain context-build - the slash command's own help text told the user "call without a hint to use the corpus-wide context build" — which was a lie because the wrapper no longer did that - consequence: generic prompts like "what changed in AtoCore backup policy?" or any cross-project question got a useless no_project_match envelope instead of a context pack The fix in .claude/commands/atocore-context.md: - the no-hint path now does the 2-step fallback dance: 1. try `auto-context "<prompt>"` for project detection 2. if the response contains "no_project_match", fall back to `context-build "<prompt>"` (no project arg) - both branches return a real context pack, fail-open envelope is preserved for genuine network errors - the underlying client surface is unchanged (no new flags, no new subcommands) — the fallback is per-frontend logic in the slash command, leaving auto-context's existing semantics intact for OpenClaw and any other caller that depends on the no_project_match envelope as a "do nothing" signal While I was here, also tightened the slash command's argument parsing to delegate alias-knowledge to the registry instead of embedding a hardcoded list: - old version had a literal list of "atocore", "p04", "p05", "p06" and their aliases that needed manual maintenance every time a project was added - new version takes the last token of $ARGUMENTS and asks the client's `detect-project` subcommand whether it's a known alias; if matched, it's the explicit hint, if not it's part of the prompt - this delegates registry knowledge to the registry, where it belongs Unrelated improvement noted but NOT fixed in this commit: - _rank_chunks in builder.py also has a naive substring boost that uses the original hint without alias expansion. The retriever already does the right thing, so this secondary boost is redundant. Tracked as a future cleanup but not in scope for the P1/P2 fix; codex's findings are about project_state lookup, not about the secondary chunk boost. Full suite: 162 passing (was 160), 1 warning. The +2 is the two new P1 regression tests.	2026-04-07 07:47:03 -04:00
Anto01	53147d326c	feat(phase9-C): rule-based candidate extractor and review queue Phase 9 Commit C. Closes the capture loop: Commit A records what AtoCore fed the LLM and what came back, Commit B bumps confidence on active memories the response actually references, and this commit turns structured cues in the response into candidate memories for a human review queue. Nothing extracted here is ever automatically promoted into trusted state. Every candidate sits at status="candidate" until a human (or later, a confident automatic policy) calls /memory/{id}/promote or /memory/{id}/reject. This keeps the "bad memory is worse than no memory" invariant from the operating model intact. New module: src/atocore/memory/extractor.py - MemoryCandidate dataclass (type, content, rule, source_span, project, confidence, source_interaction_id) - extract_candidates_from_interaction(interaction): runs a fixed set of regex rules over the response + response_summary and returns a list of candidates V0 rule set (deliberately narrow to keep false positives low): - decision_heading ## Decision: / ## Decision - / ## Decision — -> adaptation candidate - constraint_heading ## Constraint: ... -> project candidate - requirement_heading ## Requirement: ... -> project candidate - fact_heading ## Fact: ... -> knowledge candidate - preference_sentence "I prefer X" / "the user prefers X" -> preference candidate - decided_to_sentence "decided to X" -> adaptation candidate - requirement_sentence "the requirement is X" -> project candidate Extractor post-processing: - clean_value: collapse whitespace, strip trailing punctuation - min content length 8 chars, max 280 (keeps candidates reviewable) - dedupe by (memory_type, normalized value, rule) - drop candidates whose content already matches an active memory of the same type+project so the queue doesn't ask humans to re-curate things they already promoted Memory service (extends Commit B candidate-status foundation): - promote_memory(id): candidate -> active (404 if not a candidate) - reject_candidate_memory(id): candidate -> invalid - both are no-ops if the target isn't currently a candidate so the API can surface 404 without the caller needing to pre-check API endpoints (new): - POST /interactions/{id}/extract run extractor, preview-only body: {"persist": false} (default) returns candidates {"persist": true} creates candidate memories - POST /memory/{id}/promote candidate -> active - POST /memory/{id}/reject candidate -> invalid - GET /memory?status=candidate list review queue explicitly (existing endpoint now accepts status= override) - GET /memory now also returns reference_count and last_referenced_at per memory so the Commit B reinforcement signal is visible to clients Trust model unchanged: - candidates NEVER appear in context packs (get_memories_for_context still filters to active via the active_only default) - candidates NEVER get reinforced by the Commit B loop (reinforcement refuses non-active memories) - trusted project state is untouched end-to-end Tests (25 new, all green): - heading pattern: decision, constraint, requirement, fact - separator variants :, -, em-dash - sentence patterns: preference, decided_to, requirement - rejects too-short matches - dedupes identical matches - strips trailing punctuation - carries project and source_interaction_id onto candidates - drops candidates that duplicate an existing active memory - returns empty for prose without structural cues - candidate and active coexist in the memory table - promote_memory moves candidate -> active - promote on non-candidate returns False - reject_candidate_memory moves candidate -> invalid - reject on non-candidate returns False - get_memories(status="candidate") returns just the queue - POST /interactions/{id}/extract preview-only path - POST /interactions/{id}/extract persist=true path - POST /interactions/{id}/extract 404 for missing interaction - POST /memory/{id}/promote success + 404 on non-candidate - POST /memory/{id}/reject 404 on missing - GET /memory?status=candidate surfaces the queue - GET /memory?status=<invalid> returns 400 Full suite: 160 passing (was 135). What Phase 9 looks like end to end after this commit ---------------------------------------------------- prompt -> context pack assembled -> LLM response -> POST /interactions (capture) -> automatic Commit B reinforcement (active memories only) -> [optional] POST /interactions/{id}/extract -> Commit C extractor proposes candidates -> human reviews via GET /memory?status=candidate -> POST /memory/{id}/promote (candidate -> active) OR POST /memory/{id}/reject (candidate -> invalid) Not in this commit (deferred on purpose): - Decay of unused memories (we keep reference_count and last_referenced_at so a later decay job has the signal it needs) - LLM-based extractor as an alternative to the regex rules - Automatic promotion of high-confidence candidates - Candidate-to-entity upgrade path (needs the engineering layer memory-vs-entities decision, planned in a coming architecture doc)	2026-04-06 21:24:17 -04:00
Anto01	2704997256	feat(phase9-B): reinforce active memories from captured interactions Phase 9 Commit B from the agreed plan. With Commit A capturing what AtoCore fed to the LLM and what came back, this commit closes the weakest part of the loop: when a memory is actually referenced in a response, its confidence should drift up, and stale memories that nobody ever mentions should stay where they are. This is reinforcement only — nothing is promoted into trusted state and no candidates are created. Extraction is Commit C. Schema (additive migration): - memories.last_referenced_at DATETIME (null by default) - memories.reference_count INTEGER DEFAULT 0 - idx_memories_last_referenced on last_referenced_at - memories.status now accepts the new "candidate" value so Commit C has the status slot to land on. Existing active/superseded/invalid rows are untouched. New module: src/atocore/memory/reinforcement.py - reinforce_from_interaction(interaction): scans the interaction's response + response_summary for echoes of active memories and bumps confidence / reference_count for each match - matching is intentionally simple and explainable: * normalize both sides (lowercase, collapse whitespace) * require >= 12 chars of memory content to match * compare the leading 80-char window of each memory - the candidate pool is project-scoped memories for the interaction's project + global identity + preference memories, deduplicated - candidates and invalidated memories are NEVER reinforced; only active memories move Memory service changes: - MEMORY_STATUSES = ["candidate", "active", "superseded", "invalid"] - create_memory(status="candidate"\|"active"\|...) with per-status duplicate scoping so a candidate and an active with identical text can legitimately coexist during review - get_memories(status=...) explicit override of the legacy active_only flag; callers can now list the review queue cleanly - update_memory accepts any valid status including "candidate" - reinforce_memory(id, delta): low-level primitive that bumps confidence (capped at 1.0), increments reference_count, and sets last_referenced_at. Only active memories; returns (applied, old, new) - promote_memory / reject_candidate_memory helpers prepping Commit C Interactions service: - record_interaction(reinforce=True) runs reinforce_from_interaction automatically when the interaction has response content. reinforcement errors are logged but never raised back to the caller so capture itself is never blocked by a flaky downstream. - circular import between interactions service and memory.reinforcement avoided by lazy import inside the function API: - POST /interactions now accepts a reinforce bool field (default true) - POST /interactions/{id}/reinforce runs reinforcement on an existing captured interaction — useful for backfilling or for retrying after a transient error in the automatic pass - response lists which memory ids were reinforced with old / new confidence for audit Tests (17 new, all green): - reinforce_memory bumps, caps at 1.0, accumulates reference_count - reinforce_memory rejects candidates and missing ids - reinforce_memory rejects negative delta - reinforce_from_interaction matches active memory - reinforce_from_interaction ignores candidates and inactive - reinforce_from_interaction requires minimum content length - reinforce_from_interaction handles empty response cleanly - reinforce_from_interaction normalizes casing and whitespace - reinforce_from_interaction deduplicates across memory buckets - record_interaction auto-reinforces by default - record_interaction reinforce=False skips the pass - record_interaction handles empty response - POST /interactions/{id}/reinforce runs against stored interaction - POST /interactions/{id}/reinforce returns 404 for missing id - POST /interactions accepts reinforce=false Full suite: 135 passing (was 118). Trust model unchanged: - reinforcement only moves confidence within the existing active set - the candidate lifecycle is declared but only Commit C will actually create candidate memories - trusted project state is never touched by reinforcement Next: Commit C adds the rule-based extractor that produces candidate memories from captured interactions plus the promote/reject review queue endpoints.	2026-04-06 21:18:38 -04:00
Anto01	ea3fed3d44	feat(phase9-A): interaction capture loop foundation Phase 9 Commit A from the agreed plan: turn AtoCore from a stateless context enhancer into a system that records what it actually fed to an LLM and what came back. This is the audit trail Reflection (Commit B) and Extraction (Commit C) will be layered on top of. The interactions table existed in the schema since the original PoC but nothing wrote to it. This change makes it real: Schema migration (additive only): - response full LLM response (caller decides how much) - memories_used JSON list of memory ids in the context pack - chunks_used JSON list of chunk ids in the context pack - client identifier of the calling system (openclaw, claude-code, manual, ...) - session_id groups multi-turn conversations - project project name (mirrors the memory module pattern, no FK so capture stays cheap) - indexes on session_id, project, created_at The created_at column is now written explicitly with a SQLite-compatible 'YYYY-MM-DD HH:MM:SS' format so the same string lives in the DB and the returned dataclass. Without this the `since` filter on list_interactions would silently fail because CURRENT_TIMESTAMP and isoformat use different shapes that do not compare cleanly as strings. New module src/atocore/interactions/: - Interaction dataclass - record_interaction() persists one round-trip (prompt required; everything else optional). Refuses empty prompts. - list_interactions() filters by project / session_id / client / since, newest-first, hard-capped at 500 - get_interaction() fetch by id, full response + context pack API endpoints: - POST /interactions capture one interaction - GET /interactions list with summaries (no full response) - GET /interactions/{id} full record incl. response + pack Trust model: - Capture is read-only with respect to memories, project state, and source chunks. Nothing here promotes anything into trusted state. - The audit trail becomes the dataset Commit B (reinforcement) and Commit C (extraction + review queue) will operate on. Tests (13 new, all green): - service: persist + roundtrip every field - service: minimum-fields path (prompt only) - service: empty / whitespace prompt rejected - service: get by id returns None for missing - service: filter by project, session, client - service: ordering newest-first with limit - service: since filter inclusive on cutoff (the bug the timestamp fix above caught) - service: limit=0 returns empty - API: POST records and round-trips through GET /interactions/{id} - API: empty prompt returns 400 - API: missing id returns 404 - API: list filter returns summaries (not full response bodies) Full suite: 118 passing (was 105). master-plan-status.md updated to move Phase 9 from "not started" to "started" with the explicit note that Commit A is in and Commits B/C remain.	2026-04-06 19:31:43 -04:00
Anto01	c9b9eede25	feat: tunable ranking, refresh status, chroma backup + admin endpoints Three small improvements that move the operational baseline forward without changing the existing trust model. 1. Tunable retrieval ranking weights - rank_project_match_boost, rank_query_token_step, rank_query_token_cap, rank_path_high_signal_boost, rank_path_low_signal_penalty are now Settings fields - all overridable via ATOCORE_* env vars - retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32 - lets ranking be tuned per environment as Wave 1 is exercised without code changes 2. /projects/{name}/refresh status - refresh_registered_project now returns an overall status field ("ingested", "partial", "nothing_to_ingest") plus roots_ingested and roots_skipped counters - ProjectRefreshResponse advertises the new fields so callers can rely on them - covers the case where every configured root is missing on disk 3. Chroma cold snapshot + admin backup endpoints - create_runtime_backup now accepts include_chroma and writes a cold directory copy of the chroma persistence path - new list_runtime_backups() and validate_backup() helpers - new endpoints: - POST /admin/backup create snapshot (optional chroma) - GET /admin/backup list snapshots - GET /admin/backup/{stamp}/validate structural validation - chroma snapshots are taken under exclusive_ingestion() so a refresh or ingest cannot race with the cold copy - backup metadata records what was actually included and how big Tests: - 8 new tests covering tunable weights, refresh status branches (ingested / partial / nothing_to_ingest), chroma snapshot, list, validate, and the API endpoints (including the lock-acquisition path) - existing fake refresh stubs in test_api_storage.py updated for the expanded ProjectRefreshResponse model - full suite: 105 passing (was 97) next-steps doc updated to reflect that the chroma snapshot + restore validation gap from current-state.md is now closed in code; only the operational retention policy remains.	2026-04-06 18:42:19 -04:00
Anto01	14ab7c8e9f	fix: pass project_hint into retrieve and add path-signal ranking Two changes that belong together: 1. builder.build_context() now passes project_hint into retrieve(), so the project-aware boost actually fires for the retrieval pipeline driven by /context/build. Before this, only direct /query callers benefited from the registered-project boost. 2. retriever now applies two more ranking signals on every chunk: - _query_match_boost: boosts chunks whose source/title/heading echo high-signal query tokens (stop list filters out generic words like "the", "project", "system") - _path_signal_boost: down-weights archival noise (_archive, _history, pre-cleanup, reviews) by 0.72 and up-weights current high-signal docs (status, decision, requirements, charter, system-map, error-budget, ...) by 1.18 Tests: - test_context_builder_passes_project_hint_to_retrieval verifies the wiring fix - test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths verifies the new ranking helpers prefer current docs over archive This addresses the cross-project competition and archive bleed called out in current-state.md after the Wave 1 ingestion.	2026-04-06 18:37:07 -04:00
Anto01	bdb42dba05	Expand active project wave and serialize refreshes	2026-04-06 14:58:14 -04:00
Anto01	26bfa94c65	Add project-aware boost to raw query	2026-04-06 13:32:33 -04:00
Anto01	06aa931273	Add project registry update flow	2026-04-06 12:31:24 -04:00
Anto01	c9757e313a	Harden runtime and add backup foundation	2026-04-06 10:15:00 -04:00
Anto01	9715fe3143	Add project registration endpoint	2026-04-06 09:52:19 -04:00
Anto01	1f1e6b5749	Add project registration proposal preview	2026-04-06 09:11:11 -04:00
Anto01	827dcf2cd1	Add project registration policy and template	2026-04-06 08:46:37 -04:00
Anto01	8293099025	Add project registry refresh foundation	2026-04-06 08:02:13 -04:00
Anto01	6bfa1fcc37	Add Dalidou storage foundation and deployment prep	2026-04-05 18:33:52 -04:00
Anto01	b0889b3925	Stabilize core correctness and sync project plan state	2026-04-05 17:53:23 -04:00
Anto01	b48f0c95ab	feat: Phase 2 Memory Core — structured memory with context integration Memory Core implementation: - Memory service with 6 types: identity, preference, project, episodic, knowledge, adaptation - CRUD operations: create (with dedup), get (filtered), update, invalidate, supersede - Confidence scoring (0.0-1.0) and lifecycle management (active/superseded/invalid) - Memory API endpoints: POST/GET/PUT/DELETE /memory Context builder integration (trust precedence per Master Plan): 1. Trusted Project State (highest trust, 20% budget) 2. Identity + Preference memories (10% budget) 3. Retrieved chunks (remaining budget) Also fixed database.py to use dynamic settings reference for test isolation. 45/45 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 09:54:52 -04:00
Anto01	531c560db7	feat: Phase 1 ingestion hardening + Phase 5 Trusted Project State Phase 1 - Ingestion hardening: - Encoding fallback (UTF-8/UTF-8-sig/Latin-1/CP1252) - Delete detection: purge DB/vector entries for removed files - Ingestion stats endpoint (GET /stats) Phase 5 - Trusted Project State: - project_state table with categories (status, decision, requirement, contact, milestone, fact, config) - CRUD API: POST/GET/DELETE /project/state - Upsert semantics, invalidation (supersede) support - Context builder integrates project state at highest trust precedence - Project state gets 20% budget allocation, appears first in context - Trust precedence: Project State > Retrieved Chunks (per Master Plan) 33/33 tests passing. Validated end-to-end with GigaBIT M1 project data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 09:41:59 -04:00
Anto01	b4afbbb53a	feat: implement AtoCore Phase 0 + Phase 0.5 (foundation + PoC) Complete implementation of the personal context engine foundation: - FastAPI server with 5 endpoints (ingest, query, context/build, health, debug) - SQLite database with 5 tables (documents, chunks, memories, projects, interactions) - Heading-aware markdown chunker (800 char max, recursive splitting) - Multilingual embeddings via sentence-transformers (EN/FR) - ChromaDB vector store with cosine similarity retrieval - Context builder with project boosting, dedup, and budget enforcement - CLI scripts for batch ingestion and test prompt evaluation - 19 unit tests passing, 79% coverage - Validated on 482 real project files (8383 chunks, 0 errors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 09:21:27 -04:00

38 Commits