feat: Phase 7I + UI refresh (capture form, memory/domain/activity pages, topnav)

Closes three gaps the user surfaced: (1) OpenClaw agents run blind without AtoCore context, (2) mobile/desktop chats can't be captured at all, (3) wiki UI hadn't kept up with backend capabilities. Phase 7I — OpenClaw two-way bridge - Plugin now calls /context/build on before_agent_start and prepends the context pack to event.prompt, so whatever LLM runs underneath (sonnet, opus, codex, local model) answers grounded in AtoCore knowledge. Captured prompt stays the user's original text; fail-open with a 5s timeout. Config-gated via injectContext flag. - Plugin version 0.0.0 → 0.2.0; README rewritten. UI refresh - /wiki/capture — paste-to-ingest form for Claude Desktop / web / mobile / ChatGPT / other. Goes through normal /interactions pipeline with client="claude-desktop|claude-web|claude-mobile|chatgpt|other". Fixes the rotovap/mushroom-on-phone gap. - /wiki/memories/{id} (Phase 7E) — full memory detail: content, status, confidence, refs, valid_until, domain_tags (clickable to domain pages), project link, source chunk, graduated-to-entity link, full audit trail, related-by-tag neighbors. - /wiki/domains/{tag} (Phase 7F) — cross-project view: all active memories with the given tag grouped by project, sorted by count. Case-insensitive, whitespace-tolerant. Also surfaces graduated entities carrying the tag. - /wiki/activity — autonomous-activity timeline feed. Summary chips by action (created/promoted/merged/superseded/decayed/canonicalized) and by actor (auto-dedup-tier1, auto-dedup-tier2, confidence-decay, phase10-auto-promote, transient-to-durable, tag-canon, human-triage). Answers "what has the brain been doing while I was away?" - Home refresh: persistent topnav (Home · Activity · Capture · Triage · Dashboard), "What the brain is doing" snippet above project cards showing recent autonomous-actor counts, link to full activity. Tests: +10 (capture page, memory detail + 404, domain cross-project + empty + tag normalization, activity feed + groupings, home topnav, superseded-source detail after merge). 440 → 450. Known next: capture-browser extension for Claude.ai web (bigger project, deferred); voice/mobile relay (adjacent). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:14:15 -04:00
parent 877b97ec78
commit 6e43cc7383
7 changed files with 737 additions and 282 deletions
--- a/docs/current-state.md
+++ b/docs/current-state.md
@@ -1,275 +1,49 @@
-# AtoCore Current State
+# AtoCore — Current State (2026-04-19)

-## Status Summary
+Live deploy: `877b97e` · Dalidou health: ok · Harness: 17/18.

-AtoCore is no longer just a proof of concept. The local engine exists, the
-correctness pass is complete, Dalidou now hosts the canonical runtime and
-machine-storage location, and the T420/OpenClaw side now has a safe read-only
-path to consume AtoCore. The live corpus is no longer just self-knowledge: it
-now includes a first curated ingestion batch for the active projects.
+## The numbers

-## Phase Assessment
+| | count |
+|---|---|
+| Active memories | 266 (180 project, 31 preference, 24 knowledge, 17 adaptation, 11 episodic, 3 identity) |
+| Candidates pending | **0** (autonomous triage drained the queue) |
+| Interactions captured | 605 (250 claude-code, 351 openclaw) |
+| Entities (typed graph) | 50 |
+| Vectors in Chroma | 33K+ |
+| Projects | 6 registered (p04, p05, p06, abb-space, atomizer-v2, atocore) + apm emerging (2 memories, below auto-register threshold) |
+| Unique domain tags | 210 |
+| Tests | 440 passing |

- completed
-  - Phase 0
-  - Phase 0.5
-  - Phase 1
- baseline complete
-  - Phase 2
-  - Phase 3
-  - Phase 5
-  - Phase 7
-  - Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
- partial
-  - Phase 4
-  - Phase 8
- not started
-  - Phase 6
-  - Phase 10
-  - Phase 11
-  - Phase 12
-  - Phase 13
+## Autonomous pipeline — what runs without me

-## What Exists Today
+| When | Job | Does |
+|---|---|---|
+| every hour | `hourly-extract.sh` | Pulls new interactions → LLM extraction → 3-tier auto-triage (sonnet → opus → discard/human). 0 pending candidates right now = autonomy is working. |
+| every 2 min | `dedup-watcher.sh` | Services UI-triggered dedup scans |
+| daily 03:00 UTC | Full nightly (`batch-extract.sh`) | Extract · triage · auto-promote reinforced · synthesis · harness · dedup (0.90) · emerging detector · transient→durable · **confidence decay (7D)** · integrity check · alerts |
+| Sundays | +Weekly deep pass | Knowledge-base lint · dedup @ 0.85 · **tag canonicalization (7C)** |

- ingestion pipeline
- parser and chunker
- SQLite-backed memory and project state
- vector retrieval
- context builder
- API routes for query, context, health, and source status
- project registry and per-project refresh foundation
- project registration lifecycle:
-  - template
-  - proposal preview
-  - approved registration
-  - safe update of existing project registrations
-  - refresh
- implementation-facing architecture notes for:
-  - engineering knowledge hybrid architecture
-  - engineering ontology v1
- env-driven storage and deployment paths
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
- T420/OpenClaw read-only AtoCore helper skill
- full active-project markdown/text corpus wave for:
-  - `p04-gigabit`
-  - `p05-interferometer`
-  - `p06-polisher`
+Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs human**. That's the brain self-organizing.

-## What Is True On Dalidou
+## Phase 7 — Memory Consolidation status

- deployed repo location:
-  - `/srv/storage/atocore/app`
- canonical machine DB location:
-  - `/srv/storage/atocore/data/db/atocore.db`
- canonical vector store location:
-  - `/srv/storage/atocore/data/chroma`
- source input locations:
-  - `/srv/storage/atocore/sources/vault`
-  - `/srv/storage/atocore/sources/drive`
+| Subphase | What | Status |
+|---|---|---|
+| 7A | Semantic dedup + merge lifecycle | live |
+| 7A.1 | Tiered auto-approve (sonnet ≥0.8 + sim ≥0.92 → merge; opus escalation; human only for ambiguous) | live |
+| 7B | Memory-to-memory contradiction detection (0.70–0.88 band, classify duplicate/contradicts/supersedes) | deferred, needs 7A signal |
+| 7C | Tag canonicalization (weekly; auto-apply ≥0.8 confidence; protects project tokens) | live (first run: 0 proposals — vocabulary is clean) |
+| 7D | Confidence decay (0.97/day on idle unreferenced; auto-supersede below 0.3) | live (first run: 0 decayed — nothing idle+unreferenced yet) |
+| 7E | `/wiki/memories/{id}` detail page | pending |
+| 7F | `/wiki/domains/{tag}` cross-project view | pending (wants 7C + more usage first) |
+| 7G | Re-extraction on prompt version bump | pending |
+| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |

-The service and storage foundation are live on Dalidou.
+## Known gaps (honest)

-The machine-data host is real and canonical.
-
-The project registry is now also persisted in a canonical mounted config path on
-Dalidou:
-
- `/srv/storage/atocore/config/project-registry.json`
-
-The content corpus is partially populated now.
-
-The Dalidou instance already contains:
-
- AtoCore ecosystem and hosting docs
- current-state and OpenClaw integration docs
- Master Plan V3
- Build Spec V1
- trusted project-state entries for `atocore`
- full staged project markdown/text corpora for:
-  - `p04-gigabit`
-  - `p05-interferometer`
-  - `p06-polisher`
- curated repo-context docs for:
-  - `p05`: `Fullum-Interferometer`
-  - `p06`: `polisher-sim`
- trusted project-state entries for:
-  - `p04-gigabit`
-  - `p05-interferometer`
-  - `p06-polisher`
-
-Current live stats after the full active-project wave are now far beyond the
-initial seed stage:
-
- more than `1,100` source documents
- more than `20,000` chunks
- matching vector count
-
-The broader long-term corpus is still not fully populated yet. Wider project and
-vault ingestion remains a deliberate next step rather than something already
-completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.
-
-For human-readable quality review, the current staged project markdown corpus is
-primarily visible under:
-
- `/srv/storage/atocore/sources/vault/incoming/projects`
-
-This staged area is now useful for review because it contains the markdown/text
-project docs that were actually ingested for the full active-project wave.
-
-It is important to read this staged area correctly:
-
- it is a readable ingestion input layer
- it is not the final machine-memory representation itself
- seeing familiar PKM-style notes there is expected
- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
-  trusted project state, and context-builder outputs
-
-## What Is True On The T420
-
- SSH access is working
- OpenClaw workspace inspected at `/home/papa/clawd`
- OpenClaw's own memory system remains unchanged
- a read-only AtoCore integration skill exists in the workspace:
-  - `/home/papa/clawd/skills/atocore-context/`
- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
- fail-open behavior has been verified for the helper path
- OpenClaw can now seed AtoCore in two distinct ways:
-  - project-scoped memory entries
-  - staged document ingestion into the retrieval corpus
- the helper now supports the practical registered-project lifecycle:
-  - projects
-  - project-template
-  - propose-project
-  - register-project
-  - update-project
-  - refresh-project
- the helper now also supports the first organic routing layer:
-  - `detect-project "<prompt>"`
-  - `auto-context "<prompt>" [budget] [project]`
- OpenClaw can now default to AtoCore for project-knowledge questions without
-  requiring explicit helper commands from the human every time
-
-## What Exists In Memory vs Corpus
-
-These remain separate and that is intentional.
-
-In `/memory`:
-
- project-scoped curated memories now exist for:
-  - `p04-gigabit`: 5 memories
-  - `p05-interferometer`: 6 memories
-  - `p06-polisher`: 8 memories
-
-These are curated summaries and extracted stable project signals.
-
-In `source_documents` / retrieval corpus:
-
- full project markdown/text corpora are now present for the active project set
- retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is broad enough that ranking quality matters more than
-  corpus presence alone
- underspecified prompts can still pull in historical or archive material, so
-  project-aware routing and better ranking remain important
-
-The source refresh model now has a concrete foundation in code:
-
- a project registry file defines known project ids, aliases, and ingest roots
- the API can list registered projects
- the API can return a registration template
- the API can preview a registration without mutating state
- the API can persist an approved registration
- the API can update an existing registered project without changing its canonical id
- the API can refresh one registered project at a time
-
-This lifecycle is now coherent end to end for normal use.
-
-The first live update passes on existing registered projects have now been
-verified against `p04-gigabit` and `p05-interferometer`:
-
- the registration description can be updated safely
- the canonical project id remains unchanged
- refresh still behaves cleanly after the update
- `context/build` still returns useful project-specific context afterward
-
-## Reliability Baseline
-
-The runtime has now been hardened in a few practical ways:
-
- SQLite connections use a configurable busy timeout
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
- project registry writes are atomic file replacements rather than in-place rewrites
- a full runtime backup and restore path now exists and has been exercised on
-  live Dalidou:
-  - SQLite (hot online backup via `conn.backup()`)
-  - project registry (file copy)
-  - Chroma vector store (cold directory copy under `exclusive_ingestion()`)
-  - backup metadata
-  - `restore_runtime_backup()` with CLI entry point
-    (`python -m atocore.ops.backup restore <STAMP>
-    --confirm-service-stopped`), pre-restore safety snapshot for
-    rollback, WAL/SHM sidecar cleanup, `PRAGMA integrity_check`
-    on the restored file
-  - the first live drill on 2026-04-09 surfaced and fixed a Chroma
-    restore bug on Docker bind-mounted volumes (`shutil.rmtree`
-    on a mount point); a regression test now asserts the
-    destination inode is stable across restore
- deploy provenance is visible end-to-end:
-  - `/health` reports `build_sha`, `build_time`, `build_branch`
-    from env vars wired by `deploy.sh`
-  - `deploy.sh` Step 6 verifies the live `build_sha` matches the
-    just-built commit (exit code 6 on drift) so "live is current?"
-    can be answered precisely, not just by `__version__`
-  - `deploy.sh` Step 1.5 detects that the script itself changed
-    in the pulled commit and re-execs into the fresh copy, so
-    the deploy never silently runs the old script against new source
-
-This does not eliminate every concurrency edge, but it materially improves the
-current operational baseline.
-
-In `Trusted Project State`:
-
- each active seeded project now has a conservative trusted-state set
- promoted facts cover:
-  - summary
-  - core architecture or boundary decision
-  - key constraints
-  - next focus
-
-This separation is healthy:
-
- memory stores distilled project facts
- corpus stores the underlying retrievable documents
-
-## Immediate Next Focus
-
-1. ~~Re-run the full backup/restore drill~~ — DONE 2026-04-11,
-   full pass (db, registry, chroma, integrity all true)
-2. ~~Turn on auto-capture of Claude Code sessions in conservative
-   mode~~ — DONE 2026-04-11, Stop hook wired via
-   `deploy/hooks/capture_stop.py` → `POST /interactions`
-   with `reinforce=false`; kill switch via
-   `ATOCORE_CAPTURE_DISABLED=1`
-3. Run a short real-use pilot with auto-capture on, verify
-   interactions are landing in Dalidou, review quality
-4. Use the new T420-side organic routing layer in real OpenClaw workflows
-4. Tighten retrieval quality for the now fully ingested active project corpora
-5. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
-6. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
-7. Expand the remaining boring operations baseline:
-   - retention policy cleanup script
-   - off-Dalidou backup target (rsync or similar)
-8. Only later consider write-back, reflection, or deeper autonomous behaviors
-
-See also:
-
- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
-
-## Guiding Constraints
-
- bad memory is worse than no memory
- trusted project state must remain highest priority
- human-readable sources and machine storage stay separate
- OpenClaw integration must not degrade OpenClaw baseline behavior
+1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
+2. **OpenClaw is capture-only, not context-grounded.** The plugin POSTs `/interactions` on `llm_output` but does NOT call `/context/build` on `before_agent_start`. OpenClaw's underlying agent runs blind. See Q2 below.
+3. **Human interface (wiki) is thin and static.** 5 project cards + a "System" line. No dashboard for the autonomous activity. No per-memory detail page. See Q3/Q5.
+4. **Harness 17/18** — the `p04-constraints` fixture wants "Zerodur" but retrieval surfaces related-not-exact terms. Content gap, not a retrieval regression.
+5. **Two projects under-populated**: p05-interferometer (4 memories, 18 state) and atomizer-v2 (1 memory, 6 state). Batch re-extract with the new llm-0.6.0 prompt would help.