ATOCore/docs/current-state.md

# AtoCore - Current State (2026-04-25)

Update 2026-04-25: project-id chunk/vector metadata is deployed and backfilled.
Live Dalidou is on `a87d984`; `/health` is ok with 33,253 vectors and sources
ready. Live retrieval harness is 19/20 with 0 blocking failures and 1 known
content gap (`p04-constraints` missing `Zerodur` / `1.2`). Full local suite:
571 passed.

The project-id backfill was applied per populated project after a
Chroma-inclusive backup at
`/srv/storage/atocore/backups/snapshots/20260424T154358Z`. The immediate
post-apply dry-run reported 33,253 already tagged, 0 updates, 0 missing, and 0
malformed. A later repeat dry-run after the code-only ranking deploy was
aborted because the one-off container ran too long; the earlier post-apply
idempotency result remains the migration acceptance record.

## V1-0 landed 2026-04-22

Engineering V1 completion track has started. **V1-0 write-time invariants**
merged and deployed: F-1 shared-header fields (`extractor_version`,
`canonical_home`, `hand_authored`) added to `entities`, F-8 provenance
enforcement at both `create_entity` and `promote_entity`, F-5 synchronous
conflict-detection hook on every active-entity write path (create, promote,
supersede) with Q-3 fail-open. Prod backfill ran cleanly — 31 legacy
active/superseded entities flagged `hand_authored=1`, follow-up dry-run
returned 0 remaining rows. Test count 533 → 547 (+14).

R14 is closed: `POST /entities/{id}/promote` now translates the new
caller-fixable V1-0 `ValueError` into HTTP 400.

**Next in the V1 track:** V1-A (minimal query slice + Q-6 killer-correctness
integration). Gated on pipeline soak (~2026-04-26) + 100+ active memory
density target. See `docs/plans/engineering-v1-completion-plan.md` for
the full 7-phase roadmap and `docs/plans/v1-resume-state.md` for the
"you are here" map.

---

## Snapshot from previous update (2026-04-19)

## The numbers

| | count |
|---|---|
| Active memories | 266 (180 project, 31 preference, 24 knowledge, 17 adaptation, 11 episodic, 3 identity) |
| Candidates pending | **0** (autonomous triage drained the queue) |
| Interactions captured | 605 (250 claude-code, 351 openclaw) |
| Entities (typed graph) | 50 |
| Vectors in Chroma | 33K+ |
| Projects | 6 registered (p04, p05, p06, abb-space, atomizer-v2, atocore) + apm emerging (2 memories, below auto-register threshold) |
| Unique domain tags | 210 |
| Tests | 440 passing |

## Autonomous pipeline — what runs without me

| When | Job | Does |
|---|---|---|
| every hour | `hourly-extract.sh` | Pulls new interactions → LLM extraction → 3-tier auto-triage (sonnet → opus → discard/human). 0 pending candidates right now = autonomy is working. |
| every 2 min | `dedup-watcher.sh` | Services UI-triggered dedup scans |
| daily 03:00 UTC | Full nightly (`batch-extract.sh`) | Extract · triage · auto-promote reinforced · synthesis · harness · dedup (0.90) · emerging detector · transient→durable · **confidence decay (7D)** · integrity check · alerts |
| Sundays | +Weekly deep pass | Knowledge-base lint · dedup @ 0.85 · **tag canonicalization (7C)** |

Last nightly run (2026-04-19 03:00 UTC): **31 promoted · 39 rejected · 0 needs human**. That's the brain self-organizing.

## Phase 7 — Memory Consolidation status

| Subphase | What | Status |
|---|---|---|
| 7A | Semantic dedup + merge lifecycle | live |
| 7A.1 | Tiered auto-approve (sonnet ≥0.8 + sim ≥0.92 → merge; opus escalation; human only for ambiguous) | live |
| 7B | Memory-to-memory contradiction detection (0.70–0.88 band, classify duplicate/contradicts/supersedes) | deferred, needs 7A signal |
| 7C | Tag canonicalization (weekly; auto-apply ≥0.8 confidence; protects project tokens) | live (first run: 0 proposals — vocabulary is clean) |
| 7D | Confidence decay (0.97/day on idle unreferenced; auto-supersede below 0.3) | live (first run: 0 decayed — nothing idle+unreferenced yet) |
| 7E | `/wiki/memories/{id}` detail page | pending |
| 7F | `/wiki/domains/{tag}` cross-project view | pending (wants 7C + more usage first) |
| 7G | Re-extraction on prompt version bump | pending |
| 7H | Chroma vector hygiene (delete vectors for superseded memories) | pending |

## Known gaps (honest, refreshed 2026-04-24)

1. **Capture surface is Claude-Code-and-OpenClaw only.** Conversations in Claude Desktop, Claude.ai web, phone, or any other LLM UI are NOT captured. Example: the rotovap/mushroom chat yesterday never reached AtoCore because no hook fired. See Q4 below.
2. **Project-scoped retrieval guard is deployed and passing.** Explicit `project_id` chunk/vector metadata is now present in SQLite and Chroma for the 33,253-vector corpus. Retrieval prefers exact metadata ownership and keeps path/tag matching as a legacy fallback.
3. **Human interface is useful but not yet the V1 Human Mirror.** Wiki/dashboard pages exist, but the spec routes, deterministic mirror files, disputed markers, and curated annotations remain V1-D work.
4. **Harness known issue:** `p04-constraints` wants "Zerodur" and "1.2"; live retrieval surfaces related constraints but not those exact strings. Treat as content/state gap until fixed.
5. **Formal docs lag the ledger during fast work.** Use `DEV-LEDGER.md` and `python scripts/live_status.py` for live truth, then copy verified claims into these docs.