Codex's third-round audit closed the remaining five open questions
with concrete file:line resolutions, patched inline in the plan:
- F-7 (P1): graduation stack is partially built — graduated_to_entity_id
at database.py:143-146, graduated memory status, promote preserves
original at service.py:354-356, tests at test_engineering_v1_phase5.py.
Gaps: missing direct POST /memory/{id}/graduate route; spec's
knowledge -> Fact mismatches ontology (no fact type). Reconcile to
parameter or similar. V1-E 2 days -> 3-4 days.
- Q-5 / V1-D (P2): renderer reads wall-clock in _footer at mirror.py:320.
Fix is injecting regenerated timestamp + checksum as renderer inputs,
sorting DB iteration, removing dict ordering deps. Render code must
not call wall-clock directly.
- project vs project_id (P3): doc note only, no storage rename.
- Total estimate: 17.5-19.5 focused days (calendar buffer on top).
- Release notes must NOT canonize "Minions" as a V2 name. Use neutral
"queued background processing / async workers" wording.
Sign-off from Codex: "with those edits, I'd sign off on the five
questions. The only non-architectural uncertainty left in the plan is
scheduling discipline against the current Now list; that does not
block V1-0 once the soak window and memory-density gate clear."
Plan frozen. V1-0 starts after pipeline soak (~2026-04-26) and the
100-active-memory density gate clear.
Co-Authored-By: Codex <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings folded in, all with exact file:line refs from Codex:
- F-1 downgraded from done to partial. Entity dataclass at
service.py:67 and entities table missing extractor_version and
canonical_home fields per engineering-v1-acceptance.md:45. V1-0
scope now adds both via additive migration + doc note that
project is the project_id per "fields equivalent to" wording.
- F-2 replaced guesses with ground truth per-query status:
9 of 20 v1-required queries done, 1 partial (Q-001 needs
subsystem-scoped variant), 10 missing. V1-A scope shrank to
Q-001 shape fix + Q-6 integration. V1-C closes the 8 net-new
queries; Q-020 deferred to V1-D (mirror).
- F-5 reframed. Generic conflicts + conflict_members schema
already present at database.py:190, no migration needed.
Divergence is detector body (per-type dispatch needs
generalization) + routes (/admin/conflicts/* needs
/conflicts/* alias). V1-F scope is detector + routes only.
Totals revised: 16.5-17.5 days, ~60 tests.
Three of Codex's eight open questions now resolved. Remaining:
F-7 graduation depth, mirror determinism, project naming,
velocity calibration, minions-as-V2 naming.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/decisions/2026-04-22-gbrain-plan-rejection.md: record of
gbrain-inspired "Phase 8 Minions + typed edges" plan rejection.
Three high findings from Codex verified against cited architecture
docs (ontology V1 predicate set, canonical entity contract,
master-plan-status Now list sequencing).
- docs/plans/engineering-v1-completion-plan.md: seven-phase plan
for finishing Engineering V1 against engineering-v1-acceptance.md.
V1-0 (write-time invariants: F-8 provenance + F-5 hooks + F-1
audit) as hard prerequisite per Codex first-round review. Per-
criterion gap audit against each F/Q/O/D acceptance item with
code:line references. Explicit collision points with the Now
list; schedule shifted ~4 weeks to avoid pipeline-soak window.
Awaiting Codex file-level audit.
- DEV-LEDGER.md: Recent Decisions + Session Log entries covering
both the rejection and the revised plan.
No code changes. Docs + ledger only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last P2 from Antoine's "daily-usable" sprint. Entities referenced via
[[Name]] in descriptions or mirror markdown now render as:
- live wikilink if the name matches an entity in the same project
- live cross-project link with "(in project X)" scope indicator if the
only match is in another project
- red italic redlink pointing at /wiki/new?name=... otherwise
Clicking a redlink opens a pre-filled "create this entity" form that
POSTs to /v1/entities and redirects to the new entity's page.
- engineering/wiki.py: _wikilink_transform + _resolve_wikilink,
applied in render_project (pre-markdown) and render_entity
(description body). render_new_entity_form for the create page.
CSS for .wikilink / .wikilink-cross / .redlink / .new-entity-form
- api/routes.py: GET /wiki/new?name&project
- tests/test_wikilinks.py: 12 tests including the spec regression
(A references [[B]] -> redlink; create B -> link becomes live)
- DEV-LEDGER.md: session log + test_count 521 -> 533
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New table memory_merge_candidates + service functions to cluster
near-duplicate active memories within (project, memory_type) buckets,
draft a unified content via LLM, and merge on human approval. Source
memories become superseded (never deleted); merged memory carries
union of tags, max of confidence, sum of reference_count.
- schema migration for memory_merge_candidates
- atocore.memory.similarity: cosine + transitive clustering
- atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific
- service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate
- scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent)
- 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan
- triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar
- batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays)
- deploy/dalidou/dedup-watcher.sh for UI-triggered scans
- 21 new tests (374 → 395)
- docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updated DEV-LEDGER orientation with post-sprint state:
- live_sha 775960c, tests 303, harness 17/18 on live
- interactions 234 (192 claude-code + 38 openclaw)
- project_state_entries 110 across 6 projects
- nightly pipeline now includes auto-promote, harness, summary
Updated master-plan-status.md "What Is Real Today" to match
actual 2026-04-16 state. Phase 10 moved from "Next" to
operational. New "Now" priorities: observe pipeline, knowledge
density, multi-model triage, fix p04-constraints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- openclaw-plugins/atocore-capture/handler.js: simplified version
using before_agent_start + llm_output hooks (survives gateway
restarts). The production copy lives on T420 at
/tmp/atocore-openclaw-capture-plugin/openclaw-plugins/atocore-capture/
- DEV-LEDGER: updated orientation (live_sha b687e7f, capture clients)
and session log for 2026-04-16
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fixture's expect_absent: "GigaBIT" was catching legitimate
semantic overlap, not retrieval bleed. The p06 ARCHITECTURE.md
Overview describes the Polisher Suite as built for the GigaBIT M1
mirror — it is what the polisher is for, so the word appears
correctly in p06 content. All retrieved sources for this prompt
were genuinely p06/shared paths; zero actual p04 chunks leaked.
Narrowed the assertion to expect_absent: "[Source: p04-gigabit/",
which tests the real invariant (no p04 source chunks retrieved
into p06 context) without the false positive.
No retrieval/ranking code change. Fixture-only fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R11: POST /admin/extract-batch with mode=llm now returns 503 when the
claude CLI is unavailable (was silently returning success with 0
candidates), with a message pointing at the host-side script. +2 tests.
R12: extracted SYSTEM_PROMPT + parse_llm_json_array +
normalize_candidate_item + build_user_message into stdlib-only
src/atocore/memory/_llm_prompt.py. Both the container extractor and
scripts/batch_llm_extract_live.py now import from it, eliminating the
prompt/parser drift risk.
Tests 297 -> 299.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflects today's massive work: engineering layer + wiki + Karpathy
upgrades + OpenClaw importer + auto-detection. Active memories
47 -> 84. Ready for next session to pick up cold.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Before: a model returning 'p04-gigabit' for a p06-polisher
interaction would silently override the known scope because the
project was registered. After: interaction.project always wins
when set. Model project is only a fallback for unscoped captures.
Not yet guaranteed: within-project semantic errors (model says
the right project but wrong content). That's a content-quality
concern, not a trust-hierarchy issue.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex R6: the LLM extractor accepted the model's project field
verbatim. When the model returned empty string, clearly p06 memories
got promoted as project='', making them invisible to the p06
project-memory band and explaining the p06-offline-design harness
failure.
Fix: if model returns empty project but interaction.project is set,
inherit the interaction's project. Model-supplied project still takes
precedence when non-empty.
Two new tests lock the fallback and precedence behaviors.
R5 acknowledged (LLM extractor not yet wired into API — next task).
Test count: 278 -> 280. Harness re-run pending after deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mini-phase complete. Before/after deltas:
Metric Before After
─────────────────────────────────────────
Rule extractor recall 0% 0% (unchanged, deprioritized)
LLM extractor recall n/a 100% (new, claude -p haiku)
LLM candidate yield n/a 2.55/interaction
First triage accept rate n/a 31% (16/51)
Active memories 20 36 (+16)
p06-polisher memories 2 16 (+14)
atocore memories 0 5 (+5)
Retrieval harness 6/6 15/18 (expanded to 18 fixtures)
Test count 264 278 (+14)
3 remaining harness failures are budget-contention on the p06 memory
band: the specific memory a fixture targets ranks 4th+ and the 25%
budget only holds 2-3 entries. Not a ranking bug — the per-entry
250-char cap was the one justified tweak; a second budget change
risks regressing other fixtures per Codex's Day 7 hard gate.
Ledger updated: Orientation, Session Log, main_tip, harness line.
Next on the roadmap (from DEV-LEDGER Active Plan / docs/next-steps):
- Wave 2 trusted operational ingestion (p04/p05/p06 dashboards)
- Finish OpenClaw integration (Phase 8)
- Auto-triage (multi-model second pass to reduce human review)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Day 2 extractor eval baseline on a 20-interaction labeled set shows
0% yield / 0% recall / 0% precision. The 5 false negatives span
5 distinct miss classes, matching the pattern Codex's Day 4 hard
gate was designed to catch but arriving two days early.
No extractor code change on main. Day 1+2 artifacts committed on
working branch 'claude/extractor-eval-loop' at 7d8d599. Day 4
decision (keep rule-expanding vs prototype LLM-assisted mode) is
escalated to Antoine for ratification before Day 3 work touches
any extractor.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ledger is the one-file source of truth for "what is currently
true" across Claude/Codex/human sessions:
- Orientation (live SHA, main tip, test count, harness state)
- Active Plan (currently Codex's 8-day extractor + harness plan
with hard gates and fail-early thresholds)
- Open Review Findings (P1/P2, status)
- Recent Decisions (bounded to last 20)
- Session Log (bounded to last 20)
- Working Rules (no parallel work, branching rule, P1 block)
Narrative docs under docs/ sometimes lag reality; the ledger does
not. Every session MUST read it at start and append a Session Log
line before ending.
AGENTS.md: added a new "Session protocol" section at the top that
points at the ledger. Applies to any agent (Claude, Codex, future).
CLAUDE.md (new, project-local): project instructions for Claude
Code in this repo. Points at DEV-LEDGER.md and AGENTS.md, spells
out the deploy workflow and the Claude/Codex working model.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>