Codex audit of cbf9e03 surfaced two P1 gaps + one P2 scope concern,
all verified with code-level probes. Patches below.
P1: promote_entity did not re-check F-8 at status flip.
Legacy candidates with source_refs='[]' and hand_authored=0 can
exist from before V1-0 enforcement. promote_entity now raises
ValueError before flipping status so no F-8 violation can slip
into the active store through the promote path. Row stays
candidate on rejection. Symmetric error shape with the create
side.
P1: supersede_entity was missing the F-5 hook.
Plan calls for synchronous conflict detection on every
active-entity write path. Supersede creates a `supersedes`
relationship rooted at the `superseded_by` entity, which can
produce a conflict the detector should catch. Added
detect_conflicts_for_entity(superseded_by) call with fail-open
per conflict-model.md:256.
P2: backfill script --invalidate-instead was too broad.
Query included both active AND superseded rows; invalidating
superseded rows collapses audit history that V1-0 remediation
never intended to touch. Now --invalidate-instead scopes to
status='active' only. Default hand_authored-flag mode stays
broad since it's additive/non-destructive. Help text made the
destructive posture explicit.
Four new regression tests in test_v1_0_write_invariants.py:
- test_promote_rejects_legacy_candidate_without_provenance
- test_promote_accepts_candidate_flagged_hand_authored
- test_supersede_runs_conflict_detection_on_new_active
- test_supersede_hook_fails_open
Test count: 543 -> 547 (+4). Full suite green in 81.07s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase V1-0 of the Engineering V1 Completion Plan. Establishes the
write-time invariants every later phase depends on so no later phase
can leak invalid state into the entity store.
F-1 shared-header fields per engineering-v1-acceptance.md:45:
- entities.extractor_version (default "", EXTRACTOR_VERSION="v1.0.0"
written by service.create_entity)
- entities.canonical_home (default "entity")
- entities.hand_authored (default 0, INTEGER boolean)
Idempotent ALTERs in both _apply_migrations (database.py) and
init_engineering_schema (service.py). CREATE TABLE also carries the
columns for fresh DBs. _row_to_entity tolerates old rows without
them so tests that predate V1-0 keep passing.
F-8 provenance enforcement per promotion-rules.md:243:
create_entity raises ValueError when source_refs is empty and
hand_authored is False. New kwargs hand_authored and
extractor_version threaded through the API (EntityCreateRequest)
and the /wiki/new form body (human wiki writes set hand_authored
true by definition). The non-negotiable invariant: every row either
carries provenance or is explicitly flagged as hand-authored.
F-5 synchronous conflict-detection hook on active create per
engineering-v1-acceptance.md:99:
create_entity(status="active") now runs detect_conflicts_for_entity
with fail-open per conflict-model.md:256. Detector errors log a
warning but never 4xx-block the write (Q-3 "flag, never block").
Doc note added to engineering-ontology-v1.md recording that `project`
IS the `project_id` per "fields equivalent to" wording. No storage
rename.
Backfill script scripts/v1_0_backfill_provenance.py reports and
optionally flags existing active entities that lack provenance.
Idempotent. Supports --dry-run and --invalidate-instead.
Tests: 10 new in test_v1_0_write_invariants.py covering F-1 fields,
F-8 raise + bypass, F-5 hook on active + no-hook on candidate, Q-3
fail-open, Q-4 partial scope_only=active excludes candidates.
Three pre-existing conflict tests adapted to read list_open_conflicts
rather than re-run the detector (which now dedups because the hook
already fired at create-time). One API test adds hand_authored=true
since its fixture has no source_refs.
conftest.py wraps create_entity so tests that don't pass source_refs
or hand_authored default to hand_authored=True (tests author their
own fixture data — reasonable default). Production paths (API route,
wiki form, graduation scripts) all pass explicit values and are
unaffected.
Test count: 533 -> 543 (+10). Full suite green in 77.86s.
Pending: Codex review on the branch before squash-merge to main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex's third-round audit closed the remaining five open questions
with concrete file:line resolutions, patched inline in the plan:
- F-7 (P1): graduation stack is partially built — graduated_to_entity_id
at database.py:143-146, graduated memory status, promote preserves
original at service.py:354-356, tests at test_engineering_v1_phase5.py.
Gaps: missing direct POST /memory/{id}/graduate route; spec's
knowledge -> Fact mismatches ontology (no fact type). Reconcile to
parameter or similar. V1-E 2 days -> 3-4 days.
- Q-5 / V1-D (P2): renderer reads wall-clock in _footer at mirror.py:320.
Fix is injecting regenerated timestamp + checksum as renderer inputs,
sorting DB iteration, removing dict ordering deps. Render code must
not call wall-clock directly.
- project vs project_id (P3): doc note only, no storage rename.
- Total estimate: 17.5-19.5 focused days (calendar buffer on top).
- Release notes must NOT canonize "Minions" as a V2 name. Use neutral
"queued background processing / async workers" wording.
Sign-off from Codex: "with those edits, I'd sign off on the five
questions. The only non-architectural uncertainty left in the plan is
scheduling discipline against the current Now list; that does not
block V1-0 once the soak window and memory-density gate clear."
Plan frozen. V1-0 starts after pipeline soak (~2026-04-26) and the
100-active-memory density gate clear.
Co-Authored-By: Codex <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings folded in, all with exact file:line refs from Codex:
- F-1 downgraded from done to partial. Entity dataclass at
service.py:67 and entities table missing extractor_version and
canonical_home fields per engineering-v1-acceptance.md:45. V1-0
scope now adds both via additive migration + doc note that
project is the project_id per "fields equivalent to" wording.
- F-2 replaced guesses with ground truth per-query status:
9 of 20 v1-required queries done, 1 partial (Q-001 needs
subsystem-scoped variant), 10 missing. V1-A scope shrank to
Q-001 shape fix + Q-6 integration. V1-C closes the 8 net-new
queries; Q-020 deferred to V1-D (mirror).
- F-5 reframed. Generic conflicts + conflict_members schema
already present at database.py:190, no migration needed.
Divergence is detector body (per-type dispatch needs
generalization) + routes (/admin/conflicts/* needs
/conflicts/* alias). V1-F scope is detector + routes only.
Totals revised: 16.5-17.5 days, ~60 tests.
Three of Codex's eight open questions now resolved. Remaining:
F-7 graduation depth, mirror determinism, project naming,
velocity calibration, minions-as-V2 naming.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/decisions/2026-04-22-gbrain-plan-rejection.md: record of
gbrain-inspired "Phase 8 Minions + typed edges" plan rejection.
Three high findings from Codex verified against cited architecture
docs (ontology V1 predicate set, canonical entity contract,
master-plan-status Now list sequencing).
- docs/plans/engineering-v1-completion-plan.md: seven-phase plan
for finishing Engineering V1 against engineering-v1-acceptance.md.
V1-0 (write-time invariants: F-8 provenance + F-5 hooks + F-1
audit) as hard prerequisite per Codex first-round review. Per-
criterion gap audit against each F/Q/O/D acceptance item with
code:line references. Explicit collision points with the Now
list; schedule shifted ~4 weeks to avoid pipeline-soak window.
Awaiting Codex file-level audit.
- DEV-LEDGER.md: Recent Decisions + Session Log entries covering
both the rejection and the revised plan.
No code changes. Docs + ledger only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last P2 from Antoine's "daily-usable" sprint. Entities referenced via
[[Name]] in descriptions or mirror markdown now render as:
- live wikilink if the name matches an entity in the same project
- live cross-project link with "(in project X)" scope indicator if the
only match is in another project
- red italic redlink pointing at /wiki/new?name=... otherwise
Clicking a redlink opens a pre-filled "create this entity" form that
POSTs to /v1/entities and redirects to the new entity's page.
- engineering/wiki.py: _wikilink_transform + _resolve_wikilink,
applied in render_project (pre-markdown) and render_entity
(description body). render_new_entity_form for the create page.
CSS for .wikilink / .wikilink-cross / .redlink / .new-entity-form
- api/routes.py: GET /wiki/new?name&project
- tests/test_wikilinks.py: 12 tests including the spec regression
(A references [[B]] -> redlink; create B -> link becomes live)
- DEV-LEDGER.md: session log + test_count 521 -> 533
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New table memory_merge_candidates + service functions to cluster
near-duplicate active memories within (project, memory_type) buckets,
draft a unified content via LLM, and merge on human approval. Source
memories become superseded (never deleted); merged memory carries
union of tags, max of confidence, sum of reference_count.
- schema migration for memory_merge_candidates
- atocore.memory.similarity: cosine + transitive clustering
- atocore.memory._dedup_prompt: stdlib-only LLM prompt preserving every specific
- service: merge_memories / create_merge_candidate / get_merge_candidates / reject_merge_candidate
- scripts/memory_dedup.py: host-side detector (HTTP-only, idempotent)
- 5 API endpoints under /admin/memory/merge-candidates* + /admin/memory/dedup-scan
- triage UI: purple "🔗 Merge Candidates" section + "🔗 Scan for duplicates" bar
- batch-extract.sh Step B3 (0.90 daily, 0.85 Sundays)
- deploy/dalidou/dedup-watcher.sh for UI-triggered scans
- 21 new tests (374 → 395)
- docs/PHASE-7-MEMORY-CONSOLIDATION.md covering 7A-7H roadmap
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updated DEV-LEDGER orientation with post-sprint state:
- live_sha 775960c, tests 303, harness 17/18 on live
- interactions 234 (192 claude-code + 38 openclaw)
- project_state_entries 110 across 6 projects
- nightly pipeline now includes auto-promote, harness, summary
Updated master-plan-status.md "What Is Real Today" to match
actual 2026-04-16 state. Phase 10 moved from "Next" to
operational. New "Now" priorities: observe pipeline, knowledge
density, multi-model triage, fix p04-constraints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- openclaw-plugins/atocore-capture/handler.js: simplified version
using before_agent_start + llm_output hooks (survives gateway
restarts). The production copy lives on T420 at
/tmp/atocore-openclaw-capture-plugin/openclaw-plugins/atocore-capture/
- DEV-LEDGER: updated orientation (live_sha b687e7f, capture clients)
and session log for 2026-04-16
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fixture's expect_absent: "GigaBIT" was catching legitimate
semantic overlap, not retrieval bleed. The p06 ARCHITECTURE.md
Overview describes the Polisher Suite as built for the GigaBIT M1
mirror — it is what the polisher is for, so the word appears
correctly in p06 content. All retrieved sources for this prompt
were genuinely p06/shared paths; zero actual p04 chunks leaked.
Narrowed the assertion to expect_absent: "[Source: p04-gigabit/",
which tests the real invariant (no p04 source chunks retrieved
into p06 context) without the false positive.
No retrieval/ranking code change. Fixture-only fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R11: POST /admin/extract-batch with mode=llm now returns 503 when the
claude CLI is unavailable (was silently returning success with 0
candidates), with a message pointing at the host-side script. +2 tests.
R12: extracted SYSTEM_PROMPT + parse_llm_json_array +
normalize_candidate_item + build_user_message into stdlib-only
src/atocore/memory/_llm_prompt.py. Both the container extractor and
scripts/batch_llm_extract_live.py now import from it, eliminating the
prompt/parser drift risk.
Tests 297 -> 299.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflects today's massive work: engineering layer + wiki + Karpathy
upgrades + OpenClaw importer + auto-detection. Active memories
47 -> 84. Ready for next session to pick up cold.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Before: a model returning 'p04-gigabit' for a p06-polisher
interaction would silently override the known scope because the
project was registered. After: interaction.project always wins
when set. Model project is only a fallback for unscoped captures.
Not yet guaranteed: within-project semantic errors (model says
the right project but wrong content). That's a content-quality
concern, not a trust-hierarchy issue.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex R6: the LLM extractor accepted the model's project field
verbatim. When the model returned empty string, clearly p06 memories
got promoted as project='', making them invisible to the p06
project-memory band and explaining the p06-offline-design harness
failure.
Fix: if model returns empty project but interaction.project is set,
inherit the interaction's project. Model-supplied project still takes
precedence when non-empty.
Two new tests lock the fallback and precedence behaviors.
R5 acknowledged (LLM extractor not yet wired into API — next task).
Test count: 278 -> 280. Harness re-run pending after deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mini-phase complete. Before/after deltas:
Metric Before After
─────────────────────────────────────────
Rule extractor recall 0% 0% (unchanged, deprioritized)
LLM extractor recall n/a 100% (new, claude -p haiku)
LLM candidate yield n/a 2.55/interaction
First triage accept rate n/a 31% (16/51)
Active memories 20 36 (+16)
p06-polisher memories 2 16 (+14)
atocore memories 0 5 (+5)
Retrieval harness 6/6 15/18 (expanded to 18 fixtures)
Test count 264 278 (+14)
3 remaining harness failures are budget-contention on the p06 memory
band: the specific memory a fixture targets ranks 4th+ and the 25%
budget only holds 2-3 entries. Not a ranking bug — the per-entry
250-char cap was the one justified tweak; a second budget change
risks regressing other fixtures per Codex's Day 7 hard gate.
Ledger updated: Orientation, Session Log, main_tip, harness line.
Next on the roadmap (from DEV-LEDGER Active Plan / docs/next-steps):
- Wave 2 trusted operational ingestion (p04/p05/p06 dashboards)
- Finish OpenClaw integration (Phase 8)
- Auto-triage (multi-model second pass to reduce human review)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Day 2 extractor eval baseline on a 20-interaction labeled set shows
0% yield / 0% recall / 0% precision. The 5 false negatives span
5 distinct miss classes, matching the pattern Codex's Day 4 hard
gate was designed to catch but arriving two days early.
No extractor code change on main. Day 1+2 artifacts committed on
working branch 'claude/extractor-eval-loop' at 7d8d599. Day 4
decision (keep rule-expanding vs prototype LLM-assisted mode) is
escalated to Antoine for ratification before Day 3 work touches
any extractor.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ledger is the one-file source of truth for "what is currently
true" across Claude/Codex/human sessions:
- Orientation (live SHA, main tip, test count, harness state)
- Active Plan (currently Codex's 8-day extractor + harness plan
with hard gates and fail-early thresholds)
- Open Review Findings (P1/P2, status)
- Recent Decisions (bounded to last 20)
- Session Log (bounded to last 20)
- Working Rules (no parallel work, branching rule, P1 block)
Narrative docs under docs/ sometimes lag reality; the ledger does
not. Every session MUST read it at start and append a Session Log
line before ending.
AGENTS.md: added a new "Session protocol" section at the top that
points at the ledger. Applies to any agent (Claude, Codex, future).
CLAUDE.md (new, project-local): project instructions for Claude
Code in this repo. Points at DEV-LEDGER.md and AGENTS.md, spells
out the deploy workflow and the Claude/Codex working model.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>