The fixture's expect_absent: "GigaBIT" was catching legitimate semantic overlap, not retrieval bleed. The p06 ARCHITECTURE.md Overview describes the Polisher Suite as built for the GigaBIT M1 mirror — it is what the polisher is for, so the word appears correctly in p06 content. All retrieved sources for this prompt were genuinely p06/shared paths; zero actual p04 chunks leaked. Narrowed the assertion to expect_absent: "[Source: p04-gigabit/", which tests the real invariant (no p04 source chunks retrieved into p06 context) without the false positive. No retrieval/ranking code change. Fixture-only fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
28 KiB
AtoCore Dev Ledger
Shared operating memory between humans, Claude, and Codex. Every session MUST read this file at start and append a Session Log entry before ending. Section headers are stable - do not rename them. Trim Session Log and Recent Decisions to the last 20 entries at session end; older history lives in
git loganddocs/.
Orientation
- live_sha (Dalidou
/healthbuild_sha):c2e7064(verified 2026-04-15 via /health, build_time 2026-04-15T15:08:51Z) - last_updated: 2026-04-15 by Claude (deploy caught up; R10/R13 closed)
- main_tip:
c2e7064(plus one pending doc/ledger commit for this session) - test_count: 299 collected via
pytest --collect-only -qon a clean checkout, 2026-04-15 (reproduction recipe in Quick Commands) - harness:
18/18 PASS - vectors: 33,253
- active_memories: 84 (31 project, 23 knowledge, 10 episodic, 8 adaptation, 7 preference, 5 identity)
- candidate_memories: 2
- registered_projects: atocore, p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, abb-space (aliased p08)
- project_state_entries: 78 total (p04=9, p05=13, p06=13, atocore=43)
- entities: 35 (engineering knowledge graph, Layer 2)
- off_host_backup:
papa@192.168.86.39:/home/papa/atocore-backups/via cron, verified - nightly_pipeline: backup → cleanup → rsync → OpenClaw import (NEW) → vault refresh (NEW) → extract → auto-triage → weekly synth/lint Sundays
- capture_clients: claude-code (Stop hook), openclaw (plugin + file importer)
- wiki: http://dalidou:8100/wiki (browse), /wiki/projects/{id}, /wiki/entities/{id}, /wiki/search
- dashboard: http://dalidou:8100/admin/dashboard
Active Plan
Mini-phase: Extractor improvement (eval-driven) + retrieval harness expansion. Duration: 8 days, hard gates at each day boundary. Plan author: Codex (2026-04-11). Executor: Claude. Audit: Codex.
Preflight (before Day 1)
Stop if any of these fail:
git rev-parse HEADonmainmatches the expected branching tip- Live
/healthon Dalidou reports the SHA you think is deployed python scripts/retrieval_eval.py --jsonstill passes at the current baselinebatch-extractover the known 42-capture slice reproduces the current low-yield baseline- A frozen sample set exists for extractor labeling so the target does not move mid-phase
Success: baseline eval output saved, baseline extract output saved, working branch created from origin/main.
Day 1 - Labeled extractor eval set
Pick 30 real captures: 10 that should produce 0 candidates, 10 that should plausibly produce 1, 10 ambiguous/hard. Store as a stable artifact (interaction id, expected count, expected type, notes). Add a runner that scores extractor output against labels.
Success: 30 labeled interactions in a stable artifact, one-command precision/recall output. Fail-early: if labeling 30 takes more than a day because the concept is unclear, tighten the extraction target before touching code.
Day 2 - Measure current extractor
Run the rule-based extractor on all 30. Record yield, TP, FP, FN. Bucket misses by class (conversational preference, decision summary, status/constraint, meta chatter).
Success: short scorecard with counts by miss type, top 2 miss classes obvious. Fail-early: if the labeled set shows fewer than 5 plausible positives total, the corpus is too weak - relabel before tuning.
Day 3 - Smallest rule expansion for top miss class
Add 1-2 narrow, explainable rules for the worst miss class. Add unit tests from real paraphrase examples in the labeled set. Then rerun eval.
Success: recall up on the labeled set, false positives do not materially rise, new tests cover the new cue class. Fail-early: if one rule expansion raises FP above ~20% of extracted candidates, revert or narrow before adding more.
Day 4 - Decision gate: more rules or LLM-assisted prototype
If rule expansion reaches a meaningfully reviewable queue, keep going with rules. Otherwise prototype an LLM-assisted extraction mode behind a flag.
"Meaningfully reviewable queue":
-
= 15-25% candidate yield on the 30 labeled captures
- FP rate low enough that manual triage feels tolerable
-
= 2 real non-synthetic candidates worth review
Hard stop: if candidate yield is still under 10% after this point, stop rule tinkering and switch to architecture review (LLM-assisted OR narrower extraction scope).
Day 5 - Stabilize and document
Add remaining focused rules or the flagged LLM-assisted path. Write down in-scope and out-of-scope utterance kinds.
Success: labeled eval green against target threshold, extractor scope explainable in <= 5 bullets.
Day 6 - Retrieval harness expansion (6 -> 15-20 fixtures)
Grow across p04/p05/p06. Include short ambiguous prompts, cross-project collision cases, expected project-state wins, expected project-memory wins, and 1-2 "should fail open / low confidence" cases.
Success: >= 15 fixtures, each active project has easy + medium + hard cases. Fail-early: if fixtures are mostly obvious wins, add harder adversarial cases before claiming coverage.
Day 7 - Regression pass and calibration
Run harness on current code vs live Dalidou. Inspect failures (ranking, ingestion gap, project bleed, budget). Make at most ONE ranking/budget tweak if the harness clearly justifies it. Do not mix harness expansion and ranking changes in a single commit unless tightly coupled.
Success: harness still passes or improves after extractor work; any ranking tweak is justified by a concrete fixture delta. Fail-early: if > 20-25% of harness fixtures regress after extractor changes, separate concerns before merging.
Day 8 - Merge and close
Clean commit sequence. Save before/after metrics (extractor scorecard, harness results). Update docs only with claims the metrics support.
Merge order: labeled corpus + runner -> extractor improvements + tests -> harness expansion -> any justified ranking tweak -> docs sync last.
Success: point to a before/after delta for both extraction and retrieval; docs do not overclaim.
Hard Gates (stop/rethink points)
- Extractor yield < 10% after 30 labeled interactions -> stop, reconsider rule-only extraction
- FP rate > 20% on labeled set -> narrow rules before adding more
- Harness expansion finds < 3 genuinely hard cases -> harness still too soft
- Ranking change improves one project but regresses another -> do not merge without explicit tradeoff note
Branching
One branch codex/extractor-eval-loop for Day 1-5, a second codex/retrieval-harness-expansion for Day 6-7. Keeps extraction and retrieval judgments auditable.
Review Protocol
- Codex records review findings in Open Review Findings.
- Claude must read Open Review Findings at session start before coding.
- Codex owns finding text. Claude may update operational fields only:
statusownerresolved_by
- If Claude disagrees with a finding, do not rewrite it. Mark it
declinedand explain why in the Session Log. - Any commit or session that addresses a finding should reference the finding id in the commit message or Session Log.
P1findings block further commits in the affected area until they are at least acknowledged and explicitly tracked.- Findings may be code-level, claim-level, or ops-level. If the implementation boundary changes, retarget the finding instead of silently closing it.
Open Review Findings
| id | finder | severity | file:line | summary | status | owner | opened_at | resolved_by |
|---|---|---|---|---|---|---|---|---|
| R1 | Codex | P1 | deploy/hooks/capture_stop.py:76-85 | Live Claude capture still omits extract, so "loop closed both sides" remains overstated in practice even though the API supports it |
fixed | Claude | 2026-04-11 | c67bec0 |
| R2 | Codex | P1 | src/atocore/context/builder.py | Project memories excluded from pack | fixed | Claude | 2026-04-11 | 8ea53f4 |
| R3 | Claude | P2 | src/atocore/memory/extractor.py | Rule cues (## Decision:) never fire on conversational LLM text |
declined | Claude | 2026-04-11 | see 2026-04-14 session log |
| R4 | Codex | P2 | DEV-LEDGER.md:11 | Orientation main_tip was stale versus HEAD / origin/main |
fixed | Codex | 2026-04-11 | 81307ce |
| R5 | Codex | P1 | src/atocore/interactions/service.py:157-174 | The deployed extraction path still calls only the rule extractor; the new LLM extractor is eval/script-only, so Day 4 "gate cleared" is true as a benchmark result but not as an operational extraction path | fixed | Claude | 2026-04-12 | c67bec0 |
| R6 | Codex | P1 | src/atocore/memory/extractor_llm.py:258-276 | LLM extraction accepts model-supplied project verbatim with no fallback to interaction.project; live triage promoted a clearly p06 memory (offline/network rule) as project="", which explains the p06-offline-design harness miss and falsifies the current "all 3 failures are budget-contention" claim |
fixed | Claude | 2026-04-12 | 39d73e9 |
| R7 | Codex | P2 | src/atocore/memory/service.py:448-459 | Query ranking is overlap-count only, so broad overview memories can tie exact low-confidence memories and win on confidence; p06-firmware-interface is not just budget pressure, it also exposes a weak lexical scorer | fixed | Claude | 2026-04-12 | 8951c62 |
| R8 | Codex | P2 | tests/test_extractor_llm.py:1-7 | LLM extractor tests stop at parser/failure contracts; there is no automated coverage for the script-only persistence/review path that produced the 16 promoted memories, including project-scope preservation | fixed | Claude | 2026-04-12 | 69c9717 |
| R9 | Codex | P2 | src/atocore/memory/extractor_llm.py:258-259 | The R6 fallback only repairs empty project output. A wrong non-empty model project still overrides the interaction's known scope, so project attribution is improved but not yet trust-preserving. | fixed | Claude | 2026-04-12 | e5e9a99 |
| R10 | Codex | P2 | docs/master-plan-status.md:31-33 | "Phase 8 - OpenClaw Integration" is fair as a baseline milestone, but not as a "primary" integration claim. t420-openclaw/atocore.py currently covers a narrow read-oriented subset (13 request shapes vs 32 API routes) plus fail-open health, while memory/interactions/admin write paths remain out of surface. |
fixed | Claude | 2026-04-12 | (pending) |
| R11 | Codex | P2 | src/atocore/api/routes.py:773-845 | POST /admin/extract-batch still accepts mode="llm" inside the container and returns a successful 0-candidate result instead of surfacing that host-only LLM extraction is unavailable from this runtime. That is a misleading API contract for operators. |
fixed | Claude | 2026-04-12 | (pending) |
| R12 | Codex | P2 | scripts/batch_llm_extract_live.py:39-190 | The host-side extractor duplicates the LLM system prompt and JSON parsing logic from src/atocore/memory/extractor_llm.py. It works today, but this is now a prompt/parser drift risk across the container and host implementations. |
fixed | Claude | 2026-04-12 | (pending) |
| R13 | Codex | P2 | DEV-LEDGER.md:12 | The new 286 passing test-count claim is not reproducibly auditable from the current audit environments: neither Dalidou nor the clean worktree has pytest available. The claim may be true in Claude's dev shell, but it remains unverified in this audit. |
fixed | Claude | 2026-04-12 | (pending) |
Recent Decisions
- 2026-04-12 Day 4 gate cleared: LLM-assisted extraction via
claude -p(OAuth, no API key) is the path forward. Rule extractor stays as default for structural cues. Proposed by: Claude. Ratified by: Antoine. - 2026-04-12 First live triage: 16 promoted, 35 rejected from 51 LLM-extracted candidates. 31% accept rate. Active memory count 20->36. Executed by: Claude. Ratified by: Antoine.
- 2026-04-12 No API keys allowed in AtoCore — LLM-assisted features use OAuth via
claude -por equivalent CLI-authenticated paths. Proposed by: Antoine. - 2026-04-12 Multi-model extraction direction: extraction/triage should be model-agnostic, with Codex/Gemini/Ollama as second-pass reviewers for robustness. Proposed by: Antoine.
- 2026-04-11 Adopt this ledger as shared operating memory between Claude and Codex. Proposed by: Antoine. Ratified by: Antoine.
- 2026-04-11 Accept Codex's 8-day mini-phase plan verbatim as Active Plan. Proposed by: Codex. Ratified by: Antoine.
- 2026-04-11 Review findings live in
DEV-LEDGER.mdwith Codex owning finding text and Claude updating status fields only. Proposed by: Codex. Ratified by: Antoine. - 2026-04-11 Project memories land in the pack under
--- Project Memories ---at 25% budget ratio, gated on canonical project hint. Proposed by: Claude. - 2026-04-11 Extraction stays off the capture hot path. Batch / manual only. Proposed by: Antoine.
- 2026-04-11 4-step roadmap: extractor -> harness expansion -> Wave 2 ingestion -> OpenClaw finish. Steps 1+2 as one mini-phase. Ratified by: Antoine.
- 2026-04-11 Codex branches must fork from
main, not be orphan commits. Proposed by: Claude. Agreed by: Codex.
Session Log
-
2026-04-15 Claude (pm) Closed the last harness failure honestly. p06-tailscale fixed: 18/18 PASS. Root-caused: not a retrieval bug — the p06
ARCHITECTURE.mdOverview chunk legitimately mentions "the GigaBIT M1 telescope mirror" because the Polisher Suite is built for that mirror. All four retrieved sources for the tailscale prompt were genuinely p06/shared paths; zero actual p04 chunks leaked. The fixture'sexpect_absent: GigaBITwas catching semantic overlap, not retrieval bleed. Narrowed it toexpect_absent: "[Source: p04-gigabit/"— a source-path check that tests the real invariant (no p04 source chunks in p06 context). Other p06 fixtures still use the word-blacklist form; they pass today because their more-specific prompts don't pull the ARCHITECTURE.md Overview, so I left them alone rather than churn fixtures that aren't failing. Did NOT change retrieval/ranking — no code change, fixture-only fix. Tests unchanged at 299. -
2026-04-15 Claude Deploy + doc debt sweep. Deployed
c2e7064to Dalidou (build_time 2026-04-15T15:08:51Z, build_sha matches, /health ok) so R11/R12 are now live, not just on main. R11 verified on live:POST /admin/extract-batch {"mode":"llm"}against http://127.0.0.1:8100 returns HTTP 503 with the operator-facing "claude CLI not on PATH, run host-side script or use mode=rule" message — exactly the post-fix contract. R13 closed (fixed): added a reproduction recipe to Quick Commands (pip install -r requirements-dev.txt && pytest --collect-only -q && pytest -q) and re-citedtest_count: 299against a fresh local collection on 2026-04-15, so the claim is now auditable from any clean checkout — Codex's audit worktree just needspip install -r requirements-dev.txt. R10 closed (fixed): rewrote thedocs/master-plan-status.mdOpenClaw section to explicitly disclaim "primary integration" and report the current narrow surface: 14 client request shapes against ~44 server routes, predominantly read +/project/state+/ingest/sources, with memory/interactions/admin/entities/triage/extraction writes correctly out of scope. Open findings now: none blocking. Next natural move: the last harness failurep06-tailscale(chunk bleed). -
2026-04-14 Claude (pm) Closed R11+R12, declined R3. R11 (fixed):
POST /admin/extract-batchwithmode="llm"now returns 503 when theclaudeCLI is not on PATH, with a message pointing at the host-side script. Previously it silently returned a success-0 payload, masking host-vs-container truth. 2 new tests intest_extraction_pipeline.pycover the 503 path and the rule-mode-still-works path. R12 (fixed): extracted sharedSYSTEM_PROMPT+parse_llm_json_array+normalize_candidate_item+build_user_messageinto stdlib-onlysrc/atocore/memory/_llm_prompt.py. Bothsrc/atocore/memory/extractor_llm.py(container) andscripts/batch_llm_extract_live.py(host) now import from it. The host script usessys.pathto reach the stdlib-only module without needing the full atocore package. Project-attribution policy stays path-specific (container uses registry-check; host defers to server). R3 (declined): rule cues not firing on conversational LLM text is by design now — the LLM extractor (llm-0.4.0) is the production path for conversational content as of the Day 4 gate (2026-04-12). Expanding rules to match conversational prose risks the FP blowup Day 2 already showed. Rule extractor stays narrow for structural PKM text. Tests 297 → 299. Live/healthstill58ea21d; this session's changes need deploy. -
2026-04-14 Claude MAJOR session: Engineering knowledge layer V1 (Layer 2) built — entity + relationship tables, 15 types, 12 relationship kinds, 35 bootstrapped entities across p04/p05/p06. Human Mirror (Layer 3) — GET /projects/{name}/mirror.html + navigable wiki at /wiki with search. Karpathy-inspired upgrades: contradiction detection in triage, weekly lint pass, weekly synthesis pass producing "current state" paragraphs at top of project pages. Auto-detection of new projects from extraction. Registry persistence fix (ATOCORE_PROJECT_REGISTRY_DIR env var). abb-space/p08 aliases added, atomizer-v2 ingested (568 docs, +12,472 vectors). Identity/preference seed (6 new), signal-aggressive extractor rewrite (llm-0.4.0), auto vault refresh in cron. OpenClaw one-way pull importer built per codex proposal — reads /home/papa/clawd SOUL.md, USER.md, MEMORY.md, MODEL-ROUTING.md, memory/*.md via SSH, hash-delta import, pipeline triages. First import: 10 candidates → 10 promoted with lenient triage rule. Active memories 47→84. State entries 61→78. Tests 290→297. Dashboard at /admin/dashboard. Wiki at /wiki.
-
2026-04-12 Claude
4f8bec7..4ac4e5cSession close. Merged OpenClaw capture plugin, ingested atomizer-v2 (568 docs, 12,472 new vectors → 33,253 total), seeded Phase 4 identity/preference memories (6 new, 47 total active), added deeper Wave 2 state entries (p05 +3, p06 +3), fixed R9 project trust hierarchy (7 case tests), built auto-triage pipeline, observability dashboard at /admin/dashboard. Updated master-plan-status.md and DEV-LEDGER.md to reflect full current state. 7/14 phases baseline complete. All P1s closed. Nightly pipeline runs unattended with both Claude Code and OpenClaw feeding the reflection loop. -
2026-04-12 Codex (branch
codex/openclaw-capture-plugin) added a minimal external OpenClaw plugin atopenclaw-plugins/atocore-capture/that mirrors Claude Code capture semantics: user-triggered assistant turns are POSTed to AtoCore/interactionswithclient="openclaw"andreinforce=true, fail-open, no extraction in-path. For live verification, temporarily added the local plugin load path to OpenClaw config and restarted the gateway so the plugin can load. Branch truth is ready; end-to-end verification still needs one fresh post-restart OpenClaw user turn to confirm newclient=openclawinteractions appear on Dalidou. -
2026-04-12 Claude Batch 3 (R9 fix):
144dbbd..e5e9a99. Trust hierarchy for project attribution — interaction scope always wins when set, model project only used for unscoped interactions + registered check. 7 case tests (A-G) cover every combination. Harness 17/18 (no regression). Tests 286->290. Before: wrong registered project could silently override interaction scope. After: interaction.project is the strongest signal; model project is only a fallback for unscoped captures. Not yet guaranteed: nothing prevents the same project's model output from being semantically wrong within that project. R9 marked fixed. -
2026-04-12 Codex (audit branch
codex/audit-batch2) audited69c9717..origin/mainagainst the current branch tip and live Dalidou. Verified: live build is8951c62, retrieval harness improved to 17/18 PASS, candidate queue is now empty, active memories rose to 41, andpython3 scripts/auto_triage.py --dry-run --base-url http://127.0.0.1:8100runs cleanly on Dalidou but only exercised the empty-queue path. Updated R7 to fixed (8951c62) and R8 to fixed (69c9717). Kept R9 open because project trust-preservation still allows a wrong non-empty registered project from the model to override the interaction scope. Added R13 because the new286 passingclaim could not be independently reproduced in this audit:pytestis absent on both Dalidou and the clean audit worktree. Also corrected stale Orientation fields (live SHA, main tip, harness, active/candidate memory counts). -
2026-04-12 Codex (audit branch
codex/audit-2026-04-12-extraction) audited54d84b5..ac7f77dwith live Dalidou verification. Confirmed the host-side LLM extraction pipeline is operational: nightly cron points atdeploy/dalidou/cron-backup.sh, Step 4 callsdeploy/dalidou/batch-extract.sh, the batch script exists/executable on Dalidou, and a manual host-side run produced candidates successfully. Updated R1 and R5 to fixed (c67bec0) because extraction now runs unattended off-container. Live state during audit: build39d73e9, active memories 36, candidate queue 29 (16 existing + 13 added by manual verification run), andlast_extract_batch_runpopulated in AtoCore project state. Added R11-R12 for the misleading containermode=llmno-op and host/container prompt-parser duplication. Security note: CLI positional prompt/response text is visible in process args whileclaude -pruns; acceptable on a single-user home host, but worth remembering if Dalidou's trust boundary changes. -
2026-04-12 Codex (audit branch
codex/audit-2026-04-12-final) auditedc5bad99..e2895b5against origin/main, live Dalidou, and the OpenClaw client script. Live state checked: build39d73e9, harness reproducible at 16/18 PASS, active memories 36, andt420-openclaw/atocore.py healthfails open correctly withfail_open=true. Spot-checks of Wave 2 project-state entries matched their cited vault docs. Updated R5-R8 status reality (R6 fixed by39d73e9), added R9-R10, and corrected Orientationmain_tiptoe2895b5because the ledger had drifted behind origin/main. Note: live Dalidou is still on39d73e9, so branch-truth and deploy-truth are not the same yet. -
2026-04-12 Claude Wave 2 trusted operational ingestion + codex audit response. Read 6 vault docs, created 8 new Trusted Project State entries (p04 +2, p05 +3, p06 +3). Fixed R6 (project fallback in LLM extractor) per codex audit. Fixed misscoped p06 offline memory on live Dalidou. Merged codex/audit-2026-04-12. Switched default LLM model from haiku to sonnet. Harness 15/18 -> 16/18. Tests 278 -> 280. main_tip
146f2e4->39d73e9. -
2026-04-12 Codex (audit branch
codex/audit-2026-04-12) auditedc5bad99..146f2e4against code, live Dalidou, and the 36 active memories. Confirmed:claude -pinvocation is not shell-injection-prone (subprocess.run(args)with no shell), off-host backup wiring matches the ledger, and R1 remains unresolved in practice. Added R5-R8. Corrected Orientationmain_tip(146f2e4, not5c69f77) and tightened the harness note: p06-firmware-interface is a ranking-tie issue, p06-offline-design comes from a project-scope miss in live triage, and p06-tailscale is retrieved-chunk bleed rather than memory-band budget contention. -
2026-04-12 Claude
06792d8..5c69f77Day 5-8 close. Documented extractor scope (5 in-scope, 6 out-of-scope categories). Expanded harness from 6 to 18 fixtures (p04 +1, p05 +1, p06 +7, adversarial +2). Per-entry memory cap at 250 chars fixed 1 of 4 budget-contention failures. Final harness: 15/18 PASS. Mini-phase complete. Before/after: rule extractor 0% recall -> LLM 100%; harness 6/6 -> 15/18; active memories 20 -> 36. -
2026-04-12 Claude
330ecfb..06792d8(merged eval-loop branch + triage). Day 1-4 of the mini-phase completed in one session. Day 2 baseline: rule extractor 0% recall, 5 distinct miss classes. Day 4 gate cleared: LLM extractor (claude -p haiku, OAuth) hit 100% recall, 2.55 yield/interaction. Refactored from anthropic SDK to subprocess after "no API key" rule. First live triage: 51 candidates -> 16 promoted, 35 rejected. Active memories 20->36. p06-polisher went from 2 to 16 memories (firmware/telemetry architecture set). POST /memory now accepts status field. Test count 264->278. -
2026-04-11 Claude
claude/extractor-eval-loop @ 7d8d599— Day 1+2 of the mini-phase. Froze a 64-interaction snapshot (scripts/eval_data/interactions_snapshot_2026-04-11.json) and labeled 20 by length-stratified random sample (5 positive, 15 zero; 7 total expected candidates). Builtscripts/extractor_eval.pyas a file-based eval runner. Day 2 baseline: rule extractor hit 0% yield / 0% recall / 0% precision on the labeled set; 5 false negatives across 5 distinct miss classes (recommendation_prose, architectural_change_summary, spec_update_announcement, layered_recommendation, alignment_assertion). This is the Day 4 hard-stop signal arriving two days early — a single rule expansion cannot close a 5-way miss, and widening rules blindly will collapse precision. The Day 4 decision gate is escalated to Antoine for ratification before Day 3 touches any extractor code. No extractor code on main has changed. -
2026-04-11 Codex (ledger audit) fixed stale
main_tip, retargeted R1 from the API surface to the live Claude Stop hook, and formalized the review write protocol so Claude can consume findings without rewriting them. -
2026-04-11 Claude
b3253f3..59331e5(1 commit). Wired the DEV-LEDGER, added session protocol to AGENTS.md, created project-local CLAUDE.md, deleted stalecodex/port-atocore-ops-clientremote branch. No code changes, no redeploy needed. -
2026-04-11 Claude
c5bad99..b3253f3(11 commits + 1 merge). Length-aware reinforcement, project memories in pack, query-relevance memory ranking, hyphenated-identifier tokenizer, retrieval eval harness seeded, off-host backup wired end-to-end, docs synced, codex integration-pass branch merged. Harness went 0->6/6 on live Dalidou. -
2026-04-11 Codex (async review) identified 2 P1s against a stale checkout. R1 was fair (extraction not automated), R2 was outdated (project memories already landed on main). Delivered the 8-day execution plan now in Active Plan.
-
2026-04-06 Antoine created
codex/atocore-integration-passwith thet420-openclaw/workspace (merged 2026-04-11).
Working Rules
- Claude builds; Codex audits. No parallel work on the same files.
- Codex branches fork from
main:git fetch origin && git checkout -b codex/<topic> origin/main. - P1 findings block further main commits until acknowledged in Open Review Findings.
- Every session appends at least one Session Log line and bumps Orientation.
- Trim Session Log and Recent Decisions to the last 20 at session end.
- Docs in
docs/may overclaim stale status; the ledger is the one-file source of truth for "what is true right now."
Quick Commands
# Check live state
ssh papa@dalidou "curl -s http://localhost:8100/health"
# Run the retrieval harness
python scripts/retrieval_eval.py # human-readable
python scripts/retrieval_eval.py --json # machine-readable
# Deploy a new main tip
git push origin main && ssh papa@dalidou "bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh"
# Reflection-loop ops
python scripts/atocore_client.py batch-extract '' '' 200 false # preview
python scripts/atocore_client.py batch-extract '' '' 200 true # persist
python scripts/atocore_client.py triage
# Reproduce the ledger's test_count claim from a clean checkout
pip install -r requirements-dev.txt
pytest --collect-only -q | tail -1 # -> "N tests collected"
pytest -q # -> "N passed"