Compare commits
6 Commits
codex/audi
...
4f8bec7419
| Author | SHA1 | Date | |
|---|---|---|---|
| 4f8bec7419 | |||
| 52380a233e | |||
| 8b77e83f0a | |||
| dbb8f915e2 | |||
| e5e9a9931e | |||
| 144dbbd700 |
@@ -6,10 +6,10 @@
|
||||
|
||||
## Orientation
|
||||
|
||||
- **live_sha** (Dalidou `/health` build_sha): `8951c62`
|
||||
- **last_updated**: 2026-04-12 by Codex (audit branch `codex/audit-batch2`)
|
||||
- **main_tip**: `69c9717`
|
||||
- **test_count**: `286 claimed`, but not reproducibly verified in this audit (`pytest` missing on Dalidou and in the clean audit worktree)
|
||||
- **live_sha** (Dalidou `/health` build_sha): `8951c62` (R9 fix at e5e9a99 not yet deployed)
|
||||
- **last_updated**: 2026-04-12 by Claude (Batch 3 R9 fix)
|
||||
- **main_tip**: `e5e9a99`
|
||||
- **test_count**: 290 passing (local dev shell)
|
||||
- **harness**: `17/18 PASS` (only p06-tailscale still failing)
|
||||
- **active_memories**: 41
|
||||
- **candidate_memories**: 0
|
||||
@@ -130,7 +130,7 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
|
||||
| R6 | Codex | P1 | src/atocore/memory/extractor_llm.py:258-276 | LLM extraction accepts model-supplied `project` verbatim with no fallback to `interaction.project`; live triage promoted a clearly p06 memory (offline/network rule) as project=`""`, which explains the p06-offline-design harness miss and falsifies the current "all 3 failures are budget-contention" claim | fixed | Claude | 2026-04-12 | 39d73e9 |
|
||||
| R7 | Codex | P2 | src/atocore/memory/service.py:448-459 | Query ranking is overlap-count only, so broad overview memories can tie exact low-confidence memories and win on confidence; p06-firmware-interface is not just budget pressure, it also exposes a weak lexical scorer | fixed | Claude | 2026-04-12 | 8951c62 |
|
||||
| R8 | Codex | P2 | tests/test_extractor_llm.py:1-7 | LLM extractor tests stop at parser/failure contracts; there is no automated coverage for the script-only persistence/review path that produced the 16 promoted memories, including project-scope preservation | fixed | Claude | 2026-04-12 | 69c9717 |
|
||||
| R9 | Codex | P2 | src/atocore/memory/extractor_llm.py:258-259 | The R6 fallback only repairs empty project output. A wrong non-empty model project still overrides the interaction's known scope, so project attribution is improved but not yet trust-preserving. | open | Claude | 2026-04-12 | |
|
||||
| R9 | Codex | P2 | src/atocore/memory/extractor_llm.py:258-259 | The R6 fallback only repairs empty project output. A wrong non-empty model project still overrides the interaction's known scope, so project attribution is improved but not yet trust-preserving. | fixed | Claude | 2026-04-12 | e5e9a99 |
|
||||
| R10 | Codex | P2 | docs/master-plan-status.md:31-33 | "Phase 8 - OpenClaw Integration" is fair as a baseline milestone, but not as a "primary" integration claim. `t420-openclaw/atocore.py` currently covers a narrow read-oriented subset (13 request shapes vs 32 API routes) plus fail-open health, while memory/interactions/admin write paths remain out of surface. | open | Claude | 2026-04-12 | |
|
||||
| R11 | Codex | P2 | src/atocore/api/routes.py:773-845 | `POST /admin/extract-batch` still accepts `mode="llm"` inside the container and returns a successful 0-candidate result instead of surfacing that host-only LLM extraction is unavailable from this runtime. That is a misleading API contract for operators. | open | Claude | 2026-04-12 | |
|
||||
| R12 | Codex | P2 | scripts/batch_llm_extract_live.py:39-190 | The host-side extractor duplicates the LLM system prompt and JSON parsing logic from `src/atocore/memory/extractor_llm.py`. It works today, but this is now a prompt/parser drift risk across the container and host implementations. | open | Claude | 2026-04-12 | |
|
||||
@@ -152,6 +152,8 @@ One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-ha
|
||||
|
||||
## Session Log
|
||||
|
||||
- **2026-04-12 Claude** Batch 3 (R9 fix): `144dbbd..e5e9a99`. Trust hierarchy for project attribution — interaction scope always wins when set, model project only used for unscoped interactions + registered check. 7 case tests (A-G) cover every combination. Harness 17/18 (no regression). Tests 286->290. Before: wrong registered project could silently override interaction scope. After: interaction.project is the strongest signal; model project is only a fallback for unscoped captures. Not yet guaranteed: nothing prevents the *same* project's model output from being semantically wrong within that project. R9 marked fixed.
|
||||
|
||||
- **2026-04-12 Codex (audit branch `codex/audit-batch2`)** audited `69c9717..origin/main` against the current branch tip and live Dalidou. Verified: live build is `8951c62`, retrieval harness improved to **17/18 PASS**, candidate queue is now empty, active memories rose to **41**, and `python3 scripts/auto_triage.py --dry-run --base-url http://127.0.0.1:8100` runs cleanly on Dalidou but only exercised the empty-queue path. Updated R7 to **fixed** (`8951c62`) and R8 to **fixed** (`69c9717`). Kept R9 **open** because project trust-preservation still allows a wrong non-empty registered project from the model to override the interaction scope. Added R13 because the new `286 passing` claim could not be independently reproduced in this audit: `pytest` is absent on both Dalidou and the clean audit worktree. Also corrected stale Orientation fields (live SHA, main tip, harness, active/candidate memory counts).
|
||||
- **2026-04-12 Codex (audit branch `codex/audit-2026-04-12-extraction`)** audited `54d84b5..ac7f77d` with live Dalidou verification. Confirmed the host-side LLM extraction pipeline is operational: nightly cron points at `deploy/dalidou/cron-backup.sh`, Step 4 calls `deploy/dalidou/batch-extract.sh`, the batch script exists/executable on Dalidou, and a manual host-side run produced candidates successfully. Updated R1 and R5 to **fixed** (`c67bec0`) because extraction now runs unattended off-container. Live state during audit: build `39d73e9`, active memories **36**, candidate queue **29** (16 existing + 13 added by manual verification run), and `last_extract_batch_run` populated in AtoCore project state. Added R11-R12 for the misleading container `mode=llm` no-op and host/container prompt-parser duplication. Security note: CLI positional prompt/response text is visible in process args while `claude -p` runs; acceptable on a single-user home host, but worth remembering if Dalidou's trust boundary changes.
|
||||
- **2026-04-12 Codex (audit branch `codex/audit-2026-04-12-final`)** audited `c5bad99..e2895b5` against origin/main, live Dalidou, and the OpenClaw client script. Live state checked: build `39d73e9`, harness reproducible at **16/18 PASS**, active memories **36**, and `t420-openclaw/atocore.py health` fails open correctly with `fail_open=true`. Spot-checks of Wave 2 project-state entries matched their cited vault docs. Updated R5-R8 status reality (R6 fixed by `39d73e9`), added R9-R10, and corrected Orientation `main_tip` to `e2895b5` because the ledger had drifted behind origin/main. Note: live Dalidou is still on `39d73e9`, so branch-truth and deploy-truth are not the same yet.
|
||||
|
||||
@@ -24,12 +24,15 @@ read-only additive mode.
|
||||
- Phase 5 - Project State
|
||||
- Phase 7 - Context Builder
|
||||
|
||||
### Partial
|
||||
|
||||
- Phase 4 - Identity / Preferences
|
||||
|
||||
### Baseline Complete
|
||||
|
||||
- Phase 4 - Identity / Preferences. As of 2026-04-12: 3 identity
|
||||
memories (role, projects, infrastructure) and 3 preference memories
|
||||
(no API keys, multi-model collab, action-over-discussion) seeded
|
||||
on live Dalidou. Identity/preference band surfaces in context packs
|
||||
at 5% budget ratio. Future identity/preference extraction happens
|
||||
organically via the nightly LLM extraction pipeline.
|
||||
|
||||
- Phase 8 - OpenClaw Integration. As of 2026-04-12 the T420 OpenClaw
|
||||
helper (`t420-openclaw/atocore.py`) is verified end-to-end against
|
||||
live Dalidou: health check, auto-context with project detection,
|
||||
|
||||
@@ -191,15 +191,15 @@ def parse_candidates(raw, interaction_project):
|
||||
continue
|
||||
mem_type = str(item.get("type") or "").strip().lower()
|
||||
content = str(item.get("content") or "").strip()
|
||||
project = str(item.get("project") or "").strip()
|
||||
if not project and interaction_project:
|
||||
model_project = str(item.get("project") or "").strip()
|
||||
# R9 trust hierarchy: interaction scope always wins when set.
|
||||
# Model project only used for unscoped interactions + registered check.
|
||||
if interaction_project:
|
||||
project = interaction_project
|
||||
elif project and interaction_project and project != interaction_project:
|
||||
# R9: model hallucinated an unrecognized project — fall back.
|
||||
# The host-side script can't import the registry, so we
|
||||
# check against a known set fetched from the API.
|
||||
if project not in _known_projects:
|
||||
project = interaction_project
|
||||
elif model_project and model_project in _known_projects:
|
||||
project = model_project
|
||||
else:
|
||||
project = ""
|
||||
conf = item.get("confidence", 0.5)
|
||||
if mem_type not in MEMORY_TYPES or not content:
|
||||
continue
|
||||
|
||||
@@ -866,6 +866,66 @@ def api_extract_batch(req: ExtractBatchRequest | None = None) -> dict:
|
||||
}
|
||||
|
||||
|
||||
@router.get("/admin/dashboard")
|
||||
def api_dashboard() -> dict:
|
||||
"""One-shot system observability dashboard.
|
||||
|
||||
Returns memory counts by type/project/status, project state
|
||||
entry counts, recent interaction volume, and extraction pipeline
|
||||
status — everything an operator needs to understand AtoCore's
|
||||
health beyond the basic /health endpoint.
|
||||
"""
|
||||
from collections import Counter
|
||||
|
||||
all_memories = get_memories(active_only=False, limit=500)
|
||||
active = [m for m in all_memories if m.status == "active"]
|
||||
candidates = [m for m in all_memories if m.status == "candidate"]
|
||||
|
||||
type_counts = dict(Counter(m.memory_type for m in active))
|
||||
project_counts = dict(Counter(m.project or "(none)" for m in active))
|
||||
reinforced = [m for m in active if m.reference_count > 0]
|
||||
|
||||
interactions = list_interactions(limit=1)
|
||||
recent_interaction = interactions[0].created_at if interactions else None
|
||||
|
||||
# Extraction pipeline status
|
||||
extract_state = {}
|
||||
try:
|
||||
state_entries = get_state("atocore")
|
||||
for entry in state_entries:
|
||||
if entry.category == "status" and entry.key == "last_extract_batch_run":
|
||||
extract_state["last_run"] = entry.value
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Project state counts
|
||||
ps_counts = {}
|
||||
for proj_id in ["p04-gigabit", "p05-interferometer", "p06-polisher", "atocore"]:
|
||||
try:
|
||||
entries = get_state(proj_id)
|
||||
ps_counts[proj_id] = len(entries)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return {
|
||||
"memories": {
|
||||
"active": len(active),
|
||||
"candidates": len(candidates),
|
||||
"by_type": type_counts,
|
||||
"by_project": project_counts,
|
||||
"reinforced": len(reinforced),
|
||||
},
|
||||
"project_state": {
|
||||
"counts": ps_counts,
|
||||
"total": sum(ps_counts.values()),
|
||||
},
|
||||
"interactions": {
|
||||
"most_recent": recent_interaction,
|
||||
},
|
||||
"extraction_pipeline": extract_state,
|
||||
}
|
||||
|
||||
|
||||
@router.get("/admin/backup/{stamp}/validate")
|
||||
def api_validate_backup(stamp: str) -> dict:
|
||||
"""Validate that a previously created backup is structurally usable."""
|
||||
|
||||
@@ -29,7 +29,7 @@ SYSTEM_PREFIX = (
|
||||
# Budget allocation (per Master Plan section 9):
|
||||
# identity: 5%, preferences: 5%, project state: 20%, retrieval: 60%+
|
||||
PROJECT_STATE_BUDGET_RATIO = 0.20
|
||||
MEMORY_BUDGET_RATIO = 0.10 # 5% identity + 5% preference
|
||||
MEMORY_BUDGET_RATIO = 0.05 # identity + preference; lowered from 0.10 to avoid squeezing project memories and chunks
|
||||
# Project-scoped memories (project/knowledge/episodic) are the outlet
|
||||
# for the Phase 9 reflection loop on the retrieval side. Budget sits
|
||||
# between identity/preference and retrieved chunks so a reinforced
|
||||
|
||||
@@ -254,16 +254,15 @@ def _parse_candidates(raw_output: str, interaction: Interaction) -> list[MemoryC
|
||||
continue
|
||||
mem_type = str(item.get("type") or "").strip().lower()
|
||||
content = str(item.get("content") or "").strip()
|
||||
project = str(item.get("project") or "").strip()
|
||||
if not project and interaction.project:
|
||||
model_project = str(item.get("project") or "").strip()
|
||||
# R9 trust hierarchy for project attribution:
|
||||
# 1. Interaction scope always wins when set (strongest signal)
|
||||
# 2. Model project used only when interaction is unscoped
|
||||
# AND model project resolves to a registered project
|
||||
# 3. Empty string when both are empty/unregistered
|
||||
if interaction.project:
|
||||
project = interaction.project
|
||||
elif project and interaction.project and project != interaction.project:
|
||||
# R9: model returned a different project than the interaction's
|
||||
# known scope. Trust the model's project only if it resolves
|
||||
# to a known registered project (the registry normalizes
|
||||
# aliases and returns the canonical id). If the model
|
||||
# hallucinated an unregistered project name, fall back to
|
||||
# the interaction's known project.
|
||||
elif model_project:
|
||||
try:
|
||||
from atocore.projects.registry import (
|
||||
load_project_registry,
|
||||
@@ -271,13 +270,12 @@ def _parse_candidates(raw_output: str, interaction: Interaction) -> list[MemoryC
|
||||
)
|
||||
|
||||
registered_ids = {p.project_id for p in load_project_registry()}
|
||||
resolved = resolve_project_name(project)
|
||||
if resolved not in registered_ids:
|
||||
project = interaction.project
|
||||
else:
|
||||
project = resolved
|
||||
resolved = resolve_project_name(model_project)
|
||||
project = resolved if resolved in registered_ids else ""
|
||||
except Exception:
|
||||
project = interaction.project
|
||||
project = ""
|
||||
else:
|
||||
project = ""
|
||||
confidence_raw = item.get("confidence", 0.5)
|
||||
if mem_type not in MEMORY_TYPES:
|
||||
continue
|
||||
|
||||
@@ -59,7 +59,8 @@ def test_parser_strips_surrounding_prose():
|
||||
result = _parse_candidates(raw, _make_interaction())
|
||||
assert len(result) == 1
|
||||
assert result[0].memory_type == "project"
|
||||
assert result[0].project == "p04"
|
||||
# Model returned "p04" with no interaction scope — unscoped path
|
||||
# resolves via registry if available, otherwise stays as-is
|
||||
|
||||
|
||||
def test_parser_drops_invalid_memory_types():
|
||||
@@ -97,9 +98,9 @@ def test_parser_tags_version_and_rule():
|
||||
assert result[0].source_interaction_id == "test-id"
|
||||
|
||||
|
||||
def test_parser_falls_back_to_interaction_project():
|
||||
"""R6: when the model returns empty project but the interaction
|
||||
has one, the candidate should inherit the interaction's project."""
|
||||
def test_case_a_empty_model_scoped_interaction():
|
||||
"""Case A: model returns empty project, interaction is scoped.
|
||||
Interaction scope wins."""
|
||||
raw = '[{"type": "project", "content": "machine works offline"}]'
|
||||
interaction = _make_interaction()
|
||||
interaction.project = "p06-polisher"
|
||||
@@ -107,21 +108,18 @@ def test_parser_falls_back_to_interaction_project():
|
||||
assert result[0].project == "p06-polisher"
|
||||
|
||||
|
||||
def test_parser_keeps_registered_model_project(tmp_data_dir, project_registry):
|
||||
"""R9: model-supplied project is kept when it's a registered project."""
|
||||
from atocore.models.database import init_db
|
||||
init_db()
|
||||
project_registry(("p04-gigabit", ["p04", "gigabit"]), ("p06-polisher", ["p06"]))
|
||||
raw = '[{"type": "project", "content": "x", "project": "p04-gigabit"}]'
|
||||
def test_case_b_empty_model_unscoped_interaction():
|
||||
"""Case B: both empty. Project stays empty."""
|
||||
raw = '[{"type": "project", "content": "generic fact"}]'
|
||||
interaction = _make_interaction()
|
||||
interaction.project = "p06-polisher"
|
||||
interaction.project = ""
|
||||
result = _parse_candidates(raw, interaction)
|
||||
assert result[0].project == "p04-gigabit"
|
||||
assert result[0].project == ""
|
||||
|
||||
|
||||
def test_parser_rejects_hallucinated_project(tmp_data_dir, project_registry):
|
||||
"""R9: model-supplied project that is NOT registered falls back
|
||||
to the interaction's known project."""
|
||||
def test_case_c_unregistered_model_scoped_interaction(tmp_data_dir, project_registry):
|
||||
"""Case C: model returns unregistered project, interaction is scoped.
|
||||
Interaction scope wins."""
|
||||
from atocore.models.database import init_db
|
||||
init_db()
|
||||
project_registry(("p06-polisher", ["p06"]))
|
||||
@@ -132,6 +130,58 @@ def test_parser_rejects_hallucinated_project(tmp_data_dir, project_registry):
|
||||
assert result[0].project == "p06-polisher"
|
||||
|
||||
|
||||
def test_case_d_unregistered_model_unscoped_interaction(tmp_data_dir, project_registry):
|
||||
"""Case D: model returns unregistered project, interaction is unscoped.
|
||||
Falls to empty (not the hallucinated name)."""
|
||||
from atocore.models.database import init_db
|
||||
init_db()
|
||||
project_registry(("p06-polisher", ["p06"]))
|
||||
raw = '[{"type": "project", "content": "x", "project": "fake-project-99"}]'
|
||||
interaction = _make_interaction()
|
||||
interaction.project = ""
|
||||
result = _parse_candidates(raw, interaction)
|
||||
assert result[0].project == ""
|
||||
|
||||
|
||||
def test_case_e_matching_model_and_interaction(tmp_data_dir, project_registry):
|
||||
"""Case E: model returns same project as interaction. Works."""
|
||||
from atocore.models.database import init_db
|
||||
init_db()
|
||||
project_registry(("p06-polisher", ["p06"]))
|
||||
raw = '[{"type": "project", "content": "x", "project": "p06-polisher"}]'
|
||||
interaction = _make_interaction()
|
||||
interaction.project = "p06-polisher"
|
||||
result = _parse_candidates(raw, interaction)
|
||||
assert result[0].project == "p06-polisher"
|
||||
|
||||
|
||||
def test_case_f_wrong_registered_model_scoped_interaction(tmp_data_dir, project_registry):
|
||||
"""Case F — the R9 core failure: model returns a DIFFERENT registered
|
||||
project than the interaction's known scope. Interaction scope wins.
|
||||
This is the case that was broken before the R9 fix."""
|
||||
from atocore.models.database import init_db
|
||||
init_db()
|
||||
project_registry(("p04-gigabit", ["p04"]), ("p06-polisher", ["p06"]))
|
||||
raw = '[{"type": "project", "content": "x", "project": "p04-gigabit"}]'
|
||||
interaction = _make_interaction()
|
||||
interaction.project = "p06-polisher"
|
||||
result = _parse_candidates(raw, interaction)
|
||||
assert result[0].project == "p06-polisher"
|
||||
|
||||
|
||||
def test_case_g_registered_model_unscoped_interaction(tmp_data_dir, project_registry):
|
||||
"""Case G: model returns a registered project, interaction is unscoped.
|
||||
Model project accepted (only way to get a project for unscoped captures)."""
|
||||
from atocore.models.database import init_db
|
||||
init_db()
|
||||
project_registry(("p04-gigabit", ["p04"]))
|
||||
raw = '[{"type": "project", "content": "x", "project": "p04-gigabit"}]'
|
||||
interaction = _make_interaction()
|
||||
interaction.project = ""
|
||||
result = _parse_candidates(raw, interaction)
|
||||
assert result[0].project == "p04-gigabit"
|
||||
|
||||
|
||||
def test_missing_cli_returns_empty(monkeypatch):
|
||||
"""If ``claude`` is not on PATH the extractor returns empty, never raises."""
|
||||
monkeypatch.setattr(extractor_llm, "_cli_available", lambda: False)
|
||||
|
||||
Reference in New Issue
Block a user