diff --git a/docs/architecture/llm-client-integration.md b/docs/architecture/llm-client-integration.md index d8b53a0..2608cdd 100644 --- a/docs/architecture/llm-client-integration.md +++ b/docs/architecture/llm-client-integration.md @@ -116,6 +116,8 @@ exercised in real use, and only then get a client subcommand. The currently shipped scope (per `scripts/atocore_client.py`): +### Stable operations (shipped since the client was introduced) + | Subcommand | Purpose | API endpoint(s) | |---|---|---| | `health` | service status, mount + source readiness | `GET /health` | @@ -138,46 +140,46 @@ The currently shipped scope (per `scripts/atocore_client.py`): | `debug-context` | last context pack inspection | `GET /debug/context` | | `ingest-sources` | ingest configured source dirs | `POST /ingest/sources` | -That covers everything in the "stable operations" set today: -project lifecycle, ingestion, project-state curation, retrieval and -context build, retrieval-quality audit, health and stats inspection. +### Phase 9 reflection loop (shipped after migration safety work) + +These were explicitly deferred in earlier versions of this doc +pending "exercised workflow". The constraint was real — premature +API freeze would have made it harder to iterate on the ergonomics — +but the deferral ran into a bootstrap problem: you can't exercise +the workflow in real Claude Code sessions without a usable client +surface to drive it from. The fix is to ship a minimal Phase 9 +surface now and treat it as stable-but-refinable: adding new +optional parameters is fine, renaming subcommands is not. + +| Subcommand | Purpose | API endpoint(s) | +|---|---|---| +| `capture` | record one interaction round-trip | `POST /interactions` | +| `extract` | run the rule-based extractor (preview or persist) | `POST /interactions/{id}/extract` | +| `reinforce-interaction` | backfill reinforcement on an existing interaction | `POST /interactions/{id}/reinforce` | +| `list-interactions` | paginated list with filters | `GET /interactions` | +| `get-interaction` | fetch one interaction by id | `GET /interactions/{id}` | +| `queue` | list the candidate review queue | `GET /memory?status=candidate` | +| `promote` | move a candidate memory to active | `POST /memory/{id}/promote` | +| `reject` | mark a candidate memory invalid | `POST /memory/{id}/reject` | + +All 8 Phase 9 subcommands have test coverage in +`tests/test_atocore_client.py` via mocked `request()`, including +an end-to-end test that drives the full capture → extract → queue +→ promote/reject cycle through the client. + +### Coverage summary + +That covers everything in the "stable operations" set AND the +full Phase 9 reflection loop: project lifecycle, ingestion, +project-state curation, retrieval, context build, +retrieval-quality audit, health and stats inspection, interaction +capture, candidate extraction, candidate review queue. ## What's intentionally NOT in scope today -Three families of operations are explicitly **deferred** until -their workflows have been exercised in real use: +Two families of operations remain deferred: -### 1. Memory review queue and reflection loop - -Phase 9 Commit C shipped these endpoints: - -- `POST /interactions` (capture) -- `POST /interactions/{id}/reinforce` -- `POST /interactions/{id}/extract` -- `GET /memory?status=candidate` -- `POST /memory/{id}/promote` -- `POST /memory/{id}/reject` - -The contracts are stable, but the **workflow ergonomics** are not. -Until a real human has actually exercised the capture → extract → -review → promote/reject loop a few times and we know what feels -right, exposing those operations through the shared client would -prematurely freeze a UX that's still being designed. - -When the loop has been exercised in real use and we know what -the right subcommand shapes are, the shared client gains: - -- `capture [--project P] [--client C]` -- `extract [--persist]` -- `queue` (list candidate review queue) -- `promote ` -- `reject ` - -At that point the Claude Code slash command can grow a companion -`/atocore-record-response` command and the OpenClaw helper can be -extended with the same flow. - -### 2. Backup and restore admin operations +### 1. Backup and restore admin operations Phase 9 Commit B shipped these endpoints: @@ -198,7 +200,7 @@ they would set `ATOCORE_FAIL_OPEN=false` for the duration of the call so the operator gets a real error on failure rather than a silent fail-open envelope. -### 3. Engineering layer entity operations +### 2. Engineering layer entity operations The engineering layer is in planning, not implementation. When V1 ships per `engineering-v1-acceptance.md`, the shared client @@ -292,9 +294,15 @@ add a `CLIENT_VERSION = "x.y.z"` constant to 1. **Refactor the OpenClaw helper** to shell out to the shared client. Cross-repo coordination, not blocking anything in - AtoCore itself. -2. **Add memory-review subcommands** when the Phase 9 review - workflow has been exercised in real use. + AtoCore itself. With the Phase 9 subcommands now in the shared + client, the OpenClaw refactor can reuse all the reflection-loop + work instead of duplicating it. +2. **Real-usage validation of the Phase 9 loop**, now that the + client surface exists. First capture → extract → review cycle + against the live Dalidou instance, likely via the Claude Code + slash command flow. Findings feed back into subcommand + refinement (new optional flags are fine, renames require a + semver bump). 3. **Add backup admin subcommands** if and when we decide the shared client should be the canonical backup operator interface (with fail-open disabled for admin commands). @@ -302,8 +310,9 @@ add a `CLIENT_VERSION = "x.y.z"` constant to engineering V1 implementation sprint, per `engineering-v1-acceptance.md`. 5. **Tag a `CLIENT_VERSION` constant** the next time the shared - client surface meaningfully changes. Today's surface is the - v0.1.0 baseline. + client surface meaningfully changes. Today's surface with the + Phase 9 loop added is the v0.2.0 baseline (v0.1.0 was the + stable-ops-only version). ## TL;DR @@ -314,9 +323,10 @@ add a `CLIENT_VERSION = "x.y.z"` constant to helper, future Codex skill, future MCP server) are thin wrappers that shell out to the shared client - The shared client today covers project lifecycle, ingestion, - retrieval, context build, project-state, and retrieval audit -- Memory-review and engineering-entity commands are deferred - until their workflows are exercised + retrieval, context build, project-state, retrieval audit, AND + the full Phase 9 reflection loop (capture / extract / + reinforce / list / queue / promote / reject) +- Backup admin and engineering-entity commands remain deferred - The OpenClaw helper is currently a parallel implementation and the refactor to the shared client is a queued follow-up - New LLM clients should never reimplement HTTP calls — they diff --git a/scripts/atocore_client.py b/scripts/atocore_client.py index ccfcff9..1592b28 100644 --- a/scripts/atocore_client.py +++ b/scripts/atocore_client.py @@ -23,6 +23,15 @@ TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30")) REFRESH_TIMEOUT = int(os.environ.get("ATOCORE_REFRESH_TIMEOUT_SECONDS", "1800")) FAIL_OPEN = os.environ.get("ATOCORE_FAIL_OPEN", "true").lower() == "true" +# Bumped when the subcommand surface or JSON output shapes meaningfully +# change. See docs/architecture/llm-client-integration.md for the +# semver rules. History: +# 0.1.0 initial stable-ops-only client +# 0.2.0 Phase 9 reflection loop added: capture, extract, +# reinforce-interaction, list-interactions, get-interaction, +# queue, promote, reject +CLIENT_VERSION = "0.2.0" + def print_json(payload: Any) -> None: print(json.dumps(payload, ensure_ascii=True, indent=2)) @@ -243,6 +252,59 @@ def build_parser() -> argparse.ArgumentParser: p.add_argument("top_k", nargs="?", type=int, default=5) p.add_argument("project", nargs="?", default="") + # --- Phase 9 reflection loop surface -------------------------------- + # + # capture: record one interaction (prompt + response + context used). + # Mirrors POST /interactions. response is positional so shell + # callers can pass it via $(cat file.txt) or heredoc. project, + # client, and session_id are optional positionals with empty + # defaults, matching the existing script's style. + p = sub.add_parser("capture") + p.add_argument("prompt") + p.add_argument("response", nargs="?", default="") + p.add_argument("project", nargs="?", default="") + p.add_argument("client", nargs="?", default="") + p.add_argument("session_id", nargs="?", default="") + p.add_argument("reinforce", nargs="?", default="true") + + # extract: run the Phase 9 C rule-based extractor against an + # already-captured interaction. persist='true' writes the + # candidates as status='candidate' memories; default is + # preview-only. + p = sub.add_parser("extract") + p.add_argument("interaction_id") + p.add_argument("persist", nargs="?", default="false") + + # reinforce: backfill reinforcement on an already-captured interaction. + p = sub.add_parser("reinforce-interaction") + p.add_argument("interaction_id") + + # list-interactions: paginated listing with filters. + p = sub.add_parser("list-interactions") + p.add_argument("project", nargs="?", default="") + p.add_argument("session_id", nargs="?", default="") + p.add_argument("client", nargs="?", default="") + p.add_argument("since", nargs="?", default="") + p.add_argument("limit", nargs="?", type=int, default=50) + + # get-interaction: fetch one by id + p = sub.add_parser("get-interaction") + p.add_argument("interaction_id") + + # queue: list the candidate review queue + p = sub.add_parser("queue") + p.add_argument("memory_type", nargs="?", default="") + p.add_argument("project", nargs="?", default="") + p.add_argument("limit", nargs="?", type=int, default=50) + + # promote: candidate -> active + p = sub.add_parser("promote") + p.add_argument("memory_id") + + # reject: candidate -> invalid + p = sub.add_parser("reject") + p.add_argument("memory_id") + return parser @@ -304,6 +366,79 @@ def main() -> int: print_json(request("POST", "/context/build", {"prompt": args.prompt, "project": args.project or None, "budget": args.budget})) elif cmd == "audit-query": print_json(audit_query(args.prompt, args.top_k, args.project or None)) + # --- Phase 9 reflection loop surface ------------------------------ + elif cmd == "capture": + body: dict[str, Any] = { + "prompt": args.prompt, + "response": args.response, + "project": args.project, + "client": args.client or "atocore-client", + "session_id": args.session_id, + "reinforce": args.reinforce.lower() in {"1", "true", "yes", "y"}, + } + print_json(request("POST", "/interactions", body)) + elif cmd == "extract": + persist = args.persist.lower() in {"1", "true", "yes", "y"} + print_json( + request( + "POST", + f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}/extract", + {"persist": persist}, + ) + ) + elif cmd == "reinforce-interaction": + print_json( + request( + "POST", + f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}/reinforce", + {}, + ) + ) + elif cmd == "list-interactions": + query_parts: list[str] = [] + if args.project: + query_parts.append(f"project={urllib.parse.quote(args.project)}") + if args.session_id: + query_parts.append(f"session_id={urllib.parse.quote(args.session_id)}") + if args.client: + query_parts.append(f"client={urllib.parse.quote(args.client)}") + if args.since: + query_parts.append(f"since={urllib.parse.quote(args.since)}") + query_parts.append(f"limit={int(args.limit)}") + query = "?" + "&".join(query_parts) + print_json(request("GET", f"/interactions{query}")) + elif cmd == "get-interaction": + print_json( + request( + "GET", + f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}", + ) + ) + elif cmd == "queue": + query_parts = ["status=candidate"] + if args.memory_type: + query_parts.append(f"memory_type={urllib.parse.quote(args.memory_type)}") + if args.project: + query_parts.append(f"project={urllib.parse.quote(args.project)}") + query_parts.append(f"limit={int(args.limit)}") + query = "?" + "&".join(query_parts) + print_json(request("GET", f"/memory{query}")) + elif cmd == "promote": + print_json( + request( + "POST", + f"/memory/{urllib.parse.quote(args.memory_id, safe='')}/promote", + {}, + ) + ) + elif cmd == "reject": + print_json( + request( + "POST", + f"/memory/{urllib.parse.quote(args.memory_id, safe='')}/reject", + {}, + ) + ) else: return 1 return 0 diff --git a/tests/test_atocore_client.py b/tests/test_atocore_client.py new file mode 100644 index 0000000..79b4227 --- /dev/null +++ b/tests/test_atocore_client.py @@ -0,0 +1,313 @@ +"""Tests for scripts/atocore_client.py — the shared operator CLI. + +Specifically covers the Phase 9 reflection-loop subcommands added +after codex's sequence-step-3 review: ``capture``, ``extract``, +``reinforce-interaction``, ``list-interactions``, ``get-interaction``, +``queue``, ``promote``, ``reject``. + +The tests mock the client's ``request()`` helper and verify each +subcommand: + +- calls the correct HTTP method and path +- builds the correct JSON body (or the correct query string) +- passes the right subset of CLI arguments through + +This is the same "wiring test" shape used by tests/test_api_storage.py: +we don't exercise the live HTTP stack; we verify the client builds +the request correctly. The server side is already covered by its +own route tests. +""" + +from __future__ import annotations + +import json +import sys +from pathlib import Path + +import pytest + +# Make scripts/ importable +_REPO_ROOT = Path(__file__).resolve().parent.parent +sys.path.insert(0, str(_REPO_ROOT / "scripts")) + +import atocore_client as client # noqa: E402 + + +# --------------------------------------------------------------------------- +# Request capture helper +# --------------------------------------------------------------------------- + + +class _RequestCapture: + """Drop-in replacement for client.request() that records calls.""" + + def __init__(self, response: dict | None = None): + self.calls: list[dict] = [] + self._response = response if response is not None else {"ok": True} + + def __call__(self, method, path, data=None, timeout=None): + self.calls.append( + {"method": method, "path": path, "data": data, "timeout": timeout} + ) + return self._response + + +@pytest.fixture +def capture_requests(monkeypatch): + """Replace client.request with a recording stub and return it.""" + stub = _RequestCapture() + monkeypatch.setattr(client, "request", stub) + return stub + + +def _run_client(monkeypatch, argv: list[str]) -> int: + """Simulate a CLI invocation with the given argv.""" + monkeypatch.setattr(sys, "argv", ["atocore_client.py", *argv]) + return client.main() + + +# --------------------------------------------------------------------------- +# capture +# --------------------------------------------------------------------------- + + +def test_capture_posts_to_interactions_endpoint(capture_requests, monkeypatch): + _run_client( + monkeypatch, + [ + "capture", + "what is p05's current focus", + "The current focus is wave 2 operational ingestion.", + "p05-interferometer", + "claude-code-test", + "session-abc", + ], + ) + assert len(capture_requests.calls) == 1 + call = capture_requests.calls[0] + assert call["method"] == "POST" + assert call["path"] == "/interactions" + body = call["data"] + assert body["prompt"] == "what is p05's current focus" + assert body["response"].startswith("The current focus") + assert body["project"] == "p05-interferometer" + assert body["client"] == "claude-code-test" + assert body["session_id"] == "session-abc" + assert body["reinforce"] is True # default + + +def test_capture_sets_default_client_when_omitted(capture_requests, monkeypatch): + _run_client( + monkeypatch, + ["capture", "hi", "hello"], + ) + call = capture_requests.calls[0] + assert call["data"]["client"] == "atocore-client" + assert call["data"]["project"] == "" + assert call["data"]["reinforce"] is True + + +def test_capture_accepts_reinforce_false(capture_requests, monkeypatch): + _run_client( + monkeypatch, + ["capture", "prompt", "response", "p05", "claude", "sess", "false"], + ) + call = capture_requests.calls[0] + assert call["data"]["reinforce"] is False + + +# --------------------------------------------------------------------------- +# extract +# --------------------------------------------------------------------------- + + +def test_extract_default_is_preview(capture_requests, monkeypatch): + _run_client(monkeypatch, ["extract", "abc-123"]) + call = capture_requests.calls[0] + assert call["method"] == "POST" + assert call["path"] == "/interactions/abc-123/extract" + assert call["data"] == {"persist": False} + + +def test_extract_persist_true(capture_requests, monkeypatch): + _run_client(monkeypatch, ["extract", "abc-123", "true"]) + call = capture_requests.calls[0] + assert call["data"] == {"persist": True} + + +def test_extract_url_encodes_interaction_id(capture_requests, monkeypatch): + _run_client(monkeypatch, ["extract", "abc/def"]) + call = capture_requests.calls[0] + assert call["path"] == "/interactions/abc%2Fdef/extract" + + +# --------------------------------------------------------------------------- +# reinforce-interaction +# --------------------------------------------------------------------------- + + +def test_reinforce_interaction_posts_to_correct_path(capture_requests, monkeypatch): + _run_client(monkeypatch, ["reinforce-interaction", "int-xyz"]) + call = capture_requests.calls[0] + assert call["method"] == "POST" + assert call["path"] == "/interactions/int-xyz/reinforce" + assert call["data"] == {} + + +# --------------------------------------------------------------------------- +# list-interactions +# --------------------------------------------------------------------------- + + +def test_list_interactions_no_filters(capture_requests, monkeypatch): + _run_client(monkeypatch, ["list-interactions"]) + call = capture_requests.calls[0] + assert call["method"] == "GET" + assert call["path"] == "/interactions?limit=50" + + +def test_list_interactions_with_project_filter(capture_requests, monkeypatch): + _run_client(monkeypatch, ["list-interactions", "p05-interferometer"]) + call = capture_requests.calls[0] + assert "project=p05-interferometer" in call["path"] + assert "limit=50" in call["path"] + + +def test_list_interactions_full_filter_set(capture_requests, monkeypatch): + _run_client( + monkeypatch, + [ + "list-interactions", + "p05", + "sess-1", + "claude-code", + "2026-04-07T00:00:00Z", + "20", + ], + ) + call = capture_requests.calls[0] + path = call["path"] + assert "project=p05" in path + assert "session_id=sess-1" in path + assert "client=claude-code" in path + # Since is URL-encoded — the : and + chars get escaped + assert "since=2026-04-07" in path + assert "limit=20" in path + + +# --------------------------------------------------------------------------- +# get-interaction +# --------------------------------------------------------------------------- + + +def test_get_interaction_fetches_by_id(capture_requests, monkeypatch): + _run_client(monkeypatch, ["get-interaction", "int-42"]) + call = capture_requests.calls[0] + assert call["method"] == "GET" + assert call["path"] == "/interactions/int-42" + + +# --------------------------------------------------------------------------- +# queue +# --------------------------------------------------------------------------- + + +def test_queue_always_filters_by_candidate_status(capture_requests, monkeypatch): + _run_client(monkeypatch, ["queue"]) + call = capture_requests.calls[0] + assert call["method"] == "GET" + assert call["path"].startswith("/memory?") + assert "status=candidate" in call["path"] + assert "limit=50" in call["path"] + + +def test_queue_with_memory_type_and_project(capture_requests, monkeypatch): + _run_client(monkeypatch, ["queue", "adaptation", "p05-interferometer", "10"]) + call = capture_requests.calls[0] + path = call["path"] + assert "status=candidate" in path + assert "memory_type=adaptation" in path + assert "project=p05-interferometer" in path + assert "limit=10" in path + + +def test_queue_limit_coercion(capture_requests, monkeypatch): + """limit is typed as int by argparse so string '25' becomes 25.""" + _run_client(monkeypatch, ["queue", "", "", "25"]) + call = capture_requests.calls[0] + assert "limit=25" in call["path"] + + +# --------------------------------------------------------------------------- +# promote / reject +# --------------------------------------------------------------------------- + + +def test_promote_posts_to_memory_promote_path(capture_requests, monkeypatch): + _run_client(monkeypatch, ["promote", "mem-abc"]) + call = capture_requests.calls[0] + assert call["method"] == "POST" + assert call["path"] == "/memory/mem-abc/promote" + assert call["data"] == {} + + +def test_reject_posts_to_memory_reject_path(capture_requests, monkeypatch): + _run_client(monkeypatch, ["reject", "mem-xyz"]) + call = capture_requests.calls[0] + assert call["method"] == "POST" + assert call["path"] == "/memory/mem-xyz/reject" + assert call["data"] == {} + + +def test_promote_url_encodes_memory_id(capture_requests, monkeypatch): + _run_client(monkeypatch, ["promote", "mem/with/slashes"]) + call = capture_requests.calls[0] + assert "mem%2Fwith%2Fslashes" in call["path"] + + +# --------------------------------------------------------------------------- +# end-to-end: ensure the Phase 9 loop can be driven entirely through +# the client +# --------------------------------------------------------------------------- + + +def test_phase9_full_loop_via_client_shape(capture_requests, monkeypatch): + """Simulate the full capture -> extract -> queue -> promote cycle. + + This doesn't exercise real HTTP — each call is intercepted by + the mock request. But it proves every step of the Phase 9 loop + is reachable through the shared client, which is the whole point + of the codex-step-3 work. + """ + # Step 1: capture + _run_client( + monkeypatch, + [ + "capture", + "what about GF-PTFE for lateral support", + "## Decision: use GF-PTFE pads for thermal stability", + "p05-interferometer", + ], + ) + # Step 2: extract candidates (preview) + _run_client(monkeypatch, ["extract", "fake-interaction-id"]) + # Step 3: extract and persist + _run_client(monkeypatch, ["extract", "fake-interaction-id", "true"]) + # Step 4: list the review queue + _run_client(monkeypatch, ["queue"]) + # Step 5: promote a candidate + _run_client(monkeypatch, ["promote", "fake-memory-id"]) + # Step 6: reject another + _run_client(monkeypatch, ["reject", "fake-memory-id-2"]) + + methods_and_paths = [ + (c["method"], c["path"]) for c in capture_requests.calls + ] + assert methods_and_paths == [ + ("POST", "/interactions"), + ("POST", "/interactions/fake-interaction-id/extract"), + ("POST", "/interactions/fake-interaction-id/extract"), + ("GET", "/memory?status=candidate&limit=50"), + ("POST", "/memory/fake-memory-id/promote"), + ("POST", "/memory/fake-memory-id-2/reject"), + ]