Compare commits
82 Commits
codex/atoc
...
claude/ext
| Author | SHA1 | Date | |
|---|---|---|---|
| 3a7e8ccba4 | |||
| a29b5e22f2 | |||
| b309e7fd49 | |||
| 7d8d599030 | |||
| d9dc55f841 | |||
| 81307cec47 | |||
| 59331e522d | |||
| b3253f35ee | |||
| 30ee857d62 | |||
| 38f6e525af | |||
| 37331d53ef | |||
| 5aeeb1cad1 | |||
| 4da81c9e4e | |||
| 7bf83bf46a | |||
| 1161645415 | |||
| 5913da53c5 | |||
| 8ea53f4003 | |||
| 9366ba7879 | |||
| c5bad996a7 | |||
| 0b1742770a | |||
| 2829d5ec1c | |||
| 58c744fd2f | |||
| a34a7a995f | |||
| 92fc250b54 | |||
| 2d911909f8 | |||
| 1a8fdf4225 | |||
| 336208004c | |||
| 03822389a1 | |||
| be4099486c | |||
| 2c0b214137 | |||
| b492f5f7b0 | |||
| e877e5b8ff | |||
| fad30d5461 | |||
| 261277fd51 | |||
| 7e60f5a0e6 | |||
| 1953e559f9 | |||
| f521aab97b | |||
| fb6298a9a1 | |||
| f2372eff9e | |||
| 78d4e979e5 | |||
| d6ce6128cf | |||
| 368adf2ebc | |||
| a637017900 | |||
| d0ff8b5738 | |||
| b9da5b6d84 | |||
| bd291ff874 | |||
| 480f13a6df | |||
| 53147d326c | |||
| 2704997256 | |||
| ac14f8d6a4 | |||
| ceb129c7d1 | |||
| 2e449a4c33 | |||
| ea3fed3d44 | |||
| c9b9eede25 | |||
| 14ab7c8e9f | |||
| bdb42dba05 | |||
| 46a5d5887a | |||
| 9943338846 | |||
| 26bfa94c65 | |||
| 4aa2b696a9 | |||
| af01dd3e70 | |||
| 8f74cab0e6 | |||
| 06aa931273 | |||
| c9757e313a | |||
| 9715fe3143 | |||
| 1f1e6b5749 | |||
| 827dcf2cd1 | |||
| d8028f406e | |||
| 3b8d717bdf | |||
| 8293099025 | |||
| 0f95415530 | |||
| 82c7535d15 | |||
| 8a94da4bf4 | |||
| 5069d5b1b6 | |||
| 440fc1d9ba | |||
| 6bfa1fcc37 | |||
| b0889b3925 | |||
| b48f0c95ab | |||
| 531c560db7 | |||
| 6081462058 | |||
| b4afbbb53a | |||
| 32ce409a7b |
159
.claude/commands/atocore-context.md
Normal file
159
.claude/commands/atocore-context.md
Normal file
@@ -0,0 +1,159 @@
|
||||
---
|
||||
description: Pull a context pack from the live AtoCore service for the current prompt
|
||||
argument-hint: <prompt text> [project-id]
|
||||
---
|
||||
|
||||
You are about to enrich a user prompt with context from the live
|
||||
AtoCore service. This is the daily-use entry point for AtoCore from
|
||||
inside Claude Code.
|
||||
|
||||
The work happens via the **shared AtoCore operator client** at
|
||||
`scripts/atocore_client.py`. That client is the canonical Python
|
||||
backbone for stable AtoCore operations and is meant to be reused by
|
||||
every LLM client (OpenClaw helper, future Codex skill, etc.) — see
|
||||
`docs/architecture/llm-client-integration.md` for the layering. This
|
||||
slash command is a thin Claude Code-specific frontend on top of it.
|
||||
|
||||
## Step 1 — parse the arguments
|
||||
|
||||
The user invoked `/atocore-context` with:
|
||||
|
||||
```
|
||||
$ARGUMENTS
|
||||
```
|
||||
|
||||
You need to figure out two things:
|
||||
|
||||
1. The **prompt text** — what AtoCore will retrieve context for
|
||||
2. An **optional project hint** — used to scope retrieval to a
|
||||
specific project's trusted state and corpus
|
||||
|
||||
The user may have passed a project id or alias as the **last
|
||||
whitespace-separated token**. Don't maintain a hardcoded list of
|
||||
known aliases — let the shared client decide. Use this rule:
|
||||
|
||||
- Take the last token of `$ARGUMENTS`. Call it `MAYBE_HINT`.
|
||||
- Run `python scripts/atocore_client.py detect-project "$MAYBE_HINT"`
|
||||
to ask the registry whether it's a known project id or alias.
|
||||
This call is cheap (it just hits `/projects` and does a regex
|
||||
match) and inherits the client's fail-open behavior.
|
||||
- If the response has a non-null `matched_project`, the last
|
||||
token was an explicit project hint. `PROMPT_TEXT` is everything
|
||||
except the last token; `PROJECT_HINT` is the matched canonical
|
||||
project id.
|
||||
- Otherwise the last token is just part of the prompt.
|
||||
`PROMPT_TEXT` is the full `$ARGUMENTS`; `PROJECT_HINT` is empty.
|
||||
|
||||
This delegates the alias-knowledge to the registry instead of
|
||||
embedding a stale list in this markdown file. When you add a new
|
||||
project to the registry, the slash command picks it up
|
||||
automatically with no edits here.
|
||||
|
||||
## Step 2 — call the shared client for the context pack
|
||||
|
||||
The server resolves project hints through the registry before
|
||||
looking up trusted state, so you can pass either the canonical id
|
||||
or any alias to `context-build` and the trusted state lookup will
|
||||
work either way. (Regression test:
|
||||
`tests/test_context_builder.py::test_alias_hint_resolves_through_registry`.)
|
||||
|
||||
**If `PROJECT_HINT` is non-empty**, call `context-build` directly
|
||||
with that hint:
|
||||
|
||||
```bash
|
||||
python scripts/atocore_client.py context-build \
|
||||
"$PROMPT_TEXT" \
|
||||
"$PROJECT_HINT"
|
||||
```
|
||||
|
||||
**If `PROJECT_HINT` is empty**, do the 2-step fallback dance so the
|
||||
user always gets a context pack regardless of whether the prompt
|
||||
implies a project:
|
||||
|
||||
```bash
|
||||
# Try project auto-detection first.
|
||||
RESULT=$(python scripts/atocore_client.py auto-context "$PROMPT_TEXT")
|
||||
|
||||
# If auto-context could not detect a project it returns a small
|
||||
# {"status": "no_project_match", ...} envelope. In that case fall
|
||||
# back to a corpus-wide context build with no project hint, which
|
||||
# is the right behaviour for cross-project or generic prompts like
|
||||
# "what changed in AtoCore backup policy this week?"
|
||||
if echo "$RESULT" | grep -q '"no_project_match"'; then
|
||||
RESULT=$(python scripts/atocore_client.py context-build "$PROMPT_TEXT")
|
||||
fi
|
||||
|
||||
echo "$RESULT"
|
||||
```
|
||||
|
||||
This is the fix for the P2 finding from codex's review: previously
|
||||
the slash command sent every no-hint prompt through `auto-context`
|
||||
and returned `no_project_match` to the user with no context, even
|
||||
though the underlying client's `context-build` subcommand has
|
||||
always supported corpus-wide context builds.
|
||||
|
||||
In both branches the response is the JSON payload from
|
||||
`/context/build` (or, in the rare case where even the corpus-wide
|
||||
build fails, a `{"status": "unavailable"}` envelope from the
|
||||
client's fail-open layer).
|
||||
|
||||
## Step 3 — present the context pack to the user
|
||||
|
||||
The successful response contains at least:
|
||||
|
||||
- `formatted_context` — the assembled context block AtoCore would
|
||||
feed an LLM
|
||||
- `chunks_used`, `total_chars`, `budget`, `budget_remaining`,
|
||||
`duration_ms`
|
||||
- `chunks` — array of source documents that contributed, each with
|
||||
`source_file`, `heading_path`, `score`
|
||||
|
||||
Render in this order:
|
||||
|
||||
1. A one-line stats banner: `chunks=N, chars=X/budget, duration=Yms`
|
||||
2. The `formatted_context` block verbatim inside a fenced text code
|
||||
block so the user can read what AtoCore would feed an LLM
|
||||
3. The `chunks` array as a small bullet list with `source_file`,
|
||||
`heading_path`, and `score` per chunk
|
||||
|
||||
Two special cases:
|
||||
|
||||
- **`{"status": "unavailable"}`** (fail-open from the client)
|
||||
→ Tell the user: "AtoCore is unreachable at `$ATOCORE_BASE_URL`.
|
||||
Check `python scripts/atocore_client.py health` for diagnostics."
|
||||
- **Empty `chunks_used: 0` with no project state and no memories**
|
||||
→ Tell the user: "AtoCore returned no context for this prompt —
|
||||
either the corpus does not have relevant information or the
|
||||
project hint is wrong. Try a different hint or a longer prompt."
|
||||
|
||||
## Step 4 — what about capturing the interaction
|
||||
|
||||
Capture (Phase 9 Commit A) and the rest of the reflection loop
|
||||
(reinforcement, extraction, review queue) are intentionally NOT
|
||||
exposed by the shared client yet. The contracts are stable but the
|
||||
workflow ergonomics are not, so the daily-use slash command stays
|
||||
focused on context retrieval until those review flows have been
|
||||
exercised in real use. See `docs/architecture/llm-client-integration.md`
|
||||
for the deferral rationale.
|
||||
|
||||
When capture is added to the shared client, this slash command will
|
||||
gain a follow-up `/atocore-record-response` companion command that
|
||||
posts the LLM's response back to the same interaction. That work is
|
||||
queued.
|
||||
|
||||
## Notes for the assistant
|
||||
|
||||
- DO NOT bypass the shared client by calling curl yourself. The
|
||||
client is the contract between AtoCore and every LLM frontend; if
|
||||
you find a missing capability, the right fix is to extend the
|
||||
client, not to work around it.
|
||||
- DO NOT maintain a hardcoded list of project aliases in this
|
||||
file. Use `detect-project` to ask the registry — that's the
|
||||
whole point of having a registry.
|
||||
- DO NOT silently change `ATOCORE_BASE_URL`. If the env var points
|
||||
at the wrong instance, surface the error so the user can fix it.
|
||||
- DO NOT hide the formatted context pack from the user. Showing
|
||||
what AtoCore would feed an LLM is the whole point.
|
||||
- The output goes into the user's working context as background;
|
||||
they may follow up with their actual question, and the AtoCore
|
||||
context pack acts as informal injected knowledge.
|
||||
9
.dockerignore
Normal file
9
.dockerignore
Normal file
@@ -0,0 +1,9 @@
|
||||
.git
|
||||
.pytest_cache
|
||||
.coverage
|
||||
.claude
|
||||
data
|
||||
__pycache__
|
||||
*.pyc
|
||||
tests
|
||||
docs
|
||||
24
.env.example
Normal file
24
.env.example
Normal file
@@ -0,0 +1,24 @@
|
||||
ATOCORE_ENV=development
|
||||
ATOCORE_DEBUG=false
|
||||
ATOCORE_LOG_LEVEL=INFO
|
||||
ATOCORE_DATA_DIR=./data
|
||||
ATOCORE_DB_DIR=
|
||||
ATOCORE_CHROMA_DIR=
|
||||
ATOCORE_CACHE_DIR=
|
||||
ATOCORE_TMP_DIR=
|
||||
ATOCORE_VAULT_SOURCE_DIR=./sources/vault
|
||||
ATOCORE_DRIVE_SOURCE_DIR=./sources/drive
|
||||
ATOCORE_SOURCE_VAULT_ENABLED=true
|
||||
ATOCORE_SOURCE_DRIVE_ENABLED=true
|
||||
ATOCORE_LOG_DIR=./logs
|
||||
ATOCORE_BACKUP_DIR=./backups
|
||||
ATOCORE_RUN_DIR=./run
|
||||
ATOCORE_PROJECT_REGISTRY_DIR=./config
|
||||
ATOCORE_PROJECT_REGISTRY_PATH=./config/project-registry.json
|
||||
ATOCORE_HOST=127.0.0.1
|
||||
ATOCORE_PORT=8100
|
||||
ATOCORE_DB_BUSY_TIMEOUT_MS=5000
|
||||
ATOCORE_EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
|
||||
ATOCORE_CHUNK_MAX_SIZE=800
|
||||
ATOCORE_CHUNK_OVERLAP=100
|
||||
ATOCORE_CONTEXT_BUDGET=3000
|
||||
15
.gitignore
vendored
Normal file
15
.gitignore
vendored
Normal file
@@ -0,0 +1,15 @@
|
||||
data/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
.env
|
||||
*.egg-info/
|
||||
dist/
|
||||
build/
|
||||
.pytest_cache/
|
||||
htmlcov/
|
||||
.coverage
|
||||
venv/
|
||||
.venv/
|
||||
.claude/*
|
||||
!.claude/commands/
|
||||
!.claude/commands/**
|
||||
45
AGENTS.md
Normal file
45
AGENTS.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# AGENTS.md
|
||||
|
||||
## Session protocol (read first, every session)
|
||||
|
||||
**Before doing anything else, read `DEV-LEDGER.md` at the repo root.** It is the one-file source of truth for "what is currently true" — live SHA, active plan, open review findings, recent decisions. The narrative docs under `docs/` may lag; the ledger does not.
|
||||
|
||||
**Before ending a session, append a Session Log line to `DEV-LEDGER.md`** with what you did and which commit range it covers, and bump the Orientation section if anything there changed.
|
||||
|
||||
This rule applies equally to Claude, Codex, and any future agent working in this repo.
|
||||
|
||||
## Project role
|
||||
This repository is AtoCore, the runtime and machine-memory layer of the Ato ecosystem.
|
||||
|
||||
## Ecosystem definitions
|
||||
- AtoCore = app/runtime/API/ingestion/retrieval/context builder/machine DB logic
|
||||
- AtoMind = future intelligence layer for promotion, reflection, conflict handling, trust decisions
|
||||
- AtoVault = human-readable memory source, intended for Obsidian
|
||||
- AtoDrive = trusted operational project source, higher trust than general vault notes
|
||||
|
||||
## Storage principles
|
||||
- Human-readable source layers and machine operational storage must remain separate
|
||||
- AtoVault is not the live vector database location
|
||||
- AtoDrive is not the live vector database location
|
||||
- Machine operational storage includes SQLite, vector store, indexes, embeddings, and runtime metadata
|
||||
- The machine DB is derived operational state, not the primary human source of truth
|
||||
|
||||
## Deployment principles
|
||||
- Dalidou is the canonical host for AtoCore service and machine database
|
||||
- OpenClaw on the T420 should consume AtoCore over API/network/Tailscale
|
||||
- Do not design around Syncthing for the live SQLite/vector DB
|
||||
- Prefer one canonical running service over multi-node live DB replication
|
||||
|
||||
## Coding guidance
|
||||
- Keep path handling explicit and configurable via environment variables
|
||||
- Do not hard-code machine-specific absolute paths
|
||||
- Keep implementation small, testable, and reversible
|
||||
- Preserve current working behavior unless a change is necessary
|
||||
- Add or update tests when changing config, storage, or path logic
|
||||
|
||||
## Change policy
|
||||
Before large refactors:
|
||||
1. explain the architectural reason
|
||||
2. propose the smallest safe batch
|
||||
3. implement incrementally
|
||||
4. summarize changed files and migration impact
|
||||
30
CLAUDE.md
Normal file
30
CLAUDE.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# CLAUDE.md — project instructions for AtoCore
|
||||
|
||||
## Session protocol
|
||||
|
||||
Before doing anything else in this repo, read `DEV-LEDGER.md` at the repo root. It is the shared operating memory between Claude, Codex, and the human operator — live Dalidou SHA, active plan, open P1/P2 review findings, recent decisions, and session log. The narrative docs under `docs/` sometimes lag; the ledger does not.
|
||||
|
||||
Before ending a session, append a Session Log line to `DEV-LEDGER.md` covering:
|
||||
|
||||
- which commits you produced (sha range)
|
||||
- what changed at a high level
|
||||
- any harness / test count deltas
|
||||
- anything you overclaimed and later corrected
|
||||
|
||||
Bump the **Orientation** section if `live_sha`, `main_tip`, `test_count`, or `harness` changed.
|
||||
|
||||
`AGENTS.md` at the repo root carries the broader project principles (storage separation, deployment model, coding guidance). Read it when you need the "why" behind a constraint.
|
||||
|
||||
## Deploy workflow
|
||||
|
||||
```bash
|
||||
git push origin main && ssh papa@dalidou "bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh"
|
||||
```
|
||||
|
||||
The deploy script self-verifies via `/health` build_sha — if it exits non-zero, do not assume the change is live.
|
||||
|
||||
## Working model
|
||||
|
||||
- Claude builds; Codex audits. No parallel work on the same files.
|
||||
- P1 review findings block further `main` commits until acknowledged in the ledger's **Open Review Findings** table.
|
||||
- Codex branches must fork from `origin/main` (no orphan commits that require `--allow-unrelated-histories`).
|
||||
171
DEV-LEDGER.md
Normal file
171
DEV-LEDGER.md
Normal file
@@ -0,0 +1,171 @@
|
||||
# AtoCore Dev Ledger
|
||||
|
||||
> Shared operating memory between humans, Claude, and Codex.
|
||||
> **Every session MUST read this file at start and append a Session Log entry before ending.**
|
||||
> Section headers are stable - do not rename them. Trim Session Log and Recent Decisions to the last 20 entries at session end; older history lives in `git log` and `docs/`.
|
||||
|
||||
## Orientation
|
||||
|
||||
- **live_sha** (Dalidou `/health` build_sha): `38f6e52`
|
||||
- **last_updated**: 2026-04-11 by Codex (review protocol formalized)
|
||||
- **main_tip**: `81307ce`
|
||||
- **test_count**: 264 passing
|
||||
- **harness**: `6/6 PASS` (`python scripts/retrieval_eval.py` against live Dalidou)
|
||||
- **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron env `ATOCORE_BACKUP_RSYNC`, verified
|
||||
|
||||
## Active Plan
|
||||
|
||||
**Mini-phase**: Extractor improvement (eval-driven) + retrieval harness expansion.
|
||||
**Duration**: 8 days, hard gates at each day boundary.
|
||||
**Plan author**: Codex (2026-04-11). **Executor**: Claude. **Audit**: Codex.
|
||||
|
||||
### Preflight (before Day 1)
|
||||
|
||||
Stop if any of these fail:
|
||||
|
||||
- `git rev-parse HEAD` on `main` matches the expected branching tip
|
||||
- Live `/health` on Dalidou reports the SHA you think is deployed
|
||||
- `python scripts/retrieval_eval.py --json` still passes at the current baseline
|
||||
- `batch-extract` over the known 42-capture slice reproduces the current low-yield baseline
|
||||
- A frozen sample set exists for extractor labeling so the target does not move mid-phase
|
||||
|
||||
Success: baseline eval output saved, baseline extract output saved, working branch created from `origin/main`.
|
||||
|
||||
### Day 1 - Labeled extractor eval set
|
||||
|
||||
Pick 30 real captures: 10 that should produce 0 candidates, 10 that should plausibly produce 1, 10 ambiguous/hard. Store as a stable artifact (interaction id, expected count, expected type, notes). Add a runner that scores extractor output against labels.
|
||||
|
||||
Success: 30 labeled interactions in a stable artifact, one-command precision/recall output.
|
||||
Fail-early: if labeling 30 takes more than a day because the concept is unclear, tighten the extraction target before touching code.
|
||||
|
||||
### Day 2 - Measure current extractor
|
||||
|
||||
Run the rule-based extractor on all 30. Record yield, TP, FP, FN. Bucket misses by class (conversational preference, decision summary, status/constraint, meta chatter).
|
||||
|
||||
Success: short scorecard with counts by miss type, top 2 miss classes obvious.
|
||||
Fail-early: if the labeled set shows fewer than 5 plausible positives total, the corpus is too weak - relabel before tuning.
|
||||
|
||||
### Day 3 - Smallest rule expansion for top miss class
|
||||
|
||||
Add 1-2 narrow, explainable rules for the worst miss class. Add unit tests from real paraphrase examples in the labeled set. Then rerun eval.
|
||||
|
||||
Success: recall up on the labeled set, false positives do not materially rise, new tests cover the new cue class.
|
||||
Fail-early: if one rule expansion raises FP above ~20% of extracted candidates, revert or narrow before adding more.
|
||||
|
||||
### Day 4 - Decision gate: more rules or LLM-assisted prototype
|
||||
|
||||
If rule expansion reaches a **meaningfully reviewable queue**, keep going with rules. Otherwise prototype an LLM-assisted extraction mode behind a flag.
|
||||
|
||||
"Meaningfully reviewable queue":
|
||||
- >= 15-25% candidate yield on the 30 labeled captures
|
||||
- FP rate low enough that manual triage feels tolerable
|
||||
- >= 2 real non-synthetic candidates worth review
|
||||
|
||||
Hard stop: if candidate yield is still under 10% after this point, stop rule tinkering and switch to architecture review (LLM-assisted OR narrower extraction scope).
|
||||
|
||||
### Day 5 - Stabilize and document
|
||||
|
||||
Add remaining focused rules or the flagged LLM-assisted path. Write down in-scope and out-of-scope utterance kinds.
|
||||
|
||||
Success: labeled eval green against target threshold, extractor scope explainable in <= 5 bullets.
|
||||
|
||||
### Day 6 - Retrieval harness expansion (6 -> 15-20 fixtures)
|
||||
|
||||
Grow across p04/p05/p06. Include short ambiguous prompts, cross-project collision cases, expected project-state wins, expected project-memory wins, and 1-2 "should fail open / low confidence" cases.
|
||||
|
||||
Success: >= 15 fixtures, each active project has easy + medium + hard cases.
|
||||
Fail-early: if fixtures are mostly obvious wins, add harder adversarial cases before claiming coverage.
|
||||
|
||||
### Day 7 - Regression pass and calibration
|
||||
|
||||
Run harness on current code vs live Dalidou. Inspect failures (ranking, ingestion gap, project bleed, budget). Make at most ONE ranking/budget tweak if the harness clearly justifies it. Do not mix harness expansion and ranking changes in a single commit unless tightly coupled.
|
||||
|
||||
Success: harness still passes or improves after extractor work; any ranking tweak is justified by a concrete fixture delta.
|
||||
Fail-early: if > 20-25% of harness fixtures regress after extractor changes, separate concerns before merging.
|
||||
|
||||
### Day 8 - Merge and close
|
||||
|
||||
Clean commit sequence. Save before/after metrics (extractor scorecard, harness results). Update docs only with claims the metrics support.
|
||||
|
||||
Merge order: labeled corpus + runner -> extractor improvements + tests -> harness expansion -> any justified ranking tweak -> docs sync last.
|
||||
|
||||
Success: point to a before/after delta for both extraction and retrieval; docs do not overclaim.
|
||||
|
||||
### Hard Gates (stop/rethink points)
|
||||
|
||||
- Extractor yield < 10% after 30 labeled interactions -> stop, reconsider rule-only extraction
|
||||
- FP rate > 20% on labeled set -> narrow rules before adding more
|
||||
- Harness expansion finds < 3 genuinely hard cases -> harness still too soft
|
||||
- Ranking change improves one project but regresses another -> do not merge without explicit tradeoff note
|
||||
|
||||
### Branching
|
||||
|
||||
One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-harness-expansion` for Day 6-7. Keeps extraction and retrieval judgments auditable.
|
||||
|
||||
## Review Protocol
|
||||
|
||||
- Codex records review findings in **Open Review Findings**.
|
||||
- Claude must read **Open Review Findings** at session start before coding.
|
||||
- Codex owns finding text. Claude may update operational fields only:
|
||||
- `status`
|
||||
- `owner`
|
||||
- `resolved_by`
|
||||
- If Claude disagrees with a finding, do not rewrite it. Mark it `declined` and explain why in the **Session Log**.
|
||||
- Any commit or session that addresses a finding should reference the finding id in the commit message or **Session Log**.
|
||||
- `P1` findings block further commits in the affected area until they are at least acknowledged and explicitly tracked.
|
||||
- Findings may be code-level, claim-level, or ops-level. If the implementation boundary changes, retarget the finding instead of silently closing it.
|
||||
|
||||
## Open Review Findings
|
||||
|
||||
| id | finder | severity | file:line | summary | status | owner | opened_at | resolved_by |
|
||||
|-----|--------|----------|------------------------------------|-------------------------------------------------------------------------|--------------|--------|------------|-------------|
|
||||
| R1 | Codex | P1 | deploy/hooks/capture_stop.py:76-85 | Live Claude capture still omits `extract`, so "loop closed both sides" remains overstated in practice even though the API supports it | acknowledged | Claude | 2026-04-11 | |
|
||||
| R2 | Codex | P1 | src/atocore/context/builder.py | Project memories excluded from pack | fixed | Claude | 2026-04-11 | 8ea53f4 |
|
||||
| R3 | Claude | P2 | src/atocore/memory/extractor.py | Rule cues (`## Decision:`) never fire on conversational LLM text | open | Claude | 2026-04-11 | |
|
||||
| R4 | Codex | P2 | DEV-LEDGER.md:11 | Orientation `main_tip` was stale versus `HEAD` / `origin/main` | fixed | Codex | 2026-04-11 | 81307ce |
|
||||
|
||||
## Recent Decisions
|
||||
|
||||
- **2026-04-11** Adopt this ledger as shared operating memory between Claude and Codex. *Proposed by:* Antoine. *Ratified by:* Antoine.
|
||||
- **2026-04-11** Accept Codex's 8-day mini-phase plan verbatim as Active Plan. *Proposed by:* Codex. *Ratified by:* Antoine.
|
||||
- **2026-04-11** Review findings live in `DEV-LEDGER.md` with Codex owning finding text and Claude updating status fields only. *Proposed by:* Codex. *Ratified by:* Antoine.
|
||||
- **2026-04-11** Project memories land in the pack under `--- Project Memories ---` at 25% budget ratio, gated on canonical project hint. *Proposed by:* Claude.
|
||||
- **2026-04-11** Extraction stays off the capture hot path. Batch / manual only. *Proposed by:* Antoine.
|
||||
- **2026-04-11** 4-step roadmap: extractor -> harness expansion -> Wave 2 ingestion -> OpenClaw finish. Steps 1+2 as one mini-phase. *Ratified by:* Antoine.
|
||||
- **2026-04-11** Codex branches must fork from `main`, not be orphan commits. *Proposed by:* Claude. *Agreed by:* Codex.
|
||||
|
||||
## Session Log
|
||||
|
||||
- **2026-04-11 Codex (ledger audit)** fixed stale `main_tip`, retargeted R1 from the API surface to the live Claude Stop hook, and formalized the review write protocol so Claude can consume findings without rewriting them.
|
||||
- **2026-04-11 Claude** `b3253f3..59331e5` (1 commit). Wired the DEV-LEDGER, added session protocol to AGENTS.md, created project-local CLAUDE.md, deleted stale `codex/port-atocore-ops-client` remote branch. No code changes, no redeploy needed.
|
||||
- **2026-04-11 Claude** `c5bad99..b3253f3` (11 commits + 1 merge). Length-aware reinforcement, project memories in pack, query-relevance memory ranking, hyphenated-identifier tokenizer, retrieval eval harness seeded, off-host backup wired end-to-end, docs synced, codex integration-pass branch merged. Harness went 0->6/6 on live Dalidou.
|
||||
- **2026-04-11 Codex (async review)** identified 2 P1s against a stale checkout. R1 was fair (extraction not automated), R2 was outdated (project memories already landed on main). Delivered the 8-day execution plan now in Active Plan.
|
||||
- **2026-04-06 Antoine** created `codex/atocore-integration-pass` with the `t420-openclaw/` workspace (merged 2026-04-11).
|
||||
|
||||
## Working Rules
|
||||
|
||||
- Claude builds; Codex audits. No parallel work on the same files.
|
||||
- Codex branches fork from `main`: `git fetch origin && git checkout -b codex/<topic> origin/main`.
|
||||
- P1 findings block further main commits until acknowledged in Open Review Findings.
|
||||
- Every session appends at least one Session Log line and bumps Orientation.
|
||||
- Trim Session Log and Recent Decisions to the last 20 at session end.
|
||||
- Docs in `docs/` may overclaim stale status; the ledger is the one-file source of truth for "what is true right now."
|
||||
|
||||
## Quick Commands
|
||||
|
||||
```bash
|
||||
# Check live state
|
||||
ssh papa@dalidou "curl -s http://localhost:8100/health"
|
||||
|
||||
# Run the retrieval harness
|
||||
python scripts/retrieval_eval.py # human-readable
|
||||
python scripts/retrieval_eval.py --json # machine-readable
|
||||
|
||||
# Deploy a new main tip
|
||||
git push origin main && ssh papa@dalidou "bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh"
|
||||
|
||||
# Reflection-loop ops
|
||||
python scripts/atocore_client.py batch-extract '' '' 200 false # preview
|
||||
python scripts/atocore_client.py batch-extract '' '' 200 true # persist
|
||||
python scripts/atocore_client.py triage
|
||||
```
|
||||
21
Dockerfile
Normal file
21
Dockerfile
Normal file
@@ -0,0 +1,21 @@
|
||||
FROM python:3.12-slim
|
||||
|
||||
ENV PYTHONDONTWRITEBYTECODE=1
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN apt-get update \
|
||||
&& apt-get install -y --no-install-recommends build-essential curl git \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY pyproject.toml README.md requirements.txt requirements-dev.txt ./
|
||||
COPY config ./config
|
||||
COPY src ./src
|
||||
|
||||
RUN pip install --no-cache-dir --upgrade pip \
|
||||
&& pip install --no-cache-dir .
|
||||
|
||||
EXPOSE 8100
|
||||
|
||||
CMD ["python", "-m", "uvicorn", "atocore.main:app", "--host", "0.0.0.0", "--port", "8100"]
|
||||
88
README.md
Normal file
88
README.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# AtoCore
|
||||
|
||||
Personal context engine that enriches LLM interactions with durable memory, structured context, and project knowledge.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -e .
|
||||
uvicorn src.atocore.main:app --port 8100
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Ingest markdown files
|
||||
curl -X POST http://localhost:8100/ingest \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"path": "/path/to/notes"}'
|
||||
|
||||
# Build enriched context for a prompt
|
||||
curl -X POST http://localhost:8100/context/build \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"prompt": "What is the project status?", "project": "myproject"}'
|
||||
|
||||
# CLI ingestion
|
||||
python scripts/ingest_folder.py --path /path/to/notes
|
||||
|
||||
# Live operator client
|
||||
python scripts/atocore_client.py health
|
||||
python scripts/atocore_client.py audit-query "gigabit" 5
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| POST | /ingest | Ingest markdown file or folder |
|
||||
| POST | /query | Retrieve relevant chunks |
|
||||
| POST | /context/build | Build full context pack |
|
||||
| GET | /health | Health check |
|
||||
| GET | /debug/context | Inspect last context pack |
|
||||
|
||||
## Architecture
|
||||
|
||||
```text
|
||||
FastAPI (port 8100)
|
||||
|- Ingestion: markdown -> parse -> chunk -> embed -> store
|
||||
|- Retrieval: query -> embed -> vector search -> rank
|
||||
|- Context Builder: retrieve -> boost -> budget -> format
|
||||
|- SQLite (documents, chunks, memories, projects, interactions)
|
||||
'- ChromaDB (vector embeddings)
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Set via environment variables (prefix `ATOCORE_`):
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| ATOCORE_DEBUG | false | Enable debug logging |
|
||||
| ATOCORE_PORT | 8100 | Server port |
|
||||
| ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) |
|
||||
| ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) |
|
||||
| ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model |
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
pip install -e ".[dev]"
|
||||
pytest
|
||||
```
|
||||
|
||||
## Operations
|
||||
|
||||
- `scripts/atocore_client.py` provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
|
||||
- `docs/operations.md` captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.
|
||||
|
||||
## Architecture Notes
|
||||
|
||||
Implementation-facing architecture notes live under `docs/architecture/`.
|
||||
|
||||
Current additions:
|
||||
- `docs/architecture/engineering-knowledge-hybrid-architecture.md` — 5-layer hybrid model
|
||||
- `docs/architecture/engineering-ontology-v1.md` — V1 object and relationship inventory
|
||||
- `docs/architecture/engineering-query-catalog.md` — 20 v1-required queries
|
||||
- `docs/architecture/memory-vs-entities.md` — canonical home split
|
||||
- `docs/architecture/promotion-rules.md` — Layer 0 to Layer 2 pipeline
|
||||
- `docs/architecture/conflict-model.md` — contradictory facts detection and resolution
|
||||
21
config/project-registry.example.json
Normal file
21
config/project-registry.example.json
Normal file
@@ -0,0 +1,21 @@
|
||||
{
|
||||
"projects": [
|
||||
{
|
||||
"id": "p07-example",
|
||||
"aliases": ["p07", "example-project"],
|
||||
"description": "Short description of the project and the staged source set.",
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p07-example",
|
||||
"label": "Primary staged project docs"
|
||||
},
|
||||
{
|
||||
"source": "drive",
|
||||
"subpath": "projects/p07-example",
|
||||
"label": "Trusted operational docs"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
52
config/project-registry.json
Normal file
52
config/project-registry.json
Normal file
@@ -0,0 +1,52 @@
|
||||
{
|
||||
"projects": [
|
||||
{
|
||||
"id": "atocore",
|
||||
"aliases": ["ato core"],
|
||||
"description": "AtoCore platform docs and trusted project materials.",
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "drive",
|
||||
"subpath": "atocore",
|
||||
"label": "AtoCore drive docs"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "p04-gigabit",
|
||||
"aliases": ["p04", "gigabit", "gigaBIT"],
|
||||
"description": "Active P04 GigaBIT mirror project corpus from PKM plus staged operational docs.",
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p04-gigabit",
|
||||
"label": "P04 staged project docs"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "p05-interferometer",
|
||||
"aliases": ["p05", "interferometer"],
|
||||
"description": "Active P05 interferometer corpus from PKM plus selected repo context and vendor documentation.",
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p05-interferometer",
|
||||
"label": "P05 staged project docs"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "p06-polisher",
|
||||
"aliases": ["p06", "polisher"],
|
||||
"description": "Active P06 polisher corpus from PKM, software-suite notes, and selected repo context.",
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p06-polisher",
|
||||
"label": "P06 staged project docs"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
19
deploy/dalidou/.env.example
Normal file
19
deploy/dalidou/.env.example
Normal file
@@ -0,0 +1,19 @@
|
||||
ATOCORE_ENV=production
|
||||
ATOCORE_DEBUG=false
|
||||
ATOCORE_LOG_LEVEL=INFO
|
||||
ATOCORE_HOST=0.0.0.0
|
||||
ATOCORE_PORT=8100
|
||||
|
||||
ATOCORE_DATA_DIR=/srv/storage/atocore/data
|
||||
ATOCORE_DB_DIR=/srv/storage/atocore/data/db
|
||||
ATOCORE_CHROMA_DIR=/srv/storage/atocore/data/chroma
|
||||
ATOCORE_CACHE_DIR=/srv/storage/atocore/data/cache
|
||||
ATOCORE_TMP_DIR=/srv/storage/atocore/data/tmp
|
||||
ATOCORE_LOG_DIR=/srv/storage/atocore/logs
|
||||
ATOCORE_BACKUP_DIR=/srv/storage/atocore/backups
|
||||
ATOCORE_RUN_DIR=/srv/storage/atocore/run
|
||||
|
||||
ATOCORE_VAULT_SOURCE_DIR=/srv/storage/atocore/sources/vault
|
||||
ATOCORE_DRIVE_SOURCE_DIR=/srv/storage/atocore/sources/drive
|
||||
ATOCORE_SOURCE_VAULT_ENABLED=true
|
||||
ATOCORE_SOURCE_DRIVE_ENABLED=true
|
||||
85
deploy/dalidou/cron-backup.sh
Executable file
85
deploy/dalidou/cron-backup.sh
Executable file
@@ -0,0 +1,85 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# deploy/dalidou/cron-backup.sh
|
||||
# ------------------------------
|
||||
# Daily backup + retention cleanup via the AtoCore API.
|
||||
#
|
||||
# Intended to run from cron on Dalidou:
|
||||
#
|
||||
# # Daily at 03:00 UTC
|
||||
# 0 3 * * * /srv/storage/atocore/app/deploy/dalidou/cron-backup.sh >> /var/log/atocore-backup.log 2>&1
|
||||
#
|
||||
# What it does:
|
||||
# 1. Creates a runtime backup (db + registry, no chroma by default)
|
||||
# 2. Runs retention cleanup with --confirm to delete old snapshots
|
||||
# 3. Logs results to stdout (captured by cron into the log file)
|
||||
#
|
||||
# Fail-open: exits 0 even on API errors so cron doesn't send noise
|
||||
# emails. Check /var/log/atocore-backup.log for diagnostics.
|
||||
#
|
||||
# Environment variables:
|
||||
# ATOCORE_URL default http://127.0.0.1:8100
|
||||
# ATOCORE_BACKUP_CHROMA default false (set to "true" for cold chroma copy)
|
||||
# ATOCORE_BACKUP_DIR default /srv/storage/atocore/backups
|
||||
# ATOCORE_BACKUP_RSYNC optional rsync destination for off-host copies
|
||||
# (e.g. papa@laptop:/home/papa/atocore-backups/)
|
||||
# When set, the local snapshots tree is rsynced to
|
||||
# the destination after cleanup. Unset = skip.
|
||||
# SSH key auth must already be configured from this
|
||||
# host to the destination.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
ATOCORE_URL="${ATOCORE_URL:-http://127.0.0.1:8100}"
|
||||
INCLUDE_CHROMA="${ATOCORE_BACKUP_CHROMA:-false}"
|
||||
BACKUP_DIR="${ATOCORE_BACKUP_DIR:-/srv/storage/atocore/backups}"
|
||||
RSYNC_TARGET="${ATOCORE_BACKUP_RSYNC:-}"
|
||||
TIMESTAMP="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||||
|
||||
log() { printf '[%s] %s\n' "$TIMESTAMP" "$*"; }
|
||||
|
||||
log "=== AtoCore daily backup starting ==="
|
||||
|
||||
# Step 1: Create backup
|
||||
log "Step 1: creating backup (chroma=$INCLUDE_CHROMA)"
|
||||
BACKUP_RESULT=$(curl -sf -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"include_chroma\": $INCLUDE_CHROMA}" \
|
||||
"$ATOCORE_URL/admin/backup" 2>&1) || {
|
||||
log "ERROR: backup creation failed: $BACKUP_RESULT"
|
||||
exit 0
|
||||
}
|
||||
log "Backup created: $BACKUP_RESULT"
|
||||
|
||||
# Step 2: Retention cleanup (confirm=true to actually delete)
|
||||
log "Step 2: running retention cleanup"
|
||||
CLEANUP_RESULT=$(curl -sf -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"confirm": true}' \
|
||||
"$ATOCORE_URL/admin/backup/cleanup" 2>&1) || {
|
||||
log "ERROR: cleanup failed: $CLEANUP_RESULT"
|
||||
exit 0
|
||||
}
|
||||
log "Cleanup result: $CLEANUP_RESULT"
|
||||
|
||||
# Step 3: Off-host rsync (optional). Fail-open: log but don't abort
|
||||
# the cron so a laptop being offline at 03:00 UTC never turns the
|
||||
# local backup path red.
|
||||
if [[ -n "$RSYNC_TARGET" ]]; then
|
||||
log "Step 3: rsyncing snapshots to $RSYNC_TARGET"
|
||||
if [[ ! -d "$BACKUP_DIR/snapshots" ]]; then
|
||||
log "WARN: $BACKUP_DIR/snapshots does not exist, skipping rsync"
|
||||
else
|
||||
RSYNC_OUTPUT=$(rsync -a --delete \
|
||||
-e "ssh -o ConnectTimeout=10 -o BatchMode=yes -o StrictHostKeyChecking=accept-new" \
|
||||
"$BACKUP_DIR/snapshots/" "$RSYNC_TARGET" 2>&1) && {
|
||||
log "Rsync complete"
|
||||
} || {
|
||||
log "WARN: rsync to $RSYNC_TARGET failed (offline or auth?): $RSYNC_OUTPUT"
|
||||
}
|
||||
fi
|
||||
else
|
||||
log "Step 3: ATOCORE_BACKUP_RSYNC not set, skipping off-host copy"
|
||||
fi
|
||||
|
||||
log "=== AtoCore daily backup complete ==="
|
||||
349
deploy/dalidou/deploy.sh
Normal file
349
deploy/dalidou/deploy.sh
Normal file
@@ -0,0 +1,349 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# deploy/dalidou/deploy.sh
|
||||
# -------------------------
|
||||
# One-shot deploy script for updating the running AtoCore container
|
||||
# on Dalidou from the current Gitea main branch.
|
||||
#
|
||||
# The script is idempotent and safe to re-run. It handles both the
|
||||
# first-time deploy (where /srv/storage/atocore/app may not yet be
|
||||
# a git checkout) and the ongoing update case (where it is).
|
||||
#
|
||||
# Usage
|
||||
# -----
|
||||
#
|
||||
# # Normal update from main (most common)
|
||||
# bash deploy/dalidou/deploy.sh
|
||||
#
|
||||
# # Deploy a specific branch or tag
|
||||
# ATOCORE_BRANCH=codex/some-feature bash deploy/dalidou/deploy.sh
|
||||
#
|
||||
# # Dry-run: show what would happen without touching anything
|
||||
# ATOCORE_DEPLOY_DRY_RUN=1 bash deploy/dalidou/deploy.sh
|
||||
#
|
||||
# Environment variables
|
||||
# ---------------------
|
||||
#
|
||||
# ATOCORE_APP_DIR default /srv/storage/atocore/app
|
||||
# ATOCORE_GIT_REMOTE default http://127.0.0.1:3000/Antoine/ATOCore.git
|
||||
# This is the local Dalidou gitea, reached
|
||||
# via loopback. Override only when running
|
||||
# the deploy from a remote host. The default
|
||||
# is loopback (not the hostname "dalidou")
|
||||
# because the hostname doesn't reliably
|
||||
# resolve on the host itself — Dalidou
|
||||
# Claude's first deploy had to work around
|
||||
# exactly this.
|
||||
# ATOCORE_BRANCH default main
|
||||
# ATOCORE_DEPLOY_DRY_RUN if set to 1, report only, no mutations
|
||||
# ATOCORE_HEALTH_URL default http://127.0.0.1:8100/health
|
||||
#
|
||||
# Safety rails
|
||||
# ------------
|
||||
#
|
||||
# - If the app dir exists but is NOT a git repo, the script renames
|
||||
# it to <dir>.pre-git-<timestamp> before re-cloning, so you never
|
||||
# lose the pre-existing snapshot to a git clobber.
|
||||
# - If the health check fails after restart, the script exits
|
||||
# non-zero and prints the container logs tail for diagnosis.
|
||||
# - Dry-run mode is the default recommendation for the first deploy
|
||||
# on a new environment: it shows the planned git operations and
|
||||
# the compose command without actually running them.
|
||||
#
|
||||
# What this script does NOT do
|
||||
# ----------------------------
|
||||
#
|
||||
# - Does not manage secrets / .env files. The caller is responsible
|
||||
# for placing deploy/dalidou/.env before running.
|
||||
# - Does not run a backup before deploying. Run the backup endpoint
|
||||
# first if you want a pre-deploy snapshot.
|
||||
# - Does not roll back on health-check failure. If deploy fails,
|
||||
# the previous container is already stopped; you need to redeploy
|
||||
# a known-good commit to recover.
|
||||
# - Does not touch the database. The Phase 9 schema migrations in
|
||||
# src/atocore/models/database.py::_apply_migrations are idempotent
|
||||
# ALTER TABLE ADD COLUMN calls that run at service startup via the
|
||||
# lifespan handler. Stale pre-Phase-9 schema is upgraded in place.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
APP_DIR="${ATOCORE_APP_DIR:-/srv/storage/atocore/app}"
|
||||
GIT_REMOTE="${ATOCORE_GIT_REMOTE:-http://127.0.0.1:3000/Antoine/ATOCore.git}"
|
||||
BRANCH="${ATOCORE_BRANCH:-main}"
|
||||
HEALTH_URL="${ATOCORE_HEALTH_URL:-http://127.0.0.1:8100/health}"
|
||||
DRY_RUN="${ATOCORE_DEPLOY_DRY_RUN:-0}"
|
||||
COMPOSE_DIR="$APP_DIR/deploy/dalidou"
|
||||
|
||||
log() { printf '==> %s\n' "$*"; }
|
||||
run() {
|
||||
if [ "$DRY_RUN" = "1" ]; then
|
||||
printf ' [dry-run] %s\n' "$*"
|
||||
else
|
||||
eval "$@"
|
||||
fi
|
||||
}
|
||||
|
||||
log "AtoCore deploy starting"
|
||||
log " app dir: $APP_DIR"
|
||||
log " git remote: $GIT_REMOTE"
|
||||
log " branch: $BRANCH"
|
||||
log " health url: $HEALTH_URL"
|
||||
log " dry run: $DRY_RUN"
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# Step 0: pre-flight permission check
|
||||
# ---------------------------------------------------------------------
|
||||
#
|
||||
# If $APP_DIR exists but the current user cannot write to it (because
|
||||
# a previous manual deploy left it root-owned, for example), the git
|
||||
# fetch / reset in step 1 will fail with cryptic errors. Detect this
|
||||
# up front and give the operator a clean remediation command instead
|
||||
# of letting git produce half-state on partial failure. This was the
|
||||
# exact workaround the 2026-04-08 Dalidou redeploy needed — pre-
|
||||
# existing root ownership from the pre-phase9 manual schema fix.
|
||||
|
||||
if [ -d "$APP_DIR" ] && [ "$DRY_RUN" != "1" ]; then
|
||||
if [ ! -w "$APP_DIR" ] || [ ! -r "$APP_DIR/.git" ] 2>/dev/null; then
|
||||
log "WARNING: app dir exists but may not be writable by current user"
|
||||
fi
|
||||
current_owner="$(stat -c '%U:%G' "$APP_DIR" 2>/dev/null || echo unknown)"
|
||||
current_user="$(id -un 2>/dev/null || echo unknown)"
|
||||
current_uid_gid="$(id -u 2>/dev/null):$(id -g 2>/dev/null)"
|
||||
log "Step 0: permission check"
|
||||
log " app dir owner: $current_owner"
|
||||
log " current user: $current_user ($current_uid_gid)"
|
||||
# Try to write a tiny marker file. If it fails, surface a clean
|
||||
# remediation message and exit before git produces confusing
|
||||
# half-state.
|
||||
marker="$APP_DIR/.deploy-permission-check"
|
||||
if ! ( : > "$marker" ) 2>/dev/null; then
|
||||
log "FATAL: cannot write to $APP_DIR as $current_user"
|
||||
log ""
|
||||
log "The app dir is owned by $current_owner and the current user"
|
||||
log "doesn't have write permission. This usually happens after a"
|
||||
log "manual workaround deploy that ran as root."
|
||||
log ""
|
||||
log "Remediation (pick the one that matches your setup):"
|
||||
log ""
|
||||
log " # If you have passwordless sudo and gitea runs as UID 1000:"
|
||||
log " sudo chown -R 1000:1000 $APP_DIR"
|
||||
log ""
|
||||
log " # If you're running deploy.sh itself as root:"
|
||||
log " sudo bash $0"
|
||||
log ""
|
||||
log " # If neither works, do it via a throwaway container:"
|
||||
log " docker run --rm -v $APP_DIR:/app alpine \\"
|
||||
log " chown -R 1000:1000 /app"
|
||||
log ""
|
||||
log "Then re-run deploy.sh."
|
||||
exit 5
|
||||
fi
|
||||
rm -f "$marker" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# Step 1: make sure $APP_DIR is a proper git checkout of the branch
|
||||
# ---------------------------------------------------------------------
|
||||
|
||||
if [ -d "$APP_DIR/.git" ]; then
|
||||
log "Step 1: app dir is already a git checkout; fetching latest"
|
||||
run "cd '$APP_DIR' && git fetch origin '$BRANCH'"
|
||||
run "cd '$APP_DIR' && git reset --hard 'origin/$BRANCH'"
|
||||
else
|
||||
log "Step 1: app dir is NOT a git checkout; converting"
|
||||
if [ -d "$APP_DIR" ]; then
|
||||
BACKUP="${APP_DIR}.pre-git-$(date -u +%Y%m%dT%H%M%SZ)"
|
||||
log " backing up existing snapshot to $BACKUP"
|
||||
run "mv '$APP_DIR' '$BACKUP'"
|
||||
fi
|
||||
log " cloning $GIT_REMOTE -> $APP_DIR (branch: $BRANCH)"
|
||||
run "git clone --branch '$BRANCH' '$GIT_REMOTE' '$APP_DIR'"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# Step 1.5: self-update re-exec guard
|
||||
# ---------------------------------------------------------------------
|
||||
#
|
||||
# When deploy.sh itself changes in the commit we just pulled, the bash
|
||||
# process running this script is still executing the OLD deploy.sh
|
||||
# from memory — git reset --hard updated the file on disk but our
|
||||
# in-memory instructions are stale. That's exactly how the first
|
||||
# 2026-04-09 Dalidou deploy silently wrote "unknown" build_sha: old
|
||||
# Step 2 logic ran against fresh source. Detect the mismatch and
|
||||
# re-exec into the fresh copy so every post-update run exercises the
|
||||
# new script.
|
||||
#
|
||||
# Guard rails:
|
||||
# - Only runs when $APP_DIR exists, holds a git checkout, and a
|
||||
# deploy.sh exists there (i.e. after Step 1 succeeded).
|
||||
# - Uses a sentinel env var ATOCORE_DEPLOY_REEXECED=1 to make sure
|
||||
# we only re-exec once, never recurse.
|
||||
# - Skipped in dry-run mode (no mutation).
|
||||
# - Skipped if $0 isn't a readable file (bash -c pipe inputs, etc.).
|
||||
|
||||
if [ "$DRY_RUN" != "1" ] \
|
||||
&& [ -z "${ATOCORE_DEPLOY_REEXECED:-}" ] \
|
||||
&& [ -r "$0" ] \
|
||||
&& [ -f "$APP_DIR/deploy/dalidou/deploy.sh" ]; then
|
||||
ON_DISK_HASH="$(sha1sum "$APP_DIR/deploy/dalidou/deploy.sh" 2>/dev/null | awk '{print $1}')"
|
||||
RUNNING_HASH="$(sha1sum "$0" 2>/dev/null | awk '{print $1}')"
|
||||
if [ -n "$ON_DISK_HASH" ] \
|
||||
&& [ -n "$RUNNING_HASH" ] \
|
||||
&& [ "$ON_DISK_HASH" != "$RUNNING_HASH" ]; then
|
||||
log "Step 1.5: deploy.sh changed in the pulled commit; re-exec'ing"
|
||||
log " running script hash: $RUNNING_HASH"
|
||||
log " on-disk script hash: $ON_DISK_HASH"
|
||||
log " re-exec -> $APP_DIR/deploy/dalidou/deploy.sh"
|
||||
export ATOCORE_DEPLOY_REEXECED=1
|
||||
exec bash "$APP_DIR/deploy/dalidou/deploy.sh" "$@"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# Step 2: capture build provenance to pass to the container
|
||||
# ---------------------------------------------------------------------
|
||||
#
|
||||
# We compute the full SHA, the short SHA, the UTC build timestamp,
|
||||
# and the source branch. These get exported as env vars before
|
||||
# `docker compose up -d --build` so the running container can read
|
||||
# them at startup and report them via /health. The post-deploy
|
||||
# verification step (Step 6) reads /health and compares the
|
||||
# reported SHA against this value to detect any silent drift.
|
||||
|
||||
log "Step 2: capturing build provenance"
|
||||
if [ "$DRY_RUN" != "1" ] && [ -d "$APP_DIR/.git" ]; then
|
||||
DEPLOYING_SHA_FULL="$(cd "$APP_DIR" && git rev-parse HEAD)"
|
||||
DEPLOYING_SHA="$(echo "$DEPLOYING_SHA_FULL" | cut -c1-7)"
|
||||
DEPLOYING_TIME="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||||
DEPLOYING_BRANCH="$BRANCH"
|
||||
log " commit: $DEPLOYING_SHA ($DEPLOYING_SHA_FULL)"
|
||||
log " built at: $DEPLOYING_TIME"
|
||||
log " branch: $DEPLOYING_BRANCH"
|
||||
( cd "$APP_DIR" && git log --oneline -1 ) | sed 's/^/ /'
|
||||
export ATOCORE_BUILD_SHA="$DEPLOYING_SHA_FULL"
|
||||
export ATOCORE_BUILD_TIME="$DEPLOYING_TIME"
|
||||
export ATOCORE_BUILD_BRANCH="$DEPLOYING_BRANCH"
|
||||
else
|
||||
log " [dry-run] would read git log from $APP_DIR"
|
||||
DEPLOYING_SHA="dry-run"
|
||||
DEPLOYING_SHA_FULL="dry-run"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# Step 3: preserve the .env file (it's not in git)
|
||||
# ---------------------------------------------------------------------
|
||||
|
||||
ENV_FILE="$COMPOSE_DIR/.env"
|
||||
if [ "$DRY_RUN" != "1" ] && [ ! -f "$ENV_FILE" ]; then
|
||||
log "Step 3: WARNING — $ENV_FILE does not exist"
|
||||
log " the compose workflow needs this file to map mount points"
|
||||
log " copy deploy/dalidou/.env.example to $ENV_FILE and edit it"
|
||||
log " before re-running this script"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# Step 4: rebuild and restart the container
|
||||
# ---------------------------------------------------------------------
|
||||
|
||||
log "Step 4: rebuilding and restarting the atocore container"
|
||||
run "cd '$COMPOSE_DIR' && docker compose up -d --build"
|
||||
|
||||
if [ "$DRY_RUN" = "1" ]; then
|
||||
log "dry-run complete — no mutations performed"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# Step 5: wait for the service to come up and pass the health check
|
||||
# ---------------------------------------------------------------------
|
||||
|
||||
log "Step 5: waiting for /health to respond"
|
||||
for i in 1 2 3 4 5 6 7 8 9 10; do
|
||||
if curl -fsS "$HEALTH_URL" > /tmp/atocore-health.json 2>/dev/null; then
|
||||
log " service is responding"
|
||||
break
|
||||
fi
|
||||
log " not ready yet ($i/10); waiting 3s"
|
||||
sleep 3
|
||||
done
|
||||
|
||||
if ! curl -fsS "$HEALTH_URL" > /tmp/atocore-health.json 2>/dev/null; then
|
||||
log "FATAL: service did not come up within 30 seconds"
|
||||
log " container logs (last 50 lines):"
|
||||
cd "$COMPOSE_DIR" && docker compose logs --tail=50 atocore || true
|
||||
exit 3
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# Step 6: verify the deployed build matches what we just shipped
|
||||
# ---------------------------------------------------------------------
|
||||
#
|
||||
# Two layers of comparison:
|
||||
#
|
||||
# - code_version: matches src/atocore/__init__.py::__version__.
|
||||
# Coarse: any commit between version bumps reports the same value.
|
||||
# - build_sha: full git SHA the container was built from. Set as
|
||||
# an env var by Step 2 above and read by /health from
|
||||
# ATOCORE_BUILD_SHA. This is the precise drift signal — if the
|
||||
# live build_sha doesn't match $DEPLOYING_SHA_FULL, the build
|
||||
# didn't pick up the new source.
|
||||
|
||||
log "Step 6: verifying deployed build"
|
||||
log " /health response:"
|
||||
if command -v jq >/dev/null 2>&1; then
|
||||
jq . < /tmp/atocore-health.json | sed 's/^/ /'
|
||||
REPORTED_VERSION="$(jq -r '.code_version // .version' < /tmp/atocore-health.json)"
|
||||
REPORTED_SHA="$(jq -r '.build_sha // "unknown"' < /tmp/atocore-health.json)"
|
||||
REPORTED_BUILD_TIME="$(jq -r '.build_time // "unknown"' < /tmp/atocore-health.json)"
|
||||
else
|
||||
cat /tmp/atocore-health.json | sed 's/^/ /'
|
||||
echo
|
||||
REPORTED_VERSION="$(grep -o '"code_version":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
|
||||
if [ -z "$REPORTED_VERSION" ]; then
|
||||
REPORTED_VERSION="$(grep -o '"version":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
|
||||
fi
|
||||
REPORTED_SHA="$(grep -o '"build_sha":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
|
||||
REPORTED_SHA="${REPORTED_SHA:-unknown}"
|
||||
REPORTED_BUILD_TIME="$(grep -o '"build_time":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
|
||||
REPORTED_BUILD_TIME="${REPORTED_BUILD_TIME:-unknown}"
|
||||
fi
|
||||
|
||||
EXPECTED_VERSION="$(grep -oE "__version__ = \"[^\"]+\"" "$APP_DIR/src/atocore/__init__.py" | head -1 | cut -d'"' -f2)"
|
||||
|
||||
log " Layer 1 — coarse version:"
|
||||
log " expected code_version: $EXPECTED_VERSION (from src/atocore/__init__.py)"
|
||||
log " reported code_version: $REPORTED_VERSION (from live /health)"
|
||||
|
||||
if [ "$REPORTED_VERSION" != "$EXPECTED_VERSION" ]; then
|
||||
log "FATAL: code_version mismatch"
|
||||
log " the container may not have picked up the new image"
|
||||
log " try: docker compose down && docker compose up -d --build"
|
||||
exit 4
|
||||
fi
|
||||
|
||||
log " Layer 2 — precise build SHA:"
|
||||
log " expected build_sha: $DEPLOYING_SHA_FULL (from this deploy.sh run)"
|
||||
log " reported build_sha: $REPORTED_SHA (from live /health)"
|
||||
log " reported build_time: $REPORTED_BUILD_TIME"
|
||||
|
||||
if [ "$REPORTED_SHA" != "$DEPLOYING_SHA_FULL" ]; then
|
||||
log "FATAL: build_sha mismatch"
|
||||
log " the live container is reporting a different commit than"
|
||||
log " the one this deploy.sh run just shipped. Possible causes:"
|
||||
log " - the container is using a cached image instead of the"
|
||||
log " freshly-built one (try: docker compose build --no-cache)"
|
||||
log " - the env vars didn't propagate (check that"
|
||||
log " deploy/dalidou/docker-compose.yml has the environment"
|
||||
log " section with ATOCORE_BUILD_SHA)"
|
||||
log " - another process restarted the container between the"
|
||||
log " build and the health check"
|
||||
exit 6
|
||||
fi
|
||||
|
||||
log "Deploy complete."
|
||||
log " commit: $DEPLOYING_SHA ($DEPLOYING_SHA_FULL)"
|
||||
log " code_version: $REPORTED_VERSION"
|
||||
log " build_sha: $REPORTED_SHA"
|
||||
log " build_time: $REPORTED_BUILD_TIME"
|
||||
log " health: ok"
|
||||
37
deploy/dalidou/docker-compose.yml
Normal file
37
deploy/dalidou/docker-compose.yml
Normal file
@@ -0,0 +1,37 @@
|
||||
services:
|
||||
atocore:
|
||||
build:
|
||||
context: ../../
|
||||
dockerfile: Dockerfile
|
||||
container_name: atocore
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "${ATOCORE_PORT:-8100}:8100"
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
# Build provenance — set by deploy/dalidou/deploy.sh on each
|
||||
# rebuild so /health can report exactly which commit is live.
|
||||
# Defaults to 'unknown' for direct `docker compose up` runs that
|
||||
# bypass deploy.sh; in that case the operator should run
|
||||
# deploy.sh instead so the deployed SHA is recorded.
|
||||
ATOCORE_BUILD_SHA: "${ATOCORE_BUILD_SHA:-unknown}"
|
||||
ATOCORE_BUILD_TIME: "${ATOCORE_BUILD_TIME:-unknown}"
|
||||
ATOCORE_BUILD_BRANCH: "${ATOCORE_BUILD_BRANCH:-unknown}"
|
||||
volumes:
|
||||
- ${ATOCORE_DB_DIR}:${ATOCORE_DB_DIR}
|
||||
- ${ATOCORE_CHROMA_DIR}:${ATOCORE_CHROMA_DIR}
|
||||
- ${ATOCORE_CACHE_DIR}:${ATOCORE_CACHE_DIR}
|
||||
- ${ATOCORE_TMP_DIR}:${ATOCORE_TMP_DIR}
|
||||
- ${ATOCORE_LOG_DIR}:${ATOCORE_LOG_DIR}
|
||||
- ${ATOCORE_BACKUP_DIR}:${ATOCORE_BACKUP_DIR}
|
||||
- ${ATOCORE_RUN_DIR}:${ATOCORE_RUN_DIR}
|
||||
- ${ATOCORE_PROJECT_REGISTRY_DIR}:${ATOCORE_PROJECT_REGISTRY_DIR}
|
||||
- ${ATOCORE_VAULT_SOURCE_DIR}:${ATOCORE_VAULT_SOURCE_DIR}:ro
|
||||
- ${ATOCORE_DRIVE_SOURCE_DIR}:${ATOCORE_DRIVE_SOURCE_DIR}:ro
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8100/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 20s
|
||||
188
deploy/hooks/capture_stop.py
Normal file
188
deploy/hooks/capture_stop.py
Normal file
@@ -0,0 +1,188 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Claude Code Stop hook: capture interaction to AtoCore.
|
||||
|
||||
Reads the Stop hook JSON from stdin, extracts the last user prompt
|
||||
from the transcript JSONL, and POSTs to the AtoCore /interactions
|
||||
endpoint with reinforcement enabled (no extraction).
|
||||
|
||||
Fail-open: always exits 0, logs errors to stderr only.
|
||||
|
||||
Environment variables:
|
||||
ATOCORE_URL Base URL of the AtoCore instance (default: http://dalidou:8100)
|
||||
ATOCORE_CAPTURE_DISABLED Set to "1" to disable capture (kill switch)
|
||||
|
||||
Usage in ~/.claude/settings.json:
|
||||
"Stop": [{
|
||||
"matcher": "",
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "python /path/to/capture_stop.py",
|
||||
"timeout": 15
|
||||
}]
|
||||
}]
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
|
||||
ATOCORE_URL = os.environ.get("ATOCORE_URL", "http://dalidou:8100")
|
||||
TIMEOUT_SECONDS = 10
|
||||
|
||||
# Minimum prompt length to bother capturing. Single-word acks,
|
||||
# slash commands, and empty lines aren't useful interactions.
|
||||
MIN_PROMPT_LENGTH = 15
|
||||
|
||||
# Maximum response length to capture. Truncate very long assistant
|
||||
# responses to keep the interactions table manageable.
|
||||
MAX_RESPONSE_LENGTH = 50_000
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""Entry point. Always exits 0."""
|
||||
try:
|
||||
_capture()
|
||||
except Exception as exc:
|
||||
print(f"capture_stop: {exc}", file=sys.stderr)
|
||||
|
||||
|
||||
def _capture() -> None:
|
||||
if os.environ.get("ATOCORE_CAPTURE_DISABLED") == "1":
|
||||
return
|
||||
|
||||
raw = sys.stdin.read()
|
||||
if not raw.strip():
|
||||
return
|
||||
|
||||
hook_data = json.loads(raw)
|
||||
|
||||
session_id = hook_data.get("session_id", "")
|
||||
assistant_message = hook_data.get("last_assistant_message", "")
|
||||
transcript_path = hook_data.get("transcript_path", "")
|
||||
cwd = hook_data.get("cwd", "")
|
||||
|
||||
prompt = _extract_last_user_prompt(transcript_path)
|
||||
if not prompt or len(prompt.strip()) < MIN_PROMPT_LENGTH:
|
||||
return
|
||||
|
||||
response = assistant_message or ""
|
||||
if len(response) > MAX_RESPONSE_LENGTH:
|
||||
response = response[:MAX_RESPONSE_LENGTH] + "\n\n[truncated]"
|
||||
|
||||
project = _infer_project(cwd)
|
||||
|
||||
payload = {
|
||||
"prompt": prompt,
|
||||
"response": response,
|
||||
"client": "claude-code",
|
||||
"session_id": session_id,
|
||||
"project": project,
|
||||
"reinforce": True,
|
||||
}
|
||||
|
||||
body = json.dumps(payload, ensure_ascii=True).encode("utf-8")
|
||||
req = urllib.request.Request(
|
||||
f"{ATOCORE_URL}/interactions",
|
||||
data=body,
|
||||
headers={"Content-Type": "application/json"},
|
||||
method="POST",
|
||||
)
|
||||
resp = urllib.request.urlopen(req, timeout=TIMEOUT_SECONDS)
|
||||
result = json.loads(resp.read().decode("utf-8"))
|
||||
print(
|
||||
f"capture_stop: recorded interaction {result.get('id', '?')} "
|
||||
f"(project={project or 'none'}, prompt_chars={len(prompt)}, "
|
||||
f"response_chars={len(response)})",
|
||||
file=sys.stderr,
|
||||
)
|
||||
|
||||
|
||||
def _extract_last_user_prompt(transcript_path: str) -> str:
|
||||
"""Read the JSONL transcript and return the last real user prompt.
|
||||
|
||||
Skips meta messages (isMeta=True) and system/command messages
|
||||
(content starting with '<').
|
||||
"""
|
||||
if not transcript_path:
|
||||
return ""
|
||||
|
||||
# Normalize path for the current OS
|
||||
path = os.path.normpath(transcript_path)
|
||||
if not os.path.isfile(path):
|
||||
return ""
|
||||
|
||||
last_prompt = ""
|
||||
try:
|
||||
with open(path, encoding="utf-8", errors="replace") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
entry = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
|
||||
if entry.get("type") != "user":
|
||||
continue
|
||||
if entry.get("isMeta", False):
|
||||
continue
|
||||
|
||||
msg = entry.get("message", {})
|
||||
if not isinstance(msg, dict):
|
||||
continue
|
||||
|
||||
content = msg.get("content", "")
|
||||
|
||||
if isinstance(content, str):
|
||||
text = content.strip()
|
||||
elif isinstance(content, list):
|
||||
# Content blocks: extract text blocks
|
||||
parts = []
|
||||
for block in content:
|
||||
if isinstance(block, str):
|
||||
parts.append(block)
|
||||
elif isinstance(block, dict) and block.get("type") == "text":
|
||||
parts.append(block.get("text", ""))
|
||||
text = "\n".join(parts).strip()
|
||||
else:
|
||||
continue
|
||||
|
||||
# Skip system/command XML and very short messages
|
||||
if text.startswith("<") or len(text) < MIN_PROMPT_LENGTH:
|
||||
continue
|
||||
|
||||
last_prompt = text
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
return last_prompt
|
||||
|
||||
|
||||
# Project inference from working directory.
|
||||
# Maps known repo paths to AtoCore project IDs. The user can extend
|
||||
# this table or replace it with a registry lookup later.
|
||||
_PROJECT_PATH_MAP: dict[str, str] = {
|
||||
# Add mappings as needed, e.g.:
|
||||
# "C:\\Users\\antoi\\gigabit": "p04-gigabit",
|
||||
# "C:\\Users\\antoi\\interferometer": "p05-interferometer",
|
||||
}
|
||||
|
||||
|
||||
def _infer_project(cwd: str) -> str:
|
||||
"""Try to map the working directory to an AtoCore project."""
|
||||
if not cwd:
|
||||
return ""
|
||||
norm = os.path.normpath(cwd).lower()
|
||||
for path_prefix, project_id in _PROJECT_PATH_MAP.items():
|
||||
if norm.startswith(os.path.normpath(path_prefix).lower()):
|
||||
return project_id
|
||||
return ""
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
332
docs/architecture/conflict-model.md
Normal file
332
docs/architecture/conflict-model.md
Normal file
@@ -0,0 +1,332 @@
|
||||
# Conflict Model (how AtoCore handles contradictory facts)
|
||||
|
||||
## Why this document exists
|
||||
|
||||
Any system that accumulates facts from multiple sources — interactions,
|
||||
ingested documents, repo history, PKM notes — will eventually see
|
||||
contradictory facts about the same thing. AtoCore's operating model
|
||||
already has the hard rule:
|
||||
|
||||
> **Bad memory is worse than no memory.**
|
||||
|
||||
The practical consequence of that rule is: AtoCore must never
|
||||
silently merge contradictory facts, never silently pick a winner,
|
||||
and never silently discard evidence. Every conflict must be
|
||||
surfaced to a human reviewer with full audit context.
|
||||
|
||||
This document defines what "conflict" means in AtoCore, how
|
||||
conflicts are detected, how they are represented, how they are
|
||||
surfaced, and how they are resolved.
|
||||
|
||||
## What counts as a conflict
|
||||
|
||||
A conflict exists when two or more facts in the system claim
|
||||
incompatible values for the same conceptual slot. More precisely:
|
||||
|
||||
A conflict is a set of two or more **active** rows (across memories,
|
||||
entities, project_state) such that:
|
||||
|
||||
1. They share the same **target identity** — same entity type and
|
||||
same semantic key
|
||||
2. Their **claimed values** are incompatible
|
||||
3. They are all in an **active** status (not superseded, not
|
||||
invalid, not candidate)
|
||||
|
||||
Examples that are conflicts:
|
||||
|
||||
- Two active `Decision` entities affecting the same `Subsystem`
|
||||
with contradictory values for the same decided field (e.g.
|
||||
lateral support material = GF-PTFE vs lateral support material = PEEK)
|
||||
- An active `preference` memory "prefers rebase workflow" and an
|
||||
active `preference` memory "prefers merge-commit workflow"
|
||||
- A `project_state` entry `p05 / decision / lateral_support_material = GF-PTFE`
|
||||
and an active `Decision` entity also claiming the lateral support
|
||||
material is PEEK (cross-layer conflict)
|
||||
|
||||
Examples that are NOT conflicts:
|
||||
|
||||
- Two active memories both saying "prefers small diffs" — same
|
||||
meaning, not contradictory
|
||||
- An active memory saying X and a candidate memory saying Y —
|
||||
candidates are not active, so this is part of the review queue
|
||||
flow, not the conflict flow
|
||||
- A superseded `Decision` saying X and an active `Decision` saying Y
|
||||
— supersession is a resolved history, not a conflict
|
||||
- Two active `Requirement` entities each constraining the same
|
||||
component in different but compatible ways (e.g. one caps mass,
|
||||
one caps heat flux) — different fields, no contradiction
|
||||
|
||||
## Detection triggers
|
||||
|
||||
Conflict detection must fire at every write that could create a new
|
||||
active fact. That means the following hook points:
|
||||
|
||||
1. **`POST /memory` creating an active memory** (legacy path)
|
||||
2. **`POST /memory/{id}/promote`** (candidate → active)
|
||||
3. **`POST /entities` creating an active entity** (future)
|
||||
4. **`POST /entities/{id}/promote`** (candidate → active, future)
|
||||
5. **`POST /project/state`** (curating trusted state directly)
|
||||
6. **`POST /memory/{id}/graduate`** (memory → entity graduation,
|
||||
future — the resulting entity could conflict with something)
|
||||
|
||||
Extraction passes do NOT trigger conflict detection at candidate
|
||||
write time. Candidates are allowed to sit in the queue in an
|
||||
apparently-conflicting state; the reviewer will see them during
|
||||
promotion and decision-detection fires at that moment.
|
||||
|
||||
## Detection strategy per layer
|
||||
|
||||
### Memory layer
|
||||
|
||||
For identity / preference / episodic memories (the ones that stay
|
||||
in the memory layer):
|
||||
|
||||
- Matching key: `(memory_type, project, normalized_content_family)`
|
||||
- `normalized_content_family` is not a hash of the content — that
|
||||
would require exact equality — but a slot identifier extracted
|
||||
by a small per-type rule set:
|
||||
- identity: slot is "role" / "background" / "credentials"
|
||||
- preference: slot is the first content word after "prefers" / "uses" / "likes"
|
||||
normalized to a lowercase noun stem, OR the rule id that extracted it
|
||||
- episodic: no slot — episodic entries are intrinsically tied to
|
||||
a moment in time and rarely conflict
|
||||
|
||||
A conflict is flagged when two active memories share a
|
||||
`(memory_type, project, slot)` but have different content bodies.
|
||||
|
||||
### Entity layer (V1)
|
||||
|
||||
For each V1 entity type, the conflict key is a short tuple that
|
||||
uniquely identifies the "slot" that entity is claiming:
|
||||
|
||||
| Entity type | Conflict slot |
|
||||
|-------------------|-------------------------------------------------------|
|
||||
| Project | `(project_id)` |
|
||||
| Subsystem | `(project_id, subsystem_name)` |
|
||||
| Component | `(project_id, subsystem_name, component_name)` |
|
||||
| Requirement | `(project_id, requirement_key)` |
|
||||
| Constraint | `(project_id, constraint_target, constraint_kind)` |
|
||||
| Decision | `(project_id, decision_target, decision_field)` |
|
||||
| Material | `(project_id, component_id)` |
|
||||
| Parameter | `(project_id, parameter_scope, parameter_name)` |
|
||||
| AnalysisModel | `(project_id, subsystem_id, model_name)` |
|
||||
| Result | `(project_id, analysis_model_id, result_key)` |
|
||||
| ValidationClaim | `(project_id, claim_key)` |
|
||||
| Artifact | no conflict detection — artifacts are additive |
|
||||
|
||||
A conflict is two active entities with the same slot but
|
||||
different structural values. The exact "which fields count as
|
||||
structural" list is per-type and lives in the entity schema doc
|
||||
(not yet written — tracked as future `engineering-ontology-v1.md`
|
||||
updates).
|
||||
|
||||
### Cross-layer (memory vs entity vs trusted project state)
|
||||
|
||||
Trusted project state trumps active entities trumps active
|
||||
memories. This is the trust hierarchy from the operating model.
|
||||
|
||||
Cross-layer conflict detection works by a nightly job that walks
|
||||
the three layers and flags any slot that has entries in more than
|
||||
one layer with incompatible values:
|
||||
|
||||
- If trusted project state and an entity disagree: the entity is
|
||||
flagged; trusted state is assumed correct
|
||||
- If an entity and a memory disagree: the memory is flagged; the
|
||||
entity is assumed correct
|
||||
- If trusted state and a memory disagree: the memory is flagged;
|
||||
trusted state is assumed correct
|
||||
|
||||
In all three cases the lower-trust row gets a `conflicts_with`
|
||||
reference pointing at the higher-trust row but does NOT auto-move
|
||||
to superseded. The flag is an alert, not an action.
|
||||
|
||||
## Representation
|
||||
|
||||
Conflicts are represented as rows in a new `conflicts` table
|
||||
(V1 schema, not yet shipped):
|
||||
|
||||
```sql
|
||||
CREATE TABLE conflicts (
|
||||
id TEXT PRIMARY KEY,
|
||||
detected_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
slot_kind TEXT NOT NULL, -- "memory_slot" or "entity_slot" or "cross_layer"
|
||||
slot_key TEXT NOT NULL, -- JSON-encoded tuple identifying the slot
|
||||
project TEXT DEFAULT '',
|
||||
status TEXT NOT NULL DEFAULT 'open', -- open | resolved | dismissed
|
||||
resolved_at DATETIME,
|
||||
resolution TEXT DEFAULT '', -- free text from the reviewer
|
||||
-- links to conflicting rows live in conflict_members
|
||||
UNIQUE(slot_kind, slot_key, status) -- ensures only one open conflict per slot
|
||||
);
|
||||
|
||||
CREATE TABLE conflict_members (
|
||||
conflict_id TEXT NOT NULL REFERENCES conflicts(id) ON DELETE CASCADE,
|
||||
member_kind TEXT NOT NULL, -- "memory" | "entity" | "project_state"
|
||||
member_id TEXT NOT NULL,
|
||||
member_layer_trust INTEGER NOT NULL,-- 1=memory, 2=entity, 3=project_state
|
||||
PRIMARY KEY (conflict_id, member_kind, member_id)
|
||||
);
|
||||
```
|
||||
|
||||
Constraint rationale:
|
||||
|
||||
- `UNIQUE(slot_kind, slot_key, status)` where status='open' prevents
|
||||
duplicate "conflict already open for this slot" rows. At most one
|
||||
open conflict exists per slot at a time; new conflicting rows are
|
||||
added as members to the existing conflict, not as a new conflict.
|
||||
- `conflict_members.member_layer_trust` is denormalized so the
|
||||
conflict resolution UI can sort conflicting rows by trust tier
|
||||
without re-querying.
|
||||
- `status='dismissed'` exists separately from `resolved` because
|
||||
"the reviewer looked at this and declared it not a real conflict"
|
||||
is a valid distinct outcome (the two rows really do describe
|
||||
different things and the detector was overfitting).
|
||||
|
||||
## API shape
|
||||
|
||||
```
|
||||
GET /conflicts list open conflicts
|
||||
GET /conflicts?status=resolved list resolved conflicts
|
||||
GET /conflicts?project=p05-interferometer scope by project
|
||||
GET /conflicts/{id} full detail including all members
|
||||
|
||||
POST /conflicts/{id}/resolve mark resolved with notes
|
||||
body: {
|
||||
"resolution_notes": "...",
|
||||
"winner_member_id": "...", # optional: if specified,
|
||||
# other members are auto-superseded
|
||||
"action": "supersede_others" # or "no_action" if reviewer
|
||||
# wants to resolve without touching rows
|
||||
}
|
||||
|
||||
POST /conflicts/{id}/dismiss mark dismissed ("not a real conflict")
|
||||
body: {
|
||||
"reason": "..."
|
||||
}
|
||||
```
|
||||
|
||||
Conflict detection must also surface in existing endpoints:
|
||||
|
||||
- `GET /memory/{id}` — response includes a `conflicts` array if
|
||||
the memory is a member of any open conflict
|
||||
- `GET /entities/{type}/{id}` (future) — same
|
||||
- `GET /health` — includes `open_conflicts_count` so the operator
|
||||
sees at a glance that review is pending
|
||||
|
||||
## Supersession as a conflict resolution tool
|
||||
|
||||
When the reviewer resolves a conflict with `action: "supersede_others"`,
|
||||
the winner stays active and every other member is flipped to
|
||||
status="superseded" with a `superseded_by` pointer to the winner.
|
||||
This is the normal path: "we used to think X, now we know Y, flag
|
||||
X as superseded so the audit trail keeps X visible but X no longer
|
||||
influences context".
|
||||
|
||||
The conflict resolution audit record links back to all superseded
|
||||
members, so the conflict history itself is queryable:
|
||||
|
||||
- "Show me every conflict that touched Subsystem X"
|
||||
- "Show me every Decision that superseded another Decision because
|
||||
of a conflict"
|
||||
|
||||
These are entries in the V1 query catalog (see Q-014 decision history).
|
||||
|
||||
## Detection latency
|
||||
|
||||
Conflict detection runs at two latencies:
|
||||
|
||||
1. **Synchronous (at write time)** — every create/promote/update of
|
||||
an active row in a conflict-enabled type runs a synchronous
|
||||
same-layer detector. If a conflict is detected the write still
|
||||
succeeds but a row is inserted into `conflicts` and the API
|
||||
response includes a `conflict_id` field so the caller knows
|
||||
immediately.
|
||||
|
||||
2. **Asynchronous (nightly sweep)** — a scheduled job walks all
|
||||
three layers looking for cross-layer conflicts that slipped
|
||||
past write-time detection (e.g. a memory that was already
|
||||
active before an entity with the same slot was promoted). The
|
||||
sweep also looks for slot overlaps that the synchronous
|
||||
detector can't see because the slot key extraction rules have
|
||||
improved since the row was written.
|
||||
|
||||
Both paths write to the same `conflicts` table and both are
|
||||
surfaced in the same review queue.
|
||||
|
||||
## The "flag, never block" rule
|
||||
|
||||
Detection **never** blocks writes. The operating rule is:
|
||||
|
||||
- If the write is otherwise valid (schema, permissions, trust
|
||||
hierarchy), accept it
|
||||
- Log the conflict
|
||||
- Surface it to the reviewer
|
||||
- Let the system keep functioning with the conflict in place
|
||||
|
||||
The alternative — blocking writes on conflict — would mean that
|
||||
one stale fact could prevent all future writes until manually
|
||||
resolved, which in practice makes the system unusable for normal
|
||||
work. The "flag, never block" rule keeps AtoCore responsive while
|
||||
still making conflicts impossible to ignore (the `/health`
|
||||
endpoint's `open_conflicts_count` makes them loud).
|
||||
|
||||
The one exception: writing to `project_state` (layer 3) when an
|
||||
open conflict already exists on that slot will return a warning
|
||||
in the response body. The write still happens, but the reviewer
|
||||
is explicitly told "you just wrote to a slot that has an open
|
||||
conflict". This is the highest-trust layer so we want extra
|
||||
friction there without actually blocking.
|
||||
|
||||
## Showing conflicts in the Human Mirror
|
||||
|
||||
When the Human Mirror template renders a project overview, any
|
||||
open conflict in that project shows as a **"⚠ disputed"** marker
|
||||
next to the affected field, with a link to the conflict detail.
|
||||
This makes conflicts visible to anyone reading the derived
|
||||
human-facing pages, not just to reviewers who think to check the
|
||||
`/conflicts` endpoint.
|
||||
|
||||
The Human Mirror render rules (not yet written — tracked as future
|
||||
`human-mirror-rules.md`) will specify exactly where and how the
|
||||
disputed marker appears.
|
||||
|
||||
## What this document does NOT solve
|
||||
|
||||
1. **Automatic conflict resolution.** No policy will ever
|
||||
automatically promote one conflict member over another. The
|
||||
trust hierarchy is an *alert ordering* for reviewers, not an
|
||||
auto-resolve rule. The human signs off on every resolution.
|
||||
|
||||
2. **Cross-project conflicts.** If p04 and p06 both have
|
||||
entities claiming conflicting things about a shared component,
|
||||
that is currently out of scope because the V1 slot keys all
|
||||
include `project_id`. Cross-project conflict detection is a
|
||||
future concern that needs its own slot key strategy.
|
||||
|
||||
3. **Temporal conflicts with partial overlap.** If a fact was
|
||||
true during a time window and another fact is true in a
|
||||
different time window, that is not a conflict — it's history.
|
||||
Representing time-bounded facts is deferred to a future
|
||||
temporal-entities doc.
|
||||
|
||||
4. **Probabilistic "soft" conflicts.** If two entities claim the
|
||||
same slot with slightly different values (e.g. "4.8 kg" vs
|
||||
"4.82 kg"), is that a conflict? For V1, yes — the string
|
||||
values are unequal so they're flagged. Tolerance-aware
|
||||
numeric comparisons are a V2 concern.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- Conflicts = two or more active rows claiming the same slot with
|
||||
incompatible values
|
||||
- Detection fires on every active write AND in a nightly sweep
|
||||
- Conflicts are stored in a dedicated `conflicts` table with a
|
||||
`conflict_members` join
|
||||
- Resolution is always human (promote-winner / supersede-others
|
||||
/ dismiss-as-not-a-conflict)
|
||||
- "Flag, never block" — writes always succeed, conflicts are
|
||||
surfaced via `/conflicts`, `/health`, per-entity responses, and
|
||||
the Human Mirror
|
||||
- Trusted project state is the top of the trust hierarchy and is
|
||||
assumed correct in any cross-layer conflict until the reviewer
|
||||
says otherwise
|
||||
205
docs/architecture/engineering-knowledge-hybrid-architecture.md
Normal file
205
docs/architecture/engineering-knowledge-hybrid-architecture.md
Normal file
@@ -0,0 +1,205 @@
|
||||
# Engineering Knowledge Hybrid Architecture
|
||||
|
||||
## Purpose
|
||||
|
||||
This note defines how **AtoCore** can evolve into the machine foundation for a **living engineering project knowledge system** while remaining aligned with core AtoCore philosophy.
|
||||
|
||||
AtoCore remains:
|
||||
- the trust engine
|
||||
- the memory/context engine
|
||||
- the retrieval/context assembly layer
|
||||
- the runtime-facing augmentation layer
|
||||
|
||||
It does **not** become a generic wiki app or a PLM clone.
|
||||
|
||||
## Core Architectural Thesis
|
||||
|
||||
AtoCore should act as the **machine truth / context / memory substrate** for project knowledge systems.
|
||||
|
||||
That substrate can then support:
|
||||
- engineering knowledge accumulation
|
||||
- human-readable mirrors
|
||||
- OpenClaw augmentation
|
||||
- future engineering copilots
|
||||
- project traceability across design, analysis, manufacturing, and operations
|
||||
|
||||
## Layer Model
|
||||
|
||||
### Layer 0 — Raw Artifact Layer
|
||||
Examples:
|
||||
- CAD exports
|
||||
- FEM exports
|
||||
- videos / transcripts
|
||||
- screenshots
|
||||
- PDFs
|
||||
- source code
|
||||
- spreadsheets
|
||||
- reports
|
||||
- test data
|
||||
|
||||
### Layer 1 — AtoCore Core Machine Layer
|
||||
Canonical machine substrate.
|
||||
|
||||
Contains:
|
||||
- source registry
|
||||
- source chunks
|
||||
- embeddings / vector retrieval
|
||||
- structured memory
|
||||
- trusted project state
|
||||
- entity and relationship stores
|
||||
- provenance and confidence metadata
|
||||
- interactions / retrieval logs / context packs
|
||||
|
||||
### Layer 2 — Engineering Knowledge Layer
|
||||
Domain-specific project model built on top of AtoCore.
|
||||
|
||||
Represents typed engineering objects such as:
|
||||
- Project
|
||||
- System
|
||||
- Subsystem
|
||||
- Component
|
||||
- Interface
|
||||
- Requirement
|
||||
- Constraint
|
||||
- Assumption
|
||||
- Decision
|
||||
- Material
|
||||
- Parameter
|
||||
- Equation
|
||||
- Analysis Model
|
||||
- Result
|
||||
- Validation Claim
|
||||
- Manufacturing Process
|
||||
- Test
|
||||
- Software Module
|
||||
- Vendor
|
||||
- Artifact
|
||||
|
||||
### Layer 3 — Human Mirror
|
||||
Derived human-readable support surface.
|
||||
|
||||
Examples:
|
||||
- project overview
|
||||
- current state
|
||||
- subsystem pages
|
||||
- component pages
|
||||
- decision log
|
||||
- validation summary
|
||||
- timeline
|
||||
- open questions / risks
|
||||
|
||||
This layer is **derived** from structured state and approved synthesis. It is not canonical machine truth.
|
||||
|
||||
### Layer 4 — Runtime / Clients
|
||||
Consumers such as:
|
||||
- OpenClaw
|
||||
- CLI tools
|
||||
- dashboards
|
||||
- future IDE integrations
|
||||
- engineering copilots
|
||||
- reporting systems
|
||||
- Atomizer / optimization tooling
|
||||
|
||||
## Non-Negotiable Rule
|
||||
|
||||
**Human-readable pages are support artifacts. They are not the primary machine truth layer.**
|
||||
|
||||
Runtime trust order should remain:
|
||||
1. trusted current project state
|
||||
2. validated structured records
|
||||
3. selected reviewed synthesis
|
||||
4. retrieved source evidence
|
||||
5. historical / low-confidence material
|
||||
|
||||
## Responsibilities
|
||||
|
||||
### AtoCore core owns
|
||||
- memory CRUD
|
||||
- trusted project state CRUD
|
||||
- retrieval orchestration
|
||||
- context assembly
|
||||
- provenance
|
||||
- confidence / status
|
||||
- conflict flags
|
||||
- runtime APIs
|
||||
|
||||
### Engineering Knowledge Layer owns
|
||||
- engineering object taxonomy
|
||||
- engineering relationships
|
||||
- domain adapters
|
||||
- project-specific interpretation logic
|
||||
- design / analysis / manufacturing / operations linkage
|
||||
|
||||
### Human Mirror owns
|
||||
- readability
|
||||
- navigation
|
||||
- overview pages
|
||||
- subsystem summaries
|
||||
- decision digests
|
||||
- human inspection / audit comfort
|
||||
|
||||
## Update Model
|
||||
|
||||
New artifacts should not directly overwrite trusted state.
|
||||
|
||||
Recommended update flow:
|
||||
1. ingest source
|
||||
2. parse / chunk / register artifact
|
||||
3. extract candidate objects / claims / relationships
|
||||
4. compare against current trusted state
|
||||
5. flag conflicts or supersessions
|
||||
6. promote updates only under explicit rules
|
||||
7. regenerate affected human-readable pages
|
||||
8. log history and provenance
|
||||
|
||||
## Integration with Existing Knowledge Base System
|
||||
|
||||
The existing engineering Knowledge Base project can be treated as the first major domain adapter.
|
||||
|
||||
Bridge targets include:
|
||||
- KB-CAD component and architecture pages
|
||||
- KB-FEM models / results / validation pages
|
||||
- generation history
|
||||
- images / transcripts / session captures
|
||||
|
||||
AtoCore should absorb the structured value of that system, not replace it with plain retrieval.
|
||||
|
||||
## Suggested First Implementation Scope
|
||||
|
||||
1. stabilize current AtoCore core behavior
|
||||
2. define engineering ontology v1
|
||||
3. add minimal entity / relationship support
|
||||
4. create a Knowledge Base bridge for existing project structures
|
||||
5. generate Human Mirror v1 pages:
|
||||
- overview
|
||||
- current state
|
||||
- decision log
|
||||
- subsystem summary
|
||||
6. add engineering-aware context assembly for OpenClaw
|
||||
|
||||
## Why This Is Aligned With AtoCore Philosophy
|
||||
|
||||
This architecture preserves the original core ideas:
|
||||
- owned memory layer
|
||||
- owned context assembly
|
||||
- machine-human separation
|
||||
- provenance and trust clarity
|
||||
- portability across runtimes
|
||||
- robustness before sophistication
|
||||
|
||||
## Long-Range Outcome
|
||||
|
||||
AtoCore can become the substrate for a **knowledge twin** of an engineering project:
|
||||
- structure
|
||||
- intent
|
||||
- rationale
|
||||
- validation
|
||||
- manufacturing impact
|
||||
- operational behavior
|
||||
- change history
|
||||
- evidence traceability
|
||||
|
||||
That is significantly more powerful than either:
|
||||
- a generic wiki
|
||||
- plain document RAG
|
||||
- an assistant with only chat memory
|
||||
250
docs/architecture/engineering-ontology-v1.md
Normal file
250
docs/architecture/engineering-ontology-v1.md
Normal file
@@ -0,0 +1,250 @@
|
||||
# Engineering Ontology V1
|
||||
|
||||
## Purpose
|
||||
|
||||
Define the first practical engineering ontology that can sit on top of AtoCore and represent a real engineering project as structured knowledge.
|
||||
|
||||
This ontology is intended to be:
|
||||
- useful to machines
|
||||
- inspectable by humans through derived views
|
||||
- aligned with AtoCore trust / provenance rules
|
||||
- expandable across mechanical, FEM, electrical, software, manufacturing, and operations
|
||||
|
||||
## Goal
|
||||
|
||||
Represent a project as a **system of objects and relationships**, not as a pile of notes.
|
||||
|
||||
The ontology should support queries such as:
|
||||
- what is this subsystem?
|
||||
- what requirements does this component satisfy?
|
||||
- what result validates this claim?
|
||||
- what changed recently?
|
||||
- what interfaces are affected by a design change?
|
||||
- what is active vs superseded?
|
||||
|
||||
## Object Families
|
||||
|
||||
### Project structure
|
||||
- Project
|
||||
- System
|
||||
- Subsystem
|
||||
- Assembly
|
||||
- Component
|
||||
- Interface
|
||||
|
||||
### Intent / design logic
|
||||
- Requirement
|
||||
- Constraint
|
||||
- Assumption
|
||||
- Decision
|
||||
- Rationale
|
||||
- Risk
|
||||
- Issue
|
||||
- Open Question
|
||||
- Change Request
|
||||
|
||||
### Physical / technical definition
|
||||
- Material
|
||||
- Parameter
|
||||
- Equation
|
||||
- Configuration
|
||||
- Geometry Artifact
|
||||
- CAD Artifact
|
||||
- Tolerance
|
||||
- Operating Mode
|
||||
|
||||
### Analysis / validation
|
||||
- Analysis Model
|
||||
- Load Case
|
||||
- Boundary Condition
|
||||
- Solver Setup
|
||||
- Result
|
||||
- Validation Claim
|
||||
- Test
|
||||
- Correlation Record
|
||||
|
||||
### Manufacturing / delivery
|
||||
- Manufacturing Process
|
||||
- Vendor
|
||||
- BOM Item
|
||||
- Part Number
|
||||
- Assembly Procedure
|
||||
- Inspection Step
|
||||
- Cost Driver
|
||||
|
||||
### Software / controls / electrical
|
||||
- Software Module
|
||||
- Control Function
|
||||
- State Machine
|
||||
- Signal
|
||||
- Sensor
|
||||
- Actuator
|
||||
- Electrical Interface
|
||||
- Firmware Artifact
|
||||
|
||||
### Evidence / provenance
|
||||
- Source Document
|
||||
- Transcript Segment
|
||||
- Image / Screenshot
|
||||
- Session
|
||||
- Report
|
||||
- External Reference
|
||||
- Generated Summary
|
||||
|
||||
## Minimum Viable V1 Scope
|
||||
|
||||
Initial implementation should start with:
|
||||
- Project
|
||||
- Subsystem
|
||||
- Component
|
||||
- Requirement
|
||||
- Constraint
|
||||
- Decision
|
||||
- Material
|
||||
- Parameter
|
||||
- Analysis Model
|
||||
- Result
|
||||
- Validation Claim
|
||||
- Artifact
|
||||
|
||||
This is enough to represent meaningful project state without trying to model everything immediately.
|
||||
|
||||
## Core Relationship Types
|
||||
|
||||
### Structural
|
||||
- `CONTAINS`
|
||||
- `PART_OF`
|
||||
- `INTERFACES_WITH`
|
||||
|
||||
### Intent / logic
|
||||
- `SATISFIES`
|
||||
- `CONSTRAINED_BY`
|
||||
- `BASED_ON_ASSUMPTION`
|
||||
- `AFFECTED_BY_DECISION`
|
||||
- `SUPERSEDES`
|
||||
|
||||
### Validation
|
||||
- `ANALYZED_BY`
|
||||
- `VALIDATED_BY`
|
||||
- `SUPPORTS`
|
||||
- `CONFLICTS_WITH`
|
||||
- `DEPENDS_ON`
|
||||
|
||||
### Artifact / provenance
|
||||
- `DESCRIBED_BY`
|
||||
- `UPDATED_BY_SESSION`
|
||||
- `EVIDENCED_BY`
|
||||
- `SUMMARIZED_IN`
|
||||
|
||||
## Example Statements
|
||||
|
||||
- `Subsystem:Lateral Support CONTAINS Component:Pivot Pin`
|
||||
- `Component:Pivot Pin CONSTRAINED_BY Requirement:low lateral friction`
|
||||
- `Decision:Use GF-PTFE pad AFFECTS Subsystem:Lateral Support`
|
||||
- `AnalysisModel:M1 static model ANALYZES Subsystem:Reference Frame`
|
||||
- `Result:deflection case 03 SUPPORTS ValidationClaim:vertical stiffness acceptable`
|
||||
- `Artifact:NX assembly DESCRIBES Component:Reference Frame`
|
||||
- `Session:gen-004 UPDATED_BY_SESSION Component:Vertical Support`
|
||||
|
||||
## Shared Required Fields
|
||||
|
||||
Every major object should support fields equivalent to:
|
||||
- `id`
|
||||
- `type`
|
||||
- `name`
|
||||
- `project_id`
|
||||
- `status`
|
||||
- `confidence`
|
||||
- `source_refs`
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
- `notes` (optional)
|
||||
|
||||
## Suggested Status Lifecycle
|
||||
|
||||
For objects and claims:
|
||||
- `candidate`
|
||||
- `active`
|
||||
- `superseded`
|
||||
- `invalid`
|
||||
- `needs_review`
|
||||
|
||||
## Trust Rules
|
||||
|
||||
1. An object may exist before it becomes trusted.
|
||||
2. A generated markdown summary is not canonical truth by default.
|
||||
3. If evidence conflicts, prefer:
|
||||
1. trusted current project state
|
||||
2. validated structured records
|
||||
3. reviewed derived synthesis
|
||||
4. raw evidence
|
||||
5. historical notes
|
||||
4. Conflicts should be surfaced, not silently blended.
|
||||
|
||||
## Mapping to the Existing Knowledge Base System
|
||||
|
||||
### KB-CAD can map to
|
||||
- System
|
||||
- Subsystem
|
||||
- Component
|
||||
- Material
|
||||
- Decision
|
||||
- Constraint
|
||||
- Artifact
|
||||
|
||||
### KB-FEM can map to
|
||||
- Analysis Model
|
||||
- Load Case
|
||||
- Boundary Condition
|
||||
- Result
|
||||
- Validation Claim
|
||||
- Correlation Record
|
||||
|
||||
### Session generations can map to
|
||||
- Session
|
||||
- Generated Summary
|
||||
- object update history
|
||||
- provenance events
|
||||
|
||||
## Human Mirror Possibilities
|
||||
|
||||
Once the ontology exists, AtoCore can generate pages such as:
|
||||
- project overview
|
||||
- subsystem page
|
||||
- component page
|
||||
- decision log
|
||||
- validation summary
|
||||
- requirement trace page
|
||||
|
||||
These should remain **derived representations** of structured state.
|
||||
|
||||
## Recommended V1 Deliverables
|
||||
|
||||
1. minimal typed object registry
|
||||
2. minimal typed relationship registry
|
||||
3. evidence-linking support
|
||||
4. practical query support for:
|
||||
- component summary
|
||||
- subsystem current state
|
||||
- requirement coverage
|
||||
- result-to-claim mapping
|
||||
- decision history
|
||||
|
||||
## What Not To Do In V1
|
||||
|
||||
- do not model every engineering concept immediately
|
||||
- do not build a giant graph with no practical queries
|
||||
- do not collapse structured objects back into only markdown
|
||||
- do not let generated prose outrank structured truth
|
||||
- do not auto-promote trusted state too aggressively
|
||||
|
||||
## Summary
|
||||
|
||||
Ontology V1 should be:
|
||||
- small enough to implement
|
||||
- rich enough to be useful
|
||||
- aligned with AtoCore trust philosophy
|
||||
- capable of absorbing the existing engineering Knowledge Base work
|
||||
|
||||
The first goal is not to model everything.
|
||||
The first goal is to represent enough of a real project that AtoCore can reason over structure, not just notes.
|
||||
380
docs/architecture/engineering-query-catalog.md
Normal file
380
docs/architecture/engineering-query-catalog.md
Normal file
@@ -0,0 +1,380 @@
|
||||
# Engineering Query Catalog (V1 driving target)
|
||||
|
||||
## Purpose
|
||||
|
||||
This document is the **single most important driver** of the engineering
|
||||
layer V1 design. The ontology, the schema, the relationship types, and
|
||||
the human mirror templates should all be designed *to answer the queries
|
||||
in this catalog*. Anything in the ontology that does not serve at least
|
||||
one of these queries is overdesign for V1.
|
||||
|
||||
The rule is:
|
||||
|
||||
> If we cannot describe what question a typed object or relationship
|
||||
> lets us answer, that object or relationship is not in V1.
|
||||
|
||||
The catalog is also the **acceptance test** for the engineering layer.
|
||||
"V1 is done" means: AtoCore can answer at least the V1-required queries
|
||||
in this list against the active project set (`p04-gigabit`,
|
||||
`p05-interferometer`, `p06-polisher`).
|
||||
|
||||
## Structure of each entry
|
||||
|
||||
Each query is documented as:
|
||||
|
||||
- **id**: stable identifier (`Q-001`, `Q-002`, ...)
|
||||
- **question**: the natural-language question a human or LLM would ask
|
||||
- **example invocation**: how a client would call AtoCore to ask it
|
||||
- **expected result shape**: the structure of the answer (not real data)
|
||||
- **objects required**: which engineering objects must exist
|
||||
- **relationships required**: which relationships must exist
|
||||
- **provenance requirement**: what evidence must be linkable
|
||||
- **tier**: `v1-required` | `v1-stretch` | `v2`
|
||||
|
||||
## Tiering
|
||||
|
||||
- **v1-required** queries are the floor. The engineering layer cannot
|
||||
ship without all of them working.
|
||||
- **v1-stretch** queries should be doable with V1 objects but may need
|
||||
additional adapters.
|
||||
- **v2** queries are aspirational; they belong to a later wave of
|
||||
ontology work and are listed here only to make sure V1 does not
|
||||
paint us into a corner.
|
||||
|
||||
## V1 minimum object set (recap)
|
||||
|
||||
For reference, the V1 ontology includes:
|
||||
|
||||
- Project, Subsystem, Component
|
||||
- Requirement, Constraint, Decision
|
||||
- Material, Parameter
|
||||
- AnalysisModel, Result, ValidationClaim
|
||||
- Artifact
|
||||
|
||||
And the four relationship families:
|
||||
|
||||
- Structural: `CONTAINS`, `PART_OF`, `INTERFACES_WITH`
|
||||
- Intent: `SATISFIES`, `CONSTRAINED_BY`, `BASED_ON_ASSUMPTION`,
|
||||
`AFFECTED_BY_DECISION`, `SUPERSEDES`
|
||||
- Validation: `ANALYZED_BY`, `VALIDATED_BY`, `SUPPORTS`,
|
||||
`CONFLICTS_WITH`, `DEPENDS_ON`
|
||||
- Provenance: `DESCRIBED_BY`, `UPDATED_BY_SESSION`, `EVIDENCED_BY`,
|
||||
`SUMMARIZED_IN`
|
||||
|
||||
Every query below is annotated with which of these it depends on, so
|
||||
that the V1 implementation order is unambiguous.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1: Structure queries
|
||||
|
||||
### Q-001 — What does this subsystem contain?
|
||||
- **question**: "What components and child subsystems make up
|
||||
Subsystem `<name>`?"
|
||||
- **invocation**: `GET /entities/Subsystem/<id>?expand=contains`
|
||||
- **expected**: `{ subsystem, contains: [{ id, type, name, status }] }`
|
||||
- **objects**: Subsystem, Component
|
||||
- **relationships**: `CONTAINS`
|
||||
- **provenance**: each child must link back to at least one Artifact or
|
||||
source chunk via `DESCRIBED_BY` / `EVIDENCED_BY`
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-002 — What is this component a part of?
|
||||
- **question**: "Which subsystem(s) does Component `<name>` belong to?"
|
||||
- **invocation**: `GET /entities/Component/<id>?expand=parents`
|
||||
- **expected**: `{ component, part_of: [{ id, type, name, status }] }`
|
||||
- **objects**: Component, Subsystem
|
||||
- **relationships**: `PART_OF` (inverse of `CONTAINS`)
|
||||
- **provenance**: same as Q-001
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-003 — What interfaces does this subsystem have, and to what?
|
||||
- **question**: "What does Subsystem `<name>` interface with, and on
|
||||
which interfaces?"
|
||||
- **invocation**: `GET /entities/Subsystem/<id>/interfaces`
|
||||
- **expected**: `[{ interface_id, peer: { id, type, name }, role }]`
|
||||
- **objects**: Subsystem (Interface object deferred to v2)
|
||||
- **relationships**: `INTERFACES_WITH`
|
||||
- **tier**: v1-required (with simplified Interface = string label;
|
||||
full Interface object becomes v2)
|
||||
|
||||
### Q-004 — What is the system map for this project right now?
|
||||
- **question**: "Give me the current structural tree of Project `<id>`."
|
||||
- **invocation**: `GET /projects/<id>/system-map`
|
||||
- **expected**: nested tree of `{ id, type, name, status, children: [] }`
|
||||
- **objects**: Project, Subsystem, Component
|
||||
- **relationships**: `CONTAINS`, `PART_OF`
|
||||
- **tier**: v1-required
|
||||
|
||||
---
|
||||
|
||||
## Tier 2: Intent queries
|
||||
|
||||
### Q-005 — Which requirements does this component satisfy?
|
||||
- **question**: "Which Requirements does Component `<name>` satisfy
|
||||
today?"
|
||||
- **invocation**: `GET /entities/Component/<id>?expand=satisfies`
|
||||
- **expected**: `[{ requirement_id, name, status, confidence }]`
|
||||
- **objects**: Component, Requirement
|
||||
- **relationships**: `SATISFIES`
|
||||
- **provenance**: each `SATISFIES` edge must link to a Result or
|
||||
ValidationClaim that supports the satisfaction (or be flagged as
|
||||
`unverified`)
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-006 — Which requirements are not satisfied by anything?
|
||||
- **question**: "Show me orphan Requirements in Project `<id>` —
|
||||
requirements with no `SATISFIES` edge from any Component."
|
||||
- **invocation**: `GET /projects/<id>/requirements?coverage=orphan`
|
||||
- **expected**: `[{ requirement_id, name, status, last_updated }]`
|
||||
- **objects**: Project, Requirement, Component
|
||||
- **relationships**: absence of `SATISFIES`
|
||||
- **tier**: v1-required (this is the killer correctness query — it's
|
||||
the engineering equivalent of "untested code")
|
||||
|
||||
### Q-007 — What constrains this component?
|
||||
- **question**: "What Constraints apply to Component `<name>`?"
|
||||
- **invocation**: `GET /entities/Component/<id>?expand=constraints`
|
||||
- **expected**: `[{ constraint_id, name, value, source_decision_id? }]`
|
||||
- **objects**: Component, Constraint
|
||||
- **relationships**: `CONSTRAINED_BY`
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-008 — Which decisions affect this subsystem or component?
|
||||
- **question**: "Show me every Decision that affects `<entity>`."
|
||||
- **invocation**: `GET /entities/<type>/<id>?expand=decisions`
|
||||
- **expected**: `[{ decision_id, name, status, made_at, supersedes? }]`
|
||||
- **objects**: Decision, plus the affected entity
|
||||
- **relationships**: `AFFECTED_BY_DECISION`, `SUPERSEDES`
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-009 — Which decisions are based on assumptions that are now flagged?
|
||||
- **question**: "Are any active Decisions in Project `<id>` based on an
|
||||
Assumption that has been marked invalid or needs_review?"
|
||||
- **invocation**: `GET /projects/<id>/decisions?assumption_status=needs_review,invalid`
|
||||
- **expected**: `[{ decision_id, assumption_id, assumption_status }]`
|
||||
- **objects**: Decision, Assumption
|
||||
- **relationships**: `BASED_ON_ASSUMPTION`
|
||||
- **tier**: v1-required (this is the second killer correctness query —
|
||||
catches fragile design)
|
||||
|
||||
---
|
||||
|
||||
## Tier 3: Validation queries
|
||||
|
||||
### Q-010 — What result validates this claim?
|
||||
- **question**: "Show me the Result(s) supporting ValidationClaim
|
||||
`<name>`."
|
||||
- **invocation**: `GET /entities/ValidationClaim/<id>?expand=supports`
|
||||
- **expected**: `[{ result_id, analysis_model_id, summary, confidence }]`
|
||||
- **objects**: ValidationClaim, Result, AnalysisModel
|
||||
- **relationships**: `SUPPORTS`, `ANALYZED_BY`
|
||||
- **provenance**: every Result must link to its AnalysisModel and an
|
||||
Artifact via `DESCRIBED_BY`
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-011 — Are there any active validation claims with no supporting result?
|
||||
- **question**: "Which active ValidationClaims in Project `<id>` have
|
||||
no `SUPPORTS` edge from any Result?"
|
||||
- **invocation**: `GET /projects/<id>/validation?coverage=unsupported`
|
||||
- **expected**: `[{ claim_id, name, status, last_updated }]`
|
||||
- **objects**: ValidationClaim, Result
|
||||
- **relationships**: absence of `SUPPORTS`
|
||||
- **tier**: v1-required (third killer correctness query — catches
|
||||
claims that are not yet evidenced)
|
||||
|
||||
### Q-012 — Are there conflicting results for the same claim?
|
||||
- **question**: "Show me ValidationClaims where multiple Results
|
||||
disagree (one `SUPPORTS`, another `CONFLICTS_WITH`)."
|
||||
- **invocation**: `GET /projects/<id>/validation?coverage=conflict`
|
||||
- **expected**: `[{ claim_id, supporting_results, conflicting_results }]`
|
||||
- **objects**: ValidationClaim, Result
|
||||
- **relationships**: `SUPPORTS`, `CONFLICTS_WITH`
|
||||
- **tier**: v1-required
|
||||
|
||||
---
|
||||
|
||||
## Tier 4: Change / time queries
|
||||
|
||||
### Q-013 — What changed in this project recently?
|
||||
- **question**: "List entities in Project `<id>` whose `updated_at`
|
||||
is within the last `<window>`."
|
||||
- **invocation**: `GET /projects/<id>/changes?since=<iso>`
|
||||
- **expected**: `[{ id, type, name, status, updated_at, change_kind }]`
|
||||
- **objects**: any
|
||||
- **relationships**: any
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-014 — What is the decision history for this subsystem?
|
||||
- **question**: "Show me all Decisions affecting Subsystem `<id>` in
|
||||
chronological order, including superseded ones."
|
||||
- **invocation**: `GET /entities/Subsystem/<id>/decision-log`
|
||||
- **expected**: ordered list with supersession chain
|
||||
- **objects**: Decision, Subsystem
|
||||
- **relationships**: `AFFECTED_BY_DECISION`, `SUPERSEDES`
|
||||
- **tier**: v1-required (this is what a human-readable decision log
|
||||
is generated from)
|
||||
|
||||
### Q-015 — What was the trusted state of this entity at time T?
|
||||
- **question**: "Reconstruct the active fields of `<entity>` as of
|
||||
timestamp `<T>`."
|
||||
- **invocation**: `GET /entities/<type>/<id>?as_of=<iso>`
|
||||
- **expected**: the entity record as it would have been seen at T
|
||||
- **objects**: any
|
||||
- **relationships**: status lifecycle
|
||||
- **tier**: v1-stretch (requires status history table — defer if
|
||||
baseline implementation runs long)
|
||||
|
||||
---
|
||||
|
||||
## Tier 5: Cross-cutting queries
|
||||
|
||||
### Q-016 — Which interfaces are affected by changing this component?
|
||||
- **question**: "If Component `<name>` changes, which Interfaces and
|
||||
which peer subsystems are impacted?"
|
||||
- **invocation**: `GET /entities/Component/<id>/impact`
|
||||
- **expected**: `[{ interface_id, peer_id, peer_type, peer_name }]`
|
||||
- **objects**: Component, Subsystem
|
||||
- **relationships**: `PART_OF`, `INTERFACES_WITH`
|
||||
- **tier**: v1-required (this is the change-impact-analysis query the
|
||||
whole engineering layer exists for)
|
||||
|
||||
### Q-017 — What evidence supports this fact?
|
||||
- **question**: "Give me the source documents and chunks that support
|
||||
the current value of `<entity>.<field>`."
|
||||
- **invocation**: `GET /entities/<type>/<id>/evidence?field=<field>`
|
||||
- **expected**: `[{ source_file, chunk_id, heading_path, score }]`
|
||||
- **objects**: any
|
||||
- **relationships**: `EVIDENCED_BY`, `DESCRIBED_BY`
|
||||
- **tier**: v1-required (without this the engineering layer cannot
|
||||
pass the AtoCore "trust + provenance" rule)
|
||||
|
||||
### Q-018 — What is active vs superseded for this concept?
|
||||
- **question**: "Show me the current active record for `<key>` plus
|
||||
the chain of superseded versions."
|
||||
- **invocation**: `GET /entities/<type>/<id>?include=superseded`
|
||||
- **expected**: `{ active, superseded_chain: [...] }`
|
||||
- **objects**: any
|
||||
- **relationships**: `SUPERSEDES`
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-019 — Which components depend on this material?
|
||||
- **question**: "List every Component whose Material is `<material>`."
|
||||
- **invocation**: `GET /entities/Material/<id>/components`
|
||||
- **expected**: `[{ component_id, name, subsystem_id }]`
|
||||
- **objects**: Component, Material
|
||||
- **relationships**: derived from Component.material field, no edge
|
||||
needed
|
||||
- **tier**: v1-required
|
||||
|
||||
### Q-020 — What does this project look like as a project overview?
|
||||
- **question**: "Generate the human-readable Project Overview for
|
||||
Project `<id>` from current trusted state."
|
||||
- **invocation**: `GET /projects/<id>/mirror/overview`
|
||||
- **expected**: formatted markdown derived from active entities
|
||||
- **objects**: Project, Subsystem, Component, Decision, Requirement,
|
||||
ValidationClaim
|
||||
- **relationships**: structural + intent
|
||||
- **tier**: v1-required (this is the Layer 3 Human Mirror entry
|
||||
point — the moment the engineering layer becomes useful to humans
|
||||
who do not want to call APIs)
|
||||
|
||||
---
|
||||
|
||||
## v1-stretch (nice to have)
|
||||
|
||||
### Q-021 — Which parameters drive this analysis result?
|
||||
- **objects**: AnalysisModel, Parameter, Result
|
||||
- **relationships**: `ANALYZED_BY`, plus a new `DRIVEN_BY` edge
|
||||
|
||||
### Q-022 — Which decisions cite which prior decisions?
|
||||
- **objects**: Decision
|
||||
- **relationships**: `BASED_ON_DECISION` (new)
|
||||
|
||||
### Q-023 — Cross-project comparison
|
||||
- **question**: "Are any Materials shared between p04, p05, and p06,
|
||||
and are their Constraints consistent?"
|
||||
- **objects**: Project, Material, Constraint
|
||||
|
||||
---
|
||||
|
||||
## v2 (deferred)
|
||||
|
||||
### Q-024 — Cost rollup
|
||||
- requires BOM Item, Cost Driver, Vendor — out of V1 scope
|
||||
|
||||
### Q-025 — Manufacturing readiness
|
||||
- requires Manufacturing Process, Inspection Step, Assembly Procedure
|
||||
— out of V1 scope
|
||||
|
||||
### Q-026 — Software / control state
|
||||
- requires Software Module, State Machine, Sensor, Actuator — out
|
||||
of V1 scope
|
||||
|
||||
### Q-027 — Test correlation across analyses
|
||||
- requires Test, Correlation Record — out of V1 scope
|
||||
|
||||
---
|
||||
|
||||
## What this catalog implies for V1 implementation order
|
||||
|
||||
The 20 v1-required queries above tell us what to build first, in
|
||||
roughly this order:
|
||||
|
||||
1. **Structural** (Q-001 to Q-004): need Project, Subsystem, Component
|
||||
and `CONTAINS` / `PART_OF` / `INTERFACES_WITH` (with Interface as a
|
||||
simple string label, not its own entity).
|
||||
2. **Intent core** (Q-005 to Q-008): need Requirement, Constraint,
|
||||
Decision and `SATISFIES` / `CONSTRAINED_BY` / `AFFECTED_BY_DECISION`.
|
||||
3. **Killer correctness queries** (Q-006, Q-009, Q-011): need the
|
||||
absence-of-edge query patterns and the Assumption object.
|
||||
4. **Validation** (Q-010 to Q-012): need AnalysisModel, Result,
|
||||
ValidationClaim and `SUPPORTS` / `ANALYZED_BY` / `CONFLICTS_WITH`.
|
||||
5. **Change/time** (Q-013, Q-014): need a write log per entity (the
|
||||
existing `updated_at` plus a status history if Q-015 is in scope).
|
||||
6. **Cross-cutting** (Q-016 to Q-019): impact analysis is mostly a
|
||||
graph traversal once the structural and intent edges exist.
|
||||
7. **Provenance** (Q-017): the entity store must always link to
|
||||
chunks/artifacts via `EVIDENCED_BY` / `DESCRIBED_BY`. This is
|
||||
non-negotiable and should be enforced at insert time, not later.
|
||||
8. **Human Mirror** (Q-020): the markdown generator is the *last*
|
||||
thing built, not the first. It is derived from everything above.
|
||||
|
||||
## What is intentionally left out of V1
|
||||
|
||||
- BOM, manufacturing, vendor, cost objects (entire family deferred)
|
||||
- Software, control, electrical objects (entire family deferred)
|
||||
- Test correlation objects (entire family deferred)
|
||||
- Full Interface as its own entity (string label is enough for V1)
|
||||
- Time-travel queries beyond `since=<iso>` (Q-015 is stretch)
|
||||
- Multi-project rollups (Q-023 is stretch)
|
||||
|
||||
## Open questions this catalog raises
|
||||
|
||||
These are the design questions that need to be answered in the next
|
||||
planning docs (memory-vs-entities, conflict-model, promotion-rules):
|
||||
|
||||
- **Q-006, Q-011 (orphan / unsupported queries)**: do orphans get
|
||||
flagged at insert time, computed at query time, or both?
|
||||
- **Q-009 (assumption-driven decisions)**: when an Assumption flips
|
||||
to `needs_review`, are all dependent Decisions auto-flagged or do
|
||||
they only show up when this query is run?
|
||||
- **Q-012 (conflicting results)**: does AtoCore *block* a conflict
|
||||
from being saved, or always save and flag? (The trust rule says
|
||||
flag, never block — but the implementation needs the explicit nod.)
|
||||
- **Q-017 (evidence)**: is `EVIDENCED_BY` mandatory at insert? If yes,
|
||||
how do we backfill entities extracted from older interactions where
|
||||
the source link is fuzzy?
|
||||
- **Q-020 (Project Overview mirror)**: when does it regenerate?
|
||||
On every entity write? On a schedule? On demand?
|
||||
|
||||
These are the questions the next architecture docs in the planning
|
||||
sprint should resolve before any code is written.
|
||||
|
||||
## Working rule
|
||||
|
||||
> If a v1-required query in this catalog cannot be answered against
|
||||
> at least one of `p04-gigabit`, `p05-interferometer`, or
|
||||
> `p06-polisher`, the engineering layer is not done.
|
||||
|
||||
This catalog is the contract.
|
||||
434
docs/architecture/engineering-v1-acceptance.md
Normal file
434
docs/architecture/engineering-v1-acceptance.md
Normal file
@@ -0,0 +1,434 @@
|
||||
# Engineering Layer V1 Acceptance Criteria
|
||||
|
||||
## Why this document exists
|
||||
|
||||
The engineering layer planning sprint produced 7 architecture
|
||||
docs. None of them on their own says "you're done with V1, ship
|
||||
it". This document does. It translates the planning into
|
||||
measurable, falsifiable acceptance criteria so the implementation
|
||||
sprint can know unambiguously when V1 is complete.
|
||||
|
||||
The acceptance criteria are organized into four categories:
|
||||
|
||||
1. **Functional** — what the system must be able to do
|
||||
2. **Quality** — how well it must do it
|
||||
3. **Operational** — what running it must look like
|
||||
4. **Documentation** — what must be written down
|
||||
|
||||
V1 is "done" only when **every criterion in this document is met
|
||||
against at least one of the three active projects** (`p04-gigabit`,
|
||||
`p05-interferometer`, `p06-polisher`). The choice of which
|
||||
project is the test bed is up to the implementer, but the same
|
||||
project must satisfy all functional criteria.
|
||||
|
||||
## The single-sentence definition
|
||||
|
||||
> AtoCore Engineering Layer V1 is done when, against one chosen
|
||||
> active project, every v1-required query in
|
||||
> `engineering-query-catalog.md` returns a correct result, the
|
||||
> Human Mirror renders a coherent project overview, and a real
|
||||
> KB-CAD or KB-FEM export round-trips through the ingest →
|
||||
> review queue → active entity flow without violating any
|
||||
> conflict or trust invariant.
|
||||
|
||||
Everything below is the operational form of that sentence.
|
||||
|
||||
## Category 1 — Functional acceptance
|
||||
|
||||
### F-1: Entity store implemented per the V1 ontology
|
||||
|
||||
- The 12 V1 entity types from `engineering-ontology-v1.md` exist
|
||||
in the database with the schema described there
|
||||
- The 4 relationship families (Structural, Intent, Validation,
|
||||
Provenance) are implemented as edges with the relationship
|
||||
types listed in the catalog
|
||||
- Every entity has the shared header fields:
|
||||
`id, type, name, project_id, status, confidence, source_refs,
|
||||
created_at, updated_at, extractor_version, canonical_home`
|
||||
- The status lifecycle matches the memory layer:
|
||||
`candidate → active → superseded | invalid`
|
||||
|
||||
### F-2: All v1-required queries return correct results
|
||||
|
||||
For the chosen test project, every query Q-001 through Q-020 in
|
||||
`engineering-query-catalog.md` must:
|
||||
|
||||
- be implemented as an API endpoint with the shape specified in
|
||||
the catalog
|
||||
- return the expected result shape against real data
|
||||
- include the provenance chain when the catalog requires it
|
||||
- handle the empty case (no matches) gracefully — empty array,
|
||||
not 500
|
||||
|
||||
The "killer correctness queries" — Q-006 (orphan requirements),
|
||||
Q-009 (decisions on flagged assumptions), Q-011 (unsupported
|
||||
validation claims) — are non-negotiable. If any of those three
|
||||
returns wrong results, V1 is not done.
|
||||
|
||||
### F-3: Tool ingest endpoints are live
|
||||
|
||||
Both endpoints from `tool-handoff-boundaries.md` are implemented:
|
||||
|
||||
- `POST /ingest/kb-cad/export` accepts the documented JSON
|
||||
shape, validates it, and produces entity candidates
|
||||
- `POST /ingest/kb-fem/export` ditto
|
||||
- Both refuse exports with invalid schemas (4xx with a clear
|
||||
error)
|
||||
- Both return a summary of created/dropped/failed counts
|
||||
- Both never auto-promote anything; everything lands as
|
||||
`status="candidate"`
|
||||
- Both carry source identifiers (exporter name, exporter version,
|
||||
source artifact id) into the candidate's provenance fields
|
||||
|
||||
A real KB-CAD export — even a hand-crafted one if the actual
|
||||
exporter doesn't exist yet — must round-trip through the endpoint
|
||||
and produce reviewable candidates for the test project.
|
||||
|
||||
### F-4: Candidate review queue works end to end
|
||||
|
||||
Per `promotion-rules.md`:
|
||||
|
||||
- `GET /entities?status=candidate` lists the queue
|
||||
- `POST /entities/{id}/promote` moves candidate → active
|
||||
- `POST /entities/{id}/reject` moves candidate → invalid
|
||||
- The same shapes work for memories (already shipped in Phase 9 C)
|
||||
- The reviewer can edit a candidate's content via
|
||||
`PUT /entities/{id}` before promoting
|
||||
- Every promote/reject is logged with timestamp and reason
|
||||
|
||||
### F-5: Conflict detection fires
|
||||
|
||||
Per `conflict-model.md`:
|
||||
|
||||
- The synchronous detector runs at every active write
|
||||
(create, promote, project_state set, KB import)
|
||||
- A test must demonstrate that pushing a contradictory KB-CAD
|
||||
export creates a `conflicts` row with both members linked
|
||||
- The reviewer can resolve the conflict via
|
||||
`POST /conflicts/{id}/resolve` with one of the supported
|
||||
actions (supersede_others, no_action, dismiss)
|
||||
- Resolution updates the underlying entities according to the
|
||||
chosen action
|
||||
|
||||
### F-6: Human Mirror renders for the test project
|
||||
|
||||
Per `human-mirror-rules.md`:
|
||||
|
||||
- `GET /mirror/{project}/overview` returns rendered markdown
|
||||
- `GET /mirror/{project}/decisions` returns rendered markdown
|
||||
- `GET /mirror/{project}/subsystems/{subsystem}` returns
|
||||
rendered markdown for at least one subsystem
|
||||
- `POST /mirror/{project}/regenerate` triggers regeneration on
|
||||
demand
|
||||
- Generated files appear under `/srv/storage/atocore/data/mirror/`
|
||||
with the "do not edit" header banner
|
||||
- Disputed markers appear inline when conflicts exist
|
||||
- Project-state overrides display with the `(curated)` annotation
|
||||
- Output is deterministic (the same inputs produce the same
|
||||
bytes, suitable for diffing)
|
||||
|
||||
### F-7: Memory-to-entity graduation works for at least one type
|
||||
|
||||
Per `memory-vs-entities.md`:
|
||||
|
||||
- `POST /memory/{id}/graduate` exists
|
||||
- Graduating a memory of type `adaptation` produces a Decision
|
||||
entity candidate with the memory's content as a starting point
|
||||
- The original memory row stays at `status="graduated"` (a new
|
||||
status added by the engineering layer migration)
|
||||
- The graduated memory has a forward pointer to the entity
|
||||
candidate's id
|
||||
- Promoting the entity candidate does NOT delete the original
|
||||
memory
|
||||
- The same graduation flow works for `project` → Requirement
|
||||
and `knowledge` → Fact entity types (test the path; doesn't
|
||||
have to be exhaustive)
|
||||
|
||||
### F-8: Provenance chain is complete
|
||||
|
||||
For every active entity in the test project, the following must
|
||||
be true:
|
||||
|
||||
- It links back to at least one source via `source_refs` (which
|
||||
is one or more of: source_chunk_id, source_interaction_id,
|
||||
source_artifact_id from KB import)
|
||||
- The provenance chain can be walked from the entity to the
|
||||
underlying raw text (source_chunks) or external artifact
|
||||
- Q-017 (the evidence query) returns at least one row for every
|
||||
active entity
|
||||
|
||||
If any active entity has no provenance, it's a bug — provenance
|
||||
is mandatory at write time per the promotion rules.
|
||||
|
||||
## Category 2 — Quality acceptance
|
||||
|
||||
### Q-1: All existing tests still pass
|
||||
|
||||
The full pre-V1 test suite (currently 160 tests) must still
|
||||
pass. The V1 implementation may add new tests but cannot regress
|
||||
any existing test.
|
||||
|
||||
### Q-2: V1 has its own test coverage
|
||||
|
||||
For each of F-1 through F-8 above, at least one automated test
|
||||
exists that:
|
||||
|
||||
- exercises the happy path
|
||||
- covers at least one error path
|
||||
- runs in CI in under 10 seconds (no real network, no real LLM)
|
||||
|
||||
The full V1 test suite should be under 30 seconds total runtime
|
||||
to keep the development loop fast.
|
||||
|
||||
### Q-3: Conflict invariants are enforced by tests
|
||||
|
||||
Specific tests must demonstrate:
|
||||
|
||||
- Two contradictory KB exports produce a conflict (not silent
|
||||
overwrite)
|
||||
- A reviewer can't accidentally promote both members of an open
|
||||
conflict to active without resolving the conflict first
|
||||
- The "flag, never block" rule holds — writes still succeed
|
||||
even when they create a conflict
|
||||
|
||||
### Q-4: Trust hierarchy is enforced by tests
|
||||
|
||||
Specific tests must demonstrate:
|
||||
|
||||
- Entity candidates can never appear in context packs
|
||||
- Reinforcement only touches active memories (already covered
|
||||
by Phase 9 Commit B tests, but the same property must hold
|
||||
for entities once they exist)
|
||||
- Nothing automatically writes to project_state ever
|
||||
- Candidates can never satisfy Q-005 (only active entities count)
|
||||
|
||||
### Q-5: The Human Mirror is reproducible
|
||||
|
||||
A golden-file test exists for at least one Mirror page. Updating
|
||||
the golden file is a normal part of template work (single
|
||||
command, well-documented). The test fails if the renderer
|
||||
produces different bytes for the same input, catching
|
||||
non-determinism.
|
||||
|
||||
### Q-6: Killer correctness queries pass against real-ish data
|
||||
|
||||
The test bed for Q-006, Q-009, Q-011 is not synthetic. The
|
||||
implementation must seed the test project with at least:
|
||||
|
||||
- One Requirement that has a satisfying Component (Q-006 should
|
||||
not flag it)
|
||||
- One Requirement with no satisfying Component (Q-006 must flag it)
|
||||
- One Decision based on an Assumption flagged as `needs_review`
|
||||
(Q-009 must flag the Decision)
|
||||
- One ValidationClaim with at least one supporting Result
|
||||
(Q-011 should not flag it)
|
||||
- One ValidationClaim with no supporting Result (Q-011 must flag it)
|
||||
|
||||
These five seed cases run as a single integration test that
|
||||
exercises the killer correctness queries against actual
|
||||
representative data.
|
||||
|
||||
## Category 3 — Operational acceptance
|
||||
|
||||
### O-1: Migration is safe and reversible
|
||||
|
||||
The V1 schema migration (adding the `entities`, `relationships`,
|
||||
`conflicts`, `conflict_members` tables, plus `mirror_regeneration_failures`)
|
||||
must:
|
||||
|
||||
- run cleanly against a production-shape database
|
||||
- be implemented via the same `_apply_migrations` pattern as
|
||||
Phase 9 (additive only, idempotent, safe to run twice)
|
||||
- be tested by spinning up a fresh DB AND running against a
|
||||
copy of the live Dalidou DB taken from a backup
|
||||
|
||||
### O-2: Backup and restore still work
|
||||
|
||||
The backup endpoint must include the new tables. A restore drill
|
||||
on the test project must:
|
||||
|
||||
- successfully back up the V1 entity state via
|
||||
`POST /admin/backup`
|
||||
- successfully validate the snapshot
|
||||
- successfully restore from the snapshot per
|
||||
`docs/backup-restore-procedure.md`
|
||||
- pass post-restore verification including a Q-001 query against
|
||||
the test project
|
||||
|
||||
The drill must be performed once before V1 is declared done.
|
||||
|
||||
### O-3: Performance bounds
|
||||
|
||||
These are starting bounds; tune later if real usage shows
|
||||
problems:
|
||||
|
||||
- Single-entity write (`POST /entities/...`): under 100ms p99
|
||||
on the production Dalidou hardware
|
||||
- Single Q-001 / Q-005 / Q-008 query: under 500ms p99 against
|
||||
a project with up to 1000 entities
|
||||
- Mirror regeneration of one project overview: under 5 seconds
|
||||
for a project with up to 1000 entities
|
||||
- Conflict detector at write time: adds no more than 50ms p99
|
||||
to a write that doesn't actually produce a conflict
|
||||
|
||||
These bounds are not tested by automated benchmarks in V1 (that
|
||||
would be over-engineering). They are sanity-checked by the
|
||||
developer running the operations against the test project.
|
||||
|
||||
### O-4: No new manual ops burden
|
||||
|
||||
V1 should not introduce any new "you have to remember to run X
|
||||
every day" requirement. Specifically:
|
||||
|
||||
- Mirror regeneration is automatic (debounced async + daily
|
||||
refresh), no manual cron entry needed
|
||||
- Conflict detection is automatic at write time, no manual sweep
|
||||
needed in V1 (the nightly sweep is V2)
|
||||
- Backup retention cleanup is **still** an open follow-up from
|
||||
the operational baseline; V1 does not block on it
|
||||
|
||||
### O-5: No regressions in Phase 9 reflection loop
|
||||
|
||||
The capture, reinforcement, and extraction loop from Phase 9
|
||||
A/B/C must continue to work end to end with the engineering
|
||||
layer in place. Specifically:
|
||||
|
||||
- Memories whose types are NOT in the engineering layer
|
||||
(identity, preference, episodic) keep working exactly as
|
||||
before
|
||||
- Memories whose types ARE in the engineering layer (project,
|
||||
knowledge, adaptation) can still be created hand or by
|
||||
extraction; the deprecation rule from `memory-vs-entities.md`
|
||||
("no new writes after V1 ships") is implemented as a
|
||||
configurable warning, not a hard block, so existing
|
||||
workflows aren't disrupted
|
||||
|
||||
## Category 4 — Documentation acceptance
|
||||
|
||||
### D-1: Per-entity-type spec docs
|
||||
|
||||
Each of the 12 V1 entity types has a short spec doc under
|
||||
`docs/architecture/entities/` covering:
|
||||
|
||||
- the entity's purpose
|
||||
- its required and optional fields
|
||||
- its lifecycle quirks (if any beyond the standard
|
||||
candidate/active/superseded/invalid)
|
||||
- which queries it appears in (cross-reference to the catalog)
|
||||
- which relationship types reference it
|
||||
|
||||
These docs can be terse — a page each, mostly bullet lists.
|
||||
Their purpose is to make the entity model legible to a future
|
||||
maintainer, not to be reference manuals.
|
||||
|
||||
### D-2: KB-CAD and KB-FEM export schema docs
|
||||
|
||||
`docs/architecture/kb-cad-export-schema.md` and
|
||||
`docs/architecture/kb-fem-export-schema.md` are written and
|
||||
match the implemented validators.
|
||||
|
||||
### D-3: V1 release notes
|
||||
|
||||
A `docs/v1-release-notes.md` summarizes:
|
||||
|
||||
- What V1 added (entities, relationships, conflicts, mirror,
|
||||
ingest endpoints)
|
||||
- What V1 deferred (auto-promotion, BOM/cost/manufacturing
|
||||
entities, NX direct integration, cross-project rollups)
|
||||
- The migration story for existing memories (graduation flow)
|
||||
- Known limitations and the V2 roadmap pointers
|
||||
|
||||
### D-4: master-plan-status.md and current-state.md updated
|
||||
|
||||
Both top-level status docs reflect V1's completion:
|
||||
|
||||
- Phase 6 (AtoDrive) and the engineering layer are explicitly
|
||||
marked as separate tracks
|
||||
- The engineering planning sprint section is marked complete
|
||||
- Phase 9 stays at "baseline complete" (V1 doesn't change Phase 9)
|
||||
- The engineering layer V1 is added as its own line item
|
||||
|
||||
## What V1 explicitly does NOT need to do
|
||||
|
||||
To prevent scope creep, here is the negative list. None of the
|
||||
following are V1 acceptance criteria:
|
||||
|
||||
- **No LLM extractor.** The Phase 9 C rule-based extractor is
|
||||
the entity extractor for V1 too, just with new rules added for
|
||||
entity types.
|
||||
- **No auto-promotion of candidates.** Per `promotion-rules.md`.
|
||||
- **No write-back to KB-CAD or KB-FEM.** Per
|
||||
`tool-handoff-boundaries.md`.
|
||||
- **No multi-user / per-reviewer auth.** Single-user assumed.
|
||||
- **No real-time UI.** API + Mirror markdown is the V1 surface.
|
||||
A web UI is V2+.
|
||||
- **No cross-project rollups.** Per `human-mirror-rules.md`.
|
||||
- **No time-travel queries** (Q-015 stays v1-stretch).
|
||||
- **No nightly conflict sweep.** Synchronous detection only in V1.
|
||||
- **No incremental Chroma snapshots.** The current full-copy
|
||||
approach in `backup-restore-procedure.md` is fine for V1.
|
||||
- **No retention cleanup script.** Still an open follow-up.
|
||||
- **No backup encryption.** Still an open follow-up.
|
||||
- **No off-Dalidou backup target.** Still an open follow-up.
|
||||
|
||||
## How to use this document during implementation
|
||||
|
||||
When the implementation sprint begins:
|
||||
|
||||
1. Read this doc once, top to bottom
|
||||
2. Pick the test project (probably p05-interferometer because
|
||||
the optical/structural domain has the cleanest entity model)
|
||||
3. For each section, write the test or the implementation, in
|
||||
roughly the order: F-1 → F-2 → F-3 → F-4 → F-5 → F-6 → F-7 → F-8
|
||||
4. Each acceptance criterion's test should be written **before
|
||||
or alongside** the implementation, not after
|
||||
5. Run the full test suite at every commit
|
||||
6. When every box is checked, write D-3 (release notes), update
|
||||
D-4 (status docs), and call V1 done
|
||||
|
||||
The implementation sprint should not touch anything outside the
|
||||
scope listed here. If a desire arises to add something not in
|
||||
this doc, that's a V2 conversation, not a V1 expansion.
|
||||
|
||||
## Anticipated friction points
|
||||
|
||||
These are the things I expect will be hard during implementation:
|
||||
|
||||
1. **The graduation flow (F-7)** is the most cross-cutting
|
||||
change because it touches the existing memory module.
|
||||
Worth doing it last so the memory module is stable for
|
||||
all the V1 entity work first.
|
||||
2. **The Mirror's deterministic-output requirement (Q-5)** will
|
||||
bite if the implementer iterates over Python dicts without
|
||||
sorting. Plan to use `sorted()` literally everywhere.
|
||||
3. **Conflict detection (F-5)** has subtle correctness traps:
|
||||
the slot key extraction must be stable, the dedup-of-existing-conflicts
|
||||
logic must be right, and the synchronous detector must not
|
||||
slow writes meaningfully (Q-3 / O-3 cover this, but watch).
|
||||
4. **Provenance backfill** for entities that come from the
|
||||
existing memory layer via graduation (F-7) is the trickiest
|
||||
part: the original memory may not have had a strict
|
||||
`source_chunk_id`, in which case the graduated entity also
|
||||
doesn't have one. The implementation needs an "orphan
|
||||
provenance" allowance for graduated entities, with a
|
||||
warning surfaced in the Mirror.
|
||||
|
||||
These aren't blockers, just the parts of the V1 spec I'd
|
||||
attack with extra care.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- Engineering V1 is done when every box in this doc is checked
|
||||
against one chosen active project
|
||||
- Functional: 8 criteria covering entities, queries, ingest,
|
||||
review queue, conflicts, mirror, graduation, provenance
|
||||
- Quality: 6 criteria covering tests, golden files, killer
|
||||
correctness, trust enforcement
|
||||
- Operational: 5 criteria covering migration safety, backup
|
||||
drill, performance bounds, no new manual ops, Phase 9 not
|
||||
regressed
|
||||
- Documentation: 4 criteria covering entity specs, KB schema
|
||||
docs, release notes, top-level status updates
|
||||
- Negative list: a clear set of things V1 deliberately does
|
||||
NOT need to do, to prevent scope creep
|
||||
- The implementation sprint follows this doc as a checklist
|
||||
384
docs/architecture/human-mirror-rules.md
Normal file
384
docs/architecture/human-mirror-rules.md
Normal file
@@ -0,0 +1,384 @@
|
||||
# Human Mirror Rules (Layer 3 → derived markdown views)
|
||||
|
||||
## Why this document exists
|
||||
|
||||
The engineering layer V1 stores facts as typed entities and
|
||||
relationships in a SQL database. That representation is excellent
|
||||
for queries, conflict detection, and automated reasoning, but
|
||||
it's terrible for the human reading experience. People want to
|
||||
read prose, not crawl JSON.
|
||||
|
||||
The Human Mirror is the layer that turns the typed entity store
|
||||
into human-readable markdown pages. It's strictly a derived view —
|
||||
nothing in the Human Mirror is canonical, every page is regenerated
|
||||
from current entity state on demand.
|
||||
|
||||
This document defines:
|
||||
|
||||
- what the Human Mirror generates
|
||||
- when it regenerates
|
||||
- how the human edits things they see in the Mirror
|
||||
- how the canonical-vs-derived rule is enforced (so editing the
|
||||
derived markdown can't silently corrupt the entity store)
|
||||
|
||||
## The non-negotiable rule
|
||||
|
||||
> **The Human Mirror is read-only from the human's perspective.**
|
||||
>
|
||||
> If the human wants to change a fact they see in the Mirror, they
|
||||
> change it in the canonical home (per `representation-authority.md`),
|
||||
> NOT in the Mirror page. The next regeneration picks up the change.
|
||||
|
||||
This rule is what makes the whole derived-view approach safe. If
|
||||
the human is allowed to edit Mirror pages directly, the
|
||||
canonical-vs-derived split breaks and the Mirror becomes a second
|
||||
source of truth that disagrees with the entity store.
|
||||
|
||||
The technical enforcement is that every Mirror page carries a
|
||||
header banner that says "this file is generated from AtoCore
|
||||
entity state, do not edit", and the file is regenerated from the
|
||||
entity store on every change to its underlying entities. Manual
|
||||
edits will be silently overwritten on the next regeneration.
|
||||
|
||||
## What the Mirror generates in V1
|
||||
|
||||
Three template families, each producing one or more pages per
|
||||
project:
|
||||
|
||||
### 1. Project Overview
|
||||
|
||||
One page per registered project. Renders:
|
||||
|
||||
- Project header (id, aliases, description)
|
||||
- Subsystem tree (from Q-001 / Q-004 in the query catalog)
|
||||
- Active Decisions affecting this project (Q-008, ordered by date)
|
||||
- Open Requirements with coverage status (Q-005, Q-006)
|
||||
- Open ValidationClaims with support status (Q-010, Q-011)
|
||||
- Currently flagged conflicts (from the conflict model)
|
||||
- Recent changes (Q-013) — last 14 days
|
||||
|
||||
This is the most important Mirror page. It's the page someone
|
||||
opens when they want to know "what's the state of this project
|
||||
right now". It deliberately mirrors what `current-state.md` does
|
||||
for AtoCore itself but generated entirely from typed state.
|
||||
|
||||
### 2. Decision Log
|
||||
|
||||
One page per project. Renders:
|
||||
|
||||
- All active Decisions in chronological order (newest first)
|
||||
- Each Decision shows: id, what was decided, when, the affected
|
||||
Subsystem/Component, the supporting evidence (Q-014, Q-017)
|
||||
- Superseded Decisions appear as collapsed "history" entries
|
||||
with a forward link to whatever superseded them
|
||||
- Conflicting Decisions get a "⚠ disputed" marker
|
||||
|
||||
This is the human-readable form of the engineering query catalog's
|
||||
Q-014 query.
|
||||
|
||||
### 3. Subsystem Detail
|
||||
|
||||
One page per Subsystem (so a few per project). Renders:
|
||||
|
||||
- Subsystem header
|
||||
- Components contained in this subsystem (Q-001)
|
||||
- Interfaces this subsystem has (Q-003)
|
||||
- Constraints applying to it (Q-007)
|
||||
- Decisions affecting it (Q-008)
|
||||
- Validation status: which Requirements are satisfied,
|
||||
which are open (Q-005, Q-006)
|
||||
- Change history within this subsystem (Q-013 scoped)
|
||||
|
||||
Subsystem detail pages are what someone reads when they're
|
||||
working on a specific part of the system and want everything
|
||||
relevant in one place.
|
||||
|
||||
## What the Mirror does NOT generate in V1
|
||||
|
||||
Intentionally excluded so the V1 implementation stays scoped:
|
||||
|
||||
- **Per-component detail pages.** Components are listed in
|
||||
Subsystem pages but don't get their own pages. Reduces page
|
||||
count from hundreds to dozens.
|
||||
- **Per-Decision detail pages.** Decisions appear inline in
|
||||
Project Overview and Decision Log; their full text plus
|
||||
evidence chain is shown there, not on a separate page.
|
||||
- **Cross-project rollup pages.** No "all projects at a glance"
|
||||
page in V1. Each project is its own report.
|
||||
- **Time-series / historical pages.** The Mirror is always
|
||||
"current state". History is accessible via Decision Log and
|
||||
superseded chains, but no "what was true on date X" page exists
|
||||
in V1 (Q-015 is v1-stretch in the query catalog for the same
|
||||
reason).
|
||||
- **Diff pages between two timestamps.** Same reasoning.
|
||||
- **Render of the conflict queue itself.** Conflicts appear
|
||||
inline in the relevant Mirror pages with the "⚠ disputed"
|
||||
marker and a link to `/conflicts/{id}`, but there's no
|
||||
Mirror page that lists all conflicts. Use `GET /conflicts`.
|
||||
- **Per-memory pages.** Memories are not engineering entities;
|
||||
they appear in context packs and the review queue, not in the
|
||||
Human Mirror.
|
||||
|
||||
## Where Mirror pages live
|
||||
|
||||
Two options were considered. The chosen V1 path is option B:
|
||||
|
||||
**Option A — write Mirror pages back into the source vault.**
|
||||
Generate `/srv/storage/atocore/sources/vault/mirror/p05/overview.md`
|
||||
so the human reads them in their normal Obsidian / markdown
|
||||
viewer. **Rejected** because writing into the source vault
|
||||
violates the "sources are read-only" rule from
|
||||
`tool-handoff-boundaries.md` and the operating model.
|
||||
|
||||
**Option B (chosen) — write Mirror pages into a dedicated AtoCore
|
||||
output dir, served via the API.** Generate under
|
||||
`/srv/storage/atocore/data/mirror/p05/overview.md`. The human
|
||||
reads them via:
|
||||
|
||||
- the API endpoints `GET /mirror/{project}/overview`,
|
||||
`GET /mirror/{project}/decisions`,
|
||||
`GET /mirror/{project}/subsystems/{subsystem}` (all return
|
||||
rendered markdown as text/markdown)
|
||||
- a future "Mirror viewer" in the Claude Code slash command
|
||||
`/atocore-mirror <project>` that fetches the rendered markdown
|
||||
and displays it inline
|
||||
- direct file access on Dalidou for power users:
|
||||
`cat /srv/storage/atocore/data/mirror/p05/overview.md`
|
||||
|
||||
The dedicated dir keeps the Mirror clearly separated from the
|
||||
canonical sources and makes regeneration safe (it's just a
|
||||
directory wipe + write).
|
||||
|
||||
## When the Mirror regenerates
|
||||
|
||||
Three triggers, in order from cheapest to most expensive:
|
||||
|
||||
### 1. On explicit human request
|
||||
|
||||
```
|
||||
POST /mirror/{project}/regenerate
|
||||
```
|
||||
|
||||
Returns the timestamp of the regeneration and the list of files
|
||||
written. This is the path the human takes when they've just
|
||||
curated something into project_state and want to see the Mirror
|
||||
reflect it immediately.
|
||||
|
||||
### 2. On entity write (debounced, async, per project)
|
||||
|
||||
When any entity in a project changes status (candidate → active,
|
||||
active → superseded), a regeneration of that project's Mirror is
|
||||
queued. The queue is debounced — multiple writes within a 30-second
|
||||
window only trigger one regeneration. This keeps the Mirror
|
||||
"close to current" without generating a Mirror update on every
|
||||
single API call.
|
||||
|
||||
The implementation is a simple dict of "next regeneration time"
|
||||
per project, checked by a background task. No cron, no message
|
||||
queue, no Celery. Just a `dict[str, datetime]` and a thread.
|
||||
|
||||
### 3. On scheduled refresh (daily)
|
||||
|
||||
Once per day at a quiet hour, every project's Mirror regenerates
|
||||
unconditionally. This catches any state drift from manual
|
||||
project_state edits that bypassed the entity write hooks, and
|
||||
provides a baseline guarantee that the Mirror is at most 24
|
||||
hours stale.
|
||||
|
||||
The schedule runs from the same machinery as the future backup
|
||||
retention job, so we get one cron-equivalent system to maintain
|
||||
instead of two.
|
||||
|
||||
## What if regeneration fails
|
||||
|
||||
The Mirror has to be resilient. If regeneration fails for a
|
||||
project (e.g. a query catalog query crashes, a template rendering
|
||||
error), the existing Mirror files are **not** deleted. The
|
||||
existing files stay in place (showing the last successful state)
|
||||
and a regeneration error is recorded in:
|
||||
|
||||
- the API response if the trigger was explicit
|
||||
- a log entry at warning level for the async path
|
||||
- a `mirror_regeneration_failures` table for the daily refresh
|
||||
|
||||
This means the human can always read the Mirror, even if the
|
||||
last 5 minutes of changes haven't made it in yet. Stale is
|
||||
better than blank.
|
||||
|
||||
## How the human curates "around" the Mirror
|
||||
|
||||
The Mirror reflects the current entity state. If the human
|
||||
doesn't like what they see, the right edits go into one of:
|
||||
|
||||
| What you want to change | Where you change it |
|
||||
|---|---|
|
||||
| A Decision's text | `PUT /entities/Decision/{id}` (or `PUT /memory/{id}` if it's still memory-layer) |
|
||||
| A Decision's status (active → superseded) | `POST /entities/Decision/{id}/supersede` (V1 entity API) |
|
||||
| Whether a Component "satisfies" a Requirement | edit the relationship directly via the entity API (V1) |
|
||||
| The current trusted next focus shown on the Project Overview | `POST /project/state` with `category=status, key=next_focus` |
|
||||
| A typo in a generated heading or label | edit the **template**, not the rendered file. Templates live in `templates/mirror/` (V1 implementation) |
|
||||
| Source of a fact ("this came from KB-CAD on day X") | not editable by hand — it's automatically populated from provenance |
|
||||
|
||||
The rule is consistent: edit the canonical home, regenerate (or
|
||||
let the auto-trigger fire), see the change reflected in the
|
||||
Mirror.
|
||||
|
||||
## Templates
|
||||
|
||||
The Mirror uses Jinja2-style templates checked into the repo
|
||||
under `templates/mirror/`. Each template is a markdown file with
|
||||
placeholders that the renderer fills from query catalog results.
|
||||
|
||||
Template list for V1:
|
||||
|
||||
- `templates/mirror/project-overview.md.j2`
|
||||
- `templates/mirror/decision-log.md.j2`
|
||||
- `templates/mirror/subsystem-detail.md.j2`
|
||||
|
||||
Editing a template is a code change, reviewed via normal git PRs.
|
||||
The templates are deliberately small and readable so the human
|
||||
can tweak the output format without touching renderer code.
|
||||
|
||||
The renderer is a thin module:
|
||||
|
||||
```python
|
||||
# src/atocore/mirror/renderer.py (V1, not yet implemented)
|
||||
|
||||
def render_project_overview(project: str) -> str:
|
||||
"""Generate the project overview markdown for one project."""
|
||||
facts = collect_project_overview_facts(project)
|
||||
template = load_template("project-overview.md.j2")
|
||||
return template.render(**facts)
|
||||
```
|
||||
|
||||
## The "do not edit" header
|
||||
|
||||
Every generated Mirror file starts with a fixed banner:
|
||||
|
||||
```markdown
|
||||
<!--
|
||||
This file is generated by AtoCore from current entity state.
|
||||
DO NOT EDIT — manual changes will be silently overwritten on
|
||||
the next regeneration.
|
||||
Edit the canonical home instead. See:
|
||||
https://docs.atocore.../representation-authority.md
|
||||
Regenerated: 2026-04-07T12:34:56Z
|
||||
Source entities: <commit-like checksum of input data>
|
||||
-->
|
||||
```
|
||||
|
||||
The checksum at the end lets the renderer skip work when nothing
|
||||
relevant has changed since the last regeneration. If the inputs
|
||||
match the previous run's checksum, the existing file is left
|
||||
untouched.
|
||||
|
||||
## Conflicts in the Mirror
|
||||
|
||||
Per the conflict model, any open conflict on a fact that appears
|
||||
in the Mirror gets a visible disputed marker:
|
||||
|
||||
```markdown
|
||||
- Lateral support material: **GF-PTFE** ⚠ disputed
|
||||
- The KB-CAD import on 2026-04-07 reported PEEK; conflict #c-039.
|
||||
```
|
||||
|
||||
The disputed marker is a hyperlink (in renderer terms; the markdown
|
||||
output is a relative link) to the conflict detail page in the API
|
||||
or to the conflict id for direct lookup. The reviewer follows the
|
||||
link, resolves the conflict via `POST /conflicts/{id}/resolve`,
|
||||
and on the next regeneration the marker disappears.
|
||||
|
||||
## Project-state overrides in the Mirror
|
||||
|
||||
When a Mirror page would show a value derived from entities, but
|
||||
project_state has an override on the same key, **the Mirror shows
|
||||
the project_state value** with a small annotation noting the
|
||||
override:
|
||||
|
||||
```markdown
|
||||
- Next focus: **Wave 2 trusted-operational ingestion** (curated)
|
||||
```
|
||||
|
||||
The `(curated)` annotation tells the reader "this is from the
|
||||
trusted-state Layer 3, not from extracted entities". This makes
|
||||
the trust hierarchy visible in the human reading experience.
|
||||
|
||||
## The "Mirror diff" workflow (post-V1, but designed for)
|
||||
|
||||
A common workflow after V1 ships will be:
|
||||
|
||||
1. Reviewer has curated some new entities
|
||||
2. They want to see "what changed in the Mirror as a result"
|
||||
3. They want to share that diff with someone else as evidence
|
||||
|
||||
To support this, the Mirror generator writes its output
|
||||
deterministically (sorted iteration, stable timestamp formatting)
|
||||
so a `git diff` between two regenerated states is meaningful.
|
||||
|
||||
V1 doesn't add an explicit "diff between two Mirror snapshots"
|
||||
endpoint — that's deferred. But the deterministic-output
|
||||
property is a V1 requirement so future diffing works without
|
||||
re-renderer-design work.
|
||||
|
||||
## What the Mirror enables
|
||||
|
||||
With the Mirror in place:
|
||||
|
||||
- **OpenClaw can read project state in human form.** The
|
||||
read-only AtoCore helper skill on the T420 already calls
|
||||
`/context/build`; in V1 it gains the option to call
|
||||
`/mirror/{project}/overview` to get a fully-rendered markdown
|
||||
page instead of just retrieved chunks. This is much faster
|
||||
than crawling individual entities for general questions.
|
||||
- **The human gets a daily-readable artifact.** Every morning,
|
||||
Antoine can `cat /srv/storage/atocore/data/mirror/p05/overview.md`
|
||||
and see the current state of p05 in his preferred reading
|
||||
format. No API calls, no JSON parsing.
|
||||
- **Cross-collaborator sharing.** If you ever want to send
|
||||
someone a project overview without giving them AtoCore access,
|
||||
the Mirror file is a self-contained markdown document they can
|
||||
read in any markdown viewer.
|
||||
- **Claude Code integration.** A future
|
||||
`/atocore-mirror <project>` slash command renders the Mirror
|
||||
inline, complementing the existing `/atocore-context` command
|
||||
with a human-readable view of "what does AtoCore think about
|
||||
this project right now".
|
||||
|
||||
## Open questions for V1 implementation
|
||||
|
||||
1. **What's the regeneration debounce window?** 30 seconds is the
|
||||
starting value but should be tuned with real usage.
|
||||
2. **Does the daily refresh need a separate trigger mechanism, or
|
||||
is it just a long-period entry in the same in-process scheduler
|
||||
that handles the debounced async refreshes?** Probably the
|
||||
latter — keep it simple.
|
||||
3. **How are templates tested?** Likely a small set of fixture
|
||||
project states + golden output files, with a single test that
|
||||
asserts `render(fixture) == golden`. Updating golden files is
|
||||
a normal part of template work.
|
||||
4. **Are Mirror pages discoverable via a directory listing
|
||||
endpoint?** `GET /mirror/{project}` returns the list of
|
||||
available pages for that project. Probably yes; cheap to add.
|
||||
5. **How does the Mirror handle a project that has zero entities
|
||||
yet?** Render an empty-state page that says "no curated facts
|
||||
yet — add some via /memory or /entities/Decision". Better than
|
||||
a blank file.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- The Human Mirror generates 3 template families per project
|
||||
(Overview, Decision Log, Subsystem Detail) from current entity
|
||||
state
|
||||
- It's strictly read-only from the human's perspective; edits go
|
||||
to the canonical home and the Mirror picks them up on
|
||||
regeneration
|
||||
- Three regeneration triggers: explicit POST, debounced
|
||||
async-on-write, daily scheduled refresh
|
||||
- Mirror files live in `/srv/storage/atocore/data/mirror/`
|
||||
(NOT in the source vault — sources stay read-only)
|
||||
- Conflicts and project_state overrides are visible inline in
|
||||
the rendered markdown so the trust hierarchy shows through
|
||||
- Templates are checked into the repo and edited via PR; the
|
||||
rendered files are derived and never canonical
|
||||
- Deterministic output is a V1 requirement so future diffing
|
||||
works without rework
|
||||
333
docs/architecture/llm-client-integration.md
Normal file
333
docs/architecture/llm-client-integration.md
Normal file
@@ -0,0 +1,333 @@
|
||||
# LLM Client Integration (the layering)
|
||||
|
||||
## Why this document exists
|
||||
|
||||
AtoCore must be reachable from many different LLM client contexts:
|
||||
|
||||
- **OpenClaw** on the T420 (already integrated via the read-only
|
||||
helper skill at `/home/papa/clawd/skills/atocore-context/`)
|
||||
- **Claude Code** on the laptop (via the slash command shipped in
|
||||
this repo at `.claude/commands/atocore-context.md`)
|
||||
- **Codex** sessions (future)
|
||||
- **Direct API consumers** — scripts, Python code, ad-hoc curl
|
||||
- **The eventual MCP server** when it's worth building
|
||||
|
||||
Without an explicit layering rule, every new client tends to
|
||||
reimplement the same routing logic (project detection, context
|
||||
build, retrieval audit, project-state inspection) in slightly
|
||||
different ways. That is exactly what almost happened in the first
|
||||
draft of the Claude Code slash command, which started as a curl +
|
||||
jq script that duplicated capabilities the existing operator client
|
||||
already had.
|
||||
|
||||
This document defines the layering so future clients don't repeat
|
||||
that mistake.
|
||||
|
||||
## The layering
|
||||
|
||||
Three layers, top to bottom:
|
||||
|
||||
```
|
||||
+----------------------------------------------------+
|
||||
| Per-agent thin frontends |
|
||||
| |
|
||||
| - Claude Code slash command |
|
||||
| (.claude/commands/atocore-context.md) |
|
||||
| - OpenClaw helper skill |
|
||||
| (/home/papa/clawd/skills/atocore-context/) |
|
||||
| - Codex skill (future) |
|
||||
| - MCP server (future) |
|
||||
+----------------------------------------------------+
|
||||
|
|
||||
| shells out to / imports
|
||||
v
|
||||
+----------------------------------------------------+
|
||||
| Shared operator client |
|
||||
| scripts/atocore_client.py |
|
||||
| |
|
||||
| - subcommands for stable AtoCore operations |
|
||||
| - fail-open on network errors |
|
||||
| - consistent JSON output across all subcommands |
|
||||
| - environment-driven configuration |
|
||||
| (ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS, |
|
||||
| ATOCORE_REFRESH_TIMEOUT_SECONDS, |
|
||||
| ATOCORE_FAIL_OPEN) |
|
||||
+----------------------------------------------------+
|
||||
|
|
||||
| HTTP
|
||||
v
|
||||
+----------------------------------------------------+
|
||||
| AtoCore HTTP API |
|
||||
| src/atocore/api/routes.py |
|
||||
| |
|
||||
| - the universal interface to AtoCore |
|
||||
| - everything else above is glue |
|
||||
+----------------------------------------------------+
|
||||
```
|
||||
|
||||
## The non-negotiable rules
|
||||
|
||||
These rules are what make the layering work.
|
||||
|
||||
### Rule 1 — every per-agent frontend is a thin wrapper
|
||||
|
||||
A per-agent frontend exists to do exactly two things:
|
||||
|
||||
1. **Translate the agent platform's command/skill format** into an
|
||||
invocation of the shared client (or a small sequence of them)
|
||||
2. **Render the JSON response** into whatever shape the agent
|
||||
platform wants (markdown for Claude Code, plaintext for
|
||||
OpenClaw, MCP tool result for an MCP server, etc.)
|
||||
|
||||
Everything else — talking to AtoCore, project detection, retrieval
|
||||
audit, fail-open behavior, configuration — is the **shared
|
||||
client's** job.
|
||||
|
||||
If a per-agent frontend grows logic beyond the two responsibilities
|
||||
above, that logic is in the wrong place. It belongs in the shared
|
||||
client where every other frontend gets to use it.
|
||||
|
||||
### Rule 2 — the shared client never duplicates the API
|
||||
|
||||
The shared client is allowed to **compose** API calls (e.g.
|
||||
`auto-context` calls `detect-project` then `context-build`), but
|
||||
it never reimplements API logic. If a useful operation can't be
|
||||
expressed via the existing API endpoints, the right fix is to
|
||||
extend the API, not to embed the logic in the client.
|
||||
|
||||
This rule keeps the API as the single source of truth for what
|
||||
AtoCore can do.
|
||||
|
||||
### Rule 3 — the shared client only exposes stable operations
|
||||
|
||||
A subcommand only makes it into the shared client when:
|
||||
|
||||
- the API endpoint behind it has been exercised by at least one
|
||||
real workflow
|
||||
- the request and response shapes are unlikely to change
|
||||
- the operation is one that more than one frontend will plausibly
|
||||
want
|
||||
|
||||
This rule keeps the client surface stable so frontends don't have
|
||||
to chase changes. New endpoints land in the API first, get
|
||||
exercised in real use, and only then get a client subcommand.
|
||||
|
||||
## What's in scope for the shared client today
|
||||
|
||||
The currently shipped scope (per `scripts/atocore_client.py`):
|
||||
|
||||
### Stable operations (shipped since the client was introduced)
|
||||
|
||||
| Subcommand | Purpose | API endpoint(s) |
|
||||
|---|---|---|
|
||||
| `health` | service status, mount + source readiness | `GET /health` |
|
||||
| `sources` | enabled source roots and their existence | `GET /sources` |
|
||||
| `stats` | document/chunk/vector counts | `GET /stats` |
|
||||
| `projects` | registered projects | `GET /projects` |
|
||||
| `project-template` | starter shape for a new project | `GET /projects/template` |
|
||||
| `propose-project` | preview a registration | `POST /projects/proposal` |
|
||||
| `register-project` | persist a registration | `POST /projects/register` |
|
||||
| `update-project` | update an existing registration | `PUT /projects/{name}` |
|
||||
| `refresh-project` | re-ingest a project's roots | `POST /projects/{name}/refresh` |
|
||||
| `project-state` | list trusted state for a project | `GET /project/state/{name}` |
|
||||
| `project-state-set` | curate trusted state | `POST /project/state` |
|
||||
| `project-state-invalidate` | supersede trusted state | `DELETE /project/state` |
|
||||
| `query` | raw retrieval | `POST /query` |
|
||||
| `context-build` | full context pack | `POST /context/build` |
|
||||
| `auto-context` | detect-project then context-build | composes `/projects` + `/context/build` |
|
||||
| `detect-project` | match a prompt to a registered project | composes `/projects` + local regex |
|
||||
| `audit-query` | retrieval-quality audit with classification | composes `/query` + local labelling |
|
||||
| `debug-context` | last context pack inspection | `GET /debug/context` |
|
||||
| `ingest-sources` | ingest configured source dirs | `POST /ingest/sources` |
|
||||
|
||||
### Phase 9 reflection loop (shipped after migration safety work)
|
||||
|
||||
These were explicitly deferred in earlier versions of this doc
|
||||
pending "exercised workflow". The constraint was real — premature
|
||||
API freeze would have made it harder to iterate on the ergonomics —
|
||||
but the deferral ran into a bootstrap problem: you can't exercise
|
||||
the workflow in real Claude Code sessions without a usable client
|
||||
surface to drive it from. The fix is to ship a minimal Phase 9
|
||||
surface now and treat it as stable-but-refinable: adding new
|
||||
optional parameters is fine, renaming subcommands is not.
|
||||
|
||||
| Subcommand | Purpose | API endpoint(s) |
|
||||
|---|---|---|
|
||||
| `capture` | record one interaction round-trip | `POST /interactions` |
|
||||
| `extract` | run the rule-based extractor (preview or persist) | `POST /interactions/{id}/extract` |
|
||||
| `reinforce-interaction` | backfill reinforcement on an existing interaction | `POST /interactions/{id}/reinforce` |
|
||||
| `list-interactions` | paginated list with filters | `GET /interactions` |
|
||||
| `get-interaction` | fetch one interaction by id | `GET /interactions/{id}` |
|
||||
| `queue` | list the candidate review queue | `GET /memory?status=candidate` |
|
||||
| `promote` | move a candidate memory to active | `POST /memory/{id}/promote` |
|
||||
| `reject` | mark a candidate memory invalid | `POST /memory/{id}/reject` |
|
||||
|
||||
All 8 Phase 9 subcommands have test coverage in
|
||||
`tests/test_atocore_client.py` via mocked `request()`, including
|
||||
an end-to-end test that drives the full capture → extract → queue
|
||||
→ promote/reject cycle through the client.
|
||||
|
||||
### Coverage summary
|
||||
|
||||
That covers everything in the "stable operations" set AND the
|
||||
full Phase 9 reflection loop: project lifecycle, ingestion,
|
||||
project-state curation, retrieval, context build,
|
||||
retrieval-quality audit, health and stats inspection, interaction
|
||||
capture, candidate extraction, candidate review queue.
|
||||
|
||||
## What's intentionally NOT in scope today
|
||||
|
||||
Two families of operations remain deferred:
|
||||
|
||||
### 1. Backup and restore admin operations
|
||||
|
||||
Phase 9 Commit B shipped these endpoints:
|
||||
|
||||
- `POST /admin/backup` (with `include_chroma`)
|
||||
- `GET /admin/backup` (list)
|
||||
- `GET /admin/backup/{stamp}/validate`
|
||||
|
||||
The backup endpoints are stable, but the documented operational
|
||||
procedure (`docs/backup-restore-procedure.md`) intentionally uses
|
||||
direct curl rather than the shared client. The reason is that
|
||||
backup operations are *administrative* and benefit from being
|
||||
explicit about which instance they're targeting, with no
|
||||
fail-open behavior. The shared client's fail-open default would
|
||||
hide a real backup failure.
|
||||
|
||||
If we later decide to add backup commands to the shared client,
|
||||
they would set `ATOCORE_FAIL_OPEN=false` for the duration of the
|
||||
call so the operator gets a real error on failure rather than a
|
||||
silent fail-open envelope.
|
||||
|
||||
### 2. Engineering layer entity operations
|
||||
|
||||
The engineering layer is in planning, not implementation. When
|
||||
V1 ships per `engineering-v1-acceptance.md`, the shared client
|
||||
will gain entity, relationship, conflict, and Mirror commands.
|
||||
None of those exist as stable contracts yet, so they are not in
|
||||
the shared client today.
|
||||
|
||||
## How a new agent platform integrates
|
||||
|
||||
When a new LLM client needs AtoCore (e.g. Codex, ChatGPT custom
|
||||
GPT, a Cursor extension), the integration recipe is:
|
||||
|
||||
1. **Don't reimplement.** Don't write a new HTTP client. Use the
|
||||
shared client.
|
||||
2. **Write a thin frontend** that translates the platform's
|
||||
command/skill format into a shell call to
|
||||
`python scripts/atocore_client.py <subcommand> <args...>`.
|
||||
3. **Render the JSON response** in the platform's preferred shape.
|
||||
4. **Inherit fail-open and env-var behavior** from the shared
|
||||
client. Don't override unless the platform explicitly needs
|
||||
to (e.g. an admin tool that wants to see real errors).
|
||||
5. **If a needed capability is missing**, propose adding it to
|
||||
the shared client. If the underlying API endpoint also
|
||||
doesn't exist, propose adding it to the API first. Don't
|
||||
add the logic to your frontend.
|
||||
|
||||
The Claude Code slash command in this repo is a worked example:
|
||||
~50 lines of markdown that does argument parsing, calls the
|
||||
shared client, and renders the result. It contains zero AtoCore
|
||||
business logic of its own.
|
||||
|
||||
## How OpenClaw fits
|
||||
|
||||
OpenClaw's helper skill at `/home/papa/clawd/skills/atocore-context/`
|
||||
on the T420 currently has its own implementation of `auto-context`,
|
||||
`detect-project`, and the project lifecycle commands. It predates
|
||||
this layering doc.
|
||||
|
||||
The right long-term shape is to **refactor the OpenClaw helper to
|
||||
shell out to the shared client** instead of duplicating the
|
||||
routing logic. This isn't urgent because:
|
||||
|
||||
- OpenClaw's helper works today and is in active use
|
||||
- The duplication is on the OpenClaw side; AtoCore itself is not
|
||||
affected
|
||||
- The shared client and the OpenClaw helper are in different
|
||||
repos (AtoCore vs OpenClaw clawd), so the refactor is a
|
||||
cross-repo coordination
|
||||
|
||||
The refactor is queued as a follow-up. Until then, **the OpenClaw
|
||||
helper and the Claude Code slash command are parallel
|
||||
implementations** of the same idea. The shared client is the
|
||||
canonical backbone going forward; new clients should follow the
|
||||
new pattern even though the existing OpenClaw helper still has
|
||||
its own.
|
||||
|
||||
## How this connects to the master plan
|
||||
|
||||
| Layer | Phase home | Status |
|
||||
|---|---|---|
|
||||
| AtoCore HTTP API | Phases 0/0.5/1/2/3/5/7/9 | shipped |
|
||||
| Shared operator client (`scripts/atocore_client.py`) | implicitly Phase 8 (OpenClaw integration) infrastructure | shipped via codex/port-atocore-ops-client merge |
|
||||
| OpenClaw helper skill (T420) | Phase 8 — partial | shipped (own implementation, refactor queued) |
|
||||
| Claude Code slash command (this repo) | precursor to Phase 11 (multi-model) | shipped (refactored to use the shared client) |
|
||||
| Codex skill | Phase 11 | future |
|
||||
| MCP server | Phase 11 | future |
|
||||
| Web UI / dashboard | Phase 11+ | future |
|
||||
|
||||
The shared client is the **substrate Phase 11 will build on**.
|
||||
Every new client added in Phase 11 should be a thin frontend on
|
||||
the shared client, not a fresh reimplementation.
|
||||
|
||||
## Versioning and stability
|
||||
|
||||
The shared client's subcommand surface is **stable**. Adding new
|
||||
subcommands is non-breaking. Changing or removing existing
|
||||
subcommands is breaking and would require a coordinated update
|
||||
of every frontend that depends on them.
|
||||
|
||||
The current shared client has no explicit version constant; the
|
||||
implicit contract is "the subcommands and JSON shapes documented
|
||||
in this file". When the client surface meaningfully changes,
|
||||
add a `CLIENT_VERSION = "x.y.z"` constant to
|
||||
`scripts/atocore_client.py` and bump it per semver:
|
||||
|
||||
- patch: bug fixes, no surface change
|
||||
- minor: new subcommands or new optional fields
|
||||
- major: removed subcommands, renamed fields, changed defaults
|
||||
|
||||
## Open follow-ups
|
||||
|
||||
1. **Refactor the OpenClaw helper** to shell out to the shared
|
||||
client. Cross-repo coordination, not blocking anything in
|
||||
AtoCore itself. With the Phase 9 subcommands now in the shared
|
||||
client, the OpenClaw refactor can reuse all the reflection-loop
|
||||
work instead of duplicating it.
|
||||
2. **Real-usage validation of the Phase 9 loop**, now that the
|
||||
client surface exists. First capture → extract → review cycle
|
||||
against the live Dalidou instance, likely via the Claude Code
|
||||
slash command flow. Findings feed back into subcommand
|
||||
refinement (new optional flags are fine, renames require a
|
||||
semver bump).
|
||||
3. **Add backup admin subcommands** if and when we decide the
|
||||
shared client should be the canonical backup operator
|
||||
interface (with fail-open disabled for admin commands).
|
||||
4. **Add engineering-layer entity subcommands** as part of the
|
||||
engineering V1 implementation sprint, per
|
||||
`engineering-v1-acceptance.md`.
|
||||
5. **Tag a `CLIENT_VERSION` constant** the next time the shared
|
||||
client surface meaningfully changes. Today's surface with the
|
||||
Phase 9 loop added is the v0.2.0 baseline (v0.1.0 was the
|
||||
stable-ops-only version).
|
||||
|
||||
## TL;DR
|
||||
|
||||
- AtoCore HTTP API is the universal interface
|
||||
- `scripts/atocore_client.py` is the canonical shared Python
|
||||
backbone for stable AtoCore operations
|
||||
- Per-agent frontends (Claude Code slash command, OpenClaw
|
||||
helper, future Codex skill, future MCP server) are thin
|
||||
wrappers that shell out to the shared client
|
||||
- The shared client today covers project lifecycle, ingestion,
|
||||
retrieval, context build, project-state, retrieval audit, AND
|
||||
the full Phase 9 reflection loop (capture / extract /
|
||||
reinforce / list / queue / promote / reject)
|
||||
- Backup admin and engineering-entity commands remain deferred
|
||||
- The OpenClaw helper is currently a parallel implementation and
|
||||
the refactor to the shared client is a queued follow-up
|
||||
- New LLM clients should never reimplement HTTP calls — they
|
||||
follow the shell-out pattern documented here
|
||||
309
docs/architecture/memory-vs-entities.md
Normal file
309
docs/architecture/memory-vs-entities.md
Normal file
@@ -0,0 +1,309 @@
|
||||
# Memory vs Entities (Engineering Layer V1 boundary)
|
||||
|
||||
## Why this document exists
|
||||
|
||||
The engineering layer introduces a new representation — typed
|
||||
entities with explicit relationships — alongside AtoCore's existing
|
||||
memory system and its six memory types. The question that blocks
|
||||
every other engineering-layer planning doc is:
|
||||
|
||||
> When we extract a fact from an interaction or a document, does it
|
||||
> become a memory, an entity, or both? And if both, which one is
|
||||
> canonical?
|
||||
|
||||
Without an answer, the rest of the engineering layer cannot be
|
||||
designed. This document is the answer.
|
||||
|
||||
## The short version
|
||||
|
||||
- **Memories stay.** They are still the canonical home for
|
||||
*unstructured, attributed, personal, natural-language* facts.
|
||||
- **Entities are new.** They are the canonical home for *structured,
|
||||
typed, relational, engineering-domain* facts.
|
||||
- **No concept lives in both at full fidelity.** Every concept has
|
||||
exactly one canonical home. The other layer may hold a pointer or
|
||||
a rendered view, never a second source of truth.
|
||||
- **The two layers share one review queue.** Candidates from
|
||||
extraction flow into the same `status=candidate` lifecycle
|
||||
regardless of whether they are memory-bound or entity-bound.
|
||||
- **Memories can "graduate" into entities** when enough structure has
|
||||
accumulated, but the upgrade is an explicit, logged promotion, not
|
||||
a silent rewrite.
|
||||
|
||||
## The split per memory type
|
||||
|
||||
The six memory types from the current Phase 2 implementation each
|
||||
map to exactly one outcome in V1:
|
||||
|
||||
| Memory type | V1 destination | Rationale |
|
||||
|---------------|-------------------------------|-------------------------------------------------------------------------------------------------------------|
|
||||
| identity | **memory only** | Always about the human user. No engineering domain structure. Never gets entity-shaped. |
|
||||
| preference | **memory only** | Always about the human user's working style. Same reasoning. |
|
||||
| episodic | **memory only** | "What happened in this conversation / this day." Attribution and time are the point, not typed structure. |
|
||||
| knowledge | **entity when possible**, memory otherwise | If the knowledge maps to a typed engineering object (material property, constant, tolerance), it becomes a Fact entity with provenance. If it's loose general knowledge, stays a memory. |
|
||||
| project | **entity** | Anything that belonged in the "project" memory type is really a Requirement, Constraint, Decision, Subsystem attribute, etc. It belongs in the engineering layer once entities exist. |
|
||||
| adaptation | **entity (Decision)** | "We decided to X" is literally a Decision entity in the ontology. This is the clearest migration. |
|
||||
|
||||
**Practical consequence:** when the engineering layer V1 ships, the
|
||||
`project`, `knowledge`, and `adaptation` memory types are deprecated
|
||||
as a canonical home for new facts. Existing rows are not deleted —
|
||||
they are backfilled as entities through the promotion-rules flow
|
||||
(see `promotion-rules.md`), and the old memory rows become frozen
|
||||
references pointing at their graduated entity.
|
||||
|
||||
The `identity`, `preference`, and `episodic` memory types continue
|
||||
to exist exactly as they do today and do not interact with the
|
||||
engineering layer at all.
|
||||
|
||||
## What "canonical home" actually means
|
||||
|
||||
A concept's canonical home is the single place where:
|
||||
|
||||
- its *current active value* is stored
|
||||
- its *status lifecycle* is managed (active/superseded/invalid)
|
||||
- its *confidence* is tracked
|
||||
- its *provenance chain* is rooted
|
||||
- edits, supersessions, and invalidations are applied
|
||||
- conflict resolution is arbitrated
|
||||
|
||||
Everything else is a derived view of that canonical row.
|
||||
|
||||
If a `Decision` entity is the canonical home for "we switched to
|
||||
GF-PTFE pads", then:
|
||||
|
||||
- there is no `adaptation` memory row with the same content; the
|
||||
extractor creates a `Decision` candidate directly
|
||||
- the context builder, when asked to include relevant state, reaches
|
||||
into the entity store via the engineering layer, not the memory
|
||||
store
|
||||
- if the user wants to see "recent decisions" they hit the entity
|
||||
API, never the memory API
|
||||
- if they want to invalidate the decision, they do so via the entity
|
||||
API
|
||||
|
||||
The memory API remains the canonical home for `identity`,
|
||||
`preference`, and `episodic` — same rules, just a different set of
|
||||
types.
|
||||
|
||||
## Why not a unified table with a `kind` column?
|
||||
|
||||
It would be simpler to implement. It is rejected for three reasons:
|
||||
|
||||
1. **Different query shapes.** Memories are queried by type, project,
|
||||
confidence, recency. Entities are queried by type, relationships,
|
||||
graph traversal, coverage gaps ("orphan requirements"). Cramming
|
||||
both into one table forces the schema to be the union of both
|
||||
worlds and makes each query slower.
|
||||
|
||||
2. **Different lifecycles.** Memories have a simple four-state
|
||||
lifecycle (candidate/active/superseded/invalid). Entities have
|
||||
the same four states *plus* per-relationship supersession,
|
||||
per-field versioning for the killer correctness queries, and
|
||||
structured conflict flagging. The unified table would have to
|
||||
carry all entity apparatus for every memory row.
|
||||
|
||||
3. **Different provenance semantics.** A preference memory is
|
||||
provenanced by "the user told me" — one author, one time.
|
||||
An entity like a `Requirement` is provenanced by "this source
|
||||
chunk + this source document + these supporting Results" — a
|
||||
graph. The tables want to be different because their provenance
|
||||
models are different.
|
||||
|
||||
So: two tables, one review queue, one promotion flow, one trust
|
||||
hierarchy.
|
||||
|
||||
## The shared review queue
|
||||
|
||||
Both the memory extractor (Phase 9 Commit C, already shipped) and
|
||||
the future entity extractor write into the same conceptual queue:
|
||||
everything lands at `status=candidate` in its own table, and the
|
||||
human reviewer sees a unified list. The reviewer UI (future work)
|
||||
shows candidates of all kinds side by side, grouped by source
|
||||
interaction / source document, with the rule that fired.
|
||||
|
||||
From the data side this means:
|
||||
|
||||
- the memories table gets a `candidate` status (**already done in
|
||||
Phase 9 Commit B/C**)
|
||||
- the future entities table will get the same `candidate` status
|
||||
- both tables get the same `promote` / `reject` API shape: one verb
|
||||
per candidate, with an audit log entry
|
||||
|
||||
Implementation note: the API routes should evolve from
|
||||
`POST /memory/{id}/promote` to `POST /candidates/{id}/promote` once
|
||||
both tables exist, so the reviewer tooling can treat them
|
||||
uniformly. The current memory-only route stays in place for
|
||||
backward compatibility and is aliased by the unified route.
|
||||
|
||||
## Memory-to-entity graduation
|
||||
|
||||
Even though the split is clean on paper, real usage will reveal
|
||||
memories that deserve to be entities but started as plain text.
|
||||
Four signals are good candidates for proposing graduation:
|
||||
|
||||
1. **Reference count crosses a threshold.** A memory that has been
|
||||
reinforced 5+ times across multiple interactions is a strong
|
||||
signal that it deserves structure.
|
||||
|
||||
2. **Memory content matches a known entity template.** If a
|
||||
`knowledge` memory's content matches the shape "X = value [unit]"
|
||||
it can be proposed as a `Fact` or `Parameter` entity.
|
||||
|
||||
3. **A user explicitly asks for promotion.** `POST /memory/{id}/graduate`
|
||||
is the simplest explicit path — it returns a proposal for an
|
||||
entity structured from the memory's content, which the user can
|
||||
accept or reject.
|
||||
|
||||
4. **Extraction pass proposes an entity that happens to match an
|
||||
existing memory.** The entity extractor, when scanning a new
|
||||
interaction, sees the same content already exists as a memory
|
||||
and proposes graduation as part of its candidate output.
|
||||
|
||||
The graduation flow is:
|
||||
|
||||
```
|
||||
memory row (active, confidence C)
|
||||
|
|
||||
| propose_graduation()
|
||||
v
|
||||
entity candidate row (candidate, confidence C)
|
||||
+
|
||||
memory row gets status="graduated" and a forward pointer to the
|
||||
entity candidate
|
||||
|
|
||||
| human promotes the candidate entity
|
||||
v
|
||||
entity row (active)
|
||||
+
|
||||
memory row stays "graduated" permanently (historical record)
|
||||
```
|
||||
|
||||
The memory is never deleted. It becomes a frozen historical
|
||||
pointer to the entity it became. This keeps the audit trail intact
|
||||
and lets the Human Mirror show "this decision started life as a
|
||||
memory on April 2, was graduated to an entity on April 15, now has
|
||||
2 supporting ValidationClaims".
|
||||
|
||||
The `graduated` status is a new memory status that gets added when
|
||||
the graduation flow is implemented. For now (Phase 9), only the
|
||||
three non-graduating types (identity/preference/episodic) would
|
||||
ever avoid it, and the three graduating types stay in their current
|
||||
memory-only state until the engineering layer ships.
|
||||
|
||||
## Context pack assembly after the split
|
||||
|
||||
The context builder today (`src/atocore/context/builder.py`) pulls:
|
||||
|
||||
1. Trusted Project State
|
||||
2. Identity + Preference memories
|
||||
3. Retrieved chunks
|
||||
|
||||
After the split, it pulls:
|
||||
|
||||
1. Trusted Project State (unchanged)
|
||||
2. **Identity + Preference memories** (unchanged — these stay memories)
|
||||
3. **Engineering-layer facts relevant to the prompt**, queried through
|
||||
the entity API (new)
|
||||
4. Retrieved chunks (unchanged, lowest trust)
|
||||
|
||||
Note the ordering: identity/preference memories stay above entities,
|
||||
because personal style information is always more trusted than
|
||||
extracted engineering facts. Entities sit below the personal layer
|
||||
but above raw retrieval, because they have structured provenance
|
||||
that raw chunks lack.
|
||||
|
||||
The budget allocation gains a new slot:
|
||||
|
||||
- trusted project state: 20% (unchanged, highest trust)
|
||||
- identity memories: 5% (unchanged)
|
||||
- preference memories: 5% (unchanged)
|
||||
- **engineering entities: 15%** (new — pulls only V1-required
|
||||
objects relevant to the prompt)
|
||||
- retrieval: 55% (reduced from 70% to make room)
|
||||
|
||||
These are starting numbers. After the engineering layer ships and
|
||||
real usage tunes retrieval quality, these will be revisited.
|
||||
|
||||
## What the shipped memory types still mean after the split
|
||||
|
||||
| Memory type | Still accepts new writes? | V1 destination for new extractions |
|
||||
|-------------|---------------------------|------------------------------------|
|
||||
| identity | **yes** | memory (no change) |
|
||||
| preference | **yes** | memory (no change) |
|
||||
| episodic | **yes** | memory (no change) |
|
||||
| knowledge | yes, but only for loose facts | entity (Fact / Parameter) for structured things; memory is a fallback |
|
||||
| project | **no new writes after engineering V1 ships** | entity (Requirement / Constraint / Subsystem attribute) |
|
||||
| adaptation | **no new writes after engineering V1 ships** | entity (Decision) |
|
||||
|
||||
"No new writes" means the `create_memory` path will refuse to
|
||||
create new `project` or `adaptation` memories once the engineering
|
||||
layer V1 ships. Existing rows stay queryable and reinforceable but
|
||||
new facts of those kinds must become entities. This keeps the
|
||||
canonical-home rule clean going forward.
|
||||
|
||||
The deprecation is deferred: it does not happen until the engineering
|
||||
layer V1 is demonstrably working against the active project set. Until
|
||||
then, the existing memory types continue to accept writes so the
|
||||
Phase 9 loop can be exercised without waiting on the engineering
|
||||
layer.
|
||||
|
||||
## Consequences for Phase 9 (what we just built)
|
||||
|
||||
The capture loop, reinforcement, and extractor we shipped today
|
||||
are *memory-facing*. They produce memory candidates, reinforce
|
||||
memory confidence, and respect the memory status lifecycle. None
|
||||
of that changes.
|
||||
|
||||
When the engineering layer V1 ships, the extractor in
|
||||
`src/atocore/memory/extractor.py` gets a sibling in
|
||||
`src/atocore/entities/extractor.py` that uses the same
|
||||
interaction-scanning approach but produces entity candidates
|
||||
instead. The `POST /interactions/{id}/extract` endpoint either:
|
||||
|
||||
- runs both extractors and returns a combined result, or
|
||||
- gains a `?target=memory|entities|both` query parameter
|
||||
|
||||
and the decision between those two shapes can wait until the
|
||||
entity extractor actually exists.
|
||||
|
||||
Until the entity layer is real, the memory extractor also has to
|
||||
cover some things that will eventually move to entities (decisions,
|
||||
constraints, requirements). **That overlap is temporary and
|
||||
intentional.** Rather than leave those cues unextracted for months
|
||||
while the entity layer is being built, the memory extractor
|
||||
surfaces them as memory candidates. Later, a migration pass will
|
||||
propose graduation on every active memory created by
|
||||
`decision_heading`, `constraint_heading`, and `requirement_heading`
|
||||
rules once the entity types exist to receive them.
|
||||
|
||||
So: **no rework in Phase 9, no wasted extraction, clean handoff
|
||||
once the entity layer lands**.
|
||||
|
||||
## Open questions this document does NOT answer
|
||||
|
||||
These are deliberately deferred to later planning docs:
|
||||
|
||||
1. **When exactly does extraction fire?** (answered by
|
||||
`promotion-rules.md`)
|
||||
2. **How are conflicts between a memory and an entity handled
|
||||
during graduation?** (answered by `conflict-model.md`)
|
||||
3. **Does the context builder traverse the entity graph for
|
||||
relationship-rich queries, or does it only surface direct facts?**
|
||||
(answered by the context-builder spec in a future
|
||||
`engineering-context-integration.md` doc)
|
||||
4. **What is the exact API shape of the unified candidate review
|
||||
queue?** (answered by a future `review-queue-api.md` doc when
|
||||
the entity extractor exists and both tables need one UI)
|
||||
|
||||
## TL;DR
|
||||
|
||||
- memories = user-facing unstructured facts, still own identity/preference/episodic
|
||||
- entities = engineering-facing typed facts, own project/knowledge/adaptation
|
||||
- one canonical home per concept, never both
|
||||
- one shared candidate-review queue, same promote/reject shape
|
||||
- graduated memories stay as frozen historical pointers
|
||||
- Phase 9 stays memory-only and ships today; entity V1 follows the
|
||||
remaining architecture docs in this planning sprint
|
||||
- no rework required when the entity layer lands; the current memory
|
||||
extractor's structural cues get migrated forward via explicit
|
||||
graduation
|
||||
462
docs/architecture/project-identity-canonicalization.md
Normal file
462
docs/architecture/project-identity-canonicalization.md
Normal file
@@ -0,0 +1,462 @@
|
||||
# Project Identity Canonicalization
|
||||
|
||||
## Why this document exists
|
||||
|
||||
AtoCore identifies projects by name in many places: trusted state
|
||||
rows, memories, captured interactions, query/context API parameters,
|
||||
extractor candidates, future engineering entities. Without an
|
||||
explicit rule, every callsite would have to remember to canonicalize
|
||||
project names through the registry — and the recent codex review
|
||||
caught exactly the bug class that follows when one of them forgets.
|
||||
|
||||
The fix landed in `fb6298a` and works correctly today. This document
|
||||
exists to make the rule **explicit and discoverable** so the
|
||||
engineering layer V1 implementation, future entity write paths, and
|
||||
any new agent integration don't reintroduce the same fragmentation
|
||||
when nobody is looking.
|
||||
|
||||
## The contract
|
||||
|
||||
> **Every read/write that takes a project name MUST canonicalize it
|
||||
> through `resolve_project_name()` before the value crosses a service
|
||||
> boundary.**
|
||||
|
||||
The boundary is wherever a project name becomes a database row, a
|
||||
query filter, an attribute on a stored object, or a key for any
|
||||
lookup. The canonicalization happens **once**, at that boundary,
|
||||
before the underlying storage primitive is called.
|
||||
|
||||
Symbolically:
|
||||
|
||||
```
|
||||
HTTP layer (raw user input)
|
||||
↓
|
||||
service entry point
|
||||
↓
|
||||
project_name = resolve_project_name(project_name) ← ONLY canonical from this point
|
||||
↓
|
||||
storage / queries / further service calls
|
||||
```
|
||||
|
||||
The rule is intentionally simple. There's no per-call exception,
|
||||
no "trust me, the caller already canonicalized it" shortcut, no
|
||||
opt-out flag. Every service-layer entry point applies the helper
|
||||
the moment it receives a project name from outside the service.
|
||||
|
||||
## The helper
|
||||
|
||||
```python
|
||||
# src/atocore/projects/registry.py
|
||||
|
||||
def resolve_project_name(name: str | None) -> str:
|
||||
"""Canonicalize a project name through the registry.
|
||||
|
||||
Returns the canonical project_id if the input matches any
|
||||
registered project's id or alias. Returns the input unchanged
|
||||
when it's empty or not in the registry — the second case keeps
|
||||
backwards compatibility with hand-curated state, memories, and
|
||||
interactions that predate the registry, or for projects that
|
||||
are intentionally not registered.
|
||||
"""
|
||||
if not name:
|
||||
return name or ""
|
||||
project = get_registered_project(name)
|
||||
if project is not None:
|
||||
return project.project_id
|
||||
return name
|
||||
```
|
||||
|
||||
Three behaviors worth keeping in mind:
|
||||
|
||||
1. **Empty / None input → empty string output.** Callers don't have
|
||||
to pre-check; passing `""` or `None` to a query filter still
|
||||
works as "no project scope".
|
||||
2. **Registered alias → canonical project_id.** The helper does the
|
||||
case-insensitive lookup and returns the project's `id` field
|
||||
(e.g. `"p05" → "p05-interferometer"`).
|
||||
3. **Unregistered name → input unchanged.** This is the
|
||||
backwards-compatibility path. Hand-curated state, memories, or
|
||||
interactions created under a name that isn't in the registry
|
||||
keep working. The retrieval is then "best effort" — the raw
|
||||
string is used as the SQL key, which still finds the row that
|
||||
was stored under the same raw string. This path exists so the
|
||||
engineering layer V1 doesn't have to also be a data migration.
|
||||
|
||||
## Where the helper is currently called
|
||||
|
||||
As of `fb6298a`, the helper is invoked at exactly these eight
|
||||
service-layer entry points:
|
||||
|
||||
| Module | Function | What gets canonicalized |
|
||||
|---|---|---|
|
||||
| `src/atocore/context/builder.py` | `build_context` | the `project_hint` parameter, before the trusted state lookup |
|
||||
| `src/atocore/context/project_state.py` | `set_state` | `project_name`, before `ensure_project()` |
|
||||
| `src/atocore/context/project_state.py` | `get_state` | `project_name`, before the SQL lookup |
|
||||
| `src/atocore/context/project_state.py` | `invalidate_state` | `project_name`, before the SQL lookup |
|
||||
| `src/atocore/interactions/service.py` | `record_interaction` | `project`, before insert |
|
||||
| `src/atocore/interactions/service.py` | `list_interactions` | `project` filter parameter, before WHERE clause |
|
||||
| `src/atocore/memory/service.py` | `create_memory` | `project`, before insert |
|
||||
| `src/atocore/memory/service.py` | `get_memories` | `project` filter parameter, before WHERE clause |
|
||||
|
||||
Every one of those is the **first** thing the function does after
|
||||
input validation. There is no path through any of those eight
|
||||
functions where a project name reaches storage without passing
|
||||
through `resolve_project_name`.
|
||||
|
||||
## Where the helper is NOT called (and why that's correct)
|
||||
|
||||
These places intentionally do not canonicalize:
|
||||
|
||||
1. **`update_memory`'s project field.** The API does not allow
|
||||
changing a memory's project after creation, so there's no
|
||||
project to canonicalize. The function only updates `content`,
|
||||
`confidence`, and `status`.
|
||||
2. **The retriever's `_project_match_boost` substring matcher.** It
|
||||
already calls `get_registered_project` internally to expand the
|
||||
hint into the candidate set (canonical id + all aliases + last
|
||||
path segments). It accepts the raw hint by design.
|
||||
3. **`_rank_chunks`'s secondary substring boost in
|
||||
`builder.py`.** Still uses the raw hint. This is a multiplicative
|
||||
factor on top of correct retrieval, not a filter, so it cannot
|
||||
drop relevant chunks. Tracked as a future cleanup but not
|
||||
critical.
|
||||
4. **Direct SQL queries for the projects table itself** (e.g.
|
||||
`ensure_project`'s lookup). These are intentional case-insensitive
|
||||
raw lookups against the column the canonical id is stored in.
|
||||
`set_state` already canonicalized before reaching `ensure_project`,
|
||||
so the value passed is the canonical id by definition.
|
||||
5. **Hand-authored project names that aren't in the registry.**
|
||||
The helper returns those unchanged. This is the backwards-compat
|
||||
path mentioned above; it is *not* a violation of the rule, it's
|
||||
the rule applied to a name with no registry record.
|
||||
|
||||
## Why this is the trust hierarchy in action
|
||||
|
||||
The whole point of AtoCore is the trust hierarchy from the operating
|
||||
model:
|
||||
|
||||
1. Trusted Project State (Layer 3) is the most authoritative layer
|
||||
2. Memories (active) are second
|
||||
3. Source chunks (raw retrieved content) are last
|
||||
|
||||
If a caller passes the alias `p05` and Layer 3 was written under
|
||||
`p05-interferometer`, and the lookup fails to find the canonical
|
||||
row, **the trust hierarchy collapses**. The most-authoritative
|
||||
layer is silently invisible to the caller. The system would still
|
||||
return *something* — namely, lower-trust retrieved chunks — and the
|
||||
human would never know they got a degraded answer.
|
||||
|
||||
The canonicalization helper is what makes the trust hierarchy
|
||||
**dependable**. Layer 3 is supposed to win every time. To win it
|
||||
has to be findable. To be findable, the lookup key has to match
|
||||
how the row was stored. And the only way to guarantee that match
|
||||
across every entry point is to canonicalize at every boundary.
|
||||
|
||||
## Compatibility gap: legacy alias-keyed rows
|
||||
|
||||
The canonicalization rule fixes new writes going forward, but it
|
||||
does NOT fix rows that were already written under a registered
|
||||
alias before `fb6298a` landed. Those rows have a real, concrete
|
||||
gap that must be closed by a one-time migration before the
|
||||
engineering layer V1 ships.
|
||||
|
||||
The exact failure mode:
|
||||
|
||||
```
|
||||
time T0 (before fb6298a):
|
||||
POST /project/state {project: "p05", ...}
|
||||
-> set_state("p05", ...) # no canonicalization
|
||||
-> ensure_project("p05") # creates a "p05" row
|
||||
-> writes state with project_id pointing at the "p05" row
|
||||
|
||||
time T1 (after fb6298a):
|
||||
POST /project/state {project: "p05", ...} (or any read)
|
||||
-> set_state("p05", ...)
|
||||
-> resolve_project_name("p05") -> "p05-interferometer"
|
||||
-> ensure_project("p05-interferometer") # creates a SECOND row
|
||||
-> writes new state under the canonical row
|
||||
-> the T0 state is still in the "p05" row, INVISIBLE to every
|
||||
canonicalized read
|
||||
```
|
||||
|
||||
The unregistered-name fallback path saves you when the project was
|
||||
never in the registry: a row stored under `"orphan-project"` is read
|
||||
back via `"orphan-project"`, both pass through `resolve_project_name`
|
||||
unchanged, and the strings line up. **It does not save you when the
|
||||
name is a registered alias** — the helper rewrites the read key but
|
||||
not the storage key, and the legacy row becomes invisible.
|
||||
|
||||
What is at risk on the live Dalidou DB:
|
||||
|
||||
1. **`projects` table**: any rows whose `name` column matches a
|
||||
registered alias (one row per alias actually written under
|
||||
before the fix landed). These shadow the canonical project row
|
||||
and silently fragment the projects namespace.
|
||||
2. **`project_state` table**: any rows whose `project_id` points
|
||||
at one of those shadow project rows. **This is the highest-risk
|
||||
case** because it directly defeats the trust hierarchy: Layer 3
|
||||
trusted state becomes invisible to every canonicalized lookup.
|
||||
3. **`memories` table**: any rows whose `project` column is a
|
||||
registered alias. Reinforcement and extraction queries will
|
||||
miss them.
|
||||
4. **`interactions` table**: any rows whose `project` column is a
|
||||
registered alias. Listing and downstream reflection will miss
|
||||
them.
|
||||
|
||||
How to find out the actual blast radius on the live Dalidou DB:
|
||||
|
||||
```sql
|
||||
-- inspect the projects table for alias-shadow rows
|
||||
SELECT id, name FROM projects;
|
||||
|
||||
-- count alias-keyed memories per known alias
|
||||
SELECT project, COUNT(*) FROM memories
|
||||
WHERE project IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core')
|
||||
GROUP BY project;
|
||||
|
||||
-- count alias-keyed interactions
|
||||
SELECT project, COUNT(*) FROM interactions
|
||||
WHERE project IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core')
|
||||
GROUP BY project;
|
||||
|
||||
-- count alias-shadowed project_state rows by project name
|
||||
SELECT p.name, COUNT(*) FROM project_state ps
|
||||
JOIN projects p ON ps.project_id = p.id
|
||||
WHERE p.name IN ('p04','p05','p06','gigabit','interferometer','polisher','ato core');
|
||||
```
|
||||
|
||||
The migration that closes the gap has to:
|
||||
|
||||
1. For each registered project, find all `projects` rows whose
|
||||
name matches one of the project's aliases AND is not the
|
||||
canonical id itself. These are the "shadow" rows.
|
||||
2. For each shadow row, MERGE its dependent state into the
|
||||
canonical project's row:
|
||||
- rekey `project_state.project_id` from shadow → canonical
|
||||
- if the merge would create a `(project_id, category, key)`
|
||||
collision (a state row already exists under the canonical
|
||||
id with the same category+key), the migration must surface
|
||||
the conflict via the existing conflict model and pause
|
||||
until the human resolves it
|
||||
- delete the now-empty shadow `projects` row
|
||||
3. For `memories` and `interactions`, the fix is simpler because
|
||||
the alias appears as a string column (not a foreign key):
|
||||
`UPDATE memories SET project = canonical WHERE project = alias`,
|
||||
then same for interactions.
|
||||
4. The migration must run in dry-run mode first, printing the
|
||||
exact rows it would touch and the canonical destinations they
|
||||
would be merged into.
|
||||
5. The migration must be idempotent — running it twice produces
|
||||
the same final state as running it once.
|
||||
|
||||
This work is **required before the engineering layer V1 ships**
|
||||
because V1 will add new `entities`, `relationships`, `conflicts`,
|
||||
and `mirror_regeneration_failures` tables that all key on the
|
||||
canonical project id. Any leaked alias-keyed rows in the existing
|
||||
tables would show up in V1 reads as silently missing data, and
|
||||
the killer-correctness queries from `engineering-query-catalog.md`
|
||||
(orphan requirements, decisions on flagged assumptions,
|
||||
unsupported claims) would report wrong results against any project
|
||||
that has shadow rows.
|
||||
|
||||
The migration script does NOT exist yet. The open follow-ups
|
||||
section below tracks it as the next concrete step.
|
||||
|
||||
## The rule for new entry points
|
||||
|
||||
When you add a new service-layer function that takes a project name,
|
||||
follow this checklist:
|
||||
|
||||
1. **Does the function read or write a row keyed by project?** If
|
||||
yes, you must call `resolve_project_name`. If no (e.g. it only
|
||||
takes `project` as a label for logging), you may skip the
|
||||
canonicalization but you should add a comment explaining why.
|
||||
2. **Where does the canonicalization go?** As the first statement
|
||||
after input validation. Not later, not "before storage", not
|
||||
"in the helper that does the actual write". As the first
|
||||
statement, so any subsequent service call inside the function
|
||||
sees the canonical value.
|
||||
3. **Add a regression test that uses an alias.** Use the
|
||||
`project_registry` fixture from `tests/conftest.py` to set up
|
||||
a temp registry with at least one project + aliases, then
|
||||
verify the new function works when called with the alias and
|
||||
when called with the canonical id.
|
||||
4. **If the function can be called with `None` or empty string,
|
||||
verify that path too.** The helper handles it correctly but
|
||||
the function-under-test might not.
|
||||
|
||||
## How the `project_registry` test fixture works
|
||||
|
||||
`tests/conftest.py::project_registry` returns a callable that
|
||||
takes one or more `(project_id, [aliases])` tuples (or just a bare
|
||||
`project_id` string), writes them into a temp registry file,
|
||||
points `ATOCORE_PROJECT_REGISTRY_PATH` at it, and reloads
|
||||
`config.settings`. Use it like:
|
||||
|
||||
```python
|
||||
def test_my_new_thing_canonicalizes(project_registry):
|
||||
project_registry(("p05-interferometer", ["p05", "interferometer"]))
|
||||
|
||||
# ... call your service function with "p05" ...
|
||||
# ... assert it works the same as if you'd passed "p05-interferometer" ...
|
||||
```
|
||||
|
||||
The fixture is reused by all 12 alias-canonicalization regression
|
||||
tests added in `fb6298a`. Following the same pattern for new
|
||||
features is the cheapest way to keep the contract intact.
|
||||
|
||||
## What this rule does NOT cover
|
||||
|
||||
1. **Alias creation / management.** This document is about reading
|
||||
and writing project-keyed data. Adding new projects or new
|
||||
aliases is the registry's own write path
|
||||
(`POST /projects/register`, `PUT /projects/{name}`), which
|
||||
already enforces collision detection and atomic file writes.
|
||||
2. **Registry hot-reloading.** The helper calls
|
||||
`load_project_registry()` on every invocation, which reads the
|
||||
JSON file each time. There is no in-process cache. If the
|
||||
registry file changes, the next call sees the new contents.
|
||||
Performance is fine for the current registry size but if it
|
||||
becomes a bottleneck, add a versioned cache here, not at every
|
||||
call site.
|
||||
3. **Cross-project deduplication.** If two different projects in
|
||||
the registry happen to share an alias, the registry's collision
|
||||
detection blocks the second one at registration time, so this
|
||||
case can't arise in practice. The helper does not handle it
|
||||
defensively.
|
||||
4. **Time-bounded canonicalization.** A project's canonical id is
|
||||
stable. Aliases can be added or removed via
|
||||
`PUT /projects/{name}`, but the canonical `id` field never
|
||||
changes after registration. So a row written today under the
|
||||
canonical id will always remain findable under that id, even
|
||||
if the alias set evolves.
|
||||
5. **Migration of legacy data.** If the live Dalidou DB has rows
|
||||
that were written under aliases before the canonicalization
|
||||
landed (e.g. a `memories` row with `project = "p05"` from
|
||||
before `fb6298a`), those rows are **NOT** automatically
|
||||
reachable from the canonicalized read path. The unregistered-
|
||||
name fallback only helps for project names that were never
|
||||
registered at all; it does **NOT** help for names that are
|
||||
registered as aliases. See the "Compatibility gap" section
|
||||
below for the exact failure mode and the migration path that
|
||||
has to run before the engineering layer V1 ships.
|
||||
|
||||
## What this enables for the engineering layer V1
|
||||
|
||||
When the engineering layer ships per `engineering-v1-acceptance.md`,
|
||||
it adds at least these new project-keyed surfaces:
|
||||
|
||||
- `entities` table with a `project_id` column
|
||||
- `relationships` table that joins entities, indirectly project-keyed
|
||||
- `conflicts` table with a `project` column
|
||||
- `mirror_regeneration_failures` table with a `project` column
|
||||
- new endpoints: `POST /entities/...`, `POST /ingest/kb-cad/export`,
|
||||
`POST /ingest/kb-fem/export`, `GET /mirror/{project}/...`,
|
||||
`GET /conflicts?project=...`
|
||||
|
||||
**Every one of those write/read paths needs to call
|
||||
`resolve_project_name` at its service-layer entry point**, following
|
||||
the same pattern as the eight existing call sites listed above. The
|
||||
implementation sprint should:
|
||||
|
||||
1. Apply the helper at each new service entry point as the first
|
||||
statement after input validation
|
||||
2. Add a regression test using the `project_registry` fixture that
|
||||
exercises an alias against each new entry point
|
||||
3. Treat any new service function that takes a project name without
|
||||
calling `resolve_project_name` as a code review failure
|
||||
|
||||
The pattern is simple enough to follow without thinking, which is
|
||||
exactly the property we want for a contract that has to hold
|
||||
across many independent additions.
|
||||
|
||||
## Open follow-ups
|
||||
|
||||
These are things the canonicalization story still has open. None
|
||||
are blockers, but they're the rough edges to be aware of.
|
||||
|
||||
1. **Legacy alias data migration — REQUIRED before engineering V1
|
||||
ships, NOT optional.** If the live Dalidou DB has any rows
|
||||
written under aliases before `fb6298a` landed, they are
|
||||
silently invisible to the canonicalized read path (see the
|
||||
"Compatibility gap" section above for the exact failure mode).
|
||||
This is a real correctness issue, not a theoretical one: any
|
||||
trusted state, memory, or interaction stored under `p05`,
|
||||
`gigabit`, `polisher`, etc. before the fix landed is currently
|
||||
unreachable from any service-layer query. The migration script
|
||||
has to walk `projects`, `project_state`, `memories`, and
|
||||
`interactions`, merge shadow rows into their canonical
|
||||
counterparts (with conflict-model handling for any collisions),
|
||||
and run in dry-run mode first. Estimated cost: ~150 LOC for
|
||||
the migration script + ~50 LOC of tests + a one-time supervised
|
||||
run on the live Dalidou DB. **This migration is the next
|
||||
concrete pre-V1 step.**
|
||||
2. **Registry file caching.** `load_project_registry()` reads the
|
||||
JSON file on every `resolve_project_name` call. With ~5
|
||||
projects this is fine; with 50+ it would warrant a versioned
|
||||
cache (cache key = file mtime + size). Defer until measured.
|
||||
3. **Case sensitivity audit.** The helper uses
|
||||
`get_registered_project` which lowercases for comparison. The
|
||||
stored canonical id keeps its original casing. No bug today
|
||||
because every test passes, but worth re-confirming when the
|
||||
engineering layer adds entity-side storage.
|
||||
4. **`_rank_chunks`'s secondary substring boost.** Mentioned
|
||||
earlier; still uses the raw hint. Replace it with the same
|
||||
helper-driven approach the retriever uses, OR delete it as
|
||||
redundant once we confirm the retriever's primary boost is
|
||||
sufficient.
|
||||
5. **Documentation discoverability.** This doc lives under
|
||||
`docs/architecture/`. The contract is also restated in the
|
||||
docstring of `resolve_project_name` and referenced from each
|
||||
call site's comment. That redundancy is intentional — the
|
||||
contract is too easy to forget to live in only one place.
|
||||
|
||||
## Quick reference card
|
||||
|
||||
Copy-pasteable for new service functions:
|
||||
|
||||
```python
|
||||
from atocore.projects.registry import resolve_project_name
|
||||
|
||||
|
||||
def my_new_service_entry_point(
|
||||
project_name: str,
|
||||
other_args: ...,
|
||||
) -> ...:
|
||||
# Validate inputs first
|
||||
if not project_name:
|
||||
raise ValueError("project_name is required")
|
||||
|
||||
# Canonicalize through the registry as the first thing after
|
||||
# validation. Every subsequent operation in this function uses
|
||||
# the canonical id, so storage and queries are guaranteed
|
||||
# consistent across alias and canonical-id callers.
|
||||
project_name = resolve_project_name(project_name)
|
||||
|
||||
# ... rest of the function ...
|
||||
```
|
||||
|
||||
## TL;DR
|
||||
|
||||
- One helper, one rule: `resolve_project_name` at every service-layer
|
||||
entry point that takes a project name
|
||||
- Currently called in 8 places across builder, project_state,
|
||||
interactions, and memory; all 8 listed in this doc
|
||||
- Backwards-compat path returns **unregistered** names unchanged
|
||||
(e.g. `"orphan-project"`); this does NOT cover **registered
|
||||
alias** names that were used as storage keys before `fb6298a`
|
||||
- **Real compatibility gap**: any row whose `project` column is a
|
||||
registered alias from before the canonicalization landed is
|
||||
silently invisible to the new read path. A one-time migration
|
||||
is required before engineering V1 ships. See the "Compatibility
|
||||
gap" section.
|
||||
- The trust hierarchy depends on this helper being applied
|
||||
everywhere — Layer 3 trusted state has to be findable for it to
|
||||
win the trust battle
|
||||
- Use the `project_registry` test fixture to add regression tests
|
||||
for any new service function that takes a project name
|
||||
- The engineering layer V1 implementation must follow the same
|
||||
pattern at every new service entry point
|
||||
- Open follow-ups (in priority order): **legacy alias data
|
||||
migration (required pre-V1)**, redundant substring boost
|
||||
cleanup, registry caching when projects scale
|
||||
343
docs/architecture/promotion-rules.md
Normal file
343
docs/architecture/promotion-rules.md
Normal file
@@ -0,0 +1,343 @@
|
||||
# Promotion Rules (Layer 0 → Layer 2 pipeline)
|
||||
|
||||
## Purpose
|
||||
|
||||
AtoCore ingests raw human-authored content (markdown, repo notes,
|
||||
interaction transcripts) and eventually must turn some of it into
|
||||
typed engineering entities that the V1 query catalog can answer.
|
||||
The path from raw text to typed entity has to be:
|
||||
|
||||
- **explicit**: every step has a named operation, a trigger, and an
|
||||
audit log
|
||||
- **reversible**: every promotion can be undone without data loss
|
||||
- **conservative**: no automatic movement into trusted state; a human
|
||||
(or later, a very confident policy) always signs off
|
||||
- **traceable**: every typed entity must carry a back-pointer to
|
||||
the raw source that produced it
|
||||
|
||||
This document defines that path.
|
||||
|
||||
## The four layers
|
||||
|
||||
Promotion is described in terms of four layers, all of which exist
|
||||
simultaneously in the system once the engineering layer V1 ships:
|
||||
|
||||
| Layer | Name | Canonical storage | Trust | Who writes |
|
||||
|-------|-------------------|------------------------------------------|-------|------------|
|
||||
| L0 | Raw source | source_documents + source_chunks | low | ingestion pipeline |
|
||||
| L1 | Memory candidate | memories (status="candidate") | low | extractor |
|
||||
| L1' | Active memory | memories (status="active") | med | human promotion |
|
||||
| L2 | Entity candidate | entities (status="candidate") | low | extractor + graduation |
|
||||
| L2' | Active entity | entities (status="active") | high | human promotion |
|
||||
| L3 | Trusted state | project_state | highest | human curation |
|
||||
|
||||
Layer 3 (trusted project state) is already implemented and stays
|
||||
manually curated — automatic promotion into L3 is **never** allowed.
|
||||
|
||||
## The promotion graph
|
||||
|
||||
```
|
||||
[L0] source chunks
|
||||
|
|
||||
| extraction (memory extractor, Phase 9 Commit C)
|
||||
v
|
||||
[L1] memory candidate
|
||||
|
|
||||
| promote_memory()
|
||||
v
|
||||
[L1'] active memory
|
||||
|
|
||||
| (optional) propose_graduation()
|
||||
v
|
||||
[L2] entity candidate
|
||||
|
|
||||
| promote_entity()
|
||||
v
|
||||
[L2'] active entity
|
||||
|
|
||||
| (manual curation, NEVER automatic)
|
||||
v
|
||||
[L3] trusted project state
|
||||
```
|
||||
|
||||
Short path (direct entity extraction, once the entity extractor
|
||||
exists):
|
||||
|
||||
```
|
||||
[L0] source chunks
|
||||
|
|
||||
| entity extractor
|
||||
v
|
||||
[L2] entity candidate
|
||||
|
|
||||
| promote_entity()
|
||||
v
|
||||
[L2'] active entity
|
||||
```
|
||||
|
||||
A single fact can travel either path depending on what the
|
||||
extractor saw. The graduation path exists for facts that started
|
||||
life as memories before the entity layer existed, and for the
|
||||
memory extractor's structural cues (decisions, constraints,
|
||||
requirements) which are eventually entity-shaped.
|
||||
|
||||
## Triggers (when does extraction fire?)
|
||||
|
||||
Phase 9 already shipped one trigger: **on explicit API request**
|
||||
(`POST /interactions/{id}/extract`). The V1 engineering layer adds
|
||||
two more:
|
||||
|
||||
1. **On interaction capture (automatic)**
|
||||
- Same event that runs reinforcement today
|
||||
- Controlled by a `extract` boolean flag on the record request
|
||||
(default: `false` for memory extractor, `true` once an
|
||||
engineering extractor exists and has been validated)
|
||||
- Output goes to the candidate queue; nothing auto-promotes
|
||||
|
||||
2. **On ingestion (batched, per wave)**
|
||||
- After a wave of markdown ingestion finishes, a batch extractor
|
||||
pass sweeps all newly-added source chunks and produces
|
||||
candidates from them
|
||||
- Batched per wave (not per chunk) to keep the review queue
|
||||
digestible and to let the reviewer see all candidates from a
|
||||
single ingestion in one place
|
||||
- Output: a report artifact plus a review queue entry per
|
||||
candidate
|
||||
|
||||
3. **On explicit human request (existing)**
|
||||
- `POST /interactions/{id}/extract` for a single interaction
|
||||
- Future: `POST /ingestion/wave/{id}/extract` for a whole wave
|
||||
- Future: `POST /memory/{id}/graduate` to propose graduation
|
||||
of one specific memory into an entity
|
||||
|
||||
Batch size rule: **extraction passes never write more than N
|
||||
candidates per human review cycle, where N = 50 by default**. If
|
||||
a pass produces more, it ranks by (rule confidence × content
|
||||
length × novelty) and only writes the top N. The remaining
|
||||
candidates are logged, not persisted. This protects the reviewer
|
||||
from getting buried.
|
||||
|
||||
## Confidence and ranking of candidates
|
||||
|
||||
Each rule-based extraction rule carries a *prior confidence*
|
||||
based on how specific its pattern is:
|
||||
|
||||
| Rule class | Prior | Rationale |
|
||||
|---------------------------|-------|-----------|
|
||||
| Heading with explicit type (`## Decision:`) | 0.7 | Very specific structural cue, intentional author marker |
|
||||
| Typed list item (`- [Decision] ...`) | 0.65 | Explicit but often embedded in looser prose |
|
||||
| Sentence pattern (`I prefer X`) | 0.5 | Moderate structure, more false positives |
|
||||
| Regex pattern matching a value+unit (`X = 4.8 kg`) | 0.6 | Structural but prone to coincidence |
|
||||
| LLM-based (future) | variable | Depends on model's returned confidence |
|
||||
|
||||
The candidate's final confidence at write time is:
|
||||
|
||||
```
|
||||
final = prior * structural_signal_multiplier * freshness_bonus
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- `structural_signal_multiplier` is 1.1 if the source chunk path
|
||||
contains any of `_HIGH_SIGNAL_HINTS` from the retriever (status,
|
||||
decision, requirements, charter, ...) and 0.9 if it contains
|
||||
`_LOW_SIGNAL_HINTS` (`_archive`, `_history`, ...)
|
||||
- `freshness_bonus` is 1.05 if the source chunk was updated in the
|
||||
last 30 days, else 1.0
|
||||
|
||||
This formula is tuned later; the numbers are starting values.
|
||||
|
||||
## Review queue mechanics
|
||||
|
||||
### Queue population
|
||||
|
||||
- Each candidate writes one row into its target table
|
||||
(memories or entities) with `status="candidate"`
|
||||
- Each candidate carries: `rule`, `source_span`, `source_chunk_id`,
|
||||
`source_interaction_id`, `extractor_version`
|
||||
- No two candidates ever share the same (type, normalized_content,
|
||||
project) — if a second extraction pass produces a duplicate, it
|
||||
is dropped before being written
|
||||
|
||||
### Queue surfacing
|
||||
|
||||
- `GET /memory?status=candidate` lists memory candidates
|
||||
- `GET /entities?status=candidate` (future) lists entity candidates
|
||||
- `GET /candidates` (future unified route) lists both
|
||||
|
||||
### Reviewer actions
|
||||
|
||||
For each candidate, exactly one of:
|
||||
|
||||
- **promote**: `POST /memory/{id}/promote` or
|
||||
`POST /entities/{id}/promote`
|
||||
- sets `status="active"`
|
||||
- preserves the audit trail (source_chunk_id, rule, source_span)
|
||||
- **reject**: `POST /memory/{id}/reject` or
|
||||
`POST /entities/{id}/reject`
|
||||
- sets `status="invalid"`
|
||||
- preserves audit trail so repeat extractions don't re-propose
|
||||
- **edit-then-promote**: `PUT /memory/{id}` to adjust content, then
|
||||
`POST /memory/{id}/promote`
|
||||
- every edit is logged, original content preserved in a
|
||||
`previous_content_log` column (schema addition deferred to
|
||||
the first implementation sprint)
|
||||
- **defer**: no action; candidate stays in queue indefinitely
|
||||
(future: add a `pending_since` staleness indicator to the UI)
|
||||
|
||||
### Reviewer authentication
|
||||
|
||||
In V1 the review queue is single-user by convention. There is no
|
||||
per-reviewer authorization. Every promote/reject call is logged
|
||||
with the same default identity. Multi-user review is a V2 concern.
|
||||
|
||||
## Auto-promotion policies (deferred, but designed for)
|
||||
|
||||
The current V1 stance is: **no auto-promotion, ever**. All
|
||||
promotions require a human reviewer.
|
||||
|
||||
The schema and API are designed so that automatic policies can be
|
||||
added later without schema changes. The anticipated policies:
|
||||
|
||||
1. **Reference-count threshold**
|
||||
- If a candidate accumulates N+ references across multiple
|
||||
interactions within M days AND the reviewer hasn't seen it yet
|
||||
(indicating the system sees it often but the human hasn't
|
||||
gotten to it), propose auto-promote
|
||||
- Starting thresholds: N=5, M=7 days. Never auto-promote
|
||||
entity candidates that affect validation claims or decisions
|
||||
without explicit human review — those are too consequential.
|
||||
|
||||
2. **Confidence threshold**
|
||||
- If `final_confidence >= 0.85` AND the rule is a heading
|
||||
rule (not a sentence rule), eligible for auto-promotion
|
||||
|
||||
3. **Identity/preference lane**
|
||||
- identity and preference memories extracted from an
|
||||
interaction where the user explicitly says "I am X" or
|
||||
"I prefer X" with a first-person subject and high-signal
|
||||
verb could auto-promote. This is the safest lane because
|
||||
the user is the authoritative source for their own identity.
|
||||
|
||||
None of these run in V1. The APIs and data shape are designed so
|
||||
they can be added as a separate policy module without disrupting
|
||||
existing tests.
|
||||
|
||||
## Reversibility
|
||||
|
||||
Every promotion step must be undoable:
|
||||
|
||||
| Operation | How to undo |
|
||||
|---------------------------|-------------------------------------------------------|
|
||||
| memory candidate written | delete the candidate row (low-risk, it was never in context) |
|
||||
| memory candidate promoted | `PUT /memory/{id}` status=candidate (reverts to queue) |
|
||||
| memory candidate rejected | `PUT /memory/{id}` status=candidate |
|
||||
| memory graduated | memory stays as a frozen pointer; delete the entity candidate to undo |
|
||||
| entity candidate promoted | `PUT /entities/{id}` status=candidate |
|
||||
| entity promoted to active | supersede with a new active, or `PUT` back to candidate |
|
||||
|
||||
The only irreversible operation is manual curation into L3
|
||||
(trusted project state). That is by design — L3 is small, curated,
|
||||
and human-authored end to end.
|
||||
|
||||
## Provenance (what every candidate must carry)
|
||||
|
||||
Every candidate row, memory or entity, MUST have:
|
||||
|
||||
- `source_chunk_id` — if extracted from ingested content, the chunk it came from
|
||||
- `source_interaction_id` — if extracted from a captured interaction, the interaction it came from
|
||||
- `rule` — the extractor rule id that fired
|
||||
- `extractor_version` — a semver-ish string the extractor module carries
|
||||
so old candidates can be re-evaluated with a newer extractor
|
||||
|
||||
If both `source_chunk_id` and `source_interaction_id` are null, the
|
||||
candidate was hand-authored (via `POST /memory` directly) and must
|
||||
be flagged as such. Hand-authored candidates are allowed but
|
||||
discouraged — the preference is to extract from real content, not
|
||||
dictate candidates directly.
|
||||
|
||||
The active rows inherit all of these fields from their candidate
|
||||
row at promotion time. They are never overwritten.
|
||||
|
||||
## Extractor versioning
|
||||
|
||||
The extractor is going to change — new rules added, old rules
|
||||
refined, precision/recall tuned over time. The promotion flow
|
||||
must survive extractor changes:
|
||||
|
||||
- every extractor module exposes an `EXTRACTOR_VERSION = "0.1.0"`
|
||||
constant
|
||||
- every candidate row records this version
|
||||
- when the extractor version changes, the change log explains
|
||||
what the new rules do
|
||||
- old candidates are NOT automatically re-evaluated by the new
|
||||
extractor — that would lose the auditable history of why the
|
||||
old candidate was created
|
||||
- future `POST /memory/{id}/re-extract` can optionally propose
|
||||
an updated candidate from the same source chunk with the new
|
||||
extractor, but it produces a *new* candidate alongside the old
|
||||
one, never a silent rewrite
|
||||
|
||||
## Ingestion-wave extraction semantics
|
||||
|
||||
When the batched extraction pass fires on an ingestion wave, it
|
||||
produces a report artifact:
|
||||
|
||||
```
|
||||
data/extraction-reports/<wave-id>/
|
||||
├── report.json # summary counts, rule distribution
|
||||
├── candidates.ndjson # one JSON line per persisted candidate
|
||||
├── dropped.ndjson # one JSON line per candidate dropped
|
||||
│ # (over batch cap, duplicate, below
|
||||
│ # min content length, etc.)
|
||||
└── errors.log # any rule-level errors
|
||||
```
|
||||
|
||||
The report artifact lives under the configured `data_dir` and is
|
||||
retained per the backup retention policy. The ingestion-waves doc
|
||||
(`docs/ingestion-waves.md`) is updated to include an "extract"
|
||||
step after each wave, with the expectation that the human
|
||||
reviews the candidates before the next wave fires.
|
||||
|
||||
## Candidate-to-candidate deduplication across passes
|
||||
|
||||
Two extraction passes over the same chunk (or two different
|
||||
chunks containing the same fact) should not produce two identical
|
||||
candidate rows. The deduplication key is:
|
||||
|
||||
```
|
||||
(memory_type_or_entity_type, normalized_content, project, status)
|
||||
```
|
||||
|
||||
Normalization strips whitespace variants, lowercases, and drops
|
||||
trailing punctuation (same rules as the extractor's `_clean_value`
|
||||
function). If a second pass would produce a duplicate, it instead
|
||||
increments a `re_extraction_count` column on the existing
|
||||
candidate row and updates `last_re_extracted_at`. This gives the
|
||||
reviewer a "saw this N times" signal without flooding the queue.
|
||||
|
||||
This column is a future schema addition — current candidates do
|
||||
not track re-extraction. The promotion-rules implementation will
|
||||
land the column as part of its first migration.
|
||||
|
||||
## The "never auto-promote into trusted state" invariant
|
||||
|
||||
Regardless of what auto-promotion policies might exist between
|
||||
L0 → L2', **nothing ever moves into L3 (trusted project state)
|
||||
without explicit human action via `POST /project/state`**. This
|
||||
is the one hard line in the promotion graph and it is enforced
|
||||
by having no API endpoint that takes a candidate id and writes
|
||||
to `project_state`.
|
||||
|
||||
## Summary
|
||||
|
||||
- Four layers: L0 raw, L1 memory candidate/active, L2 entity
|
||||
candidate/active, L3 trusted state
|
||||
- Three triggers for extraction: on capture, on ingestion wave, on
|
||||
explicit request
|
||||
- Per-rule prior confidence, tuned by structural signals at write time
|
||||
- Shared candidate review queue, promote/reject/edit/defer actions
|
||||
- No auto-promotion in V1 (but the schema allows it later)
|
||||
- Every candidate carries full provenance and extractor version
|
||||
- Every promotion step is reversible except L3 curation
|
||||
- L3 is never touched automatically
|
||||
273
docs/architecture/representation-authority.md
Normal file
273
docs/architecture/representation-authority.md
Normal file
@@ -0,0 +1,273 @@
|
||||
# Representation Authority (canonical home matrix)
|
||||
|
||||
## Why this document exists
|
||||
|
||||
The same fact about an engineering project can show up in many
|
||||
places: a markdown note in the PKM, a structured field in KB-CAD,
|
||||
a commit message in a Gitea repo, an active memory in AtoCore, an
|
||||
entity in the engineering layer, a row in trusted project state.
|
||||
**Without an explicit rule about which representation is
|
||||
authoritative for which kind of fact, the system will accumulate
|
||||
contradictions and the human will lose trust in all of them.**
|
||||
|
||||
This document is the canonical-home matrix. Every kind of fact
|
||||
that AtoCore handles has exactly one authoritative representation,
|
||||
and every other place that holds a copy of that fact is, by
|
||||
definition, a derived view that may be stale.
|
||||
|
||||
## The representations in scope
|
||||
|
||||
Six places where facts can live in this ecosystem:
|
||||
|
||||
| Layer | What it is | Who edits it | How it's structured |
|
||||
|---|---|---|---|
|
||||
| **PKM** | Antoine's Obsidian-style markdown vault under `/srv/storage/atocore/sources/vault/` | Antoine, by hand | unstructured markdown with optional frontmatter |
|
||||
| **KB project** | the engineering Knowledge Base (KB-CAD / KB-FEM repos and any companion docs) | Antoine, semi-structured | per-tool typed records |
|
||||
| **Gitea repos** | source code repos under `dalidou:3000/Antoine/*` (Fullum-Interferometer, polisher-sim, ATOCore itself, ...) | Antoine via git commits | code, READMEs, repo-specific markdown |
|
||||
| **AtoCore memories** | rows in the `memories` table | hand-authored or extracted from interactions | typed (identity / preference / project / episodic / knowledge / adaptation) |
|
||||
| **AtoCore entities** | rows in the `entities` table (V1, not yet built) | imported from KB exports or extracted from interactions | typed entities + relationships per the V1 ontology |
|
||||
| **AtoCore project state** | rows in the `project_state` table (Layer 3, trusted) | hand-curated only, never automatic | category + key + value |
|
||||
|
||||
## The canonical home rule
|
||||
|
||||
> For each kind of fact, exactly one of the six representations is
|
||||
> the authoritative source. The other five may hold derived
|
||||
> copies, but they are not allowed to disagree with the
|
||||
> authoritative one. When they disagree, the disagreement is a
|
||||
> conflict and surfaces via the conflict model.
|
||||
|
||||
The matrix below assigns the authoritative representation per fact
|
||||
kind. It is the practical answer to the question "where does this
|
||||
fact actually live?" for daily decisions.
|
||||
|
||||
## The canonical-home matrix
|
||||
|
||||
| Fact kind | Canonical home | Why | How it gets into AtoCore |
|
||||
|---|---|---|---|
|
||||
| **CAD geometry** (the actual model) | NX (or successor CAD tool) | the only place that can render and validate it | not in AtoCore at all in V1 |
|
||||
| **CAD-side structure** (subsystem tree, component list, materials, parameters) | KB-CAD | KB-CAD is the structured wrapper around NX | KB-CAD export → `/ingest/kb-cad/export` → entities |
|
||||
| **FEM mesh & solver settings** | KB-FEM (wrapping the FEM tool) | only the solver representation can run | not in AtoCore at all in V1 |
|
||||
| **FEM results & validation outcomes** | KB-FEM | KB-FEM owns the outcome records | KB-FEM export → `/ingest/kb-fem/export` → entities |
|
||||
| **Source code** | Gitea repos | repos are version-controlled and reviewable | indirectly via repo markdown ingestion (Phase 1) |
|
||||
| **Repo-level documentation** (READMEs, design docs in the repo) | Gitea repos | lives next to the code it documents | ingested as source chunks; never hand-edited in AtoCore |
|
||||
| **Project-level prose notes** (decisions in long-form, journal-style entries, working notes) | PKM | the place Antoine actually writes when thinking | ingested as source chunks; the extractor proposes candidates from these for the review queue |
|
||||
| **Identity** ("the user is a mechanical engineer running AtoCore") | AtoCore memories (`identity` type) | nowhere else holds personal identity | hand-authored via `POST /memory` or extracted from interactions |
|
||||
| **Preference** ("prefers small reviewable diffs", "uses SI units") | AtoCore memories (`preference` type) | nowhere else holds personal preferences | hand-authored or extracted |
|
||||
| **Episodic** ("on April 6 we debugged the EXDEV bug") | AtoCore memories (`episodic` type) | nowhere else has time-bound personal recall | extracted from captured interactions |
|
||||
| **Decision** (a structured engineering decision) | AtoCore **entities** (Decision) once the engineering layer ships; AtoCore memories (`adaptation`) until then | needs structured supersession, audit trail, and link to affected components | extracted from PKM or interactions; promoted via review queue |
|
||||
| **Requirement** | AtoCore **entities** (Requirement) | needs structured satisfaction tracking | extracted from PKM, KB-CAD, or interactions |
|
||||
| **Constraint** | AtoCore **entities** (Constraint) | needs structured link to the entity it constrains | extracted from PKM, KB-CAD, or interactions |
|
||||
| **Validation claim** | AtoCore **entities** (ValidationClaim) | needs structured link to supporting Result | extracted from KB-FEM exports or interactions |
|
||||
| **Material** | KB-CAD if the material is on a real component; AtoCore entity (Material) if it's a project-wide material decision not yet attached to geometry | structured properties live in KB-CAD's material database | KB-CAD export, or hand-authored as a Material entity |
|
||||
| **Parameter** | KB-CAD or KB-FEM depending on whether it's a geometry or solver parameter; AtoCore entity (Parameter) if it's a higher-level project parameter not in either tool | structured numeric values with units live in their tool of origin | KB export, or hand-authored |
|
||||
| **Project status / current focus / next milestone** | AtoCore **project_state** (Layer 3) | the trust hierarchy says trusted state is the highest authority for "what is the current state of the project" | hand-curated via `POST /project/state` |
|
||||
| **Architectural decision records (ADRs)** | depends on form: long-form ADR markdown lives in the repo; the structured fact about which ADR was selected lives in the AtoCore Decision entity | both representations are useful for different audiences | repo ingestion provides the prose; the entity is created by extraction or hand-authored |
|
||||
| **Operational runbooks** | repo (next to the code they describe) | lives with the system it operates | not promoted into AtoCore entities — runbooks are reference material, not facts |
|
||||
| **Backup metadata** (snapshot timestamps, integrity status) | the backup-metadata.json files under `/srv/storage/atocore/backups/` | each snapshot is its own self-describing record | not in AtoCore's database; queried via the `/admin/backup` endpoints |
|
||||
| **Conversation history with AtoCore (interactions)** | AtoCore `interactions` table | nowhere else has the prompt + context pack + response triple | written by capture (Phase 9 Commit A) |
|
||||
|
||||
## The supremacy rule for cross-layer facts
|
||||
|
||||
When the same fact has copies in multiple representations and they
|
||||
disagree, the trust hierarchy applies in this order:
|
||||
|
||||
1. **AtoCore project_state** (Layer 3) is highest authority for any
|
||||
"current state of the project" question. This is why it requires
|
||||
manual curation and never gets touched by automatic processes.
|
||||
2. **The tool-of-origin canonical home** is highest authority for
|
||||
facts that are tool-managed: KB-CAD wins over AtoCore entities
|
||||
for CAD-side structure facts; KB-FEM wins for FEM result facts.
|
||||
3. **AtoCore entities** are highest authority for facts that are
|
||||
AtoCore-managed: Decisions, Requirements, Constraints,
|
||||
ValidationClaims (when the supporting Results are still loose).
|
||||
4. **Active AtoCore memories** are highest authority for personal
|
||||
facts (identity, preference, episodic).
|
||||
5. **Source chunks (PKM, repos, ingested docs)** are lowest
|
||||
authority — they are the raw substrate from which higher layers
|
||||
are extracted, but they may be stale, contradictory among
|
||||
themselves, or out of date.
|
||||
|
||||
This is the same hierarchy enforced by `conflict-model.md`. This
|
||||
document just makes it explicit per fact kind.
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1 — "what material does the lateral support pad use?"
|
||||
|
||||
Possible representations:
|
||||
|
||||
- KB-CAD has the field `component.lateral-support-pad.material = "GF-PTFE"`
|
||||
- A PKM note from last month says "considering PEEK for the
|
||||
lateral support, GF-PTFE was the previous choice"
|
||||
- An AtoCore Material entity says `GF-PTFE`
|
||||
- An AtoCore project_state entry says `p05 / decision /
|
||||
lateral_support_material = GF-PTFE`
|
||||
|
||||
Which one wins for the question "what's the current material"?
|
||||
|
||||
- **project_state wins** if the query is "what is the current
|
||||
trusted answer for p05's lateral support material" (Layer 3)
|
||||
- **KB-CAD wins** if project_state has not been curated for this
|
||||
field yet, because KB-CAD is the canonical home for CAD-side
|
||||
structure
|
||||
- **The Material entity** is a derived view from KB-CAD; if it
|
||||
disagrees with KB-CAD, the entity is wrong and a conflict is
|
||||
surfaced
|
||||
- **The PKM note** is historical context, not authoritative for
|
||||
"current"
|
||||
|
||||
### Example 2 — "did we decide to merge the bind mounts?"
|
||||
|
||||
Possible representations:
|
||||
|
||||
- A working session interaction is captured in the `interactions`
|
||||
table with the response containing `## Decision: merge the two
|
||||
bind mounts into one`
|
||||
- The Phase 9 Commit C extractor produced a candidate adaptation
|
||||
memory from that decision
|
||||
- A reviewer promoted the candidate to active
|
||||
- The AtoCore source repo has the actual code change in commit
|
||||
`d0ff8b5` and the docker-compose.yml is in its post-merge form
|
||||
|
||||
Which one wins for "is this decision real and current"?
|
||||
|
||||
- **The Gitea repo** wins for "is this decision implemented" —
|
||||
the docker-compose.yml is the canonical home for the actual
|
||||
bind mount configuration
|
||||
- **The active adaptation memory** wins for "did we decide this"
|
||||
— that's exactly what the Commit C lifecycle is for
|
||||
- **The interaction record** is the audit trail — it's
|
||||
authoritative for "when did this conversation happen and what
|
||||
did the LLM say", but not for "is this decision current"
|
||||
- **The source chunks** from PKM are not relevant here because no
|
||||
PKM note about this decision exists yet (and that's fine —
|
||||
decisions don't have to live in PKM if they live in the repo
|
||||
and the AtoCore memory)
|
||||
|
||||
### Example 3 — "what's p05's current next focus?"
|
||||
|
||||
Possible representations:
|
||||
|
||||
- The PKM has a `current-status.md` note updated last week
|
||||
- AtoCore project_state has `p05 / status / next_focus = "wave 2 ingestion"`
|
||||
- A captured interaction from yesterday discussed the next focus
|
||||
at length
|
||||
|
||||
Which one wins?
|
||||
|
||||
- **project_state wins**, full stop. The trust hierarchy says
|
||||
Layer 3 is canonical for current state. This is exactly the
|
||||
reason project_state exists.
|
||||
- The PKM note is historical context.
|
||||
- The interaction is conversation history.
|
||||
- If project_state and the PKM disagree, the human updates one or
|
||||
the other to bring them in line — usually by re-curating
|
||||
project_state if the conversation revealed a real change.
|
||||
|
||||
## What this means for the engineering layer V1 implementation
|
||||
|
||||
Several concrete consequences fall out of the matrix:
|
||||
|
||||
1. **The Material and Parameter entity types are mostly KB-CAD
|
||||
shadows in V1.** They exist in AtoCore so other entities
|
||||
(Decisions, Requirements) can reference them with structured
|
||||
links, but their authoritative values come from KB-CAD imports.
|
||||
If KB-CAD doesn't know about a material, the AtoCore entity is
|
||||
the canonical home only because nothing else is.
|
||||
2. **Decisions / Requirements / Constraints / ValidationClaims
|
||||
are AtoCore-canonical.** These don't have a natural home in
|
||||
KB-CAD or KB-FEM. They live in AtoCore as first-class entities
|
||||
with full lifecycle and supersession.
|
||||
3. **The PKM is never authoritative.** It is the substrate for
|
||||
extraction. The reviewer promotes things out of it; they don't
|
||||
point at PKM notes as the "current truth".
|
||||
4. **project_state is the override layer.** Whenever the human
|
||||
wants to declare "the current truth is X regardless of what
|
||||
the entities and memories and KB exports say", they curate
|
||||
into project_state. Layer 3 is intentionally small and
|
||||
intentionally manual.
|
||||
5. **The conflict model is the enforcement mechanism.** When two
|
||||
representations disagree on a fact whose canonical home rule
|
||||
should pick a winner, the conflict surfaces via the
|
||||
`/conflicts` endpoint and the reviewer resolves it. The
|
||||
matrix in this document tells the reviewer who is supposed
|
||||
to win in each scenario; they're not making the decision blind.
|
||||
|
||||
## What the matrix does NOT define
|
||||
|
||||
1. **Facts about people other than the user.** No "team member"
|
||||
entity, no per-collaborator preferences. AtoCore is
|
||||
single-user in V1.
|
||||
2. **Facts about AtoCore itself as a project.** Those are project
|
||||
memories and project_state entries under `project=atocore`,
|
||||
same lifecycle as any other project's facts.
|
||||
3. **Vendor / supplier / cost facts.** Out of V1 scope.
|
||||
4. **Time-bounded facts** (a value that was true between two
|
||||
dates and may not be true now). The current matrix treats all
|
||||
active facts as currently-true and uses supersession to
|
||||
represent change. Temporal facts are a V2 concern.
|
||||
5. **Cross-project shared facts** (a Material that is reused across
|
||||
p04, p05, and p06). Currently each project has its own copy.
|
||||
Cross-project deduplication is also a V2 concern.
|
||||
|
||||
## The "single canonical home" invariant in practice
|
||||
|
||||
The hard rule that every fact has exactly one canonical home is
|
||||
the load-bearing invariant of this matrix. To enforce it
|
||||
operationally:
|
||||
|
||||
- **Extraction never duplicates.** When the extractor scans an
|
||||
interaction or a source chunk and proposes a candidate, the
|
||||
candidate is dropped if it duplicates an already-active record
|
||||
in the canonical home (the existing extractor implementation
|
||||
already does this for memories; the entity extractor will
|
||||
follow the same pattern).
|
||||
- **Imports never duplicate.** When KB-CAD pushes the same
|
||||
Component twice with the same value, the second push is
|
||||
recognized as identical and updates the `last_imported_at`
|
||||
timestamp without creating a new entity.
|
||||
- **Imports surface drift as conflict.** When KB-CAD pushes the
|
||||
same Component with a different value, that's a conflict per
|
||||
the conflict model — never a silent overwrite.
|
||||
- **Hand-curation into project_state always wins.** A
|
||||
project_state entry can disagree with an entity or a KB
|
||||
export; the project_state entry is correct by fiat (Layer 3
|
||||
trust), and the reviewer is responsible for bringing the lower
|
||||
layers in line if appropriate.
|
||||
|
||||
## Open questions for V1 implementation
|
||||
|
||||
1. **How does the reviewer see the canonical home for a fact in
|
||||
the UI?** Probably by including the fact's authoritative
|
||||
layer in the entity / memory detail view: "this Material is
|
||||
currently mirrored from KB-CAD; the canonical home is KB-CAD".
|
||||
2. **Who owns running the KB-CAD / KB-FEM exporter?** The
|
||||
`tool-handoff-boundaries.md` doc lists this as an open
|
||||
question; same answer applies here.
|
||||
3. **Do we need an explicit `canonical_home` field on entity
|
||||
rows?** A field that records "this entity is canonical here"
|
||||
vs "this entity is a mirror of <external system>". Probably
|
||||
yes; deferred to the entity schema spec.
|
||||
4. **How are project_state overrides surfaced in the engineering
|
||||
layer query results?** When a query (e.g. Q-001 "what does
|
||||
this subsystem contain?") would return entity rows, the result
|
||||
should also flag any project_state entries that contradict the
|
||||
entities — letting the reviewer see the override at query
|
||||
time, not just in the conflict queue.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- Six representation layers: PKM, KB project, repos, AtoCore
|
||||
memories, AtoCore entities, AtoCore project_state
|
||||
- Every fact kind has exactly one canonical home
|
||||
- The trust hierarchy resolves cross-layer conflicts:
|
||||
project_state > tool-of-origin (KB-CAD/KB-FEM) > entities >
|
||||
active memories > source chunks
|
||||
- Decisions / Requirements / Constraints / ValidationClaims are
|
||||
AtoCore-canonical (no other system has a natural home for them)
|
||||
- Materials / Parameters / CAD-side structure are KB-CAD-canonical
|
||||
- FEM results / validation outcomes are KB-FEM-canonical
|
||||
- project_state is the human override layer, top of the
|
||||
hierarchy, manually curated only
|
||||
- Conflicts surface via `/conflicts` and the reviewer applies the
|
||||
matrix to pick a winner
|
||||
339
docs/architecture/tool-handoff-boundaries.md
Normal file
339
docs/architecture/tool-handoff-boundaries.md
Normal file
@@ -0,0 +1,339 @@
|
||||
# Tool Hand-off Boundaries (KB-CAD / KB-FEM and friends)
|
||||
|
||||
## Why this document exists
|
||||
|
||||
The engineering layer V1 will accumulate typed entities about
|
||||
projects, subsystems, components, materials, requirements,
|
||||
constraints, decisions, parameters, analysis models, results, and
|
||||
validation claims. Many of those concepts also live in real
|
||||
external tools — CAD systems, FEM solvers, BOM managers, PLM
|
||||
databases, vendor portals.
|
||||
|
||||
The first big design decision before writing any entity-layer code
|
||||
is: **what is AtoCore's read/write relationship with each of those
|
||||
external tools?**
|
||||
|
||||
The wrong answer in either direction is expensive:
|
||||
|
||||
- Too read-only: AtoCore becomes a stale shadow of the tools and
|
||||
loses the trust battle the moment a value drifts.
|
||||
- Too bidirectional: AtoCore takes on responsibilities it can't
|
||||
reliably honor (live sync, conflict resolution against external
|
||||
schemas, write-back validation), and the project never ships.
|
||||
|
||||
This document picks a position for V1.
|
||||
|
||||
## The position
|
||||
|
||||
> **AtoCore is a one-way mirror in V1.** External tools push
|
||||
> structured exports into AtoCore. AtoCore never pushes back.
|
||||
|
||||
That position has three corollaries:
|
||||
|
||||
1. **External tools remain the source of truth for everything they
|
||||
already manage.** A CAD model is canonical for geometry; a FEM
|
||||
project is canonical for meshes and solver settings; KB-CAD is
|
||||
canonical for whatever KB-CAD already calls canonical.
|
||||
2. **AtoCore is the source of truth for the *AtoCore-shaped*
|
||||
record** of those facts: the Decision that selected the geometry,
|
||||
the Requirement the geometry satisfies, the ValidationClaim the
|
||||
FEM result supports. AtoCore does not duplicate the external
|
||||
tool's primary representation; it stores the structured *facts
|
||||
about* it.
|
||||
3. **The boundary is enforced by absence.** No write endpoint in
|
||||
AtoCore ever generates a `.prt`, a `.fem`, an export to a PLM
|
||||
schema, or a vendor purchase order. If we find ourselves wanting
|
||||
to add such an endpoint in V1, we should stop and reconsider
|
||||
the V1 scope.
|
||||
|
||||
## Why one-way and not bidirectional
|
||||
|
||||
Bidirectional sync between independent systems is one of the
|
||||
hardest problems in engineering software. The honest reasons we
|
||||
are not attempting it in V1:
|
||||
|
||||
1. **Schema drift.** External tools evolve their schemas
|
||||
independently. A bidirectional sync would have to track every
|
||||
schema version of every external tool we touch. That is a
|
||||
permanent maintenance tax.
|
||||
2. **Conflict semantics.** When AtoCore and an external tool
|
||||
disagree on the same field, "who wins" is a per-tool, per-field
|
||||
decision. There is no general rule. Bidirectional sync would
|
||||
require us to specify that decision exhaustively.
|
||||
3. **Trust hierarchy.** AtoCore's whole point is the trust
|
||||
hierarchy: trusted project state > entities > memories. If we
|
||||
let entities push values back into the external tools, we
|
||||
silently elevate AtoCore's confidence to "high enough to write
|
||||
to a CAD model", which it almost never deserves.
|
||||
4. **Velocity.** A bidirectional engineering layer is a
|
||||
multi-year project. A one-way mirror is a months project. The
|
||||
value-to-effort ratio favors one-way for V1 by an enormous
|
||||
margin.
|
||||
5. **Reversibility.** We can always add bidirectional sync later
|
||||
on a per-tool basis once V1 has shown itself to be useful. We
|
||||
cannot easily walk back a half-finished bidirectional sync that
|
||||
has already corrupted data in someone's CAD model.
|
||||
|
||||
## Per-tool stance for V1
|
||||
|
||||
| External tool | V1 stance | What AtoCore reads in | What AtoCore writes back |
|
||||
|---|---|---|---|
|
||||
| **KB-CAD** (Antoine's CAD knowledge base) | one-way mirror | structured exports of subsystems, components, materials, parameters via a documented JSON or CSV shape | nothing |
|
||||
| **KB-FEM** (Antoine's FEM knowledge base) | one-way mirror | structured exports of analysis models, results, validation claims | nothing |
|
||||
| **NX / Siemens NX** (the CAD tool itself) | not connected in V1 | nothing direct — only what KB-CAD exports about NX projects | nothing |
|
||||
| **PKM (Obsidian / markdown vault)** | already connected via the ingestion pipeline (Phase 1) | full markdown/text corpus per the ingestion-waves doc | nothing |
|
||||
| **Gitea repos** | already connected via the ingestion pipeline | repo markdown/text per project | nothing |
|
||||
| **OpenClaw** (the LLM agent) | already connected via the read-only helper skill on the T420 | nothing — OpenClaw reads from AtoCore | nothing — OpenClaw does not write into AtoCore |
|
||||
| **AtoDrive** (operational truth layer, future) | future: bidirectional with AtoDrive itself, but AtoDrive is internal to AtoCore so this isn't an external tool boundary | n/a in V1 | n/a in V1 |
|
||||
| **PLM / vendor portals / cost systems** | not in V1 scope | nothing | nothing |
|
||||
|
||||
## What "one-way mirror" actually looks like in code
|
||||
|
||||
AtoCore exposes an ingestion endpoint per external tool that
|
||||
accepts a structured export and turns it into entity candidates.
|
||||
The endpoint is read-side from AtoCore's perspective (it reads
|
||||
from a file or HTTP body), even though the external tool is the
|
||||
one initiating the call.
|
||||
|
||||
Proposed V1 ingestion endpoints:
|
||||
|
||||
```
|
||||
POST /ingest/kb-cad/export body: KB-CAD export JSON
|
||||
POST /ingest/kb-fem/export body: KB-FEM export JSON
|
||||
```
|
||||
|
||||
Each endpoint:
|
||||
|
||||
1. Validates the export against the documented schema
|
||||
2. Maps each export record to an entity candidate (status="candidate")
|
||||
3. Carries the export's source identifier into the candidate's
|
||||
provenance fields (source_artifact_id, exporter_version, etc.)
|
||||
4. Returns a summary: how many candidates were created, how many
|
||||
were dropped as duplicates, how many failed schema validation
|
||||
5. Does NOT auto-promote anything
|
||||
|
||||
The KB-CAD and KB-FEM teams (which is to say, future-you) own the
|
||||
exporter scripts that produce these JSON bodies. Those scripts
|
||||
live in the KB-CAD / KB-FEM repos respectively, not in AtoCore.
|
||||
|
||||
## The export schemas (sketch, not final)
|
||||
|
||||
These are starting shapes, intentionally minimal. The schemas
|
||||
will be refined in `kb-cad-export-schema.md` and
|
||||
`kb-fem-export-schema.md` once the V1 ontology lands.
|
||||
|
||||
### KB-CAD export shape (starting sketch)
|
||||
|
||||
```json
|
||||
{
|
||||
"exporter": "kb-cad",
|
||||
"exporter_version": "1.0.0",
|
||||
"exported_at": "2026-04-07T12:00:00Z",
|
||||
"project": "p05-interferometer",
|
||||
"subsystems": [
|
||||
{
|
||||
"id": "subsystem.optical-frame",
|
||||
"name": "Optical frame",
|
||||
"parent": null,
|
||||
"components": [
|
||||
{
|
||||
"id": "component.lateral-support-pad",
|
||||
"name": "Lateral support pad",
|
||||
"material": "GF-PTFE",
|
||||
"parameters": {
|
||||
"thickness_mm": 3.0,
|
||||
"preload_n": 12.0
|
||||
},
|
||||
"source_artifact": "kb-cad://p05/subsystems/optical-frame#lateral-support"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### KB-FEM export shape (starting sketch)
|
||||
|
||||
```json
|
||||
{
|
||||
"exporter": "kb-fem",
|
||||
"exporter_version": "1.0.0",
|
||||
"exported_at": "2026-04-07T12:00:00Z",
|
||||
"project": "p05-interferometer",
|
||||
"analysis_models": [
|
||||
{
|
||||
"id": "model.optical-frame-modal",
|
||||
"name": "Optical frame modal analysis v3",
|
||||
"subsystem": "subsystem.optical-frame",
|
||||
"results": [
|
||||
{
|
||||
"id": "result.first-mode-frequency",
|
||||
"name": "First-mode frequency",
|
||||
"value": 187.4,
|
||||
"unit": "Hz",
|
||||
"supports_validation_claim": "claim.frame-rigidity-min-150hz",
|
||||
"source_artifact": "kb-fem://p05/models/optical-frame-modal#first-mode"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
These shapes will evolve. The point of including them now is to
|
||||
make the one-way mirror concrete: it is a small, well-defined
|
||||
JSON shape, not "AtoCore reaches into KB-CAD's database".
|
||||
|
||||
## What AtoCore is allowed to do with the imported records
|
||||
|
||||
After ingestion, the imported records become entity candidates
|
||||
in AtoCore's own table. From that point forward they follow the
|
||||
exact same lifecycle as any other candidate:
|
||||
|
||||
- they sit at status="candidate" until a human reviews them
|
||||
- the reviewer promotes them to status="active" or rejects them
|
||||
- the active entities are queryable via the engineering query
|
||||
catalog (Q-001 through Q-020)
|
||||
- the active entities can be referenced from Decisions, Requirements,
|
||||
ValidationClaims, etc. via the V1 relationship types
|
||||
|
||||
The imported records are never automatically pushed into trusted
|
||||
project state, never modified in place after import (they are
|
||||
superseded by re-imports, not edited), and never written back to
|
||||
the external tool.
|
||||
|
||||
## What happens when KB-CAD changes a value AtoCore already has
|
||||
|
||||
This is the canonical "drift" scenario. The flow:
|
||||
|
||||
1. KB-CAD exports a fresh JSON. Component `component.lateral-support-pad`
|
||||
now has `material: "PEEK"` instead of `material: "GF-PTFE"`.
|
||||
2. AtoCore's ingestion endpoint sees the same `id` and a different
|
||||
value.
|
||||
3. The ingestion endpoint creates a new entity candidate with the
|
||||
new value, **does NOT delete or modify the existing active
|
||||
entity**, and creates a `conflicts` row linking the two members
|
||||
(per the conflict model doc).
|
||||
4. The reviewer sees an open conflict on the next visit to
|
||||
`/conflicts`.
|
||||
5. The reviewer either:
|
||||
- **promotes the new value** (the active is superseded, the
|
||||
candidate becomes the new active, the audit trail keeps both)
|
||||
- **rejects the new value** (the candidate is invalidated, the
|
||||
active stays — useful when the export was wrong)
|
||||
- **dismisses the conflict** (declares them not actually about
|
||||
the same thing, both stay active)
|
||||
|
||||
The reviewer never touches KB-CAD from AtoCore. If the resolution
|
||||
implies a change in KB-CAD itself, the reviewer makes that change
|
||||
in KB-CAD, then re-exports.
|
||||
|
||||
## What about NX directly?
|
||||
|
||||
NX (Siemens NX) is the underlying CAD tool that KB-CAD wraps.
|
||||
**NX is not connected to AtoCore in V1.** Any facts about NX
|
||||
projects flow through KB-CAD as the structured intermediate. This
|
||||
gives us:
|
||||
|
||||
- **One schema to maintain.** AtoCore only has to understand the
|
||||
KB-CAD export shape, not the NX API.
|
||||
- **One ownership boundary.** KB-CAD owns the question of "what's
|
||||
in NX". AtoCore owns the question of "what's in the typed
|
||||
knowledge base".
|
||||
- **Future flexibility.** When NX is replaced or upgraded, only
|
||||
KB-CAD has to adapt; AtoCore doesn't notice.
|
||||
|
||||
The same logic applies to FEM solvers (Nastran, Abaqus, ANSYS):
|
||||
KB-FEM is the structured intermediate, AtoCore never talks to the
|
||||
solver directly.
|
||||
|
||||
## The hard-line invariants
|
||||
|
||||
These are the things V1 will not do, regardless of how convenient
|
||||
they might seem:
|
||||
|
||||
1. **No write to external tools.** No POST/PUT/PATCH to any
|
||||
external API, no file generation that gets written into a
|
||||
CAD/FEM project tree, no email/chat sends.
|
||||
2. **No live polling.** AtoCore does not poll KB-CAD or KB-FEM on
|
||||
a schedule. Imports are explicit pushes from the external tool
|
||||
into AtoCore's ingestion endpoint.
|
||||
3. **No silent merging.** Every value drift surfaces as a
|
||||
conflict for the reviewer (per the conflict model doc).
|
||||
4. **No schema fan-out.** AtoCore does not store every field that
|
||||
KB-CAD knows about. Only fields that map to one of the V1
|
||||
entity types make it into AtoCore. Everything else is dropped
|
||||
at the import boundary.
|
||||
5. **No external-tool-specific logic in entity types.** A
|
||||
`Component` in AtoCore is the same shape regardless of whether
|
||||
it came from KB-CAD, KB-FEM, the PKM, or a hand-curated
|
||||
project state entry. The source is recorded in provenance,
|
||||
not in the entity shape.
|
||||
|
||||
## What this enables
|
||||
|
||||
With the one-way mirror locked in, V1 implementation can focus on:
|
||||
|
||||
- The entity table and its lifecycle
|
||||
- The two `/ingest/kb-cad/export` and `/ingest/kb-fem/export`
|
||||
endpoints with their JSON validators
|
||||
- The candidate review queue extension (already designed in
|
||||
`promotion-rules.md`)
|
||||
- The conflict model (already designed in `conflict-model.md`)
|
||||
- The query catalog implementation (already designed in
|
||||
`engineering-query-catalog.md`)
|
||||
|
||||
None of those are unbounded. Each is a finite, well-defined
|
||||
implementation task. The one-way mirror is the choice that makes
|
||||
V1 finishable.
|
||||
|
||||
## What V2 might consider (deferred)
|
||||
|
||||
After V1 has been live and demonstrably useful for a quarter or
|
||||
two, the questions that become reasonable to revisit:
|
||||
|
||||
1. **Selective write-back to KB-CAD for low-risk fields.** For
|
||||
example, AtoCore could push back a "Decision id linked to this
|
||||
component" annotation that KB-CAD then displays without it
|
||||
being canonical there. Read-only annotations from AtoCore's
|
||||
perspective, advisory metadata from KB-CAD's perspective.
|
||||
2. **Live polling for very small payloads.** A daily poll of
|
||||
"what subsystem ids exist in KB-CAD now" so AtoCore can flag
|
||||
subsystems that disappeared from KB-CAD without an explicit
|
||||
AtoCore invalidation.
|
||||
3. **Direct NX integration** if the KB-CAD layer becomes a
|
||||
bottleneck — but only if the friction is real, not theoretical.
|
||||
4. **Cost / vendor / PLM connections** for projects where the
|
||||
procurement cycle is part of the active engineering work.
|
||||
|
||||
None of these are V1 work and they are listed only so the V1
|
||||
design intentionally leaves room for them later.
|
||||
|
||||
## Open questions for the V1 implementation sprint
|
||||
|
||||
1. **Where do the export schemas live?** Probably in
|
||||
`docs/architecture/kb-cad-export-schema.md` and
|
||||
`docs/architecture/kb-fem-export-schema.md`, drafted during
|
||||
the implementation sprint.
|
||||
2. **Who runs the exporter?** A scheduled job on the KB-CAD /
|
||||
KB-FEM hosts, triggered by the human after a meaningful
|
||||
change, or both?
|
||||
3. **Is the export incremental or full?** Full is simpler but
|
||||
more expensive. Incremental needs delta semantics. V1 starts
|
||||
with full and revisits when full becomes too slow.
|
||||
4. **How is the exporter authenticated to AtoCore?** Probably
|
||||
the existing PAT model (one PAT per exporter, scoped to
|
||||
`write:engineering-import` once that scope exists). Worth a
|
||||
quick auth design pass before the endpoints exist.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- AtoCore is a one-way mirror in V1: external tools push,
|
||||
AtoCore reads, AtoCore never writes back
|
||||
- Two import endpoints for V1: KB-CAD and KB-FEM, each with a
|
||||
documented JSON export shape
|
||||
- Drift surfaces as conflicts in the existing conflict model
|
||||
- No NX, no FEM solvers, no PLM, no vendor portals, no
|
||||
cost/BOM systems in V1
|
||||
- Bidirectional sync is reserved for V2+ on a per-tool basis,
|
||||
only after V1 demonstrates value
|
||||
155
docs/atocore-ecosystem-and-hosting.md
Normal file
155
docs/atocore-ecosystem-and-hosting.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# AtoCore Ecosystem And Hosting
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines the intended boundaries between the Ato ecosystem layers
|
||||
and the current hosting model.
|
||||
|
||||
## Ecosystem Roles
|
||||
|
||||
- `AtoCore`
|
||||
- runtime, ingestion, retrieval, memory, context builder, API
|
||||
- owns the machine-memory and context assembly system
|
||||
- `AtoMind`
|
||||
- future intelligence layer
|
||||
- will own promotion, reflection, conflict handling, and trust decisions
|
||||
- `AtoVault`
|
||||
- human-readable memory source
|
||||
- intended for Obsidian and manual inspection/editing
|
||||
- `AtoDrive`
|
||||
- trusted operational project source
|
||||
- curated project truth with higher trust than general notes
|
||||
|
||||
## Trust Model
|
||||
|
||||
Current intended trust precedence:
|
||||
|
||||
1. Trusted Project State
|
||||
2. AtoDrive artifacts
|
||||
3. Recent validated memory
|
||||
4. AtoVault summaries
|
||||
5. PKM chunks
|
||||
6. Historical or low-confidence material
|
||||
|
||||
## Storage Boundaries
|
||||
|
||||
Human-readable source layers and machine operational storage must remain
|
||||
separate.
|
||||
|
||||
- `AtoVault` is a source layer, not the live vector database
|
||||
- `AtoDrive` is a source layer, not the live vector database
|
||||
- machine operational state includes:
|
||||
- SQLite database
|
||||
- vector store
|
||||
- indexes
|
||||
- embeddings
|
||||
- runtime metadata
|
||||
- cache and temp artifacts
|
||||
|
||||
The machine database is derived operational state, not the primary
|
||||
human-readable source of truth.
|
||||
|
||||
## Source Snapshot Vs Machine Store
|
||||
|
||||
The human-readable files visible under `sources/vault` or `sources/drive` are
|
||||
not the final "smart storage" format of AtoCore.
|
||||
|
||||
They are source snapshots made visible to the canonical Dalidou instance so
|
||||
AtoCore can ingest them.
|
||||
|
||||
The actual machine-processed state lives in:
|
||||
|
||||
- `source_documents`
|
||||
- `source_chunks`
|
||||
- vector embeddings and indexes
|
||||
- project memories
|
||||
- trusted project state
|
||||
- context-builder output
|
||||
|
||||
This means the staged markdown can still look very similar to the original PKM
|
||||
or repo docs. That is normal.
|
||||
|
||||
The intelligence does not come from rewriting everything into a new markdown
|
||||
vault. It comes from ingesting selected source material into the machine store
|
||||
and then using that store for retrieval, trust-aware context assembly, and
|
||||
memory.
|
||||
|
||||
## Canonical Hosting Model
|
||||
|
||||
Dalidou is the canonical host for the AtoCore service and machine database.
|
||||
|
||||
OpenClaw on the T420 should consume AtoCore over API and network, ideally over
|
||||
Tailscale or another trusted internal network path.
|
||||
|
||||
The live SQLite and vector store must not be treated as a multi-node synced
|
||||
filesystem. The architecture should prefer one canonical running service over
|
||||
file replication of the live machine store.
|
||||
|
||||
## Canonical Dalidou Layout
|
||||
|
||||
```text
|
||||
/srv/storage/atocore/
|
||||
app/ # deployed AtoCore repository
|
||||
data/ # canonical machine state
|
||||
db/
|
||||
chroma/
|
||||
cache/
|
||||
tmp/
|
||||
sources/ # human-readable source inputs
|
||||
vault/
|
||||
drive/
|
||||
logs/
|
||||
backups/
|
||||
run/
|
||||
```
|
||||
|
||||
## Operational Rules
|
||||
|
||||
- source directories are treated as read-only by the AtoCore runtime
|
||||
- Dalidou holds the canonical machine DB
|
||||
- OpenClaw should use AtoCore as an additive context service
|
||||
- OpenClaw must continue to work if AtoCore is unavailable
|
||||
- write-back from OpenClaw into AtoCore is deferred until later phases
|
||||
|
||||
Current staging behavior:
|
||||
|
||||
- selected project docs may be copied into a readable staging area on Dalidou
|
||||
- AtoCore ingests from that staging area into the machine store
|
||||
- the staging area is not itself the durable intelligence layer
|
||||
- changes to the original PKM or repo source do not propagate automatically
|
||||
until a refresh or re-ingest happens
|
||||
|
||||
## Intended Daily Operating Model
|
||||
|
||||
The target workflow is:
|
||||
|
||||
- the human continues to work primarily in PKM project notes, Git/Gitea repos,
|
||||
Discord, and normal OpenClaw sessions
|
||||
- OpenClaw keeps its own runtime behavior and memory system
|
||||
- AtoCore acts as the durable external context layer that compiles trusted
|
||||
project state, retrieval, and long-lived machine-readable context
|
||||
- AtoCore improves prompt quality and robustness without replacing direct repo
|
||||
work, direct file reads, or OpenClaw's own memory
|
||||
|
||||
In other words:
|
||||
|
||||
- PKM and repos remain the human-authoritative project sources
|
||||
- OpenClaw remains the active operating environment
|
||||
- AtoCore remains the compiled context engine and machine-memory host
|
||||
|
||||
## Current Status
|
||||
|
||||
As of the current implementation pass:
|
||||
|
||||
- the AtoCore runtime is deployed on Dalidou
|
||||
- the canonical machine-data layout exists on Dalidou
|
||||
- the service is running from Dalidou
|
||||
- the T420/OpenClaw machine can reach AtoCore over network
|
||||
- a first read-only OpenClaw-side helper exists
|
||||
- the live corpus now includes initial AtoCore self-knowledge and a first
|
||||
curated batch for active projects
|
||||
- the long-term content corpus still needs broader project and vault ingestion
|
||||
|
||||
This means the platform is hosted on Dalidou now, the first cross-machine
|
||||
integration path exists, and the live content corpus is partially populated but
|
||||
not yet fully ingested.
|
||||
442
docs/backup-restore-procedure.md
Normal file
442
docs/backup-restore-procedure.md
Normal file
@@ -0,0 +1,442 @@
|
||||
# AtoCore Backup and Restore Procedure
|
||||
|
||||
## Scope
|
||||
|
||||
This document defines the operational procedure for backing up and
|
||||
restoring AtoCore's machine state on the Dalidou deployment. It is
|
||||
the practical companion to `docs/backup-strategy.md` (which defines
|
||||
the strategy) and `src/atocore/ops/backup.py` (which implements the
|
||||
mechanics).
|
||||
|
||||
The intent is that this procedure can be followed by anyone with
|
||||
SSH access to Dalidou and the AtoCore admin endpoints.
|
||||
|
||||
## What gets backed up
|
||||
|
||||
A `create_runtime_backup` snapshot contains, in order of importance:
|
||||
|
||||
| Artifact | Source path on Dalidou | Backup destination | Always included |
|
||||
|---|---|---|---|
|
||||
| SQLite database | `/srv/storage/atocore/data/db/atocore.db` | `<backup_root>/db/atocore.db` | yes |
|
||||
| Project registry JSON | `/srv/storage/atocore/config/project-registry.json` | `<backup_root>/config/project-registry.json` | yes (if file exists) |
|
||||
| Backup metadata | (generated) | `<backup_root>/backup-metadata.json` | yes |
|
||||
| Chroma vector store | `/srv/storage/atocore/data/chroma/` | `<backup_root>/chroma/` | only when `include_chroma=true` |
|
||||
|
||||
The SQLite snapshot uses the online `conn.backup()` API and is safe
|
||||
to take while the database is in use. The Chroma snapshot is a cold
|
||||
directory copy and is **only safe when no ingestion is running**;
|
||||
the API endpoint enforces this by acquiring the ingestion lock for
|
||||
the duration of the copy.
|
||||
|
||||
What is **not** in the backup:
|
||||
|
||||
- Source documents under `/srv/storage/atocore/sources/vault/` and
|
||||
`/srv/storage/atocore/sources/drive/`. These are read-only
|
||||
inputs and live in the user's PKM/Drive, which is backed up
|
||||
separately by their own systems.
|
||||
- Application code. The container image is the source of truth for
|
||||
code; recovery means rebuilding the image, not restoring code from
|
||||
a backup.
|
||||
- Logs under `/srv/storage/atocore/logs/`.
|
||||
- Embeddings cache under `/srv/storage/atocore/data/cache/`.
|
||||
- Temp files under `/srv/storage/atocore/data/tmp/`.
|
||||
|
||||
## Backup root layout
|
||||
|
||||
Each backup snapshot lives in its own timestamped directory:
|
||||
|
||||
```
|
||||
/srv/storage/atocore/backups/snapshots/
|
||||
├── 20260407T060000Z/
|
||||
│ ├── backup-metadata.json
|
||||
│ ├── db/
|
||||
│ │ └── atocore.db
|
||||
│ ├── config/
|
||||
│ │ └── project-registry.json
|
||||
│ └── chroma/ # only if include_chroma=true
|
||||
│ └── ...
|
||||
├── 20260408T060000Z/
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
The timestamp is UTC, format `YYYYMMDDTHHMMSSZ`.
|
||||
|
||||
## Triggering a backup
|
||||
|
||||
### Option A — via the admin endpoint (preferred)
|
||||
|
||||
```bash
|
||||
# DB + registry only (fast, safe at any time)
|
||||
curl -fsS -X POST http://dalidou:8100/admin/backup \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"include_chroma": false}'
|
||||
|
||||
# DB + registry + Chroma (acquires ingestion lock)
|
||||
curl -fsS -X POST http://dalidou:8100/admin/backup \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"include_chroma": true}'
|
||||
```
|
||||
|
||||
The response is the backup metadata JSON. Save the `backup_root`
|
||||
field — that's the directory the snapshot was written to.
|
||||
|
||||
### Option B — via the standalone script (when the API is down)
|
||||
|
||||
```bash
|
||||
docker exec atocore python -m atocore.ops.backup
|
||||
```
|
||||
|
||||
This runs `create_runtime_backup()` directly, without going through
|
||||
the API or the ingestion lock. Use it only when the AtoCore service
|
||||
itself is unhealthy and you can't hit the admin endpoint.
|
||||
|
||||
### Option C — manual file copy (last resort)
|
||||
|
||||
If both the API and the standalone script are unusable:
|
||||
|
||||
```bash
|
||||
sudo systemctl stop atocore # or: docker compose stop atocore
|
||||
sudo cp /srv/storage/atocore/data/db/atocore.db \
|
||||
/srv/storage/atocore/backups/manual-$(date -u +%Y%m%dT%H%M%SZ).db
|
||||
sudo cp /srv/storage/atocore/config/project-registry.json \
|
||||
/srv/storage/atocore/backups/manual-$(date -u +%Y%m%dT%H%M%SZ).registry.json
|
||||
sudo systemctl start atocore
|
||||
```
|
||||
|
||||
This is a cold backup and requires brief downtime.
|
||||
|
||||
## Listing backups
|
||||
|
||||
```bash
|
||||
curl -fsS http://dalidou:8100/admin/backup
|
||||
```
|
||||
|
||||
Returns the configured `backup_dir` and a list of all snapshots
|
||||
under it, with their full metadata if available.
|
||||
|
||||
Or, on the host directly:
|
||||
|
||||
```bash
|
||||
ls -la /srv/storage/atocore/backups/snapshots/
|
||||
```
|
||||
|
||||
## Validating a backup
|
||||
|
||||
Before relying on a backup for restore, validate it:
|
||||
|
||||
```bash
|
||||
curl -fsS http://dalidou:8100/admin/backup/20260407T060000Z/validate
|
||||
```
|
||||
|
||||
The validator:
|
||||
- confirms the snapshot directory exists
|
||||
- opens the SQLite snapshot and runs `PRAGMA integrity_check`
|
||||
- parses the registry JSON
|
||||
- confirms the Chroma directory exists (if it was included)
|
||||
|
||||
A valid backup returns `"valid": true` and an empty `errors` array.
|
||||
A failing validation returns `"valid": false` with one or more
|
||||
specific error strings (e.g. `db_integrity_check_failed`,
|
||||
`registry_invalid_json`, `chroma_snapshot_missing`).
|
||||
|
||||
**Validate every backup at creation time.** A backup that has never
|
||||
been validated is not actually a backup — it's just a hopeful copy
|
||||
of bytes.
|
||||
|
||||
## Restore procedure
|
||||
|
||||
Since 2026-04-09 the restore is implemented as a proper module
|
||||
function plus CLI entry point: `restore_runtime_backup()` in
|
||||
`src/atocore/ops/backup.py`, invoked as
|
||||
`python -m atocore.ops.backup restore <STAMP> --confirm-service-stopped`.
|
||||
It automatically takes a pre-restore safety snapshot (your rollback
|
||||
anchor), handles SQLite WAL/SHM cleanly, restores the registry, and
|
||||
runs `PRAGMA integrity_check` on the restored db. This replaces the
|
||||
earlier manual `sudo cp` sequence.
|
||||
|
||||
The function refuses to run without `--confirm-service-stopped`.
|
||||
This is deliberate: hot-restoring into a running service corrupts
|
||||
SQLite state.
|
||||
|
||||
### Pre-flight (always)
|
||||
|
||||
1. Identify which snapshot you want to restore. List available
|
||||
snapshots and pick by timestamp:
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:8100/admin/backup | jq '.backups[].stamp'
|
||||
```
|
||||
2. Validate it. Refuse to restore an invalid backup:
|
||||
```bash
|
||||
STAMP=20260409T060000Z
|
||||
curl -fsS http://127.0.0.1:8100/admin/backup/$STAMP/validate | jq .
|
||||
```
|
||||
3. **Stop AtoCore.** SQLite cannot be hot-restored under a running
|
||||
process and Chroma will not pick up new files until the process
|
||||
restarts.
|
||||
```bash
|
||||
cd /srv/storage/atocore/app/deploy/dalidou
|
||||
docker compose down
|
||||
docker compose ps # atocore should be Exited/gone
|
||||
```
|
||||
|
||||
### Run the restore
|
||||
|
||||
Use a one-shot container that reuses the live service's volume
|
||||
mounts so every path (`db_path`, `chroma_path`, backup dir) resolves
|
||||
to the same place the main service would see:
|
||||
|
||||
```bash
|
||||
cd /srv/storage/atocore/app/deploy/dalidou
|
||||
docker compose run --rm --entrypoint python atocore \
|
||||
-m atocore.ops.backup restore \
|
||||
$STAMP \
|
||||
--confirm-service-stopped
|
||||
```
|
||||
|
||||
Output is a JSON document. The critical fields:
|
||||
|
||||
- `pre_restore_snapshot`: stamp of the safety snapshot of live
|
||||
state taken right before the restore. **Write this down.** If
|
||||
the restore was the wrong call, this is how you roll it back.
|
||||
- `db_restored`: should be `true`
|
||||
- `registry_restored`: `true` if the backup captured a registry
|
||||
- `chroma_restored`: `true` if the backup captured a chroma tree
|
||||
and include_chroma resolved to true (default)
|
||||
- `restored_integrity_ok`: **must be `true`** — if this is false,
|
||||
STOP and do not start the service; investigate the integrity
|
||||
error first. The restored file is still on disk but untrusted.
|
||||
|
||||
### Controlling the restore
|
||||
|
||||
The CLI supports a few flags for finer control:
|
||||
|
||||
- `--no-pre-snapshot` skips the pre-restore safety snapshot. Use
|
||||
this only when you know you have another rollback path.
|
||||
- `--no-chroma` restores only SQLite + registry, leaving the
|
||||
current Chroma dir alone. Useful if Chroma is consistent but
|
||||
SQLite needs a rollback.
|
||||
- `--chroma` forces Chroma restoration even if the metadata
|
||||
doesn't clearly indicate the snapshot has it (rare).
|
||||
|
||||
### Chroma restore and bind-mounted volumes
|
||||
|
||||
The Chroma dir on Dalidou is a bind-mounted Docker volume. The
|
||||
restore cannot `rmtree` the destination (you can't unlink a mount
|
||||
point — it raises `OSError [Errno 16] Device or resource busy`),
|
||||
so the function clears the dir's CONTENTS and uses
|
||||
`copytree(dirs_exist_ok=True)` to copy the snapshot back in. The
|
||||
regression test `test_restore_chroma_does_not_unlink_destination_directory`
|
||||
in `tests/test_backup.py` captures the destination inode before
|
||||
and after restore and asserts it's stable — the same invariant
|
||||
that protects the bind mount.
|
||||
|
||||
This was discovered during the first real Dalidou restore drill
|
||||
on 2026-04-09. If you see a new restore failure with
|
||||
`Device or resource busy`, something has regressed this fix.
|
||||
|
||||
### Restart AtoCore
|
||||
|
||||
```bash
|
||||
cd /srv/storage/atocore/app/deploy/dalidou
|
||||
docker compose up -d
|
||||
# Wait for /health to come up
|
||||
for i in 1 2 3 4 5 6 7 8 9 10; do
|
||||
curl -fsS http://127.0.0.1:8100/health \
|
||||
&& break || { echo "not ready ($i/10)"; sleep 3; }
|
||||
done
|
||||
```
|
||||
|
||||
**Note on build_sha after restore:** The one-shot `docker compose run`
|
||||
container does not carry the build provenance env vars that `deploy.sh`
|
||||
exports at deploy time. After a restore, `/health` will report
|
||||
`build_sha: "unknown"` until you re-run `deploy.sh` or manually
|
||||
re-deploy. This is cosmetic — the data is correctly restored — but if
|
||||
you need `build_sha` to be accurate, run a redeploy after the restore:
|
||||
|
||||
```bash
|
||||
cd /srv/storage/atocore/app
|
||||
bash deploy/dalidou/deploy.sh
|
||||
```
|
||||
|
||||
### Post-restore verification
|
||||
|
||||
```bash
|
||||
# 1. Service is healthy
|
||||
curl -fsS http://127.0.0.1:8100/health | jq .
|
||||
|
||||
# 2. Stats look right
|
||||
curl -fsS http://127.0.0.1:8100/stats | jq .
|
||||
|
||||
# 3. Project registry loads
|
||||
curl -fsS http://127.0.0.1:8100/projects | jq '.projects | length'
|
||||
|
||||
# 4. A known-good context query returns non-empty results
|
||||
curl -fsS -X POST http://127.0.0.1:8100/context/build \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"prompt": "what is p05 about", "project": "p05-interferometer"}' | jq '.chunks_used'
|
||||
```
|
||||
|
||||
If any of these are wrong, the restore is bad. Roll back using the
|
||||
pre-restore safety snapshot whose stamp you recorded from the
|
||||
restore output. The rollback is the same procedure — stop the
|
||||
service and restore that stamp:
|
||||
|
||||
```bash
|
||||
docker compose down
|
||||
docker compose run --rm --entrypoint python atocore \
|
||||
-m atocore.ops.backup restore \
|
||||
$PRE_RESTORE_SNAPSHOT_STAMP \
|
||||
--confirm-service-stopped \
|
||||
--no-pre-snapshot
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
(`--no-pre-snapshot` because the rollback itself doesn't need one;
|
||||
you already have the original snapshot as a fallback if everything
|
||||
goes sideways.)
|
||||
|
||||
### Restore drill
|
||||
|
||||
The restore is exercised at three levels:
|
||||
|
||||
1. **Unit tests.** `tests/test_backup.py` has six restore tests
|
||||
(refuse-without-confirm, invalid backup, full round-trip,
|
||||
Chroma round-trip, inode-stability regression, WAL sidecar
|
||||
cleanup, skip-pre-snapshot). These run in CI on every commit.
|
||||
2. **Module-level round-trip.**
|
||||
`test_restore_round_trip_reverses_post_backup_mutations` is
|
||||
the canonical drill in code form: seed baseline, snapshot,
|
||||
mutate, restore, assert mutation reversed + baseline survived
|
||||
+ pre-restore snapshot captured the mutation.
|
||||
3. **Live drill on Dalidou.** Periodically run the full procedure
|
||||
against the real service with a disposable drill-marker
|
||||
memory (created via `POST /memory` with `memory_type=episodic`
|
||||
and `project=drill`), following the sequence above and then
|
||||
verifying the marker is gone afterward via
|
||||
`GET /memory?project=drill`. The first such drill on
|
||||
2026-04-09 surfaced the bind-mount bug; future runs
|
||||
primarily exist to verify the fix stays fixed.
|
||||
|
||||
Run the live drill:
|
||||
|
||||
- **Before** enabling any new write-path automation (auto-capture,
|
||||
automated ingestion, reinforcement sweeps).
|
||||
- **After** any change to `src/atocore/ops/backup.py` or to
|
||||
schema migrations in `src/atocore/models/database.py`.
|
||||
- **After** a Dalidou OS upgrade or docker version bump.
|
||||
- **At least once per quarter** as a standing operational check.
|
||||
- **After any incident** that touched the storage layer.
|
||||
|
||||
Record each drill run (stamp, pre-restore snapshot stamp, pass/fail,
|
||||
any surprises) somewhere durable — a line in the project journal
|
||||
or a git commit message is enough. A drill you ran once and never
|
||||
again is barely more than a drill you never ran.
|
||||
|
||||
## Retention policy
|
||||
|
||||
- **Last 7 daily backups**: kept verbatim
|
||||
- **Last 4 weekly backups** (Sunday): kept verbatim
|
||||
- **Last 6 monthly backups** (1st of month): kept verbatim
|
||||
- **Anything older**: deleted
|
||||
|
||||
The retention job is **not yet implemented** and is tracked as a
|
||||
follow-up. Until then, the snapshots directory grows monotonically.
|
||||
A simple cron-based cleanup script is the next step:
|
||||
|
||||
```cron
|
||||
0 4 * * * /srv/storage/atocore/scripts/cleanup-old-backups.sh
|
||||
```
|
||||
|
||||
## Common failure modes and what to do about them
|
||||
|
||||
| Symptom | Likely cause | Action |
|
||||
|---|---|---|
|
||||
| `db_integrity_check_failed` on validation | SQLite snapshot copied while a write was in progress, or disk corruption | Take a fresh backup and validate again. If it fails twice, suspect the underlying disk. |
|
||||
| `registry_invalid_json` | Registry was being edited at backup time | Take a fresh backup. The registry is small so this is cheap. |
|
||||
| Restore: `restored_integrity_ok: false` | Source snapshot was itself corrupt (validation should have caught it — file a bug) or copy was interrupted mid-write | Do NOT start the service. Validate the snapshot directly with `python -m atocore.ops.backup validate <STAMP>`, try a different older snapshot, or roll back to the pre-restore safety snapshot. |
|
||||
| Restore: `OSError [Errno 16] Device or resource busy` on Chroma | Old code tried to `rmtree` the Chroma mount point. Fixed on 2026-04-09 by `test_restore_chroma_does_not_unlink_destination_directory` | Ensure you're running commit 2026-04-09 or later; if you need to work around an older build, use `--no-chroma` and restore Chroma contents manually. |
|
||||
| `chroma_snapshot_missing` after a restore | Snapshot was DB-only | Either rebuild via fresh ingestion or restore an older snapshot that includes Chroma. |
|
||||
| Service won't start after restore | Permissions wrong on the restored files | Re-run `chown 1000:1000` (or whatever the gitea/atocore container user is) on the data dir. |
|
||||
| `/stats` returns 0 documents after restore | The SQL store was restored but the source paths in `source_documents` don't match the current Dalidou paths | This means the backup came from a different deployment. Don't trust this restore — it's pulling from the wrong layout. |
|
||||
| Drill marker still present after restore | Wrong stamp, service still writing during `docker compose down`, or the restore JSON didn't report `db_restored: true` | Roll back via the pre-restore safety snapshot and retry with the correct source snapshot. |
|
||||
|
||||
## Open follow-ups (not yet implemented)
|
||||
|
||||
Tracked separately in `docs/next-steps.md` — the list below is the
|
||||
backup-specific subset.
|
||||
|
||||
1. **Retention cleanup script**: see the cron entry above. The
|
||||
snapshots directory grows monotonically until this exists.
|
||||
2. **Off-Dalidou backup target**: currently snapshots live on the
|
||||
same disk as the live data. A real disaster-recovery story
|
||||
needs at least one snapshot on a different physical machine.
|
||||
The simplest first step is a periodic `rsync` to the user's
|
||||
laptop or to another server.
|
||||
3. **Backup encryption**: snapshots contain raw SQLite and JSON.
|
||||
Consider age/gpg encryption if backups will be shipped off-site.
|
||||
4. **Automatic post-backup validation**: today the validator must
|
||||
be invoked manually. The `create_runtime_backup` function
|
||||
should call `validate_backup` on its own output and refuse to
|
||||
declare success if validation fails.
|
||||
5. **Chroma backup is currently full directory copy** every time.
|
||||
For large vector stores this gets expensive. A future
|
||||
improvement would be incremental snapshots via filesystem-level
|
||||
snapshotting (LVM, btrfs, ZFS).
|
||||
|
||||
**Done** (kept for historical reference):
|
||||
|
||||
- ~~Implement `restore_runtime_backup()` as a proper module
|
||||
function so the restore isn't a manual `sudo cp` dance~~ —
|
||||
landed 2026-04-09 in commit 3362080, followed by the
|
||||
Chroma bind-mount fix from the first real drill.
|
||||
|
||||
## Quickstart cheat sheet
|
||||
|
||||
```bash
|
||||
# Daily backup (DB + registry only — fast)
|
||||
curl -fsS -X POST http://127.0.0.1:8100/admin/backup \
|
||||
-H "Content-Type: application/json" -d '{}'
|
||||
|
||||
# Weekly backup (DB + registry + Chroma — slower, holds ingestion lock)
|
||||
curl -fsS -X POST http://127.0.0.1:8100/admin/backup \
|
||||
-H "Content-Type: application/json" -d '{"include_chroma": true}'
|
||||
|
||||
# List backups
|
||||
curl -fsS http://127.0.0.1:8100/admin/backup | jq '.backups[].stamp'
|
||||
|
||||
# Validate the most recent backup
|
||||
LATEST=$(curl -fsS http://127.0.0.1:8100/admin/backup | jq -r '.backups[-1].stamp')
|
||||
curl -fsS http://127.0.0.1:8100/admin/backup/$LATEST/validate | jq .
|
||||
|
||||
# Full restore (service must be stopped first)
|
||||
cd /srv/storage/atocore/app/deploy/dalidou
|
||||
docker compose down
|
||||
docker compose run --rm --entrypoint python atocore \
|
||||
-m atocore.ops.backup restore $STAMP --confirm-service-stopped
|
||||
docker compose up -d
|
||||
|
||||
# Live drill: exercise the full create -> mutate -> restore flow
|
||||
# against the running service. The marker memory uses
|
||||
# memory_type=episodic (valid types: identity, preference, project,
|
||||
# episodic, knowledge, adaptation) and project=drill so it's easy
|
||||
# to find via GET /memory?project=drill before and after.
|
||||
#
|
||||
# See the "Restore drill" section above for the full sequence.
|
||||
STAMP=$(curl -fsS -X POST http://127.0.0.1:8100/admin/backup \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"include_chroma": true}' | jq -r '.backup_root' | awk -F/ '{print $NF}')
|
||||
|
||||
curl -fsS -X POST http://127.0.0.1:8100/memory \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"memory_type":"episodic","content":"DRILL-MARKER","project":"drill","confidence":1.0}'
|
||||
|
||||
cd /srv/storage/atocore/app/deploy/dalidou
|
||||
docker compose down
|
||||
docker compose run --rm --entrypoint python atocore \
|
||||
-m atocore.ops.backup restore $STAMP --confirm-service-stopped
|
||||
docker compose up -d
|
||||
|
||||
# Marker should be gone:
|
||||
curl -fsS 'http://127.0.0.1:8100/memory?project=drill' | jq .
|
||||
```
|
||||
80
docs/backup-strategy.md
Normal file
80
docs/backup-strategy.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# AtoCore Backup Strategy
|
||||
|
||||
## Purpose
|
||||
|
||||
This document describes the current backup baseline for the Dalidou-hosted
|
||||
AtoCore machine store.
|
||||
|
||||
The immediate goal is not full disaster-proof automation yet. The goal is to
|
||||
have one safe, repeatable way to snapshot the most important writable state.
|
||||
|
||||
## Current Backup Baseline
|
||||
|
||||
Today, the safest hot-backup target is:
|
||||
|
||||
- SQLite machine database
|
||||
- project registry JSON
|
||||
- backup metadata describing what was captured
|
||||
|
||||
This is now supported by:
|
||||
|
||||
- `python -m atocore.ops.backup`
|
||||
|
||||
## What The Script Captures
|
||||
|
||||
The backup command creates a timestamped snapshot under:
|
||||
|
||||
- `ATOCORE_BACKUP_DIR/snapshots/<timestamp>/`
|
||||
|
||||
It currently writes:
|
||||
|
||||
- `db/atocore.db`
|
||||
- created with SQLite's backup API
|
||||
- `config/project-registry.json`
|
||||
- copied if it exists
|
||||
- `backup-metadata.json`
|
||||
- timestamp, paths, and backup notes
|
||||
|
||||
## What It Does Not Yet Capture
|
||||
|
||||
The current script does not hot-backup Chroma.
|
||||
|
||||
That is intentional.
|
||||
|
||||
For now, Chroma should be treated as one of:
|
||||
|
||||
- rebuildable derived state
|
||||
- or something that needs a deliberate cold snapshot/export workflow
|
||||
|
||||
Until that workflow exists, do not rely on ad hoc live file copies of the
|
||||
vector store while the service is actively writing.
|
||||
|
||||
## Dalidou Use
|
||||
|
||||
On Dalidou, the canonical machine paths are:
|
||||
|
||||
- DB:
|
||||
- `/srv/storage/atocore/data/db/atocore.db`
|
||||
- registry:
|
||||
- `/srv/storage/atocore/config/project-registry.json`
|
||||
- backups:
|
||||
- `/srv/storage/atocore/backups`
|
||||
|
||||
So a normal backup run should happen on Dalidou itself, not from another
|
||||
machine.
|
||||
|
||||
## Next Backup Improvements
|
||||
|
||||
1. decide Chroma policy clearly
|
||||
- rebuild vs cold snapshot vs export
|
||||
2. add a simple scheduled backup routine on Dalidou
|
||||
3. add retention policy for old snapshots
|
||||
4. optionally add a restore validation check
|
||||
|
||||
## Healthy Rule
|
||||
|
||||
Do not design around syncing the live machine DB/vector store between machines.
|
||||
|
||||
Back up the canonical Dalidou state.
|
||||
Restore from Dalidou state.
|
||||
Keep OpenClaw as a client of AtoCore, not a storage peer.
|
||||
275
docs/current-state.md
Normal file
275
docs/current-state.md
Normal file
@@ -0,0 +1,275 @@
|
||||
# AtoCore Current State
|
||||
|
||||
## Status Summary
|
||||
|
||||
AtoCore is no longer just a proof of concept. The local engine exists, the
|
||||
correctness pass is complete, Dalidou now hosts the canonical runtime and
|
||||
machine-storage location, and the T420/OpenClaw side now has a safe read-only
|
||||
path to consume AtoCore. The live corpus is no longer just self-knowledge: it
|
||||
now includes a first curated ingestion batch for the active projects.
|
||||
|
||||
## Phase Assessment
|
||||
|
||||
- completed
|
||||
- Phase 0
|
||||
- Phase 0.5
|
||||
- Phase 1
|
||||
- baseline complete
|
||||
- Phase 2
|
||||
- Phase 3
|
||||
- Phase 5
|
||||
- Phase 7
|
||||
- Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
|
||||
- partial
|
||||
- Phase 4
|
||||
- Phase 8
|
||||
- not started
|
||||
- Phase 6
|
||||
- Phase 10
|
||||
- Phase 11
|
||||
- Phase 12
|
||||
- Phase 13
|
||||
|
||||
## What Exists Today
|
||||
|
||||
- ingestion pipeline
|
||||
- parser and chunker
|
||||
- SQLite-backed memory and project state
|
||||
- vector retrieval
|
||||
- context builder
|
||||
- API routes for query, context, health, and source status
|
||||
- project registry and per-project refresh foundation
|
||||
- project registration lifecycle:
|
||||
- template
|
||||
- proposal preview
|
||||
- approved registration
|
||||
- safe update of existing project registrations
|
||||
- refresh
|
||||
- implementation-facing architecture notes for:
|
||||
- engineering knowledge hybrid architecture
|
||||
- engineering ontology v1
|
||||
- env-driven storage and deployment paths
|
||||
- Dalidou Docker deployment foundation
|
||||
- initial AtoCore self-knowledge corpus ingested on Dalidou
|
||||
- T420/OpenClaw read-only AtoCore helper skill
|
||||
- full active-project markdown/text corpus wave for:
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
|
||||
## What Is True On Dalidou
|
||||
|
||||
- deployed repo location:
|
||||
- `/srv/storage/atocore/app`
|
||||
- canonical machine DB location:
|
||||
- `/srv/storage/atocore/data/db/atocore.db`
|
||||
- canonical vector store location:
|
||||
- `/srv/storage/atocore/data/chroma`
|
||||
- source input locations:
|
||||
- `/srv/storage/atocore/sources/vault`
|
||||
- `/srv/storage/atocore/sources/drive`
|
||||
|
||||
The service and storage foundation are live on Dalidou.
|
||||
|
||||
The machine-data host is real and canonical.
|
||||
|
||||
The project registry is now also persisted in a canonical mounted config path on
|
||||
Dalidou:
|
||||
|
||||
- `/srv/storage/atocore/config/project-registry.json`
|
||||
|
||||
The content corpus is partially populated now.
|
||||
|
||||
The Dalidou instance already contains:
|
||||
|
||||
- AtoCore ecosystem and hosting docs
|
||||
- current-state and OpenClaw integration docs
|
||||
- Master Plan V3
|
||||
- Build Spec V1
|
||||
- trusted project-state entries for `atocore`
|
||||
- full staged project markdown/text corpora for:
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
- curated repo-context docs for:
|
||||
- `p05`: `Fullum-Interferometer`
|
||||
- `p06`: `polisher-sim`
|
||||
- trusted project-state entries for:
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
|
||||
Current live stats after the full active-project wave are now far beyond the
|
||||
initial seed stage:
|
||||
|
||||
- more than `1,100` source documents
|
||||
- more than `20,000` chunks
|
||||
- matching vector count
|
||||
|
||||
The broader long-term corpus is still not fully populated yet. Wider project and
|
||||
vault ingestion remains a deliberate next step rather than something already
|
||||
completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.
|
||||
|
||||
For human-readable quality review, the current staged project markdown corpus is
|
||||
primarily visible under:
|
||||
|
||||
- `/srv/storage/atocore/sources/vault/incoming/projects`
|
||||
|
||||
This staged area is now useful for review because it contains the markdown/text
|
||||
project docs that were actually ingested for the full active-project wave.
|
||||
|
||||
It is important to read this staged area correctly:
|
||||
|
||||
- it is a readable ingestion input layer
|
||||
- it is not the final machine-memory representation itself
|
||||
- seeing familiar PKM-style notes there is expected
|
||||
- the machine-processed intelligence lives in the DB, chunks, vectors, memory,
|
||||
trusted project state, and context-builder outputs
|
||||
|
||||
## What Is True On The T420
|
||||
|
||||
- SSH access is working
|
||||
- OpenClaw workspace inspected at `/home/papa/clawd`
|
||||
- OpenClaw's own memory system remains unchanged
|
||||
- a read-only AtoCore integration skill exists in the workspace:
|
||||
- `/home/papa/clawd/skills/atocore-context/`
|
||||
- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
|
||||
- fail-open behavior has been verified for the helper path
|
||||
- OpenClaw can now seed AtoCore in two distinct ways:
|
||||
- project-scoped memory entries
|
||||
- staged document ingestion into the retrieval corpus
|
||||
- the helper now supports the practical registered-project lifecycle:
|
||||
- projects
|
||||
- project-template
|
||||
- propose-project
|
||||
- register-project
|
||||
- update-project
|
||||
- refresh-project
|
||||
- the helper now also supports the first organic routing layer:
|
||||
- `detect-project "<prompt>"`
|
||||
- `auto-context "<prompt>" [budget] [project]`
|
||||
- OpenClaw can now default to AtoCore for project-knowledge questions without
|
||||
requiring explicit helper commands from the human every time
|
||||
|
||||
## What Exists In Memory vs Corpus
|
||||
|
||||
These remain separate and that is intentional.
|
||||
|
||||
In `/memory`:
|
||||
|
||||
- project-scoped curated memories now exist for:
|
||||
- `p04-gigabit`: 5 memories
|
||||
- `p05-interferometer`: 6 memories
|
||||
- `p06-polisher`: 8 memories
|
||||
|
||||
These are curated summaries and extracted stable project signals.
|
||||
|
||||
In `source_documents` / retrieval corpus:
|
||||
|
||||
- full project markdown/text corpora are now present for the active project set
|
||||
- retrieval is no longer limited to AtoCore self-knowledge only
|
||||
- the current corpus is broad enough that ranking quality matters more than
|
||||
corpus presence alone
|
||||
- underspecified prompts can still pull in historical or archive material, so
|
||||
project-aware routing and better ranking remain important
|
||||
|
||||
The source refresh model now has a concrete foundation in code:
|
||||
|
||||
- a project registry file defines known project ids, aliases, and ingest roots
|
||||
- the API can list registered projects
|
||||
- the API can return a registration template
|
||||
- the API can preview a registration without mutating state
|
||||
- the API can persist an approved registration
|
||||
- the API can update an existing registered project without changing its canonical id
|
||||
- the API can refresh one registered project at a time
|
||||
|
||||
This lifecycle is now coherent end to end for normal use.
|
||||
|
||||
The first live update passes on existing registered projects have now been
|
||||
verified against `p04-gigabit` and `p05-interferometer`:
|
||||
|
||||
- the registration description can be updated safely
|
||||
- the canonical project id remains unchanged
|
||||
- refresh still behaves cleanly after the update
|
||||
- `context/build` still returns useful project-specific context afterward
|
||||
|
||||
## Reliability Baseline
|
||||
|
||||
The runtime has now been hardened in a few practical ways:
|
||||
|
||||
- SQLite connections use a configurable busy timeout
|
||||
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
|
||||
- project registry writes are atomic file replacements rather than in-place rewrites
|
||||
- a full runtime backup and restore path now exists and has been exercised on
|
||||
live Dalidou:
|
||||
- SQLite (hot online backup via `conn.backup()`)
|
||||
- project registry (file copy)
|
||||
- Chroma vector store (cold directory copy under `exclusive_ingestion()`)
|
||||
- backup metadata
|
||||
- `restore_runtime_backup()` with CLI entry point
|
||||
(`python -m atocore.ops.backup restore <STAMP>
|
||||
--confirm-service-stopped`), pre-restore safety snapshot for
|
||||
rollback, WAL/SHM sidecar cleanup, `PRAGMA integrity_check`
|
||||
on the restored file
|
||||
- the first live drill on 2026-04-09 surfaced and fixed a Chroma
|
||||
restore bug on Docker bind-mounted volumes (`shutil.rmtree`
|
||||
on a mount point); a regression test now asserts the
|
||||
destination inode is stable across restore
|
||||
- deploy provenance is visible end-to-end:
|
||||
- `/health` reports `build_sha`, `build_time`, `build_branch`
|
||||
from env vars wired by `deploy.sh`
|
||||
- `deploy.sh` Step 6 verifies the live `build_sha` matches the
|
||||
just-built commit (exit code 6 on drift) so "live is current?"
|
||||
can be answered precisely, not just by `__version__`
|
||||
- `deploy.sh` Step 1.5 detects that the script itself changed
|
||||
in the pulled commit and re-execs into the fresh copy, so
|
||||
the deploy never silently runs the old script against new source
|
||||
|
||||
This does not eliminate every concurrency edge, but it materially improves the
|
||||
current operational baseline.
|
||||
|
||||
In `Trusted Project State`:
|
||||
|
||||
- each active seeded project now has a conservative trusted-state set
|
||||
- promoted facts cover:
|
||||
- summary
|
||||
- core architecture or boundary decision
|
||||
- key constraints
|
||||
- next focus
|
||||
|
||||
This separation is healthy:
|
||||
|
||||
- memory stores distilled project facts
|
||||
- corpus stores the underlying retrievable documents
|
||||
|
||||
## Immediate Next Focus
|
||||
|
||||
1. ~~Re-run the full backup/restore drill~~ — DONE 2026-04-11,
|
||||
full pass (db, registry, chroma, integrity all true)
|
||||
2. ~~Turn on auto-capture of Claude Code sessions in conservative
|
||||
mode~~ — DONE 2026-04-11, Stop hook wired via
|
||||
`deploy/hooks/capture_stop.py` → `POST /interactions`
|
||||
with `reinforce=false`; kill switch via
|
||||
`ATOCORE_CAPTURE_DISABLED=1`
|
||||
3. Run a short real-use pilot with auto-capture on, verify
|
||||
interactions are landing in Dalidou, review quality
|
||||
4. Use the new T420-side organic routing layer in real OpenClaw workflows
|
||||
4. Tighten retrieval quality for the now fully ingested active project corpora
|
||||
5. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
|
||||
6. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
|
||||
7. Expand the remaining boring operations baseline:
|
||||
- retention policy cleanup script
|
||||
- off-Dalidou backup target (rsync or similar)
|
||||
8. Only later consider write-back, reflection, or deeper autonomous behaviors
|
||||
|
||||
See also:
|
||||
|
||||
- [ingestion-waves.md](C:/Users/antoi/ATOCore/docs/ingestion-waves.md)
|
||||
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
|
||||
|
||||
## Guiding Constraints
|
||||
|
||||
- bad memory is worse than no memory
|
||||
- trusted project state must remain highest priority
|
||||
- human-readable sources and machine storage stay separate
|
||||
- OpenClaw integration must not degrade OpenClaw baseline behavior
|
||||
270
docs/dalidou-deployment.md
Normal file
270
docs/dalidou-deployment.md
Normal file
@@ -0,0 +1,270 @@
|
||||
# Dalidou Deployment
|
||||
|
||||
## Purpose
|
||||
Deploy AtoCore on Dalidou as the canonical runtime and machine-memory host.
|
||||
|
||||
## Model
|
||||
|
||||
- Dalidou hosts the canonical AtoCore service.
|
||||
- OpenClaw on the T420 consumes AtoCore over network/Tailscale API.
|
||||
- `sources/vault` and `sources/drive` are read-only inputs by convention.
|
||||
- SQLite/Chroma machine state stays on Dalidou and is not treated as a sync peer.
|
||||
- The app and machine-storage host can be live before the long-term content
|
||||
corpus is fully populated.
|
||||
|
||||
## Directory layout
|
||||
|
||||
```text
|
||||
/srv/storage/atocore/
|
||||
app/ # deployed repo checkout
|
||||
data/
|
||||
db/
|
||||
chroma/
|
||||
cache/
|
||||
tmp/
|
||||
sources/
|
||||
vault/
|
||||
drive/
|
||||
logs/
|
||||
backups/
|
||||
run/
|
||||
```
|
||||
|
||||
## Compose workflow
|
||||
|
||||
The compose definition lives in:
|
||||
|
||||
```text
|
||||
deploy/dalidou/docker-compose.yml
|
||||
```
|
||||
|
||||
The Dalidou environment file should be copied to:
|
||||
|
||||
```text
|
||||
deploy/dalidou/.env
|
||||
```
|
||||
|
||||
starting from:
|
||||
|
||||
```text
|
||||
deploy/dalidou/.env.example
|
||||
```
|
||||
|
||||
## First-time deployment steps
|
||||
|
||||
1. Place the repository under `/srv/storage/atocore/app` — ideally as a
|
||||
proper git clone so future updates can be pulled, not as a static
|
||||
snapshot:
|
||||
|
||||
```bash
|
||||
sudo git clone http://dalidou:3000/Antoine/ATOCore.git \
|
||||
/srv/storage/atocore/app
|
||||
```
|
||||
|
||||
2. Create the canonical directories listed above.
|
||||
3. Copy `deploy/dalidou/.env.example` to `deploy/dalidou/.env`.
|
||||
4. Adjust the source paths if your AtoVault/AtoDrive mirrors live elsewhere.
|
||||
5. Run:
|
||||
|
||||
```bash
|
||||
cd /srv/storage/atocore/app/deploy/dalidou
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
6. Validate:
|
||||
|
||||
```bash
|
||||
curl http://127.0.0.1:8100/health
|
||||
curl http://127.0.0.1:8100/sources
|
||||
```
|
||||
|
||||
## Updating a running deployment
|
||||
|
||||
**Use `deploy/dalidou/deploy.sh` for every code update.** It is the
|
||||
one-shot sync script that:
|
||||
|
||||
- fetches latest main from Gitea into `/srv/storage/atocore/app`
|
||||
- (if the app dir is not a git checkout) backs it up as
|
||||
`<dir>.pre-git-<timestamp>` and re-clones
|
||||
- rebuilds the container image
|
||||
- restarts the container
|
||||
- waits for `/health` to respond
|
||||
- compares the reported `code_version` against the
|
||||
`__version__` in the freshly-pulled source, and exits non-zero
|
||||
if they don't match (deployment drift detection)
|
||||
|
||||
```bash
|
||||
# Normal update from main
|
||||
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
||||
|
||||
# Deploy a specific branch or tag
|
||||
ATOCORE_BRANCH=codex/some-feature \
|
||||
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
||||
|
||||
# Dry-run: show what would happen without touching anything
|
||||
ATOCORE_DEPLOY_DRY_RUN=1 \
|
||||
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
||||
|
||||
# Deploy from a remote host (e.g. the laptop) using the Tailscale
|
||||
# or LAN address instead of loopback
|
||||
ATOCORE_GIT_REMOTE=http://192.168.86.50:3000/Antoine/ATOCore.git \
|
||||
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
||||
```
|
||||
|
||||
The script is idempotent and safe to re-run. It never touches the
|
||||
database directly — schema migrations are applied automatically at
|
||||
service startup by the lifespan handler in `src/atocore/main.py`
|
||||
which calls `init_db()` (which in turn runs the ALTER TABLE
|
||||
statements in `_apply_migrations`).
|
||||
|
||||
### Troubleshooting hostname resolution
|
||||
|
||||
`deploy.sh` defaults `ATOCORE_GIT_REMOTE` to
|
||||
`http://127.0.0.1:3000/Antoine/ATOCore.git` (loopback) because the
|
||||
hostname "dalidou" doesn't reliably resolve on the host itself —
|
||||
the first real Dalidou deploy hit exactly this on 2026-04-08. If
|
||||
you need to override (e.g. running deploy.sh from a laptop against
|
||||
the Dalidou LAN), set `ATOCORE_GIT_REMOTE` explicitly.
|
||||
|
||||
The same applies to `scripts/atocore_client.py`: its default
|
||||
`ATOCORE_BASE_URL` is `http://dalidou:8100` for remote callers, but
|
||||
when running the client on Dalidou itself (or inside the container
|
||||
via `docker exec`), override to loopback:
|
||||
|
||||
```bash
|
||||
ATOCORE_BASE_URL=http://127.0.0.1:8100 \
|
||||
python scripts/atocore_client.py health
|
||||
```
|
||||
|
||||
If you see `{"status": "unavailable", "fail_open": true}` from the
|
||||
client, the first thing to check is whether the base URL resolves
|
||||
from where you're running the client.
|
||||
|
||||
### The deploy.sh self-update race
|
||||
|
||||
When `deploy.sh` itself changes in the commit being pulled, the
|
||||
first run after the update is still executing the *old* script from
|
||||
the bash process's in-memory copy. `git reset --hard` updates the
|
||||
file on disk, but the running bash has already loaded the
|
||||
instructions. On 2026-04-09 this silently shipped an "unknown"
|
||||
`build_sha` because the old Step 2 (which predated env-var export)
|
||||
ran against fresh source.
|
||||
|
||||
`deploy.sh` now detects this: Step 1.5 compares the sha1 of `$0`
|
||||
(the running script) against the sha1 of
|
||||
`$APP_DIR/deploy/dalidou/deploy.sh` (the on-disk copy) after the
|
||||
git reset. If they differ, it sets `ATOCORE_DEPLOY_REEXECED=1` and
|
||||
`exec`s the fresh copy so the rest of the deploy runs under the new
|
||||
script. The sentinel env var prevents infinite recursion.
|
||||
|
||||
You'll see this in the logs as:
|
||||
|
||||
```text
|
||||
==> Step 1.5: deploy.sh changed in the pulled commit; re-exec'ing
|
||||
==> running script hash: <old>
|
||||
==> on-disk script hash: <new>
|
||||
==> re-exec -> /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
||||
```
|
||||
|
||||
To opt out (debugging, for example), pre-set
|
||||
`ATOCORE_DEPLOY_REEXECED=1` before invoking `deploy.sh` and the
|
||||
self-update guard will be skipped.
|
||||
|
||||
### Deployment drift detection
|
||||
|
||||
`/health` reports drift signals at three increasing levels of
|
||||
precision:
|
||||
|
||||
| Field | Source | Precision | When to use |
|
||||
|---|---|---|---|
|
||||
| `version` / `code_version` | `atocore.__version__` (manual bump) | coarse — same value across many commits | quick smoke check that the right *release* is running |
|
||||
| `build_sha` | `ATOCORE_BUILD_SHA` env var, set by `deploy.sh` per build | precise — changes per commit | the canonical drift signal |
|
||||
| `build_time` / `build_branch` | same env var path | per-build | forensics when multiple branches in flight |
|
||||
|
||||
The **precise** check (run on the laptop or any host that can curl
|
||||
the live service AND has the source repo at hand):
|
||||
|
||||
```bash
|
||||
# What's actually running on Dalidou
|
||||
LIVE_SHA=$(curl -fsS http://dalidou:8100/health | grep -o '"build_sha":"[^"]*"' | cut -d'"' -f4)
|
||||
|
||||
# What the deployed branch tip should be
|
||||
EXPECTED_SHA=$(cd /srv/storage/atocore/app && git rev-parse HEAD)
|
||||
|
||||
# Compare
|
||||
if [ "$LIVE_SHA" = "$EXPECTED_SHA" ]; then
|
||||
echo "live is current at $LIVE_SHA"
|
||||
else
|
||||
echo "DRIFT: live $LIVE_SHA vs expected $EXPECTED_SHA"
|
||||
echo "run deploy.sh to sync"
|
||||
fi
|
||||
```
|
||||
|
||||
The `deploy.sh` script does exactly this comparison automatically
|
||||
in its post-deploy verification step (Step 6) and exits non-zero
|
||||
on mismatch. So the **simplest drift check** is just to run
|
||||
`deploy.sh` — if there's nothing to deploy, it succeeds quickly;
|
||||
if the live service is stale, it deploys and verifies.
|
||||
|
||||
If `/health` reports `build_sha: "unknown"`, the running container
|
||||
was started without `deploy.sh` (probably via `docker compose up`
|
||||
directly), and the build provenance was never recorded. Re-run
|
||||
via `deploy.sh` to fix.
|
||||
|
||||
The coarse `code_version` check is still useful as a quick visual
|
||||
sanity check — bumping `__version__` from `0.2.0` to `0.3.0`
|
||||
signals a meaningful release boundary even if the precise
|
||||
`build_sha` is what tools should compare against:
|
||||
|
||||
```bash
|
||||
# Quick sanity check (coarse)
|
||||
curl -s http://127.0.0.1:8100/health | grep -o '"code_version":"[^"]*"'
|
||||
grep '__version__' /srv/storage/atocore/app/src/atocore/__init__.py
|
||||
```
|
||||
|
||||
### Schema migrations on redeploy
|
||||
|
||||
When updating from an older `__version__`, the first startup after
|
||||
the redeploy runs the idempotent ALTER TABLE migrations in
|
||||
`_apply_migrations`. For a pre-0.2.0 → 0.2.0 upgrade the migrations
|
||||
add these columns to existing tables (all with safe defaults so no
|
||||
data is touched):
|
||||
|
||||
- `memories.project TEXT DEFAULT ''`
|
||||
- `memories.last_referenced_at DATETIME`
|
||||
- `memories.reference_count INTEGER DEFAULT 0`
|
||||
- `interactions.response TEXT DEFAULT ''`
|
||||
- `interactions.memories_used TEXT DEFAULT '[]'`
|
||||
- `interactions.chunks_used TEXT DEFAULT '[]'`
|
||||
- `interactions.client TEXT DEFAULT ''`
|
||||
- `interactions.session_id TEXT DEFAULT ''`
|
||||
- `interactions.project TEXT DEFAULT ''`
|
||||
|
||||
Plus new indexes on the new columns. No row data is modified. The
|
||||
migration is safe to run against a database that already has the
|
||||
columns — the `_column_exists` check makes each ALTER a no-op in
|
||||
that case.
|
||||
|
||||
Backup the database before any redeploy (via `POST /admin/backup`)
|
||||
if you want a pre-upgrade snapshot. The migration is additive and
|
||||
reversible by restoring the snapshot.
|
||||
|
||||
## Deferred
|
||||
|
||||
- backup automation
|
||||
- restore/snapshot tooling
|
||||
- reverse proxy / TLS exposure
|
||||
- automated source ingestion job
|
||||
- OpenClaw client wiring
|
||||
|
||||
## Current Reality Check
|
||||
|
||||
When this deployment is first brought up, the service may be healthy before the
|
||||
real corpus has been ingested.
|
||||
|
||||
That means:
|
||||
|
||||
- AtoCore the system can already be hosted on Dalidou
|
||||
- the canonical machine-data location can already be on Dalidou
|
||||
- but the live knowledge/content corpus may still be empty or only partially
|
||||
loaded until source ingestion is run
|
||||
61
docs/dalidou-storage-migration.md
Normal file
61
docs/dalidou-storage-migration.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# Dalidou Storage Migration
|
||||
|
||||
## Goal
|
||||
Establish Dalidou as the canonical AtoCore host while keeping human-readable
|
||||
source layers separate from machine operational storage.
|
||||
|
||||
## Canonical layout
|
||||
|
||||
```text
|
||||
/srv/atocore/
|
||||
app/ # git checkout of this repository
|
||||
data/ # machine operational state
|
||||
db/
|
||||
atocore.db
|
||||
chroma/
|
||||
cache/
|
||||
tmp/
|
||||
sources/
|
||||
vault/ # AtoVault input, read-only by convention
|
||||
drive/ # AtoDrive input, read-only by convention
|
||||
logs/
|
||||
backups/
|
||||
run/
|
||||
config/
|
||||
.env
|
||||
```
|
||||
|
||||
## Environment variables
|
||||
|
||||
Suggested Dalidou values:
|
||||
|
||||
```bash
|
||||
ATOCORE_ENV=production
|
||||
ATOCORE_DATA_DIR=/srv/atocore/data
|
||||
ATOCORE_DB_DIR=/srv/atocore/data/db
|
||||
ATOCORE_CHROMA_DIR=/srv/atocore/data/chroma
|
||||
ATOCORE_CACHE_DIR=/srv/atocore/data/cache
|
||||
ATOCORE_TMP_DIR=/srv/atocore/data/tmp
|
||||
ATOCORE_VAULT_SOURCE_DIR=/srv/atocore/sources/vault
|
||||
ATOCORE_DRIVE_SOURCE_DIR=/srv/atocore/sources/drive
|
||||
ATOCORE_LOG_DIR=/srv/atocore/logs
|
||||
ATOCORE_BACKUP_DIR=/srv/atocore/backups
|
||||
ATOCORE_RUN_DIR=/srv/atocore/run
|
||||
```
|
||||
|
||||
## Migration notes
|
||||
|
||||
- Existing local installs remain backward-compatible.
|
||||
- If `data/atocore.db` already exists, AtoCore continues using it.
|
||||
- Fresh installs default to `data/db/atocore.db`.
|
||||
- Source directories are inputs only; AtoCore should ingest from them but not
|
||||
treat them as writable runtime state.
|
||||
- Avoid syncing live SQLite/Chroma state between Dalidou and other machines.
|
||||
Prefer one canonical running service and API access from OpenClaw.
|
||||
|
||||
## Deferred work
|
||||
|
||||
- service manager wiring
|
||||
- backup/snapshot procedures
|
||||
- automated source registration jobs
|
||||
- OpenClaw integration
|
||||
129
docs/ingestion-waves.md
Normal file
129
docs/ingestion-waves.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# AtoCore Ingestion Waves
|
||||
|
||||
## Purpose
|
||||
|
||||
This document tracks how the corpus should grow without losing signal quality.
|
||||
|
||||
The rule is:
|
||||
|
||||
- ingest in waves
|
||||
- validate retrieval after each wave
|
||||
- only then widen the source scope
|
||||
|
||||
## Wave 1 - Active Project Full Markdown Corpus
|
||||
|
||||
Status: complete
|
||||
|
||||
Projects:
|
||||
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
|
||||
What was ingested:
|
||||
|
||||
- the full markdown/text PKM stacks for the three active projects
|
||||
- selected staged operational docs already under the Dalidou source roots
|
||||
- selected repo markdown/text context for:
|
||||
- `Fullum-Interferometer`
|
||||
- `polisher-sim`
|
||||
- `Polisher-Toolhead` (when markdown exists)
|
||||
|
||||
What was intentionally excluded:
|
||||
|
||||
- binaries
|
||||
- images
|
||||
- PDFs
|
||||
- generated outputs unless they were plain text reports
|
||||
- dependency folders
|
||||
- hidden runtime junk
|
||||
|
||||
Practical result:
|
||||
|
||||
- AtoCore moved from a curated-seed corpus to a real active-project corpus
|
||||
- the live corpus now contains well over one thousand source documents and over
|
||||
twenty thousand chunks
|
||||
- project-specific context building is materially stronger than before
|
||||
|
||||
Main lesson from Wave 1:
|
||||
|
||||
- full project ingestion is valuable
|
||||
- but broad historical/archive material can dilute retrieval for underspecified
|
||||
prompts
|
||||
- context quality now depends more strongly on good project hints and better
|
||||
ranking than on corpus size alone
|
||||
|
||||
## Wave 2 - Trusted Operational Layer Expansion
|
||||
|
||||
Status: next
|
||||
|
||||
Goal:
|
||||
|
||||
- expand `AtoDrive`-style operational truth for the active projects
|
||||
|
||||
Candidate inputs:
|
||||
|
||||
- current status dashboards
|
||||
- decision logs
|
||||
- milestone tracking
|
||||
- curated requirements baselines
|
||||
- explicit next-step plans
|
||||
|
||||
Why this matters:
|
||||
|
||||
- this raises the quality of the high-trust layer instead of only widening
|
||||
general retrieval
|
||||
|
||||
## Wave 3 - Broader Active Engineering References
|
||||
|
||||
Status: planned
|
||||
|
||||
Goal:
|
||||
|
||||
- ingest reusable engineering references that support the active project set
|
||||
without dumping the entire vault
|
||||
|
||||
Candidate inputs:
|
||||
|
||||
- interferometry reference notes directly tied to `p05`
|
||||
- polishing physics references directly tied to `p06`
|
||||
- mirror and structural reference material directly tied to `p04`
|
||||
|
||||
Rule:
|
||||
|
||||
- only bring in references with a clear connection to active work
|
||||
|
||||
## Wave 4 - Wider PKM Population
|
||||
|
||||
Status: deferred
|
||||
|
||||
Goal:
|
||||
|
||||
- widen beyond the active projects while preserving retrieval quality
|
||||
|
||||
Preconditions:
|
||||
|
||||
- stronger ranking
|
||||
- better project-aware routing
|
||||
- stable operational restore path
|
||||
- clearer promotion rules for trusted state
|
||||
|
||||
## Validation After Each Wave
|
||||
|
||||
After every ingestion wave, verify:
|
||||
|
||||
- `stats`
|
||||
- project-specific `query`
|
||||
- project-specific `context-build`
|
||||
- `debug-context`
|
||||
- whether trusted project state still dominates when it should
|
||||
- whether cross-project bleed is getting worse or better
|
||||
|
||||
## Working Rule
|
||||
|
||||
The next wave should only happen when the current wave is:
|
||||
|
||||
- ingested
|
||||
- inspected
|
||||
- retrieval-tested
|
||||
- operationally stable
|
||||
196
docs/master-plan-status.md
Normal file
196
docs/master-plan-status.md
Normal file
@@ -0,0 +1,196 @@
|
||||
# AtoCore Master Plan Status
|
||||
|
||||
## Current Position
|
||||
|
||||
AtoCore is currently between **Phase 7** and **Phase 8**.
|
||||
|
||||
The platform is no longer just a proof of concept. The local engine exists, the
|
||||
core correctness pass is complete, Dalidou hosts the canonical runtime and
|
||||
machine database, and OpenClaw on the T420 can consume AtoCore safely in
|
||||
read-only additive mode.
|
||||
|
||||
## Phase Status
|
||||
|
||||
### Completed
|
||||
|
||||
- Phase 0 - Foundation
|
||||
- Phase 0.5 - Proof of Concept
|
||||
- Phase 1 - Ingestion
|
||||
|
||||
### Baseline Complete
|
||||
|
||||
- Phase 2 - Memory Core
|
||||
- Phase 3 - Retrieval
|
||||
- Phase 5 - Project State
|
||||
- Phase 7 - Context Builder
|
||||
|
||||
### Partial
|
||||
|
||||
- Phase 4 - Identity / Preferences
|
||||
- Phase 8 - OpenClaw Integration
|
||||
|
||||
### Baseline Complete
|
||||
|
||||
- Phase 9 - Reflection (all three foundation commits landed:
|
||||
A capture, B reinforcement, C candidate extraction + review queue).
|
||||
As of 2026-04-11 the capture → reinforce half runs automatically on
|
||||
every Stop-hook capture (length-aware token-overlap matcher handles
|
||||
paragraph-length memories), and project-scoped memories now reach
|
||||
the context pack via a dedicated `--- Project Memories ---` band
|
||||
between identity/preference and retrieved chunks. The extract half
|
||||
is still a manual / batch flow by design (`scripts/atocore_client.py
|
||||
batch-extract` + `triage`). First live batch-extract run over 42
|
||||
captured interactions produced 1 candidate (rule extractor is
|
||||
conservative and keys on structural cues like `## Decision:`
|
||||
headings that rarely appear in conversational LLM responses) —
|
||||
extractor tuning is a known follow-up.
|
||||
|
||||
### Not Yet Complete In The Intended Sense
|
||||
|
||||
- Phase 6 - AtoDrive
|
||||
- Phase 10 - Write-back
|
||||
- Phase 11 - Multi-model
|
||||
- Phase 12 - Evaluation
|
||||
- Phase 13 - Hardening
|
||||
|
||||
### Engineering Layer Planning Sprint
|
||||
|
||||
**Status: complete.** All 8 architecture docs are drafted. The
|
||||
engineering layer is now ready for V1 implementation against the
|
||||
active project set.
|
||||
|
||||
- [engineering-query-catalog.md](architecture/engineering-query-catalog.md) —
|
||||
the 20 v1-required queries the engineering layer must answer
|
||||
- [memory-vs-entities.md](architecture/memory-vs-entities.md) —
|
||||
canonical home split between memory and entity tables
|
||||
- [promotion-rules.md](architecture/promotion-rules.md) —
|
||||
Layer 0 → Layer 2 pipeline, triggers, review queue mechanics
|
||||
- [conflict-model.md](architecture/conflict-model.md) —
|
||||
detection, representation, and resolution of contradictory facts
|
||||
- [tool-handoff-boundaries.md](architecture/tool-handoff-boundaries.md) —
|
||||
KB-CAD / KB-FEM one-way mirror stance, ingest endpoints, drift handling
|
||||
- [representation-authority.md](architecture/representation-authority.md) —
|
||||
canonical home matrix across PKM / KB / repos / AtoCore for 22 fact kinds
|
||||
- [human-mirror-rules.md](architecture/human-mirror-rules.md) —
|
||||
templates, regeneration triggers, edit flow, "do not edit" enforcement
|
||||
- [engineering-v1-acceptance.md](architecture/engineering-v1-acceptance.md) —
|
||||
measurable done definition with 23 acceptance criteria
|
||||
- [engineering-knowledge-hybrid-architecture.md](architecture/engineering-knowledge-hybrid-architecture.md) —
|
||||
the 5-layer model (from the previous planning wave)
|
||||
- [engineering-ontology-v1.md](architecture/engineering-ontology-v1.md) —
|
||||
the initial V1 object and relationship inventory (previous wave)
|
||||
- [project-identity-canonicalization.md](architecture/project-identity-canonicalization.md) —
|
||||
the helper-at-every-service-boundary contract that keeps the
|
||||
trust hierarchy dependable across alias and canonical-id callers;
|
||||
required reading before adding new project-keyed entity surfaces
|
||||
in the V1 implementation sprint
|
||||
|
||||
The next concrete next step is the V1 implementation sprint, which
|
||||
should follow engineering-v1-acceptance.md as its checklist, and
|
||||
must apply the project-identity-canonicalization contract at every
|
||||
new service-layer entry point.
|
||||
|
||||
### LLM Client Integration
|
||||
|
||||
A separate but related architectural concern: how AtoCore is reachable
|
||||
from many different LLM client contexts (OpenClaw, Claude Code, future
|
||||
Codex skills, future MCP server). The layering rule is documented in:
|
||||
|
||||
- [llm-client-integration.md](architecture/llm-client-integration.md) —
|
||||
three-layer shape: HTTP API → shared operator client
|
||||
(`scripts/atocore_client.py`) → per-agent thin frontends; the
|
||||
shared client is the canonical backbone every new client should
|
||||
shell out to instead of reimplementing HTTP calls
|
||||
|
||||
This sits implicitly between Phase 8 (OpenClaw) and Phase 11
|
||||
(multi-model). Memory-review and engineering-entity commands are
|
||||
deferred from the shared client until their workflows are exercised.
|
||||
|
||||
## What Is Real Today
|
||||
|
||||
- canonical AtoCore runtime on Dalidou
|
||||
- canonical machine DB and vector store on Dalidou
|
||||
- project registry with:
|
||||
- template
|
||||
- proposal preview
|
||||
- register
|
||||
- update
|
||||
- refresh
|
||||
- read-only additive OpenClaw helper on the T420
|
||||
- seeded project corpus for:
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
- conservative Trusted Project State for those active projects
|
||||
- first operational backup foundation for SQLite + project registry
|
||||
- implementation-facing architecture notes for future engineering knowledge work
|
||||
- first organic routing layer in OpenClaw via:
|
||||
- `detect-project`
|
||||
- `auto-context`
|
||||
|
||||
## Now
|
||||
|
||||
These are the current practical priorities.
|
||||
|
||||
1. Finish practical OpenClaw integration
|
||||
- make the helper lifecycle feel natural in daily use
|
||||
- use the new organic routing layer for project-knowledge questions
|
||||
- confirm fail-open behavior remains acceptable
|
||||
- keep AtoCore clearly additive
|
||||
2. Tighten retrieval quality
|
||||
- reduce cross-project competition
|
||||
- improve ranking on short or ambiguous prompts
|
||||
- add only a few anchor docs where retrieval is still weak
|
||||
3. Continue controlled ingestion
|
||||
- deepen active projects selectively
|
||||
- avoid noisy bulk corpus growth
|
||||
4. Strengthen operational boringness
|
||||
- backup and restore procedure
|
||||
- Chroma rebuild / backup policy
|
||||
- retention and restore validation
|
||||
|
||||
## Next
|
||||
|
||||
These are the next major layers after the current practical pass.
|
||||
|
||||
1. Clarify AtoDrive as a real operational truth layer
|
||||
2. Mature identity / preferences handling
|
||||
3. Improve observability for:
|
||||
- retrieval quality
|
||||
- context-pack inspection
|
||||
- comparison of behavior with and without AtoCore
|
||||
|
||||
## Later
|
||||
|
||||
These are the deliberate future expansions already supported by the architecture
|
||||
direction, but not yet ready for immediate implementation.
|
||||
|
||||
1. Minimal engineering knowledge layer
|
||||
- driven by `docs/architecture/engineering-knowledge-hybrid-architecture.md`
|
||||
- guided by `docs/architecture/engineering-ontology-v1.md`
|
||||
2. Minimal typed objects and relationships
|
||||
3. Evidence-linking and provenance-rich structured records
|
||||
4. Human mirror generation from structured state
|
||||
|
||||
## Not Yet
|
||||
|
||||
These remain intentionally deferred.
|
||||
|
||||
- automatic write-back from OpenClaw into AtoCore
|
||||
- automatic memory promotion
|
||||
- ~~reflection loop integration~~ — baseline now in (capture→reinforce
|
||||
auto, extract batch/manual). Extractor tuning and scheduled batch
|
||||
extraction still open.
|
||||
- replacing OpenClaw's own memory system
|
||||
- live machine-DB sync between machines
|
||||
- full ontology / graph expansion before the current baseline is stable
|
||||
|
||||
## Working Rule
|
||||
|
||||
The next sensible implementation threshold for the engineering ontology work is:
|
||||
|
||||
- after the current ingestion, retrieval, registry, OpenClaw helper, organic
|
||||
routing, and backup baseline feels boring and dependable
|
||||
|
||||
Until then, the architecture docs should shape decisions, not force premature
|
||||
schema work.
|
||||
248
docs/next-steps.md
Normal file
248
docs/next-steps.md
Normal file
@@ -0,0 +1,248 @@
|
||||
# AtoCore Next Steps
|
||||
|
||||
## Current Position
|
||||
|
||||
AtoCore now has:
|
||||
|
||||
- canonical runtime and machine storage on Dalidou
|
||||
- separated source and machine-data boundaries
|
||||
- initial self-knowledge ingested into the live instance
|
||||
- trusted project-state entries for AtoCore itself
|
||||
- a first read-only OpenClaw integration path on the T420
|
||||
- a first real active-project corpus batch for:
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
|
||||
This working list should be read alongside:
|
||||
|
||||
- [master-plan-status.md](C:/Users/antoi/ATOCore/docs/master-plan-status.md)
|
||||
|
||||
## Immediate Next Steps
|
||||
|
||||
1. ~~Re-run the backup/restore drill~~ — DONE 2026-04-11, full pass
|
||||
2. ~~Turn on auto-capture of Claude Code sessions~~ — DONE 2026-04-11,
|
||||
Stop hook via `deploy/hooks/capture_stop.py` → `POST /interactions`
|
||||
with `reinforce=false`; kill switch: `ATOCORE_CAPTURE_DISABLED=1`
|
||||
2a. Run a short real-use pilot with auto-capture on
|
||||
- verify interactions are landing in Dalidou
|
||||
- check prompt/response quality and truncation
|
||||
- confirm fail-open: no user-visible impact when Dalidou is down
|
||||
3. Use the T420 `atocore-context` skill and the new organic routing layer in
|
||||
real OpenClaw workflows
|
||||
- confirm `auto-context` feels natural
|
||||
- confirm project inference is good enough in practice
|
||||
- confirm the fail-open behavior remains acceptable in practice
|
||||
4. Review retrieval quality after the first real project ingestion batch
|
||||
- check whether the top hits are useful
|
||||
- check whether trusted project state remains dominant
|
||||
- reduce cross-project competition and prompt ambiguity where needed
|
||||
- use `debug-context` to inspect the exact last AtoCore supplement
|
||||
5. Treat the active-project full markdown/text wave as complete
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
6. Define a cleaner source refresh model
|
||||
- make the difference between source truth, staged inputs, and machine store
|
||||
explicit
|
||||
- move toward a project source registry and refresh workflow
|
||||
- foundation now exists via project registry + per-project refresh API
|
||||
- registration policy + template + proposal + approved registration are now
|
||||
the normal path for new projects
|
||||
7. Move to Wave 2 trusted-operational ingestion
|
||||
- curated dashboards
|
||||
- decision logs
|
||||
- milestone/current-status views
|
||||
- operational truth, not just raw project notes
|
||||
8. Integrate the new engineering architecture docs into active planning, not immediate schema code
|
||||
- keep `docs/architecture/engineering-knowledge-hybrid-architecture.md` as the target layer model
|
||||
- keep `docs/architecture/engineering-ontology-v1.md` as the V1 structured-domain target
|
||||
- do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
|
||||
9. Finish the boring operations baseline around backup
|
||||
- retention policy cleanup script (snapshots dir grows
|
||||
monotonically today)
|
||||
- off-Dalidou backup target (at minimum an rsync to laptop or
|
||||
another host so a single-disk failure isn't terminal)
|
||||
- automatic post-backup validation (have `create_runtime_backup`
|
||||
call `validate_backup` on its own output and refuse to
|
||||
declare success if validation fails)
|
||||
- DONE in commits be40994 / 0382238 / 3362080 / this one:
|
||||
- `create_runtime_backup` + `list_runtime_backups` +
|
||||
`validate_backup` + `restore_runtime_backup` with CLI
|
||||
- `POST /admin/backup` with `include_chroma=true` under
|
||||
the ingestion lock
|
||||
- `/health` build_sha / build_time / build_branch provenance
|
||||
- `deploy.sh` self-update re-exec guard + build_sha drift
|
||||
verification
|
||||
- live drill procedure in `docs/backup-restore-procedure.md`
|
||||
with failure-mode table and the memory_type=episodic
|
||||
marker pattern from the 2026-04-09 drill
|
||||
10. Keep deeper automatic runtime integration modest until the organic read-only
|
||||
model has proven value
|
||||
|
||||
## Trusted State Status
|
||||
|
||||
The first conservative trusted-state promotion pass is now complete for:
|
||||
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
|
||||
Each project now has a small set of stable entries covering:
|
||||
|
||||
- summary
|
||||
- architecture or boundary decision
|
||||
- key constraints
|
||||
- current next focus
|
||||
|
||||
This materially improves `context/build` quality for project-hinted prompts.
|
||||
|
||||
## Recommended Near-Term Project Work
|
||||
|
||||
The active-project full markdown/text wave is now in.
|
||||
|
||||
The near-term work is now:
|
||||
|
||||
1. strengthen retrieval quality
|
||||
2. promote or refine trusted operational truth where the broad corpus is now too noisy
|
||||
3. keep trusted project state concise and high-confidence
|
||||
4. widen only through named ingestion waves
|
||||
|
||||
## Recommended Next Wave Inputs
|
||||
|
||||
Wave 2 should emphasize trusted operational truth, not bulk historical notes.
|
||||
|
||||
P04:
|
||||
|
||||
- current status dashboard
|
||||
- current selected design path
|
||||
- current frame interface truth
|
||||
- current next-step milestone view
|
||||
|
||||
P05:
|
||||
|
||||
- selected vendor path
|
||||
- current error-budget baseline
|
||||
- current architecture freeze or open decisions
|
||||
- current procurement / next-action view
|
||||
|
||||
P06:
|
||||
|
||||
- current system map
|
||||
- current shared contracts baseline
|
||||
- current calibration procedure truth
|
||||
- current July / proving roadmap view
|
||||
|
||||
## Deferred On Purpose
|
||||
|
||||
- automatic write-back from OpenClaw into AtoCore
|
||||
- automatic memory promotion
|
||||
- ~~reflection loop integration~~ — baseline now landed (2026-04-11):
|
||||
Stop hook runs reinforce automatically, project memories are folded
|
||||
into the context pack, batch-extract and triage CLIs exist. What
|
||||
remains deferred: scheduled/automatic batch extraction and extractor
|
||||
rule tuning (rule-based extractor produced 1 candidate from 42 real
|
||||
captures — needs new cues for conversational LLM content).
|
||||
- replacing OpenClaw's own memory system
|
||||
- syncing the live machine DB between machines
|
||||
|
||||
## Success Criteria For The Next Batch
|
||||
|
||||
The next batch is successful if:
|
||||
|
||||
- OpenClaw can use AtoCore naturally when context is needed
|
||||
- OpenClaw can infer registered projects and call AtoCore organically for
|
||||
project-knowledge questions
|
||||
- the active-project full corpus wave can be inspected and used concretely
|
||||
through `auto-context`, `context-build`, and `debug-context`
|
||||
- OpenClaw can also register a new project cleanly before refreshing it
|
||||
- existing project registrations can be refined safely before refresh when the
|
||||
staged source set evolves
|
||||
- AtoCore answers correctly for the active project set
|
||||
- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
|
||||
- trusted project state remains concise and high confidence
|
||||
- project ingestion remains controlled rather than noisy
|
||||
- the canonical Dalidou instance stays stable
|
||||
|
||||
## Retrieval Quality Review — 2026-04-11
|
||||
|
||||
First sweep with real project-hinted queries on Dalidou. Used
|
||||
`POST /context/build` against p04, p05, p06 with representative
|
||||
questions and inspected `formatted_context`.
|
||||
|
||||
Findings:
|
||||
|
||||
- **Trusted Project State is surfacing correctly.** The DECISION and
|
||||
REQUIREMENT categories appear at the top of the pack and include
|
||||
the expected key facts (e.g. p04 "Option B conical-back mirror
|
||||
architecture"). This is the strongest signal in the pack today.
|
||||
- **Chunk retrieval is relevant on-topic but broad.** Top chunks for
|
||||
the p04 architecture query are PDR intro, CAD assembly overview,
|
||||
and the index — all on the right project but none of them directly
|
||||
answer the "why was Option B chosen" question. The authoritative
|
||||
answer sits in Project State, not in the chunks.
|
||||
- **Active memories are NOT reaching the pack.** The context builder
|
||||
surfaces Trusted Project State and retrieved chunks but does not
|
||||
include the 21 active project/knowledge memories. Reinforcement
|
||||
(Phase 9 Commit B) bumps memory confidence without the memory ever
|
||||
being read back into a prompt — the reflection loop has no outlet
|
||||
on the retrieval side. This is a design gap, not a bug: needs a
|
||||
decision on whether memories should feed into context assembly,
|
||||
and if so at what trust level (below project_state, above chunks).
|
||||
- **Cross-project bleed is low.** The p04 query did pull one p05
|
||||
chunk (CGH_Design_Input_for_AOM) as the bottom hit but the top-4
|
||||
were all p04.
|
||||
|
||||
Proposed follow-ups (not yet scheduled):
|
||||
|
||||
1. ~~Decide whether memories should be folded into `formatted_context`
|
||||
and under what section header.~~ DONE 2026-04-11 (commits 8ea53f4,
|
||||
5913da5, 1161645). A `--- Project Memories ---` band now sits
|
||||
between identity/preference and retrieved chunks, gated on a
|
||||
canonical project hint to prevent cross-project bleed. Budget
|
||||
ratio 0.25 (tuned empirically — paragraph memories are ~400 chars
|
||||
and earlier 0.15 ratio starved the first entry by one char).
|
||||
Verified live: p04 architecture query surfaces the Option B memory.
|
||||
2. Re-run the same three queries after any builder change and compare
|
||||
`formatted_context` diffs — still open, and is the natural entry
|
||||
point for the retrieval eval harness on the roadmap.
|
||||
|
||||
## Reflection Loop Live Check — 2026-04-11
|
||||
|
||||
First real run of `batch-extract` across 42 captured Claude Code
|
||||
interactions on Dalidou produced exactly **1 candidate**, and that
|
||||
candidate was a synthetic test capture from earlier in the session
|
||||
(rejected). Finding:
|
||||
|
||||
- The rule-based extractor in `src/atocore/memory/extractor.py` keys
|
||||
on explicit structural cues (decision headings like
|
||||
`## Decision: ...`, preference sentences, etc.). Real Claude Code
|
||||
responses are conversational and almost never contain those cues.
|
||||
- This means the capture → extract half of the reflection loop is
|
||||
effectively inert against organic LLM sessions until either the
|
||||
rules are broadened (new cue families: "we chose X because...",
|
||||
"the selected approach is...", etc.) or an LLM-assisted extraction
|
||||
path is added alongside the rule-based one.
|
||||
- Capture → reinforce is working correctly on live data (length-aware
|
||||
matcher verified on live paraphrase of a p04 memory).
|
||||
|
||||
Follow-up candidates (not yet scheduled):
|
||||
|
||||
1. Extractor rule expansion — add conversational-form rules so real
|
||||
session text has a chance of surfacing candidates.
|
||||
2. LLM-assisted extractor as a separate rule family, guarded by
|
||||
confidence and always landing in `status=candidate` (never active).
|
||||
3. Retrieval eval harness — diffable scorecard of
|
||||
`formatted_context` across a fixed question set per active project.
|
||||
|
||||
## Long-Run Goal
|
||||
|
||||
The long-run target is:
|
||||
|
||||
- continue working normally inside PKM project stacks and Gitea repos
|
||||
- let OpenClaw keep its own memory and runtime behavior
|
||||
- let AtoCore supplement LLM work with stronger trusted context, retrieval, and
|
||||
context assembly
|
||||
|
||||
That means AtoCore should behave like a durable external context engine and
|
||||
machine-memory layer, not a replacement for normal repo work or OpenClaw memory.
|
||||
157
docs/openclaw-integration-contract.md
Normal file
157
docs/openclaw-integration-contract.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# OpenClaw Integration Contract
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines the first safe integration contract between OpenClaw and
|
||||
AtoCore.
|
||||
|
||||
The goal is to let OpenClaw consume AtoCore as an external context service
|
||||
without degrading OpenClaw's existing baseline behavior.
|
||||
|
||||
## Current Implemented State
|
||||
|
||||
The first safe integration foundation now exists on the T420 workspace:
|
||||
|
||||
- OpenClaw's own memory system is unchanged
|
||||
- a local read-only helper skill exists at:
|
||||
- `/home/papa/clawd/skills/atocore-context/`
|
||||
- the helper currently talks to the canonical Dalidou instance
|
||||
- the helper has verified:
|
||||
- `health`
|
||||
- `project-state`
|
||||
- `query`
|
||||
- `detect-project`
|
||||
- `auto-context`
|
||||
- fail-open fallback when AtoCore is unavailable
|
||||
|
||||
This means the network and workflow foundation is working, and the first
|
||||
organic routing layer now exists, even though deeper autonomous integration
|
||||
into OpenClaw runtime behavior is still deferred.
|
||||
|
||||
## Integration Principles
|
||||
|
||||
- OpenClaw remains the runtime and orchestration layer
|
||||
- AtoCore remains the context enrichment layer
|
||||
- AtoCore is optional at runtime
|
||||
- if AtoCore is unavailable, OpenClaw must continue operating normally
|
||||
- initial integration is read-only
|
||||
- OpenClaw should not automatically write memories, project state, or ingestion
|
||||
updates during the first integration batch
|
||||
|
||||
## First Safe Responsibilities
|
||||
|
||||
OpenClaw may use AtoCore for:
|
||||
|
||||
- health and readiness checks
|
||||
- context building for contextual prompts
|
||||
- retrieval/query support
|
||||
- project-state lookup when a project is detected
|
||||
- automatic project-context augmentation for project-knowledge questions
|
||||
|
||||
OpenClaw should not yet use AtoCore for:
|
||||
|
||||
- automatic memory write-back
|
||||
- automatic reflection
|
||||
- conflict resolution decisions
|
||||
- replacing OpenClaw's own memory system
|
||||
|
||||
## First API Surface
|
||||
|
||||
OpenClaw should treat these as the initial contract:
|
||||
|
||||
- `GET /health`
|
||||
- check service readiness
|
||||
- `GET /sources`
|
||||
- inspect source registration state
|
||||
- `POST /context/build`
|
||||
- ask AtoCore for a budgeted context pack
|
||||
- `POST /query`
|
||||
- use retrieval when useful
|
||||
|
||||
Additional project-state inspection can be added if needed, but the first
|
||||
integration should stay small and resilient.
|
||||
|
||||
## Current Helper Surface
|
||||
|
||||
The current helper script exposes:
|
||||
|
||||
- `health`
|
||||
- `sources`
|
||||
- `stats`
|
||||
- `projects`
|
||||
- `project-template`
|
||||
- `detect-project <prompt>`
|
||||
- `auto-context <prompt> [budget] [project]`
|
||||
- `debug-context`
|
||||
- `propose-project ...`
|
||||
- `register-project ...`
|
||||
- `update-project ...`
|
||||
- `refresh-project <project>`
|
||||
- `project-state <project>`
|
||||
- `query <prompt> [top_k]`
|
||||
- `context-build <prompt> [project] [budget]`
|
||||
- `ingest-sources`
|
||||
|
||||
This means OpenClaw can now use the full practical registry lifecycle for known
|
||||
projects without dropping down to raw API calls.
|
||||
|
||||
## Failure Behavior
|
||||
|
||||
OpenClaw must treat AtoCore as additive.
|
||||
|
||||
If AtoCore times out, returns an error, or is unavailable:
|
||||
|
||||
- OpenClaw should continue with its own normal baseline behavior
|
||||
- no hard dependency should block the user's run
|
||||
- no partially written AtoCore state should be assumed
|
||||
|
||||
## Suggested OpenClaw Configuration
|
||||
|
||||
OpenClaw should eventually expose configuration like:
|
||||
|
||||
- `ATOCORE_ENABLED`
|
||||
- `ATOCORE_BASE_URL`
|
||||
- `ATOCORE_TIMEOUT_MS`
|
||||
- `ATOCORE_FAIL_OPEN`
|
||||
|
||||
Recommended first behavior:
|
||||
|
||||
- enabled only when configured
|
||||
- low timeout
|
||||
- fail open by default
|
||||
- no writeback enabled
|
||||
|
||||
## Suggested Usage Pattern
|
||||
|
||||
1. OpenClaw receives a user request
|
||||
2. If the prompt looks like project knowledge, OpenClaw should try:
|
||||
- `auto-context "<prompt>" 3000`
|
||||
- optionally `debug-context` immediately after if a human wants to inspect
|
||||
the exact AtoCore supplement
|
||||
3. If the prompt is clearly asking for trusted current truth, OpenClaw should
|
||||
prefer:
|
||||
- `project-state <project>`
|
||||
4. If the user explicitly asked for source refresh or ingestion, OpenClaw
|
||||
should use:
|
||||
- `refresh-project <id>`
|
||||
5. If AtoCore returns usable context, OpenClaw includes it
|
||||
6. If AtoCore fails, returns `no_project_match`, or is unavailable, OpenClaw
|
||||
proceeds normally
|
||||
|
||||
## Deferred Work
|
||||
|
||||
- deeper automatic runtime wiring inside OpenClaw itself
|
||||
- memory promotion rules
|
||||
- identity and preference write flows
|
||||
- reflection loop
|
||||
- automatic ingestion requests from OpenClaw
|
||||
- write-back policy
|
||||
- conflict-resolution integration
|
||||
|
||||
## Precondition Before Wider Ingestion
|
||||
|
||||
Before bulk ingestion of projects or ecosystem notes:
|
||||
|
||||
- the AtoCore service should be reachable from the T420
|
||||
- the OpenClaw failure fallback path should be confirmed
|
||||
- the initial contract should be documented and stable
|
||||
142
docs/operating-model.md
Normal file
142
docs/operating-model.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# AtoCore Operating Model
|
||||
|
||||
## Purpose
|
||||
|
||||
This document makes the intended day-to-day operating model explicit.
|
||||
|
||||
The goal is not to replace how work already happens. The goal is to make that
|
||||
existing workflow stronger by adding a durable context engine.
|
||||
|
||||
## Core Idea
|
||||
|
||||
Normal work continues in:
|
||||
|
||||
- PKM project notes
|
||||
- Gitea repositories
|
||||
- Discord and OpenClaw workflows
|
||||
|
||||
OpenClaw keeps:
|
||||
|
||||
- its own memory
|
||||
- its own runtime and orchestration behavior
|
||||
- its own workspace and direct file/repo tooling
|
||||
|
||||
AtoCore adds:
|
||||
|
||||
- trusted project state
|
||||
- retrievable cross-source context
|
||||
- durable machine memory
|
||||
- context assembly that improves prompt quality and robustness
|
||||
|
||||
## Layer Responsibilities
|
||||
|
||||
- PKM and repos
|
||||
- human-authoritative project sources
|
||||
- where knowledge is created, edited, reviewed, and maintained
|
||||
- OpenClaw
|
||||
- active operating environment
|
||||
- orchestration, direct repo work, messaging, agent workflows, local memory
|
||||
- AtoCore
|
||||
- compiled context engine
|
||||
- durable machine-memory host
|
||||
- retrieval and context assembly layer
|
||||
|
||||
## Why This Architecture Works
|
||||
|
||||
Each layer has different strengths and weaknesses.
|
||||
|
||||
- PKM and repos are rich but noisy and manual to search
|
||||
- OpenClaw memory is useful but session-shaped and not the whole project record
|
||||
- raw LLM repo work is powerful but can miss trusted broader context
|
||||
- AtoCore can compile context across sources and provide a better prompt input
|
||||
|
||||
The result should be:
|
||||
|
||||
- stronger prompts
|
||||
- more robust outputs
|
||||
- less manual reconstruction
|
||||
- better continuity across sessions and models
|
||||
|
||||
## What AtoCore Should Not Replace
|
||||
|
||||
AtoCore should not replace:
|
||||
|
||||
- normal file reads
|
||||
- direct repo search
|
||||
- direct PKM work
|
||||
- OpenClaw's own memory
|
||||
- OpenClaw's runtime and tool behavior
|
||||
|
||||
It should supplement those systems.
|
||||
|
||||
## What Healthy Usage Looks Like
|
||||
|
||||
When working on a project:
|
||||
|
||||
1. OpenClaw still uses local workspace/repo context
|
||||
2. OpenClaw still uses its own memory
|
||||
3. AtoCore adds:
|
||||
- trusted current project state
|
||||
- retrieved project documents
|
||||
- cross-source project context
|
||||
- context assembly for more robust model prompts
|
||||
|
||||
## Practical Rule
|
||||
|
||||
Think of AtoCore as the durable external context hard drive for LLM work:
|
||||
|
||||
- fast machine-readable context
|
||||
- persistent project understanding
|
||||
- stronger prompt inputs
|
||||
- no need to replace the normal project workflow
|
||||
|
||||
That is the architecture target.
|
||||
|
||||
## Why The Staged Markdown Exists
|
||||
|
||||
The staged markdown on Dalidou is a source-input layer, not the end product of
|
||||
the system.
|
||||
|
||||
In the current deployment model:
|
||||
|
||||
1. selected PKM, AtoDrive, or repo docs are copied or mirrored into a Dalidou
|
||||
source path
|
||||
2. AtoCore ingests them
|
||||
3. the machine store keeps the processed representation
|
||||
4. retrieval and context building operate on that machine store
|
||||
|
||||
So if the staged docs look very similar to your original PKM notes, that is
|
||||
expected. They are source material, not the compiled context layer itself.
|
||||
|
||||
## What Happens When A Source Changes
|
||||
|
||||
If you edit a PKM note or repo doc at the original source, AtoCore does not
|
||||
magically know yet.
|
||||
|
||||
The current model is refresh-based:
|
||||
|
||||
1. update the human-authoritative source
|
||||
2. refresh or re-stage the relevant project source set on Dalidou
|
||||
3. run ingestion again
|
||||
4. let AtoCore update the machine representation
|
||||
|
||||
This is still an intermediate workflow. The long-run target is a cleaner source
|
||||
registry and refresh model so that commands like `refresh p05-interferometer`
|
||||
become natural and reliable.
|
||||
|
||||
## Current Scope Of Ingestion
|
||||
|
||||
The current project corpus is intentionally selective, not exhaustive.
|
||||
|
||||
For active projects, the goal right now is to ingest:
|
||||
|
||||
- high-value anchor docs
|
||||
- strong meeting notes with real decisions
|
||||
- architecture and constraints docs
|
||||
- selected repo context that explains the system shape
|
||||
|
||||
The goal is not to dump the entire PKM or whole repo tree into AtoCore on the
|
||||
first pass.
|
||||
|
||||
So if a project only has some curated notes and not the full project universe in
|
||||
the staged area yet, that is normal for the current phase.
|
||||
96
docs/operations.md
Normal file
96
docs/operations.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# AtoCore Operations
|
||||
|
||||
Current operating order for improving AtoCore:
|
||||
|
||||
1. Retrieval-quality pass
|
||||
2. Wave 2 trusted-operational ingestion
|
||||
3. AtoDrive clarification
|
||||
4. Restore and ops validation
|
||||
|
||||
## Retrieval-Quality Pass
|
||||
|
||||
Current live behavior:
|
||||
|
||||
- broad prompts like `gigabit` and `polisher` can surface archive/history noise
|
||||
- meaningful project prompts perform much better
|
||||
- ranking quality now matters more than raw corpus growth
|
||||
|
||||
Use the operator client to audit retrieval:
|
||||
|
||||
```bash
|
||||
python scripts/atocore_client.py audit-query "gigabit" 5
|
||||
python scripts/atocore_client.py audit-query "polisher" 5
|
||||
python scripts/atocore_client.py audit-query "mirror frame stiffness requirements and selected architecture" 5 p04-gigabit
|
||||
python scripts/atocore_client.py audit-query "interferometer error budget and vendor selection constraints" 5 p05-interferometer
|
||||
python scripts/atocore_client.py audit-query "polisher system map shared contracts and calibration workflow" 5 p06-polisher
|
||||
```
|
||||
|
||||
What to improve:
|
||||
|
||||
- reduce `_archive`, `pre-cleanup`, `pre-migration`, and `History` prominence
|
||||
- prefer current-status, decision, requirement, architecture-freeze, and milestone docs
|
||||
- prefer trusted project-state when it expresses current truth
|
||||
- avoid letting broad single-word prompts drift into stale chunks
|
||||
|
||||
## Wave 2 Trusted-Operational Ingestion
|
||||
|
||||
Do not ingest the whole PKM vault next.
|
||||
|
||||
Prioritize, for each active project:
|
||||
|
||||
- current status
|
||||
- current decisions
|
||||
- requirements baseline
|
||||
- architecture freeze / current baseline
|
||||
- milestone plan
|
||||
- next actions
|
||||
|
||||
Useful commands:
|
||||
|
||||
```bash
|
||||
python scripts/atocore_client.py project-state p04-gigabit
|
||||
python scripts/atocore_client.py project-state p05-interferometer
|
||||
python scripts/atocore_client.py project-state p06-polisher
|
||||
python scripts/atocore_client.py refresh-project p04-gigabit
|
||||
python scripts/atocore_client.py refresh-project p05-interferometer
|
||||
python scripts/atocore_client.py refresh-project p06-polisher
|
||||
```
|
||||
|
||||
## AtoDrive Clarification
|
||||
|
||||
Treat AtoDrive as a curated trusted-operational source, not a generic dump.
|
||||
|
||||
Good candidates:
|
||||
|
||||
- current dashboards
|
||||
- approved baselines
|
||||
- architecture freezes
|
||||
- decision logs
|
||||
- milestone and next-step views
|
||||
|
||||
Avoid by default:
|
||||
|
||||
- duplicated exports
|
||||
- stale snapshots
|
||||
- generic archives
|
||||
- exploratory notes that are not designated current truth
|
||||
|
||||
## Restore and Ops Validation
|
||||
|
||||
Backups are not enough until restore has been tested.
|
||||
|
||||
Validate:
|
||||
|
||||
- SQLite metadata restore
|
||||
- Chroma restore or rebuild
|
||||
- project registry restore
|
||||
- project refresh after recovery
|
||||
- retrieval audit before and after recovery
|
||||
|
||||
Baseline capture:
|
||||
|
||||
```bash
|
||||
python scripts/atocore_client.py health
|
||||
python scripts/atocore_client.py stats
|
||||
python scripts/atocore_client.py projects
|
||||
```
|
||||
321
docs/phase9-first-real-use.md
Normal file
321
docs/phase9-first-real-use.md
Normal file
@@ -0,0 +1,321 @@
|
||||
# Phase 9 First Real Use Report
|
||||
|
||||
## What this is
|
||||
|
||||
The first empirical exercise of the Phase 9 reflection loop after
|
||||
Commits A, B, and C all landed. The goal is to find out where the
|
||||
extractor and the reinforcement matcher actually behave well versus
|
||||
where their behaviour drifts from the design intent.
|
||||
|
||||
The validation is reproducible. To re-run:
|
||||
|
||||
```bash
|
||||
python scripts/phase9_first_real_use.py
|
||||
```
|
||||
|
||||
This writes an isolated SQLite + Chroma store under
|
||||
`data/validation/phase9-first-use/` (gitignored), seeds three active
|
||||
memories, then runs eight sample interactions through the full
|
||||
capture → reinforce → extract pipeline.
|
||||
|
||||
## What we ran
|
||||
|
||||
Eight synthetic interactions, each paraphrased from a real working
|
||||
session about AtoCore itself or the active engineering projects:
|
||||
|
||||
| # | Label | Project | Expected |
|
||||
|---|--------------------------------------|----------------------|---------------------------|
|
||||
| 1 | exdev-mount-merge-decision | atocore | 1 decision_heading |
|
||||
| 2 | ownership-was-the-real-fix | atocore | 1 fact_heading |
|
||||
| 3 | memory-vs-entity-canonical-home | atocore | 1 decision_heading (long) |
|
||||
| 4 | auto-promotion-deferred | atocore | 1 decision_heading |
|
||||
| 5 | preference-rebase-workflow | atocore | 1 preference_sentence |
|
||||
| 6 | constraint-from-doc-cite | p05-interferometer | 1 constraint_heading |
|
||||
| 7 | prose-only-no-cues | atocore | 0 candidates |
|
||||
| 8 | multiple-cues-in-one-interaction | p06-polisher | 3 distinct rules |
|
||||
|
||||
Plus 3 seed memories were inserted before the run:
|
||||
|
||||
- `pref_rebase`: "prefers rebase-based workflows because history stays linear" (preference, 0.6)
|
||||
- `pref_concise`: "writes commit messages focused on the why, not the what" (preference, 0.6)
|
||||
- `identity_runs_atocore`: "mechanical engineer who runs AtoCore for context engineering" (identity, 0.9)
|
||||
|
||||
## What happened — extraction (the good news)
|
||||
|
||||
**Every extraction expectation was met exactly.** All eight samples
|
||||
produced the predicted candidate count and the predicted rule
|
||||
classifications:
|
||||
|
||||
| Sample | Expected | Got | Pass |
|
||||
|---------------------------------------|----------|-----|------|
|
||||
| exdev-mount-merge-decision | 1 | 1 | ✅ |
|
||||
| ownership-was-the-real-fix | 1 | 1 | ✅ |
|
||||
| memory-vs-entity-canonical-home | 1 | 1 | ✅ |
|
||||
| auto-promotion-deferred | 1 | 1 | ✅ |
|
||||
| preference-rebase-workflow | 1 | 1 | ✅ |
|
||||
| constraint-from-doc-cite | 1 | 1 | ✅ |
|
||||
| prose-only-no-cues | **0** | **0** | ✅ |
|
||||
| multiple-cues-in-one-interaction | 3 | 3 | ✅ |
|
||||
|
||||
**Total: 9 candidates from 8 interactions, 0 false positives, 0 misses
|
||||
on heading patterns or sentence patterns.**
|
||||
|
||||
The extractor's strictness is well-tuned for the kinds of structural
|
||||
cues we actually use. Things worth noting:
|
||||
|
||||
- **Sample 7 (`prose-only-no-cues`) produced zero candidates as
|
||||
designed.** This is the most important sanity check — it confirms
|
||||
the extractor won't fill the review queue with general prose when
|
||||
there's no structural intent.
|
||||
- **Sample 3's long content was preserved without truncation.** The
|
||||
280-char max wasn't hit, and the content kept its full meaning.
|
||||
- **Sample 8 produced three distinct rules in one interaction**
|
||||
(decision_heading, constraint_heading, requirement_heading) without
|
||||
the dedup key collapsing them. The dedup key is
|
||||
`(memory_type, normalized_content, rule)` and the three are all
|
||||
different on at least one axis, so they coexist as expected.
|
||||
- **The prose around each heading was correctly ignored.** Sample 6
|
||||
has a second sentence ("the error budget allocates 6 nm to the
|
||||
laser source...") that does NOT have a structural cue, and the
|
||||
extractor correctly didn't fire on it.
|
||||
|
||||
## What happened — reinforcement (the empirical finding)
|
||||
|
||||
**Reinforcement matched zero seeded memories across all 8 samples,
|
||||
even when the response clearly echoed the seed.**
|
||||
|
||||
Sample 5's response was:
|
||||
|
||||
> *"I prefer rebase-based workflows because the history stays linear
|
||||
> and reviewers have an easier time."*
|
||||
|
||||
The seeded `pref_rebase` memory was:
|
||||
|
||||
> *"prefers rebase-based workflows because history stays linear"*
|
||||
|
||||
A human reading both says these are the same fact. The reinforcement
|
||||
matcher disagrees. After all 8 interactions:
|
||||
|
||||
```
|
||||
pref_rebase: confidence=0.6000 refs=0 last=-
|
||||
pref_concise: confidence=0.6000 refs=0 last=-
|
||||
identity_runs_atocore: confidence=0.9000 refs=0 last=-
|
||||
```
|
||||
|
||||
**Nothing moved.** This is the most important finding from this
|
||||
validation pass.
|
||||
|
||||
### Why the matcher missed it
|
||||
|
||||
The current `_memory_matches` rule (in
|
||||
`src/atocore/memory/reinforcement.py`) does a normalized substring
|
||||
match: it lowercases both sides, collapses whitespace, then asks
|
||||
"does the leading 80-char window of the memory content appear as a
|
||||
substring in the response?"
|
||||
|
||||
For the rebase example:
|
||||
|
||||
- needle (normalized): `prefers rebase-based workflows because history stays linear`
|
||||
- haystack (normalized): `i prefer rebase-based workflows because the history stays linear and reviewers have an easier time.`
|
||||
|
||||
The needle starts with `prefers` (with the trailing `s`), and the
|
||||
haystack has `prefer` (without the `s`, because of the first-person
|
||||
voice). And the needle has `because history stays linear`, while the
|
||||
haystack has `because the history stays linear`. **Two small natural
|
||||
paraphrases, and the substring fails.**
|
||||
|
||||
This isn't a bug in the matcher's implementation — it's doing
|
||||
exactly what it was specified to do. It's a design limitation: the
|
||||
substring rule is too brittle for real prose, where the same fact
|
||||
gets re-stated with different verb forms, articles, and word order.
|
||||
|
||||
### Severity
|
||||
|
||||
**Medium-high.** Reinforcement is the entire point of Commit B.
|
||||
A reinforcement matcher that never fires on natural paraphrases
|
||||
will leave seeded memories with stale confidence forever. The
|
||||
reflection loop runs but it doesn't actually reinforce anything.
|
||||
That hollows out the value of having reinforcement at all.
|
||||
|
||||
It is not a critical bug because:
|
||||
- Nothing breaks. The pipeline still runs cleanly.
|
||||
- Reinforcement is supposed to be a *signal*, not the only path to
|
||||
high confidence — humans can still curate confidence directly.
|
||||
- The candidate-extraction path (Commit C) is unaffected and works
|
||||
perfectly.
|
||||
|
||||
But it does need to be addressed before Phase 9 can be considered
|
||||
operationally complete.
|
||||
|
||||
## Recommended fix (deferred to a follow-up commit)
|
||||
|
||||
Replace the substring matcher with a token-overlap matcher. The
|
||||
specification:
|
||||
|
||||
1. Tokenize both memory content and response into lowercase words
|
||||
of length >= 3, dropping a small stop list (`the`, `a`, `an`,
|
||||
`and`, `or`, `of`, `to`, `is`, `was`, `that`, `this`, `with`,
|
||||
`for`, `from`, `into`).
|
||||
2. Stem aggressively (or at minimum, fold trailing `s` and `ed`
|
||||
so `prefers`/`prefer`/`preferred` collapse to one token).
|
||||
3. A match exists if **at least 70% of the memory's content
|
||||
tokens** appear in the response token set.
|
||||
4. Memory content must still be at least `_MIN_MEMORY_CONTENT_LENGTH`
|
||||
characters to be considered.
|
||||
|
||||
This is more permissive than the substring rule but still tight
|
||||
enough to avoid spurious matches on generic words. It would have
|
||||
caught the rebase example because:
|
||||
|
||||
- memory tokens (after stop-list and stemming):
|
||||
`{prefer, rebase-bas, workflow, because, history, stay, linear}`
|
||||
- response tokens:
|
||||
`{prefer, rebase-bas, workflow, because, history, stay, linear,
|
||||
reviewer, easi, time}`
|
||||
- overlap: 7 / 7 memory tokens = 100% > 70% threshold → match
|
||||
|
||||
### Why not fix it in this report
|
||||
|
||||
Three reasons:
|
||||
|
||||
1. The validation report is supposed to be evidence, not a fix
|
||||
spec. A separate commit will introduce the new matcher with
|
||||
its own tests.
|
||||
2. The token-overlap matcher needs its own design review for edge
|
||||
cases (very long memories, very short responses, technical
|
||||
abbreviations, code snippets in responses).
|
||||
3. Mixing the report and the fix into one commit would muddle the
|
||||
audit trail. The report is the empirical evidence; the fix is
|
||||
the response.
|
||||
|
||||
The fix is queued as the next Phase 9 maintenance commit and is
|
||||
flagged in the next-steps section below.
|
||||
|
||||
## Other observations
|
||||
|
||||
### Extraction is conservative on purpose, and that's working
|
||||
|
||||
Sample 7 is the most important data point in the whole run.
|
||||
A natural prose response with no structural cues produced zero
|
||||
candidates. **This is exactly the design intent** — the extractor
|
||||
should be loud about explicit decisions/constraints/requirements
|
||||
and quiet about everything else. If the extractor were too loose
|
||||
the review queue would fill up with low-value items and the human
|
||||
would stop reviewing.
|
||||
|
||||
After this run I have measurably more confidence that the V0 rule
|
||||
set is the right starting point. Future rules can be added one at
|
||||
a time as we see specific patterns the extractor misses, instead of
|
||||
guessing at what might be useful.
|
||||
|
||||
### Confidence on candidates
|
||||
|
||||
All extracted candidates landed at the default `confidence=0.5`,
|
||||
which is what the extractor is currently hardcoded to do. The
|
||||
`promotion-rules.md` doc proposes a per-rule prior with a
|
||||
structural-signal multiplier and freshness bonus. None of that is
|
||||
implemented yet. The validation didn't reveal any urgency around
|
||||
this — humans review the candidates either way — but it confirms
|
||||
that the priors-and-multipliers refinement is a reasonable next
|
||||
step rather than a critical one.
|
||||
|
||||
### Multiple cues in one interaction
|
||||
|
||||
Sample 8 confirmed an important property: **three structural
|
||||
cues in the same response do not collide in dedup**. The dedup
|
||||
key is `(memory_type, normalized_content, rule)`, and since each
|
||||
cue produced a distinct (type, content, rule) tuple, all three
|
||||
landed cleanly.
|
||||
|
||||
This matters because real working sessions naturally bundle
|
||||
multiple decisions/constraints/requirements into one summary.
|
||||
The extractor handles those bundles correctly.
|
||||
|
||||
### Project scoping
|
||||
|
||||
Each candidate carries the `project` from the source interaction
|
||||
into its own `project` field. Sample 6 (p05) and sample 8 (p06)
|
||||
both produced candidates with the right project. This is
|
||||
non-obvious because the extractor module never explicitly looks
|
||||
at project — it inherits from the interaction it's scanning. Worth
|
||||
keeping in mind when the entity extractor is built: same pattern
|
||||
should apply.
|
||||
|
||||
## What this validates and what it doesn't
|
||||
|
||||
### Validates
|
||||
|
||||
- The Phase 9 Commit C extractor's rule set is well-tuned for
|
||||
hand-written structural cues
|
||||
- The dedup logic does the right thing across multiple cues
|
||||
- The "drop candidates that match an existing active memory" filter
|
||||
works (would have been visible if any seeded memory had matched
|
||||
one of the heading texts — none did, but the code path is the
|
||||
same one that's covered in `tests/test_extractor.py`)
|
||||
- The `prose-only-no-cues` no-fire case is solid
|
||||
- Long content is preserved without truncation
|
||||
- Project scoping flows through the pipeline
|
||||
|
||||
### Does NOT validate
|
||||
|
||||
- The reinforcement matcher (clearly, since it caught nothing)
|
||||
- The behaviour against very long documents (each sample was
|
||||
under 700 chars; real interaction responses can be 10× that)
|
||||
- The behaviour against responses that contain code blocks (the
|
||||
extractor's regex rules don't handle code-block fenced sections
|
||||
specially)
|
||||
- Cross-interaction promotion-to-active flow (no candidate was
|
||||
promoted in this run; the lifecycle is covered by the unit tests
|
||||
but not by this empirical exercise)
|
||||
- The behaviour at scale: 8 interactions is a one-shot. We need
|
||||
to see the queue after 50+ before judging reviewer ergonomics.
|
||||
|
||||
### Recommended next empirical exercises
|
||||
|
||||
1. **Real conversation capture**, using a slash command from a
|
||||
real Claude Code session against either a local or Dalidou
|
||||
AtoCore instance. The synthetic responses in this script are
|
||||
honest paraphrases but they're still hand-curated.
|
||||
2. **Bulk capture from existing PKM**, ingesting a few real
|
||||
project notes through the extractor as if they were
|
||||
interactions. This stresses the rules against documents that
|
||||
weren't written with the extractor in mind.
|
||||
3. **Reinforcement matcher rerun** after the token-overlap
|
||||
matcher lands.
|
||||
|
||||
## Action items from this report
|
||||
|
||||
- [ ] **Fix reinforcement matcher** with token-overlap rule
|
||||
described in the "Recommended fix" section above. Owner:
|
||||
next session. Severity: medium-high.
|
||||
- [x] **Document the extractor's V0 strictness** as a working
|
||||
property, not a limitation. Sample 7 makes the case.
|
||||
- [ ] **Build the slash command** so the next validation run
|
||||
can use real (not synthetic) interactions. Tracked in
|
||||
Session 2 of the current planning sprint.
|
||||
- [ ] **Run a 50+ interaction batch** to evaluate reviewer
|
||||
ergonomics. Deferred until the slash command exists.
|
||||
|
||||
## Reproducibility
|
||||
|
||||
The script is deterministic. Re-running it will produce
|
||||
identical results because:
|
||||
|
||||
- the data dir is wiped on every run
|
||||
- the sample interactions are constants
|
||||
- the memory uuid generation is non-deterministic but the
|
||||
important fields (content, type, count, rule) are not
|
||||
- the `data/validation/phase9-first-use/` directory is gitignored,
|
||||
so no state leaks across runs
|
||||
|
||||
To reproduce this exact report:
|
||||
|
||||
```bash
|
||||
python scripts/phase9_first_real_use.py
|
||||
```
|
||||
|
||||
To get JSON output for downstream tooling:
|
||||
|
||||
```bash
|
||||
python scripts/phase9_first_real_use.py --json
|
||||
```
|
||||
129
docs/project-registration-policy.md
Normal file
129
docs/project-registration-policy.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# AtoCore Project Registration Policy
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines the normal path for adding a new project to AtoCore and
|
||||
for safely updating an existing registration later.
|
||||
|
||||
The goal is to make `register + refresh` the standard workflow instead of
|
||||
relying on long custom ingestion prompts every time.
|
||||
|
||||
## What Registration Means
|
||||
|
||||
Registering a project does not ingest it by itself.
|
||||
|
||||
Registration means:
|
||||
|
||||
- the project gets a canonical AtoCore id
|
||||
- known aliases are recorded
|
||||
- the staged source roots for that project are defined
|
||||
- AtoCore and OpenClaw can later refresh that project consistently
|
||||
|
||||
Updating a project means:
|
||||
|
||||
- aliases can be corrected or expanded
|
||||
- the short registry description can be improved
|
||||
- ingest roots can be adjusted deliberately
|
||||
- the canonical project id remains stable
|
||||
|
||||
## Required Fields
|
||||
|
||||
Each project registry entry must include:
|
||||
|
||||
- `id`
|
||||
- stable canonical project id
|
||||
- prefer lowercase kebab-case
|
||||
- examples:
|
||||
- `p04-gigabit`
|
||||
- `p05-interferometer`
|
||||
- `p06-polisher`
|
||||
- `aliases`
|
||||
- short common names or abbreviations
|
||||
- examples:
|
||||
- `p05`
|
||||
- `interferometer`
|
||||
- `description`
|
||||
- short explanation of what the registered source set represents
|
||||
- `ingest_roots`
|
||||
- one or more staged roots under configured source layers
|
||||
|
||||
## Allowed Source Roots
|
||||
|
||||
Current allowed `source` values are:
|
||||
|
||||
- `vault`
|
||||
- `drive`
|
||||
|
||||
These map to the configured Dalidou source boundaries.
|
||||
|
||||
## Recommended Registration Rules
|
||||
|
||||
1. Prefer one canonical project id
|
||||
2. Keep aliases short and practical
|
||||
3. Start with the smallest useful staged roots
|
||||
4. Prefer curated high-signal docs before broad corpora
|
||||
5. Keep repo context selective at first
|
||||
6. Avoid registering noisy or generated trees
|
||||
7. Use `drive` for trusted operational material when available
|
||||
8. Use `vault` for curated staged PKM and repo-doc snapshots
|
||||
|
||||
## Normal Workflow
|
||||
|
||||
For a new project:
|
||||
|
||||
1. stage the initial source docs on Dalidou
|
||||
2. inspect the expected shape with:
|
||||
- `GET /projects/template`
|
||||
- or `atocore.sh project-template`
|
||||
3. preview the entry without mutating state:
|
||||
- `POST /projects/proposal`
|
||||
- or `atocore.sh propose-project ...`
|
||||
4. register the approved entry:
|
||||
- `POST /projects/register`
|
||||
- or `atocore.sh register-project ...`
|
||||
5. verify the entry with:
|
||||
- `GET /projects`
|
||||
- or the T420 helper `atocore.sh projects`
|
||||
6. refresh it with:
|
||||
- `POST /projects/{id}/refresh`
|
||||
- or `atocore.sh refresh-project <id>`
|
||||
7. verify retrieval and context quality
|
||||
8. only later promote stable facts into Trusted Project State
|
||||
|
||||
For an existing registered project:
|
||||
|
||||
1. inspect the current entry with:
|
||||
- `GET /projects`
|
||||
- or `atocore.sh projects`
|
||||
2. update the registration if aliases, description, or roots need refinement:
|
||||
- `PUT /projects/{id}`
|
||||
3. verify the updated entry
|
||||
4. refresh the project again
|
||||
5. verify retrieval and context quality did not regress
|
||||
|
||||
## What Not To Do
|
||||
|
||||
Do not:
|
||||
|
||||
- register giant noisy trees blindly
|
||||
- treat registration as equivalent to trusted state
|
||||
- dump the full PKM by default
|
||||
- rely on aliases that collide across projects
|
||||
- use the live machine DB as a source root
|
||||
|
||||
## Template
|
||||
|
||||
Use:
|
||||
|
||||
- [project-registry.example.json](C:/Users/antoi/ATOCore/config/project-registry.example.json)
|
||||
|
||||
And the API template endpoint:
|
||||
|
||||
- `GET /projects/template`
|
||||
|
||||
Other lifecycle endpoints:
|
||||
|
||||
- `POST /projects/proposal`
|
||||
- `POST /projects/register`
|
||||
- `PUT /projects/{id}`
|
||||
- `POST /projects/{id}/refresh`
|
||||
103
docs/source-refresh-model.md
Normal file
103
docs/source-refresh-model.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# AtoCore Source Refresh Model
|
||||
|
||||
## Purpose
|
||||
|
||||
This document explains how human-authored project material should flow into the
|
||||
Dalidou-hosted AtoCore machine store.
|
||||
|
||||
It exists to make one distinction explicit:
|
||||
|
||||
- source markdown is not the same thing as the machine-memory layer
|
||||
- source refresh is how changes in PKM or repos become visible to AtoCore
|
||||
|
||||
## Current Model
|
||||
|
||||
Today, the flow is:
|
||||
|
||||
1. human-authoritative project material exists in PKM, AtoDrive, and repos
|
||||
2. selected high-value files are staged into Dalidou source paths
|
||||
3. AtoCore ingests those source files
|
||||
4. AtoCore stores the processed representation in:
|
||||
- document records
|
||||
- chunks
|
||||
- vectors
|
||||
- project memory
|
||||
- trusted project state
|
||||
5. retrieval and context assembly use the machine store, not the staged folder
|
||||
|
||||
## Why This Feels Redundant
|
||||
|
||||
The staged source files can look almost identical to the original PKM notes or
|
||||
repo docs because they are still source material.
|
||||
|
||||
That is expected.
|
||||
|
||||
The staged source area exists because the canonical AtoCore instance on Dalidou
|
||||
needs a server-visible path to ingest from.
|
||||
|
||||
## What Happens When A Project Source Changes
|
||||
|
||||
If you edit a note in PKM or a doc in a repo:
|
||||
|
||||
- the original source changes immediately
|
||||
- the staged Dalidou copy does not change automatically
|
||||
- the AtoCore machine store also does not change automatically
|
||||
|
||||
To refresh AtoCore:
|
||||
|
||||
1. select the updated project source set
|
||||
2. copy or mirror the new version into the Dalidou source area
|
||||
3. run ingestion again
|
||||
4. verify that retrieval and context reflect the new material
|
||||
|
||||
## Current Intentional Limits
|
||||
|
||||
The current active-project ingestion strategy is selective.
|
||||
|
||||
That means:
|
||||
|
||||
- not every note from a project is staged
|
||||
- not every repo file is staged
|
||||
- the goal is to start with high-value anchor docs
|
||||
- broader ingestion comes later if needed
|
||||
|
||||
This is why the staged source area for a project may look partial or uneven at
|
||||
this stage.
|
||||
|
||||
## Long-Run Target
|
||||
|
||||
The long-run workflow should become much more natural:
|
||||
|
||||
- each project has a registered source map
|
||||
- PKM root
|
||||
- AtoDrive root
|
||||
- repo root
|
||||
- preferred docs
|
||||
- excluded noisy paths
|
||||
- a command like `refresh p06-polisher` resolves the right sources
|
||||
- AtoCore refreshes the machine representation cleanly
|
||||
- OpenClaw consumes the improved context over API
|
||||
|
||||
## Current Foundation
|
||||
|
||||
The first concrete foundation for this now exists in AtoCore:
|
||||
|
||||
- a project registry file records known project ids, aliases, and ingest roots
|
||||
- the API can list those registered projects
|
||||
- the API can return a registration template for new projects
|
||||
- the API can preview a proposed registration before writing it
|
||||
- the API can persist an approved registration to the registry
|
||||
- the API can refresh a single registered project from its configured roots
|
||||
|
||||
This is not full source automation yet, but it gives the refresh model a real
|
||||
home in the system.
|
||||
|
||||
## Healthy Mental Model
|
||||
|
||||
Use this distinction:
|
||||
|
||||
- PKM / AtoDrive / repos = human-authoritative sources
|
||||
- staged Dalidou markdown = server-visible ingestion inputs
|
||||
- AtoCore DB/vector state = compiled machine context layer
|
||||
|
||||
That separation is intentional and healthy.
|
||||
36
pyproject.toml
Normal file
36
pyproject.toml
Normal file
@@ -0,0 +1,36 @@
|
||||
[build-system]
|
||||
requires = ["setuptools>=68.0", "wheel"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "atocore"
|
||||
version = "0.2.0"
|
||||
description = "Personal context engine for LLM interactions"
|
||||
requires-python = ">=3.11"
|
||||
dependencies = [
|
||||
"fastapi>=0.110.0",
|
||||
"uvicorn[standard]>=0.27.0",
|
||||
"python-frontmatter>=1.1.0",
|
||||
"chromadb>=0.4.22",
|
||||
"sentence-transformers>=2.5.0",
|
||||
"pydantic>=2.6.0",
|
||||
"pydantic-settings>=2.1.0",
|
||||
"structlog>=24.1.0",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = [
|
||||
"pytest>=8.0.0",
|
||||
"pytest-cov>=4.1.0",
|
||||
"httpx>=0.27.0",
|
||||
"pyyaml>=6.0.0",
|
||||
]
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["src"]
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
testpaths = ["tests"]
|
||||
python_files = ["test_*.py"]
|
||||
python_functions = ["test_*"]
|
||||
addopts = "-v"
|
||||
5
requirements-dev.txt
Normal file
5
requirements-dev.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
-r requirements.txt
|
||||
pytest>=8.0.0
|
||||
pytest-cov>=4.1.0
|
||||
httpx>=0.27.0
|
||||
pyyaml>=6.0.0
|
||||
8
requirements.txt
Normal file
8
requirements.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
fastapi>=0.110.0
|
||||
uvicorn[standard]>=0.27.0
|
||||
python-frontmatter>=1.1.0
|
||||
chromadb>=0.4.22
|
||||
sentence-transformers>=2.5.0
|
||||
pydantic>=2.6.0
|
||||
pydantic-settings>=2.1.0
|
||||
structlog>=24.1.0
|
||||
630
scripts/atocore_client.py
Normal file
630
scripts/atocore_client.py
Normal file
@@ -0,0 +1,630 @@
|
||||
"""Operator-facing API client for live AtoCore instances.
|
||||
|
||||
This script is intentionally external to the app runtime. It is for admins
|
||||
and operators who want a convenient way to inspect live project state,
|
||||
refresh projects, audit retrieval quality, manage trusted project-state
|
||||
entries, and drive the Phase 9 reflection loop (capture, extract, queue,
|
||||
promote, reject).
|
||||
|
||||
Environment variables
|
||||
---------------------
|
||||
|
||||
ATOCORE_BASE_URL
|
||||
Base URL of the AtoCore service (default: ``http://dalidou:8100``).
|
||||
|
||||
When running ON the Dalidou host itself or INSIDE the Dalidou
|
||||
container, override this with loopback or the real IP::
|
||||
|
||||
ATOCORE_BASE_URL=http://127.0.0.1:8100 \\
|
||||
python scripts/atocore_client.py health
|
||||
|
||||
The default hostname "dalidou" is meant for cases where the
|
||||
caller is a remote machine (laptop, T420/OpenClaw, etc.) with
|
||||
"dalidou" in its /etc/hosts or resolvable via Tailscale. It does
|
||||
NOT reliably resolve on the host itself or inside the container,
|
||||
and when it fails the client returns
|
||||
``{"status": "unavailable", "fail_open": true}`` — the right
|
||||
diagnosis when that happens is to set ATOCORE_BASE_URL explicitly
|
||||
to 127.0.0.1:8100 and retry.
|
||||
|
||||
ATOCORE_TIMEOUT_SECONDS
|
||||
Request timeout for most operations (default: 30).
|
||||
|
||||
ATOCORE_REFRESH_TIMEOUT_SECONDS
|
||||
Longer timeout for project refresh operations which can be slow
|
||||
(default: 1800).
|
||||
|
||||
ATOCORE_FAIL_OPEN
|
||||
When "true" (default), network errors return a small fail-open
|
||||
envelope instead of raising. Set to "false" for admin operations
|
||||
where you need the real error.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
from typing import Any
|
||||
|
||||
|
||||
BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100").rstrip("/")
|
||||
TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
|
||||
REFRESH_TIMEOUT = int(os.environ.get("ATOCORE_REFRESH_TIMEOUT_SECONDS", "1800"))
|
||||
FAIL_OPEN = os.environ.get("ATOCORE_FAIL_OPEN", "true").lower() == "true"
|
||||
|
||||
# Bumped when the subcommand surface or JSON output shapes meaningfully
|
||||
# change. See docs/architecture/llm-client-integration.md for the
|
||||
# semver rules. History:
|
||||
# 0.1.0 initial stable-ops-only client
|
||||
# 0.2.0 Phase 9 reflection loop added: capture, extract,
|
||||
# reinforce-interaction, list-interactions, get-interaction,
|
||||
# queue, promote, reject
|
||||
CLIENT_VERSION = "0.2.0"
|
||||
|
||||
|
||||
def print_json(payload: Any) -> None:
|
||||
print(json.dumps(payload, ensure_ascii=True, indent=2))
|
||||
|
||||
|
||||
def fail_open_payload() -> dict[str, Any]:
|
||||
return {"status": "unavailable", "source": "atocore", "fail_open": True}
|
||||
|
||||
|
||||
def request(
|
||||
method: str,
|
||||
path: str,
|
||||
data: dict[str, Any] | None = None,
|
||||
timeout: int | None = None,
|
||||
) -> Any:
|
||||
url = f"{BASE_URL}{path}"
|
||||
headers = {"Content-Type": "application/json"} if data is not None else {}
|
||||
payload = json.dumps(data).encode("utf-8") if data is not None else None
|
||||
req = urllib.request.Request(url, data=payload, headers=headers, method=method)
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=timeout or TIMEOUT) as response:
|
||||
body = response.read().decode("utf-8")
|
||||
except urllib.error.HTTPError as exc:
|
||||
body = exc.read().decode("utf-8")
|
||||
if body:
|
||||
print(body)
|
||||
raise SystemExit(22) from exc
|
||||
except (urllib.error.URLError, TimeoutError, OSError):
|
||||
if FAIL_OPEN:
|
||||
print_json(fail_open_payload())
|
||||
raise SystemExit(0)
|
||||
raise
|
||||
|
||||
if not body.strip():
|
||||
return {}
|
||||
return json.loads(body)
|
||||
|
||||
|
||||
def parse_aliases(aliases_csv: str) -> list[str]:
|
||||
return [alias.strip() for alias in aliases_csv.split(",") if alias.strip()]
|
||||
|
||||
|
||||
def detect_project(prompt: str) -> dict[str, Any]:
|
||||
payload = request("GET", "/projects")
|
||||
prompt_lower = prompt.lower()
|
||||
best_project = None
|
||||
best_alias = None
|
||||
best_score = -1
|
||||
|
||||
for project in payload.get("projects", []):
|
||||
candidates = [project.get("id", ""), *project.get("aliases", [])]
|
||||
for candidate in candidates:
|
||||
candidate = (candidate or "").strip()
|
||||
if not candidate:
|
||||
continue
|
||||
pattern = rf"(?<![a-z0-9]){re.escape(candidate.lower())}(?![a-z0-9])"
|
||||
matched = re.search(pattern, prompt_lower) is not None
|
||||
if not matched and candidate.lower() not in prompt_lower:
|
||||
continue
|
||||
score = len(candidate)
|
||||
if score > best_score:
|
||||
best_project = project.get("id")
|
||||
best_alias = candidate
|
||||
best_score = score
|
||||
|
||||
return {"matched_project": best_project, "matched_alias": best_alias}
|
||||
|
||||
|
||||
def classify_result(result: dict[str, Any]) -> dict[str, Any]:
|
||||
source_file = (result.get("source_file") or "").lower()
|
||||
heading = (result.get("heading_path") or "").lower()
|
||||
title = (result.get("title") or "").lower()
|
||||
text = " ".join([source_file, heading, title])
|
||||
|
||||
labels: list[str] = []
|
||||
if any(token in text for token in ["_archive", "/archive", "archive/", "pre-cleanup", "pre-migration", "history"]):
|
||||
labels.append("archive_or_history")
|
||||
if any(token in text for token in ["status", "dashboard", "current-state", "current state", "next-steps", "next steps"]):
|
||||
labels.append("current_status")
|
||||
if any(token in text for token in ["decision", "adr", "tradeoff", "selected architecture", "selection"]):
|
||||
labels.append("decision")
|
||||
if any(token in text for token in ["requirement", "spec", "constraints", "baseline", "cdr", "sow"]):
|
||||
labels.append("requirements")
|
||||
if any(token in text for token in ["roadmap", "milestone", "plan", "workflow", "calibration", "contract"]):
|
||||
labels.append("execution_plan")
|
||||
if not labels:
|
||||
labels.append("reference")
|
||||
|
||||
return {
|
||||
"score": result.get("score"),
|
||||
"title": result.get("title"),
|
||||
"heading_path": result.get("heading_path"),
|
||||
"source_file": result.get("source_file"),
|
||||
"labels": labels,
|
||||
"is_noise_risk": "archive_or_history" in labels,
|
||||
}
|
||||
|
||||
|
||||
def audit_query(prompt: str, top_k: int, project: str | None) -> dict[str, Any]:
|
||||
response = request(
|
||||
"POST",
|
||||
"/query",
|
||||
{"prompt": prompt, "top_k": top_k, "project": project or None},
|
||||
)
|
||||
classifications = [classify_result(result) for result in response.get("results", [])]
|
||||
broad_prompt = len(prompt.split()) <= 2
|
||||
noise_hits = sum(1 for item in classifications if item["is_noise_risk"])
|
||||
current_hits = sum(1 for item in classifications if "current_status" in item["labels"])
|
||||
decision_hits = sum(1 for item in classifications if "decision" in item["labels"])
|
||||
requirements_hits = sum(1 for item in classifications if "requirements" in item["labels"])
|
||||
|
||||
recommendations: list[str] = []
|
||||
if broad_prompt:
|
||||
recommendations.append("Prompt is broad; prefer a project-specific question with intent, artifact type, or constraint language.")
|
||||
if noise_hits:
|
||||
recommendations.append("Archive/history noise is present; prefer current-status, decision, requirements, and baseline docs in the next ingestion/ranking pass.")
|
||||
if current_hits == 0:
|
||||
recommendations.append("No current-status docs surfaced in the top results; Wave 2 should ingest or strengthen trusted operational truth.")
|
||||
if decision_hits == 0:
|
||||
recommendations.append("No decision docs surfaced in the top results; add or freeze decision logs for the active project.")
|
||||
if requirements_hits == 0:
|
||||
recommendations.append("No requirements/baseline docs surfaced in the top results; prioritize baseline and architecture-freeze material.")
|
||||
if not recommendations:
|
||||
recommendations.append("Ranking looks healthy for this prompt.")
|
||||
|
||||
return {
|
||||
"prompt": prompt,
|
||||
"project": project,
|
||||
"top_k": top_k,
|
||||
"broad_prompt": broad_prompt,
|
||||
"noise_hits": noise_hits,
|
||||
"current_status_hits": current_hits,
|
||||
"decision_hits": decision_hits,
|
||||
"requirements_hits": requirements_hits,
|
||||
"results": classifications,
|
||||
"recommendations": recommendations,
|
||||
}
|
||||
|
||||
|
||||
def project_payload(
|
||||
project_id: str,
|
||||
aliases_csv: str,
|
||||
source: str,
|
||||
subpath: str,
|
||||
description: str,
|
||||
label: str,
|
||||
) -> dict[str, Any]:
|
||||
return {
|
||||
"project_id": project_id,
|
||||
"aliases": parse_aliases(aliases_csv),
|
||||
"description": description,
|
||||
"ingest_roots": [{"source": source, "subpath": subpath, "label": label}],
|
||||
}
|
||||
|
||||
|
||||
def build_parser() -> argparse.ArgumentParser:
|
||||
parser = argparse.ArgumentParser(description="AtoCore live API client")
|
||||
sub = parser.add_subparsers(dest="command", required=True)
|
||||
|
||||
for name in ["health", "sources", "stats", "projects", "project-template", "debug-context", "ingest-sources"]:
|
||||
sub.add_parser(name)
|
||||
|
||||
p = sub.add_parser("detect-project")
|
||||
p.add_argument("prompt")
|
||||
|
||||
p = sub.add_parser("auto-context")
|
||||
p.add_argument("prompt")
|
||||
p.add_argument("budget", nargs="?", type=int, default=3000)
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
|
||||
for name in ["propose-project", "register-project"]:
|
||||
p = sub.add_parser(name)
|
||||
p.add_argument("project_id")
|
||||
p.add_argument("aliases_csv")
|
||||
p.add_argument("source")
|
||||
p.add_argument("subpath")
|
||||
p.add_argument("description", nargs="?", default="")
|
||||
p.add_argument("label", nargs="?", default="")
|
||||
|
||||
p = sub.add_parser("update-project")
|
||||
p.add_argument("project")
|
||||
p.add_argument("description")
|
||||
p.add_argument("aliases_csv", nargs="?", default="")
|
||||
|
||||
p = sub.add_parser("refresh-project")
|
||||
p.add_argument("project")
|
||||
p.add_argument("purge_deleted", nargs="?", default="false")
|
||||
|
||||
p = sub.add_parser("project-state")
|
||||
p.add_argument("project")
|
||||
p.add_argument("category", nargs="?", default="")
|
||||
|
||||
p = sub.add_parser("project-state-set")
|
||||
p.add_argument("project")
|
||||
p.add_argument("category")
|
||||
p.add_argument("key")
|
||||
p.add_argument("value")
|
||||
p.add_argument("source", nargs="?", default="")
|
||||
p.add_argument("confidence", nargs="?", type=float, default=1.0)
|
||||
|
||||
p = sub.add_parser("project-state-invalidate")
|
||||
p.add_argument("project")
|
||||
p.add_argument("category")
|
||||
p.add_argument("key")
|
||||
|
||||
p = sub.add_parser("query")
|
||||
p.add_argument("prompt")
|
||||
p.add_argument("top_k", nargs="?", type=int, default=5)
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
|
||||
p = sub.add_parser("context-build")
|
||||
p.add_argument("prompt")
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
p.add_argument("budget", nargs="?", type=int, default=3000)
|
||||
|
||||
p = sub.add_parser("audit-query")
|
||||
p.add_argument("prompt")
|
||||
p.add_argument("top_k", nargs="?", type=int, default=5)
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
|
||||
# --- Phase 9 reflection loop surface --------------------------------
|
||||
#
|
||||
# capture: record one interaction (prompt + response + context used).
|
||||
# Mirrors POST /interactions. response is positional so shell
|
||||
# callers can pass it via $(cat file.txt) or heredoc. project,
|
||||
# client, and session_id are optional positionals with empty
|
||||
# defaults, matching the existing script's style.
|
||||
p = sub.add_parser("capture")
|
||||
p.add_argument("prompt")
|
||||
p.add_argument("response", nargs="?", default="")
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
p.add_argument("client", nargs="?", default="")
|
||||
p.add_argument("session_id", nargs="?", default="")
|
||||
p.add_argument("reinforce", nargs="?", default="true")
|
||||
|
||||
# extract: run the Phase 9 C rule-based extractor against an
|
||||
# already-captured interaction. persist='true' writes the
|
||||
# candidates as status='candidate' memories; default is
|
||||
# preview-only.
|
||||
p = sub.add_parser("extract")
|
||||
p.add_argument("interaction_id")
|
||||
p.add_argument("persist", nargs="?", default="false")
|
||||
|
||||
# reinforce: backfill reinforcement on an already-captured interaction.
|
||||
p = sub.add_parser("reinforce-interaction")
|
||||
p.add_argument("interaction_id")
|
||||
|
||||
# list-interactions: paginated listing with filters.
|
||||
p = sub.add_parser("list-interactions")
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
p.add_argument("session_id", nargs="?", default="")
|
||||
p.add_argument("client", nargs="?", default="")
|
||||
p.add_argument("since", nargs="?", default="")
|
||||
p.add_argument("limit", nargs="?", type=int, default=50)
|
||||
|
||||
# get-interaction: fetch one by id
|
||||
p = sub.add_parser("get-interaction")
|
||||
p.add_argument("interaction_id")
|
||||
|
||||
# queue: list the candidate review queue
|
||||
p = sub.add_parser("queue")
|
||||
p.add_argument("memory_type", nargs="?", default="")
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
p.add_argument("limit", nargs="?", type=int, default=50)
|
||||
|
||||
# promote: candidate -> active
|
||||
p = sub.add_parser("promote")
|
||||
p.add_argument("memory_id")
|
||||
|
||||
# reject: candidate -> invalid
|
||||
p = sub.add_parser("reject")
|
||||
p.add_argument("memory_id")
|
||||
|
||||
# batch-extract: fan out /interactions/{id}/extract?persist=true across
|
||||
# recent interactions. Idempotent — the extractor create_memory path
|
||||
# silently skips duplicates, so re-running is safe.
|
||||
p = sub.add_parser("batch-extract")
|
||||
p.add_argument("since", nargs="?", default="")
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
p.add_argument("limit", nargs="?", type=int, default=100)
|
||||
p.add_argument("persist", nargs="?", default="true")
|
||||
|
||||
# triage: interactive candidate review loop. Fetches the queue, shows
|
||||
# each candidate, accepts p/r/s (promote / reject / skip) / q (quit).
|
||||
p = sub.add_parser("triage")
|
||||
p.add_argument("memory_type", nargs="?", default="")
|
||||
p.add_argument("project", nargs="?", default="")
|
||||
p.add_argument("limit", nargs="?", type=int, default=50)
|
||||
|
||||
return parser
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = build_parser().parse_args()
|
||||
cmd = args.command
|
||||
|
||||
if cmd == "health":
|
||||
print_json(request("GET", "/health"))
|
||||
elif cmd == "sources":
|
||||
print_json(request("GET", "/sources"))
|
||||
elif cmd == "stats":
|
||||
print_json(request("GET", "/stats"))
|
||||
elif cmd == "projects":
|
||||
print_json(request("GET", "/projects"))
|
||||
elif cmd == "project-template":
|
||||
print_json(request("GET", "/projects/template"))
|
||||
elif cmd == "debug-context":
|
||||
print_json(request("GET", "/debug/context"))
|
||||
elif cmd == "ingest-sources":
|
||||
print_json(request("POST", "/ingest/sources", {}))
|
||||
elif cmd == "detect-project":
|
||||
print_json(detect_project(args.prompt))
|
||||
elif cmd == "auto-context":
|
||||
project = args.project or detect_project(args.prompt).get("matched_project") or ""
|
||||
if not project:
|
||||
print_json({"status": "no_project_match", "source": "atocore", "mode": "auto-context"})
|
||||
else:
|
||||
print_json(request("POST", "/context/build", {"prompt": args.prompt, "project": project, "budget": args.budget}))
|
||||
elif cmd in {"propose-project", "register-project"}:
|
||||
path = "/projects/proposal" if cmd == "propose-project" else "/projects/register"
|
||||
print_json(request("POST", path, project_payload(args.project_id, args.aliases_csv, args.source, args.subpath, args.description, args.label)))
|
||||
elif cmd == "update-project":
|
||||
payload: dict[str, Any] = {"description": args.description}
|
||||
if args.aliases_csv.strip():
|
||||
payload["aliases"] = parse_aliases(args.aliases_csv)
|
||||
print_json(request("PUT", f"/projects/{urllib.parse.quote(args.project)}", payload))
|
||||
elif cmd == "refresh-project":
|
||||
purge_deleted = args.purge_deleted.lower() in {"1", "true", "yes", "y"}
|
||||
path = f"/projects/{urllib.parse.quote(args.project)}/refresh?purge_deleted={str(purge_deleted).lower()}"
|
||||
print_json(request("POST", path, {}, timeout=REFRESH_TIMEOUT))
|
||||
elif cmd == "project-state":
|
||||
suffix = f"?category={urllib.parse.quote(args.category)}" if args.category else ""
|
||||
print_json(request("GET", f"/project/state/{urllib.parse.quote(args.project)}{suffix}"))
|
||||
elif cmd == "project-state-set":
|
||||
print_json(request("POST", "/project/state", {
|
||||
"project": args.project,
|
||||
"category": args.category,
|
||||
"key": args.key,
|
||||
"value": args.value,
|
||||
"source": args.source,
|
||||
"confidence": args.confidence,
|
||||
}))
|
||||
elif cmd == "project-state-invalidate":
|
||||
print_json(request("DELETE", "/project/state", {"project": args.project, "category": args.category, "key": args.key}))
|
||||
elif cmd == "query":
|
||||
print_json(request("POST", "/query", {"prompt": args.prompt, "top_k": args.top_k, "project": args.project or None}))
|
||||
elif cmd == "context-build":
|
||||
print_json(request("POST", "/context/build", {"prompt": args.prompt, "project": args.project or None, "budget": args.budget}))
|
||||
elif cmd == "audit-query":
|
||||
print_json(audit_query(args.prompt, args.top_k, args.project or None))
|
||||
# --- Phase 9 reflection loop surface ------------------------------
|
||||
elif cmd == "capture":
|
||||
body: dict[str, Any] = {
|
||||
"prompt": args.prompt,
|
||||
"response": args.response,
|
||||
"project": args.project,
|
||||
"client": args.client or "atocore-client",
|
||||
"session_id": args.session_id,
|
||||
"reinforce": args.reinforce.lower() in {"1", "true", "yes", "y"},
|
||||
}
|
||||
print_json(request("POST", "/interactions", body))
|
||||
elif cmd == "extract":
|
||||
persist = args.persist.lower() in {"1", "true", "yes", "y"}
|
||||
print_json(
|
||||
request(
|
||||
"POST",
|
||||
f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}/extract",
|
||||
{"persist": persist},
|
||||
)
|
||||
)
|
||||
elif cmd == "reinforce-interaction":
|
||||
print_json(
|
||||
request(
|
||||
"POST",
|
||||
f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}/reinforce",
|
||||
{},
|
||||
)
|
||||
)
|
||||
elif cmd == "list-interactions":
|
||||
query_parts: list[str] = []
|
||||
if args.project:
|
||||
query_parts.append(f"project={urllib.parse.quote(args.project)}")
|
||||
if args.session_id:
|
||||
query_parts.append(f"session_id={urllib.parse.quote(args.session_id)}")
|
||||
if args.client:
|
||||
query_parts.append(f"client={urllib.parse.quote(args.client)}")
|
||||
if args.since:
|
||||
query_parts.append(f"since={urllib.parse.quote(args.since)}")
|
||||
query_parts.append(f"limit={int(args.limit)}")
|
||||
query = "?" + "&".join(query_parts)
|
||||
print_json(request("GET", f"/interactions{query}"))
|
||||
elif cmd == "get-interaction":
|
||||
print_json(
|
||||
request(
|
||||
"GET",
|
||||
f"/interactions/{urllib.parse.quote(args.interaction_id, safe='')}",
|
||||
)
|
||||
)
|
||||
elif cmd == "queue":
|
||||
query_parts = ["status=candidate"]
|
||||
if args.memory_type:
|
||||
query_parts.append(f"memory_type={urllib.parse.quote(args.memory_type)}")
|
||||
if args.project:
|
||||
query_parts.append(f"project={urllib.parse.quote(args.project)}")
|
||||
query_parts.append(f"limit={int(args.limit)}")
|
||||
query = "?" + "&".join(query_parts)
|
||||
print_json(request("GET", f"/memory{query}"))
|
||||
elif cmd == "promote":
|
||||
print_json(
|
||||
request(
|
||||
"POST",
|
||||
f"/memory/{urllib.parse.quote(args.memory_id, safe='')}/promote",
|
||||
{},
|
||||
)
|
||||
)
|
||||
elif cmd == "reject":
|
||||
print_json(
|
||||
request(
|
||||
"POST",
|
||||
f"/memory/{urllib.parse.quote(args.memory_id, safe='')}/reject",
|
||||
{},
|
||||
)
|
||||
)
|
||||
elif cmd == "batch-extract":
|
||||
print_json(run_batch_extract(args.since, args.project, args.limit, args.persist))
|
||||
elif cmd == "triage":
|
||||
return run_triage(args.memory_type, args.project, args.limit)
|
||||
else:
|
||||
return 1
|
||||
return 0
|
||||
|
||||
|
||||
def run_batch_extract(since: str, project: str, limit: int, persist_flag: str) -> dict:
|
||||
"""Fetch recent interactions and run the extractor against each one.
|
||||
|
||||
Returns an aggregated summary. Safe to re-run: the server-side
|
||||
persist path catches ValueError on duplicates and the endpoint
|
||||
reports per-interaction candidate counts either way.
|
||||
"""
|
||||
persist = persist_flag.lower() in {"1", "true", "yes", "y"}
|
||||
query_parts: list[str] = []
|
||||
if project:
|
||||
query_parts.append(f"project={urllib.parse.quote(project)}")
|
||||
if since:
|
||||
query_parts.append(f"since={urllib.parse.quote(since)}")
|
||||
query_parts.append(f"limit={int(limit)}")
|
||||
query = "?" + "&".join(query_parts)
|
||||
|
||||
listing = request("GET", f"/interactions{query}")
|
||||
interactions = listing.get("interactions", []) if isinstance(listing, dict) else []
|
||||
|
||||
processed = 0
|
||||
total_candidates = 0
|
||||
total_persisted = 0
|
||||
errors: list[dict] = []
|
||||
per_interaction: list[dict] = []
|
||||
|
||||
for item in interactions:
|
||||
iid = item.get("id") or ""
|
||||
if not iid:
|
||||
continue
|
||||
try:
|
||||
result = request(
|
||||
"POST",
|
||||
f"/interactions/{urllib.parse.quote(iid, safe='')}/extract",
|
||||
{"persist": persist},
|
||||
)
|
||||
except Exception as exc: # pragma: no cover - network errors land here
|
||||
errors.append({"interaction_id": iid, "error": str(exc)})
|
||||
continue
|
||||
processed += 1
|
||||
count = int(result.get("candidate_count", 0) or 0)
|
||||
persisted_ids = result.get("persisted_ids") or []
|
||||
total_candidates += count
|
||||
total_persisted += len(persisted_ids)
|
||||
if count:
|
||||
per_interaction.append(
|
||||
{
|
||||
"interaction_id": iid,
|
||||
"candidate_count": count,
|
||||
"persisted_count": len(persisted_ids),
|
||||
"project": item.get("project") or "",
|
||||
}
|
||||
)
|
||||
|
||||
return {
|
||||
"processed": processed,
|
||||
"total_candidates": total_candidates,
|
||||
"total_persisted": total_persisted,
|
||||
"persist": persist,
|
||||
"errors": errors,
|
||||
"interactions_with_candidates": per_interaction,
|
||||
}
|
||||
|
||||
|
||||
def run_triage(memory_type: str, project: str, limit: int) -> int:
|
||||
"""Interactive review of candidate memories.
|
||||
|
||||
Loads the queue once, walks through entries, prompts for
|
||||
(p)romote / (r)eject / (s)kip / (q)uit. Stateless between runs —
|
||||
re-running picks up whatever is still status=candidate.
|
||||
"""
|
||||
query_parts = ["status=candidate"]
|
||||
if memory_type:
|
||||
query_parts.append(f"memory_type={urllib.parse.quote(memory_type)}")
|
||||
if project:
|
||||
query_parts.append(f"project={urllib.parse.quote(project)}")
|
||||
query_parts.append(f"limit={int(limit)}")
|
||||
listing = request("GET", "/memory?" + "&".join(query_parts))
|
||||
memories = listing.get("memories", []) if isinstance(listing, dict) else []
|
||||
|
||||
if not memories:
|
||||
print_json({"status": "empty_queue", "count": 0})
|
||||
return 0
|
||||
|
||||
promoted = 0
|
||||
rejected = 0
|
||||
skipped = 0
|
||||
stopped_early = False
|
||||
|
||||
print(f"Triage queue: {len(memories)} candidate(s)\n", file=sys.stderr)
|
||||
for idx, mem in enumerate(memories, 1):
|
||||
mid = mem.get("id", "")
|
||||
print(f"[{idx}/{len(memories)}] {mem.get('memory_type','?')} project={mem.get('project','')} conf={mem.get('confidence','?')}", file=sys.stderr)
|
||||
print(f" id: {mid}", file=sys.stderr)
|
||||
print(f" {mem.get('content','')}", file=sys.stderr)
|
||||
try:
|
||||
choice = input(" (p)romote / (r)eject / (s)kip / (q)uit > ").strip().lower()
|
||||
except EOFError:
|
||||
stopped_early = True
|
||||
break
|
||||
if choice in {"q", "quit"}:
|
||||
stopped_early = True
|
||||
break
|
||||
if choice in {"p", "promote"}:
|
||||
request("POST", f"/memory/{urllib.parse.quote(mid, safe='')}/promote", {})
|
||||
promoted += 1
|
||||
print(" -> promoted", file=sys.stderr)
|
||||
elif choice in {"r", "reject"}:
|
||||
request("POST", f"/memory/{urllib.parse.quote(mid, safe='')}/reject", {})
|
||||
rejected += 1
|
||||
print(" -> rejected", file=sys.stderr)
|
||||
else:
|
||||
skipped += 1
|
||||
print(" -> skipped", file=sys.stderr)
|
||||
|
||||
print_json(
|
||||
{
|
||||
"reviewed": promoted + rejected + skipped,
|
||||
"promoted": promoted,
|
||||
"rejected": rejected,
|
||||
"skipped": skipped,
|
||||
"stopped_early": stopped_early,
|
||||
"remaining_in_queue": len(memories) - (promoted + rejected + skipped) - (1 if stopped_early else 0),
|
||||
}
|
||||
)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
145
scripts/eval_data/extractor_labels_2026-04-11.json
Normal file
145
scripts/eval_data/extractor_labels_2026-04-11.json
Normal file
@@ -0,0 +1,145 @@
|
||||
{
|
||||
"version": "0.1",
|
||||
"frozen_at": "2026-04-11",
|
||||
"snapshot_file": "scripts/eval_data/interactions_snapshot_2026-04-11.json",
|
||||
"labeled_count": 20,
|
||||
"plan_deviation": "Codex's plan called for 30 labeled interactions (10 zero / 10 plausible / 10 ambiguous). Actual corpus is heavily skewed toward instructional/status content; after reading 20 drawn by length-stratified random sample, the honest positive rate is ~25% (5/20). Labeling more would mostly add zeros; the Day 2 measurement is not bottlenecked on sample size.",
|
||||
"positive_count": 5,
|
||||
"labels": [
|
||||
{
|
||||
"id": "ab239158-d6ac-4c51-b6e4-dd4ccea384a2",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Instructional deploy guidance. No durable claim."
|
||||
},
|
||||
{
|
||||
"id": "da153f2a-b20a-4dee-8c72-431ebb71f08c",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "'Deploy still in progress.' Pure status."
|
||||
},
|
||||
{
|
||||
"id": "7d8371ee-c6d3-4dfe-a7b0-2d091f075c15",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Git command walkthrough. No durable claim."
|
||||
},
|
||||
{
|
||||
"id": "14bf3f90-e318-466e-81ac-d35522741ba5",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Ledger status update. Transient fact, not a durable memory candidate."
|
||||
},
|
||||
{
|
||||
"id": "8f855235-c38d-4c27-9f2b-8530ebe1a2d8",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Short-term recommendation ('merge to main and deploy'), not a standing decision."
|
||||
},
|
||||
{
|
||||
"id": "04a96eb5-cd00-4e9f-9252-b2cc919000a4",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Dev server config table. Operational detail, not a memory."
|
||||
},
|
||||
{
|
||||
"id": "79d606ed-8981-454a-83af-c25226b1b65c",
|
||||
"expected_count": 1,
|
||||
"expected_type": "adaptation",
|
||||
"expected_project": "",
|
||||
"expected_snippet": "shared DEV-LEDGER as operating memory",
|
||||
"miss_class": "recommendation_prose",
|
||||
"notes": "A recommendation that later became a ratified decision. Rule extractor would need a 'simplest version that could work today' / 'I'd start with' cue class."
|
||||
},
|
||||
{
|
||||
"id": "a6b0d279-c564-4bce-a703-e476f4a148ad",
|
||||
"expected_count": 2,
|
||||
"expected_type": "project",
|
||||
"expected_project": "p06-polisher",
|
||||
"expected_snippet": "z_engaged bool; cam amplitude set mechanically and read by encoders",
|
||||
"miss_class": "architectural_change_summary",
|
||||
"notes": "Two durable architectural facts about the polisher machine (Z-axis is engage/retract, cam is read-only). Extractor would need to recognize 'A is now B' / 'X removed, Y added' patterns."
|
||||
},
|
||||
{
|
||||
"id": "4e00e398-2e89-4653-8ee5-3f65c7f4d2d3",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Clarification question to user."
|
||||
},
|
||||
{
|
||||
"id": "a6a7816a-7590-4616-84f4-49d9054c2a91",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Instructional response offering two next moves."
|
||||
},
|
||||
{
|
||||
"id": "03527502-316a-4a3e-989c-00719392c7d1",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Troubleshooting a paste failure. Ephemeral."
|
||||
},
|
||||
{
|
||||
"id": "1fff59fc-545f-42df-9dd1-a0e6dec1b7ee",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Agreement + follow-up question. No durable claim."
|
||||
},
|
||||
{
|
||||
"id": "eb65dc18-0030-4720-ace7-f55af9df719d",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Explanation of how the capture hook works. Instructional."
|
||||
},
|
||||
{
|
||||
"id": "52c8c0f3-32fb-4b48-9065-73c778a08417",
|
||||
"expected_count": 1,
|
||||
"expected_type": "project",
|
||||
"expected_project": "p06-polisher",
|
||||
"expected_snippet": "USB SSD mandatory on RPi; Tailscale for remote access",
|
||||
"miss_class": "spec_update_announcement",
|
||||
"notes": "Concrete architectural commitments just added to the polisher spec. Phrased as '§17.1 Local Storage - USB SSD mandatory, not SD card.' The '§' section markers could be a new cue."
|
||||
},
|
||||
{
|
||||
"id": "32d40414-15af-47ee-944b-2cceae9574b8",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Session recap. Historical summary, not a durable memory."
|
||||
},
|
||||
{
|
||||
"id": "b6d2cdfc-37fb-459a-96bd-caefb9beaab4",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Deployment prompt for Dalidou. Operational, not a memory."
|
||||
},
|
||||
{
|
||||
"id": "ee03d823-931b-4d4e-9258-88b4ed5eeb07",
|
||||
"expected_count": 2,
|
||||
"expected_type": "knowledge",
|
||||
"expected_project": "p06-polisher",
|
||||
"expected_snippet": "USB SSD is non-negotiable for local storage; Tailscale mesh for SSH/file transfer",
|
||||
"miss_class": "layered_recommendation",
|
||||
"notes": "Layered infra recommendation with 'non-negotiable' / 'strongly recommended' strength markers. The 'non-negotiable' token could be a new cue class."
|
||||
},
|
||||
{
|
||||
"id": "dd234d9f-0d1c-47e8-b01c-eebcb568c7e7",
|
||||
"expected_count": 1,
|
||||
"expected_type": "project",
|
||||
"expected_project": "p06-polisher",
|
||||
"expected_snippet": "interface contract is identical regardless of who generates the programs; machine is a standalone box",
|
||||
"miss_class": "alignment_assertion",
|
||||
"notes": "Architectural invariant assertion. '**Alignment verified**' / 'nothing changes for X' style. Likely too subtle for rule matching without LLM assistance."
|
||||
},
|
||||
{
|
||||
"id": "1f95891a-cf37-400e-9d68-4fad8e04dcbb",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Huge session handoff prompt. Informational only."
|
||||
},
|
||||
{
|
||||
"id": "5580950f-d010-4544-be4b-b3071271a698",
|
||||
"expected_count": 0,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Ledger schema sketch. Structural design proposal, later ratified — but the same idea was already captured as a ratified decision in the recent decisions section, so not worth re-extracting from this conversational form."
|
||||
}
|
||||
]
|
||||
}
|
||||
518
scripts/eval_data/extractor_llm_baseline_2026-04-11.json
Normal file
518
scripts/eval_data/extractor_llm_baseline_2026-04-11.json
Normal file
@@ -0,0 +1,518 @@
|
||||
{
|
||||
"summary": {
|
||||
"total": 20,
|
||||
"exact_match": 6,
|
||||
"positive_expected": 5,
|
||||
"total_expected_candidates": 7,
|
||||
"total_actual_candidates": 51,
|
||||
"yield_rate": 2.55,
|
||||
"recall": 1.0,
|
||||
"precision": 0.357,
|
||||
"false_positive_interactions": 9,
|
||||
"false_negative_interactions": 0,
|
||||
"miss_classes": {},
|
||||
"mode": "llm"
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
"id": "ab239158-d6ac-4c51-b6e4-dd4ccea384a2",
|
||||
"expected_count": 0,
|
||||
"actual_count": 1,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Instructional deploy guidance. No durable claim.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "knowledge",
|
||||
"content": "AtoCore deployments to dalidou use the script /srv/storage/atocore/app/deploy/dalidou/deploy.sh instead of manual docker commands",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "da153f2a-b20a-4dee-8c72-431ebb71f08c",
|
||||
"expected_count": 0,
|
||||
"actual_count": 0,
|
||||
"ok": true,
|
||||
"miss_class": "n/a",
|
||||
"notes": "'Deploy still in progress.' Pure status.",
|
||||
"actual_candidates": []
|
||||
},
|
||||
{
|
||||
"id": "7d8371ee-c6d3-4dfe-a7b0-2d091f075c15",
|
||||
"expected_count": 0,
|
||||
"actual_count": 0,
|
||||
"ok": true,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Git command walkthrough. No durable claim.",
|
||||
"actual_candidates": []
|
||||
},
|
||||
{
|
||||
"id": "14bf3f90-e318-466e-81ac-d35522741ba5",
|
||||
"expected_count": 0,
|
||||
"actual_count": 4,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Ledger status update. Transient fact, not a durable memory candidate.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Retrieval/extraction evaluation follows 8-day mini-phase plan with hard gates to prevent scope drift. Preflight checks must validate git SHAs, baselines, and fixture stability before coding.",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Day 1: Create labeled extractor eval set from 30 captures (10 zero-candidate, 10 single-candidate, 10 ambiguous) with metadata; create scoring tool to measure precision/recall.",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Day 2: Measure current extractor against labeled set, recording yield, true/false positives, and false negatives by pattern.",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Session Log/Ledger system tracks work state across sessions so future sessions immediately know what is true and what is next; phases marked by git SHAs.",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "8f855235-c38d-4c27-9f2b-8530ebe1a2d8",
|
||||
"expected_count": 0,
|
||||
"actual_count": 0,
|
||||
"ok": true,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Short-term recommendation ('merge to main and deploy'), not a standing decision.",
|
||||
"actual_candidates": []
|
||||
},
|
||||
{
|
||||
"id": "04a96eb5-cd00-4e9f-9252-b2cc919000a4",
|
||||
"expected_count": 0,
|
||||
"actual_count": 0,
|
||||
"ok": true,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Dev server config table. Operational detail, not a memory.",
|
||||
"actual_candidates": []
|
||||
},
|
||||
{
|
||||
"id": "79d606ed-8981-454a-83af-c25226b1b65c",
|
||||
"expected_count": 1,
|
||||
"actual_count": 3,
|
||||
"ok": false,
|
||||
"miss_class": "recommendation_prose",
|
||||
"notes": "A recommendation that later became a ratified decision. Rule extractor would need a 'simplest version that could work today' / 'I'd start with' cue class.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "atocore uses multi-model coordination: Claude and codex share DEV-LEDGER.md (current state / active plan / P1+P2 findings / recent decisions / commit log) read at session start, appended at session end",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "atocore starts with manual-event-loop (/audit or /status prompts) using DEV-LEDGER.md before upgrading to automated git hooks/CI review",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "atocore development involves coordinating between Claude and codex models with shared plan/review strategy and counter-validation to improve system quality",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "a6b0d279-c564-4bce-a703-e476f4a148ad",
|
||||
"expected_count": 2,
|
||||
"actual_count": 6,
|
||||
"ok": false,
|
||||
"miss_class": "architectural_change_summary",
|
||||
"notes": "Two durable architectural facts about the polisher machine (Z-axis is engage/retract, cam is read-only). Extractor would need to recognize 'A is now B' / 'X removed, Y added' patterns.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Z-axis is a binary engage/retract mechanism (z_engaged bool), not continuous position control; confirmation timeout z_engage_timeout_s required.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Cam amplitude and offset are mechanically set by operator and read via encoders; no actuators control them, controller receives encoder telemetry only.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Cam parameters in controller are expected_cam_amplitude_deg and expected_cam_offset_deg (read-only reference for verification), not command setpoints.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Manual mode UI displays cam encoder readings (cam_amplitude_deg, cam_offset_deg) as read-only for operator verification of mechanical setting.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Manual session log records cam_setting measured at session start; run-log segment actual block includes cam_amplitude_deg_mean and cam_offset_deg_mean.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Machine capabilities now define z_type: engage_retract and cam_type: mechanical_with_encoder instead of actuator-driven setpoints.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "4e00e398-2e89-4653-8ee5-3f65c7f4d2d3",
|
||||
"expected_count": 0,
|
||||
"actual_count": 0,
|
||||
"ok": true,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Clarification question to user.",
|
||||
"actual_candidates": []
|
||||
},
|
||||
{
|
||||
"id": "a6a7816a-7590-4616-84f4-49d9054c2a91",
|
||||
"expected_count": 0,
|
||||
"actual_count": 3,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Instructional response offering two next moves.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "knowledge",
|
||||
"content": "Codex is an audit agent; communicate with it via markdown prompts with numbered steps; it updates findings via commits to codex/* branches or direct messages.",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "preference",
|
||||
"content": "Audit-first workflow recommended: have codex audit DEV-LEDGER.md and recent commits before execution; validates round-trip, catches errors early.",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "knowledge",
|
||||
"content": "DEV-LEDGER.md at repo root is the shared coordination document with Orientation, Active Plan, and Open Review Findings sections.",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "03527502-316a-4a3e-989c-00719392c7d1",
|
||||
"expected_count": 0,
|
||||
"actual_count": 0,
|
||||
"ok": true,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Troubleshooting a paste failure. Ephemeral.",
|
||||
"actual_candidates": []
|
||||
},
|
||||
{
|
||||
"id": "1fff59fc-545f-42df-9dd1-a0e6dec1b7ee",
|
||||
"expected_count": 0,
|
||||
"actual_count": 3,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Agreement + follow-up question. No durable claim.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Roadmap: Extractor improvement → Harness expansion → Wave 2 trusted operational ingestion → Finish OpenClaw integration (in that order)",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Phase 1 (Extractor): eval-driven loop—label captures, improve rules/add LLM mode, measure yield & FP, stop when queue reviewable (not coverage metrics)",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Phases 1 & 2 (Extractor + Harness) are a mini-phase; without harness, extractor improvements are blind edits",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "eb65dc18-0030-4720-ace7-f55af9df719d",
|
||||
"expected_count": 0,
|
||||
"actual_count": 2,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Explanation of how the capture hook works. Instructional.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "knowledge",
|
||||
"content": "Dalidou stores Claude Code interactions via a Stop hook that fires after each turn and POSTs to http://dalidou:8100/interactions with client=claude-code parameter",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "adaptation",
|
||||
"content": "Interaction capture system is passive and automatic; no manual action required, interactions accumulate automatically during normal Claude Code usage",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "52c8c0f3-32fb-4b48-9065-73c778a08417",
|
||||
"expected_count": 1,
|
||||
"actual_count": 5,
|
||||
"ok": false,
|
||||
"miss_class": "spec_update_announcement",
|
||||
"notes": "Concrete architectural commitments just added to the polisher spec. Phrased as '§17.1 Local Storage - USB SSD mandatory, not SD card.' The '§' section markers could be a new cue.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "USB SSD mandatory for storage (not SD card); directory structure /data/runs/{id}/, /data/manual/{id}/; status.json for machine state",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "RPi joins Tailscale mesh for remote access over SSH VPN; no public IP or port forwarding; fully offline operation",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Data synchronization via rsync over Tailscale, failure-tolerant and non-blocking; USB stick as manual fallback",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Machine design principle: works fully offline and independently; network connection is for remote access only",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "No cloud, no real-time streaming, no remote control features in design scope",
|
||||
"project": "",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "32d40414-15af-47ee-944b-2cceae9574b8",
|
||||
"expected_count": 0,
|
||||
"actual_count": 5,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Session recap. Historical summary, not a durable memory.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "P1: Reflection loop integration incomplete—extraction remains manual (POST /interactions/{id}/extract), not auto-triggered with reinforcement. Live capture won't auto-populate candidate review queue.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "P1: Project memories excluded from context injection; build_context() requests [\"identity\", \"preference\"] only. Reinforcement signal doesn't reach assembled context packs.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Current batch-extract rules produce only 1 candidate from 42 real captures. Extractor needs conversational-cue detection or LLM-assisted path to improve yield.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Next priority: extractor rule expansion (cheapest validation of reflection loop), then Wave 2 trusted operational ingestion (master-plan priority). Defer retrieval eval harness focus.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "knowledge",
|
||||
"content": "Alias canonicalization fix (resolve_project_name() boundary) is consistently applied across project state, memories, interactions, and context lookup. Code review approved directionally.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "b6d2cdfc-37fb-459a-96bd-caefb9beaab4",
|
||||
"expected_count": 0,
|
||||
"actual_count": 1,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Deployment prompt for Dalidou. Operational, not a memory.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "preference",
|
||||
"content": "User prefers receiving standalone testing prompts they can paste into Claude Code on target deployments rather than having the assistant run tests directly.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "ee03d823-931b-4d4e-9258-88b4ed5eeb07",
|
||||
"expected_count": 2,
|
||||
"actual_count": 5,
|
||||
"ok": false,
|
||||
"miss_class": "layered_recommendation",
|
||||
"notes": "Layered infra recommendation with 'non-negotiable' / 'strongly recommended' strength markers. The 'non-negotiable' token could be a new cue class.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "USB SSD on RPi is mandatory for polishing telemetry storage; must be independent of network for data integrity during runs.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Use Tailscale mesh for RPi remote access to provide SSH, file transfer, and NAT traversal without port forwarding.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Auto-sync telemetry data via rsync over Tailscale after runs complete; fire-and-forget pattern with automatic retry on network interruption.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Real-time telemetry monitoring should target 10 Hz downsampling; full 100 Hz streaming over network is not necessary.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "knowledge",
|
||||
"content": "Polishing telemetry data rate is approximately 29 MB per hour (100 Hz × 20 channels × 4 bytes = 8 KB/s).",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "dd234d9f-0d1c-47e8-b01c-eebcb568c7e7",
|
||||
"expected_count": 1,
|
||||
"actual_count": 3,
|
||||
"ok": false,
|
||||
"miss_class": "alignment_assertion",
|
||||
"notes": "Architectural invariant assertion. '**Alignment verified**' / 'nothing changes for X' style. Likely too subtle for rule matching without LLM assistance.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Machine spec (shareable) + Atomaste spec (internal) separate concerns. Machine spec hides program generation as 'separate scope' to protect IP/business strategy.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Firmware interface contract is invariant: controller-job.v1 input, run-log.v1 + telemetry output. No firmware changes needed regardless of program generation implementation.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "Atomaste sim spec documents forward/return paths, calibration model (Preston k), translation loss, and service/IP strategy—details hidden from shareable machine spec.",
|
||||
"project": "p06-polisher",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "1f95891a-cf37-400e-9d68-4fad8e04dcbb",
|
||||
"expected_count": 0,
|
||||
"actual_count": 4,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Huge session handoff prompt. Informational only.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "knowledge",
|
||||
"content": "AtoCore is FastAPI (Python 3.12, SQLite + ChromaDB) on Dalidou home server (dalidou:8100), repo C:\\Users\\antoi\\ATOCore, data /srv/storage/atocore/, ingests Obsidian vault + Google Drive into vector memory system.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "knowledge",
|
||||
"content": "Deploy AtoCore: git push origin main, then ssh papa@dalidou and run /srv/storage/atocore/app/deploy/dalidou/deploy.sh",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "adaptation",
|
||||
"content": "Do not add memory extraction to interaction capture hot path; keep extraction as separate batch/manual step. Reason: latency and queue noise before review rhythm is comfortable.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "As of 2026-04-11, approved roadmap in order: observe reinforcement, batch extraction, candidate triage, off-Dalidou backup, retrieval quality review.",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "5580950f-d010-4544-be4b-b3071271a698",
|
||||
"expected_count": 0,
|
||||
"actual_count": 6,
|
||||
"ok": false,
|
||||
"miss_class": "n/a",
|
||||
"notes": "Ledger schema sketch. Structural design proposal, later ratified — but the same idea was already captured as a ratified decision in the recent decisions section, so not worth re-extracting from this conversational form.",
|
||||
"actual_candidates": [
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "AtoCore adopts DEV-LEDGER.md as shared operating memory with stable headers; updated at session boundaries",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "adaptation",
|
||||
"content": "Codex branches for AtoCore fork from main (never orphan); use naming pattern codex/<topic>",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "adaptation",
|
||||
"content": "In AtoCore, Claude builds and Codex audits; never work in parallel on same files",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "adaptation",
|
||||
"content": "In AtoCore, P1-severity findings in DEV-LEDGER.md block further main commits until acknowledged",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "adaptation",
|
||||
"content": "Every AtoCore session appends to DEV-LEDGER.md Session Log and updates Orientation before ending",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
},
|
||||
{
|
||||
"memory_type": "project",
|
||||
"content": "AtoCore roadmap: (1) extractor improvement, (2) harness expansion, (3) Wave 2 ingestion, (4) OpenClaw finish; steps 1+2 are current mini-phase",
|
||||
"project": "atocore",
|
||||
"rule": "llm_extraction"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
1
scripts/eval_data/interactions_snapshot_2026-04-11.json
Normal file
1
scripts/eval_data/interactions_snapshot_2026-04-11.json
Normal file
File diff suppressed because one or more lines are too long
274
scripts/extractor_eval.py
Normal file
274
scripts/extractor_eval.py
Normal file
@@ -0,0 +1,274 @@
|
||||
"""Extractor eval runner — scores the rule-based extractor against a
|
||||
labeled interaction corpus.
|
||||
|
||||
Pulls full interaction content from a frozen snapshot, runs each through
|
||||
``extract_candidates_from_interaction``, and compares the output to the
|
||||
expected counts from a labels file. Produces a per-label scorecard plus
|
||||
aggregate precision / recall / yield numbers.
|
||||
|
||||
This harness deliberately stays file-based: snapshot + labels + this
|
||||
runner. No Dalidou HTTP dependency once the snapshot is frozen, so the
|
||||
eval is reproducible run-to-run even as live captures drift.
|
||||
|
||||
Usage:
|
||||
|
||||
python scripts/extractor_eval.py # human report
|
||||
python scripts/extractor_eval.py --json # machine-readable
|
||||
python scripts/extractor_eval.py \\
|
||||
--snapshot scripts/eval_data/interactions_snapshot_2026-04-11.json \\
|
||||
--labels scripts/eval_data/extractor_labels_2026-04-11.json
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import io
|
||||
import json
|
||||
import sys
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
# Force UTF-8 on stdout so real LLM output (arrows, em-dashes, CJK)
|
||||
# doesn't crash the human report on Windows cp1252 consoles.
|
||||
if hasattr(sys.stdout, "buffer"):
|
||||
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace", line_buffering=True)
|
||||
|
||||
# Make src/ importable without requiring an install.
|
||||
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
sys.path.insert(0, str(_REPO_ROOT / "src"))
|
||||
|
||||
from atocore.interactions.service import Interaction # noqa: E402
|
||||
from atocore.memory.extractor import extract_candidates_from_interaction # noqa: E402
|
||||
from atocore.memory.extractor_llm import extract_candidates_llm # noqa: E402
|
||||
|
||||
DEFAULT_SNAPSHOT = _REPO_ROOT / "scripts" / "eval_data" / "interactions_snapshot_2026-04-11.json"
|
||||
DEFAULT_LABELS = _REPO_ROOT / "scripts" / "eval_data" / "extractor_labels_2026-04-11.json"
|
||||
|
||||
|
||||
@dataclass
|
||||
class LabelResult:
|
||||
id: str
|
||||
expected_count: int
|
||||
actual_count: int
|
||||
ok: bool
|
||||
miss_class: str
|
||||
notes: str
|
||||
actual_candidates: list[dict] = field(default_factory=list)
|
||||
|
||||
|
||||
def load_snapshot(path: Path) -> dict[str, dict]:
|
||||
data = json.loads(path.read_text(encoding="utf-8"))
|
||||
return {item["id"]: item for item in data.get("interactions", [])}
|
||||
|
||||
|
||||
def load_labels(path: Path) -> dict:
|
||||
return json.loads(path.read_text(encoding="utf-8"))
|
||||
|
||||
|
||||
def interaction_from_snapshot(snap: dict) -> Interaction:
|
||||
return Interaction(
|
||||
id=snap["id"],
|
||||
prompt=snap.get("prompt", "") or "",
|
||||
response=snap.get("response", "") or "",
|
||||
response_summary="",
|
||||
project=snap.get("project", "") or "",
|
||||
client=snap.get("client", "") or "",
|
||||
session_id=snap.get("session_id", "") or "",
|
||||
created_at=snap.get("created_at", "") or "",
|
||||
)
|
||||
|
||||
|
||||
def score(snapshot: dict[str, dict], labels_doc: dict, mode: str = "rule") -> list[LabelResult]:
|
||||
results: list[LabelResult] = []
|
||||
for label in labels_doc["labels"]:
|
||||
iid = label["id"]
|
||||
snap = snapshot.get(iid)
|
||||
if snap is None:
|
||||
results.append(
|
||||
LabelResult(
|
||||
id=iid,
|
||||
expected_count=int(label.get("expected_count", 0)),
|
||||
actual_count=-1,
|
||||
ok=False,
|
||||
miss_class="not_in_snapshot",
|
||||
notes=label.get("notes", ""),
|
||||
)
|
||||
)
|
||||
continue
|
||||
interaction = interaction_from_snapshot(snap)
|
||||
if mode == "llm":
|
||||
candidates = extract_candidates_llm(interaction)
|
||||
else:
|
||||
candidates = extract_candidates_from_interaction(interaction)
|
||||
actual_count = len(candidates)
|
||||
expected_count = int(label.get("expected_count", 0))
|
||||
results.append(
|
||||
LabelResult(
|
||||
id=iid,
|
||||
expected_count=expected_count,
|
||||
actual_count=actual_count,
|
||||
ok=(actual_count == expected_count),
|
||||
miss_class=label.get("miss_class", "n/a"),
|
||||
notes=label.get("notes", ""),
|
||||
actual_candidates=[
|
||||
{
|
||||
"memory_type": c.memory_type,
|
||||
"content": c.content,
|
||||
"project": c.project,
|
||||
"rule": c.rule,
|
||||
}
|
||||
for c in candidates
|
||||
],
|
||||
)
|
||||
)
|
||||
return results
|
||||
|
||||
|
||||
def aggregate(results: list[LabelResult]) -> dict:
|
||||
total = len(results)
|
||||
exact_match = sum(1 for r in results if r.ok)
|
||||
true_positive = sum(1 for r in results if r.expected_count > 0 and r.actual_count > 0)
|
||||
false_positive_interactions = sum(
|
||||
1 for r in results if r.expected_count == 0 and r.actual_count > 0
|
||||
)
|
||||
false_negative_interactions = sum(
|
||||
1 for r in results if r.expected_count > 0 and r.actual_count == 0
|
||||
)
|
||||
positive_expected = sum(1 for r in results if r.expected_count > 0)
|
||||
total_expected_candidates = sum(r.expected_count for r in results)
|
||||
total_actual_candidates = sum(max(r.actual_count, 0) for r in results)
|
||||
yield_rate = total_actual_candidates / total if total else 0.0
|
||||
# Recall over interaction count that had at least one expected candidate:
|
||||
recall = true_positive / positive_expected if positive_expected else 0.0
|
||||
# Precision over interaction count that produced any candidate:
|
||||
precision_denom = true_positive + false_positive_interactions
|
||||
precision = true_positive / precision_denom if precision_denom else 0.0
|
||||
# Miss class breakdown
|
||||
miss_classes: dict[str, int] = {}
|
||||
for r in results:
|
||||
if r.expected_count > 0 and r.actual_count == 0:
|
||||
key = r.miss_class or "unlabeled"
|
||||
miss_classes[key] = miss_classes.get(key, 0) + 1
|
||||
return {
|
||||
"total": total,
|
||||
"exact_match": exact_match,
|
||||
"positive_expected": positive_expected,
|
||||
"total_expected_candidates": total_expected_candidates,
|
||||
"total_actual_candidates": total_actual_candidates,
|
||||
"yield_rate": round(yield_rate, 3),
|
||||
"recall": round(recall, 3),
|
||||
"precision": round(precision, 3),
|
||||
"false_positive_interactions": false_positive_interactions,
|
||||
"false_negative_interactions": false_negative_interactions,
|
||||
"miss_classes": miss_classes,
|
||||
}
|
||||
|
||||
|
||||
def print_human(results: list[LabelResult], summary: dict) -> None:
|
||||
print("=== Extractor eval ===")
|
||||
print(
|
||||
f"labeled={summary['total']} "
|
||||
f"exact_match={summary['exact_match']} "
|
||||
f"positive_expected={summary['positive_expected']}"
|
||||
)
|
||||
print(
|
||||
f"yield={summary['yield_rate']} "
|
||||
f"recall={summary['recall']} "
|
||||
f"precision={summary['precision']}"
|
||||
)
|
||||
print(
|
||||
f"false_positives={summary['false_positive_interactions']} "
|
||||
f"false_negatives={summary['false_negative_interactions']}"
|
||||
)
|
||||
print()
|
||||
print("miss class breakdown (FN):")
|
||||
if summary["miss_classes"]:
|
||||
for k, v in sorted(summary["miss_classes"].items(), key=lambda kv: -kv[1]):
|
||||
print(f" {v:3d} {k}")
|
||||
else:
|
||||
print(" (none)")
|
||||
print()
|
||||
print("per-interaction:")
|
||||
for r in results:
|
||||
marker = "OK " if r.ok else "MISS"
|
||||
iid_short = r.id[:8]
|
||||
print(f" {marker} {iid_short} expected={r.expected_count} actual={r.actual_count} class={r.miss_class}")
|
||||
if r.actual_candidates:
|
||||
for c in r.actual_candidates:
|
||||
preview = (c["content"] or "")[:80]
|
||||
print(f" [{c['memory_type']}] {preview}")
|
||||
|
||||
|
||||
def print_json(results: list[LabelResult], summary: dict) -> None:
|
||||
payload = {
|
||||
"summary": summary,
|
||||
"results": [
|
||||
{
|
||||
"id": r.id,
|
||||
"expected_count": r.expected_count,
|
||||
"actual_count": r.actual_count,
|
||||
"ok": r.ok,
|
||||
"miss_class": r.miss_class,
|
||||
"notes": r.notes,
|
||||
"actual_candidates": r.actual_candidates,
|
||||
}
|
||||
for r in results
|
||||
],
|
||||
}
|
||||
json.dump(payload, sys.stdout, indent=2)
|
||||
sys.stdout.write("\n")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(description="AtoCore extractor eval")
|
||||
parser.add_argument("--snapshot", type=Path, default=DEFAULT_SNAPSHOT)
|
||||
parser.add_argument("--labels", type=Path, default=DEFAULT_LABELS)
|
||||
parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=Path,
|
||||
default=None,
|
||||
help="write JSON result to this file (bypasses log/stdout interleaving)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--mode",
|
||||
choices=["rule", "llm"],
|
||||
default="rule",
|
||||
help="which extractor to score (default: rule)",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
snapshot = load_snapshot(args.snapshot)
|
||||
labels = load_labels(args.labels)
|
||||
results = score(snapshot, labels, mode=args.mode)
|
||||
summary = aggregate(results)
|
||||
summary["mode"] = args.mode
|
||||
|
||||
if args.output is not None:
|
||||
payload = {
|
||||
"summary": summary,
|
||||
"results": [
|
||||
{
|
||||
"id": r.id,
|
||||
"expected_count": r.expected_count,
|
||||
"actual_count": r.actual_count,
|
||||
"ok": r.ok,
|
||||
"miss_class": r.miss_class,
|
||||
"notes": r.notes,
|
||||
"actual_candidates": r.actual_candidates,
|
||||
}
|
||||
for r in results
|
||||
],
|
||||
}
|
||||
args.output.write_text(json.dumps(payload, indent=2, ensure_ascii=False), encoding="utf-8")
|
||||
print(f"wrote {args.output} ({summary['mode']}: recall={summary['recall']} precision={summary['precision']})")
|
||||
elif args.json:
|
||||
print_json(results, summary)
|
||||
else:
|
||||
print_human(results, summary)
|
||||
|
||||
return 0 if summary["false_negative_interactions"] == 0 and summary["false_positive_interactions"] == 0 else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
54
scripts/ingest_folder.py
Normal file
54
scripts/ingest_folder.py
Normal file
@@ -0,0 +1,54 @@
|
||||
"""CLI script to ingest a folder of markdown files."""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add src to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
|
||||
|
||||
from atocore.ingestion.pipeline import ingest_folder
|
||||
from atocore.models.database import init_db
|
||||
from atocore.observability.logger import setup_logging
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Ingest markdown files into AtoCore")
|
||||
parser.add_argument("--path", required=True, help="Path to folder with markdown files")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging()
|
||||
init_db()
|
||||
|
||||
folder = Path(args.path)
|
||||
if not folder.is_dir():
|
||||
print(f"Error: {folder} is not a directory")
|
||||
sys.exit(1)
|
||||
|
||||
results = ingest_folder(folder)
|
||||
|
||||
# Summary
|
||||
ingested = sum(1 for r in results if r["status"] == "ingested")
|
||||
skipped = sum(1 for r in results if r["status"] == "skipped")
|
||||
errors = sum(1 for r in results if r["status"] == "error")
|
||||
total_chunks = sum(r.get("chunks", 0) for r in results)
|
||||
|
||||
print(f"\n{'='*50}")
|
||||
print(f"Ingestion complete:")
|
||||
print(f" Files processed: {len(results)}")
|
||||
print(f" Ingested: {ingested}")
|
||||
print(f" Skipped (unchanged): {skipped}")
|
||||
print(f" Errors: {errors}")
|
||||
print(f" Total chunks created: {total_chunks}")
|
||||
print(f"{'='*50}")
|
||||
|
||||
if errors:
|
||||
print("\nErrors:")
|
||||
for r in results:
|
||||
if r["status"] == "error":
|
||||
print(f" {r['file']}: {r['error']}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1018
scripts/migrate_legacy_aliases.py
Normal file
1018
scripts/migrate_legacy_aliases.py
Normal file
File diff suppressed because it is too large
Load Diff
89
scripts/persist_llm_candidates.py
Normal file
89
scripts/persist_llm_candidates.py
Normal file
@@ -0,0 +1,89 @@
|
||||
"""Persist LLM-extracted candidates from a baseline JSON to Dalidou.
|
||||
|
||||
One-shot script: reads a saved extractor eval output file, filters to
|
||||
candidates the LLM actually produced, and POSTs each to the Dalidou
|
||||
memory API with ``status=candidate``. Deduplicates against already-
|
||||
existing candidate content so the script is safe to re-run.
|
||||
|
||||
Usage:
|
||||
|
||||
python scripts/persist_llm_candidates.py \\
|
||||
scripts/eval_data/extractor_llm_baseline_2026-04-11.json
|
||||
|
||||
Then triage via:
|
||||
|
||||
python scripts/atocore_client.py triage
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
|
||||
BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
|
||||
TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "10"))
|
||||
|
||||
|
||||
def post_json(path: str, body: dict) -> dict:
|
||||
data = json.dumps(body).encode("utf-8")
|
||||
req = urllib.request.Request(
|
||||
url=f"{BASE_URL}{path}",
|
||||
method="POST",
|
||||
headers={"Content-Type": "application/json"},
|
||||
data=data,
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=TIMEOUT) as resp:
|
||||
return json.loads(resp.read().decode("utf-8"))
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) < 2:
|
||||
print(f"usage: {sys.argv[0]} <baseline_json>", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
data = json.loads(open(sys.argv[1], encoding="utf-8").read())
|
||||
results = data.get("results", [])
|
||||
|
||||
persisted = 0
|
||||
skipped = 0
|
||||
errors = 0
|
||||
|
||||
for r in results:
|
||||
for c in r.get("actual_candidates", []):
|
||||
content = (c.get("content") or "").strip()
|
||||
if not content:
|
||||
continue
|
||||
mem_type = c.get("memory_type", "knowledge")
|
||||
project = c.get("project", "")
|
||||
confidence = c.get("confidence", 0.5)
|
||||
|
||||
try:
|
||||
resp = post_json("/memory", {
|
||||
"memory_type": mem_type,
|
||||
"content": content,
|
||||
"project": project,
|
||||
"confidence": float(confidence),
|
||||
"status": "candidate",
|
||||
})
|
||||
persisted += 1
|
||||
print(f" + {resp.get('id','?')[:8]} [{mem_type}] {content[:80]}")
|
||||
except urllib.error.HTTPError as exc:
|
||||
if exc.code == 400:
|
||||
skipped += 1
|
||||
else:
|
||||
errors += 1
|
||||
print(f" ! error {exc.code}: {content[:60]}", file=sys.stderr)
|
||||
except Exception as exc:
|
||||
errors += 1
|
||||
print(f" ! {exc}: {content[:60]}", file=sys.stderr)
|
||||
|
||||
print(f"\npersisted={persisted} skipped={skipped} errors={errors}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
393
scripts/phase9_first_real_use.py
Normal file
393
scripts/phase9_first_real_use.py
Normal file
@@ -0,0 +1,393 @@
|
||||
"""Phase 9 first-real-use validation script.
|
||||
|
||||
Captures a small set of representative interactions drawn from a real
|
||||
working session, runs the full Phase 9 loop (capture -> reinforce ->
|
||||
extract) over them, and prints what each step produced. The intent is
|
||||
to generate empirical evidence about the extractor's behaviour against
|
||||
prose that wasn't written to make the test pass.
|
||||
|
||||
Usage:
|
||||
python scripts/phase9_first_real_use.py [--data-dir PATH]
|
||||
|
||||
The script writes a fresh isolated SQLite + Chroma store under the
|
||||
given data dir (default: ./data/validation/phase9-first-use). The
|
||||
data dir is gitignored so the script can be re-run cleanly.
|
||||
|
||||
Each interaction is printed with:
|
||||
- the captured interaction id
|
||||
- the reinforcement results (which seeded memories were echoed)
|
||||
- the extraction results (which candidates were proposed and why)
|
||||
- notes on what the extractor MISSED (manually annotated below)
|
||||
|
||||
The output is intentionally human-readable so the run can be saved as
|
||||
the body of docs/phase9-first-real-use.md.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import sys
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
# Make the package importable when run directly from the repo root.
|
||||
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
sys.path.insert(0, str(_REPO_ROOT / "src"))
|
||||
|
||||
|
||||
@dataclass
|
||||
class SampleInteraction:
|
||||
label: str
|
||||
prompt: str
|
||||
response: str
|
||||
project: str
|
||||
notes_for_human: str
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Sample interactions, paraphrased from real working sessions on AtoCore
|
||||
# itself. Each one is a real-feeling exchange with at least one structural
|
||||
# cue the extractor should catch, plus prose the extractor should NOT
|
||||
# accidentally catch.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
SAMPLES: list[SampleInteraction] = [
|
||||
SampleInteraction(
|
||||
label="exdev-mount-merge-decision",
|
||||
project="atocore",
|
||||
prompt="The TMPDIR fix didn't work, the LOCAL_COPY_PATH change didn't work, what now?",
|
||||
response=(
|
||||
"We've been chasing this through three layers without success. "
|
||||
"Time to stop guessing and apply the simple fix Dalidou Claude "
|
||||
"proposed in the very first diagnosis.\n\n"
|
||||
"## Decision: merge the two bind mounts into one\n\n"
|
||||
"Both /srv/storage/repos and /srv/storage/gitea/data live on "
|
||||
"/dev/sdb1, so the move is a same-filesystem rename — instant, "
|
||||
"atomic, no data copy. After the move, the EXDEV class of bug "
|
||||
"becomes structurally impossible because there is only one "
|
||||
"mount inside the container."
|
||||
),
|
||||
notes_for_human=(
|
||||
"Should fire decision_heading rule. Should NOT fire requirement "
|
||||
"or constraint rules. The 'becomes structurally impossible' line "
|
||||
"is prose, not a candidate."
|
||||
),
|
||||
),
|
||||
SampleInteraction(
|
||||
label="ownership-was-the-real-fix",
|
||||
project="atocore",
|
||||
prompt="Still failing after the mount merge. What's actually wrong?",
|
||||
response=(
|
||||
"Same rejection even after the mounts were merged, which kills "
|
||||
"the EXDEV theory. The error 'unable to migrate objects to "
|
||||
"permanent storage' is git's catch-all for any failure in "
|
||||
"tmp_objdir_migrate, and the real errno is being thrown away.\n\n"
|
||||
"## Fact: the actual root cause was ownership, not cross-device link\n\n"
|
||||
"After mv on the host, the moved directory had perms git could "
|
||||
"navigate for some subdirs but not others. Recursive chown "
|
||||
"git:git plus chmod u+rwX,g+rwX fixed it on the first try."
|
||||
),
|
||||
notes_for_human=(
|
||||
"Should fire fact_heading rule, target type knowledge. The "
|
||||
"'unable to migrate' string is inside prose and should NOT "
|
||||
"trigger any rule."
|
||||
),
|
||||
),
|
||||
SampleInteraction(
|
||||
label="memory-vs-entity-canonical-home",
|
||||
project="atocore",
|
||||
prompt="Does a Decision live in memories or entities?",
|
||||
response=(
|
||||
"Both layers exist for different reasons. After thinking through "
|
||||
"the ontology question, I'm landing on a clean rule: one "
|
||||
"canonical home per concept, with explicit graduation when a "
|
||||
"memory eventually deserves structure.\n\n"
|
||||
"## Decision: project, knowledge, and adaptation memory types graduate to entities\n\n"
|
||||
"The identity, preference, and episodic types stay in the memory "
|
||||
"layer because they're personal and unstructured. The other "
|
||||
"three become entities once the engineering layer V1 ships. The "
|
||||
"original memory rows are NEVER deleted — they become frozen "
|
||||
"historical pointers to whichever entity they graduated into."
|
||||
),
|
||||
notes_for_human=(
|
||||
"Should fire decision_heading rule. Long content (over 100 "
|
||||
"chars) — verifies the max-length cap of 280 doesn't truncate."
|
||||
),
|
||||
),
|
||||
SampleInteraction(
|
||||
label="auto-promotion-deferred",
|
||||
project="atocore",
|
||||
prompt="Should the engineering layer auto-promote candidates?",
|
||||
response=(
|
||||
"The trust hierarchy is the whole point of AtoCore. If we let "
|
||||
"the system auto-promote, we trade away the 'bad memory is "
|
||||
"worse than no memory' invariant for a marginal speed gain.\n\n"
|
||||
"## Decision: no auto-promotion in V1, ever\n\n"
|
||||
"The schema and API are designed so policies can be added "
|
||||
"later without migration, but the V1 stance is that every "
|
||||
"promotion needs a human signoff. The hard line: nothing ever "
|
||||
"moves into trusted project state automatically, even with the "
|
||||
"engineering layer running."
|
||||
),
|
||||
notes_for_human=(
|
||||
"Should fire decision_heading rule. Has 'invariant' jargon — "
|
||||
"verify the extractor doesn't get distracted."
|
||||
),
|
||||
),
|
||||
SampleInteraction(
|
||||
label="preference-rebase-workflow",
|
||||
project="atocore",
|
||||
prompt="Should we use rebase or merge commits for this branch?",
|
||||
response=(
|
||||
"I prefer rebase-based workflows because the history stays "
|
||||
"linear and reviewers have an easier time. For the dalidou "
|
||||
"storage-foundation branch we'll rebase before merging into "
|
||||
"main so the timeline reads as one coherent narrative."
|
||||
),
|
||||
notes_for_human=(
|
||||
"Should fire preference_sentence rule. Note: the rule looks "
|
||||
"for 'I prefer' which is exactly the wording used."
|
||||
),
|
||||
),
|
||||
SampleInteraction(
|
||||
label="constraint-from-doc-cite",
|
||||
project="p05-interferometer",
|
||||
prompt="What's the budget situation for p05 right now?",
|
||||
response=(
|
||||
"Looking at the latest design notes, the constraint is fairly "
|
||||
"tight on the optical path side.\n\n"
|
||||
"### Constraint: total optical path stability must stay within 10 nm RMS at 1 kHz\n\n"
|
||||
"The error budget allocates 6 nm to the laser source and 4 nm "
|
||||
"to the mechanical mounts and frame combined."
|
||||
),
|
||||
notes_for_human=(
|
||||
"Should fire constraint_heading rule with type=project. The "
|
||||
"second sentence ('error budget allocates 6 nm') is descriptive "
|
||||
"and should NOT fire any rule because it doesn't have a "
|
||||
"structural cue."
|
||||
),
|
||||
),
|
||||
SampleInteraction(
|
||||
label="prose-only-no-cues",
|
||||
project="atocore",
|
||||
prompt="What should I work on next?",
|
||||
response=(
|
||||
"Looking at the current state of AtoCore, the next natural "
|
||||
"step is to validate Phase 9 against real interactions before "
|
||||
"starting the engineering layer implementation. Empirical "
|
||||
"evidence is missing and the planning docs would benefit from "
|
||||
"real signal."
|
||||
),
|
||||
notes_for_human=(
|
||||
"Should produce ZERO candidates. Pure recommendation prose, no "
|
||||
"structural cues. If this fires anything the extractor is too "
|
||||
"loose."
|
||||
),
|
||||
),
|
||||
SampleInteraction(
|
||||
label="multiple-cues-in-one-interaction",
|
||||
project="p06-polisher",
|
||||
prompt="Summarize today's polisher session",
|
||||
response=(
|
||||
"We worked through three things in the polisher session today.\n\n"
|
||||
"## Decision: defer the laser interlock redesign to after the July milestone\n\n"
|
||||
"## Constraint: the calibration routine must complete in under 90 seconds for production use\n\n"
|
||||
"## Requirement: the polisher must hold position to within 0.5 micron at 1 g loading\n\n"
|
||||
"Action items captured for the next sync."
|
||||
),
|
||||
notes_for_human=(
|
||||
"Three rules should fire on the same interaction: "
|
||||
"decision_heading -> adaptation, constraint_heading -> project, "
|
||||
"requirement_heading -> project. Verify dedup doesn't merge them."
|
||||
),
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
def setup_environment(data_dir: Path) -> None:
|
||||
"""Configure AtoCore to use an isolated data directory for this run."""
|
||||
if data_dir.exists():
|
||||
shutil.rmtree(data_dir)
|
||||
data_dir.mkdir(parents=True, exist_ok=True)
|
||||
os.environ["ATOCORE_DATA_DIR"] = str(data_dir)
|
||||
os.environ.setdefault("ATOCORE_DEBUG", "true")
|
||||
# Reset cached settings so the new env vars take effect
|
||||
import atocore.config as config
|
||||
|
||||
config.settings = config.Settings()
|
||||
import atocore.retrieval.vector_store as vs
|
||||
|
||||
vs._store = None
|
||||
|
||||
|
||||
def seed_memories() -> dict[str, str]:
|
||||
"""Insert a small set of seed active memories so reinforcement has
|
||||
something to match against."""
|
||||
from atocore.memory.service import create_memory
|
||||
|
||||
seeded: dict[str, str] = {}
|
||||
seeded["pref_rebase"] = create_memory(
|
||||
memory_type="preference",
|
||||
content="prefers rebase-based workflows because history stays linear",
|
||||
confidence=0.6,
|
||||
).id
|
||||
seeded["pref_concise"] = create_memory(
|
||||
memory_type="preference",
|
||||
content="writes commit messages focused on the why, not the what",
|
||||
confidence=0.6,
|
||||
).id
|
||||
seeded["identity_runs_atocore"] = create_memory(
|
||||
memory_type="identity",
|
||||
content="mechanical engineer who runs AtoCore for context engineering",
|
||||
confidence=0.9,
|
||||
).id
|
||||
return seeded
|
||||
|
||||
|
||||
def run_sample(sample: SampleInteraction) -> dict:
|
||||
"""Capture one sample, run extraction, return a result dict."""
|
||||
from atocore.interactions.service import record_interaction
|
||||
from atocore.memory.extractor import extract_candidates_from_interaction
|
||||
|
||||
interaction = record_interaction(
|
||||
prompt=sample.prompt,
|
||||
response=sample.response,
|
||||
project=sample.project,
|
||||
client="phase9-first-real-use",
|
||||
session_id="first-real-use",
|
||||
reinforce=True,
|
||||
)
|
||||
candidates = extract_candidates_from_interaction(interaction)
|
||||
|
||||
return {
|
||||
"label": sample.label,
|
||||
"project": sample.project,
|
||||
"interaction_id": interaction.id,
|
||||
"expected_notes": sample.notes_for_human,
|
||||
"candidate_count": len(candidates),
|
||||
"candidates": [
|
||||
{
|
||||
"memory_type": c.memory_type,
|
||||
"rule": c.rule,
|
||||
"content": c.content,
|
||||
"source_span": c.source_span[:120],
|
||||
}
|
||||
for c in candidates
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def report_seed_memory_state(seeded_ids: dict[str, str]) -> dict:
|
||||
from atocore.memory.service import get_memories
|
||||
|
||||
state = {}
|
||||
for label, mid in seeded_ids.items():
|
||||
rows = [m for m in get_memories(limit=200) if m.id == mid]
|
||||
if not rows:
|
||||
state[label] = None
|
||||
continue
|
||||
m = rows[0]
|
||||
state[label] = {
|
||||
"id": m.id,
|
||||
"memory_type": m.memory_type,
|
||||
"content_preview": m.content[:80],
|
||||
"confidence": round(m.confidence, 4),
|
||||
"reference_count": m.reference_count,
|
||||
"last_referenced_at": m.last_referenced_at,
|
||||
}
|
||||
return state
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--data-dir",
|
||||
default=str(_REPO_ROOT / "data" / "validation" / "phase9-first-use"),
|
||||
help="Isolated data directory to use for this validation run",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json",
|
||||
action="store_true",
|
||||
help="Emit machine-readable JSON instead of human prose",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
data_dir = Path(args.data_dir).resolve()
|
||||
setup_environment(data_dir)
|
||||
|
||||
from atocore.models.database import init_db
|
||||
from atocore.context.project_state import init_project_state_schema
|
||||
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
|
||||
seeded = seed_memories()
|
||||
sample_results = [run_sample(s) for s in SAMPLES]
|
||||
final_seed_state = report_seed_memory_state(seeded)
|
||||
|
||||
if args.json:
|
||||
json.dump(
|
||||
{
|
||||
"data_dir": str(data_dir),
|
||||
"seeded_memories_initial": list(seeded.keys()),
|
||||
"samples": sample_results,
|
||||
"seed_memory_state_after_run": final_seed_state,
|
||||
},
|
||||
sys.stdout,
|
||||
indent=2,
|
||||
default=str,
|
||||
)
|
||||
return 0
|
||||
|
||||
print("=" * 78)
|
||||
print("Phase 9 first-real-use validation run")
|
||||
print("=" * 78)
|
||||
print(f"Isolated data dir: {data_dir}")
|
||||
print()
|
||||
print("Seeded the memory store with 3 active memories:")
|
||||
for label, mid in seeded.items():
|
||||
print(f" - {label} ({mid[:8]})")
|
||||
print()
|
||||
print("-" * 78)
|
||||
print(f"Running {len(SAMPLES)} sample interactions ...")
|
||||
print("-" * 78)
|
||||
|
||||
for result in sample_results:
|
||||
print()
|
||||
print(f"## {result['label']} [project={result['project']}]")
|
||||
print(f" interaction_id={result['interaction_id'][:8]}")
|
||||
print(f" expected: {result['expected_notes']}")
|
||||
print(f" candidates produced: {result['candidate_count']}")
|
||||
for i, cand in enumerate(result["candidates"], 1):
|
||||
print(
|
||||
f" [{i}] type={cand['memory_type']:11s} "
|
||||
f"rule={cand['rule']:21s} "
|
||||
f"content={cand['content']!r}"
|
||||
)
|
||||
|
||||
print()
|
||||
print("-" * 78)
|
||||
print("Reinforcement state on seeded memories AFTER all interactions:")
|
||||
print("-" * 78)
|
||||
for label, state in final_seed_state.items():
|
||||
if state is None:
|
||||
print(f" {label}: <missing>")
|
||||
continue
|
||||
print(
|
||||
f" {label}: confidence={state['confidence']:.4f} "
|
||||
f"refs={state['reference_count']} "
|
||||
f"last={state['last_referenced_at'] or '-'}"
|
||||
)
|
||||
|
||||
print()
|
||||
print("=" * 78)
|
||||
print("Run complete. Data written to:", data_dir)
|
||||
print("=" * 78)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
76
scripts/query_test.py
Normal file
76
scripts/query_test.py
Normal file
@@ -0,0 +1,76 @@
|
||||
"""CLI script to run test prompts and compare baseline vs enriched."""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
# Add src to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
|
||||
|
||||
from atocore.context.builder import build_context
|
||||
from atocore.models.database import init_db
|
||||
from atocore.observability.logger import setup_logging
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Run test prompts against AtoCore")
|
||||
parser.add_argument(
|
||||
"--prompts",
|
||||
default=str(Path(__file__).parent.parent / "tests" / "test_prompts" / "prompts.yaml"),
|
||||
help="Path to prompts YAML file",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging()
|
||||
init_db()
|
||||
|
||||
prompts_path = Path(args.prompts)
|
||||
if not prompts_path.exists():
|
||||
print(f"Error: {prompts_path} not found")
|
||||
sys.exit(1)
|
||||
|
||||
with open(prompts_path) as f:
|
||||
data = yaml.safe_load(f)
|
||||
|
||||
prompts = data.get("prompts", [])
|
||||
print(f"Running {len(prompts)} test prompts...\n")
|
||||
|
||||
for p in prompts:
|
||||
prompt_id = p["id"]
|
||||
prompt_text = p["prompt"]
|
||||
project = p.get("project")
|
||||
expected = p.get("expected", "")
|
||||
|
||||
print(f"{'='*60}")
|
||||
print(f"[{prompt_id}] {prompt_text}")
|
||||
print(f"Project: {project or 'none'}")
|
||||
print(f"Expected: {expected}")
|
||||
print(f"-" * 60)
|
||||
|
||||
pack = build_context(
|
||||
user_prompt=prompt_text,
|
||||
project_hint=project,
|
||||
)
|
||||
|
||||
print(f"Chunks retrieved: {len(pack.chunks_used)}")
|
||||
print(f"Total chars: {pack.total_chars} / {pack.budget}")
|
||||
print(f"Duration: {pack.duration_ms}ms")
|
||||
print()
|
||||
|
||||
for i, chunk in enumerate(pack.chunks_used[:5]):
|
||||
print(f" [{i+1}] Score: {chunk.score:.2f} | {chunk.source_file}")
|
||||
print(f" Section: {chunk.heading_path}")
|
||||
print(f" Preview: {chunk.content[:120]}...")
|
||||
print()
|
||||
|
||||
print(f"Full prompt length: {len(pack.full_prompt)} chars")
|
||||
print()
|
||||
|
||||
print(f"{'='*60}")
|
||||
print("Done. Review output above to assess retrieval quality.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
194
scripts/retrieval_eval.py
Normal file
194
scripts/retrieval_eval.py
Normal file
@@ -0,0 +1,194 @@
|
||||
"""Retrieval quality eval harness.
|
||||
|
||||
Runs a fixed set of project-hinted questions against
|
||||
``POST /context/build`` on a live AtoCore instance and scores the
|
||||
resulting ``formatted_context`` against per-question expectations.
|
||||
The goal is a diffable scorecard that tells you, run-to-run,
|
||||
whether a retrieval / builder / ingestion change moved the needle.
|
||||
|
||||
Design notes
|
||||
------------
|
||||
- Fixtures live in ``scripts/retrieval_eval_fixtures.json`` so new
|
||||
questions can be added without touching Python. Each fixture
|
||||
names the project, the prompt, and a checklist of substrings that
|
||||
MUST appear in ``formatted_context`` (``expect_present``) and
|
||||
substrings that MUST NOT appear (``expect_absent``). The absent
|
||||
list catches cross-project bleed and stale content.
|
||||
- The checklist is deliberately substring-based (not regex, not
|
||||
embedding-similarity) so a failure is always a trivially
|
||||
reproducible "this string is not in that string". Richer scoring
|
||||
can come later once we know the harness is useful.
|
||||
- The harness is external to the app runtime and talks to AtoCore
|
||||
over HTTP, so it works against dev, staging, or prod. It follows
|
||||
the same environment-variable contract as ``atocore_client.py``
|
||||
(``ATOCORE_BASE_URL``, ``ATOCORE_TIMEOUT_SECONDS``).
|
||||
- Exit code 0 on all-pass, 1 on any fixture failure. Intended for
|
||||
manual runs today; a future cron / CI hook can consume the
|
||||
JSON output via ``--json``.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
python scripts/retrieval_eval.py # human-readable report
|
||||
python scripts/retrieval_eval.py --json # machine-readable
|
||||
python scripts/retrieval_eval.py --fixtures path/to/custom.json
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
|
||||
DEFAULT_TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
|
||||
DEFAULT_BUDGET = 3000
|
||||
DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json"
|
||||
|
||||
|
||||
@dataclass
|
||||
class Fixture:
|
||||
name: str
|
||||
project: str
|
||||
prompt: str
|
||||
budget: int = DEFAULT_BUDGET
|
||||
expect_present: list[str] = field(default_factory=list)
|
||||
expect_absent: list[str] = field(default_factory=list)
|
||||
notes: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class FixtureResult:
|
||||
fixture: Fixture
|
||||
ok: bool
|
||||
missing_present: list[str]
|
||||
unexpected_absent: list[str]
|
||||
total_chars: int
|
||||
error: str = ""
|
||||
|
||||
|
||||
def load_fixtures(path: Path) -> list[Fixture]:
|
||||
data = json.loads(path.read_text(encoding="utf-8"))
|
||||
if not isinstance(data, list):
|
||||
raise ValueError(f"{path} must contain a JSON array of fixtures")
|
||||
fixtures: list[Fixture] = []
|
||||
for i, raw in enumerate(data):
|
||||
if not isinstance(raw, dict):
|
||||
raise ValueError(f"fixture {i} is not an object")
|
||||
fixtures.append(
|
||||
Fixture(
|
||||
name=raw["name"],
|
||||
project=raw.get("project", ""),
|
||||
prompt=raw["prompt"],
|
||||
budget=int(raw.get("budget", DEFAULT_BUDGET)),
|
||||
expect_present=list(raw.get("expect_present", [])),
|
||||
expect_absent=list(raw.get("expect_absent", [])),
|
||||
notes=raw.get("notes", ""),
|
||||
)
|
||||
)
|
||||
return fixtures
|
||||
|
||||
|
||||
def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
|
||||
payload = {
|
||||
"prompt": fixture.prompt,
|
||||
"project": fixture.project or None,
|
||||
"budget": fixture.budget,
|
||||
}
|
||||
req = urllib.request.Request(
|
||||
url=f"{base_url}/context/build",
|
||||
method="POST",
|
||||
headers={"Content-Type": "application/json"},
|
||||
data=json.dumps(payload).encode("utf-8"),
|
||||
)
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||
body = json.loads(resp.read().decode("utf-8"))
|
||||
except urllib.error.URLError as exc:
|
||||
return FixtureResult(
|
||||
fixture=fixture,
|
||||
ok=False,
|
||||
missing_present=list(fixture.expect_present),
|
||||
unexpected_absent=[],
|
||||
total_chars=0,
|
||||
error=f"http_error: {exc}",
|
||||
)
|
||||
|
||||
formatted = body.get("formatted_context") or ""
|
||||
missing = [s for s in fixture.expect_present if s not in formatted]
|
||||
unexpected = [s for s in fixture.expect_absent if s in formatted]
|
||||
return FixtureResult(
|
||||
fixture=fixture,
|
||||
ok=not missing and not unexpected,
|
||||
missing_present=missing,
|
||||
unexpected_absent=unexpected,
|
||||
total_chars=len(formatted),
|
||||
)
|
||||
|
||||
|
||||
def print_human_report(results: list[FixtureResult]) -> None:
|
||||
total = len(results)
|
||||
passed = sum(1 for r in results if r.ok)
|
||||
print(f"Retrieval eval: {passed}/{total} fixtures passed")
|
||||
print()
|
||||
for r in results:
|
||||
marker = "PASS" if r.ok else "FAIL"
|
||||
print(f"[{marker}] {r.fixture.name} project={r.fixture.project} chars={r.total_chars}")
|
||||
if r.error:
|
||||
print(f" error: {r.error}")
|
||||
for miss in r.missing_present:
|
||||
print(f" missing expected: {miss!r}")
|
||||
for bleed in r.unexpected_absent:
|
||||
print(f" unexpected present: {bleed!r}")
|
||||
if r.fixture.notes and not r.ok:
|
||||
print(f" notes: {r.fixture.notes}")
|
||||
|
||||
|
||||
def print_json_report(results: list[FixtureResult]) -> None:
|
||||
payload = {
|
||||
"total": len(results),
|
||||
"passed": sum(1 for r in results if r.ok),
|
||||
"fixtures": [
|
||||
{
|
||||
"name": r.fixture.name,
|
||||
"project": r.fixture.project,
|
||||
"ok": r.ok,
|
||||
"total_chars": r.total_chars,
|
||||
"missing_present": r.missing_present,
|
||||
"unexpected_absent": r.unexpected_absent,
|
||||
"error": r.error,
|
||||
}
|
||||
for r in results
|
||||
],
|
||||
}
|
||||
json.dump(payload, sys.stdout, indent=2)
|
||||
sys.stdout.write("\n")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(description="AtoCore retrieval quality eval harness")
|
||||
parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
|
||||
parser.add_argument("--timeout", type=int, default=DEFAULT_TIMEOUT)
|
||||
parser.add_argument("--fixtures", type=Path, default=DEFAULT_FIXTURES)
|
||||
parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
|
||||
args = parser.parse_args()
|
||||
|
||||
fixtures = load_fixtures(args.fixtures)
|
||||
results = [run_fixture(f, args.base_url, args.timeout) for f in fixtures]
|
||||
|
||||
if args.json:
|
||||
print_json_report(results)
|
||||
else:
|
||||
print_human_report(results)
|
||||
|
||||
return 0 if all(r.ok for r in results) else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
86
scripts/retrieval_eval_fixtures.json
Normal file
86
scripts/retrieval_eval_fixtures.json
Normal file
@@ -0,0 +1,86 @@
|
||||
[
|
||||
{
|
||||
"name": "p04-architecture-decision",
|
||||
"project": "p04-gigabit",
|
||||
"prompt": "what mirror architecture was selected for GigaBIT M1 and why",
|
||||
"expect_present": [
|
||||
"--- Trusted Project State ---",
|
||||
"Option B",
|
||||
"conical",
|
||||
"--- Project Memories ---"
|
||||
],
|
||||
"expect_absent": [
|
||||
"p06-polisher",
|
||||
"folded-beam"
|
||||
],
|
||||
"notes": "Canonical p04 decision — should surface both Trusted Project State (selected_mirror_architecture) and the project-memory band with the Option B memory"
|
||||
},
|
||||
{
|
||||
"name": "p04-constraints",
|
||||
"project": "p04-gigabit",
|
||||
"prompt": "what are the key GigaBIT M1 program constraints",
|
||||
"expect_present": [
|
||||
"--- Trusted Project State ---",
|
||||
"Zerodur",
|
||||
"1.2"
|
||||
],
|
||||
"expect_absent": [
|
||||
"polisher suite"
|
||||
],
|
||||
"notes": "Key constraints are in Trusted Project State (key_constraints) and in the mission-framing memory"
|
||||
},
|
||||
{
|
||||
"name": "p05-configuration",
|
||||
"project": "p05-interferometer",
|
||||
"prompt": "what is the selected interferometer configuration",
|
||||
"expect_present": [
|
||||
"folded-beam",
|
||||
"CGH"
|
||||
],
|
||||
"expect_absent": [
|
||||
"Option B",
|
||||
"conical back",
|
||||
"polisher suite"
|
||||
],
|
||||
"notes": "P05 architecture memory covers folded-beam + CGH. GigaBIT M1 is the mirror under test and legitimately appears in p05 source docs (the interferometer measures it), so we only flag genuinely p04-only decisions like the mirror architecture choice."
|
||||
},
|
||||
{
|
||||
"name": "p05-vendor-signal",
|
||||
"project": "p05-interferometer",
|
||||
"prompt": "what is the current vendor signal for the interferometer procurement",
|
||||
"expect_present": [
|
||||
"4D",
|
||||
"Zygo"
|
||||
],
|
||||
"expect_absent": [
|
||||
"polisher"
|
||||
],
|
||||
"notes": "Vendor memory mentions 4D as strongest technical candidate and Zygo Verifire SV as value path"
|
||||
},
|
||||
{
|
||||
"name": "p06-suite-split",
|
||||
"project": "p06-polisher",
|
||||
"prompt": "how is the polisher software suite split across layers",
|
||||
"expect_present": [
|
||||
"polisher-sim",
|
||||
"polisher-post",
|
||||
"polisher-control"
|
||||
],
|
||||
"expect_absent": [
|
||||
"GigaBIT"
|
||||
],
|
||||
"notes": "The three-layer split is in multiple p06 memories; check all three names surface together"
|
||||
},
|
||||
{
|
||||
"name": "p06-control-rule",
|
||||
"project": "p06-polisher",
|
||||
"prompt": "what is the polisher control design rule",
|
||||
"expect_present": [
|
||||
"interlocks"
|
||||
],
|
||||
"expect_absent": [
|
||||
"interferometer"
|
||||
],
|
||||
"notes": "Control design rule memory mentions interlocks and state transitions"
|
||||
}
|
||||
]
|
||||
15
src/atocore/__init__.py
Normal file
15
src/atocore/__init__.py
Normal file
@@ -0,0 +1,15 @@
|
||||
"""AtoCore — Personal Context Engine."""
|
||||
|
||||
# Bumped when a commit meaningfully changes the API surface, schema, or
|
||||
# user-visible behavior. The /health endpoint reports this value so
|
||||
# deployment drift is immediately visible: if the running service's
|
||||
# /health reports an older version than the main branch's __version__,
|
||||
# the deployment is stale and needs a redeploy (see
|
||||
# docs/dalidou-deployment.md and deploy/dalidou/deploy.sh).
|
||||
#
|
||||
# History:
|
||||
# 0.1.0 Phase 0/0.5/1/2/3/5/7 baseline
|
||||
# 0.2.0 Phase 9 reflection loop (capture/reinforce/extract + review
|
||||
# queue), shared client v0.2.0, project identity
|
||||
# canonicalization at every service-layer entry point
|
||||
__version__ = "0.2.0"
|
||||
0
src/atocore/api/__init__.py
Normal file
0
src/atocore/api/__init__.py
Normal file
847
src/atocore/api/routes.py
Normal file
847
src/atocore/api/routes.py
Normal file
@@ -0,0 +1,847 @@
|
||||
"""FastAPI route definitions."""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
|
||||
import atocore.config as _config
|
||||
from atocore.context.builder import (
|
||||
build_context,
|
||||
get_last_context_pack,
|
||||
_pack_to_dict,
|
||||
)
|
||||
from atocore.context.project_state import (
|
||||
CATEGORIES,
|
||||
get_state,
|
||||
invalidate_state,
|
||||
set_state,
|
||||
)
|
||||
from atocore.ingestion.pipeline import (
|
||||
exclusive_ingestion,
|
||||
get_ingestion_stats,
|
||||
get_source_status,
|
||||
ingest_configured_sources,
|
||||
ingest_file,
|
||||
ingest_folder,
|
||||
)
|
||||
from atocore.interactions.service import (
|
||||
get_interaction,
|
||||
list_interactions,
|
||||
record_interaction,
|
||||
)
|
||||
from atocore.memory.extractor import (
|
||||
EXTRACTOR_VERSION,
|
||||
MemoryCandidate,
|
||||
extract_candidates_from_interaction,
|
||||
)
|
||||
from atocore.memory.reinforcement import reinforce_from_interaction
|
||||
from atocore.memory.service import (
|
||||
MEMORY_STATUSES,
|
||||
MEMORY_TYPES,
|
||||
create_memory,
|
||||
get_memories,
|
||||
invalidate_memory,
|
||||
promote_memory,
|
||||
reject_candidate_memory,
|
||||
supersede_memory,
|
||||
update_memory,
|
||||
)
|
||||
from atocore.observability.logger import get_logger
|
||||
from atocore.ops.backup import (
|
||||
cleanup_old_backups,
|
||||
create_runtime_backup,
|
||||
list_runtime_backups,
|
||||
validate_backup,
|
||||
)
|
||||
from atocore.projects.registry import (
|
||||
build_project_registration_proposal,
|
||||
get_project_registry_template,
|
||||
list_registered_projects,
|
||||
register_project,
|
||||
refresh_registered_project,
|
||||
update_project,
|
||||
)
|
||||
from atocore.retrieval.retriever import retrieve
|
||||
from atocore.retrieval.vector_store import get_vector_store
|
||||
|
||||
router = APIRouter()
|
||||
log = get_logger("api")
|
||||
|
||||
|
||||
# --- Request/Response models ---
|
||||
|
||||
|
||||
class IngestRequest(BaseModel):
|
||||
path: str
|
||||
|
||||
|
||||
class IngestResponse(BaseModel):
|
||||
results: list[dict]
|
||||
|
||||
|
||||
class IngestSourcesResponse(BaseModel):
|
||||
results: list[dict]
|
||||
|
||||
|
||||
class ProjectRefreshResponse(BaseModel):
|
||||
project: str
|
||||
aliases: list[str]
|
||||
description: str
|
||||
purge_deleted: bool
|
||||
status: str
|
||||
roots_ingested: int
|
||||
roots_skipped: int
|
||||
roots: list[dict]
|
||||
|
||||
|
||||
class ProjectRegistrationProposalRequest(BaseModel):
|
||||
project_id: str
|
||||
aliases: list[str] = []
|
||||
description: str = ""
|
||||
ingest_roots: list[dict]
|
||||
|
||||
|
||||
class ProjectUpdateRequest(BaseModel):
|
||||
aliases: list[str] | None = None
|
||||
description: str | None = None
|
||||
ingest_roots: list[dict] | None = None
|
||||
|
||||
|
||||
class QueryRequest(BaseModel):
|
||||
prompt: str
|
||||
top_k: int = 10
|
||||
filter_tags: list[str] | None = None
|
||||
project: str | None = None
|
||||
|
||||
|
||||
class QueryResponse(BaseModel):
|
||||
results: list[dict]
|
||||
|
||||
|
||||
class ContextBuildRequest(BaseModel):
|
||||
prompt: str
|
||||
project: str | None = None
|
||||
budget: int | None = None
|
||||
|
||||
|
||||
class ContextBuildResponse(BaseModel):
|
||||
formatted_context: str
|
||||
full_prompt: str
|
||||
chunks_used: int
|
||||
total_chars: int
|
||||
budget: int
|
||||
budget_remaining: int
|
||||
duration_ms: int
|
||||
chunks: list[dict]
|
||||
|
||||
|
||||
class MemoryCreateRequest(BaseModel):
|
||||
memory_type: str
|
||||
content: str
|
||||
project: str = ""
|
||||
confidence: float = 1.0
|
||||
status: str = "active"
|
||||
|
||||
|
||||
class MemoryUpdateRequest(BaseModel):
|
||||
content: str | None = None
|
||||
confidence: float | None = None
|
||||
status: str | None = None
|
||||
|
||||
|
||||
class ProjectStateSetRequest(BaseModel):
|
||||
project: str
|
||||
category: str
|
||||
key: str
|
||||
value: str
|
||||
source: str = ""
|
||||
confidence: float = 1.0
|
||||
|
||||
|
||||
class ProjectStateGetRequest(BaseModel):
|
||||
project: str
|
||||
category: str | None = None
|
||||
|
||||
|
||||
class ProjectStateInvalidateRequest(BaseModel):
|
||||
project: str
|
||||
category: str
|
||||
key: str
|
||||
|
||||
|
||||
# --- Endpoints ---
|
||||
|
||||
|
||||
@router.post("/ingest", response_model=IngestResponse)
|
||||
def api_ingest(req: IngestRequest) -> IngestResponse:
|
||||
"""Ingest a markdown file or folder."""
|
||||
target = Path(req.path)
|
||||
try:
|
||||
with exclusive_ingestion():
|
||||
if target.is_file():
|
||||
results = [ingest_file(target)]
|
||||
elif target.is_dir():
|
||||
results = ingest_folder(target)
|
||||
else:
|
||||
raise HTTPException(status_code=404, detail=f"Path not found: {req.path}")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
log.error("ingest_failed", path=req.path, error=str(e))
|
||||
raise HTTPException(status_code=500, detail=f"Ingestion failed: {e}")
|
||||
return IngestResponse(results=results)
|
||||
|
||||
|
||||
@router.post("/ingest/sources", response_model=IngestSourcesResponse)
|
||||
def api_ingest_sources() -> IngestSourcesResponse:
|
||||
"""Ingest enabled configured source directories."""
|
||||
try:
|
||||
with exclusive_ingestion():
|
||||
results = ingest_configured_sources()
|
||||
except Exception as e:
|
||||
log.error("ingest_sources_failed", error=str(e))
|
||||
raise HTTPException(status_code=500, detail=f"Configured source ingestion failed: {e}")
|
||||
return IngestSourcesResponse(results=results)
|
||||
|
||||
|
||||
@router.get("/projects")
|
||||
def api_projects() -> dict:
|
||||
"""Return registered projects and their resolved ingest roots."""
|
||||
return {
|
||||
"projects": list_registered_projects(),
|
||||
"registry_path": str(_config.settings.resolved_project_registry_path),
|
||||
}
|
||||
|
||||
|
||||
@router.get("/projects/template")
|
||||
def api_projects_template() -> dict:
|
||||
"""Return a starter template for project registry entries."""
|
||||
return {
|
||||
"template": get_project_registry_template(),
|
||||
"registry_path": str(_config.settings.resolved_project_registry_path),
|
||||
"allowed_sources": ["vault", "drive"],
|
||||
}
|
||||
|
||||
|
||||
@router.post("/projects/proposal")
|
||||
def api_project_registration_proposal(req: ProjectRegistrationProposalRequest) -> dict:
|
||||
"""Return a normalized project registration proposal without writing it."""
|
||||
try:
|
||||
return build_project_registration_proposal(
|
||||
project_id=req.project_id,
|
||||
aliases=req.aliases,
|
||||
description=req.description,
|
||||
ingest_roots=req.ingest_roots,
|
||||
)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
|
||||
|
||||
@router.post("/projects/register")
|
||||
def api_project_registration(req: ProjectRegistrationProposalRequest) -> dict:
|
||||
"""Persist a validated project registration to the registry file."""
|
||||
try:
|
||||
return register_project(
|
||||
project_id=req.project_id,
|
||||
aliases=req.aliases,
|
||||
description=req.description,
|
||||
ingest_roots=req.ingest_roots,
|
||||
)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
|
||||
|
||||
@router.put("/projects/{project_name}")
|
||||
def api_project_update(project_name: str, req: ProjectUpdateRequest) -> dict:
|
||||
"""Update an existing project registration."""
|
||||
try:
|
||||
return update_project(
|
||||
project_name=project_name,
|
||||
aliases=req.aliases,
|
||||
description=req.description,
|
||||
ingest_roots=req.ingest_roots,
|
||||
)
|
||||
except ValueError as e:
|
||||
detail = str(e)
|
||||
if detail.startswith("Unknown project"):
|
||||
raise HTTPException(status_code=404, detail=detail)
|
||||
raise HTTPException(status_code=400, detail=detail)
|
||||
|
||||
|
||||
@router.post("/projects/{project_name}/refresh", response_model=ProjectRefreshResponse)
|
||||
def api_refresh_project(project_name: str, purge_deleted: bool = False) -> ProjectRefreshResponse:
|
||||
"""Refresh one registered project from its configured ingest roots."""
|
||||
try:
|
||||
with exclusive_ingestion():
|
||||
result = refresh_registered_project(project_name, purge_deleted=purge_deleted)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=404, detail=str(e))
|
||||
except Exception as e:
|
||||
log.error("project_refresh_failed", project=project_name, error=str(e))
|
||||
raise HTTPException(status_code=500, detail=f"Project refresh failed: {e}")
|
||||
return ProjectRefreshResponse(**result)
|
||||
|
||||
|
||||
@router.post("/query", response_model=QueryResponse)
|
||||
def api_query(req: QueryRequest) -> QueryResponse:
|
||||
"""Retrieve relevant chunks for a prompt."""
|
||||
try:
|
||||
chunks = retrieve(
|
||||
req.prompt,
|
||||
top_k=req.top_k,
|
||||
filter_tags=req.filter_tags,
|
||||
project_hint=req.project,
|
||||
)
|
||||
except Exception as e:
|
||||
log.error("query_failed", prompt=req.prompt[:100], error=str(e))
|
||||
raise HTTPException(status_code=500, detail=f"Query failed: {e}")
|
||||
return QueryResponse(
|
||||
results=[
|
||||
{
|
||||
"chunk_id": c.chunk_id,
|
||||
"content": c.content,
|
||||
"score": c.score,
|
||||
"heading_path": c.heading_path,
|
||||
"source_file": c.source_file,
|
||||
"title": c.title,
|
||||
}
|
||||
for c in chunks
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@router.post("/context/build", response_model=ContextBuildResponse)
|
||||
def api_build_context(req: ContextBuildRequest) -> ContextBuildResponse:
|
||||
"""Build a full context pack for a prompt."""
|
||||
try:
|
||||
pack = build_context(
|
||||
user_prompt=req.prompt,
|
||||
project_hint=req.project,
|
||||
budget=req.budget,
|
||||
)
|
||||
except Exception as e:
|
||||
log.error("context_build_failed", prompt=req.prompt[:100], error=str(e))
|
||||
raise HTTPException(status_code=500, detail=f"Context build failed: {e}")
|
||||
pack_dict = _pack_to_dict(pack)
|
||||
return ContextBuildResponse(
|
||||
formatted_context=pack.formatted_context,
|
||||
full_prompt=pack.full_prompt,
|
||||
chunks_used=len(pack.chunks_used),
|
||||
total_chars=pack.total_chars,
|
||||
budget=pack.budget,
|
||||
budget_remaining=pack.budget_remaining,
|
||||
duration_ms=pack.duration_ms,
|
||||
chunks=pack_dict["chunks"],
|
||||
)
|
||||
|
||||
|
||||
@router.post("/memory")
|
||||
def api_create_memory(req: MemoryCreateRequest) -> dict:
|
||||
"""Create a new memory entry."""
|
||||
try:
|
||||
mem = create_memory(
|
||||
memory_type=req.memory_type,
|
||||
content=req.content,
|
||||
project=req.project,
|
||||
confidence=req.confidence,
|
||||
status=req.status,
|
||||
)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
return {"status": "ok", "id": mem.id, "memory_type": mem.memory_type}
|
||||
|
||||
|
||||
@router.get("/memory")
|
||||
def api_get_memories(
|
||||
memory_type: str | None = None,
|
||||
project: str | None = None,
|
||||
active_only: bool = True,
|
||||
min_confidence: float = 0.0,
|
||||
limit: int = 50,
|
||||
status: str | None = None,
|
||||
) -> dict:
|
||||
"""List memories, optionally filtered.
|
||||
|
||||
When ``status`` is given explicitly it overrides ``active_only`` so
|
||||
the Phase 9 Commit C review queue can be listed via
|
||||
``GET /memory?status=candidate``.
|
||||
"""
|
||||
try:
|
||||
memories = get_memories(
|
||||
memory_type=memory_type,
|
||||
project=project,
|
||||
active_only=active_only,
|
||||
min_confidence=min_confidence,
|
||||
limit=limit,
|
||||
status=status,
|
||||
)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
return {
|
||||
"memories": [
|
||||
{
|
||||
"id": m.id,
|
||||
"memory_type": m.memory_type,
|
||||
"content": m.content,
|
||||
"project": m.project,
|
||||
"confidence": m.confidence,
|
||||
"status": m.status,
|
||||
"reference_count": m.reference_count,
|
||||
"last_referenced_at": m.last_referenced_at,
|
||||
"updated_at": m.updated_at,
|
||||
}
|
||||
for m in memories
|
||||
],
|
||||
"types": MEMORY_TYPES,
|
||||
"statuses": MEMORY_STATUSES,
|
||||
}
|
||||
|
||||
|
||||
@router.put("/memory/{memory_id}")
|
||||
def api_update_memory(memory_id: str, req: MemoryUpdateRequest) -> dict:
|
||||
"""Update an existing memory."""
|
||||
try:
|
||||
success = update_memory(
|
||||
memory_id=memory_id,
|
||||
content=req.content,
|
||||
confidence=req.confidence,
|
||||
status=req.status,
|
||||
)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
if not success:
|
||||
raise HTTPException(status_code=404, detail="Memory not found")
|
||||
return {"status": "updated", "id": memory_id}
|
||||
|
||||
|
||||
@router.delete("/memory/{memory_id}")
|
||||
def api_invalidate_memory(memory_id: str) -> dict:
|
||||
"""Invalidate a memory (error correction)."""
|
||||
success = invalidate_memory(memory_id)
|
||||
if not success:
|
||||
raise HTTPException(status_code=404, detail="Memory not found")
|
||||
return {"status": "invalidated", "id": memory_id}
|
||||
|
||||
|
||||
@router.post("/memory/{memory_id}/promote")
|
||||
def api_promote_memory(memory_id: str) -> dict:
|
||||
"""Promote a candidate memory to active (Phase 9 Commit C)."""
|
||||
try:
|
||||
success = promote_memory(memory_id)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
if not success:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Memory not found or not a candidate: {memory_id}",
|
||||
)
|
||||
return {"status": "promoted", "id": memory_id}
|
||||
|
||||
|
||||
@router.post("/memory/{memory_id}/reject")
|
||||
def api_reject_candidate_memory(memory_id: str) -> dict:
|
||||
"""Reject a candidate memory (Phase 9 Commit C review queue)."""
|
||||
success = reject_candidate_memory(memory_id)
|
||||
if not success:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Memory not found or not a candidate: {memory_id}",
|
||||
)
|
||||
return {"status": "rejected", "id": memory_id}
|
||||
|
||||
|
||||
@router.post("/project/state")
|
||||
def api_set_project_state(req: ProjectStateSetRequest) -> dict:
|
||||
"""Set or update a trusted project state entry."""
|
||||
try:
|
||||
entry = set_state(
|
||||
project_name=req.project,
|
||||
category=req.category,
|
||||
key=req.key,
|
||||
value=req.value,
|
||||
source=req.source,
|
||||
confidence=req.confidence,
|
||||
)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
log.error("set_state_failed", error=str(e))
|
||||
raise HTTPException(status_code=500, detail=f"Failed to set state: {e}")
|
||||
return {"status": "ok", "id": entry.id, "category": entry.category, "key": entry.key}
|
||||
|
||||
|
||||
@router.get("/project/state/{project_name}")
|
||||
def api_get_project_state(project_name: str, category: str | None = None) -> dict:
|
||||
"""Get trusted project state entries."""
|
||||
entries = get_state(project_name, category=category)
|
||||
return {
|
||||
"project": project_name,
|
||||
"entries": [
|
||||
{
|
||||
"id": e.id,
|
||||
"category": e.category,
|
||||
"key": e.key,
|
||||
"value": e.value,
|
||||
"source": e.source,
|
||||
"confidence": e.confidence,
|
||||
"status": e.status,
|
||||
"updated_at": e.updated_at,
|
||||
}
|
||||
for e in entries
|
||||
],
|
||||
"categories": CATEGORIES,
|
||||
}
|
||||
|
||||
|
||||
@router.delete("/project/state")
|
||||
def api_invalidate_project_state(req: ProjectStateInvalidateRequest) -> dict:
|
||||
"""Invalidate (supersede) a project state entry."""
|
||||
success = invalidate_state(req.project, req.category, req.key)
|
||||
if not success:
|
||||
raise HTTPException(status_code=404, detail="State entry not found or already invalidated")
|
||||
return {"status": "invalidated", "project": req.project, "category": req.category, "key": req.key}
|
||||
|
||||
|
||||
class InteractionRecordRequest(BaseModel):
|
||||
prompt: str
|
||||
response: str = ""
|
||||
response_summary: str = ""
|
||||
project: str = ""
|
||||
client: str = ""
|
||||
session_id: str = ""
|
||||
memories_used: list[str] = []
|
||||
chunks_used: list[str] = []
|
||||
context_pack: dict | None = None
|
||||
reinforce: bool = True
|
||||
extract: bool = False
|
||||
|
||||
|
||||
@router.post("/interactions")
|
||||
def api_record_interaction(req: InteractionRecordRequest) -> dict:
|
||||
"""Capture one interaction (prompt + response + what was used).
|
||||
|
||||
This is the foundation of the AtoCore reflection loop. It records
|
||||
what the system fed to an LLM and what came back. If ``reinforce``
|
||||
is true (default) and there is response content, the Phase 9
|
||||
Commit B reinforcement pass runs automatically, bumping the
|
||||
confidence of any active memory echoed in the response. Nothing is
|
||||
ever promoted into trusted state automatically.
|
||||
"""
|
||||
try:
|
||||
interaction = record_interaction(
|
||||
prompt=req.prompt,
|
||||
response=req.response,
|
||||
response_summary=req.response_summary,
|
||||
project=req.project,
|
||||
client=req.client,
|
||||
session_id=req.session_id,
|
||||
memories_used=req.memories_used,
|
||||
chunks_used=req.chunks_used,
|
||||
context_pack=req.context_pack,
|
||||
reinforce=req.reinforce,
|
||||
extract=req.extract,
|
||||
)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
return {
|
||||
"status": "recorded",
|
||||
"id": interaction.id,
|
||||
"created_at": interaction.created_at,
|
||||
}
|
||||
|
||||
|
||||
@router.post("/interactions/{interaction_id}/reinforce")
|
||||
def api_reinforce_interaction(interaction_id: str) -> dict:
|
||||
"""Run the reinforcement pass on an already-captured interaction.
|
||||
|
||||
Useful for backfilling reinforcement over historical interactions,
|
||||
or for retrying after a transient failure in the automatic pass
|
||||
that runs inside ``POST /interactions``.
|
||||
"""
|
||||
interaction = get_interaction(interaction_id)
|
||||
if interaction is None:
|
||||
raise HTTPException(status_code=404, detail=f"Interaction not found: {interaction_id}")
|
||||
results = reinforce_from_interaction(interaction)
|
||||
return {
|
||||
"interaction_id": interaction_id,
|
||||
"reinforced_count": len(results),
|
||||
"reinforced": [
|
||||
{
|
||||
"memory_id": r.memory_id,
|
||||
"memory_type": r.memory_type,
|
||||
"old_confidence": round(r.old_confidence, 4),
|
||||
"new_confidence": round(r.new_confidence, 4),
|
||||
}
|
||||
for r in results
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
class InteractionExtractRequest(BaseModel):
|
||||
persist: bool = False
|
||||
|
||||
|
||||
@router.post("/interactions/{interaction_id}/extract")
|
||||
def api_extract_from_interaction(
|
||||
interaction_id: str,
|
||||
req: InteractionExtractRequest | None = None,
|
||||
) -> dict:
|
||||
"""Extract candidate memories from a captured interaction.
|
||||
|
||||
Phase 9 Commit C. The extractor is rule-based and deliberately
|
||||
conservative — it only surfaces candidates that matched an explicit
|
||||
structural cue (decision heading, preference sentence, etc.). By
|
||||
default the candidates are returned *without* being persisted so a
|
||||
caller can preview them before committing to a review queue. Pass
|
||||
``persist: true`` to immediately create candidate memories for
|
||||
each extraction result.
|
||||
"""
|
||||
interaction = get_interaction(interaction_id)
|
||||
if interaction is None:
|
||||
raise HTTPException(status_code=404, detail=f"Interaction not found: {interaction_id}")
|
||||
payload = req or InteractionExtractRequest()
|
||||
candidates: list[MemoryCandidate] = extract_candidates_from_interaction(interaction)
|
||||
|
||||
persisted_ids: list[str] = []
|
||||
if payload.persist:
|
||||
for candidate in candidates:
|
||||
try:
|
||||
mem = create_memory(
|
||||
memory_type=candidate.memory_type,
|
||||
content=candidate.content,
|
||||
project=candidate.project,
|
||||
confidence=candidate.confidence,
|
||||
status="candidate",
|
||||
)
|
||||
persisted_ids.append(mem.id)
|
||||
except ValueError as e:
|
||||
log.error(
|
||||
"extract_persist_failed",
|
||||
interaction_id=interaction_id,
|
||||
rule=candidate.rule,
|
||||
error=str(e),
|
||||
)
|
||||
|
||||
return {
|
||||
"interaction_id": interaction_id,
|
||||
"candidate_count": len(candidates),
|
||||
"persisted": payload.persist,
|
||||
"persisted_ids": persisted_ids,
|
||||
"extractor_version": EXTRACTOR_VERSION,
|
||||
"candidates": [
|
||||
{
|
||||
"memory_type": c.memory_type,
|
||||
"content": c.content,
|
||||
"project": c.project,
|
||||
"confidence": c.confidence,
|
||||
"rule": c.rule,
|
||||
"source_span": c.source_span,
|
||||
"extractor_version": c.extractor_version,
|
||||
}
|
||||
for c in candidates
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@router.get("/interactions")
|
||||
def api_list_interactions(
|
||||
project: str | None = None,
|
||||
session_id: str | None = None,
|
||||
client: str | None = None,
|
||||
since: str | None = None,
|
||||
limit: int = 50,
|
||||
) -> dict:
|
||||
"""List captured interactions, optionally filtered by project, session,
|
||||
client, or creation time. Hard-capped at 500 entries per call."""
|
||||
interactions = list_interactions(
|
||||
project=project,
|
||||
session_id=session_id,
|
||||
client=client,
|
||||
since=since,
|
||||
limit=limit,
|
||||
)
|
||||
return {
|
||||
"count": len(interactions),
|
||||
"interactions": [
|
||||
{
|
||||
"id": i.id,
|
||||
"prompt": i.prompt,
|
||||
"response_summary": i.response_summary,
|
||||
"response_chars": len(i.response),
|
||||
"project": i.project,
|
||||
"client": i.client,
|
||||
"session_id": i.session_id,
|
||||
"memories_used": i.memories_used,
|
||||
"chunks_used": i.chunks_used,
|
||||
"created_at": i.created_at,
|
||||
}
|
||||
for i in interactions
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@router.get("/interactions/{interaction_id}")
|
||||
def api_get_interaction(interaction_id: str) -> dict:
|
||||
"""Fetch a single interaction with the full response and context pack."""
|
||||
interaction = get_interaction(interaction_id)
|
||||
if interaction is None:
|
||||
raise HTTPException(status_code=404, detail=f"Interaction not found: {interaction_id}")
|
||||
return {
|
||||
"id": interaction.id,
|
||||
"prompt": interaction.prompt,
|
||||
"response": interaction.response,
|
||||
"response_summary": interaction.response_summary,
|
||||
"project": interaction.project,
|
||||
"client": interaction.client,
|
||||
"session_id": interaction.session_id,
|
||||
"memories_used": interaction.memories_used,
|
||||
"chunks_used": interaction.chunks_used,
|
||||
"context_pack": interaction.context_pack,
|
||||
"created_at": interaction.created_at,
|
||||
}
|
||||
|
||||
|
||||
class BackupCreateRequest(BaseModel):
|
||||
include_chroma: bool = False
|
||||
|
||||
|
||||
@router.post("/admin/backup")
|
||||
def api_create_backup(req: BackupCreateRequest | None = None) -> dict:
|
||||
"""Create a runtime backup snapshot.
|
||||
|
||||
When ``include_chroma`` is true the call holds the ingestion lock so a
|
||||
safe cold copy of the vector store can be taken without racing against
|
||||
refresh or ingest endpoints.
|
||||
"""
|
||||
payload = req or BackupCreateRequest()
|
||||
try:
|
||||
if payload.include_chroma:
|
||||
with exclusive_ingestion():
|
||||
metadata = create_runtime_backup(include_chroma=True)
|
||||
else:
|
||||
metadata = create_runtime_backup(include_chroma=False)
|
||||
except Exception as e:
|
||||
log.error("admin_backup_failed", error=str(e))
|
||||
raise HTTPException(status_code=500, detail=f"Backup failed: {e}")
|
||||
return metadata
|
||||
|
||||
|
||||
@router.get("/admin/backup")
|
||||
def api_list_backups() -> dict:
|
||||
"""List all runtime backups under the configured backup directory."""
|
||||
return {
|
||||
"backup_dir": str(_config.settings.resolved_backup_dir),
|
||||
"backups": list_runtime_backups(),
|
||||
}
|
||||
|
||||
|
||||
class BackupCleanupRequest(BaseModel):
|
||||
confirm: bool = False
|
||||
|
||||
|
||||
@router.post("/admin/backup/cleanup")
|
||||
def api_cleanup_backups(req: BackupCleanupRequest | None = None) -> dict:
|
||||
"""Apply retention policy to old backup snapshots.
|
||||
|
||||
Dry-run by default. Pass ``confirm: true`` to actually delete.
|
||||
Retention: last 7 daily, last 4 weekly (Sundays), last 6 monthly (1st).
|
||||
"""
|
||||
payload = req or BackupCleanupRequest()
|
||||
try:
|
||||
return cleanup_old_backups(confirm=payload.confirm)
|
||||
except Exception as e:
|
||||
log.error("admin_cleanup_failed", error=str(e))
|
||||
raise HTTPException(status_code=500, detail=f"Cleanup failed: {e}")
|
||||
|
||||
|
||||
@router.get("/admin/backup/{stamp}/validate")
|
||||
def api_validate_backup(stamp: str) -> dict:
|
||||
"""Validate that a previously created backup is structurally usable."""
|
||||
result = validate_backup(stamp)
|
||||
if not result.get("exists", False):
|
||||
raise HTTPException(status_code=404, detail=f"Backup not found: {stamp}")
|
||||
return result
|
||||
|
||||
|
||||
@router.get("/health")
|
||||
def api_health() -> dict:
|
||||
"""Health check.
|
||||
|
||||
Three layers of version reporting, in increasing precision:
|
||||
|
||||
- ``version`` / ``code_version``: ``atocore.__version__`` (e.g.
|
||||
"0.2.0"). Bumped manually on commits that change the API
|
||||
surface, schema, or user-visible behavior. Coarse — any
|
||||
number of commits can land between bumps without changing
|
||||
this value.
|
||||
- ``build_sha``: full git SHA of the commit the running
|
||||
container was built from. Set by ``deploy/dalidou/deploy.sh``
|
||||
via the ``ATOCORE_BUILD_SHA`` env var on every rebuild.
|
||||
Reports ``"unknown"`` for builds that bypass deploy.sh
|
||||
(direct ``docker compose up`` etc.). This is the precise
|
||||
drift signal: if the live ``build_sha`` doesn't match the
|
||||
tip of the deployed branch on Gitea, the service is stale
|
||||
regardless of what ``code_version`` says.
|
||||
- ``build_time`` / ``build_branch``: when and from which branch
|
||||
the live container was built. Useful for forensics when
|
||||
multiple branches are in flight or when build_sha is
|
||||
ambiguous (e.g. a force-push to the same SHA).
|
||||
|
||||
The deploy.sh post-deploy verification step compares the live
|
||||
``build_sha`` to the SHA it just set, and exits non-zero on
|
||||
mismatch.
|
||||
"""
|
||||
import os
|
||||
|
||||
from atocore import __version__
|
||||
|
||||
store = get_vector_store()
|
||||
source_status = get_source_status()
|
||||
return {
|
||||
"status": "ok",
|
||||
"version": __version__,
|
||||
"code_version": __version__,
|
||||
"build_sha": os.environ.get("ATOCORE_BUILD_SHA", "unknown"),
|
||||
"build_time": os.environ.get("ATOCORE_BUILD_TIME", "unknown"),
|
||||
"build_branch": os.environ.get("ATOCORE_BUILD_BRANCH", "unknown"),
|
||||
"vectors_count": store.count,
|
||||
"env": _config.settings.env,
|
||||
"machine_paths": {
|
||||
"db_path": str(_config.settings.db_path),
|
||||
"chroma_path": str(_config.settings.chroma_path),
|
||||
"log_dir": str(_config.settings.resolved_log_dir),
|
||||
"backup_dir": str(_config.settings.resolved_backup_dir),
|
||||
"run_dir": str(_config.settings.resolved_run_dir),
|
||||
},
|
||||
"sources_ready": all(
|
||||
(not source["enabled"]) or (source["exists"] and source["is_dir"])
|
||||
for source in source_status
|
||||
),
|
||||
"source_status": source_status,
|
||||
}
|
||||
|
||||
|
||||
@router.get("/sources")
|
||||
def api_sources() -> dict:
|
||||
"""Return configured ingestion source directories and readiness."""
|
||||
return {
|
||||
"sources": get_source_status(),
|
||||
"vault_enabled": _config.settings.source_vault_enabled,
|
||||
"drive_enabled": _config.settings.source_drive_enabled,
|
||||
}
|
||||
|
||||
|
||||
@router.get("/stats")
|
||||
def api_stats() -> dict:
|
||||
"""Ingestion statistics."""
|
||||
return get_ingestion_stats()
|
||||
|
||||
|
||||
@router.get("/debug/context")
|
||||
def api_debug_context() -> dict:
|
||||
"""Inspect the last assembled context pack."""
|
||||
pack = get_last_context_pack()
|
||||
if pack is None:
|
||||
return {"message": "No context pack built yet."}
|
||||
return _pack_to_dict(pack)
|
||||
157
src/atocore/config.py
Normal file
157
src/atocore/config.py
Normal file
@@ -0,0 +1,157 @@
|
||||
"""AtoCore configuration via environment variables."""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from pydantic_settings import BaseSettings
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
env: str = "development"
|
||||
debug: bool = False
|
||||
log_level: str = "INFO"
|
||||
data_dir: Path = Path("./data")
|
||||
db_dir: Path | None = None
|
||||
chroma_dir: Path | None = None
|
||||
cache_dir: Path | None = None
|
||||
tmp_dir: Path | None = None
|
||||
vault_source_dir: Path = Path("./sources/vault")
|
||||
drive_source_dir: Path = Path("./sources/drive")
|
||||
source_vault_enabled: bool = True
|
||||
source_drive_enabled: bool = True
|
||||
log_dir: Path = Path("./logs")
|
||||
backup_dir: Path = Path("./backups")
|
||||
run_dir: Path = Path("./run")
|
||||
project_registry_path: Path = Path("./config/project-registry.json")
|
||||
host: str = "127.0.0.1"
|
||||
port: int = 8100
|
||||
db_busy_timeout_ms: int = 5000
|
||||
|
||||
# Embedding
|
||||
embedding_model: str = (
|
||||
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
|
||||
)
|
||||
|
||||
# Chunking
|
||||
chunk_max_size: int = 800
|
||||
chunk_overlap: int = 100
|
||||
chunk_min_size: int = 50
|
||||
|
||||
# Context
|
||||
context_budget: int = 3000
|
||||
context_top_k: int = 15
|
||||
|
||||
# Retrieval ranking weights (tunable per environment).
|
||||
# All multipliers default to the values used since Wave 1; tighten or
|
||||
# loosen them via ATOCORE_* env vars without touching code.
|
||||
rank_project_match_boost: float = 2.0
|
||||
rank_query_token_step: float = 0.08
|
||||
rank_query_token_cap: float = 1.32
|
||||
rank_path_high_signal_boost: float = 1.18
|
||||
rank_path_low_signal_penalty: float = 0.72
|
||||
|
||||
model_config = {"env_prefix": "ATOCORE_"}
|
||||
|
||||
@property
|
||||
def db_path(self) -> Path:
|
||||
legacy_path = self.resolved_data_dir / "atocore.db"
|
||||
if self.db_dir is not None:
|
||||
return self.resolved_db_dir / "atocore.db"
|
||||
if legacy_path.exists():
|
||||
return legacy_path
|
||||
return self.resolved_db_dir / "atocore.db"
|
||||
|
||||
@property
|
||||
def chroma_path(self) -> Path:
|
||||
return self._resolve_path(self.chroma_dir or (self.resolved_data_dir / "chroma"))
|
||||
|
||||
@property
|
||||
def cache_path(self) -> Path:
|
||||
return self._resolve_path(self.cache_dir or (self.resolved_data_dir / "cache"))
|
||||
|
||||
@property
|
||||
def tmp_path(self) -> Path:
|
||||
return self._resolve_path(self.tmp_dir or (self.resolved_data_dir / "tmp"))
|
||||
|
||||
@property
|
||||
def resolved_data_dir(self) -> Path:
|
||||
return self._resolve_path(self.data_dir)
|
||||
|
||||
@property
|
||||
def resolved_db_dir(self) -> Path:
|
||||
return self._resolve_path(self.db_dir or (self.resolved_data_dir / "db"))
|
||||
|
||||
@property
|
||||
def resolved_vault_source_dir(self) -> Path:
|
||||
return self._resolve_path(self.vault_source_dir)
|
||||
|
||||
@property
|
||||
def resolved_drive_source_dir(self) -> Path:
|
||||
return self._resolve_path(self.drive_source_dir)
|
||||
|
||||
@property
|
||||
def resolved_log_dir(self) -> Path:
|
||||
return self._resolve_path(self.log_dir)
|
||||
|
||||
@property
|
||||
def resolved_backup_dir(self) -> Path:
|
||||
return self._resolve_path(self.backup_dir)
|
||||
|
||||
@property
|
||||
def resolved_run_dir(self) -> Path:
|
||||
if self.run_dir == Path("./run"):
|
||||
return self._resolve_path(self.resolved_data_dir.parent / "run")
|
||||
return self._resolve_path(self.run_dir)
|
||||
|
||||
@property
|
||||
def resolved_project_registry_path(self) -> Path:
|
||||
return self._resolve_path(self.project_registry_path)
|
||||
|
||||
@property
|
||||
def machine_dirs(self) -> list[Path]:
|
||||
return [
|
||||
self.db_path.parent,
|
||||
self.chroma_path,
|
||||
self.cache_path,
|
||||
self.tmp_path,
|
||||
self.resolved_log_dir,
|
||||
self.resolved_backup_dir,
|
||||
self.resolved_run_dir,
|
||||
self.resolved_project_registry_path.parent,
|
||||
]
|
||||
|
||||
@property
|
||||
def source_specs(self) -> list[dict[str, object]]:
|
||||
return [
|
||||
{
|
||||
"name": "vault",
|
||||
"enabled": self.source_vault_enabled,
|
||||
"path": self.resolved_vault_source_dir,
|
||||
"read_only": True,
|
||||
},
|
||||
{
|
||||
"name": "drive",
|
||||
"enabled": self.source_drive_enabled,
|
||||
"path": self.resolved_drive_source_dir,
|
||||
"read_only": True,
|
||||
},
|
||||
]
|
||||
|
||||
@property
|
||||
def source_dirs(self) -> list[Path]:
|
||||
return [spec["path"] for spec in self.source_specs if spec["enabled"]]
|
||||
|
||||
def _resolve_path(self, path: Path) -> Path:
|
||||
return path.expanduser().resolve(strict=False)
|
||||
|
||||
|
||||
settings = Settings()
|
||||
|
||||
|
||||
def ensure_runtime_dirs() -> None:
|
||||
"""Create writable runtime directories for machine state and logs.
|
||||
|
||||
Source directories are intentionally excluded because they are treated as
|
||||
read-only ingestion inputs by convention.
|
||||
"""
|
||||
for directory in settings.machine_dirs:
|
||||
directory.mkdir(parents=True, exist_ok=True)
|
||||
0
src/atocore/context/__init__.py
Normal file
0
src/atocore/context/__init__.py
Normal file
424
src/atocore/context/builder.py
Normal file
424
src/atocore/context/builder.py
Normal file
@@ -0,0 +1,424 @@
|
||||
"""Context pack assembly: retrieve, rank, budget, format.
|
||||
|
||||
Trust precedence (per Master Plan):
|
||||
1. Trusted Project State → always included first, highest authority
|
||||
2. Identity + Preference memories → included next
|
||||
3. Retrieved chunks → ranked, deduplicated, budget-constrained
|
||||
"""
|
||||
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
import atocore.config as _config
|
||||
from atocore.context.project_state import format_project_state, get_state
|
||||
from atocore.memory.service import get_memories_for_context
|
||||
from atocore.observability.logger import get_logger
|
||||
from atocore.projects.registry import resolve_project_name
|
||||
from atocore.retrieval.retriever import ChunkResult, retrieve
|
||||
|
||||
log = get_logger("context_builder")
|
||||
|
||||
SYSTEM_PREFIX = (
|
||||
"You have access to the following personal context from the user's knowledge base.\n"
|
||||
"Use it to inform your answer. If the context is not relevant, ignore it.\n"
|
||||
"Do not mention the context system unless asked.\n"
|
||||
"When project state is provided, treat it as the most authoritative source."
|
||||
)
|
||||
|
||||
# Budget allocation (per Master Plan section 9):
|
||||
# identity: 5%, preferences: 5%, project state: 20%, retrieval: 60%+
|
||||
PROJECT_STATE_BUDGET_RATIO = 0.20
|
||||
MEMORY_BUDGET_RATIO = 0.10 # 5% identity + 5% preference
|
||||
# Project-scoped memories (project/knowledge/episodic) are the outlet
|
||||
# for the Phase 9 reflection loop on the retrieval side. Budget sits
|
||||
# between identity/preference and retrieved chunks so a reinforced
|
||||
# memory can actually reach the model.
|
||||
PROJECT_MEMORY_BUDGET_RATIO = 0.25
|
||||
PROJECT_MEMORY_TYPES = ["project", "knowledge", "episodic"]
|
||||
|
||||
# Last built context pack for debug inspection
|
||||
_last_context_pack: "ContextPack | None" = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class ContextChunk:
|
||||
content: str
|
||||
source_file: str
|
||||
heading_path: str
|
||||
score: float
|
||||
char_count: int
|
||||
|
||||
|
||||
@dataclass
|
||||
class ContextPack:
|
||||
chunks_used: list[ContextChunk] = field(default_factory=list)
|
||||
project_state_text: str = ""
|
||||
project_state_chars: int = 0
|
||||
memory_text: str = ""
|
||||
memory_chars: int = 0
|
||||
project_memory_text: str = ""
|
||||
project_memory_chars: int = 0
|
||||
total_chars: int = 0
|
||||
budget: int = 0
|
||||
budget_remaining: int = 0
|
||||
formatted_context: str = ""
|
||||
full_prompt: str = ""
|
||||
query: str = ""
|
||||
project_hint: str = ""
|
||||
duration_ms: int = 0
|
||||
|
||||
|
||||
def build_context(
|
||||
user_prompt: str,
|
||||
project_hint: str | None = None,
|
||||
budget: int | None = None,
|
||||
) -> ContextPack:
|
||||
"""Build a context pack for a user prompt.
|
||||
|
||||
Trust precedence applied:
|
||||
1. Project state is injected first (highest trust)
|
||||
2. Identity + preference memories (second trust level)
|
||||
3. Retrieved chunks fill the remaining budget
|
||||
"""
|
||||
global _last_context_pack
|
||||
start = time.time()
|
||||
budget = _config.settings.context_budget if budget is None else max(budget, 0)
|
||||
|
||||
# 1. Get Trusted Project State (highest precedence)
|
||||
project_state_text = ""
|
||||
project_state_chars = 0
|
||||
project_state_budget = min(
|
||||
budget,
|
||||
max(0, int(budget * PROJECT_STATE_BUDGET_RATIO)),
|
||||
)
|
||||
|
||||
# Canonicalize the project hint through the registry so callers
|
||||
# can pass an alias (`p05`, `gigabit`) and still find trusted
|
||||
# state stored under the canonical project id. The same helper
|
||||
# is used everywhere a project name crosses a trust boundary
|
||||
# (project_state, memories, interactions). When the registry has
|
||||
# no entry the helper returns the input unchanged so hand-curated
|
||||
# state that predates the registry still works.
|
||||
canonical_project = resolve_project_name(project_hint) if project_hint else ""
|
||||
if canonical_project:
|
||||
state_entries = get_state(canonical_project)
|
||||
if state_entries:
|
||||
project_state_text = format_project_state(state_entries)
|
||||
project_state_text, project_state_chars = _truncate_text_block(
|
||||
project_state_text,
|
||||
project_state_budget or budget,
|
||||
)
|
||||
|
||||
# 2. Get identity + preference memories (second precedence)
|
||||
memory_budget = min(int(budget * MEMORY_BUDGET_RATIO), max(budget - project_state_chars, 0))
|
||||
memory_text, memory_chars = get_memories_for_context(
|
||||
memory_types=["identity", "preference"],
|
||||
budget=memory_budget,
|
||||
query=user_prompt,
|
||||
)
|
||||
|
||||
# 2b. Get project-scoped memories (third precedence). Only
|
||||
# populated when a canonical project is in scope — cross-project
|
||||
# memory bleed would rot the pack. Active-only filtering is
|
||||
# handled by the shared min_confidence=0.5 gate inside
|
||||
# get_memories_for_context.
|
||||
project_memory_text = ""
|
||||
project_memory_chars = 0
|
||||
if canonical_project:
|
||||
project_memory_budget = min(
|
||||
int(budget * PROJECT_MEMORY_BUDGET_RATIO),
|
||||
max(budget - project_state_chars - memory_chars, 0),
|
||||
)
|
||||
project_memory_text, project_memory_chars = get_memories_for_context(
|
||||
memory_types=PROJECT_MEMORY_TYPES,
|
||||
project=canonical_project,
|
||||
budget=project_memory_budget,
|
||||
header="--- Project Memories ---",
|
||||
footer="--- End Project Memories ---",
|
||||
query=user_prompt,
|
||||
)
|
||||
|
||||
# 3. Calculate remaining budget for retrieval
|
||||
retrieval_budget = budget - project_state_chars - memory_chars - project_memory_chars
|
||||
|
||||
# 4. Retrieve candidates
|
||||
candidates = (
|
||||
retrieve(
|
||||
user_prompt,
|
||||
top_k=_config.settings.context_top_k,
|
||||
project_hint=project_hint,
|
||||
)
|
||||
if retrieval_budget > 0
|
||||
else []
|
||||
)
|
||||
|
||||
# 5. Score and rank
|
||||
scored = _rank_chunks(candidates, project_hint)
|
||||
|
||||
# 6. Select within remaining budget
|
||||
selected = _select_within_budget(scored, max(retrieval_budget, 0))
|
||||
|
||||
# 7. Format full context
|
||||
formatted = _format_full_context(
|
||||
project_state_text, memory_text, project_memory_text, selected
|
||||
)
|
||||
if len(formatted) > budget:
|
||||
formatted, selected = _trim_context_to_budget(
|
||||
project_state_text,
|
||||
memory_text,
|
||||
project_memory_text,
|
||||
selected,
|
||||
budget,
|
||||
)
|
||||
|
||||
# 8. Build full prompt
|
||||
full_prompt = f"{SYSTEM_PREFIX}\n\n{formatted}\n\n{user_prompt}"
|
||||
|
||||
project_state_chars = len(project_state_text)
|
||||
memory_chars = len(memory_text)
|
||||
project_memory_chars = len(project_memory_text)
|
||||
retrieval_chars = sum(c.char_count for c in selected)
|
||||
total_chars = len(formatted)
|
||||
duration_ms = int((time.time() - start) * 1000)
|
||||
|
||||
pack = ContextPack(
|
||||
chunks_used=selected,
|
||||
project_state_text=project_state_text,
|
||||
project_state_chars=project_state_chars,
|
||||
memory_text=memory_text,
|
||||
memory_chars=memory_chars,
|
||||
project_memory_text=project_memory_text,
|
||||
project_memory_chars=project_memory_chars,
|
||||
total_chars=total_chars,
|
||||
budget=budget,
|
||||
budget_remaining=budget - total_chars,
|
||||
formatted_context=formatted,
|
||||
full_prompt=full_prompt,
|
||||
query=user_prompt,
|
||||
project_hint=project_hint or "",
|
||||
duration_ms=duration_ms,
|
||||
)
|
||||
|
||||
_last_context_pack = pack
|
||||
|
||||
log.info(
|
||||
"context_built",
|
||||
chunks_used=len(selected),
|
||||
project_state_chars=project_state_chars,
|
||||
memory_chars=memory_chars,
|
||||
project_memory_chars=project_memory_chars,
|
||||
retrieval_chars=retrieval_chars,
|
||||
total_chars=total_chars,
|
||||
budget_remaining=budget - total_chars,
|
||||
duration_ms=duration_ms,
|
||||
)
|
||||
log.debug("context_pack_detail", pack=_pack_to_dict(pack))
|
||||
|
||||
return pack
|
||||
|
||||
|
||||
def get_last_context_pack() -> ContextPack | None:
|
||||
"""Return the last built context pack for debug inspection."""
|
||||
return _last_context_pack
|
||||
|
||||
|
||||
def _rank_chunks(
|
||||
candidates: list[ChunkResult],
|
||||
project_hint: str | None,
|
||||
) -> list[tuple[float, ChunkResult]]:
|
||||
"""Rank candidates with boosting for project match."""
|
||||
scored = []
|
||||
seen_content: set[str] = set()
|
||||
|
||||
for chunk in candidates:
|
||||
# Deduplicate by content prefix (first 200 chars)
|
||||
content_key = chunk.content[:200]
|
||||
if content_key in seen_content:
|
||||
continue
|
||||
seen_content.add(content_key)
|
||||
|
||||
# Base score from similarity
|
||||
final_score = chunk.score
|
||||
|
||||
# Project boost
|
||||
if project_hint:
|
||||
tags_str = chunk.tags.lower() if chunk.tags else ""
|
||||
source_str = chunk.source_file.lower()
|
||||
title_str = chunk.title.lower() if chunk.title else ""
|
||||
hint_lower = project_hint.lower()
|
||||
|
||||
if hint_lower in tags_str or hint_lower in source_str or hint_lower in title_str:
|
||||
final_score *= 1.3
|
||||
|
||||
scored.append((final_score, chunk))
|
||||
|
||||
# Sort by score descending
|
||||
scored.sort(key=lambda x: x[0], reverse=True)
|
||||
return scored
|
||||
|
||||
|
||||
def _select_within_budget(
|
||||
scored: list[tuple[float, ChunkResult]],
|
||||
budget: int,
|
||||
) -> list[ContextChunk]:
|
||||
"""Select top chunks that fit within the character budget."""
|
||||
selected = []
|
||||
used = 0
|
||||
|
||||
for score, chunk in scored:
|
||||
chunk_len = len(chunk.content)
|
||||
if used + chunk_len > budget:
|
||||
continue
|
||||
selected.append(
|
||||
ContextChunk(
|
||||
content=chunk.content,
|
||||
source_file=_shorten_path(chunk.source_file),
|
||||
heading_path=chunk.heading_path,
|
||||
score=score,
|
||||
char_count=chunk_len,
|
||||
)
|
||||
)
|
||||
used += chunk_len
|
||||
|
||||
return selected
|
||||
|
||||
|
||||
def _format_full_context(
|
||||
project_state_text: str,
|
||||
memory_text: str,
|
||||
project_memory_text: str,
|
||||
chunks: list[ContextChunk],
|
||||
) -> str:
|
||||
"""Format project state + memories + retrieved chunks into full context block."""
|
||||
parts = []
|
||||
|
||||
# 1. Project state first (highest trust)
|
||||
if project_state_text:
|
||||
parts.append(project_state_text)
|
||||
parts.append("")
|
||||
|
||||
# 2. Identity + preference memories (second trust level)
|
||||
if memory_text:
|
||||
parts.append(memory_text)
|
||||
parts.append("")
|
||||
|
||||
# 3. Project-scoped memories (third trust level)
|
||||
if project_memory_text:
|
||||
parts.append(project_memory_text)
|
||||
parts.append("")
|
||||
|
||||
# 4. Retrieved chunks (lowest trust)
|
||||
if chunks:
|
||||
parts.append("--- AtoCore Retrieved Context ---")
|
||||
if project_state_text:
|
||||
parts.append("If retrieved context conflicts with Trusted Project State above, trust the Trusted Project State.")
|
||||
for chunk in chunks:
|
||||
parts.append(
|
||||
f"[Source: {chunk.source_file} | Section: {chunk.heading_path} | Score: {chunk.score:.2f}]"
|
||||
)
|
||||
parts.append(chunk.content)
|
||||
parts.append("")
|
||||
parts.append("--- End Context ---")
|
||||
elif not project_state_text and not memory_text and not project_memory_text:
|
||||
parts.append("--- AtoCore Context ---\nNo relevant context found.\n--- End Context ---")
|
||||
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
def _shorten_path(path: str) -> str:
|
||||
"""Shorten an absolute path to a relative-like display."""
|
||||
p = Path(path)
|
||||
parts = p.parts
|
||||
if len(parts) > 3:
|
||||
return str(Path(*parts[-3:]))
|
||||
return str(p)
|
||||
|
||||
|
||||
def _pack_to_dict(pack: ContextPack) -> dict:
|
||||
"""Convert a context pack to a JSON-serializable dict."""
|
||||
return {
|
||||
"query": pack.query,
|
||||
"project_hint": pack.project_hint,
|
||||
"project_state_chars": pack.project_state_chars,
|
||||
"memory_chars": pack.memory_chars,
|
||||
"project_memory_chars": pack.project_memory_chars,
|
||||
"chunks_used": len(pack.chunks_used),
|
||||
"total_chars": pack.total_chars,
|
||||
"budget": pack.budget,
|
||||
"budget_remaining": pack.budget_remaining,
|
||||
"duration_ms": pack.duration_ms,
|
||||
"has_project_state": bool(pack.project_state_text),
|
||||
"has_memories": bool(pack.memory_text),
|
||||
"has_project_memories": bool(pack.project_memory_text),
|
||||
"chunks": [
|
||||
{
|
||||
"source_file": c.source_file,
|
||||
"heading_path": c.heading_path,
|
||||
"score": c.score,
|
||||
"char_count": c.char_count,
|
||||
"content_preview": c.content[:100],
|
||||
}
|
||||
for c in pack.chunks_used
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _truncate_text_block(text: str, budget: int) -> tuple[str, int]:
|
||||
"""Trim a formatted text block so trusted tiers cannot exceed the total budget."""
|
||||
if budget <= 0 or not text:
|
||||
return "", 0
|
||||
if len(text) <= budget:
|
||||
return text, len(text)
|
||||
if budget <= 3:
|
||||
trimmed = text[:budget]
|
||||
else:
|
||||
trimmed = f"{text[: budget - 3].rstrip()}..."
|
||||
return trimmed, len(trimmed)
|
||||
|
||||
|
||||
def _trim_context_to_budget(
|
||||
project_state_text: str,
|
||||
memory_text: str,
|
||||
project_memory_text: str,
|
||||
chunks: list[ContextChunk],
|
||||
budget: int,
|
||||
) -> tuple[str, list[ContextChunk]]:
|
||||
"""Trim retrieval → project memories → identity/preference → project state."""
|
||||
kept_chunks = list(chunks)
|
||||
formatted = _format_full_context(
|
||||
project_state_text, memory_text, project_memory_text, kept_chunks
|
||||
)
|
||||
while len(formatted) > budget and kept_chunks:
|
||||
kept_chunks.pop()
|
||||
formatted = _format_full_context(
|
||||
project_state_text, memory_text, project_memory_text, kept_chunks
|
||||
)
|
||||
|
||||
if len(formatted) <= budget:
|
||||
return formatted, kept_chunks
|
||||
|
||||
# Drop project memories next (they were the most recently added
|
||||
# tier and carry less trust than identity/preference).
|
||||
project_memory_text, _ = _truncate_text_block(
|
||||
project_memory_text,
|
||||
max(budget - len(project_state_text) - len(memory_text), 0),
|
||||
)
|
||||
formatted = _format_full_context(
|
||||
project_state_text, memory_text, project_memory_text, kept_chunks
|
||||
)
|
||||
if len(formatted) <= budget:
|
||||
return formatted, kept_chunks
|
||||
|
||||
memory_text, _ = _truncate_text_block(memory_text, max(budget - len(project_state_text), 0))
|
||||
formatted = _format_full_context(
|
||||
project_state_text, memory_text, project_memory_text, kept_chunks
|
||||
)
|
||||
if len(formatted) <= budget:
|
||||
return formatted, kept_chunks
|
||||
|
||||
project_state_text, _ = _truncate_text_block(project_state_text, budget)
|
||||
formatted = _format_full_context(project_state_text, "", "", [])
|
||||
if len(formatted) > budget:
|
||||
formatted, _ = _truncate_text_block(formatted, budget)
|
||||
return formatted, []
|
||||
254
src/atocore/context/project_state.py
Normal file
254
src/atocore/context/project_state.py
Normal file
@@ -0,0 +1,254 @@
|
||||
"""Trusted Project State — the highest-priority context source.
|
||||
|
||||
Per the Master Plan trust precedence:
|
||||
1. Trusted Project State (this module)
|
||||
2. AtoDrive artifacts
|
||||
3. Recent validated memory
|
||||
4. AtoVault summaries
|
||||
5. PKM chunks
|
||||
6. Historical / low-confidence
|
||||
|
||||
Project state is manually curated or explicitly confirmed facts about a project.
|
||||
It always wins over retrieval-based context when there's a conflict.
|
||||
"""
|
||||
|
||||
import uuid
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from atocore.models.database import get_connection
|
||||
from atocore.observability.logger import get_logger
|
||||
from atocore.projects.registry import resolve_project_name
|
||||
|
||||
log = get_logger("project_state")
|
||||
|
||||
# DB schema extension for project state
|
||||
PROJECT_STATE_SCHEMA = """
|
||||
CREATE TABLE IF NOT EXISTS project_state (
|
||||
id TEXT PRIMARY KEY,
|
||||
project_id TEXT NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||
category TEXT NOT NULL,
|
||||
key TEXT NOT NULL,
|
||||
value TEXT NOT NULL,
|
||||
source TEXT DEFAULT '',
|
||||
confidence REAL DEFAULT 1.0,
|
||||
status TEXT DEFAULT 'active',
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
UNIQUE(project_id, category, key)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_project_state_project ON project_state(project_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_project_state_category ON project_state(category);
|
||||
CREATE INDEX IF NOT EXISTS idx_project_state_status ON project_state(status);
|
||||
"""
|
||||
|
||||
# Valid categories for project state entries
|
||||
CATEGORIES = [
|
||||
"status", # current project status, phase, blockers
|
||||
"decision", # confirmed design/engineering decisions
|
||||
"requirement", # key requirements and constraints
|
||||
"contact", # key people, vendors, stakeholders
|
||||
"milestone", # dates, deadlines, deliverables
|
||||
"fact", # verified technical facts
|
||||
"config", # project configuration, parameters
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class ProjectStateEntry:
|
||||
id: str
|
||||
project_id: str
|
||||
category: str
|
||||
key: str
|
||||
value: str
|
||||
source: str = ""
|
||||
confidence: float = 1.0
|
||||
status: str = "active"
|
||||
created_at: str = ""
|
||||
updated_at: str = ""
|
||||
|
||||
|
||||
def init_project_state_schema() -> None:
|
||||
"""Create the project_state table if it doesn't exist."""
|
||||
with get_connection() as conn:
|
||||
conn.executescript(PROJECT_STATE_SCHEMA)
|
||||
log.info("project_state_schema_initialized")
|
||||
|
||||
|
||||
def ensure_project(name: str, description: str = "") -> str:
|
||||
"""Get or create a project by name. Returns project_id."""
|
||||
with get_connection() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT id FROM projects WHERE lower(name) = lower(?)", (name,)
|
||||
).fetchone()
|
||||
if row:
|
||||
return row["id"]
|
||||
|
||||
project_id = str(uuid.uuid4())
|
||||
conn.execute(
|
||||
"INSERT INTO projects (id, name, description) VALUES (?, ?, ?)",
|
||||
(project_id, name, description),
|
||||
)
|
||||
log.info("project_created", name=name, project_id=project_id)
|
||||
return project_id
|
||||
|
||||
|
||||
def set_state(
|
||||
project_name: str,
|
||||
category: str,
|
||||
key: str,
|
||||
value: str,
|
||||
source: str = "",
|
||||
confidence: float = 1.0,
|
||||
) -> ProjectStateEntry:
|
||||
"""Set or update a project state entry. Upsert semantics.
|
||||
|
||||
The ``project_name`` is canonicalized through the registry so a
|
||||
caller passing an alias (``p05``) ends up writing into the same
|
||||
row as the canonical id (``p05-interferometer``). Without this
|
||||
step, alias and canonical names would create two parallel
|
||||
project rows and fragmented state.
|
||||
"""
|
||||
if category not in CATEGORIES:
|
||||
raise ValueError(f"Invalid category '{category}'. Must be one of: {CATEGORIES}")
|
||||
_validate_confidence(confidence)
|
||||
|
||||
project_name = resolve_project_name(project_name)
|
||||
project_id = ensure_project(project_name)
|
||||
entry_id = str(uuid.uuid4())
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
with get_connection() as conn:
|
||||
# Check if entry exists
|
||||
existing = conn.execute(
|
||||
"SELECT id FROM project_state WHERE project_id = ? AND category = ? AND key = ?",
|
||||
(project_id, category, key),
|
||||
).fetchone()
|
||||
|
||||
if existing:
|
||||
entry_id = existing["id"]
|
||||
conn.execute(
|
||||
"UPDATE project_state SET value = ?, source = ?, confidence = ?, "
|
||||
"status = 'active', updated_at = CURRENT_TIMESTAMP "
|
||||
"WHERE id = ?",
|
||||
(value, source, confidence, entry_id),
|
||||
)
|
||||
log.info("project_state_updated", project=project_name, category=category, key=key)
|
||||
else:
|
||||
conn.execute(
|
||||
"INSERT INTO project_state (id, project_id, category, key, value, source, confidence) "
|
||||
"VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||
(entry_id, project_id, category, key, value, source, confidence),
|
||||
)
|
||||
log.info("project_state_created", project=project_name, category=category, key=key)
|
||||
|
||||
return ProjectStateEntry(
|
||||
id=entry_id,
|
||||
project_id=project_id,
|
||||
category=category,
|
||||
key=key,
|
||||
value=value,
|
||||
source=source,
|
||||
confidence=confidence,
|
||||
status="active",
|
||||
created_at=now,
|
||||
updated_at=now,
|
||||
)
|
||||
|
||||
|
||||
def get_state(
|
||||
project_name: str,
|
||||
category: str | None = None,
|
||||
active_only: bool = True,
|
||||
) -> list[ProjectStateEntry]:
|
||||
"""Get project state entries, optionally filtered by category.
|
||||
|
||||
The lookup is canonicalized through the registry so an alias hint
|
||||
finds the same rows as the canonical id.
|
||||
"""
|
||||
project_name = resolve_project_name(project_name)
|
||||
with get_connection() as conn:
|
||||
project = conn.execute(
|
||||
"SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,)
|
||||
).fetchone()
|
||||
if not project:
|
||||
return []
|
||||
|
||||
query = "SELECT * FROM project_state WHERE project_id = ?"
|
||||
params: list = [project["id"]]
|
||||
|
||||
if category:
|
||||
query += " AND category = ?"
|
||||
params.append(category)
|
||||
if active_only:
|
||||
query += " AND status = 'active'"
|
||||
|
||||
query += " ORDER BY category, key"
|
||||
rows = conn.execute(query, params).fetchall()
|
||||
|
||||
return [
|
||||
ProjectStateEntry(
|
||||
id=r["id"],
|
||||
project_id=r["project_id"],
|
||||
category=r["category"],
|
||||
key=r["key"],
|
||||
value=r["value"],
|
||||
source=r["source"],
|
||||
confidence=r["confidence"],
|
||||
status=r["status"],
|
||||
created_at=r["created_at"],
|
||||
updated_at=r["updated_at"],
|
||||
)
|
||||
for r in rows
|
||||
]
|
||||
|
||||
|
||||
def invalidate_state(project_name: str, category: str, key: str) -> bool:
|
||||
"""Mark a project state entry as superseded.
|
||||
|
||||
The lookup is canonicalized through the registry so an alias is
|
||||
treated as the canonical project for the invalidation lookup.
|
||||
"""
|
||||
project_name = resolve_project_name(project_name)
|
||||
with get_connection() as conn:
|
||||
project = conn.execute(
|
||||
"SELECT id FROM projects WHERE lower(name) = lower(?)", (project_name,)
|
||||
).fetchone()
|
||||
if not project:
|
||||
return False
|
||||
|
||||
result = conn.execute(
|
||||
"UPDATE project_state SET status = 'superseded', updated_at = CURRENT_TIMESTAMP "
|
||||
"WHERE project_id = ? AND category = ? AND key = ? AND status = 'active'",
|
||||
(project["id"], category, key),
|
||||
)
|
||||
if result.rowcount > 0:
|
||||
log.info("project_state_invalidated", project=project_name, category=category, key=key)
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def format_project_state(entries: list[ProjectStateEntry]) -> str:
|
||||
"""Format project state entries for context injection."""
|
||||
if not entries:
|
||||
return ""
|
||||
|
||||
lines = ["--- Trusted Project State ---"]
|
||||
current_category = ""
|
||||
|
||||
for entry in entries:
|
||||
if entry.category != current_category:
|
||||
current_category = entry.category
|
||||
lines.append(f"\n[{current_category.upper()}]")
|
||||
lines.append(f" {entry.key}: {entry.value}")
|
||||
if entry.source:
|
||||
lines.append(f" (source: {entry.source})")
|
||||
|
||||
lines.append("\n--- End Project State ---")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _validate_confidence(confidence: float) -> None:
|
||||
if not 0.0 <= confidence <= 1.0:
|
||||
raise ValueError("Confidence must be between 0.0 and 1.0")
|
||||
0
src/atocore/ingestion/__init__.py
Normal file
0
src/atocore/ingestion/__init__.py
Normal file
150
src/atocore/ingestion/chunker.py
Normal file
150
src/atocore/ingestion/chunker.py
Normal file
@@ -0,0 +1,150 @@
|
||||
"""Heading-aware recursive markdown chunking."""
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
import atocore.config as _config
|
||||
|
||||
|
||||
@dataclass
|
||||
class Chunk:
|
||||
content: str
|
||||
chunk_index: int
|
||||
heading_path: str
|
||||
char_count: int
|
||||
metadata: dict = field(default_factory=dict)
|
||||
|
||||
|
||||
def chunk_markdown(
|
||||
body: str,
|
||||
base_metadata: dict | None = None,
|
||||
max_size: int | None = None,
|
||||
overlap: int | None = None,
|
||||
min_size: int | None = None,
|
||||
) -> list[Chunk]:
|
||||
"""Split markdown body into chunks using heading-aware strategy.
|
||||
|
||||
1. Split on H2 boundaries
|
||||
2. If section > max_size, split on H3
|
||||
3. If still > max_size, split on paragraph breaks
|
||||
4. If still > max_size, hard split with overlap
|
||||
"""
|
||||
max_size = max_size or _config.settings.chunk_max_size
|
||||
overlap = overlap or _config.settings.chunk_overlap
|
||||
min_size = min_size or _config.settings.chunk_min_size
|
||||
base_metadata = base_metadata or {}
|
||||
|
||||
sections = _split_by_heading(body, level=2)
|
||||
raw_chunks: list[tuple[str, str]] = [] # (heading_path, content)
|
||||
|
||||
for heading, content in sections:
|
||||
if len(content) <= max_size:
|
||||
raw_chunks.append((heading, content))
|
||||
else:
|
||||
# Try splitting on H3
|
||||
subsections = _split_by_heading(content, level=3)
|
||||
for sub_heading, sub_content in subsections:
|
||||
full_path = (
|
||||
f"{heading} > {sub_heading}" if heading and sub_heading else heading or sub_heading
|
||||
)
|
||||
if len(sub_content) <= max_size:
|
||||
raw_chunks.append((full_path, sub_content))
|
||||
else:
|
||||
# Split on paragraphs
|
||||
para_chunks = _split_by_paragraphs(
|
||||
sub_content, max_size, overlap
|
||||
)
|
||||
for pc in para_chunks:
|
||||
raw_chunks.append((full_path, pc))
|
||||
|
||||
# Build final chunks, filtering out too-small ones
|
||||
chunks = []
|
||||
idx = 0
|
||||
for heading_path, content in raw_chunks:
|
||||
content = content.strip()
|
||||
if len(content) < min_size:
|
||||
continue
|
||||
chunks.append(
|
||||
Chunk(
|
||||
content=content,
|
||||
chunk_index=idx,
|
||||
heading_path=heading_path,
|
||||
char_count=len(content),
|
||||
metadata={**base_metadata},
|
||||
)
|
||||
)
|
||||
idx += 1
|
||||
|
||||
return chunks
|
||||
|
||||
|
||||
def _split_by_heading(text: str, level: int) -> list[tuple[str, str]]:
|
||||
"""Split text by heading level. Returns (heading_text, section_content) pairs."""
|
||||
pattern = rf"^({'#' * level})\s+(.+)$"
|
||||
parts: list[tuple[str, str]] = []
|
||||
current_heading = ""
|
||||
current_lines: list[str] = []
|
||||
|
||||
for line in text.split("\n"):
|
||||
match = re.match(pattern, line)
|
||||
if match:
|
||||
# Save previous section
|
||||
if current_lines:
|
||||
parts.append((current_heading, "\n".join(current_lines)))
|
||||
current_heading = match.group(2).strip()
|
||||
current_lines = []
|
||||
else:
|
||||
current_lines.append(line)
|
||||
|
||||
# Save last section
|
||||
if current_lines:
|
||||
parts.append((current_heading, "\n".join(current_lines)))
|
||||
|
||||
return parts
|
||||
|
||||
|
||||
def _split_by_paragraphs(
|
||||
text: str, max_size: int, overlap: int
|
||||
) -> list[str]:
|
||||
"""Split text by paragraph breaks, then hard-split if needed."""
|
||||
paragraphs = re.split(r"\n\n+", text)
|
||||
chunks: list[str] = []
|
||||
current = ""
|
||||
|
||||
for para in paragraphs:
|
||||
para = para.strip()
|
||||
if not para:
|
||||
continue
|
||||
|
||||
if len(current) + len(para) + 2 <= max_size:
|
||||
current = f"{current}\n\n{para}" if current else para
|
||||
else:
|
||||
if current:
|
||||
chunks.append(current)
|
||||
# If single paragraph exceeds max, hard split
|
||||
if len(para) > max_size:
|
||||
chunks.extend(_hard_split(para, max_size, overlap))
|
||||
else:
|
||||
current = para
|
||||
continue
|
||||
current = ""
|
||||
|
||||
if current:
|
||||
chunks.append(current)
|
||||
|
||||
return chunks
|
||||
|
||||
|
||||
def _hard_split(text: str, max_size: int, overlap: int) -> list[str]:
|
||||
"""Hard split text at max_size with overlap."""
|
||||
# Prevent infinite loop: overlap must be less than max_size
|
||||
if overlap >= max_size:
|
||||
overlap = max_size // 4
|
||||
|
||||
chunks = []
|
||||
start = 0
|
||||
while start < len(text):
|
||||
end = start + max_size
|
||||
chunks.append(text[start:end])
|
||||
start = end - overlap
|
||||
return chunks
|
||||
65
src/atocore/ingestion/parser.py
Normal file
65
src/atocore/ingestion/parser.py
Normal file
@@ -0,0 +1,65 @@
|
||||
"""Markdown file parsing with frontmatter extraction."""
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
import frontmatter
|
||||
|
||||
|
||||
@dataclass
|
||||
class ParsedDocument:
|
||||
file_path: str
|
||||
title: str
|
||||
body: str
|
||||
tags: list[str] = field(default_factory=list)
|
||||
frontmatter: dict = field(default_factory=dict)
|
||||
headings: list[tuple[int, str]] = field(default_factory=list)
|
||||
|
||||
|
||||
def parse_markdown(file_path: Path, text: str | None = None) -> ParsedDocument:
|
||||
"""Parse a markdown file, extracting frontmatter and structure."""
|
||||
raw_text = text if text is not None else file_path.read_text(encoding="utf-8")
|
||||
post = frontmatter.loads(raw_text)
|
||||
|
||||
meta = dict(post.metadata) if post.metadata else {}
|
||||
body = post.content.strip()
|
||||
|
||||
# Extract title: first H1, or filename
|
||||
title = _extract_title(body, file_path)
|
||||
|
||||
# Extract tags from frontmatter
|
||||
tags = meta.get("tags", [])
|
||||
if isinstance(tags, str):
|
||||
tags = [t.strip() for t in tags.split(",") if t.strip()]
|
||||
tags = tags or []
|
||||
|
||||
# Extract heading structure
|
||||
headings = _extract_headings(body)
|
||||
|
||||
return ParsedDocument(
|
||||
file_path=str(file_path.resolve()),
|
||||
title=title,
|
||||
body=body,
|
||||
tags=tags,
|
||||
frontmatter=meta,
|
||||
headings=headings,
|
||||
)
|
||||
|
||||
|
||||
def _extract_title(body: str, file_path: Path) -> str:
|
||||
"""Get title from first H1 or fallback to filename."""
|
||||
match = re.search(r"^#\s+(.+)$", body, re.MULTILINE)
|
||||
if match:
|
||||
return match.group(1).strip()
|
||||
return file_path.stem.replace("_", " ").replace("-", " ").title()
|
||||
|
||||
|
||||
def _extract_headings(body: str) -> list[tuple[int, str]]:
|
||||
"""Extract all headings with their level."""
|
||||
headings = []
|
||||
for match in re.finditer(r"^(#{1,4})\s+(.+)$", body, re.MULTILINE):
|
||||
level = len(match.group(1))
|
||||
text = match.group(2).strip()
|
||||
headings.append((level, text))
|
||||
return headings
|
||||
321
src/atocore/ingestion/pipeline.py
Normal file
321
src/atocore/ingestion/pipeline.py
Normal file
@@ -0,0 +1,321 @@
|
||||
"""Ingestion pipeline: parse → chunk → embed → store."""
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import threading
|
||||
import time
|
||||
import uuid
|
||||
from contextlib import contextmanager
|
||||
from pathlib import Path
|
||||
|
||||
import atocore.config as _config
|
||||
from atocore.ingestion.chunker import chunk_markdown
|
||||
from atocore.ingestion.parser import parse_markdown
|
||||
from atocore.models.database import get_connection
|
||||
from atocore.observability.logger import get_logger
|
||||
from atocore.retrieval.vector_store import get_vector_store
|
||||
|
||||
log = get_logger("ingestion")
|
||||
|
||||
# Encodings to try when reading markdown files
|
||||
_ENCODINGS = ["utf-8", "utf-8-sig", "latin-1", "cp1252"]
|
||||
_INGESTION_LOCK = threading.Lock()
|
||||
|
||||
|
||||
@contextmanager
|
||||
def exclusive_ingestion():
|
||||
"""Serialize long-running ingestion operations across API requests."""
|
||||
_INGESTION_LOCK.acquire()
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
_INGESTION_LOCK.release()
|
||||
|
||||
|
||||
def ingest_file(file_path: Path) -> dict:
|
||||
"""Ingest a single markdown file. Returns stats."""
|
||||
start = time.time()
|
||||
file_path = file_path.resolve()
|
||||
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"File not found: {file_path}")
|
||||
if file_path.suffix.lower() not in (".md", ".markdown"):
|
||||
raise ValueError(f"Not a markdown file: {file_path}")
|
||||
|
||||
# Read with encoding fallback
|
||||
raw_content = _read_file_safe(file_path)
|
||||
file_hash = hashlib.sha256(raw_content.encode("utf-8")).hexdigest()
|
||||
|
||||
# Check if already ingested and unchanged
|
||||
with get_connection() as conn:
|
||||
existing = conn.execute(
|
||||
"SELECT id, file_hash FROM source_documents WHERE file_path = ?",
|
||||
(str(file_path),),
|
||||
).fetchone()
|
||||
|
||||
if existing and existing["file_hash"] == file_hash:
|
||||
log.info("file_skipped_unchanged", file_path=str(file_path))
|
||||
return {"file": str(file_path), "status": "skipped", "reason": "unchanged"}
|
||||
|
||||
# Parse
|
||||
parsed = parse_markdown(file_path, text=raw_content)
|
||||
|
||||
# Chunk
|
||||
base_meta = {
|
||||
"source_file": str(file_path),
|
||||
"tags": parsed.tags,
|
||||
"title": parsed.title,
|
||||
}
|
||||
chunks = chunk_markdown(parsed.body, base_metadata=base_meta)
|
||||
|
||||
# Store in DB and vector store
|
||||
doc_id = str(uuid.uuid4())
|
||||
vector_store = get_vector_store()
|
||||
old_chunk_ids: list[str] = []
|
||||
new_chunk_ids: list[str] = []
|
||||
|
||||
try:
|
||||
with get_connection() as conn:
|
||||
# Remove old data if re-ingesting
|
||||
if existing:
|
||||
doc_id = existing["id"]
|
||||
old_chunk_ids = [
|
||||
row["id"]
|
||||
for row in conn.execute(
|
||||
"SELECT id FROM source_chunks WHERE document_id = ?",
|
||||
(doc_id,),
|
||||
).fetchall()
|
||||
]
|
||||
conn.execute(
|
||||
"DELETE FROM source_chunks WHERE document_id = ?", (doc_id,)
|
||||
)
|
||||
conn.execute(
|
||||
"UPDATE source_documents SET file_hash = ?, title = ?, tags = ?, updated_at = CURRENT_TIMESTAMP WHERE id = ?",
|
||||
(file_hash, parsed.title, json.dumps(parsed.tags), doc_id),
|
||||
)
|
||||
else:
|
||||
conn.execute(
|
||||
"INSERT INTO source_documents (id, file_path, file_hash, title, doc_type, tags) VALUES (?, ?, ?, ?, ?, ?)",
|
||||
(doc_id, str(file_path), file_hash, parsed.title, "markdown", json.dumps(parsed.tags)),
|
||||
)
|
||||
|
||||
if not chunks:
|
||||
log.warning("no_chunks_created", file_path=str(file_path))
|
||||
else:
|
||||
# Insert chunks
|
||||
chunk_contents = []
|
||||
chunk_metadatas = []
|
||||
|
||||
for chunk in chunks:
|
||||
chunk_id = str(uuid.uuid4())
|
||||
new_chunk_ids.append(chunk_id)
|
||||
chunk_contents.append(chunk.content)
|
||||
chunk_metadatas.append({
|
||||
"document_id": doc_id,
|
||||
"heading_path": chunk.heading_path,
|
||||
"source_file": str(file_path),
|
||||
"tags": json.dumps(parsed.tags),
|
||||
"title": parsed.title,
|
||||
})
|
||||
|
||||
conn.execute(
|
||||
"INSERT INTO source_chunks (id, document_id, chunk_index, content, heading_path, char_count, metadata) VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||
(
|
||||
chunk_id,
|
||||
doc_id,
|
||||
chunk.chunk_index,
|
||||
chunk.content,
|
||||
chunk.heading_path,
|
||||
chunk.char_count,
|
||||
json.dumps(chunk.metadata),
|
||||
),
|
||||
)
|
||||
|
||||
# Add new vectors before commit so DB can still roll back on failure.
|
||||
vector_store.add(new_chunk_ids, chunk_contents, chunk_metadatas)
|
||||
except Exception:
|
||||
if new_chunk_ids:
|
||||
vector_store.delete(new_chunk_ids)
|
||||
raise
|
||||
|
||||
# Delete stale vectors only after the DB transaction committed.
|
||||
if old_chunk_ids:
|
||||
vector_store.delete(old_chunk_ids)
|
||||
|
||||
duration_ms = int((time.time() - start) * 1000)
|
||||
if chunks:
|
||||
log.info(
|
||||
"file_ingested",
|
||||
file_path=str(file_path),
|
||||
chunks_created=len(chunks),
|
||||
duration_ms=duration_ms,
|
||||
)
|
||||
else:
|
||||
log.info(
|
||||
"file_ingested_empty",
|
||||
file_path=str(file_path),
|
||||
duration_ms=duration_ms,
|
||||
)
|
||||
|
||||
return {
|
||||
"file": str(file_path),
|
||||
"status": "ingested" if chunks else "empty",
|
||||
"chunks": len(chunks),
|
||||
"duration_ms": duration_ms,
|
||||
}
|
||||
|
||||
|
||||
def ingest_folder(folder_path: Path, purge_deleted: bool = True) -> list[dict]:
|
||||
"""Ingest all markdown files in a folder recursively.
|
||||
|
||||
Args:
|
||||
folder_path: Directory to scan for .md files.
|
||||
purge_deleted: If True, remove DB/vector entries for files
|
||||
that no longer exist on disk.
|
||||
"""
|
||||
folder_path = folder_path.resolve()
|
||||
if not folder_path.is_dir():
|
||||
raise NotADirectoryError(f"Not a directory: {folder_path}")
|
||||
|
||||
results = []
|
||||
md_files = sorted(
|
||||
list(folder_path.rglob("*.md")) + list(folder_path.rglob("*.markdown"))
|
||||
)
|
||||
current_paths = {str(f.resolve()) for f in md_files}
|
||||
log.info("ingestion_started", folder=str(folder_path), file_count=len(md_files))
|
||||
|
||||
# Ingest new/changed files
|
||||
for md_file in md_files:
|
||||
try:
|
||||
result = ingest_file(md_file)
|
||||
results.append(result)
|
||||
except Exception as e:
|
||||
log.error("ingestion_error", file_path=str(md_file), error=str(e))
|
||||
results.append({"file": str(md_file), "status": "error", "error": str(e)})
|
||||
|
||||
# Purge entries for deleted files
|
||||
if purge_deleted:
|
||||
deleted = _purge_deleted_files(folder_path, current_paths)
|
||||
if deleted:
|
||||
log.info("purged_deleted_files", count=deleted)
|
||||
results.append({"status": "purged", "deleted_count": deleted})
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def get_source_status() -> list[dict]:
|
||||
"""Describe configured source directories and their readiness."""
|
||||
sources = []
|
||||
for spec in _config.settings.source_specs:
|
||||
path = spec["path"]
|
||||
assert isinstance(path, Path)
|
||||
sources.append(
|
||||
{
|
||||
"name": spec["name"],
|
||||
"enabled": spec["enabled"],
|
||||
"path": str(path),
|
||||
"exists": path.exists(),
|
||||
"is_dir": path.is_dir(),
|
||||
"read_only": spec["read_only"],
|
||||
}
|
||||
)
|
||||
return sources
|
||||
|
||||
|
||||
def ingest_configured_sources(purge_deleted: bool = False) -> list[dict]:
|
||||
"""Ingest enabled source directories declared in config.
|
||||
|
||||
Purge is disabled by default here because sources are intended to be
|
||||
read-only inputs and should not be treated as the primary writable state.
|
||||
"""
|
||||
results = []
|
||||
for source in get_source_status():
|
||||
if not source["enabled"]:
|
||||
results.append({"source": source["name"], "status": "disabled", "path": source["path"]})
|
||||
continue
|
||||
if not source["exists"] or not source["is_dir"]:
|
||||
results.append({"source": source["name"], "status": "missing", "path": source["path"]})
|
||||
continue
|
||||
|
||||
folder_results = ingest_folder(Path(source["path"]), purge_deleted=purge_deleted)
|
||||
results.append(
|
||||
{
|
||||
"source": source["name"],
|
||||
"status": "ingested",
|
||||
"path": source["path"],
|
||||
"results": folder_results,
|
||||
}
|
||||
)
|
||||
return results
|
||||
|
||||
|
||||
def get_ingestion_stats() -> dict:
|
||||
"""Return ingestion statistics."""
|
||||
with get_connection() as conn:
|
||||
docs = conn.execute("SELECT COUNT(*) as c FROM source_documents").fetchone()
|
||||
chunks = conn.execute("SELECT COUNT(*) as c FROM source_chunks").fetchone()
|
||||
recent = conn.execute(
|
||||
"SELECT file_path, title, ingested_at FROM source_documents "
|
||||
"ORDER BY updated_at DESC LIMIT 5"
|
||||
).fetchall()
|
||||
|
||||
vector_store = get_vector_store()
|
||||
return {
|
||||
"total_documents": docs["c"],
|
||||
"total_chunks": chunks["c"],
|
||||
"total_vectors": vector_store.count,
|
||||
"recent_documents": [
|
||||
{"file_path": r["file_path"], "title": r["title"], "ingested_at": r["ingested_at"]}
|
||||
for r in recent
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _read_file_safe(file_path: Path) -> str:
|
||||
"""Read a file with encoding fallback."""
|
||||
for encoding in _ENCODINGS:
|
||||
try:
|
||||
return file_path.read_text(encoding=encoding)
|
||||
except (UnicodeDecodeError, ValueError):
|
||||
continue
|
||||
# Last resort: read with errors replaced
|
||||
return file_path.read_text(encoding="utf-8", errors="replace")
|
||||
|
||||
|
||||
def _purge_deleted_files(folder_path: Path, current_paths: set[str]) -> int:
|
||||
"""Remove DB/vector entries for files under folder_path that no longer exist."""
|
||||
folder_str = str(folder_path)
|
||||
deleted_count = 0
|
||||
vector_store = get_vector_store()
|
||||
chunk_ids_to_delete: list[str] = []
|
||||
|
||||
with get_connection() as conn:
|
||||
rows = conn.execute(
|
||||
"SELECT id, file_path FROM source_documents"
|
||||
).fetchall()
|
||||
|
||||
for row in rows:
|
||||
doc_path = Path(row["file_path"])
|
||||
try:
|
||||
doc_path.relative_to(folder_path)
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
if row["file_path"] not in current_paths:
|
||||
doc_id = row["id"]
|
||||
chunk_ids_to_delete.extend(
|
||||
r["id"]
|
||||
for r in conn.execute(
|
||||
"SELECT id FROM source_chunks WHERE document_id = ?",
|
||||
(doc_id,),
|
||||
).fetchall()
|
||||
)
|
||||
conn.execute("DELETE FROM source_chunks WHERE document_id = ?", (doc_id,))
|
||||
conn.execute("DELETE FROM source_documents WHERE id = ?", (doc_id,))
|
||||
log.info("purged_deleted_file", file_path=row["file_path"])
|
||||
deleted_count += 1
|
||||
|
||||
if chunk_ids_to_delete:
|
||||
vector_store.delete(chunk_ids_to_delete)
|
||||
|
||||
return deleted_count
|
||||
27
src/atocore/interactions/__init__.py
Normal file
27
src/atocore/interactions/__init__.py
Normal file
@@ -0,0 +1,27 @@
|
||||
"""Interactions: capture loop for AtoCore.
|
||||
|
||||
This module is the foundation for Phase 9 (Reflection) and Phase 10
|
||||
(Write-back). It records what AtoCore fed to an LLM and what came back,
|
||||
so that later phases can:
|
||||
|
||||
- reinforce active memories that the LLM actually relied on
|
||||
- extract candidate memories / project state from real conversations
|
||||
- inspect the audit trail of any answer the system helped produce
|
||||
|
||||
Nothing here automatically promotes information into trusted state.
|
||||
The capture loop is intentionally read-only with respect to trust.
|
||||
"""
|
||||
|
||||
from atocore.interactions.service import (
|
||||
Interaction,
|
||||
get_interaction,
|
||||
list_interactions,
|
||||
record_interaction,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"Interaction",
|
||||
"get_interaction",
|
||||
"list_interactions",
|
||||
"record_interaction",
|
||||
]
|
||||
329
src/atocore/interactions/service.py
Normal file
329
src/atocore/interactions/service.py
Normal file
@@ -0,0 +1,329 @@
|
||||
"""Interaction capture service.
|
||||
|
||||
An *interaction* is one round-trip of:
|
||||
- a user prompt
|
||||
- the AtoCore context pack that was assembled for it
|
||||
- the LLM response (full text or a summary, caller's choice)
|
||||
- which memories and chunks were actually used in the pack
|
||||
- a client identifier (e.g. ``openclaw``, ``claude-code``, ``manual``)
|
||||
- an optional session identifier so multi-turn conversations can be
|
||||
reconstructed later
|
||||
|
||||
The capture is intentionally additive: it never modifies memories,
|
||||
project state, or chunks. Reflection (Phase 9 Commit B/C) and
|
||||
write-back (Phase 10) are layered on top of this audit trail without
|
||||
violating the AtoCore trust hierarchy.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from atocore.models.database import get_connection
|
||||
from atocore.observability.logger import get_logger
|
||||
from atocore.projects.registry import resolve_project_name
|
||||
|
||||
log = get_logger("interactions")
|
||||
|
||||
# Stored timestamps use 'YYYY-MM-DD HH:MM:SS' (no timezone offset, UTC by
|
||||
# convention) so they sort lexically and compare cleanly with the SQLite
|
||||
# CURRENT_TIMESTAMP default. The since filter accepts ISO 8601 strings
|
||||
# (with 'T', optional 'Z' or +offset, optional fractional seconds) and
|
||||
# normalizes them to the storage format before the SQL comparison.
|
||||
_STORAGE_TIMESTAMP_FORMAT = "%Y-%m-%d %H:%M:%S"
|
||||
|
||||
|
||||
@dataclass
|
||||
class Interaction:
|
||||
id: str
|
||||
prompt: str
|
||||
response: str
|
||||
response_summary: str
|
||||
project: str
|
||||
client: str
|
||||
session_id: str
|
||||
memories_used: list[str] = field(default_factory=list)
|
||||
chunks_used: list[str] = field(default_factory=list)
|
||||
context_pack: dict = field(default_factory=dict)
|
||||
created_at: str = ""
|
||||
|
||||
|
||||
def record_interaction(
|
||||
prompt: str,
|
||||
response: str = "",
|
||||
response_summary: str = "",
|
||||
project: str = "",
|
||||
client: str = "",
|
||||
session_id: str = "",
|
||||
memories_used: list[str] | None = None,
|
||||
chunks_used: list[str] | None = None,
|
||||
context_pack: dict | None = None,
|
||||
reinforce: bool = True,
|
||||
extract: bool = False,
|
||||
) -> Interaction:
|
||||
"""Persist a single interaction to the audit trail.
|
||||
|
||||
The only required field is ``prompt`` so this can be called even when
|
||||
the caller is in the middle of a partial turn (for example to record
|
||||
that AtoCore was queried even before the LLM response is back).
|
||||
|
||||
When ``reinforce`` is True (default) and the interaction has response
|
||||
content, the Phase 9 Commit B reinforcement pass runs automatically
|
||||
against the active memory set. This bumps the confidence of any
|
||||
memory whose content is echoed in the response. Set ``reinforce`` to
|
||||
False to capture the interaction without touching memory confidence,
|
||||
which is useful for backfill and for tests that want to isolate the
|
||||
audit trail from the reinforcement loop.
|
||||
"""
|
||||
if not prompt or not prompt.strip():
|
||||
raise ValueError("Interaction prompt must be non-empty")
|
||||
|
||||
# Canonicalize the project through the registry so an alias and
|
||||
# the canonical id store under the same bucket. Without this,
|
||||
# reinforcement and extraction (which both query by raw
|
||||
# interaction.project) would silently miss memories and create
|
||||
# candidates in the wrong project.
|
||||
project = resolve_project_name(project)
|
||||
|
||||
interaction_id = str(uuid.uuid4())
|
||||
# Store created_at explicitly so the same string lives in both the DB
|
||||
# column and the returned dataclass. SQLite's CURRENT_TIMESTAMP uses
|
||||
# 'YYYY-MM-DD HH:MM:SS' which would not compare cleanly against ISO
|
||||
# timestamps with 'T' and tz offset, breaking the `since` filter on
|
||||
# list_interactions.
|
||||
now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
|
||||
memories_used = list(memories_used or [])
|
||||
chunks_used = list(chunks_used or [])
|
||||
context_pack_payload = context_pack or {}
|
||||
|
||||
with get_connection() as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO interactions (
|
||||
id, prompt, context_pack, response_summary, response,
|
||||
memories_used, chunks_used, client, session_id, project,
|
||||
created_at
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
interaction_id,
|
||||
prompt,
|
||||
json.dumps(context_pack_payload, ensure_ascii=True),
|
||||
response_summary,
|
||||
response,
|
||||
json.dumps(memories_used, ensure_ascii=True),
|
||||
json.dumps(chunks_used, ensure_ascii=True),
|
||||
client,
|
||||
session_id,
|
||||
project,
|
||||
now,
|
||||
),
|
||||
)
|
||||
|
||||
log.info(
|
||||
"interaction_recorded",
|
||||
interaction_id=interaction_id,
|
||||
project=project,
|
||||
client=client,
|
||||
session_id=session_id,
|
||||
memories_used=len(memories_used),
|
||||
chunks_used=len(chunks_used),
|
||||
response_chars=len(response),
|
||||
)
|
||||
|
||||
interaction = Interaction(
|
||||
id=interaction_id,
|
||||
prompt=prompt,
|
||||
response=response,
|
||||
response_summary=response_summary,
|
||||
project=project,
|
||||
client=client,
|
||||
session_id=session_id,
|
||||
memories_used=memories_used,
|
||||
chunks_used=chunks_used,
|
||||
context_pack=context_pack_payload,
|
||||
created_at=now,
|
||||
)
|
||||
|
||||
if reinforce and (response or response_summary):
|
||||
# Import inside the function to avoid a circular import between
|
||||
# the interactions service and the reinforcement module which
|
||||
# depends on it.
|
||||
try:
|
||||
from atocore.memory.reinforcement import reinforce_from_interaction
|
||||
|
||||
reinforce_from_interaction(interaction)
|
||||
except Exception as exc: # pragma: no cover - reinforcement must never block capture
|
||||
log.error(
|
||||
"reinforcement_failed_on_capture",
|
||||
interaction_id=interaction_id,
|
||||
error=str(exc),
|
||||
)
|
||||
|
||||
if extract and (response or response_summary):
|
||||
try:
|
||||
from atocore.memory.extractor import extract_candidates_from_interaction
|
||||
from atocore.memory.service import create_memory
|
||||
|
||||
candidates = extract_candidates_from_interaction(interaction)
|
||||
for candidate in candidates:
|
||||
try:
|
||||
create_memory(
|
||||
memory_type=candidate.memory_type,
|
||||
content=candidate.content,
|
||||
project=candidate.project,
|
||||
confidence=candidate.confidence,
|
||||
status="candidate",
|
||||
)
|
||||
except ValueError:
|
||||
pass # duplicate or validation error — skip silently
|
||||
except Exception as exc: # pragma: no cover - extraction must never block capture
|
||||
log.error(
|
||||
"extraction_failed_on_capture",
|
||||
interaction_id=interaction_id,
|
||||
error=str(exc),
|
||||
)
|
||||
|
||||
return interaction
|
||||
|
||||
|
||||
def list_interactions(
|
||||
project: str | None = None,
|
||||
session_id: str | None = None,
|
||||
client: str | None = None,
|
||||
since: str | None = None,
|
||||
limit: int = 50,
|
||||
) -> list[Interaction]:
|
||||
"""List captured interactions, optionally filtered.
|
||||
|
||||
``since`` accepts an ISO 8601 timestamp string (with ``T``, an
|
||||
optional ``Z`` or numeric offset, optional fractional seconds).
|
||||
The value is normalized to the storage format (UTC,
|
||||
``YYYY-MM-DD HH:MM:SS``) before the SQL comparison so external
|
||||
callers can pass any of the common ISO shapes without filter
|
||||
drift. ``project`` is canonicalized through the registry so an
|
||||
alias finds rows stored under the canonical project id.
|
||||
``limit`` is hard-capped at 500 to keep casual API listings cheap.
|
||||
"""
|
||||
if limit <= 0:
|
||||
return []
|
||||
limit = min(limit, 500)
|
||||
|
||||
query = "SELECT * FROM interactions WHERE 1=1"
|
||||
params: list = []
|
||||
|
||||
if project:
|
||||
query += " AND project = ?"
|
||||
params.append(resolve_project_name(project))
|
||||
if session_id:
|
||||
query += " AND session_id = ?"
|
||||
params.append(session_id)
|
||||
if client:
|
||||
query += " AND client = ?"
|
||||
params.append(client)
|
||||
if since:
|
||||
query += " AND created_at >= ?"
|
||||
params.append(_normalize_since(since))
|
||||
|
||||
query += " ORDER BY created_at DESC LIMIT ?"
|
||||
params.append(limit)
|
||||
|
||||
with get_connection() as conn:
|
||||
rows = conn.execute(query, params).fetchall()
|
||||
|
||||
return [_row_to_interaction(row) for row in rows]
|
||||
|
||||
|
||||
def get_interaction(interaction_id: str) -> Interaction | None:
|
||||
"""Fetch one interaction by id, or return None if it does not exist."""
|
||||
if not interaction_id:
|
||||
return None
|
||||
with get_connection() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT * FROM interactions WHERE id = ?", (interaction_id,)
|
||||
).fetchone()
|
||||
if row is None:
|
||||
return None
|
||||
return _row_to_interaction(row)
|
||||
|
||||
|
||||
def _row_to_interaction(row) -> Interaction:
|
||||
return Interaction(
|
||||
id=row["id"],
|
||||
prompt=row["prompt"],
|
||||
response=row["response"] or "",
|
||||
response_summary=row["response_summary"] or "",
|
||||
project=row["project"] or "",
|
||||
client=row["client"] or "",
|
||||
session_id=row["session_id"] or "",
|
||||
memories_used=_safe_json_list(row["memories_used"]),
|
||||
chunks_used=_safe_json_list(row["chunks_used"]),
|
||||
context_pack=_safe_json_dict(row["context_pack"]),
|
||||
created_at=row["created_at"] or "",
|
||||
)
|
||||
|
||||
|
||||
def _safe_json_list(raw: str | None) -> list[str]:
|
||||
if not raw:
|
||||
return []
|
||||
try:
|
||||
value = json.loads(raw)
|
||||
except json.JSONDecodeError:
|
||||
return []
|
||||
if not isinstance(value, list):
|
||||
return []
|
||||
return [str(item) for item in value]
|
||||
|
||||
|
||||
def _safe_json_dict(raw: str | None) -> dict:
|
||||
if not raw:
|
||||
return {}
|
||||
try:
|
||||
value = json.loads(raw)
|
||||
except json.JSONDecodeError:
|
||||
return {}
|
||||
if not isinstance(value, dict):
|
||||
return {}
|
||||
return value
|
||||
|
||||
|
||||
def _normalize_since(since: str) -> str:
|
||||
"""Normalize an ISO 8601 ``since`` filter to the storage format.
|
||||
|
||||
Stored ``created_at`` values are ``YYYY-MM-DD HH:MM:SS`` (no
|
||||
timezone, UTC by convention). External callers naturally pass
|
||||
ISO 8601 with ``T`` separator, optional ``Z`` suffix, optional
|
||||
fractional seconds, and optional ``+HH:MM`` offsets. A naive
|
||||
string comparison between the two formats fails on the same
|
||||
day because the lexically-greater ``T`` makes any ISO value
|
||||
sort after any space-separated value.
|
||||
|
||||
This helper accepts the common ISO shapes plus the bare
|
||||
storage format and returns the storage format. On a parse
|
||||
failure it returns the input unchanged so the SQL comparison
|
||||
fails open (no rows match) instead of raising and breaking
|
||||
the listing endpoint.
|
||||
"""
|
||||
if not since:
|
||||
return since
|
||||
candidate = since.strip()
|
||||
# Python's fromisoformat understands trailing 'Z' from 3.11+ but
|
||||
# we replace it explicitly for safety against earlier shapes.
|
||||
if candidate.endswith("Z"):
|
||||
candidate = candidate[:-1] + "+00:00"
|
||||
try:
|
||||
dt = datetime.fromisoformat(candidate)
|
||||
except ValueError:
|
||||
# Already in storage format, or unparseable: best-effort
|
||||
# match the storage format with a regex; if that fails too,
|
||||
# return the raw input.
|
||||
if re.fullmatch(r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", since):
|
||||
return since
|
||||
return since
|
||||
if dt.tzinfo is not None:
|
||||
dt = dt.astimezone(timezone.utc).replace(tzinfo=None)
|
||||
return dt.strftime(_STORAGE_TIMESTAMP_FORMAT)
|
||||
62
src/atocore/main.py
Normal file
62
src/atocore/main.py
Normal file
@@ -0,0 +1,62 @@
|
||||
"""AtoCore — FastAPI application entry point."""
|
||||
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
from fastapi import FastAPI
|
||||
|
||||
from atocore import __version__
|
||||
from atocore.api.routes import router
|
||||
import atocore.config as _config
|
||||
from atocore.context.project_state import init_project_state_schema
|
||||
from atocore.ingestion.pipeline import get_source_status
|
||||
from atocore.models.database import init_db
|
||||
from atocore.observability.logger import get_logger, setup_logging
|
||||
|
||||
|
||||
log = get_logger("main")
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
"""Run setup before the first request and teardown after shutdown.
|
||||
|
||||
Replaces the deprecated ``@app.on_event("startup")`` hook with the
|
||||
modern ``lifespan`` context manager. Setup runs synchronously (the
|
||||
underlying calls are blocking I/O) so no await is needed; the
|
||||
function still must be async per the FastAPI contract.
|
||||
"""
|
||||
setup_logging()
|
||||
_config.ensure_runtime_dirs()
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
log.info(
|
||||
"startup_ready",
|
||||
env=_config.settings.env,
|
||||
db_path=str(_config.settings.db_path),
|
||||
chroma_path=str(_config.settings.chroma_path),
|
||||
source_status=get_source_status(),
|
||||
)
|
||||
yield
|
||||
# No teardown work needed today; SQLite connections are short-lived
|
||||
# and the Chroma client cleans itself up on process exit.
|
||||
|
||||
|
||||
app = FastAPI(
|
||||
title="AtoCore",
|
||||
description="Personal Context Engine for LLM interactions",
|
||||
version=__version__,
|
||||
lifespan=lifespan,
|
||||
)
|
||||
|
||||
app.include_router(router)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
|
||||
uvicorn.run(
|
||||
"atocore.main:app",
|
||||
host=_config.settings.host,
|
||||
port=_config.settings.port,
|
||||
reload=True,
|
||||
)
|
||||
0
src/atocore/memory/__init__.py
Normal file
0
src/atocore/memory/__init__.py
Normal file
242
src/atocore/memory/extractor.py
Normal file
242
src/atocore/memory/extractor.py
Normal file
@@ -0,0 +1,242 @@
|
||||
"""Rule-based candidate-memory extraction from captured interactions.
|
||||
|
||||
Phase 9 Commit C. This module reads an interaction's response text and
|
||||
produces a list of *candidate* memories that a human can later review
|
||||
and either promote to active or reject. Nothing extracted here is ever
|
||||
automatically promoted into trusted state — the AtoCore trust rule is
|
||||
that bad memory is worse than no memory, so the extractor is
|
||||
conservative on purpose.
|
||||
|
||||
Design rules for V0
|
||||
-------------------
|
||||
1. Rule-based only. No LLM calls. The extractor should be fast, cheap,
|
||||
fully explainable, and produce the same output for the same input
|
||||
across runs.
|
||||
2. Patterns match obvious, high-signal structures and are intentionally
|
||||
narrow. False positives are more harmful than false negatives because
|
||||
every candidate means review work for a human.
|
||||
3. Every extracted candidate records which pattern fired and which text
|
||||
span it came from, so a reviewer can audit the extractor's reasoning.
|
||||
4. Patterns should feel like idioms the user already writes in their
|
||||
PKM and interaction notes:
|
||||
* ``## Decision: ...`` and variants
|
||||
* ``## Constraint: ...`` and variants
|
||||
* ``I prefer <X>`` / ``the user prefers <X>``
|
||||
* ``decided to <X>``
|
||||
* ``<X> is a requirement`` / ``requirement: <X>``
|
||||
5. Candidates are de-duplicated against already-active memories of the
|
||||
same type+project so review queues don't fill up with things the
|
||||
user has already curated.
|
||||
|
||||
The extractor produces ``MemoryCandidate`` objects. The caller decides
|
||||
whether to persist them via ``create_memory(..., status="candidate")``.
|
||||
Persistence is kept out of the extractor itself so it can be tested
|
||||
without touching the database and so future extractors (LLM-based,
|
||||
structural, ontology-driven) can be swapped in cleanly.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
|
||||
from atocore.interactions.service import Interaction
|
||||
from atocore.memory.service import MEMORY_TYPES, get_memories
|
||||
from atocore.observability.logger import get_logger
|
||||
|
||||
log = get_logger("extractor")
|
||||
|
||||
|
||||
# Bumped whenever the rule set, regex shapes, or post-processing
|
||||
# semantics change in a way that could affect candidate output. The
|
||||
# promotion-rules doc requires every candidate to record the version
|
||||
# of the extractor that produced it so old candidates can be re-evaluated
|
||||
# (or kept as-is) when the rules evolve.
|
||||
#
|
||||
# History:
|
||||
# 0.1.0 - initial Phase 9 Commit C rule set (Apr 6, 2026)
|
||||
EXTRACTOR_VERSION = "0.1.0"
|
||||
|
||||
|
||||
# Every candidate is attributed to the rule that fired so reviewers can
|
||||
# audit why it was proposed.
|
||||
@dataclass
|
||||
class MemoryCandidate:
|
||||
memory_type: str
|
||||
content: str
|
||||
rule: str
|
||||
source_span: str
|
||||
project: str = ""
|
||||
confidence: float = 0.5 # default review-queue confidence
|
||||
source_interaction_id: str = ""
|
||||
extractor_version: str = EXTRACTOR_VERSION
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pattern definitions
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# Each pattern maps to:
|
||||
# - the memory type the candidate should land in
|
||||
# - a compiled regex over the response text
|
||||
# - a short human-readable rule id
|
||||
#
|
||||
# Regexes are intentionally anchored to obvious structural cues so random
|
||||
# prose doesn't light them up. All are case-insensitive and DOTALL so
|
||||
# they can span a line break inside a single logical phrase.
|
||||
|
||||
_RULES: list[tuple[str, str, re.Pattern]] = [
|
||||
(
|
||||
"decision_heading",
|
||||
"adaptation",
|
||||
re.compile(
|
||||
r"^[ \t]*#{1,6}[ \t]*decision[ \t]*[:\-\u2014][ \t]*(?P<value>.+?)$",
|
||||
re.IGNORECASE | re.MULTILINE,
|
||||
),
|
||||
),
|
||||
(
|
||||
"constraint_heading",
|
||||
"project",
|
||||
re.compile(
|
||||
r"^[ \t]*#{1,6}[ \t]*constraint[ \t]*[:\-\u2014][ \t]*(?P<value>.+?)$",
|
||||
re.IGNORECASE | re.MULTILINE,
|
||||
),
|
||||
),
|
||||
(
|
||||
"requirement_heading",
|
||||
"project",
|
||||
re.compile(
|
||||
r"^[ \t]*#{1,6}[ \t]*requirement[ \t]*[:\-\u2014][ \t]*(?P<value>.+?)$",
|
||||
re.IGNORECASE | re.MULTILINE,
|
||||
),
|
||||
),
|
||||
(
|
||||
"fact_heading",
|
||||
"knowledge",
|
||||
re.compile(
|
||||
r"^[ \t]*#{1,6}[ \t]*fact[ \t]*[:\-\u2014][ \t]*(?P<value>.+?)$",
|
||||
re.IGNORECASE | re.MULTILINE,
|
||||
),
|
||||
),
|
||||
(
|
||||
"preference_sentence",
|
||||
"preference",
|
||||
re.compile(
|
||||
r"(?:^|[\s\.])(?:I|the user)\s+prefer(?:s)?\s+(?P<value>[^\n\.\!]{6,200})",
|
||||
re.IGNORECASE,
|
||||
),
|
||||
),
|
||||
(
|
||||
"decided_to_sentence",
|
||||
"adaptation",
|
||||
re.compile(
|
||||
r"(?:^|[\s\.])(?:I|we|the user)\s+decided\s+to\s+(?P<value>[^\n\.\!]{6,200})",
|
||||
re.IGNORECASE,
|
||||
),
|
||||
),
|
||||
(
|
||||
"requirement_sentence",
|
||||
"project",
|
||||
re.compile(
|
||||
r"(?:^|[\s\.])(?:the[ \t]+)?requirement\s+(?:is|was)\s+(?P<value>[^\n\.\!]{6,200})",
|
||||
re.IGNORECASE,
|
||||
),
|
||||
),
|
||||
]
|
||||
|
||||
# A minimum content length after trimming stops silly one-word candidates.
|
||||
_MIN_CANDIDATE_LENGTH = 8
|
||||
# A maximum content length keeps candidates reviewable at a glance.
|
||||
_MAX_CANDIDATE_LENGTH = 280
|
||||
|
||||
|
||||
def extract_candidates_from_interaction(
|
||||
interaction: Interaction,
|
||||
) -> list[MemoryCandidate]:
|
||||
"""Return a list of candidate memories for human review.
|
||||
|
||||
The returned candidates are not persisted. The caller can iterate
|
||||
over the result and call ``create_memory(..., status="candidate")``
|
||||
for each one it wants to land.
|
||||
"""
|
||||
text = _combined_response_text(interaction)
|
||||
if not text:
|
||||
return []
|
||||
|
||||
raw_candidates: list[MemoryCandidate] = []
|
||||
seen_spans: set[tuple[str, str, str]] = set() # (type, normalized_value, rule)
|
||||
|
||||
for rule_id, memory_type, pattern in _RULES:
|
||||
for match in pattern.finditer(text):
|
||||
value = _clean_value(match.group("value"))
|
||||
if len(value) < _MIN_CANDIDATE_LENGTH or len(value) > _MAX_CANDIDATE_LENGTH:
|
||||
continue
|
||||
normalized = value.lower()
|
||||
dedup_key = (memory_type, normalized, rule_id)
|
||||
if dedup_key in seen_spans:
|
||||
continue
|
||||
seen_spans.add(dedup_key)
|
||||
raw_candidates.append(
|
||||
MemoryCandidate(
|
||||
memory_type=memory_type,
|
||||
content=value,
|
||||
rule=rule_id,
|
||||
source_span=match.group(0).strip(),
|
||||
project=interaction.project or "",
|
||||
confidence=0.5,
|
||||
source_interaction_id=interaction.id,
|
||||
)
|
||||
)
|
||||
|
||||
# Drop anything that duplicates an already-active memory of the
|
||||
# same type and project so reviewers aren't asked to re-curate
|
||||
# things they already promoted.
|
||||
filtered = [c for c in raw_candidates if not _matches_existing_active(c)]
|
||||
|
||||
if filtered:
|
||||
log.info(
|
||||
"extraction_produced_candidates",
|
||||
interaction_id=interaction.id,
|
||||
candidate_count=len(filtered),
|
||||
dropped_as_duplicate=len(raw_candidates) - len(filtered),
|
||||
)
|
||||
return filtered
|
||||
|
||||
|
||||
def _combined_response_text(interaction: Interaction) -> str:
|
||||
parts: list[str] = []
|
||||
if interaction.response:
|
||||
parts.append(interaction.response)
|
||||
if interaction.response_summary:
|
||||
parts.append(interaction.response_summary)
|
||||
return "\n".join(parts).strip()
|
||||
|
||||
|
||||
def _clean_value(raw: str) -> str:
|
||||
"""Trim whitespace, strip trailing punctuation, collapse inner spaces."""
|
||||
cleaned = re.sub(r"\s+", " ", raw).strip()
|
||||
# Trim trailing punctuation that commonly trails sentences but is not
|
||||
# part of the fact itself.
|
||||
cleaned = cleaned.rstrip(".;,!?\u2014-")
|
||||
return cleaned.strip()
|
||||
|
||||
|
||||
def _matches_existing_active(candidate: MemoryCandidate) -> bool:
|
||||
"""Return True if an identical active memory already exists."""
|
||||
if candidate.memory_type not in MEMORY_TYPES:
|
||||
return False
|
||||
try:
|
||||
existing = get_memories(
|
||||
memory_type=candidate.memory_type,
|
||||
project=candidate.project or None,
|
||||
active_only=True,
|
||||
limit=200,
|
||||
)
|
||||
except Exception as exc: # pragma: no cover - defensive
|
||||
log.error("extractor_existing_lookup_failed", error=str(exc))
|
||||
return False
|
||||
needle = candidate.content.lower()
|
||||
for mem in existing:
|
||||
if mem.content.lower() == needle:
|
||||
return True
|
||||
return False
|
||||
281
src/atocore/memory/extractor_llm.py
Normal file
281
src/atocore/memory/extractor_llm.py
Normal file
@@ -0,0 +1,281 @@
|
||||
"""LLM-assisted candidate-memory extraction via the Claude Code CLI.
|
||||
|
||||
Day 4 of the 2026-04-11 mini-phase: the rule-based extractor hit 0%
|
||||
recall against real conversational claude-code captures (Day 2 baseline
|
||||
scorecard in ``scripts/eval_data/extractor_labels_2026-04-11.json``),
|
||||
with false negatives spread across 5 distinct miss classes. A single
|
||||
rule expansion cannot close that gap, so this module adds an optional
|
||||
LLM-assisted mode that shells out to the ``claude -p`` (Claude Code
|
||||
non-interactive) CLI with a focused extraction system prompt. That
|
||||
path reuses the user's existing Claude.ai OAuth credentials — no API
|
||||
key anywhere, per the 2026-04-11 decision.
|
||||
|
||||
Trust rules carried forward from the rule-based extractor:
|
||||
|
||||
- Candidates are NEVER auto-promoted. Caller persists with
|
||||
``status="candidate"`` and a human reviews via the triage CLI.
|
||||
- This path is additive. The rule-based extractor keeps working
|
||||
exactly as before; callers opt in by importing this module.
|
||||
- Extraction stays off the capture hot path — this is batch / manual
|
||||
only, per the 2026-04-11 decision.
|
||||
- Failure is silent. Missing CLI, non-zero exit, malformed JSON,
|
||||
timeout — all return an empty list and log an error. Never raises
|
||||
into the caller; the capture audit trail must not break on an
|
||||
optional side effect.
|
||||
|
||||
Configuration:
|
||||
|
||||
- Requires the ``claude`` CLI on PATH (``claude --version`` should work).
|
||||
- ``ATOCORE_LLM_EXTRACTOR_MODEL`` overrides the model alias (default
|
||||
``haiku``).
|
||||
- ``ATOCORE_LLM_EXTRACTOR_TIMEOUT_S`` overrides the per-call timeout
|
||||
(default 45 seconds — first invocation is slow because Node.js
|
||||
startup plus OAuth check is non-trivial).
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- We run ``claude -p`` with ``--model <alias>``,
|
||||
``--append-system-prompt`` for the extraction instructions,
|
||||
``--no-session-persistence`` so we don't pollute session history,
|
||||
and ``--disable-slash-commands`` so stray ``/foo`` in an extracted
|
||||
response never triggers something.
|
||||
- The CLI is invoked from a temp working directory so it does not
|
||||
auto-discover ``CLAUDE.md`` / ``DEV-LEDGER.md`` / ``AGENTS.md``
|
||||
from the repo root. We want a bare extraction context, not the
|
||||
full project briefing. We can't use ``--bare`` because that
|
||||
forces API-key auth; the temp-cwd trick is the lightest way to
|
||||
keep OAuth auth while skipping project context loading.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import tempfile
|
||||
from dataclasses import dataclass
|
||||
from functools import lru_cache
|
||||
|
||||
from atocore.interactions.service import Interaction
|
||||
from atocore.memory.extractor import MemoryCandidate
|
||||
from atocore.memory.service import MEMORY_TYPES
|
||||
from atocore.observability.logger import get_logger
|
||||
|
||||
log = get_logger("extractor_llm")
|
||||
|
||||
LLM_EXTRACTOR_VERSION = "llm-0.2.0"
|
||||
DEFAULT_MODEL = os.environ.get("ATOCORE_LLM_EXTRACTOR_MODEL", "haiku")
|
||||
DEFAULT_TIMEOUT_S = float(os.environ.get("ATOCORE_LLM_EXTRACTOR_TIMEOUT_S", "90"))
|
||||
MAX_RESPONSE_CHARS = 8000
|
||||
MAX_PROMPT_CHARS = 2000
|
||||
|
||||
_SYSTEM_PROMPT = """You extract durable memory candidates from LLM conversation turns for a personal context engine called AtoCore.
|
||||
|
||||
Your job is to read one user prompt plus the assistant's response and decide which durable facts, decisions, preferences, architectural rules, or project invariants should be remembered across future sessions.
|
||||
|
||||
Rules:
|
||||
|
||||
1. Only surface durable claims. Skip transient status ("deploy is still running"), instructional guidance ("here is how to run the command"), troubleshooting tactics, ephemeral recommendations ("merge this PR now"), and session recaps.
|
||||
2. A candidate is durable when a reader coming back in two weeks would still need to know it. Architectural choices, named rules, ratified decisions, invariants, procurement commitments, and project-level constraints qualify. Conversational fillers and step-by-step instructions do not.
|
||||
3. Each candidate must stand alone. Rewrite the claim in one sentence under 200 characters with enough context that a reader without the conversation understands it.
|
||||
4. Each candidate must have a type from this closed set: project, knowledge, preference, adaptation.
|
||||
5. If the conversation is clearly scoped to a project (p04-gigabit, p05-interferometer, p06-polisher, atocore), set ``project`` to that id. Otherwise leave ``project`` empty.
|
||||
6. If the response makes no durable claim, return an empty list. It is correct and expected to return [] on most conversational turns.
|
||||
7. Confidence should be 0.5 by default so human review workload is honest. Raise to 0.6 only when the response states the claim in an unambiguous, committed form (e.g. "the decision is X", "the selected approach is Y", "X is non-negotiable").
|
||||
8. Output must be a raw JSON array and nothing else. No prose before or after. No markdown fences. No explanations.
|
||||
|
||||
Each array element has exactly this shape:
|
||||
|
||||
{"type": "project|knowledge|preference|adaptation", "content": "...", "project": "...", "confidence": 0.5}
|
||||
|
||||
Return [] when there is nothing to extract."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class LLMExtractionResult:
|
||||
candidates: list[MemoryCandidate]
|
||||
raw_output: str
|
||||
error: str = ""
|
||||
|
||||
|
||||
@lru_cache(maxsize=1)
|
||||
def _sandbox_cwd() -> str:
|
||||
"""Return a stable temp directory for ``claude -p`` invocations.
|
||||
|
||||
We want the CLI to run from a directory that does NOT contain
|
||||
``CLAUDE.md`` / ``DEV-LEDGER.md`` / ``AGENTS.md``, so every
|
||||
extraction call starts with a clean context instead of the full
|
||||
AtoCore project briefing. Cached so the directory persists for
|
||||
the lifetime of the process.
|
||||
"""
|
||||
return tempfile.mkdtemp(prefix="ato-llm-extract-")
|
||||
|
||||
|
||||
def _cli_available() -> bool:
|
||||
return shutil.which("claude") is not None
|
||||
|
||||
|
||||
def extract_candidates_llm(
|
||||
interaction: Interaction,
|
||||
model: str | None = None,
|
||||
timeout_s: float | None = None,
|
||||
) -> list[MemoryCandidate]:
|
||||
"""Run the LLM-assisted extractor against one interaction.
|
||||
|
||||
Returns a list of ``MemoryCandidate`` objects, empty on any
|
||||
failure path. The caller is responsible for persistence.
|
||||
"""
|
||||
return extract_candidates_llm_verbose(
|
||||
interaction,
|
||||
model=model,
|
||||
timeout_s=timeout_s,
|
||||
).candidates
|
||||
|
||||
|
||||
def extract_candidates_llm_verbose(
|
||||
interaction: Interaction,
|
||||
model: str | None = None,
|
||||
timeout_s: float | None = None,
|
||||
) -> LLMExtractionResult:
|
||||
"""Like ``extract_candidates_llm`` but also returns the raw
|
||||
subprocess output and any error encountered, for eval / debugging.
|
||||
"""
|
||||
if not _cli_available():
|
||||
return LLMExtractionResult(
|
||||
candidates=[],
|
||||
raw_output="",
|
||||
error="claude_cli_missing",
|
||||
)
|
||||
|
||||
response_text = (interaction.response or "").strip()
|
||||
if not response_text:
|
||||
return LLMExtractionResult(candidates=[], raw_output="", error="empty_response")
|
||||
|
||||
prompt_excerpt = (interaction.prompt or "")[:MAX_PROMPT_CHARS]
|
||||
response_excerpt = response_text[:MAX_RESPONSE_CHARS]
|
||||
user_message = (
|
||||
f"PROJECT HINT (may be empty): {interaction.project or ''}\n\n"
|
||||
f"USER PROMPT:\n{prompt_excerpt}\n\n"
|
||||
f"ASSISTANT RESPONSE:\n{response_excerpt}\n\n"
|
||||
"Return the JSON array now."
|
||||
)
|
||||
|
||||
args = [
|
||||
"claude",
|
||||
"-p",
|
||||
"--model",
|
||||
model or DEFAULT_MODEL,
|
||||
"--append-system-prompt",
|
||||
_SYSTEM_PROMPT,
|
||||
"--no-session-persistence",
|
||||
"--disable-slash-commands",
|
||||
user_message,
|
||||
]
|
||||
|
||||
try:
|
||||
completed = subprocess.run(
|
||||
args,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout_s or DEFAULT_TIMEOUT_S,
|
||||
cwd=_sandbox_cwd(),
|
||||
encoding="utf-8",
|
||||
errors="replace",
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
log.error("llm_extractor_timeout", interaction_id=interaction.id)
|
||||
return LLMExtractionResult(candidates=[], raw_output="", error="timeout")
|
||||
except Exception as exc: # pragma: no cover - unexpected subprocess failure
|
||||
log.error("llm_extractor_subprocess_failed", error=str(exc))
|
||||
return LLMExtractionResult(candidates=[], raw_output="", error=f"subprocess_error: {exc}")
|
||||
|
||||
if completed.returncode != 0:
|
||||
log.error(
|
||||
"llm_extractor_nonzero_exit",
|
||||
interaction_id=interaction.id,
|
||||
returncode=completed.returncode,
|
||||
stderr_prefix=(completed.stderr or "")[:200],
|
||||
)
|
||||
return LLMExtractionResult(
|
||||
candidates=[],
|
||||
raw_output=completed.stdout or "",
|
||||
error=f"exit_{completed.returncode}",
|
||||
)
|
||||
|
||||
raw_output = (completed.stdout or "").strip()
|
||||
candidates = _parse_candidates(raw_output, interaction)
|
||||
log.info(
|
||||
"llm_extractor_done",
|
||||
interaction_id=interaction.id,
|
||||
candidate_count=len(candidates),
|
||||
model=model or DEFAULT_MODEL,
|
||||
)
|
||||
return LLMExtractionResult(candidates=candidates, raw_output=raw_output)
|
||||
|
||||
|
||||
def _parse_candidates(raw_output: str, interaction: Interaction) -> list[MemoryCandidate]:
|
||||
"""Parse the model's JSON output into MemoryCandidate objects.
|
||||
|
||||
Tolerates common model glitches: surrounding whitespace, stray
|
||||
markdown fences, leading/trailing prose. Silently drops malformed
|
||||
array elements rather than raising.
|
||||
"""
|
||||
text = raw_output.strip()
|
||||
if text.startswith("```"):
|
||||
text = text.strip("`")
|
||||
first_newline = text.find("\n")
|
||||
if first_newline >= 0:
|
||||
text = text[first_newline + 1 :]
|
||||
if text.endswith("```"):
|
||||
text = text[:-3]
|
||||
text = text.strip()
|
||||
|
||||
if not text or text == "[]":
|
||||
return []
|
||||
|
||||
if not text.lstrip().startswith("["):
|
||||
start = text.find("[")
|
||||
end = text.rfind("]")
|
||||
if start >= 0 and end > start:
|
||||
text = text[start : end + 1]
|
||||
|
||||
try:
|
||||
parsed = json.loads(text)
|
||||
except json.JSONDecodeError as exc:
|
||||
log.error("llm_extractor_parse_failed", error=str(exc), raw_prefix=raw_output[:120])
|
||||
return []
|
||||
|
||||
if not isinstance(parsed, list):
|
||||
return []
|
||||
|
||||
results: list[MemoryCandidate] = []
|
||||
for item in parsed:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
mem_type = str(item.get("type") or "").strip().lower()
|
||||
content = str(item.get("content") or "").strip()
|
||||
project = str(item.get("project") or "").strip()
|
||||
confidence_raw = item.get("confidence", 0.5)
|
||||
if mem_type not in MEMORY_TYPES:
|
||||
continue
|
||||
if not content:
|
||||
continue
|
||||
try:
|
||||
confidence = float(confidence_raw)
|
||||
except (TypeError, ValueError):
|
||||
confidence = 0.5
|
||||
confidence = max(0.0, min(1.0, confidence))
|
||||
results.append(
|
||||
MemoryCandidate(
|
||||
memory_type=mem_type,
|
||||
content=content[:1000],
|
||||
rule="llm_extraction",
|
||||
source_span=content[:200],
|
||||
project=project,
|
||||
confidence=confidence,
|
||||
source_interaction_id=interaction.id,
|
||||
extractor_version=LLM_EXTRACTOR_VERSION,
|
||||
)
|
||||
)
|
||||
return results
|
||||
241
src/atocore/memory/reinforcement.py
Normal file
241
src/atocore/memory/reinforcement.py
Normal file
@@ -0,0 +1,241 @@
|
||||
"""Reinforce active memories from captured interactions (Phase 9 Commit B).
|
||||
|
||||
When an interaction is captured with a non-empty response, this module
|
||||
scans the response text against currently-active memories and bumps the
|
||||
confidence of any memory whose content appears in the response. The
|
||||
intent is to surface a weak signal that the LLM actually relied on a
|
||||
given memory, without ever promoting anything new into trusted state.
|
||||
|
||||
Design notes
|
||||
------------
|
||||
- Matching uses token-overlap: tokenize both sides (lowercase, stem,
|
||||
drop stop words), then check whether >= 70 % of the memory's content
|
||||
tokens appear in the response token set. This handles natural
|
||||
paraphrases (e.g. "prefers" vs "prefer", "because history" vs
|
||||
"because the history") that substring matching missed.
|
||||
- Candidates and invalidated memories are NEVER considered — reinforcement
|
||||
must not revive history.
|
||||
- Reinforcement is capped at 1.0 and monotonically non-decreasing.
|
||||
- The function is idempotent with respect to a single call but will
|
||||
accumulate confidence across multiple calls; that is intentional — if
|
||||
the same memory is mentioned in 10 separate conversations it is, by
|
||||
definition, more confidently useful.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
|
||||
from atocore.interactions.service import Interaction
|
||||
from atocore.memory.service import (
|
||||
Memory,
|
||||
get_memories,
|
||||
reinforce_memory,
|
||||
)
|
||||
from atocore.observability.logger import get_logger
|
||||
|
||||
log = get_logger("reinforcement")
|
||||
|
||||
# Minimum memory content length to consider for matching. Too-short
|
||||
# memories (e.g. "use SI") would otherwise fire on almost every response
|
||||
# and generate noise. 12 characters is long enough to require real
|
||||
# semantic content but short enough to match one-liner identity
|
||||
# memories like "prefers Python".
|
||||
_MIN_MEMORY_CONTENT_LENGTH = 12
|
||||
|
||||
# Token-overlap matching constants.
|
||||
_STOP_WORDS: frozenset[str] = frozenset({
|
||||
"the", "a", "an", "and", "or", "of", "to", "is", "was",
|
||||
"that", "this", "with", "for", "from", "into",
|
||||
})
|
||||
_MATCH_THRESHOLD = 0.70
|
||||
|
||||
# Long memories can't realistically hit 70% overlap through organic
|
||||
# paraphrase — a 40-token memory would need 28 stemmed tokens echoed
|
||||
# verbatim. Above this token count the matcher switches to an absolute
|
||||
# overlap floor plus a softer fraction floor so paragraph-length memories
|
||||
# still reinforce when the response genuinely uses them.
|
||||
_LONG_MEMORY_TOKEN_COUNT = 15
|
||||
_LONG_MODE_MIN_OVERLAP = 12
|
||||
_LONG_MODE_MIN_FRACTION = 0.35
|
||||
|
||||
DEFAULT_CONFIDENCE_DELTA = 0.02
|
||||
|
||||
|
||||
@dataclass
|
||||
class ReinforcementResult:
|
||||
memory_id: str
|
||||
memory_type: str
|
||||
old_confidence: float
|
||||
new_confidence: float
|
||||
|
||||
|
||||
def reinforce_from_interaction(
|
||||
interaction: Interaction,
|
||||
confidence_delta: float = DEFAULT_CONFIDENCE_DELTA,
|
||||
) -> list[ReinforcementResult]:
|
||||
"""Scan an interaction's response for active-memory mentions.
|
||||
|
||||
Returns the list of memories that were reinforced. An empty list is
|
||||
returned if the interaction has no response content, if no memories
|
||||
match, or if the interaction has no project scope and the global
|
||||
active set is empty.
|
||||
"""
|
||||
response_text = _combined_response_text(interaction)
|
||||
if not response_text:
|
||||
return []
|
||||
|
||||
normalized_response = _normalize(response_text)
|
||||
if not normalized_response:
|
||||
return []
|
||||
|
||||
# Fetch the candidate pool of active memories. We cast a wide net
|
||||
# here: project-scoped memories for the interaction's project first,
|
||||
# plus identity and preference memories which are global by nature.
|
||||
candidate_pool: list[Memory] = []
|
||||
seen_ids: set[str] = set()
|
||||
|
||||
def _add_batch(batch: list[Memory]) -> None:
|
||||
for mem in batch:
|
||||
if mem.id in seen_ids:
|
||||
continue
|
||||
seen_ids.add(mem.id)
|
||||
candidate_pool.append(mem)
|
||||
|
||||
if interaction.project:
|
||||
_add_batch(get_memories(project=interaction.project, active_only=True, limit=200))
|
||||
_add_batch(get_memories(memory_type="identity", active_only=True, limit=50))
|
||||
_add_batch(get_memories(memory_type="preference", active_only=True, limit=50))
|
||||
|
||||
reinforced: list[ReinforcementResult] = []
|
||||
for memory in candidate_pool:
|
||||
if not _memory_matches(memory.content, normalized_response):
|
||||
continue
|
||||
applied, old_conf, new_conf = reinforce_memory(
|
||||
memory.id, confidence_delta=confidence_delta
|
||||
)
|
||||
if not applied:
|
||||
continue
|
||||
reinforced.append(
|
||||
ReinforcementResult(
|
||||
memory_id=memory.id,
|
||||
memory_type=memory.memory_type,
|
||||
old_confidence=old_conf,
|
||||
new_confidence=new_conf,
|
||||
)
|
||||
)
|
||||
|
||||
if reinforced:
|
||||
log.info(
|
||||
"reinforcement_applied",
|
||||
interaction_id=interaction.id,
|
||||
project=interaction.project,
|
||||
reinforced_count=len(reinforced),
|
||||
)
|
||||
return reinforced
|
||||
|
||||
|
||||
def _combined_response_text(interaction: Interaction) -> str:
|
||||
"""Pick the best available response text from an interaction."""
|
||||
parts: list[str] = []
|
||||
if interaction.response:
|
||||
parts.append(interaction.response)
|
||||
if interaction.response_summary:
|
||||
parts.append(interaction.response_summary)
|
||||
return "\n".join(parts).strip()
|
||||
|
||||
|
||||
def _normalize(text: str) -> str:
|
||||
"""Lowercase and collapse whitespace for substring matching."""
|
||||
if not text:
|
||||
return ""
|
||||
lowered = text.lower()
|
||||
# Collapse any run of whitespace (including newlines and tabs) to
|
||||
# a single space so multi-line responses match single-line memories.
|
||||
collapsed = re.sub(r"\s+", " ", lowered)
|
||||
return collapsed.strip()
|
||||
|
||||
|
||||
def _stem(word: str) -> str:
|
||||
"""Aggressive suffix-folding so inflected forms collapse.
|
||||
|
||||
Handles trailing ``ing``, ``ed``, and ``s`` — good enough for
|
||||
reinforcement matching without pulling in nltk/snowball.
|
||||
"""
|
||||
# Order matters: try longest suffix first.
|
||||
if word.endswith("ing") and len(word) >= 6:
|
||||
return word[:-3]
|
||||
if word.endswith("ed") and len(word) > 4:
|
||||
stem = word[:-2]
|
||||
# "preferred" → "preferr" → "prefer" (doubled consonant before -ed)
|
||||
if len(stem) >= 3 and stem[-1] == stem[-2]:
|
||||
stem = stem[:-1]
|
||||
return stem
|
||||
if word.endswith("s") and len(word) > 3:
|
||||
return word[:-1]
|
||||
return word
|
||||
|
||||
|
||||
def _tokenize(text: str) -> set[str]:
|
||||
"""Split normalized text into a stemmed token set.
|
||||
|
||||
Strips punctuation, drops words shorter than 3 chars and stop
|
||||
words. Hyphenated and slash-separated identifiers
|
||||
(``polisher-control``, ``twyman-green``, ``2-projects/interferometer``)
|
||||
produce both the full form AND each sub-token, so a query for
|
||||
"polisher control" can match a memory that wrote
|
||||
"polisher-control" without forcing callers to guess the exact
|
||||
hyphenation.
|
||||
"""
|
||||
tokens: set[str] = set()
|
||||
for raw in text.split():
|
||||
word = raw.strip(".,;:!?\"'()[]{}-/")
|
||||
if not word:
|
||||
continue
|
||||
_add_token(tokens, word)
|
||||
# Also add sub-tokens split on internal '-' or '/' so
|
||||
# hyphenated identifiers match queries that don't hyphenate.
|
||||
if "-" in word or "/" in word:
|
||||
for sub in re.split(r"[-/]+", word):
|
||||
_add_token(tokens, sub)
|
||||
return tokens
|
||||
|
||||
|
||||
def _add_token(tokens: set[str], word: str) -> None:
|
||||
if len(word) < 3:
|
||||
return
|
||||
if word in _STOP_WORDS:
|
||||
return
|
||||
tokens.add(_stem(word))
|
||||
|
||||
|
||||
def _memory_matches(memory_content: str, normalized_response: str) -> bool:
|
||||
"""Return True if enough of the memory's tokens appear in the response.
|
||||
|
||||
Dual-mode token overlap:
|
||||
- Short memories (<= _LONG_MEMORY_TOKEN_COUNT stems): require
|
||||
>= 70 % of memory tokens echoed.
|
||||
- Long memories (paragraphs): require an absolute floor of
|
||||
_LONG_MODE_MIN_OVERLAP distinct stems echoed AND a softer
|
||||
fraction of _LONG_MODE_MIN_FRACTION, so organic paraphrase
|
||||
of a real project memory can reinforce without the response
|
||||
quoting the paragraph verbatim.
|
||||
"""
|
||||
if not memory_content:
|
||||
return False
|
||||
normalized_memory = _normalize(memory_content)
|
||||
if len(normalized_memory) < _MIN_MEMORY_CONTENT_LENGTH:
|
||||
return False
|
||||
memory_tokens = _tokenize(normalized_memory)
|
||||
if not memory_tokens:
|
||||
return False
|
||||
response_tokens = _tokenize(normalized_response)
|
||||
overlap = memory_tokens & response_tokens
|
||||
fraction = len(overlap) / len(memory_tokens)
|
||||
if len(memory_tokens) <= _LONG_MEMORY_TOKEN_COUNT:
|
||||
return fraction >= _MATCH_THRESHOLD
|
||||
return (
|
||||
len(overlap) >= _LONG_MODE_MIN_OVERLAP
|
||||
and fraction >= _LONG_MODE_MIN_FRACTION
|
||||
)
|
||||
478
src/atocore/memory/service.py
Normal file
478
src/atocore/memory/service.py
Normal file
@@ -0,0 +1,478 @@
|
||||
"""Memory Core — structured memory management.
|
||||
|
||||
Memory types (per Master Plan):
|
||||
- identity: who the user is, role, background
|
||||
- preference: how they like to work, style, tools
|
||||
- project: project-specific knowledge and context
|
||||
- episodic: what happened, conversations, events
|
||||
- knowledge: verified facts, technical knowledge
|
||||
- adaptation: learned corrections, behavioral adjustments
|
||||
|
||||
Memories have:
|
||||
- confidence (0.0–1.0): how certain we are
|
||||
- status: lifecycle state, one of MEMORY_STATUSES
|
||||
* candidate: extracted from an interaction, awaiting human review
|
||||
(Phase 9 Commit C). Candidates are NEVER included in
|
||||
context packs.
|
||||
* active: promoted/curated, visible to retrieval and context
|
||||
* superseded: replaced by a newer entry
|
||||
* invalid: rejected / error-corrected
|
||||
- last_referenced_at / reference_count: reinforcement signal
|
||||
(Phase 9 Commit B). Bumped whenever a captured interaction's
|
||||
response content echoes this memory.
|
||||
- optional link to source chunk: traceability
|
||||
"""
|
||||
|
||||
import uuid
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from atocore.models.database import get_connection
|
||||
from atocore.observability.logger import get_logger
|
||||
from atocore.projects.registry import resolve_project_name
|
||||
|
||||
log = get_logger("memory")
|
||||
|
||||
MEMORY_TYPES = [
|
||||
"identity",
|
||||
"preference",
|
||||
"project",
|
||||
"episodic",
|
||||
"knowledge",
|
||||
"adaptation",
|
||||
]
|
||||
|
||||
MEMORY_STATUSES = [
|
||||
"candidate",
|
||||
"active",
|
||||
"superseded",
|
||||
"invalid",
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class Memory:
|
||||
id: str
|
||||
memory_type: str
|
||||
content: str
|
||||
project: str
|
||||
source_chunk_id: str
|
||||
confidence: float
|
||||
status: str
|
||||
created_at: str
|
||||
updated_at: str
|
||||
last_referenced_at: str = ""
|
||||
reference_count: int = 0
|
||||
|
||||
|
||||
def create_memory(
|
||||
memory_type: str,
|
||||
content: str,
|
||||
project: str = "",
|
||||
source_chunk_id: str = "",
|
||||
confidence: float = 1.0,
|
||||
status: str = "active",
|
||||
) -> Memory:
|
||||
"""Create a new memory entry.
|
||||
|
||||
``status`` defaults to ``active`` for backward compatibility. Pass
|
||||
``candidate`` when the memory is being proposed by the Phase 9 Commit C
|
||||
extractor and still needs human review before it can influence context.
|
||||
"""
|
||||
if memory_type not in MEMORY_TYPES:
|
||||
raise ValueError(f"Invalid memory type '{memory_type}'. Must be one of: {MEMORY_TYPES}")
|
||||
if status not in MEMORY_STATUSES:
|
||||
raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
|
||||
_validate_confidence(confidence)
|
||||
|
||||
# Canonicalize the project through the registry so an alias and
|
||||
# the canonical id store under the same bucket. This keeps
|
||||
# reinforcement queries (which use the interaction's project) and
|
||||
# context retrieval (which uses the registry-canonicalized hint)
|
||||
# consistent with how memories are created.
|
||||
project = resolve_project_name(project)
|
||||
|
||||
memory_id = str(uuid.uuid4())
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
# Check for duplicate content within the same type+project at the same status.
|
||||
# Scoping by status keeps active curation separate from the candidate
|
||||
# review queue: a candidate and an active memory with identical text can
|
||||
# legitimately coexist if the candidate is a fresh extraction of something
|
||||
# already curated.
|
||||
with get_connection() as conn:
|
||||
existing = conn.execute(
|
||||
"SELECT id FROM memories "
|
||||
"WHERE memory_type = ? AND content = ? AND project = ? AND status = ?",
|
||||
(memory_type, content, project, status),
|
||||
).fetchone()
|
||||
if existing:
|
||||
log.info(
|
||||
"memory_duplicate_skipped",
|
||||
memory_type=memory_type,
|
||||
status=status,
|
||||
content_preview=content[:80],
|
||||
)
|
||||
return _row_to_memory(
|
||||
conn.execute("SELECT * FROM memories WHERE id = ?", (existing["id"],)).fetchone()
|
||||
)
|
||||
|
||||
conn.execute(
|
||||
"INSERT INTO memories (id, memory_type, content, project, source_chunk_id, confidence, status) "
|
||||
"VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||
(memory_id, memory_type, content, project, source_chunk_id or None, confidence, status),
|
||||
)
|
||||
|
||||
log.info(
|
||||
"memory_created",
|
||||
memory_type=memory_type,
|
||||
status=status,
|
||||
content_preview=content[:80],
|
||||
)
|
||||
|
||||
return Memory(
|
||||
id=memory_id,
|
||||
memory_type=memory_type,
|
||||
content=content,
|
||||
project=project,
|
||||
source_chunk_id=source_chunk_id,
|
||||
confidence=confidence,
|
||||
status=status,
|
||||
created_at=now,
|
||||
updated_at=now,
|
||||
last_referenced_at="",
|
||||
reference_count=0,
|
||||
)
|
||||
|
||||
|
||||
def get_memories(
|
||||
memory_type: str | None = None,
|
||||
project: str | None = None,
|
||||
active_only: bool = True,
|
||||
min_confidence: float = 0.0,
|
||||
limit: int = 50,
|
||||
status: str | None = None,
|
||||
) -> list[Memory]:
|
||||
"""Retrieve memories, optionally filtered.
|
||||
|
||||
When ``status`` is provided explicitly, it takes precedence over
|
||||
``active_only`` so callers can list the candidate review queue via
|
||||
``get_memories(status='candidate')``. When ``status`` is omitted the
|
||||
legacy ``active_only`` behaviour still applies.
|
||||
"""
|
||||
if status is not None and status not in MEMORY_STATUSES:
|
||||
raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
|
||||
|
||||
query = "SELECT * FROM memories WHERE 1=1"
|
||||
params: list = []
|
||||
|
||||
if memory_type:
|
||||
query += " AND memory_type = ?"
|
||||
params.append(memory_type)
|
||||
if project is not None:
|
||||
# Canonicalize on the read side so a caller passing an alias
|
||||
# finds rows that were stored under the canonical id (and
|
||||
# vice versa). resolve_project_name returns the input
|
||||
# unchanged for unregistered names so empty-string queries
|
||||
# for "no project scope" still work.
|
||||
query += " AND project = ?"
|
||||
params.append(resolve_project_name(project))
|
||||
if status is not None:
|
||||
query += " AND status = ?"
|
||||
params.append(status)
|
||||
elif active_only:
|
||||
query += " AND status = 'active'"
|
||||
if min_confidence > 0:
|
||||
query += " AND confidence >= ?"
|
||||
params.append(min_confidence)
|
||||
|
||||
query += " ORDER BY confidence DESC, updated_at DESC LIMIT ?"
|
||||
params.append(limit)
|
||||
|
||||
with get_connection() as conn:
|
||||
rows = conn.execute(query, params).fetchall()
|
||||
|
||||
return [_row_to_memory(r) for r in rows]
|
||||
|
||||
|
||||
def update_memory(
|
||||
memory_id: str,
|
||||
content: str | None = None,
|
||||
confidence: float | None = None,
|
||||
status: str | None = None,
|
||||
) -> bool:
|
||||
"""Update an existing memory."""
|
||||
with get_connection() as conn:
|
||||
existing = conn.execute("SELECT * FROM memories WHERE id = ?", (memory_id,)).fetchone()
|
||||
if existing is None:
|
||||
return False
|
||||
|
||||
next_content = content if content is not None else existing["content"]
|
||||
next_status = status if status is not None else existing["status"]
|
||||
if confidence is not None:
|
||||
_validate_confidence(confidence)
|
||||
|
||||
if next_status == "active":
|
||||
duplicate = conn.execute(
|
||||
"SELECT id FROM memories "
|
||||
"WHERE memory_type = ? AND content = ? AND project = ? AND status = 'active' AND id != ?",
|
||||
(existing["memory_type"], next_content, existing["project"] or "", memory_id),
|
||||
).fetchone()
|
||||
if duplicate:
|
||||
raise ValueError("Update would create a duplicate active memory")
|
||||
|
||||
updates = []
|
||||
params: list = []
|
||||
|
||||
if content is not None:
|
||||
updates.append("content = ?")
|
||||
params.append(content)
|
||||
if confidence is not None:
|
||||
updates.append("confidence = ?")
|
||||
params.append(confidence)
|
||||
if status is not None:
|
||||
if status not in MEMORY_STATUSES:
|
||||
raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
|
||||
updates.append("status = ?")
|
||||
params.append(status)
|
||||
|
||||
if not updates:
|
||||
return False
|
||||
|
||||
updates.append("updated_at = CURRENT_TIMESTAMP")
|
||||
params.append(memory_id)
|
||||
|
||||
result = conn.execute(
|
||||
f"UPDATE memories SET {', '.join(updates)} WHERE id = ?",
|
||||
params,
|
||||
)
|
||||
|
||||
if result.rowcount > 0:
|
||||
log.info("memory_updated", memory_id=memory_id)
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def invalidate_memory(memory_id: str) -> bool:
|
||||
"""Mark a memory as invalid (error correction)."""
|
||||
return update_memory(memory_id, status="invalid")
|
||||
|
||||
|
||||
def supersede_memory(memory_id: str) -> bool:
|
||||
"""Mark a memory as superseded (replaced by newer info)."""
|
||||
return update_memory(memory_id, status="superseded")
|
||||
|
||||
|
||||
def promote_memory(memory_id: str) -> bool:
|
||||
"""Promote a candidate memory to active (Phase 9 Commit C review queue).
|
||||
|
||||
Returns False if the memory does not exist or is not currently a
|
||||
candidate. Raises ValueError only if the promotion would create a
|
||||
duplicate active memory (delegates to update_memory's existing check).
|
||||
"""
|
||||
with get_connection() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT status FROM memories WHERE id = ?", (memory_id,)
|
||||
).fetchone()
|
||||
if row is None:
|
||||
return False
|
||||
if row["status"] != "candidate":
|
||||
return False
|
||||
return update_memory(memory_id, status="active")
|
||||
|
||||
|
||||
def reject_candidate_memory(memory_id: str) -> bool:
|
||||
"""Reject a candidate memory (Phase 9 Commit C).
|
||||
|
||||
Sets the candidate's status to ``invalid`` so it drops out of the
|
||||
review queue without polluting the active set. Returns False if the
|
||||
memory does not exist or is not currently a candidate.
|
||||
"""
|
||||
with get_connection() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT status FROM memories WHERE id = ?", (memory_id,)
|
||||
).fetchone()
|
||||
if row is None:
|
||||
return False
|
||||
if row["status"] != "candidate":
|
||||
return False
|
||||
return update_memory(memory_id, status="invalid")
|
||||
|
||||
|
||||
def reinforce_memory(
|
||||
memory_id: str,
|
||||
confidence_delta: float = 0.02,
|
||||
) -> tuple[bool, float, float]:
|
||||
"""Bump a memory's confidence and reference count (Phase 9 Commit B).
|
||||
|
||||
Returns a 3-tuple ``(applied, old_confidence, new_confidence)``.
|
||||
``applied`` is False if the memory does not exist or is not in the
|
||||
``active`` state — reinforcement only touches live memories so the
|
||||
candidate queue and invalidated history are never silently revived.
|
||||
|
||||
Confidence is capped at 1.0. last_referenced_at is set to the current
|
||||
UTC time in SQLite-comparable format. reference_count is incremented
|
||||
by one per call (not per delta amount).
|
||||
"""
|
||||
if confidence_delta < 0:
|
||||
raise ValueError("confidence_delta must be non-negative for reinforcement")
|
||||
now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
|
||||
with get_connection() as conn:
|
||||
row = conn.execute(
|
||||
"SELECT confidence, status FROM memories WHERE id = ?", (memory_id,)
|
||||
).fetchone()
|
||||
if row is None or row["status"] != "active":
|
||||
return False, 0.0, 0.0
|
||||
old_confidence = float(row["confidence"])
|
||||
new_confidence = min(1.0, old_confidence + confidence_delta)
|
||||
conn.execute(
|
||||
"UPDATE memories SET confidence = ?, last_referenced_at = ?, "
|
||||
"reference_count = COALESCE(reference_count, 0) + 1 "
|
||||
"WHERE id = ?",
|
||||
(new_confidence, now, memory_id),
|
||||
)
|
||||
log.info(
|
||||
"memory_reinforced",
|
||||
memory_id=memory_id,
|
||||
old_confidence=round(old_confidence, 4),
|
||||
new_confidence=round(new_confidence, 4),
|
||||
)
|
||||
return True, old_confidence, new_confidence
|
||||
|
||||
|
||||
def get_memories_for_context(
|
||||
memory_types: list[str] | None = None,
|
||||
project: str | None = None,
|
||||
budget: int = 500,
|
||||
header: str = "--- AtoCore Memory ---",
|
||||
footer: str = "--- End Memory ---",
|
||||
query: str | None = None,
|
||||
) -> tuple[str, int]:
|
||||
"""Get formatted memories for context injection.
|
||||
|
||||
Returns (formatted_text, char_count).
|
||||
|
||||
Budget allocation per Master Plan section 9:
|
||||
identity: 5%, preference: 5%, rest from retrieval budget
|
||||
|
||||
The caller can override ``header`` / ``footer`` to distinguish
|
||||
multiple memory blocks in the same pack (e.g. identity/preference
|
||||
vs project/knowledge memories).
|
||||
|
||||
When ``query`` is provided, candidates within each memory type
|
||||
are ranked by lexical overlap against the query (stemmed token
|
||||
intersection, ties broken by confidence). Without a query,
|
||||
candidates fall through in the order ``get_memories`` returns
|
||||
them — which is effectively "by confidence desc".
|
||||
"""
|
||||
if memory_types is None:
|
||||
memory_types = ["identity", "preference"]
|
||||
|
||||
if budget <= 0:
|
||||
return "", 0
|
||||
wrapper_chars = len(header) + len(footer) + 2
|
||||
if budget <= wrapper_chars:
|
||||
return "", 0
|
||||
|
||||
available = budget - wrapper_chars
|
||||
selected_entries: list[str] = []
|
||||
used = 0
|
||||
|
||||
# Pre-tokenize the query once. ``_score_memory_for_query`` is a
|
||||
# free function below that reuses the reinforcement tokenizer so
|
||||
# lexical scoring here matches the reinforcement matcher.
|
||||
query_tokens: set[str] | None = None
|
||||
if query:
|
||||
from atocore.memory.reinforcement import _normalize, _tokenize
|
||||
|
||||
query_tokens = _tokenize(_normalize(query))
|
||||
if not query_tokens:
|
||||
query_tokens = None
|
||||
|
||||
# Collect ALL candidates across the requested types into one
|
||||
# pool, then rank globally before the budget walk. Ranking per
|
||||
# type and walking types in order would starve later types when
|
||||
# the first type's candidates filled the budget — even if a
|
||||
# later-type candidate matched the query perfectly. Type order
|
||||
# is preserved as a stable tiebreaker inside
|
||||
# ``_rank_memories_for_query`` via Python's stable sort.
|
||||
pool: list[Memory] = []
|
||||
seen_ids: set[str] = set()
|
||||
for mtype in memory_types:
|
||||
for mem in get_memories(
|
||||
memory_type=mtype,
|
||||
project=project,
|
||||
min_confidence=0.5,
|
||||
limit=30,
|
||||
):
|
||||
if mem.id in seen_ids:
|
||||
continue
|
||||
seen_ids.add(mem.id)
|
||||
pool.append(mem)
|
||||
|
||||
if query_tokens is not None:
|
||||
pool = _rank_memories_for_query(pool, query_tokens)
|
||||
|
||||
for mem in pool:
|
||||
entry = f"[{mem.memory_type}] {mem.content}"
|
||||
entry_len = len(entry) + 1
|
||||
if entry_len > available - used:
|
||||
continue
|
||||
selected_entries.append(entry)
|
||||
used += entry_len
|
||||
|
||||
if not selected_entries:
|
||||
return "", 0
|
||||
|
||||
lines = [header, *selected_entries, footer]
|
||||
text = "\n".join(lines)
|
||||
|
||||
log.info("memories_for_context", count=len(selected_entries), chars=len(text))
|
||||
return text, len(text)
|
||||
|
||||
|
||||
def _rank_memories_for_query(
|
||||
memories: list["Memory"],
|
||||
query_tokens: set[str],
|
||||
) -> list["Memory"]:
|
||||
"""Rerank a memory list by lexical overlap with a pre-tokenized query.
|
||||
|
||||
Ordering key: (overlap_count DESC, confidence DESC). When a query
|
||||
shares no tokens with a memory, overlap is zero and confidence
|
||||
acts as the sole tiebreaker — which matches the pre-query
|
||||
behaviour and keeps no-query calls stable.
|
||||
"""
|
||||
from atocore.memory.reinforcement import _normalize, _tokenize
|
||||
|
||||
scored: list[tuple[int, float, Memory]] = []
|
||||
for mem in memories:
|
||||
mem_tokens = _tokenize(_normalize(mem.content))
|
||||
overlap = len(mem_tokens & query_tokens) if mem_tokens else 0
|
||||
scored.append((overlap, mem.confidence, mem))
|
||||
scored.sort(key=lambda t: (t[0], t[1]), reverse=True)
|
||||
return [mem for _, _, mem in scored]
|
||||
|
||||
|
||||
def _row_to_memory(row) -> Memory:
|
||||
"""Convert a DB row to Memory dataclass."""
|
||||
keys = row.keys() if hasattr(row, "keys") else []
|
||||
last_ref = row["last_referenced_at"] if "last_referenced_at" in keys else None
|
||||
ref_count = row["reference_count"] if "reference_count" in keys else 0
|
||||
return Memory(
|
||||
id=row["id"],
|
||||
memory_type=row["memory_type"],
|
||||
content=row["content"],
|
||||
project=row["project"] or "",
|
||||
source_chunk_id=row["source_chunk_id"] or "",
|
||||
confidence=row["confidence"],
|
||||
status=row["status"],
|
||||
created_at=row["created_at"],
|
||||
updated_at=row["updated_at"],
|
||||
last_referenced_at=last_ref or "",
|
||||
reference_count=int(ref_count or 0),
|
||||
)
|
||||
|
||||
|
||||
def _validate_confidence(confidence: float) -> None:
|
||||
if not 0.0 <= confidence <= 1.0:
|
||||
raise ValueError("Confidence must be between 0.0 and 1.0")
|
||||
0
src/atocore/models/__init__.py
Normal file
0
src/atocore/models/__init__.py
Normal file
175
src/atocore/models/database.py
Normal file
175
src/atocore/models/database.py
Normal file
@@ -0,0 +1,175 @@
|
||||
"""SQLite database schema and connection management."""
|
||||
|
||||
import sqlite3
|
||||
from contextlib import contextmanager
|
||||
from pathlib import Path
|
||||
from typing import Generator
|
||||
|
||||
import atocore.config as _config
|
||||
from atocore.observability.logger import get_logger
|
||||
|
||||
log = get_logger("database")
|
||||
|
||||
SCHEMA_SQL = """
|
||||
CREATE TABLE IF NOT EXISTS source_documents (
|
||||
id TEXT PRIMARY KEY,
|
||||
file_path TEXT UNIQUE NOT NULL,
|
||||
file_hash TEXT NOT NULL,
|
||||
title TEXT,
|
||||
doc_type TEXT DEFAULT 'markdown',
|
||||
tags TEXT DEFAULT '[]',
|
||||
ingested_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS source_chunks (
|
||||
id TEXT PRIMARY KEY,
|
||||
document_id TEXT NOT NULL REFERENCES source_documents(id) ON DELETE CASCADE,
|
||||
chunk_index INTEGER NOT NULL,
|
||||
content TEXT NOT NULL,
|
||||
heading_path TEXT DEFAULT '',
|
||||
char_count INTEGER NOT NULL,
|
||||
metadata TEXT DEFAULT '{}',
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS memories (
|
||||
id TEXT PRIMARY KEY,
|
||||
memory_type TEXT NOT NULL,
|
||||
content TEXT NOT NULL,
|
||||
project TEXT DEFAULT '',
|
||||
source_chunk_id TEXT REFERENCES source_chunks(id),
|
||||
confidence REAL DEFAULT 1.0,
|
||||
status TEXT DEFAULT 'active',
|
||||
last_referenced_at DATETIME,
|
||||
reference_count INTEGER DEFAULT 0,
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS projects (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT UNIQUE NOT NULL,
|
||||
description TEXT DEFAULT '',
|
||||
status TEXT DEFAULT 'active',
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS interactions (
|
||||
id TEXT PRIMARY KEY,
|
||||
prompt TEXT NOT NULL,
|
||||
context_pack TEXT DEFAULT '{}',
|
||||
response_summary TEXT DEFAULT '',
|
||||
response TEXT DEFAULT '',
|
||||
memories_used TEXT DEFAULT '[]',
|
||||
chunks_used TEXT DEFAULT '[]',
|
||||
client TEXT DEFAULT '',
|
||||
session_id TEXT DEFAULT '',
|
||||
project TEXT DEFAULT '',
|
||||
project_id TEXT REFERENCES projects(id),
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
-- Indexes that reference columns guaranteed to exist since the first
|
||||
-- release ship here. Indexes that reference columns added by later
|
||||
-- migrations (memories.project, interactions.project,
|
||||
-- interactions.session_id) are created inside _apply_migrations AFTER
|
||||
-- the corresponding ALTER TABLE, NOT here. Creating them here would
|
||||
-- fail on upgrade from a pre-migration schema because CREATE TABLE
|
||||
-- IF NOT EXISTS is a no-op on an existing table, so the new columns
|
||||
-- wouldn't be added before the CREATE INDEX runs.
|
||||
CREATE INDEX IF NOT EXISTS idx_chunks_document ON source_chunks(document_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_memories_type ON memories(memory_type);
|
||||
CREATE INDEX IF NOT EXISTS idx_memories_status ON memories(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_interactions_project ON interactions(project_id);
|
||||
"""
|
||||
|
||||
|
||||
def _ensure_data_dir() -> None:
|
||||
_config.ensure_runtime_dirs()
|
||||
|
||||
|
||||
def init_db() -> None:
|
||||
"""Initialize the database with schema."""
|
||||
_ensure_data_dir()
|
||||
with get_connection() as conn:
|
||||
conn.executescript(SCHEMA_SQL)
|
||||
_apply_migrations(conn)
|
||||
log.info("database_initialized", path=str(_config.settings.db_path))
|
||||
|
||||
|
||||
def _apply_migrations(conn: sqlite3.Connection) -> None:
|
||||
"""Apply lightweight schema migrations for existing local databases."""
|
||||
if not _column_exists(conn, "memories", "project"):
|
||||
conn.execute("ALTER TABLE memories ADD COLUMN project TEXT DEFAULT ''")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project)")
|
||||
|
||||
# Phase 9 Commit B: reinforcement columns.
|
||||
# last_referenced_at records when a memory was most recently referenced
|
||||
# in a captured interaction; reference_count is a monotonically
|
||||
# increasing counter bumped on every reference. Together they let
|
||||
# Reflection (Commit C) and decay (deferred) reason about which
|
||||
# memories are actually being used versus which have gone cold.
|
||||
if not _column_exists(conn, "memories", "last_referenced_at"):
|
||||
conn.execute("ALTER TABLE memories ADD COLUMN last_referenced_at DATETIME")
|
||||
if not _column_exists(conn, "memories", "reference_count"):
|
||||
conn.execute("ALTER TABLE memories ADD COLUMN reference_count INTEGER DEFAULT 0")
|
||||
conn.execute(
|
||||
"CREATE INDEX IF NOT EXISTS idx_memories_last_referenced ON memories(last_referenced_at)"
|
||||
)
|
||||
|
||||
# Phase 9 Commit A: capture loop columns on the interactions table.
|
||||
# The original schema only carried prompt + project_id + a context_pack
|
||||
# JSON blob. To make interactions a real audit trail of what AtoCore fed
|
||||
# the LLM and what came back, we record the full response, the chunk
|
||||
# and memory ids that were actually used, plus client + session metadata.
|
||||
if not _column_exists(conn, "interactions", "response"):
|
||||
conn.execute("ALTER TABLE interactions ADD COLUMN response TEXT DEFAULT ''")
|
||||
if not _column_exists(conn, "interactions", "memories_used"):
|
||||
conn.execute("ALTER TABLE interactions ADD COLUMN memories_used TEXT DEFAULT '[]'")
|
||||
if not _column_exists(conn, "interactions", "chunks_used"):
|
||||
conn.execute("ALTER TABLE interactions ADD COLUMN chunks_used TEXT DEFAULT '[]'")
|
||||
if not _column_exists(conn, "interactions", "client"):
|
||||
conn.execute("ALTER TABLE interactions ADD COLUMN client TEXT DEFAULT ''")
|
||||
if not _column_exists(conn, "interactions", "session_id"):
|
||||
conn.execute("ALTER TABLE interactions ADD COLUMN session_id TEXT DEFAULT ''")
|
||||
if not _column_exists(conn, "interactions", "project"):
|
||||
conn.execute("ALTER TABLE interactions ADD COLUMN project TEXT DEFAULT ''")
|
||||
conn.execute(
|
||||
"CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id)"
|
||||
)
|
||||
conn.execute(
|
||||
"CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project)"
|
||||
)
|
||||
conn.execute(
|
||||
"CREATE INDEX IF NOT EXISTS idx_interactions_created_at ON interactions(created_at)"
|
||||
)
|
||||
|
||||
|
||||
def _column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
|
||||
rows = conn.execute(f"PRAGMA table_info({table})").fetchall()
|
||||
return any(row["name"] == column for row in rows)
|
||||
|
||||
|
||||
@contextmanager
|
||||
def get_connection() -> Generator[sqlite3.Connection, None, None]:
|
||||
"""Get a database connection with row factory."""
|
||||
_ensure_data_dir()
|
||||
conn = sqlite3.connect(
|
||||
str(_config.settings.db_path),
|
||||
timeout=_config.settings.db_busy_timeout_ms / 1000,
|
||||
)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
conn.execute(f"PRAGMA busy_timeout = {_config.settings.db_busy_timeout_ms}")
|
||||
conn.execute("PRAGMA journal_mode = WAL")
|
||||
conn.execute("PRAGMA synchronous = NORMAL")
|
||||
try:
|
||||
yield conn
|
||||
conn.commit()
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
raise
|
||||
finally:
|
||||
conn.close()
|
||||
0
src/atocore/observability/__init__.py
Normal file
0
src/atocore/observability/__init__.py
Normal file
43
src/atocore/observability/logger.py
Normal file
43
src/atocore/observability/logger.py
Normal file
@@ -0,0 +1,43 @@
|
||||
"""Structured logging for AtoCore."""
|
||||
|
||||
import logging
|
||||
|
||||
import atocore.config as _config
|
||||
import structlog
|
||||
|
||||
_LOG_LEVELS = {
|
||||
"DEBUG": logging.DEBUG,
|
||||
"INFO": logging.INFO,
|
||||
"WARNING": logging.WARNING,
|
||||
"ERROR": logging.ERROR,
|
||||
}
|
||||
|
||||
|
||||
def setup_logging() -> None:
|
||||
"""Configure structlog with JSON output."""
|
||||
log_level = "DEBUG" if _config.settings.debug else "INFO"
|
||||
renderer = (
|
||||
structlog.dev.ConsoleRenderer()
|
||||
if _config.settings.debug
|
||||
else structlog.processors.JSONRenderer()
|
||||
)
|
||||
|
||||
structlog.configure(
|
||||
processors=[
|
||||
structlog.contextvars.merge_contextvars,
|
||||
structlog.processors.add_log_level,
|
||||
structlog.processors.TimeStamper(fmt="iso"),
|
||||
renderer,
|
||||
],
|
||||
wrapper_class=structlog.make_filtering_bound_logger(
|
||||
_LOG_LEVELS.get(log_level, logging.INFO)
|
||||
),
|
||||
context_class=dict,
|
||||
logger_factory=structlog.PrintLoggerFactory(),
|
||||
cache_logger_on_first_use=True,
|
||||
)
|
||||
|
||||
|
||||
def get_logger(name: str) -> structlog.BoundLogger:
|
||||
"""Get a named logger."""
|
||||
return structlog.get_logger(name)
|
||||
1
src/atocore/ops/__init__.py
Normal file
1
src/atocore/ops/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Operational utilities for running AtoCore safely."""
|
||||
636
src/atocore/ops/backup.py
Normal file
636
src/atocore/ops/backup.py
Normal file
@@ -0,0 +1,636 @@
|
||||
"""Create safe runtime backups for the AtoCore machine store.
|
||||
|
||||
This module is intentionally conservative:
|
||||
|
||||
- The SQLite snapshot uses the online ``conn.backup()`` API and is safe to
|
||||
call while the database is in use.
|
||||
- The project registry snapshot is a simple file copy of the canonical
|
||||
registry JSON.
|
||||
- The Chroma snapshot is a *cold* directory copy. To stay safe it must be
|
||||
taken while no ingestion is running. The recommended pattern from the API
|
||||
layer is to acquire ``exclusive_ingestion()`` for the duration of the
|
||||
backup so refreshes and ingestions cannot run concurrently with the copy.
|
||||
|
||||
The backup metadata file records what was actually included so restore
|
||||
tooling does not have to guess.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import shutil
|
||||
import sqlite3
|
||||
from datetime import datetime, UTC
|
||||
from pathlib import Path
|
||||
|
||||
import atocore.config as _config
|
||||
from atocore.models.database import init_db
|
||||
from atocore.observability.logger import get_logger
|
||||
|
||||
log = get_logger("backup")
|
||||
|
||||
|
||||
def create_runtime_backup(
|
||||
timestamp: datetime | None = None,
|
||||
include_chroma: bool = False,
|
||||
) -> dict:
|
||||
"""Create a hot SQLite backup plus registry/config metadata.
|
||||
|
||||
When ``include_chroma`` is true the Chroma persistence directory is also
|
||||
snapshotted as a cold directory copy. The caller is responsible for
|
||||
ensuring no ingestion is running concurrently. The HTTP layer enforces
|
||||
this by holding ``exclusive_ingestion()`` around the call.
|
||||
"""
|
||||
init_db()
|
||||
now = timestamp or datetime.now(UTC)
|
||||
stamp = now.strftime("%Y%m%dT%H%M%SZ")
|
||||
|
||||
backup_root = _config.settings.resolved_backup_dir / "snapshots" / stamp
|
||||
db_backup_dir = backup_root / "db"
|
||||
config_backup_dir = backup_root / "config"
|
||||
chroma_backup_dir = backup_root / "chroma"
|
||||
metadata_path = backup_root / "backup-metadata.json"
|
||||
|
||||
db_backup_dir.mkdir(parents=True, exist_ok=True)
|
||||
config_backup_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
db_snapshot_path = db_backup_dir / _config.settings.db_path.name
|
||||
_backup_sqlite_db(_config.settings.db_path, db_snapshot_path)
|
||||
|
||||
registry_snapshot = None
|
||||
registry_path = _config.settings.resolved_project_registry_path
|
||||
if registry_path.exists():
|
||||
registry_snapshot = config_backup_dir / registry_path.name
|
||||
registry_snapshot.write_text(
|
||||
registry_path.read_text(encoding="utf-8"), encoding="utf-8"
|
||||
)
|
||||
|
||||
chroma_snapshot_path = ""
|
||||
chroma_files_copied = 0
|
||||
chroma_bytes_copied = 0
|
||||
if include_chroma:
|
||||
source_chroma = _config.settings.chroma_path
|
||||
if source_chroma.exists() and source_chroma.is_dir():
|
||||
chroma_backup_dir.mkdir(parents=True, exist_ok=True)
|
||||
chroma_files_copied, chroma_bytes_copied = _copy_directory_tree(
|
||||
source_chroma, chroma_backup_dir
|
||||
)
|
||||
chroma_snapshot_path = str(chroma_backup_dir)
|
||||
else:
|
||||
log.info(
|
||||
"chroma_snapshot_skipped_missing",
|
||||
path=str(source_chroma),
|
||||
)
|
||||
|
||||
metadata = {
|
||||
"created_at": now.isoformat(),
|
||||
"backup_root": str(backup_root),
|
||||
"db_snapshot_path": str(db_snapshot_path),
|
||||
"db_size_bytes": db_snapshot_path.stat().st_size,
|
||||
"registry_snapshot_path": str(registry_snapshot) if registry_snapshot else "",
|
||||
"chroma_snapshot_path": chroma_snapshot_path,
|
||||
"chroma_snapshot_bytes": chroma_bytes_copied,
|
||||
"chroma_snapshot_files": chroma_files_copied,
|
||||
"chroma_snapshot_included": include_chroma,
|
||||
"vector_store_note": (
|
||||
"Chroma snapshot included as cold directory copy."
|
||||
if include_chroma and chroma_snapshot_path
|
||||
else "Chroma hot backup is not included; rerun with include_chroma=True under exclusive_ingestion()."
|
||||
),
|
||||
}
|
||||
metadata_path.write_text(
|
||||
json.dumps(metadata, indent=2, ensure_ascii=True) + "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
# Automatic post-backup validation. Failures log a warning but do
|
||||
# not raise — the backup files are still on disk and may be useful.
|
||||
validation = validate_backup(stamp)
|
||||
validated = validation.get("valid", False)
|
||||
validation_errors = validation.get("errors", [])
|
||||
if not validated:
|
||||
log.warning(
|
||||
"post_backup_validation_failed",
|
||||
backup_root=str(backup_root),
|
||||
errors=validation_errors,
|
||||
)
|
||||
metadata["validated"] = validated
|
||||
metadata["validation_errors"] = validation_errors
|
||||
|
||||
log.info(
|
||||
"runtime_backup_created",
|
||||
backup_root=str(backup_root),
|
||||
db_snapshot=str(db_snapshot_path),
|
||||
chroma_included=include_chroma,
|
||||
chroma_bytes=chroma_bytes_copied,
|
||||
validated=validated,
|
||||
)
|
||||
return metadata
|
||||
|
||||
|
||||
def list_runtime_backups() -> list[dict]:
|
||||
"""List all runtime backups under the configured backup directory."""
|
||||
snapshots_root = _config.settings.resolved_backup_dir / "snapshots"
|
||||
if not snapshots_root.exists() or not snapshots_root.is_dir():
|
||||
return []
|
||||
|
||||
entries: list[dict] = []
|
||||
for snapshot_dir in sorted(snapshots_root.iterdir()):
|
||||
if not snapshot_dir.is_dir():
|
||||
continue
|
||||
metadata_path = snapshot_dir / "backup-metadata.json"
|
||||
entry: dict = {
|
||||
"stamp": snapshot_dir.name,
|
||||
"path": str(snapshot_dir),
|
||||
"has_metadata": metadata_path.exists(),
|
||||
}
|
||||
if metadata_path.exists():
|
||||
try:
|
||||
entry["metadata"] = json.loads(metadata_path.read_text(encoding="utf-8"))
|
||||
except json.JSONDecodeError:
|
||||
entry["metadata"] = None
|
||||
entry["metadata_error"] = "invalid_json"
|
||||
entries.append(entry)
|
||||
return entries
|
||||
|
||||
|
||||
def validate_backup(stamp: str) -> dict:
|
||||
"""Validate that a previously created backup is structurally usable.
|
||||
|
||||
Checks:
|
||||
- the snapshot directory exists
|
||||
- the SQLite snapshot is openable and ``PRAGMA integrity_check`` returns ok
|
||||
- the registry snapshot, if recorded, parses as JSON
|
||||
- the chroma snapshot directory, if recorded, exists
|
||||
"""
|
||||
snapshot_dir = _config.settings.resolved_backup_dir / "snapshots" / stamp
|
||||
result: dict = {
|
||||
"stamp": stamp,
|
||||
"path": str(snapshot_dir),
|
||||
"exists": snapshot_dir.exists(),
|
||||
"db_ok": False,
|
||||
"registry_ok": None,
|
||||
"chroma_ok": None,
|
||||
"errors": [],
|
||||
}
|
||||
if not snapshot_dir.exists():
|
||||
result["errors"].append("snapshot_directory_missing")
|
||||
return result
|
||||
|
||||
metadata_path = snapshot_dir / "backup-metadata.json"
|
||||
if not metadata_path.exists():
|
||||
result["errors"].append("metadata_missing")
|
||||
return result
|
||||
|
||||
try:
|
||||
metadata = json.loads(metadata_path.read_text(encoding="utf-8"))
|
||||
except json.JSONDecodeError as exc:
|
||||
result["errors"].append(f"metadata_invalid_json: {exc}")
|
||||
return result
|
||||
result["metadata"] = metadata
|
||||
|
||||
db_path = Path(metadata.get("db_snapshot_path", ""))
|
||||
if not db_path.exists():
|
||||
result["errors"].append("db_snapshot_missing")
|
||||
else:
|
||||
try:
|
||||
with sqlite3.connect(str(db_path)) as conn:
|
||||
row = conn.execute("PRAGMA integrity_check").fetchone()
|
||||
result["db_ok"] = bool(row and row[0] == "ok")
|
||||
if not result["db_ok"]:
|
||||
result["errors"].append(
|
||||
f"db_integrity_check_failed: {row[0] if row else 'no_row'}"
|
||||
)
|
||||
except sqlite3.DatabaseError as exc:
|
||||
result["errors"].append(f"db_open_failed: {exc}")
|
||||
|
||||
registry_snapshot_path = metadata.get("registry_snapshot_path", "")
|
||||
if registry_snapshot_path:
|
||||
registry_path = Path(registry_snapshot_path)
|
||||
if not registry_path.exists():
|
||||
result["registry_ok"] = False
|
||||
result["errors"].append("registry_snapshot_missing")
|
||||
else:
|
||||
try:
|
||||
json.loads(registry_path.read_text(encoding="utf-8"))
|
||||
result["registry_ok"] = True
|
||||
except json.JSONDecodeError as exc:
|
||||
result["registry_ok"] = False
|
||||
result["errors"].append(f"registry_invalid_json: {exc}")
|
||||
|
||||
chroma_snapshot_path = metadata.get("chroma_snapshot_path", "")
|
||||
if chroma_snapshot_path:
|
||||
chroma_dir = Path(chroma_snapshot_path)
|
||||
if chroma_dir.exists() and chroma_dir.is_dir():
|
||||
result["chroma_ok"] = True
|
||||
else:
|
||||
result["chroma_ok"] = False
|
||||
result["errors"].append("chroma_snapshot_missing")
|
||||
|
||||
result["valid"] = not result["errors"]
|
||||
return result
|
||||
|
||||
|
||||
def restore_runtime_backup(
|
||||
stamp: str,
|
||||
*,
|
||||
include_chroma: bool | None = None,
|
||||
pre_restore_snapshot: bool = True,
|
||||
confirm_service_stopped: bool = False,
|
||||
) -> dict:
|
||||
"""Restore a previously captured runtime backup.
|
||||
|
||||
CRITICAL: the AtoCore service MUST be stopped before calling this.
|
||||
Overwriting a live SQLite database corrupts state and can break
|
||||
the running container's open connections. The caller must pass
|
||||
``confirm_service_stopped=True`` as an explicit acknowledgment —
|
||||
otherwise this function refuses to run.
|
||||
|
||||
The restore procedure:
|
||||
|
||||
1. Validate the backup via ``validate_backup``; refuse on any error.
|
||||
2. (default) Create a pre-restore safety snapshot of the CURRENT
|
||||
state so the restore itself is reversible. The snapshot stamp
|
||||
is returned in the result for the operator to record.
|
||||
3. Remove stale SQLite WAL/SHM sidecar files next to the target db
|
||||
before copying — the snapshot is a self-contained main-file
|
||||
image from ``conn.backup()``, and leftover WAL/SHM from the old
|
||||
live db would desync against the restored main file.
|
||||
4. Copy the snapshot db over the target db path.
|
||||
5. Restore the project registry file if the snapshot captured one.
|
||||
6. Restore the Chroma directory if ``include_chroma`` resolves to
|
||||
true. When ``include_chroma is None`` the function defers to
|
||||
whether the snapshot captured Chroma (the common case).
|
||||
7. Run ``PRAGMA integrity_check`` on the restored db and report
|
||||
the result.
|
||||
|
||||
Returns a dict describing what was restored. On refused restore
|
||||
(service still running, validation failed) raises ``RuntimeError``.
|
||||
"""
|
||||
if not confirm_service_stopped:
|
||||
raise RuntimeError(
|
||||
"restore_runtime_backup refuses to run without "
|
||||
"confirm_service_stopped=True — stop the AtoCore container "
|
||||
"first (e.g. `docker compose down` from deploy/dalidou) "
|
||||
"before calling this function"
|
||||
)
|
||||
|
||||
validation = validate_backup(stamp)
|
||||
if not validation.get("valid"):
|
||||
raise RuntimeError(
|
||||
f"backup {stamp} failed validation: {validation.get('errors')}"
|
||||
)
|
||||
metadata = validation.get("metadata") or {}
|
||||
|
||||
pre_snapshot_stamp: str | None = None
|
||||
if pre_restore_snapshot:
|
||||
pre = create_runtime_backup(include_chroma=False)
|
||||
pre_snapshot_stamp = Path(pre["backup_root"]).name
|
||||
|
||||
target_db = _config.settings.db_path
|
||||
source_db = Path(metadata.get("db_snapshot_path", ""))
|
||||
if not source_db.exists():
|
||||
raise RuntimeError(
|
||||
f"db snapshot not found at {source_db} — backup "
|
||||
f"metadata may be stale"
|
||||
)
|
||||
|
||||
# Force sqlite to flush any lingering WAL into the main file and
|
||||
# release OS-level file handles on -wal/-shm before we swap the
|
||||
# main file. Passing through conn.backup() in the pre-restore
|
||||
# snapshot can leave sidecars momentarily locked on Windows;
|
||||
# an explicit checkpoint(TRUNCATE) is the reliable way to flush
|
||||
# and release. Best-effort: if the target db can't be opened
|
||||
# (missing, corrupt), fall through and trust the copy step.
|
||||
if target_db.exists():
|
||||
try:
|
||||
with sqlite3.connect(str(target_db)) as checkpoint_conn:
|
||||
checkpoint_conn.execute("PRAGMA wal_checkpoint(TRUNCATE)")
|
||||
except sqlite3.DatabaseError as exc:
|
||||
log.warning(
|
||||
"restore_pre_checkpoint_failed",
|
||||
target_db=str(target_db),
|
||||
error=str(exc),
|
||||
)
|
||||
|
||||
# Remove stale WAL/SHM sidecars from the old live db so SQLite
|
||||
# can't read inconsistent state on next open. Tolerant to
|
||||
# Windows file-lock races — the subsequent copy replaces the
|
||||
# main file anyway, and the integrity check afterward is the
|
||||
# actual correctness signal.
|
||||
wal_path = target_db.with_name(target_db.name + "-wal")
|
||||
shm_path = target_db.with_name(target_db.name + "-shm")
|
||||
for stale in (wal_path, shm_path):
|
||||
if stale.exists():
|
||||
try:
|
||||
stale.unlink()
|
||||
except OSError as exc:
|
||||
log.warning(
|
||||
"restore_sidecar_unlink_failed",
|
||||
path=str(stale),
|
||||
error=str(exc),
|
||||
)
|
||||
|
||||
target_db.parent.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(source_db, target_db)
|
||||
|
||||
registry_restored = False
|
||||
registry_snapshot_path = metadata.get("registry_snapshot_path", "")
|
||||
if registry_snapshot_path:
|
||||
src_reg = Path(registry_snapshot_path)
|
||||
if src_reg.exists():
|
||||
dst_reg = _config.settings.resolved_project_registry_path
|
||||
dst_reg.parent.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(src_reg, dst_reg)
|
||||
registry_restored = True
|
||||
|
||||
chroma_snapshot_path = metadata.get("chroma_snapshot_path", "")
|
||||
if include_chroma is None:
|
||||
include_chroma = bool(chroma_snapshot_path)
|
||||
chroma_restored = False
|
||||
if include_chroma and chroma_snapshot_path:
|
||||
src_chroma = Path(chroma_snapshot_path)
|
||||
if src_chroma.exists() and src_chroma.is_dir():
|
||||
dst_chroma = _config.settings.chroma_path
|
||||
# Do NOT rmtree the destination itself: in a Dockerized
|
||||
# deployment the chroma dir is a bind-mounted volume, and
|
||||
# unlinking a mount point raises
|
||||
# OSError [Errno 16] Device or resource busy.
|
||||
# Instead, clear the directory's CONTENTS and copytree into
|
||||
# it with dirs_exist_ok=True. This is equivalent to an
|
||||
# rmtree+copytree for restore purposes but stays inside the
|
||||
# mount boundary. Discovered during the first real restore
|
||||
# drill on Dalidou (2026-04-09).
|
||||
dst_chroma.mkdir(parents=True, exist_ok=True)
|
||||
for item in dst_chroma.iterdir():
|
||||
if item.is_dir() and not item.is_symlink():
|
||||
shutil.rmtree(item)
|
||||
else:
|
||||
item.unlink()
|
||||
shutil.copytree(src_chroma, dst_chroma, dirs_exist_ok=True)
|
||||
chroma_restored = True
|
||||
|
||||
restored_integrity_ok = False
|
||||
integrity_error: str | None = None
|
||||
try:
|
||||
with sqlite3.connect(str(target_db)) as conn:
|
||||
row = conn.execute("PRAGMA integrity_check").fetchone()
|
||||
restored_integrity_ok = bool(row and row[0] == "ok")
|
||||
if not restored_integrity_ok:
|
||||
integrity_error = row[0] if row else "no_row"
|
||||
except sqlite3.DatabaseError as exc:
|
||||
integrity_error = f"db_open_failed: {exc}"
|
||||
|
||||
result: dict = {
|
||||
"stamp": stamp,
|
||||
"pre_restore_snapshot": pre_snapshot_stamp,
|
||||
"target_db": str(target_db),
|
||||
"db_restored": True,
|
||||
"registry_restored": registry_restored,
|
||||
"chroma_restored": chroma_restored,
|
||||
"restored_integrity_ok": restored_integrity_ok,
|
||||
}
|
||||
if integrity_error:
|
||||
result["integrity_error"] = integrity_error
|
||||
|
||||
log.info(
|
||||
"runtime_backup_restored",
|
||||
stamp=stamp,
|
||||
pre_restore_snapshot=pre_snapshot_stamp,
|
||||
registry_restored=registry_restored,
|
||||
chroma_restored=chroma_restored,
|
||||
integrity_ok=restored_integrity_ok,
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
def cleanup_old_backups(*, confirm: bool = False) -> dict:
|
||||
"""Apply retention policy and remove old snapshots.
|
||||
|
||||
Retention keeps:
|
||||
- Last 7 daily snapshots (most recent per calendar day)
|
||||
- Last 4 weekly snapshots (most recent on each Sunday)
|
||||
- Last 6 monthly snapshots (most recent on the 1st of each month)
|
||||
|
||||
All other snapshots are candidates for deletion. Runs as dry-run by
|
||||
default; pass ``confirm=True`` to actually delete.
|
||||
|
||||
Returns a dict with kept/deleted counts and any errors.
|
||||
"""
|
||||
snapshots_root = _config.settings.resolved_backup_dir / "snapshots"
|
||||
if not snapshots_root.exists() or not snapshots_root.is_dir():
|
||||
return {"kept": 0, "deleted": 0, "would_delete": 0, "dry_run": not confirm, "errors": []}
|
||||
|
||||
# Parse all stamp directories into (datetime, dir_path) pairs.
|
||||
stamps: list[tuple[datetime, Path]] = []
|
||||
unparseable: list[str] = []
|
||||
for entry in sorted(snapshots_root.iterdir()):
|
||||
if not entry.is_dir():
|
||||
continue
|
||||
try:
|
||||
dt = datetime.strptime(entry.name, "%Y%m%dT%H%M%SZ").replace(tzinfo=UTC)
|
||||
stamps.append((dt, entry))
|
||||
except ValueError:
|
||||
unparseable.append(entry.name)
|
||||
|
||||
if not stamps:
|
||||
return {
|
||||
"kept": 0, "deleted": 0, "would_delete": 0,
|
||||
"dry_run": not confirm, "errors": [],
|
||||
"unparseable": unparseable,
|
||||
}
|
||||
|
||||
# Sort newest first so "most recent per bucket" is a simple first-seen.
|
||||
stamps.sort(key=lambda t: t[0], reverse=True)
|
||||
|
||||
keep_set: set[Path] = set()
|
||||
|
||||
# Last 7 daily: most recent snapshot per calendar day.
|
||||
seen_days: set[str] = set()
|
||||
for dt, path in stamps:
|
||||
day_key = dt.strftime("%Y-%m-%d")
|
||||
if day_key not in seen_days:
|
||||
seen_days.add(day_key)
|
||||
keep_set.add(path)
|
||||
if len(seen_days) >= 7:
|
||||
break
|
||||
|
||||
# Last 4 weekly: most recent snapshot that falls on a Sunday.
|
||||
seen_weeks: set[str] = set()
|
||||
for dt, path in stamps:
|
||||
if dt.weekday() == 6: # Sunday
|
||||
week_key = dt.strftime("%Y-W%W")
|
||||
if week_key not in seen_weeks:
|
||||
seen_weeks.add(week_key)
|
||||
keep_set.add(path)
|
||||
if len(seen_weeks) >= 4:
|
||||
break
|
||||
|
||||
# Last 6 monthly: most recent snapshot on the 1st of a month.
|
||||
seen_months: set[str] = set()
|
||||
for dt, path in stamps:
|
||||
if dt.day == 1:
|
||||
month_key = dt.strftime("%Y-%m")
|
||||
if month_key not in seen_months:
|
||||
seen_months.add(month_key)
|
||||
keep_set.add(path)
|
||||
if len(seen_months) >= 6:
|
||||
break
|
||||
|
||||
to_delete = [path for _, path in stamps if path not in keep_set]
|
||||
|
||||
errors: list[str] = []
|
||||
deleted_count = 0
|
||||
if confirm:
|
||||
for path in to_delete:
|
||||
try:
|
||||
shutil.rmtree(path)
|
||||
deleted_count += 1
|
||||
except OSError as exc:
|
||||
errors.append(f"{path.name}: {exc}")
|
||||
|
||||
result: dict = {
|
||||
"kept": len(keep_set),
|
||||
"dry_run": not confirm,
|
||||
"errors": errors,
|
||||
}
|
||||
if confirm:
|
||||
result["deleted"] = deleted_count
|
||||
else:
|
||||
result["would_delete"] = len(to_delete)
|
||||
if unparseable:
|
||||
result["unparseable"] = unparseable
|
||||
|
||||
log.info(
|
||||
"cleanup_old_backups",
|
||||
kept=len(keep_set),
|
||||
deleted=deleted_count if confirm else 0,
|
||||
would_delete=len(to_delete) if not confirm else 0,
|
||||
dry_run=not confirm,
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
def _backup_sqlite_db(source_path: Path, dest_path: Path) -> None:
|
||||
source_conn = sqlite3.connect(str(source_path))
|
||||
dest_conn = sqlite3.connect(str(dest_path))
|
||||
try:
|
||||
source_conn.backup(dest_conn)
|
||||
finally:
|
||||
dest_conn.close()
|
||||
source_conn.close()
|
||||
|
||||
|
||||
def _copy_directory_tree(source: Path, dest: Path) -> tuple[int, int]:
|
||||
"""Copy a directory tree and return (file_count, total_bytes)."""
|
||||
if dest.exists():
|
||||
shutil.rmtree(dest)
|
||||
shutil.copytree(source, dest)
|
||||
|
||||
file_count = 0
|
||||
total_bytes = 0
|
||||
for path in dest.rglob("*"):
|
||||
if path.is_file():
|
||||
file_count += 1
|
||||
total_bytes += path.stat().st_size
|
||||
return file_count, total_bytes
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""CLI entry point for the backup module.
|
||||
|
||||
Supports four subcommands:
|
||||
|
||||
- ``create`` run ``create_runtime_backup`` (default if none given)
|
||||
- ``list`` list all runtime backup snapshots
|
||||
- ``validate`` validate a specific snapshot by stamp
|
||||
- ``restore`` restore a specific snapshot by stamp
|
||||
|
||||
The restore subcommand is the one used by the backup/restore drill
|
||||
and MUST be run only when the AtoCore service is stopped. It takes
|
||||
``--confirm-service-stopped`` as an explicit acknowledgment.
|
||||
"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="python -m atocore.ops.backup",
|
||||
description="AtoCore runtime backup create/list/validate/restore",
|
||||
)
|
||||
sub = parser.add_subparsers(dest="command")
|
||||
|
||||
p_create = sub.add_parser("create", help="create a new runtime backup")
|
||||
p_create.add_argument(
|
||||
"--chroma",
|
||||
action="store_true",
|
||||
help="also snapshot the Chroma vector store (cold copy)",
|
||||
)
|
||||
|
||||
sub.add_parser("list", help="list runtime backup snapshots")
|
||||
|
||||
p_validate = sub.add_parser("validate", help="validate a snapshot by stamp")
|
||||
p_validate.add_argument("stamp", help="snapshot stamp (e.g. 20260409T010203Z)")
|
||||
|
||||
p_cleanup = sub.add_parser("cleanup", help="remove old snapshots per retention policy")
|
||||
p_cleanup.add_argument(
|
||||
"--confirm",
|
||||
action="store_true",
|
||||
help="actually delete (default is dry-run)",
|
||||
)
|
||||
|
||||
p_restore = sub.add_parser(
|
||||
"restore",
|
||||
help="restore a snapshot by stamp (service must be stopped)",
|
||||
)
|
||||
p_restore.add_argument("stamp", help="snapshot stamp to restore")
|
||||
p_restore.add_argument(
|
||||
"--confirm-service-stopped",
|
||||
action="store_true",
|
||||
help="explicit acknowledgment that the AtoCore container is stopped",
|
||||
)
|
||||
p_restore.add_argument(
|
||||
"--no-pre-snapshot",
|
||||
action="store_true",
|
||||
help="skip the pre-restore safety snapshot of current state",
|
||||
)
|
||||
chroma_group = p_restore.add_mutually_exclusive_group()
|
||||
chroma_group.add_argument(
|
||||
"--chroma",
|
||||
dest="include_chroma",
|
||||
action="store_true",
|
||||
default=None,
|
||||
help="force-restore the Chroma snapshot",
|
||||
)
|
||||
chroma_group.add_argument(
|
||||
"--no-chroma",
|
||||
dest="include_chroma",
|
||||
action="store_false",
|
||||
help="skip the Chroma snapshot even if it was captured",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
command = args.command or "create"
|
||||
|
||||
if command == "create":
|
||||
include_chroma = getattr(args, "chroma", False)
|
||||
result = create_runtime_backup(include_chroma=include_chroma)
|
||||
elif command == "list":
|
||||
result = {"backups": list_runtime_backups()}
|
||||
elif command == "validate":
|
||||
result = validate_backup(args.stamp)
|
||||
elif command == "cleanup":
|
||||
result = cleanup_old_backups(confirm=getattr(args, "confirm", False))
|
||||
elif command == "restore":
|
||||
result = restore_runtime_backup(
|
||||
args.stamp,
|
||||
include_chroma=args.include_chroma,
|
||||
pre_restore_snapshot=not args.no_pre_snapshot,
|
||||
confirm_service_stopped=args.confirm_service_stopped,
|
||||
)
|
||||
else: # pragma: no cover — argparse guards this
|
||||
parser.error(f"unknown command: {command}")
|
||||
|
||||
print(json.dumps(result, indent=2, ensure_ascii=True))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
src/atocore/projects/__init__.py
Normal file
1
src/atocore/projects/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Project registry and source refresh helpers."""
|
||||
461
src/atocore/projects/registry.py
Normal file
461
src/atocore/projects/registry.py
Normal file
@@ -0,0 +1,461 @@
|
||||
"""Registered project source metadata and refresh helpers."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import tempfile
|
||||
from dataclasses import asdict, dataclass
|
||||
from pathlib import Path
|
||||
|
||||
import atocore.config as _config
|
||||
from atocore.ingestion.pipeline import ingest_folder
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProjectSourceRef:
|
||||
source: str
|
||||
subpath: str
|
||||
label: str = ""
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RegisteredProject:
|
||||
project_id: str
|
||||
aliases: tuple[str, ...]
|
||||
description: str
|
||||
ingest_roots: tuple[ProjectSourceRef, ...]
|
||||
|
||||
|
||||
def get_project_registry_template() -> dict:
|
||||
"""Return a minimal template for registering a new project."""
|
||||
return {
|
||||
"projects": [
|
||||
{
|
||||
"id": "p07-example",
|
||||
"aliases": ["p07", "example-project"],
|
||||
"description": "Short description of the project and staged corpus.",
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p07-example",
|
||||
"label": "Primary staged project docs",
|
||||
}
|
||||
],
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
def build_project_registration_proposal(
|
||||
project_id: str,
|
||||
aliases: list[str] | tuple[str, ...] | None = None,
|
||||
description: str = "",
|
||||
ingest_roots: list[dict] | tuple[dict, ...] | None = None,
|
||||
) -> dict:
|
||||
"""Build a normalized project registration proposal without mutating state."""
|
||||
normalized_id = project_id.strip()
|
||||
if not normalized_id:
|
||||
raise ValueError("Project id must be non-empty")
|
||||
|
||||
normalized_aliases = _normalize_aliases(aliases or [])
|
||||
normalized_roots = _normalize_ingest_roots(ingest_roots or [])
|
||||
if not normalized_roots:
|
||||
raise ValueError("At least one ingest root is required")
|
||||
|
||||
collisions = _find_name_collisions(normalized_id, normalized_aliases)
|
||||
resolved_roots = []
|
||||
for root in normalized_roots:
|
||||
source_ref = ProjectSourceRef(
|
||||
source=root["source"],
|
||||
subpath=root["subpath"],
|
||||
label=root.get("label", ""),
|
||||
)
|
||||
resolved_path = _resolve_ingest_root(source_ref)
|
||||
resolved_roots.append(
|
||||
{
|
||||
**root,
|
||||
"path": str(resolved_path),
|
||||
"exists": resolved_path.exists(),
|
||||
"is_dir": resolved_path.is_dir(),
|
||||
}
|
||||
)
|
||||
|
||||
return {
|
||||
"project": {
|
||||
"id": normalized_id,
|
||||
"aliases": normalized_aliases,
|
||||
"description": description.strip(),
|
||||
"ingest_roots": normalized_roots,
|
||||
},
|
||||
"resolved_ingest_roots": resolved_roots,
|
||||
"collisions": collisions,
|
||||
"registry_path": str(_config.settings.resolved_project_registry_path),
|
||||
"valid": not collisions,
|
||||
}
|
||||
|
||||
|
||||
def register_project(
|
||||
project_id: str,
|
||||
aliases: list[str] | tuple[str, ...] | None = None,
|
||||
description: str = "",
|
||||
ingest_roots: list[dict] | tuple[dict, ...] | None = None,
|
||||
) -> dict:
|
||||
"""Persist a validated project registration to the registry file."""
|
||||
proposal = build_project_registration_proposal(
|
||||
project_id=project_id,
|
||||
aliases=aliases,
|
||||
description=description,
|
||||
ingest_roots=ingest_roots,
|
||||
)
|
||||
if not proposal["valid"]:
|
||||
collision_names = ", ".join(collision["name"] for collision in proposal["collisions"])
|
||||
raise ValueError(f"Project registration has collisions: {collision_names}")
|
||||
|
||||
registry_path = _config.settings.resolved_project_registry_path
|
||||
payload = _load_registry_payload(registry_path)
|
||||
payload.setdefault("projects", []).append(proposal["project"])
|
||||
_write_registry_payload(registry_path, payload)
|
||||
|
||||
return {
|
||||
**proposal,
|
||||
"status": "registered",
|
||||
}
|
||||
|
||||
|
||||
def update_project(
|
||||
project_name: str,
|
||||
aliases: list[str] | tuple[str, ...] | None = None,
|
||||
description: str | None = None,
|
||||
ingest_roots: list[dict] | tuple[dict, ...] | None = None,
|
||||
) -> dict:
|
||||
"""Update an existing project registration in the registry file."""
|
||||
existing = get_registered_project(project_name)
|
||||
if existing is None:
|
||||
raise ValueError(f"Unknown project: {project_name}")
|
||||
|
||||
final_aliases = _normalize_aliases(aliases) if aliases is not None else list(existing.aliases)
|
||||
final_description = description.strip() if description is not None else existing.description
|
||||
final_roots = (
|
||||
_normalize_ingest_roots(ingest_roots)
|
||||
if ingest_roots is not None
|
||||
else [asdict(root) for root in existing.ingest_roots]
|
||||
)
|
||||
if not final_roots:
|
||||
raise ValueError("At least one ingest root is required")
|
||||
|
||||
collisions = _find_name_collisions(
|
||||
existing.project_id,
|
||||
final_aliases,
|
||||
exclude_project_id=existing.project_id,
|
||||
)
|
||||
if collisions:
|
||||
collision_names = ", ".join(collision["name"] for collision in collisions)
|
||||
raise ValueError(f"Project update has collisions: {collision_names}")
|
||||
|
||||
updated_entry = {
|
||||
"id": existing.project_id,
|
||||
"aliases": final_aliases,
|
||||
"description": final_description,
|
||||
"ingest_roots": final_roots,
|
||||
}
|
||||
|
||||
resolved_roots = []
|
||||
for root in final_roots:
|
||||
source_ref = ProjectSourceRef(
|
||||
source=root["source"],
|
||||
subpath=root["subpath"],
|
||||
label=root.get("label", ""),
|
||||
)
|
||||
resolved_path = _resolve_ingest_root(source_ref)
|
||||
resolved_roots.append(
|
||||
{
|
||||
**root,
|
||||
"path": str(resolved_path),
|
||||
"exists": resolved_path.exists(),
|
||||
"is_dir": resolved_path.is_dir(),
|
||||
}
|
||||
)
|
||||
|
||||
registry_path = _config.settings.resolved_project_registry_path
|
||||
payload = _load_registry_payload(registry_path)
|
||||
payload["projects"] = [
|
||||
updated_entry if str(entry.get("id", "")).strip() == existing.project_id else entry
|
||||
for entry in payload.get("projects", [])
|
||||
]
|
||||
_write_registry_payload(registry_path, payload)
|
||||
|
||||
return {
|
||||
"project": updated_entry,
|
||||
"resolved_ingest_roots": resolved_roots,
|
||||
"collisions": [],
|
||||
"registry_path": str(registry_path),
|
||||
"valid": True,
|
||||
"status": "updated",
|
||||
}
|
||||
|
||||
|
||||
def load_project_registry() -> list[RegisteredProject]:
|
||||
"""Load project registry entries from JSON config."""
|
||||
registry_path = _config.settings.resolved_project_registry_path
|
||||
payload = _load_registry_payload(registry_path)
|
||||
entries = payload.get("projects", [])
|
||||
projects: list[RegisteredProject] = []
|
||||
|
||||
for entry in entries:
|
||||
project_id = str(entry["id"]).strip()
|
||||
if not project_id:
|
||||
raise ValueError("Project registry entry is missing a non-empty id")
|
||||
aliases = tuple(
|
||||
alias.strip()
|
||||
for alias in entry.get("aliases", [])
|
||||
if isinstance(alias, str) and alias.strip()
|
||||
)
|
||||
description = str(entry.get("description", "")).strip()
|
||||
ingest_roots = tuple(
|
||||
ProjectSourceRef(
|
||||
source=str(root["source"]).strip(),
|
||||
subpath=str(root["subpath"]).strip(),
|
||||
label=str(root.get("label", "")).strip(),
|
||||
)
|
||||
for root in entry.get("ingest_roots", [])
|
||||
if str(root.get("source", "")).strip()
|
||||
and str(root.get("subpath", "")).strip()
|
||||
)
|
||||
if not ingest_roots:
|
||||
raise ValueError(f"Project registry entry '{project_id}' has no ingest_roots")
|
||||
projects.append(
|
||||
RegisteredProject(
|
||||
project_id=project_id,
|
||||
aliases=aliases,
|
||||
description=description,
|
||||
ingest_roots=ingest_roots,
|
||||
)
|
||||
)
|
||||
|
||||
_validate_unique_project_names(projects)
|
||||
return projects
|
||||
|
||||
|
||||
def list_registered_projects() -> list[dict]:
|
||||
"""Return registry entries with resolved source readiness."""
|
||||
return [_project_to_dict(project) for project in load_project_registry()]
|
||||
|
||||
|
||||
def get_registered_project(project_name: str) -> RegisteredProject | None:
|
||||
"""Resolve a registry entry by id or alias."""
|
||||
needle = project_name.strip().lower()
|
||||
if not needle:
|
||||
return None
|
||||
|
||||
for project in load_project_registry():
|
||||
candidates = {project.project_id.lower(), *(alias.lower() for alias in project.aliases)}
|
||||
if needle in candidates:
|
||||
return project
|
||||
return None
|
||||
|
||||
|
||||
def resolve_project_name(name: str | None) -> str:
|
||||
"""Canonicalize a project name through the registry.
|
||||
|
||||
Returns the canonical ``project_id`` if the input matches any
|
||||
registered project's id or alias. Returns the input unchanged
|
||||
when it's empty or not in the registry — the second case keeps
|
||||
backwards compatibility with hand-curated state, memories, and
|
||||
interactions that predate the registry, or for projects that
|
||||
are intentionally not registered.
|
||||
|
||||
This helper is the single canonicalization boundary for project
|
||||
names across the trust hierarchy. Every read/write that takes a
|
||||
project name should pass it through ``resolve_project_name``
|
||||
before storing or querying. The contract is documented in
|
||||
``docs/architecture/representation-authority.md``.
|
||||
"""
|
||||
if not name:
|
||||
return name or ""
|
||||
project = get_registered_project(name)
|
||||
if project is not None:
|
||||
return project.project_id
|
||||
return name
|
||||
|
||||
|
||||
def refresh_registered_project(project_name: str, purge_deleted: bool = False) -> dict:
|
||||
"""Ingest all configured source roots for a registered project.
|
||||
|
||||
The returned dict carries an overall ``status`` so callers can tell at a
|
||||
glance whether the refresh was fully successful, partial, or did nothing
|
||||
at all because every configured root was missing or not a directory:
|
||||
|
||||
- ``ingested``: every root was a real directory and was ingested
|
||||
- ``partial``: at least one root ingested and at least one was unusable
|
||||
- ``nothing_to_ingest``: no roots were usable
|
||||
"""
|
||||
project = get_registered_project(project_name)
|
||||
if project is None:
|
||||
raise ValueError(f"Unknown project: {project_name}")
|
||||
|
||||
roots = []
|
||||
ingested_count = 0
|
||||
skipped_count = 0
|
||||
for source_ref in project.ingest_roots:
|
||||
resolved = _resolve_ingest_root(source_ref)
|
||||
root_result = {
|
||||
"source": source_ref.source,
|
||||
"subpath": source_ref.subpath,
|
||||
"label": source_ref.label,
|
||||
"path": str(resolved),
|
||||
}
|
||||
if not resolved.exists():
|
||||
roots.append({**root_result, "status": "missing"})
|
||||
skipped_count += 1
|
||||
continue
|
||||
if not resolved.is_dir():
|
||||
roots.append({**root_result, "status": "not_directory"})
|
||||
skipped_count += 1
|
||||
continue
|
||||
|
||||
roots.append(
|
||||
{
|
||||
**root_result,
|
||||
"status": "ingested",
|
||||
"results": ingest_folder(resolved, purge_deleted=purge_deleted),
|
||||
}
|
||||
)
|
||||
ingested_count += 1
|
||||
|
||||
if ingested_count == 0:
|
||||
overall_status = "nothing_to_ingest"
|
||||
elif skipped_count == 0:
|
||||
overall_status = "ingested"
|
||||
else:
|
||||
overall_status = "partial"
|
||||
|
||||
return {
|
||||
"project": project.project_id,
|
||||
"aliases": list(project.aliases),
|
||||
"description": project.description,
|
||||
"purge_deleted": purge_deleted,
|
||||
"status": overall_status,
|
||||
"roots_ingested": ingested_count,
|
||||
"roots_skipped": skipped_count,
|
||||
"roots": roots,
|
||||
}
|
||||
|
||||
|
||||
def _normalize_aliases(aliases: list[str] | tuple[str, ...]) -> list[str]:
|
||||
deduped: list[str] = []
|
||||
seen: set[str] = set()
|
||||
for alias in aliases:
|
||||
candidate = alias.strip()
|
||||
if not candidate:
|
||||
continue
|
||||
key = candidate.lower()
|
||||
if key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
deduped.append(candidate)
|
||||
return deduped
|
||||
|
||||
|
||||
def _normalize_ingest_roots(ingest_roots: list[dict] | tuple[dict, ...]) -> list[dict]:
|
||||
normalized: list[dict] = []
|
||||
for root in ingest_roots:
|
||||
source = str(root.get("source", "")).strip()
|
||||
subpath = str(root.get("subpath", "")).strip()
|
||||
label = str(root.get("label", "")).strip()
|
||||
if not source or not subpath:
|
||||
continue
|
||||
if source not in {"vault", "drive"}:
|
||||
raise ValueError(f"Unsupported source root: {source}")
|
||||
normalized.append({"source": source, "subpath": subpath, "label": label})
|
||||
return normalized
|
||||
|
||||
|
||||
def _project_to_dict(project: RegisteredProject) -> dict:
|
||||
return {
|
||||
"id": project.project_id,
|
||||
"aliases": list(project.aliases),
|
||||
"description": project.description,
|
||||
"ingest_roots": [
|
||||
{
|
||||
**asdict(source_ref),
|
||||
"path": str(_resolve_ingest_root(source_ref)),
|
||||
"exists": _resolve_ingest_root(source_ref).exists(),
|
||||
"is_dir": _resolve_ingest_root(source_ref).is_dir(),
|
||||
}
|
||||
for source_ref in project.ingest_roots
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _resolve_ingest_root(source_ref: ProjectSourceRef) -> Path:
|
||||
base_map = {
|
||||
"vault": _config.settings.resolved_vault_source_dir,
|
||||
"drive": _config.settings.resolved_drive_source_dir,
|
||||
}
|
||||
try:
|
||||
base_dir = base_map[source_ref.source]
|
||||
except KeyError as exc:
|
||||
raise ValueError(f"Unsupported source root: {source_ref.source}") from exc
|
||||
|
||||
return (base_dir / source_ref.subpath).resolve(strict=False)
|
||||
|
||||
|
||||
def _validate_unique_project_names(projects: list[RegisteredProject]) -> None:
|
||||
seen: dict[str, str] = {}
|
||||
for project in projects:
|
||||
names = [project.project_id, *project.aliases]
|
||||
for name in names:
|
||||
key = name.lower()
|
||||
if key in seen and seen[key] != project.project_id:
|
||||
raise ValueError(
|
||||
f"Project registry name collision: '{name}' is used by both "
|
||||
f"'{seen[key]}' and '{project.project_id}'"
|
||||
)
|
||||
seen[key] = project.project_id
|
||||
|
||||
|
||||
def _find_name_collisions(
|
||||
project_id: str,
|
||||
aliases: list[str],
|
||||
exclude_project_id: str | None = None,
|
||||
) -> list[dict]:
|
||||
collisions: list[dict] = []
|
||||
existing = load_project_registry()
|
||||
requested_names = [project_id, *aliases]
|
||||
for requested in requested_names:
|
||||
requested_key = requested.lower()
|
||||
for project in existing:
|
||||
if exclude_project_id is not None and project.project_id == exclude_project_id:
|
||||
continue
|
||||
project_names = [project.project_id, *project.aliases]
|
||||
if requested_key in {name.lower() for name in project_names}:
|
||||
collisions.append(
|
||||
{
|
||||
"name": requested,
|
||||
"existing_project": project.project_id,
|
||||
}
|
||||
)
|
||||
break
|
||||
return collisions
|
||||
|
||||
|
||||
def _load_registry_payload(registry_path: Path) -> dict:
|
||||
if not registry_path.exists():
|
||||
return {"projects": []}
|
||||
return json.loads(registry_path.read_text(encoding="utf-8"))
|
||||
|
||||
|
||||
def _write_registry_payload(registry_path: Path, payload: dict) -> None:
|
||||
registry_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
rendered = json.dumps(payload, indent=2, ensure_ascii=True) + "\n"
|
||||
with tempfile.NamedTemporaryFile(
|
||||
mode="w",
|
||||
encoding="utf-8",
|
||||
dir=registry_path.parent,
|
||||
prefix=f"{registry_path.stem}.",
|
||||
suffix=".tmp",
|
||||
delete=False,
|
||||
) as tmp_file:
|
||||
tmp_file.write(rendered)
|
||||
temp_path = Path(tmp_file.name)
|
||||
temp_path.replace(registry_path)
|
||||
0
src/atocore/retrieval/__init__.py
Normal file
0
src/atocore/retrieval/__init__.py
Normal file
32
src/atocore/retrieval/embeddings.py
Normal file
32
src/atocore/retrieval/embeddings.py
Normal file
@@ -0,0 +1,32 @@
|
||||
"""Embedding model management."""
|
||||
|
||||
import atocore.config as _config
|
||||
from sentence_transformers import SentenceTransformer
|
||||
|
||||
from atocore.observability.logger import get_logger
|
||||
|
||||
log = get_logger("embeddings")
|
||||
|
||||
_model: SentenceTransformer | None = None
|
||||
|
||||
|
||||
def get_model() -> SentenceTransformer:
|
||||
"""Load and cache the embedding model."""
|
||||
global _model
|
||||
if _model is None:
|
||||
log.info("loading_embedding_model", model=_config.settings.embedding_model)
|
||||
_model = SentenceTransformer(_config.settings.embedding_model)
|
||||
log.info("embedding_model_loaded", model=_config.settings.embedding_model)
|
||||
return _model
|
||||
|
||||
|
||||
def embed_texts(texts: list[str]) -> list[list[float]]:
|
||||
"""Generate embeddings for a list of texts."""
|
||||
model = get_model()
|
||||
embeddings = model.encode(texts, show_progress_bar=False, normalize_embeddings=True)
|
||||
return embeddings.tolist()
|
||||
|
||||
|
||||
def embed_query(query: str) -> list[float]:
|
||||
"""Generate embedding for a single query."""
|
||||
return embed_texts([query])[0]
|
||||
236
src/atocore/retrieval/retriever.py
Normal file
236
src/atocore/retrieval/retriever.py
Normal file
@@ -0,0 +1,236 @@
|
||||
"""Retrieval: query to ranked chunks."""
|
||||
|
||||
import re
|
||||
import time
|
||||
from dataclasses import dataclass
|
||||
|
||||
import atocore.config as _config
|
||||
from atocore.models.database import get_connection
|
||||
from atocore.observability.logger import get_logger
|
||||
from atocore.projects.registry import get_registered_project
|
||||
from atocore.retrieval.embeddings import embed_query
|
||||
from atocore.retrieval.vector_store import get_vector_store
|
||||
|
||||
log = get_logger("retriever")
|
||||
|
||||
_STOP_TOKENS = {
|
||||
"about",
|
||||
"and",
|
||||
"current",
|
||||
"for",
|
||||
"from",
|
||||
"into",
|
||||
"like",
|
||||
"project",
|
||||
"shared",
|
||||
"system",
|
||||
"that",
|
||||
"the",
|
||||
"this",
|
||||
"what",
|
||||
"with",
|
||||
}
|
||||
|
||||
_HIGH_SIGNAL_HINTS = (
|
||||
"status",
|
||||
"decision",
|
||||
"requirements",
|
||||
"requirement",
|
||||
"roadmap",
|
||||
"charter",
|
||||
"system-map",
|
||||
"system_map",
|
||||
"contracts",
|
||||
"schema",
|
||||
"architecture",
|
||||
"workflow",
|
||||
"error-budget",
|
||||
"comparison-matrix",
|
||||
"selection-decision",
|
||||
)
|
||||
|
||||
_LOW_SIGNAL_HINTS = (
|
||||
"/_archive/",
|
||||
"\\_archive\\",
|
||||
"/archive/",
|
||||
"\\archive\\",
|
||||
"_history",
|
||||
"history",
|
||||
"pre-cleanup",
|
||||
"pre-migration",
|
||||
"reviews/",
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ChunkResult:
|
||||
chunk_id: str
|
||||
content: str
|
||||
score: float
|
||||
heading_path: str
|
||||
source_file: str
|
||||
tags: str
|
||||
title: str
|
||||
document_id: str
|
||||
|
||||
|
||||
def retrieve(
|
||||
query: str,
|
||||
top_k: int | None = None,
|
||||
filter_tags: list[str] | None = None,
|
||||
project_hint: str | None = None,
|
||||
) -> list[ChunkResult]:
|
||||
"""Retrieve the most relevant chunks for a query."""
|
||||
top_k = top_k or _config.settings.context_top_k
|
||||
start = time.time()
|
||||
|
||||
query_embedding = embed_query(query)
|
||||
store = get_vector_store()
|
||||
|
||||
where = None
|
||||
if filter_tags:
|
||||
if len(filter_tags) == 1:
|
||||
where = {"tags": {"$contains": f'"{filter_tags[0]}"'}}
|
||||
else:
|
||||
where = {
|
||||
"$and": [
|
||||
{"tags": {"$contains": f'"{tag}"'}}
|
||||
for tag in filter_tags
|
||||
]
|
||||
}
|
||||
|
||||
results = store.query(
|
||||
query_embedding=query_embedding,
|
||||
top_k=top_k,
|
||||
where=where,
|
||||
)
|
||||
|
||||
chunks = []
|
||||
if results and results["ids"] and results["ids"][0]:
|
||||
existing_ids = _existing_chunk_ids(results["ids"][0])
|
||||
for i, chunk_id in enumerate(results["ids"][0]):
|
||||
if chunk_id not in existing_ids:
|
||||
continue
|
||||
|
||||
distance = results["distances"][0][i] if results["distances"] else 0
|
||||
score = 1.0 - distance
|
||||
meta = results["metadatas"][0][i] if results["metadatas"] else {}
|
||||
content = results["documents"][0][i] if results["documents"] else ""
|
||||
|
||||
score *= _query_match_boost(query, meta)
|
||||
score *= _path_signal_boost(meta)
|
||||
if project_hint:
|
||||
score *= _project_match_boost(project_hint, meta)
|
||||
|
||||
chunks.append(
|
||||
ChunkResult(
|
||||
chunk_id=chunk_id,
|
||||
content=content,
|
||||
score=round(score, 4),
|
||||
heading_path=meta.get("heading_path", ""),
|
||||
source_file=meta.get("source_file", ""),
|
||||
tags=meta.get("tags", "[]"),
|
||||
title=meta.get("title", ""),
|
||||
document_id=meta.get("document_id", ""),
|
||||
)
|
||||
)
|
||||
|
||||
duration_ms = int((time.time() - start) * 1000)
|
||||
chunks.sort(key=lambda chunk: chunk.score, reverse=True)
|
||||
|
||||
log.info(
|
||||
"retrieval_done",
|
||||
query=query[:100],
|
||||
top_k=top_k,
|
||||
results_count=len(chunks),
|
||||
duration_ms=duration_ms,
|
||||
)
|
||||
|
||||
return chunks
|
||||
|
||||
|
||||
def _project_match_boost(project_hint: str, metadata: dict) -> float:
|
||||
"""Return a project-aware relevance multiplier for raw retrieval."""
|
||||
hint_lower = project_hint.strip().lower()
|
||||
if not hint_lower:
|
||||
return 1.0
|
||||
|
||||
source_file = str(metadata.get("source_file", "")).lower()
|
||||
title = str(metadata.get("title", "")).lower()
|
||||
tags = str(metadata.get("tags", "")).lower()
|
||||
searchable = " ".join([source_file, title, tags])
|
||||
|
||||
project = get_registered_project(project_hint)
|
||||
candidate_names = {hint_lower}
|
||||
if project is not None:
|
||||
candidate_names.add(project.project_id.lower())
|
||||
candidate_names.update(alias.lower() for alias in project.aliases)
|
||||
candidate_names.update(
|
||||
source_ref.subpath.replace("\\", "/").strip("/").split("/")[-1].lower()
|
||||
for source_ref in project.ingest_roots
|
||||
if source_ref.subpath.strip("/\\")
|
||||
)
|
||||
|
||||
for candidate in candidate_names:
|
||||
if candidate and candidate in searchable:
|
||||
return _config.settings.rank_project_match_boost
|
||||
|
||||
return 1.0
|
||||
|
||||
|
||||
def _query_match_boost(query: str, metadata: dict) -> float:
|
||||
"""Boost chunks whose path/title/headings echo the query's high-signal terms."""
|
||||
tokens = [
|
||||
token
|
||||
for token in re.findall(r"[a-z0-9][a-z0-9_-]{2,}", query.lower())
|
||||
if token not in _STOP_TOKENS
|
||||
]
|
||||
if not tokens:
|
||||
return 1.0
|
||||
|
||||
searchable = " ".join(
|
||||
[
|
||||
str(metadata.get("source_file", "")).lower(),
|
||||
str(metadata.get("title", "")).lower(),
|
||||
str(metadata.get("heading_path", "")).lower(),
|
||||
]
|
||||
)
|
||||
matches = sum(1 for token in set(tokens) if token in searchable)
|
||||
if matches <= 0:
|
||||
return 1.0
|
||||
return min(
|
||||
1.0 + matches * _config.settings.rank_query_token_step,
|
||||
_config.settings.rank_query_token_cap,
|
||||
)
|
||||
|
||||
|
||||
def _path_signal_boost(metadata: dict) -> float:
|
||||
"""Prefer current high-signal docs and gently down-rank archival noise."""
|
||||
searchable = " ".join(
|
||||
[
|
||||
str(metadata.get("source_file", "")).lower(),
|
||||
str(metadata.get("title", "")).lower(),
|
||||
str(metadata.get("heading_path", "")).lower(),
|
||||
]
|
||||
)
|
||||
|
||||
multiplier = 1.0
|
||||
if any(hint in searchable for hint in _LOW_SIGNAL_HINTS):
|
||||
multiplier *= _config.settings.rank_path_low_signal_penalty
|
||||
if any(hint in searchable for hint in _HIGH_SIGNAL_HINTS):
|
||||
multiplier *= _config.settings.rank_path_high_signal_boost
|
||||
return multiplier
|
||||
|
||||
|
||||
def _existing_chunk_ids(chunk_ids: list[str]) -> set[str]:
|
||||
"""Filter out stale vector entries whose chunk rows no longer exist."""
|
||||
if not chunk_ids:
|
||||
return set()
|
||||
|
||||
placeholders = ", ".join("?" for _ in chunk_ids)
|
||||
with get_connection() as conn:
|
||||
rows = conn.execute(
|
||||
f"SELECT id FROM source_chunks WHERE id IN ({placeholders})",
|
||||
chunk_ids,
|
||||
).fetchall()
|
||||
return {row["id"] for row in rows}
|
||||
77
src/atocore/retrieval/vector_store.py
Normal file
77
src/atocore/retrieval/vector_store.py
Normal file
@@ -0,0 +1,77 @@
|
||||
"""ChromaDB vector store wrapper."""
|
||||
|
||||
import chromadb
|
||||
|
||||
import atocore.config as _config
|
||||
from atocore.observability.logger import get_logger
|
||||
from atocore.retrieval.embeddings import embed_texts
|
||||
|
||||
log = get_logger("vector_store")
|
||||
|
||||
COLLECTION_NAME = "atocore_chunks"
|
||||
|
||||
_store: "VectorStore | None" = None
|
||||
|
||||
|
||||
class VectorStore:
|
||||
"""Wrapper around ChromaDB for chunk storage and retrieval."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
_config.settings.chroma_path.mkdir(parents=True, exist_ok=True)
|
||||
self._client = chromadb.PersistentClient(path=str(_config.settings.chroma_path))
|
||||
self._collection = self._client.get_or_create_collection(
|
||||
name=COLLECTION_NAME,
|
||||
metadata={"hnsw:space": "cosine"},
|
||||
)
|
||||
log.info("vector_store_initialized", path=str(_config.settings.chroma_path))
|
||||
|
||||
def add(
|
||||
self,
|
||||
ids: list[str],
|
||||
documents: list[str],
|
||||
metadatas: list[dict],
|
||||
) -> None:
|
||||
"""Add chunks with embeddings to the store."""
|
||||
embeddings = embed_texts(documents)
|
||||
self._collection.add(
|
||||
ids=ids,
|
||||
embeddings=embeddings,
|
||||
documents=documents,
|
||||
metadatas=metadatas,
|
||||
)
|
||||
log.debug("vectors_added", count=len(ids))
|
||||
|
||||
def query(
|
||||
self,
|
||||
query_embedding: list[float],
|
||||
top_k: int = 10,
|
||||
where: dict | None = None,
|
||||
) -> dict:
|
||||
"""Query the store for similar chunks."""
|
||||
kwargs: dict = {
|
||||
"query_embeddings": [query_embedding],
|
||||
"n_results": top_k,
|
||||
"include": ["documents", "metadatas", "distances"],
|
||||
}
|
||||
if where:
|
||||
kwargs["where"] = where
|
||||
|
||||
return self._collection.query(**kwargs)
|
||||
|
||||
def delete(self, ids: list[str]) -> None:
|
||||
"""Delete chunks by IDs."""
|
||||
if ids:
|
||||
self._collection.delete(ids=ids)
|
||||
log.debug("vectors_deleted", count=len(ids))
|
||||
|
||||
@property
|
||||
def count(self) -> int:
|
||||
return self._collection.count()
|
||||
|
||||
|
||||
def get_vector_store() -> VectorStore:
|
||||
"""Get or create the singleton vector store."""
|
||||
global _store
|
||||
if _store is None:
|
||||
_store = VectorStore()
|
||||
return _store
|
||||
0
tests/__init__.py
Normal file
0
tests/__init__.py
Normal file
158
tests/conftest.py
Normal file
158
tests/conftest.py
Normal file
@@ -0,0 +1,158 @@
|
||||
"""pytest configuration and shared fixtures."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "src"))
|
||||
|
||||
# Default test data directory — overridden per-test by fixtures
|
||||
_default_test_dir = tempfile.mkdtemp(prefix="atocore_test_")
|
||||
os.environ["ATOCORE_DATA_DIR"] = _default_test_dir
|
||||
os.environ["ATOCORE_DEBUG"] = "true"
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def tmp_data_dir(tmp_path):
|
||||
"""Provide a temporary data directory for tests."""
|
||||
os.environ["ATOCORE_DATA_DIR"] = str(tmp_path)
|
||||
# Reset singletons
|
||||
from atocore import config
|
||||
config.settings = config.Settings()
|
||||
|
||||
import atocore.retrieval.vector_store as vs
|
||||
vs._store = None
|
||||
|
||||
return tmp_path
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def project_registry(tmp_path, monkeypatch):
|
||||
"""Stand up an isolated project registry pointing at a temp file.
|
||||
|
||||
Returns a callable that takes one or more (project_id, [aliases])
|
||||
tuples and writes them into the registry, then forces the in-process
|
||||
settings singleton to re-resolve. Use this when a test needs the
|
||||
canonicalization helpers (resolve_project_name, get_registered_project)
|
||||
to recognize aliases.
|
||||
"""
|
||||
registry_path = tmp_path / "test-project-registry.json"
|
||||
|
||||
def _set(*projects):
|
||||
payload = {"projects": []}
|
||||
for entry in projects:
|
||||
if isinstance(entry, str):
|
||||
project_id, aliases = entry, []
|
||||
else:
|
||||
project_id, aliases = entry
|
||||
payload["projects"].append(
|
||||
{
|
||||
"id": project_id,
|
||||
"aliases": list(aliases),
|
||||
"description": f"test project {project_id}",
|
||||
"ingest_roots": [
|
||||
{"source": "vault", "subpath": f"incoming/projects/{project_id}"}
|
||||
],
|
||||
}
|
||||
)
|
||||
registry_path.write_text(json.dumps(payload), encoding="utf-8")
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
from atocore import config
|
||||
|
||||
config.settings = config.Settings()
|
||||
return registry_path
|
||||
|
||||
return _set
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_markdown(tmp_path) -> Path:
|
||||
"""Create a sample markdown file for testing."""
|
||||
md_file = tmp_path / "test_note.md"
|
||||
md_file.write_text(
|
||||
"""---
|
||||
tags:
|
||||
- atocore
|
||||
- architecture
|
||||
date: 2026-04-05
|
||||
---
|
||||
# AtoCore Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
AtoCore is a personal context engine that enriches LLM interactions
|
||||
with durable memory, structured context, and project knowledge.
|
||||
|
||||
## Layers
|
||||
|
||||
The system has these layers:
|
||||
|
||||
1. Main PKM (human, messy, exploratory)
|
||||
2. AtoVault (system mirror)
|
||||
3. AtoDrive (trusted project truth)
|
||||
4. Structured Memory (DB)
|
||||
5. Semantic Retrieval (vector DB)
|
||||
|
||||
## Memory Types
|
||||
|
||||
AtoCore supports these memory types:
|
||||
|
||||
- Identity
|
||||
- Preferences
|
||||
- Project Memory
|
||||
- Episodic Memory
|
||||
- Knowledge Objects
|
||||
- Adaptation Memory
|
||||
- Trusted Project State
|
||||
|
||||
## Trust Precedence
|
||||
|
||||
When sources conflict:
|
||||
|
||||
1. Trusted Project State wins
|
||||
2. AtoDrive overrides PKM
|
||||
3. Most recent confirmed wins
|
||||
4. Higher confidence wins
|
||||
5. Equal → flag conflict
|
||||
|
||||
No silent merging.
|
||||
""",
|
||||
encoding="utf-8",
|
||||
)
|
||||
return md_file
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_folder(tmp_path, sample_markdown) -> Path:
|
||||
"""Create a folder with multiple markdown files."""
|
||||
# Already has test_note.md from sample_markdown
|
||||
second = tmp_path / "second_note.md"
|
||||
second.write_text(
|
||||
"""---
|
||||
tags:
|
||||
- chunking
|
||||
---
|
||||
# Chunking Strategy
|
||||
|
||||
## Approach
|
||||
|
||||
Heading-aware recursive splitting:
|
||||
|
||||
1. Split on H2 boundaries first
|
||||
2. If section > 800 chars, split on H3
|
||||
3. If still > 800 chars, split on paragraphs
|
||||
4. Hard split at 800 chars with 100 char overlap
|
||||
|
||||
## Parameters
|
||||
|
||||
- max_chunk_size: 800 characters
|
||||
- overlap: 100 characters
|
||||
- min_chunk_size: 50 characters
|
||||
""",
|
||||
encoding="utf-8",
|
||||
)
|
||||
return tmp_path
|
||||
636
tests/test_api_storage.py
Normal file
636
tests/test_api_storage.py
Normal file
@@ -0,0 +1,636 @@
|
||||
"""Tests for storage-related API readiness endpoints."""
|
||||
|
||||
from contextlib import contextmanager
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
import atocore.config as config
|
||||
from atocore.main import app
|
||||
|
||||
|
||||
def test_sources_endpoint_reports_configured_sources(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
vault_dir.mkdir()
|
||||
drive_dir.mkdir()
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.get("/sources")
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["vault_enabled"] is True
|
||||
assert body["drive_enabled"] is True
|
||||
assert len(body["sources"]) == 2
|
||||
assert all(source["read_only"] for source in body["sources"])
|
||||
|
||||
|
||||
def test_health_endpoint_exposes_machine_paths_and_source_readiness(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
vault_dir.mkdir()
|
||||
drive_dir.mkdir()
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.get("/health")
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["status"] == "ok"
|
||||
assert body["sources_ready"] is True
|
||||
assert "db_path" in body["machine_paths"]
|
||||
assert "run_dir" in body["machine_paths"]
|
||||
|
||||
|
||||
def test_health_endpoint_reports_code_version_from_module(tmp_data_dir):
|
||||
"""The /health response must include code_version reflecting
|
||||
atocore.__version__, so deployment drift detection works."""
|
||||
from atocore import __version__
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.get("/health")
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["version"] == __version__
|
||||
assert body["code_version"] == __version__
|
||||
|
||||
|
||||
def test_health_endpoint_reports_build_metadata_from_env(tmp_data_dir, monkeypatch):
|
||||
"""The /health response must include build_sha, build_time, and
|
||||
build_branch from the ATOCORE_BUILD_* env vars, so deploy.sh can
|
||||
detect precise drift via SHA comparison instead of relying on
|
||||
the coarse code_version field.
|
||||
|
||||
Regression test for the codex finding from 2026-04-08:
|
||||
code_version 0.2.0 is too coarse to trust as a 'live is current'
|
||||
signal because it only changes on manual bumps. The build_sha
|
||||
field changes per commit and is set by deploy.sh.
|
||||
"""
|
||||
monkeypatch.setenv("ATOCORE_BUILD_SHA", "abc1234567890fedcba0987654321")
|
||||
monkeypatch.setenv("ATOCORE_BUILD_TIME", "2026-04-09T01:23:45Z")
|
||||
monkeypatch.setenv("ATOCORE_BUILD_BRANCH", "main")
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.get("/health")
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["build_sha"] == "abc1234567890fedcba0987654321"
|
||||
assert body["build_time"] == "2026-04-09T01:23:45Z"
|
||||
assert body["build_branch"] == "main"
|
||||
|
||||
|
||||
def test_health_endpoint_reports_unknown_when_build_env_unset(tmp_data_dir, monkeypatch):
|
||||
"""When deploy.sh hasn't set the build env vars (e.g. someone
|
||||
ran `docker compose up` directly), /health reports 'unknown'
|
||||
for all three build fields. This is a clear signal to the
|
||||
operator that the deploy provenance is missing and they should
|
||||
re-run via deploy.sh."""
|
||||
monkeypatch.delenv("ATOCORE_BUILD_SHA", raising=False)
|
||||
monkeypatch.delenv("ATOCORE_BUILD_TIME", raising=False)
|
||||
monkeypatch.delenv("ATOCORE_BUILD_BRANCH", raising=False)
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.get("/health")
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["build_sha"] == "unknown"
|
||||
assert body["build_time"] == "unknown"
|
||||
assert body["build_branch"] == "unknown"
|
||||
|
||||
|
||||
def test_projects_endpoint_reports_registered_projects(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
config_dir = tmp_data_dir / "config"
|
||||
project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
|
||||
project_dir.mkdir(parents=True)
|
||||
drive_dir.mkdir()
|
||||
config_dir.mkdir()
|
||||
|
||||
registry_path = config_dir / "project-registry.json"
|
||||
registry_path.write_text(
|
||||
"""
|
||||
{
|
||||
"projects": [
|
||||
{
|
||||
"id": "p04-gigabit",
|
||||
"aliases": ["p04"],
|
||||
"description": "P04 docs",
|
||||
"ingest_roots": [
|
||||
{"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
""".strip(),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.get("/projects")
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["projects"][0]["id"] == "p04-gigabit"
|
||||
assert body["projects"][0]["ingest_roots"][0]["exists"] is True
|
||||
|
||||
|
||||
def test_project_refresh_endpoint_uses_registered_roots(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
config_dir = tmp_data_dir / "config"
|
||||
project_dir = vault_dir / "incoming" / "projects" / "p05-interferometer"
|
||||
project_dir.mkdir(parents=True)
|
||||
drive_dir.mkdir()
|
||||
config_dir.mkdir()
|
||||
|
||||
registry_path = config_dir / "project-registry.json"
|
||||
registry_path.write_text(
|
||||
"""
|
||||
{
|
||||
"projects": [
|
||||
{
|
||||
"id": "p05-interferometer",
|
||||
"aliases": ["p05"],
|
||||
"description": "P05 docs",
|
||||
"ingest_roots": [
|
||||
{"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
""".strip(),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
calls = []
|
||||
|
||||
def fake_refresh_registered_project(project_name, purge_deleted=False):
|
||||
calls.append((project_name, purge_deleted))
|
||||
return {
|
||||
"project": "p05-interferometer",
|
||||
"aliases": ["p05"],
|
||||
"description": "P05 docs",
|
||||
"purge_deleted": purge_deleted,
|
||||
"status": "ingested",
|
||||
"roots_ingested": 1,
|
||||
"roots_skipped": 0,
|
||||
"roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p05-interferometer",
|
||||
"path": str(project_dir),
|
||||
"status": "ingested",
|
||||
"results": [],
|
||||
}
|
||||
],
|
||||
}
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
monkeypatch.setattr("atocore.api.routes.refresh_registered_project", fake_refresh_registered_project)
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.post("/projects/p05/refresh")
|
||||
|
||||
assert response.status_code == 200
|
||||
assert calls == [("p05", False)]
|
||||
assert response.json()["project"] == "p05-interferometer"
|
||||
|
||||
|
||||
def test_project_refresh_endpoint_serializes_ingestion(tmp_data_dir, monkeypatch):
|
||||
config.settings = config.Settings()
|
||||
events = []
|
||||
|
||||
@contextmanager
|
||||
def fake_lock():
|
||||
events.append("enter")
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
events.append("exit")
|
||||
|
||||
def fake_refresh_registered_project(project_name, purge_deleted=False):
|
||||
events.append(("refresh", project_name, purge_deleted))
|
||||
return {
|
||||
"project": "p05-interferometer",
|
||||
"aliases": ["p05"],
|
||||
"description": "P05 docs",
|
||||
"purge_deleted": purge_deleted,
|
||||
"status": "nothing_to_ingest",
|
||||
"roots_ingested": 0,
|
||||
"roots_skipped": 0,
|
||||
"roots": [],
|
||||
}
|
||||
|
||||
monkeypatch.setattr("atocore.api.routes.exclusive_ingestion", fake_lock)
|
||||
monkeypatch.setattr("atocore.api.routes.refresh_registered_project", fake_refresh_registered_project)
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.post("/projects/p05/refresh")
|
||||
|
||||
assert response.status_code == 200
|
||||
assert events == ["enter", ("refresh", "p05", False), "exit"]
|
||||
|
||||
|
||||
def test_projects_template_endpoint_returns_template(tmp_data_dir, monkeypatch):
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.get("/projects/template")
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["allowed_sources"] == ["vault", "drive"]
|
||||
assert body["template"]["projects"][0]["id"] == "p07-example"
|
||||
|
||||
|
||||
def test_project_proposal_endpoint_returns_normalized_preview(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
config_dir = tmp_data_dir / "config"
|
||||
staged = vault_dir / "incoming" / "projects" / "p07-example"
|
||||
staged.mkdir(parents=True)
|
||||
drive_dir.mkdir()
|
||||
config_dir.mkdir()
|
||||
|
||||
registry_path = config_dir / "project-registry.json"
|
||||
registry_path.write_text('{"projects": []}', encoding="utf-8")
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.post(
|
||||
"/projects/proposal",
|
||||
json={
|
||||
"project_id": "p07-example",
|
||||
"aliases": ["p07", "example-project", "p07"],
|
||||
"description": "Example project",
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p07-example",
|
||||
"label": "Primary docs",
|
||||
}
|
||||
],
|
||||
},
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["project"]["aliases"] == ["p07", "example-project"]
|
||||
assert body["resolved_ingest_roots"][0]["exists"] is True
|
||||
assert body["valid"] is True
|
||||
|
||||
|
||||
def test_project_register_endpoint_persists_entry(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
config_dir = tmp_data_dir / "config"
|
||||
staged = vault_dir / "incoming" / "projects" / "p07-example"
|
||||
staged.mkdir(parents=True)
|
||||
drive_dir.mkdir()
|
||||
config_dir.mkdir()
|
||||
|
||||
registry_path = config_dir / "project-registry.json"
|
||||
registry_path.write_text('{"projects": []}', encoding="utf-8")
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.post(
|
||||
"/projects/register",
|
||||
json={
|
||||
"project_id": "p07-example",
|
||||
"aliases": ["p07", "example-project"],
|
||||
"description": "Example project",
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p07-example",
|
||||
"label": "Primary docs",
|
||||
}
|
||||
],
|
||||
},
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["status"] == "registered"
|
||||
assert body["project"]["id"] == "p07-example"
|
||||
assert '"p07-example"' in registry_path.read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def test_project_register_endpoint_rejects_collisions(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
config_dir = tmp_data_dir / "config"
|
||||
vault_dir.mkdir()
|
||||
drive_dir.mkdir()
|
||||
config_dir.mkdir()
|
||||
|
||||
registry_path = config_dir / "project-registry.json"
|
||||
registry_path.write_text(
|
||||
"""
|
||||
{
|
||||
"projects": [
|
||||
{
|
||||
"id": "p05-interferometer",
|
||||
"aliases": ["p05", "interferometer"],
|
||||
"ingest_roots": [
|
||||
{"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
""".strip(),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.post(
|
||||
"/projects/register",
|
||||
json={
|
||||
"project_id": "p07-example",
|
||||
"aliases": ["interferometer"],
|
||||
"ingest_roots": [
|
||||
{
|
||||
"source": "vault",
|
||||
"subpath": "incoming/projects/p07-example",
|
||||
}
|
||||
],
|
||||
},
|
||||
)
|
||||
|
||||
assert response.status_code == 400
|
||||
assert "collisions" in response.json()["detail"]
|
||||
|
||||
|
||||
def test_project_update_endpoint_persists_changes(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
config_dir = tmp_data_dir / "config"
|
||||
project_dir = vault_dir / "incoming" / "projects" / "p04-gigabit"
|
||||
project_dir.mkdir(parents=True)
|
||||
drive_dir.mkdir()
|
||||
config_dir.mkdir()
|
||||
|
||||
registry_path = config_dir / "project-registry.json"
|
||||
registry_path.write_text(
|
||||
"""
|
||||
{
|
||||
"projects": [
|
||||
{
|
||||
"id": "p04-gigabit",
|
||||
"aliases": ["p04", "gigabit"],
|
||||
"description": "Old description",
|
||||
"ingest_roots": [
|
||||
{"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
""".strip(),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.put(
|
||||
"/projects/p04",
|
||||
json={
|
||||
"aliases": ["p04", "gigabit", "gigabit-project"],
|
||||
"description": "Updated P04 docs",
|
||||
},
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["status"] == "updated"
|
||||
assert body["project"]["aliases"] == ["p04", "gigabit", "gigabit-project"]
|
||||
assert body["project"]["description"] == "Updated P04 docs"
|
||||
|
||||
|
||||
def test_project_update_endpoint_rejects_collisions(tmp_data_dir, monkeypatch):
|
||||
vault_dir = tmp_data_dir / "vault-source"
|
||||
drive_dir = tmp_data_dir / "drive-source"
|
||||
config_dir = tmp_data_dir / "config"
|
||||
vault_dir.mkdir()
|
||||
drive_dir.mkdir()
|
||||
config_dir.mkdir()
|
||||
|
||||
registry_path = config_dir / "project-registry.json"
|
||||
registry_path.write_text(
|
||||
"""
|
||||
{
|
||||
"projects": [
|
||||
{
|
||||
"id": "p04-gigabit",
|
||||
"aliases": ["p04", "gigabit"],
|
||||
"ingest_roots": [
|
||||
{"source": "vault", "subpath": "incoming/projects/p04-gigabit"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "p05-interferometer",
|
||||
"aliases": ["p05", "interferometer"],
|
||||
"ingest_roots": [
|
||||
{"source": "vault", "subpath": "incoming/projects/p05-interferometer"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
""".strip(),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(vault_dir))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(drive_dir))
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.put(
|
||||
"/projects/p04",
|
||||
json={
|
||||
"aliases": ["p04", "interferometer"],
|
||||
},
|
||||
)
|
||||
|
||||
assert response.status_code == 400
|
||||
assert "collisions" in response.json()["detail"]
|
||||
|
||||
|
||||
def test_admin_backup_create_without_chroma(tmp_data_dir, monkeypatch):
|
||||
config.settings = config.Settings()
|
||||
captured = {}
|
||||
|
||||
def fake_create_runtime_backup(timestamp=None, include_chroma=False):
|
||||
captured["include_chroma"] = include_chroma
|
||||
return {
|
||||
"created_at": "2026-04-06T23:00:00+00:00",
|
||||
"backup_root": "/tmp/fake",
|
||||
"db_snapshot_path": "/tmp/fake/db/atocore.db",
|
||||
"db_size_bytes": 0,
|
||||
"registry_snapshot_path": "",
|
||||
"chroma_snapshot_path": "",
|
||||
"chroma_snapshot_bytes": 0,
|
||||
"chroma_snapshot_files": 0,
|
||||
"chroma_snapshot_included": False,
|
||||
"vector_store_note": "skipped",
|
||||
}
|
||||
|
||||
monkeypatch.setattr("atocore.api.routes.create_runtime_backup", fake_create_runtime_backup)
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.post("/admin/backup", json={})
|
||||
|
||||
assert response.status_code == 200
|
||||
assert captured == {"include_chroma": False}
|
||||
body = response.json()
|
||||
assert body["chroma_snapshot_included"] is False
|
||||
|
||||
|
||||
def test_admin_backup_create_with_chroma_holds_lock(tmp_data_dir, monkeypatch):
|
||||
config.settings = config.Settings()
|
||||
events = []
|
||||
|
||||
@contextmanager
|
||||
def fake_lock():
|
||||
events.append("enter")
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
events.append("exit")
|
||||
|
||||
def fake_create_runtime_backup(timestamp=None, include_chroma=False):
|
||||
events.append(("backup", include_chroma))
|
||||
return {
|
||||
"created_at": "2026-04-06T23:30:00+00:00",
|
||||
"backup_root": "/tmp/fake",
|
||||
"db_snapshot_path": "/tmp/fake/db/atocore.db",
|
||||
"db_size_bytes": 0,
|
||||
"registry_snapshot_path": "",
|
||||
"chroma_snapshot_path": "/tmp/fake/chroma",
|
||||
"chroma_snapshot_bytes": 4,
|
||||
"chroma_snapshot_files": 1,
|
||||
"chroma_snapshot_included": True,
|
||||
"vector_store_note": "included",
|
||||
}
|
||||
|
||||
monkeypatch.setattr("atocore.api.routes.exclusive_ingestion", fake_lock)
|
||||
monkeypatch.setattr("atocore.api.routes.create_runtime_backup", fake_create_runtime_backup)
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.post("/admin/backup", json={"include_chroma": True})
|
||||
|
||||
assert response.status_code == 200
|
||||
assert events == ["enter", ("backup", True), "exit"]
|
||||
assert response.json()["chroma_snapshot_included"] is True
|
||||
|
||||
|
||||
def test_admin_backup_list_and_validate_endpoints(tmp_data_dir, monkeypatch):
|
||||
config.settings = config.Settings()
|
||||
|
||||
def fake_list_runtime_backups():
|
||||
return [
|
||||
{
|
||||
"stamp": "20260406T220000Z",
|
||||
"path": "/tmp/fake/snapshots/20260406T220000Z",
|
||||
"has_metadata": True,
|
||||
"metadata": {"db_snapshot_path": "/tmp/fake/snapshots/20260406T220000Z/db/atocore.db"},
|
||||
}
|
||||
]
|
||||
|
||||
def fake_validate_backup(stamp):
|
||||
if stamp == "missing":
|
||||
return {
|
||||
"stamp": stamp,
|
||||
"path": f"/tmp/fake/snapshots/{stamp}",
|
||||
"exists": False,
|
||||
"errors": ["snapshot_directory_missing"],
|
||||
}
|
||||
return {
|
||||
"stamp": stamp,
|
||||
"path": f"/tmp/fake/snapshots/{stamp}",
|
||||
"exists": True,
|
||||
"db_ok": True,
|
||||
"registry_ok": True,
|
||||
"chroma_ok": None,
|
||||
"valid": True,
|
||||
"errors": [],
|
||||
}
|
||||
|
||||
monkeypatch.setattr("atocore.api.routes.list_runtime_backups", fake_list_runtime_backups)
|
||||
monkeypatch.setattr("atocore.api.routes.validate_backup", fake_validate_backup)
|
||||
|
||||
client = TestClient(app)
|
||||
|
||||
listing = client.get("/admin/backup")
|
||||
assert listing.status_code == 200
|
||||
listing_body = listing.json()
|
||||
assert "backup_dir" in listing_body
|
||||
assert listing_body["backups"][0]["stamp"] == "20260406T220000Z"
|
||||
|
||||
valid = client.get("/admin/backup/20260406T220000Z/validate")
|
||||
assert valid.status_code == 200
|
||||
assert valid.json()["valid"] is True
|
||||
|
||||
missing = client.get("/admin/backup/missing/validate")
|
||||
assert missing.status_code == 404
|
||||
|
||||
|
||||
def test_query_endpoint_accepts_project_hint(monkeypatch):
|
||||
def fake_retrieve(prompt, top_k=10, filter_tags=None, project_hint=None):
|
||||
assert prompt == "architecture"
|
||||
assert top_k == 3
|
||||
assert project_hint == "p04-gigabit"
|
||||
return []
|
||||
|
||||
monkeypatch.setattr("atocore.api.routes.retrieve", fake_retrieve)
|
||||
|
||||
client = TestClient(app)
|
||||
response = client.post(
|
||||
"/query",
|
||||
json={
|
||||
"prompt": "architecture",
|
||||
"top_k": 3,
|
||||
"project": "p04-gigabit",
|
||||
},
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
assert response.json()["results"] == []
|
||||
313
tests/test_atocore_client.py
Normal file
313
tests/test_atocore_client.py
Normal file
@@ -0,0 +1,313 @@
|
||||
"""Tests for scripts/atocore_client.py — the shared operator CLI.
|
||||
|
||||
Specifically covers the Phase 9 reflection-loop subcommands added
|
||||
after codex's sequence-step-3 review: ``capture``, ``extract``,
|
||||
``reinforce-interaction``, ``list-interactions``, ``get-interaction``,
|
||||
``queue``, ``promote``, ``reject``.
|
||||
|
||||
The tests mock the client's ``request()`` helper and verify each
|
||||
subcommand:
|
||||
|
||||
- calls the correct HTTP method and path
|
||||
- builds the correct JSON body (or the correct query string)
|
||||
- passes the right subset of CLI arguments through
|
||||
|
||||
This is the same "wiring test" shape used by tests/test_api_storage.py:
|
||||
we don't exercise the live HTTP stack; we verify the client builds
|
||||
the request correctly. The server side is already covered by its
|
||||
own route tests.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
# Make scripts/ importable
|
||||
_REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
sys.path.insert(0, str(_REPO_ROOT / "scripts"))
|
||||
|
||||
import atocore_client as client # noqa: E402
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Request capture helper
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class _RequestCapture:
|
||||
"""Drop-in replacement for client.request() that records calls."""
|
||||
|
||||
def __init__(self, response: dict | None = None):
|
||||
self.calls: list[dict] = []
|
||||
self._response = response if response is not None else {"ok": True}
|
||||
|
||||
def __call__(self, method, path, data=None, timeout=None):
|
||||
self.calls.append(
|
||||
{"method": method, "path": path, "data": data, "timeout": timeout}
|
||||
)
|
||||
return self._response
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def capture_requests(monkeypatch):
|
||||
"""Replace client.request with a recording stub and return it."""
|
||||
stub = _RequestCapture()
|
||||
monkeypatch.setattr(client, "request", stub)
|
||||
return stub
|
||||
|
||||
|
||||
def _run_client(monkeypatch, argv: list[str]) -> int:
|
||||
"""Simulate a CLI invocation with the given argv."""
|
||||
monkeypatch.setattr(sys, "argv", ["atocore_client.py", *argv])
|
||||
return client.main()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# capture
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_capture_posts_to_interactions_endpoint(capture_requests, monkeypatch):
|
||||
_run_client(
|
||||
monkeypatch,
|
||||
[
|
||||
"capture",
|
||||
"what is p05's current focus",
|
||||
"The current focus is wave 2 operational ingestion.",
|
||||
"p05-interferometer",
|
||||
"claude-code-test",
|
||||
"session-abc",
|
||||
],
|
||||
)
|
||||
assert len(capture_requests.calls) == 1
|
||||
call = capture_requests.calls[0]
|
||||
assert call["method"] == "POST"
|
||||
assert call["path"] == "/interactions"
|
||||
body = call["data"]
|
||||
assert body["prompt"] == "what is p05's current focus"
|
||||
assert body["response"].startswith("The current focus")
|
||||
assert body["project"] == "p05-interferometer"
|
||||
assert body["client"] == "claude-code-test"
|
||||
assert body["session_id"] == "session-abc"
|
||||
assert body["reinforce"] is True # default
|
||||
|
||||
|
||||
def test_capture_sets_default_client_when_omitted(capture_requests, monkeypatch):
|
||||
_run_client(
|
||||
monkeypatch,
|
||||
["capture", "hi", "hello"],
|
||||
)
|
||||
call = capture_requests.calls[0]
|
||||
assert call["data"]["client"] == "atocore-client"
|
||||
assert call["data"]["project"] == ""
|
||||
assert call["data"]["reinforce"] is True
|
||||
|
||||
|
||||
def test_capture_accepts_reinforce_false(capture_requests, monkeypatch):
|
||||
_run_client(
|
||||
monkeypatch,
|
||||
["capture", "prompt", "response", "p05", "claude", "sess", "false"],
|
||||
)
|
||||
call = capture_requests.calls[0]
|
||||
assert call["data"]["reinforce"] is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# extract
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_extract_default_is_preview(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["extract", "abc-123"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["method"] == "POST"
|
||||
assert call["path"] == "/interactions/abc-123/extract"
|
||||
assert call["data"] == {"persist": False}
|
||||
|
||||
|
||||
def test_extract_persist_true(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["extract", "abc-123", "true"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["data"] == {"persist": True}
|
||||
|
||||
|
||||
def test_extract_url_encodes_interaction_id(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["extract", "abc/def"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["path"] == "/interactions/abc%2Fdef/extract"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# reinforce-interaction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_reinforce_interaction_posts_to_correct_path(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["reinforce-interaction", "int-xyz"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["method"] == "POST"
|
||||
assert call["path"] == "/interactions/int-xyz/reinforce"
|
||||
assert call["data"] == {}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# list-interactions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_list_interactions_no_filters(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["list-interactions"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["method"] == "GET"
|
||||
assert call["path"] == "/interactions?limit=50"
|
||||
|
||||
|
||||
def test_list_interactions_with_project_filter(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["list-interactions", "p05-interferometer"])
|
||||
call = capture_requests.calls[0]
|
||||
assert "project=p05-interferometer" in call["path"]
|
||||
assert "limit=50" in call["path"]
|
||||
|
||||
|
||||
def test_list_interactions_full_filter_set(capture_requests, monkeypatch):
|
||||
_run_client(
|
||||
monkeypatch,
|
||||
[
|
||||
"list-interactions",
|
||||
"p05",
|
||||
"sess-1",
|
||||
"claude-code",
|
||||
"2026-04-07T00:00:00Z",
|
||||
"20",
|
||||
],
|
||||
)
|
||||
call = capture_requests.calls[0]
|
||||
path = call["path"]
|
||||
assert "project=p05" in path
|
||||
assert "session_id=sess-1" in path
|
||||
assert "client=claude-code" in path
|
||||
# Since is URL-encoded — the : and + chars get escaped
|
||||
assert "since=2026-04-07" in path
|
||||
assert "limit=20" in path
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# get-interaction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_get_interaction_fetches_by_id(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["get-interaction", "int-42"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["method"] == "GET"
|
||||
assert call["path"] == "/interactions/int-42"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# queue
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_queue_always_filters_by_candidate_status(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["queue"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["method"] == "GET"
|
||||
assert call["path"].startswith("/memory?")
|
||||
assert "status=candidate" in call["path"]
|
||||
assert "limit=50" in call["path"]
|
||||
|
||||
|
||||
def test_queue_with_memory_type_and_project(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["queue", "adaptation", "p05-interferometer", "10"])
|
||||
call = capture_requests.calls[0]
|
||||
path = call["path"]
|
||||
assert "status=candidate" in path
|
||||
assert "memory_type=adaptation" in path
|
||||
assert "project=p05-interferometer" in path
|
||||
assert "limit=10" in path
|
||||
|
||||
|
||||
def test_queue_limit_coercion(capture_requests, monkeypatch):
|
||||
"""limit is typed as int by argparse so string '25' becomes 25."""
|
||||
_run_client(monkeypatch, ["queue", "", "", "25"])
|
||||
call = capture_requests.calls[0]
|
||||
assert "limit=25" in call["path"]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# promote / reject
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_promote_posts_to_memory_promote_path(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["promote", "mem-abc"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["method"] == "POST"
|
||||
assert call["path"] == "/memory/mem-abc/promote"
|
||||
assert call["data"] == {}
|
||||
|
||||
|
||||
def test_reject_posts_to_memory_reject_path(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["reject", "mem-xyz"])
|
||||
call = capture_requests.calls[0]
|
||||
assert call["method"] == "POST"
|
||||
assert call["path"] == "/memory/mem-xyz/reject"
|
||||
assert call["data"] == {}
|
||||
|
||||
|
||||
def test_promote_url_encodes_memory_id(capture_requests, monkeypatch):
|
||||
_run_client(monkeypatch, ["promote", "mem/with/slashes"])
|
||||
call = capture_requests.calls[0]
|
||||
assert "mem%2Fwith%2Fslashes" in call["path"]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# end-to-end: ensure the Phase 9 loop can be driven entirely through
|
||||
# the client
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_phase9_full_loop_via_client_shape(capture_requests, monkeypatch):
|
||||
"""Simulate the full capture -> extract -> queue -> promote cycle.
|
||||
|
||||
This doesn't exercise real HTTP — each call is intercepted by
|
||||
the mock request. But it proves every step of the Phase 9 loop
|
||||
is reachable through the shared client, which is the whole point
|
||||
of the codex-step-3 work.
|
||||
"""
|
||||
# Step 1: capture
|
||||
_run_client(
|
||||
monkeypatch,
|
||||
[
|
||||
"capture",
|
||||
"what about GF-PTFE for lateral support",
|
||||
"## Decision: use GF-PTFE pads for thermal stability",
|
||||
"p05-interferometer",
|
||||
],
|
||||
)
|
||||
# Step 2: extract candidates (preview)
|
||||
_run_client(monkeypatch, ["extract", "fake-interaction-id"])
|
||||
# Step 3: extract and persist
|
||||
_run_client(monkeypatch, ["extract", "fake-interaction-id", "true"])
|
||||
# Step 4: list the review queue
|
||||
_run_client(monkeypatch, ["queue"])
|
||||
# Step 5: promote a candidate
|
||||
_run_client(monkeypatch, ["promote", "fake-memory-id"])
|
||||
# Step 6: reject another
|
||||
_run_client(monkeypatch, ["reject", "fake-memory-id-2"])
|
||||
|
||||
methods_and_paths = [
|
||||
(c["method"], c["path"]) for c in capture_requests.calls
|
||||
]
|
||||
assert methods_and_paths == [
|
||||
("POST", "/interactions"),
|
||||
("POST", "/interactions/fake-interaction-id/extract"),
|
||||
("POST", "/interactions/fake-interaction-id/extract"),
|
||||
("GET", "/memory?status=candidate&limit=50"),
|
||||
("POST", "/memory/fake-memory-id/promote"),
|
||||
("POST", "/memory/fake-memory-id-2/reject"),
|
||||
]
|
||||
690
tests/test_backup.py
Normal file
690
tests/test_backup.py
Normal file
@@ -0,0 +1,690 @@
|
||||
"""Tests for runtime backup creation, restore, and retention cleanup."""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
from datetime import UTC, datetime, timedelta
|
||||
|
||||
import pytest
|
||||
|
||||
import atocore.config as config
|
||||
from atocore.models.database import init_db
|
||||
from atocore.ops.backup import (
|
||||
cleanup_old_backups,
|
||||
create_runtime_backup,
|
||||
list_runtime_backups,
|
||||
restore_runtime_backup,
|
||||
validate_backup,
|
||||
)
|
||||
|
||||
|
||||
def test_create_runtime_backup_copies_db_and_registry(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
registry_path = tmp_path / "config" / "project-registry.json"
|
||||
registry_path.parent.mkdir(parents=True)
|
||||
registry_path.write_text('{"projects":[{"id":"p01-example","aliases":[],"ingest_roots":[{"source":"vault","subpath":"incoming/projects/p01-example"}]}]}\n', encoding="utf-8")
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
with sqlite3.connect(str(config.settings.db_path)) as conn:
|
||||
conn.execute("INSERT INTO projects (id, name) VALUES (?, ?)", ("p01", "P01 Example"))
|
||||
conn.commit()
|
||||
|
||||
result = create_runtime_backup(datetime(2026, 4, 6, 18, 0, 0, tzinfo=UTC))
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
db_snapshot = tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "db" / "atocore.db"
|
||||
registry_snapshot = (
|
||||
tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "config" / "project-registry.json"
|
||||
)
|
||||
metadata_path = (
|
||||
tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "backup-metadata.json"
|
||||
)
|
||||
|
||||
assert result["db_snapshot_path"] == str(db_snapshot)
|
||||
assert db_snapshot.exists()
|
||||
assert registry_snapshot.exists()
|
||||
assert metadata_path.exists()
|
||||
|
||||
with sqlite3.connect(str(db_snapshot)) as conn:
|
||||
row = conn.execute("SELECT name FROM projects WHERE id = ?", ("p01",)).fetchone()
|
||||
assert row[0] == "P01 Example"
|
||||
|
||||
metadata = json.loads(metadata_path.read_text(encoding="utf-8"))
|
||||
assert metadata["registry_snapshot_path"] == str(registry_snapshot)
|
||||
|
||||
|
||||
def test_create_runtime_backup_includes_chroma_when_requested(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
|
||||
# Create a fake chroma directory tree with a couple of files.
|
||||
chroma_dir = config.settings.chroma_path
|
||||
(chroma_dir / "collection-a").mkdir(parents=True, exist_ok=True)
|
||||
(chroma_dir / "collection-a" / "data.bin").write_bytes(b"\x00\x01\x02\x03")
|
||||
(chroma_dir / "metadata.json").write_text('{"ok":true}', encoding="utf-8")
|
||||
|
||||
result = create_runtime_backup(
|
||||
datetime(2026, 4, 6, 20, 0, 0, tzinfo=UTC),
|
||||
include_chroma=True,
|
||||
)
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
chroma_snapshot_root = (
|
||||
tmp_path / "backups" / "snapshots" / "20260406T200000Z" / "chroma"
|
||||
)
|
||||
assert result["chroma_snapshot_included"] is True
|
||||
assert result["chroma_snapshot_path"] == str(chroma_snapshot_root)
|
||||
assert result["chroma_snapshot_files"] >= 2
|
||||
assert result["chroma_snapshot_bytes"] > 0
|
||||
assert (chroma_snapshot_root / "collection-a" / "data.bin").exists()
|
||||
assert (chroma_snapshot_root / "metadata.json").exists()
|
||||
|
||||
|
||||
def test_list_and_validate_runtime_backups(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
first = create_runtime_backup(datetime(2026, 4, 6, 21, 0, 0, tzinfo=UTC))
|
||||
second = create_runtime_backup(datetime(2026, 4, 6, 22, 0, 0, tzinfo=UTC))
|
||||
|
||||
listing = list_runtime_backups()
|
||||
first_validation = validate_backup("20260406T210000Z")
|
||||
second_validation = validate_backup("20260406T220000Z")
|
||||
missing_validation = validate_backup("20260101T000000Z")
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
assert len(listing) == 2
|
||||
assert {entry["stamp"] for entry in listing} == {
|
||||
"20260406T210000Z",
|
||||
"20260406T220000Z",
|
||||
}
|
||||
for entry in listing:
|
||||
assert entry["has_metadata"] is True
|
||||
assert entry["metadata"]["db_snapshot_path"]
|
||||
|
||||
assert first_validation["valid"] is True
|
||||
assert first_validation["db_ok"] is True
|
||||
assert first_validation["errors"] == []
|
||||
|
||||
assert second_validation["valid"] is True
|
||||
|
||||
assert missing_validation["exists"] is False
|
||||
assert "snapshot_directory_missing" in missing_validation["errors"]
|
||||
|
||||
# both metadata paths are reachable on disk
|
||||
assert json.loads(
|
||||
(tmp_path / "backups" / "snapshots" / "20260406T210000Z" / "backup-metadata.json")
|
||||
.read_text(encoding="utf-8")
|
||||
)["db_snapshot_path"] == first["db_snapshot_path"]
|
||||
assert second["db_snapshot_path"].endswith("atocore.db")
|
||||
|
||||
|
||||
def test_create_runtime_backup_handles_missing_registry(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
result = create_runtime_backup(datetime(2026, 4, 6, 19, 0, 0, tzinfo=UTC))
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
assert result["registry_snapshot_path"] == ""
|
||||
|
||||
|
||||
def test_restore_refuses_without_confirm_service_stopped(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
create_runtime_backup(datetime(2026, 4, 9, 10, 0, 0, tzinfo=UTC))
|
||||
|
||||
with pytest.raises(RuntimeError, match="confirm_service_stopped"):
|
||||
restore_runtime_backup("20260409T100000Z")
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
|
||||
def test_restore_raises_on_invalid_backup(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
with pytest.raises(RuntimeError, match="failed validation"):
|
||||
restore_runtime_backup(
|
||||
"20250101T000000Z", confirm_service_stopped=True
|
||||
)
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
|
||||
def test_restore_round_trip_reverses_post_backup_mutations(tmp_path, monkeypatch):
|
||||
"""Canonical drill: snapshot -> mutate -> restore -> mutation gone."""
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
registry_path = tmp_path / "config" / "project-registry.json"
|
||||
registry_path.parent.mkdir(parents=True)
|
||||
registry_path.write_text(
|
||||
'{"projects":[{"id":"p01-example","aliases":[],'
|
||||
'"ingest_roots":[{"source":"vault","subpath":"incoming/projects/p01-example"}]}]}\n',
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
|
||||
# 1. Seed baseline state that should SURVIVE the restore.
|
||||
with sqlite3.connect(str(config.settings.db_path)) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO projects (id, name) VALUES (?, ?)",
|
||||
("p01", "Baseline Project"),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
# 2. Create the backup we're going to restore to.
|
||||
create_runtime_backup(datetime(2026, 4, 9, 11, 0, 0, tzinfo=UTC))
|
||||
stamp = "20260409T110000Z"
|
||||
|
||||
# 3. Mutate live state AFTER the backup — this is what the
|
||||
# restore should reverse.
|
||||
with sqlite3.connect(str(config.settings.db_path)) as conn:
|
||||
conn.execute(
|
||||
"INSERT INTO projects (id, name) VALUES (?, ?)",
|
||||
("p99", "Post Backup Mutation"),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
# Confirm the mutation is present before restore.
|
||||
with sqlite3.connect(str(config.settings.db_path)) as conn:
|
||||
row = conn.execute(
|
||||
"SELECT name FROM projects WHERE id = ?", ("p99",)
|
||||
).fetchone()
|
||||
assert row is not None and row[0] == "Post Backup Mutation"
|
||||
|
||||
# 4. Restore — the drill procedure. Explicit confirm_service_stopped.
|
||||
result = restore_runtime_backup(
|
||||
stamp, confirm_service_stopped=True
|
||||
)
|
||||
|
||||
# 5. Verify restore report
|
||||
assert result["stamp"] == stamp
|
||||
assert result["db_restored"] is True
|
||||
assert result["registry_restored"] is True
|
||||
assert result["restored_integrity_ok"] is True
|
||||
assert result["pre_restore_snapshot"] is not None
|
||||
|
||||
# 6. Verify live state reflects the restore: baseline survived,
|
||||
# post-backup mutation is gone.
|
||||
with sqlite3.connect(str(config.settings.db_path)) as conn:
|
||||
baseline = conn.execute(
|
||||
"SELECT name FROM projects WHERE id = ?", ("p01",)
|
||||
).fetchone()
|
||||
mutation = conn.execute(
|
||||
"SELECT name FROM projects WHERE id = ?", ("p99",)
|
||||
).fetchone()
|
||||
assert baseline is not None and baseline[0] == "Baseline Project"
|
||||
assert mutation is None
|
||||
|
||||
# 7. Pre-restore safety snapshot DOES contain the mutation —
|
||||
# it captured current state before overwriting. This is the
|
||||
# reversibility guarantee: the operator can restore back to
|
||||
# it if the restore itself was a mistake.
|
||||
pre_stamp = result["pre_restore_snapshot"]
|
||||
pre_validation = validate_backup(pre_stamp)
|
||||
assert pre_validation["valid"] is True
|
||||
pre_db_path = pre_validation["metadata"]["db_snapshot_path"]
|
||||
with sqlite3.connect(pre_db_path) as conn:
|
||||
pre_mutation = conn.execute(
|
||||
"SELECT name FROM projects WHERE id = ?", ("p99",)
|
||||
).fetchone()
|
||||
assert pre_mutation is not None and pre_mutation[0] == "Post Backup Mutation"
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
|
||||
def test_restore_round_trip_with_chroma(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
|
||||
# Seed baseline chroma state that should survive restore.
|
||||
chroma_dir = config.settings.chroma_path
|
||||
(chroma_dir / "coll-a").mkdir(parents=True, exist_ok=True)
|
||||
(chroma_dir / "coll-a" / "baseline.bin").write_bytes(b"baseline")
|
||||
|
||||
create_runtime_backup(
|
||||
datetime(2026, 4, 9, 12, 0, 0, tzinfo=UTC), include_chroma=True
|
||||
)
|
||||
stamp = "20260409T120000Z"
|
||||
|
||||
# Mutate chroma after backup: add a file + remove baseline.
|
||||
(chroma_dir / "coll-a" / "post_backup.bin").write_bytes(b"post")
|
||||
(chroma_dir / "coll-a" / "baseline.bin").unlink()
|
||||
|
||||
result = restore_runtime_backup(
|
||||
stamp, confirm_service_stopped=True
|
||||
)
|
||||
|
||||
assert result["chroma_restored"] is True
|
||||
assert (chroma_dir / "coll-a" / "baseline.bin").exists()
|
||||
assert not (chroma_dir / "coll-a" / "post_backup.bin").exists()
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
|
||||
def test_restore_chroma_does_not_unlink_destination_directory(tmp_path, monkeypatch):
|
||||
"""Regression: restore must not rmtree the chroma dir itself.
|
||||
|
||||
In a Dockerized deployment the chroma dir is a bind-mounted
|
||||
volume. Calling shutil.rmtree on a mount point raises
|
||||
``OSError [Errno 16] Device or resource busy``, which broke the
|
||||
first real Dalidou drill on 2026-04-09. The fix clears the
|
||||
directory's CONTENTS and copytree(dirs_exist_ok=True) into it,
|
||||
keeping the directory inode (and any bind mount) intact.
|
||||
|
||||
This test captures the inode of the destination directory before
|
||||
and after restore and asserts they match — that's what a
|
||||
bind-mounted chroma dir would also see.
|
||||
"""
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
|
||||
chroma_dir = config.settings.chroma_path
|
||||
(chroma_dir / "coll-a").mkdir(parents=True, exist_ok=True)
|
||||
(chroma_dir / "coll-a" / "baseline.bin").write_bytes(b"baseline")
|
||||
|
||||
create_runtime_backup(
|
||||
datetime(2026, 4, 9, 15, 0, 0, tzinfo=UTC), include_chroma=True
|
||||
)
|
||||
|
||||
# Capture the destination directory's stat signature before restore.
|
||||
chroma_stat_before = chroma_dir.stat()
|
||||
|
||||
# Add a file post-backup so restore has work to do.
|
||||
(chroma_dir / "coll-a" / "post_backup.bin").write_bytes(b"post")
|
||||
|
||||
restore_runtime_backup(
|
||||
"20260409T150000Z", confirm_service_stopped=True
|
||||
)
|
||||
|
||||
# Directory still exists (would have failed on mount point) and
|
||||
# its st_ino matches — the mount itself wasn't unlinked.
|
||||
assert chroma_dir.exists()
|
||||
chroma_stat_after = chroma_dir.stat()
|
||||
assert chroma_stat_before.st_ino == chroma_stat_after.st_ino, (
|
||||
"chroma directory inode changed — restore recreated the "
|
||||
"directory instead of clearing its contents; this would "
|
||||
"fail on a Docker bind-mounted volume"
|
||||
)
|
||||
# And the contents did actually get restored.
|
||||
assert (chroma_dir / "coll-a" / "baseline.bin").exists()
|
||||
assert not (chroma_dir / "coll-a" / "post_backup.bin").exists()
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
|
||||
def test_restore_skips_pre_snapshot_when_requested(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
create_runtime_backup(datetime(2026, 4, 9, 13, 0, 0, tzinfo=UTC))
|
||||
|
||||
before_count = len(list_runtime_backups())
|
||||
|
||||
result = restore_runtime_backup(
|
||||
"20260409T130000Z",
|
||||
confirm_service_stopped=True,
|
||||
pre_restore_snapshot=False,
|
||||
)
|
||||
|
||||
after_count = len(list_runtime_backups())
|
||||
assert result["pre_restore_snapshot"] is None
|
||||
assert after_count == before_count
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
|
||||
def test_create_backup_includes_validation_fields(tmp_path, monkeypatch):
|
||||
"""Task B: create_runtime_backup auto-validates and reports result."""
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
result = create_runtime_backup(datetime(2026, 4, 11, 10, 0, 0, tzinfo=UTC))
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
assert "validated" in result
|
||||
assert "validation_errors" in result
|
||||
assert result["validated"] is True
|
||||
assert result["validation_errors"] == []
|
||||
|
||||
|
||||
def test_create_backup_validation_failure_does_not_raise(tmp_path, monkeypatch):
|
||||
"""Task B: if post-backup validation fails, backup still returns metadata."""
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
def _broken_validate(stamp):
|
||||
return {"valid": False, "errors": ["db_missing", "metadata_missing"]}
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
monkeypatch.setattr("atocore.ops.backup.validate_backup", _broken_validate)
|
||||
result = create_runtime_backup(datetime(2026, 4, 11, 11, 0, 0, tzinfo=UTC))
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
# Should NOT have raised — backup still returned metadata
|
||||
assert result["validated"] is False
|
||||
assert result["validation_errors"] == ["db_missing", "metadata_missing"]
|
||||
# Core backup fields still present
|
||||
assert "db_snapshot_path" in result
|
||||
assert "created_at" in result
|
||||
|
||||
|
||||
def test_restore_cleans_stale_wal_sidecars(tmp_path, monkeypatch):
|
||||
"""Stale WAL/SHM sidecars must not carry bytes past the restore.
|
||||
|
||||
Note: after restore runs, PRAGMA integrity_check reopens the
|
||||
restored db which may legitimately recreate a fresh -wal. So we
|
||||
assert that the STALE byte marker no longer appears in either
|
||||
sidecar, not that the files are absent.
|
||||
"""
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
create_runtime_backup(datetime(2026, 4, 9, 14, 0, 0, tzinfo=UTC))
|
||||
|
||||
# Write fake stale WAL/SHM next to the live db with an
|
||||
# unmistakable marker.
|
||||
target_db = config.settings.db_path
|
||||
wal = target_db.with_name(target_db.name + "-wal")
|
||||
shm = target_db.with_name(target_db.name + "-shm")
|
||||
stale_marker = b"STALE-SIDECAR-MARKER-DO-NOT-SURVIVE"
|
||||
wal.write_bytes(stale_marker)
|
||||
shm.write_bytes(stale_marker)
|
||||
assert wal.exists() and shm.exists()
|
||||
|
||||
restore_runtime_backup(
|
||||
"20260409T140000Z", confirm_service_stopped=True
|
||||
)
|
||||
|
||||
# The restored db must pass integrity check (tested elsewhere);
|
||||
# here we just confirm that no file next to it still contains
|
||||
# the stale marker from the old live process.
|
||||
for sidecar in (wal, shm):
|
||||
if sidecar.exists():
|
||||
assert stale_marker not in sidecar.read_bytes(), (
|
||||
f"{sidecar.name} still carries stale marker"
|
||||
)
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Task C: Backup retention cleanup
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _setup_cleanup_env(tmp_path, monkeypatch):
|
||||
"""Helper: configure env, init db, return snapshots_root."""
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
original = config.settings
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
snapshots_root = config.settings.resolved_backup_dir / "snapshots"
|
||||
snapshots_root.mkdir(parents=True, exist_ok=True)
|
||||
return original, snapshots_root
|
||||
|
||||
|
||||
def _seed_snapshots(snapshots_root, dates):
|
||||
"""Create minimal valid snapshot dirs for the given datetimes."""
|
||||
for dt in dates:
|
||||
stamp = dt.strftime("%Y%m%dT%H%M%SZ")
|
||||
snap_dir = snapshots_root / stamp
|
||||
db_dir = snap_dir / "db"
|
||||
db_dir.mkdir(parents=True, exist_ok=True)
|
||||
db_path = db_dir / "atocore.db"
|
||||
conn = sqlite3.connect(str(db_path))
|
||||
conn.execute("CREATE TABLE IF NOT EXISTS _marker (id INTEGER)")
|
||||
conn.close()
|
||||
metadata = {
|
||||
"created_at": dt.isoformat(),
|
||||
"backup_root": str(snap_dir),
|
||||
"db_snapshot_path": str(db_path),
|
||||
"db_size_bytes": db_path.stat().st_size,
|
||||
"registry_snapshot_path": "",
|
||||
"chroma_snapshot_path": "",
|
||||
"chroma_snapshot_bytes": 0,
|
||||
"chroma_snapshot_files": 0,
|
||||
"chroma_snapshot_included": False,
|
||||
"vector_store_note": "",
|
||||
}
|
||||
(snap_dir / "backup-metadata.json").write_text(
|
||||
json.dumps(metadata, indent=2) + "\n", encoding="utf-8"
|
||||
)
|
||||
|
||||
|
||||
def test_cleanup_empty_dir(tmp_path, monkeypatch):
|
||||
original, _ = _setup_cleanup_env(tmp_path, monkeypatch)
|
||||
try:
|
||||
result = cleanup_old_backups()
|
||||
assert result["kept"] == 0
|
||||
assert result["would_delete"] == 0
|
||||
assert result["dry_run"] is True
|
||||
finally:
|
||||
config.settings = original
|
||||
|
||||
|
||||
def test_cleanup_dry_run_identifies_old_snapshots(tmp_path, monkeypatch):
|
||||
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
|
||||
try:
|
||||
# 10 daily snapshots Apr 2-11 (avoiding Apr 1 which is monthly).
|
||||
base = datetime(2026, 4, 2, 12, 0, 0, tzinfo=UTC)
|
||||
dates = [base + timedelta(days=i) for i in range(10)]
|
||||
_seed_snapshots(snapshots_root, dates)
|
||||
|
||||
result = cleanup_old_backups()
|
||||
assert result["dry_run"] is True
|
||||
# 7 daily kept + Apr 5 is a Sunday (weekly) but already in daily.
|
||||
# Apr 2, 3, 4 are oldest. Apr 5 is Sunday → kept as weekly.
|
||||
# So: 7 daily (Apr 5-11) + 1 weekly (Apr 5 already counted) = 7 daily.
|
||||
# But Apr 5 is the 8th newest day from Apr 11... wait.
|
||||
# Newest 7 days: Apr 11,10,9,8,7,6,5 → all kept as daily.
|
||||
# Remaining: Apr 4,3,2. Apr 5 is already in daily.
|
||||
# None of Apr 4,3,2 are Sunday or 1st → all 3 deleted.
|
||||
assert result["kept"] == 7
|
||||
assert result["would_delete"] == 3
|
||||
assert len(list(snapshots_root.iterdir())) == 10
|
||||
finally:
|
||||
config.settings = original
|
||||
|
||||
|
||||
def test_cleanup_confirm_deletes(tmp_path, monkeypatch):
|
||||
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
|
||||
try:
|
||||
base = datetime(2026, 4, 2, 12, 0, 0, tzinfo=UTC)
|
||||
dates = [base + timedelta(days=i) for i in range(10)]
|
||||
_seed_snapshots(snapshots_root, dates)
|
||||
|
||||
result = cleanup_old_backups(confirm=True)
|
||||
assert result["dry_run"] is False
|
||||
assert result["deleted"] == 3
|
||||
assert result["kept"] == 7
|
||||
assert len(list(snapshots_root.iterdir())) == 7
|
||||
finally:
|
||||
config.settings = original
|
||||
|
||||
|
||||
def test_cleanup_keeps_last_7_daily(tmp_path, monkeypatch):
|
||||
"""Exactly 7 snapshots on different days → all kept."""
|
||||
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
|
||||
try:
|
||||
base = datetime(2026, 4, 5, 12, 0, 0, tzinfo=UTC)
|
||||
dates = [base + timedelta(days=i) for i in range(7)]
|
||||
_seed_snapshots(snapshots_root, dates)
|
||||
|
||||
result = cleanup_old_backups()
|
||||
assert result["kept"] == 7
|
||||
assert result["would_delete"] == 0
|
||||
finally:
|
||||
config.settings = original
|
||||
|
||||
|
||||
def test_cleanup_keeps_sunday_weekly(tmp_path, monkeypatch):
|
||||
"""Snapshots on Sundays outside the 7-day window are kept as weekly."""
|
||||
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
|
||||
try:
|
||||
# 7 daily snapshots covering Apr 5-11
|
||||
base = datetime(2026, 4, 5, 12, 0, 0, tzinfo=UTC)
|
||||
daily = [base + timedelta(days=i) for i in range(7)]
|
||||
|
||||
# 2 older Sunday snapshots
|
||||
sun1 = datetime(2026, 3, 29, 12, 0, 0, tzinfo=UTC) # Sunday
|
||||
sun2 = datetime(2026, 3, 22, 12, 0, 0, tzinfo=UTC) # Sunday
|
||||
# A non-Sunday old snapshot that should be deleted
|
||||
wed = datetime(2026, 3, 25, 12, 0, 0, tzinfo=UTC) # Wednesday
|
||||
|
||||
_seed_snapshots(snapshots_root, daily + [sun1, sun2, wed])
|
||||
|
||||
result = cleanup_old_backups()
|
||||
# 7 daily + 2 Sunday weekly = 9 kept, 1 Wednesday deleted
|
||||
assert result["kept"] == 9
|
||||
assert result["would_delete"] == 1
|
||||
finally:
|
||||
config.settings = original
|
||||
|
||||
|
||||
def test_cleanup_keeps_monthly_first(tmp_path, monkeypatch):
|
||||
"""Snapshots on the 1st of a month outside daily+weekly are kept as monthly."""
|
||||
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
|
||||
try:
|
||||
# 7 daily in April 2026
|
||||
base = datetime(2026, 4, 5, 12, 0, 0, tzinfo=UTC)
|
||||
daily = [base + timedelta(days=i) for i in range(7)]
|
||||
|
||||
# Old monthly 1st snapshots
|
||||
m1 = datetime(2026, 1, 1, 12, 0, 0, tzinfo=UTC)
|
||||
m2 = datetime(2025, 12, 1, 12, 0, 0, tzinfo=UTC)
|
||||
# Old non-1st, non-Sunday snapshot — should be deleted
|
||||
old = datetime(2026, 1, 15, 12, 0, 0, tzinfo=UTC)
|
||||
|
||||
_seed_snapshots(snapshots_root, daily + [m1, m2, old])
|
||||
|
||||
result = cleanup_old_backups()
|
||||
# 7 daily + 2 monthly = 9 kept, 1 deleted
|
||||
assert result["kept"] == 9
|
||||
assert result["would_delete"] == 1
|
||||
finally:
|
||||
config.settings = original
|
||||
|
||||
|
||||
def test_cleanup_unparseable_stamp_skipped(tmp_path, monkeypatch):
|
||||
"""Directories with unparseable names are ignored, not deleted."""
|
||||
original, snapshots_root = _setup_cleanup_env(tmp_path, monkeypatch)
|
||||
try:
|
||||
base = datetime(2026, 4, 5, 12, 0, 0, tzinfo=UTC)
|
||||
_seed_snapshots(snapshots_root, [base])
|
||||
|
||||
bad_dir = snapshots_root / "not-a-timestamp"
|
||||
bad_dir.mkdir()
|
||||
|
||||
result = cleanup_old_backups(confirm=True)
|
||||
assert result.get("unparseable") == ["not-a-timestamp"]
|
||||
assert bad_dir.exists()
|
||||
assert result["kept"] == 1
|
||||
finally:
|
||||
config.settings = original
|
||||
249
tests/test_capture_stop.py
Normal file
249
tests/test_capture_stop.py
Normal file
@@ -0,0 +1,249 @@
|
||||
"""Tests for deploy/hooks/capture_stop.py — Claude Code Stop hook."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
import textwrap
|
||||
from io import StringIO
|
||||
from pathlib import Path
|
||||
from unittest import mock
|
||||
|
||||
import pytest
|
||||
|
||||
# The hook script lives outside of the normal package tree, so import
|
||||
# it by manipulating sys.path.
|
||||
_HOOK_DIR = str(Path(__file__).resolve().parent.parent / "deploy" / "hooks")
|
||||
if _HOOK_DIR not in sys.path:
|
||||
sys.path.insert(0, _HOOK_DIR)
|
||||
|
||||
import capture_stop # noqa: E402
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _write_transcript(tmp: Path, entries: list[dict]) -> str:
|
||||
"""Write a JSONL transcript and return the path."""
|
||||
path = tmp / "transcript.jsonl"
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
for entry in entries:
|
||||
f.write(json.dumps(entry, ensure_ascii=False) + "\n")
|
||||
return str(path)
|
||||
|
||||
|
||||
def _user_entry(content: str, *, is_meta: bool = False) -> dict:
|
||||
return {
|
||||
"type": "user",
|
||||
"isMeta": is_meta,
|
||||
"message": {"role": "user", "content": content},
|
||||
}
|
||||
|
||||
|
||||
def _assistant_entry() -> dict:
|
||||
return {
|
||||
"type": "assistant",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": [{"type": "text", "text": "Sure, here's the answer."}],
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def _system_entry() -> dict:
|
||||
return {"type": "system", "message": {"role": "system", "content": "system init"}}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _extract_last_user_prompt
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestExtractLastUserPrompt:
|
||||
def test_returns_last_real_prompt(self, tmp_path):
|
||||
path = _write_transcript(tmp_path, [
|
||||
_user_entry("First prompt that is long enough to capture"),
|
||||
_assistant_entry(),
|
||||
_user_entry("Second prompt that should be the one we capture"),
|
||||
_assistant_entry(),
|
||||
])
|
||||
result = capture_stop._extract_last_user_prompt(path)
|
||||
assert result == "Second prompt that should be the one we capture"
|
||||
|
||||
def test_skips_meta_messages(self, tmp_path):
|
||||
path = _write_transcript(tmp_path, [
|
||||
_user_entry("Real prompt that is definitely long enough"),
|
||||
_user_entry("<local-command>some system stuff</local-command>"),
|
||||
_user_entry("Meta message that looks real enough", is_meta=True),
|
||||
])
|
||||
result = capture_stop._extract_last_user_prompt(path)
|
||||
assert result == "Real prompt that is definitely long enough"
|
||||
|
||||
def test_skips_xml_content(self, tmp_path):
|
||||
path = _write_transcript(tmp_path, [
|
||||
_user_entry("Actual prompt from a real human user"),
|
||||
_user_entry("<command-name>/help</command-name>"),
|
||||
])
|
||||
result = capture_stop._extract_last_user_prompt(path)
|
||||
assert result == "Actual prompt from a real human user"
|
||||
|
||||
def test_skips_short_messages(self, tmp_path):
|
||||
path = _write_transcript(tmp_path, [
|
||||
_user_entry("This prompt is long enough to be captured"),
|
||||
_user_entry("yes"), # too short
|
||||
])
|
||||
result = capture_stop._extract_last_user_prompt(path)
|
||||
assert result == "This prompt is long enough to be captured"
|
||||
|
||||
def test_handles_content_blocks(self, tmp_path):
|
||||
entry = {
|
||||
"type": "user",
|
||||
"message": {
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": "First paragraph of the prompt."},
|
||||
{"type": "text", "text": "Second paragraph continues here."},
|
||||
],
|
||||
},
|
||||
}
|
||||
path = _write_transcript(tmp_path, [entry])
|
||||
result = capture_stop._extract_last_user_prompt(path)
|
||||
assert "First paragraph" in result
|
||||
assert "Second paragraph" in result
|
||||
|
||||
def test_empty_transcript(self, tmp_path):
|
||||
path = _write_transcript(tmp_path, [])
|
||||
result = capture_stop._extract_last_user_prompt(path)
|
||||
assert result == ""
|
||||
|
||||
def test_missing_file(self):
|
||||
result = capture_stop._extract_last_user_prompt("/nonexistent/path.jsonl")
|
||||
assert result == ""
|
||||
|
||||
def test_empty_path(self):
|
||||
result = capture_stop._extract_last_user_prompt("")
|
||||
assert result == ""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _infer_project
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestInferProject:
|
||||
def test_empty_cwd(self):
|
||||
assert capture_stop._infer_project("") == ""
|
||||
|
||||
def test_unknown_path(self):
|
||||
assert capture_stop._infer_project("C:\\Users\\antoi\\random") == ""
|
||||
|
||||
def test_mapped_path(self):
|
||||
with mock.patch.dict(capture_stop._PROJECT_PATH_MAP, {
|
||||
"C:\\Users\\antoi\\gigabit": "p04-gigabit",
|
||||
}):
|
||||
result = capture_stop._infer_project("C:\\Users\\antoi\\gigabit\\src")
|
||||
assert result == "p04-gigabit"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _capture (integration-style, mocking HTTP)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestCapture:
|
||||
def _hook_input(self, *, transcript_path: str = "", **overrides) -> str:
|
||||
data = {
|
||||
"session_id": "test-session-123",
|
||||
"transcript_path": transcript_path,
|
||||
"cwd": "C:\\Users\\antoi\\ATOCore",
|
||||
"permission_mode": "default",
|
||||
"hook_event_name": "Stop",
|
||||
"last_assistant_message": "Here is the answer to your question about the code.",
|
||||
"turn_number": 3,
|
||||
}
|
||||
data.update(overrides)
|
||||
return json.dumps(data)
|
||||
|
||||
@mock.patch("capture_stop.urllib.request.urlopen")
|
||||
def test_posts_to_atocore(self, mock_urlopen, tmp_path):
|
||||
transcript = _write_transcript(tmp_path, [
|
||||
_user_entry("Please explain how the backup system works in detail"),
|
||||
_assistant_entry(),
|
||||
])
|
||||
mock_resp = mock.MagicMock()
|
||||
mock_resp.read.return_value = json.dumps({"id": "int-001", "status": "recorded"}).encode()
|
||||
mock_urlopen.return_value = mock_resp
|
||||
|
||||
with mock.patch("sys.stdin", StringIO(self._hook_input(transcript_path=transcript))):
|
||||
capture_stop._capture()
|
||||
|
||||
mock_urlopen.assert_called_once()
|
||||
req = mock_urlopen.call_args[0][0]
|
||||
body = json.loads(req.data.decode())
|
||||
assert body["prompt"] == "Please explain how the backup system works in detail"
|
||||
assert body["client"] == "claude-code"
|
||||
assert body["session_id"] == "test-session-123"
|
||||
assert body["reinforce"] is True
|
||||
|
||||
@mock.patch("capture_stop.urllib.request.urlopen")
|
||||
def test_skips_when_disabled(self, mock_urlopen, tmp_path):
|
||||
transcript = _write_transcript(tmp_path, [
|
||||
_user_entry("A prompt that would normally be captured"),
|
||||
])
|
||||
with mock.patch.dict(os.environ, {"ATOCORE_CAPTURE_DISABLED": "1"}):
|
||||
with mock.patch("sys.stdin", StringIO(self._hook_input(transcript_path=transcript))):
|
||||
capture_stop._capture()
|
||||
mock_urlopen.assert_not_called()
|
||||
|
||||
@mock.patch("capture_stop.urllib.request.urlopen")
|
||||
def test_skips_short_prompt(self, mock_urlopen, tmp_path):
|
||||
transcript = _write_transcript(tmp_path, [
|
||||
_user_entry("yes"),
|
||||
])
|
||||
with mock.patch("sys.stdin", StringIO(self._hook_input(transcript_path=transcript))):
|
||||
capture_stop._capture()
|
||||
mock_urlopen.assert_not_called()
|
||||
|
||||
@mock.patch("capture_stop.urllib.request.urlopen")
|
||||
def test_truncates_long_response(self, mock_urlopen, tmp_path):
|
||||
transcript = _write_transcript(tmp_path, [
|
||||
_user_entry("Tell me everything about the entire codebase architecture"),
|
||||
])
|
||||
long_response = "x" * 60_000
|
||||
mock_resp = mock.MagicMock()
|
||||
mock_resp.read.return_value = json.dumps({"id": "int-002"}).encode()
|
||||
mock_urlopen.return_value = mock_resp
|
||||
|
||||
with mock.patch("sys.stdin", StringIO(
|
||||
self._hook_input(transcript_path=transcript, last_assistant_message=long_response)
|
||||
)):
|
||||
capture_stop._capture()
|
||||
|
||||
req = mock_urlopen.call_args[0][0]
|
||||
body = json.loads(req.data.decode())
|
||||
assert len(body["response"]) <= capture_stop.MAX_RESPONSE_LENGTH + 20
|
||||
assert body["response"].endswith("[truncated]")
|
||||
|
||||
def test_main_never_raises(self):
|
||||
"""main() must always exit 0, even on garbage input."""
|
||||
with mock.patch("sys.stdin", StringIO("not json at all")):
|
||||
# Should not raise
|
||||
capture_stop.main()
|
||||
|
||||
@mock.patch("capture_stop.urllib.request.urlopen")
|
||||
def test_uses_atocore_url_env(self, mock_urlopen, tmp_path):
|
||||
transcript = _write_transcript(tmp_path, [
|
||||
_user_entry("Please help me with this particular problem in the code"),
|
||||
])
|
||||
mock_resp = mock.MagicMock()
|
||||
mock_resp.read.return_value = json.dumps({"id": "int-003"}).encode()
|
||||
mock_urlopen.return_value = mock_resp
|
||||
|
||||
with mock.patch.dict(os.environ, {"ATOCORE_URL": "http://localhost:9999"}):
|
||||
# Re-read the env var
|
||||
with mock.patch.object(capture_stop, "ATOCORE_URL", "http://localhost:9999"):
|
||||
with mock.patch("sys.stdin", StringIO(self._hook_input(transcript_path=transcript))):
|
||||
capture_stop._capture()
|
||||
|
||||
req = mock_urlopen.call_args[0][0]
|
||||
assert req.full_url == "http://localhost:9999/interactions"
|
||||
73
tests/test_chunker.py
Normal file
73
tests/test_chunker.py
Normal file
@@ -0,0 +1,73 @@
|
||||
"""Tests for the markdown chunker."""
|
||||
|
||||
from atocore.ingestion.chunker import chunk_markdown
|
||||
|
||||
|
||||
def test_basic_chunking():
|
||||
"""Test that markdown is split into chunks."""
|
||||
body = """## Section One
|
||||
|
||||
This is the first section with some content that is long enough to pass the minimum chunk size filter applied by the chunker.
|
||||
|
||||
## Section Two
|
||||
|
||||
This is the second section with different content that is also long enough to pass the minimum chunk size threshold.
|
||||
"""
|
||||
chunks = chunk_markdown(body)
|
||||
assert len(chunks) >= 2
|
||||
assert all(c.char_count > 0 for c in chunks)
|
||||
assert all(c.chunk_index >= 0 for c in chunks)
|
||||
|
||||
|
||||
def test_heading_path_preserved():
|
||||
"""Test that heading paths are captured."""
|
||||
body = """## Architecture
|
||||
|
||||
### Layers
|
||||
|
||||
The system has multiple layers organized in a clear hierarchy for separation of concerns and maintainability.
|
||||
"""
|
||||
chunks = chunk_markdown(body)
|
||||
assert len(chunks) >= 1
|
||||
# At least one chunk should have heading info
|
||||
has_heading = any(c.heading_path for c in chunks)
|
||||
assert has_heading
|
||||
|
||||
|
||||
def test_small_chunks_filtered():
|
||||
"""Test that very small chunks are discarded."""
|
||||
body = """## A
|
||||
|
||||
Hi
|
||||
|
||||
## B
|
||||
|
||||
This is a real section with enough content to pass the minimum size threshold.
|
||||
"""
|
||||
chunks = chunk_markdown(body, min_size=50)
|
||||
# "Hi" should be filtered out
|
||||
for c in chunks:
|
||||
assert c.char_count >= 50
|
||||
|
||||
|
||||
def test_large_section_split():
|
||||
"""Test that large sections are split further."""
|
||||
large_content = "Word " * 200 # ~1000 chars
|
||||
body = f"## Big Section\n\n{large_content}"
|
||||
chunks = chunk_markdown(body, max_size=400)
|
||||
assert len(chunks) >= 2
|
||||
|
||||
|
||||
def test_metadata_passed_through():
|
||||
"""Test that base metadata is included in chunks."""
|
||||
body = "## Test\n\nSome content here that is long enough."
|
||||
meta = {"source_file": "/test/file.md", "tags": ["test"]}
|
||||
chunks = chunk_markdown(body, base_metadata=meta)
|
||||
if chunks:
|
||||
assert chunks[0].metadata.get("source_file") == "/test/file.md"
|
||||
|
||||
|
||||
def test_empty_body():
|
||||
"""Test chunking an empty body."""
|
||||
chunks = chunk_markdown("")
|
||||
assert chunks == []
|
||||
89
tests/test_config.py
Normal file
89
tests/test_config.py
Normal file
@@ -0,0 +1,89 @@
|
||||
"""Tests for configuration and canonical path boundaries."""
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import atocore.config as config
|
||||
|
||||
|
||||
def test_settings_resolve_canonical_directories(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(tmp_path / "vault-source"))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(tmp_path / "drive-source"))
|
||||
monkeypatch.setenv("ATOCORE_LOG_DIR", str(tmp_path / "logs"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
settings = config.Settings()
|
||||
|
||||
assert settings.db_path == (tmp_path / "data" / "db" / "atocore.db").resolve()
|
||||
assert settings.chroma_path == (tmp_path / "data" / "chroma").resolve()
|
||||
assert settings.cache_path == (tmp_path / "data" / "cache").resolve()
|
||||
assert settings.tmp_path == (tmp_path / "data" / "tmp").resolve()
|
||||
assert settings.resolved_vault_source_dir == (tmp_path / "vault-source").resolve()
|
||||
assert settings.resolved_drive_source_dir == (tmp_path / "drive-source").resolve()
|
||||
assert settings.resolved_log_dir == (tmp_path / "logs").resolve()
|
||||
assert settings.resolved_backup_dir == (tmp_path / "backups").resolve()
|
||||
assert settings.resolved_run_dir == (tmp_path / "run").resolve()
|
||||
assert settings.resolved_project_registry_path == (
|
||||
tmp_path / "config" / "project-registry.json"
|
||||
).resolve()
|
||||
|
||||
|
||||
def test_settings_keep_legacy_db_path_when_present(tmp_path, monkeypatch):
|
||||
data_dir = tmp_path / "data"
|
||||
data_dir.mkdir()
|
||||
legacy_db = data_dir / "atocore.db"
|
||||
legacy_db.write_text("", encoding="utf-8")
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(data_dir))
|
||||
|
||||
settings = config.Settings()
|
||||
|
||||
assert settings.db_path == legacy_db.resolve()
|
||||
|
||||
|
||||
def test_ranking_weights_are_tunable_via_env(monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_RANK_PROJECT_MATCH_BOOST", "3.5")
|
||||
monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_STEP", "0.12")
|
||||
monkeypatch.setenv("ATOCORE_RANK_QUERY_TOKEN_CAP", "1.5")
|
||||
monkeypatch.setenv("ATOCORE_RANK_PATH_HIGH_SIGNAL_BOOST", "1.25")
|
||||
monkeypatch.setenv("ATOCORE_RANK_PATH_LOW_SIGNAL_PENALTY", "0.5")
|
||||
|
||||
settings = config.Settings()
|
||||
|
||||
assert settings.rank_project_match_boost == 3.5
|
||||
assert settings.rank_query_token_step == 0.12
|
||||
assert settings.rank_query_token_cap == 1.5
|
||||
assert settings.rank_path_high_signal_boost == 1.25
|
||||
assert settings.rank_path_low_signal_penalty == 0.5
|
||||
|
||||
|
||||
def test_ensure_runtime_dirs_creates_machine_dirs_only(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_VAULT_SOURCE_DIR", str(tmp_path / "vault-source"))
|
||||
monkeypatch.setenv("ATOCORE_DRIVE_SOURCE_DIR", str(tmp_path / "drive-source"))
|
||||
monkeypatch.setenv("ATOCORE_LOG_DIR", str(tmp_path / "logs"))
|
||||
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
|
||||
monkeypatch.setenv(
|
||||
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
|
||||
)
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
config.ensure_runtime_dirs()
|
||||
|
||||
assert config.settings.db_path.parent.exists()
|
||||
assert config.settings.chroma_path.exists()
|
||||
assert config.settings.cache_path.exists()
|
||||
assert config.settings.tmp_path.exists()
|
||||
assert config.settings.resolved_log_dir.exists()
|
||||
assert config.settings.resolved_backup_dir.exists()
|
||||
assert config.settings.resolved_run_dir.exists()
|
||||
assert config.settings.resolved_project_registry_path.parent.exists()
|
||||
assert not config.settings.resolved_vault_source_dir.exists()
|
||||
assert not config.settings.resolved_drive_source_dir.exists()
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
348
tests/test_context_builder.py
Normal file
348
tests/test_context_builder.py
Normal file
@@ -0,0 +1,348 @@
|
||||
"""Tests for the context builder."""
|
||||
|
||||
import json
|
||||
|
||||
import atocore.config as config
|
||||
from atocore.context.builder import build_context, get_last_context_pack
|
||||
from atocore.context.project_state import init_project_state_schema, set_state
|
||||
from atocore.ingestion.pipeline import ingest_file
|
||||
from atocore.models.database import init_db
|
||||
|
||||
|
||||
def test_build_context_returns_pack(tmp_data_dir, sample_markdown):
|
||||
"""Test that context builder returns a valid pack."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
pack = build_context("What is AtoCore?")
|
||||
assert pack.total_chars > 0
|
||||
assert len(pack.chunks_used) > 0
|
||||
assert pack.budget_remaining >= 0
|
||||
assert "--- End Context ---" in pack.formatted_context
|
||||
|
||||
|
||||
def test_context_respects_budget(tmp_data_dir, sample_markdown):
|
||||
"""Test that context builder respects character budget."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
pack = build_context("What is AtoCore?", budget=500)
|
||||
assert pack.total_chars <= 500
|
||||
assert len(pack.formatted_context) <= 500
|
||||
|
||||
|
||||
def test_context_with_project_hint(tmp_data_dir, sample_markdown):
|
||||
"""Test that project hint boosts relevant chunks."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
pack = build_context("What is the architecture?", project_hint="atocore")
|
||||
assert len(pack.chunks_used) > 0
|
||||
assert pack.total_chars > 0
|
||||
|
||||
|
||||
def test_context_builder_passes_project_hint_to_retrieval(monkeypatch):
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
|
||||
calls = []
|
||||
|
||||
def fake_retrieve(query, top_k=None, filter_tags=None, project_hint=None):
|
||||
calls.append((query, project_hint))
|
||||
return []
|
||||
|
||||
monkeypatch.setattr("atocore.context.builder.retrieve", fake_retrieve)
|
||||
|
||||
build_context("architecture", project_hint="p05-interferometer", budget=300)
|
||||
|
||||
assert calls == [("architecture", "p05-interferometer")]
|
||||
|
||||
|
||||
def test_last_context_pack_stored(tmp_data_dir, sample_markdown):
|
||||
"""Test that last context pack is stored for debug."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
build_context("test prompt")
|
||||
last = get_last_context_pack()
|
||||
assert last is not None
|
||||
assert last.query == "test prompt"
|
||||
|
||||
|
||||
def test_full_prompt_structure(tmp_data_dir, sample_markdown):
|
||||
"""Test that the full prompt has correct structure."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
pack = build_context("What are memory types?")
|
||||
assert "knowledge base" in pack.full_prompt.lower()
|
||||
assert "What are memory types?" in pack.full_prompt
|
||||
|
||||
|
||||
def test_project_state_included_in_context(tmp_data_dir, sample_markdown):
|
||||
"""Test that trusted project state is injected into context."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
# Set some project state
|
||||
set_state("atocore", "status", "phase", "Phase 0.5 complete")
|
||||
set_state("atocore", "decision", "database", "SQLite for structured data")
|
||||
|
||||
pack = build_context("What is AtoCore?", project_hint="atocore")
|
||||
|
||||
# Project state should appear in context
|
||||
assert "--- Trusted Project State ---" in pack.formatted_context
|
||||
assert "Phase 0.5 complete" in pack.formatted_context
|
||||
assert "SQLite for structured data" in pack.formatted_context
|
||||
assert pack.project_state_chars > 0
|
||||
|
||||
|
||||
def test_trusted_state_precedence_is_restated_in_retrieved_context(tmp_data_dir, sample_markdown):
|
||||
"""When trusted state and retrieval coexist, the context should restate precedence explicitly."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
set_state("atocore", "status", "phase", "Phase 2")
|
||||
pack = build_context("What is AtoCore?", project_hint="atocore")
|
||||
|
||||
assert "If retrieved context conflicts with Trusted Project State above" in pack.formatted_context
|
||||
|
||||
|
||||
def test_project_state_takes_priority_budget(tmp_data_dir, sample_markdown):
|
||||
"""Test that project state is included even with tight budget."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
set_state("atocore", "status", "phase", "Phase 1 in progress")
|
||||
|
||||
# Small budget — project state should still be included
|
||||
pack = build_context("status?", project_hint="atocore", budget=500)
|
||||
assert "Phase 1 in progress" in pack.formatted_context
|
||||
|
||||
|
||||
def test_project_state_respects_total_budget(tmp_data_dir, sample_markdown):
|
||||
"""Trusted state should still fit within the total context budget."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
set_state("atocore", "status", "notes", "x" * 400)
|
||||
set_state("atocore", "decision", "details", "y" * 400)
|
||||
|
||||
pack = build_context("status?", project_hint="atocore", budget=120)
|
||||
assert pack.total_chars <= 120
|
||||
assert pack.budget_remaining >= 0
|
||||
assert len(pack.formatted_context) <= 120
|
||||
|
||||
|
||||
def test_project_hint_matches_state_case_insensitively(tmp_data_dir, sample_markdown):
|
||||
"""Project state lookup should not depend on exact casing."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
set_state("AtoCore", "status", "phase", "Phase 2")
|
||||
pack = build_context("status?", project_hint="atocore")
|
||||
assert "Phase 2" in pack.formatted_context
|
||||
|
||||
|
||||
def test_no_project_state_without_hint(tmp_data_dir, sample_markdown):
|
||||
"""Test that project state is not included without project hint."""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
set_state("atocore", "status", "phase", "Phase 1")
|
||||
|
||||
pack = build_context("What is AtoCore?")
|
||||
assert pack.project_state_chars == 0
|
||||
assert "--- Trusted Project State ---" not in pack.formatted_context
|
||||
|
||||
|
||||
def test_alias_hint_resolves_through_registry(tmp_data_dir, sample_markdown, monkeypatch):
|
||||
"""An alias hint like 'p05' should find project state stored under 'p05-interferometer'.
|
||||
|
||||
This is the regression test for the P1 finding from codex's review:
|
||||
/context/build was previously doing an exact-name lookup that
|
||||
silently dropped trusted project state when the caller passed an
|
||||
alias instead of the canonical project id.
|
||||
"""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
# Stand up a minimal project registry that knows the aliases.
|
||||
# The registry lives in a JSON file pointed to by
|
||||
# ATOCORE_PROJECT_REGISTRY_PATH; the dataclass-driven loader picks
|
||||
# it up on every call (no in-process cache to invalidate).
|
||||
registry_path = tmp_data_dir / "project-registry.json"
|
||||
registry_path.write_text(
|
||||
json.dumps(
|
||||
{
|
||||
"projects": [
|
||||
{
|
||||
"id": "p05-interferometer",
|
||||
"aliases": ["p05", "interferometer"],
|
||||
"description": "P05 alias-resolution regression test",
|
||||
"ingest_roots": [
|
||||
{"source": "vault", "subpath": "incoming/projects/p05"}
|
||||
],
|
||||
}
|
||||
]
|
||||
}
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
|
||||
# Trusted state is stored under the canonical id (the way the
|
||||
# /project/state endpoint always writes it).
|
||||
set_state(
|
||||
"p05-interferometer",
|
||||
"status",
|
||||
"next_focus",
|
||||
"Wave 2 trusted-operational ingestion",
|
||||
)
|
||||
|
||||
# The bug: pack with alias hint used to silently miss the state.
|
||||
pack_with_alias = build_context("status?", project_hint="p05", budget=2000)
|
||||
assert "Wave 2 trusted-operational ingestion" in pack_with_alias.formatted_context
|
||||
assert pack_with_alias.project_state_chars > 0
|
||||
|
||||
# The canonical id should still work the same way.
|
||||
pack_with_canonical = build_context(
|
||||
"status?", project_hint="p05-interferometer", budget=2000
|
||||
)
|
||||
assert "Wave 2 trusted-operational ingestion" in pack_with_canonical.formatted_context
|
||||
|
||||
# A second alias should also resolve.
|
||||
pack_with_other_alias = build_context(
|
||||
"status?", project_hint="interferometer", budget=2000
|
||||
)
|
||||
assert "Wave 2 trusted-operational ingestion" in pack_with_other_alias.formatted_context
|
||||
|
||||
|
||||
def test_unknown_hint_falls_back_to_raw_lookup(tmp_data_dir, sample_markdown, monkeypatch):
|
||||
"""A hint that isn't in the registry should still try the raw name.
|
||||
|
||||
This preserves backwards compatibility with hand-curated
|
||||
project_state entries that predate the project registry.
|
||||
"""
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
# Empty registry — the hint won't resolve through it.
|
||||
registry_path = tmp_data_dir / "project-registry.json"
|
||||
registry_path.write_text('{"projects": []}', encoding="utf-8")
|
||||
monkeypatch.setenv("ATOCORE_PROJECT_REGISTRY_PATH", str(registry_path))
|
||||
config.settings = config.Settings()
|
||||
|
||||
set_state("orphan-project", "status", "phase", "Solo run")
|
||||
|
||||
pack = build_context("status?", project_hint="orphan-project", budget=2000)
|
||||
assert "Solo run" in pack.formatted_context
|
||||
|
||||
|
||||
def test_project_memories_included_in_pack(tmp_data_dir, sample_markdown):
|
||||
"""Active project-scoped memories for the target project should
|
||||
land in a dedicated '--- Project Memories ---' band so the
|
||||
Phase 9 reflection loop has a retrieval outlet."""
|
||||
from atocore.memory.service import create_memory
|
||||
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
mem = create_memory(
|
||||
memory_type="project",
|
||||
content="the mirror architecture is Option B conical back for p04-gigabit",
|
||||
project="p04-gigabit",
|
||||
confidence=0.9,
|
||||
)
|
||||
# A sibling memory for a different project must NOT leak into the pack.
|
||||
create_memory(
|
||||
memory_type="project",
|
||||
content="polisher suite splits into sim, post, control, contracts",
|
||||
project="p06-polisher",
|
||||
confidence=0.9,
|
||||
)
|
||||
|
||||
pack = build_context(
|
||||
"remind me about the mirror architecture",
|
||||
project_hint="p04-gigabit",
|
||||
budget=3000,
|
||||
)
|
||||
assert "--- Project Memories ---" in pack.formatted_context
|
||||
assert "Option B conical back" in pack.formatted_context
|
||||
assert "polisher suite splits" not in pack.formatted_context
|
||||
assert pack.project_memory_chars > 0
|
||||
assert mem.project == "p04-gigabit"
|
||||
|
||||
|
||||
def test_project_memories_absent_without_project_hint(tmp_data_dir, sample_markdown):
|
||||
"""Without a project hint, project memories stay out of the pack —
|
||||
cross-project bleed would rot the signal."""
|
||||
from atocore.memory.service import create_memory
|
||||
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
create_memory(
|
||||
memory_type="project",
|
||||
content="scoped project knowledge that should not leak globally",
|
||||
project="p04-gigabit",
|
||||
confidence=0.9,
|
||||
)
|
||||
|
||||
pack = build_context("tell me something", budget=3000)
|
||||
assert "--- Project Memories ---" not in pack.formatted_context
|
||||
assert pack.project_memory_chars == 0
|
||||
|
||||
|
||||
def test_project_memories_query_relevance_ordering(tmp_data_dir, sample_markdown):
|
||||
"""When the budget only fits one memory, query-relevance ordering
|
||||
should pick the one the query is actually about — even if another
|
||||
memory has higher confidence.
|
||||
|
||||
Regression for the 2026-04-11 p05-vendor-signal harness failure:
|
||||
memory selection was fixed-order by confidence, so a lower-ranked
|
||||
vendor memory got starved out of the budget when a query was
|
||||
specifically about vendors.
|
||||
"""
|
||||
from atocore.memory.service import create_memory
|
||||
|
||||
init_db()
|
||||
init_project_state_schema()
|
||||
ingest_file(sample_markdown)
|
||||
|
||||
create_memory(
|
||||
memory_type="project",
|
||||
content="the folded-beam interferometer uses a CGH stage and fold mirror",
|
||||
project="p05-interferometer",
|
||||
confidence=0.97,
|
||||
)
|
||||
create_memory(
|
||||
memory_type="knowledge",
|
||||
content="vendor signal: Zygo Verifire SV is the strongest value path for the interferometer",
|
||||
project="p05-interferometer",
|
||||
confidence=0.85,
|
||||
)
|
||||
|
||||
pack = build_context(
|
||||
"what is the current vendor signal for the interferometer",
|
||||
project_hint="p05-interferometer",
|
||||
budget=1200, # tight enough that only one project memory fits
|
||||
)
|
||||
assert "Zygo Verifire SV" in pack.formatted_context
|
||||
assert pack.project_memory_chars > 0
|
||||
184
tests/test_database.py
Normal file
184
tests/test_database.py
Normal file
@@ -0,0 +1,184 @@
|
||||
"""Tests for SQLite connection pragmas and runtime behavior."""
|
||||
|
||||
import sqlite3
|
||||
|
||||
import atocore.config as config
|
||||
from atocore.models.database import get_connection, init_db
|
||||
|
||||
|
||||
def test_get_connection_applies_busy_timeout_and_wal(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_DB_BUSY_TIMEOUT_MS", "7000")
|
||||
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
init_db()
|
||||
with get_connection() as conn:
|
||||
busy_timeout = conn.execute("PRAGMA busy_timeout").fetchone()[0]
|
||||
journal_mode = conn.execute("PRAGMA journal_mode").fetchone()[0]
|
||||
foreign_keys = conn.execute("PRAGMA foreign_keys").fetchone()[0]
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
assert busy_timeout == 7000
|
||||
assert str(journal_mode).lower() == "wal"
|
||||
assert foreign_keys == 1
|
||||
|
||||
|
||||
def test_get_connection_uses_configured_timeout_value(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("ATOCORE_DB_BUSY_TIMEOUT_MS", "2500")
|
||||
|
||||
original_settings = config.settings
|
||||
original_connect = sqlite3.connect
|
||||
calls = []
|
||||
|
||||
def fake_connect(*args, **kwargs):
|
||||
calls.append(kwargs.get("timeout"))
|
||||
return original_connect(*args, **kwargs)
|
||||
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
monkeypatch.setattr("atocore.models.database.sqlite3.connect", fake_connect)
|
||||
init_db()
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
|
||||
assert calls
|
||||
assert calls[0] == 2.5
|
||||
|
||||
|
||||
def test_init_db_upgrades_pre_phase9_schema_without_failing(tmp_path, monkeypatch):
|
||||
"""Regression test for the schema init ordering bug caught during
|
||||
the first real Dalidou deploy (report from 2026-04-08).
|
||||
|
||||
Before the fix, SCHEMA_SQL contained CREATE INDEX statements that
|
||||
referenced columns (memories.project, interactions.project,
|
||||
interactions.session_id) added by _apply_migrations later in
|
||||
init_db. On a fresh install this worked because CREATE TABLE
|
||||
created the tables with the new columns before the CREATE INDEX
|
||||
ran, but on UPGRADE from a pre-Phase-9 schema the CREATE TABLE
|
||||
IF NOT EXISTS was a no-op and the CREATE INDEX hit
|
||||
OperationalError: no such column.
|
||||
|
||||
This test seeds the tables with the OLD pre-Phase-9 shape then
|
||||
calls init_db() and verifies that:
|
||||
|
||||
- init_db does not raise
|
||||
- The new columns were added via _apply_migrations
|
||||
- The new indexes exist
|
||||
|
||||
If the bug is reintroduced by moving a CREATE INDEX for a
|
||||
migration column back into SCHEMA_SQL, this test will fail
|
||||
with OperationalError before reaching the assertions.
|
||||
"""
|
||||
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
|
||||
original_settings = config.settings
|
||||
try:
|
||||
config.settings = config.Settings()
|
||||
|
||||
# Step 1: create the data dir and open a direct connection
|
||||
config.ensure_runtime_dirs()
|
||||
db_path = config.settings.db_path
|
||||
|
||||
# Step 2: seed the DB with the old pre-Phase-9 shape. No
|
||||
# project/last_referenced_at/reference_count on memories; no
|
||||
# project/client/session_id/response/memories_used/chunks_used
|
||||
# on interactions. We also need the prerequisite tables
|
||||
# (projects, source_documents, source_chunks) because the
|
||||
# memories table has an FK to source_chunks.
|
||||
with sqlite3.connect(str(db_path)) as conn:
|
||||
conn.executescript(
|
||||
"""
|
||||
CREATE TABLE source_documents (
|
||||
id TEXT PRIMARY KEY,
|
||||
file_path TEXT UNIQUE NOT NULL,
|
||||
file_hash TEXT NOT NULL,
|
||||
title TEXT,
|
||||
doc_type TEXT DEFAULT 'markdown',
|
||||
tags TEXT DEFAULT '[]',
|
||||
ingested_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE source_chunks (
|
||||
id TEXT PRIMARY KEY,
|
||||
document_id TEXT NOT NULL REFERENCES source_documents(id) ON DELETE CASCADE,
|
||||
chunk_index INTEGER NOT NULL,
|
||||
content TEXT NOT NULL,
|
||||
heading_path TEXT DEFAULT '',
|
||||
char_count INTEGER NOT NULL,
|
||||
metadata TEXT DEFAULT '{}',
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE memories (
|
||||
id TEXT PRIMARY KEY,
|
||||
memory_type TEXT NOT NULL,
|
||||
content TEXT NOT NULL,
|
||||
source_chunk_id TEXT REFERENCES source_chunks(id),
|
||||
confidence REAL DEFAULT 1.0,
|
||||
status TEXT DEFAULT 'active',
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE projects (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT UNIQUE NOT NULL,
|
||||
description TEXT DEFAULT '',
|
||||
status TEXT DEFAULT 'active',
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE interactions (
|
||||
id TEXT PRIMARY KEY,
|
||||
prompt TEXT NOT NULL,
|
||||
context_pack TEXT DEFAULT '{}',
|
||||
response_summary TEXT DEFAULT '',
|
||||
project_id TEXT REFERENCES projects(id),
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
"""
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
# Step 3: call init_db — this used to raise on the upgrade
|
||||
# path. After the fix it should succeed.
|
||||
init_db()
|
||||
|
||||
# Step 4: verify the migrations ran — Phase 9 columns present
|
||||
with sqlite3.connect(str(db_path)) as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
memories_cols = {
|
||||
row["name"] for row in conn.execute("PRAGMA table_info(memories)")
|
||||
}
|
||||
interactions_cols = {
|
||||
row["name"]
|
||||
for row in conn.execute("PRAGMA table_info(interactions)")
|
||||
}
|
||||
|
||||
assert "project" in memories_cols
|
||||
assert "last_referenced_at" in memories_cols
|
||||
assert "reference_count" in memories_cols
|
||||
|
||||
assert "project" in interactions_cols
|
||||
assert "client" in interactions_cols
|
||||
assert "session_id" in interactions_cols
|
||||
assert "response" in interactions_cols
|
||||
assert "memories_used" in interactions_cols
|
||||
assert "chunks_used" in interactions_cols
|
||||
|
||||
# Step 5: verify the indexes on migration columns exist
|
||||
index_rows = conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type='index' AND tbl_name IN ('memories','interactions')"
|
||||
).fetchall()
|
||||
index_names = {row["name"] for row in index_rows}
|
||||
|
||||
assert "idx_memories_project" in index_names
|
||||
assert "idx_interactions_project_name" in index_names
|
||||
assert "idx_interactions_session" in index_names
|
||||
finally:
|
||||
config.settings = original_settings
|
||||
374
tests/test_extractor.py
Normal file
374
tests/test_extractor.py
Normal file
@@ -0,0 +1,374 @@
|
||||
"""Tests for Phase 9 Commit C rule-based candidate extractor."""
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from atocore.interactions.service import record_interaction
|
||||
from atocore.main import app
|
||||
from atocore.memory.extractor import (
|
||||
MemoryCandidate,
|
||||
extract_candidates_from_interaction,
|
||||
)
|
||||
from atocore.memory.service import (
|
||||
create_memory,
|
||||
get_memories,
|
||||
promote_memory,
|
||||
reject_candidate_memory,
|
||||
)
|
||||
from atocore.models.database import init_db
|
||||
|
||||
|
||||
def _capture(**fields):
|
||||
return record_interaction(
|
||||
prompt=fields.get("prompt", "unused"),
|
||||
response=fields.get("response", ""),
|
||||
response_summary=fields.get("response_summary", ""),
|
||||
project=fields.get("project", ""),
|
||||
reinforce=False,
|
||||
)
|
||||
|
||||
|
||||
# --- extractor: heading patterns ------------------------------------------
|
||||
|
||||
|
||||
def test_extractor_finds_decision_heading(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
response=(
|
||||
"We talked about the frame.\n\n"
|
||||
"## Decision: switch the lateral supports to GF-PTFE pads\n\n"
|
||||
"Rationale: thermal stability."
|
||||
),
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
assert len(results) == 1
|
||||
assert results[0].memory_type == "adaptation"
|
||||
assert "GF-PTFE" in results[0].content
|
||||
assert results[0].rule == "decision_heading"
|
||||
|
||||
|
||||
def test_extractor_finds_constraint_and_requirement_headings(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
response=(
|
||||
"### Constraint: total mass must stay under 4.8 kg\n"
|
||||
"## Requirement: survives 12g shock in any axis\n"
|
||||
),
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
rules = {r.rule for r in results}
|
||||
assert "constraint_heading" in rules
|
||||
assert "requirement_heading" in rules
|
||||
constraint = next(r for r in results if r.rule == "constraint_heading")
|
||||
requirement = next(r for r in results if r.rule == "requirement_heading")
|
||||
assert constraint.memory_type == "project"
|
||||
assert requirement.memory_type == "project"
|
||||
assert "4.8 kg" in constraint.content
|
||||
assert "12g" in requirement.content
|
||||
|
||||
|
||||
def test_extractor_finds_fact_heading(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
response="## Fact: the polisher sim uses floating-point deltas in microns\n",
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
assert len(results) == 1
|
||||
assert results[0].memory_type == "knowledge"
|
||||
assert results[0].rule == "fact_heading"
|
||||
|
||||
|
||||
def test_extractor_heading_separator_variants(tmp_data_dir):
|
||||
"""Decision headings should match with `:`, `-`, or em-dash."""
|
||||
init_db()
|
||||
for sep in (":", "-", "\u2014"):
|
||||
interaction = _capture(
|
||||
response=f"## Decision {sep} adopt option B for the mount interface\n",
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
assert len(results) == 1, f"sep={sep!r}"
|
||||
assert "option B" in results[0].content
|
||||
|
||||
|
||||
# --- extractor: sentence patterns -----------------------------------------
|
||||
|
||||
|
||||
def test_extractor_finds_preference_sentence(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
response=(
|
||||
"I prefer rebase-based workflows because history stays linear "
|
||||
"and reviewers have an easier time."
|
||||
),
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
pref_matches = [r for r in results if r.rule == "preference_sentence"]
|
||||
assert len(pref_matches) == 1
|
||||
assert pref_matches[0].memory_type == "preference"
|
||||
assert "rebase" in pref_matches[0].content.lower()
|
||||
|
||||
|
||||
def test_extractor_finds_decided_to_sentence(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
response=(
|
||||
"After going through the options we decided to keep the legacy "
|
||||
"calibration routine for the July milestone."
|
||||
),
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
decision_matches = [r for r in results if r.rule == "decided_to_sentence"]
|
||||
assert len(decision_matches) == 1
|
||||
assert decision_matches[0].memory_type == "adaptation"
|
||||
assert "legacy calibration" in decision_matches[0].content.lower()
|
||||
|
||||
|
||||
def test_extractor_finds_requirement_sentence(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
response=(
|
||||
"One of the findings: the requirement is that the interferometer "
|
||||
"must resolve 50 picometer displacements at 1 kHz bandwidth."
|
||||
),
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
req_matches = [r for r in results if r.rule == "requirement_sentence"]
|
||||
assert len(req_matches) == 1
|
||||
assert req_matches[0].memory_type == "project"
|
||||
assert "picometer" in req_matches[0].content.lower()
|
||||
|
||||
|
||||
# --- extractor: content rules ---------------------------------------------
|
||||
|
||||
|
||||
def test_extractor_rejects_too_short_matches(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(response="## Decision: yes\n") # too short after clean
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
assert results == []
|
||||
|
||||
|
||||
def test_extractor_deduplicates_identical_matches(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
response=(
|
||||
"## Decision: use the modular frame variant for prototyping\n"
|
||||
"## Decision: use the modular frame variant for prototyping\n"
|
||||
),
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
assert len(results) == 1
|
||||
|
||||
|
||||
def test_extractor_strips_trailing_punctuation(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
response="## Decision: defer the laser redesign to Q3.\n",
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
assert len(results) == 1
|
||||
assert results[0].content.endswith("Q3")
|
||||
|
||||
|
||||
def test_extractor_includes_project_and_source_interaction_id(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(
|
||||
project="p05-interferometer",
|
||||
response="## Decision: freeze the optical path for the prototype run\n",
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
assert len(results) == 1
|
||||
assert results[0].project == "p05-interferometer"
|
||||
assert results[0].source_interaction_id == interaction.id
|
||||
|
||||
|
||||
def test_extractor_drops_candidates_matching_existing_active(tmp_data_dir):
|
||||
init_db()
|
||||
# Seed an active memory that the extractor would otherwise re-propose
|
||||
create_memory(
|
||||
memory_type="preference",
|
||||
content="prefers small reviewable diffs",
|
||||
)
|
||||
interaction = _capture(
|
||||
response="Remember that I prefer small reviewable diffs because they merge faster.",
|
||||
)
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
# The only candidate would have been the preference, now dropped
|
||||
assert not any(r.content.lower() == "small reviewable diffs" for r in results)
|
||||
|
||||
|
||||
def test_extractor_returns_empty_for_no_patterns(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = _capture(response="Nothing structural here, just prose.")
|
||||
results = extract_candidates_from_interaction(interaction)
|
||||
assert results == []
|
||||
|
||||
|
||||
# --- service: candidate lifecycle -----------------------------------------
|
||||
|
||||
|
||||
def test_candidate_and_active_can_coexist(tmp_data_dir):
|
||||
init_db()
|
||||
active = create_memory(
|
||||
memory_type="preference",
|
||||
content="logs every config change to the change log",
|
||||
status="active",
|
||||
)
|
||||
candidate = create_memory(
|
||||
memory_type="preference",
|
||||
content="logs every config change to the change log",
|
||||
status="candidate",
|
||||
)
|
||||
# The two are distinct rows because status is part of the dedup key
|
||||
assert active.id != candidate.id
|
||||
|
||||
|
||||
def test_promote_memory_moves_candidate_to_active(tmp_data_dir):
|
||||
init_db()
|
||||
candidate = create_memory(
|
||||
memory_type="adaptation",
|
||||
content="moved the staging scripts into deploy/staging",
|
||||
status="candidate",
|
||||
)
|
||||
ok = promote_memory(candidate.id)
|
||||
assert ok is True
|
||||
|
||||
active_list = get_memories(memory_type="adaptation", status="active")
|
||||
assert any(m.id == candidate.id for m in active_list)
|
||||
|
||||
|
||||
def test_promote_memory_on_non_candidate_returns_false(tmp_data_dir):
|
||||
init_db()
|
||||
active = create_memory(
|
||||
memory_type="adaptation",
|
||||
content="already active adaptation entry",
|
||||
status="active",
|
||||
)
|
||||
assert promote_memory(active.id) is False
|
||||
|
||||
|
||||
def test_reject_candidate_moves_it_to_invalid(tmp_data_dir):
|
||||
init_db()
|
||||
candidate = create_memory(
|
||||
memory_type="knowledge",
|
||||
content="the calibration uses barometric pressure compensation",
|
||||
status="candidate",
|
||||
)
|
||||
ok = reject_candidate_memory(candidate.id)
|
||||
assert ok is True
|
||||
|
||||
invalid_list = get_memories(memory_type="knowledge", status="invalid")
|
||||
assert any(m.id == candidate.id for m in invalid_list)
|
||||
|
||||
|
||||
def test_reject_on_non_candidate_returns_false(tmp_data_dir):
|
||||
init_db()
|
||||
active = create_memory(memory_type="preference", content="always uses structured logging")
|
||||
assert reject_candidate_memory(active.id) is False
|
||||
|
||||
|
||||
def test_get_memories_filters_by_candidate_status(tmp_data_dir):
|
||||
init_db()
|
||||
create_memory(memory_type="preference", content="active one", status="active")
|
||||
create_memory(memory_type="preference", content="candidate one", status="candidate")
|
||||
create_memory(memory_type="preference", content="another candidate", status="candidate")
|
||||
candidates = get_memories(status="candidate", memory_type="preference")
|
||||
assert len(candidates) == 2
|
||||
assert all(c.status == "candidate" for c in candidates)
|
||||
|
||||
|
||||
# --- API: extract / promote / reject / list -------------------------------
|
||||
|
||||
|
||||
def test_api_extract_interaction_without_persist(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = record_interaction(
|
||||
prompt="review",
|
||||
response="## Decision: flip the default budget to 4000 for p05\n",
|
||||
reinforce=False,
|
||||
)
|
||||
client = TestClient(app)
|
||||
response = client.post(f"/interactions/{interaction.id}/extract", json={})
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["candidate_count"] == 1
|
||||
assert body["persisted"] is False
|
||||
assert body["persisted_ids"] == []
|
||||
# The candidate should NOT have been written to the memory table
|
||||
queue = get_memories(status="candidate")
|
||||
assert queue == []
|
||||
|
||||
|
||||
def test_api_extract_interaction_with_persist(tmp_data_dir):
|
||||
init_db()
|
||||
interaction = record_interaction(
|
||||
prompt="review",
|
||||
response=(
|
||||
"## Decision: pin the embedding model to v2.3 for Wave 2\n"
|
||||
"## Constraint: context budget must stay under 4000 chars\n"
|
||||
),
|
||||
reinforce=False,
|
||||
)
|
||||
client = TestClient(app)
|
||||
response = client.post(
|
||||
f"/interactions/{interaction.id}/extract", json={"persist": True}
|
||||
)
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert body["candidate_count"] == 2
|
||||
assert body["persisted"] is True
|
||||
assert len(body["persisted_ids"]) == 2
|
||||
|
||||
queue = get_memories(status="candidate", limit=50)
|
||||
assert len(queue) == 2
|
||||
|
||||
|
||||
def test_api_extract_returns_404_for_missing_interaction(tmp_data_dir):
|
||||
init_db()
|
||||
client = TestClient(app)
|
||||
response = client.post("/interactions/nope/extract", json={})
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_api_promote_and_reject_endpoints(tmp_data_dir):
|
||||
init_db()
|
||||
candidate = create_memory(
|
||||
memory_type="adaptation",
|
||||
content="restructured the ingestion pipeline into layered stages",
|
||||
status="candidate",
|
||||
)
|
||||
client = TestClient(app)
|
||||
|
||||
promote_response = client.post(f"/memory/{candidate.id}/promote")
|
||||
assert promote_response.status_code == 200
|
||||
assert promote_response.json()["status"] == "promoted"
|
||||
|
||||
# Promoting it again should 404 because it's no longer a candidate
|
||||
second_promote = client.post(f"/memory/{candidate.id}/promote")
|
||||
assert second_promote.status_code == 404
|
||||
|
||||
reject_response = client.post("/memory/does-not-exist/reject")
|
||||
assert reject_response.status_code == 404
|
||||
|
||||
|
||||
def test_api_get_memory_candidate_status_filter(tmp_data_dir):
|
||||
init_db()
|
||||
create_memory(memory_type="preference", content="prefers explicit types", status="active")
|
||||
create_memory(
|
||||
memory_type="preference",
|
||||
content="prefers pull requests sized by diff lines not files",
|
||||
status="candidate",
|
||||
)
|
||||
client = TestClient(app)
|
||||
response = client.get("/memory", params={"status": "candidate"})
|
||||
assert response.status_code == 200
|
||||
body = response.json()
|
||||
assert "candidate" in body["statuses"]
|
||||
assert len(body["memories"]) == 1
|
||||
assert body["memories"][0]["status"] == "candidate"
|
||||
|
||||
|
||||
def test_api_get_memory_invalid_status_returns_400(tmp_data_dir):
|
||||
init_db()
|
||||
client = TestClient(app)
|
||||
response = client.get("/memory", params={"status": "not-a-status"})
|
||||
assert response.status_code == 400
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user