Compare commits
19 Commits
codex/dali
...
330ecfb6a6
| Author | SHA1 | Date | |
|---|---|---|---|
| 330ecfb6a6 | |||
| d9dc55f841 | |||
| 81307cec47 | |||
| 59331e522d | |||
| b3253f35ee | |||
| 30ee857d62 | |||
| 38f6e525af | |||
| 37331d53ef | |||
| 5aeeb1cad1 | |||
| 4da81c9e4e | |||
| 7bf83bf46a | |||
| 1161645415 | |||
| 5913da53c5 | |||
| 8ea53f4003 | |||
| 9366ba7879 | |||
| c5bad996a7 | |||
| 0b1742770a | |||
| 2829d5ec1c | |||
| f49637b5cc |
@@ -1,5 +1,13 @@
|
|||||||
# AGENTS.md
|
# AGENTS.md
|
||||||
|
|
||||||
|
## Session protocol (read first, every session)
|
||||||
|
|
||||||
|
**Before doing anything else, read `DEV-LEDGER.md` at the repo root.** It is the one-file source of truth for "what is currently true" — live SHA, active plan, open review findings, recent decisions. The narrative docs under `docs/` may lag; the ledger does not.
|
||||||
|
|
||||||
|
**Before ending a session, append a Session Log line to `DEV-LEDGER.md`** with what you did and which commit range it covers, and bump the Orientation section if anything there changed.
|
||||||
|
|
||||||
|
This rule applies equally to Claude, Codex, and any future agent working in this repo.
|
||||||
|
|
||||||
## Project role
|
## Project role
|
||||||
This repository is AtoCore, the runtime and machine-memory layer of the Ato ecosystem.
|
This repository is AtoCore, the runtime and machine-memory layer of the Ato ecosystem.
|
||||||
|
|
||||||
|
|||||||
30
CLAUDE.md
Normal file
30
CLAUDE.md
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
# CLAUDE.md — project instructions for AtoCore
|
||||||
|
|
||||||
|
## Session protocol
|
||||||
|
|
||||||
|
Before doing anything else in this repo, read `DEV-LEDGER.md` at the repo root. It is the shared operating memory between Claude, Codex, and the human operator — live Dalidou SHA, active plan, open P1/P2 review findings, recent decisions, and session log. The narrative docs under `docs/` sometimes lag; the ledger does not.
|
||||||
|
|
||||||
|
Before ending a session, append a Session Log line to `DEV-LEDGER.md` covering:
|
||||||
|
|
||||||
|
- which commits you produced (sha range)
|
||||||
|
- what changed at a high level
|
||||||
|
- any harness / test count deltas
|
||||||
|
- anything you overclaimed and later corrected
|
||||||
|
|
||||||
|
Bump the **Orientation** section if `live_sha`, `main_tip`, `test_count`, or `harness` changed.
|
||||||
|
|
||||||
|
`AGENTS.md` at the repo root carries the broader project principles (storage separation, deployment model, coding guidance). Read it when you need the "why" behind a constraint.
|
||||||
|
|
||||||
|
## Deploy workflow
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git push origin main && ssh papa@dalidou "bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh"
|
||||||
|
```
|
||||||
|
|
||||||
|
The deploy script self-verifies via `/health` build_sha — if it exits non-zero, do not assume the change is live.
|
||||||
|
|
||||||
|
## Working model
|
||||||
|
|
||||||
|
- Claude builds; Codex audits. No parallel work on the same files.
|
||||||
|
- P1 review findings block further `main` commits until acknowledged in the ledger's **Open Review Findings** table.
|
||||||
|
- Codex branches must fork from `origin/main` (no orphan commits that require `--allow-unrelated-histories`).
|
||||||
172
DEV-LEDGER.md
Normal file
172
DEV-LEDGER.md
Normal file
@@ -0,0 +1,172 @@
|
|||||||
|
# AtoCore Dev Ledger
|
||||||
|
|
||||||
|
> Shared operating memory between humans, Claude, and Codex.
|
||||||
|
> **Every session MUST read this file at start and append a Session Log entry before ending.**
|
||||||
|
> Section headers are stable - do not rename them. Trim Session Log and Recent Decisions to the last 20 entries at session end; older history lives in `git log` and `docs/`.
|
||||||
|
|
||||||
|
## Orientation
|
||||||
|
|
||||||
|
- **live_sha** (Dalidou `/health` build_sha): `38f6e52`
|
||||||
|
- **last_updated**: 2026-04-11 by Claude (Day 1+2 eval on working branch, Day 4 gate escalated)
|
||||||
|
- **main_tip**: `d9dc55f` (unchanged; Day 1+2 artifacts live on `claude/extractor-eval-loop @ 7d8d599`)
|
||||||
|
- **test_count**: 264 passing
|
||||||
|
- **harness**: `6/6 PASS` (`python scripts/retrieval_eval.py` against live Dalidou)
|
||||||
|
- **off_host_backup**: `papa@192.168.86.39:/home/papa/atocore-backups/` via cron env `ATOCORE_BACKUP_RSYNC`, verified
|
||||||
|
|
||||||
|
## Active Plan
|
||||||
|
|
||||||
|
**Mini-phase**: Extractor improvement (eval-driven) + retrieval harness expansion.
|
||||||
|
**Duration**: 8 days, hard gates at each day boundary.
|
||||||
|
**Plan author**: Codex (2026-04-11). **Executor**: Claude. **Audit**: Codex.
|
||||||
|
|
||||||
|
### Preflight (before Day 1)
|
||||||
|
|
||||||
|
Stop if any of these fail:
|
||||||
|
|
||||||
|
- `git rev-parse HEAD` on `main` matches the expected branching tip
|
||||||
|
- Live `/health` on Dalidou reports the SHA you think is deployed
|
||||||
|
- `python scripts/retrieval_eval.py --json` still passes at the current baseline
|
||||||
|
- `batch-extract` over the known 42-capture slice reproduces the current low-yield baseline
|
||||||
|
- A frozen sample set exists for extractor labeling so the target does not move mid-phase
|
||||||
|
|
||||||
|
Success: baseline eval output saved, baseline extract output saved, working branch created from `origin/main`.
|
||||||
|
|
||||||
|
### Day 1 - Labeled extractor eval set
|
||||||
|
|
||||||
|
Pick 30 real captures: 10 that should produce 0 candidates, 10 that should plausibly produce 1, 10 ambiguous/hard. Store as a stable artifact (interaction id, expected count, expected type, notes). Add a runner that scores extractor output against labels.
|
||||||
|
|
||||||
|
Success: 30 labeled interactions in a stable artifact, one-command precision/recall output.
|
||||||
|
Fail-early: if labeling 30 takes more than a day because the concept is unclear, tighten the extraction target before touching code.
|
||||||
|
|
||||||
|
### Day 2 - Measure current extractor
|
||||||
|
|
||||||
|
Run the rule-based extractor on all 30. Record yield, TP, FP, FN. Bucket misses by class (conversational preference, decision summary, status/constraint, meta chatter).
|
||||||
|
|
||||||
|
Success: short scorecard with counts by miss type, top 2 miss classes obvious.
|
||||||
|
Fail-early: if the labeled set shows fewer than 5 plausible positives total, the corpus is too weak - relabel before tuning.
|
||||||
|
|
||||||
|
### Day 3 - Smallest rule expansion for top miss class
|
||||||
|
|
||||||
|
Add 1-2 narrow, explainable rules for the worst miss class. Add unit tests from real paraphrase examples in the labeled set. Then rerun eval.
|
||||||
|
|
||||||
|
Success: recall up on the labeled set, false positives do not materially rise, new tests cover the new cue class.
|
||||||
|
Fail-early: if one rule expansion raises FP above ~20% of extracted candidates, revert or narrow before adding more.
|
||||||
|
|
||||||
|
### Day 4 - Decision gate: more rules or LLM-assisted prototype
|
||||||
|
|
||||||
|
If rule expansion reaches a **meaningfully reviewable queue**, keep going with rules. Otherwise prototype an LLM-assisted extraction mode behind a flag.
|
||||||
|
|
||||||
|
"Meaningfully reviewable queue":
|
||||||
|
- >= 15-25% candidate yield on the 30 labeled captures
|
||||||
|
- FP rate low enough that manual triage feels tolerable
|
||||||
|
- >= 2 real non-synthetic candidates worth review
|
||||||
|
|
||||||
|
Hard stop: if candidate yield is still under 10% after this point, stop rule tinkering and switch to architecture review (LLM-assisted OR narrower extraction scope).
|
||||||
|
|
||||||
|
### Day 5 - Stabilize and document
|
||||||
|
|
||||||
|
Add remaining focused rules or the flagged LLM-assisted path. Write down in-scope and out-of-scope utterance kinds.
|
||||||
|
|
||||||
|
Success: labeled eval green against target threshold, extractor scope explainable in <= 5 bullets.
|
||||||
|
|
||||||
|
### Day 6 - Retrieval harness expansion (6 -> 15-20 fixtures)
|
||||||
|
|
||||||
|
Grow across p04/p05/p06. Include short ambiguous prompts, cross-project collision cases, expected project-state wins, expected project-memory wins, and 1-2 "should fail open / low confidence" cases.
|
||||||
|
|
||||||
|
Success: >= 15 fixtures, each active project has easy + medium + hard cases.
|
||||||
|
Fail-early: if fixtures are mostly obvious wins, add harder adversarial cases before claiming coverage.
|
||||||
|
|
||||||
|
### Day 7 - Regression pass and calibration
|
||||||
|
|
||||||
|
Run harness on current code vs live Dalidou. Inspect failures (ranking, ingestion gap, project bleed, budget). Make at most ONE ranking/budget tweak if the harness clearly justifies it. Do not mix harness expansion and ranking changes in a single commit unless tightly coupled.
|
||||||
|
|
||||||
|
Success: harness still passes or improves after extractor work; any ranking tweak is justified by a concrete fixture delta.
|
||||||
|
Fail-early: if > 20-25% of harness fixtures regress after extractor changes, separate concerns before merging.
|
||||||
|
|
||||||
|
### Day 8 - Merge and close
|
||||||
|
|
||||||
|
Clean commit sequence. Save before/after metrics (extractor scorecard, harness results). Update docs only with claims the metrics support.
|
||||||
|
|
||||||
|
Merge order: labeled corpus + runner -> extractor improvements + tests -> harness expansion -> any justified ranking tweak -> docs sync last.
|
||||||
|
|
||||||
|
Success: point to a before/after delta for both extraction and retrieval; docs do not overclaim.
|
||||||
|
|
||||||
|
### Hard Gates (stop/rethink points)
|
||||||
|
|
||||||
|
- Extractor yield < 10% after 30 labeled interactions -> stop, reconsider rule-only extraction
|
||||||
|
- FP rate > 20% on labeled set -> narrow rules before adding more
|
||||||
|
- Harness expansion finds < 3 genuinely hard cases -> harness still too soft
|
||||||
|
- Ranking change improves one project but regresses another -> do not merge without explicit tradeoff note
|
||||||
|
|
||||||
|
### Branching
|
||||||
|
|
||||||
|
One branch `codex/extractor-eval-loop` for Day 1-5, a second `codex/retrieval-harness-expansion` for Day 6-7. Keeps extraction and retrieval judgments auditable.
|
||||||
|
|
||||||
|
## Review Protocol
|
||||||
|
|
||||||
|
- Codex records review findings in **Open Review Findings**.
|
||||||
|
- Claude must read **Open Review Findings** at session start before coding.
|
||||||
|
- Codex owns finding text. Claude may update operational fields only:
|
||||||
|
- `status`
|
||||||
|
- `owner`
|
||||||
|
- `resolved_by`
|
||||||
|
- If Claude disagrees with a finding, do not rewrite it. Mark it `declined` and explain why in the **Session Log**.
|
||||||
|
- Any commit or session that addresses a finding should reference the finding id in the commit message or **Session Log**.
|
||||||
|
- `P1` findings block further commits in the affected area until they are at least acknowledged and explicitly tracked.
|
||||||
|
- Findings may be code-level, claim-level, or ops-level. If the implementation boundary changes, retarget the finding instead of silently closing it.
|
||||||
|
|
||||||
|
## Open Review Findings
|
||||||
|
|
||||||
|
| id | finder | severity | file:line | summary | status | owner | opened_at | resolved_by |
|
||||||
|
|-----|--------|----------|------------------------------------|-------------------------------------------------------------------------|--------------|--------|------------|-------------|
|
||||||
|
| R1 | Codex | P1 | deploy/hooks/capture_stop.py:76-85 | Live Claude capture still omits `extract`, so "loop closed both sides" remains overstated in practice even though the API supports it | acknowledged | Claude | 2026-04-11 | |
|
||||||
|
| R2 | Codex | P1 | src/atocore/context/builder.py | Project memories excluded from pack | fixed | Claude | 2026-04-11 | 8ea53f4 |
|
||||||
|
| R3 | Claude | P2 | src/atocore/memory/extractor.py | Rule cues (`## Decision:`) never fire on conversational LLM text | open | Claude | 2026-04-11 | |
|
||||||
|
| R4 | Codex | P2 | DEV-LEDGER.md:11 | Orientation `main_tip` was stale versus `HEAD` / `origin/main` | fixed | Codex | 2026-04-11 | 81307ce |
|
||||||
|
|
||||||
|
## Recent Decisions
|
||||||
|
|
||||||
|
- **2026-04-11** Adopt this ledger as shared operating memory between Claude and Codex. *Proposed by:* Antoine. *Ratified by:* Antoine.
|
||||||
|
- **2026-04-11** Accept Codex's 8-day mini-phase plan verbatim as Active Plan. *Proposed by:* Codex. *Ratified by:* Antoine.
|
||||||
|
- **2026-04-11** Review findings live in `DEV-LEDGER.md` with Codex owning finding text and Claude updating status fields only. *Proposed by:* Codex. *Ratified by:* Antoine.
|
||||||
|
- **2026-04-11** Project memories land in the pack under `--- Project Memories ---` at 25% budget ratio, gated on canonical project hint. *Proposed by:* Claude.
|
||||||
|
- **2026-04-11** Extraction stays off the capture hot path. Batch / manual only. *Proposed by:* Antoine.
|
||||||
|
- **2026-04-11** 4-step roadmap: extractor -> harness expansion -> Wave 2 ingestion -> OpenClaw finish. Steps 1+2 as one mini-phase. *Ratified by:* Antoine.
|
||||||
|
- **2026-04-11** Codex branches must fork from `main`, not be orphan commits. *Proposed by:* Claude. *Agreed by:* Codex.
|
||||||
|
|
||||||
|
## Session Log
|
||||||
|
|
||||||
|
- **2026-04-11 Claude** `claude/extractor-eval-loop @ 7d8d599` — Day 1+2 of the mini-phase. Froze a 64-interaction snapshot (`scripts/eval_data/interactions_snapshot_2026-04-11.json`) and labeled 20 by length-stratified random sample (5 positive, 15 zero; 7 total expected candidates). Built `scripts/extractor_eval.py` as a file-based eval runner. **Day 2 baseline: rule extractor hit 0% yield / 0% recall / 0% precision on the labeled set; 5 false negatives across 5 distinct miss classes (recommendation_prose, architectural_change_summary, spec_update_announcement, layered_recommendation, alignment_assertion).** This is the Day 4 hard-stop signal arriving two days early — a single rule expansion cannot close a 5-way miss, and widening rules blindly will collapse precision. The Day 4 decision gate is escalated to Antoine for ratification before Day 3 touches any extractor code. No extractor code on main has changed.
|
||||||
|
- **2026-04-11 Codex (ledger audit)** fixed stale `main_tip`, retargeted R1 from the API surface to the live Claude Stop hook, and formalized the review write protocol so Claude can consume findings without rewriting them.
|
||||||
|
- **2026-04-11 Claude** `b3253f3..59331e5` (1 commit). Wired the DEV-LEDGER, added session protocol to AGENTS.md, created project-local CLAUDE.md, deleted stale `codex/port-atocore-ops-client` remote branch. No code changes, no redeploy needed.
|
||||||
|
- **2026-04-11 Claude** `c5bad99..b3253f3` (11 commits + 1 merge). Length-aware reinforcement, project memories in pack, query-relevance memory ranking, hyphenated-identifier tokenizer, retrieval eval harness seeded, off-host backup wired end-to-end, docs synced, codex integration-pass branch merged. Harness went 0->6/6 on live Dalidou.
|
||||||
|
- **2026-04-11 Codex (async review)** identified 2 P1s against a stale checkout. R1 was fair (extraction not automated), R2 was outdated (project memories already landed on main). Delivered the 8-day execution plan now in Active Plan.
|
||||||
|
- **2026-04-06 Antoine** created `codex/atocore-integration-pass` with the `t420-openclaw/` workspace (merged 2026-04-11).
|
||||||
|
|
||||||
|
## Working Rules
|
||||||
|
|
||||||
|
- Claude builds; Codex audits. No parallel work on the same files.
|
||||||
|
- Codex branches fork from `main`: `git fetch origin && git checkout -b codex/<topic> origin/main`.
|
||||||
|
- P1 findings block further main commits until acknowledged in Open Review Findings.
|
||||||
|
- Every session appends at least one Session Log line and bumps Orientation.
|
||||||
|
- Trim Session Log and Recent Decisions to the last 20 at session end.
|
||||||
|
- Docs in `docs/` may overclaim stale status; the ledger is the one-file source of truth for "what is true right now."
|
||||||
|
|
||||||
|
## Quick Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check live state
|
||||||
|
ssh papa@dalidou "curl -s http://localhost:8100/health"
|
||||||
|
|
||||||
|
# Run the retrieval harness
|
||||||
|
python scripts/retrieval_eval.py # human-readable
|
||||||
|
python scripts/retrieval_eval.py --json # machine-readable
|
||||||
|
|
||||||
|
# Deploy a new main tip
|
||||||
|
git push origin main && ssh papa@dalidou "bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh"
|
||||||
|
|
||||||
|
# Reflection-loop ops
|
||||||
|
python scripts/atocore_client.py batch-extract '' '' 200 false # preview
|
||||||
|
python scripts/atocore_client.py batch-extract '' '' 200 true # persist
|
||||||
|
python scripts/atocore_client.py triage
|
||||||
|
```
|
||||||
85
deploy/dalidou/cron-backup.sh
Executable file
85
deploy/dalidou/cron-backup.sh
Executable file
@@ -0,0 +1,85 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
#
|
||||||
|
# deploy/dalidou/cron-backup.sh
|
||||||
|
# ------------------------------
|
||||||
|
# Daily backup + retention cleanup via the AtoCore API.
|
||||||
|
#
|
||||||
|
# Intended to run from cron on Dalidou:
|
||||||
|
#
|
||||||
|
# # Daily at 03:00 UTC
|
||||||
|
# 0 3 * * * /srv/storage/atocore/app/deploy/dalidou/cron-backup.sh >> /var/log/atocore-backup.log 2>&1
|
||||||
|
#
|
||||||
|
# What it does:
|
||||||
|
# 1. Creates a runtime backup (db + registry, no chroma by default)
|
||||||
|
# 2. Runs retention cleanup with --confirm to delete old snapshots
|
||||||
|
# 3. Logs results to stdout (captured by cron into the log file)
|
||||||
|
#
|
||||||
|
# Fail-open: exits 0 even on API errors so cron doesn't send noise
|
||||||
|
# emails. Check /var/log/atocore-backup.log for diagnostics.
|
||||||
|
#
|
||||||
|
# Environment variables:
|
||||||
|
# ATOCORE_URL default http://127.0.0.1:8100
|
||||||
|
# ATOCORE_BACKUP_CHROMA default false (set to "true" for cold chroma copy)
|
||||||
|
# ATOCORE_BACKUP_DIR default /srv/storage/atocore/backups
|
||||||
|
# ATOCORE_BACKUP_RSYNC optional rsync destination for off-host copies
|
||||||
|
# (e.g. papa@laptop:/home/papa/atocore-backups/)
|
||||||
|
# When set, the local snapshots tree is rsynced to
|
||||||
|
# the destination after cleanup. Unset = skip.
|
||||||
|
# SSH key auth must already be configured from this
|
||||||
|
# host to the destination.
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
ATOCORE_URL="${ATOCORE_URL:-http://127.0.0.1:8100}"
|
||||||
|
INCLUDE_CHROMA="${ATOCORE_BACKUP_CHROMA:-false}"
|
||||||
|
BACKUP_DIR="${ATOCORE_BACKUP_DIR:-/srv/storage/atocore/backups}"
|
||||||
|
RSYNC_TARGET="${ATOCORE_BACKUP_RSYNC:-}"
|
||||||
|
TIMESTAMP="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||||||
|
|
||||||
|
log() { printf '[%s] %s\n' "$TIMESTAMP" "$*"; }
|
||||||
|
|
||||||
|
log "=== AtoCore daily backup starting ==="
|
||||||
|
|
||||||
|
# Step 1: Create backup
|
||||||
|
log "Step 1: creating backup (chroma=$INCLUDE_CHROMA)"
|
||||||
|
BACKUP_RESULT=$(curl -sf -X POST \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d "{\"include_chroma\": $INCLUDE_CHROMA}" \
|
||||||
|
"$ATOCORE_URL/admin/backup" 2>&1) || {
|
||||||
|
log "ERROR: backup creation failed: $BACKUP_RESULT"
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
log "Backup created: $BACKUP_RESULT"
|
||||||
|
|
||||||
|
# Step 2: Retention cleanup (confirm=true to actually delete)
|
||||||
|
log "Step 2: running retention cleanup"
|
||||||
|
CLEANUP_RESULT=$(curl -sf -X POST \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"confirm": true}' \
|
||||||
|
"$ATOCORE_URL/admin/backup/cleanup" 2>&1) || {
|
||||||
|
log "ERROR: cleanup failed: $CLEANUP_RESULT"
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
log "Cleanup result: $CLEANUP_RESULT"
|
||||||
|
|
||||||
|
# Step 3: Off-host rsync (optional). Fail-open: log but don't abort
|
||||||
|
# the cron so a laptop being offline at 03:00 UTC never turns the
|
||||||
|
# local backup path red.
|
||||||
|
if [[ -n "$RSYNC_TARGET" ]]; then
|
||||||
|
log "Step 3: rsyncing snapshots to $RSYNC_TARGET"
|
||||||
|
if [[ ! -d "$BACKUP_DIR/snapshots" ]]; then
|
||||||
|
log "WARN: $BACKUP_DIR/snapshots does not exist, skipping rsync"
|
||||||
|
else
|
||||||
|
RSYNC_OUTPUT=$(rsync -a --delete \
|
||||||
|
-e "ssh -o ConnectTimeout=10 -o BatchMode=yes -o StrictHostKeyChecking=accept-new" \
|
||||||
|
"$BACKUP_DIR/snapshots/" "$RSYNC_TARGET" 2>&1) && {
|
||||||
|
log "Rsync complete"
|
||||||
|
} || {
|
||||||
|
log "WARN: rsync to $RSYNC_TARGET failed (offline or auth?): $RSYNC_OUTPUT"
|
||||||
|
}
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
log "Step 3: ATOCORE_BACKUP_RSYNC not set, skipping off-host copy"
|
||||||
|
fi
|
||||||
|
|
||||||
|
log "=== AtoCore daily backup complete ==="
|
||||||
@@ -3,7 +3,7 @@
|
|||||||
|
|
||||||
Reads the Stop hook JSON from stdin, extracts the last user prompt
|
Reads the Stop hook JSON from stdin, extracts the last user prompt
|
||||||
from the transcript JSONL, and POSTs to the AtoCore /interactions
|
from the transcript JSONL, and POSTs to the AtoCore /interactions
|
||||||
endpoint in conservative mode (reinforce=false, no extraction).
|
endpoint with reinforcement enabled (no extraction).
|
||||||
|
|
||||||
Fail-open: always exits 0, logs errors to stderr only.
|
Fail-open: always exits 0, logs errors to stderr only.
|
||||||
|
|
||||||
@@ -81,7 +81,7 @@ def _capture() -> None:
|
|||||||
"client": "claude-code",
|
"client": "claude-code",
|
||||||
"session_id": session_id,
|
"session_id": session_id,
|
||||||
"project": project,
|
"project": project,
|
||||||
"reinforce": False,
|
"reinforce": True,
|
||||||
}
|
}
|
||||||
|
|
||||||
body = json.dumps(payload, ensure_ascii=True).encode("utf-8")
|
body = json.dumps(payload, ensure_ascii=True).encode("utf-8")
|
||||||
|
|||||||
@@ -32,7 +32,18 @@ read-only additive mode.
|
|||||||
### Baseline Complete
|
### Baseline Complete
|
||||||
|
|
||||||
- Phase 9 - Reflection (all three foundation commits landed:
|
- Phase 9 - Reflection (all three foundation commits landed:
|
||||||
A capture, B reinforcement, C candidate extraction + review queue)
|
A capture, B reinforcement, C candidate extraction + review queue).
|
||||||
|
As of 2026-04-11 the capture → reinforce half runs automatically on
|
||||||
|
every Stop-hook capture (length-aware token-overlap matcher handles
|
||||||
|
paragraph-length memories), and project-scoped memories now reach
|
||||||
|
the context pack via a dedicated `--- Project Memories ---` band
|
||||||
|
between identity/preference and retrieved chunks. The extract half
|
||||||
|
is still a manual / batch flow by design (`scripts/atocore_client.py
|
||||||
|
batch-extract` + `triage`). First live batch-extract run over 42
|
||||||
|
captured interactions produced 1 candidate (rule extractor is
|
||||||
|
conservative and keys on structural cues like `## Decision:`
|
||||||
|
headings that rarely appear in conversational LLM responses) —
|
||||||
|
extractor tuning is a known follow-up.
|
||||||
|
|
||||||
### Not Yet Complete In The Intended Sense
|
### Not Yet Complete In The Intended Sense
|
||||||
|
|
||||||
@@ -167,7 +178,9 @@ These remain intentionally deferred.
|
|||||||
|
|
||||||
- automatic write-back from OpenClaw into AtoCore
|
- automatic write-back from OpenClaw into AtoCore
|
||||||
- automatic memory promotion
|
- automatic memory promotion
|
||||||
- reflection loop integration
|
- ~~reflection loop integration~~ — baseline now in (capture→reinforce
|
||||||
|
auto, extract batch/manual). Extractor tuning and scheduled batch
|
||||||
|
extraction still open.
|
||||||
- replacing OpenClaw's own memory system
|
- replacing OpenClaw's own memory system
|
||||||
- live machine-DB sync between machines
|
- live machine-DB sync between machines
|
||||||
- full ontology / graph expansion before the current baseline is stable
|
- full ontology / graph expansion before the current baseline is stable
|
||||||
|
|||||||
@@ -137,7 +137,12 @@ P06:
|
|||||||
|
|
||||||
- automatic write-back from OpenClaw into AtoCore
|
- automatic write-back from OpenClaw into AtoCore
|
||||||
- automatic memory promotion
|
- automatic memory promotion
|
||||||
- reflection loop integration
|
- ~~reflection loop integration~~ — baseline now landed (2026-04-11):
|
||||||
|
Stop hook runs reinforce automatically, project memories are folded
|
||||||
|
into the context pack, batch-extract and triage CLIs exist. What
|
||||||
|
remains deferred: scheduled/automatic batch extraction and extractor
|
||||||
|
rule tuning (rule-based extractor produced 1 candidate from 42 real
|
||||||
|
captures — needs new cues for conversational LLM content).
|
||||||
- replacing OpenClaw's own memory system
|
- replacing OpenClaw's own memory system
|
||||||
- syncing the live machine DB between machines
|
- syncing the live machine DB between machines
|
||||||
|
|
||||||
@@ -159,6 +164,77 @@ The next batch is successful if:
|
|||||||
- project ingestion remains controlled rather than noisy
|
- project ingestion remains controlled rather than noisy
|
||||||
- the canonical Dalidou instance stays stable
|
- the canonical Dalidou instance stays stable
|
||||||
|
|
||||||
|
## Retrieval Quality Review — 2026-04-11
|
||||||
|
|
||||||
|
First sweep with real project-hinted queries on Dalidou. Used
|
||||||
|
`POST /context/build` against p04, p05, p06 with representative
|
||||||
|
questions and inspected `formatted_context`.
|
||||||
|
|
||||||
|
Findings:
|
||||||
|
|
||||||
|
- **Trusted Project State is surfacing correctly.** The DECISION and
|
||||||
|
REQUIREMENT categories appear at the top of the pack and include
|
||||||
|
the expected key facts (e.g. p04 "Option B conical-back mirror
|
||||||
|
architecture"). This is the strongest signal in the pack today.
|
||||||
|
- **Chunk retrieval is relevant on-topic but broad.** Top chunks for
|
||||||
|
the p04 architecture query are PDR intro, CAD assembly overview,
|
||||||
|
and the index — all on the right project but none of them directly
|
||||||
|
answer the "why was Option B chosen" question. The authoritative
|
||||||
|
answer sits in Project State, not in the chunks.
|
||||||
|
- **Active memories are NOT reaching the pack.** The context builder
|
||||||
|
surfaces Trusted Project State and retrieved chunks but does not
|
||||||
|
include the 21 active project/knowledge memories. Reinforcement
|
||||||
|
(Phase 9 Commit B) bumps memory confidence without the memory ever
|
||||||
|
being read back into a prompt — the reflection loop has no outlet
|
||||||
|
on the retrieval side. This is a design gap, not a bug: needs a
|
||||||
|
decision on whether memories should feed into context assembly,
|
||||||
|
and if so at what trust level (below project_state, above chunks).
|
||||||
|
- **Cross-project bleed is low.** The p04 query did pull one p05
|
||||||
|
chunk (CGH_Design_Input_for_AOM) as the bottom hit but the top-4
|
||||||
|
were all p04.
|
||||||
|
|
||||||
|
Proposed follow-ups (not yet scheduled):
|
||||||
|
|
||||||
|
1. ~~Decide whether memories should be folded into `formatted_context`
|
||||||
|
and under what section header.~~ DONE 2026-04-11 (commits 8ea53f4,
|
||||||
|
5913da5, 1161645). A `--- Project Memories ---` band now sits
|
||||||
|
between identity/preference and retrieved chunks, gated on a
|
||||||
|
canonical project hint to prevent cross-project bleed. Budget
|
||||||
|
ratio 0.25 (tuned empirically — paragraph memories are ~400 chars
|
||||||
|
and earlier 0.15 ratio starved the first entry by one char).
|
||||||
|
Verified live: p04 architecture query surfaces the Option B memory.
|
||||||
|
2. Re-run the same three queries after any builder change and compare
|
||||||
|
`formatted_context` diffs — still open, and is the natural entry
|
||||||
|
point for the retrieval eval harness on the roadmap.
|
||||||
|
|
||||||
|
## Reflection Loop Live Check — 2026-04-11
|
||||||
|
|
||||||
|
First real run of `batch-extract` across 42 captured Claude Code
|
||||||
|
interactions on Dalidou produced exactly **1 candidate**, and that
|
||||||
|
candidate was a synthetic test capture from earlier in the session
|
||||||
|
(rejected). Finding:
|
||||||
|
|
||||||
|
- The rule-based extractor in `src/atocore/memory/extractor.py` keys
|
||||||
|
on explicit structural cues (decision headings like
|
||||||
|
`## Decision: ...`, preference sentences, etc.). Real Claude Code
|
||||||
|
responses are conversational and almost never contain those cues.
|
||||||
|
- This means the capture → extract half of the reflection loop is
|
||||||
|
effectively inert against organic LLM sessions until either the
|
||||||
|
rules are broadened (new cue families: "we chose X because...",
|
||||||
|
"the selected approach is...", etc.) or an LLM-assisted extraction
|
||||||
|
path is added alongside the rule-based one.
|
||||||
|
- Capture → reinforce is working correctly on live data (length-aware
|
||||||
|
matcher verified on live paraphrase of a p04 memory).
|
||||||
|
|
||||||
|
Follow-up candidates (not yet scheduled):
|
||||||
|
|
||||||
|
1. Extractor rule expansion — add conversational-form rules so real
|
||||||
|
session text has a chance of surfacing candidates.
|
||||||
|
2. LLM-assisted extractor as a separate rule family, guarded by
|
||||||
|
confidence and always landing in `status=candidate` (never active).
|
||||||
|
3. Retrieval eval harness — diffable scorecard of
|
||||||
|
`formatted_context` across a fixed question set per active project.
|
||||||
|
|
||||||
## Long-Run Goal
|
## Long-Run Goal
|
||||||
|
|
||||||
The long-run target is:
|
The long-run target is:
|
||||||
|
|||||||
@@ -340,6 +340,22 @@ def build_parser() -> argparse.ArgumentParser:
|
|||||||
p = sub.add_parser("reject")
|
p = sub.add_parser("reject")
|
||||||
p.add_argument("memory_id")
|
p.add_argument("memory_id")
|
||||||
|
|
||||||
|
# batch-extract: fan out /interactions/{id}/extract?persist=true across
|
||||||
|
# recent interactions. Idempotent — the extractor create_memory path
|
||||||
|
# silently skips duplicates, so re-running is safe.
|
||||||
|
p = sub.add_parser("batch-extract")
|
||||||
|
p.add_argument("since", nargs="?", default="")
|
||||||
|
p.add_argument("project", nargs="?", default="")
|
||||||
|
p.add_argument("limit", nargs="?", type=int, default=100)
|
||||||
|
p.add_argument("persist", nargs="?", default="true")
|
||||||
|
|
||||||
|
# triage: interactive candidate review loop. Fetches the queue, shows
|
||||||
|
# each candidate, accepts p/r/s (promote / reject / skip) / q (quit).
|
||||||
|
p = sub.add_parser("triage")
|
||||||
|
p.add_argument("memory_type", nargs="?", default="")
|
||||||
|
p.add_argument("project", nargs="?", default="")
|
||||||
|
p.add_argument("limit", nargs="?", type=int, default=50)
|
||||||
|
|
||||||
return parser
|
return parser
|
||||||
|
|
||||||
|
|
||||||
@@ -474,10 +490,141 @@ def main() -> int:
|
|||||||
{},
|
{},
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
elif cmd == "batch-extract":
|
||||||
|
print_json(run_batch_extract(args.since, args.project, args.limit, args.persist))
|
||||||
|
elif cmd == "triage":
|
||||||
|
return run_triage(args.memory_type, args.project, args.limit)
|
||||||
else:
|
else:
|
||||||
return 1
|
return 1
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def run_batch_extract(since: str, project: str, limit: int, persist_flag: str) -> dict:
|
||||||
|
"""Fetch recent interactions and run the extractor against each one.
|
||||||
|
|
||||||
|
Returns an aggregated summary. Safe to re-run: the server-side
|
||||||
|
persist path catches ValueError on duplicates and the endpoint
|
||||||
|
reports per-interaction candidate counts either way.
|
||||||
|
"""
|
||||||
|
persist = persist_flag.lower() in {"1", "true", "yes", "y"}
|
||||||
|
query_parts: list[str] = []
|
||||||
|
if project:
|
||||||
|
query_parts.append(f"project={urllib.parse.quote(project)}")
|
||||||
|
if since:
|
||||||
|
query_parts.append(f"since={urllib.parse.quote(since)}")
|
||||||
|
query_parts.append(f"limit={int(limit)}")
|
||||||
|
query = "?" + "&".join(query_parts)
|
||||||
|
|
||||||
|
listing = request("GET", f"/interactions{query}")
|
||||||
|
interactions = listing.get("interactions", []) if isinstance(listing, dict) else []
|
||||||
|
|
||||||
|
processed = 0
|
||||||
|
total_candidates = 0
|
||||||
|
total_persisted = 0
|
||||||
|
errors: list[dict] = []
|
||||||
|
per_interaction: list[dict] = []
|
||||||
|
|
||||||
|
for item in interactions:
|
||||||
|
iid = item.get("id") or ""
|
||||||
|
if not iid:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
result = request(
|
||||||
|
"POST",
|
||||||
|
f"/interactions/{urllib.parse.quote(iid, safe='')}/extract",
|
||||||
|
{"persist": persist},
|
||||||
|
)
|
||||||
|
except Exception as exc: # pragma: no cover - network errors land here
|
||||||
|
errors.append({"interaction_id": iid, "error": str(exc)})
|
||||||
|
continue
|
||||||
|
processed += 1
|
||||||
|
count = int(result.get("candidate_count", 0) or 0)
|
||||||
|
persisted_ids = result.get("persisted_ids") or []
|
||||||
|
total_candidates += count
|
||||||
|
total_persisted += len(persisted_ids)
|
||||||
|
if count:
|
||||||
|
per_interaction.append(
|
||||||
|
{
|
||||||
|
"interaction_id": iid,
|
||||||
|
"candidate_count": count,
|
||||||
|
"persisted_count": len(persisted_ids),
|
||||||
|
"project": item.get("project") or "",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"processed": processed,
|
||||||
|
"total_candidates": total_candidates,
|
||||||
|
"total_persisted": total_persisted,
|
||||||
|
"persist": persist,
|
||||||
|
"errors": errors,
|
||||||
|
"interactions_with_candidates": per_interaction,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def run_triage(memory_type: str, project: str, limit: int) -> int:
|
||||||
|
"""Interactive review of candidate memories.
|
||||||
|
|
||||||
|
Loads the queue once, walks through entries, prompts for
|
||||||
|
(p)romote / (r)eject / (s)kip / (q)uit. Stateless between runs —
|
||||||
|
re-running picks up whatever is still status=candidate.
|
||||||
|
"""
|
||||||
|
query_parts = ["status=candidate"]
|
||||||
|
if memory_type:
|
||||||
|
query_parts.append(f"memory_type={urllib.parse.quote(memory_type)}")
|
||||||
|
if project:
|
||||||
|
query_parts.append(f"project={urllib.parse.quote(project)}")
|
||||||
|
query_parts.append(f"limit={int(limit)}")
|
||||||
|
listing = request("GET", "/memory?" + "&".join(query_parts))
|
||||||
|
memories = listing.get("memories", []) if isinstance(listing, dict) else []
|
||||||
|
|
||||||
|
if not memories:
|
||||||
|
print_json({"status": "empty_queue", "count": 0})
|
||||||
|
return 0
|
||||||
|
|
||||||
|
promoted = 0
|
||||||
|
rejected = 0
|
||||||
|
skipped = 0
|
||||||
|
stopped_early = False
|
||||||
|
|
||||||
|
print(f"Triage queue: {len(memories)} candidate(s)\n", file=sys.stderr)
|
||||||
|
for idx, mem in enumerate(memories, 1):
|
||||||
|
mid = mem.get("id", "")
|
||||||
|
print(f"[{idx}/{len(memories)}] {mem.get('memory_type','?')} project={mem.get('project','')} conf={mem.get('confidence','?')}", file=sys.stderr)
|
||||||
|
print(f" id: {mid}", file=sys.stderr)
|
||||||
|
print(f" {mem.get('content','')}", file=sys.stderr)
|
||||||
|
try:
|
||||||
|
choice = input(" (p)romote / (r)eject / (s)kip / (q)uit > ").strip().lower()
|
||||||
|
except EOFError:
|
||||||
|
stopped_early = True
|
||||||
|
break
|
||||||
|
if choice in {"q", "quit"}:
|
||||||
|
stopped_early = True
|
||||||
|
break
|
||||||
|
if choice in {"p", "promote"}:
|
||||||
|
request("POST", f"/memory/{urllib.parse.quote(mid, safe='')}/promote", {})
|
||||||
|
promoted += 1
|
||||||
|
print(" -> promoted", file=sys.stderr)
|
||||||
|
elif choice in {"r", "reject"}:
|
||||||
|
request("POST", f"/memory/{urllib.parse.quote(mid, safe='')}/reject", {})
|
||||||
|
rejected += 1
|
||||||
|
print(" -> rejected", file=sys.stderr)
|
||||||
|
else:
|
||||||
|
skipped += 1
|
||||||
|
print(" -> skipped", file=sys.stderr)
|
||||||
|
|
||||||
|
print_json(
|
||||||
|
{
|
||||||
|
"reviewed": promoted + rejected + skipped,
|
||||||
|
"promoted": promoted,
|
||||||
|
"rejected": rejected,
|
||||||
|
"skipped": skipped,
|
||||||
|
"stopped_early": stopped_early,
|
||||||
|
"remaining_in_queue": len(memories) - (promoted + rejected + skipped) - (1 if stopped_early else 0),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
raise SystemExit(main())
|
raise SystemExit(main())
|
||||||
|
|||||||
194
scripts/retrieval_eval.py
Normal file
194
scripts/retrieval_eval.py
Normal file
@@ -0,0 +1,194 @@
|
|||||||
|
"""Retrieval quality eval harness.
|
||||||
|
|
||||||
|
Runs a fixed set of project-hinted questions against
|
||||||
|
``POST /context/build`` on a live AtoCore instance and scores the
|
||||||
|
resulting ``formatted_context`` against per-question expectations.
|
||||||
|
The goal is a diffable scorecard that tells you, run-to-run,
|
||||||
|
whether a retrieval / builder / ingestion change moved the needle.
|
||||||
|
|
||||||
|
Design notes
|
||||||
|
------------
|
||||||
|
- Fixtures live in ``scripts/retrieval_eval_fixtures.json`` so new
|
||||||
|
questions can be added without touching Python. Each fixture
|
||||||
|
names the project, the prompt, and a checklist of substrings that
|
||||||
|
MUST appear in ``formatted_context`` (``expect_present``) and
|
||||||
|
substrings that MUST NOT appear (``expect_absent``). The absent
|
||||||
|
list catches cross-project bleed and stale content.
|
||||||
|
- The checklist is deliberately substring-based (not regex, not
|
||||||
|
embedding-similarity) so a failure is always a trivially
|
||||||
|
reproducible "this string is not in that string". Richer scoring
|
||||||
|
can come later once we know the harness is useful.
|
||||||
|
- The harness is external to the app runtime and talks to AtoCore
|
||||||
|
over HTTP, so it works against dev, staging, or prod. It follows
|
||||||
|
the same environment-variable contract as ``atocore_client.py``
|
||||||
|
(``ATOCORE_BASE_URL``, ``ATOCORE_TIMEOUT_SECONDS``).
|
||||||
|
- Exit code 0 on all-pass, 1 on any fixture failure. Intended for
|
||||||
|
manual runs today; a future cron / CI hook can consume the
|
||||||
|
JSON output via ``--json``.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
|
||||||
|
python scripts/retrieval_eval.py # human-readable report
|
||||||
|
python scripts/retrieval_eval.py --json # machine-readable
|
||||||
|
python scripts/retrieval_eval.py --fixtures path/to/custom.json
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import urllib.error
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.request
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
DEFAULT_BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100")
|
||||||
|
DEFAULT_TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
|
||||||
|
DEFAULT_BUDGET = 3000
|
||||||
|
DEFAULT_FIXTURES = Path(__file__).parent / "retrieval_eval_fixtures.json"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Fixture:
|
||||||
|
name: str
|
||||||
|
project: str
|
||||||
|
prompt: str
|
||||||
|
budget: int = DEFAULT_BUDGET
|
||||||
|
expect_present: list[str] = field(default_factory=list)
|
||||||
|
expect_absent: list[str] = field(default_factory=list)
|
||||||
|
notes: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class FixtureResult:
|
||||||
|
fixture: Fixture
|
||||||
|
ok: bool
|
||||||
|
missing_present: list[str]
|
||||||
|
unexpected_absent: list[str]
|
||||||
|
total_chars: int
|
||||||
|
error: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
def load_fixtures(path: Path) -> list[Fixture]:
|
||||||
|
data = json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
if not isinstance(data, list):
|
||||||
|
raise ValueError(f"{path} must contain a JSON array of fixtures")
|
||||||
|
fixtures: list[Fixture] = []
|
||||||
|
for i, raw in enumerate(data):
|
||||||
|
if not isinstance(raw, dict):
|
||||||
|
raise ValueError(f"fixture {i} is not an object")
|
||||||
|
fixtures.append(
|
||||||
|
Fixture(
|
||||||
|
name=raw["name"],
|
||||||
|
project=raw.get("project", ""),
|
||||||
|
prompt=raw["prompt"],
|
||||||
|
budget=int(raw.get("budget", DEFAULT_BUDGET)),
|
||||||
|
expect_present=list(raw.get("expect_present", [])),
|
||||||
|
expect_absent=list(raw.get("expect_absent", [])),
|
||||||
|
notes=raw.get("notes", ""),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return fixtures
|
||||||
|
|
||||||
|
|
||||||
|
def run_fixture(fixture: Fixture, base_url: str, timeout: int) -> FixtureResult:
|
||||||
|
payload = {
|
||||||
|
"prompt": fixture.prompt,
|
||||||
|
"project": fixture.project or None,
|
||||||
|
"budget": fixture.budget,
|
||||||
|
}
|
||||||
|
req = urllib.request.Request(
|
||||||
|
url=f"{base_url}/context/build",
|
||||||
|
method="POST",
|
||||||
|
headers={"Content-Type": "application/json"},
|
||||||
|
data=json.dumps(payload).encode("utf-8"),
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||||
|
body = json.loads(resp.read().decode("utf-8"))
|
||||||
|
except urllib.error.URLError as exc:
|
||||||
|
return FixtureResult(
|
||||||
|
fixture=fixture,
|
||||||
|
ok=False,
|
||||||
|
missing_present=list(fixture.expect_present),
|
||||||
|
unexpected_absent=[],
|
||||||
|
total_chars=0,
|
||||||
|
error=f"http_error: {exc}",
|
||||||
|
)
|
||||||
|
|
||||||
|
formatted = body.get("formatted_context") or ""
|
||||||
|
missing = [s for s in fixture.expect_present if s not in formatted]
|
||||||
|
unexpected = [s for s in fixture.expect_absent if s in formatted]
|
||||||
|
return FixtureResult(
|
||||||
|
fixture=fixture,
|
||||||
|
ok=not missing and not unexpected,
|
||||||
|
missing_present=missing,
|
||||||
|
unexpected_absent=unexpected,
|
||||||
|
total_chars=len(formatted),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def print_human_report(results: list[FixtureResult]) -> None:
|
||||||
|
total = len(results)
|
||||||
|
passed = sum(1 for r in results if r.ok)
|
||||||
|
print(f"Retrieval eval: {passed}/{total} fixtures passed")
|
||||||
|
print()
|
||||||
|
for r in results:
|
||||||
|
marker = "PASS" if r.ok else "FAIL"
|
||||||
|
print(f"[{marker}] {r.fixture.name} project={r.fixture.project} chars={r.total_chars}")
|
||||||
|
if r.error:
|
||||||
|
print(f" error: {r.error}")
|
||||||
|
for miss in r.missing_present:
|
||||||
|
print(f" missing expected: {miss!r}")
|
||||||
|
for bleed in r.unexpected_absent:
|
||||||
|
print(f" unexpected present: {bleed!r}")
|
||||||
|
if r.fixture.notes and not r.ok:
|
||||||
|
print(f" notes: {r.fixture.notes}")
|
||||||
|
|
||||||
|
|
||||||
|
def print_json_report(results: list[FixtureResult]) -> None:
|
||||||
|
payload = {
|
||||||
|
"total": len(results),
|
||||||
|
"passed": sum(1 for r in results if r.ok),
|
||||||
|
"fixtures": [
|
||||||
|
{
|
||||||
|
"name": r.fixture.name,
|
||||||
|
"project": r.fixture.project,
|
||||||
|
"ok": r.ok,
|
||||||
|
"total_chars": r.total_chars,
|
||||||
|
"missing_present": r.missing_present,
|
||||||
|
"unexpected_absent": r.unexpected_absent,
|
||||||
|
"error": r.error,
|
||||||
|
}
|
||||||
|
for r in results
|
||||||
|
],
|
||||||
|
}
|
||||||
|
json.dump(payload, sys.stdout, indent=2)
|
||||||
|
sys.stdout.write("\n")
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
parser = argparse.ArgumentParser(description="AtoCore retrieval quality eval harness")
|
||||||
|
parser.add_argument("--base-url", default=DEFAULT_BASE_URL)
|
||||||
|
parser.add_argument("--timeout", type=int, default=DEFAULT_TIMEOUT)
|
||||||
|
parser.add_argument("--fixtures", type=Path, default=DEFAULT_FIXTURES)
|
||||||
|
parser.add_argument("--json", action="store_true", help="emit machine-readable JSON")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
fixtures = load_fixtures(args.fixtures)
|
||||||
|
results = [run_fixture(f, args.base_url, args.timeout) for f in fixtures]
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print_json_report(results)
|
||||||
|
else:
|
||||||
|
print_human_report(results)
|
||||||
|
|
||||||
|
return 0 if all(r.ok for r in results) else 1
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
86
scripts/retrieval_eval_fixtures.json
Normal file
86
scripts/retrieval_eval_fixtures.json
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
[
|
||||||
|
{
|
||||||
|
"name": "p04-architecture-decision",
|
||||||
|
"project": "p04-gigabit",
|
||||||
|
"prompt": "what mirror architecture was selected for GigaBIT M1 and why",
|
||||||
|
"expect_present": [
|
||||||
|
"--- Trusted Project State ---",
|
||||||
|
"Option B",
|
||||||
|
"conical",
|
||||||
|
"--- Project Memories ---"
|
||||||
|
],
|
||||||
|
"expect_absent": [
|
||||||
|
"p06-polisher",
|
||||||
|
"folded-beam"
|
||||||
|
],
|
||||||
|
"notes": "Canonical p04 decision — should surface both Trusted Project State (selected_mirror_architecture) and the project-memory band with the Option B memory"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "p04-constraints",
|
||||||
|
"project": "p04-gigabit",
|
||||||
|
"prompt": "what are the key GigaBIT M1 program constraints",
|
||||||
|
"expect_present": [
|
||||||
|
"--- Trusted Project State ---",
|
||||||
|
"Zerodur",
|
||||||
|
"1.2"
|
||||||
|
],
|
||||||
|
"expect_absent": [
|
||||||
|
"polisher suite"
|
||||||
|
],
|
||||||
|
"notes": "Key constraints are in Trusted Project State (key_constraints) and in the mission-framing memory"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "p05-configuration",
|
||||||
|
"project": "p05-interferometer",
|
||||||
|
"prompt": "what is the selected interferometer configuration",
|
||||||
|
"expect_present": [
|
||||||
|
"folded-beam",
|
||||||
|
"CGH"
|
||||||
|
],
|
||||||
|
"expect_absent": [
|
||||||
|
"Option B",
|
||||||
|
"conical back",
|
||||||
|
"polisher suite"
|
||||||
|
],
|
||||||
|
"notes": "P05 architecture memory covers folded-beam + CGH. GigaBIT M1 is the mirror under test and legitimately appears in p05 source docs (the interferometer measures it), so we only flag genuinely p04-only decisions like the mirror architecture choice."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "p05-vendor-signal",
|
||||||
|
"project": "p05-interferometer",
|
||||||
|
"prompt": "what is the current vendor signal for the interferometer procurement",
|
||||||
|
"expect_present": [
|
||||||
|
"4D",
|
||||||
|
"Zygo"
|
||||||
|
],
|
||||||
|
"expect_absent": [
|
||||||
|
"polisher"
|
||||||
|
],
|
||||||
|
"notes": "Vendor memory mentions 4D as strongest technical candidate and Zygo Verifire SV as value path"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "p06-suite-split",
|
||||||
|
"project": "p06-polisher",
|
||||||
|
"prompt": "how is the polisher software suite split across layers",
|
||||||
|
"expect_present": [
|
||||||
|
"polisher-sim",
|
||||||
|
"polisher-post",
|
||||||
|
"polisher-control"
|
||||||
|
],
|
||||||
|
"expect_absent": [
|
||||||
|
"GigaBIT"
|
||||||
|
],
|
||||||
|
"notes": "The three-layer split is in multiple p06 memories; check all three names surface together"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "p06-control-rule",
|
||||||
|
"project": "p06-polisher",
|
||||||
|
"prompt": "what is the polisher control design rule",
|
||||||
|
"expect_present": [
|
||||||
|
"interlocks"
|
||||||
|
],
|
||||||
|
"expect_absent": [
|
||||||
|
"interferometer"
|
||||||
|
],
|
||||||
|
"notes": "Control design rule memory mentions interlocks and state transitions"
|
||||||
|
}
|
||||||
|
]
|
||||||
@@ -49,6 +49,7 @@ from atocore.memory.service import (
|
|||||||
)
|
)
|
||||||
from atocore.observability.logger import get_logger
|
from atocore.observability.logger import get_logger
|
||||||
from atocore.ops.backup import (
|
from atocore.ops.backup import (
|
||||||
|
cleanup_old_backups,
|
||||||
create_runtime_backup,
|
create_runtime_backup,
|
||||||
list_runtime_backups,
|
list_runtime_backups,
|
||||||
validate_backup,
|
validate_backup,
|
||||||
@@ -511,6 +512,7 @@ class InteractionRecordRequest(BaseModel):
|
|||||||
chunks_used: list[str] = []
|
chunks_used: list[str] = []
|
||||||
context_pack: dict | None = None
|
context_pack: dict | None = None
|
||||||
reinforce: bool = True
|
reinforce: bool = True
|
||||||
|
extract: bool = False
|
||||||
|
|
||||||
|
|
||||||
@router.post("/interactions")
|
@router.post("/interactions")
|
||||||
@@ -536,6 +538,7 @@ def api_record_interaction(req: InteractionRecordRequest) -> dict:
|
|||||||
chunks_used=req.chunks_used,
|
chunks_used=req.chunks_used,
|
||||||
context_pack=req.context_pack,
|
context_pack=req.context_pack,
|
||||||
reinforce=req.reinforce,
|
reinforce=req.reinforce,
|
||||||
|
extract=req.extract,
|
||||||
)
|
)
|
||||||
except ValueError as e:
|
except ValueError as e:
|
||||||
raise HTTPException(status_code=400, detail=str(e))
|
raise HTTPException(status_code=400, detail=str(e))
|
||||||
@@ -731,6 +734,25 @@ def api_list_backups() -> dict:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class BackupCleanupRequest(BaseModel):
|
||||||
|
confirm: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/admin/backup/cleanup")
|
||||||
|
def api_cleanup_backups(req: BackupCleanupRequest | None = None) -> dict:
|
||||||
|
"""Apply retention policy to old backup snapshots.
|
||||||
|
|
||||||
|
Dry-run by default. Pass ``confirm: true`` to actually delete.
|
||||||
|
Retention: last 7 daily, last 4 weekly (Sundays), last 6 monthly (1st).
|
||||||
|
"""
|
||||||
|
payload = req or BackupCleanupRequest()
|
||||||
|
try:
|
||||||
|
return cleanup_old_backups(confirm=payload.confirm)
|
||||||
|
except Exception as e:
|
||||||
|
log.error("admin_cleanup_failed", error=str(e))
|
||||||
|
raise HTTPException(status_code=500, detail=f"Cleanup failed: {e}")
|
||||||
|
|
||||||
|
|
||||||
@router.get("/admin/backup/{stamp}/validate")
|
@router.get("/admin/backup/{stamp}/validate")
|
||||||
def api_validate_backup(stamp: str) -> dict:
|
def api_validate_backup(stamp: str) -> dict:
|
||||||
"""Validate that a previously created backup is structurally usable."""
|
"""Validate that a previously created backup is structurally usable."""
|
||||||
|
|||||||
@@ -30,6 +30,12 @@ SYSTEM_PREFIX = (
|
|||||||
# identity: 5%, preferences: 5%, project state: 20%, retrieval: 60%+
|
# identity: 5%, preferences: 5%, project state: 20%, retrieval: 60%+
|
||||||
PROJECT_STATE_BUDGET_RATIO = 0.20
|
PROJECT_STATE_BUDGET_RATIO = 0.20
|
||||||
MEMORY_BUDGET_RATIO = 0.10 # 5% identity + 5% preference
|
MEMORY_BUDGET_RATIO = 0.10 # 5% identity + 5% preference
|
||||||
|
# Project-scoped memories (project/knowledge/episodic) are the outlet
|
||||||
|
# for the Phase 9 reflection loop on the retrieval side. Budget sits
|
||||||
|
# between identity/preference and retrieved chunks so a reinforced
|
||||||
|
# memory can actually reach the model.
|
||||||
|
PROJECT_MEMORY_BUDGET_RATIO = 0.25
|
||||||
|
PROJECT_MEMORY_TYPES = ["project", "knowledge", "episodic"]
|
||||||
|
|
||||||
# Last built context pack for debug inspection
|
# Last built context pack for debug inspection
|
||||||
_last_context_pack: "ContextPack | None" = None
|
_last_context_pack: "ContextPack | None" = None
|
||||||
@@ -51,6 +57,8 @@ class ContextPack:
|
|||||||
project_state_chars: int = 0
|
project_state_chars: int = 0
|
||||||
memory_text: str = ""
|
memory_text: str = ""
|
||||||
memory_chars: int = 0
|
memory_chars: int = 0
|
||||||
|
project_memory_text: str = ""
|
||||||
|
project_memory_chars: int = 0
|
||||||
total_chars: int = 0
|
total_chars: int = 0
|
||||||
budget: int = 0
|
budget: int = 0
|
||||||
budget_remaining: int = 0
|
budget_remaining: int = 0
|
||||||
@@ -107,10 +115,32 @@ def build_context(
|
|||||||
memory_text, memory_chars = get_memories_for_context(
|
memory_text, memory_chars = get_memories_for_context(
|
||||||
memory_types=["identity", "preference"],
|
memory_types=["identity", "preference"],
|
||||||
budget=memory_budget,
|
budget=memory_budget,
|
||||||
|
query=user_prompt,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# 2b. Get project-scoped memories (third precedence). Only
|
||||||
|
# populated when a canonical project is in scope — cross-project
|
||||||
|
# memory bleed would rot the pack. Active-only filtering is
|
||||||
|
# handled by the shared min_confidence=0.5 gate inside
|
||||||
|
# get_memories_for_context.
|
||||||
|
project_memory_text = ""
|
||||||
|
project_memory_chars = 0
|
||||||
|
if canonical_project:
|
||||||
|
project_memory_budget = min(
|
||||||
|
int(budget * PROJECT_MEMORY_BUDGET_RATIO),
|
||||||
|
max(budget - project_state_chars - memory_chars, 0),
|
||||||
|
)
|
||||||
|
project_memory_text, project_memory_chars = get_memories_for_context(
|
||||||
|
memory_types=PROJECT_MEMORY_TYPES,
|
||||||
|
project=canonical_project,
|
||||||
|
budget=project_memory_budget,
|
||||||
|
header="--- Project Memories ---",
|
||||||
|
footer="--- End Project Memories ---",
|
||||||
|
query=user_prompt,
|
||||||
|
)
|
||||||
|
|
||||||
# 3. Calculate remaining budget for retrieval
|
# 3. Calculate remaining budget for retrieval
|
||||||
retrieval_budget = budget - project_state_chars - memory_chars
|
retrieval_budget = budget - project_state_chars - memory_chars - project_memory_chars
|
||||||
|
|
||||||
# 4. Retrieve candidates
|
# 4. Retrieve candidates
|
||||||
candidates = (
|
candidates = (
|
||||||
@@ -130,11 +160,14 @@ def build_context(
|
|||||||
selected = _select_within_budget(scored, max(retrieval_budget, 0))
|
selected = _select_within_budget(scored, max(retrieval_budget, 0))
|
||||||
|
|
||||||
# 7. Format full context
|
# 7. Format full context
|
||||||
formatted = _format_full_context(project_state_text, memory_text, selected)
|
formatted = _format_full_context(
|
||||||
|
project_state_text, memory_text, project_memory_text, selected
|
||||||
|
)
|
||||||
if len(formatted) > budget:
|
if len(formatted) > budget:
|
||||||
formatted, selected = _trim_context_to_budget(
|
formatted, selected = _trim_context_to_budget(
|
||||||
project_state_text,
|
project_state_text,
|
||||||
memory_text,
|
memory_text,
|
||||||
|
project_memory_text,
|
||||||
selected,
|
selected,
|
||||||
budget,
|
budget,
|
||||||
)
|
)
|
||||||
@@ -144,6 +177,7 @@ def build_context(
|
|||||||
|
|
||||||
project_state_chars = len(project_state_text)
|
project_state_chars = len(project_state_text)
|
||||||
memory_chars = len(memory_text)
|
memory_chars = len(memory_text)
|
||||||
|
project_memory_chars = len(project_memory_text)
|
||||||
retrieval_chars = sum(c.char_count for c in selected)
|
retrieval_chars = sum(c.char_count for c in selected)
|
||||||
total_chars = len(formatted)
|
total_chars = len(formatted)
|
||||||
duration_ms = int((time.time() - start) * 1000)
|
duration_ms = int((time.time() - start) * 1000)
|
||||||
@@ -154,6 +188,8 @@ def build_context(
|
|||||||
project_state_chars=project_state_chars,
|
project_state_chars=project_state_chars,
|
||||||
memory_text=memory_text,
|
memory_text=memory_text,
|
||||||
memory_chars=memory_chars,
|
memory_chars=memory_chars,
|
||||||
|
project_memory_text=project_memory_text,
|
||||||
|
project_memory_chars=project_memory_chars,
|
||||||
total_chars=total_chars,
|
total_chars=total_chars,
|
||||||
budget=budget,
|
budget=budget,
|
||||||
budget_remaining=budget - total_chars,
|
budget_remaining=budget - total_chars,
|
||||||
@@ -171,6 +207,7 @@ def build_context(
|
|||||||
chunks_used=len(selected),
|
chunks_used=len(selected),
|
||||||
project_state_chars=project_state_chars,
|
project_state_chars=project_state_chars,
|
||||||
memory_chars=memory_chars,
|
memory_chars=memory_chars,
|
||||||
|
project_memory_chars=project_memory_chars,
|
||||||
retrieval_chars=retrieval_chars,
|
retrieval_chars=retrieval_chars,
|
||||||
total_chars=total_chars,
|
total_chars=total_chars,
|
||||||
budget_remaining=budget - total_chars,
|
budget_remaining=budget - total_chars,
|
||||||
@@ -250,6 +287,7 @@ def _select_within_budget(
|
|||||||
def _format_full_context(
|
def _format_full_context(
|
||||||
project_state_text: str,
|
project_state_text: str,
|
||||||
memory_text: str,
|
memory_text: str,
|
||||||
|
project_memory_text: str,
|
||||||
chunks: list[ContextChunk],
|
chunks: list[ContextChunk],
|
||||||
) -> str:
|
) -> str:
|
||||||
"""Format project state + memories + retrieved chunks into full context block."""
|
"""Format project state + memories + retrieved chunks into full context block."""
|
||||||
@@ -265,7 +303,12 @@ def _format_full_context(
|
|||||||
parts.append(memory_text)
|
parts.append(memory_text)
|
||||||
parts.append("")
|
parts.append("")
|
||||||
|
|
||||||
# 3. Retrieved chunks (lowest trust)
|
# 3. Project-scoped memories (third trust level)
|
||||||
|
if project_memory_text:
|
||||||
|
parts.append(project_memory_text)
|
||||||
|
parts.append("")
|
||||||
|
|
||||||
|
# 4. Retrieved chunks (lowest trust)
|
||||||
if chunks:
|
if chunks:
|
||||||
parts.append("--- AtoCore Retrieved Context ---")
|
parts.append("--- AtoCore Retrieved Context ---")
|
||||||
if project_state_text:
|
if project_state_text:
|
||||||
@@ -277,7 +320,7 @@ def _format_full_context(
|
|||||||
parts.append(chunk.content)
|
parts.append(chunk.content)
|
||||||
parts.append("")
|
parts.append("")
|
||||||
parts.append("--- End Context ---")
|
parts.append("--- End Context ---")
|
||||||
elif not project_state_text and not memory_text:
|
elif not project_state_text and not memory_text and not project_memory_text:
|
||||||
parts.append("--- AtoCore Context ---\nNo relevant context found.\n--- End Context ---")
|
parts.append("--- AtoCore Context ---\nNo relevant context found.\n--- End Context ---")
|
||||||
|
|
||||||
return "\n".join(parts)
|
return "\n".join(parts)
|
||||||
@@ -299,6 +342,7 @@ def _pack_to_dict(pack: ContextPack) -> dict:
|
|||||||
"project_hint": pack.project_hint,
|
"project_hint": pack.project_hint,
|
||||||
"project_state_chars": pack.project_state_chars,
|
"project_state_chars": pack.project_state_chars,
|
||||||
"memory_chars": pack.memory_chars,
|
"memory_chars": pack.memory_chars,
|
||||||
|
"project_memory_chars": pack.project_memory_chars,
|
||||||
"chunks_used": len(pack.chunks_used),
|
"chunks_used": len(pack.chunks_used),
|
||||||
"total_chars": pack.total_chars,
|
"total_chars": pack.total_chars,
|
||||||
"budget": pack.budget,
|
"budget": pack.budget,
|
||||||
@@ -306,6 +350,7 @@ def _pack_to_dict(pack: ContextPack) -> dict:
|
|||||||
"duration_ms": pack.duration_ms,
|
"duration_ms": pack.duration_ms,
|
||||||
"has_project_state": bool(pack.project_state_text),
|
"has_project_state": bool(pack.project_state_text),
|
||||||
"has_memories": bool(pack.memory_text),
|
"has_memories": bool(pack.memory_text),
|
||||||
|
"has_project_memories": bool(pack.project_memory_text),
|
||||||
"chunks": [
|
"chunks": [
|
||||||
{
|
{
|
||||||
"source_file": c.source_file,
|
"source_file": c.source_file,
|
||||||
@@ -335,26 +380,45 @@ def _truncate_text_block(text: str, budget: int) -> tuple[str, int]:
|
|||||||
def _trim_context_to_budget(
|
def _trim_context_to_budget(
|
||||||
project_state_text: str,
|
project_state_text: str,
|
||||||
memory_text: str,
|
memory_text: str,
|
||||||
|
project_memory_text: str,
|
||||||
chunks: list[ContextChunk],
|
chunks: list[ContextChunk],
|
||||||
budget: int,
|
budget: int,
|
||||||
) -> tuple[str, list[ContextChunk]]:
|
) -> tuple[str, list[ContextChunk]]:
|
||||||
"""Trim retrieval first, then memory, then project state until formatted context fits."""
|
"""Trim retrieval → project memories → identity/preference → project state."""
|
||||||
kept_chunks = list(chunks)
|
kept_chunks = list(chunks)
|
||||||
formatted = _format_full_context(project_state_text, memory_text, kept_chunks)
|
formatted = _format_full_context(
|
||||||
|
project_state_text, memory_text, project_memory_text, kept_chunks
|
||||||
|
)
|
||||||
while len(formatted) > budget and kept_chunks:
|
while len(formatted) > budget and kept_chunks:
|
||||||
kept_chunks.pop()
|
kept_chunks.pop()
|
||||||
formatted = _format_full_context(project_state_text, memory_text, kept_chunks)
|
formatted = _format_full_context(
|
||||||
|
project_state_text, memory_text, project_memory_text, kept_chunks
|
||||||
|
)
|
||||||
|
|
||||||
if len(formatted) <= budget:
|
if len(formatted) <= budget:
|
||||||
return formatted, kept_chunks
|
return formatted, kept_chunks
|
||||||
|
|
||||||
|
# Drop project memories next (they were the most recently added
|
||||||
|
# tier and carry less trust than identity/preference).
|
||||||
|
project_memory_text, _ = _truncate_text_block(
|
||||||
|
project_memory_text,
|
||||||
|
max(budget - len(project_state_text) - len(memory_text), 0),
|
||||||
|
)
|
||||||
|
formatted = _format_full_context(
|
||||||
|
project_state_text, memory_text, project_memory_text, kept_chunks
|
||||||
|
)
|
||||||
|
if len(formatted) <= budget:
|
||||||
|
return formatted, kept_chunks
|
||||||
|
|
||||||
memory_text, _ = _truncate_text_block(memory_text, max(budget - len(project_state_text), 0))
|
memory_text, _ = _truncate_text_block(memory_text, max(budget - len(project_state_text), 0))
|
||||||
formatted = _format_full_context(project_state_text, memory_text, kept_chunks)
|
formatted = _format_full_context(
|
||||||
|
project_state_text, memory_text, project_memory_text, kept_chunks
|
||||||
|
)
|
||||||
if len(formatted) <= budget:
|
if len(formatted) <= budget:
|
||||||
return formatted, kept_chunks
|
return formatted, kept_chunks
|
||||||
|
|
||||||
project_state_text, _ = _truncate_text_block(project_state_text, budget)
|
project_state_text, _ = _truncate_text_block(project_state_text, budget)
|
||||||
formatted = _format_full_context(project_state_text, "", [])
|
formatted = _format_full_context(project_state_text, "", "", [])
|
||||||
if len(formatted) > budget:
|
if len(formatted) > budget:
|
||||||
formatted, _ = _truncate_text_block(formatted, budget)
|
formatted, _ = _truncate_text_block(formatted, budget)
|
||||||
return formatted, []
|
return formatted, []
|
||||||
|
|||||||
@@ -63,6 +63,7 @@ def record_interaction(
|
|||||||
chunks_used: list[str] | None = None,
|
chunks_used: list[str] | None = None,
|
||||||
context_pack: dict | None = None,
|
context_pack: dict | None = None,
|
||||||
reinforce: bool = True,
|
reinforce: bool = True,
|
||||||
|
extract: bool = False,
|
||||||
) -> Interaction:
|
) -> Interaction:
|
||||||
"""Persist a single interaction to the audit trail.
|
"""Persist a single interaction to the audit trail.
|
||||||
|
|
||||||
@@ -163,6 +164,30 @@ def record_interaction(
|
|||||||
error=str(exc),
|
error=str(exc),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
if extract and (response or response_summary):
|
||||||
|
try:
|
||||||
|
from atocore.memory.extractor import extract_candidates_from_interaction
|
||||||
|
from atocore.memory.service import create_memory
|
||||||
|
|
||||||
|
candidates = extract_candidates_from_interaction(interaction)
|
||||||
|
for candidate in candidates:
|
||||||
|
try:
|
||||||
|
create_memory(
|
||||||
|
memory_type=candidate.memory_type,
|
||||||
|
content=candidate.content,
|
||||||
|
project=candidate.project,
|
||||||
|
confidence=candidate.confidence,
|
||||||
|
status="candidate",
|
||||||
|
)
|
||||||
|
except ValueError:
|
||||||
|
pass # duplicate or validation error — skip silently
|
||||||
|
except Exception as exc: # pragma: no cover - extraction must never block capture
|
||||||
|
log.error(
|
||||||
|
"extraction_failed_on_capture",
|
||||||
|
interaction_id=interaction_id,
|
||||||
|
error=str(exc),
|
||||||
|
)
|
||||||
|
|
||||||
return interaction
|
return interaction
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -51,6 +51,15 @@ _STOP_WORDS: frozenset[str] = frozenset({
|
|||||||
})
|
})
|
||||||
_MATCH_THRESHOLD = 0.70
|
_MATCH_THRESHOLD = 0.70
|
||||||
|
|
||||||
|
# Long memories can't realistically hit 70% overlap through organic
|
||||||
|
# paraphrase — a 40-token memory would need 28 stemmed tokens echoed
|
||||||
|
# verbatim. Above this token count the matcher switches to an absolute
|
||||||
|
# overlap floor plus a softer fraction floor so paragraph-length memories
|
||||||
|
# still reinforce when the response genuinely uses them.
|
||||||
|
_LONG_MEMORY_TOKEN_COUNT = 15
|
||||||
|
_LONG_MODE_MIN_OVERLAP = 12
|
||||||
|
_LONG_MODE_MIN_FRACTION = 0.35
|
||||||
|
|
||||||
DEFAULT_CONFIDENCE_DELTA = 0.02
|
DEFAULT_CONFIDENCE_DELTA = 0.02
|
||||||
|
|
||||||
|
|
||||||
@@ -171,26 +180,47 @@ def _stem(word: str) -> str:
|
|||||||
def _tokenize(text: str) -> set[str]:
|
def _tokenize(text: str) -> set[str]:
|
||||||
"""Split normalized text into a stemmed token set.
|
"""Split normalized text into a stemmed token set.
|
||||||
|
|
||||||
Strips punctuation, drops words shorter than 3 chars and stop words.
|
Strips punctuation, drops words shorter than 3 chars and stop
|
||||||
|
words. Hyphenated and slash-separated identifiers
|
||||||
|
(``polisher-control``, ``twyman-green``, ``2-projects/interferometer``)
|
||||||
|
produce both the full form AND each sub-token, so a query for
|
||||||
|
"polisher control" can match a memory that wrote
|
||||||
|
"polisher-control" without forcing callers to guess the exact
|
||||||
|
hyphenation.
|
||||||
"""
|
"""
|
||||||
tokens: set[str] = set()
|
tokens: set[str] = set()
|
||||||
for raw in text.split():
|
for raw in text.split():
|
||||||
# Strip leading/trailing punctuation (commas, periods, quotes, etc.)
|
|
||||||
word = raw.strip(".,;:!?\"'()[]{}-/")
|
word = raw.strip(".,;:!?\"'()[]{}-/")
|
||||||
if len(word) < 3:
|
if not word:
|
||||||
continue
|
continue
|
||||||
if word in _STOP_WORDS:
|
_add_token(tokens, word)
|
||||||
continue
|
# Also add sub-tokens split on internal '-' or '/' so
|
||||||
tokens.add(_stem(word))
|
# hyphenated identifiers match queries that don't hyphenate.
|
||||||
|
if "-" in word or "/" in word:
|
||||||
|
for sub in re.split(r"[-/]+", word):
|
||||||
|
_add_token(tokens, sub)
|
||||||
return tokens
|
return tokens
|
||||||
|
|
||||||
|
|
||||||
|
def _add_token(tokens: set[str], word: str) -> None:
|
||||||
|
if len(word) < 3:
|
||||||
|
return
|
||||||
|
if word in _STOP_WORDS:
|
||||||
|
return
|
||||||
|
tokens.add(_stem(word))
|
||||||
|
|
||||||
|
|
||||||
def _memory_matches(memory_content: str, normalized_response: str) -> bool:
|
def _memory_matches(memory_content: str, normalized_response: str) -> bool:
|
||||||
"""Return True if enough of the memory's tokens appear in the response.
|
"""Return True if enough of the memory's tokens appear in the response.
|
||||||
|
|
||||||
Uses token-overlap: tokenize both sides (lowercase, stem, drop stop
|
Dual-mode token overlap:
|
||||||
words), then check whether >= 70 % of the memory's content tokens
|
- Short memories (<= _LONG_MEMORY_TOKEN_COUNT stems): require
|
||||||
appear in the response token set.
|
>= 70 % of memory tokens echoed.
|
||||||
|
- Long memories (paragraphs): require an absolute floor of
|
||||||
|
_LONG_MODE_MIN_OVERLAP distinct stems echoed AND a softer
|
||||||
|
fraction of _LONG_MODE_MIN_FRACTION, so organic paraphrase
|
||||||
|
of a real project memory can reinforce without the response
|
||||||
|
quoting the paragraph verbatim.
|
||||||
"""
|
"""
|
||||||
if not memory_content:
|
if not memory_content:
|
||||||
return False
|
return False
|
||||||
@@ -202,4 +232,10 @@ def _memory_matches(memory_content: str, normalized_response: str) -> bool:
|
|||||||
return False
|
return False
|
||||||
response_tokens = _tokenize(normalized_response)
|
response_tokens = _tokenize(normalized_response)
|
||||||
overlap = memory_tokens & response_tokens
|
overlap = memory_tokens & response_tokens
|
||||||
return len(overlap) / len(memory_tokens) >= _MATCH_THRESHOLD
|
fraction = len(overlap) / len(memory_tokens)
|
||||||
|
if len(memory_tokens) <= _LONG_MEMORY_TOKEN_COUNT:
|
||||||
|
return fraction >= _MATCH_THRESHOLD
|
||||||
|
return (
|
||||||
|
len(overlap) >= _LONG_MODE_MIN_OVERLAP
|
||||||
|
and fraction >= _LONG_MODE_MIN_FRACTION
|
||||||
|
)
|
||||||
|
|||||||
@@ -344,6 +344,9 @@ def get_memories_for_context(
|
|||||||
memory_types: list[str] | None = None,
|
memory_types: list[str] | None = None,
|
||||||
project: str | None = None,
|
project: str | None = None,
|
||||||
budget: int = 500,
|
budget: int = 500,
|
||||||
|
header: str = "--- AtoCore Memory ---",
|
||||||
|
footer: str = "--- End Memory ---",
|
||||||
|
query: str | None = None,
|
||||||
) -> tuple[str, int]:
|
) -> tuple[str, int]:
|
||||||
"""Get formatted memories for context injection.
|
"""Get formatted memories for context injection.
|
||||||
|
|
||||||
@@ -351,38 +354,72 @@ def get_memories_for_context(
|
|||||||
|
|
||||||
Budget allocation per Master Plan section 9:
|
Budget allocation per Master Plan section 9:
|
||||||
identity: 5%, preference: 5%, rest from retrieval budget
|
identity: 5%, preference: 5%, rest from retrieval budget
|
||||||
|
|
||||||
|
The caller can override ``header`` / ``footer`` to distinguish
|
||||||
|
multiple memory blocks in the same pack (e.g. identity/preference
|
||||||
|
vs project/knowledge memories).
|
||||||
|
|
||||||
|
When ``query`` is provided, candidates within each memory type
|
||||||
|
are ranked by lexical overlap against the query (stemmed token
|
||||||
|
intersection, ties broken by confidence). Without a query,
|
||||||
|
candidates fall through in the order ``get_memories`` returns
|
||||||
|
them — which is effectively "by confidence desc".
|
||||||
"""
|
"""
|
||||||
if memory_types is None:
|
if memory_types is None:
|
||||||
memory_types = ["identity", "preference"]
|
memory_types = ["identity", "preference"]
|
||||||
|
|
||||||
if budget <= 0:
|
if budget <= 0:
|
||||||
return "", 0
|
return "", 0
|
||||||
|
|
||||||
header = "--- AtoCore Memory ---"
|
|
||||||
footer = "--- End Memory ---"
|
|
||||||
wrapper_chars = len(header) + len(footer) + 2
|
wrapper_chars = len(header) + len(footer) + 2
|
||||||
if budget <= wrapper_chars:
|
if budget <= wrapper_chars:
|
||||||
return "", 0
|
return "", 0
|
||||||
|
|
||||||
available = budget - wrapper_chars
|
available = budget - wrapper_chars
|
||||||
selected_entries: list[str] = []
|
selected_entries: list[str] = []
|
||||||
|
used = 0
|
||||||
|
|
||||||
for index, mtype in enumerate(memory_types):
|
# Pre-tokenize the query once. ``_score_memory_for_query`` is a
|
||||||
type_budget = available if index == len(memory_types) - 1 else max(0, available // (len(memory_types) - index))
|
# free function below that reuses the reinforcement tokenizer so
|
||||||
type_used = 0
|
# lexical scoring here matches the reinforcement matcher.
|
||||||
|
query_tokens: set[str] | None = None
|
||||||
|
if query:
|
||||||
|
from atocore.memory.reinforcement import _normalize, _tokenize
|
||||||
|
|
||||||
|
query_tokens = _tokenize(_normalize(query))
|
||||||
|
if not query_tokens:
|
||||||
|
query_tokens = None
|
||||||
|
|
||||||
|
# Collect ALL candidates across the requested types into one
|
||||||
|
# pool, then rank globally before the budget walk. Ranking per
|
||||||
|
# type and walking types in order would starve later types when
|
||||||
|
# the first type's candidates filled the budget — even if a
|
||||||
|
# later-type candidate matched the query perfectly. Type order
|
||||||
|
# is preserved as a stable tiebreaker inside
|
||||||
|
# ``_rank_memories_for_query`` via Python's stable sort.
|
||||||
|
pool: list[Memory] = []
|
||||||
|
seen_ids: set[str] = set()
|
||||||
|
for mtype in memory_types:
|
||||||
for mem in get_memories(
|
for mem in get_memories(
|
||||||
memory_type=mtype,
|
memory_type=mtype,
|
||||||
project=project,
|
project=project,
|
||||||
min_confidence=0.5,
|
min_confidence=0.5,
|
||||||
limit=10,
|
limit=30,
|
||||||
):
|
):
|
||||||
entry = f"[{mem.memory_type}] {mem.content}"
|
if mem.id in seen_ids:
|
||||||
entry_len = len(entry) + 1
|
|
||||||
if entry_len > type_budget - type_used:
|
|
||||||
continue
|
continue
|
||||||
selected_entries.append(entry)
|
seen_ids.add(mem.id)
|
||||||
type_used += entry_len
|
pool.append(mem)
|
||||||
available -= type_used
|
|
||||||
|
if query_tokens is not None:
|
||||||
|
pool = _rank_memories_for_query(pool, query_tokens)
|
||||||
|
|
||||||
|
for mem in pool:
|
||||||
|
entry = f"[{mem.memory_type}] {mem.content}"
|
||||||
|
entry_len = len(entry) + 1
|
||||||
|
if entry_len > available - used:
|
||||||
|
continue
|
||||||
|
selected_entries.append(entry)
|
||||||
|
used += entry_len
|
||||||
|
|
||||||
if not selected_entries:
|
if not selected_entries:
|
||||||
return "", 0
|
return "", 0
|
||||||
@@ -394,6 +431,28 @@ def get_memories_for_context(
|
|||||||
return text, len(text)
|
return text, len(text)
|
||||||
|
|
||||||
|
|
||||||
|
def _rank_memories_for_query(
|
||||||
|
memories: list["Memory"],
|
||||||
|
query_tokens: set[str],
|
||||||
|
) -> list["Memory"]:
|
||||||
|
"""Rerank a memory list by lexical overlap with a pre-tokenized query.
|
||||||
|
|
||||||
|
Ordering key: (overlap_count DESC, confidence DESC). When a query
|
||||||
|
shares no tokens with a memory, overlap is zero and confidence
|
||||||
|
acts as the sole tiebreaker — which matches the pre-query
|
||||||
|
behaviour and keeps no-query calls stable.
|
||||||
|
"""
|
||||||
|
from atocore.memory.reinforcement import _normalize, _tokenize
|
||||||
|
|
||||||
|
scored: list[tuple[int, float, Memory]] = []
|
||||||
|
for mem in memories:
|
||||||
|
mem_tokens = _tokenize(_normalize(mem.content))
|
||||||
|
overlap = len(mem_tokens & query_tokens) if mem_tokens else 0
|
||||||
|
scored.append((overlap, mem.confidence, mem))
|
||||||
|
scored.sort(key=lambda t: (t[0], t[1]), reverse=True)
|
||||||
|
return [mem for _, _, mem in scored]
|
||||||
|
|
||||||
|
|
||||||
def _row_to_memory(row) -> Memory:
|
def _row_to_memory(row) -> Memory:
|
||||||
"""Convert a DB row to Memory dataclass."""
|
"""Convert a DB row to Memory dataclass."""
|
||||||
keys = row.keys() if hasattr(row, "keys") else []
|
keys = row.keys() if hasattr(row, "keys") else []
|
||||||
|
|||||||
234
t420-openclaw/AGENTS.md
Normal file
234
t420-openclaw/AGENTS.md
Normal file
@@ -0,0 +1,234 @@
|
|||||||
|
# AGENTS.md - Your Workspace
|
||||||
|
|
||||||
|
This folder is home. Treat it that way.
|
||||||
|
|
||||||
|
## First Run
|
||||||
|
|
||||||
|
If `BOOTSTRAP.md` exists, that's your birth certificate. Follow it, figure out who you are, then delete it. You won't need it again.
|
||||||
|
|
||||||
|
## Every Session
|
||||||
|
|
||||||
|
Before doing anything else:
|
||||||
|
1. Read `SOUL.md` — this is who you are
|
||||||
|
2. Read `USER.md` — this is who you're helping
|
||||||
|
3. Read `MODEL-ROUTING.md` — follow the auto-routing policy for model selection
|
||||||
|
4. Read `memory/YYYY-MM-DD.md` (today + yesterday) for recent context
|
||||||
|
5. **If in MAIN SESSION** (direct chat with your human): Also read `MEMORY.md`
|
||||||
|
|
||||||
|
Don't ask permission. Just do it.
|
||||||
|
|
||||||
|
## Memory
|
||||||
|
|
||||||
|
You wake up fresh each session. These files are your continuity:
|
||||||
|
- **Daily notes:** `memory/YYYY-MM-DD.md` (create `memory/` if needed) — raw logs of what happened
|
||||||
|
- **Long-term:** `MEMORY.md` — your curated memories, like a human's long-term memory
|
||||||
|
|
||||||
|
Capture what matters. Decisions, context, things to remember. Skip the secrets unless asked to keep them.
|
||||||
|
|
||||||
|
### 🧠 MEMORY.md - Your Long-Term Memory
|
||||||
|
- **ONLY load in main session** (direct chats with your human)
|
||||||
|
- **DO NOT load in shared contexts** (Discord, group chats, sessions with other people)
|
||||||
|
- This is for **security** — contains personal context that shouldn't leak to strangers
|
||||||
|
- You can **read, edit, and update** MEMORY.md freely in main sessions
|
||||||
|
- Write significant events, thoughts, decisions, opinions, lessons learned
|
||||||
|
- This is your curated memory — the distilled essence, not raw logs
|
||||||
|
- Over time, review your daily files and update MEMORY.md with what's worth keeping
|
||||||
|
|
||||||
|
### 📝 Write It Down - No "Mental Notes"!
|
||||||
|
- **Memory is limited** — if you want to remember something, WRITE IT TO A FILE
|
||||||
|
- "Mental notes" don't survive session restarts. Files do.
|
||||||
|
- When someone says "remember this" → update `memory/YYYY-MM-DD.md` or relevant file
|
||||||
|
- When you learn a lesson → update AGENTS.md, TOOLS.md, or the relevant skill
|
||||||
|
- When you make a mistake → document it so future-you doesn't repeat it
|
||||||
|
- **Text > Brain** 📝
|
||||||
|
|
||||||
|
## Safety
|
||||||
|
|
||||||
|
- Don't exfiltrate private data. Ever.
|
||||||
|
- Don't run destructive commands without asking.
|
||||||
|
- `trash` > `rm` (recoverable beats gone forever)
|
||||||
|
- When in doubt, ask.
|
||||||
|
|
||||||
|
## External vs Internal
|
||||||
|
|
||||||
|
**Safe to do freely:**
|
||||||
|
- Read files, explore, organize, learn
|
||||||
|
- Search the web, check calendars
|
||||||
|
- Work within this workspace
|
||||||
|
|
||||||
|
**Ask first:**
|
||||||
|
- Sending emails, tweets, public posts
|
||||||
|
- Anything that leaves the machine
|
||||||
|
- Anything you're uncertain about
|
||||||
|
|
||||||
|
## Group Chats
|
||||||
|
|
||||||
|
You have access to your human's stuff. That doesn't mean you *share* their stuff. In groups, you're a participant — not their voice, not their proxy. Think before you speak.
|
||||||
|
|
||||||
|
### 💬 Know When to Speak!
|
||||||
|
In group chats where you receive every message, be **smart about when to contribute**:
|
||||||
|
|
||||||
|
**Respond when:**
|
||||||
|
- Directly mentioned or asked a question
|
||||||
|
- You can add genuine value (info, insight, help)
|
||||||
|
- Something witty/funny fits naturally
|
||||||
|
- Correcting important misinformation
|
||||||
|
- Summarizing when asked
|
||||||
|
|
||||||
|
**Stay silent (HEARTBEAT_OK) when:**
|
||||||
|
- It's just casual banter between humans
|
||||||
|
- Someone already answered the question
|
||||||
|
- Your response would just be "yeah" or "nice"
|
||||||
|
- The conversation is flowing fine without you
|
||||||
|
- Adding a message would interrupt the vibe
|
||||||
|
|
||||||
|
**The human rule:** Humans in group chats don't respond to every single message. Neither should you. Quality > quantity. If you wouldn't send it in a real group chat with friends, don't send it.
|
||||||
|
|
||||||
|
**Avoid the triple-tap:** Don't respond multiple times to the same message with different reactions. One thoughtful response beats three fragments.
|
||||||
|
|
||||||
|
Participate, don't dominate.
|
||||||
|
|
||||||
|
### 😊 React Like a Human!
|
||||||
|
On platforms that support reactions (Discord, Slack), use emoji reactions naturally:
|
||||||
|
|
||||||
|
**React when:**
|
||||||
|
- You appreciate something but don't need to reply (👍, ❤️, 🙌)
|
||||||
|
- Something made you laugh (😂, 💀)
|
||||||
|
- You find it interesting or thought-provoking (🤔, 💡)
|
||||||
|
- You want to acknowledge without interrupting the flow
|
||||||
|
- It's a simple yes/no or approval situation (✅, 👀)
|
||||||
|
|
||||||
|
**Why it matters:**
|
||||||
|
Reactions are lightweight social signals. Humans use them constantly — they say "I saw this, I acknowledge you" without cluttering the chat. You should too.
|
||||||
|
|
||||||
|
**Don't overdo it:** One reaction per message max. Pick the one that fits best.
|
||||||
|
|
||||||
|
## Tools
|
||||||
|
|
||||||
|
When a task is contextual and project-dependent, use the `atocore-context` skill to query Dalidou-hosted AtoCore for trusted project state, retrieval, context-building, registered project refresh, or project registration discovery when that will improve accuracy. Treat AtoCore as additive and fail-open; do not replace OpenClaw's own memory with it. Prefer `projects` and `refresh-project <id>` when a known project needs a clean source refresh, and use `project-template` when proposing a new project registration, and `propose-project ...` when you want a normalized preview before editing the registry manually.
|
||||||
|
|
||||||
|
### Organic AtoCore Routing
|
||||||
|
|
||||||
|
For normal project knowledge questions, use AtoCore by default without waiting for the human to ask for the helper explicitly.
|
||||||
|
|
||||||
|
Use AtoCore first when the prompt:
|
||||||
|
- mentions a registered project id or alias
|
||||||
|
- asks about architecture, constraints, status, requirements, vendors, planning, prior decisions, or current project truth
|
||||||
|
- would benefit from cross-source context instead of only the local repo
|
||||||
|
|
||||||
|
Preferred flow:
|
||||||
|
1. `auto-context "<prompt>" 3000` for most project knowledge questions
|
||||||
|
2. `project-state <project>` when the user is clearly asking for trusted current truth
|
||||||
|
3. `audit-query "<prompt>" 5 [project]` when broad prompts drift, archive/history noise appears, or retrieval quality is being evaluated
|
||||||
|
4. `refresh-project <id>` before answering if the user explicitly asked to refresh or ingest project changes
|
||||||
|
|
||||||
|
For AtoCore improvement work, prefer this sequence:
|
||||||
|
1. retrieval-quality pass
|
||||||
|
2. Wave 2 trusted-operational ingestion
|
||||||
|
3. AtoDrive clarification
|
||||||
|
4. restore and ops validation
|
||||||
|
|
||||||
|
Wave 2 trusted-operational truth should prioritize:
|
||||||
|
- current status
|
||||||
|
- current decisions
|
||||||
|
- requirements baseline
|
||||||
|
- milestone plan
|
||||||
|
- next actions
|
||||||
|
|
||||||
|
Do not ingest the whole PKM vault before the trusted-operational layer is in good shape. Treat AtoDrive as curated operational truth, not a generic dump.
|
||||||
|
|
||||||
|
Do not force AtoCore for purely local coding actions like fixing a function, editing one file, or running tests, unless broader project context is likely to matter.
|
||||||
|
|
||||||
|
If `auto-context` returns `no_project_match` or AtoCore is unavailable, continue normally with OpenClaw's own tools and memory.
|
||||||
|
|
||||||
|
Skills provide your tools. When you need one, check its `SKILL.md`. Keep local notes (camera names, SSH details, voice preferences) in `TOOLS.md`.
|
||||||
|
|
||||||
|
**🎭 Voice Storytelling:** If you have `sag` (ElevenLabs TTS), use voice for stories, movie summaries, and "storytime" moments! Way more engaging than walls of text. Surprise people with funny voices.
|
||||||
|
|
||||||
|
**📝 Platform Formatting:**
|
||||||
|
- **Discord/WhatsApp:** No markdown tables! Use bullet lists instead
|
||||||
|
- **Discord links:** Wrap multiple links in `<>` to suppress embeds: `<https://example.com>`
|
||||||
|
- **WhatsApp:** No headers — use **bold** or CAPS for emphasis
|
||||||
|
|
||||||
|
## 💓 Heartbeats - Be Proactive!
|
||||||
|
|
||||||
|
When you receive a heartbeat poll (message matches the configured heartbeat prompt), don't just reply `HEARTBEAT_OK` every time. Use heartbeats productively!
|
||||||
|
|
||||||
|
Default heartbeat prompt:
|
||||||
|
`Read HEARTBEAT.md if it exists (workspace context). Follow it strictly. Do not infer or repeat old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK.`
|
||||||
|
|
||||||
|
You are free to edit `HEARTBEAT.md` with a short checklist or reminders. Keep it small to limit token burn.
|
||||||
|
|
||||||
|
### Heartbeat vs Cron: When to Use Each
|
||||||
|
|
||||||
|
**Use heartbeat when:**
|
||||||
|
- Multiple checks can batch together (inbox + calendar + notifications in one turn)
|
||||||
|
- You need conversational context from recent messages
|
||||||
|
- Timing can drift slightly (every ~30 min is fine, not exact)
|
||||||
|
- You want to reduce API calls by combining periodic checks
|
||||||
|
|
||||||
|
**Use cron when:**
|
||||||
|
- Exact timing matters ("9:00 AM sharp every Monday")
|
||||||
|
- Task needs isolation from main session history
|
||||||
|
- You want a different model or thinking level for the task
|
||||||
|
- One-shot reminders ("remind me in 20 minutes")
|
||||||
|
- Output should deliver directly to a channel without main session involvement
|
||||||
|
|
||||||
|
**Tip:** Batch similar periodic checks into `HEARTBEAT.md` instead of creating multiple cron jobs. Use cron for precise schedules and standalone tasks.
|
||||||
|
|
||||||
|
**Things to check (rotate through these, 2-4 times per day):**
|
||||||
|
- **Emails** - Any urgent unread messages?
|
||||||
|
- **Calendar** - Upcoming events in next 24-48h?
|
||||||
|
- **Mentions** - Twitter/social notifications?
|
||||||
|
- **Weather** - Relevant if your human might go out?
|
||||||
|
|
||||||
|
**Track your checks** in `memory/heartbeat-state.json`:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"lastChecks": {
|
||||||
|
"email": 1703275200,
|
||||||
|
"calendar": 1703260800,
|
||||||
|
"weather": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**When to reach out:**
|
||||||
|
- Important email arrived
|
||||||
|
- Calendar event coming up (<2h)
|
||||||
|
- Something interesting you found
|
||||||
|
- It's been >8h since you said anything
|
||||||
|
|
||||||
|
**When to stay quiet (HEARTBEAT_OK):**
|
||||||
|
- Late night (23:00-08:00) unless urgent
|
||||||
|
- Human is clearly busy
|
||||||
|
- Nothing new since last check
|
||||||
|
- You just checked <30 minutes ago
|
||||||
|
|
||||||
|
**Proactive work you can do without asking:**
|
||||||
|
- Read and organize memory files
|
||||||
|
- Check on projects (git status, etc.)
|
||||||
|
- Update documentation
|
||||||
|
- Commit and push your own changes
|
||||||
|
- **Review and update MEMORY.md** (see below)
|
||||||
|
|
||||||
|
### 🔄 Memory Maintenance (During Heartbeats)
|
||||||
|
Periodically (every few days), use a heartbeat to:
|
||||||
|
1. Read through recent `memory/YYYY-MM-DD.md` files
|
||||||
|
2. Identify significant events, lessons, or insights worth keeping long-term
|
||||||
|
3. Update `MEMORY.md` with distilled learnings
|
||||||
|
4. Remove outdated info from MEMORY.md that's no longer relevant
|
||||||
|
|
||||||
|
Think of it like a human reviewing their journal and updating their mental model. Daily files are raw notes; MEMORY.md is curated wisdom.
|
||||||
|
|
||||||
|
The goal: Be helpful without being annoying. Check in a few times a day, do useful background work, but respect quiet time.
|
||||||
|
|
||||||
|
## Make It Yours
|
||||||
|
|
||||||
|
This is a starting point. Add your own conventions, style, and rules as you figure out what works.
|
||||||
|
|
||||||
|
## Orchestration Completion Protocol
|
||||||
|
After any orchestration chain completes (research → review → condensation):
|
||||||
|
1. Secretary MUST be the final agent tasked
|
||||||
|
2. Secretary produces the condensation file AND posts a distillate to Discord #reports
|
||||||
|
3. Manager should include in Secretary's task: "Post a distillate to Discord #reports summarizing this orchestration"
|
||||||
133
t420-openclaw/ATOCORE-OPERATIONS.md
Normal file
133
t420-openclaw/ATOCORE-OPERATIONS.md
Normal file
@@ -0,0 +1,133 @@
|
|||||||
|
# AtoCore Operations
|
||||||
|
|
||||||
|
This is the current operating playbook for making AtoCore more dependable and higher-signal.
|
||||||
|
|
||||||
|
## Order of Work
|
||||||
|
|
||||||
|
1. Retrieval-quality pass
|
||||||
|
2. Wave 2 trusted-operational ingestion
|
||||||
|
3. AtoDrive clarification
|
||||||
|
4. Restore and ops validation
|
||||||
|
|
||||||
|
## 1. Retrieval-Quality Pass
|
||||||
|
|
||||||
|
Observed behavior from the live service:
|
||||||
|
|
||||||
|
- broad prompts like `gigabit` and `polisher` still surface archive/history noise
|
||||||
|
- meaningful prompts like `mirror frame stiffness requirements and selected architecture` are much sharper
|
||||||
|
- now that the corpus is large enough, ranking quality matters more than raw corpus presence
|
||||||
|
|
||||||
|
Use these commands first:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python atocore.py audit-query "gigabit" 5
|
||||||
|
python atocore.py audit-query "polisher" 5
|
||||||
|
python atocore.py audit-query "mirror frame stiffness requirements and selected architecture" 5 p04-gigabit
|
||||||
|
python atocore.py audit-query "interferometer error budget and vendor selection constraints" 5 p05-interferometer
|
||||||
|
python atocore.py audit-query "polisher system map shared contracts and calibration workflow" 5 p06-polisher
|
||||||
|
```
|
||||||
|
|
||||||
|
What to fix in the retrieval pass:
|
||||||
|
|
||||||
|
- reduce `_archive`, `pre-cleanup`, `pre-migration`, and `History` prominence
|
||||||
|
- prefer current-status, decision, requirement, architecture-freeze, and milestone docs
|
||||||
|
- prefer trusted project-state over freeform notes when both speak to current truth
|
||||||
|
- keep broad prompts from matching stale or generic chunks too easily
|
||||||
|
|
||||||
|
Suggested acceptance bar:
|
||||||
|
|
||||||
|
- top 5 for active-project prompts contain at least one current-status or next-focus item
|
||||||
|
- top 5 contain at least one decision or architecture-baseline item
|
||||||
|
- top 5 contain at least one requirement or constraints item
|
||||||
|
- broad single-word prompts no longer lead with archive/history chunks
|
||||||
|
|
||||||
|
## 2. Wave 2 Trusted-Operational Ingestion
|
||||||
|
|
||||||
|
Do not ingest the whole PKM vault next.
|
||||||
|
|
||||||
|
Wave 2 should ingest trusted operational truth for each active project:
|
||||||
|
|
||||||
|
- current status dashboard or status note
|
||||||
|
- current decisions / decision log
|
||||||
|
- requirements baseline
|
||||||
|
- architecture freeze / current baseline
|
||||||
|
- milestone plan
|
||||||
|
- next actions / near-term focus
|
||||||
|
|
||||||
|
Recommended helper flow:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python atocore.py project-state p04-gigabit
|
||||||
|
python atocore.py project-state p05-interferometer
|
||||||
|
python atocore.py project-state p06-polisher
|
||||||
|
python atocore.py project-state-set p04-gigabit status next_focus "Continue curated support and frame-context buildout." "Wave 2 status dashboard" 1.0
|
||||||
|
python atocore.py project-state-set p05-interferometer requirement key_constraints "Preserve current error-budget, thermal, and vendor-selection constraints as the working baseline." "Wave 2 requirements baseline" 1.0
|
||||||
|
python atocore.py project-state-set p06-polisher decision system_boundary "The suite remains a three-layer chain with explicit planning, translation, and execution boundaries." "Wave 2 decision log" 1.0
|
||||||
|
python atocore.py refresh-project p04-gigabit
|
||||||
|
python atocore.py refresh-project p05-interferometer
|
||||||
|
python atocore.py refresh-project p06-polisher
|
||||||
|
```
|
||||||
|
|
||||||
|
Use project-state for the most authoritative "current truth" fields, then refresh the registered project roots after curated Wave 2 documents land.
|
||||||
|
|
||||||
|
## 3. AtoDrive Clarification
|
||||||
|
|
||||||
|
AtoDrive should become a trusted-operational source, not a generic corpus dump.
|
||||||
|
|
||||||
|
Good AtoDrive candidates:
|
||||||
|
|
||||||
|
- current dashboards
|
||||||
|
- current baselines
|
||||||
|
- approved architecture docs
|
||||||
|
- decision logs
|
||||||
|
- milestone and next-step views
|
||||||
|
- operational source-of-truth files that humans actively maintain
|
||||||
|
|
||||||
|
Avoid as default AtoDrive ingest:
|
||||||
|
|
||||||
|
- large generic archives
|
||||||
|
- duplicated exports
|
||||||
|
- stale snapshots when a newer baseline exists
|
||||||
|
- exploratory notes that are not designated current truth
|
||||||
|
|
||||||
|
Rule of thumb:
|
||||||
|
|
||||||
|
- if the file answers "what is true now?" it may belong in trusted-operational
|
||||||
|
- if the file mostly answers "what did we think at some point?" it belongs in the broader corpus, not Wave 2
|
||||||
|
|
||||||
|
## 4. Restore and Ops Validation
|
||||||
|
|
||||||
|
Backups are not enough until restore has been tested.
|
||||||
|
|
||||||
|
Validate these explicitly:
|
||||||
|
|
||||||
|
- SQLite metadata restore
|
||||||
|
- Chroma restore or rebuild
|
||||||
|
- registry restore
|
||||||
|
- source-root refresh after restore
|
||||||
|
- health and stats consistency after recovery
|
||||||
|
|
||||||
|
Recommended restore drill:
|
||||||
|
|
||||||
|
1. Record current `health`, `stats`, and `projects` output.
|
||||||
|
2. Restore SQLite metadata and project registry from backup.
|
||||||
|
3. Decide whether Chroma is restored from backup or rebuilt from source.
|
||||||
|
4. Run project refresh for active projects.
|
||||||
|
5. Compare vector/doc counts and run retrieval audits again.
|
||||||
|
|
||||||
|
Commands to capture the before/after baseline:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python atocore.py health
|
||||||
|
python atocore.py stats
|
||||||
|
python atocore.py projects
|
||||||
|
python atocore.py audit-query "gigabit" 5
|
||||||
|
python atocore.py audit-query "interferometer error budget and vendor selection constraints" 5 p05-interferometer
|
||||||
|
```
|
||||||
|
|
||||||
|
Recovery policy decision still needed:
|
||||||
|
|
||||||
|
- prefer Chroma backup restore for fast recovery when backup integrity is trusted
|
||||||
|
- prefer Chroma rebuild when backups are suspect, schema changed, or ranking behavior drifts unexpectedly
|
||||||
|
|
||||||
|
The important part is to choose one policy on purpose and validate it, not leave it implicit.
|
||||||
105
t420-openclaw/SKILL.md
Normal file
105
t420-openclaw/SKILL.md
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
---
|
||||||
|
name: atocore-context
|
||||||
|
description: Use Dalidou-hosted AtoCore as a read-only external context service for project state, retrieval, and context-building without touching OpenClaw's own memory.
|
||||||
|
---
|
||||||
|
|
||||||
|
# AtoCore Context
|
||||||
|
|
||||||
|
Use this skill when you need trusted project context, retrieval help, or AtoCore
|
||||||
|
health/status from the canonical Dalidou instance.
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
AtoCore is an additive external context service.
|
||||||
|
|
||||||
|
- It does not replace OpenClaw's own memory.
|
||||||
|
- It should be used for contextual work, not trivial prompts.
|
||||||
|
- It is read-only in this first integration batch.
|
||||||
|
- If AtoCore is unavailable, continue normally.
|
||||||
|
|
||||||
|
## Canonical Endpoint
|
||||||
|
|
||||||
|
Default base URL:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
http://dalidou:8100
|
||||||
|
```
|
||||||
|
|
||||||
|
Override with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ATOCORE_BASE_URL=http://host:port
|
||||||
|
```
|
||||||
|
|
||||||
|
## Safe Usage
|
||||||
|
|
||||||
|
Use AtoCore for:
|
||||||
|
- project-state checks
|
||||||
|
- automatic project detection for normal project questions
|
||||||
|
- retrieval-quality audits before declaring a project corpus "good enough"
|
||||||
|
- retrieval over ingested project/ecosystem docs
|
||||||
|
- context-building for complex project prompts
|
||||||
|
- verifying current AtoCore hosting and architecture state
|
||||||
|
- listing registered projects and refreshing a known project source set
|
||||||
|
- inspecting the project registration template before proposing a new project entry
|
||||||
|
- generating a proposal preview for a new project registration without writing it
|
||||||
|
- registering an approved project entry when explicitly requested
|
||||||
|
- updating an existing registered project when aliases or description need refinement
|
||||||
|
|
||||||
|
Do not use AtoCore for:
|
||||||
|
- automatic memory write-back
|
||||||
|
- replacing OpenClaw memory
|
||||||
|
- silent ingestion of broad new corpora without approval
|
||||||
|
- ingesting the whole PKM vault before trusted operational truth is staged
|
||||||
|
- mutating the registry automatically without human approval
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh health
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh sources
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh stats
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh projects
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh project-template
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh detect-project "what's the interferometer error budget?"
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh auto-context "what's the interferometer error budget?" 3000
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh debug-context
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh audit-query "gigabit" 5
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh audit-query "mirror frame stiffness requirements and selected architecture" 5 p04-gigabit
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh propose-project p07-example "p07,example-project" vault incoming/projects/p07-example "Example project" "Primary staged project docs"
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh register-project p07-example "p07,example-project" vault incoming/projects/p07-example "Example project" "Primary staged project docs"
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh update-project p05 "Curated staged docs for the P05 interferometer architecture, vendors, and error-budget project."
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh refresh-project p05
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh project-state atocore
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh project-state-set p05-interferometer status next_focus "Freeze current error-budget baseline and vendor downselect." "Wave 2 status dashboard" 1.0
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh project-state-invalidate p05-interferometer status next_focus
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh query "What is AtoDrive?"
|
||||||
|
~/clawd/skills/atocore-context/scripts/atocore.sh context-build "Need current AtoCore architecture" atocore 3000
|
||||||
|
```
|
||||||
|
|
||||||
|
Direct Python entrypoint for non-Bash environments:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python ~/clawd/skills/atocore-context/scripts/atocore.py health
|
||||||
|
```
|
||||||
|
|
||||||
|
## Contract
|
||||||
|
|
||||||
|
- prefer AtoCore only when additional context is genuinely useful
|
||||||
|
- trust AtoCore as additive context, not as a hard runtime dependency
|
||||||
|
- fail open if the service errors or times out
|
||||||
|
- cite when information came from AtoCore rather than local OpenClaw memory
|
||||||
|
- for normal project knowledge questions, prefer `auto-context "<prompt>" 3000` before answering
|
||||||
|
- use `detect-project "<prompt>"` when you want to inspect project inference explicitly
|
||||||
|
- use `debug-context` right after `auto-context` or `context-build` when you want
|
||||||
|
to inspect the exact last AtoCore context pack
|
||||||
|
- use `audit-query "<prompt>" 5 [project]` when retrieval quality is in question, especially for broad prompts
|
||||||
|
- prefer `projects` plus `refresh-project <id>` over long ad hoc ingest instructions when the project is already registered
|
||||||
|
- use `project-template` when preparing a new project registration proposal
|
||||||
|
- use `propose-project ...` to draft a normalized entry and review collisions first
|
||||||
|
- use `register-project ...` only after the proposal has been reviewed and approved
|
||||||
|
- use `update-project ...` when a registered project's description or aliases need refinement before refresh
|
||||||
|
- use `project-state-set` for trusted operational truth such as current status, current decisions, frozen requirements, milestone baselines, and next actions
|
||||||
|
- do Wave 2 before broad PKM expansion: status dashboards, decision logs, milestone views, current baseline docs, and next-step views
|
||||||
|
- treat AtoDrive as a curated trusted-operational source, not a generic dump of miscellaneous drive files
|
||||||
|
- validate restore posture explicitly; a backup is not trusted until restore or rebuild steps have been exercised successfully
|
||||||
279
t420-openclaw/TOOLS.md
Normal file
279
t420-openclaw/TOOLS.md
Normal file
@@ -0,0 +1,279 @@
|
|||||||
|
# TOOLS.md - Local Notes
|
||||||
|
|
||||||
|
|
||||||
|
## AtoCore (External Context Service)
|
||||||
|
|
||||||
|
- **Canonical Host:** http://dalidou:8100
|
||||||
|
- **Role:** Read-only external context service for trusted project state, retrieval, context-building, registered project refresh, project registration discovery, and retrieval-quality auditing
|
||||||
|
- **Machine state lives on:** Dalidou (/srv/storage/atocore/data/...)
|
||||||
|
- **Rule:** Use AtoCore as additive context only; do not treat it as a replacement for OpenClaw memory
|
||||||
|
- **Helper script:** /home/papa/clawd/skills/atocore-context/scripts/atocore.sh
|
||||||
|
- **Python fallback:** `/home/papa/clawd/skills/atocore-context/scripts/atocore.py` for non-Bash environments
|
||||||
|
- **Key commands:** `projects`, `project-template`, `detect-project "<prompt>"`, `auto-context "<prompt>" [budget] [project]`, `debug-context`, `audit-query "<prompt>" [top_k] [project]`, `propose-project ...`, `register-project ...`, `update-project <id> "description" ["aliases"]`, `refresh-project <id>`, `project-state <id> [category]`, `project-state-set <project> <category> <key> <value> [source] [confidence]`, `project-state-invalidate <project> <category> <key>`, `context-build ...`
|
||||||
|
- **Fail-open rule:** If AtoCore is unavailable, continue normal OpenClaw behavior
|
||||||
|
|
||||||
|
### Organic Usage Rule
|
||||||
|
|
||||||
|
- For normal project knowledge questions, try `auto-context` first.
|
||||||
|
- For retrieval complaints or broad-prompt drift, run `audit-query` before changing ingestion scope.
|
||||||
|
- Use `project-state` when you want trusted current truth only.
|
||||||
|
- Use `project-state-set` for current status, current decisions, baseline requirements, milestone views, and next actions.
|
||||||
|
- Use `query` for quick probing/debugging.
|
||||||
|
- Use `context-build` when you already know the project and want the exact context pack.
|
||||||
|
- Use `debug-context` right after `auto-context` or `context-build` if you want
|
||||||
|
to inspect the exact AtoCore supplement being fed into the workflow.
|
||||||
|
- Do Wave 2 trusted-operational ingestion before broad PKM expansion.
|
||||||
|
- Treat AtoDrive as a curated operational-truth source, not a generic bulk ingest target.
|
||||||
|
- Keep purely local coding tasks local unless broader project context is likely to help.
|
||||||
|
|
||||||
|
## PKM / Obsidian Vault
|
||||||
|
|
||||||
|
- **Local Path:** `/home/papa/obsidian-vault/`
|
||||||
|
- **Name:** Antoine Brain Extension
|
||||||
|
- **Sync:** Syncthing (syncs from dalidou)
|
||||||
|
- **Access:** ✅ Direct local access — no SSH needed!
|
||||||
|
|
||||||
|
## ATODrive (Work Documents)
|
||||||
|
|
||||||
|
- **Local Path:** `/home/papa/ATODrive/`
|
||||||
|
- **Sync:** Syncthing (syncs from dalidou SeaDrive)
|
||||||
|
- **Access:** ✅ Direct local access
|
||||||
|
|
||||||
|
## Atomaste (Business/Templates)
|
||||||
|
|
||||||
|
- **Local Path:** `/home/papa/Atomaste/`
|
||||||
|
- **Sync:** Syncthing (syncs from dalidou SeaDrive)
|
||||||
|
- **Access:** ✅ Direct local access
|
||||||
|
|
||||||
|
## Atomaste Finance (Canonical Expense System)
|
||||||
|
|
||||||
|
- **Single home:** `/home/papa/Atomaste/03_Finances/Expenses/`
|
||||||
|
- **Rule:** If it is expense-related, it belongs under `Expenses/`, not `Documents/Receipts/`
|
||||||
|
- **Per-year structure:**
|
||||||
|
- `YYYY/Inbox/` — unprocessed incoming receipts/screenshots
|
||||||
|
- `YYYY/receipts/` — final home for processed raw receipt files
|
||||||
|
- `YYYY/expenses_master.csv` — main structured expense table
|
||||||
|
- `YYYY/reports/` — derived summaries, exports, tax packages
|
||||||
|
- **Workflow:** Inbox → `expenses_master.csv` → `receipts/`
|
||||||
|
- **Legacy path:** `/home/papa/Atomaste/03_Finances/Documents/Receipts/` is deprecated and should stay unused except for the migration note
|
||||||
|
|
||||||
|
## Odile Inc (Corporate)
|
||||||
|
|
||||||
|
- **Local Path:** `/home/papa/Odile Inc/`
|
||||||
|
- **Sync:** Syncthing (syncs from dalidou SeaDrive `My Libraries\Odile\Odile Inc`)
|
||||||
|
- **Access:** ✅ Direct local access
|
||||||
|
- **Entity:** Odile Bérubé O.D. Inc. (SPCC, optometrist)
|
||||||
|
- **Fiscal year end:** July 31
|
||||||
|
- **Structure:** `01_Finances/` (BankStatements, Expenses, Payroll, Revenue, Taxes), `02_Admin/`, `Inbox/`
|
||||||
|
- **Rule:** Corporate docs go here, personal docs go to `Impôts Odile/Dossier_Fiscal_YYYY/`
|
||||||
|
|
||||||
|
## Impôts Odile (Personal Tax)
|
||||||
|
|
||||||
|
- **Local Path:** `/home/papa/Impôts Odile/`
|
||||||
|
- **Sync:** Syncthing (syncs from dalidou SeaDrive)
|
||||||
|
- **Access:** ✅ Direct local access
|
||||||
|
- **Structure:** `Dossier_Fiscal_YYYY/` (9 sections: revenus, dépenses, crédits, feuillets, REER, dons, comptable, frais médicaux, budget)
|
||||||
|
- **Cron:** Monthly receipt processing (1st of month, 2 PM ET) scans mario@atomaste.ca for Odile's emails
|
||||||
|
|
||||||
|
## Git Repos (via Gitea)
|
||||||
|
|
||||||
|
- **Gitea URL:** http://100.80.199.40:3000
|
||||||
|
- **Auth:** Token in `~/.gitconfig` — **ALWAYS use auth for API calls** (private repos won't show without it)
|
||||||
|
- **API Auth Header:** `Authorization: token $(git config --get credential.http://100.80.199.40:3000.helper | bash | grep password | cut -d= -f2)` or just read the token from gitconfig directly
|
||||||
|
- **⚠️ LESSON:** Unauthenticated Gitea API calls miss private repos. Always authenticate.
|
||||||
|
- **Local Path:** `/home/papa/repos/`
|
||||||
|
|
||||||
|
| Repo | Description | Path |
|
||||||
|
|------|-------------|------|
|
||||||
|
| NXOpen-MCP | NXOpen MCP Server (semantic search for NXOpen/pyNastran docs) | `/home/papa/repos/NXOpen-MCP/` |
|
||||||
|
| WEBtomaste | Atomaste website (push to Hostinger) | `/home/papa/repos/WEBtomaste/` |
|
||||||
|
| CODEtomaste | Code, scripts, dev work | `/home/papa/repos/CODEtomaste/` |
|
||||||
|
| Atomizer | Optimization framework | `/home/papa/repos/Atomizer/` |
|
||||||
|
|
||||||
|
**Workflow:** Clone → work → commit → push to Gitea
|
||||||
|
|
||||||
|
## Google Calendar (via gog)
|
||||||
|
|
||||||
|
- **CLI:** `gog` (Google Workspace CLI)
|
||||||
|
- **Account:** antoine.letarte@gmail.com
|
||||||
|
- **Scopes:** Calendar only (no Gmail, Drive, etc.)
|
||||||
|
- **Commands:**
|
||||||
|
- `gog calendar events --max 10` — List upcoming events
|
||||||
|
- `gog calendar calendars` — List calendars
|
||||||
|
- `gog calendar create --summary "Meeting" --start "2026-01-28T10:00:00"` — Create event
|
||||||
|
|
||||||
|
### Vault Structure (PARA)
|
||||||
|
```
|
||||||
|
obsidian/
|
||||||
|
├── 0-Inbox/ # Quick captures, process weekly
|
||||||
|
├── 1-Areas/ # Ongoing responsibilities
|
||||||
|
│ ├── Personal/ # Finance, Health, Family, Home
|
||||||
|
│ └── Professional/ # Atomaste/, Engineering/
|
||||||
|
├── 2-Projects/ # Active work with deadlines
|
||||||
|
│ ├── P04-GigaBIT-M1/ # Current main project (StarSpec)
|
||||||
|
│ ├── Atomizer-AtomasteAI/
|
||||||
|
│ └── _Archive/ # Completed projects
|
||||||
|
├── 3-Resources/ # Reference material
|
||||||
|
│ ├── People/ # Clients, Suppliers, Colleagues
|
||||||
|
│ ├── Tools/ # Software, Hardware guides
|
||||||
|
│ └── Concepts/ # Technical concepts
|
||||||
|
├── 4-Calendar/ # Time-based notes
|
||||||
|
│ └── Logs/
|
||||||
|
│ ├── Daily Notes/ # TODAY only
|
||||||
|
│ ├── Daily Notes/Archive/ # Past notes
|
||||||
|
│ ├── Weekly Notes/
|
||||||
|
│ └── Meeting Notes/
|
||||||
|
├── Atlas/MAPS/ # Topic indexes (MOCs)
|
||||||
|
└── X/ # Templates, Images, System files
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Commands (DOD Workflow)
|
||||||
|
- `/morning` - Prepare daily note, check calendar, process overnight transcripts
|
||||||
|
- `/eod` - Shutdown routine: compile metrics, draft carry-forward, prep tomorrow
|
||||||
|
- `/log [x]` - Add timestamped entry to Log section
|
||||||
|
- `/done [task]` - Mark task complete + log it
|
||||||
|
- `/block [task]` - Add blocker to Active Context
|
||||||
|
- `/idea [x]` - Add to Capture > Ideas
|
||||||
|
- `/status` - Today's progress summary
|
||||||
|
- `/tomorrow` - Draft tomorrow's plan
|
||||||
|
- `/push` - Commit CAD work to Gitea
|
||||||
|
|
||||||
|
### Daily Note Location
|
||||||
|
`/home/papa/obsidian-vault/4-Calendar/Logs/Daily Notes/YYYY-MM-DD.md`
|
||||||
|
|
||||||
|
### Transcript Inbox
|
||||||
|
`/home/papa/obsidian-vault/0-Inbox/Transcripts/` — subfolders: daily, ideas, instructions, journal, reviews, meetings, captures, notes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Access Boundaries
|
||||||
|
|
||||||
|
See **SECURITY.md** for full details. Summary:
|
||||||
|
|
||||||
|
**I have access to:**
|
||||||
|
- `/home/papa/clawd/` (my workspace)
|
||||||
|
- `/home/papa/obsidian-vault/` (PKM via Syncthing)
|
||||||
|
- `/home/papa/ATODrive/` (work docs via Syncthing)
|
||||||
|
- `/home/papa/Atomaste/` (business/templates via Syncthing)
|
||||||
|
|
||||||
|
**I do NOT have access to:**
|
||||||
|
- Personal SeaDrive folders (Finance, Antoine, Adaline, Odile, Movies)
|
||||||
|
- Photos, email backups, Paperless, Home Assistant
|
||||||
|
- Direct dalidou access (removed SSHFS mount 2026-01-27)
|
||||||
|
|
||||||
|
**Restricted SSH access:**
|
||||||
|
- User `mario@dalidou` exists for on-demand access (no folder permissions by default)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Atomaste Report System
|
||||||
|
|
||||||
|
Once Atomaste folder is synced, templates will be at:
|
||||||
|
`/home/papa/Atomaste/Templates/Atomaste_Report_Standard/`
|
||||||
|
|
||||||
|
**Build command** (local, once synced):
|
||||||
|
```bash
|
||||||
|
cd /home/papa/Atomaste/Templates/Atomaste_Report_Standard
|
||||||
|
python3 scripts/build-report.py input.md -o output.pdf
|
||||||
|
```
|
||||||
|
|
||||||
|
*Pending: Syncthing setup for Atomaste folder.*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Web Hosting
|
||||||
|
|
||||||
|
- **Provider:** Hostinger
|
||||||
|
- **Domain:** atomaste.ca
|
||||||
|
- **Repo:** `webtomaste` on Gitea
|
||||||
|
- **Note:** I can't push to Gitea directly (no SSH access)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Email
|
||||||
|
|
||||||
|
### Sending
|
||||||
|
- **Address:** mario@atomaste.ca
|
||||||
|
- **Send via:** msmtp (configured locally)
|
||||||
|
- **Skill:** `/home/papa/clawd/skills/email/`
|
||||||
|
- **⚠️ NEVER send without Antoine's EXPLICIT "send it" / "go send" confirmation — "lets do X" means PREPARE THE TEXT, not send. Always show final draft and wait for explicit send command. NO EXCEPTIONS. (Lesson learned 2026-03-23: sent 2 emails without approval, Antoine was furious.)**
|
||||||
|
- **Always use `send-email.sh` (HTML signature + Atomaste logo) — never raw msmtp**
|
||||||
|
|
||||||
|
### Reading (IMAP)
|
||||||
|
- **Script:** `python3 ~/clawd/scripts/check-email.py`
|
||||||
|
- **Credentials:** `~/.config/atomaste-mail/imap.conf` (chmod 600)
|
||||||
|
- **Server:** `imap.hostinger.com:993` (SSL)
|
||||||
|
- **Mailboxes:**
|
||||||
|
- `mario@atomaste.ca` — also receives `antoine@atomaste.ca` forwards
|
||||||
|
- `contact@atomaste.ca` — general Atomaste inbox
|
||||||
|
- **Commands:**
|
||||||
|
- `python3 ~/clawd/scripts/check-email.py --unread` — unread from both
|
||||||
|
- `python3 ~/clawd/scripts/check-email.py --account mario --max 10 --days 3`
|
||||||
|
- `python3 ~/clawd/scripts/check-email.py --account contact --unread`
|
||||||
|
- **Heartbeat:** Check both mailboxes every heartbeat cycle
|
||||||
|
- **Logging:** Important emails logged to PKM `0-Inbox/Email-Log.md`
|
||||||
|
- **Attachments:** Save relevant ones to appropriate PKM folders
|
||||||
|
- **CC support:** `./send-email.sh "to@email.com" "Subject" "<p>Body</p>" --cc "cc@email.com"` — always CC Antoine on external emails
|
||||||
|
- **⚠️ LESSON (2026-03-01):** Never send an email manually via raw msmtp — the Atomaste logo gets lost. Always use send-email.sh. If a feature is missing (like CC was), fix the script first, then send once. Don't send twice.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## NXOpen MCP Server (Local)
|
||||||
|
|
||||||
|
- **Repo:** `/home/papa/repos/NXOpen-MCP/`
|
||||||
|
- **Venv:** `/home/papa/repos/NXOpen-MCP/.venv/`
|
||||||
|
- **Data:** `/home/papa/repos/NXOpen-MCP/data/` (classes.json, methods.json, functions.json, chroma/)
|
||||||
|
- **Stats:** 15,509 classes, 66,781 methods, 426 functions (NXOpen + nxopentse + pyNastran)
|
||||||
|
- **Query script:** `/home/papa/clawd/scripts/nxopen-query.sh`
|
||||||
|
|
||||||
|
### How to Use
|
||||||
|
The database is async. Use the venv Python:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Search (semantic)
|
||||||
|
/home/papa/clawd/scripts/nxopen-query.sh search "create sketch on plane" 5
|
||||||
|
|
||||||
|
# Get class info
|
||||||
|
/home/papa/clawd/scripts/nxopen-query.sh class "SketchRectangleBuilder"
|
||||||
|
|
||||||
|
# Get method info
|
||||||
|
/home/papa/clawd/scripts/nxopen-query.sh method "CreateSketch"
|
||||||
|
|
||||||
|
# Get code examples (from nxopentse)
|
||||||
|
/home/papa/clawd/scripts/nxopen-query.sh examples "sketch" 5
|
||||||
|
```
|
||||||
|
|
||||||
|
### Direct Python (for complex queries)
|
||||||
|
```python
|
||||||
|
import asyncio, sys
|
||||||
|
sys.path.insert(0, '/home/papa/repos/NXOpen-MCP/src')
|
||||||
|
from nxopen_mcp.database import NXOpenDatabase
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
db = NXOpenDatabase('/home/papa/repos/NXOpen-MCP/data')
|
||||||
|
if hasattr(db, 'initialize'): await db.initialize()
|
||||||
|
results = await db.search('your query', limit=10)
|
||||||
|
# results are SearchResult objects with .title, .summary, .type, .namespace
|
||||||
|
|
||||||
|
asyncio.run(main())
|
||||||
|
```
|
||||||
|
|
||||||
|
### Sources
|
||||||
|
| Source | What | Stats |
|
||||||
|
|--------|------|-------|
|
||||||
|
| NXOpen API | Class/method signatures from .pyi stubs | 15,219 classes, 64,320 methods |
|
||||||
|
| nxopentse | Helper functions with working NXOpen code | 149 functions, 3 classes |
|
||||||
|
| pyNastran | BDF/OP2 classes for Nastran file manipulation | 287 classes, 277 functions |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Add specific paths, voice preferences, camera names, etc. as I learn them.*
|
||||||
|
|
||||||
|
## Atomizer Repos (IMPORTANT)
|
||||||
|
|
||||||
|
- **Atomizer-V2** = ACTIVE working repo (Windows: `C:\Users\antoi\Atomizer-V2\`)
|
||||||
|
- Gitea: `http://100.80.199.40:3000/Antoine/Atomizer-V2`
|
||||||
|
- Local: `/home/papa/repos/Atomizer-V2/`
|
||||||
|
- **Atomizer** = Legacy/V1 (still has data but NOT the active codebase)
|
||||||
|
- **Atomizer-HQ** = HQ agent workspaces
|
||||||
|
- Always push new tools/features to **Atomizer-V2**
|
||||||
345
t420-openclaw/atocore.py
Normal file
345
t420-openclaw/atocore.py
Normal file
@@ -0,0 +1,345 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import urllib.error
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.request
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
|
||||||
|
BASE_URL = os.environ.get("ATOCORE_BASE_URL", "http://dalidou:8100").rstrip("/")
|
||||||
|
TIMEOUT = int(os.environ.get("ATOCORE_TIMEOUT_SECONDS", "30"))
|
||||||
|
REFRESH_TIMEOUT = int(os.environ.get("ATOCORE_REFRESH_TIMEOUT_SECONDS", "1800"))
|
||||||
|
FAIL_OPEN = os.environ.get("ATOCORE_FAIL_OPEN", "true").lower() == "true"
|
||||||
|
|
||||||
|
|
||||||
|
USAGE = """Usage:
|
||||||
|
atocore.py health
|
||||||
|
atocore.py sources
|
||||||
|
atocore.py stats
|
||||||
|
atocore.py projects
|
||||||
|
atocore.py project-template
|
||||||
|
atocore.py detect-project <prompt>
|
||||||
|
atocore.py auto-context <prompt> [budget] [project]
|
||||||
|
atocore.py debug-context
|
||||||
|
atocore.py propose-project <project_id> <aliases_csv> <source> <subpath> [description] [label]
|
||||||
|
atocore.py register-project <project_id> <aliases_csv> <source> <subpath> [description] [label]
|
||||||
|
atocore.py update-project <project> <description> [aliases_csv]
|
||||||
|
atocore.py refresh-project <project> [purge_deleted]
|
||||||
|
atocore.py project-state <project> [category]
|
||||||
|
atocore.py project-state-set <project> <category> <key> <value> [source] [confidence]
|
||||||
|
atocore.py project-state-invalidate <project> <category> <key>
|
||||||
|
atocore.py query <prompt> [top_k] [project]
|
||||||
|
atocore.py context-build <prompt> [project] [budget]
|
||||||
|
atocore.py audit-query <prompt> [top_k] [project]
|
||||||
|
atocore.py ingest-sources
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def print_json(payload: Any) -> None:
|
||||||
|
print(json.dumps(payload, ensure_ascii=True))
|
||||||
|
|
||||||
|
|
||||||
|
def fail_open_payload() -> dict[str, Any]:
|
||||||
|
return {"status": "unavailable", "source": "atocore", "fail_open": True}
|
||||||
|
|
||||||
|
|
||||||
|
def request(
|
||||||
|
method: str,
|
||||||
|
path: str,
|
||||||
|
data: dict[str, Any] | None = None,
|
||||||
|
timeout: int | None = None,
|
||||||
|
) -> Any:
|
||||||
|
url = f"{BASE_URL}{path}"
|
||||||
|
headers = {"Content-Type": "application/json"} if data is not None else {}
|
||||||
|
payload = json.dumps(data).encode("utf-8") if data is not None else None
|
||||||
|
req = urllib.request.Request(url, data=payload, headers=headers, method=method)
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout or TIMEOUT) as response:
|
||||||
|
body = response.read().decode("utf-8")
|
||||||
|
except urllib.error.HTTPError as exc:
|
||||||
|
body = exc.read().decode("utf-8")
|
||||||
|
if body:
|
||||||
|
print(body)
|
||||||
|
raise SystemExit(22) from exc
|
||||||
|
except (urllib.error.URLError, TimeoutError, OSError):
|
||||||
|
if FAIL_OPEN:
|
||||||
|
print_json(fail_open_payload())
|
||||||
|
raise SystemExit(0)
|
||||||
|
raise
|
||||||
|
|
||||||
|
if not body.strip():
|
||||||
|
return {}
|
||||||
|
return json.loads(body)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_aliases(aliases_csv: str) -> list[str]:
|
||||||
|
return [alias.strip() for alias in aliases_csv.split(",") if alias.strip()]
|
||||||
|
|
||||||
|
|
||||||
|
def project_payload(
|
||||||
|
project_id: str,
|
||||||
|
aliases_csv: str,
|
||||||
|
source: str,
|
||||||
|
subpath: str,
|
||||||
|
description: str,
|
||||||
|
label: str,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"project_id": project_id,
|
||||||
|
"aliases": parse_aliases(aliases_csv),
|
||||||
|
"description": description,
|
||||||
|
"ingest_roots": [{"source": source, "subpath": subpath, "label": label}],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def detect_project(prompt: str) -> dict[str, Any]:
|
||||||
|
payload = request("GET", "/projects")
|
||||||
|
prompt_lower = prompt.lower()
|
||||||
|
best_project = None
|
||||||
|
best_alias = None
|
||||||
|
best_score = -1
|
||||||
|
|
||||||
|
for project in payload.get("projects", []):
|
||||||
|
candidates = [project.get("id", ""), *project.get("aliases", [])]
|
||||||
|
for candidate in candidates:
|
||||||
|
candidate = (candidate or "").strip()
|
||||||
|
if not candidate:
|
||||||
|
continue
|
||||||
|
pattern = rf"(?<![a-z0-9]){re.escape(candidate.lower())}(?![a-z0-9])"
|
||||||
|
matched = re.search(pattern, prompt_lower) is not None
|
||||||
|
if not matched and candidate.lower() not in prompt_lower:
|
||||||
|
continue
|
||||||
|
score = len(candidate)
|
||||||
|
if score > best_score:
|
||||||
|
best_project = project.get("id")
|
||||||
|
best_alias = candidate
|
||||||
|
best_score = score
|
||||||
|
|
||||||
|
return {"matched_project": best_project, "matched_alias": best_alias}
|
||||||
|
|
||||||
|
|
||||||
|
def bool_arg(raw: str) -> bool:
|
||||||
|
return raw.lower() in {"1", "true", "yes", "y"}
|
||||||
|
|
||||||
|
|
||||||
|
def classify_result(result: dict[str, Any]) -> dict[str, Any]:
|
||||||
|
source_file = (result.get("source_file") or "").lower()
|
||||||
|
heading = (result.get("heading_path") or "").lower()
|
||||||
|
title = (result.get("title") or "").lower()
|
||||||
|
text = " ".join([source_file, heading, title])
|
||||||
|
|
||||||
|
labels: list[str] = []
|
||||||
|
if any(token in text for token in ["_archive", "/archive", "archive/", "pre-cleanup", "pre-migration", "history"]):
|
||||||
|
labels.append("archive_or_history")
|
||||||
|
if any(token in text for token in ["status", "dashboard", "current-state", "current state", "next-steps", "next steps"]):
|
||||||
|
labels.append("current_status")
|
||||||
|
if any(token in text for token in ["decision", "adr", "tradeoff", "selected architecture", "selection"]):
|
||||||
|
labels.append("decision")
|
||||||
|
if any(token in text for token in ["requirement", "spec", "constraints", "baseline", "cdr", "sow"]):
|
||||||
|
labels.append("requirements")
|
||||||
|
if any(token in text for token in ["roadmap", "milestone", "plan", "workflow", "calibration", "contract"]):
|
||||||
|
labels.append("execution_plan")
|
||||||
|
if not labels:
|
||||||
|
labels.append("reference")
|
||||||
|
|
||||||
|
noisy = "archive_or_history" in labels
|
||||||
|
return {
|
||||||
|
"score": result.get("score"),
|
||||||
|
"title": result.get("title"),
|
||||||
|
"heading_path": result.get("heading_path"),
|
||||||
|
"source_file": result.get("source_file"),
|
||||||
|
"labels": labels,
|
||||||
|
"is_noise_risk": noisy,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def audit_query(prompt: str, top_k: int, project: str | None) -> dict[str, Any]:
|
||||||
|
response = request(
|
||||||
|
"POST",
|
||||||
|
"/query",
|
||||||
|
{"prompt": prompt, "top_k": top_k, "project": project or None},
|
||||||
|
)
|
||||||
|
classifications = [classify_result(result) for result in response.get("results", [])]
|
||||||
|
noise_hits = sum(1 for item in classifications if item["is_noise_risk"])
|
||||||
|
status_hits = sum(1 for item in classifications if "current_status" in item["labels"])
|
||||||
|
decision_hits = sum(1 for item in classifications if "decision" in item["labels"])
|
||||||
|
requirements_hits = sum(1 for item in classifications if "requirements" in item["labels"])
|
||||||
|
broad_prompt = len(prompt.split()) <= 2
|
||||||
|
|
||||||
|
recommendations: list[str] = []
|
||||||
|
if broad_prompt:
|
||||||
|
recommendations.append("Prompt is broad; prefer a project-specific question with intent, artifact type, or constraint language.")
|
||||||
|
if noise_hits:
|
||||||
|
recommendations.append("Archive/history noise is present; prefer current-status, decision, requirements, and baseline docs in the next ingestion/ranking pass.")
|
||||||
|
if status_hits == 0:
|
||||||
|
recommendations.append("No current-status docs surfaced in the top results; Wave 2 should ingest or strengthen trusted operational truth.")
|
||||||
|
if decision_hits == 0:
|
||||||
|
recommendations.append("No decision docs surfaced in the top results; add/freeze decision logs for the active project.")
|
||||||
|
if requirements_hits == 0:
|
||||||
|
recommendations.append("No requirements/baseline docs surfaced in the top results; prioritize baseline and architecture freeze material.")
|
||||||
|
if not recommendations:
|
||||||
|
recommendations.append("Ranking looks healthy for this prompt.")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"prompt": prompt,
|
||||||
|
"project": project,
|
||||||
|
"top_k": top_k,
|
||||||
|
"broad_prompt": broad_prompt,
|
||||||
|
"noise_hits": noise_hits,
|
||||||
|
"current_status_hits": status_hits,
|
||||||
|
"decision_hits": decision_hits,
|
||||||
|
"requirements_hits": requirements_hits,
|
||||||
|
"results": classifications,
|
||||||
|
"recommendations": recommendations,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv: list[str]) -> int:
|
||||||
|
if len(argv) < 2:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
cmd = argv[1]
|
||||||
|
args = argv[2:]
|
||||||
|
|
||||||
|
if cmd == "health":
|
||||||
|
print_json(request("GET", "/health"))
|
||||||
|
return 0
|
||||||
|
if cmd == "sources":
|
||||||
|
print_json(request("GET", "/sources"))
|
||||||
|
return 0
|
||||||
|
if cmd == "stats":
|
||||||
|
print_json(request("GET", "/stats"))
|
||||||
|
return 0
|
||||||
|
if cmd == "projects":
|
||||||
|
print_json(request("GET", "/projects"))
|
||||||
|
return 0
|
||||||
|
if cmd == "project-template":
|
||||||
|
print_json(request("GET", "/projects/template"))
|
||||||
|
return 0
|
||||||
|
if cmd == "detect-project":
|
||||||
|
if not args:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
print_json(detect_project(args[0]))
|
||||||
|
return 0
|
||||||
|
if cmd == "auto-context":
|
||||||
|
if not args:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
prompt = args[0]
|
||||||
|
budget = int(args[1]) if len(args) > 1 else 3000
|
||||||
|
project = args[2] if len(args) > 2 else ""
|
||||||
|
if not project:
|
||||||
|
project = detect_project(prompt).get("matched_project") or ""
|
||||||
|
if not project:
|
||||||
|
print_json({"status": "no_project_match", "source": "atocore", "mode": "auto-context"})
|
||||||
|
return 0
|
||||||
|
print_json(request("POST", "/context/build", {"prompt": prompt, "project": project, "budget": budget}))
|
||||||
|
return 0
|
||||||
|
if cmd == "debug-context":
|
||||||
|
print_json(request("GET", "/debug/context"))
|
||||||
|
return 0
|
||||||
|
if cmd in {"propose-project", "register-project"}:
|
||||||
|
if len(args) < 4:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
payload = project_payload(
|
||||||
|
args[0],
|
||||||
|
args[1],
|
||||||
|
args[2],
|
||||||
|
args[3],
|
||||||
|
args[4] if len(args) > 4 else "",
|
||||||
|
args[5] if len(args) > 5 else "",
|
||||||
|
)
|
||||||
|
path = "/projects/proposal" if cmd == "propose-project" else "/projects/register"
|
||||||
|
print_json(request("POST", path, payload))
|
||||||
|
return 0
|
||||||
|
if cmd == "update-project":
|
||||||
|
if len(args) < 2:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
payload: dict[str, Any] = {"description": args[1]}
|
||||||
|
if len(args) > 2 and args[2].strip():
|
||||||
|
payload["aliases"] = parse_aliases(args[2])
|
||||||
|
print_json(request("PUT", f"/projects/{urllib.parse.quote(args[0])}", payload))
|
||||||
|
return 0
|
||||||
|
if cmd == "refresh-project":
|
||||||
|
if not args:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
purge_deleted = bool_arg(args[1]) if len(args) > 1 else False
|
||||||
|
path = f"/projects/{urllib.parse.quote(args[0])}/refresh?purge_deleted={str(purge_deleted).lower()}"
|
||||||
|
print_json(request("POST", path, {}, timeout=REFRESH_TIMEOUT))
|
||||||
|
return 0
|
||||||
|
if cmd == "project-state":
|
||||||
|
if not args:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
project = urllib.parse.quote(args[0])
|
||||||
|
suffix = f"?category={urllib.parse.quote(args[1])}" if len(args) > 1 and args[1] else ""
|
||||||
|
print_json(request("GET", f"/project/state/{project}{suffix}"))
|
||||||
|
return 0
|
||||||
|
if cmd == "project-state-set":
|
||||||
|
if len(args) < 4:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
payload = {
|
||||||
|
"project": args[0],
|
||||||
|
"category": args[1],
|
||||||
|
"key": args[2],
|
||||||
|
"value": args[3],
|
||||||
|
"source": args[4] if len(args) > 4 else "",
|
||||||
|
"confidence": float(args[5]) if len(args) > 5 else 1.0,
|
||||||
|
}
|
||||||
|
print_json(request("POST", "/project/state", payload))
|
||||||
|
return 0
|
||||||
|
if cmd == "project-state-invalidate":
|
||||||
|
if len(args) < 3:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
payload = {"project": args[0], "category": args[1], "key": args[2]}
|
||||||
|
print_json(request("DELETE", "/project/state", payload))
|
||||||
|
return 0
|
||||||
|
if cmd == "query":
|
||||||
|
if not args:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
prompt = args[0]
|
||||||
|
top_k = int(args[1]) if len(args) > 1 else 5
|
||||||
|
project = args[2] if len(args) > 2 else ""
|
||||||
|
print_json(request("POST", "/query", {"prompt": prompt, "top_k": top_k, "project": project or None}))
|
||||||
|
return 0
|
||||||
|
if cmd == "context-build":
|
||||||
|
if not args:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
prompt = args[0]
|
||||||
|
project = args[1] if len(args) > 1 else ""
|
||||||
|
budget = int(args[2]) if len(args) > 2 else 3000
|
||||||
|
print_json(request("POST", "/context/build", {"prompt": prompt, "project": project or None, "budget": budget}))
|
||||||
|
return 0
|
||||||
|
if cmd == "audit-query":
|
||||||
|
if not args:
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
prompt = args[0]
|
||||||
|
top_k = int(args[1]) if len(args) > 1 else 5
|
||||||
|
project = args[2] if len(args) > 2 else ""
|
||||||
|
print_json(audit_query(prompt, top_k, project or None))
|
||||||
|
return 0
|
||||||
|
if cmd == "ingest-sources":
|
||||||
|
print_json(request("POST", "/ingest/sources", {}))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
print(USAGE, end="")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main(sys.argv))
|
||||||
15
t420-openclaw/atocore.sh
Normal file
15
t420-openclaw/atocore.sh
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
|
||||||
|
if command -v python3 >/dev/null 2>&1; then
|
||||||
|
exec python3 "$SCRIPT_DIR/atocore.py" "$@"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if command -v python >/dev/null 2>&1; then
|
||||||
|
exec python "$SCRIPT_DIR/atocore.py" "$@"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Python is required to run atocore.sh" >&2
|
||||||
|
exit 1
|
||||||
@@ -183,7 +183,7 @@ class TestCapture:
|
|||||||
assert body["prompt"] == "Please explain how the backup system works in detail"
|
assert body["prompt"] == "Please explain how the backup system works in detail"
|
||||||
assert body["client"] == "claude-code"
|
assert body["client"] == "claude-code"
|
||||||
assert body["session_id"] == "test-session-123"
|
assert body["session_id"] == "test-session-123"
|
||||||
assert body["reinforce"] is False
|
assert body["reinforce"] is True
|
||||||
|
|
||||||
@mock.patch("capture_stop.urllib.request.urlopen")
|
@mock.patch("capture_stop.urllib.request.urlopen")
|
||||||
def test_skips_when_disabled(self, mock_urlopen, tmp_path):
|
def test_skips_when_disabled(self, mock_urlopen, tmp_path):
|
||||||
|
|||||||
@@ -251,3 +251,98 @@ def test_unknown_hint_falls_back_to_raw_lookup(tmp_data_dir, sample_markdown, mo
|
|||||||
|
|
||||||
pack = build_context("status?", project_hint="orphan-project", budget=2000)
|
pack = build_context("status?", project_hint="orphan-project", budget=2000)
|
||||||
assert "Solo run" in pack.formatted_context
|
assert "Solo run" in pack.formatted_context
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_memories_included_in_pack(tmp_data_dir, sample_markdown):
|
||||||
|
"""Active project-scoped memories for the target project should
|
||||||
|
land in a dedicated '--- Project Memories ---' band so the
|
||||||
|
Phase 9 reflection loop has a retrieval outlet."""
|
||||||
|
from atocore.memory.service import create_memory
|
||||||
|
|
||||||
|
init_db()
|
||||||
|
init_project_state_schema()
|
||||||
|
ingest_file(sample_markdown)
|
||||||
|
|
||||||
|
mem = create_memory(
|
||||||
|
memory_type="project",
|
||||||
|
content="the mirror architecture is Option B conical back for p04-gigabit",
|
||||||
|
project="p04-gigabit",
|
||||||
|
confidence=0.9,
|
||||||
|
)
|
||||||
|
# A sibling memory for a different project must NOT leak into the pack.
|
||||||
|
create_memory(
|
||||||
|
memory_type="project",
|
||||||
|
content="polisher suite splits into sim, post, control, contracts",
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.9,
|
||||||
|
)
|
||||||
|
|
||||||
|
pack = build_context(
|
||||||
|
"remind me about the mirror architecture",
|
||||||
|
project_hint="p04-gigabit",
|
||||||
|
budget=3000,
|
||||||
|
)
|
||||||
|
assert "--- Project Memories ---" in pack.formatted_context
|
||||||
|
assert "Option B conical back" in pack.formatted_context
|
||||||
|
assert "polisher suite splits" not in pack.formatted_context
|
||||||
|
assert pack.project_memory_chars > 0
|
||||||
|
assert mem.project == "p04-gigabit"
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_memories_absent_without_project_hint(tmp_data_dir, sample_markdown):
|
||||||
|
"""Without a project hint, project memories stay out of the pack —
|
||||||
|
cross-project bleed would rot the signal."""
|
||||||
|
from atocore.memory.service import create_memory
|
||||||
|
|
||||||
|
init_db()
|
||||||
|
init_project_state_schema()
|
||||||
|
ingest_file(sample_markdown)
|
||||||
|
|
||||||
|
create_memory(
|
||||||
|
memory_type="project",
|
||||||
|
content="scoped project knowledge that should not leak globally",
|
||||||
|
project="p04-gigabit",
|
||||||
|
confidence=0.9,
|
||||||
|
)
|
||||||
|
|
||||||
|
pack = build_context("tell me something", budget=3000)
|
||||||
|
assert "--- Project Memories ---" not in pack.formatted_context
|
||||||
|
assert pack.project_memory_chars == 0
|
||||||
|
|
||||||
|
|
||||||
|
def test_project_memories_query_relevance_ordering(tmp_data_dir, sample_markdown):
|
||||||
|
"""When the budget only fits one memory, query-relevance ordering
|
||||||
|
should pick the one the query is actually about — even if another
|
||||||
|
memory has higher confidence.
|
||||||
|
|
||||||
|
Regression for the 2026-04-11 p05-vendor-signal harness failure:
|
||||||
|
memory selection was fixed-order by confidence, so a lower-ranked
|
||||||
|
vendor memory got starved out of the budget when a query was
|
||||||
|
specifically about vendors.
|
||||||
|
"""
|
||||||
|
from atocore.memory.service import create_memory
|
||||||
|
|
||||||
|
init_db()
|
||||||
|
init_project_state_schema()
|
||||||
|
ingest_file(sample_markdown)
|
||||||
|
|
||||||
|
create_memory(
|
||||||
|
memory_type="project",
|
||||||
|
content="the folded-beam interferometer uses a CGH stage and fold mirror",
|
||||||
|
project="p05-interferometer",
|
||||||
|
confidence=0.97,
|
||||||
|
)
|
||||||
|
create_memory(
|
||||||
|
memory_type="knowledge",
|
||||||
|
content="vendor signal: Zygo Verifire SV is the strongest value path for the interferometer",
|
||||||
|
project="p05-interferometer",
|
||||||
|
confidence=0.85,
|
||||||
|
)
|
||||||
|
|
||||||
|
pack = build_context(
|
||||||
|
"what is the current vendor signal for the interferometer",
|
||||||
|
project_hint="p05-interferometer",
|
||||||
|
budget=1200, # tight enough that only one project memory fits
|
||||||
|
)
|
||||||
|
assert "Zygo Verifire SV" in pack.formatted_context
|
||||||
|
assert pack.project_memory_chars > 0
|
||||||
|
|||||||
@@ -476,6 +476,60 @@ def test_reinforce_matches_at_70_percent_threshold(tmp_data_dir):
|
|||||||
assert any(r.memory_id == mem.id for r in results)
|
assert any(r.memory_id == mem.id for r in results)
|
||||||
|
|
||||||
|
|
||||||
|
def test_reinforce_long_memory_matches_on_absolute_overlap(tmp_data_dir):
|
||||||
|
"""A paragraph-length memory should reinforce when the response
|
||||||
|
echoes a substantive subset of its distinctive tokens, even though
|
||||||
|
the overlap fraction stays well under 70%."""
|
||||||
|
init_db()
|
||||||
|
mem = create_memory(
|
||||||
|
memory_type="project",
|
||||||
|
content=(
|
||||||
|
"Interferometer architecture: a folded-beam configuration with a "
|
||||||
|
"fixed horizontal interferometer, a forty-five degree fold mirror, "
|
||||||
|
"a six-DOF CGH stage, and the mirror on its own tilting platform. "
|
||||||
|
"The fold mirror redirects the beam while the CGH shapes the wavefront."
|
||||||
|
),
|
||||||
|
project="p05-interferometer",
|
||||||
|
confidence=0.5,
|
||||||
|
)
|
||||||
|
interaction = _make_interaction(
|
||||||
|
project="p05-interferometer",
|
||||||
|
response=(
|
||||||
|
"For the interferometer we keep the folded-beam layout: horizontal "
|
||||||
|
"interferometer, fold mirror at forty-five degrees, CGH stage with "
|
||||||
|
"six DOF, and the mirror sitting on its tilting platform. The fold "
|
||||||
|
"mirror redirects the beam and the CGH shapes the wavefront."
|
||||||
|
),
|
||||||
|
)
|
||||||
|
results = reinforce_from_interaction(interaction)
|
||||||
|
assert any(r.memory_id == mem.id for r in results)
|
||||||
|
|
||||||
|
|
||||||
|
def test_reinforce_long_memory_rejects_thin_overlap(tmp_data_dir):
|
||||||
|
"""Long memory + a response that only brushes a few generic terms
|
||||||
|
must NOT reinforce — otherwise the reflection loop rots."""
|
||||||
|
init_db()
|
||||||
|
mem = create_memory(
|
||||||
|
memory_type="project",
|
||||||
|
content=(
|
||||||
|
"Polisher control system executes approved controller jobs, "
|
||||||
|
"enforces state transitions and interlocks, supports pause "
|
||||||
|
"resume and abort, and records auditable run logs while "
|
||||||
|
"never reinterpreting metrology or inventing new strategies."
|
||||||
|
),
|
||||||
|
project="p06-polisher",
|
||||||
|
confidence=0.5,
|
||||||
|
)
|
||||||
|
interaction = _make_interaction(
|
||||||
|
project="p06-polisher",
|
||||||
|
response=(
|
||||||
|
"I updated the polisher docs and fixed a typo in the run logs section."
|
||||||
|
),
|
||||||
|
)
|
||||||
|
results = reinforce_from_interaction(interaction)
|
||||||
|
assert all(r.memory_id != mem.id for r in results)
|
||||||
|
|
||||||
|
|
||||||
def test_reinforce_rejects_below_70_percent(tmp_data_dir):
|
def test_reinforce_rejects_below_70_percent(tmp_data_dir):
|
||||||
"""Only 6 of 10 content tokens present (60%) → should NOT match."""
|
"""Only 6 of 10 content tokens present (60%) → should NOT match."""
|
||||||
init_db()
|
init_db()
|
||||||
|
|||||||
Reference in New Issue
Block a user