diff --git a/docs/phase9-first-real-use.md b/docs/phase9-first-real-use.md new file mode 100644 index 0000000..7c65f81 --- /dev/null +++ b/docs/phase9-first-real-use.md @@ -0,0 +1,321 @@ +# Phase 9 First Real Use Report + +## What this is + +The first empirical exercise of the Phase 9 reflection loop after +Commits A, B, and C all landed. The goal is to find out where the +extractor and the reinforcement matcher actually behave well versus +where their behaviour drifts from the design intent. + +The validation is reproducible. To re-run: + +```bash +python scripts/phase9_first_real_use.py +``` + +This writes an isolated SQLite + Chroma store under +`data/validation/phase9-first-use/` (gitignored), seeds three active +memories, then runs eight sample interactions through the full +capture → reinforce → extract pipeline. + +## What we ran + +Eight synthetic interactions, each paraphrased from a real working +session about AtoCore itself or the active engineering projects: + +| # | Label | Project | Expected | +|---|--------------------------------------|----------------------|---------------------------| +| 1 | exdev-mount-merge-decision | atocore | 1 decision_heading | +| 2 | ownership-was-the-real-fix | atocore | 1 fact_heading | +| 3 | memory-vs-entity-canonical-home | atocore | 1 decision_heading (long) | +| 4 | auto-promotion-deferred | atocore | 1 decision_heading | +| 5 | preference-rebase-workflow | atocore | 1 preference_sentence | +| 6 | constraint-from-doc-cite | p05-interferometer | 1 constraint_heading | +| 7 | prose-only-no-cues | atocore | 0 candidates | +| 8 | multiple-cues-in-one-interaction | p06-polisher | 3 distinct rules | + +Plus 3 seed memories were inserted before the run: + +- `pref_rebase`: "prefers rebase-based workflows because history stays linear" (preference, 0.6) +- `pref_concise`: "writes commit messages focused on the why, not the what" (preference, 0.6) +- `identity_runs_atocore`: "mechanical engineer who runs AtoCore for context engineering" (identity, 0.9) + +## What happened — extraction (the good news) + +**Every extraction expectation was met exactly.** All eight samples +produced the predicted candidate count and the predicted rule +classifications: + +| Sample | Expected | Got | Pass | +|---------------------------------------|----------|-----|------| +| exdev-mount-merge-decision | 1 | 1 | ✅ | +| ownership-was-the-real-fix | 1 | 1 | ✅ | +| memory-vs-entity-canonical-home | 1 | 1 | ✅ | +| auto-promotion-deferred | 1 | 1 | ✅ | +| preference-rebase-workflow | 1 | 1 | ✅ | +| constraint-from-doc-cite | 1 | 1 | ✅ | +| prose-only-no-cues | **0** | **0** | ✅ | +| multiple-cues-in-one-interaction | 3 | 3 | ✅ | + +**Total: 9 candidates from 8 interactions, 0 false positives, 0 misses +on heading patterns or sentence patterns.** + +The extractor's strictness is well-tuned for the kinds of structural +cues we actually use. Things worth noting: + +- **Sample 7 (`prose-only-no-cues`) produced zero candidates as + designed.** This is the most important sanity check — it confirms + the extractor won't fill the review queue with general prose when + there's no structural intent. +- **Sample 3's long content was preserved without truncation.** The + 280-char max wasn't hit, and the content kept its full meaning. +- **Sample 8 produced three distinct rules in one interaction** + (decision_heading, constraint_heading, requirement_heading) without + the dedup key collapsing them. The dedup key is + `(memory_type, normalized_content, rule)` and the three are all + different on at least one axis, so they coexist as expected. +- **The prose around each heading was correctly ignored.** Sample 6 + has a second sentence ("the error budget allocates 6 nm to the + laser source...") that does NOT have a structural cue, and the + extractor correctly didn't fire on it. + +## What happened — reinforcement (the empirical finding) + +**Reinforcement matched zero seeded memories across all 8 samples, +even when the response clearly echoed the seed.** + +Sample 5's response was: + +> *"I prefer rebase-based workflows because the history stays linear +> and reviewers have an easier time."* + +The seeded `pref_rebase` memory was: + +> *"prefers rebase-based workflows because history stays linear"* + +A human reading both says these are the same fact. The reinforcement +matcher disagrees. After all 8 interactions: + +``` +pref_rebase: confidence=0.6000 refs=0 last=- +pref_concise: confidence=0.6000 refs=0 last=- +identity_runs_atocore: confidence=0.9000 refs=0 last=- +``` + +**Nothing moved.** This is the most important finding from this +validation pass. + +### Why the matcher missed it + +The current `_memory_matches` rule (in +`src/atocore/memory/reinforcement.py`) does a normalized substring +match: it lowercases both sides, collapses whitespace, then asks +"does the leading 80-char window of the memory content appear as a +substring in the response?" + +For the rebase example: + +- needle (normalized): `prefers rebase-based workflows because history stays linear` +- haystack (normalized): `i prefer rebase-based workflows because the history stays linear and reviewers have an easier time.` + +The needle starts with `prefers` (with the trailing `s`), and the +haystack has `prefer` (without the `s`, because of the first-person +voice). And the needle has `because history stays linear`, while the +haystack has `because the history stays linear`. **Two small natural +paraphrases, and the substring fails.** + +This isn't a bug in the matcher's implementation — it's doing +exactly what it was specified to do. It's a design limitation: the +substring rule is too brittle for real prose, where the same fact +gets re-stated with different verb forms, articles, and word order. + +### Severity + +**Medium-high.** Reinforcement is the entire point of Commit B. +A reinforcement matcher that never fires on natural paraphrases +will leave seeded memories with stale confidence forever. The +reflection loop runs but it doesn't actually reinforce anything. +That hollows out the value of having reinforcement at all. + +It is not a critical bug because: +- Nothing breaks. The pipeline still runs cleanly. +- Reinforcement is supposed to be a *signal*, not the only path to + high confidence — humans can still curate confidence directly. +- The candidate-extraction path (Commit C) is unaffected and works + perfectly. + +But it does need to be addressed before Phase 9 can be considered +operationally complete. + +## Recommended fix (deferred to a follow-up commit) + +Replace the substring matcher with a token-overlap matcher. The +specification: + +1. Tokenize both memory content and response into lowercase words + of length >= 3, dropping a small stop list (`the`, `a`, `an`, + `and`, `or`, `of`, `to`, `is`, `was`, `that`, `this`, `with`, + `for`, `from`, `into`). +2. Stem aggressively (or at minimum, fold trailing `s` and `ed` + so `prefers`/`prefer`/`preferred` collapse to one token). +3. A match exists if **at least 70% of the memory's content + tokens** appear in the response token set. +4. Memory content must still be at least `_MIN_MEMORY_CONTENT_LENGTH` + characters to be considered. + +This is more permissive than the substring rule but still tight +enough to avoid spurious matches on generic words. It would have +caught the rebase example because: + +- memory tokens (after stop-list and stemming): + `{prefer, rebase-bas, workflow, because, history, stay, linear}` +- response tokens: + `{prefer, rebase-bas, workflow, because, history, stay, linear, + reviewer, easi, time}` +- overlap: 7 / 7 memory tokens = 100% > 70% threshold → match + +### Why not fix it in this report + +Three reasons: + +1. The validation report is supposed to be evidence, not a fix + spec. A separate commit will introduce the new matcher with + its own tests. +2. The token-overlap matcher needs its own design review for edge + cases (very long memories, very short responses, technical + abbreviations, code snippets in responses). +3. Mixing the report and the fix into one commit would muddle the + audit trail. The report is the empirical evidence; the fix is + the response. + +The fix is queued as the next Phase 9 maintenance commit and is +flagged in the next-steps section below. + +## Other observations + +### Extraction is conservative on purpose, and that's working + +Sample 7 is the most important data point in the whole run. +A natural prose response with no structural cues produced zero +candidates. **This is exactly the design intent** — the extractor +should be loud about explicit decisions/constraints/requirements +and quiet about everything else. If the extractor were too loose +the review queue would fill up with low-value items and the human +would stop reviewing. + +After this run I have measurably more confidence that the V0 rule +set is the right starting point. Future rules can be added one at +a time as we see specific patterns the extractor misses, instead of +guessing at what might be useful. + +### Confidence on candidates + +All extracted candidates landed at the default `confidence=0.5`, +which is what the extractor is currently hardcoded to do. The +`promotion-rules.md` doc proposes a per-rule prior with a +structural-signal multiplier and freshness bonus. None of that is +implemented yet. The validation didn't reveal any urgency around +this — humans review the candidates either way — but it confirms +that the priors-and-multipliers refinement is a reasonable next +step rather than a critical one. + +### Multiple cues in one interaction + +Sample 8 confirmed an important property: **three structural +cues in the same response do not collide in dedup**. The dedup +key is `(memory_type, normalized_content, rule)`, and since each +cue produced a distinct (type, content, rule) tuple, all three +landed cleanly. + +This matters because real working sessions naturally bundle +multiple decisions/constraints/requirements into one summary. +The extractor handles those bundles correctly. + +### Project scoping + +Each candidate carries the `project` from the source interaction +into its own `project` field. Sample 6 (p05) and sample 8 (p06) +both produced candidates with the right project. This is +non-obvious because the extractor module never explicitly looks +at project — it inherits from the interaction it's scanning. Worth +keeping in mind when the entity extractor is built: same pattern +should apply. + +## What this validates and what it doesn't + +### Validates + +- The Phase 9 Commit C extractor's rule set is well-tuned for + hand-written structural cues +- The dedup logic does the right thing across multiple cues +- The "drop candidates that match an existing active memory" filter + works (would have been visible if any seeded memory had matched + one of the heading texts — none did, but the code path is the + same one that's covered in `tests/test_extractor.py`) +- The `prose-only-no-cues` no-fire case is solid +- Long content is preserved without truncation +- Project scoping flows through the pipeline + +### Does NOT validate + +- The reinforcement matcher (clearly, since it caught nothing) +- The behaviour against very long documents (each sample was + under 700 chars; real interaction responses can be 10× that) +- The behaviour against responses that contain code blocks (the + extractor's regex rules don't handle code-block fenced sections + specially) +- Cross-interaction promotion-to-active flow (no candidate was + promoted in this run; the lifecycle is covered by the unit tests + but not by this empirical exercise) +- The behaviour at scale: 8 interactions is a one-shot. We need + to see the queue after 50+ before judging reviewer ergonomics. + +### Recommended next empirical exercises + +1. **Real conversation capture**, using a slash command from a + real Claude Code session against either a local or Dalidou + AtoCore instance. The synthetic responses in this script are + honest paraphrases but they're still hand-curated. +2. **Bulk capture from existing PKM**, ingesting a few real + project notes through the extractor as if they were + interactions. This stresses the rules against documents that + weren't written with the extractor in mind. +3. **Reinforcement matcher rerun** after the token-overlap + matcher lands. + +## Action items from this report + +- [ ] **Fix reinforcement matcher** with token-overlap rule + described in the "Recommended fix" section above. Owner: + next session. Severity: medium-high. +- [x] **Document the extractor's V0 strictness** as a working + property, not a limitation. Sample 7 makes the case. +- [ ] **Build the slash command** so the next validation run + can use real (not synthetic) interactions. Tracked in + Session 2 of the current planning sprint. +- [ ] **Run a 50+ interaction batch** to evaluate reviewer + ergonomics. Deferred until the slash command exists. + +## Reproducibility + +The script is deterministic. Re-running it will produce +identical results because: + +- the data dir is wiped on every run +- the sample interactions are constants +- the memory uuid generation is non-deterministic but the + important fields (content, type, count, rule) are not +- the `data/validation/phase9-first-use/` directory is gitignored, + so no state leaks across runs + +To reproduce this exact report: + +```bash +python scripts/phase9_first_real_use.py +``` + +To get JSON output for downstream tooling: + +```bash +python scripts/phase9_first_real_use.py --json +``` diff --git a/scripts/phase9_first_real_use.py b/scripts/phase9_first_real_use.py new file mode 100644 index 0000000..56d4648 --- /dev/null +++ b/scripts/phase9_first_real_use.py @@ -0,0 +1,393 @@ +"""Phase 9 first-real-use validation script. + +Captures a small set of representative interactions drawn from a real +working session, runs the full Phase 9 loop (capture -> reinforce -> +extract) over them, and prints what each step produced. The intent is +to generate empirical evidence about the extractor's behaviour against +prose that wasn't written to make the test pass. + +Usage: + python scripts/phase9_first_real_use.py [--data-dir PATH] + +The script writes a fresh isolated SQLite + Chroma store under the +given data dir (default: ./data/validation/phase9-first-use). The +data dir is gitignored so the script can be re-run cleanly. + +Each interaction is printed with: + - the captured interaction id + - the reinforcement results (which seeded memories were echoed) + - the extraction results (which candidates were proposed and why) + - notes on what the extractor MISSED (manually annotated below) + +The output is intentionally human-readable so the run can be saved as +the body of docs/phase9-first-real-use.md. +""" + +from __future__ import annotations + +import argparse +import json +import os +import shutil +import sys +from dataclasses import dataclass +from pathlib import Path + +# Make the package importable when run directly from the repo root. +_REPO_ROOT = Path(__file__).resolve().parent.parent +sys.path.insert(0, str(_REPO_ROOT / "src")) + + +@dataclass +class SampleInteraction: + label: str + prompt: str + response: str + project: str + notes_for_human: str + + +# --------------------------------------------------------------------------- +# Sample interactions, paraphrased from real working sessions on AtoCore +# itself. Each one is a real-feeling exchange with at least one structural +# cue the extractor should catch, plus prose the extractor should NOT +# accidentally catch. +# --------------------------------------------------------------------------- + +SAMPLES: list[SampleInteraction] = [ + SampleInteraction( + label="exdev-mount-merge-decision", + project="atocore", + prompt="The TMPDIR fix didn't work, the LOCAL_COPY_PATH change didn't work, what now?", + response=( + "We've been chasing this through three layers without success. " + "Time to stop guessing and apply the simple fix Dalidou Claude " + "proposed in the very first diagnosis.\n\n" + "## Decision: merge the two bind mounts into one\n\n" + "Both /srv/storage/repos and /srv/storage/gitea/data live on " + "/dev/sdb1, so the move is a same-filesystem rename — instant, " + "atomic, no data copy. After the move, the EXDEV class of bug " + "becomes structurally impossible because there is only one " + "mount inside the container." + ), + notes_for_human=( + "Should fire decision_heading rule. Should NOT fire requirement " + "or constraint rules. The 'becomes structurally impossible' line " + "is prose, not a candidate." + ), + ), + SampleInteraction( + label="ownership-was-the-real-fix", + project="atocore", + prompt="Still failing after the mount merge. What's actually wrong?", + response=( + "Same rejection even after the mounts were merged, which kills " + "the EXDEV theory. The error 'unable to migrate objects to " + "permanent storage' is git's catch-all for any failure in " + "tmp_objdir_migrate, and the real errno is being thrown away.\n\n" + "## Fact: the actual root cause was ownership, not cross-device link\n\n" + "After mv on the host, the moved directory had perms git could " + "navigate for some subdirs but not others. Recursive chown " + "git:git plus chmod u+rwX,g+rwX fixed it on the first try." + ), + notes_for_human=( + "Should fire fact_heading rule, target type knowledge. The " + "'unable to migrate' string is inside prose and should NOT " + "trigger any rule." + ), + ), + SampleInteraction( + label="memory-vs-entity-canonical-home", + project="atocore", + prompt="Does a Decision live in memories or entities?", + response=( + "Both layers exist for different reasons. After thinking through " + "the ontology question, I'm landing on a clean rule: one " + "canonical home per concept, with explicit graduation when a " + "memory eventually deserves structure.\n\n" + "## Decision: project, knowledge, and adaptation memory types graduate to entities\n\n" + "The identity, preference, and episodic types stay in the memory " + "layer because they're personal and unstructured. The other " + "three become entities once the engineering layer V1 ships. The " + "original memory rows are NEVER deleted — they become frozen " + "historical pointers to whichever entity they graduated into." + ), + notes_for_human=( + "Should fire decision_heading rule. Long content (over 100 " + "chars) — verifies the max-length cap of 280 doesn't truncate." + ), + ), + SampleInteraction( + label="auto-promotion-deferred", + project="atocore", + prompt="Should the engineering layer auto-promote candidates?", + response=( + "The trust hierarchy is the whole point of AtoCore. If we let " + "the system auto-promote, we trade away the 'bad memory is " + "worse than no memory' invariant for a marginal speed gain.\n\n" + "## Decision: no auto-promotion in V1, ever\n\n" + "The schema and API are designed so policies can be added " + "later without migration, but the V1 stance is that every " + "promotion needs a human signoff. The hard line: nothing ever " + "moves into trusted project state automatically, even with the " + "engineering layer running." + ), + notes_for_human=( + "Should fire decision_heading rule. Has 'invariant' jargon — " + "verify the extractor doesn't get distracted." + ), + ), + SampleInteraction( + label="preference-rebase-workflow", + project="atocore", + prompt="Should we use rebase or merge commits for this branch?", + response=( + "I prefer rebase-based workflows because the history stays " + "linear and reviewers have an easier time. For the dalidou " + "storage-foundation branch we'll rebase before merging into " + "main so the timeline reads as one coherent narrative." + ), + notes_for_human=( + "Should fire preference_sentence rule. Note: the rule looks " + "for 'I prefer' which is exactly the wording used." + ), + ), + SampleInteraction( + label="constraint-from-doc-cite", + project="p05-interferometer", + prompt="What's the budget situation for p05 right now?", + response=( + "Looking at the latest design notes, the constraint is fairly " + "tight on the optical path side.\n\n" + "### Constraint: total optical path stability must stay within 10 nm RMS at 1 kHz\n\n" + "The error budget allocates 6 nm to the laser source and 4 nm " + "to the mechanical mounts and frame combined." + ), + notes_for_human=( + "Should fire constraint_heading rule with type=project. The " + "second sentence ('error budget allocates 6 nm') is descriptive " + "and should NOT fire any rule because it doesn't have a " + "structural cue." + ), + ), + SampleInteraction( + label="prose-only-no-cues", + project="atocore", + prompt="What should I work on next?", + response=( + "Looking at the current state of AtoCore, the next natural " + "step is to validate Phase 9 against real interactions before " + "starting the engineering layer implementation. Empirical " + "evidence is missing and the planning docs would benefit from " + "real signal." + ), + notes_for_human=( + "Should produce ZERO candidates. Pure recommendation prose, no " + "structural cues. If this fires anything the extractor is too " + "loose." + ), + ), + SampleInteraction( + label="multiple-cues-in-one-interaction", + project="p06-polisher", + prompt="Summarize today's polisher session", + response=( + "We worked through three things in the polisher session today.\n\n" + "## Decision: defer the laser interlock redesign to after the July milestone\n\n" + "## Constraint: the calibration routine must complete in under 90 seconds for production use\n\n" + "## Requirement: the polisher must hold position to within 0.5 micron at 1 g loading\n\n" + "Action items captured for the next sync." + ), + notes_for_human=( + "Three rules should fire on the same interaction: " + "decision_heading -> adaptation, constraint_heading -> project, " + "requirement_heading -> project. Verify dedup doesn't merge them." + ), + ), +] + + +def setup_environment(data_dir: Path) -> None: + """Configure AtoCore to use an isolated data directory for this run.""" + if data_dir.exists(): + shutil.rmtree(data_dir) + data_dir.mkdir(parents=True, exist_ok=True) + os.environ["ATOCORE_DATA_DIR"] = str(data_dir) + os.environ.setdefault("ATOCORE_DEBUG", "true") + # Reset cached settings so the new env vars take effect + import atocore.config as config + + config.settings = config.Settings() + import atocore.retrieval.vector_store as vs + + vs._store = None + + +def seed_memories() -> dict[str, str]: + """Insert a small set of seed active memories so reinforcement has + something to match against.""" + from atocore.memory.service import create_memory + + seeded: dict[str, str] = {} + seeded["pref_rebase"] = create_memory( + memory_type="preference", + content="prefers rebase-based workflows because history stays linear", + confidence=0.6, + ).id + seeded["pref_concise"] = create_memory( + memory_type="preference", + content="writes commit messages focused on the why, not the what", + confidence=0.6, + ).id + seeded["identity_runs_atocore"] = create_memory( + memory_type="identity", + content="mechanical engineer who runs AtoCore for context engineering", + confidence=0.9, + ).id + return seeded + + +def run_sample(sample: SampleInteraction) -> dict: + """Capture one sample, run extraction, return a result dict.""" + from atocore.interactions.service import record_interaction + from atocore.memory.extractor import extract_candidates_from_interaction + + interaction = record_interaction( + prompt=sample.prompt, + response=sample.response, + project=sample.project, + client="phase9-first-real-use", + session_id="first-real-use", + reinforce=True, + ) + candidates = extract_candidates_from_interaction(interaction) + + return { + "label": sample.label, + "project": sample.project, + "interaction_id": interaction.id, + "expected_notes": sample.notes_for_human, + "candidate_count": len(candidates), + "candidates": [ + { + "memory_type": c.memory_type, + "rule": c.rule, + "content": c.content, + "source_span": c.source_span[:120], + } + for c in candidates + ], + } + + +def report_seed_memory_state(seeded_ids: dict[str, str]) -> dict: + from atocore.memory.service import get_memories + + state = {} + for label, mid in seeded_ids.items(): + rows = [m for m in get_memories(limit=200) if m.id == mid] + if not rows: + state[label] = None + continue + m = rows[0] + state[label] = { + "id": m.id, + "memory_type": m.memory_type, + "content_preview": m.content[:80], + "confidence": round(m.confidence, 4), + "reference_count": m.reference_count, + "last_referenced_at": m.last_referenced_at, + } + return state + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument( + "--data-dir", + default=str(_REPO_ROOT / "data" / "validation" / "phase9-first-use"), + help="Isolated data directory to use for this validation run", + ) + parser.add_argument( + "--json", + action="store_true", + help="Emit machine-readable JSON instead of human prose", + ) + args = parser.parse_args() + + data_dir = Path(args.data_dir).resolve() + setup_environment(data_dir) + + from atocore.models.database import init_db + from atocore.context.project_state import init_project_state_schema + + init_db() + init_project_state_schema() + + seeded = seed_memories() + sample_results = [run_sample(s) for s in SAMPLES] + final_seed_state = report_seed_memory_state(seeded) + + if args.json: + json.dump( + { + "data_dir": str(data_dir), + "seeded_memories_initial": list(seeded.keys()), + "samples": sample_results, + "seed_memory_state_after_run": final_seed_state, + }, + sys.stdout, + indent=2, + default=str, + ) + return 0 + + print("=" * 78) + print("Phase 9 first-real-use validation run") + print("=" * 78) + print(f"Isolated data dir: {data_dir}") + print() + print("Seeded the memory store with 3 active memories:") + for label, mid in seeded.items(): + print(f" - {label} ({mid[:8]})") + print() + print("-" * 78) + print(f"Running {len(SAMPLES)} sample interactions ...") + print("-" * 78) + + for result in sample_results: + print() + print(f"## {result['label']} [project={result['project']}]") + print(f" interaction_id={result['interaction_id'][:8]}") + print(f" expected: {result['expected_notes']}") + print(f" candidates produced: {result['candidate_count']}") + for i, cand in enumerate(result["candidates"], 1): + print( + f" [{i}] type={cand['memory_type']:11s} " + f"rule={cand['rule']:21s} " + f"content={cand['content']!r}" + ) + + print() + print("-" * 78) + print("Reinforcement state on seeded memories AFTER all interactions:") + print("-" * 78) + for label, state in final_seed_state.items(): + if state is None: + print(f" {label}: ") + continue + print( + f" {label}: confidence={state['confidence']:.4f} " + f"refs={state['reference_count']} " + f"last={state['last_referenced_at'] or '-'}" + ) + + print() + print("=" * 78) + print("Run complete. Data written to:", data_dir) + print("=" * 78) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/src/atocore/api/routes.py b/src/atocore/api/routes.py index 1acb1fa..85b4ad3 100644 --- a/src/atocore/api/routes.py +++ b/src/atocore/api/routes.py @@ -31,6 +31,7 @@ from atocore.interactions.service import ( record_interaction, ) from atocore.memory.extractor import ( + EXTRACTOR_VERSION, MemoryCandidate, extract_candidates_from_interaction, ) @@ -622,6 +623,7 @@ def api_extract_from_interaction( "candidate_count": len(candidates), "persisted": payload.persist, "persisted_ids": persisted_ids, + "extractor_version": EXTRACTOR_VERSION, "candidates": [ { "memory_type": c.memory_type, @@ -630,6 +632,7 @@ def api_extract_from_interaction( "confidence": c.confidence, "rule": c.rule, "source_span": c.source_span, + "extractor_version": c.extractor_version, } for c in candidates ], diff --git a/src/atocore/main.py b/src/atocore/main.py index 0314b0c..eba3434 100644 --- a/src/atocore/main.py +++ b/src/atocore/main.py @@ -1,5 +1,7 @@ """AtoCore — FastAPI application entry point.""" +from contextlib import asynccontextmanager + from fastapi import FastAPI from atocore.api.routes import router @@ -9,18 +11,19 @@ from atocore.ingestion.pipeline import get_source_status from atocore.models.database import init_db from atocore.observability.logger import get_logger, setup_logging -app = FastAPI( - title="AtoCore", - description="Personal Context Engine for LLM interactions", - version="0.1.0", -) -app.include_router(router) log = get_logger("main") -@app.on_event("startup") -def startup(): +@asynccontextmanager +async def lifespan(app: FastAPI): + """Run setup before the first request and teardown after shutdown. + + Replaces the deprecated ``@app.on_event("startup")`` hook with the + modern ``lifespan`` context manager. Setup runs synchronously (the + underlying calls are blocking I/O) so no await is needed; the + function still must be async per the FastAPI contract. + """ setup_logging() _config.ensure_runtime_dirs() init_db() @@ -32,6 +35,19 @@ def startup(): chroma_path=str(_config.settings.chroma_path), source_status=get_source_status(), ) + yield + # No teardown work needed today; SQLite connections are short-lived + # and the Chroma client cleans itself up on process exit. + + +app = FastAPI( + title="AtoCore", + description="Personal Context Engine for LLM interactions", + version="0.1.0", + lifespan=lifespan, +) + +app.include_router(router) if __name__ == "__main__": diff --git a/src/atocore/memory/extractor.py b/src/atocore/memory/extractor.py index f7513b7..92b7f8b 100644 --- a/src/atocore/memory/extractor.py +++ b/src/atocore/memory/extractor.py @@ -46,6 +46,18 @@ from atocore.observability.logger import get_logger log = get_logger("extractor") + +# Bumped whenever the rule set, regex shapes, or post-processing +# semantics change in a way that could affect candidate output. The +# promotion-rules doc requires every candidate to record the version +# of the extractor that produced it so old candidates can be re-evaluated +# (or kept as-is) when the rules evolve. +# +# History: +# 0.1.0 - initial Phase 9 Commit C rule set (Apr 6, 2026) +EXTRACTOR_VERSION = "0.1.0" + + # Every candidate is attributed to the rule that fired so reviewers can # audit why it was proposed. @dataclass @@ -57,6 +69,7 @@ class MemoryCandidate: project: str = "" confidence: float = 0.5 # default review-queue confidence source_interaction_id: str = "" + extractor_version: str = EXTRACTOR_VERSION # ---------------------------------------------------------------------------