phase9 first-real-use validation + small hygiene wins

Session 1 of the four-session plan. Empirically exercises the Phase 9
loop (capture -> reinforce -> extract) for the first time and lands
three small hygiene fixes.

Validation script + report
--------------------------
scripts/phase9_first_real_use.py — reproducible script that:
  - sets up an isolated SQLite + Chroma store under
    data/validation/phase9-first-use (gitignored)
  - seeds 3 active memories
  - runs 8 sample interactions through capture + reinforce + extract
  - prints what each step produced and reinforcement state at the end
  - supports --json output for downstream tooling

docs/phase9-first-real-use.md — narrative report of the run with:
  - extraction results table (8/8 expectations met exactly)
  - the empirical finding that REINFORCEMENT MATCHED ZERO seeds
    despite sample 5 clearly echoing the rebase preference memory
  - root cause analysis: the substring matcher is too brittle for
    natural paraphrases (e.g. "prefers" vs "I prefer", "history"
    vs "the history")
  - recommended fix: replace substring matcher with a token-overlap
    matcher (>=70% of memory tokens present in response, with
    light stemming and a small stop list)
  - explicit note that the fix is queued as a follow-up commit, not
    bundled into the report — keeps the audit trail clean

Key extraction results from the run:
  - all 7 heading/sentence rules fired correctly
  - 0 false positives on the prose-only sample (the most important
    sanity check)
  - long content preserved without truncation
  - dedup correctly kept three distinct cues from one interaction
  - project scoping flowed cleanly through the pipeline

Hygiene 1: FastAPI lifespan migration (src/atocore/main.py)
- Replaced @app.on_event("startup") with the modern @asynccontextmanager
  lifespan handler
- Same setup work (setup_logging, ensure_runtime_dirs, init_db,
  init_project_state_schema, startup_ready log)
- Removes the two on_event deprecation warnings from every test run
- Test suite now shows 1 warning instead of 3

Hygiene 2: EXTRACTOR_VERSION constant (src/atocore/memory/extractor.py)
- Added EXTRACTOR_VERSION = "0.1.0" with a versioned change log comment
- MemoryCandidate dataclass carries extractor_version on every candidate
- POST /interactions/{id}/extract response now includes extractor_version
  on both the top level (current run) and on each candidate
- Implements the versioning requirement called out in
  docs/architecture/promotion-rules.md so old candidates can be
  identified and re-evaluated when the rule set evolves

Hygiene 3: ~/.git-credentials cleanup (out-of-tree, not committed)
- Removed the dead OAUTH_USER:<jwt> line for dalidou:3000 that was
  being silently rewritten by the system credential manager on every
  push attempt
- Configured credential.http://dalidou:3000.helper with the empty-string
  sentinel pattern so the URL-specific helper chain is exactly
  ["", store] instead of inheriting the system-level "manager" helper
  that ships with Git for Windows
- Same fix for the 100.80.199.40 (Tailscale) entry
- Verified end to end: a fresh push using only the cleaned credentials
  file (no embedded URL) authenticates as Antoine and lands cleanly

Full suite: 160 passing (no change from previous), 1 warning
(was 3) thanks to the lifespan migration.
This commit is contained in:
2026-04-07 06:16:35 -04:00
parent bd291ff874
commit b9da5b6d84
5 changed files with 754 additions and 8 deletions

View File

@@ -0,0 +1,321 @@
# Phase 9 First Real Use Report
## What this is
The first empirical exercise of the Phase 9 reflection loop after
Commits A, B, and C all landed. The goal is to find out where the
extractor and the reinforcement matcher actually behave well versus
where their behaviour drifts from the design intent.
The validation is reproducible. To re-run:
```bash
python scripts/phase9_first_real_use.py
```
This writes an isolated SQLite + Chroma store under
`data/validation/phase9-first-use/` (gitignored), seeds three active
memories, then runs eight sample interactions through the full
capture → reinforce → extract pipeline.
## What we ran
Eight synthetic interactions, each paraphrased from a real working
session about AtoCore itself or the active engineering projects:
| # | Label | Project | Expected |
|---|--------------------------------------|----------------------|---------------------------|
| 1 | exdev-mount-merge-decision | atocore | 1 decision_heading |
| 2 | ownership-was-the-real-fix | atocore | 1 fact_heading |
| 3 | memory-vs-entity-canonical-home | atocore | 1 decision_heading (long) |
| 4 | auto-promotion-deferred | atocore | 1 decision_heading |
| 5 | preference-rebase-workflow | atocore | 1 preference_sentence |
| 6 | constraint-from-doc-cite | p05-interferometer | 1 constraint_heading |
| 7 | prose-only-no-cues | atocore | 0 candidates |
| 8 | multiple-cues-in-one-interaction | p06-polisher | 3 distinct rules |
Plus 3 seed memories were inserted before the run:
- `pref_rebase`: "prefers rebase-based workflows because history stays linear" (preference, 0.6)
- `pref_concise`: "writes commit messages focused on the why, not the what" (preference, 0.6)
- `identity_runs_atocore`: "mechanical engineer who runs AtoCore for context engineering" (identity, 0.9)
## What happened — extraction (the good news)
**Every extraction expectation was met exactly.** All eight samples
produced the predicted candidate count and the predicted rule
classifications:
| Sample | Expected | Got | Pass |
|---------------------------------------|----------|-----|------|
| exdev-mount-merge-decision | 1 | 1 | ✅ |
| ownership-was-the-real-fix | 1 | 1 | ✅ |
| memory-vs-entity-canonical-home | 1 | 1 | ✅ |
| auto-promotion-deferred | 1 | 1 | ✅ |
| preference-rebase-workflow | 1 | 1 | ✅ |
| constraint-from-doc-cite | 1 | 1 | ✅ |
| prose-only-no-cues | **0** | **0** | ✅ |
| multiple-cues-in-one-interaction | 3 | 3 | ✅ |
**Total: 9 candidates from 8 interactions, 0 false positives, 0 misses
on heading patterns or sentence patterns.**
The extractor's strictness is well-tuned for the kinds of structural
cues we actually use. Things worth noting:
- **Sample 7 (`prose-only-no-cues`) produced zero candidates as
designed.** This is the most important sanity check — it confirms
the extractor won't fill the review queue with general prose when
there's no structural intent.
- **Sample 3's long content was preserved without truncation.** The
280-char max wasn't hit, and the content kept its full meaning.
- **Sample 8 produced three distinct rules in one interaction**
(decision_heading, constraint_heading, requirement_heading) without
the dedup key collapsing them. The dedup key is
`(memory_type, normalized_content, rule)` and the three are all
different on at least one axis, so they coexist as expected.
- **The prose around each heading was correctly ignored.** Sample 6
has a second sentence ("the error budget allocates 6 nm to the
laser source...") that does NOT have a structural cue, and the
extractor correctly didn't fire on it.
## What happened — reinforcement (the empirical finding)
**Reinforcement matched zero seeded memories across all 8 samples,
even when the response clearly echoed the seed.**
Sample 5's response was:
> *"I prefer rebase-based workflows because the history stays linear
> and reviewers have an easier time."*
The seeded `pref_rebase` memory was:
> *"prefers rebase-based workflows because history stays linear"*
A human reading both says these are the same fact. The reinforcement
matcher disagrees. After all 8 interactions:
```
pref_rebase: confidence=0.6000 refs=0 last=-
pref_concise: confidence=0.6000 refs=0 last=-
identity_runs_atocore: confidence=0.9000 refs=0 last=-
```
**Nothing moved.** This is the most important finding from this
validation pass.
### Why the matcher missed it
The current `_memory_matches` rule (in
`src/atocore/memory/reinforcement.py`) does a normalized substring
match: it lowercases both sides, collapses whitespace, then asks
"does the leading 80-char window of the memory content appear as a
substring in the response?"
For the rebase example:
- needle (normalized): `prefers rebase-based workflows because history stays linear`
- haystack (normalized): `i prefer rebase-based workflows because the history stays linear and reviewers have an easier time.`
The needle starts with `prefers` (with the trailing `s`), and the
haystack has `prefer` (without the `s`, because of the first-person
voice). And the needle has `because history stays linear`, while the
haystack has `because the history stays linear`. **Two small natural
paraphrases, and the substring fails.**
This isn't a bug in the matcher's implementation — it's doing
exactly what it was specified to do. It's a design limitation: the
substring rule is too brittle for real prose, where the same fact
gets re-stated with different verb forms, articles, and word order.
### Severity
**Medium-high.** Reinforcement is the entire point of Commit B.
A reinforcement matcher that never fires on natural paraphrases
will leave seeded memories with stale confidence forever. The
reflection loop runs but it doesn't actually reinforce anything.
That hollows out the value of having reinforcement at all.
It is not a critical bug because:
- Nothing breaks. The pipeline still runs cleanly.
- Reinforcement is supposed to be a *signal*, not the only path to
high confidence — humans can still curate confidence directly.
- The candidate-extraction path (Commit C) is unaffected and works
perfectly.
But it does need to be addressed before Phase 9 can be considered
operationally complete.
## Recommended fix (deferred to a follow-up commit)
Replace the substring matcher with a token-overlap matcher. The
specification:
1. Tokenize both memory content and response into lowercase words
of length >= 3, dropping a small stop list (`the`, `a`, `an`,
`and`, `or`, `of`, `to`, `is`, `was`, `that`, `this`, `with`,
`for`, `from`, `into`).
2. Stem aggressively (or at minimum, fold trailing `s` and `ed`
so `prefers`/`prefer`/`preferred` collapse to one token).
3. A match exists if **at least 70% of the memory's content
tokens** appear in the response token set.
4. Memory content must still be at least `_MIN_MEMORY_CONTENT_LENGTH`
characters to be considered.
This is more permissive than the substring rule but still tight
enough to avoid spurious matches on generic words. It would have
caught the rebase example because:
- memory tokens (after stop-list and stemming):
`{prefer, rebase-bas, workflow, because, history, stay, linear}`
- response tokens:
`{prefer, rebase-bas, workflow, because, history, stay, linear,
reviewer, easi, time}`
- overlap: 7 / 7 memory tokens = 100% > 70% threshold → match
### Why not fix it in this report
Three reasons:
1. The validation report is supposed to be evidence, not a fix
spec. A separate commit will introduce the new matcher with
its own tests.
2. The token-overlap matcher needs its own design review for edge
cases (very long memories, very short responses, technical
abbreviations, code snippets in responses).
3. Mixing the report and the fix into one commit would muddle the
audit trail. The report is the empirical evidence; the fix is
the response.
The fix is queued as the next Phase 9 maintenance commit and is
flagged in the next-steps section below.
## Other observations
### Extraction is conservative on purpose, and that's working
Sample 7 is the most important data point in the whole run.
A natural prose response with no structural cues produced zero
candidates. **This is exactly the design intent** — the extractor
should be loud about explicit decisions/constraints/requirements
and quiet about everything else. If the extractor were too loose
the review queue would fill up with low-value items and the human
would stop reviewing.
After this run I have measurably more confidence that the V0 rule
set is the right starting point. Future rules can be added one at
a time as we see specific patterns the extractor misses, instead of
guessing at what might be useful.
### Confidence on candidates
All extracted candidates landed at the default `confidence=0.5`,
which is what the extractor is currently hardcoded to do. The
`promotion-rules.md` doc proposes a per-rule prior with a
structural-signal multiplier and freshness bonus. None of that is
implemented yet. The validation didn't reveal any urgency around
this — humans review the candidates either way — but it confirms
that the priors-and-multipliers refinement is a reasonable next
step rather than a critical one.
### Multiple cues in one interaction
Sample 8 confirmed an important property: **three structural
cues in the same response do not collide in dedup**. The dedup
key is `(memory_type, normalized_content, rule)`, and since each
cue produced a distinct (type, content, rule) tuple, all three
landed cleanly.
This matters because real working sessions naturally bundle
multiple decisions/constraints/requirements into one summary.
The extractor handles those bundles correctly.
### Project scoping
Each candidate carries the `project` from the source interaction
into its own `project` field. Sample 6 (p05) and sample 8 (p06)
both produced candidates with the right project. This is
non-obvious because the extractor module never explicitly looks
at project — it inherits from the interaction it's scanning. Worth
keeping in mind when the entity extractor is built: same pattern
should apply.
## What this validates and what it doesn't
### Validates
- The Phase 9 Commit C extractor's rule set is well-tuned for
hand-written structural cues
- The dedup logic does the right thing across multiple cues
- The "drop candidates that match an existing active memory" filter
works (would have been visible if any seeded memory had matched
one of the heading texts — none did, but the code path is the
same one that's covered in `tests/test_extractor.py`)
- The `prose-only-no-cues` no-fire case is solid
- Long content is preserved without truncation
- Project scoping flows through the pipeline
### Does NOT validate
- The reinforcement matcher (clearly, since it caught nothing)
- The behaviour against very long documents (each sample was
under 700 chars; real interaction responses can be 10× that)
- The behaviour against responses that contain code blocks (the
extractor's regex rules don't handle code-block fenced sections
specially)
- Cross-interaction promotion-to-active flow (no candidate was
promoted in this run; the lifecycle is covered by the unit tests
but not by this empirical exercise)
- The behaviour at scale: 8 interactions is a one-shot. We need
to see the queue after 50+ before judging reviewer ergonomics.
### Recommended next empirical exercises
1. **Real conversation capture**, using a slash command from a
real Claude Code session against either a local or Dalidou
AtoCore instance. The synthetic responses in this script are
honest paraphrases but they're still hand-curated.
2. **Bulk capture from existing PKM**, ingesting a few real
project notes through the extractor as if they were
interactions. This stresses the rules against documents that
weren't written with the extractor in mind.
3. **Reinforcement matcher rerun** after the token-overlap
matcher lands.
## Action items from this report
- [ ] **Fix reinforcement matcher** with token-overlap rule
described in the "Recommended fix" section above. Owner:
next session. Severity: medium-high.
- [x] **Document the extractor's V0 strictness** as a working
property, not a limitation. Sample 7 makes the case.
- [ ] **Build the slash command** so the next validation run
can use real (not synthetic) interactions. Tracked in
Session 2 of the current planning sprint.
- [ ] **Run a 50+ interaction batch** to evaluate reviewer
ergonomics. Deferred until the slash command exists.
## Reproducibility
The script is deterministic. Re-running it will produce
identical results because:
- the data dir is wiped on every run
- the sample interactions are constants
- the memory uuid generation is non-deterministic but the
important fields (content, type, count, rule) are not
- the `data/validation/phase9-first-use/` directory is gitignored,
so no state leaks across runs
To reproduce this exact report:
```bash
python scripts/phase9_first_real_use.py
```
To get JSON output for downstream tooling:
```bash
python scripts/phase9_first_real_use.py --json
```

View File

@@ -0,0 +1,393 @@
"""Phase 9 first-real-use validation script.
Captures a small set of representative interactions drawn from a real
working session, runs the full Phase 9 loop (capture -> reinforce ->
extract) over them, and prints what each step produced. The intent is
to generate empirical evidence about the extractor's behaviour against
prose that wasn't written to make the test pass.
Usage:
python scripts/phase9_first_real_use.py [--data-dir PATH]
The script writes a fresh isolated SQLite + Chroma store under the
given data dir (default: ./data/validation/phase9-first-use). The
data dir is gitignored so the script can be re-run cleanly.
Each interaction is printed with:
- the captured interaction id
- the reinforcement results (which seeded memories were echoed)
- the extraction results (which candidates were proposed and why)
- notes on what the extractor MISSED (manually annotated below)
The output is intentionally human-readable so the run can be saved as
the body of docs/phase9-first-real-use.md.
"""
from __future__ import annotations
import argparse
import json
import os
import shutil
import sys
from dataclasses import dataclass
from pathlib import Path
# Make the package importable when run directly from the repo root.
_REPO_ROOT = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(_REPO_ROOT / "src"))
@dataclass
class SampleInteraction:
label: str
prompt: str
response: str
project: str
notes_for_human: str
# ---------------------------------------------------------------------------
# Sample interactions, paraphrased from real working sessions on AtoCore
# itself. Each one is a real-feeling exchange with at least one structural
# cue the extractor should catch, plus prose the extractor should NOT
# accidentally catch.
# ---------------------------------------------------------------------------
SAMPLES: list[SampleInteraction] = [
SampleInteraction(
label="exdev-mount-merge-decision",
project="atocore",
prompt="The TMPDIR fix didn't work, the LOCAL_COPY_PATH change didn't work, what now?",
response=(
"We've been chasing this through three layers without success. "
"Time to stop guessing and apply the simple fix Dalidou Claude "
"proposed in the very first diagnosis.\n\n"
"## Decision: merge the two bind mounts into one\n\n"
"Both /srv/storage/repos and /srv/storage/gitea/data live on "
"/dev/sdb1, so the move is a same-filesystem rename — instant, "
"atomic, no data copy. After the move, the EXDEV class of bug "
"becomes structurally impossible because there is only one "
"mount inside the container."
),
notes_for_human=(
"Should fire decision_heading rule. Should NOT fire requirement "
"or constraint rules. The 'becomes structurally impossible' line "
"is prose, not a candidate."
),
),
SampleInteraction(
label="ownership-was-the-real-fix",
project="atocore",
prompt="Still failing after the mount merge. What's actually wrong?",
response=(
"Same rejection even after the mounts were merged, which kills "
"the EXDEV theory. The error 'unable to migrate objects to "
"permanent storage' is git's catch-all for any failure in "
"tmp_objdir_migrate, and the real errno is being thrown away.\n\n"
"## Fact: the actual root cause was ownership, not cross-device link\n\n"
"After mv on the host, the moved directory had perms git could "
"navigate for some subdirs but not others. Recursive chown "
"git:git plus chmod u+rwX,g+rwX fixed it on the first try."
),
notes_for_human=(
"Should fire fact_heading rule, target type knowledge. The "
"'unable to migrate' string is inside prose and should NOT "
"trigger any rule."
),
),
SampleInteraction(
label="memory-vs-entity-canonical-home",
project="atocore",
prompt="Does a Decision live in memories or entities?",
response=(
"Both layers exist for different reasons. After thinking through "
"the ontology question, I'm landing on a clean rule: one "
"canonical home per concept, with explicit graduation when a "
"memory eventually deserves structure.\n\n"
"## Decision: project, knowledge, and adaptation memory types graduate to entities\n\n"
"The identity, preference, and episodic types stay in the memory "
"layer because they're personal and unstructured. The other "
"three become entities once the engineering layer V1 ships. The "
"original memory rows are NEVER deleted — they become frozen "
"historical pointers to whichever entity they graduated into."
),
notes_for_human=(
"Should fire decision_heading rule. Long content (over 100 "
"chars) — verifies the max-length cap of 280 doesn't truncate."
),
),
SampleInteraction(
label="auto-promotion-deferred",
project="atocore",
prompt="Should the engineering layer auto-promote candidates?",
response=(
"The trust hierarchy is the whole point of AtoCore. If we let "
"the system auto-promote, we trade away the 'bad memory is "
"worse than no memory' invariant for a marginal speed gain.\n\n"
"## Decision: no auto-promotion in V1, ever\n\n"
"The schema and API are designed so policies can be added "
"later without migration, but the V1 stance is that every "
"promotion needs a human signoff. The hard line: nothing ever "
"moves into trusted project state automatically, even with the "
"engineering layer running."
),
notes_for_human=(
"Should fire decision_heading rule. Has 'invariant' jargon — "
"verify the extractor doesn't get distracted."
),
),
SampleInteraction(
label="preference-rebase-workflow",
project="atocore",
prompt="Should we use rebase or merge commits for this branch?",
response=(
"I prefer rebase-based workflows because the history stays "
"linear and reviewers have an easier time. For the dalidou "
"storage-foundation branch we'll rebase before merging into "
"main so the timeline reads as one coherent narrative."
),
notes_for_human=(
"Should fire preference_sentence rule. Note: the rule looks "
"for 'I prefer' which is exactly the wording used."
),
),
SampleInteraction(
label="constraint-from-doc-cite",
project="p05-interferometer",
prompt="What's the budget situation for p05 right now?",
response=(
"Looking at the latest design notes, the constraint is fairly "
"tight on the optical path side.\n\n"
"### Constraint: total optical path stability must stay within 10 nm RMS at 1 kHz\n\n"
"The error budget allocates 6 nm to the laser source and 4 nm "
"to the mechanical mounts and frame combined."
),
notes_for_human=(
"Should fire constraint_heading rule with type=project. The "
"second sentence ('error budget allocates 6 nm') is descriptive "
"and should NOT fire any rule because it doesn't have a "
"structural cue."
),
),
SampleInteraction(
label="prose-only-no-cues",
project="atocore",
prompt="What should I work on next?",
response=(
"Looking at the current state of AtoCore, the next natural "
"step is to validate Phase 9 against real interactions before "
"starting the engineering layer implementation. Empirical "
"evidence is missing and the planning docs would benefit from "
"real signal."
),
notes_for_human=(
"Should produce ZERO candidates. Pure recommendation prose, no "
"structural cues. If this fires anything the extractor is too "
"loose."
),
),
SampleInteraction(
label="multiple-cues-in-one-interaction",
project="p06-polisher",
prompt="Summarize today's polisher session",
response=(
"We worked through three things in the polisher session today.\n\n"
"## Decision: defer the laser interlock redesign to after the July milestone\n\n"
"## Constraint: the calibration routine must complete in under 90 seconds for production use\n\n"
"## Requirement: the polisher must hold position to within 0.5 micron at 1 g loading\n\n"
"Action items captured for the next sync."
),
notes_for_human=(
"Three rules should fire on the same interaction: "
"decision_heading -> adaptation, constraint_heading -> project, "
"requirement_heading -> project. Verify dedup doesn't merge them."
),
),
]
def setup_environment(data_dir: Path) -> None:
"""Configure AtoCore to use an isolated data directory for this run."""
if data_dir.exists():
shutil.rmtree(data_dir)
data_dir.mkdir(parents=True, exist_ok=True)
os.environ["ATOCORE_DATA_DIR"] = str(data_dir)
os.environ.setdefault("ATOCORE_DEBUG", "true")
# Reset cached settings so the new env vars take effect
import atocore.config as config
config.settings = config.Settings()
import atocore.retrieval.vector_store as vs
vs._store = None
def seed_memories() -> dict[str, str]:
"""Insert a small set of seed active memories so reinforcement has
something to match against."""
from atocore.memory.service import create_memory
seeded: dict[str, str] = {}
seeded["pref_rebase"] = create_memory(
memory_type="preference",
content="prefers rebase-based workflows because history stays linear",
confidence=0.6,
).id
seeded["pref_concise"] = create_memory(
memory_type="preference",
content="writes commit messages focused on the why, not the what",
confidence=0.6,
).id
seeded["identity_runs_atocore"] = create_memory(
memory_type="identity",
content="mechanical engineer who runs AtoCore for context engineering",
confidence=0.9,
).id
return seeded
def run_sample(sample: SampleInteraction) -> dict:
"""Capture one sample, run extraction, return a result dict."""
from atocore.interactions.service import record_interaction
from atocore.memory.extractor import extract_candidates_from_interaction
interaction = record_interaction(
prompt=sample.prompt,
response=sample.response,
project=sample.project,
client="phase9-first-real-use",
session_id="first-real-use",
reinforce=True,
)
candidates = extract_candidates_from_interaction(interaction)
return {
"label": sample.label,
"project": sample.project,
"interaction_id": interaction.id,
"expected_notes": sample.notes_for_human,
"candidate_count": len(candidates),
"candidates": [
{
"memory_type": c.memory_type,
"rule": c.rule,
"content": c.content,
"source_span": c.source_span[:120],
}
for c in candidates
],
}
def report_seed_memory_state(seeded_ids: dict[str, str]) -> dict:
from atocore.memory.service import get_memories
state = {}
for label, mid in seeded_ids.items():
rows = [m for m in get_memories(limit=200) if m.id == mid]
if not rows:
state[label] = None
continue
m = rows[0]
state[label] = {
"id": m.id,
"memory_type": m.memory_type,
"content_preview": m.content[:80],
"confidence": round(m.confidence, 4),
"reference_count": m.reference_count,
"last_referenced_at": m.last_referenced_at,
}
return state
def main() -> int:
parser = argparse.ArgumentParser()
parser.add_argument(
"--data-dir",
default=str(_REPO_ROOT / "data" / "validation" / "phase9-first-use"),
help="Isolated data directory to use for this validation run",
)
parser.add_argument(
"--json",
action="store_true",
help="Emit machine-readable JSON instead of human prose",
)
args = parser.parse_args()
data_dir = Path(args.data_dir).resolve()
setup_environment(data_dir)
from atocore.models.database import init_db
from atocore.context.project_state import init_project_state_schema
init_db()
init_project_state_schema()
seeded = seed_memories()
sample_results = [run_sample(s) for s in SAMPLES]
final_seed_state = report_seed_memory_state(seeded)
if args.json:
json.dump(
{
"data_dir": str(data_dir),
"seeded_memories_initial": list(seeded.keys()),
"samples": sample_results,
"seed_memory_state_after_run": final_seed_state,
},
sys.stdout,
indent=2,
default=str,
)
return 0
print("=" * 78)
print("Phase 9 first-real-use validation run")
print("=" * 78)
print(f"Isolated data dir: {data_dir}")
print()
print("Seeded the memory store with 3 active memories:")
for label, mid in seeded.items():
print(f" - {label} ({mid[:8]})")
print()
print("-" * 78)
print(f"Running {len(SAMPLES)} sample interactions ...")
print("-" * 78)
for result in sample_results:
print()
print(f"## {result['label']} [project={result['project']}]")
print(f" interaction_id={result['interaction_id'][:8]}")
print(f" expected: {result['expected_notes']}")
print(f" candidates produced: {result['candidate_count']}")
for i, cand in enumerate(result["candidates"], 1):
print(
f" [{i}] type={cand['memory_type']:11s} "
f"rule={cand['rule']:21s} "
f"content={cand['content']!r}"
)
print()
print("-" * 78)
print("Reinforcement state on seeded memories AFTER all interactions:")
print("-" * 78)
for label, state in final_seed_state.items():
if state is None:
print(f" {label}: <missing>")
continue
print(
f" {label}: confidence={state['confidence']:.4f} "
f"refs={state['reference_count']} "
f"last={state['last_referenced_at'] or '-'}"
)
print()
print("=" * 78)
print("Run complete. Data written to:", data_dir)
print("=" * 78)
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -31,6 +31,7 @@ from atocore.interactions.service import (
record_interaction,
)
from atocore.memory.extractor import (
EXTRACTOR_VERSION,
MemoryCandidate,
extract_candidates_from_interaction,
)
@@ -622,6 +623,7 @@ def api_extract_from_interaction(
"candidate_count": len(candidates),
"persisted": payload.persist,
"persisted_ids": persisted_ids,
"extractor_version": EXTRACTOR_VERSION,
"candidates": [
{
"memory_type": c.memory_type,
@@ -630,6 +632,7 @@ def api_extract_from_interaction(
"confidence": c.confidence,
"rule": c.rule,
"source_span": c.source_span,
"extractor_version": c.extractor_version,
}
for c in candidates
],

View File

@@ -1,5 +1,7 @@
"""AtoCore — FastAPI application entry point."""
from contextlib import asynccontextmanager
from fastapi import FastAPI
from atocore.api.routes import router
@@ -9,18 +11,19 @@ from atocore.ingestion.pipeline import get_source_status
from atocore.models.database import init_db
from atocore.observability.logger import get_logger, setup_logging
app = FastAPI(
title="AtoCore",
description="Personal Context Engine for LLM interactions",
version="0.1.0",
)
app.include_router(router)
log = get_logger("main")
@app.on_event("startup")
def startup():
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Run setup before the first request and teardown after shutdown.
Replaces the deprecated ``@app.on_event("startup")`` hook with the
modern ``lifespan`` context manager. Setup runs synchronously (the
underlying calls are blocking I/O) so no await is needed; the
function still must be async per the FastAPI contract.
"""
setup_logging()
_config.ensure_runtime_dirs()
init_db()
@@ -32,6 +35,19 @@ def startup():
chroma_path=str(_config.settings.chroma_path),
source_status=get_source_status(),
)
yield
# No teardown work needed today; SQLite connections are short-lived
# and the Chroma client cleans itself up on process exit.
app = FastAPI(
title="AtoCore",
description="Personal Context Engine for LLM interactions",
version="0.1.0",
lifespan=lifespan,
)
app.include_router(router)
if __name__ == "__main__":

View File

@@ -46,6 +46,18 @@ from atocore.observability.logger import get_logger
log = get_logger("extractor")
# Bumped whenever the rule set, regex shapes, or post-processing
# semantics change in a way that could affect candidate output. The
# promotion-rules doc requires every candidate to record the version
# of the extractor that produced it so old candidates can be re-evaluated
# (or kept as-is) when the rules evolve.
#
# History:
# 0.1.0 - initial Phase 9 Commit C rule set (Apr 6, 2026)
EXTRACTOR_VERSION = "0.1.0"
# Every candidate is attributed to the rule that fired so reviewers can
# audit why it was proposed.
@dataclass
@@ -57,6 +69,7 @@ class MemoryCandidate:
project: str = ""
confidence: float = 0.5 # default review-queue confidence
source_interaction_id: str = ""
extractor_version: str = EXTRACTOR_VERSION
# ---------------------------------------------------------------------------