phase9 first-real-use validation + small hygiene wins
Session 1 of the four-session plan. Empirically exercises the Phase 9
loop (capture -> reinforce -> extract) for the first time and lands
three small hygiene fixes.
Validation script + report
--------------------------
scripts/phase9_first_real_use.py — reproducible script that:
- sets up an isolated SQLite + Chroma store under
data/validation/phase9-first-use (gitignored)
- seeds 3 active memories
- runs 8 sample interactions through capture + reinforce + extract
- prints what each step produced and reinforcement state at the end
- supports --json output for downstream tooling
docs/phase9-first-real-use.md — narrative report of the run with:
- extraction results table (8/8 expectations met exactly)
- the empirical finding that REINFORCEMENT MATCHED ZERO seeds
despite sample 5 clearly echoing the rebase preference memory
- root cause analysis: the substring matcher is too brittle for
natural paraphrases (e.g. "prefers" vs "I prefer", "history"
vs "the history")
- recommended fix: replace substring matcher with a token-overlap
matcher (>=70% of memory tokens present in response, with
light stemming and a small stop list)
- explicit note that the fix is queued as a follow-up commit, not
bundled into the report — keeps the audit trail clean
Key extraction results from the run:
- all 7 heading/sentence rules fired correctly
- 0 false positives on the prose-only sample (the most important
sanity check)
- long content preserved without truncation
- dedup correctly kept three distinct cues from one interaction
- project scoping flowed cleanly through the pipeline
Hygiene 1: FastAPI lifespan migration (src/atocore/main.py)
- Replaced @app.on_event("startup") with the modern @asynccontextmanager
lifespan handler
- Same setup work (setup_logging, ensure_runtime_dirs, init_db,
init_project_state_schema, startup_ready log)
- Removes the two on_event deprecation warnings from every test run
- Test suite now shows 1 warning instead of 3
Hygiene 2: EXTRACTOR_VERSION constant (src/atocore/memory/extractor.py)
- Added EXTRACTOR_VERSION = "0.1.0" with a versioned change log comment
- MemoryCandidate dataclass carries extractor_version on every candidate
- POST /interactions/{id}/extract response now includes extractor_version
on both the top level (current run) and on each candidate
- Implements the versioning requirement called out in
docs/architecture/promotion-rules.md so old candidates can be
identified and re-evaluated when the rule set evolves
Hygiene 3: ~/.git-credentials cleanup (out-of-tree, not committed)
- Removed the dead OAUTH_USER:<jwt> line for dalidou:3000 that was
being silently rewritten by the system credential manager on every
push attempt
- Configured credential.http://dalidou:3000.helper with the empty-string
sentinel pattern so the URL-specific helper chain is exactly
["", store] instead of inheriting the system-level "manager" helper
that ships with Git for Windows
- Same fix for the 100.80.199.40 (Tailscale) entry
- Verified end to end: a fresh push using only the cleaned credentials
file (no embedded URL) authenticates as Antoine and lands cleanly
Full suite: 160 passing (no change from previous), 1 warning
(was 3) thanks to the lifespan migration.
This commit is contained in:
@@ -31,6 +31,7 @@ from atocore.interactions.service import (
|
||||
record_interaction,
|
||||
)
|
||||
from atocore.memory.extractor import (
|
||||
EXTRACTOR_VERSION,
|
||||
MemoryCandidate,
|
||||
extract_candidates_from_interaction,
|
||||
)
|
||||
@@ -622,6 +623,7 @@ def api_extract_from_interaction(
|
||||
"candidate_count": len(candidates),
|
||||
"persisted": payload.persist,
|
||||
"persisted_ids": persisted_ids,
|
||||
"extractor_version": EXTRACTOR_VERSION,
|
||||
"candidates": [
|
||||
{
|
||||
"memory_type": c.memory_type,
|
||||
@@ -630,6 +632,7 @@ def api_extract_from_interaction(
|
||||
"confidence": c.confidence,
|
||||
"rule": c.rule,
|
||||
"source_span": c.source_span,
|
||||
"extractor_version": c.extractor_version,
|
||||
}
|
||||
for c in candidates
|
||||
],
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
"""AtoCore — FastAPI application entry point."""
|
||||
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
from fastapi import FastAPI
|
||||
|
||||
from atocore.api.routes import router
|
||||
@@ -9,18 +11,19 @@ from atocore.ingestion.pipeline import get_source_status
|
||||
from atocore.models.database import init_db
|
||||
from atocore.observability.logger import get_logger, setup_logging
|
||||
|
||||
app = FastAPI(
|
||||
title="AtoCore",
|
||||
description="Personal Context Engine for LLM interactions",
|
||||
version="0.1.0",
|
||||
)
|
||||
|
||||
app.include_router(router)
|
||||
log = get_logger("main")
|
||||
|
||||
|
||||
@app.on_event("startup")
|
||||
def startup():
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
"""Run setup before the first request and teardown after shutdown.
|
||||
|
||||
Replaces the deprecated ``@app.on_event("startup")`` hook with the
|
||||
modern ``lifespan`` context manager. Setup runs synchronously (the
|
||||
underlying calls are blocking I/O) so no await is needed; the
|
||||
function still must be async per the FastAPI contract.
|
||||
"""
|
||||
setup_logging()
|
||||
_config.ensure_runtime_dirs()
|
||||
init_db()
|
||||
@@ -32,6 +35,19 @@ def startup():
|
||||
chroma_path=str(_config.settings.chroma_path),
|
||||
source_status=get_source_status(),
|
||||
)
|
||||
yield
|
||||
# No teardown work needed today; SQLite connections are short-lived
|
||||
# and the Chroma client cleans itself up on process exit.
|
||||
|
||||
|
||||
app = FastAPI(
|
||||
title="AtoCore",
|
||||
description="Personal Context Engine for LLM interactions",
|
||||
version="0.1.0",
|
||||
lifespan=lifespan,
|
||||
)
|
||||
|
||||
app.include_router(router)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -46,6 +46,18 @@ from atocore.observability.logger import get_logger
|
||||
|
||||
log = get_logger("extractor")
|
||||
|
||||
|
||||
# Bumped whenever the rule set, regex shapes, or post-processing
|
||||
# semantics change in a way that could affect candidate output. The
|
||||
# promotion-rules doc requires every candidate to record the version
|
||||
# of the extractor that produced it so old candidates can be re-evaluated
|
||||
# (or kept as-is) when the rules evolve.
|
||||
#
|
||||
# History:
|
||||
# 0.1.0 - initial Phase 9 Commit C rule set (Apr 6, 2026)
|
||||
EXTRACTOR_VERSION = "0.1.0"
|
||||
|
||||
|
||||
# Every candidate is attributed to the rule that fired so reviewers can
|
||||
# audit why it was proposed.
|
||||
@dataclass
|
||||
@@ -57,6 +69,7 @@ class MemoryCandidate:
|
||||
project: str = ""
|
||||
confidence: float = 0.5 # default review-queue confidence
|
||||
source_interaction_id: str = ""
|
||||
extractor_version: str = EXTRACTOR_VERSION
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
Reference in New Issue
Block a user