Codex caught a real documentation accuracy bug in the previous
canonicalization doc commit (f521aab). The doc claimed that rows
written under aliases before fb6298a "still work via the
unregistered-name fallback path" — that is wrong for REGISTERED
aliases, which is exactly the case that matters.
The unregistered-name fallback only saves you when the project was
never in the registry: a row stored under "orphan-project" is read
back via "orphan-project", both pass through resolve_project_name
unchanged, and the strings line up. For a registered alias like
"p05", the helper rewrites the read key to "p05-interferometer"
but does NOT rewrite the storage key, so the legacy row becomes
silently invisible.
This commit corrects the doc and locks the gap behavior in with
a regression test, so the issue cannot be lost again.
docs/architecture/project-identity-canonicalization.md
------------------------------------------------------
- Removed the misleading claim from the "What this rule does NOT
cover" section. Replaced with a pointer to the new gap section
and an explicit statement that the migration is required before
engineering V1 ships.
- New "Compatibility gap: legacy alias-keyed rows" section between
"Why this is the trust hierarchy in action" and "The rule for
new entry points". This is the natural insertion point because
the gap is exactly the trust hierarchy failing for legacy data.
The section covers:
* a worked T0/T1 timeline showing the exact failure mode
* what is at risk on the live Dalidou DB, ranked by trust tier:
projects table (shadow rows), project_state (highest risk
because Layer 3 is most-authoritative), memories, interactions
* inspection SQL queries for measuring the actual blast radius
on the live DB before running any migration
* the spec for the migration script: walk projects, find shadow
rows, merge dependent state via the conflict model when there
are collisions, dry-run mode, idempotent
* explicit statement that this is required pre-V1 because V1
will add new project-keyed tables and the killer correctness
queries from engineering-query-catalog.md would report wrong
results against any project that has shadow rows
- "Open follow-ups" item 1 promoted from "tracked optional" to
"REQUIRED before engineering V1 ships, NOT optional" with a
more honest cost estimate (~150 LOC migration + ~50 LOC tests
+ supervised live run, not the previous optimistic ~30 LOC)
- TL;DR rewritten to mention the gap explicitly and re-order
the open follow-ups so the migration is the top priority
tests/test_project_state.py
---------------------------
- New test_legacy_alias_keyed_state_is_invisible_until_migrated
- Inserts a "p05" project row + a project_state row pointing at
it via raw SQL (bypassing set_state which now canonicalizes),
simulating a pre-fix legacy row
- Verifies the canonicalized get_state path can NOT see the row
via either the alias or the canonical id — this is the bug
- Verifies the row is still in the database (just unreachable),
so the migration script has something to find
- The docstring explicitly says: "When the legacy alias migration
script lands, this test must be inverted." Future readers will
know exactly when and how to update it.
Full suite: 175 passing (was 174), 1 warning. The +1 is the new
gap regression test.
What this commit does NOT do
----------------------------
- The migration script itself is NOT in this commit. Codex's
finding was a doc accuracy issue, and the right scope is fix
the doc + lock the gap behavior in. Writing the migration is
the next concrete step but is bigger (~200 LOC + dry-run mode
+ collision handling via the conflict model + supervised run
on the live Dalidou DB), warrants its own commit, and probably
warrants a "draft + review the dry-run output before applying"
workflow rather than a single shot.
- Existing tests are unchanged. The new test stands alone as a
documented gap; the 12 canonicalization tests from fb6298a
still pass without modification.
Three findings from codex's review of the previous P1+P2 fix. The
earlier commit (f2372ef) only fixed alias resolution at the context
builder. Codex correctly pointed out that the same fragmentation
applies at every other place a project name crosses a boundary —
project_state writes/reads, interaction capture/listing/filtering,
memory create/queries, and reinforcement's downstream queries. Plus
a real bug in the interaction `since` filter where the storage
format and the documented ISO format don't compare cleanly.
The fix is one helper used at every boundary instead of duplicating
the resolution inline.
New helper: src/atocore/projects/registry.py::resolve_project_name
---------------------------------------------------------------
- Single canonicalization boundary for project names
- Returns the canonical project_id when the input matches any
registered id or alias
- Returns the input unchanged for empty/None and for unregistered
names (preserves backwards compat with hand-curated state that
predates the registry)
- Documented as the contract that every read/write at the trust
boundary should pass through
P1 — Trusted Project State endpoints
------------------------------------
src/atocore/context/project_state.py: set_state, get_state, and
invalidate_state now all canonicalize project_name through
resolve_project_name BEFORE looking up or creating the project row.
Before this fix:
- POST /project/state with project="p05" called ensure_project("p05")
which created a separate row in the projects table
- The state row was attached to that alias project_id
- Later context builds canonicalized "p05" -> "p05-interferometer"
via the builder fix from f2372ef and never found the state
- Result: trusted state silently fragmented across alias rows
After this fix:
- The alias is resolved to the canonical id at every entry point
- Two captures (one via "p05", one via "p05-interferometer") write
to the same row
- get_state via either alias or the canonical id finds the same row
Fixes the highest-priority gap codex flagged because Trusted Project
State is supposed to be the most dependable layer in the AtoCore
trust hierarchy.
P2.a — Interaction capture project canonicalization
----------------------------------------------------
src/atocore/interactions/service.py: record_interaction now
canonicalizes project before storing, so interaction.project is
always the canonical id regardless of what the client passed.
Downstream effects:
- reinforce_from_interaction queries memories by interaction.project
-> previously missed memories stored under canonical id
-> now consistent because interaction.project IS the canonical id
- the extractor stamps candidates with interaction.project
-> previously created candidates in alias buckets
-> now creates candidates in the canonical bucket
- list_interactions(project=alias) was already broken, now fixed by
canonicalizing the filter input on the read side too
Memory service applied the same fix:
- src/atocore/memory/service.py: create_memory and get_memories
both canonicalize project through resolve_project_name
- This keeps stored memory.project consistent with the
reinforcement query path
P2.b — Interaction `since` filter format normalization
------------------------------------------------------
src/atocore/interactions/service.py: new _normalize_since helper.
The bug:
- created_at is stored as 'YYYY-MM-DD HH:MM:SS' (no timezone, UTC by
convention) so it sorts lexically and compares cleanly with the
SQLite CURRENT_TIMESTAMP default
- The `since` parameter was documented as ISO 8601 but compared as
a raw string against the storage format
- The lexically-greater 'T' separator means an ISO timestamp like
'2026-04-07T12:00:00Z' is GREATER than the storage form
'2026-04-07 12:00:00' for the same instant
- Result: a client passing ISO `since` got an empty result for any
row from the same day, even though those rows existed and were
technically "after" the cutoff in real-world time
The fix:
- _normalize_since accepts ISO 8601 with T, optional Z suffix,
optional fractional seconds, optional +HH:MM offsets
- Uses datetime.fromisoformat for parsing (Python 3.11+)
- Converts to UTC and reformats as the storage format before the
SQL comparison
- The bare storage format still works (backwards compat path is a
regex match that returns the input unchanged)
- Unparseable input is returned as-is so the comparison degrades
gracefully (rows just don't match) instead of raising and
breaking the listing endpoint
builder.py refactor
-------------------
The previous P1 fix had inline canonicalization. Now it uses the
shared helper for consistency:
- import changed from get_registered_project to resolve_project_name
- the inline lookup is replaced with a single helper call
- the comment block now points at representation-authority.md for
the canonicalization contract
New shared test fixture: tests/conftest.py::project_registry
------------------------------------------------------------
- Standardizes the registry-setup pattern that was duplicated
across test_context_builder.py, test_project_state.py,
test_interactions.py, and test_reinforcement.py
- Returns a callable that takes (project_id, [aliases]) tuples
and writes them into a temp registry file with the env var
pointed at it and config.settings reloaded
- Used by all 12 new regression tests in this commit
Tests (12 new, all green on first run)
--------------------------------------
test_project_state.py:
- test_set_state_canonicalizes_alias: write via alias, read via
every alias and the canonical id, verify same row id
- test_get_state_canonicalizes_alias_after_canonical_write
- test_invalidate_state_canonicalizes_alias
- test_unregistered_project_state_still_works (backwards compat)
test_interactions.py:
- test_record_interaction_canonicalizes_project
- test_list_interactions_canonicalizes_project_filter
- test_list_interactions_since_accepts_iso_with_t_separator
- test_list_interactions_since_accepts_z_suffix
- test_list_interactions_since_accepts_offset
- test_list_interactions_since_storage_format_still_works
test_reinforcement.py:
- test_reinforcement_works_when_capture_uses_alias (end-to-end:
capture under alias, seed memory under canonical, verify
reinforcement matches)
- test_get_memories_filter_by_alias
Full suite: 174 passing (was 162), 1 warning. The +12 is the
new regression tests, no existing tests regressed.
What's still NOT canonicalized (and why)
----------------------------------------
- _rank_chunks's secondary substring boost in builder.py — the
retriever already does the right thing via its own
_project_match_boost which calls get_registered_project. The
redundant secondary boost still uses the raw hint but it's a
multiplicative factor on top of correct retrieval, not a
filter, so it can't drop relevant chunks. Tracked as a future
cleanup but not a P1.
- update_memory's project field (you can't change a memory's
project after creation in the API anyway).
- The retriever's project_hint parameter on direct /query calls
— same reasoning as the builder boost, plus the retriever's
own get_registered_project call already handles aliases there.
Two regression fixes from codex's review of the slash command
refactor commit (78d4e97). Both findings are real and now have
covered tests.
P1 — server-side alias resolution for project_state lookup
----------------------------------------------------------
The bug:
- /context/build forwarded the caller's project hint verbatim to
get_state(project_hint), which does an exact-name lookup against
the projects table (case-insensitive but no alias resolution)
- the project registry's alias matching was only used by the
client's auto-context path and the retriever's project-match
boost, never by the server's project_state lookup
- consequence: /atocore-context "... p05" would silently miss
trusted project state stored under the canonical id
"p05-interferometer", weakening project-hinted retrieval to
the point that an explicit alias hint was *worse* than no hint
The fix in src/atocore/context/builder.py:
- import get_registered_project from the projects registry
- before calling get_state(project_hint), resolve the hint
through get_registered_project; if a registry record exists,
use the canonical project_id for the state lookup
- if no registry record exists, fall back to the raw hint so a
hand-curated project_state entry that predates the registry
still works (backwards compat with pre-registry deployments)
The retriever already does its own alias expansion via
get_registered_project for the project-match boost, so the
retriever side was never broken — only the project_state lookup
in the builder. The fix is scoped to that one call site.
Tests added in tests/test_context_builder.py:
- test_alias_hint_resolves_through_registry: stands up a fresh
registry, sets state under "p05-interferometer", then verifies
build_context with project_hint="p05" finds the state, AND
with project_hint="interferometer" (the second alias) finds it
too, AND with the canonical id finds it. Covers all three
resolution paths.
- test_unknown_hint_falls_back_to_raw_lookup: empty registry,
set state under an unregistered project name, verify the
build_context call with that name as the hint still finds the
state. Locks in the backwards-compat behavior.
P2 — slash command no-hint fallback to corpus-wide context build
----------------------------------------------------------------
The bug:
- the slash command's no-hint path called auto-context, which
returns {"status": "no_project_match"} when project detection
fails and does NOT fall back to a plain context-build
- the slash command's own help text told the user "call without
a hint to use the corpus-wide context build" — which was a lie
because the wrapper no longer did that
- consequence: generic prompts like "what changed in AtoCore
backup policy?" or any cross-project question got a useless
no_project_match envelope instead of a context pack
The fix in .claude/commands/atocore-context.md:
- the no-hint path now does the 2-step fallback dance:
1. try `auto-context "<prompt>"` for project detection
2. if the response contains "no_project_match", fall back to
`context-build "<prompt>"` (no project arg)
- both branches return a real context pack, fail-open envelope
is preserved for genuine network errors
- the underlying client surface is unchanged (no new flags, no
new subcommands) — the fallback is per-frontend logic in the
slash command, leaving auto-context's existing semantics
intact for OpenClaw and any other caller that depends on the
no_project_match envelope as a "do nothing" signal
While I was here, also tightened the slash command's argument
parsing to delegate alias-knowledge to the registry instead of
embedding a hardcoded list:
- old version had a literal list of "atocore", "p04", "p05",
"p06" and their aliases that needed manual maintenance every
time a project was added
- new version takes the last token of $ARGUMENTS and asks the
client's `detect-project` subcommand whether it's a known
alias; if matched, it's the explicit hint, if not it's part
of the prompt
- this delegates registry knowledge to the registry, where it
belongs
Unrelated improvement noted but NOT fixed in this commit:
- _rank_chunks in builder.py also has a naive substring boost
that uses the original hint without alias expansion. The
retriever already does the right thing, so this secondary
boost is redundant. Tracked as a future cleanup but not in
scope for the P1/P2 fix; codex's findings are about
project_state lookup, not about the secondary chunk boost.
Full suite: 162 passing (was 160), 1 warning. The +2 is the two
new P1 regression tests.
Phase 9 Commit C. Closes the capture loop: Commit A records what
AtoCore fed the LLM and what came back, Commit B bumps confidence on
active memories the response actually references, and this commit
turns structured cues in the response into candidate memories for a
human review queue.
Nothing extracted here is ever automatically promoted into trusted
state. Every candidate sits at status="candidate" until a human (or
later, a confident automatic policy) calls /memory/{id}/promote or
/memory/{id}/reject. This keeps the "bad memory is worse than no
memory" invariant from the operating model intact.
New module: src/atocore/memory/extractor.py
- MemoryCandidate dataclass (type, content, rule, source_span,
project, confidence, source_interaction_id)
- extract_candidates_from_interaction(interaction): runs a fixed set
of regex rules over the response + response_summary and returns
a list of candidates
V0 rule set (deliberately narrow to keep false positives low):
- decision_heading ## Decision: / ## Decision - / ## Decision —
-> adaptation candidate
- constraint_heading ## Constraint: ... -> project candidate
- requirement_heading ## Requirement: ... -> project candidate
- fact_heading ## Fact: ... -> knowledge candidate
- preference_sentence "I prefer X" / "the user prefers X"
-> preference candidate
- decided_to_sentence "decided to X" -> adaptation candidate
- requirement_sentence "the requirement is X" -> project candidate
Extractor post-processing:
- clean_value: collapse whitespace, strip trailing punctuation
- min content length 8 chars, max 280 (keeps candidates reviewable)
- dedupe by (memory_type, normalized value, rule)
- drop candidates whose content already matches an active memory of
the same type+project so the queue doesn't ask humans to re-curate
things they already promoted
Memory service (extends Commit B candidate-status foundation):
- promote_memory(id): candidate -> active (404 if not a candidate)
- reject_candidate_memory(id): candidate -> invalid
- both are no-ops if the target isn't currently a candidate so the
API can surface 404 without the caller needing to pre-check
API endpoints (new):
- POST /interactions/{id}/extract run extractor, preview-only
body: {"persist": false} (default) returns candidates
{"persist": true} creates candidate memories
- POST /memory/{id}/promote candidate -> active
- POST /memory/{id}/reject candidate -> invalid
- GET /memory?status=candidate list review queue explicitly
(existing endpoint now accepts status= override)
- GET /memory now also returns reference_count and last_referenced_at
per memory so the Commit B reinforcement signal is visible to clients
Trust model unchanged:
- candidates NEVER appear in context packs (get_memories_for_context
still filters to active via the active_only default)
- candidates NEVER get reinforced by the Commit B loop (reinforcement
refuses non-active memories)
- trusted project state is untouched end-to-end
Tests (25 new, all green):
- heading pattern: decision, constraint, requirement, fact
- separator variants :, -, em-dash
- sentence patterns: preference, decided_to, requirement
- rejects too-short matches
- dedupes identical matches
- strips trailing punctuation
- carries project and source_interaction_id onto candidates
- drops candidates that duplicate an existing active memory
- returns empty for prose without structural cues
- candidate and active coexist in the memory table
- promote_memory moves candidate -> active
- promote on non-candidate returns False
- reject_candidate_memory moves candidate -> invalid
- reject on non-candidate returns False
- get_memories(status="candidate") returns just the queue
- POST /interactions/{id}/extract preview-only path
- POST /interactions/{id}/extract persist=true path
- POST /interactions/{id}/extract 404 for missing interaction
- POST /memory/{id}/promote success + 404 on non-candidate
- POST /memory/{id}/reject 404 on missing
- GET /memory?status=candidate surfaces the queue
- GET /memory?status=<invalid> returns 400
Full suite: 160 passing (was 135).
What Phase 9 looks like end to end after this commit
----------------------------------------------------
prompt
-> context pack assembled
-> LLM response
-> POST /interactions (capture)
-> automatic Commit B reinforcement (active memories only)
-> [optional] POST /interactions/{id}/extract
-> Commit C extractor proposes candidates
-> human reviews via GET /memory?status=candidate
-> POST /memory/{id}/promote (candidate -> active)
OR POST /memory/{id}/reject (candidate -> invalid)
Not in this commit (deferred on purpose):
- Decay of unused memories (we keep reference_count and
last_referenced_at so a later decay job has the signal it needs)
- LLM-based extractor as an alternative to the regex rules
- Automatic promotion of high-confidence candidates
- Candidate-to-entity upgrade path (needs the engineering layer
memory-vs-entities decision, planned in a coming architecture doc)
Phase 9 Commit B from the agreed plan. With Commit A capturing what
AtoCore fed to the LLM and what came back, this commit closes the
weakest part of the loop: when a memory is actually referenced in a
response, its confidence should drift up, and stale memories that
nobody ever mentions should stay where they are.
This is reinforcement only — nothing is promoted into trusted state
and no candidates are created. Extraction is Commit C.
Schema (additive migration):
- memories.last_referenced_at DATETIME (null by default)
- memories.reference_count INTEGER DEFAULT 0
- idx_memories_last_referenced on last_referenced_at
- memories.status now accepts the new "candidate" value so Commit C
has the status slot to land on. Existing active/superseded/invalid
rows are untouched.
New module: src/atocore/memory/reinforcement.py
- reinforce_from_interaction(interaction): scans the interaction's
response + response_summary for echoes of active memories and
bumps confidence / reference_count for each match
- matching is intentionally simple and explainable:
* normalize both sides (lowercase, collapse whitespace)
* require >= 12 chars of memory content to match
* compare the leading 80-char window of each memory
- the candidate pool is project-scoped memories for the interaction's
project + global identity + preference memories, deduplicated
- candidates and invalidated memories are NEVER reinforced; only
active memories move
Memory service changes:
- MEMORY_STATUSES = ["candidate", "active", "superseded", "invalid"]
- create_memory(status="candidate"|"active"|...) with per-status
duplicate scoping so a candidate and an active with identical text
can legitimately coexist during review
- get_memories(status=...) explicit override of the legacy active_only
flag; callers can now list the review queue cleanly
- update_memory accepts any valid status including "candidate"
- reinforce_memory(id, delta): low-level primitive that bumps
confidence (capped at 1.0), increments reference_count, and sets
last_referenced_at. Only active memories; returns (applied, old, new)
- promote_memory / reject_candidate_memory helpers prepping Commit C
Interactions service:
- record_interaction(reinforce=True) runs reinforce_from_interaction
automatically when the interaction has response content. reinforcement
errors are logged but never raised back to the caller so capture
itself is never blocked by a flaky downstream.
- circular import between interactions service and memory.reinforcement
avoided by lazy import inside the function
API:
- POST /interactions now accepts a reinforce bool field (default true)
- POST /interactions/{id}/reinforce runs reinforcement on an existing
captured interaction — useful for backfilling or for retrying after
a transient error in the automatic pass
- response lists which memory ids were reinforced with
old / new confidence for audit
Tests (17 new, all green):
- reinforce_memory bumps, caps at 1.0, accumulates reference_count
- reinforce_memory rejects candidates and missing ids
- reinforce_memory rejects negative delta
- reinforce_from_interaction matches active memory
- reinforce_from_interaction ignores candidates and inactive
- reinforce_from_interaction requires minimum content length
- reinforce_from_interaction handles empty response cleanly
- reinforce_from_interaction normalizes casing and whitespace
- reinforce_from_interaction deduplicates across memory buckets
- record_interaction auto-reinforces by default
- record_interaction reinforce=False skips the pass
- record_interaction handles empty response
- POST /interactions/{id}/reinforce runs against stored interaction
- POST /interactions/{id}/reinforce returns 404 for missing id
- POST /interactions accepts reinforce=false
Full suite: 135 passing (was 118).
Trust model unchanged:
- reinforcement only moves confidence within the existing active set
- the candidate lifecycle is declared but only Commit C will actually
create candidate memories
- trusted project state is never touched by reinforcement
Next: Commit C adds the rule-based extractor that produces candidate
memories from captured interactions plus the promote/reject review
queue endpoints.
Phase 9 Commit A from the agreed plan: turn AtoCore from a stateless
context enhancer into a system that records what it actually fed to an
LLM and what came back. This is the audit trail Reflection (Commit B)
and Extraction (Commit C) will be layered on top of.
The interactions table existed in the schema since the original PoC
but nothing wrote to it. This change makes it real:
Schema migration (additive only):
- response full LLM response (caller decides how much)
- memories_used JSON list of memory ids in the context pack
- chunks_used JSON list of chunk ids in the context pack
- client identifier of the calling system
(openclaw, claude-code, manual, ...)
- session_id groups multi-turn conversations
- project project name (mirrors the memory module pattern,
no FK so capture stays cheap)
- indexes on session_id, project, created_at
The created_at column is now written explicitly with a SQLite-compatible
'YYYY-MM-DD HH:MM:SS' format so the same string lives in the DB and the
returned dataclass. Without this the `since` filter on list_interactions
would silently fail because CURRENT_TIMESTAMP and isoformat use different
shapes that do not compare cleanly as strings.
New module src/atocore/interactions/:
- Interaction dataclass
- record_interaction() persists one round-trip (prompt required;
everything else optional). Refuses empty prompts.
- list_interactions() filters by project / session_id / client / since,
newest-first, hard-capped at 500
- get_interaction() fetch by id, full response + context pack
API endpoints:
- POST /interactions capture one interaction
- GET /interactions list with summaries (no full response)
- GET /interactions/{id} full record incl. response + pack
Trust model:
- Capture is read-only with respect to memories, project state, and
source chunks. Nothing here promotes anything into trusted state.
- The audit trail becomes the dataset Commit B (reinforcement) and
Commit C (extraction + review queue) will operate on.
Tests (13 new, all green):
- service: persist + roundtrip every field
- service: minimum-fields path (prompt only)
- service: empty / whitespace prompt rejected
- service: get by id returns None for missing
- service: filter by project, session, client
- service: ordering newest-first with limit
- service: since filter inclusive on cutoff (the bug the timestamp
fix above caught)
- service: limit=0 returns empty
- API: POST records and round-trips through GET /interactions/{id}
- API: empty prompt returns 400
- API: missing id returns 404
- API: list filter returns summaries (not full response bodies)
Full suite: 118 passing (was 105).
master-plan-status.md updated to move Phase 9 from "not started" to
"started" with the explicit note that Commit A is in and Commits B/C
remain.
Three small improvements that move the operational baseline forward
without changing the existing trust model.
1. Tunable retrieval ranking weights
- rank_project_match_boost, rank_query_token_step,
rank_query_token_cap, rank_path_high_signal_boost,
rank_path_low_signal_penalty are now Settings fields
- all overridable via ATOCORE_* env vars
- retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32
- lets ranking be tuned per environment as Wave 1 is exercised
without code changes
2. /projects/{name}/refresh status
- refresh_registered_project now returns an overall status field
("ingested", "partial", "nothing_to_ingest") plus roots_ingested
and roots_skipped counters
- ProjectRefreshResponse advertises the new fields so callers can
rely on them
- covers the case where every configured root is missing on disk
3. Chroma cold snapshot + admin backup endpoints
- create_runtime_backup now accepts include_chroma and writes a
cold directory copy of the chroma persistence path
- new list_runtime_backups() and validate_backup() helpers
- new endpoints:
- POST /admin/backup create snapshot (optional chroma)
- GET /admin/backup list snapshots
- GET /admin/backup/{stamp}/validate structural validation
- chroma snapshots are taken under exclusive_ingestion() so a refresh
or ingest cannot race with the cold copy
- backup metadata records what was actually included and how big
Tests:
- 8 new tests covering tunable weights, refresh status branches
(ingested / partial / nothing_to_ingest), chroma snapshot, list,
validate, and the API endpoints (including the lock-acquisition path)
- existing fake refresh stubs in test_api_storage.py updated for the
expanded ProjectRefreshResponse model
- full suite: 105 passing (was 97)
next-steps doc updated to reflect that the chroma snapshot + restore
validation gap from current-state.md is now closed in code; only the
operational retention policy remains.
Two changes that belong together:
1. builder.build_context() now passes project_hint into retrieve(),
so the project-aware boost actually fires for the retrieval pipeline
driven by /context/build. Before this, only direct /query callers
benefited from the registered-project boost.
2. retriever now applies two more ranking signals on every chunk:
- _query_match_boost: boosts chunks whose source/title/heading
echo high-signal query tokens (stop list filters out generic
words like "the", "project", "system")
- _path_signal_boost: down-weights archival noise (_archive,
_history, pre-cleanup, reviews) by 0.72 and up-weights current
high-signal docs (status, decision, requirements, charter,
system-map, error-budget, ...) by 1.18
Tests:
- test_context_builder_passes_project_hint_to_retrieval verifies
the wiring fix
- test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths
verifies the new ranking helpers prefer current docs over archive
This addresses the cross-project competition and archive bleed
called out in current-state.md after the Wave 1 ingestion.