fix: pass project_hint into retrieve and add path-signal ranking

Two changes that belong together:

1. builder.build_context() now passes project_hint into retrieve(),
   so the project-aware boost actually fires for the retrieval pipeline
   driven by /context/build. Before this, only direct /query callers
   benefited from the registered-project boost.

2. retriever now applies two more ranking signals on every chunk:
   - _query_match_boost: boosts chunks whose source/title/heading
     echo high-signal query tokens (stop list filters out generic
     words like "the", "project", "system")
   - _path_signal_boost: down-weights archival noise (_archive,
     _history, pre-cleanup, reviews) by 0.72 and up-weights current
     high-signal docs (status, decision, requirements, charter,
     system-map, error-budget, ...) by 1.18

Tests:
- test_context_builder_passes_project_hint_to_retrieval verifies
  the wiring fix
- test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths
  verifies the new ranking helpers prefer current docs over archive

This addresses the cross-project competition and archive bleed
called out in current-state.md after the Wave 1 ingestion.
This commit is contained in:
2026-04-06 18:37:07 -04:00
parent bdb42dba05
commit 14ab7c8e9f
4 changed files with 175 additions and 8 deletions

View File

@@ -104,7 +104,15 @@ def build_context(
retrieval_budget = budget - project_state_chars - memory_chars
# 4. Retrieve candidates
candidates = retrieve(user_prompt, top_k=_config.settings.context_top_k) if retrieval_budget > 0 else []
candidates = (
retrieve(
user_prompt,
top_k=_config.settings.context_top_k,
project_hint=project_hint,
)
if retrieval_budget > 0
else []
)
# 5. Score and rank
scored = _rank_chunks(candidates, project_hint)