feat: tunable ranking, refresh status, chroma backup + admin endpoints

Three small improvements that move the operational baseline forward without changing the existing trust model. 1. Tunable retrieval ranking weights - rank_project_match_boost, rank_query_token_step, rank_query_token_cap, rank_path_high_signal_boost, rank_path_low_signal_penalty are now Settings fields - all overridable via ATOCORE_* env vars - retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32 - lets ranking be tuned per environment as Wave 1 is exercised without code changes 2. /projects/{name}/refresh status - refresh_registered_project now returns an overall status field ("ingested", "partial", "nothing_to_ingest") plus roots_ingested and roots_skipped counters - ProjectRefreshResponse advertises the new fields so callers can rely on them - covers the case where every configured root is missing on disk 3. Chroma cold snapshot + admin backup endpoints - create_runtime_backup now accepts include_chroma and writes a cold directory copy of the chroma persistence path - new list_runtime_backups() and validate_backup() helpers - new endpoints: - POST /admin/backup create snapshot (optional chroma) - GET /admin/backup list snapshots - GET /admin/backup/{stamp}/validate structural validation - chroma snapshots are taken under exclusive_ingestion() so a refresh or ingest cannot race with the cold copy - backup metadata records what was actually included and how big Tests: - 8 new tests covering tunable weights, refresh status branches (ingested / partial / nothing_to_ingest), chroma snapshot, list, validate, and the API endpoints (including the lock-acquisition path) - existing fake refresh stubs in test_api_storage.py updated for the expanded ProjectRefreshResponse model - full suite: 105 passing (was 97) next-steps doc updated to reflect that the chroma snapshot + restore validation gap from current-state.md is now closed in code; only the operational retention policy remains.
2026-04-06 18:42:19 -04:00
parent 14ab7c8e9f
commit c9b9eede25
10 changed files with 615 additions and 13 deletions
--- a/src/atocore/retrieval/retriever.py
+++ b/src/atocore/retrieval/retriever.py
@@ -173,7 +173,7 @@ def _project_match_boost(project_hint: str, metadata: dict) -> float:

    for candidate in candidate_names:
        if candidate and candidate in searchable:
-            return 2.0
+            return _config.settings.rank_project_match_boost

    return 1.0

@@ -198,7 +198,10 @@ def _query_match_boost(query: str, metadata: dict) -> float:
    matches = sum(1 for token in set(tokens) if token in searchable)
    if matches <= 0:
        return 1.0
-    return min(1.0 + matches * 0.08, 1.32)
+    return min(
+        1.0 + matches * _config.settings.rank_query_token_step,
+        _config.settings.rank_query_token_cap,
+    )


 def _path_signal_boost(metadata: dict) -> float:
@@ -213,9 +216,9 @@ def _path_signal_boost(metadata: dict) -> float:

    multiplier = 1.0
    if any(hint in searchable for hint in _LOW_SIGNAL_HINTS):
-        multiplier *= 0.72
+        multiplier *= _config.settings.rank_path_low_signal_penalty
    if any(hint in searchable for hint in _HIGH_SIGNAL_HINTS):
-        multiplier *= 1.18
+        multiplier *= _config.settings.rank_path_high_signal_boost
    return multiplier