fix(R7/R9): overlap-density ranking + project trust-preservation

R7: ranking scorer now uses overlap-density (overlap_count / memory_token_count) as primary key instead of raw overlap count. A 5-token memory with 3 overlapping tokens (density 0.6) now beats a 40-token overview memory with 3 overlapping tokens (density 0.075) at the same absolute count. Secondary: absolute overlap. Tertiary: confidence. Targeting p06-firmware-interface harness fixture. R9: when the LLM extractor returns a project that differs from the interaction's known project, it now checks the project registry. If the model's project is a registered canonical ID, trust it. If not (hallucinated name), fall back to the interaction's project. Uses load_project_registry() for the check. The host-side script mirrors this via an API call to GET /projects at startup. Two new tests: test_parser_keeps_registered_model_project and test_parser_rejects_hallucinated_project. Test count: 280 -> 281. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:34:33 -04:00
parent 1a2ee5e07f
commit 8951c624fe
4 changed files with 78 additions and 11 deletions
--- a/src/atocore/memory/extractor_llm.py
+++ b/src/atocore/memory/extractor_llm.py
@@ -257,6 +257,27 @@ def _parse_candidates(raw_output: str, interaction: Interaction) -> list[MemoryC
        project = str(item.get("project") or "").strip()
        if not project and interaction.project:
            project = interaction.project
+        elif project and interaction.project and project != interaction.project:
+            # R9: model returned a different project than the interaction's
+            # known scope. Trust the model's project only if it resolves
+            # to a known registered project (the registry normalizes
+            # aliases and returns the canonical id). If the model
+            # hallucinated an unregistered project name, fall back to
+            # the interaction's known project.
+            try:
+                from atocore.projects.registry import (
+                    load_project_registry,
+                    resolve_project_name,
+                )
+
+                registered_ids = {p.project_id for p in load_project_registry()}
+                resolved = resolve_project_name(project)
+                if resolved not in registered_ids:
+                    project = interaction.project
+                else:
+                    project = resolved
+            except Exception:
+                project = interaction.project
        confidence_raw = item.get("confidence", 0.5)
        if mem_type not in MEMORY_TYPES:
            continue
--- a/src/atocore/memory/service.py
+++ b/src/atocore/memory/service.py
@@ -446,20 +446,27 @@ def _rank_memories_for_query(
 ) -> list["Memory"]:
    """Rerank a memory list by lexical overlap with a pre-tokenized query.

-    Ordering key: (overlap_count DESC, confidence DESC). When a query
-    shares no tokens with a memory, overlap is zero and confidence
-    acts as the sole tiebreaker — which matches the pre-query
-    behaviour and keeps no-query calls stable.
+    Primary key: overlap_density (overlap_count / memory_token_count),
+    which rewards short focused memories that match the query precisely
+    over long overview memories that incidentally share a few tokens.
+    Secondary: absolute overlap count. Tertiary: confidence.
+
+    R7 fix: previously overlap_count alone was the primary key, so a
+    40-token overview memory with 3 overlapping tokens tied a 5-token
+    memory with 3 overlapping tokens, and the overview won on
+    confidence. Now the short memory's density (0.6) beats the
+    overview's density (0.075).
    """
    from atocore.memory.reinforcement import _normalize, _tokenize

-    scored: list[tuple[int, float, Memory]] = []
+    scored: list[tuple[float, int, float, Memory]] = []
    for mem in memories:
        mem_tokens = _tokenize(_normalize(mem.content))
        overlap = len(mem_tokens & query_tokens) if mem_tokens else 0
-        scored.append((overlap, mem.confidence, mem))
-    scored.sort(key=lambda t: (t[0], t[1]), reverse=True)
-    return [mem for _, _, mem in scored]
+        density = overlap / len(mem_tokens) if mem_tokens else 0.0
+        scored.append((density, overlap, mem.confidence, mem))
+    scored.sort(key=lambda t: (t[0], t[1], t[2]), reverse=True)
+    return [mem for _, _, _, mem in scored]


 def _row_to_memory(row) -> Memory: