feat: Phase 3 V1 — Auto-Organization (domain_tags + valid_until)

Adds structural metadata that the LLM triage was already implicitly reasoning about ("stale snapshot" → reject). Phase 3 captures that reasoning as fields so it can DRIVE retrieval, not just rejection. Schema (src/atocore/models/database.py): - domain_tags TEXT DEFAULT '[]' JSON array of lowercase topic keywords - valid_until DATETIME ISO date; null = permanent - idx_memories_valid_until index for efficient expiry queries Memory service (src/atocore/memory/service.py): - Memory dataclass gains domain_tags + valid_until - create_memory, update_memory accept/persist both - _row_to_memory safely reads both (JSON-decode + null handling) - _normalize_tags helper: lowercase, dedup, strip, cap at 10 - get_memories_for_context filters expired (valid_until < today UTC) - _rank_memories_for_query adds tag-boost: memories whose domain_tags appear as substrings in query text rank higher (tertiary key after content-overlap density + absolute overlap, before confidence) LLM extractor (_llm_prompt.py → llm-0.5.0): - SYSTEM_PROMPT documents domain_tags (2-5 keywords) + valid_until (time-bounded facts get expiry dates; durable facts stay null) - normalize_candidate_item parses both fields from model output with graceful fallback for string/null/missing LLM triage (scripts/auto_triage.py): - TRIAGE_SYSTEM_PROMPT documents same two fields - parse_verdict extracts them from verdict JSON - On promote: PUT /memory/{id} with tags + valid_until BEFORE POST /memory/{id}/promote, so active memories carry them API (src/atocore/api/routes.py): - MemoryCreateRequest: adds domain_tags, valid_until - MemoryUpdateRequest: adds domain_tags, valid_until, memory_type - GET /memory response exposes domain_tags + valid_until + created_at Triage UI (src/atocore/engineering/triage_ui.py): - Renders existing tags as colored badges - Adds inline text field for tags (comma-separated) + date picker for valid_until on every candidate card - Save&Promote button persists edits via PUT then promotes - Plain Promote (and Y shortcut) also saves tags/expiry if edited Wiki (src/atocore/engineering/wiki.py): - Search now matches memory content OR domain_tags - Search results render tags as clickable badges linking to /wiki/search?q=<tag> for cross-project navigation - valid_until shown as amber "valid until YYYY-MM-DD" hint Tests: 303 → 308 (5 new for Phase 3 behavior): - test_create_memory_with_tags_and_valid_until - test_create_memory_normalizes_tags - test_update_memory_sets_tags_and_valid_until - test_get_memories_for_context_excludes_expired - test_context_builder_tag_boost_orders_results Deferred (explicitly): temporal_scope enum, source_refs memory graph, HDBSCAN clustering, memory detail wiki page, backfill of existing actives. See docs/MASTER-BRAIN-PLAN.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 21:37:01 -04:00
parent 271ee25d99
commit bfa7dba4de
9 changed files with 444 additions and 36 deletions
--- a/scripts/auto_triage.py
+++ b/scripts/auto_triage.py
@@ -48,7 +48,25 @@ You will receive:
 For each candidate, output exactly one JSON object:
-{"verdict": "promote|reject|needs_human|contradicts", "confidence": 0.0-1.0, "reason": "one sentence", "conflicts_with": "id of existing memory if contradicts"}
+{"verdict": "promote|reject|needs_human|contradicts", "confidence": 0.0-1.0, "reason": "one sentence", "conflicts_with": "id of existing memory if contradicts", "domain_tags": ["tag1","tag2"], "valid_until": null}
 DOMAIN TAGS (Phase 3): A lowercase list of 2-5 topical keywords describing
 the SUBJECT matter (not the project). This enables cross-project retrieval:
 a query about "optics" can pull matches from p04 + p05 + p06.
 Good tags are single lowercase words or hyphenated terms. Mix:
 - domain keywords (optics, thermal, firmware, materials, controls)
 - project tokens when clearly scoped (p04, p05, p06, abb)
 - lifecycle/activity words (procurement, design, validation, vendor)
 Always emit domain_tags on a promote. For reject, empty list is fine.
 VALID_UNTIL (Phase 3): ISO date "YYYY-MM-DD" OR null (permanent).
 Set to a near-future date when the candidate is time-bounded:
 - Status snapshots ("current blocker is X") → ~2 weeks out
 - Scheduled events ("meeting Friday") → event date
 - Quotes with expiry → quote expiry date
 Leave null for durable decisions, engineering insights, ratified requirements.
 Rules:
@@ -196,11 +214,36 @@ def parse_verdict(raw):
    reason = str(parsed.get("reason", "")).strip()[:200]
    conflicts_with = str(parsed.get("conflicts_with", "")).strip()
    # Phase 3: domain tags + expiry
    raw_tags = parsed.get("domain_tags") or []
    if isinstance(raw_tags, str):
        raw_tags = [t.strip() for t in raw_tags.split(",") if t.strip()]
    if not isinstance(raw_tags, list):
        raw_tags = []
    domain_tags = []
    for t in raw_tags[:10]:
        if not isinstance(t, str):
            continue
        tag = t.strip().lower()
        if tag and tag not in domain_tags:
            domain_tags.append(tag)
    valid_until = parsed.get("valid_until")
    if valid_until is None:
        valid_until = ""
    else:
        valid_until = str(valid_until).strip()
        if valid_until.lower() in ("", "null", "none", "permanent"):
            valid_until = ""
    return {
        "verdict": verdict,
        "confidence": confidence,
        "reason": reason,
        "conflicts_with": conflicts_with,
        "domain_tags": domain_tags,
        "valid_until": valid_until,
    }
@@ -259,6 +302,24 @@ def main():
                if args.dry_run:
                    print(f"  WOULD PROMOTE  {label}  conf={conf:.2f}  {reason}")
                else:
                    # Phase 3: update tags + valid_until on the candidate
                    # before promoting, so the active memory carries them.
                    tags = verdict_obj.get("domain_tags") or []
                    valid_until = verdict_obj.get("valid_until") or ""
                    if tags or valid_until:
                        try:
                            import urllib.request as _ur
                            body = json.dumps({
                                "domain_tags": tags,
                                "valid_until": valid_until,
                            }).encode("utf-8")
                            req = _ur.Request(
                                f"{args.base_url}/memory/{mid}", method="PUT",
                                headers={"Content-Type": "application/json"}, data=body,
                            )
                            _ur.urlopen(req, timeout=10).read()
                        except Exception:
                            pass  # non-fatal; promote anyway
                    try:
                        api_post(args.base_url, f"/memory/{mid}/promote")
                        print(f"  PROMOTED       {label}  conf={conf:.2f}  {reason}")
--- a/scripts/batch_llm_extract_live.py
+++ b/scripts/batch_llm_extract_live.py
@@ -176,6 +176,8 @@ def parse_candidates(raw, interaction_project):
            "content": normalized["content"],
            "project": project,
            "confidence": normalized["confidence"],
            "domain_tags": normalized.get("domain_tags") or [],
            "valid_until": normalized.get("valid_until") or "",
        })
    return results
@@ -250,6 +252,8 @@ def main():
                    "project": c["project"],
                    "confidence": c["confidence"],
                    "status": "candidate",
                    "domain_tags": c.get("domain_tags") or [],
                    "valid_until": c.get("valid_until") or "",
                })
                total_persisted += 1
            except urllib.error.HTTPError as exc:
--- a/src/atocore/api/routes.py
+++ b/src/atocore/api/routes.py
@@ -254,12 +254,17 @@ class MemoryCreateRequest(BaseModel):
    project: str = ""
    confidence: float = 1.0
    status: str = "active"
    domain_tags: list[str] | None = None
    valid_until: str = ""
 class MemoryUpdateRequest(BaseModel):
    content: str | None = None
    confidence: float | None = None
    status: str | None = None
    memory_type: str | None = None
    domain_tags: list[str] | None = None
    valid_until: str | None = None
 class ProjectStateSetRequest(BaseModel):
@@ -458,6 +463,8 @@ def api_create_memory(req: MemoryCreateRequest) -> dict:
            project=req.project,
            confidence=req.confidence,
            status=req.status,
            domain_tags=req.domain_tags or [],
            valid_until=req.valid_until or "",
        )
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))
@@ -502,6 +509,9 @@ def api_get_memories(
                "reference_count": m.reference_count,
                "last_referenced_at": m.last_referenced_at,
                "updated_at": m.updated_at,
                "created_at": m.created_at,
                "domain_tags": m.domain_tags or [],
                "valid_until": m.valid_until or "",
            }
            for m in memories
        ],
@@ -519,6 +529,9 @@ def api_update_memory(memory_id: str, req: MemoryUpdateRequest) -> dict:
            content=req.content,
            confidence=req.confidence,
            status=req.status,
            memory_type=req.memory_type,
            domain_tags=req.domain_tags,
            valid_until=req.valid_until,
        )
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))
--- a/src/atocore/engineering/triage_ui.py
+++ b/src/atocore/engineering/triage_ui.py
@@ -35,12 +35,24 @@ def _render_candidate_card(cand) -> str:
    confidence = f"{cand.confidence:.2f}"
    refs = cand.reference_count or 0
    created = _escape(str(cand.created_at or ""))
    tags = cand.domain_tags or []
    tags_str = _escape(", ".join(tags))
    valid_until = _escape(cand.valid_until or "")
    # Strip time portion for HTML date input
    valid_until_date = valid_until[:10] if valid_until else ""
    type_options = "".join(
        f'<option value="{t}"{" selected" if t == cand.memory_type else ""}>{t}</option>'
        for t in VALID_TYPES
    )
    # Tag badges rendered from current tags
    badges_html = ""
    if tags:
        badges_html = '<div class="cand-tags-display">' + "".join(
            f'<span class="tag-badge">{_escape(t)}</span>' for t in tags
        ) + '</div>'
    return f"""
 <div class="cand" id="cand-{mid}" data-id="{mid}">
  <div class="cand-head">
@@ -51,10 +63,21 @@ def _render_candidate_card(cand) -> str:
  <div class="cand-body">
    <textarea class="cand-content" id="content-{mid}">{content}</textarea>
  </div>
  {badges_html}
  <div class="cand-meta-row">
    <label class="cand-field-label">Tags:
      <input type="text" class="cand-tags-input" id="tags-{mid}"
             value="{tags_str}" placeholder="optics, thermal, p04" />
    </label>
    <label class="cand-field-label">Valid until:
      <input type="date" class="cand-valid-until" id="valid-until-{mid}"
             value="{valid_until_date}" />
    </label>
  </div>
  <div class="cand-actions">
    <button class="btn-promote" data-id="{mid}" title="Promote (Y)">✅ Promote</button>
    <button class="btn-reject" data-id="{mid}" title="Reject (N)">❌ Reject</button>
-    <button class="btn-save-promote" data-id="{mid}" title="Edit + Promote (E)">✏️ Save&Promote</button>
+    <button class="btn-save-promote" data-id="{mid}" title="Save edits + promote (E)">✏️ Save&Promote</button>
    <label class="cand-type-label">Type:
      <select class="cand-type-select" id="type-{mid}">{type_options}</select>
    </label>
@@ -162,24 +185,47 @@ async function reject(id) {
  else setStatus(id, '❌ Failed: ' + r.status, false);
 }
 function parseTags(str) {
  return (str || '').split(/[,;]/).map(s => s.trim().toLowerCase()).filter(Boolean);
 }
 async function savePromote(id) {
  const content = document.getElementById('content-' + id).value.trim();
  const mtype = document.getElementById('type-' + id).value;
  const tagsStr = document.getElementById('tags-' + id)?.value || '';
  const validUntil = document.getElementById('valid-until-' + id)?.value || '';
  if (!content) { setStatus(id, 'Content is empty', false); return; }
  setStatus(id, 'Saving…', true);
-  const r1 = await apiCall('/memory/' + encodeURIComponent(id), 'PUT', {
+  const body = {
-    content: content, memory_type: mtype
+    content: content,
-  });
+    memory_type: mtype,
    domain_tags: parseTags(tagsStr),
    valid_until: validUntil,
  };
  const r1 = await apiCall('/memory/' + encodeURIComponent(id), 'PUT', body);
  if (!r1.ok) { setStatus(id, '❌ Save failed: ' + r1.status, false); return; }
  const r2 = await apiCall('/memory/' + encodeURIComponent(id) + '/promote', 'POST');
  if (r2.ok) { setStatus(id, '✅ Saved & Promoted', true); removeCard(id); }
  else setStatus(id, '❌ Promote failed: ' + r2.status, false);
 }
 // Also save tag/expiry edits when plain "Promote" is clicked if fields changed
 async function promoteWithMeta(id) {
  const tagsStr = document.getElementById('tags-' + id)?.value || '';
  const validUntil = document.getElementById('valid-until-' + id)?.value || '';
  if (tagsStr.trim() || validUntil) {
    await apiCall('/memory/' + encodeURIComponent(id), 'PUT', {
      domain_tags: parseTags(tagsStr),
      valid_until: validUntil,
    });
  }
  return promote(id);
 }
 document.addEventListener('click', (e) => {
  const id = e.target.dataset?.id;
  if (!id) return;
-  if (e.target.classList.contains('btn-promote')) promote(id);
+  if (e.target.classList.contains('btn-promote')) promoteWithMeta(id);
  else if (e.target.classList.contains('btn-reject')) reject(id);
  else if (e.target.classList.contains('btn-save-promote')) savePromote(id);
 });
@@ -192,7 +238,7 @@ document.addEventListener('keydown', (e) => {
  const first = document.querySelector('.cand');
  if (!first) return;
  const id = first.dataset.id;
-  if (e.key === 'y' || e.key === 'Y') { e.preventDefault(); promote(id); }
+  if (e.key === 'y' || e.key === 'Y') { e.preventDefault(); promoteWithMeta(id); }
  else if (e.key === 'n' || e.key === 'N') { e.preventDefault(); reject(id); }
  else if (e.key === 'e' || e.key === 'E') {
    e.preventDefault();
@@ -236,6 +282,13 @@ _TRIAGE_CSS = """
  .auto-triage-msg { flex:1; min-width:200px; font-size:0.85rem; opacity:0.75; }
  .auto-triage-msg.ok { color:var(--accent); opacity:1; font-weight:500; }
  .auto-triage-msg.err { color:#dc2626; opacity:1; font-weight:500; }
  .cand-tags-display { margin-top:0.5rem; display:flex; gap:0.35rem; flex-wrap:wrap; }
  .tag-badge { background:var(--accent); color:white; padding:0.15rem 0.55rem; border-radius:10px; font-size:0.72rem; font-family:monospace; font-weight:500; }
  .cand-meta-row { display:flex; gap:0.8rem; margin-top:0.6rem; align-items:center; flex-wrap:wrap; }
  .cand-field-label { display:flex; gap:0.3rem; align-items:center; font-size:0.85rem; opacity:0.75; }
  .cand-tags-input { flex:1; min-width:200px; padding:0.3rem 0.5rem; background:var(--bg); color:var(--text); border:1px solid var(--border); border-radius:3px; font-family:monospace; font-size:0.85rem; }
  .cand-tags-input:focus { outline:none; border-color:var(--accent); }
  .cand-valid-until { padding:0.3rem; background:var(--bg); color:var(--text); border:1px solid var(--border); border-radius:3px; font-family:inherit; font-size:0.85rem; }
 </style>
 """
--- a/src/atocore/engineering/wiki.py
+++ b/src/atocore/engineering/wiki.py
@@ -211,15 +211,32 @@ def render_search(query: str) -> str:
            )
        lines.append('</ul>')
-    # Search memories
+    # Search memories — match on content OR domain_tags (Phase 3)
    all_memories = get_memories(active_only=True, limit=200)
    query_lower = query.lower()
-    matching_mems = [m for m in all_memories if query_lower in m.content.lower()][:10]
+    matching_mems = [
        m for m in all_memories
        if query_lower in m.content.lower()
        or any(query_lower in (t or "").lower() for t in (m.domain_tags or []))
    ][:20]
    if matching_mems:
        lines.append(f'<h2>Memories ({len(matching_mems)})</h2><ul>')
        for m in matching_mems:
            proj = f' <span class="tag">{m.project}</span>' if m.project else ''
-            lines.append(f'<li>[{m.memory_type}]{proj} {m.content[:200]}</li>')
+            tags_html = ""
            if m.domain_tags:
                tag_links = " ".join(
                    f'<a href="/wiki/search?q={t}" class="tag-badge">{t}</a>'
                    for t in m.domain_tags[:5]
                )
                tags_html = f' <span class="mem-tags">{tag_links}</span>'
            expiry_html = ""
            if m.valid_until:
                expiry_html = f' <span class="mem-expiry">valid until {m.valid_until[:10]}</span>'
            lines.append(
                f'<li>[{m.memory_type}]{proj}{tags_html}{expiry_html} '
                f'{m.content[:200]}</li>'
            )
        lines.append('</ul>')
    if not entities and not matching_mems:
@@ -302,6 +319,11 @@ _TEMPLATE = """<!DOCTYPE html>
  .triage-notice { background: var(--card); border-left: 4px solid var(--accent); padding: 0.6rem 1rem; border-radius: 4px; margin: 0.8rem 0; }
  .triage-warning { background: #fef3c7; color: #78350f; border-left: 4px solid #d97706; padding: 0.6rem 1rem; border-radius: 4px; margin: 0.8rem 0; }
  @media (prefers-color-scheme: dark) { .triage-warning { background: #451a03; color: #fde68a; } }
  .mem-tags { display: inline-flex; gap: 0.25rem; flex-wrap: wrap; vertical-align: middle; }
  .tag-badge { background: var(--accent); color: white; padding: 0.1rem 0.5rem; border-radius: 10px; font-size: 0.7rem; font-family: monospace; text-decoration: none; font-weight: 500; }
  .tag-badge:hover { opacity: 0.85; text-decoration: none; }
  .mem-expiry { font-size: 0.75rem; color: #d97706; font-style: italic; margin-left: 0.4rem; }
  @media (prefers-color-scheme: dark) { .mem-expiry { color: #fbbf24; } }
 </style>
 </head>
 <body>
--- a/src/atocore/memory/_llm_prompt.py
+++ b/src/atocore/memory/_llm_prompt.py
@@ -21,7 +21,7 @@ from __future__ import annotations
 import json
 from typing import Any
-LLM_EXTRACTOR_VERSION = "llm-0.4.0"
+LLM_EXTRACTOR_VERSION = "llm-0.5.0"
 MAX_RESPONSE_CHARS = 8000
 MAX_PROMPT_CHARS = 2000
 MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "adaptation"}
@@ -84,6 +84,36 @@ DOMAINS for knowledge candidates (required when type=knowledge and project is em
 physics, materials, optics, mechanics, manufacturing, metrology,
 controls, software, math, finance, business
 DOMAIN TAGS (Phase 3):
 Every candidate gets domain_tags — a lowercase list of topical keywords
 that describe the SUBJECT matter regardless of project. This is how
 cross-project retrieval works: a query about "optics" surfaces matches
 from p04 + p05 + p06 without naming each project.
 Good tags: single lowercase words or hyphenated terms.
 Examples:
  - "ABB quote received for P04" → ["abb", "p04", "procurement", "optics"]
  - "USB SSD mandatory on polisher" → ["p06", "firmware", "storage"]
  - "CTE dominates WFE at F/1.2" → ["optics", "materials", "thermal"]
  - "Antoine prefers OAuth over API keys" → ["security", "auth", "preference"]
 Tag 2-5 items. Use domain keywords (optics, thermal, firmware), project
 tokens when relevant (p04, abb), and lifecycle words (procurement, design,
 validation) as appropriate.
 VALID_UNTIL (Phase 3):
 A memory can have an expiry date if it describes time-bounded truth.
 Use valid_until for:
  - Status snapshots: "current blocker is X" → valid_until = ~2 weeks out
  - Scheduled events: "meeting with vendor Friday" → valid_until = meeting date
  - Quotes with expiry: "quote valid until May 31"
  - Interim decisions pending ratification
 Leave empty (null) for:
  - Durable design decisions ("Option B selected")
  - Engineering insights ("CTE dominates at F/1.2")
  - Ratified requirements, architectural commitments
 Default = null (permanent). Format: ISO date "YYYY-MM-DD" or empty.
 TRUST HIERARCHY:
 - project-specific: set project to the project id, leave domain empty
@@ -99,7 +129,7 @@ OUTPUT RULES:
 - Empty array [] is fine when the conversation has no durable signal
 Each element:
-{"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5}"""
+{"type": "project|knowledge|preference|adaptation|episodic", "content": "...", "project": "...", "domain": "", "confidence": 0.5, "domain_tags": ["tag1","tag2"], "valid_until": null}"""
 def build_user_message(prompt: str, response: str, project_hint: str) -> str:
@@ -174,10 +204,36 @@ def normalize_candidate_item(item: dict[str, Any]) -> dict[str, Any] | None:
    if domain and not model_project:
        content = f"[{domain}] {content}"
    # Phase 3: domain_tags + valid_until
    raw_tags = item.get("domain_tags") or []
    if isinstance(raw_tags, str):
        # Tolerate comma-separated string fallback
        raw_tags = [t.strip() for t in raw_tags.split(",") if t.strip()]
    if not isinstance(raw_tags, list):
        raw_tags = []
    domain_tags = []
    for t in raw_tags[:10]:  # cap at 10
        if not isinstance(t, str):
            continue
        tag = t.strip().lower()
        if tag and tag not in domain_tags:
            domain_tags.append(tag)
    valid_until = item.get("valid_until")
    if valid_until is not None:
        valid_until = str(valid_until).strip()
        # Accept ISO date "YYYY-MM-DD" or full timestamp; empty/"null" → none
        if valid_until.lower() in ("", "null", "none", "permanent"):
            valid_until = ""
    else:
        valid_until = ""
    return {
        "type": mem_type,
        "content": content[:1000],
        "project": model_project,
        "domain": domain,
        "confidence": confidence,
        "domain_tags": domain_tags,
        "valid_until": valid_until,
    }
--- a/src/atocore/memory/service.py
+++ b/src/atocore/memory/service.py
@@ -63,6 +63,30 @@ class Memory:
    updated_at: str
    last_referenced_at: str = ""
    reference_count: int = 0
    domain_tags: list[str] | None = None
    valid_until: str = ""  # ISO UTC; empty = permanent
 def _normalize_tags(tags) -> list[str]:
    """Coerce a tags value (list, JSON string, None) to a clean lowercase list."""
    import json as _json
    if tags is None:
        return []
    if isinstance(tags, str):
        try:
            tags = _json.loads(tags) if tags.strip().startswith("[") else []
        except Exception:
            tags = []
    if not isinstance(tags, list):
        return []
    out = []
    for t in tags:
        if not isinstance(t, str):
            continue
        t = t.strip().lower()
        if t and t not in out:
            out.append(t)
    return out
 def create_memory(
@@ -72,34 +96,36 @@ def create_memory(
    source_chunk_id: str = "",
    confidence: float = 1.0,
    status: str = "active",
    domain_tags: list[str] | None = None,
    valid_until: str = "",
 ) -> Memory:
    """Create a new memory entry.
    ``status`` defaults to ``active`` for backward compatibility. Pass
    ``candidate`` when the memory is being proposed by the Phase 9 Commit C
    extractor and still needs human review before it can influence context.
    Phase 3: ``domain_tags`` is a list of lowercase domain strings
    (optics, mechanics, firmware, ...) for cross-project retrieval.
    ``valid_until`` is an ISO UTC timestamp; memories with valid_until
    in the past are excluded from context packs (but remain queryable).
    """
    import json as _json
    if memory_type not in MEMORY_TYPES:
        raise ValueError(f"Invalid memory type '{memory_type}'. Must be one of: {MEMORY_TYPES}")
    if status not in MEMORY_STATUSES:
        raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
    _validate_confidence(confidence)
    # Canonicalize the project through the registry so an alias and
    # the canonical id store under the same bucket. This keeps
    # reinforcement queries (which use the interaction's project) and
    # context retrieval (which uses the registry-canonicalized hint)
    # consistent with how memories are created.
    project = resolve_project_name(project)
    tags = _normalize_tags(domain_tags)
    tags_json = _json.dumps(tags)
    valid_until = (valid_until or "").strip() or None
    memory_id = str(uuid.uuid4())
    now = datetime.now(timezone.utc).isoformat()
    # Check for duplicate content within the same type+project at the same status.
    # Scoping by status keeps active curation separate from the candidate
    # review queue: a candidate and an active memory with identical text can
    # legitimately coexist if the candidate is a fresh extraction of something
    # already curated.
    with get_connection() as conn:
        existing = conn.execute(
            "SELECT id FROM memories "
@@ -118,9 +144,11 @@ def create_memory(
            )
        conn.execute(
-            "INSERT INTO memories (id, memory_type, content, project, source_chunk_id, confidence, status) "
+            "INSERT INTO memories (id, memory_type, content, project, source_chunk_id, "
-            "VALUES (?, ?, ?, ?, ?, ?, ?)",
+            "confidence, status, domain_tags, valid_until) "
-            (memory_id, memory_type, content, project, source_chunk_id or None, confidence, status),
+            "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
            (memory_id, memory_type, content, project, source_chunk_id or None,
             confidence, status, tags_json, valid_until),
        )
    log.info(
@@ -128,6 +156,8 @@ def create_memory(
        memory_type=memory_type,
        status=status,
        content_preview=content[:80],
        tags=tags,
        valid_until=valid_until or "",
    )
    return Memory(
@@ -142,6 +172,8 @@ def create_memory(
        updated_at=now,
        last_referenced_at="",
        reference_count=0,
        domain_tags=tags,
        valid_until=valid_until or "",
    )
@@ -200,8 +232,13 @@ def update_memory(
    content: str | None = None,
    confidence: float | None = None,
    status: str | None = None,
    memory_type: str | None = None,
    domain_tags: list[str] | None = None,
    valid_until: str | None = None,
 ) -> bool:
    """Update an existing memory."""
    import json as _json
    with get_connection() as conn:
        existing = conn.execute("SELECT * FROM memories WHERE id = ?", (memory_id,)).fetchone()
        if existing is None:
@@ -235,6 +272,17 @@ def update_memory(
                raise ValueError(f"Invalid status '{status}'. Must be one of: {MEMORY_STATUSES}")
            updates.append("status = ?")
            params.append(status)
        if memory_type is not None:
            if memory_type not in MEMORY_TYPES:
                raise ValueError(f"Invalid memory type '{memory_type}'. Must be one of: {MEMORY_TYPES}")
            updates.append("memory_type = ?")
            params.append(memory_type)
        if domain_tags is not None:
            updates.append("domain_tags = ?")
            params.append(_json.dumps(_normalize_tags(domain_tags)))
        if valid_until is not None:
            updates.append("valid_until = ?")
            params.append(valid_until.strip() or None)
        if not updates:
            return False
@@ -488,8 +536,14 @@ def get_memories_for_context(
            seen_ids.add(mem.id)
            pool.append(mem)
    # Phase 3: filter out expired memories (valid_until in the past).
    # Raw API queries still return them (for audit/history) but context
    # packs must not surface stale facts.
    if pool:
        pool = _filter_expired(pool)
    if query_tokens is not None:
-        pool = _rank_memories_for_query(pool, query_tokens)
+        pool = _rank_memories_for_query(pool, query_tokens, query=query)
    # Per-entry cap prevents a single long memory from monopolizing
    # the band. With 16 p06 memories competing for ~700 chars, an
@@ -518,40 +572,78 @@ def get_memories_for_context(
    return text, len(text)
 def _filter_expired(memories: list["Memory"]) -> list["Memory"]:
    """Drop memories whose valid_until is in the past (UTC comparison)."""
    now_iso = datetime.now(timezone.utc).strftime("%Y-%m-%d")
    out = []
    for m in memories:
        vu = (m.valid_until or "").strip()
        if not vu:
            out.append(m)
            continue
        # Compare as string (ISO dates/timestamps sort correctly lexicographically
        # when they have the same format; date-only vs full ts both start YYYY-MM-DD).
        if vu[:10] >= now_iso:
            out.append(m)
        # else: expired, drop silently
    return out
 def _rank_memories_for_query(
    memories: list["Memory"],
    query_tokens: set[str],
    query: str | None = None,
 ) -> list["Memory"]:
    """Rerank a memory list by lexical overlap with a pre-tokenized query.
    Primary key: overlap_density (overlap_count / memory_token_count),
    which rewards short focused memories that match the query precisely
    over long overview memories that incidentally share a few tokens.
-    Secondary: absolute overlap count. Tertiary: confidence.
+    Secondary: absolute overlap count. Tertiary: domain-tag match.
    Quaternary: confidence.
-    R7 fix: previously overlap_count alone was the primary key, so a
+    Phase 3: domain_tags contribute a boost when they appear in the
-    40-token overview memory with 3 overlapping tokens tied a 5-token
+    query text. A memory tagged [optics, thermal] for a query about
-    memory with 3 overlapping tokens, and the overview won on
+    "optics coating" gets promoted above a memory without those tags.
-    confidence. Now the short memory's density (0.6) beats the
+    Tag boost fires AFTER content-overlap density so it only breaks
-    overview's density (0.075).
+    ties among content-similar candidates.
    """
    from atocore.memory.reinforcement import _normalize, _tokenize
-    scored: list[tuple[float, int, float, Memory]] = []
+    query_lower = (query or "").lower()
    scored: list[tuple[float, int, int, float, Memory]] = []
    for mem in memories:
        mem_tokens = _tokenize(_normalize(mem.content))
        overlap = len(mem_tokens & query_tokens) if mem_tokens else 0
        density = overlap / len(mem_tokens) if mem_tokens else 0.0
-        scored.append((density, overlap, mem.confidence, mem))
+
-    scored.sort(key=lambda t: (t[0], t[1], t[2]), reverse=True)
+        # Tag boost: count how many of the memory's domain_tags appear
-    return [mem for _, _, _, mem in scored]
+        # as substrings in the raw query. Strong signal for topical match.
        tag_hits = 0
        for tag in (mem.domain_tags or []):
            if tag and tag in query_lower:
                tag_hits += 1
        scored.append((density, overlap, tag_hits, mem.confidence, mem))
    scored.sort(key=lambda t: (t[0], t[1], t[2], t[3]), reverse=True)
    return [mem for _, _, _, _, mem in scored]
 def _row_to_memory(row) -> Memory:
    """Convert a DB row to Memory dataclass."""
    import json as _json
    keys = row.keys() if hasattr(row, "keys") else []
    last_ref = row["last_referenced_at"] if "last_referenced_at" in keys else None
    ref_count = row["reference_count"] if "reference_count" in keys else 0
    tags_raw = row["domain_tags"] if "domain_tags" in keys else None
    try:
        tags = _json.loads(tags_raw) if tags_raw else []
        if not isinstance(tags, list):
            tags = []
    except Exception:
        tags = []
    valid_until = row["valid_until"] if "valid_until" in keys else None
    return Memory(
        id=row["id"],
        memory_type=row["memory_type"],
@@ -564,6 +656,8 @@ def _row_to_memory(row) -> Memory:
        updated_at=row["updated_at"],
        last_referenced_at=last_ref or "",
        reference_count=int(ref_count or 0),
        domain_tags=tags,
        valid_until=valid_until or "",
    )
--- a/src/atocore/models/database.py
+++ b/src/atocore/models/database.py
@@ -119,6 +119,23 @@ def _apply_migrations(conn: sqlite3.Connection) -> None:
        "CREATE INDEX IF NOT EXISTS idx_memories_last_referenced ON memories(last_referenced_at)"
    )
    # Phase 3 (Auto-Organization V1): domain tags + expiry.
    # domain_tags is a JSON array of lowercase strings (optics, mechanics,
    # firmware, business, etc.) inferred by the LLM during triage. Used for
    # cross-project retrieval: a query about "optics" can surface matches from
    # p04 + p05 + p06 without knowing all the project names.
    # valid_until is an ISO UTC timestamp beyond which the memory is
    # considered stale. get_memories_for_context filters these out of context
    # packs automatically so ephemeral facts (status snapshots, weekly counts)
    # don't pollute grounding once they've aged out.
    if not _column_exists(conn, "memories", "domain_tags"):
        conn.execute("ALTER TABLE memories ADD COLUMN domain_tags TEXT DEFAULT '[]'")
    if not _column_exists(conn, "memories", "valid_until"):
        conn.execute("ALTER TABLE memories ADD COLUMN valid_until DATETIME")
    conn.execute(
        "CREATE INDEX IF NOT EXISTS idx_memories_valid_until ON memories(valid_until)"
    )
    # Phase 9 Commit A: capture loop columns on the interactions table.
    # The original schema only carried prompt + project_id + a context_pack
    # JSON blob. To make interactions a real audit trail of what AtoCore fed
--- a/tests/test_memory.py
+++ b/tests/test_memory.py
@@ -264,6 +264,94 @@ def test_expire_stale_candidates(isolated_db):
    assert mem["status"] == "invalid"
 # --- Phase 3: domain_tags + valid_until ---
 def test_create_memory_with_tags_and_valid_until(isolated_db):
    from atocore.memory.service import create_memory
    mem = create_memory(
        "knowledge",
        "CTE gradient dominates WFE at F/1.2",
        domain_tags=["optics", "thermal", "materials"],
        valid_until="2027-01-01",
    )
    assert mem.domain_tags == ["optics", "thermal", "materials"]
    assert mem.valid_until == "2027-01-01"
 def test_create_memory_normalizes_tags(isolated_db):
    from atocore.memory.service import create_memory
    mem = create_memory(
        "knowledge",
        "some content here",
        domain_tags=["  Optics  ", "OPTICS", "Thermal", ""],
    )
    # Duplicates and empty removed; lowercased; stripped
    assert mem.domain_tags == ["optics", "thermal"]
 def test_update_memory_sets_tags_and_valid_until(isolated_db):
    from atocore.memory.service import create_memory, update_memory
    from atocore.models.database import get_connection
    mem = create_memory("knowledge", "some content for update test")
    assert update_memory(
        mem.id,
        domain_tags=["controls", "firmware"],
        valid_until="2026-12-31",
    )
    with get_connection() as conn:
        row = conn.execute("SELECT domain_tags, valid_until FROM memories WHERE id = ?", (mem.id,)).fetchone()
    import json as _json
    assert _json.loads(row["domain_tags"]) == ["controls", "firmware"]
    assert row["valid_until"] == "2026-12-31"
 def test_get_memories_for_context_excludes_expired(isolated_db):
    """Expired active memories must not land in context packs."""
    from atocore.memory.service import create_memory, get_memories_for_context
    # Active but expired
    create_memory(
        "knowledge",
        "stale snapshot from long ago period",
        valid_until="2020-01-01",
        confidence=1.0,
    )
    # Active and valid
    create_memory(
        "knowledge",
        "durable engineering insight stays valid forever",
        confidence=1.0,
    )
    text, _ = get_memories_for_context(memory_types=["knowledge"], budget=600)
    assert "durable engineering" in text
    assert "stale snapshot" not in text
 def test_context_builder_tag_boost_orders_results(isolated_db):
    """Memories with tags matching query should rank higher."""
    from atocore.memory.service import create_memory, get_memories_for_context
    create_memory("knowledge", "generic content has no obvious overlap with topic", confidence=0.8, domain_tags=[])
    create_memory("knowledge", "generic content has no obvious overlap topic here", confidence=0.8, domain_tags=["optics"])
    text, _ = get_memories_for_context(
        memory_types=["knowledge"],
        budget=2000,
        query="tell me about optics",
    )
    # Tagged memory should appear before the untagged one
    idx_tagged = text.find("overlap topic here")
    idx_untagged = text.find("overlap with topic")
    assert idx_tagged != -1
    assert idx_untagged != -1
    assert idx_tagged < idx_untagged
 def test_expire_stale_candidates_keeps_reinforced(isolated_db):
    from atocore.memory.service import create_memory, expire_stale_candidates
    from atocore.models.database import get_connection