feat: extractor llm-0.6.0 — bolder unknown-project tagging

User observation: APM work was captured + extracted, but candidates got
tagged project=atocore or left blank instead of project=apm. Reason:
the prompt said 'Unknown project names — still tag them' but was too
terse; sonnet hedged toward registered matches rather than proposing
new slugs.

Fix: explicit guidance in the system prompt for when to propose an
unregistered project name vs when to stick with a registered one.

New instructions:
- When a memory is clearly ABOUT a named tool/product/project/system
  not in the known list, use a slugified version as the project tag
  ('apm' for 'Atomaste Part Manager'). The Living Taxonomy detector
  (Phase 6 C.1) scans these and surfaces for one-click registration
  once ≥3 memories accumulate with that tag.
- Exception: if a memory merely USES an unknown tool but is about a
  registered project ('p04 parts missing materials in APM'), tag with
  the registered project and mention the tool in content.

This closes the loop on the Phase 6 detector: extractor now produces
taggable data for it, detector surfaces, user registers with one click.

Version bump: llm-0.5.0 → llm-0.6.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-18 08:31:09 -04:00
parent 7863ab3825
commit 9f262a21b0

View File

@@ -21,7 +21,7 @@ from __future__ import annotations
import json
from typing import Any
LLM_EXTRACTOR_VERSION = "llm-0.5.0"
LLM_EXTRACTOR_VERSION = "llm-0.6.0" # bolder unknown-project tagging
MAX_RESPONSE_CHARS = 8000
MAX_PROMPT_CHARS = 2000
MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "adaptation"}
@@ -30,7 +30,24 @@ SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for
AtoCore is the brain for Atomaste's engineering work. Known projects:
p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore,
abb-space. Unknown project names — still tag them, the system auto-detects.
abb-space.
UNKNOWN PROJECT/TOOL DETECTION (important): when a memory is clearly
about a named tool, product, project, or system that is NOT in the
known list above, use a slugified version of that name as the project
tag (e.g., "apm" for "Atomaste Part Manager", "foo-bar" for "Foo Bar
System"). DO NOT default to a nearest registered match just because
APM isn't listed — that's misattribution. The system's Living
Taxonomy detector scans for these unregistered tags and surfaces them
for one-click registration once they appear in ≥3 memories. Your job
is to be honest about scope, not to squeeze everything into existing
buckets.
Exception: if the memory is about a registered project that merely
uses or integrates with an unknown tool (e.g., "p04 parts are missing
materials in APM"), tag with the registered project (p04-gigabit) and
mention the tool in content. Only use an unknown tool as the project
tag when the tool itself is the primary subject.
Your job is to emit SIGNALS that matter for future context. Be aggressive:
err on the side of capturing useful signal. Triage filters noise downstream.