From 9f262a21b0eb97282cc8485165391517cdb63259 Mon Sep 17 00:00:00 2001 From: Anto01 Date: Sat, 18 Apr 2026 08:31:09 -0400 Subject: [PATCH] =?UTF-8?q?feat:=20extractor=20llm-0.6.0=20=E2=80=94=20bol?= =?UTF-8?q?der=20unknown-project=20tagging?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit User observation: APM work was captured + extracted, but candidates got tagged project=atocore or left blank instead of project=apm. Reason: the prompt said 'Unknown project names — still tag them' but was too terse; sonnet hedged toward registered matches rather than proposing new slugs. Fix: explicit guidance in the system prompt for when to propose an unregistered project name vs when to stick with a registered one. New instructions: - When a memory is clearly ABOUT a named tool/product/project/system not in the known list, use a slugified version as the project tag ('apm' for 'Atomaste Part Manager'). The Living Taxonomy detector (Phase 6 C.1) scans these and surfaces for one-click registration once ≥3 memories accumulate with that tag. - Exception: if a memory merely USES an unknown tool but is about a registered project ('p04 parts missing materials in APM'), tag with the registered project and mention the tool in content. This closes the loop on the Phase 6 detector: extractor now produces taggable data for it, detector surfaces, user registers with one click. Version bump: llm-0.5.0 → llm-0.6.0. Co-Authored-By: Claude Opus 4.6 (1M context) --- src/atocore/memory/_llm_prompt.py | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/atocore/memory/_llm_prompt.py b/src/atocore/memory/_llm_prompt.py index d144838..2ccb0e2 100644 --- a/src/atocore/memory/_llm_prompt.py +++ b/src/atocore/memory/_llm_prompt.py @@ -21,7 +21,7 @@ from __future__ import annotations import json from typing import Any -LLM_EXTRACTOR_VERSION = "llm-0.5.0" +LLM_EXTRACTOR_VERSION = "llm-0.6.0" # bolder unknown-project tagging MAX_RESPONSE_CHARS = 8000 MAX_PROMPT_CHARS = 2000 MEMORY_TYPES = {"identity", "preference", "project", "episodic", "knowledge", "adaptation"} @@ -30,7 +30,24 @@ SYSTEM_PROMPT = """You extract memory candidates from LLM conversation turns for AtoCore is the brain for Atomaste's engineering work. Known projects: p04-gigabit, p05-interferometer, p06-polisher, atomizer-v2, atocore, -abb-space. Unknown project names — still tag them, the system auto-detects. +abb-space. + +UNKNOWN PROJECT/TOOL DETECTION (important): when a memory is clearly +about a named tool, product, project, or system that is NOT in the +known list above, use a slugified version of that name as the project +tag (e.g., "apm" for "Atomaste Part Manager", "foo-bar" for "Foo Bar +System"). DO NOT default to a nearest registered match just because +APM isn't listed — that's misattribution. The system's Living +Taxonomy detector scans for these unregistered tags and surfaces them +for one-click registration once they appear in ≥3 memories. Your job +is to be honest about scope, not to squeeze everything into existing +buckets. + +Exception: if the memory is about a registered project that merely +uses or integrates with an unknown tool (e.g., "p04 parts are missing +materials in APM"), tag with the registered project (p04-gigabit) and +mention the tool in content. Only use an unknown tool as the project +tag when the tool itself is the primary subject. Your job is to emit SIGNALS that matter for future context. Be aggressive: err on the side of capturing useful signal. Triage filters noise downstream.