feat: Phase 7C — tag canonicalization (autonomous, weekly)
LLM proposes alias→canonical mappings for domain_tags; confidence >= 0.8
auto-apply, below goes to human triage. Protects project identifiers
(p04, p05, p06, atocore, apm, etc.) from ever being canonicalized
since they're their own namespace, not concepts.
Problem solved: tag drift fragments retrieval. "fw" vs "firmware" vs
"firmware-control" all mean the same thing, but cross-cutting queries
that filter by tag only hit one variant. Weekly canonicalization pass
keeps the tag graph clean.
- Schema: tag_aliases table (pending | approved | rejected)
- atocore.memory._tag_canon_prompt (stdlib-only, protected project tokens)
- service: get_tag_distribution, apply_tag_alias (atomic per-memory,
dedupes if both alias + canonical present), create / approve / reject
proposal lifecycle, per-memory audit rows with action="tag_canonicalized"
- scripts/canonicalize_tags.py: host-side detector, autonomous by default,
--no-auto-approve kill switch
- 6 API endpoints under /admin/tags/* (distribution, list, propose,
apply, approve/{id}, reject/{id})
- Step B4 in batch-extract.sh (Sundays only — weekly cadence)
- 26 new tests (prompt parser, normalizer protections, distribution
counting, rewrite atomicity, dedup, audit, lifecycle). 414 → 440.
Design: aggressive protection of project tokens because a false
canonicalization (p04 → p04-gigabit, or vice versa) would scramble
cross-project filtering. Err toward preservation; the alias only
applies if the model is very confident AND both strings appear in
the current distribution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -197,6 +197,19 @@ python3 "$APP_DIR/scripts/memory_dedup.py" \
|
||||
log "WARN: memory dedup failed (non-blocking)"
|
||||
}
|
||||
|
||||
# Step B4: Tag canonicalization (Phase 7C, weekly Sundays)
|
||||
# Autonomous: LLM proposes alias→canonical maps, auto-applies confidence >= 0.8.
|
||||
# Projects tokens are protected (skipped on both sides). Borderline proposals
|
||||
# land in /admin/tags/aliases for human review.
|
||||
if [[ "$(date -u +%u)" == "7" ]]; then
|
||||
log "Step B4: tag canonicalization (Sunday)"
|
||||
python3 "$APP_DIR/scripts/canonicalize_tags.py" \
|
||||
--base-url "$ATOCORE_URL" \
|
||||
2>&1 || {
|
||||
log "WARN: tag canonicalization failed (non-blocking)"
|
||||
}
|
||||
fi
|
||||
|
||||
# Step G: Integrity check (Phase 4 V1)
|
||||
log "Step G: integrity check"
|
||||
python3 "$APP_DIR/scripts/integrity_check.py" \
|
||||
|
||||
Reference in New Issue
Block a user