Codex's audit of e8ac8bb returned GO with two cheap improvements worth
folding in:
P2: the "short token does not match" test was trivially true because
apm and drill share no tokens at all. Replaced with a 4-label setup
that exercises both directions: apm + apm-fpga must NOT cluster (only
shared token is the 3-char 'apm'); foo-fpga + bar-fpga + apm-fpga MUST
cluster via the 4-char 'fpga'. Now a regression that lets <4-char
tokens through would fail.
P3: token comparison was case-sensitive. Lowercased before the length
check so 'HydroTech-Mining' clusters with 'hydrotech-split-tank' the
same way the all-lowercase variants do. Added a regression test.
Also added the registered-token-leak test Codex specifically called
out: p04-gigabit registered, gigabit-other unregistered — gigabit-other
must NOT surface p04-gigabit as a suggested alias (filter happens
before clustering).
Test count: 594 -> 596.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GET /admin/projects/proposals?min_active=N — an on-demand companion
to the nightly scripts/detect_emerging.py cache. Reads SQL + the registry
directly so the result is always current.
Each proposal is operator-ready:
- project_id (the literal label as captured)
- active_count / candidate_count from current SQL
- sample_memories: 3 most recent active memories with content preview
- suggested_aliases: sibling labels sharing a >=4-char token
(e.g. lead-space + lead-space-exploration-ltd + space-exploration-ltd
cluster naturally; apm and drill stay independent)
- guessed_ingest_root: vault:incoming/projects/<id>/
Workflow: operator hits /admin/projects/proposals to see the live "what
should I register?" view, picks aliases from the suggestions, then POSTs
to the existing /admin/projects/register-emerging.
Closes Codex's Wave 1.5 ask: "promote-to-registered-project proposal
with suggested aliases, sample memories, and guessed ingest root;
require one click." For apm at 165 active memories on prod, this is
overdue.
8 regression tests covering: registered-name (canonical + alias)
exclusion, threshold filtering, sibling clustering, short-token negative,
sample/root shape, candidate counting, param validation, sort order.
Test count: 586 -> 594.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>