Codex's audit of e8ac8bb returned GO with two cheap improvements worth
folding in:
P2: the "short token does not match" test was trivially true because
apm and drill share no tokens at all. Replaced with a 4-label setup
that exercises both directions: apm + apm-fpga must NOT cluster (only
shared token is the 3-char 'apm'); foo-fpga + bar-fpga + apm-fpga MUST
cluster via the 4-char 'fpga'. Now a regression that lets <4-char
tokens through would fail.
P3: token comparison was case-sensitive. Lowercased before the length
check so 'HydroTech-Mining' clusters with 'hydrotech-split-tank' the
same way the all-lowercase variants do. Added a regression test.
Also added the registered-token-leak test Codex specifically called
out: p04-gigabit registered, gigabit-other unregistered — gigabit-other
must NOT surface p04-gigabit as a suggested alias (filter happens
before clustering).
Test count: 594 -> 596.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>