ATOCore

Author SHA1 Message Date

Author	SHA1	Message	Date
Anto01	f70fa6bb9a	fix(projects): close Codex Wave 1.5 P2/P3 — stronger negatives + lowercase tokens Codex's audit of `e8ac8bb` returned GO with two cheap improvements worth folding in: P2: the "short token does not match" test was trivially true because apm and drill share no tokens at all. Replaced with a 4-label setup that exercises both directions: apm + apm-fpga must NOT cluster (only shared token is the 3-char 'apm'); foo-fpga + bar-fpga + apm-fpga MUST cluster via the 4-char 'fpga'. Now a regression that lets <4-char tokens through would fail. P3: token comparison was case-sensitive. Lowercased before the length check so 'HydroTech-Mining' clusters with 'hydrotech-split-tank' the same way the all-lowercase variants do. Added a regression test. Also added the registered-token-leak test Codex specifically called out: p04-gigabit registered, gigabit-other unregistered — gigabit-other must NOT surface p04-gigabit as a suggested alias (filter happens before clustering). Test count: 594 -> 596. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:16:00 -04:00
Anto01	e8ac8bb536	feat(projects): live emerging-project registration proposals Adds GET /admin/projects/proposals?min_active=N — an on-demand companion to the nightly scripts/detect_emerging.py cache. Reads SQL + the registry directly so the result is always current. Each proposal is operator-ready: - project_id (the literal label as captured) - active_count / candidate_count from current SQL - sample_memories: 3 most recent active memories with content preview - suggested_aliases: sibling labels sharing a >=4-char token (e.g. lead-space + lead-space-exploration-ltd + space-exploration-ltd cluster naturally; apm and drill stay independent) - guessed_ingest_root: vault:incoming/projects/<id>/ Workflow: operator hits /admin/projects/proposals to see the live "what should I register?" view, picks aliases from the suggestions, then POSTs to the existing /admin/projects/register-emerging. Closes Codex's Wave 1.5 ask: "promote-to-registered-project proposal with suggested aliases, sample memories, and guessed ingest root; require one click." For apm at 165 active memories on prod, this is overdue. 8 regression tests covering: registered-name (canonical + alias) exclusion, threshold filtering, sibling clustering, short-token negative, sample/root shape, candidate counting, param validation, sort order. Test count: 586 -> 594. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:10:37 -04:00

Anto01

f70fa6bb9a

fix(projects): close Codex Wave 1.5 P2/P3 — stronger negatives + lowercase tokens

Codex's audit of e8ac8bb returned GO with two cheap improvements worth
folding in:

P2: the "short token does not match" test was trivially true because
apm and drill share no tokens at all. Replaced with a 4-label setup
that exercises both directions: apm + apm-fpga must NOT cluster (only
shared token is the 3-char 'apm'); foo-fpga + bar-fpga + apm-fpga MUST
cluster via the 4-char 'fpga'. Now a regression that lets <4-char
tokens through would fail.

P3: token comparison was case-sensitive. Lowercased before the length
check so 'HydroTech-Mining' clusters with 'hydrotech-split-tank' the
same way the all-lowercase variants do. Added a regression test.

Also added the registered-token-leak test Codex specifically called
out: p04-gigabit registered, gigabit-other unregistered — gigabit-other
must NOT surface p04-gigabit as a suggested alias (filter happens
before clustering).

Test count: 594 -> 596.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 22:16:00 -04:00

Anto01

e8ac8bb536

feat(projects): live emerging-project registration proposals

Adds GET /admin/projects/proposals?min_active=N — an on-demand companion
to the nightly scripts/detect_emerging.py cache. Reads SQL + the registry
directly so the result is always current.

Each proposal is operator-ready:
  - project_id (the literal label as captured)
  - active_count / candidate_count from current SQL
  - sample_memories: 3 most recent active memories with content preview
  - suggested_aliases: sibling labels sharing a >=4-char token
    (e.g. lead-space + lead-space-exploration-ltd + space-exploration-ltd
    cluster naturally; apm and drill stay independent)
  - guessed_ingest_root: vault:incoming/projects/<id>/

Workflow: operator hits /admin/projects/proposals to see the live "what
should I register?" view, picks aliases from the suggestions, then POSTs
to the existing /admin/projects/register-emerging.

Closes Codex's Wave 1.5 ask: "promote-to-registered-project proposal
with suggested aliases, sample memories, and guessed ingest root;
require one click." For apm at 165 active memories on prod, this is
overdue.

8 regression tests covering: registered-name (canonical + alias)
exclusion, threshold filtering, sibling clustering, short-token negative,
sample/root shape, candidate counting, param validation, sort order.

Test count: 586 -> 594.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 22:10:37 -04:00

2 Commits