Files
Atomizer/hq/workspaces/shared/reviews/v2-migration-audit.md

16 KiB

V2 Migration Master Plan — Audit Report

Auditor: Auditor Agent 🔍 Date: 2026-02-22 Document Reviewed: ATOMIZER-V2-MIGRATION-MASTERPLAN.md Verdict: 🟡 MAJOR issues found — plan is strong but has significant gaps that will cause problems during execution


1. Completeness — 🔴 CRITICAL GAPS

1.1 Missing V1 Modules (Not Accounted For)

The migration plan lists modules to port but misses at least 8 significant V1 subpackages:

V1 Module Files Purpose Impact if Missed
optimization_engine/context/ 7 files Session state, compaction, feedback loop, playbook, reflector 🔴 Core runtime functionality — sessions won't persist state
optimization_engine/study/ 8 files Study creator, wizard, continuation, reset, benchmarking, state, history 🔴 Can't create or manage studies without this
optimization_engine/utils/ 12 files Logger, dashboard_db, trial_manager, NX file discovery, study archiver, realtime tracking 🔴 Infrastructure that everything depends on
optimization_engine/plugins/ 4 files hook_manager, hooks, validators (DIFFERENT from hooks/) 🟡 Plugin system won't work
optimization_engine/intake/ 3 files Config intake, context intake, processor 🟡 Study intake pipeline broken
optimization_engine/validation/ 3 files checker.py, gate.py (DIFFERENT from validators/) 🟡 Validation gates lost
optimization_engine/model_discovery/ 2 files NX model introspection 🟡 Model discovery capability lost
optimization_engine/devloop/ 7 files Analyzer, orchestrator, planning, test_runner, browser scenarios 🟢 DevLoop was planned for tools/devloop_cli.py but the full subpackage has 7 files
optimization_engine/processors/ 2 files adaptive_characterization.py 🟡 V1 already has a processors/ concept
optimization_engine/future/ 11 files Research agents, LLM workflow analyzer, step classifier 🟢 May be intentionally excluded, but not listed in "DO NOT MIGRATE"
optimization_engine/custom_functions/ 2 files NX material generator 🟢 Utility, should be documented
optimization_engine/templates/ 3 files run_optimization_template, run_nn_optimization_template 🟡 Template system for studies
optimization_engine/surrogates/ 1 file __init__.py (separate from gnn/) 🟢 Minor

1.2 Missing V1 Core Files

V1 File Role Plan Status
optimization_engine/core/base_runner.py Base class for runners Not mentioned (plan only lists runner.py)
optimization_engine/core/gradient_optimizer.py Gradient-based optimization Not mentioned
optimization_engine/core/runner_with_neural.py Neural-accelerated runner Not mentioned
optimization_engine/core/strategy_portfolio.py Strategy portfolio management Not mentioned
optimization_engine/core/strategy_selector.py Strategy selection (different from method_selector) Not mentioned
optimization_engine/schemas/ Schema files Mentioned but directory contents not inventoried

1.3 Missing V1 Root-Level Files

File Status
atomizer.py (25KB monolith) Listed in "DO NOT MIGRATE" but its functionality needs a replacement
launch_dashboard.py Not mentioned — how does V2 launch the dashboard?
requirements.txt Replaced by pyproject.toml
install.bat Not mentioned — Windows install script

1.4 V1 Tools Directory

The plan only mentions tools/devloop_cli.py. V1 tools/ has 25+ scripts including:

  • analyze_study.py, find_best_iteration.py, archive_study.py
  • create_pareto_graphs.py, generate_psd_figures.py
  • Zernike-specific tools (HTML generator, WFE PSD, optical report)
  • Study migration tools

Recommendation: Create an inventory of tools/ and decide per-file: migrate, archive, or replace.


2. Risk Assessment — 🟡 MAJOR

2.1 Identified Risks (Plan Section 11)

The plan's risk table is reasonable but underestimates these risks:

Risk Plan's Mitigation My Assessment
Import breakage Find-replace optimization_engine.atomizer. 🟡 Insufficient. Many V1 modules use relative imports, cross-module imports, and optimization_engine. is nested (e.g., from optimization_engine.core.runner import Runner where runner.py imports from optimization_engine.extractors). A mechanical find-replace will miss circular dependencies and runtime-only imports. Need a test suite, not just sed.
NX integration breaks Test on dalidou before archiving V1 Adequate
.gitignore too aggressive Test essential files 🟡 See Data Safety section below

2.2 Unidentified Risks

Risk Severity Mitigation Needed
V1 utils/ dependency web — logger, trial_manager, dashboard_db are imported EVERYWHERE in V1. Where do they go in V2? 🔴 HIGH Create atomizer/utils/ or distribute into appropriate modules. Map ALL import dependencies before porting.
context/ module loss — session state, compaction, feedback loops. If not ported, studies can't resume, context is lost between runs 🔴 HIGH Add to migration table, decide V2 location
study/ module loss — study creation wizard, continuation, reset. Without this, can't create studies from V2 🔴 HIGH Add to migration table as P0
Optuna DB path changes — V1 studies store Optuna databases at specific paths. V2 restructure may break study continuation 🟡 MED Test study continuation with path remapping
NX journal path references — NX journals may hardcode V1 paths 🟡 MED Audit all journal files for hardcoded paths
Knowledge base .jsonl files — are these tracked in git or gitignored? They're small (212KB) but grow over time 🟡 MED Clarify: track in git or gitignore with backup strategy
Python version compatibility — pyproject.toml says >=3.10 but V1 may use patterns from 3.8/3.9 🟢 LOW Test on target Python version

3. Feasibility — 🟡 8-Day Timeline is Aggressive

3.1 Phase-by-Phase Assessment

Phase Planned Realistic Issue
Phase 0: Bootstrap + AOM 1 day 1.5 days AOM link conversion for 48 docs is tedious even with a script. Needs manual QA.
Phase 1: Core Engine 2 days 3-4 days Plan lists 13 steps but misses ~25 additional files from core/, context/, study/, utils/. Refactoring runner→engine while maintaining all runner variants (base_runner, runner_with_neural) is non-trivial.
Phase 2: Supporting 2 days 2 days Reasonable if scope is truly "direct port"
Phase 3: Integration 2 days 3 days Import fixes across 100+ files. This is where the missing modules will surface.
Phase 4: Syncthing 1 day 1 day Reasonable
Phase 5: GitHub + CI 1 day 0.5 days Straightforward
Phase 6: Archive V1 1 day 0.5 days Straightforward
Total 8 days 11-13 days

3.2 Key Bottleneck

Phase 1 is underscoped. The migration table shows 13 clean steps, but V1's optimization_engine/ has ~150 Python files across 20 subpackages. The plan only explicitly accounts for ~60 of these. The remaining ~90 files will surface during Phase 3 integration testing, causing scope creep and rework.

Recommendation: Before starting, create a complete file-level inventory mapping every V1 .py file to its V2 destination (or explicit "skip" decision). This takes ~2 hours but saves days of surprises.


4. Architecture Alignment — STRONG

4.1 AOM Component Map Match

The V2 structure maps well to the AOM's four pillars:

AOM Component V2 Location Match
Pillar 1 (Philosophy) docs/AOM/01-Philosophy/
Pillar 2 (Operations) docs/AOM/02-Operations/
Pillar 3 (Developer) docs/AOM/03-Developer/
Pillar 4 (Knowledge) docs/AOM/04-Knowledge/
Contracts atomizer/contracts/ Matches AOM 03-Developer/08-Data-Contracts
Processors atomizer/processors/ Matches AOM 03-Developer/09-Processor-Development
Orchestrator atomizer/orchestrator/ Matches AOM 01-Philosophy/08-Tool-Agnostic
Extractors atomizer/extractors/ Matches AOM 02-Operations/04-Extractor-Library
Protocols docs/protocols/ Matches AOM 02-Operations/02-Protocol-Reference

4.2 Minor Misalignments

Issue Severity
AOM has Audit/ folder (2 docs) — plan places it under docs/AOM/Audit/ None
AOM Phase 4/5 docs (CLAUDE-v2, Living-Document-Protocol) need explicit V2 homes — plan addresses this in Section 4.4 None
MCP servers are in V2 repo as mcp_servers/ but AOM 03-Developer/10 suggests they could be separate repos 🟢 Minor — decide later

5. Data Safety — 🟡 NEEDS ATTENTION

5.1 .gitignore Assessment

Good coverage for:

  • NX/solver binary files (.sim, .prt, .fem, .bdf, .op2, .f06, .frd)
  • Python artifacts
  • IDE files
  • Study data directory

Missing patterns:

Pattern Risk Recommendation
*.backup / *.bak Backup files could leak Add *.bak and *.backup
*.csv Large result CSVs from studies Add or use studies/ containment
*.png / *.jpg in study dirs Iteration screenshots, contour plots Covered by studies/ gitignore
*.sqlite / *.sqlite3 Optuna databases Add explicitly (.db covers some but not all)
research_sessions/ Knowledge base research data Clarify if tracked
*.jsonl Session insights grow unbounded Clarify: should knowledge/session_insights/*.jsonl be tracked?
*.whl Wheel files Add
*.tar.gz / *.zip Archives in tools/ Not currently present but preventive

5.2 Large File Risk

The plan correctly excludes projects/ (99GB), atomizer_field_training_data/ (68MB), and tools/ (462MB — wait, why is V1 tools/ 462MB?).

Action item: Investigate what's in V1 tools/ that's 462MB. The plan lists it as "Large tool archives" — these could contaminate V2 if tools/ is ported carelessly.

5.3 Success Criterion #9

"No file larger than 1MB in git history (excluding initial dashboard assets)"

This is good but needs enforcement. Recommendation: Add a pre-commit hook or CI check that rejects files >1MB.


6. Backward Compatibility — 🟡 RISKS EXIST

6.1 AtomizerSpec v2→v3 Migration

The plan mentions atomizer/spec/migrator.py for v2.0→v3.0 migration. This is critical.

Key question: What happens when a V1 atomizer_spec.json is loaded?

  • V1 specs have no toolchain section → must default to NX/NX mesher/Nastran
  • V1 specs use optimization_engine.* import paths in custom hooks → must still work
  • V1 specs may reference absolute paths on dalidou → need path translation

6.2 V1 Study Continuation

Can a V2 installation continue an in-progress V1 study?

  • Optuna DB: needs same database path or migration
  • Study state: optimization_engine/study/state.py tracks progress — needs porting
  • Iteration results: stored in studies/*/ — path-dependent

The plan doesn't address mid-study migration. This may be acceptable if all V1 studies are completed before migration, but this should be an explicit decision.

6.3 Import Path Compatibility

The plan says "find-replace optimization_engine.atomizer." but:

  • V1 custom hooks may import from optimization_engine.*
  • User-created study scripts import V1 paths
  • NX journals may import from V1 paths

Recommendation: Consider a compatibility shim:

# optimization_engine/__init__.py (temporary)
import warnings
warnings.warn("optimization_engine is deprecated, use atomizer", DeprecationWarning)
from atomizer import *

7. Gaps — What Hasn't Been Considered

7.1 🔴 No Rollback Plan

If V2 migration fails at Phase 3, what's the recovery? V1 is still there (not archived until Phase 6), but there's no documented rollback procedure.

7.2 🟡 No Migration Verification Checklist

The "Success Criteria" (Section 13) are end-state checks. There's no per-phase verification that catches issues early. Each phase needs explicit "done when" criteria with test commands.

7.3 🟡 Environment/Dependencies

  • V1 uses requirements.txt + conda (atomizer env). V2 uses pyproject.toml.
  • How are V1 dependencies captured? Is there a pip freeze of the working V1 environment?
  • PyTorch + torch-geometric (for GNN) are notoriously version-sensitive. Pin versions.

7.4 🟡 Windows Path Handling

V1 was developed on Windows (NX is Windows-only). V2 development is on Linux. Cross-platform path handling (pathlib.Path vs string paths) needs systematic review, not just "update Windows paths in NX processor (if needed)."

7.5 🟢 Documentation for config/ Migration

V1 has config/nx_config.json.template and config/optimization_config_template.json. These aren't mentioned in the migration plan. They should either map to V2's atomizer/spec/ or .env.example.

7.6 🟢 optimization_engine/schemas/ Contents

The plan says "Port schemas" but doesn't inventory what's in this directory. Should be checked.

7.7 🟢 Feature Registry

V1 has optimization_engine/feature_registry.json. Not mentioned in migration plan.


Summary Scorecard

Criteria Grade Notes
Completeness 🟡 C+ ~60% of V1 files explicitly mapped. 8+ subpackages missing.
Risk Assessment 🟡 B- Good risks identified, but utils/, context/, study/ omissions are high-risk
Feasibility 🟡 B- 8 days → realistically 11-13 days
Architecture Alignment A Excellent match to AOM Component Map
Data Safety 🟡 B Solid .gitignore but missing some patterns; needs pre-commit hook
Backward Compatibility 🟡 B- Spec migration planned but mid-study and import shims not addressed
Overall 🟡 B- Strong vision, solid architecture, but execution plan has dangerous gaps in file inventory

Recommendations (Priority Ordered)

  1. 🔴 IMMEDIATE: Create complete file inventory — Map every V1 .py file to V2 destination or explicit skip. ~2 hours, saves days. (find optimization_engine -name "*.py" | sort → spreadsheet with V2 destination column)

  2. 🔴 Add missing modules to migration table:

    • context/atomizer/context/ or merge into optimization/
    • study/atomizer/study/ (this is P0, not optional)
    • utils/atomizer/utils/ (infrastructure everything depends on)
    • plugins/ → merge with hooks/ or separate
    • validation/ → merge with spec/validator.py
    • intake/atomizer/intake/ or merge into interview/
  3. 🟡 Extend timeline to 12 days or explicitly reduce scope (e.g., "Phase 1 ports only the minimum for NX workflow; remaining modules in Phase 2")

  4. 🟡 Add per-phase verification commands (not just end-state criteria)

  5. 🟡 Add rollback procedure to Section 11

  6. 🟡 Pin dependency versions in pyproject.toml (especially PyTorch, torch-geometric)

  7. 🟡 Add pre-commit hook for file size enforcement (>1MB rejection)

  8. 🟢 Consider import compatibility shim for transition period

  9. 🟢 Investigate V1 tools/ size (462MB — what's in there?)

  10. 🟢 Decide on .jsonl tracking — knowledge base files should probably be tracked, session data should not


This is a strong plan with the right vision and principles. The architecture alignment is excellent. The gaps are execution-level — they're fixable before work begins. Fixing them now prevents the "oh wait, where does this module go?" problem that derails migrations mid-stream.

— Auditor 🔍, 2026-02-22