Atomizer/hq/workspaces/shared/reviews/v2-migration-audit.md

# V2 Migration Master Plan — Audit Report

**Auditor:** Auditor Agent 🔍
**Date:** 2026-02-22
**Document Reviewed:** `ATOMIZER-V2-MIGRATION-MASTERPLAN.md`
**Verdict:** 🟡 MAJOR issues found — plan is strong but has significant gaps that will cause problems during execution

---

## 1. Completeness — 🔴 CRITICAL GAPS

### 1.1 Missing V1 Modules (Not Accounted For)

The migration plan lists modules to port but **misses at least 8 significant V1 subpackages**:

| V1 Module | Files | Purpose | Impact if Missed |
|-----------|-------|---------|-----------------|
| `optimization_engine/context/` | 7 files | Session state, compaction, feedback loop, playbook, reflector | 🔴 Core runtime functionality — sessions won't persist state |
| `optimization_engine/study/` | 8 files | Study creator, wizard, continuation, reset, benchmarking, state, history | 🔴 Can't create or manage studies without this |
| `optimization_engine/utils/` | 12 files | Logger, dashboard_db, trial_manager, NX file discovery, study archiver, realtime tracking | 🔴 Infrastructure that everything depends on |
| `optimization_engine/plugins/` | 4 files | hook_manager, hooks, validators (DIFFERENT from `hooks/`) | 🟡 Plugin system won't work |
| `optimization_engine/intake/` | 3 files | Config intake, context intake, processor | 🟡 Study intake pipeline broken |
| `optimization_engine/validation/` | 3 files | checker.py, gate.py (DIFFERENT from `validators/`) | 🟡 Validation gates lost |
| `optimization_engine/model_discovery/` | 2 files | NX model introspection | 🟡 Model discovery capability lost |
| `optimization_engine/devloop/` | 7 files | Analyzer, orchestrator, planning, test_runner, browser scenarios | 🟢 DevLoop was planned for `tools/devloop_cli.py` but the full subpackage has 7 files |
| `optimization_engine/processors/` | 2 files | adaptive_characterization.py | 🟡 V1 already has a `processors/` concept |
| `optimization_engine/future/` | 11 files | Research agents, LLM workflow analyzer, step classifier | 🟢 May be intentionally excluded, but not listed in "DO NOT MIGRATE" |
| `optimization_engine/custom_functions/` | 2 files | NX material generator | 🟢 Utility, should be documented |
| `optimization_engine/templates/` | 3 files | run_optimization_template, run_nn_optimization_template | 🟡 Template system for studies |
| `optimization_engine/surrogates/` | 1 file | `__init__.py` (separate from `gnn/`) | 🟢 Minor |

### 1.2 Missing V1 Core Files

| V1 File | Role | Plan Status |
|---------|------|-------------|
| `optimization_engine/core/base_runner.py` | Base class for runners | ❌ Not mentioned (plan only lists runner.py) |
| `optimization_engine/core/gradient_optimizer.py` | Gradient-based optimization | ❌ Not mentioned |
| `optimization_engine/core/runner_with_neural.py` | Neural-accelerated runner | ❌ Not mentioned |
| `optimization_engine/core/strategy_portfolio.py` | Strategy portfolio management | ❌ Not mentioned |
| `optimization_engine/core/strategy_selector.py` | Strategy selection (different from method_selector) | ❌ Not mentioned |
| `optimization_engine/schemas/` | Schema files | ✅ Mentioned but directory contents not inventoried |

### 1.3 Missing V1 Root-Level Files

| File | Status |
|------|--------|
| `atomizer.py` (25KB monolith) | Listed in "DO NOT MIGRATE" ✅ but its functionality needs a replacement |
| `launch_dashboard.py` | ❌ Not mentioned — how does V2 launch the dashboard? |
| `requirements.txt` | Replaced by pyproject.toml ✅ |
| `install.bat` | ❌ Not mentioned — Windows install script |

### 1.4 V1 Tools Directory

The plan only mentions `tools/devloop_cli.py`. V1 `tools/` has **25+ scripts** including:
- `analyze_study.py`, `find_best_iteration.py`, `archive_study.py`
- `create_pareto_graphs.py`, `generate_psd_figures.py`
- Zernike-specific tools (HTML generator, WFE PSD, optical report)
- Study migration tools

**Recommendation:** Create an inventory of tools/ and decide per-file: migrate, archive, or replace.

---

## 2. Risk Assessment — 🟡 MAJOR

### 2.1 Identified Risks (Plan Section 11)

The plan's risk table is reasonable but **underestimates these risks:**

| Risk | Plan's Mitigation | My Assessment |
|------|-------------------|---------------|
| Import breakage | Find-replace `optimization_engine.` → `atomizer.` | 🟡 **Insufficient.** Many V1 modules use relative imports, cross-module imports, and `optimization_engine.` is nested (e.g., `from optimization_engine.core.runner import Runner` where `runner.py` imports from `optimization_engine.extractors`). A mechanical find-replace will miss circular dependencies and runtime-only imports. Need a test suite, not just sed. |
| NX integration breaks | Test on dalidou before archiving V1 | ✅ Adequate |
| `.gitignore` too aggressive | Test essential files | 🟡 See Data Safety section below |

### 2.2 Unidentified Risks

| Risk | Severity | Mitigation Needed |
|------|----------|-------------------|
| **V1 `utils/` dependency web** — logger, trial_manager, dashboard_db are imported EVERYWHERE in V1. Where do they go in V2? | 🔴 HIGH | Create `atomizer/utils/` or distribute into appropriate modules. Map ALL import dependencies before porting. |
| **`context/` module loss** — session state, compaction, feedback loops. If not ported, studies can't resume, context is lost between runs | 🔴 HIGH | Add to migration table, decide V2 location |
| **`study/` module loss** — study creation wizard, continuation, reset. Without this, can't create studies from V2 | 🔴 HIGH | Add to migration table as P0 |
| **Optuna DB path changes** — V1 studies store Optuna databases at specific paths. V2 restructure may break study continuation | 🟡 MED | Test study continuation with path remapping |
| **NX journal path references** — NX journals may hardcode V1 paths | 🟡 MED | Audit all journal files for hardcoded paths |
| **Knowledge base `.jsonl` files** — are these tracked in git or gitignored? They're small (212KB) but grow over time | 🟡 MED | Clarify: track in git or gitignore with backup strategy |
| **Python version compatibility** — pyproject.toml says `>=3.10` but V1 may use patterns from 3.8/3.9 | 🟢 LOW | Test on target Python version |

---

## 3. Feasibility — 🟡 8-Day Timeline is Aggressive

### 3.1 Phase-by-Phase Assessment

| Phase | Planned | Realistic | Issue |
|-------|---------|-----------|-------|
| Phase 0: Bootstrap + AOM | 1 day | 1.5 days | AOM link conversion for 48 docs is tedious even with a script. Needs manual QA. |
| Phase 1: Core Engine | 2 days | 3-4 days | **Plan lists 13 steps but misses ~25 additional files** from `core/`, `context/`, `study/`, `utils/`. Refactoring runner→engine while maintaining all runner variants (base_runner, runner_with_neural) is non-trivial. |
| Phase 2: Supporting | 2 days | 2 days | Reasonable if scope is truly "direct port" |
| Phase 3: Integration | 2 days | 3 days | Import fixes across 100+ files. This is where the missing modules will surface. |
| Phase 4: Syncthing | 1 day | 1 day | Reasonable |
| Phase 5: GitHub + CI | 1 day | 0.5 days | Straightforward |
| Phase 6: Archive V1 | 1 day | 0.5 days | Straightforward |
| **Total** | **8 days** | **11-13 days** | |

### 3.2 Key Bottleneck

**Phase 1 is underscoped.** The migration table shows 13 clean steps, but V1's `optimization_engine/` has **~150 Python files across 20 subpackages**. The plan only explicitly accounts for ~60 of these. The remaining ~90 files will surface during Phase 3 integration testing, causing scope creep and rework.

**Recommendation:** Before starting, create a complete file-level inventory mapping every V1 `.py` file to its V2 destination (or explicit "skip" decision). This takes ~2 hours but saves days of surprises.

---

## 4. Architecture Alignment — ✅ STRONG

### 4.1 AOM Component Map Match

The V2 structure maps well to the AOM's four pillars:

| AOM Component | V2 Location | Match |
|--------------|-------------|-------|
| Pillar 1 (Philosophy) | `docs/AOM/01-Philosophy/` | ✅ |
| Pillar 2 (Operations) | `docs/AOM/02-Operations/` | ✅ |
| Pillar 3 (Developer) | `docs/AOM/03-Developer/` | ✅ |
| Pillar 4 (Knowledge) | `docs/AOM/04-Knowledge/` | ✅ |
| Contracts | `atomizer/contracts/` | ✅ Matches AOM 03-Developer/08-Data-Contracts |
| Processors | `atomizer/processors/` | ✅ Matches AOM 03-Developer/09-Processor-Development |
| Orchestrator | `atomizer/orchestrator/` | ✅ Matches AOM 01-Philosophy/08-Tool-Agnostic |
| Extractors | `atomizer/extractors/` | ✅ Matches AOM 02-Operations/04-Extractor-Library |
| Protocols | `docs/protocols/` | ✅ Matches AOM 02-Operations/02-Protocol-Reference |

### 4.2 Minor Misalignments

| Issue | Severity |
|-------|----------|
| AOM has `Audit/` folder (2 docs) — plan places it under `docs/AOM/Audit/` ✅ | None |
| AOM Phase 4/5 docs (CLAUDE-v2, Living-Document-Protocol) need explicit V2 homes — plan addresses this in Section 4.4 ✅ | None |
| MCP servers are in V2 repo as `mcp_servers/` but AOM 03-Developer/10 suggests they could be separate repos | 🟢 Minor — decide later |

---

## 5. Data Safety — 🟡 NEEDS ATTENTION

### 5.1 .gitignore Assessment

**Good coverage for:**
- NX/solver binary files (`.sim`, `.prt`, `.fem`, `.bdf`, `.op2`, `.f06`, `.frd`)
- Python artifacts
- IDE files
- Study data directory

**Missing patterns:**

| Pattern | Risk | Recommendation |
|---------|------|---------------|
| `*.backup` / `*.bak` | Backup files could leak | Add `*.bak` and `*.backup` |
| `*.csv` | Large result CSVs from studies | Add or use `studies/` containment |
| `*.png` / `*.jpg` in study dirs | Iteration screenshots, contour plots | Covered by `studies/` gitignore ✅ |
| `*.sqlite` / `*.sqlite3` | Optuna databases | Add explicitly (`.db` covers some but not all) |
| `research_sessions/` | Knowledge base research data | Clarify if tracked |
| `*.jsonl` | Session insights grow unbounded | Clarify: should `knowledge/session_insights/*.jsonl` be tracked? |
| `*.whl` | Wheel files | Add |
| `*.tar.gz` / `*.zip` | Archives in tools/ | Not currently present but preventive |

### 5.2 Large File Risk

The plan correctly excludes `projects/` (99GB), `atomizer_field_training_data/` (68MB), and `tools/` (462MB — wait, why is V1 tools/ 462MB?).

**Action item:** Investigate what's in V1 `tools/` that's 462MB. The plan lists it as "Large tool archives" — these could contaminate V2 if `tools/` is ported carelessly.

### 5.3 Success Criterion #9

> "No file larger than 1MB in git history (excluding initial dashboard assets)"

This is good but needs enforcement. **Recommendation:** Add a pre-commit hook or CI check that rejects files >1MB.

---

## 6. Backward Compatibility — 🟡 RISKS EXIST

### 6.1 AtomizerSpec v2→v3 Migration

The plan mentions `atomizer/spec/migrator.py` for v2.0→v3.0 migration. This is critical.

**Key question:** What happens when a V1 `atomizer_spec.json` is loaded?
- V1 specs have no `toolchain` section → must default to `NX/NX mesher/Nastran`
- V1 specs use `optimization_engine.*` import paths in custom hooks → must still work
- V1 specs may reference absolute paths on dalidou → need path translation

### 6.2 V1 Study Continuation

Can a V2 installation continue an in-progress V1 study?
- Optuna DB: needs same database path or migration
- Study state: `optimization_engine/study/state.py` tracks progress — needs porting
- Iteration results: stored in `studies/*/` — path-dependent

**The plan doesn't address mid-study migration.** This may be acceptable if all V1 studies are completed before migration, but this should be an explicit decision.

### 6.3 Import Path Compatibility

The plan says "find-replace `optimization_engine.` → `atomizer.`" but:
- V1 custom hooks may import from `optimization_engine.*`
- User-created study scripts import V1 paths
- NX journals may import from V1 paths

**Recommendation:** Consider a compatibility shim:
```python
# optimization_engine/__init__.py (temporary)
import warnings
warnings.warn("optimization_engine is deprecated, use atomizer", DeprecationWarning)
from atomizer import *
```

---

## 7. Gaps — What Hasn't Been Considered

### 7.1 🔴 No Rollback Plan
If V2 migration fails at Phase 3, what's the recovery? V1 is still there (not archived until Phase 6), but there's no documented rollback procedure.

### 7.2 🟡 No Migration Verification Checklist
The "Success Criteria" (Section 13) are end-state checks. There's no per-phase verification that catches issues early. Each phase needs explicit "done when" criteria with test commands.

### 7.3 🟡 Environment/Dependencies
- V1 uses `requirements.txt` + conda (`atomizer` env). V2 uses `pyproject.toml`.
- How are V1 dependencies captured? Is there a `pip freeze` of the working V1 environment?
- PyTorch + torch-geometric (for GNN) are notoriously version-sensitive. Pin versions.

### 7.4 🟡 Windows Path Handling
V1 was developed on Windows (NX is Windows-only). V2 development is on Linux. Cross-platform path handling (`pathlib.Path` vs string paths) needs systematic review, not just "update Windows paths in NX processor (if needed)."

### 7.5 🟢 Documentation for `config/` Migration
V1 has `config/nx_config.json.template` and `config/optimization_config_template.json`. These aren't mentioned in the migration plan. They should either map to V2's `atomizer/spec/` or `.env.example`.

### 7.6 🟢 `optimization_engine/schemas/` Contents
The plan says "Port schemas" but doesn't inventory what's in this directory. Should be checked.

### 7.7 🟢 Feature Registry
V1 has `optimization_engine/feature_registry.json`. Not mentioned in migration plan.

---

## Summary Scorecard

| Criteria | Grade | Notes |
|----------|-------|-------|
| **Completeness** | 🟡 C+ | ~60% of V1 files explicitly mapped. 8+ subpackages missing. |
| **Risk Assessment** | 🟡 B- | Good risks identified, but `utils/`, `context/`, `study/` omissions are high-risk |
| **Feasibility** | 🟡 B- | 8 days → realistically 11-13 days |
| **Architecture Alignment** | ✅ A | Excellent match to AOM Component Map |
| **Data Safety** | 🟡 B | Solid .gitignore but missing some patterns; needs pre-commit hook |
| **Backward Compatibility** | 🟡 B- | Spec migration planned but mid-study and import shims not addressed |
| **Overall** | 🟡 B- | Strong vision, solid architecture, but execution plan has dangerous gaps in file inventory |

---

## Recommendations (Priority Ordered)

1. **🔴 IMMEDIATE: Create complete file inventory** — Map every V1 `.py` file to V2 destination or explicit skip. ~2 hours, saves days. (`find optimization_engine -name "*.py" | sort` → spreadsheet with V2 destination column)

2. **🔴 Add missing modules to migration table:**
   - `context/` → `atomizer/context/` or merge into `optimization/`
   - `study/` → `atomizer/study/` (this is P0, not optional)
   - `utils/` → `atomizer/utils/` (infrastructure everything depends on)
   - `plugins/` → merge with `hooks/` or separate
   - `validation/` → merge with `spec/validator.py`
   - `intake/` → `atomizer/intake/` or merge into `interview/`

3. **🟡 Extend timeline to 12 days** or explicitly reduce scope (e.g., "Phase 1 ports only the minimum for NX workflow; remaining modules in Phase 2")

4. **🟡 Add per-phase verification commands** (not just end-state criteria)

5. **🟡 Add rollback procedure** to Section 11

6. **🟡 Pin dependency versions** in pyproject.toml (especially PyTorch, torch-geometric)

7. **🟡 Add pre-commit hook** for file size enforcement (>1MB rejection)

8. **🟢 Consider import compatibility shim** for transition period

9. **🟢 Investigate V1 `tools/` size** (462MB — what's in there?)

10. **🟢 Decide on `.jsonl` tracking** — knowledge base files should probably be tracked, session data should not

---

*This is a strong plan with the right vision and principles. The architecture alignment is excellent. The gaps are execution-level — they're fixable before work begins. Fixing them now prevents the "oh wait, where does this module go?" problem that derails migrations mid-stream.*

*— Auditor 🔍, 2026-02-22*