Atomizer/knowledge_base/lac/session_insights/failure.jsonl

{"timestamp":"2025-12-17T20:30:00","category":"failure","context":"Killed NX process (ugraf.exe PID 111040) without permission while trying to extract expressions","insight":"CRITICAL RULE VIOLATION: Never kill NX (ugraf.exe) or any user process directly. The NXSessionManager exists specifically to track which NX sessions Atomizer started vs user sessions. Only use manager.close_nx_if_allowed() which checks can_close_nx() before terminating. Direct Stop-Process or taskkill on ugraf.exe is FORBIDDEN unless the session manager confirms we started that PID.","confidence":1.0,"tags":["nx","process-management","safety","critical","session-manager"],"severity":"critical","rule":"NEVER use Stop-Process, taskkill, or any direct process termination on ugraf.exe. Always use NXSessionManager.close_nx_if_allowed() which only closes sessions we started."}
{"timestamp":"2025-12-17T20:40:00","category":"failure","context":"Created m1_mirror_cost_reduction_V2 study without README.md despite OP_01 protocol clearly requiring it","insight":"EXECUTION FAILURE: The protocol OP_01_CREATE_STUDY.md already listed README.md as a required output, but I failed to follow my own documentation. This is a process discipline issue, not a knowledge gap. The fix is NOT to add more documentation (it was already there), but to use TodoWrite to track ALL required outputs during study creation and verify completion before declaring done. When creating a study, the todo list MUST include: (1) optimization_config.json, (2) run_optimization.py, (3) README.md, (4) STUDY_REPORT.md - and mark study creation complete ONLY after all 4 are done.","confidence":1.0,"tags":["study-creation","documentation","readme","process-discipline","todowrite"],"severity":"high","rule":"When creating a study, add ALL required files to TodoWrite checklist and verify each is created before marking task complete. The protocol exists - FOLLOW IT."}
{"timestamp":"2025-12-19T10:00:00","category":"workaround","context":"NX journal execution via cmd /c with environment variables fails silently or produces garbled output. Multiple attempts with cmd /c SET and && chaining failed to capture run_journal.exe output.","insight":"CRITICAL WORKAROUND: When executing NX journals from Claude Code on Windows, use PowerShell with [Environment]::SetEnvironmentVariable() method instead of cmd /c or $env: syntax. The correct pattern is: powershell -Command \"[Environment]::SetEnvironmentVariable('SPLM_LICENSE_SERVER', '28000@dalidou;28000@100.80.199.40', 'Process'); & 'C:\\Program Files\\Siemens\\DesigncenterNX2512\\NXBIN\\run_journal.exe' 'journal.py' -args 'arg1' 'arg2' 2>&1\". The $env: syntax gets corrupted when passed through bash (colon gets interpreted). The cmd /c SET syntax often fails to capture output. This PowerShell pattern reliably sets license server and captures all output.","confidence":1.0,"tags":["nx","powershell","run_journal","license-server","windows","cmd-workaround"],"severity":"high","rule":"ALWAYS use PowerShell with [Environment]::SetEnvironmentVariable() for NX journal execution. NEVER use cmd /c SET or $env: syntax for setting SPLM_LICENSE_SERVER."}
{"timestamp":"2025-12-19T15:30:00","category":"failure","context":"CMA-ES optimization V7 started with random sample instead of baseline. First trial had whiffle_min=45.73 instead of baseline 62.75, resulting in WS=329 instead of expected ~281.","insight":"CMA-ES with Optuna CmaEsSampler does NOT evaluate x0 (baseline) first - it samples AROUND x0 with sigma0 step size. The x0 parameter only sets the CENTER of the initial sampling distribution, not the first trial. To ensure baseline is evaluated first, use study.enqueue_trial(x0) after creating the study. This is critical for refinement studies where you need to compare against a known-good baseline. Pattern: if len(study.trials) == 0: study.enqueue_trial(x0)","confidence":1.0,"tags":["cma-es","optuna","baseline","x0","enqueue","optimization"],"severity":"high","rule":"When using CmaEsSampler with a known baseline, ALWAYS enqueue the baseline as trial 0 using study.enqueue_trial(x0). The x0 parameter alone does NOT guarantee baseline evaluation."}
{"timestamp":"2025-12-22T14:00:00","category":"failure","context":"V10 mirror optimization reported impossibly good relative WFE values (40-20=1.99nm instead of ~6nm, 60-20=6.82nm instead of ~13nm). User noticed results were 'too good to be true'.","insight":"CRITICAL BUG IN RELATIVE WFE CALCULATION: The V10 run_optimization.py computed relative WFE as abs(RMS_target - RMS_ref) instead of RMS(WFE_target - WFE_ref). This is mathematically WRONG because |RMS(A) - RMS(B)| ≠ RMS(A - B). The correct approach is to compute the node-by-node WFE difference FIRST, then fit Zernike to the difference field, then compute RMS. The bug gave values 3-4x lower than correct values because the 20° reference had HIGHER absolute WFE than 40°/60°, so the subtraction gave negative values, and abs() hid the problem. The fix is to use extractor.extract_relative() which correctly computes node-by-node differences. Both ZernikeExtractor and ZernikeOPDExtractor now have extract_relative() methods.","confidence":1.0,"tags":["zernike","wfe","relative-wfe","extract_relative","critical-bug","v10"],"severity":"critical","rule":"NEVER compute relative WFE as abs(RMS_target - RMS_ref). ALWAYS use extract_relative() which computes RMS(WFE_target - WFE_ref) by doing node-by-node subtraction first, then Zernike fitting, then RMS."}
{"timestamp":"2025-12-28T17:30:00","category":"failure","context":"V5 turbo optimization created from scratch instead of copying V4. Multiple critical components were missing or wrong: no license server, wrong extraction keys (filtered_rms_nm vs relative_filtered_rms_nm), wrong mfg_90 key, missing figure_path parameter, incomplete version regex.","insight":"STUDY DERIVATION FAILURE: When creating a new study version (V5 from V4), NEVER rewrite the run_optimization.py from scratch. ALWAYS copy the working version first, then add/modify only the new feature (e.g., L-BFGS polish). Rewriting caused 5 independent bugs: (1) missing LICENSE_SERVER setup, (2) wrong extraction key filtered_rms_nm instead of relative_filtered_rms_nm, (3) wrong mfg_90 key, (4) missing figure_path=None in extractor call, (5) incomplete version regex missing DesigncenterNX pattern. The FEA/extraction pipeline is PROVEN CODE - never rewrite it. Only add new optimization strategies as modules on top.","confidence":1.0,"tags":["study-creation","copy-dont-rewrite","extraction","license-server","v5","critical"],"severity":"critical","rule":"When deriving a new study version, COPY the entire working run_optimization.py first. Add new features as ADDITIONS, not rewrites. The FEA pipeline (license, NXSolver setup, extraction) is proven - never rewrite it."}
{"timestamp":"2025-12-28T21:30:00","category":"failure","context":"V5 flat back turbo optimization with MLP surrogate + L-BFGS polish. Surrogate predicted WS~280 but actual FEA gave WS~365-377. Error of 85-96 (30%+ relative error). All L-BFGS solutions converged to same fake optimum that didn't exist in reality.","insight":"SURROGATE + L-BFGS FAILURE MODE: Gradient-based optimization on MLP surrogates finds 'fake optima' that don't exist in real FEA. The surrogate has smooth gradients everywhere, but L-BFGS descends to regions OUTSIDE the training distribution where predictions are wildly wrong. V5 results: (1) Best TPE trial: WS=290.18, (2) Best L-BFGS trial: WS=325.27, (3) Worst L-BFGS trials: WS=376.52. The fancy L-BFGS polish made results WORSE than random TPE. Key issues: (a) No uncertainty quantification - can't detect out-of-distribution, (b) No mass constraint in surrogate - L-BFGS finds infeasible designs (122-124kg vs 120kg limit), (c) L-BFGS converges to same bad point from multiple starting locations (trials 31-44 all gave WS=376.52).","confidence":1.0,"tags":["surrogate","mlp","lbfgs","gradient-descent","fake-optima","out-of-distribution","v5","turbo"],"severity":"critical","rule":"NEVER trust gradient descent on surrogates without: (1) Uncertainty quantification to reject OOD predictions, (2) Mass/constraint prediction to enforce feasibility, (3) Trust-region to stay within training distribution. Pure TPE with real FEA often beats surrogate+gradient methods."}
{"timestamp": "2025-12-29T15:29:55.869508", "category": "failure", "context": "Trial 5 solver error", "insight": "convergence_failure: Convergence failure at iteration 100", "confidence": 0.7, "tags": ["solver", "convergence_failure", "automatic"]}