Atomizer/knowledge_base/lac/session_insights/failure.jsonl

{"timestamp":"2025-12-17T20:30:00","category":"failure","context":"Killed NX process (ugraf.exe PID 111040) without permission while trying to extract expressions","insight":"CRITICAL RULE VIOLATION: Never kill NX (ugraf.exe) or any user process directly. The NXSessionManager exists specifically to track which NX sessions Atomizer started vs user sessions. Only use manager.close_nx_if_allowed() which checks can_close_nx() before terminating. Direct Stop-Process or taskkill on ugraf.exe is FORBIDDEN unless the session manager confirms we started that PID.","confidence":1.0,"tags":["nx","process-management","safety","critical","session-manager"],"severity":"critical","rule":"NEVER use Stop-Process, taskkill, or any direct process termination on ugraf.exe. Always use NXSessionManager.close_nx_if_allowed() which only closes sessions we started."}
{"timestamp":"2025-12-17T20:40:00","category":"failure","context":"Created m1_mirror_cost_reduction_V2 study without README.md despite OP_01 protocol clearly requiring it","insight":"EXECUTION FAILURE: The protocol OP_01_CREATE_STUDY.md already listed README.md as a required output, but I failed to follow my own documentation. This is a process discipline issue, not a knowledge gap. The fix is NOT to add more documentation (it was already there), but to use TodoWrite to track ALL required outputs during study creation and verify completion before declaring done. When creating a study, the todo list MUST include: (1) optimization_config.json, (2) run_optimization.py, (3) README.md, (4) STUDY_REPORT.md - and mark study creation complete ONLY after all 4 are done.","confidence":1.0,"tags":["study-creation","documentation","readme","process-discipline","todowrite"],"severity":"high","rule":"When creating a study, add ALL required files to TodoWrite checklist and verify each is created before marking task complete. The protocol exists - FOLLOW IT."}
{"timestamp":"2025-12-19T10:00:00","category":"workaround","context":"NX journal execution via cmd /c with environment variables fails silently or produces garbled output. Multiple attempts with cmd /c SET and && chaining failed to capture run_journal.exe output.","insight":"CRITICAL WORKAROUND: When executing NX journals from Claude Code on Windows, use PowerShell with [Environment]::SetEnvironmentVariable() method instead of cmd /c or $env: syntax. The correct pattern is: powershell -Command \"[Environment]::SetEnvironmentVariable('SPLM_LICENSE_SERVER', '28000@dalidou;28000@100.80.199.40', 'Process'); & 'C:\\Program Files\\Siemens\\DesigncenterNX2512\\NXBIN\\run_journal.exe' 'journal.py' -args 'arg1' 'arg2' 2>&1\". The $env: syntax gets corrupted when passed through bash (colon gets interpreted). The cmd /c SET syntax often fails to capture output. This PowerShell pattern reliably sets license server and captures all output.","confidence":1.0,"tags":["nx","powershell","run_journal","license-server","windows","cmd-workaround"],"severity":"high","rule":"ALWAYS use PowerShell with [Environment]::SetEnvironmentVariable() for NX journal execution. NEVER use cmd /c SET or $env: syntax for setting SPLM_LICENSE_SERVER."}
{"timestamp":"2025-12-19T15:30:00","category":"failure","context":"CMA-ES optimization V7 started with random sample instead of baseline. First trial had whiffle_min=45.73 instead of baseline 62.75, resulting in WS=329 instead of expected ~281.","insight":"CMA-ES with Optuna CmaEsSampler does NOT evaluate x0 (baseline) first - it samples AROUND x0 with sigma0 step size. The x0 parameter only sets the CENTER of the initial sampling distribution, not the first trial. To ensure baseline is evaluated first, use study.enqueue_trial(x0) after creating the study. This is critical for refinement studies where you need to compare against a known-good baseline. Pattern: if len(study.trials) == 0: study.enqueue_trial(x0)","confidence":1.0,"tags":["cma-es","optuna","baseline","x0","enqueue","optimization"],"severity":"high","rule":"When using CmaEsSampler with a known baseline, ALWAYS enqueue the baseline as trial 0 using study.enqueue_trial(x0). The x0 parameter alone does NOT guarantee baseline evaluation."}
{"timestamp":"2025-12-22T14:00:00","category":"failure","context":"V10 mirror optimization reported impossibly good relative WFE values (40-20=1.99nm instead of ~6nm, 60-20=6.82nm instead of ~13nm). User noticed results were 'too good to be true'.","insight":"CRITICAL BUG IN RELATIVE WFE CALCULATION: The V10 run_optimization.py computed relative WFE as abs(RMS_target - RMS_ref) instead of RMS(WFE_target - WFE_ref). This is mathematically WRONG because |RMS(A) - RMS(B)| ≠ RMS(A - B). The correct approach is to compute the node-by-node WFE difference FIRST, then fit Zernike to the difference field, then compute RMS. The bug gave values 3-4x lower than correct values because the 20° reference had HIGHER absolute WFE than 40°/60°, so the subtraction gave negative values, and abs() hid the problem. The fix is to use extractor.extract_relative() which correctly computes node-by-node differences. Both ZernikeExtractor and ZernikeOPDExtractor now have extract_relative() methods.","confidence":1.0,"tags":["zernike","wfe","relative-wfe","extract_relative","critical-bug","v10"],"severity":"critical","rule":"NEVER compute relative WFE as abs(RMS_target - RMS_ref). ALWAYS use extract_relative() which computes RMS(WFE_target - WFE_ref) by doing node-by-node subtraction first, then Zernike fitting, then RMS."}