Files
Atomizer/docs/api/nx_integration.md

420 lines
10 KiB
Markdown
Raw Permalink Normal View History

# NX Session Management
**Status**: Implemented
**Version**: 1.0
**Date**: 2025-11-20
## Problem
When running multiple optimizations concurrently or when a user has NX open for manual work, conflicts can occur:
1. **Multiple Optimizations**: Two optimization studies trying to modify the same model simultaneously
2. **User's Interactive NX**: Batch optimization interfering with user's manual work
3. **File Corruption**: Concurrent writes to .prt/.sim files causing corruption
4. **License Conflicts**: Multiple NX instances competing for licenses
5. **Journal Failures**: Journals trying to run on wrong NX session
## Solution: NX Session Manager
The `NXSessionManager` class provides intelligent session conflict prevention.
### Key Features
1. **Session Detection**
- Detects all running NX processes (interactive + batch)
- Identifies interactive vs batch sessions
- Warns if user has NX open
2. **File Locking**
- Exclusive locks on model files (.prt)
- Prevents two optimizations from modifying same model
- Queues trials if model is locked
3. **Process Queuing**
- Limits concurrent NX batch sessions (default: 1)
- Waits if max sessions reached
- Automatic timeout and error handling
4. **Stale Lock Cleanup**
- Detects crashed processes
- Removes orphaned lock files
- Prevents permanent deadlocks
## Architecture
### Session Manager Components
```python
from optimization_engine.nx_session_manager import NXSessionManager
# Initialize
session_mgr = NXSessionManager(
lock_dir=Path.home() / ".atomizer" / "locks",
max_concurrent_sessions=1, # Max parallel NX instances
wait_timeout=300, # Max wait time (5 min)
verbose=True
)
```
### Two-Level Locking
**Level 1: Model File Lock** (most important)
```python
# Ensures exclusive access to a specific model
with session_mgr.acquire_model_lock(prt_file, study_name):
# Update CAD model
updater.update_expressions(params)
# Run simulation
result = solver.run_simulation(sim_file)
```
**Level 2: NX Session Lock** (optional)
```python
# Limits total concurrent NX batch instances
with session_mgr.acquire_nx_session(study_name):
# Run NX batch operation
pass
```
## Usage Examples
### Example 1: Single Optimization (Recommended)
```python
from optimization_engine.nx_solver import NXSolver
from optimization_engine.nx_updater import NXParameterUpdater
from optimization_engine.nx_session_manager import NXSessionManager
# Initialize components
session_mgr = NXSessionManager(verbose=True)
updater = NXParameterUpdater("model.prt")
solver = NXSolver()
# Check for interactive NX sessions
if session_mgr.is_nx_interactive_session_running():
print("WARNING: NX is open! Close it before running optimization.")
# You can choose to abort or continue
# Run trials with session management
for trial in trials:
with session_mgr.acquire_model_lock(prt_file, "my_study"):
# Exclusive access to model - safe to modify
updater.update_expressions(params)
result = solver.run_simulation(sim_file)
```
### Example 2: Multiple Concurrent Optimizations
```python
# Study A (in one terminal)
session_mgr_A = NXSessionManager()
with session_mgr_A.acquire_model_lock(model_A_prt, "study_A"):
# Works on model A
updater_A.update_expressions(params_A)
solver_A.run_simulation(sim_A)
# Study B (in another terminal, simultaneously)
session_mgr_B = NXSessionManager()
with session_mgr_B.acquire_model_lock(model_B_prt, "study_B"):
# Works on model B (different model - no conflict)
updater_B.update_expressions(params_B)
solver_B.run_simulation(sim_B)
# If they try to use SAME model:
with session_mgr_A.acquire_model_lock(model_SAME, "study_A"):
pass # Acquires lock
with session_mgr_B.acquire_model_lock(model_SAME, "study_B"):
# Waits here until study_A releases lock
# Then proceeds safely
pass
```
### Example 3: Protection Against User's Interactive NX
```python
session_mgr = NXSessionManager(verbose=True)
# Detect if user has NX open
nx_sessions = session_mgr.get_running_nx_sessions()
for session in nx_sessions:
print(f"Detected: {session.name} (PID {session.pid})")
if session_mgr.is_nx_interactive_session_running():
print("Interactive NX session detected!")
print("Recommend closing NX before running optimization.")
# Option 1: Abort
raise RuntimeError("Close NX and try again")
# Option 2: Continue with warning
print("Continuing anyway... (may cause conflicts)")
```
## Configuration
### Lock Directory
Default: `~/.atomizer/locks/`
Custom:
```python
session_mgr = NXSessionManager(
lock_dir=Path("/custom/lock/dir")
)
```
### Concurrent Session Limit
Default: 1 (safest)
Allow multiple:
```python
session_mgr = NXSessionManager(
max_concurrent_sessions=2 # Allow 2 parallel NX batches
)
```
**Warning**: Multiple concurrent NX sessions require multiple licenses!
### Wait Timeout
Default: 300 seconds (5 minutes)
Custom:
```python
session_mgr = NXSessionManager(
wait_timeout=600 # Wait up to 10 minutes
)
```
## Integration with NXSolver
The `NXSolver` class has built-in session management:
```python
from optimization_engine.nx_solver import NXSolver
solver = NXSolver(
enable_session_management=True, # Default
study_name="my_study"
)
# Session management happens automatically
result = solver.run_simulation(sim_file)
```
**Note**: Full automatic integration is planned but not yet implemented. Currently, manual wrapping is recommended.
## Status Monitoring
### Get Current Status
```python
report = session_mgr.get_status_report()
print(report)
```
Output:
```
======================================================================
NX SESSION MANAGER STATUS
======================================================================
Running NX Processes: 2
PID 12345: ugraf.exe
Working dir: C:/Users/username/project
PID 12346: run_journal.exe
WARNING: Interactive NX session detected!
Batch operations may conflict with user's work.
Active Optimization Sessions: 1/1
my_study (PID 12347)
Active Lock Files: 1
======================================================================
```
### Cleanup Stale Locks
```python
# Run at startup
session_mgr.cleanup_stale_locks()
```
Removes lock files from crashed processes.
## Error Handling
### Lock Timeout
```python
try:
with session_mgr.acquire_model_lock(prt_file, study_name):
# ... modify model ...
pass
except TimeoutError as e:
print(f"Could not acquire model lock: {e}")
print("Another optimization may be using this model.")
# Handle error (skip trial, abort, etc.)
```
### NX Session Timeout
```python
try:
with session_mgr.acquire_nx_session(study_name):
# ... run NX batch ...
pass
except TimeoutError as e:
print(f"Could not acquire NX session: {e}")
print(f"Max concurrent sessions ({session_mgr.max_concurrent}) reached.")
# Handle error
```
## Platform Support
-**Windows**: Full support (uses `msvcrt` for file locking)
-**Linux/Mac**: Full support (uses `fcntl` for file locking)
-**Cross-Platform**: Lock files work across different OS instances
## Limitations
1. **Same Machine Only**: Session manager only prevents conflicts on the same machine
- For networked optimizations, need distributed lock manager
2. **File System Required**: Requires writable lock directory
- May not work on read-only filesystems
3. **Process Detection**: Relies on `psutil` for process detection
- May miss processes in some edge cases
4. **Not Real-Time**: Lock checking has small latency
- Not suitable for microsecond-level synchronization
## Best Practices
### 1. Always Use Model Locks
```python
# GOOD: Protected
with session_mgr.acquire_model_lock(prt_file, study_name):
updater.update_expressions(params)
# BAD: Unprotected (race condition!)
updater.update_expressions(params)
```
### 2. Check for Interactive NX
```python
# Before starting optimization
if session_mgr.is_nx_interactive_session_running():
print("WARNING: Close NX before running optimization!")
# Decide: abort or continue with warning
```
### 3. Cleanup on Startup
```python
# At optimization start
session_mgr = NXSessionManager()
session_mgr.cleanup_stale_locks() # Remove crashed process locks
```
### 4. Use Unique Study Names
```python
# GOOD: Unique names
solver_A = NXSolver(study_name="beam_optimization_trial_42")
solver_B = NXSolver(study_name="plate_optimization_trial_15")
# BAD: Same name (confusing logs)
solver_A = NXSolver(study_name="default_study")
solver_B = NXSolver(study_name="default_study")
```
### 5. Handle Timeouts Gracefully
```python
try:
with session_mgr.acquire_model_lock(prt_file, study_name):
result = solver.run_simulation(sim_file)
except TimeoutError:
# Don't crash entire optimization!
print("Lock timeout - skipping this trial")
raise optuna.TrialPruned() # Optuna will continue
```
## Troubleshooting
### "Lock timeout" errors
**Cause**: Another process holds the lock longer than timeout
**Solutions**:
1. Check if another optimization is running
2. Increase timeout: `wait_timeout=600`
3. Check for stale locks: `cleanup_stale_locks()`
### "Interactive NX session detected" warnings
**Cause**: User has NX open in GUI mode
**Solutions**:
1. Close interactive NX before optimization
2. Use different model files
3. Continue with warning (risky!)
### Stale lock files
**Cause**: Optimization crashed without releasing locks
**Solution**:
```python
session_mgr.cleanup_stale_locks()
```
### Multiple optimizations on different models still conflict
**Cause**: NX session limit reached
**Solution**:
```python
session_mgr = NXSessionManager(
max_concurrent_sessions=2 # Allow 2 parallel NX instances
)
```
**Warning**: Requires 2 NX licenses!
## Future Enhancements
- [ ] Distributed lock manager (for cluster computing)
- [ ] Automatic NX session affinity (assign trials to specific NX instances)
- [ ] License pool management
- [ ] Network file lock support (for shared drives)
- [ ] Real-time session monitoring dashboard
- [ ] Automatic crash recovery
## Version History
### Version 1.0 (2025-11-20)
- Initial implementation
- Model file locking
- NX session detection
- Concurrent session limiting
- Stale lock cleanup
- Status reporting
---
**Implementation Status**: ✅ Core functionality complete
**Testing Status**: ⚠️ Needs production testing
**Documentation Status**: ✅ Complete