- Restructure docs/ folder (remove numeric prefixes): - 04_USER_GUIDES -> guides/ - 05_API_REFERENCE -> api/ - 06_PHYSICS -> physics/ - 07_DEVELOPMENT -> development/ - 08_ARCHIVE -> archive/ - 09_DIAGRAMS -> diagrams/ - Replace tagline 'Talk, don't click' with 'LLM-driven optimization framework' in 9 files - Create comprehensive docs/GETTING_STARTED.md: - Prerequisites and quick setup - Project structure overview - First study tutorial (Claude or manual) - Dashboard usage guide - Neural acceleration introduction - Rewrite docs/00_INDEX.md with correct paths and modern structure - Archive obsolete files: - 01_PROTOCOLS.md -> archive/historical/01_PROTOCOLS_legacy.md - 03_GETTING_STARTED.md -> archive/historical/ - ATOMIZER_PODCAST_BRIEFING.md -> archive/marketing/ - Update timestamps to 2026-01-20 across all key files - Update .gitignore to exclude docs/generated/ - Version bump: ATOMIZER_CONTEXT v1.8 -> v2.0
420 lines
10 KiB
Markdown
420 lines
10 KiB
Markdown
# NX Session Management
|
|
|
|
**Status**: Implemented
|
|
**Version**: 1.0
|
|
**Date**: 2025-11-20
|
|
|
|
## Problem
|
|
|
|
When running multiple optimizations concurrently or when a user has NX open for manual work, conflicts can occur:
|
|
|
|
1. **Multiple Optimizations**: Two optimization studies trying to modify the same model simultaneously
|
|
2. **User's Interactive NX**: Batch optimization interfering with user's manual work
|
|
3. **File Corruption**: Concurrent writes to .prt/.sim files causing corruption
|
|
4. **License Conflicts**: Multiple NX instances competing for licenses
|
|
5. **Journal Failures**: Journals trying to run on wrong NX session
|
|
|
|
## Solution: NX Session Manager
|
|
|
|
The `NXSessionManager` class provides intelligent session conflict prevention.
|
|
|
|
### Key Features
|
|
|
|
1. **Session Detection**
|
|
- Detects all running NX processes (interactive + batch)
|
|
- Identifies interactive vs batch sessions
|
|
- Warns if user has NX open
|
|
|
|
2. **File Locking**
|
|
- Exclusive locks on model files (.prt)
|
|
- Prevents two optimizations from modifying same model
|
|
- Queues trials if model is locked
|
|
|
|
3. **Process Queuing**
|
|
- Limits concurrent NX batch sessions (default: 1)
|
|
- Waits if max sessions reached
|
|
- Automatic timeout and error handling
|
|
|
|
4. **Stale Lock Cleanup**
|
|
- Detects crashed processes
|
|
- Removes orphaned lock files
|
|
- Prevents permanent deadlocks
|
|
|
|
## Architecture
|
|
|
|
### Session Manager Components
|
|
|
|
```python
|
|
from optimization_engine.nx_session_manager import NXSessionManager
|
|
|
|
# Initialize
|
|
session_mgr = NXSessionManager(
|
|
lock_dir=Path.home() / ".atomizer" / "locks",
|
|
max_concurrent_sessions=1, # Max parallel NX instances
|
|
wait_timeout=300, # Max wait time (5 min)
|
|
verbose=True
|
|
)
|
|
```
|
|
|
|
### Two-Level Locking
|
|
|
|
**Level 1: Model File Lock** (most important)
|
|
```python
|
|
# Ensures exclusive access to a specific model
|
|
with session_mgr.acquire_model_lock(prt_file, study_name):
|
|
# Update CAD model
|
|
updater.update_expressions(params)
|
|
|
|
# Run simulation
|
|
result = solver.run_simulation(sim_file)
|
|
```
|
|
|
|
**Level 2: NX Session Lock** (optional)
|
|
```python
|
|
# Limits total concurrent NX batch instances
|
|
with session_mgr.acquire_nx_session(study_name):
|
|
# Run NX batch operation
|
|
pass
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Example 1: Single Optimization (Recommended)
|
|
|
|
```python
|
|
from optimization_engine.nx_solver import NXSolver
|
|
from optimization_engine.nx_updater import NXParameterUpdater
|
|
from optimization_engine.nx_session_manager import NXSessionManager
|
|
|
|
# Initialize components
|
|
session_mgr = NXSessionManager(verbose=True)
|
|
updater = NXParameterUpdater("model.prt")
|
|
solver = NXSolver()
|
|
|
|
# Check for interactive NX sessions
|
|
if session_mgr.is_nx_interactive_session_running():
|
|
print("WARNING: NX is open! Close it before running optimization.")
|
|
# You can choose to abort or continue
|
|
|
|
# Run trials with session management
|
|
for trial in trials:
|
|
with session_mgr.acquire_model_lock(prt_file, "my_study"):
|
|
# Exclusive access to model - safe to modify
|
|
updater.update_expressions(params)
|
|
result = solver.run_simulation(sim_file)
|
|
```
|
|
|
|
### Example 2: Multiple Concurrent Optimizations
|
|
|
|
```python
|
|
# Study A (in one terminal)
|
|
session_mgr_A = NXSessionManager()
|
|
|
|
with session_mgr_A.acquire_model_lock(model_A_prt, "study_A"):
|
|
# Works on model A
|
|
updater_A.update_expressions(params_A)
|
|
solver_A.run_simulation(sim_A)
|
|
|
|
# Study B (in another terminal, simultaneously)
|
|
session_mgr_B = NXSessionManager()
|
|
|
|
with session_mgr_B.acquire_model_lock(model_B_prt, "study_B"):
|
|
# Works on model B (different model - no conflict)
|
|
updater_B.update_expressions(params_B)
|
|
solver_B.run_simulation(sim_B)
|
|
|
|
# If they try to use SAME model:
|
|
with session_mgr_A.acquire_model_lock(model_SAME, "study_A"):
|
|
pass # Acquires lock
|
|
|
|
with session_mgr_B.acquire_model_lock(model_SAME, "study_B"):
|
|
# Waits here until study_A releases lock
|
|
# Then proceeds safely
|
|
pass
|
|
```
|
|
|
|
### Example 3: Protection Against User's Interactive NX
|
|
|
|
```python
|
|
session_mgr = NXSessionManager(verbose=True)
|
|
|
|
# Detect if user has NX open
|
|
nx_sessions = session_mgr.get_running_nx_sessions()
|
|
|
|
for session in nx_sessions:
|
|
print(f"Detected: {session.name} (PID {session.pid})")
|
|
|
|
if session_mgr.is_nx_interactive_session_running():
|
|
print("Interactive NX session detected!")
|
|
print("Recommend closing NX before running optimization.")
|
|
|
|
# Option 1: Abort
|
|
raise RuntimeError("Close NX and try again")
|
|
|
|
# Option 2: Continue with warning
|
|
print("Continuing anyway... (may cause conflicts)")
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Lock Directory
|
|
|
|
Default: `~/.atomizer/locks/`
|
|
|
|
Custom:
|
|
```python
|
|
session_mgr = NXSessionManager(
|
|
lock_dir=Path("/custom/lock/dir")
|
|
)
|
|
```
|
|
|
|
### Concurrent Session Limit
|
|
|
|
Default: 1 (safest)
|
|
|
|
Allow multiple:
|
|
```python
|
|
session_mgr = NXSessionManager(
|
|
max_concurrent_sessions=2 # Allow 2 parallel NX batches
|
|
)
|
|
```
|
|
|
|
**Warning**: Multiple concurrent NX sessions require multiple licenses!
|
|
|
|
### Wait Timeout
|
|
|
|
Default: 300 seconds (5 minutes)
|
|
|
|
Custom:
|
|
```python
|
|
session_mgr = NXSessionManager(
|
|
wait_timeout=600 # Wait up to 10 minutes
|
|
)
|
|
```
|
|
|
|
## Integration with NXSolver
|
|
|
|
The `NXSolver` class has built-in session management:
|
|
|
|
```python
|
|
from optimization_engine.nx_solver import NXSolver
|
|
|
|
solver = NXSolver(
|
|
enable_session_management=True, # Default
|
|
study_name="my_study"
|
|
)
|
|
|
|
# Session management happens automatically
|
|
result = solver.run_simulation(sim_file)
|
|
```
|
|
|
|
**Note**: Full automatic integration is planned but not yet implemented. Currently, manual wrapping is recommended.
|
|
|
|
## Status Monitoring
|
|
|
|
### Get Current Status
|
|
|
|
```python
|
|
report = session_mgr.get_status_report()
|
|
print(report)
|
|
```
|
|
|
|
Output:
|
|
```
|
|
======================================================================
|
|
NX SESSION MANAGER STATUS
|
|
======================================================================
|
|
|
|
Running NX Processes: 2
|
|
PID 12345: ugraf.exe
|
|
Working dir: C:/Users/username/project
|
|
PID 12346: run_journal.exe
|
|
|
|
WARNING: Interactive NX session detected!
|
|
Batch operations may conflict with user's work.
|
|
|
|
Active Optimization Sessions: 1/1
|
|
my_study (PID 12347)
|
|
|
|
Active Lock Files: 1
|
|
======================================================================
|
|
```
|
|
|
|
### Cleanup Stale Locks
|
|
|
|
```python
|
|
# Run at startup
|
|
session_mgr.cleanup_stale_locks()
|
|
```
|
|
|
|
Removes lock files from crashed processes.
|
|
|
|
## Error Handling
|
|
|
|
### Lock Timeout
|
|
|
|
```python
|
|
try:
|
|
with session_mgr.acquire_model_lock(prt_file, study_name):
|
|
# ... modify model ...
|
|
pass
|
|
except TimeoutError as e:
|
|
print(f"Could not acquire model lock: {e}")
|
|
print("Another optimization may be using this model.")
|
|
# Handle error (skip trial, abort, etc.)
|
|
```
|
|
|
|
### NX Session Timeout
|
|
|
|
```python
|
|
try:
|
|
with session_mgr.acquire_nx_session(study_name):
|
|
# ... run NX batch ...
|
|
pass
|
|
except TimeoutError as e:
|
|
print(f"Could not acquire NX session: {e}")
|
|
print(f"Max concurrent sessions ({session_mgr.max_concurrent}) reached.")
|
|
# Handle error
|
|
```
|
|
|
|
## Platform Support
|
|
|
|
- ✅ **Windows**: Full support (uses `msvcrt` for file locking)
|
|
- ✅ **Linux/Mac**: Full support (uses `fcntl` for file locking)
|
|
- ✅ **Cross-Platform**: Lock files work across different OS instances
|
|
|
|
## Limitations
|
|
|
|
1. **Same Machine Only**: Session manager only prevents conflicts on the same machine
|
|
- For networked optimizations, need distributed lock manager
|
|
|
|
2. **File System Required**: Requires writable lock directory
|
|
- May not work on read-only filesystems
|
|
|
|
3. **Process Detection**: Relies on `psutil` for process detection
|
|
- May miss processes in some edge cases
|
|
|
|
4. **Not Real-Time**: Lock checking has small latency
|
|
- Not suitable for microsecond-level synchronization
|
|
|
|
## Best Practices
|
|
|
|
### 1. Always Use Model Locks
|
|
|
|
```python
|
|
# GOOD: Protected
|
|
with session_mgr.acquire_model_lock(prt_file, study_name):
|
|
updater.update_expressions(params)
|
|
|
|
# BAD: Unprotected (race condition!)
|
|
updater.update_expressions(params)
|
|
```
|
|
|
|
### 2. Check for Interactive NX
|
|
|
|
```python
|
|
# Before starting optimization
|
|
if session_mgr.is_nx_interactive_session_running():
|
|
print("WARNING: Close NX before running optimization!")
|
|
# Decide: abort or continue with warning
|
|
```
|
|
|
|
### 3. Cleanup on Startup
|
|
|
|
```python
|
|
# At optimization start
|
|
session_mgr = NXSessionManager()
|
|
session_mgr.cleanup_stale_locks() # Remove crashed process locks
|
|
```
|
|
|
|
### 4. Use Unique Study Names
|
|
|
|
```python
|
|
# GOOD: Unique names
|
|
solver_A = NXSolver(study_name="beam_optimization_trial_42")
|
|
solver_B = NXSolver(study_name="plate_optimization_trial_15")
|
|
|
|
# BAD: Same name (confusing logs)
|
|
solver_A = NXSolver(study_name="default_study")
|
|
solver_B = NXSolver(study_name="default_study")
|
|
```
|
|
|
|
### 5. Handle Timeouts Gracefully
|
|
|
|
```python
|
|
try:
|
|
with session_mgr.acquire_model_lock(prt_file, study_name):
|
|
result = solver.run_simulation(sim_file)
|
|
except TimeoutError:
|
|
# Don't crash entire optimization!
|
|
print("Lock timeout - skipping this trial")
|
|
raise optuna.TrialPruned() # Optuna will continue
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### "Lock timeout" errors
|
|
|
|
**Cause**: Another process holds the lock longer than timeout
|
|
|
|
**Solutions**:
|
|
1. Check if another optimization is running
|
|
2. Increase timeout: `wait_timeout=600`
|
|
3. Check for stale locks: `cleanup_stale_locks()`
|
|
|
|
### "Interactive NX session detected" warnings
|
|
|
|
**Cause**: User has NX open in GUI mode
|
|
|
|
**Solutions**:
|
|
1. Close interactive NX before optimization
|
|
2. Use different model files
|
|
3. Continue with warning (risky!)
|
|
|
|
### Stale lock files
|
|
|
|
**Cause**: Optimization crashed without releasing locks
|
|
|
|
**Solution**:
|
|
```python
|
|
session_mgr.cleanup_stale_locks()
|
|
```
|
|
|
|
### Multiple optimizations on different models still conflict
|
|
|
|
**Cause**: NX session limit reached
|
|
|
|
**Solution**:
|
|
```python
|
|
session_mgr = NXSessionManager(
|
|
max_concurrent_sessions=2 # Allow 2 parallel NX instances
|
|
)
|
|
```
|
|
|
|
**Warning**: Requires 2 NX licenses!
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Distributed lock manager (for cluster computing)
|
|
- [ ] Automatic NX session affinity (assign trials to specific NX instances)
|
|
- [ ] License pool management
|
|
- [ ] Network file lock support (for shared drives)
|
|
- [ ] Real-time session monitoring dashboard
|
|
- [ ] Automatic crash recovery
|
|
|
|
## Version History
|
|
|
|
### Version 1.0 (2025-11-20)
|
|
- Initial implementation
|
|
- Model file locking
|
|
- NX session detection
|
|
- Concurrent session limiting
|
|
- Stale lock cleanup
|
|
- Status reporting
|
|
|
|
---
|
|
|
|
**Implementation Status**: ✅ Core functionality complete
|
|
**Testing Status**: ⚠️ Needs production testing
|
|
**Documentation Status**: ✅ Complete
|