94 lines
2.8 KiB
Markdown
94 lines
2.8 KiB
Markdown
|
|
# SYS_19 — Job Queue Protocol
|
||
|
|
|
||
|
|
## Purpose
|
||
|
|
Defines how agents submit and monitor optimization jobs that execute on Windows (NX/Simcenter).
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
Linux (Agents) Windows (NX/Simcenter)
|
||
|
|
/job-queue/ C:\Atomizer\job-queue\
|
||
|
|
├── inbox/ ← results ├── inbox/
|
||
|
|
├── outbox/ → jobs ├── outbox/
|
||
|
|
└── archive/ (processed) └── archive/
|
||
|
|
```
|
||
|
|
|
||
|
|
Syncthing keeps these directories in sync (5-30 second delay).
|
||
|
|
|
||
|
|
## Submitting a Job
|
||
|
|
|
||
|
|
### Study Builder creates job directory:
|
||
|
|
```
|
||
|
|
outbox/job-YYYYMMDD-HHMMSS-<name>/
|
||
|
|
├── job.json # Job manifest (REQUIRED)
|
||
|
|
├── run_optimization.py # The script to execute
|
||
|
|
├── atomizer_spec.json # Study configuration (if applicable)
|
||
|
|
├── README.md # Human-readable description
|
||
|
|
└── 1_setup/ # Model files
|
||
|
|
├── *.prt # NX parts
|
||
|
|
├── *_i.prt # Idealized parts
|
||
|
|
├── *.fem # FEM files
|
||
|
|
└── *.sim # Simulation files
|
||
|
|
```
|
||
|
|
|
||
|
|
### job.json Format
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"job_id": "job-20260210-143022-wfe",
|
||
|
|
"created_at": "2026-02-10T14:30:22Z",
|
||
|
|
"created_by": "study-builder",
|
||
|
|
"project": "starspec-m1-wfe",
|
||
|
|
"channel": "#starspec-m1-wfe",
|
||
|
|
"type": "optimization",
|
||
|
|
"script": "run_optimization.py",
|
||
|
|
"args": ["--start"],
|
||
|
|
"status": "submitted",
|
||
|
|
"notify": {
|
||
|
|
"on_complete": true,
|
||
|
|
"on_fail": true
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Monitoring a Job
|
||
|
|
|
||
|
|
Agents check job status by reading job.json files:
|
||
|
|
- `outbox/` → Submitted, waiting for sync
|
||
|
|
- After Antoine runs the script, results appear in `inbox/`
|
||
|
|
|
||
|
|
### Status Values
|
||
|
|
| Status | Meaning |
|
||
|
|
|--------|---------|
|
||
|
|
| `submitted` | Agent placed job in outbox |
|
||
|
|
| `running` | Antoine started execution |
|
||
|
|
| `completed` | Finished successfully |
|
||
|
|
| `failed` | Execution failed |
|
||
|
|
|
||
|
|
## Receiving Results
|
||
|
|
|
||
|
|
Results arrive in `inbox/` with updated job.json and result files:
|
||
|
|
```
|
||
|
|
inbox/job-YYYYMMDD-HHMMSS-<name>/
|
||
|
|
├── job.json # Updated status
|
||
|
|
├── 3_results/ # Output data
|
||
|
|
│ ├── study.db # Optuna study database
|
||
|
|
│ ├── *.csv # Result tables
|
||
|
|
│ └── *.png # Generated plots
|
||
|
|
└── stdout.log # Execution log
|
||
|
|
```
|
||
|
|
|
||
|
|
## Post-Processing
|
||
|
|
|
||
|
|
1. Manager's heartbeat detects new results in `inbox/`
|
||
|
|
2. Manager notifies Post-Processor
|
||
|
|
3. Post-Processor analyzes results
|
||
|
|
4. Move processed job to `archive/` with timestamp
|
||
|
|
|
||
|
|
## Rules
|
||
|
|
|
||
|
|
1. **Never modify files in inbox/ directly** — copy first, then process
|
||
|
|
2. **Always include job.json** — it's the job's identity
|
||
|
|
3. **Use descriptive names** — `job-20260210-143022-starspec-wfe` not `job-1`
|
||
|
|
4. **Include README.md** — so Antoine knows what the job does at a glance
|
||
|
|
5. **Relative paths only** — no absolute Windows/Linux paths in scripts
|