deploy: self-update re-exec guard in deploy.sh
When deploy.sh itself changes in the commit being pulled, the bash process is still running the OLD script from memory — git reset --hard updated the file on disk but the in-memory instructions are stale. This bit the 2026-04-09 Dalidou deploy: the old pre-build-sha Step 2 ran against fresh source, so the container started with ATOCORE_BUILD_SHA="unknown" instead of the real commit. Manual re-run fixed it, but the class of bug will re-emerge every time deploy.sh itself changes. Fix (Step 1.5): - After git reset --hard, sha1 the running script ($0) and the on-disk copy at $APP_DIR/deploy/dalidou/deploy.sh - If they differ, export ATOCORE_DEPLOY_REEXECED=1 and exec into the fresh copy so Step 2 onward runs under the new script - The sentinel env var prevents recursion - Skipped in dry-run mode, when $0 isn't readable, or when the on-disk script doesn't exist yet Docs (docs/dalidou-deployment.md): - New "The deploy.sh self-update race" troubleshooting section explaining the root cause, the Step 1.5 mechanism, what the log output looks like, and how to opt out Verified syntax and dry-run. 219/219 tests still passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -140,6 +140,36 @@ If you see `{"status": "unavailable", "fail_open": true}` from the
|
||||
client, the first thing to check is whether the base URL resolves
|
||||
from where you're running the client.
|
||||
|
||||
### The deploy.sh self-update race
|
||||
|
||||
When `deploy.sh` itself changes in the commit being pulled, the
|
||||
first run after the update is still executing the *old* script from
|
||||
the bash process's in-memory copy. `git reset --hard` updates the
|
||||
file on disk, but the running bash has already loaded the
|
||||
instructions. On 2026-04-09 this silently shipped an "unknown"
|
||||
`build_sha` because the old Step 2 (which predated env-var export)
|
||||
ran against fresh source.
|
||||
|
||||
`deploy.sh` now detects this: Step 1.5 compares the sha1 of `$0`
|
||||
(the running script) against the sha1 of
|
||||
`$APP_DIR/deploy/dalidou/deploy.sh` (the on-disk copy) after the
|
||||
git reset. If they differ, it sets `ATOCORE_DEPLOY_REEXECED=1` and
|
||||
`exec`s the fresh copy so the rest of the deploy runs under the new
|
||||
script. The sentinel env var prevents infinite recursion.
|
||||
|
||||
You'll see this in the logs as:
|
||||
|
||||
```text
|
||||
==> Step 1.5: deploy.sh changed in the pulled commit; re-exec'ing
|
||||
==> running script hash: <old>
|
||||
==> on-disk script hash: <new>
|
||||
==> re-exec -> /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
||||
```
|
||||
|
||||
To opt out (debugging, for example), pre-set
|
||||
`ATOCORE_DEPLOY_REEXECED=1` before invoking `deploy.sh` and the
|
||||
self-update guard will be skipped.
|
||||
|
||||
### Deployment drift detection
|
||||
|
||||
`/health` reports drift signals at three increasing levels of
|
||||
|
||||
Reference in New Issue
Block a user