deploy: self-update re-exec guard in deploy.sh

When deploy.sh itself changes in the commit being pulled, the bash
process is still running the OLD script from memory — git reset --hard
updated the file on disk but the in-memory instructions are stale.
This bit the 2026-04-09 Dalidou deploy: the old pre-build-sha Step 2
ran against fresh source, so the container started with
ATOCORE_BUILD_SHA="unknown" instead of the real commit. Manual
re-run fixed it, but the class of bug will re-emerge every time
deploy.sh itself changes.

Fix (Step 1.5):
- After git reset --hard, sha1 the running script ($0) and the
  on-disk copy at $APP_DIR/deploy/dalidou/deploy.sh
- If they differ, export ATOCORE_DEPLOY_REEXECED=1 and exec into
  the fresh copy so Step 2 onward runs under the new script
- The sentinel env var prevents recursion
- Skipped in dry-run mode, when $0 isn't readable, or when the
  on-disk script doesn't exist yet

Docs (docs/dalidou-deployment.md):
- New "The deploy.sh self-update race" troubleshooting section
  explaining the root cause, the Step 1.5 mechanism, what the log
  output looks like, and how to opt out

Verified syntax and dry-run. 219/219 tests still passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-08 21:08:41 -04:00
parent be4099486c
commit 03822389a1
2 changed files with 69 additions and 0 deletions

View File

@@ -160,6 +160,45 @@ else
run "git clone --branch '$BRANCH' '$GIT_REMOTE' '$APP_DIR'"
fi
# ---------------------------------------------------------------------
# Step 1.5: self-update re-exec guard
# ---------------------------------------------------------------------
#
# When deploy.sh itself changes in the commit we just pulled, the bash
# process running this script is still executing the OLD deploy.sh
# from memory — git reset --hard updated the file on disk but our
# in-memory instructions are stale. That's exactly how the first
# 2026-04-09 Dalidou deploy silently wrote "unknown" build_sha: old
# Step 2 logic ran against fresh source. Detect the mismatch and
# re-exec into the fresh copy so every post-update run exercises the
# new script.
#
# Guard rails:
# - Only runs when $APP_DIR exists, holds a git checkout, and a
# deploy.sh exists there (i.e. after Step 1 succeeded).
# - Uses a sentinel env var ATOCORE_DEPLOY_REEXECED=1 to make sure
# we only re-exec once, never recurse.
# - Skipped in dry-run mode (no mutation).
# - Skipped if $0 isn't a readable file (bash -c pipe inputs, etc.).
if [ "$DRY_RUN" != "1" ] \
&& [ -z "${ATOCORE_DEPLOY_REEXECED:-}" ] \
&& [ -r "$0" ] \
&& [ -f "$APP_DIR/deploy/dalidou/deploy.sh" ]; then
ON_DISK_HASH="$(sha1sum "$APP_DIR/deploy/dalidou/deploy.sh" 2>/dev/null | awk '{print $1}')"
RUNNING_HASH="$(sha1sum "$0" 2>/dev/null | awk '{print $1}')"
if [ -n "$ON_DISK_HASH" ] \
&& [ -n "$RUNNING_HASH" ] \
&& [ "$ON_DISK_HASH" != "$RUNNING_HASH" ]; then
log "Step 1.5: deploy.sh changed in the pulled commit; re-exec'ing"
log " running script hash: $RUNNING_HASH"
log " on-disk script hash: $ON_DISK_HASH"
log " re-exec -> $APP_DIR/deploy/dalidou/deploy.sh"
export ATOCORE_DEPLOY_REEXECED=1
exec bash "$APP_DIR/deploy/dalidou/deploy.sh" "$@"
fi
fi
# ---------------------------------------------------------------------
# Step 2: capture build provenance to pass to the container
# ---------------------------------------------------------------------