deploy: add build_sha visibility for precise drift detection

Make /health report the precise git SHA the container was built from,
so 'is the live service current?' can be answered without ambiguity.
0.2.0 was too coarse to trust as a 'live is current' signal — many
commits share the same __version__.

Three layers:

1. /health endpoint (src/atocore/api/routes.py)
   - Reads ATOCORE_BUILD_SHA, ATOCORE_BUILD_TIME, ATOCORE_BUILD_BRANCH
     from environment, defaults to 'unknown'
   - Reports them alongside existing code_version field

2. docker-compose.yml
   - Forwards the three env vars from the host into the container
   - Defaults to 'unknown' so direct `docker compose up` runs (without
     deploy.sh) cleanly signal missing build provenance

3. deploy.sh
   - Step 2 captures git SHA + UTC timestamp + branch and exports them
     as env vars before `docker compose up -d --build`
   - Step 6 reads /health post-deploy and compares the reported
     build_sha against the freshly-built one. Mismatch exits non-zero
     (exit code 6) with a remediation hint covering cached image,
     env propagation, and concurrent restart cases

Tests (tests/test_api_storage.py):
- test_health_endpoint_reports_code_version_from_module
- test_health_endpoint_reports_build_metadata_from_env
- test_health_endpoint_reports_unknown_when_build_env_unset

Docs (docs/dalidou-deployment.md):
- Three-level drift detection table (code_version coarse,
  build_sha precise, build_time/branch forensic)
- Canonical drift check script using LIVE_SHA vs EXPECTED_SHA
- Note that running deploy.sh is itself the simplest drift check

219/219 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-08 20:25:32 -04:00
parent 2c0b214137
commit be4099486c
5 changed files with 207 additions and 28 deletions

View File

@@ -161,18 +161,33 @@ else
fi
# ---------------------------------------------------------------------
# Step 2: show what we're deploying
# Step 2: capture build provenance to pass to the container
# ---------------------------------------------------------------------
#
# We compute the full SHA, the short SHA, the UTC build timestamp,
# and the source branch. These get exported as env vars before
# `docker compose up -d --build` so the running container can read
# them at startup and report them via /health. The post-deploy
# verification step (Step 6) reads /health and compares the
# reported SHA against this value to detect any silent drift.
log "Step 2: deployable commit"
log "Step 2: capturing build provenance"
if [ "$DRY_RUN" != "1" ] && [ -d "$APP_DIR/.git" ]; then
( cd "$APP_DIR" && git log --oneline -1 )
( cd "$APP_DIR" && git rev-parse HEAD > /tmp/atocore-deploying-sha.txt )
DEPLOYING_SHA="$(cat /tmp/atocore-deploying-sha.txt | cut -c1-7)"
log " commit: $DEPLOYING_SHA"
DEPLOYING_SHA_FULL="$(cd "$APP_DIR" && git rev-parse HEAD)"
DEPLOYING_SHA="$(echo "$DEPLOYING_SHA_FULL" | cut -c1-7)"
DEPLOYING_TIME="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
DEPLOYING_BRANCH="$BRANCH"
log " commit: $DEPLOYING_SHA ($DEPLOYING_SHA_FULL)"
log " built at: $DEPLOYING_TIME"
log " branch: $DEPLOYING_BRANCH"
( cd "$APP_DIR" && git log --oneline -1 ) | sed 's/^/ /'
export ATOCORE_BUILD_SHA="$DEPLOYING_SHA_FULL"
export ATOCORE_BUILD_TIME="$DEPLOYING_TIME"
export ATOCORE_BUILD_BRANCH="$DEPLOYING_BRANCH"
else
log " [dry-run] would read git log from $APP_DIR"
DEPLOYING_SHA="dry-run"
DEPLOYING_SHA_FULL="dry-run"
fi
# ---------------------------------------------------------------------
@@ -222,14 +237,26 @@ if ! curl -fsS "$HEALTH_URL" > /tmp/atocore-health.json 2>/dev/null; then
fi
# ---------------------------------------------------------------------
# Step 6: verify the deployed version matches expectations
# Step 6: verify the deployed build matches what we just shipped
# ---------------------------------------------------------------------
#
# Two layers of comparison:
#
# - code_version: matches src/atocore/__init__.py::__version__.
# Coarse: any commit between version bumps reports the same value.
# - build_sha: full git SHA the container was built from. Set as
# an env var by Step 2 above and read by /health from
# ATOCORE_BUILD_SHA. This is the precise drift signal — if the
# live build_sha doesn't match $DEPLOYING_SHA_FULL, the build
# didn't pick up the new source.
log "Step 6: verifying deployed version"
log "Step 6: verifying deployed build"
log " /health response:"
if command -v jq >/dev/null 2>&1; then
jq . < /tmp/atocore-health.json | sed 's/^/ /'
REPORTED_VERSION="$(jq -r '.code_version // .version' < /tmp/atocore-health.json)"
REPORTED_SHA="$(jq -r '.build_sha // "unknown"' < /tmp/atocore-health.json)"
REPORTED_BUILD_TIME="$(jq -r '.build_time // "unknown"' < /tmp/atocore-health.json)"
else
cat /tmp/atocore-health.json | sed 's/^/ /'
echo
@@ -237,21 +264,47 @@ else
if [ -z "$REPORTED_VERSION" ]; then
REPORTED_VERSION="$(grep -o '"version":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
fi
REPORTED_SHA="$(grep -o '"build_sha":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
REPORTED_SHA="${REPORTED_SHA:-unknown}"
REPORTED_BUILD_TIME="$(grep -o '"build_time":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
REPORTED_BUILD_TIME="${REPORTED_BUILD_TIME:-unknown}"
fi
EXPECTED_VERSION="$(grep -oE "__version__ = \"[^\"]+\"" "$APP_DIR/src/atocore/__init__.py" | head -1 | cut -d'"' -f2)"
log " expected code_version: $EXPECTED_VERSION (from $APP_DIR/src/atocore/__init__.py)"
log " reported code_version: $REPORTED_VERSION (from live /health)"
log " Layer 1 — coarse version:"
log " expected code_version: $EXPECTED_VERSION (from src/atocore/__init__.py)"
log " reported code_version: $REPORTED_VERSION (from live /health)"
if [ "$REPORTED_VERSION" != "$EXPECTED_VERSION" ]; then
log "WARNING: deployed version mismatch"
log " the container may not have picked up the new image"
log " try: docker compose down && docker compose up -d --build"
log "FATAL: code_version mismatch"
log " the container may not have picked up the new image"
log " try: docker compose down && docker compose up -d --build"
exit 4
fi
log " Layer 2 — precise build SHA:"
log " expected build_sha: $DEPLOYING_SHA_FULL (from this deploy.sh run)"
log " reported build_sha: $REPORTED_SHA (from live /health)"
log " reported build_time: $REPORTED_BUILD_TIME"
if [ "$REPORTED_SHA" != "$DEPLOYING_SHA_FULL" ]; then
log "FATAL: build_sha mismatch"
log " the live container is reporting a different commit than"
log " the one this deploy.sh run just shipped. Possible causes:"
log " - the container is using a cached image instead of the"
log " freshly-built one (try: docker compose build --no-cache)"
log " - the env vars didn't propagate (check that"
log " deploy/dalidou/docker-compose.yml has the environment"
log " section with ATOCORE_BUILD_SHA)"
log " - another process restarted the container between the"
log " build and the health check"
exit 6
fi
log "Deploy complete."
log " commit: $DEPLOYING_SHA"
log " commit: $DEPLOYING_SHA ($DEPLOYING_SHA_FULL)"
log " code_version: $REPORTED_VERSION"
log " build_sha: $REPORTED_SHA"
log " build_time: $REPORTED_BUILD_TIME"
log " health: ok"

View File

@@ -9,6 +9,15 @@ services:
- "${ATOCORE_PORT:-8100}:8100"
env_file:
- .env
environment:
# Build provenance — set by deploy/dalidou/deploy.sh on each
# rebuild so /health can report exactly which commit is live.
# Defaults to 'unknown' for direct `docker compose up` runs that
# bypass deploy.sh; in that case the operator should run
# deploy.sh instead so the deployed SHA is recorded.
ATOCORE_BUILD_SHA: "${ATOCORE_BUILD_SHA:-unknown}"
ATOCORE_BUILD_TIME: "${ATOCORE_BUILD_TIME:-unknown}"
ATOCORE_BUILD_BRANCH: "${ATOCORE_BUILD_BRANCH:-unknown}"
volumes:
- ${ATOCORE_DB_DIR}:${ATOCORE_DB_DIR}
- ${ATOCORE_CHROMA_DIR}:${ATOCORE_CHROMA_DIR}