When deploy.sh itself changes in the commit being pulled, the bash process is still running the OLD script from memory — git reset --hard updated the file on disk but the in-memory instructions are stale. This bit the 2026-04-09 Dalidou deploy: the old pre-build-sha Step 2 ran against fresh source, so the container started with ATOCORE_BUILD_SHA="unknown" instead of the real commit. Manual re-run fixed it, but the class of bug will re-emerge every time deploy.sh itself changes. Fix (Step 1.5): - After git reset --hard, sha1 the running script ($0) and the on-disk copy at $APP_DIR/deploy/dalidou/deploy.sh - If they differ, export ATOCORE_DEPLOY_REEXECED=1 and exec into the fresh copy so Step 2 onward runs under the new script - The sentinel env var prevents recursion - Skipped in dry-run mode, when $0 isn't readable, or when the on-disk script doesn't exist yet Docs (docs/dalidou-deployment.md): - New "The deploy.sh self-update race" troubleshooting section explaining the root cause, the Step 1.5 mechanism, what the log output looks like, and how to opt out Verified syntax and dry-run. 219/219 tests still passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8.7 KiB
Dalidou Deployment
Purpose
Deploy AtoCore on Dalidou as the canonical runtime and machine-memory host.
Model
- Dalidou hosts the canonical AtoCore service.
- OpenClaw on the T420 consumes AtoCore over network/Tailscale API.
sources/vaultandsources/driveare read-only inputs by convention.- SQLite/Chroma machine state stays on Dalidou and is not treated as a sync peer.
- The app and machine-storage host can be live before the long-term content corpus is fully populated.
Directory layout
/srv/storage/atocore/
app/ # deployed repo checkout
data/
db/
chroma/
cache/
tmp/
sources/
vault/
drive/
logs/
backups/
run/
Compose workflow
The compose definition lives in:
deploy/dalidou/docker-compose.yml
The Dalidou environment file should be copied to:
deploy/dalidou/.env
starting from:
deploy/dalidou/.env.example
First-time deployment steps
-
Place the repository under
/srv/storage/atocore/app— ideally as a proper git clone so future updates can be pulled, not as a static snapshot:sudo git clone http://dalidou:3000/Antoine/ATOCore.git \ /srv/storage/atocore/app -
Create the canonical directories listed above.
-
Copy
deploy/dalidou/.env.exampletodeploy/dalidou/.env. -
Adjust the source paths if your AtoVault/AtoDrive mirrors live elsewhere.
-
Run:
cd /srv/storage/atocore/app/deploy/dalidou docker compose up -d --build -
Validate:
curl http://127.0.0.1:8100/health curl http://127.0.0.1:8100/sources
Updating a running deployment
Use deploy/dalidou/deploy.sh for every code update. It is the
one-shot sync script that:
- fetches latest main from Gitea into
/srv/storage/atocore/app - (if the app dir is not a git checkout) backs it up as
<dir>.pre-git-<timestamp>and re-clones - rebuilds the container image
- restarts the container
- waits for
/healthto respond - compares the reported
code_versionagainst the__version__in the freshly-pulled source, and exits non-zero if they don't match (deployment drift detection)
# Normal update from main
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
# Deploy a specific branch or tag
ATOCORE_BRANCH=codex/some-feature \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
# Dry-run: show what would happen without touching anything
ATOCORE_DEPLOY_DRY_RUN=1 \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
# Deploy from a remote host (e.g. the laptop) using the Tailscale
# or LAN address instead of loopback
ATOCORE_GIT_REMOTE=http://192.168.86.50:3000/Antoine/ATOCore.git \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
The script is idempotent and safe to re-run. It never touches the
database directly — schema migrations are applied automatically at
service startup by the lifespan handler in src/atocore/main.py
which calls init_db() (which in turn runs the ALTER TABLE
statements in _apply_migrations).
Troubleshooting hostname resolution
deploy.sh defaults ATOCORE_GIT_REMOTE to
http://127.0.0.1:3000/Antoine/ATOCore.git (loopback) because the
hostname "dalidou" doesn't reliably resolve on the host itself —
the first real Dalidou deploy hit exactly this on 2026-04-08. If
you need to override (e.g. running deploy.sh from a laptop against
the Dalidou LAN), set ATOCORE_GIT_REMOTE explicitly.
The same applies to scripts/atocore_client.py: its default
ATOCORE_BASE_URL is http://dalidou:8100 for remote callers, but
when running the client on Dalidou itself (or inside the container
via docker exec), override to loopback:
ATOCORE_BASE_URL=http://127.0.0.1:8100 \
python scripts/atocore_client.py health
If you see {"status": "unavailable", "fail_open": true} from the
client, the first thing to check is whether the base URL resolves
from where you're running the client.
The deploy.sh self-update race
When deploy.sh itself changes in the commit being pulled, the
first run after the update is still executing the old script from
the bash process's in-memory copy. git reset --hard updates the
file on disk, but the running bash has already loaded the
instructions. On 2026-04-09 this silently shipped an "unknown"
build_sha because the old Step 2 (which predated env-var export)
ran against fresh source.
deploy.sh now detects this: Step 1.5 compares the sha1 of $0
(the running script) against the sha1 of
$APP_DIR/deploy/dalidou/deploy.sh (the on-disk copy) after the
git reset. If they differ, it sets ATOCORE_DEPLOY_REEXECED=1 and
execs the fresh copy so the rest of the deploy runs under the new
script. The sentinel env var prevents infinite recursion.
You'll see this in the logs as:
==> Step 1.5: deploy.sh changed in the pulled commit; re-exec'ing
==> running script hash: <old>
==> on-disk script hash: <new>
==> re-exec -> /srv/storage/atocore/app/deploy/dalidou/deploy.sh
To opt out (debugging, for example), pre-set
ATOCORE_DEPLOY_REEXECED=1 before invoking deploy.sh and the
self-update guard will be skipped.
Deployment drift detection
/health reports drift signals at three increasing levels of
precision:
| Field | Source | Precision | When to use |
|---|---|---|---|
version / code_version |
atocore.__version__ (manual bump) |
coarse — same value across many commits | quick smoke check that the right release is running |
build_sha |
ATOCORE_BUILD_SHA env var, set by deploy.sh per build |
precise — changes per commit | the canonical drift signal |
build_time / build_branch |
same env var path | per-build | forensics when multiple branches in flight |
The precise check (run on the laptop or any host that can curl the live service AND has the source repo at hand):
# What's actually running on Dalidou
LIVE_SHA=$(curl -fsS http://dalidou:8100/health | grep -o '"build_sha":"[^"]*"' | cut -d'"' -f4)
# What the deployed branch tip should be
EXPECTED_SHA=$(cd /srv/storage/atocore/app && git rev-parse HEAD)
# Compare
if [ "$LIVE_SHA" = "$EXPECTED_SHA" ]; then
echo "live is current at $LIVE_SHA"
else
echo "DRIFT: live $LIVE_SHA vs expected $EXPECTED_SHA"
echo "run deploy.sh to sync"
fi
The deploy.sh script does exactly this comparison automatically
in its post-deploy verification step (Step 6) and exits non-zero
on mismatch. So the simplest drift check is just to run
deploy.sh — if there's nothing to deploy, it succeeds quickly;
if the live service is stale, it deploys and verifies.
If /health reports build_sha: "unknown", the running container
was started without deploy.sh (probably via docker compose up
directly), and the build provenance was never recorded. Re-run
via deploy.sh to fix.
The coarse code_version check is still useful as a quick visual
sanity check — bumping __version__ from 0.2.0 to 0.3.0
signals a meaningful release boundary even if the precise
build_sha is what tools should compare against:
# Quick sanity check (coarse)
curl -s http://127.0.0.1:8100/health | grep -o '"code_version":"[^"]*"'
grep '__version__' /srv/storage/atocore/app/src/atocore/__init__.py
Schema migrations on redeploy
When updating from an older __version__, the first startup after
the redeploy runs the idempotent ALTER TABLE migrations in
_apply_migrations. For a pre-0.2.0 → 0.2.0 upgrade the migrations
add these columns to existing tables (all with safe defaults so no
data is touched):
memories.project TEXT DEFAULT ''memories.last_referenced_at DATETIMEmemories.reference_count INTEGER DEFAULT 0interactions.response TEXT DEFAULT ''interactions.memories_used TEXT DEFAULT '[]'interactions.chunks_used TEXT DEFAULT '[]'interactions.client TEXT DEFAULT ''interactions.session_id TEXT DEFAULT ''interactions.project TEXT DEFAULT ''
Plus new indexes on the new columns. No row data is modified. The
migration is safe to run against a database that already has the
columns — the _column_exists check makes each ALTER a no-op in
that case.
Backup the database before any redeploy (via POST /admin/backup)
if you want a pre-upgrade snapshot. The migration is additive and
reversible by restoring the snapshot.
Deferred
- backup automation
- restore/snapshot tooling
- reverse proxy / TLS exposure
- automated source ingestion job
- OpenClaw client wiring
Current Reality Check
When this deployment is first brought up, the service may be healthy before the real corpus has been ingested.
That means:
- AtoCore the system can already be hosted on Dalidou
- the canonical machine-data location can already be on Dalidou
- but the live knowledge/content corpus may still be empty or only partially loaded until source ingestion is run