Files
ATOCore/docs/dalidou-deployment.md
Anto01 03822389a1 deploy: self-update re-exec guard in deploy.sh
When deploy.sh itself changes in the commit being pulled, the bash
process is still running the OLD script from memory — git reset --hard
updated the file on disk but the in-memory instructions are stale.
This bit the 2026-04-09 Dalidou deploy: the old pre-build-sha Step 2
ran against fresh source, so the container started with
ATOCORE_BUILD_SHA="unknown" instead of the real commit. Manual
re-run fixed it, but the class of bug will re-emerge every time
deploy.sh itself changes.

Fix (Step 1.5):
- After git reset --hard, sha1 the running script ($0) and the
  on-disk copy at $APP_DIR/deploy/dalidou/deploy.sh
- If they differ, export ATOCORE_DEPLOY_REEXECED=1 and exec into
  the fresh copy so Step 2 onward runs under the new script
- The sentinel env var prevents recursion
- Skipped in dry-run mode, when $0 isn't readable, or when the
  on-disk script doesn't exist yet

Docs (docs/dalidou-deployment.md):
- New "The deploy.sh self-update race" troubleshooting section
  explaining the root cause, the Step 1.5 mechanism, what the log
  output looks like, and how to opt out

Verified syntax and dry-run. 219/219 tests still passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:08:41 -04:00

8.7 KiB

Dalidou Deployment

Purpose

Deploy AtoCore on Dalidou as the canonical runtime and machine-memory host.

Model

  • Dalidou hosts the canonical AtoCore service.
  • OpenClaw on the T420 consumes AtoCore over network/Tailscale API.
  • sources/vault and sources/drive are read-only inputs by convention.
  • SQLite/Chroma machine state stays on Dalidou and is not treated as a sync peer.
  • The app and machine-storage host can be live before the long-term content corpus is fully populated.

Directory layout

/srv/storage/atocore/
  app/         # deployed repo checkout
  data/
    db/
    chroma/
    cache/
    tmp/
  sources/
    vault/
    drive/
  logs/
  backups/
  run/

Compose workflow

The compose definition lives in:

deploy/dalidou/docker-compose.yml

The Dalidou environment file should be copied to:

deploy/dalidou/.env

starting from:

deploy/dalidou/.env.example

First-time deployment steps

  1. Place the repository under /srv/storage/atocore/app — ideally as a proper git clone so future updates can be pulled, not as a static snapshot:

    sudo git clone http://dalidou:3000/Antoine/ATOCore.git \
        /srv/storage/atocore/app
    
  2. Create the canonical directories listed above.

  3. Copy deploy/dalidou/.env.example to deploy/dalidou/.env.

  4. Adjust the source paths if your AtoVault/AtoDrive mirrors live elsewhere.

  5. Run:

    cd /srv/storage/atocore/app/deploy/dalidou
    docker compose up -d --build
    
  6. Validate:

    curl http://127.0.0.1:8100/health
    curl http://127.0.0.1:8100/sources
    

Updating a running deployment

Use deploy/dalidou/deploy.sh for every code update. It is the one-shot sync script that:

  • fetches latest main from Gitea into /srv/storage/atocore/app
  • (if the app dir is not a git checkout) backs it up as <dir>.pre-git-<timestamp> and re-clones
  • rebuilds the container image
  • restarts the container
  • waits for /health to respond
  • compares the reported code_version against the __version__ in the freshly-pulled source, and exits non-zero if they don't match (deployment drift detection)
# Normal update from main
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh

# Deploy a specific branch or tag
ATOCORE_BRANCH=codex/some-feature \
    bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh

# Dry-run: show what would happen without touching anything
ATOCORE_DEPLOY_DRY_RUN=1 \
    bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh

# Deploy from a remote host (e.g. the laptop) using the Tailscale
# or LAN address instead of loopback
ATOCORE_GIT_REMOTE=http://192.168.86.50:3000/Antoine/ATOCore.git \
    bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh

The script is idempotent and safe to re-run. It never touches the database directly — schema migrations are applied automatically at service startup by the lifespan handler in src/atocore/main.py which calls init_db() (which in turn runs the ALTER TABLE statements in _apply_migrations).

Troubleshooting hostname resolution

deploy.sh defaults ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git (loopback) because the hostname "dalidou" doesn't reliably resolve on the host itself — the first real Dalidou deploy hit exactly this on 2026-04-08. If you need to override (e.g. running deploy.sh from a laptop against the Dalidou LAN), set ATOCORE_GIT_REMOTE explicitly.

The same applies to scripts/atocore_client.py: its default ATOCORE_BASE_URL is http://dalidou:8100 for remote callers, but when running the client on Dalidou itself (or inside the container via docker exec), override to loopback:

ATOCORE_BASE_URL=http://127.0.0.1:8100 \
    python scripts/atocore_client.py health

If you see {"status": "unavailable", "fail_open": true} from the client, the first thing to check is whether the base URL resolves from where you're running the client.

The deploy.sh self-update race

When deploy.sh itself changes in the commit being pulled, the first run after the update is still executing the old script from the bash process's in-memory copy. git reset --hard updates the file on disk, but the running bash has already loaded the instructions. On 2026-04-09 this silently shipped an "unknown" build_sha because the old Step 2 (which predated env-var export) ran against fresh source.

deploy.sh now detects this: Step 1.5 compares the sha1 of $0 (the running script) against the sha1 of $APP_DIR/deploy/dalidou/deploy.sh (the on-disk copy) after the git reset. If they differ, it sets ATOCORE_DEPLOY_REEXECED=1 and execs the fresh copy so the rest of the deploy runs under the new script. The sentinel env var prevents infinite recursion.

You'll see this in the logs as:

==> Step 1.5: deploy.sh changed in the pulled commit; re-exec'ing
==>   running script hash: <old>
==>   on-disk script hash: <new>
==>   re-exec -> /srv/storage/atocore/app/deploy/dalidou/deploy.sh

To opt out (debugging, for example), pre-set ATOCORE_DEPLOY_REEXECED=1 before invoking deploy.sh and the self-update guard will be skipped.

Deployment drift detection

/health reports drift signals at three increasing levels of precision:

Field Source Precision When to use
version / code_version atocore.__version__ (manual bump) coarse — same value across many commits quick smoke check that the right release is running
build_sha ATOCORE_BUILD_SHA env var, set by deploy.sh per build precise — changes per commit the canonical drift signal
build_time / build_branch same env var path per-build forensics when multiple branches in flight

The precise check (run on the laptop or any host that can curl the live service AND has the source repo at hand):

# What's actually running on Dalidou
LIVE_SHA=$(curl -fsS http://dalidou:8100/health | grep -o '"build_sha":"[^"]*"' | cut -d'"' -f4)

# What the deployed branch tip should be
EXPECTED_SHA=$(cd /srv/storage/atocore/app && git rev-parse HEAD)

# Compare
if [ "$LIVE_SHA" = "$EXPECTED_SHA" ]; then
    echo "live is current at $LIVE_SHA"
else
    echo "DRIFT: live $LIVE_SHA vs expected $EXPECTED_SHA"
    echo "run deploy.sh to sync"
fi

The deploy.sh script does exactly this comparison automatically in its post-deploy verification step (Step 6) and exits non-zero on mismatch. So the simplest drift check is just to run deploy.sh — if there's nothing to deploy, it succeeds quickly; if the live service is stale, it deploys and verifies.

If /health reports build_sha: "unknown", the running container was started without deploy.sh (probably via docker compose up directly), and the build provenance was never recorded. Re-run via deploy.sh to fix.

The coarse code_version check is still useful as a quick visual sanity check — bumping __version__ from 0.2.0 to 0.3.0 signals a meaningful release boundary even if the precise build_sha is what tools should compare against:

# Quick sanity check (coarse)
curl -s http://127.0.0.1:8100/health | grep -o '"code_version":"[^"]*"'
grep '__version__' /srv/storage/atocore/app/src/atocore/__init__.py

Schema migrations on redeploy

When updating from an older __version__, the first startup after the redeploy runs the idempotent ALTER TABLE migrations in _apply_migrations. For a pre-0.2.0 → 0.2.0 upgrade the migrations add these columns to existing tables (all with safe defaults so no data is touched):

  • memories.project TEXT DEFAULT ''
  • memories.last_referenced_at DATETIME
  • memories.reference_count INTEGER DEFAULT 0
  • interactions.response TEXT DEFAULT ''
  • interactions.memories_used TEXT DEFAULT '[]'
  • interactions.chunks_used TEXT DEFAULT '[]'
  • interactions.client TEXT DEFAULT ''
  • interactions.session_id TEXT DEFAULT ''
  • interactions.project TEXT DEFAULT ''

Plus new indexes on the new columns. No row data is modified. The migration is safe to run against a database that already has the columns — the _column_exists check makes each ALTER a no-op in that case.

Backup the database before any redeploy (via POST /admin/backup) if you want a pre-upgrade snapshot. The migration is additive and reversible by restoring the snapshot.

Deferred

  • backup automation
  • restore/snapshot tooling
  • reverse proxy / TLS exposure
  • automated source ingestion job
  • OpenClaw client wiring

Current Reality Check

When this deployment is first brought up, the service may be healthy before the real corpus has been ingested.

That means:

  • AtoCore the system can already be hosted on Dalidou
  • the canonical machine-data location can already be on Dalidou
  • but the live knowledge/content corpus may still be empty or only partially loaded until source ingestion is run