deploy: version-visible /health + deploy.sh + update runbook
Dalidou Claude's validation run against the live service exposed a
structural gap: the deployment at /srv/storage/atocore/app has no
git connection, the running container was built from pre-Phase-9
source, and /health hardcoded 'version: 0.1.0' so drift is
invisible. Weeks of work have been shipping to Gitea but never
reaching the live service.
This commit fixes both the drift-invisibility problem and the
absence of an update workflow, so the next deploy to Dalidou can
go live cleanly and future drifts surface immediately.
Layer 1: deployment drift is now visible via /health
----------------------------------------------------
- src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0
and documented as the source of truth for the deployed code
version, with a history block explaining when each bump happens
(API surface change, schema change, user-visible behavior change)
- src/atocore/main.py: FastAPI constructor now uses __version__
instead of the hardcoded '0.1.0' string, so the OpenAPI docs
reflect the actual code version
- src/atocore/api/routes.py: /health now reads from __version__
dynamically. Both the existing 'version' field and a new
'code_version' field report the same value for backwards compat.
A new docstring explains that comparing this to the main
branch's __version__ is the fastest way to detect drift.
- pyproject.toml: version bumped to 0.2.0 to stay in sync
The comparison is now:
curl /health -> "code_version": "0.2.0"
grep __version__ src/atocore/__init__.py -> "0.2.0"
If those differ, the deployment is stale. Concrete, unambiguous.
Layer 2: deploy.sh as the canonical update path
-----------------------------------------------
New file: deploy/dalidou/deploy.sh
One-shot bash script that handles both the first-time deploy
(where /srv/storage/atocore/app may not be a git repo yet) and
the ongoing update case. Steps:
1. If app dir is not a git checkout, back it up as
<dir>.pre-git-<utc-stamp> and re-clone from Gitea.
If it IS a checkout, fetch + reset --hard origin/<branch>.
2. Report the deployable commit SHA
3. Check that deploy/dalidou/.env exists (hard fail if missing
with a clear message pointing at .env.example)
4. docker compose up -d --build — rebuilds the image from
current source, restarts the container
5. Poll /health for up to 30 seconds; on failure, print the
last 50 lines of container logs and exit non-zero
6. Parse /health.code_version and compare to the __version__
in the freshly-pulled source. If they differ, exit non-zero
with a message suggesting docker compose down && up
7. On success, report commit + code_version + "health: ok"
Configurable via env vars:
- ATOCORE_APP_DIR (default /srv/storage/atocore/app)
- ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git)
- ATOCORE_BRANCH (default main)
- ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health)
- ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode
Explicit non-goals documented in the script header:
- does not manage secrets (.env is the caller's responsibility)
- does not take a pre-deploy backup (call /admin/backup first
if you want one)
- does not roll back on failure (redeploy a known-good commit
to recover)
- does not touch the DB directly — schema migrations run at
service startup via the lifespan handler, and all existing
_apply_migrations ALTERs are idempotent ADD COLUMN operations
Layer 3: updated docs/dalidou-deployment.md
-------------------------------------------
- First-time deployment steps now explicitly say "git clone", not
"place the repository", so future first-time deploys don't end
up as static snapshots again
- New "Updating a running deployment" section covering deploy.sh
usage with all three modes (normal / branch override / dry-run)
- New "Deployment drift detection" section with the one-liner
comparison between /health code_version and the repo's
__version__
- New "Schema migrations on redeploy" section enumerating the
exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0
upgrade, confirming they are additive-only and safe, and
recommending a backup via /admin/backup before any redeploy
Full suite: 215 passing, 1 warning. No test was hardcoded to the
old version string, so the version bump was safe without test
changes.
What this commit does NOT do
----------------------------
- Does NOT execute the deploy on the live Dalidou instance. That
requires Dalidou access and is the next step. A ready-to-paste
prompt for Dalidou Claude will be provided separately.
- Does NOT add CI/CD, webhook-based auto-deploy, or reverse
proxy. Those remain in the 'deferred' section of the
deployment doc.
- Does NOT change the Dockerfile. The existing 'COPY source at
build time' pattern is what deploy.sh relies on — rebuilding
the image picks up new code.
- Does NOT modify the database schema. The Phase 9 migrations
that Dalidou's DB needs will be applied automatically on next
service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
|
|
|
#!/usr/bin/env bash
|
|
|
|
|
#
|
|
|
|
|
# deploy/dalidou/deploy.sh
|
|
|
|
|
# -------------------------
|
|
|
|
|
# One-shot deploy script for updating the running AtoCore container
|
|
|
|
|
# on Dalidou from the current Gitea main branch.
|
|
|
|
|
#
|
|
|
|
|
# The script is idempotent and safe to re-run. It handles both the
|
|
|
|
|
# first-time deploy (where /srv/storage/atocore/app may not yet be
|
|
|
|
|
# a git checkout) and the ongoing update case (where it is).
|
|
|
|
|
#
|
|
|
|
|
# Usage
|
|
|
|
|
# -----
|
|
|
|
|
#
|
|
|
|
|
# # Normal update from main (most common)
|
|
|
|
|
# bash deploy/dalidou/deploy.sh
|
|
|
|
|
#
|
|
|
|
|
# # Deploy a specific branch or tag
|
|
|
|
|
# ATOCORE_BRANCH=codex/some-feature bash deploy/dalidou/deploy.sh
|
|
|
|
|
#
|
|
|
|
|
# # Dry-run: show what would happen without touching anything
|
|
|
|
|
# ATOCORE_DEPLOY_DRY_RUN=1 bash deploy/dalidou/deploy.sh
|
|
|
|
|
#
|
|
|
|
|
# Environment variables
|
|
|
|
|
# ---------------------
|
|
|
|
|
#
|
|
|
|
|
# ATOCORE_APP_DIR default /srv/storage/atocore/app
|
fix: schema init ordering, deploy.sh default, client BASE_URL docs
Three issues Dalidou Claude surfaced during the first real deploy
of commit e877e5b to the live service (report from 2026-04-08).
Bug 1 was the critical one — a schema init ordering bug that would
have bitten every future upgrade from a pre-Phase-9 schema — and
the other two were usability traps around hostname resolution.
Bug 1 (CRITICAL): schema init ordering
--------------------------------------
src/atocore/models/database.py
SCHEMA_SQL contained CREATE INDEX statements that referenced
columns added later by _apply_migrations():
CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project);
CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project);
CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id);
On a FRESH install, CREATE TABLE IF NOT EXISTS creates the tables
with the Phase 9 shape (columns present), so the CREATE INDEX runs
cleanly and _apply_migrations is effectively a no-op.
On an UPGRADE from a pre-Phase-9 schema, CREATE TABLE IF NOT EXISTS
is a no-op (the tables already exist in the old shape), the columns
are NOT added yet, and the CREATE INDEX fails with
"OperationalError: no such column: project" before
_apply_migrations gets a chance to add the columns.
Dalidou Claude hit this exactly when redeploying from 0.1.0 to
0.2.0 — had to manually ALTER TABLE to add the Phase 9 columns
before the container could start.
The fix is to remove the Phase 9-column indexes from SCHEMA_SQL.
They already exist in _apply_migrations() AFTER the corresponding
ALTER TABLE, so they still get created on both fresh and upgrade
paths — just after the columns exist, not before.
Indexes still in SCHEMA_SQL (all safe — reference columns that
have existed since the first release):
- idx_chunks_document on source_chunks(document_id)
- idx_memories_type on memories(memory_type)
- idx_memories_status on memories(status)
- idx_interactions_project on interactions(project_id)
Indexes moved to _apply_migrations (already there — just no longer
duplicated in SCHEMA_SQL):
- idx_memories_project on memories(project)
- idx_interactions_project_name on interactions(project)
- idx_interactions_session on interactions(session_id)
- idx_interactions_created_at on interactions(created_at)
Regression test: tests/test_database.py
---------------------------------------
New test_init_db_upgrades_pre_phase9_schema_without_failing:
- Seeds the DB with the exact pre-Phase-9 shape (no project /
last_referenced_at / reference_count on memories; no project /
client / session_id / response / memories_used / chunks_used on
interactions)
- Calls init_db() — which used to raise OperationalError before
the fix
- Verifies all Phase 9 columns are present after the call
- Verifies the migration indexes exist
Before the fix this test would have failed with
"OperationalError: no such column: project" on the init_db call.
After the fix it passes. This locks the invariant "init_db is
safe on any legacy schema shape" so the bug can't silently come
back.
Full suite: 216 passing (was 215), 1 warning. The +1 is the new
regression test.
Bug 3 (usability): deploy.sh DNS default
----------------------------------------
deploy/dalidou/deploy.sh
ATOCORE_GIT_REMOTE defaulted to http://dalidou:3000/Antoine/ATOCore.git
which requires the "dalidou" hostname to resolve. On the Dalidou
host itself it didn't (no /etc/hosts entry for localhost alias),
so deploy.sh had to be run with the IP as a manual workaround.
Fix: default ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git.
Loopback always works on the host running the script. Callers
from a remote host (e.g. running deploy.sh from a laptop against
the Dalidou LAN) set ATOCORE_GIT_REMOTE explicitly. The script
header's Environment Variables section documents this with an
explicit reference to the 2026-04-08 Dalidou deploy report so the
rationale isn't lost.
docs/dalidou-deployment.md gets a new "Troubleshooting hostname
resolution" subsection and a new example invocation showing how
to deploy from a remote host with an explicit ATOCORE_GIT_REMOTE
override.
Bug 2 (usability): atocore_client.py ATOCORE_BASE_URL documentation
-------------------------------------------------------------------
scripts/atocore_client.py
Same class of issue as bug 3. BASE_URL defaults to
http://dalidou:8100 which resolves fine from a remote caller
(laptop, T420/OpenClaw over Tailscale) but NOT from the Dalidou
host itself or from inside the atocore container. Dalidou Claude
saw the CLI return
{"status": "unavailable", "fail_open": true}
while direct curl to http://127.0.0.1:8100 worked.
The fix here is NOT to change the default (remote callers are
the common case and would break) but to DOCUMENT the override
clearly so the next operator knows what's happening:
- The script module docstring grew a new "Environment variables"
section covering ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS,
ATOCORE_REFRESH_TIMEOUT_SECONDS, and ATOCORE_FAIL_OPEN, with
the explicit override example for on-host/in-container use
- It calls out the exact symptom (fail-open envelope when the
base URL doesn't resolve) so the diagnosis is obvious from
the error alone
- docs/dalidou-deployment.md troubleshooting section mirrors
this guidance so there's one place to look regardless of
whether the operator starts with the client help or the
deploy doc
What this commit does NOT do
----------------------------
- Does NOT change the default ATOCORE_BASE_URL. Doing that would
break the T420 OpenClaw helper and every remote caller who
currently relies on the hostname. Documentation is the right
fix for this case.
- Does NOT fix /etc/hosts on Dalidou. That's a host-level
configuration issue that the user can fix if they prefer
having the hostname resolve; the deploy.sh fix makes it
unnecessary regardless.
- Does NOT re-run the validation on Dalidou. The next step is
for the live service to pull this commit via deploy.sh (which
should now work without the IP workaround) and re-run the
Phase 9 loop test to confirm nothing regressed.
2026-04-08 19:02:57 -04:00
|
|
|
# ATOCORE_GIT_REMOTE default http://127.0.0.1:3000/Antoine/ATOCore.git
|
|
|
|
|
# This is the local Dalidou gitea, reached
|
|
|
|
|
# via loopback. Override only when running
|
|
|
|
|
# the deploy from a remote host. The default
|
|
|
|
|
# is loopback (not the hostname "dalidou")
|
|
|
|
|
# because the hostname doesn't reliably
|
|
|
|
|
# resolve on the host itself — Dalidou
|
|
|
|
|
# Claude's first deploy had to work around
|
|
|
|
|
# exactly this.
|
deploy: version-visible /health + deploy.sh + update runbook
Dalidou Claude's validation run against the live service exposed a
structural gap: the deployment at /srv/storage/atocore/app has no
git connection, the running container was built from pre-Phase-9
source, and /health hardcoded 'version: 0.1.0' so drift is
invisible. Weeks of work have been shipping to Gitea but never
reaching the live service.
This commit fixes both the drift-invisibility problem and the
absence of an update workflow, so the next deploy to Dalidou can
go live cleanly and future drifts surface immediately.
Layer 1: deployment drift is now visible via /health
----------------------------------------------------
- src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0
and documented as the source of truth for the deployed code
version, with a history block explaining when each bump happens
(API surface change, schema change, user-visible behavior change)
- src/atocore/main.py: FastAPI constructor now uses __version__
instead of the hardcoded '0.1.0' string, so the OpenAPI docs
reflect the actual code version
- src/atocore/api/routes.py: /health now reads from __version__
dynamically. Both the existing 'version' field and a new
'code_version' field report the same value for backwards compat.
A new docstring explains that comparing this to the main
branch's __version__ is the fastest way to detect drift.
- pyproject.toml: version bumped to 0.2.0 to stay in sync
The comparison is now:
curl /health -> "code_version": "0.2.0"
grep __version__ src/atocore/__init__.py -> "0.2.0"
If those differ, the deployment is stale. Concrete, unambiguous.
Layer 2: deploy.sh as the canonical update path
-----------------------------------------------
New file: deploy/dalidou/deploy.sh
One-shot bash script that handles both the first-time deploy
(where /srv/storage/atocore/app may not be a git repo yet) and
the ongoing update case. Steps:
1. If app dir is not a git checkout, back it up as
<dir>.pre-git-<utc-stamp> and re-clone from Gitea.
If it IS a checkout, fetch + reset --hard origin/<branch>.
2. Report the deployable commit SHA
3. Check that deploy/dalidou/.env exists (hard fail if missing
with a clear message pointing at .env.example)
4. docker compose up -d --build — rebuilds the image from
current source, restarts the container
5. Poll /health for up to 30 seconds; on failure, print the
last 50 lines of container logs and exit non-zero
6. Parse /health.code_version and compare to the __version__
in the freshly-pulled source. If they differ, exit non-zero
with a message suggesting docker compose down && up
7. On success, report commit + code_version + "health: ok"
Configurable via env vars:
- ATOCORE_APP_DIR (default /srv/storage/atocore/app)
- ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git)
- ATOCORE_BRANCH (default main)
- ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health)
- ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode
Explicit non-goals documented in the script header:
- does not manage secrets (.env is the caller's responsibility)
- does not take a pre-deploy backup (call /admin/backup first
if you want one)
- does not roll back on failure (redeploy a known-good commit
to recover)
- does not touch the DB directly — schema migrations run at
service startup via the lifespan handler, and all existing
_apply_migrations ALTERs are idempotent ADD COLUMN operations
Layer 3: updated docs/dalidou-deployment.md
-------------------------------------------
- First-time deployment steps now explicitly say "git clone", not
"place the repository", so future first-time deploys don't end
up as static snapshots again
- New "Updating a running deployment" section covering deploy.sh
usage with all three modes (normal / branch override / dry-run)
- New "Deployment drift detection" section with the one-liner
comparison between /health code_version and the repo's
__version__
- New "Schema migrations on redeploy" section enumerating the
exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0
upgrade, confirming they are additive-only and safe, and
recommending a backup via /admin/backup before any redeploy
Full suite: 215 passing, 1 warning. No test was hardcoded to the
old version string, so the version bump was safe without test
changes.
What this commit does NOT do
----------------------------
- Does NOT execute the deploy on the live Dalidou instance. That
requires Dalidou access and is the next step. A ready-to-paste
prompt for Dalidou Claude will be provided separately.
- Does NOT add CI/CD, webhook-based auto-deploy, or reverse
proxy. Those remain in the 'deferred' section of the
deployment doc.
- Does NOT change the Dockerfile. The existing 'COPY source at
build time' pattern is what deploy.sh relies on — rebuilding
the image picks up new code.
- Does NOT modify the database schema. The Phase 9 migrations
that Dalidou's DB needs will be applied automatically on next
service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
|
|
|
# ATOCORE_BRANCH default main
|
|
|
|
|
# ATOCORE_DEPLOY_DRY_RUN if set to 1, report only, no mutations
|
|
|
|
|
# ATOCORE_HEALTH_URL default http://127.0.0.1:8100/health
|
|
|
|
|
#
|
|
|
|
|
# Safety rails
|
|
|
|
|
# ------------
|
|
|
|
|
#
|
|
|
|
|
# - If the app dir exists but is NOT a git repo, the script renames
|
|
|
|
|
# it to <dir>.pre-git-<timestamp> before re-cloning, so you never
|
|
|
|
|
# lose the pre-existing snapshot to a git clobber.
|
|
|
|
|
# - If the health check fails after restart, the script exits
|
|
|
|
|
# non-zero and prints the container logs tail for diagnosis.
|
|
|
|
|
# - Dry-run mode is the default recommendation for the first deploy
|
|
|
|
|
# on a new environment: it shows the planned git operations and
|
|
|
|
|
# the compose command without actually running them.
|
|
|
|
|
#
|
|
|
|
|
# What this script does NOT do
|
|
|
|
|
# ----------------------------
|
|
|
|
|
#
|
|
|
|
|
# - Does not manage secrets / .env files. The caller is responsible
|
|
|
|
|
# for placing deploy/dalidou/.env before running.
|
|
|
|
|
# - Does not run a backup before deploying. Run the backup endpoint
|
|
|
|
|
# first if you want a pre-deploy snapshot.
|
|
|
|
|
# - Does not roll back on health-check failure. If deploy fails,
|
|
|
|
|
# the previous container is already stopped; you need to redeploy
|
|
|
|
|
# a known-good commit to recover.
|
|
|
|
|
# - Does not touch the database. The Phase 9 schema migrations in
|
|
|
|
|
# src/atocore/models/database.py::_apply_migrations are idempotent
|
|
|
|
|
# ALTER TABLE ADD COLUMN calls that run at service startup via the
|
|
|
|
|
# lifespan handler. Stale pre-Phase-9 schema is upgraded in place.
|
|
|
|
|
|
|
|
|
|
set -euo pipefail
|
|
|
|
|
|
|
|
|
|
APP_DIR="${ATOCORE_APP_DIR:-/srv/storage/atocore/app}"
|
fix: schema init ordering, deploy.sh default, client BASE_URL docs
Three issues Dalidou Claude surfaced during the first real deploy
of commit e877e5b to the live service (report from 2026-04-08).
Bug 1 was the critical one — a schema init ordering bug that would
have bitten every future upgrade from a pre-Phase-9 schema — and
the other two were usability traps around hostname resolution.
Bug 1 (CRITICAL): schema init ordering
--------------------------------------
src/atocore/models/database.py
SCHEMA_SQL contained CREATE INDEX statements that referenced
columns added later by _apply_migrations():
CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project);
CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project);
CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id);
On a FRESH install, CREATE TABLE IF NOT EXISTS creates the tables
with the Phase 9 shape (columns present), so the CREATE INDEX runs
cleanly and _apply_migrations is effectively a no-op.
On an UPGRADE from a pre-Phase-9 schema, CREATE TABLE IF NOT EXISTS
is a no-op (the tables already exist in the old shape), the columns
are NOT added yet, and the CREATE INDEX fails with
"OperationalError: no such column: project" before
_apply_migrations gets a chance to add the columns.
Dalidou Claude hit this exactly when redeploying from 0.1.0 to
0.2.0 — had to manually ALTER TABLE to add the Phase 9 columns
before the container could start.
The fix is to remove the Phase 9-column indexes from SCHEMA_SQL.
They already exist in _apply_migrations() AFTER the corresponding
ALTER TABLE, so they still get created on both fresh and upgrade
paths — just after the columns exist, not before.
Indexes still in SCHEMA_SQL (all safe — reference columns that
have existed since the first release):
- idx_chunks_document on source_chunks(document_id)
- idx_memories_type on memories(memory_type)
- idx_memories_status on memories(status)
- idx_interactions_project on interactions(project_id)
Indexes moved to _apply_migrations (already there — just no longer
duplicated in SCHEMA_SQL):
- idx_memories_project on memories(project)
- idx_interactions_project_name on interactions(project)
- idx_interactions_session on interactions(session_id)
- idx_interactions_created_at on interactions(created_at)
Regression test: tests/test_database.py
---------------------------------------
New test_init_db_upgrades_pre_phase9_schema_without_failing:
- Seeds the DB with the exact pre-Phase-9 shape (no project /
last_referenced_at / reference_count on memories; no project /
client / session_id / response / memories_used / chunks_used on
interactions)
- Calls init_db() — which used to raise OperationalError before
the fix
- Verifies all Phase 9 columns are present after the call
- Verifies the migration indexes exist
Before the fix this test would have failed with
"OperationalError: no such column: project" on the init_db call.
After the fix it passes. This locks the invariant "init_db is
safe on any legacy schema shape" so the bug can't silently come
back.
Full suite: 216 passing (was 215), 1 warning. The +1 is the new
regression test.
Bug 3 (usability): deploy.sh DNS default
----------------------------------------
deploy/dalidou/deploy.sh
ATOCORE_GIT_REMOTE defaulted to http://dalidou:3000/Antoine/ATOCore.git
which requires the "dalidou" hostname to resolve. On the Dalidou
host itself it didn't (no /etc/hosts entry for localhost alias),
so deploy.sh had to be run with the IP as a manual workaround.
Fix: default ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git.
Loopback always works on the host running the script. Callers
from a remote host (e.g. running deploy.sh from a laptop against
the Dalidou LAN) set ATOCORE_GIT_REMOTE explicitly. The script
header's Environment Variables section documents this with an
explicit reference to the 2026-04-08 Dalidou deploy report so the
rationale isn't lost.
docs/dalidou-deployment.md gets a new "Troubleshooting hostname
resolution" subsection and a new example invocation showing how
to deploy from a remote host with an explicit ATOCORE_GIT_REMOTE
override.
Bug 2 (usability): atocore_client.py ATOCORE_BASE_URL documentation
-------------------------------------------------------------------
scripts/atocore_client.py
Same class of issue as bug 3. BASE_URL defaults to
http://dalidou:8100 which resolves fine from a remote caller
(laptop, T420/OpenClaw over Tailscale) but NOT from the Dalidou
host itself or from inside the atocore container. Dalidou Claude
saw the CLI return
{"status": "unavailable", "fail_open": true}
while direct curl to http://127.0.0.1:8100 worked.
The fix here is NOT to change the default (remote callers are
the common case and would break) but to DOCUMENT the override
clearly so the next operator knows what's happening:
- The script module docstring grew a new "Environment variables"
section covering ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS,
ATOCORE_REFRESH_TIMEOUT_SECONDS, and ATOCORE_FAIL_OPEN, with
the explicit override example for on-host/in-container use
- It calls out the exact symptom (fail-open envelope when the
base URL doesn't resolve) so the diagnosis is obvious from
the error alone
- docs/dalidou-deployment.md troubleshooting section mirrors
this guidance so there's one place to look regardless of
whether the operator starts with the client help or the
deploy doc
What this commit does NOT do
----------------------------
- Does NOT change the default ATOCORE_BASE_URL. Doing that would
break the T420 OpenClaw helper and every remote caller who
currently relies on the hostname. Documentation is the right
fix for this case.
- Does NOT fix /etc/hosts on Dalidou. That's a host-level
configuration issue that the user can fix if they prefer
having the hostname resolve; the deploy.sh fix makes it
unnecessary regardless.
- Does NOT re-run the validation on Dalidou. The next step is
for the live service to pull this commit via deploy.sh (which
should now work without the IP workaround) and re-run the
Phase 9 loop test to confirm nothing regressed.
2026-04-08 19:02:57 -04:00
|
|
|
GIT_REMOTE="${ATOCORE_GIT_REMOTE:-http://127.0.0.1:3000/Antoine/ATOCore.git}"
|
deploy: version-visible /health + deploy.sh + update runbook
Dalidou Claude's validation run against the live service exposed a
structural gap: the deployment at /srv/storage/atocore/app has no
git connection, the running container was built from pre-Phase-9
source, and /health hardcoded 'version: 0.1.0' so drift is
invisible. Weeks of work have been shipping to Gitea but never
reaching the live service.
This commit fixes both the drift-invisibility problem and the
absence of an update workflow, so the next deploy to Dalidou can
go live cleanly and future drifts surface immediately.
Layer 1: deployment drift is now visible via /health
----------------------------------------------------
- src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0
and documented as the source of truth for the deployed code
version, with a history block explaining when each bump happens
(API surface change, schema change, user-visible behavior change)
- src/atocore/main.py: FastAPI constructor now uses __version__
instead of the hardcoded '0.1.0' string, so the OpenAPI docs
reflect the actual code version
- src/atocore/api/routes.py: /health now reads from __version__
dynamically. Both the existing 'version' field and a new
'code_version' field report the same value for backwards compat.
A new docstring explains that comparing this to the main
branch's __version__ is the fastest way to detect drift.
- pyproject.toml: version bumped to 0.2.0 to stay in sync
The comparison is now:
curl /health -> "code_version": "0.2.0"
grep __version__ src/atocore/__init__.py -> "0.2.0"
If those differ, the deployment is stale. Concrete, unambiguous.
Layer 2: deploy.sh as the canonical update path
-----------------------------------------------
New file: deploy/dalidou/deploy.sh
One-shot bash script that handles both the first-time deploy
(where /srv/storage/atocore/app may not be a git repo yet) and
the ongoing update case. Steps:
1. If app dir is not a git checkout, back it up as
<dir>.pre-git-<utc-stamp> and re-clone from Gitea.
If it IS a checkout, fetch + reset --hard origin/<branch>.
2. Report the deployable commit SHA
3. Check that deploy/dalidou/.env exists (hard fail if missing
with a clear message pointing at .env.example)
4. docker compose up -d --build — rebuilds the image from
current source, restarts the container
5. Poll /health for up to 30 seconds; on failure, print the
last 50 lines of container logs and exit non-zero
6. Parse /health.code_version and compare to the __version__
in the freshly-pulled source. If they differ, exit non-zero
with a message suggesting docker compose down && up
7. On success, report commit + code_version + "health: ok"
Configurable via env vars:
- ATOCORE_APP_DIR (default /srv/storage/atocore/app)
- ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git)
- ATOCORE_BRANCH (default main)
- ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health)
- ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode
Explicit non-goals documented in the script header:
- does not manage secrets (.env is the caller's responsibility)
- does not take a pre-deploy backup (call /admin/backup first
if you want one)
- does not roll back on failure (redeploy a known-good commit
to recover)
- does not touch the DB directly — schema migrations run at
service startup via the lifespan handler, and all existing
_apply_migrations ALTERs are idempotent ADD COLUMN operations
Layer 3: updated docs/dalidou-deployment.md
-------------------------------------------
- First-time deployment steps now explicitly say "git clone", not
"place the repository", so future first-time deploys don't end
up as static snapshots again
- New "Updating a running deployment" section covering deploy.sh
usage with all three modes (normal / branch override / dry-run)
- New "Deployment drift detection" section with the one-liner
comparison between /health code_version and the repo's
__version__
- New "Schema migrations on redeploy" section enumerating the
exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0
upgrade, confirming they are additive-only and safe, and
recommending a backup via /admin/backup before any redeploy
Full suite: 215 passing, 1 warning. No test was hardcoded to the
old version string, so the version bump was safe without test
changes.
What this commit does NOT do
----------------------------
- Does NOT execute the deploy on the live Dalidou instance. That
requires Dalidou access and is the next step. A ready-to-paste
prompt for Dalidou Claude will be provided separately.
- Does NOT add CI/CD, webhook-based auto-deploy, or reverse
proxy. Those remain in the 'deferred' section of the
deployment doc.
- Does NOT change the Dockerfile. The existing 'COPY source at
build time' pattern is what deploy.sh relies on — rebuilding
the image picks up new code.
- Does NOT modify the database schema. The Phase 9 migrations
that Dalidou's DB needs will be applied automatically on next
service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
|
|
|
BRANCH="${ATOCORE_BRANCH:-main}"
|
|
|
|
|
HEALTH_URL="${ATOCORE_HEALTH_URL:-http://127.0.0.1:8100/health}"
|
|
|
|
|
DRY_RUN="${ATOCORE_DEPLOY_DRY_RUN:-0}"
|
|
|
|
|
COMPOSE_DIR="$APP_DIR/deploy/dalidou"
|
|
|
|
|
|
|
|
|
|
log() { printf '==> %s\n' "$*"; }
|
|
|
|
|
run() {
|
|
|
|
|
if [ "$DRY_RUN" = "1" ]; then
|
|
|
|
|
printf ' [dry-run] %s\n' "$*"
|
|
|
|
|
else
|
|
|
|
|
eval "$@"
|
|
|
|
|
fi
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
log "AtoCore deploy starting"
|
|
|
|
|
log " app dir: $APP_DIR"
|
|
|
|
|
log " git remote: $GIT_REMOTE"
|
|
|
|
|
log " branch: $BRANCH"
|
|
|
|
|
log " health url: $HEALTH_URL"
|
|
|
|
|
log " dry run: $DRY_RUN"
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
# Step 1: make sure $APP_DIR is a proper git checkout of the branch
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
if [ -d "$APP_DIR/.git" ]; then
|
|
|
|
|
log "Step 1: app dir is already a git checkout; fetching latest"
|
|
|
|
|
run "cd '$APP_DIR' && git fetch origin '$BRANCH'"
|
|
|
|
|
run "cd '$APP_DIR' && git reset --hard 'origin/$BRANCH'"
|
|
|
|
|
else
|
|
|
|
|
log "Step 1: app dir is NOT a git checkout; converting"
|
|
|
|
|
if [ -d "$APP_DIR" ]; then
|
|
|
|
|
BACKUP="${APP_DIR}.pre-git-$(date -u +%Y%m%dT%H%M%SZ)"
|
|
|
|
|
log " backing up existing snapshot to $BACKUP"
|
|
|
|
|
run "mv '$APP_DIR' '$BACKUP'"
|
|
|
|
|
fi
|
|
|
|
|
log " cloning $GIT_REMOTE -> $APP_DIR (branch: $BRANCH)"
|
|
|
|
|
run "git clone --branch '$BRANCH' '$GIT_REMOTE' '$APP_DIR'"
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
# Step 2: show what we're deploying
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
log "Step 2: deployable commit"
|
|
|
|
|
if [ "$DRY_RUN" != "1" ] && [ -d "$APP_DIR/.git" ]; then
|
|
|
|
|
( cd "$APP_DIR" && git log --oneline -1 )
|
|
|
|
|
( cd "$APP_DIR" && git rev-parse HEAD > /tmp/atocore-deploying-sha.txt )
|
|
|
|
|
DEPLOYING_SHA="$(cat /tmp/atocore-deploying-sha.txt | cut -c1-7)"
|
|
|
|
|
log " commit: $DEPLOYING_SHA"
|
|
|
|
|
else
|
|
|
|
|
log " [dry-run] would read git log from $APP_DIR"
|
|
|
|
|
DEPLOYING_SHA="dry-run"
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
# Step 3: preserve the .env file (it's not in git)
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
ENV_FILE="$COMPOSE_DIR/.env"
|
|
|
|
|
if [ "$DRY_RUN" != "1" ] && [ ! -f "$ENV_FILE" ]; then
|
|
|
|
|
log "Step 3: WARNING — $ENV_FILE does not exist"
|
|
|
|
|
log " the compose workflow needs this file to map mount points"
|
|
|
|
|
log " copy deploy/dalidou/.env.example to $ENV_FILE and edit it"
|
|
|
|
|
log " before re-running this script"
|
|
|
|
|
exit 2
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
# Step 4: rebuild and restart the container
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
log "Step 4: rebuilding and restarting the atocore container"
|
|
|
|
|
run "cd '$COMPOSE_DIR' && docker compose up -d --build"
|
|
|
|
|
|
|
|
|
|
if [ "$DRY_RUN" = "1" ]; then
|
|
|
|
|
log "dry-run complete — no mutations performed"
|
|
|
|
|
exit 0
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
# Step 5: wait for the service to come up and pass the health check
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
log "Step 5: waiting for /health to respond"
|
|
|
|
|
for i in 1 2 3 4 5 6 7 8 9 10; do
|
|
|
|
|
if curl -fsS "$HEALTH_URL" > /tmp/atocore-health.json 2>/dev/null; then
|
|
|
|
|
log " service is responding"
|
|
|
|
|
break
|
|
|
|
|
fi
|
|
|
|
|
log " not ready yet ($i/10); waiting 3s"
|
|
|
|
|
sleep 3
|
|
|
|
|
done
|
|
|
|
|
|
|
|
|
|
if ! curl -fsS "$HEALTH_URL" > /tmp/atocore-health.json 2>/dev/null; then
|
|
|
|
|
log "FATAL: service did not come up within 30 seconds"
|
|
|
|
|
log " container logs (last 50 lines):"
|
|
|
|
|
cd "$COMPOSE_DIR" && docker compose logs --tail=50 atocore || true
|
|
|
|
|
exit 3
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
# Step 6: verify the deployed version matches expectations
|
|
|
|
|
# ---------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
log "Step 6: verifying deployed version"
|
|
|
|
|
log " /health response:"
|
|
|
|
|
if command -v jq >/dev/null 2>&1; then
|
|
|
|
|
jq . < /tmp/atocore-health.json | sed 's/^/ /'
|
|
|
|
|
REPORTED_VERSION="$(jq -r '.code_version // .version' < /tmp/atocore-health.json)"
|
|
|
|
|
else
|
|
|
|
|
cat /tmp/atocore-health.json | sed 's/^/ /'
|
|
|
|
|
echo
|
|
|
|
|
REPORTED_VERSION="$(grep -o '"code_version":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
|
|
|
|
|
if [ -z "$REPORTED_VERSION" ]; then
|
|
|
|
|
REPORTED_VERSION="$(grep -o '"version":"[^"]*"' /tmp/atocore-health.json | head -1 | cut -d'"' -f4)"
|
|
|
|
|
fi
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
EXPECTED_VERSION="$(grep -oE "__version__ = \"[^\"]+\"" "$APP_DIR/src/atocore/__init__.py" | head -1 | cut -d'"' -f2)"
|
|
|
|
|
|
|
|
|
|
log " expected code_version: $EXPECTED_VERSION (from $APP_DIR/src/atocore/__init__.py)"
|
|
|
|
|
log " reported code_version: $REPORTED_VERSION (from live /health)"
|
|
|
|
|
|
|
|
|
|
if [ "$REPORTED_VERSION" != "$EXPECTED_VERSION" ]; then
|
|
|
|
|
log "WARNING: deployed version mismatch"
|
|
|
|
|
log " the container may not have picked up the new image"
|
|
|
|
|
log " try: docker compose down && docker compose up -d --build"
|
|
|
|
|
exit 4
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
log "Deploy complete."
|
|
|
|
|
log " commit: $DEPLOYING_SHA"
|
|
|
|
|
log " code_version: $REPORTED_VERSION"
|
|
|
|
|
log " health: ok"
|