Files
ATOCore/docs/dalidou-deployment.md

241 lines
7.6 KiB
Markdown
Raw Normal View History

# Dalidou Deployment
## Purpose
Deploy AtoCore on Dalidou as the canonical runtime and machine-memory host.
## Model
- Dalidou hosts the canonical AtoCore service.
- OpenClaw on the T420 consumes AtoCore over network/Tailscale API.
- `sources/vault` and `sources/drive` are read-only inputs by convention.
- SQLite/Chroma machine state stays on Dalidou and is not treated as a sync peer.
- The app and machine-storage host can be live before the long-term content
corpus is fully populated.
## Directory layout
```text
/srv/storage/atocore/
app/ # deployed repo checkout
data/
db/
chroma/
cache/
tmp/
sources/
vault/
drive/
logs/
backups/
run/
```
## Compose workflow
The compose definition lives in:
```text
deploy/dalidou/docker-compose.yml
```
The Dalidou environment file should be copied to:
```text
deploy/dalidou/.env
```
starting from:
```text
deploy/dalidou/.env.example
```
deploy: version-visible /health + deploy.sh + update runbook Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
## First-time deployment steps
1. Place the repository under `/srv/storage/atocore/app` — ideally as a
proper git clone so future updates can be pulled, not as a static
snapshot:
```bash
sudo git clone http://dalidou:3000/Antoine/ATOCore.git \
/srv/storage/atocore/app
```
2. Create the canonical directories listed above.
3. Copy `deploy/dalidou/.env.example` to `deploy/dalidou/.env`.
4. Adjust the source paths if your AtoVault/AtoDrive mirrors live elsewhere.
5. Run:
deploy: version-visible /health + deploy.sh + update runbook Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
```bash
cd /srv/storage/atocore/app/deploy/dalidou
docker compose up -d --build
```
6. Validate:
```bash
curl http://127.0.0.1:8100/health
curl http://127.0.0.1:8100/sources
```
## Updating a running deployment
**Use `deploy/dalidou/deploy.sh` for every code update.** It is the
one-shot sync script that:
- fetches latest main from Gitea into `/srv/storage/atocore/app`
- (if the app dir is not a git checkout) backs it up as
`<dir>.pre-git-<timestamp>` and re-clones
- rebuilds the container image
- restarts the container
- waits for `/health` to respond
- compares the reported `code_version` against the
`__version__` in the freshly-pulled source, and exits non-zero
if they don't match (deployment drift detection)
```bash
deploy: version-visible /health + deploy.sh + update runbook Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
# Normal update from main
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
# Deploy a specific branch or tag
ATOCORE_BRANCH=codex/some-feature \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
# Dry-run: show what would happen without touching anything
ATOCORE_DEPLOY_DRY_RUN=1 \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
fix: schema init ordering, deploy.sh default, client BASE_URL docs Three issues Dalidou Claude surfaced during the first real deploy of commit e877e5b to the live service (report from 2026-04-08). Bug 1 was the critical one — a schema init ordering bug that would have bitten every future upgrade from a pre-Phase-9 schema — and the other two were usability traps around hostname resolution. Bug 1 (CRITICAL): schema init ordering -------------------------------------- src/atocore/models/database.py SCHEMA_SQL contained CREATE INDEX statements that referenced columns added later by _apply_migrations(): CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project); CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project); CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id); On a FRESH install, CREATE TABLE IF NOT EXISTS creates the tables with the Phase 9 shape (columns present), so the CREATE INDEX runs cleanly and _apply_migrations is effectively a no-op. On an UPGRADE from a pre-Phase-9 schema, CREATE TABLE IF NOT EXISTS is a no-op (the tables already exist in the old shape), the columns are NOT added yet, and the CREATE INDEX fails with "OperationalError: no such column: project" before _apply_migrations gets a chance to add the columns. Dalidou Claude hit this exactly when redeploying from 0.1.0 to 0.2.0 — had to manually ALTER TABLE to add the Phase 9 columns before the container could start. The fix is to remove the Phase 9-column indexes from SCHEMA_SQL. They already exist in _apply_migrations() AFTER the corresponding ALTER TABLE, so they still get created on both fresh and upgrade paths — just after the columns exist, not before. Indexes still in SCHEMA_SQL (all safe — reference columns that have existed since the first release): - idx_chunks_document on source_chunks(document_id) - idx_memories_type on memories(memory_type) - idx_memories_status on memories(status) - idx_interactions_project on interactions(project_id) Indexes moved to _apply_migrations (already there — just no longer duplicated in SCHEMA_SQL): - idx_memories_project on memories(project) - idx_interactions_project_name on interactions(project) - idx_interactions_session on interactions(session_id) - idx_interactions_created_at on interactions(created_at) Regression test: tests/test_database.py --------------------------------------- New test_init_db_upgrades_pre_phase9_schema_without_failing: - Seeds the DB with the exact pre-Phase-9 shape (no project / last_referenced_at / reference_count on memories; no project / client / session_id / response / memories_used / chunks_used on interactions) - Calls init_db() — which used to raise OperationalError before the fix - Verifies all Phase 9 columns are present after the call - Verifies the migration indexes exist Before the fix this test would have failed with "OperationalError: no such column: project" on the init_db call. After the fix it passes. This locks the invariant "init_db is safe on any legacy schema shape" so the bug can't silently come back. Full suite: 216 passing (was 215), 1 warning. The +1 is the new regression test. Bug 3 (usability): deploy.sh DNS default ---------------------------------------- deploy/dalidou/deploy.sh ATOCORE_GIT_REMOTE defaulted to http://dalidou:3000/Antoine/ATOCore.git which requires the "dalidou" hostname to resolve. On the Dalidou host itself it didn't (no /etc/hosts entry for localhost alias), so deploy.sh had to be run with the IP as a manual workaround. Fix: default ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git. Loopback always works on the host running the script. Callers from a remote host (e.g. running deploy.sh from a laptop against the Dalidou LAN) set ATOCORE_GIT_REMOTE explicitly. The script header's Environment Variables section documents this with an explicit reference to the 2026-04-08 Dalidou deploy report so the rationale isn't lost. docs/dalidou-deployment.md gets a new "Troubleshooting hostname resolution" subsection and a new example invocation showing how to deploy from a remote host with an explicit ATOCORE_GIT_REMOTE override. Bug 2 (usability): atocore_client.py ATOCORE_BASE_URL documentation ------------------------------------------------------------------- scripts/atocore_client.py Same class of issue as bug 3. BASE_URL defaults to http://dalidou:8100 which resolves fine from a remote caller (laptop, T420/OpenClaw over Tailscale) but NOT from the Dalidou host itself or from inside the atocore container. Dalidou Claude saw the CLI return {"status": "unavailable", "fail_open": true} while direct curl to http://127.0.0.1:8100 worked. The fix here is NOT to change the default (remote callers are the common case and would break) but to DOCUMENT the override clearly so the next operator knows what's happening: - The script module docstring grew a new "Environment variables" section covering ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS, ATOCORE_REFRESH_TIMEOUT_SECONDS, and ATOCORE_FAIL_OPEN, with the explicit override example for on-host/in-container use - It calls out the exact symptom (fail-open envelope when the base URL doesn't resolve) so the diagnosis is obvious from the error alone - docs/dalidou-deployment.md troubleshooting section mirrors this guidance so there's one place to look regardless of whether the operator starts with the client help or the deploy doc What this commit does NOT do ---------------------------- - Does NOT change the default ATOCORE_BASE_URL. Doing that would break the T420 OpenClaw helper and every remote caller who currently relies on the hostname. Documentation is the right fix for this case. - Does NOT fix /etc/hosts on Dalidou. That's a host-level configuration issue that the user can fix if they prefer having the hostname resolve; the deploy.sh fix makes it unnecessary regardless. - Does NOT re-run the validation on Dalidou. The next step is for the live service to pull this commit via deploy.sh (which should now work without the IP workaround) and re-run the Phase 9 loop test to confirm nothing regressed.
2026-04-08 19:02:57 -04:00
# Deploy from a remote host (e.g. the laptop) using the Tailscale
# or LAN address instead of loopback
ATOCORE_GIT_REMOTE=http://192.168.86.50:3000/Antoine/ATOCore.git \
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
```
deploy: version-visible /health + deploy.sh + update runbook Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
The script is idempotent and safe to re-run. It never touches the
database directly — schema migrations are applied automatically at
service startup by the lifespan handler in `src/atocore/main.py`
which calls `init_db()` (which in turn runs the ALTER TABLE
statements in `_apply_migrations`).
fix: schema init ordering, deploy.sh default, client BASE_URL docs Three issues Dalidou Claude surfaced during the first real deploy of commit e877e5b to the live service (report from 2026-04-08). Bug 1 was the critical one — a schema init ordering bug that would have bitten every future upgrade from a pre-Phase-9 schema — and the other two were usability traps around hostname resolution. Bug 1 (CRITICAL): schema init ordering -------------------------------------- src/atocore/models/database.py SCHEMA_SQL contained CREATE INDEX statements that referenced columns added later by _apply_migrations(): CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project); CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project); CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id); On a FRESH install, CREATE TABLE IF NOT EXISTS creates the tables with the Phase 9 shape (columns present), so the CREATE INDEX runs cleanly and _apply_migrations is effectively a no-op. On an UPGRADE from a pre-Phase-9 schema, CREATE TABLE IF NOT EXISTS is a no-op (the tables already exist in the old shape), the columns are NOT added yet, and the CREATE INDEX fails with "OperationalError: no such column: project" before _apply_migrations gets a chance to add the columns. Dalidou Claude hit this exactly when redeploying from 0.1.0 to 0.2.0 — had to manually ALTER TABLE to add the Phase 9 columns before the container could start. The fix is to remove the Phase 9-column indexes from SCHEMA_SQL. They already exist in _apply_migrations() AFTER the corresponding ALTER TABLE, so they still get created on both fresh and upgrade paths — just after the columns exist, not before. Indexes still in SCHEMA_SQL (all safe — reference columns that have existed since the first release): - idx_chunks_document on source_chunks(document_id) - idx_memories_type on memories(memory_type) - idx_memories_status on memories(status) - idx_interactions_project on interactions(project_id) Indexes moved to _apply_migrations (already there — just no longer duplicated in SCHEMA_SQL): - idx_memories_project on memories(project) - idx_interactions_project_name on interactions(project) - idx_interactions_session on interactions(session_id) - idx_interactions_created_at on interactions(created_at) Regression test: tests/test_database.py --------------------------------------- New test_init_db_upgrades_pre_phase9_schema_without_failing: - Seeds the DB with the exact pre-Phase-9 shape (no project / last_referenced_at / reference_count on memories; no project / client / session_id / response / memories_used / chunks_used on interactions) - Calls init_db() — which used to raise OperationalError before the fix - Verifies all Phase 9 columns are present after the call - Verifies the migration indexes exist Before the fix this test would have failed with "OperationalError: no such column: project" on the init_db call. After the fix it passes. This locks the invariant "init_db is safe on any legacy schema shape" so the bug can't silently come back. Full suite: 216 passing (was 215), 1 warning. The +1 is the new regression test. Bug 3 (usability): deploy.sh DNS default ---------------------------------------- deploy/dalidou/deploy.sh ATOCORE_GIT_REMOTE defaulted to http://dalidou:3000/Antoine/ATOCore.git which requires the "dalidou" hostname to resolve. On the Dalidou host itself it didn't (no /etc/hosts entry for localhost alias), so deploy.sh had to be run with the IP as a manual workaround. Fix: default ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git. Loopback always works on the host running the script. Callers from a remote host (e.g. running deploy.sh from a laptop against the Dalidou LAN) set ATOCORE_GIT_REMOTE explicitly. The script header's Environment Variables section documents this with an explicit reference to the 2026-04-08 Dalidou deploy report so the rationale isn't lost. docs/dalidou-deployment.md gets a new "Troubleshooting hostname resolution" subsection and a new example invocation showing how to deploy from a remote host with an explicit ATOCORE_GIT_REMOTE override. Bug 2 (usability): atocore_client.py ATOCORE_BASE_URL documentation ------------------------------------------------------------------- scripts/atocore_client.py Same class of issue as bug 3. BASE_URL defaults to http://dalidou:8100 which resolves fine from a remote caller (laptop, T420/OpenClaw over Tailscale) but NOT from the Dalidou host itself or from inside the atocore container. Dalidou Claude saw the CLI return {"status": "unavailable", "fail_open": true} while direct curl to http://127.0.0.1:8100 worked. The fix here is NOT to change the default (remote callers are the common case and would break) but to DOCUMENT the override clearly so the next operator knows what's happening: - The script module docstring grew a new "Environment variables" section covering ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS, ATOCORE_REFRESH_TIMEOUT_SECONDS, and ATOCORE_FAIL_OPEN, with the explicit override example for on-host/in-container use - It calls out the exact symptom (fail-open envelope when the base URL doesn't resolve) so the diagnosis is obvious from the error alone - docs/dalidou-deployment.md troubleshooting section mirrors this guidance so there's one place to look regardless of whether the operator starts with the client help or the deploy doc What this commit does NOT do ---------------------------- - Does NOT change the default ATOCORE_BASE_URL. Doing that would break the T420 OpenClaw helper and every remote caller who currently relies on the hostname. Documentation is the right fix for this case. - Does NOT fix /etc/hosts on Dalidou. That's a host-level configuration issue that the user can fix if they prefer having the hostname resolve; the deploy.sh fix makes it unnecessary regardless. - Does NOT re-run the validation on Dalidou. The next step is for the live service to pull this commit via deploy.sh (which should now work without the IP workaround) and re-run the Phase 9 loop test to confirm nothing regressed.
2026-04-08 19:02:57 -04:00
### Troubleshooting hostname resolution
`deploy.sh` defaults `ATOCORE_GIT_REMOTE` to
`http://127.0.0.1:3000/Antoine/ATOCore.git` (loopback) because the
hostname "dalidou" doesn't reliably resolve on the host itself —
the first real Dalidou deploy hit exactly this on 2026-04-08. If
you need to override (e.g. running deploy.sh from a laptop against
the Dalidou LAN), set `ATOCORE_GIT_REMOTE` explicitly.
The same applies to `scripts/atocore_client.py`: its default
`ATOCORE_BASE_URL` is `http://dalidou:8100` for remote callers, but
when running the client on Dalidou itself (or inside the container
via `docker exec`), override to loopback:
```bash
ATOCORE_BASE_URL=http://127.0.0.1:8100 \
python scripts/atocore_client.py health
```
If you see `{"status": "unavailable", "fail_open": true}` from the
client, the first thing to check is whether the base URL resolves
from where you're running the client.
deploy: version-visible /health + deploy.sh + update runbook Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
### Deployment drift detection
deploy: add build_sha visibility for precise drift detection Make /health report the precise git SHA the container was built from, so 'is the live service current?' can be answered without ambiguity. 0.2.0 was too coarse to trust as a 'live is current' signal — many commits share the same __version__. Three layers: 1. /health endpoint (src/atocore/api/routes.py) - Reads ATOCORE_BUILD_SHA, ATOCORE_BUILD_TIME, ATOCORE_BUILD_BRANCH from environment, defaults to 'unknown' - Reports them alongside existing code_version field 2. docker-compose.yml - Forwards the three env vars from the host into the container - Defaults to 'unknown' so direct `docker compose up` runs (without deploy.sh) cleanly signal missing build provenance 3. deploy.sh - Step 2 captures git SHA + UTC timestamp + branch and exports them as env vars before `docker compose up -d --build` - Step 6 reads /health post-deploy and compares the reported build_sha against the freshly-built one. Mismatch exits non-zero (exit code 6) with a remediation hint covering cached image, env propagation, and concurrent restart cases Tests (tests/test_api_storage.py): - test_health_endpoint_reports_code_version_from_module - test_health_endpoint_reports_build_metadata_from_env - test_health_endpoint_reports_unknown_when_build_env_unset Docs (docs/dalidou-deployment.md): - Three-level drift detection table (code_version coarse, build_sha precise, build_time/branch forensic) - Canonical drift check script using LIVE_SHA vs EXPECTED_SHA - Note that running deploy.sh is itself the simplest drift check 219/219 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:25:32 -04:00
`/health` reports drift signals at three increasing levels of
precision:
| Field | Source | Precision | When to use |
|---|---|---|---|
| `version` / `code_version` | `atocore.__version__` (manual bump) | coarse — same value across many commits | quick smoke check that the right *release* is running |
| `build_sha` | `ATOCORE_BUILD_SHA` env var, set by `deploy.sh` per build | precise — changes per commit | the canonical drift signal |
| `build_time` / `build_branch` | same env var path | per-build | forensics when multiple branches in flight |
The **precise** check (run on the laptop or any host that can curl
the live service AND has the source repo at hand):
```bash
deploy: add build_sha visibility for precise drift detection Make /health report the precise git SHA the container was built from, so 'is the live service current?' can be answered without ambiguity. 0.2.0 was too coarse to trust as a 'live is current' signal — many commits share the same __version__. Three layers: 1. /health endpoint (src/atocore/api/routes.py) - Reads ATOCORE_BUILD_SHA, ATOCORE_BUILD_TIME, ATOCORE_BUILD_BRANCH from environment, defaults to 'unknown' - Reports them alongside existing code_version field 2. docker-compose.yml - Forwards the three env vars from the host into the container - Defaults to 'unknown' so direct `docker compose up` runs (without deploy.sh) cleanly signal missing build provenance 3. deploy.sh - Step 2 captures git SHA + UTC timestamp + branch and exports them as env vars before `docker compose up -d --build` - Step 6 reads /health post-deploy and compares the reported build_sha against the freshly-built one. Mismatch exits non-zero (exit code 6) with a remediation hint covering cached image, env propagation, and concurrent restart cases Tests (tests/test_api_storage.py): - test_health_endpoint_reports_code_version_from_module - test_health_endpoint_reports_build_metadata_from_env - test_health_endpoint_reports_unknown_when_build_env_unset Docs (docs/dalidou-deployment.md): - Three-level drift detection table (code_version coarse, build_sha precise, build_time/branch forensic) - Canonical drift check script using LIVE_SHA vs EXPECTED_SHA - Note that running deploy.sh is itself the simplest drift check 219/219 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:25:32 -04:00
# What's actually running on Dalidou
LIVE_SHA=$(curl -fsS http://dalidou:8100/health | grep -o '"build_sha":"[^"]*"' | cut -d'"' -f4)
# What the deployed branch tip should be
EXPECTED_SHA=$(cd /srv/storage/atocore/app && git rev-parse HEAD)
# Compare
if [ "$LIVE_SHA" = "$EXPECTED_SHA" ]; then
echo "live is current at $LIVE_SHA"
else
echo "DRIFT: live $LIVE_SHA vs expected $EXPECTED_SHA"
echo "run deploy.sh to sync"
fi
```
The `deploy.sh` script does exactly this comparison automatically
in its post-deploy verification step (Step 6) and exits non-zero
on mismatch. So the **simplest drift check** is just to run
`deploy.sh` — if there's nothing to deploy, it succeeds quickly;
if the live service is stale, it deploys and verifies.
deploy: version-visible /health + deploy.sh + update runbook Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
deploy: add build_sha visibility for precise drift detection Make /health report the precise git SHA the container was built from, so 'is the live service current?' can be answered without ambiguity. 0.2.0 was too coarse to trust as a 'live is current' signal — many commits share the same __version__. Three layers: 1. /health endpoint (src/atocore/api/routes.py) - Reads ATOCORE_BUILD_SHA, ATOCORE_BUILD_TIME, ATOCORE_BUILD_BRANCH from environment, defaults to 'unknown' - Reports them alongside existing code_version field 2. docker-compose.yml - Forwards the three env vars from the host into the container - Defaults to 'unknown' so direct `docker compose up` runs (without deploy.sh) cleanly signal missing build provenance 3. deploy.sh - Step 2 captures git SHA + UTC timestamp + branch and exports them as env vars before `docker compose up -d --build` - Step 6 reads /health post-deploy and compares the reported build_sha against the freshly-built one. Mismatch exits non-zero (exit code 6) with a remediation hint covering cached image, env propagation, and concurrent restart cases Tests (tests/test_api_storage.py): - test_health_endpoint_reports_code_version_from_module - test_health_endpoint_reports_build_metadata_from_env - test_health_endpoint_reports_unknown_when_build_env_unset Docs (docs/dalidou-deployment.md): - Three-level drift detection table (code_version coarse, build_sha precise, build_time/branch forensic) - Canonical drift check script using LIVE_SHA vs EXPECTED_SHA - Note that running deploy.sh is itself the simplest drift check 219/219 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:25:32 -04:00
If `/health` reports `build_sha: "unknown"`, the running container
was started without `deploy.sh` (probably via `docker compose up`
directly), and the build provenance was never recorded. Re-run
via `deploy.sh` to fix.
The coarse `code_version` check is still useful as a quick visual
sanity check — bumping `__version__` from `0.2.0` to `0.3.0`
signals a meaningful release boundary even if the precise
`build_sha` is what tools should compare against:
```bash
# Quick sanity check (coarse)
curl -s http://127.0.0.1:8100/health | grep -o '"code_version":"[^"]*"'
deploy: version-visible /health + deploy.sh + update runbook Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
grep '__version__' /srv/storage/atocore/app/src/atocore/__init__.py
```
deploy: version-visible /health + deploy.sh + update runbook Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
2026-04-08 18:08:49 -04:00
### Schema migrations on redeploy
When updating from an older `__version__`, the first startup after
the redeploy runs the idempotent ALTER TABLE migrations in
`_apply_migrations`. For a pre-0.2.0 → 0.2.0 upgrade the migrations
add these columns to existing tables (all with safe defaults so no
data is touched):
- `memories.project TEXT DEFAULT ''`
- `memories.last_referenced_at DATETIME`
- `memories.reference_count INTEGER DEFAULT 0`
- `interactions.response TEXT DEFAULT ''`
- `interactions.memories_used TEXT DEFAULT '[]'`
- `interactions.chunks_used TEXT DEFAULT '[]'`
- `interactions.client TEXT DEFAULT ''`
- `interactions.session_id TEXT DEFAULT ''`
- `interactions.project TEXT DEFAULT ''`
Plus new indexes on the new columns. No row data is modified. The
migration is safe to run against a database that already has the
columns — the `_column_exists` check makes each ALTER a no-op in
that case.
Backup the database before any redeploy (via `POST /admin/backup`)
if you want a pre-upgrade snapshot. The migration is additive and
reversible by restoring the snapshot.
## Deferred
- backup automation
- restore/snapshot tooling
- reverse proxy / TLS exposure
- automated source ingestion job
- OpenClaw client wiring
## Current Reality Check
When this deployment is first brought up, the service may be healthy before the
real corpus has been ingested.
That means:
- AtoCore the system can already be hosted on Dalidou
- the canonical machine-data location can already be on Dalidou
- but the live knowledge/content corpus may still be empty or only partially
loaded until source ingestion is run