Dalidou Claude's validation run against the live service exposed a structural gap: the deployment at /srv/storage/atocore/app has no git connection, the running container was built from pre-Phase-9 source, and /health hardcoded 'version: 0.1.0' so drift is invisible. Weeks of work have been shipping to Gitea but never reaching the live service. This commit fixes both the drift-invisibility problem and the absence of an update workflow, so the next deploy to Dalidou can go live cleanly and future drifts surface immediately. Layer 1: deployment drift is now visible via /health ---------------------------------------------------- - src/atocore/__init__.py: __version__ bumped from 0.1.0 to 0.2.0 and documented as the source of truth for the deployed code version, with a history block explaining when each bump happens (API surface change, schema change, user-visible behavior change) - src/atocore/main.py: FastAPI constructor now uses __version__ instead of the hardcoded '0.1.0' string, so the OpenAPI docs reflect the actual code version - src/atocore/api/routes.py: /health now reads from __version__ dynamically. Both the existing 'version' field and a new 'code_version' field report the same value for backwards compat. A new docstring explains that comparing this to the main branch's __version__ is the fastest way to detect drift. - pyproject.toml: version bumped to 0.2.0 to stay in sync The comparison is now: curl /health -> "code_version": "0.2.0" grep __version__ src/atocore/__init__.py -> "0.2.0" If those differ, the deployment is stale. Concrete, unambiguous. Layer 2: deploy.sh as the canonical update path ----------------------------------------------- New file: deploy/dalidou/deploy.sh One-shot bash script that handles both the first-time deploy (where /srv/storage/atocore/app may not be a git repo yet) and the ongoing update case. Steps: 1. If app dir is not a git checkout, back it up as <dir>.pre-git-<utc-stamp> and re-clone from Gitea. If it IS a checkout, fetch + reset --hard origin/<branch>. 2. Report the deployable commit SHA 3. Check that deploy/dalidou/.env exists (hard fail if missing with a clear message pointing at .env.example) 4. docker compose up -d --build — rebuilds the image from current source, restarts the container 5. Poll /health for up to 30 seconds; on failure, print the last 50 lines of container logs and exit non-zero 6. Parse /health.code_version and compare to the __version__ in the freshly-pulled source. If they differ, exit non-zero with a message suggesting docker compose down && up 7. On success, report commit + code_version + "health: ok" Configurable via env vars: - ATOCORE_APP_DIR (default /srv/storage/atocore/app) - ATOCORE_GIT_REMOTE (default http://dalidou:3000/Antoine/ATOCore.git) - ATOCORE_BRANCH (default main) - ATOCORE_HEALTH_URL (default http://127.0.0.1:8100/health) - ATOCORE_DEPLOY_DRY_RUN=1 for preview-only mode Explicit non-goals documented in the script header: - does not manage secrets (.env is the caller's responsibility) - does not take a pre-deploy backup (call /admin/backup first if you want one) - does not roll back on failure (redeploy a known-good commit to recover) - does not touch the DB directly — schema migrations run at service startup via the lifespan handler, and all existing _apply_migrations ALTERs are idempotent ADD COLUMN operations Layer 3: updated docs/dalidou-deployment.md ------------------------------------------- - First-time deployment steps now explicitly say "git clone", not "place the repository", so future first-time deploys don't end up as static snapshots again - New "Updating a running deployment" section covering deploy.sh usage with all three modes (normal / branch override / dry-run) - New "Deployment drift detection" section with the one-liner comparison between /health code_version and the repo's __version__ - New "Schema migrations on redeploy" section enumerating the exact ALTER TABLE statements that run on a pre-0.2.0 -> 0.2.0 upgrade, confirming they are additive-only and safe, and recommending a backup via /admin/backup before any redeploy Full suite: 215 passing, 1 warning. No test was hardcoded to the old version string, so the version bump was safe without test changes. What this commit does NOT do ---------------------------- - Does NOT execute the deploy on the live Dalidou instance. That requires Dalidou access and is the next step. A ready-to-paste prompt for Dalidou Claude will be provided separately. - Does NOT add CI/CD, webhook-based auto-deploy, or reverse proxy. Those remain in the 'deferred' section of the deployment doc. - Does NOT change the Dockerfile. The existing 'COPY source at build time' pattern is what deploy.sh relies on — rebuilding the image picks up new code. - Does NOT modify the database schema. The Phase 9 migrations that Dalidou's DB needs will be applied automatically on next service startup via the existing _apply_migrations path.
177 lines
4.9 KiB
Markdown
177 lines
4.9 KiB
Markdown
# Dalidou Deployment
|
|
|
|
## Purpose
|
|
Deploy AtoCore on Dalidou as the canonical runtime and machine-memory host.
|
|
|
|
## Model
|
|
|
|
- Dalidou hosts the canonical AtoCore service.
|
|
- OpenClaw on the T420 consumes AtoCore over network/Tailscale API.
|
|
- `sources/vault` and `sources/drive` are read-only inputs by convention.
|
|
- SQLite/Chroma machine state stays on Dalidou and is not treated as a sync peer.
|
|
- The app and machine-storage host can be live before the long-term content
|
|
corpus is fully populated.
|
|
|
|
## Directory layout
|
|
|
|
```text
|
|
/srv/storage/atocore/
|
|
app/ # deployed repo checkout
|
|
data/
|
|
db/
|
|
chroma/
|
|
cache/
|
|
tmp/
|
|
sources/
|
|
vault/
|
|
drive/
|
|
logs/
|
|
backups/
|
|
run/
|
|
```
|
|
|
|
## Compose workflow
|
|
|
|
The compose definition lives in:
|
|
|
|
```text
|
|
deploy/dalidou/docker-compose.yml
|
|
```
|
|
|
|
The Dalidou environment file should be copied to:
|
|
|
|
```text
|
|
deploy/dalidou/.env
|
|
```
|
|
|
|
starting from:
|
|
|
|
```text
|
|
deploy/dalidou/.env.example
|
|
```
|
|
|
|
## First-time deployment steps
|
|
|
|
1. Place the repository under `/srv/storage/atocore/app` — ideally as a
|
|
proper git clone so future updates can be pulled, not as a static
|
|
snapshot:
|
|
|
|
```bash
|
|
sudo git clone http://dalidou:3000/Antoine/ATOCore.git \
|
|
/srv/storage/atocore/app
|
|
```
|
|
|
|
2. Create the canonical directories listed above.
|
|
3. Copy `deploy/dalidou/.env.example` to `deploy/dalidou/.env`.
|
|
4. Adjust the source paths if your AtoVault/AtoDrive mirrors live elsewhere.
|
|
5. Run:
|
|
|
|
```bash
|
|
cd /srv/storage/atocore/app/deploy/dalidou
|
|
docker compose up -d --build
|
|
```
|
|
|
|
6. Validate:
|
|
|
|
```bash
|
|
curl http://127.0.0.1:8100/health
|
|
curl http://127.0.0.1:8100/sources
|
|
```
|
|
|
|
## Updating a running deployment
|
|
|
|
**Use `deploy/dalidou/deploy.sh` for every code update.** It is the
|
|
one-shot sync script that:
|
|
|
|
- fetches latest main from Gitea into `/srv/storage/atocore/app`
|
|
- (if the app dir is not a git checkout) backs it up as
|
|
`<dir>.pre-git-<timestamp>` and re-clones
|
|
- rebuilds the container image
|
|
- restarts the container
|
|
- waits for `/health` to respond
|
|
- compares the reported `code_version` against the
|
|
`__version__` in the freshly-pulled source, and exits non-zero
|
|
if they don't match (deployment drift detection)
|
|
|
|
```bash
|
|
# Normal update from main
|
|
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
|
|
|
# Deploy a specific branch or tag
|
|
ATOCORE_BRANCH=codex/some-feature \
|
|
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
|
|
|
# Dry-run: show what would happen without touching anything
|
|
ATOCORE_DEPLOY_DRY_RUN=1 \
|
|
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh
|
|
```
|
|
|
|
The script is idempotent and safe to re-run. It never touches the
|
|
database directly — schema migrations are applied automatically at
|
|
service startup by the lifespan handler in `src/atocore/main.py`
|
|
which calls `init_db()` (which in turn runs the ALTER TABLE
|
|
statements in `_apply_migrations`).
|
|
|
|
### Deployment drift detection
|
|
|
|
`/health` reports both `version` and `code_version` fields, both set
|
|
from `atocore.__version__` at import time. To check whether the
|
|
deployed code matches the repo's `main` branch:
|
|
|
|
```bash
|
|
# What's running
|
|
curl -s http://127.0.0.1:8100/health | grep -o '"code_version":"[^"]*"'
|
|
|
|
# What's in the repo's main branch
|
|
grep '__version__' /srv/storage/atocore/app/src/atocore/__init__.py
|
|
```
|
|
|
|
If these differ, the deployment is stale. Run `deploy.sh` to sync.
|
|
|
|
### Schema migrations on redeploy
|
|
|
|
When updating from an older `__version__`, the first startup after
|
|
the redeploy runs the idempotent ALTER TABLE migrations in
|
|
`_apply_migrations`. For a pre-0.2.0 → 0.2.0 upgrade the migrations
|
|
add these columns to existing tables (all with safe defaults so no
|
|
data is touched):
|
|
|
|
- `memories.project TEXT DEFAULT ''`
|
|
- `memories.last_referenced_at DATETIME`
|
|
- `memories.reference_count INTEGER DEFAULT 0`
|
|
- `interactions.response TEXT DEFAULT ''`
|
|
- `interactions.memories_used TEXT DEFAULT '[]'`
|
|
- `interactions.chunks_used TEXT DEFAULT '[]'`
|
|
- `interactions.client TEXT DEFAULT ''`
|
|
- `interactions.session_id TEXT DEFAULT ''`
|
|
- `interactions.project TEXT DEFAULT ''`
|
|
|
|
Plus new indexes on the new columns. No row data is modified. The
|
|
migration is safe to run against a database that already has the
|
|
columns — the `_column_exists` check makes each ALTER a no-op in
|
|
that case.
|
|
|
|
Backup the database before any redeploy (via `POST /admin/backup`)
|
|
if you want a pre-upgrade snapshot. The migration is additive and
|
|
reversible by restoring the snapshot.
|
|
|
|
## Deferred
|
|
|
|
- backup automation
|
|
- restore/snapshot tooling
|
|
- reverse proxy / TLS exposure
|
|
- automated source ingestion job
|
|
- OpenClaw client wiring
|
|
|
|
## Current Reality Check
|
|
|
|
When this deployment is first brought up, the service may be healthy before the
|
|
real corpus has been ingested.
|
|
|
|
That means:
|
|
|
|
- AtoCore the system can already be hosted on Dalidou
|
|
- the canonical machine-data location can already be on Dalidou
|
|
- but the live knowledge/content corpus may still be empty or only partially
|
|
loaded until source ingestion is run
|