Files

Anto01 b492f5f7b0 fix: schema init ordering, deploy.sh default, client BASE_URL docs

Three issues Dalidou Claude surfaced during the first real deploy
of commit e877e5b to the live service (report from 2026-04-08).
Bug 1 was the critical one — a schema init ordering bug that would
have bitten every future upgrade from a pre-Phase-9 schema — and
the other two were usability traps around hostname resolution.

Bug 1 (CRITICAL): schema init ordering
--------------------------------------
src/atocore/models/database.py

SCHEMA_SQL contained CREATE INDEX statements that referenced
columns added later by _apply_migrations():

    CREATE INDEX IF NOT EXISTS idx_memories_project ON memories(project);
    CREATE INDEX IF NOT EXISTS idx_interactions_project_name ON interactions(project);
    CREATE INDEX IF NOT EXISTS idx_interactions_session ON interactions(session_id);

On a FRESH install, CREATE TABLE IF NOT EXISTS creates the tables
with the Phase 9 shape (columns present), so the CREATE INDEX runs
cleanly and _apply_migrations is effectively a no-op.

On an UPGRADE from a pre-Phase-9 schema, CREATE TABLE IF NOT EXISTS
is a no-op (the tables already exist in the old shape), the columns
are NOT added yet, and the CREATE INDEX fails with
"OperationalError: no such column: project" before
_apply_migrations gets a chance to add the columns.

Dalidou Claude hit this exactly when redeploying from 0.1.0 to
0.2.0 — had to manually ALTER TABLE to add the Phase 9 columns
before the container could start.

The fix is to remove the Phase 9-column indexes from SCHEMA_SQL.
They already exist in _apply_migrations() AFTER the corresponding
ALTER TABLE, so they still get created on both fresh and upgrade
paths — just after the columns exist, not before.

Indexes still in SCHEMA_SQL (all safe — reference columns that
have existed since the first release):
- idx_chunks_document on source_chunks(document_id)
- idx_memories_type on memories(memory_type)
- idx_memories_status on memories(status)
- idx_interactions_project on interactions(project_id)

Indexes moved to _apply_migrations (already there — just no longer
duplicated in SCHEMA_SQL):
- idx_memories_project on memories(project)
- idx_interactions_project_name on interactions(project)
- idx_interactions_session on interactions(session_id)
- idx_interactions_created_at on interactions(created_at)

Regression test: tests/test_database.py
---------------------------------------
New test_init_db_upgrades_pre_phase9_schema_without_failing:

- Seeds the DB with the exact pre-Phase-9 shape (no project /
  last_referenced_at / reference_count on memories; no project /
  client / session_id / response / memories_used / chunks_used on
  interactions)
- Calls init_db() — which used to raise OperationalError before
  the fix
- Verifies all Phase 9 columns are present after the call
- Verifies the migration indexes exist

Before the fix this test would have failed with
"OperationalError: no such column: project" on the init_db call.
After the fix it passes. This locks the invariant "init_db is
safe on any legacy schema shape" so the bug can't silently come
back.

Full suite: 216 passing (was 215), 1 warning. The +1 is the new
regression test.

Bug 3 (usability): deploy.sh DNS default
----------------------------------------
deploy/dalidou/deploy.sh

ATOCORE_GIT_REMOTE defaulted to http://dalidou:3000/Antoine/ATOCore.git
which requires the "dalidou" hostname to resolve. On the Dalidou
host itself it didn't (no /etc/hosts entry for localhost alias),
so deploy.sh had to be run with the IP as a manual workaround.

Fix: default ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git.
Loopback always works on the host running the script. Callers
from a remote host (e.g. running deploy.sh from a laptop against
the Dalidou LAN) set ATOCORE_GIT_REMOTE explicitly. The script
header's Environment Variables section documents this with an
explicit reference to the 2026-04-08 Dalidou deploy report so the
rationale isn't lost.

docs/dalidou-deployment.md gets a new "Troubleshooting hostname
resolution" subsection and a new example invocation showing how
to deploy from a remote host with an explicit ATOCORE_GIT_REMOTE
override.

Bug 2 (usability): atocore_client.py ATOCORE_BASE_URL documentation
-------------------------------------------------------------------
scripts/atocore_client.py

Same class of issue as bug 3. BASE_URL defaults to
http://dalidou:8100 which resolves fine from a remote caller
(laptop, T420/OpenClaw over Tailscale) but NOT from the Dalidou
host itself or from inside the atocore container. Dalidou Claude
saw the CLI return
{"status": "unavailable", "fail_open": true}
while direct curl to http://127.0.0.1:8100 worked.

The fix here is NOT to change the default (remote callers are
the common case and would break) but to DOCUMENT the override
clearly so the next operator knows what's happening:

- The script module docstring grew a new "Environment variables"
  section covering ATOCORE_BASE_URL, ATOCORE_TIMEOUT_SECONDS,
  ATOCORE_REFRESH_TIMEOUT_SECONDS, and ATOCORE_FAIL_OPEN, with
  the explicit override example for on-host/in-container use
- It calls out the exact symptom (fail-open envelope when the
  base URL doesn't resolve) so the diagnosis is obvious from
  the error alone
- docs/dalidou-deployment.md troubleshooting section mirrors
  this guidance so there's one place to look regardless of
  whether the operator starts with the client help or the
  deploy doc

What this commit does NOT do
----------------------------
- Does NOT change the default ATOCORE_BASE_URL. Doing that would
  break the T420 OpenClaw helper and every remote caller who
  currently relies on the hostname. Documentation is the right
  fix for this case.
- Does NOT fix /etc/hosts on Dalidou. That's a host-level
  configuration issue that the user can fix if they prefer
  having the hostname resolve; the deploy.sh fix makes it
  unnecessary regardless.
- Does NOT re-run the validation on Dalidou. The next step is
  for the live service to pull this commit via deploy.sh (which
  should now work without the IP workaround) and re-run the
  Phase 9 loop test to confirm nothing regressed.

2026-04-08 19:02:57 -04:00

6.0 KiB

Raw Blame History

Dalidou Deployment

Purpose

Deploy AtoCore on Dalidou as the canonical runtime and machine-memory host.

Model

Dalidou hosts the canonical AtoCore service.
OpenClaw on the T420 consumes AtoCore over network/Tailscale API.
sources/vault and sources/drive are read-only inputs by convention.
SQLite/Chroma machine state stays on Dalidou and is not treated as a sync peer.
The app and machine-storage host can be live before the long-term content corpus is fully populated.

Directory layout

/srv/storage/atocore/
  app/         # deployed repo checkout
  data/
    db/
    chroma/
    cache/
    tmp/
  sources/
    vault/
    drive/
  logs/
  backups/
  run/

Compose workflow

The compose definition lives in:

deploy/dalidou/docker-compose.yml

The Dalidou environment file should be copied to:

deploy/dalidou/.env

starting from:

deploy/dalidou/.env.example

First-time deployment steps

Place the repository under /srv/storage/atocore/app — ideally as a proper git clone so future updates can be pulled, not as a static snapshot:
```
sudo git clone http://dalidou:3000/Antoine/ATOCore.git \
    /srv/storage/atocore/app
```
Create the canonical directories listed above.
Copy deploy/dalidou/.env.example to deploy/dalidou/.env.
Adjust the source paths if your AtoVault/AtoDrive mirrors live elsewhere.

Run:

cd /srv/storage/atocore/app/deploy/dalidou
docker compose up -d --build

Validate:

curl http://127.0.0.1:8100/health
curl http://127.0.0.1:8100/sources

Updating a running deployment

Use deploy/dalidou/deploy.sh for every code update. It is the one-shot sync script that:

fetches latest main from Gitea into /srv/storage/atocore/app
(if the app dir is not a git checkout) backs it up as <dir>.pre-git-<timestamp> and re-clones
rebuilds the container image
restarts the container
waits for /health to respond
compares the reported code_version against the __version__ in the freshly-pulled source, and exits non-zero if they don't match (deployment drift detection)

# Normal update from main
bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh

# Deploy a specific branch or tag
ATOCORE_BRANCH=codex/some-feature \
    bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh

# Dry-run: show what would happen without touching anything
ATOCORE_DEPLOY_DRY_RUN=1 \
    bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh

# Deploy from a remote host (e.g. the laptop) using the Tailscale
# or LAN address instead of loopback
ATOCORE_GIT_REMOTE=http://192.168.86.50:3000/Antoine/ATOCore.git \
    bash /srv/storage/atocore/app/deploy/dalidou/deploy.sh

The script is idempotent and safe to re-run. It never touches the database directly — schema migrations are applied automatically at service startup by the lifespan handler in src/atocore/main.py which calls init_db() (which in turn runs the ALTER TABLE statements in _apply_migrations).

Troubleshooting hostname resolution

deploy.sh defaults ATOCORE_GIT_REMOTE to http://127.0.0.1:3000/Antoine/ATOCore.git (loopback) because the hostname "dalidou" doesn't reliably resolve on the host itself — the first real Dalidou deploy hit exactly this on 2026-04-08. If you need to override (e.g. running deploy.sh from a laptop against the Dalidou LAN), set ATOCORE_GIT_REMOTE explicitly.

The same applies to scripts/atocore_client.py: its default ATOCORE_BASE_URL is http://dalidou:8100 for remote callers, but when running the client on Dalidou itself (or inside the container via docker exec), override to loopback:

ATOCORE_BASE_URL=http://127.0.0.1:8100 \
    python scripts/atocore_client.py health

If you see {"status": "unavailable", "fail_open": true} from the client, the first thing to check is whether the base URL resolves from where you're running the client.

Deployment drift detection

/health reports both version and code_version fields, both set from atocore.__version__ at import time. To check whether the deployed code matches the repo's main branch:

# What's running
curl -s http://127.0.0.1:8100/health | grep -o '"code_version":"[^"]*"'

# What's in the repo's main branch
grep '__version__' /srv/storage/atocore/app/src/atocore/__init__.py

If these differ, the deployment is stale. Run deploy.sh to sync.

Schema migrations on redeploy

When updating from an older __version__, the first startup after the redeploy runs the idempotent ALTER TABLE migrations in _apply_migrations. For a pre-0.2.0 → 0.2.0 upgrade the migrations add these columns to existing tables (all with safe defaults so no data is touched):

memories.project TEXT DEFAULT ''
memories.last_referenced_at DATETIME
memories.reference_count INTEGER DEFAULT 0
interactions.response TEXT DEFAULT ''
interactions.memories_used TEXT DEFAULT '[]'
interactions.chunks_used TEXT DEFAULT '[]'
interactions.client TEXT DEFAULT ''
interactions.session_id TEXT DEFAULT ''
interactions.project TEXT DEFAULT ''

Plus new indexes on the new columns. No row data is modified. The migration is safe to run against a database that already has the columns — the _column_exists check makes each ALTER a no-op in that case.

Backup the database before any redeploy (via POST /admin/backup) if you want a pre-upgrade snapshot. The migration is additive and reversible by restoring the snapshot.

Deferred

backup automation
restore/snapshot tooling
reverse proxy / TLS exposure
automated source ingestion job
OpenClaw client wiring

Current Reality Check

When this deployment is first brought up, the service may be healthy before the real corpus has been ingested.

That means:

AtoCore the system can already be hosted on Dalidou
the canonical machine-data location can already be on Dalidou
but the live knowledge/content corpus may still be empty or only partially loaded until source ingestion is run

6.0 KiB Raw Blame History