Harden runtime and add backup foundation

This commit is contained in:
2026-04-06 10:15:00 -04:00
parent 9715fe3143
commit c9757e313a
11 changed files with 331 additions and 10 deletions

View File

@@ -17,6 +17,7 @@ ATOCORE_PROJECT_REGISTRY_DIR=./config
ATOCORE_PROJECT_REGISTRY_PATH=./config/project-registry.json
ATOCORE_HOST=127.0.0.1
ATOCORE_PORT=8100
ATOCORE_DB_BUSY_TIMEOUT_MS=5000
ATOCORE_EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
ATOCORE_CHUNK_MAX_SIZE=800
ATOCORE_CHUNK_OVERLAP=100

80
docs/backup-strategy.md Normal file
View File

@@ -0,0 +1,80 @@
# AtoCore Backup Strategy
## Purpose
This document describes the current backup baseline for the Dalidou-hosted
AtoCore machine store.
The immediate goal is not full disaster-proof automation yet. The goal is to
have one safe, repeatable way to snapshot the most important writable state.
## Current Backup Baseline
Today, the safest hot-backup target is:
- SQLite machine database
- project registry JSON
- backup metadata describing what was captured
This is now supported by:
- `python -m atocore.ops.backup`
## What The Script Captures
The backup command creates a timestamped snapshot under:
- `ATOCORE_BACKUP_DIR/snapshots/<timestamp>/`
It currently writes:
- `db/atocore.db`
- created with SQLite's backup API
- `config/project-registry.json`
- copied if it exists
- `backup-metadata.json`
- timestamp, paths, and backup notes
## What It Does Not Yet Capture
The current script does not hot-backup Chroma.
That is intentional.
For now, Chroma should be treated as one of:
- rebuildable derived state
- or something that needs a deliberate cold snapshot/export workflow
Until that workflow exists, do not rely on ad hoc live file copies of the
vector store while the service is actively writing.
## Dalidou Use
On Dalidou, the canonical machine paths are:
- DB:
- `/srv/storage/atocore/data/db/atocore.db`
- registry:
- `/srv/storage/atocore/config/project-registry.json`
- backups:
- `/srv/storage/atocore/backups`
So a normal backup run should happen on Dalidou itself, not from another
machine.
## Next Backup Improvements
1. decide Chroma policy clearly
- rebuild vs cold snapshot vs export
2. add a simple scheduled backup routine on Dalidou
3. add retention policy for old snapshots
4. optionally add a restore validation check
## Healthy Rule
Do not design around syncing the live machine DB/vector store between machines.
Back up the canonical Dalidou state.
Restore from Dalidou state.
Keep OpenClaw as a client of AtoCore, not a storage peer.

View File

@@ -39,6 +39,11 @@ now includes a first curated ingestion batch for the active projects.
- context builder
- API routes for query, context, health, and source status
- project registry and per-project refresh foundation
- project registration lifecycle:
- template
- proposal preview
- approved registration
- refresh
- env-driven storage and deployment paths
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
@@ -64,6 +69,11 @@ The service and storage foundation are live on Dalidou.
The machine-data host is real and canonical.
The project registry is now also persisted in a canonical mounted config path on
Dalidou:
- `/srv/storage/atocore/config/project-registry.json`
The content corpus is partially populated now.
The Dalidou instance already contains:
@@ -88,9 +98,9 @@ The Dalidou instance already contains:
Current live stats after the latest documentation sync and active-project ingest
passes:
- `source_documents`: 34
- `source_chunks`: 551
- `vectors`: 551
- `source_documents`: 35
- `source_chunks`: 560
- `vectors`: 560
The broader long-term corpus is still not fully populated yet. Wider project and
vault ingestion remains a deliberate next step rather than something already
@@ -149,8 +159,28 @@ The source refresh model now has a concrete foundation in code:
- a project registry file defines known project ids, aliases, and ingest roots
- the API can list registered projects
- the API can return a registration template
- the API can preview a registration without mutating state
- the API can persist an approved registration
- the API can refresh one registered project at a time
This lifecycle is now coherent end to end for normal use.
## Reliability Baseline
The runtime has now been hardened in a few practical ways:
- SQLite connections use a configurable busy timeout
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
- project registry writes are atomic file replacements rather than in-place rewrites
- a first runtime backup path now exists for:
- SQLite
- project registry
- backup metadata
This does not eliminate every concurrency edge, but it materially improves the
current operational baseline.
In `Trusted Project State`:
- each active seeded project now has a conservative trusted-state set
@@ -167,7 +197,7 @@ This separation is healthy:
## Immediate Next Focus
1. Use the new T420-side AtoCore skill in real OpenClaw workflows
1. Use the new T420-side AtoCore skill and registration flow in real OpenClaw workflows
2. Tighten retrieval quality for the newly seeded active projects
3. Define the first broader AtoVault/AtoDrive ingestion batches
4. Add backup/export strategy for Dalidou machine state

View File

@@ -31,10 +31,12 @@ AtoCore now has:
explicit
- move toward a project source registry and refresh workflow
- foundation now exists via project registry + per-project refresh API
- registration policy + template are now the next normal path for new projects
- registration policy + template + proposal + approved registration are now
the normal path for new projects
5. Define backup and export procedures for Dalidou
- SQLite snapshot/backup strategy
- exercise the new SQLite + registry snapshot path on Dalidou
- Chroma backup or rebuild policy
- retention and restore validation
6. Keep deeper automatic runtime integration deferred until the read-only model
has proven value
@@ -101,6 +103,7 @@ P06:
The next batch is successful if:
- OpenClaw can use AtoCore naturally when context is needed
- OpenClaw can also register a new project cleanly before refreshing it
- AtoCore answers correctly for the active project set
- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
- trusted project state remains concise and high confidence

View File

@@ -24,6 +24,7 @@ class Settings(BaseSettings):
project_registry_path: Path = Path("./config/project-registry.json")
host: str = "127.0.0.1"
port: int = 8100
db_busy_timeout_ms: int = 5000
# Embedding
embedding_model: str = (

View File

@@ -100,9 +100,15 @@ def _column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
def get_connection() -> Generator[sqlite3.Connection, None, None]:
"""Get a database connection with row factory."""
_ensure_data_dir()
conn = sqlite3.connect(str(_config.settings.db_path))
conn = sqlite3.connect(
str(_config.settings.db_path),
timeout=_config.settings.db_busy_timeout_ms / 1000,
)
conn.row_factory = sqlite3.Row
conn.execute("PRAGMA foreign_keys = ON")
conn.execute(f"PRAGMA busy_timeout = {_config.settings.db_busy_timeout_ms}")
conn.execute("PRAGMA journal_mode = WAL")
conn.execute("PRAGMA synchronous = NORMAL")
try:
yield conn
conn.commit()

View File

@@ -0,0 +1 @@
"""Operational utilities for running AtoCore safely."""

70
src/atocore/ops/backup.py Normal file
View File

@@ -0,0 +1,70 @@
"""Create safe runtime backups for the AtoCore machine store."""
from __future__ import annotations
import json
import sqlite3
from datetime import datetime, UTC
from pathlib import Path
import atocore.config as _config
from atocore.models.database import init_db
from atocore.observability.logger import get_logger
log = get_logger("backup")
def create_runtime_backup(timestamp: datetime | None = None) -> dict:
"""Create a hot backup of the SQLite DB plus registry/config metadata."""
init_db()
now = timestamp or datetime.now(UTC)
stamp = now.strftime("%Y%m%dT%H%M%SZ")
backup_root = _config.settings.resolved_backup_dir / "snapshots" / stamp
db_backup_dir = backup_root / "db"
config_backup_dir = backup_root / "config"
metadata_path = backup_root / "backup-metadata.json"
db_backup_dir.mkdir(parents=True, exist_ok=True)
config_backup_dir.mkdir(parents=True, exist_ok=True)
db_snapshot_path = db_backup_dir / _config.settings.db_path.name
_backup_sqlite_db(_config.settings.db_path, db_snapshot_path)
registry_snapshot = None
registry_path = _config.settings.resolved_project_registry_path
if registry_path.exists():
registry_snapshot = config_backup_dir / registry_path.name
registry_snapshot.write_text(registry_path.read_text(encoding="utf-8"), encoding="utf-8")
metadata = {
"created_at": now.isoformat(),
"backup_root": str(backup_root),
"db_snapshot_path": str(db_snapshot_path),
"db_size_bytes": db_snapshot_path.stat().st_size,
"registry_snapshot_path": str(registry_snapshot) if registry_snapshot else "",
"vector_store_note": "Chroma hot backup is not included in this script; use a cold snapshot or rebuild/export workflow.",
}
metadata_path.write_text(json.dumps(metadata, indent=2, ensure_ascii=True) + "\n", encoding="utf-8")
log.info("runtime_backup_created", backup_root=str(backup_root), db_snapshot=str(db_snapshot_path))
return metadata
def _backup_sqlite_db(source_path: Path, dest_path: Path) -> None:
source_conn = sqlite3.connect(str(source_path))
dest_conn = sqlite3.connect(str(dest_path))
try:
source_conn.backup(dest_conn)
finally:
dest_conn.close()
source_conn.close()
def main() -> None:
result = create_runtime_backup()
print(json.dumps(result, indent=2, ensure_ascii=True))
if __name__ == "__main__":
main()

View File

@@ -3,6 +3,7 @@
from __future__ import annotations
import json
import tempfile
from dataclasses import asdict, dataclass
from pathlib import Path
@@ -320,7 +321,15 @@ def _load_registry_payload(registry_path: Path) -> dict:
def _write_registry_payload(registry_path: Path, payload: dict) -> None:
registry_path.parent.mkdir(parents=True, exist_ok=True)
registry_path.write_text(
json.dumps(payload, indent=2, ensure_ascii=True) + "\n",
rendered = json.dumps(payload, indent=2, ensure_ascii=True) + "\n"
with tempfile.NamedTemporaryFile(
mode="w",
encoding="utf-8",
)
dir=registry_path.parent,
prefix=f"{registry_path.stem}.",
suffix=".tmp",
delete=False,
) as tmp_file:
tmp_file.write(rendered)
temp_path = Path(tmp_file.name)
temp_path.replace(registry_path)

71
tests/test_backup.py Normal file
View File

@@ -0,0 +1,71 @@
"""Tests for runtime backup creation."""
import json
import sqlite3
from datetime import UTC, datetime
import atocore.config as config
from atocore.models.database import init_db
from atocore.ops.backup import create_runtime_backup
def test_create_runtime_backup_copies_db_and_registry(tmp_path, monkeypatch):
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
registry_path = tmp_path / "config" / "project-registry.json"
registry_path.parent.mkdir(parents=True)
registry_path.write_text('{"projects":[{"id":"p01-example","aliases":[],"ingest_roots":[{"source":"vault","subpath":"incoming/projects/p01-example"}]}]}\n', encoding="utf-8")
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
with sqlite3.connect(str(config.settings.db_path)) as conn:
conn.execute("INSERT INTO projects (id, name) VALUES (?, ?)", ("p01", "P01 Example"))
conn.commit()
result = create_runtime_backup(datetime(2026, 4, 6, 18, 0, 0, tzinfo=UTC))
finally:
config.settings = original_settings
db_snapshot = tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "db" / "atocore.db"
registry_snapshot = (
tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "config" / "project-registry.json"
)
metadata_path = (
tmp_path / "backups" / "snapshots" / "20260406T180000Z" / "backup-metadata.json"
)
assert result["db_snapshot_path"] == str(db_snapshot)
assert db_snapshot.exists()
assert registry_snapshot.exists()
assert metadata_path.exists()
with sqlite3.connect(str(db_snapshot)) as conn:
row = conn.execute("SELECT name FROM projects WHERE id = ?", ("p01",)).fetchone()
assert row[0] == "P01 Example"
metadata = json.loads(metadata_path.read_text(encoding="utf-8"))
assert metadata["registry_snapshot_path"] == str(registry_snapshot)
def test_create_runtime_backup_handles_missing_registry(tmp_path, monkeypatch):
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_BACKUP_DIR", str(tmp_path / "backups"))
monkeypatch.setenv(
"ATOCORE_PROJECT_REGISTRY_PATH", str(tmp_path / "config" / "project-registry.json")
)
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
result = create_runtime_backup(datetime(2026, 4, 6, 19, 0, 0, tzinfo=UTC))
finally:
config.settings = original_settings
assert result["registry_snapshot_path"] == ""

49
tests/test_database.py Normal file
View File

@@ -0,0 +1,49 @@
"""Tests for SQLite connection pragmas and runtime behavior."""
import sqlite3
import atocore.config as config
from atocore.models.database import get_connection, init_db
def test_get_connection_applies_busy_timeout_and_wal(tmp_path, monkeypatch):
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_DB_BUSY_TIMEOUT_MS", "7000")
original_settings = config.settings
try:
config.settings = config.Settings()
init_db()
with get_connection() as conn:
busy_timeout = conn.execute("PRAGMA busy_timeout").fetchone()[0]
journal_mode = conn.execute("PRAGMA journal_mode").fetchone()[0]
foreign_keys = conn.execute("PRAGMA foreign_keys").fetchone()[0]
finally:
config.settings = original_settings
assert busy_timeout == 7000
assert str(journal_mode).lower() == "wal"
assert foreign_keys == 1
def test_get_connection_uses_configured_timeout_value(tmp_path, monkeypatch):
monkeypatch.setenv("ATOCORE_DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("ATOCORE_DB_BUSY_TIMEOUT_MS", "2500")
original_settings = config.settings
original_connect = sqlite3.connect
calls = []
def fake_connect(*args, **kwargs):
calls.append(kwargs.get("timeout"))
return original_connect(*args, **kwargs)
try:
config.settings = config.Settings()
monkeypatch.setattr("atocore.models.database.sqlite3.connect", fake_connect)
init_db()
finally:
config.settings = original_settings
assert calls
assert calls[0] == 2.5