Harden runtime and add backup foundation

This commit is contained in:
2026-04-06 10:15:00 -04:00
parent 9715fe3143
commit c9757e313a
11 changed files with 331 additions and 10 deletions

80
docs/backup-strategy.md Normal file
View File

@@ -0,0 +1,80 @@
# AtoCore Backup Strategy
## Purpose
This document describes the current backup baseline for the Dalidou-hosted
AtoCore machine store.
The immediate goal is not full disaster-proof automation yet. The goal is to
have one safe, repeatable way to snapshot the most important writable state.
## Current Backup Baseline
Today, the safest hot-backup target is:
- SQLite machine database
- project registry JSON
- backup metadata describing what was captured
This is now supported by:
- `python -m atocore.ops.backup`
## What The Script Captures
The backup command creates a timestamped snapshot under:
- `ATOCORE_BACKUP_DIR/snapshots/<timestamp>/`
It currently writes:
- `db/atocore.db`
- created with SQLite's backup API
- `config/project-registry.json`
- copied if it exists
- `backup-metadata.json`
- timestamp, paths, and backup notes
## What It Does Not Yet Capture
The current script does not hot-backup Chroma.
That is intentional.
For now, Chroma should be treated as one of:
- rebuildable derived state
- or something that needs a deliberate cold snapshot/export workflow
Until that workflow exists, do not rely on ad hoc live file copies of the
vector store while the service is actively writing.
## Dalidou Use
On Dalidou, the canonical machine paths are:
- DB:
- `/srv/storage/atocore/data/db/atocore.db`
- registry:
- `/srv/storage/atocore/config/project-registry.json`
- backups:
- `/srv/storage/atocore/backups`
So a normal backup run should happen on Dalidou itself, not from another
machine.
## Next Backup Improvements
1. decide Chroma policy clearly
- rebuild vs cold snapshot vs export
2. add a simple scheduled backup routine on Dalidou
3. add retention policy for old snapshots
4. optionally add a restore validation check
## Healthy Rule
Do not design around syncing the live machine DB/vector store between machines.
Back up the canonical Dalidou state.
Restore from Dalidou state.
Keep OpenClaw as a client of AtoCore, not a storage peer.

View File

@@ -39,6 +39,11 @@ now includes a first curated ingestion batch for the active projects.
- context builder
- API routes for query, context, health, and source status
- project registry and per-project refresh foundation
- project registration lifecycle:
- template
- proposal preview
- approved registration
- refresh
- env-driven storage and deployment paths
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
@@ -64,6 +69,11 @@ The service and storage foundation are live on Dalidou.
The machine-data host is real and canonical.
The project registry is now also persisted in a canonical mounted config path on
Dalidou:
- `/srv/storage/atocore/config/project-registry.json`
The content corpus is partially populated now.
The Dalidou instance already contains:
@@ -88,9 +98,9 @@ The Dalidou instance already contains:
Current live stats after the latest documentation sync and active-project ingest
passes:
- `source_documents`: 34
- `source_chunks`: 551
- `vectors`: 551
- `source_documents`: 35
- `source_chunks`: 560
- `vectors`: 560
The broader long-term corpus is still not fully populated yet. Wider project and
vault ingestion remains a deliberate next step rather than something already
@@ -149,8 +159,28 @@ The source refresh model now has a concrete foundation in code:
- a project registry file defines known project ids, aliases, and ingest roots
- the API can list registered projects
- the API can return a registration template
- the API can preview a registration without mutating state
- the API can persist an approved registration
- the API can refresh one registered project at a time
This lifecycle is now coherent end to end for normal use.
## Reliability Baseline
The runtime has now been hardened in a few practical ways:
- SQLite connections use a configurable busy timeout
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
- project registry writes are atomic file replacements rather than in-place rewrites
- a first runtime backup path now exists for:
- SQLite
- project registry
- backup metadata
This does not eliminate every concurrency edge, but it materially improves the
current operational baseline.
In `Trusted Project State`:
- each active seeded project now has a conservative trusted-state set
@@ -167,7 +197,7 @@ This separation is healthy:
## Immediate Next Focus
1. Use the new T420-side AtoCore skill in real OpenClaw workflows
1. Use the new T420-side AtoCore skill and registration flow in real OpenClaw workflows
2. Tighten retrieval quality for the newly seeded active projects
3. Define the first broader AtoVault/AtoDrive ingestion batches
4. Add backup/export strategy for Dalidou machine state

View File

@@ -31,10 +31,12 @@ AtoCore now has:
explicit
- move toward a project source registry and refresh workflow
- foundation now exists via project registry + per-project refresh API
- registration policy + template are now the next normal path for new projects
- registration policy + template + proposal + approved registration are now
the normal path for new projects
5. Define backup and export procedures for Dalidou
- SQLite snapshot/backup strategy
- exercise the new SQLite + registry snapshot path on Dalidou
- Chroma backup or rebuild policy
- retention and restore validation
6. Keep deeper automatic runtime integration deferred until the read-only model
has proven value
@@ -101,6 +103,7 @@ P06:
The next batch is successful if:
- OpenClaw can use AtoCore naturally when context is needed
- OpenClaw can also register a new project cleanly before refreshing it
- AtoCore answers correctly for the active project set
- retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
- trusted project state remains concise and high confidence