Files

Anto01 c9b9eede25 feat: tunable ranking, refresh status, chroma backup + admin endpoints

Three small improvements that move the operational baseline forward
without changing the existing trust model.

1. Tunable retrieval ranking weights
   - rank_project_match_boost, rank_query_token_step,
     rank_query_token_cap, rank_path_high_signal_boost,
     rank_path_low_signal_penalty are now Settings fields
   - all overridable via ATOCORE_* env vars
   - retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32
   - lets ranking be tuned per environment as Wave 1 is exercised
     without code changes

2. /projects/{name}/refresh status
   - refresh_registered_project now returns an overall status field
     ("ingested", "partial", "nothing_to_ingest") plus roots_ingested
     and roots_skipped counters
   - ProjectRefreshResponse advertises the new fields so callers can
     rely on them
   - covers the case where every configured root is missing on disk

3. Chroma cold snapshot + admin backup endpoints
   - create_runtime_backup now accepts include_chroma and writes a
     cold directory copy of the chroma persistence path
   - new list_runtime_backups() and validate_backup() helpers
   - new endpoints:
     - POST /admin/backup            create snapshot (optional chroma)
     - GET  /admin/backup            list snapshots
     - GET  /admin/backup/{stamp}/validate  structural validation
   - chroma snapshots are taken under exclusive_ingestion() so a refresh
     or ingest cannot race with the cold copy
   - backup metadata records what was actually included and how big

Tests:
- 8 new tests covering tunable weights, refresh status branches
  (ingested / partial / nothing_to_ingest), chroma snapshot, list,
  validate, and the API endpoints (including the lock-acquisition path)
- existing fake refresh stubs in test_api_storage.py updated for the
  expanded ProjectRefreshResponse model
- full suite: 105 passing (was 97)

next-steps doc updated to reflect that the chroma snapshot + restore
validation gap from current-state.md is now closed in code; only the
operational retention policy remains.

2026-04-06 18:42:19 -04:00

5.4 KiB

Raw Blame History

AtoCore Next Steps

Current Position

AtoCore now has:

canonical runtime and machine storage on Dalidou
separated source and machine-data boundaries
initial self-knowledge ingested into the live instance
trusted project-state entries for AtoCore itself
a first read-only OpenClaw integration path on the T420
a first real active-project corpus batch for:
- p04-gigabit
- p05-interferometer
- p06-polisher

This working list should be read alongside:

master-plan-status.md

Immediate Next Steps

Use the T420 atocore-context skill and the new organic routing layer in real OpenClaw workflows
- confirm auto-context feels natural
- confirm project inference is good enough in practice
- confirm the fail-open behavior remains acceptable in practice
Review retrieval quality after the first real project ingestion batch
- check whether the top hits are useful
- check whether trusted project state remains dominant
- reduce cross-project competition and prompt ambiguity where needed
- use debug-context to inspect the exact last AtoCore supplement
Treat the active-project full markdown/text wave as complete
- p04-gigabit
- p05-interferometer
- p06-polisher
Define a cleaner source refresh model
- make the difference between source truth, staged inputs, and machine store explicit
- move toward a project source registry and refresh workflow
- foundation now exists via project registry + per-project refresh API
- registration policy + template + proposal + approved registration are now the normal path for new projects
Move to Wave 2 trusted-operational ingestion
- curated dashboards
- decision logs
- milestone/current-status views
- operational truth, not just raw project notes
Integrate the new engineering architecture docs into active planning, not immediate schema code
- keep docs/architecture/engineering-knowledge-hybrid-architecture.md as the target layer model
- keep docs/architecture/engineering-ontology-v1.md as the V1 structured-domain target
- do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
Define backup and export procedures for Dalidou
- exercise the new SQLite + registry snapshot path on Dalidou
- Chroma backup or rebuild policy
- retention and restore validation
- admin backup endpoint now supports include_chroma cold snapshot under the ingestion lock and validate confirms each snapshot is openable; remaining work is the operational retention policy
Keep deeper automatic runtime integration modest until the organic read-only model has proven value

Trusted State Status

The first conservative trusted-state promotion pass is now complete for:

p04-gigabit
p05-interferometer
p06-polisher

Each project now has a small set of stable entries covering:

summary
architecture or boundary decision
key constraints
current next focus

This materially improves context/build quality for project-hinted prompts.

Recommended Near-Term Project Work

The active-project full markdown/text wave is now in.

The near-term work is now:

strengthen retrieval quality
promote or refine trusted operational truth where the broad corpus is now too noisy
keep trusted project state concise and high-confidence
widen only through named ingestion waves

Recommended Next Wave Inputs

Wave 2 should emphasize trusted operational truth, not bulk historical notes.

P04:

current status dashboard
current selected design path
current frame interface truth
current next-step milestone view

P05:

selected vendor path
current error-budget baseline
current architecture freeze or open decisions
current procurement / next-action view

P06:

current system map
current shared contracts baseline
current calibration procedure truth
current July / proving roadmap view

Deferred On Purpose

automatic write-back from OpenClaw into AtoCore
automatic memory promotion
reflection loop integration
replacing OpenClaw's own memory system
syncing the live machine DB between machines

Success Criteria For The Next Batch

The next batch is successful if:

OpenClaw can use AtoCore naturally when context is needed
OpenClaw can infer registered projects and call AtoCore organically for project-knowledge questions
the active-project full corpus wave can be inspected and used concretely through auto-context, context-build, and debug-context
OpenClaw can also register a new project cleanly before refreshing it
existing project registrations can be refined safely before refresh when the staged source set evolves
AtoCore answers correctly for the active project set
retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
trusted project state remains concise and high confidence
project ingestion remains controlled rather than noisy
the canonical Dalidou instance stays stable

Long-Run Goal

The long-run target is:

continue working normally inside PKM project stacks and Gitea repos
let OpenClaw keep its own memory and runtime behavior
let AtoCore supplement LLM work with stronger trusted context, retrieval, and context assembly

That means AtoCore should behave like a durable external context engine and machine-memory layer, not a replacement for normal repo work or OpenClaw memory.

5.4 KiB Raw Blame History