Files
ATOCore/docs/next-steps.md
Anto01 c9b9eede25 feat: tunable ranking, refresh status, chroma backup + admin endpoints
Three small improvements that move the operational baseline forward
without changing the existing trust model.

1. Tunable retrieval ranking weights
   - rank_project_match_boost, rank_query_token_step,
     rank_query_token_cap, rank_path_high_signal_boost,
     rank_path_low_signal_penalty are now Settings fields
   - all overridable via ATOCORE_* env vars
   - retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32
   - lets ranking be tuned per environment as Wave 1 is exercised
     without code changes

2. /projects/{name}/refresh status
   - refresh_registered_project now returns an overall status field
     ("ingested", "partial", "nothing_to_ingest") plus roots_ingested
     and roots_skipped counters
   - ProjectRefreshResponse advertises the new fields so callers can
     rely on them
   - covers the case where every configured root is missing on disk

3. Chroma cold snapshot + admin backup endpoints
   - create_runtime_backup now accepts include_chroma and writes a
     cold directory copy of the chroma persistence path
   - new list_runtime_backups() and validate_backup() helpers
   - new endpoints:
     - POST /admin/backup            create snapshot (optional chroma)
     - GET  /admin/backup            list snapshots
     - GET  /admin/backup/{stamp}/validate  structural validation
   - chroma snapshots are taken under exclusive_ingestion() so a refresh
     or ingest cannot race with the cold copy
   - backup metadata records what was actually included and how big

Tests:
- 8 new tests covering tunable weights, refresh status branches
  (ingested / partial / nothing_to_ingest), chroma snapshot, list,
  validate, and the API endpoints (including the lock-acquisition path)
- existing fake refresh stubs in test_api_storage.py updated for the
  expanded ProjectRefreshResponse model
- full suite: 105 passing (was 97)

next-steps doc updated to reflect that the chroma snapshot + restore
validation gap from current-state.md is now closed in code; only the
operational retention policy remains.
2026-04-06 18:42:19 -04:00

5.4 KiB

AtoCore Next Steps

Current Position

AtoCore now has:

  • canonical runtime and machine storage on Dalidou
  • separated source and machine-data boundaries
  • initial self-knowledge ingested into the live instance
  • trusted project-state entries for AtoCore itself
  • a first read-only OpenClaw integration path on the T420
  • a first real active-project corpus batch for:
    • p04-gigabit
    • p05-interferometer
    • p06-polisher

This working list should be read alongside:

Immediate Next Steps

  1. Use the T420 atocore-context skill and the new organic routing layer in real OpenClaw workflows
    • confirm auto-context feels natural
    • confirm project inference is good enough in practice
    • confirm the fail-open behavior remains acceptable in practice
  2. Review retrieval quality after the first real project ingestion batch
    • check whether the top hits are useful
    • check whether trusted project state remains dominant
    • reduce cross-project competition and prompt ambiguity where needed
    • use debug-context to inspect the exact last AtoCore supplement
  3. Treat the active-project full markdown/text wave as complete
    • p04-gigabit
    • p05-interferometer
    • p06-polisher
  4. Define a cleaner source refresh model
    • make the difference between source truth, staged inputs, and machine store explicit
    • move toward a project source registry and refresh workflow
    • foundation now exists via project registry + per-project refresh API
    • registration policy + template + proposal + approved registration are now the normal path for new projects
  5. Move to Wave 2 trusted-operational ingestion
    • curated dashboards
    • decision logs
    • milestone/current-status views
    • operational truth, not just raw project notes
  6. Integrate the new engineering architecture docs into active planning, not immediate schema code
    • keep docs/architecture/engineering-knowledge-hybrid-architecture.md as the target layer model
    • keep docs/architecture/engineering-ontology-v1.md as the V1 structured-domain target
    • do not start entity/relationship persistence until the ingestion, retrieval, registry, and backup baseline feels boring and stable
  7. Define backup and export procedures for Dalidou
    • exercise the new SQLite + registry snapshot path on Dalidou
    • Chroma backup or rebuild policy
    • retention and restore validation
    • admin backup endpoint now supports include_chroma cold snapshot under the ingestion lock and validate confirms each snapshot is openable; remaining work is the operational retention policy
  8. Keep deeper automatic runtime integration modest until the organic read-only model has proven value

Trusted State Status

The first conservative trusted-state promotion pass is now complete for:

  • p04-gigabit
  • p05-interferometer
  • p06-polisher

Each project now has a small set of stable entries covering:

  • summary
  • architecture or boundary decision
  • key constraints
  • current next focus

This materially improves context/build quality for project-hinted prompts.

The active-project full markdown/text wave is now in.

The near-term work is now:

  1. strengthen retrieval quality
  2. promote or refine trusted operational truth where the broad corpus is now too noisy
  3. keep trusted project state concise and high-confidence
  4. widen only through named ingestion waves

Wave 2 should emphasize trusted operational truth, not bulk historical notes.

P04:

  • current status dashboard
  • current selected design path
  • current frame interface truth
  • current next-step milestone view

P05:

  • selected vendor path
  • current error-budget baseline
  • current architecture freeze or open decisions
  • current procurement / next-action view

P06:

  • current system map
  • current shared contracts baseline
  • current calibration procedure truth
  • current July / proving roadmap view

Deferred On Purpose

  • automatic write-back from OpenClaw into AtoCore
  • automatic memory promotion
  • reflection loop integration
  • replacing OpenClaw's own memory system
  • syncing the live machine DB between machines

Success Criteria For The Next Batch

The next batch is successful if:

  • OpenClaw can use AtoCore naturally when context is needed
  • OpenClaw can infer registered projects and call AtoCore organically for project-knowledge questions
  • the active-project full corpus wave can be inspected and used concretely through auto-context, context-build, and debug-context
  • OpenClaw can also register a new project cleanly before refreshing it
  • existing project registrations can be refined safely before refresh when the staged source set evolves
  • AtoCore answers correctly for the active project set
  • retrieval surfaces the seeded project docs instead of mostly AtoCore meta-docs
  • trusted project state remains concise and high confidence
  • project ingestion remains controlled rather than noisy
  • the canonical Dalidou instance stays stable

Long-Run Goal

The long-run target is:

  • continue working normally inside PKM project stacks and Gitea repos
  • let OpenClaw keep its own memory and runtime behavior
  • let AtoCore supplement LLM work with stronger trusted context, retrieval, and context assembly

That means AtoCore should behave like a durable external context engine and machine-memory layer, not a replacement for normal repo work or OpenClaw memory.