Anto01 c9b9eede25 feat: tunable ranking, refresh status, chroma backup + admin endpoints
Three small improvements that move the operational baseline forward
without changing the existing trust model.

1. Tunable retrieval ranking weights
   - rank_project_match_boost, rank_query_token_step,
     rank_query_token_cap, rank_path_high_signal_boost,
     rank_path_low_signal_penalty are now Settings fields
   - all overridable via ATOCORE_* env vars
   - retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32
   - lets ranking be tuned per environment as Wave 1 is exercised
     without code changes

2. /projects/{name}/refresh status
   - refresh_registered_project now returns an overall status field
     ("ingested", "partial", "nothing_to_ingest") plus roots_ingested
     and roots_skipped counters
   - ProjectRefreshResponse advertises the new fields so callers can
     rely on them
   - covers the case where every configured root is missing on disk

3. Chroma cold snapshot + admin backup endpoints
   - create_runtime_backup now accepts include_chroma and writes a
     cold directory copy of the chroma persistence path
   - new list_runtime_backups() and validate_backup() helpers
   - new endpoints:
     - POST /admin/backup            create snapshot (optional chroma)
     - GET  /admin/backup            list snapshots
     - GET  /admin/backup/{stamp}/validate  structural validation
   - chroma snapshots are taken under exclusive_ingestion() so a refresh
     or ingest cannot race with the cold copy
   - backup metadata records what was actually included and how big

Tests:
- 8 new tests covering tunable weights, refresh status branches
  (ingested / partial / nothing_to_ingest), chroma snapshot, list,
  validate, and the API endpoints (including the lock-acquisition path)
- existing fake refresh stubs in test_api_storage.py updated for the
  expanded ProjectRefreshResponse model
- full suite: 105 passing (was 97)

next-steps doc updated to reflect that the chroma snapshot + restore
validation gap from current-state.md is now closed in code; only the
operational retention policy remains.
2026-04-06 18:42:19 -04:00
2026-04-06 12:45:28 -04:00

AtoCore

Personal context engine that enriches LLM interactions with durable memory, structured context, and project knowledge.

Quick Start

pip install -e .
uvicorn src.atocore.main:app --port 8100

Usage

# Ingest markdown files
curl -X POST http://localhost:8100/ingest \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/notes"}'

# Build enriched context for a prompt
curl -X POST http://localhost:8100/context/build \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is the project status?", "project": "myproject"}'

# CLI ingestion
python scripts/ingest_folder.py --path /path/to/notes

API Endpoints

Method Path Description
POST /ingest Ingest markdown file or folder
POST /query Retrieve relevant chunks
POST /context/build Build full context pack
GET /health Health check
GET /debug/context Inspect last context pack

Architecture

FastAPI (port 8100)
  |- Ingestion: markdown -> parse -> chunk -> embed -> store
  |- Retrieval: query -> embed -> vector search -> rank
  |- Context Builder: retrieve -> boost -> budget -> format
  |- SQLite (documents, chunks, memories, projects, interactions)
  '- ChromaDB (vector embeddings)

Configuration

Set via environment variables (prefix ATOCORE_):

Variable Default Description
ATOCORE_DEBUG false Enable debug logging
ATOCORE_PORT 8100 Server port
ATOCORE_CHUNK_MAX_SIZE 800 Max chunk size (chars)
ATOCORE_CONTEXT_BUDGET 3000 Context pack budget (chars)
ATOCORE_EMBEDDING_MODEL paraphrase-multilingual-MiniLM-L12-v2 Embedding model

Testing

pip install -e ".[dev]"
pytest

Architecture Notes

Implementation-facing architecture notes live under docs/architecture/.

Current additions:

  • docs/architecture/engineering-knowledge-hybrid-architecture.md
  • docs/architecture/engineering-ontology-v1.md
Description
ATODrive project repository
Readme 1.8 MiB
Languages
Python 96.2%
Shell 3.3%
JavaScript 0.4%