Commit Graph

80 Commits

Author SHA1 Message Date
f49637b5cc Add AtoCore integration tooling and operations guide 2026-04-06 19:28:09 -04:00
c9b9eede25 feat: tunable ranking, refresh status, chroma backup + admin endpoints
Three small improvements that move the operational baseline forward
without changing the existing trust model.

1. Tunable retrieval ranking weights
   - rank_project_match_boost, rank_query_token_step,
     rank_query_token_cap, rank_path_high_signal_boost,
     rank_path_low_signal_penalty are now Settings fields
   - all overridable via ATOCORE_* env vars
   - retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32
   - lets ranking be tuned per environment as Wave 1 is exercised
     without code changes

2. /projects/{name}/refresh status
   - refresh_registered_project now returns an overall status field
     ("ingested", "partial", "nothing_to_ingest") plus roots_ingested
     and roots_skipped counters
   - ProjectRefreshResponse advertises the new fields so callers can
     rely on them
   - covers the case where every configured root is missing on disk

3. Chroma cold snapshot + admin backup endpoints
   - create_runtime_backup now accepts include_chroma and writes a
     cold directory copy of the chroma persistence path
   - new list_runtime_backups() and validate_backup() helpers
   - new endpoints:
     - POST /admin/backup            create snapshot (optional chroma)
     - GET  /admin/backup            list snapshots
     - GET  /admin/backup/{stamp}/validate  structural validation
   - chroma snapshots are taken under exclusive_ingestion() so a refresh
     or ingest cannot race with the cold copy
   - backup metadata records what was actually included and how big

Tests:
- 8 new tests covering tunable weights, refresh status branches
  (ingested / partial / nothing_to_ingest), chroma snapshot, list,
  validate, and the API endpoints (including the lock-acquisition path)
- existing fake refresh stubs in test_api_storage.py updated for the
  expanded ProjectRefreshResponse model
- full suite: 105 passing (was 97)

next-steps doc updated to reflect that the chroma snapshot + restore
validation gap from current-state.md is now closed in code; only the
operational retention policy remains.
2026-04-06 18:42:19 -04:00
14ab7c8e9f fix: pass project_hint into retrieve and add path-signal ranking
Two changes that belong together:

1. builder.build_context() now passes project_hint into retrieve(),
   so the project-aware boost actually fires for the retrieval pipeline
   driven by /context/build. Before this, only direct /query callers
   benefited from the registered-project boost.

2. retriever now applies two more ranking signals on every chunk:
   - _query_match_boost: boosts chunks whose source/title/heading
     echo high-signal query tokens (stop list filters out generic
     words like "the", "project", "system")
   - _path_signal_boost: down-weights archival noise (_archive,
     _history, pre-cleanup, reviews) by 0.72 and up-weights current
     high-signal docs (status, decision, requirements, charter,
     system-map, error-budget, ...) by 1.18

Tests:
- test_context_builder_passes_project_hint_to_retrieval verifies
  the wiring fix
- test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths
  verifies the new ranking helpers prefer current docs over archive

This addresses the cross-project competition and archive bleed
called out in current-state.md after the Wave 1 ingestion.
2026-04-06 18:37:07 -04:00
bdb42dba05 Expand active project wave and serialize refreshes 2026-04-06 14:58:14 -04:00
46a5d5887a Update plan status for organic routing 2026-04-06 14:06:54 -04:00
9943338846 Document organic OpenClaw routing layer 2026-04-06 14:04:49 -04:00
26bfa94c65 Add project-aware boost to raw query 2026-04-06 13:32:33 -04:00
4aa2b696a9 Document next-phase execution plan 2026-04-06 13:10:11 -04:00
af01dd3e70 Add engineering architecture docs 2026-04-06 12:45:28 -04:00
8f74cab0e6 Sync live project registry descriptions 2026-04-06 12:36:15 -04:00
06aa931273 Add project registry update flow 2026-04-06 12:31:24 -04:00
c9757e313a Harden runtime and add backup foundation 2026-04-06 10:15:00 -04:00
9715fe3143 Add project registration endpoint 2026-04-06 09:52:19 -04:00
1f1e6b5749 Add project registration proposal preview 2026-04-06 09:11:11 -04:00
827dcf2cd1 Add project registration policy and template 2026-04-06 08:46:37 -04:00
d8028f406e Sync live corpus counts in current state doc 2026-04-06 08:19:42 -04:00
3b8d717bdf Ship project registry config in image 2026-04-06 08:10:05 -04:00
8293099025 Add project registry refresh foundation 2026-04-06 08:02:13 -04:00
0f95415530 Clarify source staging and refresh model 2026-04-06 07:53:18 -04:00
82c7535d15 Refresh current state and next steps docs 2026-04-06 07:36:33 -04:00
8a94da4bf4 Clarify operating model and project corpus state 2026-04-06 07:25:33 -04:00
5069d5b1b6 Update current state and next steps docs 2026-04-05 19:12:45 -04:00
440fc1d9ba Document ecosystem state and integration contract 2026-04-05 18:47:40 -04:00
6bfa1fcc37 Add Dalidou storage foundation and deployment prep 2026-04-05 18:33:52 -04:00
b0889b3925 Stabilize core correctness and sync project plan state 2026-04-05 17:53:23 -04:00
b48f0c95ab feat: Phase 2 Memory Core — structured memory with context integration
Memory Core implementation:
- Memory service with 6 types: identity, preference, project, episodic, knowledge, adaptation
- CRUD operations: create (with dedup), get (filtered), update, invalidate, supersede
- Confidence scoring (0.0-1.0) and lifecycle management (active/superseded/invalid)
- Memory API endpoints: POST/GET/PUT/DELETE /memory

Context builder integration (trust precedence per Master Plan):
  1. Trusted Project State (highest trust, 20% budget)
  2. Identity + Preference memories (10% budget)
  3. Retrieved chunks (remaining budget)

Also fixed database.py to use dynamic settings reference for test isolation.
45/45 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:54:52 -04:00
531c560db7 feat: Phase 1 ingestion hardening + Phase 5 Trusted Project State
Phase 1 - Ingestion hardening:
- Encoding fallback (UTF-8/UTF-8-sig/Latin-1/CP1252)
- Delete detection: purge DB/vector entries for removed files
- Ingestion stats endpoint (GET /stats)

Phase 5 - Trusted Project State:
- project_state table with categories (status, decision, requirement, contact, milestone, fact, config)
- CRUD API: POST/GET/DELETE /project/state
- Upsert semantics, invalidation (supersede) support
- Context builder integrates project state at highest trust precedence
- Project state gets 20% budget allocation, appears first in context
- Trust precedence: Project State > Retrieved Chunks (per Master Plan)

33/33 tests passing. Validated end-to-end with GigaBIT M1 project data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:41:59 -04:00
6081462058 fix: critical bugs and hardening from validation audit
- Fix infinite loop in chunker _hard_split when overlap >= max_size
- Fix tag filter false positives by quoting tag values in ChromaDB query
- Fix score boost semantics (additive → multiplicative) to stay within 0-1 range
- Add error handling and type hints to all API routes
- Update README with proper project documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:35:37 -04:00
b4afbbb53a feat: implement AtoCore Phase 0 + Phase 0.5 (foundation + PoC)
Complete implementation of the personal context engine foundation:
- FastAPI server with 5 endpoints (ingest, query, context/build, health, debug)
- SQLite database with 5 tables (documents, chunks, memories, projects, interactions)
- Heading-aware markdown chunker (800 char max, recursive splitting)
- Multilingual embeddings via sentence-transformers (EN/FR)
- ChromaDB vector store with cosine similarity retrieval
- Context builder with project boosting, dedup, and budget enforcement
- CLI scripts for batch ingestion and test prompt evaluation
- 19 unit tests passing, 79% coverage
- Validated on 482 real project files (8383 chunks, 0 errors)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 09:21:27 -04:00
32ce409a7b Initial commit 2026-04-05 12:28:07 +00:00