feat: implement AtoCore Phase 0 + Phase 0.5 (foundation + PoC)

Complete implementation of the personal context engine foundation:
- FastAPI server with 5 endpoints (ingest, query, context/build, health, debug)
- SQLite database with 5 tables (documents, chunks, memories, projects, interactions)
- Heading-aware markdown chunker (800 char max, recursive splitting)
- Multilingual embeddings via sentence-transformers (EN/FR)
- ChromaDB vector store with cosine similarity retrieval
- Context builder with project boosting, dedup, and budget enforcement
- CLI scripts for batch ingestion and test prompt evaluation
- 19 unit tests passing, 79% coverage
- Validated on 482 real project files (8383 chunks, 0 errors)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-05 09:21:27 -04:00
parent 32ce409a7b
commit b4afbbb53a
34 changed files with 1756 additions and 0 deletions

39
src/atocore/config.py Normal file
View File

@@ -0,0 +1,39 @@
"""AtoCore configuration via environment variables."""
from pathlib import Path
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
debug: bool = False
data_dir: Path = Path("./data")
host: str = "127.0.0.1"
port: int = 8100
# Embedding
embedding_model: str = (
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)
# Chunking
chunk_max_size: int = 800
chunk_overlap: int = 100
chunk_min_size: int = 50
# Context
context_budget: int = 3000
context_top_k: int = 15
model_config = {"env_prefix": "ATOCORE_"}
@property
def db_path(self) -> Path:
return self.data_dir / "atocore.db"
@property
def chroma_path(self) -> Path:
return self.data_dir / "chroma"
settings = Settings()