feat: implement AtoCore Phase 0 + Phase 0.5 (foundation + PoC)

Complete implementation of the personal context engine foundation:
- FastAPI server with 5 endpoints (ingest, query, context/build, health, debug)
- SQLite database with 5 tables (documents, chunks, memories, projects, interactions)
- Heading-aware markdown chunker (800 char max, recursive splitting)
- Multilingual embeddings via sentence-transformers (EN/FR)
- ChromaDB vector store with cosine similarity retrieval
- Context builder with project boosting, dedup, and budget enforcement
- CLI scripts for batch ingestion and test prompt evaluation
- 19 unit tests passing, 79% coverage
- Validated on 482 real project files (8383 chunks, 0 errors)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-05 09:21:27 -04:00
parent 32ce409a7b
commit b4afbbb53a
34 changed files with 1756 additions and 0 deletions

114
tests/conftest.py Normal file
View File

@@ -0,0 +1,114 @@
"""pytest configuration and shared fixtures."""
import os
import tempfile
from pathlib import Path
import pytest
# Force test data directory
os.environ["ATOCORE_DATA_DIR"] = tempfile.mkdtemp(prefix="atocore_test_")
os.environ["ATOCORE_DEBUG"] = "true"
@pytest.fixture
def tmp_data_dir(tmp_path):
"""Provide a temporary data directory for tests."""
os.environ["ATOCORE_DATA_DIR"] = str(tmp_path)
# Reset singletons
from atocore import config
config.settings = config.Settings()
import atocore.retrieval.vector_store as vs
vs._store = None
return tmp_path
@pytest.fixture
def sample_markdown(tmp_path) -> Path:
"""Create a sample markdown file for testing."""
md_file = tmp_path / "test_note.md"
md_file.write_text(
"""---
tags:
- atocore
- architecture
date: 2026-04-05
---
# AtoCore Architecture
## Overview
AtoCore is a personal context engine that enriches LLM interactions
with durable memory, structured context, and project knowledge.
## Layers
The system has these layers:
1. Main PKM (human, messy, exploratory)
2. AtoVault (system mirror)
3. AtoDrive (trusted project truth)
4. Structured Memory (DB)
5. Semantic Retrieval (vector DB)
## Memory Types
AtoCore supports these memory types:
- Identity
- Preferences
- Project Memory
- Episodic Memory
- Knowledge Objects
- Adaptation Memory
- Trusted Project State
## Trust Precedence
When sources conflict:
1. Trusted Project State wins
2. AtoDrive overrides PKM
3. Most recent confirmed wins
4. Higher confidence wins
5. Equal → flag conflict
No silent merging.
""",
encoding="utf-8",
)
return md_file
@pytest.fixture
def sample_folder(tmp_path, sample_markdown) -> Path:
"""Create a folder with multiple markdown files."""
# Already has test_note.md from sample_markdown
second = tmp_path / "second_note.md"
second.write_text(
"""---
tags:
- chunking
---
# Chunking Strategy
## Approach
Heading-aware recursive splitting:
1. Split on H2 boundaries first
2. If section > 800 chars, split on H3
3. If still > 800 chars, split on paragraphs
4. Hard split at 800 chars with 100 char overlap
## Parameters
- max_chunk_size: 800 characters
- overlap: 100 characters
- min_chunk_size: 50 characters
""",
encoding="utf-8",
)
return tmp_path