14ab7c8e9ffb24ab46e84cc55c1587b3adb5c319
Two changes that belong together:
1. builder.build_context() now passes project_hint into retrieve(),
so the project-aware boost actually fires for the retrieval pipeline
driven by /context/build. Before this, only direct /query callers
benefited from the registered-project boost.
2. retriever now applies two more ranking signals on every chunk:
- _query_match_boost: boosts chunks whose source/title/heading
echo high-signal query tokens (stop list filters out generic
words like "the", "project", "system")
- _path_signal_boost: down-weights archival noise (_archive,
_history, pre-cleanup, reviews) by 0.72 and up-weights current
high-signal docs (status, decision, requirements, charter,
system-map, error-budget, ...) by 1.18
Tests:
- test_context_builder_passes_project_hint_to_retrieval verifies
the wiring fix
- test_retrieve_downranks_archive_noise_and_prefers_high_signal_paths
verifies the new ranking helpers prefer current docs over archive
This addresses the cross-project competition and archive bleed
called out in current-state.md after the Wave 1 ingestion.
AtoCore
Personal context engine that enriches LLM interactions with durable memory, structured context, and project knowledge.
Quick Start
pip install -e .
uvicorn src.atocore.main:app --port 8100
Usage
# Ingest markdown files
curl -X POST http://localhost:8100/ingest \
-H "Content-Type: application/json" \
-d '{"path": "/path/to/notes"}'
# Build enriched context for a prompt
curl -X POST http://localhost:8100/context/build \
-H "Content-Type: application/json" \
-d '{"prompt": "What is the project status?", "project": "myproject"}'
# CLI ingestion
python scripts/ingest_folder.py --path /path/to/notes
API Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /ingest | Ingest markdown file or folder |
| POST | /query | Retrieve relevant chunks |
| POST | /context/build | Build full context pack |
| GET | /health | Health check |
| GET | /debug/context | Inspect last context pack |
Architecture
FastAPI (port 8100)
|- Ingestion: markdown -> parse -> chunk -> embed -> store
|- Retrieval: query -> embed -> vector search -> rank
|- Context Builder: retrieve -> boost -> budget -> format
|- SQLite (documents, chunks, memories, projects, interactions)
'- ChromaDB (vector embeddings)
Configuration
Set via environment variables (prefix ATOCORE_):
| Variable | Default | Description |
|---|---|---|
| ATOCORE_DEBUG | false | Enable debug logging |
| ATOCORE_PORT | 8100 | Server port |
| ATOCORE_CHUNK_MAX_SIZE | 800 | Max chunk size (chars) |
| ATOCORE_CONTEXT_BUDGET | 3000 | Context pack budget (chars) |
| ATOCORE_EMBEDDING_MODEL | paraphrase-multilingual-MiniLM-L12-v2 | Embedding model |
Testing
pip install -e ".[dev]"
pytest
Architecture Notes
Implementation-facing architecture notes live under docs/architecture/.
Current additions:
docs/architecture/engineering-knowledge-hybrid-architecture.mddocs/architecture/engineering-ontology-v1.md
Description
Languages
Python
96.2%
Shell
3.3%
JavaScript
0.4%