Go to file

Anto01 f521aab97b docs(arch): project-identity-canonicalization contract

Codifies the helper-at-every-service-boundary rule that fb6298a
implemented across the eight current callsites. The contract is
intentionally simple but easy to forget, so it lives in its own
doc that the engineering layer V1 implementation sprint can read
before adding new project-keyed entity surfaces.

docs/architecture/project-identity-canonicalization.md
------------------------------------------------------
- The contract: every read/write that takes a project name MUST
  call resolve_project_name() before the value crosses a service
  boundary; canonicalization happens once, at the first statement
  after input validation, never later
- The helper API: resolve_project_name(name) returns the canonical
  project_id for registered names, the input unchanged for empty
  or unregistered names (the second case is the backwards-compat
  path for hand-curated state predating the registry)
- Full table of the 8 current callsites: builder.build_context,
  project_state.set_state/get_state/invalidate_state,
  interactions.record_interaction/list_interactions,
  memory.create_memory/get_memories
- Where the helper is intentionally NOT called and why: legacy
  ensure_project lookup, retriever's own _project_match_boost
  (which already calls get_registered_project), _rank_chunks
  secondary substring boost (multiplicative not filter, can't
  drop relevant chunks), update_memory (no project field update),
  unregistered names (the rule applied to a name with no record)
- Why this is the trust hierarchy in action: Layer 3 trusted
  state has to be findable to win the trust battle; an
  un-canonicalized lookup silently makes Layer 3 invisible and
  the system falls through to lower-trust retrieved chunks with
  no signal to the human
- The 4-step rule for new entry points: identify project-keyed
  reads/writes, place the call as the first statement after
  validation, add a regression test using the project_registry
  fixture, verify None/empty paths
- How the project_registry fixture works with a copy-pasteable
  example
- What the rule does NOT cover: alias creation (registry's own
  write path), registry hot-reloading (no in-process cache by
  design), cross-project dedup (collision detection at
  registration), time-bounded canonicalization (canonical id is
  stable forever), legacy data migration (open follow-up)
- Engineering layer V1 implications: every new service entry
  point in the entities/relationships/conflicts/mirror modules
  must apply the helper at the first statement after validation;
  treated as code review failure if missing
- Open follow-ups: legacy data migration script (~30 LOC),
  registry file caching when projects scale beyond ~50, case
  sensitivity audit when entity-side storage lands, _rank_chunks
  cleanup, documentation discoverability (intentional redundancy
  between this doc, the helper docstring, and per-callsite comments)
- Quick reference card: copy-pasteable template for new service
  functions

master-plan-status.md updated
-----------------------------
- New doc added to the engineering-layer planning sprint listing
- Marked as required reading before V1 implementation begins
- Note that V1 must apply the contract at every new service-layer
  entry point

Pure doc work, no code changes. Full suite stays at 174 passing
because no source changed.

2026-04-07 19:32:31 -04:00

.claude/commands

fix(P1+P2): alias-aware project state lookup + slash command corpus fallback

2026-04-07 07:47:03 -04:00

config

Expand active project wave and serialize refreshes

2026-04-06 14:58:14 -04:00

deploy/dalidou

Add project registration endpoint

2026-04-06 09:52:19 -04:00

docs

docs(arch): project-identity-canonicalization contract

2026-04-07 19:32:31 -04:00

scripts

Merge origin/main into codex/dalidou-storage-foundation

2026-04-07 06:20:19 -04:00

src/atocore

fix(P1+P2): canonicalize project names at every trust boundary

2026-04-07 08:29:33 -04:00

tests

fix(P1+P2): canonicalize project names at every trust boundary

2026-04-07 08:29:33 -04:00

.dockerignore

Add Dalidou storage foundation and deployment prep

2026-04-05 18:33:52 -04:00

.env.example

Harden runtime and add backup foundation

2026-04-06 10:15:00 -04:00

.gitignore

slash command for daily AtoCore use + backup-restore procedure

2026-04-07 06:46:50 -04:00

AGENTS.md

Add Dalidou storage foundation and deployment prep

2026-04-05 18:33:52 -04:00

Dockerfile

Ship project registry config in image

2026-04-06 08:10:05 -04:00

pyproject.toml

Stabilize core correctness and sync project plan state

2026-04-05 17:53:23 -04:00

README.md

Merge origin/main into codex/dalidou-storage-foundation

2026-04-07 06:20:19 -04:00

requirements-dev.txt

feat: implement AtoCore Phase 0 + Phase 0.5 (foundation + PoC)

2026-04-05 09:21:27 -04:00

requirements.txt

feat: implement AtoCore Phase 0 + Phase 0.5 (foundation + PoC)

2026-04-05 09:21:27 -04:00

README.md

AtoCore

Personal context engine that enriches LLM interactions with durable memory, structured context, and project knowledge.

Quick Start

pip install -e .
uvicorn src.atocore.main:app --port 8100

Usage

# Ingest markdown files
curl -X POST http://localhost:8100/ingest \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/notes"}'

# Build enriched context for a prompt
curl -X POST http://localhost:8100/context/build \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is the project status?", "project": "myproject"}'

# CLI ingestion
python scripts/ingest_folder.py --path /path/to/notes

# Live operator client
python scripts/atocore_client.py health
python scripts/atocore_client.py audit-query "gigabit" 5

API Endpoints

Method	Path	Description
POST	/ingest	Ingest markdown file or folder
POST	/query	Retrieve relevant chunks
POST	/context/build	Build full context pack
GET	/health	Health check
GET	/debug/context	Inspect last context pack

Architecture

FastAPI (port 8100)
  |- Ingestion: markdown -> parse -> chunk -> embed -> store
  |- Retrieval: query -> embed -> vector search -> rank
  |- Context Builder: retrieve -> boost -> budget -> format
  |- SQLite (documents, chunks, memories, projects, interactions)
  '- ChromaDB (vector embeddings)

Configuration

Set via environment variables (prefix ATOCORE_):

Variable	Default	Description
ATOCORE_DEBUG	false	Enable debug logging
ATOCORE_PORT	8100	Server port
ATOCORE_CHUNK_MAX_SIZE	800	Max chunk size (chars)
ATOCORE_CONTEXT_BUDGET	3000	Context pack budget (chars)
ATOCORE_EMBEDDING_MODEL	paraphrase-multilingual-MiniLM-L12-v2	Embedding model

Testing

pip install -e ".[dev]"
pytest

Operations

scripts/atocore_client.py provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
docs/operations.md captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.

Architecture Notes

Implementation-facing architecture notes live under docs/architecture/.

Current additions:

docs/architecture/engineering-knowledge-hybrid-architecture.md — 5-layer hybrid model
docs/architecture/engineering-ontology-v1.md — V1 object and relationship inventory
docs/architecture/engineering-query-catalog.md — 20 v1-required queries
docs/architecture/memory-vs-entities.md — canonical home split
docs/architecture/promotion-rules.md — Layer 0 to Layer 2 pipeline
docs/architecture/conflict-model.md — contradictory facts detection and resolution