Go to file

Anto01 877b97ec78 feat: Phase 7C — tag canonicalization (autonomous, weekly)

LLM proposes alias→canonical mappings for domain_tags; confidence >= 0.8
auto-apply, below goes to human triage. Protects project identifiers
(p04, p05, p06, atocore, apm, etc.) from ever being canonicalized
since they're their own namespace, not concepts.

Problem solved: tag drift fragments retrieval. "fw" vs "firmware" vs
"firmware-control" all mean the same thing, but cross-cutting queries
that filter by tag only hit one variant. Weekly canonicalization pass
keeps the tag graph clean.

- Schema: tag_aliases table (pending | approved | rejected)
- atocore.memory._tag_canon_prompt (stdlib-only, protected project tokens)
- service: get_tag_distribution, apply_tag_alias (atomic per-memory,
  dedupes if both alias + canonical present), create / approve / reject
  proposal lifecycle, per-memory audit rows with action="tag_canonicalized"
- scripts/canonicalize_tags.py: host-side detector, autonomous by default,
  --no-auto-approve kill switch
- 6 API endpoints under /admin/tags/* (distribution, list, propose,
  apply, approve/{id}, reject/{id})
- Step B4 in batch-extract.sh (Sundays only — weekly cadence)
- 26 new tests (prompt parser, normalizer protections, distribution
  counting, rewrite atomicity, dedup, audit, lifecycle). 414 → 440.

Design: aggressive protection of project tokens because a false
canonicalization (p04 → p04-gigabit, or vice versa) would scramble
cross-project filtering. Err toward preservation; the alias only
applies if the model is very confident AND both strings appear in
the current distribution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-19 09:41:02 -04:00

.claude/commands

fix(P1+P2): alias-aware project state lookup + slash command corpus fallback

2026-04-07 07:47:03 -04:00

config

feat(capture): wire project inference from cwd

2026-04-16 09:01:38 -04:00

deploy

feat: Phase 7C — tag canonicalization (autonomous, weekly)

2026-04-19 09:41:02 -04:00

docs

feat: Phase 7A — semantic memory dedup ("sleep cycle" V1)

2026-04-18 10:30:49 -04:00

openclaw-plugins/atocore-capture

fix: OpenClaw plugin filters cron-initiated agent runs

2026-04-17 11:09:44 -04:00

scripts

feat: Phase 7C — tag canonicalization (autonomous, weekly)

2026-04-19 09:41:02 -04:00

src/atocore

feat: Phase 7C — tag canonicalization (autonomous, weekly)

2026-04-19 09:41:02 -04:00

t420-openclaw

Add AtoCore integration tooling and operations guide

2026-04-06 19:28:09 -04:00

tests

feat: Phase 7C — tag canonicalization (autonomous, weekly)

2026-04-19 09:41:02 -04:00

.dockerignore

Add Dalidou storage foundation and deployment prep

2026-04-05 18:33:52 -04:00

.env.example

Harden runtime and add backup foundation

2026-04-06 10:15:00 -04:00

.gitignore

feat: universal LLM consumption (Phase 1 complete)

2026-04-16 20:14:25 -04:00

AGENTS.md

feat: DEV-LEDGER.md as shared operating memory + session protocol

2026-04-11 14:46:21 -04:00

CLAUDE.md

feat: DEV-LEDGER.md as shared operating memory + session protocol

2026-04-11 14:46:21 -04:00

DEV-LEDGER.md

feat: Phase 7A — semantic memory dedup ("sleep cycle" V1)

2026-04-18 10:30:49 -04:00

Dockerfile

Ship project registry config in image

2026-04-06 08:10:05 -04:00

pyproject.toml

fix: add markdown to pyproject.toml (container pip install reads this, not requirements.txt)

2026-04-13 15:37:22 -04:00

README.md

Merge origin/main into codex/dalidou-storage-foundation

2026-04-07 06:20:19 -04:00

requirements-dev.txt

feat: implement AtoCore Phase 0 + Phase 0.5 (foundation + PoC)

2026-04-05 09:21:27 -04:00

requirements.txt

feat: HTML mirror pages — readable project dashboards in browser

2026-04-13 15:31:03 -04:00

README.md

AtoCore

Personal context engine that enriches LLM interactions with durable memory, structured context, and project knowledge.

Quick Start

pip install -e .
uvicorn src.atocore.main:app --port 8100

Usage

# Ingest markdown files
curl -X POST http://localhost:8100/ingest \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/notes"}'

# Build enriched context for a prompt
curl -X POST http://localhost:8100/context/build \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is the project status?", "project": "myproject"}'

# CLI ingestion
python scripts/ingest_folder.py --path /path/to/notes

# Live operator client
python scripts/atocore_client.py health
python scripts/atocore_client.py audit-query "gigabit" 5

API Endpoints

Method	Path	Description
POST	/ingest	Ingest markdown file or folder
POST	/query	Retrieve relevant chunks
POST	/context/build	Build full context pack
GET	/health	Health check
GET	/debug/context	Inspect last context pack

Architecture

FastAPI (port 8100)
  |- Ingestion: markdown -> parse -> chunk -> embed -> store
  |- Retrieval: query -> embed -> vector search -> rank
  |- Context Builder: retrieve -> boost -> budget -> format
  |- SQLite (documents, chunks, memories, projects, interactions)
  '- ChromaDB (vector embeddings)

Configuration

Set via environment variables (prefix ATOCORE_):

Variable	Default	Description
ATOCORE_DEBUG	false	Enable debug logging
ATOCORE_PORT	8100	Server port
ATOCORE_CHUNK_MAX_SIZE	800	Max chunk size (chars)
ATOCORE_CONTEXT_BUDGET	3000	Context pack budget (chars)
ATOCORE_EMBEDDING_MODEL	paraphrase-multilingual-MiniLM-L12-v2	Embedding model

Testing

pip install -e ".[dev]"
pytest

Operations

scripts/atocore_client.py provides a live API client for project refresh, project-state inspection, and retrieval-quality audits.
docs/operations.md captures the current operational priority order: retrieval quality, Wave 2 trusted-operational ingestion, AtoDrive scoping, and restore validation.

Architecture Notes

Implementation-facing architecture notes live under docs/architecture/.

Current additions:

docs/architecture/engineering-knowledge-hybrid-architecture.md — 5-layer hybrid model
docs/architecture/engineering-ontology-v1.md — V1 object and relationship inventory
docs/architecture/engineering-query-catalog.md — 20 v1-required queries
docs/architecture/memory-vs-entities.md — canonical home split
docs/architecture/promotion-rules.md — Layer 0 to Layer 2 pipeline
docs/architecture/conflict-model.md — contradictory facts detection and resolution