Files
ATOCore/docs/current-state.md
Anto01 480f13a6df docs(arch): memory-vs-entities, promotion-rules, conflict-model
Three planning docs that answer the architectural questions the
engineering query catalog raised. Together with the catalog they
form roughly half of the pre-implementation planning sprint.

docs/architecture/memory-vs-entities.md
---------------------------------------
Resolves the central question blocking every other engineering
layer doc: is a Decision a memory or an entity?

Key decisions:
- memories stay the canonical home for identity, preference, and
  episodic facts
- entities become the canonical home for project, knowledge, and
  adaptation facts once the engineering layer V1 ships
- no concept lives in both layers at full fidelity; one canonical
  home per concept
- a "graduation" flow lets active memories upgrade into entities
  (memory stays as a frozen historical pointer, never deleted)
- one shared candidate review queue across both layers
- context builder budget gains a 15% slot for engineering entities,
  slotted between identity/preference memories and retrieved chunks
- the Phase 9 memory extractor's structural cues (decision heading,
  constraint heading, requirement heading) are explicitly an
  intentional temporary overlap, cleanly migrated via graduation
  when the entity extractor ships

docs/architecture/promotion-rules.md
------------------------------------
Defines the full Layer 0 → Layer 2 pipeline:

- four layers: L0 raw source, L1 memory candidate/active, L2 entity
  candidate/active, L3 trusted project state
- three extraction triggers: on interaction capture (existing),
  on ingestion wave (new, batched per wave), on explicit request
- per-rule prior confidence tuned at write time by structural
  signal (echoes the retriever's high/low signal hints) and
  freshness bonus
- batch cap of 50 candidates per pass to protect the reviewer
- full provenance requirements: every candidate carries rule id,
  source_chunk_id, source_interaction_id, and extractor_version
- reversibility matrix for every promotion step
- explicit no-auto-promotion-in-V1 stance with the schema designed
  so auto-promotion policies can be added later without migration
- the hard invariant: nothing ever moves into L3 automatically
- ingestion-wave extraction produces a report artifact under
  data/extraction-reports/<wave-id>/

docs/architecture/conflict-model.md
-----------------------------------
Defines how AtoCore handles contradictory facts without violating
the "bad memory is worse than no memory" rule.

- conflict = two or more active rows claiming the same slot with
  incompatible values
- per-type "slot key" tuples for both memory and entity types
- cross-layer conflict detection respects the trust hierarchy:
  trusted project state > active entities > active memories
- new conflicts and conflict_members tables (schema proposal)
- detection at two latencies: synchronous at write time,
  asynchronous nightly sweep
- "flag, never block" rule: writes always succeed, conflicts are
  surfaced via /conflicts, /health open_conflicts_count, per-row
  response bodies, and the Human Mirror's disputed marker
- resolution is always human: promote-winner + supersede-others,
  or dismiss-as-not-a-real-conflict, both with audit trail
- explicitly out of scope for V1: cross-project conflicts,
  temporal-overlap conflicts, tolerance-aware numeric comparisons

Also updates:
- master-plan-status.md: Phase 9 moved from "started" to "baseline
  complete" now that Commits A, B, C are all landed
- master-plan-status.md: adds a "Engineering Layer Planning Sprint"
  section listing the doc wave so far and the remaining docs
  (tool-handoff-boundaries, human-mirror-rules,
  representation-authority, engineering-v1-acceptance)
- current-state.md: Phase 9 moved from "not started" to "baseline
  complete" with the A/B/C annotation

This is pure doc work. No code changes, no schema changes, no
behavior changes. Per the working rule in master-plan-status.md:
the architecture docs shape decisions, they do not force premature
schema work.
2026-04-06 21:30:35 -04:00

8.1 KiB

AtoCore Current State

Status Summary

AtoCore is no longer just a proof of concept. The local engine exists, the correctness pass is complete, Dalidou now hosts the canonical runtime and machine-storage location, and the T420/OpenClaw side now has a safe read-only path to consume AtoCore. The live corpus is no longer just self-knowledge: it now includes a first curated ingestion batch for the active projects.

Phase Assessment

  • completed
    • Phase 0
    • Phase 0.5
    • Phase 1
  • baseline complete
    • Phase 2
    • Phase 3
    • Phase 5
    • Phase 7
    • Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
  • partial
    • Phase 4
    • Phase 8
  • not started
    • Phase 6
    • Phase 10
    • Phase 11
    • Phase 12
    • Phase 13

What Exists Today

  • ingestion pipeline
  • parser and chunker
  • SQLite-backed memory and project state
  • vector retrieval
  • context builder
  • API routes for query, context, health, and source status
  • project registry and per-project refresh foundation
  • project registration lifecycle:
    • template
    • proposal preview
    • approved registration
    • safe update of existing project registrations
    • refresh
  • implementation-facing architecture notes for:
    • engineering knowledge hybrid architecture
    • engineering ontology v1
  • env-driven storage and deployment paths
  • Dalidou Docker deployment foundation
  • initial AtoCore self-knowledge corpus ingested on Dalidou
  • T420/OpenClaw read-only AtoCore helper skill
  • full active-project markdown/text corpus wave for:
    • p04-gigabit
    • p05-interferometer
    • p06-polisher

What Is True On Dalidou

  • deployed repo location:
    • /srv/storage/atocore/app
  • canonical machine DB location:
    • /srv/storage/atocore/data/db/atocore.db
  • canonical vector store location:
    • /srv/storage/atocore/data/chroma
  • source input locations:
    • /srv/storage/atocore/sources/vault
    • /srv/storage/atocore/sources/drive

The service and storage foundation are live on Dalidou.

The machine-data host is real and canonical.

The project registry is now also persisted in a canonical mounted config path on Dalidou:

  • /srv/storage/atocore/config/project-registry.json

The content corpus is partially populated now.

The Dalidou instance already contains:

  • AtoCore ecosystem and hosting docs
  • current-state and OpenClaw integration docs
  • Master Plan V3
  • Build Spec V1
  • trusted project-state entries for atocore
  • full staged project markdown/text corpora for:
    • p04-gigabit
    • p05-interferometer
    • p06-polisher
  • curated repo-context docs for:
    • p05: Fullum-Interferometer
    • p06: polisher-sim
  • trusted project-state entries for:
    • p04-gigabit
    • p05-interferometer
    • p06-polisher

Current live stats after the full active-project wave are now far beyond the initial seed stage:

  • more than 1,100 source documents
  • more than 20,000 chunks
  • matching vector count

The broader long-term corpus is still not fully populated yet. Wider project and vault ingestion remains a deliberate next step rather than something already completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.

For human-readable quality review, the current staged project markdown corpus is primarily visible under:

  • /srv/storage/atocore/sources/vault/incoming/projects

This staged area is now useful for review because it contains the markdown/text project docs that were actually ingested for the full active-project wave.

It is important to read this staged area correctly:

  • it is a readable ingestion input layer
  • it is not the final machine-memory representation itself
  • seeing familiar PKM-style notes there is expected
  • the machine-processed intelligence lives in the DB, chunks, vectors, memory, trusted project state, and context-builder outputs

What Is True On The T420

  • SSH access is working
  • OpenClaw workspace inspected at /home/papa/clawd
  • OpenClaw's own memory system remains unchanged
  • a read-only AtoCore integration skill exists in the workspace:
    • /home/papa/clawd/skills/atocore-context/
  • the T420 can successfully reach Dalidou AtoCore over network/Tailscale
  • fail-open behavior has been verified for the helper path
  • OpenClaw can now seed AtoCore in two distinct ways:
    • project-scoped memory entries
    • staged document ingestion into the retrieval corpus
  • the helper now supports the practical registered-project lifecycle:
    • projects
    • project-template
    • propose-project
    • register-project
    • update-project
    • refresh-project
  • the helper now also supports the first organic routing layer:
    • detect-project "<prompt>"
    • auto-context "<prompt>" [budget] [project]
  • OpenClaw can now default to AtoCore for project-knowledge questions without requiring explicit helper commands from the human every time

What Exists In Memory vs Corpus

These remain separate and that is intentional.

In /memory:

  • project-scoped curated memories now exist for:
    • p04-gigabit: 5 memories
    • p05-interferometer: 6 memories
    • p06-polisher: 8 memories

These are curated summaries and extracted stable project signals.

In source_documents / retrieval corpus:

  • full project markdown/text corpora are now present for the active project set
  • retrieval is no longer limited to AtoCore self-knowledge only
  • the current corpus is broad enough that ranking quality matters more than corpus presence alone
  • underspecified prompts can still pull in historical or archive material, so project-aware routing and better ranking remain important

The source refresh model now has a concrete foundation in code:

  • a project registry file defines known project ids, aliases, and ingest roots
  • the API can list registered projects
  • the API can return a registration template
  • the API can preview a registration without mutating state
  • the API can persist an approved registration
  • the API can update an existing registered project without changing its canonical id
  • the API can refresh one registered project at a time

This lifecycle is now coherent end to end for normal use.

The first live update passes on existing registered projects have now been verified against p04-gigabit and p05-interferometer:

  • the registration description can be updated safely
  • the canonical project id remains unchanged
  • refresh still behaves cleanly after the update
  • context/build still returns useful project-specific context afterward

Reliability Baseline

The runtime has now been hardened in a few practical ways:

  • SQLite connections use a configurable busy timeout
  • SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
  • project registry writes are atomic file replacements rather than in-place rewrites
  • a first runtime backup path now exists for:
    • SQLite
    • project registry
    • backup metadata

This does not eliminate every concurrency edge, but it materially improves the current operational baseline.

In Trusted Project State:

  • each active seeded project now has a conservative trusted-state set
  • promoted facts cover:
    • summary
    • core architecture or boundary decision
    • key constraints
    • next focus

This separation is healthy:

  • memory stores distilled project facts
  • corpus stores the underlying retrievable documents

Immediate Next Focus

  1. Use the new T420-side organic routing layer in real OpenClaw workflows
  2. Tighten retrieval quality for the now fully ingested active project corpora
  3. Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
  4. Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
  5. Expand the boring operations baseline:
    • restore validation
    • Chroma rebuild / backup policy
    • retention
  6. Only later consider write-back, reflection, or deeper autonomous behaviors

See also:

Guiding Constraints

  • bad memory is worse than no memory
  • trusted project state must remain highest priority
  • human-readable sources and machine storage stay separate
  • OpenClaw integration must not degrade OpenClaw baseline behavior