Files

Anto01 480f13a6df docs(arch): memory-vs-entities, promotion-rules, conflict-model

Three planning docs that answer the architectural questions the
engineering query catalog raised. Together with the catalog they
form roughly half of the pre-implementation planning sprint.

docs/architecture/memory-vs-entities.md
---------------------------------------
Resolves the central question blocking every other engineering
layer doc: is a Decision a memory or an entity?

Key decisions:
- memories stay the canonical home for identity, preference, and
  episodic facts
- entities become the canonical home for project, knowledge, and
  adaptation facts once the engineering layer V1 ships
- no concept lives in both layers at full fidelity; one canonical
  home per concept
- a "graduation" flow lets active memories upgrade into entities
  (memory stays as a frozen historical pointer, never deleted)
- one shared candidate review queue across both layers
- context builder budget gains a 15% slot for engineering entities,
  slotted between identity/preference memories and retrieved chunks
- the Phase 9 memory extractor's structural cues (decision heading,
  constraint heading, requirement heading) are explicitly an
  intentional temporary overlap, cleanly migrated via graduation
  when the entity extractor ships

docs/architecture/promotion-rules.md
------------------------------------
Defines the full Layer 0 → Layer 2 pipeline:

- four layers: L0 raw source, L1 memory candidate/active, L2 entity
  candidate/active, L3 trusted project state
- three extraction triggers: on interaction capture (existing),
  on ingestion wave (new, batched per wave), on explicit request
- per-rule prior confidence tuned at write time by structural
  signal (echoes the retriever's high/low signal hints) and
  freshness bonus
- batch cap of 50 candidates per pass to protect the reviewer
- full provenance requirements: every candidate carries rule id,
  source_chunk_id, source_interaction_id, and extractor_version
- reversibility matrix for every promotion step
- explicit no-auto-promotion-in-V1 stance with the schema designed
  so auto-promotion policies can be added later without migration
- the hard invariant: nothing ever moves into L3 automatically
- ingestion-wave extraction produces a report artifact under
  data/extraction-reports/<wave-id>/

docs/architecture/conflict-model.md
-----------------------------------
Defines how AtoCore handles contradictory facts without violating
the "bad memory is worse than no memory" rule.

- conflict = two or more active rows claiming the same slot with
  incompatible values
- per-type "slot key" tuples for both memory and entity types
- cross-layer conflict detection respects the trust hierarchy:
  trusted project state > active entities > active memories
- new conflicts and conflict_members tables (schema proposal)
- detection at two latencies: synchronous at write time,
  asynchronous nightly sweep
- "flag, never block" rule: writes always succeed, conflicts are
  surfaced via /conflicts, /health open_conflicts_count, per-row
  response bodies, and the Human Mirror's disputed marker
- resolution is always human: promote-winner + supersede-others,
  or dismiss-as-not-a-real-conflict, both with audit trail
- explicitly out of scope for V1: cross-project conflicts,
  temporal-overlap conflicts, tolerance-aware numeric comparisons

Also updates:
- master-plan-status.md: Phase 9 moved from "started" to "baseline
  complete" now that Commits A, B, C are all landed
- master-plan-status.md: adds a "Engineering Layer Planning Sprint"
  section listing the doc wave so far and the remaining docs
  (tool-handoff-boundaries, human-mirror-rules,
  representation-authority, engineering-v1-acceptance)
- current-state.md: Phase 9 moved from "not started" to "baseline
  complete" with the A/B/C annotation

This is pure doc work. No code changes, no schema changes, no
behavior changes. Per the working rule in master-plan-status.md:
the architecture docs shape decisions, they do not force premature
schema work.

2026-04-06 21:30:35 -04:00

8.1 KiB

Raw Blame History

AtoCore Current State

Status Summary

AtoCore is no longer just a proof of concept. The local engine exists, the correctness pass is complete, Dalidou now hosts the canonical runtime and machine-storage location, and the T420/OpenClaw side now has a safe read-only path to consume AtoCore. The live corpus is no longer just self-knowledge: it now includes a first curated ingestion batch for the active projects.

Phase Assessment

completed
- Phase 0
- Phase 0.5
- Phase 1
baseline complete
- Phase 2
- Phase 3
- Phase 5
- Phase 7
- Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
partial
- Phase 4
- Phase 8
not started
- Phase 6
- Phase 10
- Phase 11
- Phase 12
- Phase 13

What Exists Today

ingestion pipeline
parser and chunker
SQLite-backed memory and project state
vector retrieval
context builder
API routes for query, context, health, and source status
project registry and per-project refresh foundation
project registration lifecycle:
- template
- proposal preview
- approved registration
- safe update of existing project registrations
- refresh
implementation-facing architecture notes for:
- engineering knowledge hybrid architecture
- engineering ontology v1
env-driven storage and deployment paths
Dalidou Docker deployment foundation
initial AtoCore self-knowledge corpus ingested on Dalidou
T420/OpenClaw read-only AtoCore helper skill
full active-project markdown/text corpus wave for:
- p04-gigabit
- p05-interferometer
- p06-polisher

What Is True On Dalidou

deployed repo location:
- /srv/storage/atocore/app
canonical machine DB location:
- /srv/storage/atocore/data/db/atocore.db
canonical vector store location:
- /srv/storage/atocore/data/chroma
source input locations:
- /srv/storage/atocore/sources/vault
- /srv/storage/atocore/sources/drive

The service and storage foundation are live on Dalidou.

The machine-data host is real and canonical.

The project registry is now also persisted in a canonical mounted config path on Dalidou:

/srv/storage/atocore/config/project-registry.json

The content corpus is partially populated now.

The Dalidou instance already contains:

AtoCore ecosystem and hosting docs
current-state and OpenClaw integration docs
Master Plan V3
Build Spec V1
trusted project-state entries for atocore
full staged project markdown/text corpora for:
- p04-gigabit
- p05-interferometer
- p06-polisher
curated repo-context docs for:
- p05: Fullum-Interferometer
- p06: polisher-sim
trusted project-state entries for:
- p04-gigabit
- p05-interferometer
- p06-polisher

Current live stats after the full active-project wave are now far beyond the initial seed stage:

more than 1,100 source documents
more than 20,000 chunks
matching vector count

The broader long-term corpus is still not fully populated yet. Wider project and vault ingestion remains a deliberate next step rather than something already completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.

For human-readable quality review, the current staged project markdown corpus is primarily visible under:

/srv/storage/atocore/sources/vault/incoming/projects

This staged area is now useful for review because it contains the markdown/text project docs that were actually ingested for the full active-project wave.

It is important to read this staged area correctly:

it is a readable ingestion input layer
it is not the final machine-memory representation itself
seeing familiar PKM-style notes there is expected
the machine-processed intelligence lives in the DB, chunks, vectors, memory, trusted project state, and context-builder outputs

What Is True On The T420

SSH access is working
OpenClaw workspace inspected at /home/papa/clawd
OpenClaw's own memory system remains unchanged
a read-only AtoCore integration skill exists in the workspace:
- /home/papa/clawd/skills/atocore-context/
the T420 can successfully reach Dalidou AtoCore over network/Tailscale
fail-open behavior has been verified for the helper path
OpenClaw can now seed AtoCore in two distinct ways:
- project-scoped memory entries
- staged document ingestion into the retrieval corpus
the helper now supports the practical registered-project lifecycle:
- projects
- project-template
- propose-project
- register-project
- update-project
- refresh-project
the helper now also supports the first organic routing layer:
- detect-project "<prompt>"
- auto-context "<prompt>" [budget] [project]
OpenClaw can now default to AtoCore for project-knowledge questions without requiring explicit helper commands from the human every time

What Exists In Memory vs Corpus

These remain separate and that is intentional.

In /memory:

project-scoped curated memories now exist for:
- p04-gigabit: 5 memories
- p05-interferometer: 6 memories
- p06-polisher: 8 memories

These are curated summaries and extracted stable project signals.

In source_documents / retrieval corpus:

full project markdown/text corpora are now present for the active project set
retrieval is no longer limited to AtoCore self-knowledge only
the current corpus is broad enough that ranking quality matters more than corpus presence alone
underspecified prompts can still pull in historical or archive material, so project-aware routing and better ranking remain important

The source refresh model now has a concrete foundation in code:

a project registry file defines known project ids, aliases, and ingest roots
the API can list registered projects
the API can return a registration template
the API can preview a registration without mutating state
the API can persist an approved registration
the API can update an existing registered project without changing its canonical id
the API can refresh one registered project at a time

This lifecycle is now coherent end to end for normal use.

The first live update passes on existing registered projects have now been verified against p04-gigabit and p05-interferometer:

the registration description can be updated safely
the canonical project id remains unchanged
refresh still behaves cleanly after the update
context/build still returns useful project-specific context afterward

Reliability Baseline

The runtime has now been hardened in a few practical ways:

SQLite connections use a configurable busy timeout
SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
project registry writes are atomic file replacements rather than in-place rewrites
a first runtime backup path now exists for:
- SQLite
- project registry
- backup metadata

This does not eliminate every concurrency edge, but it materially improves the current operational baseline.

In Trusted Project State:

each active seeded project now has a conservative trusted-state set
promoted facts cover:
- summary
- core architecture or boundary decision
- key constraints
- next focus

This separation is healthy:

memory stores distilled project facts
corpus stores the underlying retrievable documents

Immediate Next Focus

Use the new T420-side organic routing layer in real OpenClaw workflows
Tighten retrieval quality for the now fully ingested active project corpora
Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
Expand the boring operations baseline:
- restore validation
- Chroma rebuild / backup policy
- retention
Only later consider write-back, reflection, or deeper autonomous behaviors

Guiding Constraints

bad memory is worse than no memory
trusted project state must remain highest priority
human-readable sources and machine storage stay separate
OpenClaw integration must not degrade OpenClaw baseline behavior

8.1 KiB Raw Blame History