Three planning docs that answer the architectural questions the engineering query catalog raised. Together with the catalog they form roughly half of the pre-implementation planning sprint. docs/architecture/memory-vs-entities.md --------------------------------------- Resolves the central question blocking every other engineering layer doc: is a Decision a memory or an entity? Key decisions: - memories stay the canonical home for identity, preference, and episodic facts - entities become the canonical home for project, knowledge, and adaptation facts once the engineering layer V1 ships - no concept lives in both layers at full fidelity; one canonical home per concept - a "graduation" flow lets active memories upgrade into entities (memory stays as a frozen historical pointer, never deleted) - one shared candidate review queue across both layers - context builder budget gains a 15% slot for engineering entities, slotted between identity/preference memories and retrieved chunks - the Phase 9 memory extractor's structural cues (decision heading, constraint heading, requirement heading) are explicitly an intentional temporary overlap, cleanly migrated via graduation when the entity extractor ships docs/architecture/promotion-rules.md ------------------------------------ Defines the full Layer 0 → Layer 2 pipeline: - four layers: L0 raw source, L1 memory candidate/active, L2 entity candidate/active, L3 trusted project state - three extraction triggers: on interaction capture (existing), on ingestion wave (new, batched per wave), on explicit request - per-rule prior confidence tuned at write time by structural signal (echoes the retriever's high/low signal hints) and freshness bonus - batch cap of 50 candidates per pass to protect the reviewer - full provenance requirements: every candidate carries rule id, source_chunk_id, source_interaction_id, and extractor_version - reversibility matrix for every promotion step - explicit no-auto-promotion-in-V1 stance with the schema designed so auto-promotion policies can be added later without migration - the hard invariant: nothing ever moves into L3 automatically - ingestion-wave extraction produces a report artifact under data/extraction-reports/<wave-id>/ docs/architecture/conflict-model.md ----------------------------------- Defines how AtoCore handles contradictory facts without violating the "bad memory is worse than no memory" rule. - conflict = two or more active rows claiming the same slot with incompatible values - per-type "slot key" tuples for both memory and entity types - cross-layer conflict detection respects the trust hierarchy: trusted project state > active entities > active memories - new conflicts and conflict_members tables (schema proposal) - detection at two latencies: synchronous at write time, asynchronous nightly sweep - "flag, never block" rule: writes always succeed, conflicts are surfaced via /conflicts, /health open_conflicts_count, per-row response bodies, and the Human Mirror's disputed marker - resolution is always human: promote-winner + supersede-others, or dismiss-as-not-a-real-conflict, both with audit trail - explicitly out of scope for V1: cross-project conflicts, temporal-overlap conflicts, tolerance-aware numeric comparisons Also updates: - master-plan-status.md: Phase 9 moved from "started" to "baseline complete" now that Commits A, B, C are all landed - master-plan-status.md: adds a "Engineering Layer Planning Sprint" section listing the doc wave so far and the remaining docs (tool-handoff-boundaries, human-mirror-rules, representation-authority, engineering-v1-acceptance) - current-state.md: Phase 9 moved from "not started" to "baseline complete" with the A/B/C annotation This is pure doc work. No code changes, no schema changes, no behavior changes. Per the working rule in master-plan-status.md: the architecture docs shape decisions, they do not force premature schema work.
8.1 KiB
AtoCore Current State
Status Summary
AtoCore is no longer just a proof of concept. The local engine exists, the correctness pass is complete, Dalidou now hosts the canonical runtime and machine-storage location, and the T420/OpenClaw side now has a safe read-only path to consume AtoCore. The live corpus is no longer just self-knowledge: it now includes a first curated ingestion batch for the active projects.
Phase Assessment
- completed
- Phase 0
- Phase 0.5
- Phase 1
- baseline complete
- Phase 2
- Phase 3
- Phase 5
- Phase 7
- Phase 9 (Commits A/B/C: capture, reinforcement, extractor + review queue)
- partial
- Phase 4
- Phase 8
- not started
- Phase 6
- Phase 10
- Phase 11
- Phase 12
- Phase 13
What Exists Today
- ingestion pipeline
- parser and chunker
- SQLite-backed memory and project state
- vector retrieval
- context builder
- API routes for query, context, health, and source status
- project registry and per-project refresh foundation
- project registration lifecycle:
- template
- proposal preview
- approved registration
- safe update of existing project registrations
- refresh
- implementation-facing architecture notes for:
- engineering knowledge hybrid architecture
- engineering ontology v1
- env-driven storage and deployment paths
- Dalidou Docker deployment foundation
- initial AtoCore self-knowledge corpus ingested on Dalidou
- T420/OpenClaw read-only AtoCore helper skill
- full active-project markdown/text corpus wave for:
p04-gigabitp05-interferometerp06-polisher
What Is True On Dalidou
- deployed repo location:
/srv/storage/atocore/app
- canonical machine DB location:
/srv/storage/atocore/data/db/atocore.db
- canonical vector store location:
/srv/storage/atocore/data/chroma
- source input locations:
/srv/storage/atocore/sources/vault/srv/storage/atocore/sources/drive
The service and storage foundation are live on Dalidou.
The machine-data host is real and canonical.
The project registry is now also persisted in a canonical mounted config path on Dalidou:
/srv/storage/atocore/config/project-registry.json
The content corpus is partially populated now.
The Dalidou instance already contains:
- AtoCore ecosystem and hosting docs
- current-state and OpenClaw integration docs
- Master Plan V3
- Build Spec V1
- trusted project-state entries for
atocore - full staged project markdown/text corpora for:
p04-gigabitp05-interferometerp06-polisher
- curated repo-context docs for:
p05:Fullum-Interferometerp06:polisher-sim
- trusted project-state entries for:
p04-gigabitp05-interferometerp06-polisher
Current live stats after the full active-project wave are now far beyond the initial seed stage:
- more than
1,100source documents - more than
20,000chunks - matching vector count
The broader long-term corpus is still not fully populated yet. Wider project and vault ingestion remains a deliberate next step rather than something already completed, but the corpus is now meaningfully seeded beyond AtoCore's own docs.
For human-readable quality review, the current staged project markdown corpus is primarily visible under:
/srv/storage/atocore/sources/vault/incoming/projects
This staged area is now useful for review because it contains the markdown/text project docs that were actually ingested for the full active-project wave.
It is important to read this staged area correctly:
- it is a readable ingestion input layer
- it is not the final machine-memory representation itself
- seeing familiar PKM-style notes there is expected
- the machine-processed intelligence lives in the DB, chunks, vectors, memory, trusted project state, and context-builder outputs
What Is True On The T420
- SSH access is working
- OpenClaw workspace inspected at
/home/papa/clawd - OpenClaw's own memory system remains unchanged
- a read-only AtoCore integration skill exists in the workspace:
/home/papa/clawd/skills/atocore-context/
- the T420 can successfully reach Dalidou AtoCore over network/Tailscale
- fail-open behavior has been verified for the helper path
- OpenClaw can now seed AtoCore in two distinct ways:
- project-scoped memory entries
- staged document ingestion into the retrieval corpus
- the helper now supports the practical registered-project lifecycle:
- projects
- project-template
- propose-project
- register-project
- update-project
- refresh-project
- the helper now also supports the first organic routing layer:
detect-project "<prompt>"auto-context "<prompt>" [budget] [project]
- OpenClaw can now default to AtoCore for project-knowledge questions without requiring explicit helper commands from the human every time
What Exists In Memory vs Corpus
These remain separate and that is intentional.
In /memory:
- project-scoped curated memories now exist for:
p04-gigabit: 5 memoriesp05-interferometer: 6 memoriesp06-polisher: 8 memories
These are curated summaries and extracted stable project signals.
In source_documents / retrieval corpus:
- full project markdown/text corpora are now present for the active project set
- retrieval is no longer limited to AtoCore self-knowledge only
- the current corpus is broad enough that ranking quality matters more than corpus presence alone
- underspecified prompts can still pull in historical or archive material, so project-aware routing and better ranking remain important
The source refresh model now has a concrete foundation in code:
- a project registry file defines known project ids, aliases, and ingest roots
- the API can list registered projects
- the API can return a registration template
- the API can preview a registration without mutating state
- the API can persist an approved registration
- the API can update an existing registered project without changing its canonical id
- the API can refresh one registered project at a time
This lifecycle is now coherent end to end for normal use.
The first live update passes on existing registered projects have now been
verified against p04-gigabit and p05-interferometer:
- the registration description can be updated safely
- the canonical project id remains unchanged
- refresh still behaves cleanly after the update
context/buildstill returns useful project-specific context afterward
Reliability Baseline
The runtime has now been hardened in a few practical ways:
- SQLite connections use a configurable busy timeout
- SQLite uses WAL mode to reduce transient lock pain under normal concurrent use
- project registry writes are atomic file replacements rather than in-place rewrites
- a first runtime backup path now exists for:
- SQLite
- project registry
- backup metadata
This does not eliminate every concurrency edge, but it materially improves the current operational baseline.
In Trusted Project State:
- each active seeded project now has a conservative trusted-state set
- promoted facts cover:
- summary
- core architecture or boundary decision
- key constraints
- next focus
This separation is healthy:
- memory stores distilled project facts
- corpus stores the underlying retrievable documents
Immediate Next Focus
- Use the new T420-side organic routing layer in real OpenClaw workflows
- Tighten retrieval quality for the now fully ingested active project corpora
- Move to Wave 2 trusted-operational ingestion instead of blindly widening raw corpus further
- Keep the new engineering-knowledge architecture docs as implementation guidance while avoiding premature schema work
- Expand the boring operations baseline:
- restore validation
- Chroma rebuild / backup policy
- retention
- Only later consider write-back, reflection, or deeper autonomous behaviors
See also:
Guiding Constraints
- bad memory is worse than no memory
- trusted project state must remain highest priority
- human-readable sources and machine storage stay separate
- OpenClaw integration must not degrade OpenClaw baseline behavior