Files

Anto01 d6ce6128cf docs(arch): human-mirror-rules + engineering-v1-acceptance, sprint complete

Session 4 of the four-session plan. Final two engineering planning
docs, plus master-plan-status.md updated to reflect that the
engineering layer planning sprint is now complete.

docs/architecture/human-mirror-rules.md
---------------------------------------
The Layer 3 derived markdown view spec:

- The non-negotiable rule: the Mirror is read-only from the
  human's perspective; edits go to the canonical home and the
  Mirror picks them up on regeneration
- 3 V1 template families: Project Overview, Decision Log,
  Subsystem Detail
- Explicit V1 exclusions: per-component pages, per-decision
  pages, cross-project rollups, time-series pages, diff pages,
  conflict queue render, per-memory pages
- Mirror files live in /srv/storage/atocore/data/mirror/ NOT in
  the source vault (sources stay read-only per the operating
  model)
- 3 regeneration triggers: explicit POST, debounced async on
  entity write, daily scheduled refresh
- "Do not edit" header banner with checksum so unchanged inputs
  skip work
- Conflicts and project_state overrides surface inline so the
  trust hierarchy is visible in the human reading experience
- Templates checked in under templates/mirror/, edited via PR
- Deterministic output is a V1 requirement so future Mirror
  diffing works without rework
- Open questions for V1: debounce window, scheduler integration,
  template testing approach, directory listing endpoint, empty
  state rendering

docs/architecture/engineering-v1-acceptance.md
----------------------------------------------
The measurable done definition:

- Single-sentence definition: V1 is done when every v1-required
  query in engineering-query-catalog.md returns a correct result
  for one chosen test project, the Human Mirror renders a
  coherent overview, and a real KB-CAD or KB-FEM export round-
  trips through ingest -> review queue -> active entity without
  violating any conflict or trust invariant
- 23 acceptance criteria across 4 categories:
  * Functional (8): entity store, all 20 v1-required queries,
    tool ingest endpoints, candidate review queue, conflict
    detection, Human Mirror, memory-to-entity graduation,
    complete provenance chain
  * Quality (6): existing tests pass, V1 has its own coverage,
    conflict invariants enforced, trust hierarchy enforced,
    Mirror reproducible via golden file, killer correctness
    queries pass against representative data
  * Operational (5): safe migration, backup/restore drill,
    performance bounds, no new manual ops burden, Phase 9 not
    regressed
  * Documentation (4): per-entity-type spec docs, KB schema docs,
    V1 release notes, master-plan-status updated
- Explicit negative list of things V1 does NOT need to do:
  no LLM extractor, no auto-promotion, no write-back, no
  multi-user, no real-time UI, no cross-project rollups,
  no time-travel, no nightly conflict sweep, no incremental
  Chroma, no retention cleanup, no encryption, no off-Dalidou
  backup target
- Recommended implementation order: F-1 -> F-8 in sequence,
  with the graduation flow (F-7) saved for last as the most
  cross-cutting change
- Anticipated friction points called out in advance:
  graduation cross-cuts memory module, Mirror determinism trap,
  conflict detector subtle correctness, provenance backfill
  for graduated entities

master-plan-status.md updated
-----------------------------
- Engineering Layer Planning Sprint section now marked complete
  with all 8 architecture docs listed
- Note that the next concrete step is the V1 implementation
  sprint following engineering-v1-acceptance.md as its checklist

Pure doc work. No code, no schema, no behavior changes.

After this commit, the engineering planning sprint is fully done
(8/8 docs) and Phase 9 is fully complete (Commits A/B/C all
shipped, validated, and pushed). AtoCore is ready for either
the engineering V1 implementation sprint OR a pause for real-
world Phase 9 usage, depending on which the user prefers next.

2026-04-07 06:55:43 -04:00

17 KiB

Raw Permalink Blame History

Engineering Layer V1 Acceptance Criteria

Why this document exists

The engineering layer planning sprint produced 7 architecture docs. None of them on their own says "you're done with V1, ship it". This document does. It translates the planning into measurable, falsifiable acceptance criteria so the implementation sprint can know unambiguously when V1 is complete.

The acceptance criteria are organized into four categories:

Functional — what the system must be able to do
Quality — how well it must do it
Operational — what running it must look like
Documentation — what must be written down

V1 is "done" only when every criterion in this document is met against at least one of the three active projects (p04-gigabit, p05-interferometer, p06-polisher). The choice of which project is the test bed is up to the implementer, but the same project must satisfy all functional criteria.

The single-sentence definition

AtoCore Engineering Layer V1 is done when, against one chosen active project, every v1-required query in engineering-query-catalog.md returns a correct result, the Human Mirror renders a coherent project overview, and a real KB-CAD or KB-FEM export round-trips through the ingest → review queue → active entity flow without violating any conflict or trust invariant.

Everything below is the operational form of that sentence.

Category 1 — Functional acceptance

F-1: Entity store implemented per the V1 ontology

The 12 V1 entity types from engineering-ontology-v1.md exist in the database with the schema described there
The 4 relationship families (Structural, Intent, Validation, Provenance) are implemented as edges with the relationship types listed in the catalog
Every entity has the shared header fields: id, type, name, project_id, status, confidence, source_refs, created_at, updated_at, extractor_version, canonical_home
The status lifecycle matches the memory layer: candidate → active → superseded | invalid

F-2: All v1-required queries return correct results

For the chosen test project, every query Q-001 through Q-020 in engineering-query-catalog.md must:

be implemented as an API endpoint with the shape specified in the catalog
return the expected result shape against real data
include the provenance chain when the catalog requires it
handle the empty case (no matches) gracefully — empty array, not 500

The "killer correctness queries" — Q-006 (orphan requirements), Q-009 (decisions on flagged assumptions), Q-011 (unsupported validation claims) — are non-negotiable. If any of those three returns wrong results, V1 is not done.

F-3: Tool ingest endpoints are live

Both endpoints from tool-handoff-boundaries.md are implemented:

POST /ingest/kb-cad/export accepts the documented JSON shape, validates it, and produces entity candidates
POST /ingest/kb-fem/export ditto
Both refuse exports with invalid schemas (4xx with a clear error)
Both return a summary of created/dropped/failed counts
Both never auto-promote anything; everything lands as status="candidate"
Both carry source identifiers (exporter name, exporter version, source artifact id) into the candidate's provenance fields

A real KB-CAD export — even a hand-crafted one if the actual exporter doesn't exist yet — must round-trip through the endpoint and produce reviewable candidates for the test project.

F-4: Candidate review queue works end to end

Per promotion-rules.md:

GET /entities?status=candidate lists the queue
POST /entities/{id}/promote moves candidate → active
POST /entities/{id}/reject moves candidate → invalid
The same shapes work for memories (already shipped in Phase 9 C)
The reviewer can edit a candidate's content via PUT /entities/{id} before promoting
Every promote/reject is logged with timestamp and reason

F-5: Conflict detection fires

Per conflict-model.md:

The synchronous detector runs at every active write (create, promote, project_state set, KB import)
A test must demonstrate that pushing a contradictory KB-CAD export creates a conflicts row with both members linked
The reviewer can resolve the conflict via POST /conflicts/{id}/resolve with one of the supported actions (supersede_others, no_action, dismiss)
Resolution updates the underlying entities according to the chosen action

F-6: Human Mirror renders for the test project

Per human-mirror-rules.md:

GET /mirror/{project}/overview returns rendered markdown
GET /mirror/{project}/decisions returns rendered markdown
GET /mirror/{project}/subsystems/{subsystem} returns rendered markdown for at least one subsystem
POST /mirror/{project}/regenerate triggers regeneration on demand
Generated files appear under /srv/storage/atocore/data/mirror/ with the "do not edit" header banner
Disputed markers appear inline when conflicts exist
Project-state overrides display with the (curated) annotation
Output is deterministic (the same inputs produce the same bytes, suitable for diffing)

F-7: Memory-to-entity graduation works for at least one type

Per memory-vs-entities.md:

POST /memory/{id}/graduate exists
Graduating a memory of type adaptation produces a Decision entity candidate with the memory's content as a starting point
The original memory row stays at status="graduated" (a new status added by the engineering layer migration)
The graduated memory has a forward pointer to the entity candidate's id
Promoting the entity candidate does NOT delete the original memory
The same graduation flow works for project → Requirement and knowledge → Fact entity types (test the path; doesn't have to be exhaustive)

F-8: Provenance chain is complete

For every active entity in the test project, the following must be true:

It links back to at least one source via source_refs (which is one or more of: source_chunk_id, source_interaction_id, source_artifact_id from KB import)
The provenance chain can be walked from the entity to the underlying raw text (source_chunks) or external artifact
Q-017 (the evidence query) returns at least one row for every active entity

If any active entity has no provenance, it's a bug — provenance is mandatory at write time per the promotion rules.

Category 2 — Quality acceptance

Q-1: All existing tests still pass

The full pre-V1 test suite (currently 160 tests) must still pass. The V1 implementation may add new tests but cannot regress any existing test.

Q-2: V1 has its own test coverage

For each of F-1 through F-8 above, at least one automated test exists that:

exercises the happy path
covers at least one error path
runs in CI in under 10 seconds (no real network, no real LLM)

The full V1 test suite should be under 30 seconds total runtime to keep the development loop fast.

Q-3: Conflict invariants are enforced by tests

Specific tests must demonstrate:

Two contradictory KB exports produce a conflict (not silent overwrite)
A reviewer can't accidentally promote both members of an open conflict to active without resolving the conflict first
The "flag, never block" rule holds — writes still succeed even when they create a conflict

Q-4: Trust hierarchy is enforced by tests

Specific tests must demonstrate:

Entity candidates can never appear in context packs
Reinforcement only touches active memories (already covered by Phase 9 Commit B tests, but the same property must hold for entities once they exist)
Nothing automatically writes to project_state ever
Candidates can never satisfy Q-005 (only active entities count)

Q-5: The Human Mirror is reproducible

A golden-file test exists for at least one Mirror page. Updating the golden file is a normal part of template work (single command, well-documented). The test fails if the renderer produces different bytes for the same input, catching non-determinism.

Q-6: Killer correctness queries pass against real-ish data

The test bed for Q-006, Q-009, Q-011 is not synthetic. The implementation must seed the test project with at least:

One Requirement that has a satisfying Component (Q-006 should not flag it)
One Requirement with no satisfying Component (Q-006 must flag it)
One Decision based on an Assumption flagged as needs_review (Q-009 must flag the Decision)
One ValidationClaim with at least one supporting Result (Q-011 should not flag it)
One ValidationClaim with no supporting Result (Q-011 must flag it)

These five seed cases run as a single integration test that exercises the killer correctness queries against actual representative data.

Category 3 — Operational acceptance

O-1: Migration is safe and reversible

The V1 schema migration (adding the entities, relationships, conflicts, conflict_members tables, plus mirror_regeneration_failures) must:

run cleanly against a production-shape database
be implemented via the same _apply_migrations pattern as Phase 9 (additive only, idempotent, safe to run twice)
be tested by spinning up a fresh DB AND running against a copy of the live Dalidou DB taken from a backup

O-2: Backup and restore still work

The backup endpoint must include the new tables. A restore drill on the test project must:

successfully back up the V1 entity state via POST /admin/backup
successfully validate the snapshot
successfully restore from the snapshot per docs/backup-restore-procedure.md
pass post-restore verification including a Q-001 query against the test project

The drill must be performed once before V1 is declared done.

O-3: Performance bounds

These are starting bounds; tune later if real usage shows problems:

Single-entity write (POST /entities/...): under 100ms p99 on the production Dalidou hardware
Single Q-001 / Q-005 / Q-008 query: under 500ms p99 against a project with up to 1000 entities
Mirror regeneration of one project overview: under 5 seconds for a project with up to 1000 entities
Conflict detector at write time: adds no more than 50ms p99 to a write that doesn't actually produce a conflict

These bounds are not tested by automated benchmarks in V1 (that would be over-engineering). They are sanity-checked by the developer running the operations against the test project.

O-4: No new manual ops burden

V1 should not introduce any new "you have to remember to run X every day" requirement. Specifically:

Mirror regeneration is automatic (debounced async + daily refresh), no manual cron entry needed
Conflict detection is automatic at write time, no manual sweep needed in V1 (the nightly sweep is V2)
Backup retention cleanup is still an open follow-up from the operational baseline; V1 does not block on it

O-5: No regressions in Phase 9 reflection loop

The capture, reinforcement, and extraction loop from Phase 9 A/B/C must continue to work end to end with the engineering layer in place. Specifically:

Memories whose types are NOT in the engineering layer (identity, preference, episodic) keep working exactly as before
Memories whose types ARE in the engineering layer (project, knowledge, adaptation) can still be created hand or by extraction; the deprecation rule from memory-vs-entities.md ("no new writes after V1 ships") is implemented as a configurable warning, not a hard block, so existing workflows aren't disrupted

Category 4 — Documentation acceptance

D-1: Per-entity-type spec docs

Each of the 12 V1 entity types has a short spec doc under docs/architecture/entities/ covering:

the entity's purpose
its required and optional fields
its lifecycle quirks (if any beyond the standard candidate/active/superseded/invalid)
which queries it appears in (cross-reference to the catalog)
which relationship types reference it

These docs can be terse — a page each, mostly bullet lists. Their purpose is to make the entity model legible to a future maintainer, not to be reference manuals.

D-2: KB-CAD and KB-FEM export schema docs

docs/architecture/kb-cad-export-schema.md and docs/architecture/kb-fem-export-schema.md are written and match the implemented validators.

D-3: V1 release notes

A docs/v1-release-notes.md summarizes:

What V1 added (entities, relationships, conflicts, mirror, ingest endpoints)
What V1 deferred (auto-promotion, BOM/cost/manufacturing entities, NX direct integration, cross-project rollups)
The migration story for existing memories (graduation flow)
Known limitations and the V2 roadmap pointers

D-4: master-plan-status.md and current-state.md updated

Both top-level status docs reflect V1's completion:

Phase 6 (AtoDrive) and the engineering layer are explicitly marked as separate tracks
The engineering planning sprint section is marked complete
Phase 9 stays at "baseline complete" (V1 doesn't change Phase 9)
The engineering layer V1 is added as its own line item

What V1 explicitly does NOT need to do

To prevent scope creep, here is the negative list. None of the following are V1 acceptance criteria:

No LLM extractor. The Phase 9 C rule-based extractor is the entity extractor for V1 too, just with new rules added for entity types.
No auto-promotion of candidates. Per promotion-rules.md.
No write-back to KB-CAD or KB-FEM. Per tool-handoff-boundaries.md.
No multi-user / per-reviewer auth. Single-user assumed.
No real-time UI. API + Mirror markdown is the V1 surface. A web UI is V2+.
No cross-project rollups. Per human-mirror-rules.md.
No time-travel queries (Q-015 stays v1-stretch).
No nightly conflict sweep. Synchronous detection only in V1.
No incremental Chroma snapshots. The current full-copy approach in backup-restore-procedure.md is fine for V1.
No retention cleanup script. Still an open follow-up.
No backup encryption. Still an open follow-up.
No off-Dalidou backup target. Still an open follow-up.

How to use this document during implementation

When the implementation sprint begins:

Read this doc once, top to bottom
Pick the test project (probably p05-interferometer because the optical/structural domain has the cleanest entity model)
For each section, write the test or the implementation, in roughly the order: F-1 → F-2 → F-3 → F-4 → F-5 → F-6 → F-7 → F-8
Each acceptance criterion's test should be written before or alongside the implementation, not after
Run the full test suite at every commit
When every box is checked, write D-3 (release notes), update D-4 (status docs), and call V1 done

The implementation sprint should not touch anything outside the scope listed here. If a desire arises to add something not in this doc, that's a V2 conversation, not a V1 expansion.

Anticipated friction points

These are the things I expect will be hard during implementation:

The graduation flow (F-7) is the most cross-cutting change because it touches the existing memory module. Worth doing it last so the memory module is stable for all the V1 entity work first.
The Mirror's deterministic-output requirement (Q-5) will bite if the implementer iterates over Python dicts without sorting. Plan to use sorted() literally everywhere.
Conflict detection (F-5) has subtle correctness traps: the slot key extraction must be stable, the dedup-of-existing-conflicts logic must be right, and the synchronous detector must not slow writes meaningfully (Q-3 / O-3 cover this, but watch).
Provenance backfill for entities that come from the existing memory layer via graduation (F-7) is the trickiest part: the original memory may not have had a strict source_chunk_id, in which case the graduated entity also doesn't have one. The implementation needs an "orphan provenance" allowance for graduated entities, with a warning surfaced in the Mirror.

These aren't blockers, just the parts of the V1 spec I'd attack with extra care.

TL;DR

Engineering V1 is done when every box in this doc is checked against one chosen active project
Functional: 8 criteria covering entities, queries, ingest, review queue, conflicts, mirror, graduation, provenance
Quality: 6 criteria covering tests, golden files, killer correctness, trust enforcement
Operational: 5 criteria covering migration safety, backup drill, performance bounds, no new manual ops, Phase 9 not regressed
Documentation: 4 criteria covering entity specs, KB schema docs, release notes, top-level status updates
Negative list: a clear set of things V1 deliberately does NOT need to do, to prevent scope creep
The implementation sprint follows this doc as a checklist

17 KiB Raw Permalink Blame History