ATOCore/docs/architecture/engineering-v1-acceptance.md

# Engineering Layer V1 Acceptance Criteria

## Why this document exists

The engineering layer planning sprint produced 7 architecture
docs. None of them on their own says "you're done with V1, ship
it". This document does. It translates the planning into
measurable, falsifiable acceptance criteria so the implementation
sprint can know unambiguously when V1 is complete.

The acceptance criteria are organized into four categories:

1. **Functional** — what the system must be able to do
2. **Quality** — how well it must do it
3. **Operational** — what running it must look like
4. **Documentation** — what must be written down

V1 is "done" only when **every criterion in this document is met
against at least one of the three active projects** (`p04-gigabit`,
`p05-interferometer`, `p06-polisher`). The choice of which
project is the test bed is up to the implementer, but the same
project must satisfy all functional criteria.

## The single-sentence definition

> AtoCore Engineering Layer V1 is done when, against one chosen
> active project, every v1-required query in
> `engineering-query-catalog.md` returns a correct result, the
> Human Mirror renders a coherent project overview, and a real
> KB-CAD or KB-FEM export round-trips through the ingest →
> review queue → active entity flow without violating any
> conflict or trust invariant.

Everything below is the operational form of that sentence.

## Category 1 — Functional acceptance

### F-1: Entity store implemented per the V1 ontology

- The 12 V1 entity types from `engineering-ontology-v1.md` exist
  in the database with the schema described there
- The 4 relationship families (Structural, Intent, Validation,
  Provenance) are implemented as edges with the relationship
  types listed in the catalog
- Every entity has the shared header fields:
  `id, type, name, project_id, status, confidence, source_refs,
   created_at, updated_at, extractor_version, canonical_home`
- The status lifecycle matches the memory layer:
  `candidate → active → superseded | invalid`

### F-2: All v1-required queries return correct results

For the chosen test project, every query Q-001 through Q-020 in
`engineering-query-catalog.md` must:

- be implemented as an API endpoint with the shape specified in
  the catalog
- return the expected result shape against real data
- include the provenance chain when the catalog requires it
- handle the empty case (no matches) gracefully — empty array,
  not 500

The "killer correctness queries" — Q-006 (orphan requirements),
Q-009 (decisions on flagged assumptions), Q-011 (unsupported
validation claims) — are non-negotiable. If any of those three
returns wrong results, V1 is not done.

### F-3: Tool ingest endpoints are live

Both endpoints from `tool-handoff-boundaries.md` are implemented:

- `POST /ingest/kb-cad/export` accepts the documented JSON
  shape, validates it, and produces entity candidates
- `POST /ingest/kb-fem/export` ditto
- Both refuse exports with invalid schemas (4xx with a clear
  error)
- Both return a summary of created/dropped/failed counts
- Both never auto-promote anything; everything lands as
  `status="candidate"`
- Both carry source identifiers (exporter name, exporter version,
  source artifact id) into the candidate's provenance fields

A real KB-CAD export — even a hand-crafted one if the actual
exporter doesn't exist yet — must round-trip through the endpoint
and produce reviewable candidates for the test project.

### F-4: Candidate review queue works end to end

Per `promotion-rules.md`:

- `GET /entities?status=candidate` lists the queue
- `POST /entities/{id}/promote` moves candidate → active
- `POST /entities/{id}/reject` moves candidate → invalid
- The same shapes work for memories (already shipped in Phase 9 C)
- The reviewer can edit a candidate's content via
  `PUT /entities/{id}` before promoting
- Every promote/reject is logged with timestamp and reason

### F-5: Conflict detection fires

Per `conflict-model.md`:

- The synchronous detector runs at every active write
  (create, promote, project_state set, KB import)
- A test must demonstrate that pushing a contradictory KB-CAD
  export creates a `conflicts` row with both members linked
- The reviewer can resolve the conflict via
  `POST /conflicts/{id}/resolve` with one of the supported
  actions (supersede_others, no_action, dismiss)
- Resolution updates the underlying entities according to the
  chosen action

### F-6: Human Mirror renders for the test project

Per `human-mirror-rules.md`:

- `GET /mirror/{project}/overview` returns rendered markdown
- `GET /mirror/{project}/decisions` returns rendered markdown
- `GET /mirror/{project}/subsystems/{subsystem}` returns
  rendered markdown for at least one subsystem
- `POST /mirror/{project}/regenerate` triggers regeneration on
  demand
- Generated files appear under `/srv/storage/atocore/data/mirror/`
  with the "do not edit" header banner
- Disputed markers appear inline when conflicts exist
- Project-state overrides display with the `(curated)` annotation
- Output is deterministic (the same inputs produce the same
  bytes, suitable for diffing)

### F-7: Memory-to-entity graduation works for at least one type

Per `memory-vs-entities.md`:

- `POST /memory/{id}/graduate` exists
- Graduating a memory of type `adaptation` produces a Decision
  entity candidate with the memory's content as a starting point
- The original memory row stays at `status="graduated"` (a new
  status added by the engineering layer migration)
- The graduated memory has a forward pointer to the entity
  candidate's id
- Promoting the entity candidate does NOT delete the original
  memory
- The same graduation flow works for `project` → Requirement
  and `knowledge` → Fact entity types (test the path; doesn't
  have to be exhaustive)

### F-8: Provenance chain is complete

For every active entity in the test project, the following must
be true:

- It links back to at least one source via `source_refs` (which
  is one or more of: source_chunk_id, source_interaction_id,
  source_artifact_id from KB import)
- The provenance chain can be walked from the entity to the
  underlying raw text (source_chunks) or external artifact
- Q-017 (the evidence query) returns at least one row for every
  active entity

If any active entity has no provenance, it's a bug — provenance
is mandatory at write time per the promotion rules.

## Category 2 — Quality acceptance

### Q-1: All existing tests still pass

The full pre-V1 test suite (currently 160 tests) must still
pass. The V1 implementation may add new tests but cannot regress
any existing test.

### Q-2: V1 has its own test coverage

For each of F-1 through F-8 above, at least one automated test
exists that:

- exercises the happy path
- covers at least one error path
- runs in CI in under 10 seconds (no real network, no real LLM)

The full V1 test suite should be under 30 seconds total runtime
to keep the development loop fast.

### Q-3: Conflict invariants are enforced by tests

Specific tests must demonstrate:

- Two contradictory KB exports produce a conflict (not silent
  overwrite)
- A reviewer can't accidentally promote both members of an open
  conflict to active without resolving the conflict first
- The "flag, never block" rule holds — writes still succeed
  even when they create a conflict

### Q-4: Trust hierarchy is enforced by tests

Specific tests must demonstrate:

- Entity candidates can never appear in context packs
- Reinforcement only touches active memories (already covered
  by Phase 9 Commit B tests, but the same property must hold
  for entities once they exist)
- Nothing automatically writes to project_state ever
- Candidates can never satisfy Q-005 (only active entities count)

### Q-5: The Human Mirror is reproducible

A golden-file test exists for at least one Mirror page. Updating
the golden file is a normal part of template work (single
command, well-documented). The test fails if the renderer
produces different bytes for the same input, catching
non-determinism.

### Q-6: Killer correctness queries pass against real-ish data

The test bed for Q-006, Q-009, Q-011 is not synthetic. The
implementation must seed the test project with at least:

- One Requirement that has a satisfying Component (Q-006 should
  not flag it)
- One Requirement with no satisfying Component (Q-006 must flag it)
- One Decision based on an Assumption flagged as `needs_review`
  (Q-009 must flag the Decision)
- One ValidationClaim with at least one supporting Result
  (Q-011 should not flag it)
- One ValidationClaim with no supporting Result (Q-011 must flag it)

These five seed cases run as a single integration test that
exercises the killer correctness queries against actual
representative data.

## Category 3 — Operational acceptance

### O-1: Migration is safe and reversible

The V1 schema migration (adding the `entities`, `relationships`,
`conflicts`, `conflict_members` tables, plus `mirror_regeneration_failures`)
must:

- run cleanly against a production-shape database
- be implemented via the same `_apply_migrations` pattern as
  Phase 9 (additive only, idempotent, safe to run twice)
- be tested by spinning up a fresh DB AND running against a
  copy of the live Dalidou DB taken from a backup

### O-2: Backup and restore still work

The backup endpoint must include the new tables. A restore drill
on the test project must:

- successfully back up the V1 entity state via
  `POST /admin/backup`
- successfully validate the snapshot
- successfully restore from the snapshot per
  `docs/backup-restore-procedure.md`
- pass post-restore verification including a Q-001 query against
  the test project

The drill must be performed once before V1 is declared done.

### O-3: Performance bounds

These are starting bounds; tune later if real usage shows
problems:

- Single-entity write (`POST /entities/...`): under 100ms p99
  on the production Dalidou hardware
- Single Q-001 / Q-005 / Q-008 query: under 500ms p99 against
  a project with up to 1000 entities
- Mirror regeneration of one project overview: under 5 seconds
  for a project with up to 1000 entities
- Conflict detector at write time: adds no more than 50ms p99
  to a write that doesn't actually produce a conflict

These bounds are not tested by automated benchmarks in V1 (that
would be over-engineering). They are sanity-checked by the
developer running the operations against the test project.

### O-4: No new manual ops burden

V1 should not introduce any new "you have to remember to run X
every day" requirement. Specifically:

- Mirror regeneration is automatic (debounced async + daily
  refresh), no manual cron entry needed
- Conflict detection is automatic at write time, no manual sweep
  needed in V1 (the nightly sweep is V2)
- Backup retention cleanup is **still** an open follow-up from
  the operational baseline; V1 does not block on it

### O-5: No regressions in Phase 9 reflection loop

The capture, reinforcement, and extraction loop from Phase 9
A/B/C must continue to work end to end with the engineering
layer in place. Specifically:

- Memories whose types are NOT in the engineering layer
  (identity, preference, episodic) keep working exactly as
  before
- Memories whose types ARE in the engineering layer (project,
  knowledge, adaptation) can still be created hand or by
  extraction; the deprecation rule from `memory-vs-entities.md`
  ("no new writes after V1 ships") is implemented as a
  configurable warning, not a hard block, so existing
  workflows aren't disrupted

## Category 4 — Documentation acceptance

### D-1: Per-entity-type spec docs

Each of the 12 V1 entity types has a short spec doc under
`docs/architecture/entities/` covering:

- the entity's purpose
- its required and optional fields
- its lifecycle quirks (if any beyond the standard
  candidate/active/superseded/invalid)
- which queries it appears in (cross-reference to the catalog)
- which relationship types reference it

These docs can be terse — a page each, mostly bullet lists.
Their purpose is to make the entity model legible to a future
maintainer, not to be reference manuals.

### D-2: KB-CAD and KB-FEM export schema docs

`docs/architecture/kb-cad-export-schema.md` and
`docs/architecture/kb-fem-export-schema.md` are written and
match the implemented validators.

### D-3: V1 release notes

A `docs/v1-release-notes.md` summarizes:

- What V1 added (entities, relationships, conflicts, mirror,
  ingest endpoints)
- What V1 deferred (auto-promotion, BOM/cost/manufacturing
  entities, NX direct integration, cross-project rollups)
- The migration story for existing memories (graduation flow)
- Known limitations and the V2 roadmap pointers

### D-4: master-plan-status.md and current-state.md updated

Both top-level status docs reflect V1's completion:

- Phase 6 (AtoDrive) and the engineering layer are explicitly
  marked as separate tracks
- The engineering planning sprint section is marked complete
- Phase 9 stays at "baseline complete" (V1 doesn't change Phase 9)
- The engineering layer V1 is added as its own line item

## What V1 explicitly does NOT need to do

To prevent scope creep, here is the negative list. None of the
following are V1 acceptance criteria:

- **No LLM extractor.** The Phase 9 C rule-based extractor is
  the entity extractor for V1 too, just with new rules added for
  entity types.
- **No auto-promotion of candidates.** Per `promotion-rules.md`.
- **No write-back to KB-CAD or KB-FEM.** Per
  `tool-handoff-boundaries.md`.
- **No multi-user / per-reviewer auth.** Single-user assumed.
- **No real-time UI.** API + Mirror markdown is the V1 surface.
  A web UI is V2+.
- **No cross-project rollups.** Per `human-mirror-rules.md`.
- **No time-travel queries** (Q-015 stays v1-stretch).
- **No nightly conflict sweep.** Synchronous detection only in V1.
- **No incremental Chroma snapshots.** The current full-copy
  approach in `backup-restore-procedure.md` is fine for V1.
- **No retention cleanup script.** Still an open follow-up.
- **No backup encryption.** Still an open follow-up.
- **No off-Dalidou backup target.** Still an open follow-up.

## How to use this document during implementation

When the implementation sprint begins:

1. Read this doc once, top to bottom
2. Pick the test project (probably p05-interferometer because
   the optical/structural domain has the cleanest entity model)
3. For each section, write the test or the implementation, in
   roughly the order: F-1 → F-2 → F-3 → F-4 → F-5 → F-6 → F-7 → F-8
4. Each acceptance criterion's test should be written **before
   or alongside** the implementation, not after
5. Run the full test suite at every commit
6. When every box is checked, write D-3 (release notes), update
   D-4 (status docs), and call V1 done

The implementation sprint should not touch anything outside the
scope listed here. If a desire arises to add something not in
this doc, that's a V2 conversation, not a V1 expansion.

## Anticipated friction points

These are the things I expect will be hard during implementation:

1. **The graduation flow (F-7)** is the most cross-cutting
   change because it touches the existing memory module.
   Worth doing it last so the memory module is stable for
   all the V1 entity work first.
2. **The Mirror's deterministic-output requirement (Q-5)** will
   bite if the implementer iterates over Python dicts without
   sorting. Plan to use `sorted()` literally everywhere.
3. **Conflict detection (F-5)** has subtle correctness traps:
   the slot key extraction must be stable, the dedup-of-existing-conflicts
   logic must be right, and the synchronous detector must not
   slow writes meaningfully (Q-3 / O-3 cover this, but watch).
4. **Provenance backfill** for entities that come from the
   existing memory layer via graduation (F-7) is the trickiest
   part: the original memory may not have had a strict
   `source_chunk_id`, in which case the graduated entity also
   doesn't have one. The implementation needs an "orphan
   provenance" allowance for graduated entities, with a
   warning surfaced in the Mirror.

These aren't blockers, just the parts of the V1 spec I'd
attack with extra care.

## TL;DR

- Engineering V1 is done when every box in this doc is checked
  against one chosen active project
- Functional: 8 criteria covering entities, queries, ingest,
  review queue, conflicts, mirror, graduation, provenance
- Quality: 6 criteria covering tests, golden files, killer
  correctness, trust enforcement
- Operational: 5 criteria covering migration safety, backup
  drill, performance bounds, no new manual ops, Phase 9 not
  regressed
- Documentation: 4 criteria covering entity specs, KB schema
  docs, release notes, top-level status updates
- Negative list: a clear set of things V1 deliberately does
  NOT need to do, to prevent scope creep
- The implementation sprint follows this doc as a checklist