435 lines
17 KiB
Markdown
435 lines
17 KiB
Markdown
|
|
# Engineering Layer V1 Acceptance Criteria
|
||
|
|
|
||
|
|
## Why this document exists
|
||
|
|
|
||
|
|
The engineering layer planning sprint produced 7 architecture
|
||
|
|
docs. None of them on their own says "you're done with V1, ship
|
||
|
|
it". This document does. It translates the planning into
|
||
|
|
measurable, falsifiable acceptance criteria so the implementation
|
||
|
|
sprint can know unambiguously when V1 is complete.
|
||
|
|
|
||
|
|
The acceptance criteria are organized into four categories:
|
||
|
|
|
||
|
|
1. **Functional** — what the system must be able to do
|
||
|
|
2. **Quality** — how well it must do it
|
||
|
|
3. **Operational** — what running it must look like
|
||
|
|
4. **Documentation** — what must be written down
|
||
|
|
|
||
|
|
V1 is "done" only when **every criterion in this document is met
|
||
|
|
against at least one of the three active projects** (`p04-gigabit`,
|
||
|
|
`p05-interferometer`, `p06-polisher`). The choice of which
|
||
|
|
project is the test bed is up to the implementer, but the same
|
||
|
|
project must satisfy all functional criteria.
|
||
|
|
|
||
|
|
## The single-sentence definition
|
||
|
|
|
||
|
|
> AtoCore Engineering Layer V1 is done when, against one chosen
|
||
|
|
> active project, every v1-required query in
|
||
|
|
> `engineering-query-catalog.md` returns a correct result, the
|
||
|
|
> Human Mirror renders a coherent project overview, and a real
|
||
|
|
> KB-CAD or KB-FEM export round-trips through the ingest →
|
||
|
|
> review queue → active entity flow without violating any
|
||
|
|
> conflict or trust invariant.
|
||
|
|
|
||
|
|
Everything below is the operational form of that sentence.
|
||
|
|
|
||
|
|
## Category 1 — Functional acceptance
|
||
|
|
|
||
|
|
### F-1: Entity store implemented per the V1 ontology
|
||
|
|
|
||
|
|
- The 12 V1 entity types from `engineering-ontology-v1.md` exist
|
||
|
|
in the database with the schema described there
|
||
|
|
- The 4 relationship families (Structural, Intent, Validation,
|
||
|
|
Provenance) are implemented as edges with the relationship
|
||
|
|
types listed in the catalog
|
||
|
|
- Every entity has the shared header fields:
|
||
|
|
`id, type, name, project_id, status, confidence, source_refs,
|
||
|
|
created_at, updated_at, extractor_version, canonical_home`
|
||
|
|
- The status lifecycle matches the memory layer:
|
||
|
|
`candidate → active → superseded | invalid`
|
||
|
|
|
||
|
|
### F-2: All v1-required queries return correct results
|
||
|
|
|
||
|
|
For the chosen test project, every query Q-001 through Q-020 in
|
||
|
|
`engineering-query-catalog.md` must:
|
||
|
|
|
||
|
|
- be implemented as an API endpoint with the shape specified in
|
||
|
|
the catalog
|
||
|
|
- return the expected result shape against real data
|
||
|
|
- include the provenance chain when the catalog requires it
|
||
|
|
- handle the empty case (no matches) gracefully — empty array,
|
||
|
|
not 500
|
||
|
|
|
||
|
|
The "killer correctness queries" — Q-006 (orphan requirements),
|
||
|
|
Q-009 (decisions on flagged assumptions), Q-011 (unsupported
|
||
|
|
validation claims) — are non-negotiable. If any of those three
|
||
|
|
returns wrong results, V1 is not done.
|
||
|
|
|
||
|
|
### F-3: Tool ingest endpoints are live
|
||
|
|
|
||
|
|
Both endpoints from `tool-handoff-boundaries.md` are implemented:
|
||
|
|
|
||
|
|
- `POST /ingest/kb-cad/export` accepts the documented JSON
|
||
|
|
shape, validates it, and produces entity candidates
|
||
|
|
- `POST /ingest/kb-fem/export` ditto
|
||
|
|
- Both refuse exports with invalid schemas (4xx with a clear
|
||
|
|
error)
|
||
|
|
- Both return a summary of created/dropped/failed counts
|
||
|
|
- Both never auto-promote anything; everything lands as
|
||
|
|
`status="candidate"`
|
||
|
|
- Both carry source identifiers (exporter name, exporter version,
|
||
|
|
source artifact id) into the candidate's provenance fields
|
||
|
|
|
||
|
|
A real KB-CAD export — even a hand-crafted one if the actual
|
||
|
|
exporter doesn't exist yet — must round-trip through the endpoint
|
||
|
|
and produce reviewable candidates for the test project.
|
||
|
|
|
||
|
|
### F-4: Candidate review queue works end to end
|
||
|
|
|
||
|
|
Per `promotion-rules.md`:
|
||
|
|
|
||
|
|
- `GET /entities?status=candidate` lists the queue
|
||
|
|
- `POST /entities/{id}/promote` moves candidate → active
|
||
|
|
- `POST /entities/{id}/reject` moves candidate → invalid
|
||
|
|
- The same shapes work for memories (already shipped in Phase 9 C)
|
||
|
|
- The reviewer can edit a candidate's content via
|
||
|
|
`PUT /entities/{id}` before promoting
|
||
|
|
- Every promote/reject is logged with timestamp and reason
|
||
|
|
|
||
|
|
### F-5: Conflict detection fires
|
||
|
|
|
||
|
|
Per `conflict-model.md`:
|
||
|
|
|
||
|
|
- The synchronous detector runs at every active write
|
||
|
|
(create, promote, project_state set, KB import)
|
||
|
|
- A test must demonstrate that pushing a contradictory KB-CAD
|
||
|
|
export creates a `conflicts` row with both members linked
|
||
|
|
- The reviewer can resolve the conflict via
|
||
|
|
`POST /conflicts/{id}/resolve` with one of the supported
|
||
|
|
actions (supersede_others, no_action, dismiss)
|
||
|
|
- Resolution updates the underlying entities according to the
|
||
|
|
chosen action
|
||
|
|
|
||
|
|
### F-6: Human Mirror renders for the test project
|
||
|
|
|
||
|
|
Per `human-mirror-rules.md`:
|
||
|
|
|
||
|
|
- `GET /mirror/{project}/overview` returns rendered markdown
|
||
|
|
- `GET /mirror/{project}/decisions` returns rendered markdown
|
||
|
|
- `GET /mirror/{project}/subsystems/{subsystem}` returns
|
||
|
|
rendered markdown for at least one subsystem
|
||
|
|
- `POST /mirror/{project}/regenerate` triggers regeneration on
|
||
|
|
demand
|
||
|
|
- Generated files appear under `/srv/storage/atocore/data/mirror/`
|
||
|
|
with the "do not edit" header banner
|
||
|
|
- Disputed markers appear inline when conflicts exist
|
||
|
|
- Project-state overrides display with the `(curated)` annotation
|
||
|
|
- Output is deterministic (the same inputs produce the same
|
||
|
|
bytes, suitable for diffing)
|
||
|
|
|
||
|
|
### F-7: Memory-to-entity graduation works for at least one type
|
||
|
|
|
||
|
|
Per `memory-vs-entities.md`:
|
||
|
|
|
||
|
|
- `POST /memory/{id}/graduate` exists
|
||
|
|
- Graduating a memory of type `adaptation` produces a Decision
|
||
|
|
entity candidate with the memory's content as a starting point
|
||
|
|
- The original memory row stays at `status="graduated"` (a new
|
||
|
|
status added by the engineering layer migration)
|
||
|
|
- The graduated memory has a forward pointer to the entity
|
||
|
|
candidate's id
|
||
|
|
- Promoting the entity candidate does NOT delete the original
|
||
|
|
memory
|
||
|
|
- The same graduation flow works for `project` → Requirement
|
||
|
|
and `knowledge` → Fact entity types (test the path; doesn't
|
||
|
|
have to be exhaustive)
|
||
|
|
|
||
|
|
### F-8: Provenance chain is complete
|
||
|
|
|
||
|
|
For every active entity in the test project, the following must
|
||
|
|
be true:
|
||
|
|
|
||
|
|
- It links back to at least one source via `source_refs` (which
|
||
|
|
is one or more of: source_chunk_id, source_interaction_id,
|
||
|
|
source_artifact_id from KB import)
|
||
|
|
- The provenance chain can be walked from the entity to the
|
||
|
|
underlying raw text (source_chunks) or external artifact
|
||
|
|
- Q-017 (the evidence query) returns at least one row for every
|
||
|
|
active entity
|
||
|
|
|
||
|
|
If any active entity has no provenance, it's a bug — provenance
|
||
|
|
is mandatory at write time per the promotion rules.
|
||
|
|
|
||
|
|
## Category 2 — Quality acceptance
|
||
|
|
|
||
|
|
### Q-1: All existing tests still pass
|
||
|
|
|
||
|
|
The full pre-V1 test suite (currently 160 tests) must still
|
||
|
|
pass. The V1 implementation may add new tests but cannot regress
|
||
|
|
any existing test.
|
||
|
|
|
||
|
|
### Q-2: V1 has its own test coverage
|
||
|
|
|
||
|
|
For each of F-1 through F-8 above, at least one automated test
|
||
|
|
exists that:
|
||
|
|
|
||
|
|
- exercises the happy path
|
||
|
|
- covers at least one error path
|
||
|
|
- runs in CI in under 10 seconds (no real network, no real LLM)
|
||
|
|
|
||
|
|
The full V1 test suite should be under 30 seconds total runtime
|
||
|
|
to keep the development loop fast.
|
||
|
|
|
||
|
|
### Q-3: Conflict invariants are enforced by tests
|
||
|
|
|
||
|
|
Specific tests must demonstrate:
|
||
|
|
|
||
|
|
- Two contradictory KB exports produce a conflict (not silent
|
||
|
|
overwrite)
|
||
|
|
- A reviewer can't accidentally promote both members of an open
|
||
|
|
conflict to active without resolving the conflict first
|
||
|
|
- The "flag, never block" rule holds — writes still succeed
|
||
|
|
even when they create a conflict
|
||
|
|
|
||
|
|
### Q-4: Trust hierarchy is enforced by tests
|
||
|
|
|
||
|
|
Specific tests must demonstrate:
|
||
|
|
|
||
|
|
- Entity candidates can never appear in context packs
|
||
|
|
- Reinforcement only touches active memories (already covered
|
||
|
|
by Phase 9 Commit B tests, but the same property must hold
|
||
|
|
for entities once they exist)
|
||
|
|
- Nothing automatically writes to project_state ever
|
||
|
|
- Candidates can never satisfy Q-005 (only active entities count)
|
||
|
|
|
||
|
|
### Q-5: The Human Mirror is reproducible
|
||
|
|
|
||
|
|
A golden-file test exists for at least one Mirror page. Updating
|
||
|
|
the golden file is a normal part of template work (single
|
||
|
|
command, well-documented). The test fails if the renderer
|
||
|
|
produces different bytes for the same input, catching
|
||
|
|
non-determinism.
|
||
|
|
|
||
|
|
### Q-6: Killer correctness queries pass against real-ish data
|
||
|
|
|
||
|
|
The test bed for Q-006, Q-009, Q-011 is not synthetic. The
|
||
|
|
implementation must seed the test project with at least:
|
||
|
|
|
||
|
|
- One Requirement that has a satisfying Component (Q-006 should
|
||
|
|
not flag it)
|
||
|
|
- One Requirement with no satisfying Component (Q-006 must flag it)
|
||
|
|
- One Decision based on an Assumption flagged as `needs_review`
|
||
|
|
(Q-009 must flag the Decision)
|
||
|
|
- One ValidationClaim with at least one supporting Result
|
||
|
|
(Q-011 should not flag it)
|
||
|
|
- One ValidationClaim with no supporting Result (Q-011 must flag it)
|
||
|
|
|
||
|
|
These five seed cases run as a single integration test that
|
||
|
|
exercises the killer correctness queries against actual
|
||
|
|
representative data.
|
||
|
|
|
||
|
|
## Category 3 — Operational acceptance
|
||
|
|
|
||
|
|
### O-1: Migration is safe and reversible
|
||
|
|
|
||
|
|
The V1 schema migration (adding the `entities`, `relationships`,
|
||
|
|
`conflicts`, `conflict_members` tables, plus `mirror_regeneration_failures`)
|
||
|
|
must:
|
||
|
|
|
||
|
|
- run cleanly against a production-shape database
|
||
|
|
- be implemented via the same `_apply_migrations` pattern as
|
||
|
|
Phase 9 (additive only, idempotent, safe to run twice)
|
||
|
|
- be tested by spinning up a fresh DB AND running against a
|
||
|
|
copy of the live Dalidou DB taken from a backup
|
||
|
|
|
||
|
|
### O-2: Backup and restore still work
|
||
|
|
|
||
|
|
The backup endpoint must include the new tables. A restore drill
|
||
|
|
on the test project must:
|
||
|
|
|
||
|
|
- successfully back up the V1 entity state via
|
||
|
|
`POST /admin/backup`
|
||
|
|
- successfully validate the snapshot
|
||
|
|
- successfully restore from the snapshot per
|
||
|
|
`docs/backup-restore-procedure.md`
|
||
|
|
- pass post-restore verification including a Q-001 query against
|
||
|
|
the test project
|
||
|
|
|
||
|
|
The drill must be performed once before V1 is declared done.
|
||
|
|
|
||
|
|
### O-3: Performance bounds
|
||
|
|
|
||
|
|
These are starting bounds; tune later if real usage shows
|
||
|
|
problems:
|
||
|
|
|
||
|
|
- Single-entity write (`POST /entities/...`): under 100ms p99
|
||
|
|
on the production Dalidou hardware
|
||
|
|
- Single Q-001 / Q-005 / Q-008 query: under 500ms p99 against
|
||
|
|
a project with up to 1000 entities
|
||
|
|
- Mirror regeneration of one project overview: under 5 seconds
|
||
|
|
for a project with up to 1000 entities
|
||
|
|
- Conflict detector at write time: adds no more than 50ms p99
|
||
|
|
to a write that doesn't actually produce a conflict
|
||
|
|
|
||
|
|
These bounds are not tested by automated benchmarks in V1 (that
|
||
|
|
would be over-engineering). They are sanity-checked by the
|
||
|
|
developer running the operations against the test project.
|
||
|
|
|
||
|
|
### O-4: No new manual ops burden
|
||
|
|
|
||
|
|
V1 should not introduce any new "you have to remember to run X
|
||
|
|
every day" requirement. Specifically:
|
||
|
|
|
||
|
|
- Mirror regeneration is automatic (debounced async + daily
|
||
|
|
refresh), no manual cron entry needed
|
||
|
|
- Conflict detection is automatic at write time, no manual sweep
|
||
|
|
needed in V1 (the nightly sweep is V2)
|
||
|
|
- Backup retention cleanup is **still** an open follow-up from
|
||
|
|
the operational baseline; V1 does not block on it
|
||
|
|
|
||
|
|
### O-5: No regressions in Phase 9 reflection loop
|
||
|
|
|
||
|
|
The capture, reinforcement, and extraction loop from Phase 9
|
||
|
|
A/B/C must continue to work end to end with the engineering
|
||
|
|
layer in place. Specifically:
|
||
|
|
|
||
|
|
- Memories whose types are NOT in the engineering layer
|
||
|
|
(identity, preference, episodic) keep working exactly as
|
||
|
|
before
|
||
|
|
- Memories whose types ARE in the engineering layer (project,
|
||
|
|
knowledge, adaptation) can still be created hand or by
|
||
|
|
extraction; the deprecation rule from `memory-vs-entities.md`
|
||
|
|
("no new writes after V1 ships") is implemented as a
|
||
|
|
configurable warning, not a hard block, so existing
|
||
|
|
workflows aren't disrupted
|
||
|
|
|
||
|
|
## Category 4 — Documentation acceptance
|
||
|
|
|
||
|
|
### D-1: Per-entity-type spec docs
|
||
|
|
|
||
|
|
Each of the 12 V1 entity types has a short spec doc under
|
||
|
|
`docs/architecture/entities/` covering:
|
||
|
|
|
||
|
|
- the entity's purpose
|
||
|
|
- its required and optional fields
|
||
|
|
- its lifecycle quirks (if any beyond the standard
|
||
|
|
candidate/active/superseded/invalid)
|
||
|
|
- which queries it appears in (cross-reference to the catalog)
|
||
|
|
- which relationship types reference it
|
||
|
|
|
||
|
|
These docs can be terse — a page each, mostly bullet lists.
|
||
|
|
Their purpose is to make the entity model legible to a future
|
||
|
|
maintainer, not to be reference manuals.
|
||
|
|
|
||
|
|
### D-2: KB-CAD and KB-FEM export schema docs
|
||
|
|
|
||
|
|
`docs/architecture/kb-cad-export-schema.md` and
|
||
|
|
`docs/architecture/kb-fem-export-schema.md` are written and
|
||
|
|
match the implemented validators.
|
||
|
|
|
||
|
|
### D-3: V1 release notes
|
||
|
|
|
||
|
|
A `docs/v1-release-notes.md` summarizes:
|
||
|
|
|
||
|
|
- What V1 added (entities, relationships, conflicts, mirror,
|
||
|
|
ingest endpoints)
|
||
|
|
- What V1 deferred (auto-promotion, BOM/cost/manufacturing
|
||
|
|
entities, NX direct integration, cross-project rollups)
|
||
|
|
- The migration story for existing memories (graduation flow)
|
||
|
|
- Known limitations and the V2 roadmap pointers
|
||
|
|
|
||
|
|
### D-4: master-plan-status.md and current-state.md updated
|
||
|
|
|
||
|
|
Both top-level status docs reflect V1's completion:
|
||
|
|
|
||
|
|
- Phase 6 (AtoDrive) and the engineering layer are explicitly
|
||
|
|
marked as separate tracks
|
||
|
|
- The engineering planning sprint section is marked complete
|
||
|
|
- Phase 9 stays at "baseline complete" (V1 doesn't change Phase 9)
|
||
|
|
- The engineering layer V1 is added as its own line item
|
||
|
|
|
||
|
|
## What V1 explicitly does NOT need to do
|
||
|
|
|
||
|
|
To prevent scope creep, here is the negative list. None of the
|
||
|
|
following are V1 acceptance criteria:
|
||
|
|
|
||
|
|
- **No LLM extractor.** The Phase 9 C rule-based extractor is
|
||
|
|
the entity extractor for V1 too, just with new rules added for
|
||
|
|
entity types.
|
||
|
|
- **No auto-promotion of candidates.** Per `promotion-rules.md`.
|
||
|
|
- **No write-back to KB-CAD or KB-FEM.** Per
|
||
|
|
`tool-handoff-boundaries.md`.
|
||
|
|
- **No multi-user / per-reviewer auth.** Single-user assumed.
|
||
|
|
- **No real-time UI.** API + Mirror markdown is the V1 surface.
|
||
|
|
A web UI is V2+.
|
||
|
|
- **No cross-project rollups.** Per `human-mirror-rules.md`.
|
||
|
|
- **No time-travel queries** (Q-015 stays v1-stretch).
|
||
|
|
- **No nightly conflict sweep.** Synchronous detection only in V1.
|
||
|
|
- **No incremental Chroma snapshots.** The current full-copy
|
||
|
|
approach in `backup-restore-procedure.md` is fine for V1.
|
||
|
|
- **No retention cleanup script.** Still an open follow-up.
|
||
|
|
- **No backup encryption.** Still an open follow-up.
|
||
|
|
- **No off-Dalidou backup target.** Still an open follow-up.
|
||
|
|
|
||
|
|
## How to use this document during implementation
|
||
|
|
|
||
|
|
When the implementation sprint begins:
|
||
|
|
|
||
|
|
1. Read this doc once, top to bottom
|
||
|
|
2. Pick the test project (probably p05-interferometer because
|
||
|
|
the optical/structural domain has the cleanest entity model)
|
||
|
|
3. For each section, write the test or the implementation, in
|
||
|
|
roughly the order: F-1 → F-2 → F-3 → F-4 → F-5 → F-6 → F-7 → F-8
|
||
|
|
4. Each acceptance criterion's test should be written **before
|
||
|
|
or alongside** the implementation, not after
|
||
|
|
5. Run the full test suite at every commit
|
||
|
|
6. When every box is checked, write D-3 (release notes), update
|
||
|
|
D-4 (status docs), and call V1 done
|
||
|
|
|
||
|
|
The implementation sprint should not touch anything outside the
|
||
|
|
scope listed here. If a desire arises to add something not in
|
||
|
|
this doc, that's a V2 conversation, not a V1 expansion.
|
||
|
|
|
||
|
|
## Anticipated friction points
|
||
|
|
|
||
|
|
These are the things I expect will be hard during implementation:
|
||
|
|
|
||
|
|
1. **The graduation flow (F-7)** is the most cross-cutting
|
||
|
|
change because it touches the existing memory module.
|
||
|
|
Worth doing it last so the memory module is stable for
|
||
|
|
all the V1 entity work first.
|
||
|
|
2. **The Mirror's deterministic-output requirement (Q-5)** will
|
||
|
|
bite if the implementer iterates over Python dicts without
|
||
|
|
sorting. Plan to use `sorted()` literally everywhere.
|
||
|
|
3. **Conflict detection (F-5)** has subtle correctness traps:
|
||
|
|
the slot key extraction must be stable, the dedup-of-existing-conflicts
|
||
|
|
logic must be right, and the synchronous detector must not
|
||
|
|
slow writes meaningfully (Q-3 / O-3 cover this, but watch).
|
||
|
|
4. **Provenance backfill** for entities that come from the
|
||
|
|
existing memory layer via graduation (F-7) is the trickiest
|
||
|
|
part: the original memory may not have had a strict
|
||
|
|
`source_chunk_id`, in which case the graduated entity also
|
||
|
|
doesn't have one. The implementation needs an "orphan
|
||
|
|
provenance" allowance for graduated entities, with a
|
||
|
|
warning surfaced in the Mirror.
|
||
|
|
|
||
|
|
These aren't blockers, just the parts of the V1 spec I'd
|
||
|
|
attack with extra care.
|
||
|
|
|
||
|
|
## TL;DR
|
||
|
|
|
||
|
|
- Engineering V1 is done when every box in this doc is checked
|
||
|
|
against one chosen active project
|
||
|
|
- Functional: 8 criteria covering entities, queries, ingest,
|
||
|
|
review queue, conflicts, mirror, graduation, provenance
|
||
|
|
- Quality: 6 criteria covering tests, golden files, killer
|
||
|
|
correctness, trust enforcement
|
||
|
|
- Operational: 5 criteria covering migration safety, backup
|
||
|
|
drill, performance bounds, no new manual ops, Phase 9 not
|
||
|
|
regressed
|
||
|
|
- Documentation: 4 criteria covering entity specs, KB schema
|
||
|
|
docs, release notes, top-level status updates
|
||
|
|
- Negative list: a clear set of things V1 deliberately does
|
||
|
|
NOT need to do, to prevent scope creep
|
||
|
|
- The implementation sprint follows this doc as a checklist
|