verify-work: add cold-start smoke test auto-injection for server/db phases

## Problem

`/gsd:verify-work` extracts tests from SUMMARY.md accomplishments, which describe what was **built** - not what **works after restart**. This creates a systematic blind spot for cold-start bugs.

### Real-world example

Phase 32 of our project passed 9/9 UAT tests, but shipped a P0 bug:

- `createTableMs: 50` in dynalite caused a race condition where tables weren't ACTIVE when seed ran
- Silent `try/catch` in server.ts masked the error
- Server started green with **zero data**
- Every UI interaction was broken

**Why verify-work missed it:** All 9 tests ran against a warm server (already seeded from a prior session). The race only manifests on cold start. 8 of 9 tests were code-review checks (file exists, TypeScript compiles, logic branches correctly). The 1 runtime test reused the already-running server.

## Root Cause (3 compounding failures)

1. **Test extraction only reads SUMMARYs** - SUMMARYs describe build claims, not runtime behavior
2. **No runtime vs code-review distinction** - Sub-agents default to reading code, not executing it
3. **No destructive reset test** - No test killed the server, wiped state, and restarted from scratch

## Proposed Enhancement

### 1. Cold-start test auto-injection (highest impact, lowest effort)

In the `extract_tests` step, pattern-match on modified files:

```
IF phase modifies any of: [server.ts, database/*, seed/*, index.ts, startup*, config.*]
THEN auto-add test:
  name: "Cold Start Smoke Test"
  expected: "Kill server, clear state, restart from scratch. 
             Server boots without errors, seed completes, 
             primary query returns data."
  type: runtime
  destructive: true
```

### 2. Runtime vs code-review test tagging (medium effort)

Tag each test as `type: runtime` or `type: code-review` in the UAT file. Enforce a minimum ratio (e.g., 30% runtime). Sub-agents doing code review should be explicitly told "run the code, don't just read it."

### 3. Destructive reset before E2E tests (medium effort)

For tests tagged `destructive: true`, the workflow should kill running services and clear ephemeral state before executing.

## Summary

The core insight: **verify-work trusts SUMMARYs (what was built) instead of testing reality (what works after restart)**. This is the AI-agent equivalent of "works on my machine" - the agent's "machine" was a warm server with pre-seeded data.

---

*GSD v1.22.0, reported from a real Phase 32 UAT session*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

verify-work: add cold-start smoke test auto-injection for server/db phases #904

Problem

Real-world example

Root Cause (3 compounding failures)

Proposed Enhancement

1. Cold-start test auto-injection (highest impact, lowest effort)

2. Runtime vs code-review test tagging (medium effort)

3. Destructive reset before E2E tests (medium effort)

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

verify-work: add cold-start smoke test auto-injection for server/db phases #904

Description

Problem

Real-world example

Root Cause (3 compounding failures)

Proposed Enhancement

1. Cold-start test auto-injection (highest impact, lowest effort)

2. Runtime vs code-review test tagging (medium effort)

3. Destructive reset before E2E tests (medium effort)

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions