BDD test suite has ~12% failure rate in CI due to flaky lifecycle tests

## Summary

The BDD test suite running with `--parallel 50` in CI has become flaky, with approximately 12% of CI runs failing (4 failures out of 33 recent runs). Tests pass reliably in isolation locally but fail intermittently under high parallelism.

## Affected Scenarios

| Scenario | Failures | Failure Pattern |
|----------|----------|-----------------|
| `mcpserver-tool-call-lifecycle` | 2 | Timeout at ~31-36s |
| `mcpserver-streamable-http-tool-call-lifecycle` | 1 | Timeout at ~31s |
| `oauth-sso-state-sync-after-login` | 1 | Timeout at ~10.8s |

## Common Characteristics

1. **All failing tests use `wait_for_state: 30s`** for polling state transitions
2. **All involve service lifecycle transitions** (start/stop/reconnect)
3. **Tests pass locally in isolation** but fail under high parallelism (50 workers)
4. **Timeout pattern**: Tests hit exactly 30s+ which matches the `wait_for_state` timeout

## Root Cause Hypotheses

1. **Resource contention under high parallelism**: 50 parallel muster instances competing for CPU/ports
2. **State transition timing**: Service state changes may be slower under CI load
3. **Mock server startup delays**: HTTP/SSE mock servers may take longer to initialize under contention
4. **Polling interval too slow**: 1s poll interval with 30s timeout = only 30 attempts

## Evidence

Recent CI failures:
- https://github.com/giantswarm/muster/actions/runs/21341791094 - `mcpserver-streamable-http-tool-call-lifecycle`
- https://github.com/giantswarm/muster/actions/runs/21341603315 - `oauth-sso-state-sync-after-login`
- https://github.com/giantswarm/muster/actions/runs/21341509493 - `mcpserver-tool-call-lifecycle`
- https://github.com/giantswarm/muster/actions/runs/21340267184 - `mcpserver-tool-call-lifecycle`

## Proposed Investigation

### Phase 1: Reliable Reproduction
- [ ] Create stress test script to run N iterations locally with 50 parallel workers
- [ ] Establish baseline failure rate locally vs CI
- [ ] Identify if failures are random or specific to certain scenarios

### Phase 2: Instrumentation
- [ ] Add timing diagnostics to `muster_manager.go` (port allocation, process startup, mock server init)
- [ ] Add timing diagnostics to `test_runner.go` (`wait_for_state` polling, state transitions)
- [ ] Collect detailed logs from failed CI runs

### Phase 3: Fixes to Consider
- [ ] Increase `wait_for_state` timeout from 30s to 60s for lifecycle tests
- [ ] Implement exponential backoff in state polling (start at 500ms, backoff to 5s)
- [ ] Add startup jitter to prevent thundering herd in parallel execution
- [ ] Consider reducing CI parallelism from 50 to 25

## Acceptance Criteria

- [ ] Test suite failure rate in CI < 1% over 20+ consecutive runs
- [ ] No changes to test behavior or coverage
- [ ] Root cause documented for future reference

## Related

- Test framework: `internal/testing/test_runner.go`
- Instance manager: `internal/testing/muster_manager.go`
- CI workflow: `.github/workflows/ci.yaml`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BDD test suite has ~12% failure rate in CI due to flaky lifecycle tests #309

Summary

Affected Scenarios

Common Characteristics

Root Cause Hypotheses

Evidence

Proposed Investigation

Phase 1: Reliable Reproduction

Phase 2: Instrumentation

Phase 3: Fixes to Consider

Acceptance Criteria

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scenario	Failures	Failure Pattern
`mcpserver-tool-call-lifecycle`	2	Timeout at ~31-36s
`mcpserver-streamable-http-tool-call-lifecycle`	1	Timeout at ~31s
`oauth-sso-state-sync-after-login`	1	Timeout at ~10.8s

BDD test suite has ~12% failure rate in CI due to flaky lifecycle tests #309

Description

Summary

Affected Scenarios

Common Characteristics

Root Cause Hypotheses

Evidence

Proposed Investigation

Phase 1: Reliable Reproduction

Phase 2: Instrumentation

Phase 3: Fixes to Consider

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions