Improve test coverage from 55.7% to 75%+ through strategic testing improvements

## Problem

Current test coverage is **55.7% combined** (37.2% unit, 29.0% integration, 32.0% e2e), despite having extensive tests (~16k lines of test code for ~27k lines of source). Analysis reveals three key issues preventing higher coverage:

### Root Causes

1. **Entire modules untested**: ~1,400 lines (31% of gap)
   - `visualization/` module: **1,296 lines with 0% coverage**
   - `tools/stack_buckets.py`: **95 lines with 0% coverage** (may be dead code)

2. **Over-mocking in unit tests**: ~1,500 lines (36% of gap)
   - Unit tests mock external dependencies (quilt3, boto3, etc.) which is correct
   - BUT: Tests don't execute real code paths, just test that mocks are called correctly
   - Example: `test_quilt_service.py` has 109 mock statements for a 775-line module
   - Result: Unit tests show 82.6% coverage but mostly test mock interactions, not actual logic

3. **Siloed test coverage**: Only 18.5% overlap between test suites
   - Tests are highly specialized by category (unit/integration/e2e)
   - Different test types hit completely different code paths
   - Examples:
     - `quilt_service.py`: 82.6% unit, 21.5% integration → only 83.3% combined
     - `buckets.py`: 7.9% unit, 87.4% integration → relies entirely on integration tests
     - `packages.py`: 13.9% unit, 41.0% integration, 47.7% e2e → 81.4% combined

## Goals

Achieve **75%+ combined coverage** through strategic improvements, focusing on:
- ✅ Testing behavior, not mocking
- ✅ Combined coverage as primary metric (not individual test categories)
- ✅ Removing or testing dead code

## Proposed Changes

### Phase 1: Low-Hanging Fruit (Expected: +10-15% coverage)

#### 1.1 Test or Remove Visualization Module
- [ ] Audit `src/quilt_mcp/visualization/` (1,296 lines, 0% coverage)
- [ ] Determine if this module is actively used
- [ ] If used: Add test suite for critical paths
- [ ] If unused: Remove or mark as experimental/untested

#### 1.2 Test or Remove stack_buckets.py
- [ ] Audit `src/quilt_mcp/tools/stack_buckets.py` (95 lines, 0% coverage)
- [ ] Check if this is dead code or just untested
- [ ] Either add tests or remove if obsolete

### Phase 2: Reduce Over-Mocking (Expected: +5-10% coverage)

#### 2.1 Refactor High-Mock Test Files
Focus on these files with excessive mocking:

| Test File | Mock Count | Source Coverage | Issue |
|-----------|------------|-----------------|-------|
| `test_quilt_service.py` | 109 | 82.6% unit only | Mocks bypass real code |
| `test_utils.py` | 48 | 53.6% combined | Over-mocked |
| `test_tabulator.py` | 31 | 37.7% combined | Over-mocked |
| `test_selector_fn.py` | 23 | Unknown | Over-mocked |

**Strategy:**
- Keep unit tests for pure logic (validation, parsing, formatting)
- Move integration-heavy tests to integration suite
- Use real implementations with fake data instead of mocking everything

**Example Refactor:**
```python
# BEFORE: Unit test that just tests mock interactions
def test_get_catalog_config():
    mock_session = Mock()
    mock_response = Mock()
    mock_response.json.return_value = fake_config
    mock_session.get.return_value = mock_response
    # ... rest of test mocks behavior, doesn't test real code

# AFTER: Integration test with real HTTP (or at least httpx with respx)
def test_get_catalog_config_integration():
    # Use real HTTP client with mocked responses
    with respx.mock:
        respx.get("https://catalog.com/config.json").mock(
            return_value=httpx.Response(200, json=fake_config)
        )
        result = service.get_catalog_config("https://catalog.com")
        # Tests ACTUAL parsing/validation logic, not just mock calls
```

### Phase 3: Add Strategic Integration Tests (Expected: +5-10% coverage)

Focus on modules with high unit-only coverage but low integration coverage:

| Module | Unit | Integration | Gap |
|--------|------|-------------|-----|
| `error_recovery.py` | 59.9% | 0.0% | 127 lines unit-only |
| `workflow_service.py` | 66.5% | 18.1% | 91 lines unit-only |
| `governance_service.py` | 59.4% | 12.9% | 102 lines unit-only |
| `data_visualization.py` | 55.6% | 13.1% | 130 lines unit-only |

**Strategy:**
- Add integration tests that exercise real workflows
- Focus on error handling paths
- Test integration points between services

## Success Metrics

- [ ] Combined coverage reaches 75%+
- [ ] No entire modules with 0% coverage (except marked as exempt)
- [ ] Reduced mocking ratio: <0.5 mocks per test function (currently 1.4)
- [ ] Individual test suite thresholds remain low (current: unit 30%, integration 25%, e2e 28%)

## Non-Goals

- ❌ Achieving 100% coverage
- ❌ High individual test category coverage (unit/integration/e2e)
- ❌ Testing every edge case
- ❌ Refactoring entire test suite

## Implementation Plan

1. **Phase 1** (1-2 days): Quick wins by addressing untested modules
2. **Phase 2** (3-5 days): Refactor high-mock test files incrementally
3. **Phase 3** (2-3 days): Add strategic integration tests for key modules
4. **Continuous**: Monitor coverage in CI, prevent regression

## References

- Coverage analysis: `build/test-results/coverage-analysis.csv`
- Current thresholds: `scripts/tests/coverage_required.yaml`
- Coverage philosophy documented in threshold file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve test coverage from 55.7% to 75%+ through strategic testing improvements #238

Problem

Root Causes

Goals

Proposed Changes

Phase 1: Low-Hanging Fruit (Expected: +10-15% coverage)

1.1 Test or Remove Visualization Module

1.2 Test or Remove stack_buckets.py

Phase 2: Reduce Over-Mocking (Expected: +5-10% coverage)

2.1 Refactor High-Mock Test Files

Phase 3: Add Strategic Integration Tests (Expected: +5-10% coverage)

Success Metrics

Non-Goals

Implementation Plan

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test File	Mock Count	Source Coverage	Issue
`test_quilt_service.py`	109	82.6% unit only	Mocks bypass real code
`test_utils.py`	48	53.6% combined	Over-mocked
`test_tabulator.py`	31	37.7% combined	Over-mocked
`test_selector_fn.py`	23	Unknown	Over-mocked

Module	Unit	Integration	Gap
`error_recovery.py`	59.9%	0.0%	127 lines unit-only
`workflow_service.py`	66.5%	18.1%	91 lines unit-only
`governance_service.py`	59.4%	12.9%	102 lines unit-only
`data_visualization.py`	55.6%	13.1%	130 lines unit-only

Improve test coverage from 55.7% to 75%+ through strategic testing improvements #238

Description

Problem

Root Causes

Goals

Proposed Changes

Phase 1: Low-Hanging Fruit (Expected: +10-15% coverage)

1.1 Test or Remove Visualization Module

1.2 Test or Remove stack_buckets.py

Phase 2: Reduce Over-Mocking (Expected: +5-10% coverage)

2.1 Refactor High-Mock Test Files

Phase 3: Add Strategic Integration Tests (Expected: +5-10% coverage)

Success Metrics

Non-Goals

Implementation Plan

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions