Skip to content

Improve test coverage from 55.7% to 75%+ through strategic testing improvements #238

@drernie

Description

@drernie

Problem

Current test coverage is 55.7% combined (37.2% unit, 29.0% integration, 32.0% e2e), despite having extensive tests (~16k lines of test code for ~27k lines of source). Analysis reveals three key issues preventing higher coverage:

Root Causes

  1. Entire modules untested: ~1,400 lines (31% of gap)

    • visualization/ module: 1,296 lines with 0% coverage
    • tools/stack_buckets.py: 95 lines with 0% coverage (may be dead code)
  2. Over-mocking in unit tests: ~1,500 lines (36% of gap)

    • Unit tests mock external dependencies (quilt3, boto3, etc.) which is correct
    • BUT: Tests don't execute real code paths, just test that mocks are called correctly
    • Example: test_quilt_service.py has 109 mock statements for a 775-line module
    • Result: Unit tests show 82.6% coverage but mostly test mock interactions, not actual logic
  3. Siloed test coverage: Only 18.5% overlap between test suites

    • Tests are highly specialized by category (unit/integration/e2e)
    • Different test types hit completely different code paths
    • Examples:
      • quilt_service.py: 82.6% unit, 21.5% integration → only 83.3% combined
      • buckets.py: 7.9% unit, 87.4% integration → relies entirely on integration tests
      • packages.py: 13.9% unit, 41.0% integration, 47.7% e2e → 81.4% combined

Goals

Achieve 75%+ combined coverage through strategic improvements, focusing on:

  • ✅ Testing behavior, not mocking
  • ✅ Combined coverage as primary metric (not individual test categories)
  • ✅ Removing or testing dead code

Proposed Changes

Phase 1: Low-Hanging Fruit (Expected: +10-15% coverage)

1.1 Test or Remove Visualization Module

  • Audit src/quilt_mcp/visualization/ (1,296 lines, 0% coverage)
  • Determine if this module is actively used
  • If used: Add test suite for critical paths
  • If unused: Remove or mark as experimental/untested

1.2 Test or Remove stack_buckets.py

  • Audit src/quilt_mcp/tools/stack_buckets.py (95 lines, 0% coverage)
  • Check if this is dead code or just untested
  • Either add tests or remove if obsolete

Phase 2: Reduce Over-Mocking (Expected: +5-10% coverage)

2.1 Refactor High-Mock Test Files

Focus on these files with excessive mocking:

Test File Mock Count Source Coverage Issue
test_quilt_service.py 109 82.6% unit only Mocks bypass real code
test_utils.py 48 53.6% combined Over-mocked
test_tabulator.py 31 37.7% combined Over-mocked
test_selector_fn.py 23 Unknown Over-mocked

Strategy:

  • Keep unit tests for pure logic (validation, parsing, formatting)
  • Move integration-heavy tests to integration suite
  • Use real implementations with fake data instead of mocking everything

Example Refactor:

# BEFORE: Unit test that just tests mock interactions
def test_get_catalog_config():
    mock_session = Mock()
    mock_response = Mock()
    mock_response.json.return_value = fake_config
    mock_session.get.return_value = mock_response
    # ... rest of test mocks behavior, doesn't test real code

# AFTER: Integration test with real HTTP (or at least httpx with respx)
def test_get_catalog_config_integration():
    # Use real HTTP client with mocked responses
    with respx.mock:
        respx.get("https://catalog.com/config.json").mock(
            return_value=httpx.Response(200, json=fake_config)
        )
        result = service.get_catalog_config("https://catalog.com")
        # Tests ACTUAL parsing/validation logic, not just mock calls

Phase 3: Add Strategic Integration Tests (Expected: +5-10% coverage)

Focus on modules with high unit-only coverage but low integration coverage:

Module Unit Integration Gap
error_recovery.py 59.9% 0.0% 127 lines unit-only
workflow_service.py 66.5% 18.1% 91 lines unit-only
governance_service.py 59.4% 12.9% 102 lines unit-only
data_visualization.py 55.6% 13.1% 130 lines unit-only

Strategy:

  • Add integration tests that exercise real workflows
  • Focus on error handling paths
  • Test integration points between services

Success Metrics

  • Combined coverage reaches 75%+
  • No entire modules with 0% coverage (except marked as exempt)
  • Reduced mocking ratio: <0.5 mocks per test function (currently 1.4)
  • Individual test suite thresholds remain low (current: unit 30%, integration 25%, e2e 28%)

Non-Goals

  • ❌ Achieving 100% coverage
  • ❌ High individual test category coverage (unit/integration/e2e)
  • ❌ Testing every edge case
  • ❌ Refactoring entire test suite

Implementation Plan

  1. Phase 1 (1-2 days): Quick wins by addressing untested modules
  2. Phase 2 (3-5 days): Refactor high-mock test files incrementally
  3. Phase 3 (2-3 days): Add strategic integration tests for key modules
  4. Continuous: Monitor coverage in CI, prevent regression

References

  • Coverage analysis: build/test-results/coverage-analysis.csv
  • Current thresholds: scripts/tests/coverage_required.yaml
  • Coverage philosophy documented in threshold file

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions