-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
Current test coverage is 55.7% combined (37.2% unit, 29.0% integration, 32.0% e2e), despite having extensive tests (~16k lines of test code for ~27k lines of source). Analysis reveals three key issues preventing higher coverage:
Root Causes
-
Entire modules untested: ~1,400 lines (31% of gap)
visualization/module: 1,296 lines with 0% coveragetools/stack_buckets.py: 95 lines with 0% coverage (may be dead code)
-
Over-mocking in unit tests: ~1,500 lines (36% of gap)
- Unit tests mock external dependencies (quilt3, boto3, etc.) which is correct
- BUT: Tests don't execute real code paths, just test that mocks are called correctly
- Example:
test_quilt_service.pyhas 109 mock statements for a 775-line module - Result: Unit tests show 82.6% coverage but mostly test mock interactions, not actual logic
-
Siloed test coverage: Only 18.5% overlap between test suites
- Tests are highly specialized by category (unit/integration/e2e)
- Different test types hit completely different code paths
- Examples:
quilt_service.py: 82.6% unit, 21.5% integration → only 83.3% combinedbuckets.py: 7.9% unit, 87.4% integration → relies entirely on integration testspackages.py: 13.9% unit, 41.0% integration, 47.7% e2e → 81.4% combined
Goals
Achieve 75%+ combined coverage through strategic improvements, focusing on:
- ✅ Testing behavior, not mocking
- ✅ Combined coverage as primary metric (not individual test categories)
- ✅ Removing or testing dead code
Proposed Changes
Phase 1: Low-Hanging Fruit (Expected: +10-15% coverage)
1.1 Test or Remove Visualization Module
- Audit
src/quilt_mcp/visualization/(1,296 lines, 0% coverage) - Determine if this module is actively used
- If used: Add test suite for critical paths
- If unused: Remove or mark as experimental/untested
1.2 Test or Remove stack_buckets.py
- Audit
src/quilt_mcp/tools/stack_buckets.py(95 lines, 0% coverage) - Check if this is dead code or just untested
- Either add tests or remove if obsolete
Phase 2: Reduce Over-Mocking (Expected: +5-10% coverage)
2.1 Refactor High-Mock Test Files
Focus on these files with excessive mocking:
| Test File | Mock Count | Source Coverage | Issue |
|---|---|---|---|
test_quilt_service.py |
109 | 82.6% unit only | Mocks bypass real code |
test_utils.py |
48 | 53.6% combined | Over-mocked |
test_tabulator.py |
31 | 37.7% combined | Over-mocked |
test_selector_fn.py |
23 | Unknown | Over-mocked |
Strategy:
- Keep unit tests for pure logic (validation, parsing, formatting)
- Move integration-heavy tests to integration suite
- Use real implementations with fake data instead of mocking everything
Example Refactor:
# BEFORE: Unit test that just tests mock interactions
def test_get_catalog_config():
mock_session = Mock()
mock_response = Mock()
mock_response.json.return_value = fake_config
mock_session.get.return_value = mock_response
# ... rest of test mocks behavior, doesn't test real code
# AFTER: Integration test with real HTTP (or at least httpx with respx)
def test_get_catalog_config_integration():
# Use real HTTP client with mocked responses
with respx.mock:
respx.get("https://catalog.com/config.json").mock(
return_value=httpx.Response(200, json=fake_config)
)
result = service.get_catalog_config("https://catalog.com")
# Tests ACTUAL parsing/validation logic, not just mock callsPhase 3: Add Strategic Integration Tests (Expected: +5-10% coverage)
Focus on modules with high unit-only coverage but low integration coverage:
| Module | Unit | Integration | Gap |
|---|---|---|---|
error_recovery.py |
59.9% | 0.0% | 127 lines unit-only |
workflow_service.py |
66.5% | 18.1% | 91 lines unit-only |
governance_service.py |
59.4% | 12.9% | 102 lines unit-only |
data_visualization.py |
55.6% | 13.1% | 130 lines unit-only |
Strategy:
- Add integration tests that exercise real workflows
- Focus on error handling paths
- Test integration points between services
Success Metrics
- Combined coverage reaches 75%+
- No entire modules with 0% coverage (except marked as exempt)
- Reduced mocking ratio: <0.5 mocks per test function (currently 1.4)
- Individual test suite thresholds remain low (current: unit 30%, integration 25%, e2e 28%)
Non-Goals
- ❌ Achieving 100% coverage
- ❌ High individual test category coverage (unit/integration/e2e)
- ❌ Testing every edge case
- ❌ Refactoring entire test suite
Implementation Plan
- Phase 1 (1-2 days): Quick wins by addressing untested modules
- Phase 2 (3-5 days): Refactor high-mock test files incrementally
- Phase 3 (2-3 days): Add strategic integration tests for key modules
- Continuous: Monitor coverage in CI, prevent regression
References
- Coverage analysis:
build/test-results/coverage-analysis.csv - Current thresholds:
scripts/tests/coverage_required.yaml - Coverage philosophy documented in threshold file
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels