Stabilize test suite: fix flaky timing tests, add rerun support, and handle Windows resource leaks#11992
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11992 +/- ##
==========================================
- Coverage 98.76% 98.75% -0.01%
==========================================
Files 127 127
Lines 44655 44674 +19
Branches 2367 2365 -2
==========================================
+ Hits 44102 44117 +15
- Misses 393 396 +3
- Partials 160 161 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
|
|
I think the expectation was that import time would improve again in later versions. I think flaky failures today are mostly caused by regressions (e.g. idna: kjd/idna#188). |
2a4a9d7 to
cf37399
Compare
for more information, see https://pre-commit.ci
6fae85d to
2cda92f
Compare
|
The CI failure on I've updated it. Let me know if this looks good |
5febb04 to
cd1e76d
Compare
be1fa7f to
553f63e
Compare
webknjaz
left a comment
There was a problem hiding this comment.
Can we try using https://pypi.org/p/pytest-rerunfailures? It should be more generic.
| @@ -1333,11 +1333,23 @@ async def handler(request: Request) -> Response: | |||
|
|
|||
| def test_regex_performance() -> None: | |||
There was a problem hiding this comment.
I wonder if this belongs in tests executed by codspeed 🤔
cc @bdraco
There was a problem hiding this comment.
We could do, but the reason I put it here is that the exact performance is not important, it's designed to trip if there's a ReDoS issue, which is an order of magnitude problem. I just misjudged the initial value.
There was a problem hiding this comment.
For reference, on my machine this now takes ~4ms, on the old regex it took closer to 60s. So, increasing the test threshold to 50ms is absolutely fine here.
We could also consider just skipping it on Mac OS, given that it seems to be 5-10x slower than Linux...
tests/test_imports.py
Outdated
There was a problem hiding this comment.
I think bdraco mentioned that he might look at this one at some point. I'm not sure it's a trivial one to run in codspeed.
There was a problem hiding this comment.
I imagine we could try their new CLI they've announced the other day (I think I got an email yesterday). But I agree this is not in the scope of this PR.
I think we can just double that to 20ms. It's meant to highlight an issue as an order of magnitude, so I clearly misjudged the performance of Mac OS. |
|
Thanks @webknjaz for the suggesting pytest-rerunfailures. |
- Converted get_flaky_threshold function to a pytest fixture using indirect parametrization - Renamed to rerun_adjusted_threshold for clarity - Updated RST docstring with usage examples and rerun count logic - Added proper type annotations (tuple[float, float]) for mypy compliance - Updated test_imports.py and test_client_middleware_digest_auth.py to use new fixture - Improved code readability by splitting long assertion lines - Restored macOS timing observation comment (40-50ms)
The 0.1s delay was insufficient for proxy.py worker threads to release their sockets. Increased to 0.5s and added 3 gc.collect() passes to ensure all cyclic references are broken before pytest cleanup.
Instead of waiting a fixed 5+ seconds, poll for proxy threads to finish with gc.collect() calls. This is faster on typical runs (exits as soon as threads are gone) while still having a 5s timeout for robustness.
for more information, see https://pre-commit.ci
Instead of matching thread names (which was unreliable), capture the baseline set of threads before starting proxy.py and wait until all extra threads have finished.
…ad tracking and adding post-loop garbage collection.
webknjaz
left a comment
There was a problem hiding this comment.
I think it's good but let me know if you'd like to drop that helper function before merging.
Urgh.. looks like my browser cached the previous diff and I commented on the wrong thing.
1138854 to
ad45b76
Compare
50102d7 to
ac1cb5b
Compare
for more information, see https://pre-commit.ci
|
Addressed intermittent CI failures on Windows with Python 3.10 and 3.11 in |
…est thresholds Replaces tuple-based threshold parameters with a self-documenting RerunThresholdParams NamedTuple containing 'base' and 'increment_per_rerun' fields for improved readability and maintainability.
Description
What do these changes do?
Timing-based performance tests:
test_import_timeto usesys.version_info >= (3, 12)for relaxed thresholds, making it automatically apply to Python 3.12+ (including 3.14) without manual version additions. This fixes intermittent failures in CI.test_regex_performanceflakiness with adjusted thresholds and refactoring.RerunThresholdParamsNamedTuple andrerun_adjusted_thresholdfixture for cleaner threshold handling.Flaky test handling:
pytest-rerunfailuresplugin to CI requirements and configure it to automatically rerun flaky tests, reducing spurious failures.Resource leak and cleanup fixes (primarily Windows):
Are there changes in behavior for the user?
No user-facing changes. This only affects CI test behavior.
Is it a substantial burden for the maintainers to support this?
No — this actually reduces maintenance burden by making the test future-proof.
Related issue number
Fixes flaky test failures in PR #11990.
Checklist
CONTRIBUTORS.txtCHANGES/folder