Skip to content

Stabilize test suite: fix flaky timing tests, add rerun support, and handle Windows resource leaks#11992

Open
rodrigobnogueira wants to merge 27 commits intoaio-libs:masterfrom
rodrigobnogueira:fix/import-time-test-python-3.14
Open

Stabilize test suite: fix flaky timing tests, add rerun support, and handle Windows resource leaks#11992
rodrigobnogueira wants to merge 27 commits intoaio-libs:masterfrom
rodrigobnogueira:fix/import-time-test-python-3.14

Conversation

@rodrigobnogueira
Copy link
Member

@rodrigobnogueira rodrigobnogueira commented Jan 24, 2026

Description

What do these changes do?

  • Timing-based performance tests:

    • Refactor test_import_time to use sys.version_info >= (3, 12) for relaxed thresholds, making it automatically apply to Python 3.12+ (including 3.14) without manual version additions. This fixes intermittent failures in CI.
    • Similarly stabilize test_regex_performance flakiness with adjusted thresholds and refactoring.
    • Introduce a RerunThresholdParams NamedTuple and rerun_adjusted_threshold fixture for cleaner threshold handling.
  • Flaky test handling:

    • Add pytest-rerunfailures plugin to CI requirements and configure it to automatically rerun flaky tests, reducing spurious failures.
  • Resource leak and cleanup fixes (primarily Windows):

Are there changes in behavior for the user?

No user-facing changes. This only affects CI test behavior.

Is it a substantial burden for the maintainers to support this?

No — this actually reduces maintenance burden by making the test future-proof.

Related issue number

Fixes flaky test failures in PR #11990.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
  • Add a new news fragment into the CHANGES/ folder

@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jan 24, 2026
@codecov
Copy link

codecov bot commented Jan 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.75%. Comparing base (31fce7d) to head (0f3fb76).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11992      +/-   ##
==========================================
- Coverage   98.76%   98.75%   -0.01%     
==========================================
  Files         127      127              
  Lines       44655    44674      +19     
  Branches     2367     2365       -2     
==========================================
+ Hits        44102    44117      +15     
- Misses        393      396       +3     
- Partials      160      161       +1     
Flag Coverage Δ
CI-GHA 98.60% <98.03%> (-0.02%) ⬇️
OS-Linux 98.34% <96.07%> (-0.02%) ⬇️
OS-Windows 96.71% <90.19%> (+0.01%) ⬆️
OS-macOS 97.60% <88.23%> (+0.01%) ⬆️
Py-3.10.11 97.14% <90.19%> (+0.01%) ⬆️
Py-3.10.19 97.62% <96.07%> (-0.02%) ⬇️
Py-3.11.14 97.82% <96.07%> (-0.02%) ⬇️
Py-3.11.9 97.35% <90.19%> (+<0.01%) ⬆️
Py-3.12.10 97.44% <88.23%> (+0.01%) ⬆️
Py-3.12.12 97.92% <96.07%> (-0.01%) ⬇️
Py-3.13.11 98.16% <96.07%> (-0.01%) ⬇️
Py-3.14.2 98.14% <95.91%> (-0.01%) ⬇️
Py-3.14.2t 97.23% <95.91%> (-0.02%) ⬇️
Py-pypy3.11.13-7.3.20 97.38% <88.23%> (+<0.01%) ⬆️
VM-macos 97.60% <88.23%> (+0.01%) ⬆️
VM-ubuntu 98.34% <96.07%> (-0.02%) ⬇️
VM-windows 96.71% <90.19%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 24, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing rodrigobnogueira:fix/import-time-test-python-3.14 (0f3fb76) with master (0cba798)

Summary

✅ 59 untouched benchmarks

@Dreamsorcerer
Copy link
Member

I think the expectation was that import time would improve again in later versions. I think flaky failures today are mostly caused by regressions (e.g. idna: kjd/idna#188).

@rodrigobnogueira rodrigobnogueira force-pushed the fix/import-time-test-python-3.14 branch from 2a4a9d7 to cf37399 Compare January 24, 2026 19:19
@rodrigobnogueira rodrigobnogueira force-pushed the fix/import-time-test-python-3.14 branch from 6fae85d to 2cda92f Compare January 24, 2026 20:14
@rodrigobnogueira
Copy link
Member Author

rodrigobnogueira commented Jan 24, 2026

The CI failure on test_regex_performance popped up again in the macOS 3.12 job (as seen before—it's another timing-based test with a tight <10ms assertion).

I've updated it. Let me know if this looks good

@rodrigobnogueira rodrigobnogueira force-pushed the fix/import-time-test-python-3.14 branch 2 times, most recently from 5febb04 to cd1e76d Compare January 24, 2026 20:29
@rodrigobnogueira rodrigobnogueira force-pushed the fix/import-time-test-python-3.14 branch from be1fa7f to 553f63e Compare January 24, 2026 20:42
Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try using https://pypi.org/p/pytest-rerunfailures? It should be more generic.

@@ -1333,11 +1333,23 @@ async def handler(request: Request) -> Response:

def test_regex_performance() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this belongs in tests executed by codspeed 🤔

cc @bdraco

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do, but the reason I put it here is that the exact performance is not important, it's designed to trip if there's a ReDoS issue, which is an order of magnitude problem. I just misjudged the initial value.

Copy link
Member

@Dreamsorcerer Dreamsorcerer Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, on my machine this now takes ~4ms, on the old regex it took closer to 60s. So, increasing the test threshold to 50ms is absolutely fine here.

We could also consider just skipping it on Mac OS, given that it seems to be 5-10x slower than Linux...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another codspeed candidate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bdraco mentioned that he might look at this one at some point. I'm not sure it's a trivial one to run in codspeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine we could try their new CLI they've announced the other day (I think I got an email yesterday). But I agree this is not in the scope of this PR.

@webknjaz webknjaz requested a review from bdraco January 25, 2026 10:00
@Dreamsorcerer
Copy link
Member

The CI failure on test_regex_performance popped up again in the macOS 3.12 job (as seen before—it's another timing-based test with a tight <10ms assertion).

I think we can just double that to 20ms. It's meant to highlight an issue as an order of magnitude, so I clearly misjudged the performance of Mac OS.

@rodrigobnogueira
Copy link
Member Author

Thanks @webknjaz for the suggesting pytest-rerunfailures.
A helper function was introduced for these time-based tests.
There are other tests that might be modified if you want: test_forwarded_re_performance and test_cookie_pattern_performance, both have a hardcoded 10ms threshold

Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdraco any ideas on designing this better?

rodrigo.nogueira and others added 5 commits January 26, 2026 13:25
- Converted get_flaky_threshold function to a pytest fixture using indirect parametrization
- Renamed to rerun_adjusted_threshold for clarity
- Updated RST docstring with usage examples and rerun count logic
- Added proper type annotations (tuple[float, float]) for mypy compliance
- Updated test_imports.py and test_client_middleware_digest_auth.py to use new fixture
- Improved code readability by splitting long assertion lines
- Restored macOS timing observation comment (40-50ms)
rodrigo.nogueira and others added 6 commits January 28, 2026 19:15
The 0.1s delay was insufficient for proxy.py worker threads to release
their sockets. Increased to 0.5s and added 3 gc.collect() passes to
ensure all cyclic references are broken before pytest cleanup.
Instead of waiting a fixed 5+ seconds, poll for proxy threads to finish
with gc.collect() calls. This is faster on typical runs (exits as soon
as threads are gone) while still having a 5s timeout for robustness.
Instead of matching thread names (which was unreliable), capture the
baseline set of threads before starting proxy.py and wait until all
extra threads have finished.
…ad tracking and adding post-loop garbage collection.
webknjaz
webknjaz previously approved these changes Jan 29, 2026
Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good but let me know if you'd like to drop that helper function before merging.

@webknjaz webknjaz dismissed their stale review January 29, 2026 14:52

Urgh.. looks like my browser cached the previous diff and I commented on the wrong thing.

@rodrigobnogueira rodrigobnogueira force-pushed the fix/import-time-test-python-3.14 branch from 1138854 to ad45b76 Compare January 30, 2026 00:12
@rodrigobnogueira rodrigobnogueira force-pushed the fix/import-time-test-python-3.14 branch from 50102d7 to ac1cb5b Compare January 30, 2026 00:15
@rodrigobnogueira
Copy link
Member Author

rodrigobnogueira commented Jan 30, 2026

Addressed intermittent CI failures on Windows with Python 3.10 and 3.11 in test_proxy_functional.py, caused by PytestUnraisableExceptionWarning (wrapping ResourceWarning: unclosed socket) from proxy.py's threaded mode during connection failures (related to known Windows quirks; see abhinavsingh/proxy.py#492 for context, though it primarily discusses threadless mode). This is not ideal since it's a global filter, but it stabilizes the CI without masking unrelated issues and allows the tests to pass consistently.
All tests passed with the last commit.

rodrigo.nogueira added 2 commits January 30, 2026 10:42
…est thresholds

Replaces tuple-based threshold parameters with a self-documenting
RerunThresholdParams NamedTuple containing 'base' and 'increment_per_rerun'
fields for improved readability and maintainability.
@rodrigobnogueira rodrigobnogueira changed the title Fix flaky import time test for Python 3.12+ Stabilize test suite: fix flaky timing tests, add rerun support, and handle Windows resource leaks Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided There is a change note present in this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants