Skip to content

Allow decompression to continue after exceeding max_length#11966

Open
Dreamsorcerer wants to merge 138 commits intomasterfrom
Dreamsorcerer-patch-5
Open

Allow decompression to continue after exceeding max_length#11966
Dreamsorcerer wants to merge 138 commits intomasterfrom
Dreamsorcerer-patch-5

Conversation

@Dreamsorcerer
Copy link
Member

@Dreamsorcerer Dreamsorcerer commented Jan 15, 2026

Architecture summary:

  • Compression utils now have a .data_available attribute, when True the decompress() call can be repeated with b"" to get more data. The decompression output has been reduced to 256KiB, matching the socket read limit.
  • DeflateBuffer/StreamReader.feed_data() now return True if there is more data available by calling the method again with b"".
  • (PY) PayloadParser.feed_data() now returns an enum indicating when the payload is complete or whether more input is needed to continue producing output.
  • (PY) HttpParser.feed_data() now returns early when the payload parser is paused and has a new attribute to track if more data is available from the payload parser.
  • (C) HttpParser.cb_on_body() now processes the decompressed data in chunks and pauses llhttp if the parser is asked to pause.
  • (C) HttpParser.feed_data() now call cb_on_body() again when more data is available.
  • Both parsers now have a .pause_reading() method to stop the parser from producing more output.
  • BaseProtocol.pause_reading() now calls the parser's .pause_reading() method.
  • BaseProtocol.resume_reading() now resumes parsing and then checks if it's been paused again before telling the transport to resume.

@Dreamsorcerer Dreamsorcerer added the backport-3.13 Trigger automatic backporting to the 3.13 release branch by Patchback robot label Jan 15, 2026
@Dreamsorcerer Dreamsorcerer added backport-3.14 Trigger automatic backporting to the 3.14 release branch by Patchback robot bot:chronographer:skip This PR does not need to include a change note labels Jan 15, 2026
@codecov
Copy link

codecov bot commented Jan 15, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
4346 1 4345 42
View the full list of 1 ❄️ flaky test(s)
tests.test_proxy_functional::test_uvloop_secure_https_proxy

Flake rate in main: 7.41% (Passed 25 times, Failed 2 times)

Stack Traces | 1.04s run time
client_ssl_ctx = <ssl.SSLContext object at 0x7f7f2c7d7610>
secure_proxy_url = URL('https://127.0.0.1:33899')
uvloop_loop = <uvloop.Loop running=False closed=False debug=False>

    #x1B[0m#x1B[37m@pytest#x1B[39;49;00m.mark.skipif(#x1B[90m#x1B[39;49;00m
        platform.system() == #x1B[33m"#x1B[39;49;00m#x1B[33mWindows#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m #x1B[95mor#x1B[39;49;00m sys.implementation.name != #x1B[33m"#x1B[39;49;00m#x1B[33mcpython#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
        reason=#x1B[33m"#x1B[39;49;00m#x1B[33muvloop is not supported on Windows and non-CPython implementations#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m,#x1B[90m#x1B[39;49;00m
    )#x1B[90m#x1B[39;49;00m
    #x1B[37m@pytest#x1B[39;49;00m.mark.filterwarnings(#x1B[33mr#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m#x1B[33mignore:.*ssl.OP_NO_SSL*#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[90m# Filter out the warning from#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
    #x1B[90m# https://github.com/abhinavsingh/proxy.py.../proxy/common/utils.py#L226#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
    #x1B[90m# otherwise this test will fail because the proxy will die with an error.#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
    #x1B[94masync#x1B[39;49;00m #x1B[94mdef#x1B[39;49;00m#x1B[90m #x1B[39;49;00m#x1B[92mtest_uvloop_secure_https_proxy#x1B[39;49;00m(#x1B[90m#x1B[39;49;00m
        client_ssl_ctx: ssl.SSLContext,#x1B[90m#x1B[39;49;00m
        secure_proxy_url: URL,#x1B[90m#x1B[39;49;00m
        uvloop_loop: asyncio.AbstractEventLoop,#x1B[90m#x1B[39;49;00m
    ) -> #x1B[94mNone#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
    #x1B[90m    #x1B[39;49;00m#x1B[33m"""Ensure HTTPS sites are accessible through a secure proxy without warning when using uvloop."""#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
        conn = aiohttp.TCPConnector(force_close=#x1B[94mTrue#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
        sess = aiohttp.ClientSession(connector=conn)#x1B[90m#x1B[39;49;00m
        #x1B[94mtry#x1B[39;49;00m:#x1B[90m#x1B[39;49;00m
            url = URL(#x1B[33m"#x1B[39;49;00m#x1B[33mhttps://example.com#x1B[39;49;00m#x1B[33m"#x1B[39;49;00m)#x1B[90m#x1B[39;49;00m
    #x1B[90m#x1B[39;49;00m
            #x1B[94masync#x1B[39;49;00m #x1B[94mwith#x1B[39;49;00m sess.get(#x1B[90m#x1B[39;49;00m
                url, proxy=secure_proxy_url, ssl=client_ssl_ctx#x1B[90m#x1B[39;49;00m
            ) #x1B[94mas#x1B[39;49;00m response:#x1B[90m#x1B[39;49;00m
>               #x1B[94massert#x1B[39;49;00m response.status == #x1B[94m200#x1B[39;49;00m#x1B[90m#x1B[39;49;00m
#x1B[1m#x1B[31mE               assert 403 == 200#x1B[0m
#x1B[1m#x1B[31mE                +  where 403 = <ClientResponse(https://example.com) [403 Forbidden]>\n<CIMultiDictProxy('Date': 'Thu, 05 Feb 2026 17:09:35 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'close', 'accept-ch': 'Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA', 'cf-mitigated': 'challenge', 'critical-ch': 'Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA', 'cross-origin-embedder-policy': 'require-corp', 'cross-origin-opener-policy': 'same-origin', 'cross-origin-resource-policy': 'same-origin', 'origin-agent-cluster': '?1', 'permissions-policy': 'accelerometer=(),browsing-topics=(),camera=(),clipboard-read=(),clipboard-write=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()', 'referrer-policy': 'same-origin', 'server-timing': 'chlray;desc="9c9404323d51f074"', 'x-content-type-options': 'nosniff', 'x-frame-options': 'SAMEORIGIN', 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '9c9404323d51f074-DFW', 'Content-Encoding': 'gzip')>\n.status#x1B[0m

client_ssl_ctx = <ssl.SSLContext object at 0x7f7f2c7d7610>
conn       = <aiohttp.connector.TCPConnector object at 0x7f7f33029bd0>
response   = <ClientResponse(https://example.com) [403 Forbidden]>
<CIMultiDictProxy('Date': 'Thu, 05 Feb 2026 17:09:35 GMT', 'Cont...MT', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '9c9404323d51f074-DFW', 'Content-Encoding': 'gzip')>

secure_proxy_url = URL('https://127.0.0.1:33899')
sess       = <aiohttp.client.ClientSession object at 0x7f7f3306b780>
url        = URL('https://example.com')
uvloop_loop = <uvloop.Loop running=False closed=False debug=False>

#x1B[1m#x1B[31mtests/test_proxy_functional.py#x1B[0m:259: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 15, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 42.58%

Comparing Dreamsorcerer-patch-5 (389b0e6) with master (4bb9e6e)

Summary

⚡ 1 improved benchmark
❌ 5 regressed benchmarks
✅ 53 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_get_request_with_251308_compressed_chunked_payload[isal.isal_zlib-pyloop] 64.4 ms 112.1 ms -42.58%
test_read_large_binary_websocket_messages[pyloop] 8,943 µs 49.1 µs ×180
test_get_request_with_251308_compressed_chunked_payload[zlib_ng.zlib_ng-pyloop] 209 ms 243.7 ms -14.22%
test_ten_streamed_responses_iter_chunks[pyloop] 16.2 ms 18.3 ms -11.43%
test_ten_streamed_responses_iter_chunked_65536[pyloop] 23.1 ms 25.2 ms -8.41%
test_ten_streamed_responses_iter_chunked_4096[pyloop] 28.2 ms 33.1 ms -14.71%

@Dreamsorcerer Dreamsorcerer changed the title Test chunk splits after pause Allow decompression to continue after exceeding max_length Jan 15, 2026
payload_type,
)
from .streams import StreamReader
from .web_exceptions import HttpRequestEntityTooLarge
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to pass this class into MultipartReader or something? We shouldn't really be importing any web modules here.

@Dreamsorcerer
Copy link
Member Author

I'm pulling out the .read_chunk(decode=True). If someone wants to work on implementing that in a separate PR, the test is:

    @pytest.mark.skipif(sys.version_info < (3, 11), reason="wbits not available")
    async def test_read_chunk_with_content_encoding_deflate(self) -> None:
        content = b"A" * 1_000_000  # Large enough to exceed max_length.
        compressed = ZLibBackend.compress(content, wbits=-ZLibBackend.MAX_WBITS)

        h = CIMultiDictProxy(CIMultiDict({CONTENT_ENCODING: "deflate"}))
        with Stream(compressed + b"\r\n--:--") as stream:
            obj = aiohttp.BodyPartReader(BOUNDARY, h, stream)
            result = b""
            while chunk := await obj.read_chunk(decode=True):
                result += chunk
        assert result == content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-3.13 Trigger automatic backporting to the 3.13 release branch by Patchback robot backport-3.14 Trigger automatic backporting to the 3.14 release branch by Patchback robot bot:chronographer:skip This PR does not need to include a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants