Skip to content

gh-144370: Disallow usage of control characters in status, headers and values for security#144371

Draft
benediktjohannes wants to merge 20 commits intopython:mainfrom
benediktjohannes:patch-4
Draft

gh-144370: Disallow usage of control characters in status, headers and values for security#144371
benediktjohannes wants to merge 20 commits intopython:mainfrom
benediktjohannes:patch-4

Conversation

@benediktjohannes
Copy link
Contributor

@benediktjohannes benediktjohannes commented Jan 31, 2026

This PR is opened as public as the primary described CVE is already publicated and this only contains one certain point that has to be changed in code in Python and refers to issue #144370 (gh-144370).

In Lib/wsgiref/handlers.py at 260 (def _convert_string_type(self, value, title):) it is necessary to add the same fix as in #143916 requested as well (credits to @sethmlarson 👍). This references #143917 and #144118 and already includes the latest fix for the inclusion of HTAB, additions to this in form of tests will automatically be added by merging #144118.

There are some differences to wsgiref.headers.Headers that I made because I chose the default status of "name" to be "True" instead of false as "True" includes one more disallowed character for safety in future use cases if not defined otherwise, but I also added "name=True" even if not necessary in use cases just for safety for another case (if the default case was changed in order to produce similarity to wsgiref.headers.Headers while I'd recommend to change wsgiref.headers.Headers). I've also made an addition to the raised "ValueError" including "values" (without a further description of the very exact in- and exclusions like HTAB because this doesn't seem to be user-sided relevant to me) and I've used a slightly shorter internal form for the variables (which can be changed if wanted, but it's just style).

Important note: I was / am not quite sure about these changes as they were not tested, but I'm highly confident that this should be correct as it (mostly) bases on changes which were already made in other PRs (or at least in one other PR because this is the only merged one (but this also refers to a "non-merged" PR, so has to be concretized in that way)). And another important aspect is that I was not quite sure about the "status" (whether my classification is correct here as a "non-name value" when considering the handling of the characters, so please correct me if I'm mistaken (and I wasn't quite sure as well at the question whether potentially there should be added some different disallowed characters for this or some disallowed be removed, so please correct me if I'm mistaken)). Please also correct me if I'm mistaken in general anywhere in this PR (maybe this isn't even necessary for this case and I'm completely mistaken because this wasn't tested, but I'm pretty confident just from looking at the code).

@benediktjohannes
Copy link
Contributor Author

Important note: This PR needs backport to 3.10, 3.11, 3.12, 3.13 and 3.14 please, thanks! 👍 (I'll probably add these tomorrow if there are no concerns.)

@benediktjohannes
Copy link
Contributor Author

@gpshead or @StanFromIreland / @ZeroIntensity please review, thanks!

@picnixz
Copy link
Member

picnixz commented Jan 31, 2026

What is different from #144118?

@picnixz picnixz closed this Jan 31, 2026
@picnixz
Copy link
Member

picnixz commented Jan 31, 2026

I would suggest that you directly comment on Seth's PR instead.

@benediktjohannes
Copy link
Contributor Author

@picnixz I guess that this was a misunderstanding, this PR is about a completely different thing that is changes. This is not a duplicate of the other PR because the other one is about adding HTAB again to Lib/wsgiref/headers.py, but this one is about Lib/wsgiref/handlers.py, not headers.py. This is something completely different because the other one only adds one character to be allowed, but this one reguards disallowing several characters in handlers.py (and not headers.py) which wasn‘t mentioned at all in the other PR because this concerns completely different aspects like status and some changes in debug mode in handlers.py and not headers.py. Please reopen. Thanks!

@benediktjohannes
Copy link
Contributor Author

@picnixz I guess, I know why this seemed to be a duplicate as I described the „differences“ to the other PR and then you thought that this was about some changes that I suggested for the other PR, but this is not what this PR is about because this is about something completely different in another location, so please reopen. Thanks!

@benediktjohannes
Copy link
Contributor Author

benediktjohannes commented Feb 1, 2026

And I think that is would not be a very good idea if we combined these changes in another PR because the first one was already merged two weeks ago and the other one is only about adding allowed phrases again while this one is about the opposit (disallowing phrases) with the only thing in common that this one does not disallow one thing which will be added in the new PR again, but both other PRs concern another location which makes this not even a duplication in that way, so please reopen. Thanks!

cc @gpshead @sethmlarson @StanFromIreland @ZeroIntensity

@gpshead gpshead reopened this Feb 1, 2026
@benediktjohannes
Copy link
Contributor Author

Thank you very much!

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should write tests for theses changes.

"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

_name_disallowed = re.compile(r'[\x00-\x1F\x7F]')
_value_disallowed = re.compile(r'[\x00-\x08\x0A-\x1F\x7F]')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to wait until PR gh-144118 is merged, and then get these regexs from wsgiref.headers (line 4).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea! One other idea that came to my mind when reading this is that we could theoretically use the same converter for wsgiref.headers and wsgiref.handlers (if we already use the same regexes here), but maybe something will be changed in one and then it's unclear that it's used there too, so probably a good idea to only use these twice because they should (probably) always be updated in the same way and it should be more secure to only have to update them once so that they aren't forgotten in the other place (like they were before I found them in this place), so that's a nice idea for future security, very good engineering!

Unforetunately, I can't add this before the other PR is merged because the tests won't work, so for now I have to let this be the same or theoretically I could also let this the same here and add a suggestion to the other PR so that it includes the regexes from this one because it seems to be urgent to me to merge this as it's a security update while the other PR only adds one character back (which is not that urgent) and then I could just add this as a suggestion to the other one.

What's your opinion on this @vstinner ? Thanks!

Co-authored-by: Victor Stinner <vstinner@python.org>
@benediktjohannes
Copy link
Contributor Author

Hi @vstinner thank you very much for your review! I really like your suggestions and I'll make some commits in order to get them in the PR soon, thank you very much!

benediktjohannes and others added 2 commits February 4, 2026 16:47
Co-authored-by: Victor Stinner <vstinner@python.org>
@benediktjohannes
Copy link
Contributor Author

(Added the "re" again because it's included in vstinner's regex and also in sethmlarson's PR so if this will be used in both cases then it's consistent in that way and it would take too much time to change this there as well and the suggestions by seth and vstinner were both with the "re", so I've added it)

@benediktjohannes
Copy link
Contributor Author

I'm not quite sure about the status whether it should be handled as a value or a name, so please correct me if I'm mistaken inthis case (or even none of both? but I don't think that)

@benediktjohannes
Copy link
Contributor Author

You should write tests for theses changes.

Tests are basically already included by seth's PR, but I can of course add similar ones to handlers as well.

@vstinner
Copy link
Member

vstinner commented Feb 4, 2026

Tests are basically already included by seth's PR, but I can of course add similar ones to handlers as well.

Please add new tests on handlers.

@benediktjohannes
Copy link
Contributor Author

@vstinner please re-review, thanks! (I'm still not quite sure about it because I haven't tested it (also the tests are not tested) or is this automatically done by the lint check?)

@benediktjohannes
Copy link
Contributor Author

(but even if this is done by the lint check automatically, I'm not quite sure about the handling of the status)

@benediktjohannes
Copy link
Contributor Author

and thanks for the review by the way 👍

@benediktjohannes
Copy link
Contributor Author

Ahh, this seems actually to be checked automatically by the Tests based on what I've seen! Nice! But I'm not quite sure about the correct classification of the status nevertheless, so please have a look there, thanks!

@benediktjohannes
Copy link
Contributor Author

And also I'm not quite sure about the test logic because I'm not an expert for wsgiref, so please check on that too, thank you!

I'm not adding c0 at add_header as this is already caught by the other PR of seth and if there are other "versions" of inputting headers it should be the same as in seth's PR (and it's only for debug mode)
@benediktjohannes benediktjohannes marked this pull request as draft February 4, 2026 19:12
@sethmlarson
Copy link
Contributor

Thanks for this. There's no definite answer here, but my vibe is that a response status is far less likely to be controllable/injectable than a header value. I think this is still a fine improvement, but not sure if we'd handle it the same as #143916 with a CVE?

@benediktjohannes
Copy link
Contributor Author

By the way I want to add the comment of this PR to the chat (I'm not adding c0 at add_header as this is already caught by the other PR of seth and if there are other "versions" of inputting headers it should be the same as in seth's PR (and it's only for debug mode))

@benediktjohannes
Copy link
Contributor Author

Thanks for this. There's no definite answer here, but my vibe is that a response status is far less likely to be controllable/injectable than a header value. I think this is still a fine improvement, but not sure if we'd handle it the same as #143916 with a CVE?

Hi, thank you very much for your comment! I'm still having some problems with the implementation (but I hope that I'll have that fixed soon). 👍 You're right that a response status is probably less likely to be injectable (and debug mode is not that relevant), but it seems to me that theoretically there could be (and statistically by the usage number of Python, there probably are many) some status responses based upon user entrys (or other ways to influence the status) and for this reason I think that (based on the statistics of the usage numbers of Python) there are at least some / many relevant systems in danger through this, so I think handling it via a CVE would be the safest way, but I'm not quite sure, so please correct me if I'm mistaken, thanks! 👍

@sethmlarson
Copy link
Contributor

Alarm fatigue is a real phenomenon, so we have to reserve publishing CVEs only for when there is actual danger. My read is that very few deployments use wsgiref in production (since it's a reference implementation, the header CVE was only because it wasn't documented as such) and very few deployments allow user-controlled status codes.

The previous CVE and advisory noted that wsgiref should not be deployed to production, so anyone who received that advisory is on the hook if they are deploying wsgiref like this.

@benediktjohannes
Copy link
Contributor Author

Alarm fatigue is a real phenomenon, so we have to reserve publishing CVEs only for when there is actual danger. My read is that very few deployments use wsgiref in production (since it's a reference implementation, the header CVE was only because it wasn't documented as such) and very few deployments allow user-controlled status codes.

The previous CVE and advisory noted that wsgiref should not be deployed to production, so anyone who received that advisory is on the hook if they are deploying wsgiref like this.

Thanks for the answer! I agree that the warning should already have informed everyone (who reads these warnings) about the fact that this is potentially a risk (but potentially they may think that this was now reported and fixed and nevertheless use (although it's against the formal recommendation (because many people do so many not recommended things (like there is nearly nobody (I guess) who is reading the whole user license agreements))) wsgiref for production), so maybe it's enough to patch this for all supported (for security updates) Python versions so that CVEs are "reserved" for bigger dangers because it's already indirectly included in the other CVE, I think that this is a possible way as well while I'm just always saying "better a little bit safer than risking something", but maybe @vstinner has a opinion on that as well. But I think that I'm fine with sethmlarson's solution. 👍

@benediktjohannes
Copy link
Contributor Author

Addition: Maybe I was mistaken with my claim concerning the debug mode because I think that there are possibly many people running their real servers in debug mode (I'm not quite sure whether this is the case for wsgiref as well, but I think so) because they want additional output and information (and in this case, it's the same with name and value) (this concerns the message about the potential risks and not the one about the tests because this seems irrelevant to me because it's basically (if I'm not mistaken anywhere) the same in seth's PR).

@benediktjohannes
Copy link
Contributor Author

And unforetunately the validate_status is triggered after the convertion with the potential injection (see status = self._convert_string_type(status, "Status", name=False)
self._validate_status(status)) which means that there is no security through this check

@benediktjohannes
Copy link
Contributor Author

But I see your point that the status is very unlikely to be manipulated because anyone who can actually change the status is basically in power to change very important things, but potentially there are some edge cases with excluded statuses like 404, etc. and then they nevertheless allow changing the status somehow or some other edge cases (which is probably a mistake by the user (like we already said), but potentially thinkable and statistically maybe relevant (while in this case it's probably a fact that the person using wsgiref that way won't read CVEs, but theoretically this is an argument while I definitely see your point with the "over alarming", but maybe we should consider the debug mode risks (see above))), thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants