Skip to content

fix(gmail): prevent URL corruption in quoted-printable decoding#186

Open
100menotu001 wants to merge 2 commits intosteipete:mainfrom
100menotu001:fix/quoted-printable-url-corruption
Open

fix(gmail): prevent URL corruption in quoted-printable decoding#186
100menotu001 wants to merge 2 commits intosteipete:mainfrom
100menotu001:fix/quoted-printable-url-corruption

Conversation

@100menotu001
Copy link

Summary

Fixes #159 — URLs with = characters were being corrupted to U+FFFD when emails had Content-Transfer-Encoding: quoted-printable but content was already decoded by Gmail API.

Root Cause

Gmail API's format=full may return body.data with content already decoded from its original transfer encoding, but the CTE header still indicates quoted-printable. The existing code would attempt to QP-decode again, causing:

  • Raw = characters (from URLs like ?foo=bar) treated as invalid QP sequences
  • Go's quotedprintable.Reader produces U+FFFD for invalid sequences

Solution

Added looksLikeQuotedPrintable() check (similar to existing looksLikeBase64()) that detects actual QP markers before decoding:

  • Soft line breaks (=\r\n or =\n)
  • Uppercase hex sequences (=XX where X is 0-9 or A-F)

Using uppercase-only hex detection avoids false positives from URLs containing lowercase letters after = (e.g., ?foo=bar).

Testing

Added unit tests covering:

  • Original issue (URL preservation when already decoded)
  • Various QP patterns (uppercase hex, soft breaks)
  • False positive prevention (lowercase URL params)
  • Edge cases (short input, mixed case)

Contributed via OpenClaw agent — active gog CLI users contributing back.

Agent added 2 commits February 4, 2026 17:39
When Gmail API returns already-decoded content (format=full), the
Content-Transfer-Encoding header may still say 'quoted-printable'.
Previously, we would attempt to QP-decode again, causing raw '='
characters in URLs to be replaced with U+FFFD (replacement char).

This adds looksLikeQuotedPrintable() to detect actual QP sequences
(=XX hex or =CRLF soft breaks) and skip decoding when content appears
pre-decoded.

Fixes steipete#159
Matt's security review caught that URLs like '?foo=bar' would incorrectly
trigger QP detection because '=ba' matches as hex digits.

Changed to only match UPPERCASE hex (0-9, A-F) since:
- RFC 2045 recommends uppercase for QP encoding
- Most encoders use uppercase in practice
- This avoids false positives from lowercase URL params
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: gmail get corrupts URLs in email body (quoted-printable decoding issue)

1 participant