Skip to content

fix(smithy-json): escape control characters in write_string per RFC 8259 §7#647

Open
jason-weddington wants to merge 2 commits intosmithy-lang:developfrom
jason-weddington:fix/json-control-char-escaping
Open

fix(smithy-json): escape control characters in write_string per RFC 8259 §7#647
jason-weddington wants to merge 2 commits intosmithy-lang:developfrom
jason-weddington:fix/json-control-char-escaping

Conversation

@jason-weddington
Copy link

Title

fix(smithy-json): escape control characters in write_string per RFC 8259 §7

Body

Summary

StreamingJSONEncoder.write_string() only escapes \ and ". Control characters U+0000–U+001F (\n, \t, \r, etc.) are written as raw bytes, producing invalid JSON.

This produces invalid JSON per RFC 8259 §7, causing SerializationException (HTTP 400) on any API call where a string field contains these characters.

Root Cause

serializers.py, write_string():

value.replace("\\", "\\\\").replace('"', '\\"').encode("utf-8")

Missing escapes for \n, \r, \t, \b, \f, and other U+0000–U+001F characters required by RFC 8259 §7.

Fix

Replace the inline .replace() chain with a regex-based _escape_string() that handles:

  • Named escapes: \n, \r, \t, \b, \f (plus existing \\ and \")
  • Generic escapes: \uXXXX for remaining U+0000–U+001F

Tests

  • 5 new serde round-trip cases (newline, tab, CR, backslash, quote) added to JSON_SERDE_CASES
  • Dedicated TestStringControlCharEscaping class with:
    • Parametrized named control char tests
    • Exhaustive U+0000–U+001F round-trip via json.loads validation
    • Null byte, mixed escapes, and realistic multi-line prompt cases

All 126 tests pass (106 existing + 20 new).

Impact

Affects all services using smithy-json for request serialization. Any string field containing control characters produces invalid JSON. For services where string fields commonly contain newlines — such as the Bedrock Runtime Converse API — this is the primary failure path.

Prior Art

The Rust implementation in smithy-rs already handles this correctly in escape.rs:

fn escape_string_inner(start: &[u8], rest: &[u8]) -> String {
    let mut escaped = Vec::with_capacity(start.len() + rest.len() + 1);
    escaped.extend(start);
    for byte in rest {
        match byte {
            b'"' => escaped.extend(b"\\\""),
            b'\\' => escaped.extend(b"\\\\"),
            0x08 => escaped.extend(b"\\b"),
            0x0C => escaped.extend(b"\\f"),
            b'\n' => escaped.extend(b"\\n"),
            b'\r' => escaped.extend(b"\\r"),
            b'\t' => escaped.extend(b"\\t"),
            0..=0x1F => escaped.extend(format!("\\u{byte:04x}").bytes()),
            _ => escaped.push(*byte),
        }
    }
    // ...
}

This PR brings the Python implementation in line with the Rust behavior.

Reproduction

from io import BytesIO
from smithy_json import JSONCodec
from smithy_core.prelude import STRING

sink = BytesIO()
s = JSONCodec().create_serializer(sink)
s.write_string(STRING, "line 1\nline 2")
s.flush()
print(sink.getvalue())
# Before fix: b'"line 1\nline 2"'   (invalid JSON — raw 0x0A byte)
# After fix:  b'"line 1\\nline 2"'  (valid JSON)

…259 §7

StreamingJSONEncoder.write_string() only escaped backslash and double
quote. Control characters U+0000–U+001F (newline, tab, CR, etc.) were
written as raw bytes, producing invalid JSON that causes
SerializationException on API calls with multi-line string fields.

Use a regex to escape all control characters: named escapes for common
ones (\n, \r, \t, \b, \f) and \uXXXX for the rest.
@jason-weddington jason-weddington requested a review from a team as a code owner February 27, 2026 16:36
Copy link
Contributor

@jonathan343 jonathan343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jason-weddington!

This looks good to me. Can you add a changelog entry for us so this gets included when we do our next release? You can achieve this by running a command similar to the following:

./scripts/changelog/new-entry.py -t enhancement -p smithy-json -d "Fixed string serialization to escape all control characters (U+0000-U+001F) per [RFC 8259](https://www.rfc-editor.org/rfc/rfc8259#section-7), preventing invalid JSON output for multiline and other control-character-containing strings. ([#647](https://github.com/smithy-lang/smithy-python/pull/647))"

@jason-weddington
Copy link
Author

Thanks @jason-weddington!

This looks good to me. Can you add a changelog entry for us so this gets included when we do our next release? You can achieve this by running a command similar to the following:

./scripts/changelog/new-entry.py -t enhancement -p smithy-json -d "Fixed string serialization to escape all control characters (U+0000-U+001F) per [RFC 8259](https://www.rfc-editor.org/rfc/rfc8259#section-7), preventing invalid JSON output for multiline and other control-character-containing strings. ([#647](https://github.com/smithy-lang/smithy-python/pull/647))"

done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants