RFC: Implementing `serde::Serialize`

Status: RFC

Applies to: client, server

For a summarized list of proposed changes, see the Changes Checklist section.

This RFC defines how smithy-rs will enable customers to use the serde library with generated clients & servers. This is a common request for myriad reasons, but as we have written about before this is a challenging design area. This RFC proposes a new approach: Rather than implement Serialize directly, add a method to types that returns a type that implements Serialize. This solves a number of issues:

It is minimally impactful: It doesn't lock us into one Serialize implementation. It contains only one public trait, SerializeConfigured. This trait will initially be defined on a per-crate basis to avoid the orphan-trait rule. It also doesn't have any impact on shared runtime crates (since no types actually need to implement serialize).
It allows customers to configure serde to their use case. For example, for testing/replay you probably don't want to redact sensitive fields but for logging or other forms of data storage, you may want to redact those fields.
The entire implementation is isolated to a single module, making it trivial to feature-gate out.

Terminology

serde: A specific Rust library that is commonly used for serialization
Serializer: The serde design decouples the serialization format (e.g. JSON) from the serialization structure of a particular piece of data. This allows the same Rust code to be serialized to CBOR, JSON, etc. The serialization protocol, e.g. serde_json, is referred to as the Serializer.
Decorator: The interface by which code separated from the core code generator can customize codegen behavior.

The user experience if this RFC is implemented

Currently, there is no practical way for customers to link Smithy-generated types with Serialize. Customers will bring the SerdeDecorator into scope by including it on their classpath when generating clients and servers.

Customers may add a serde trait to members of their model:

use smithy.rust#serde;
@serde
structure SomeStructure {
    field: String
}

The serde trait can be added to all shapes, including operations and services. When it is applied to a service, all shapes in the service closure will support serialization.

Note: this RFC only describes Serialize. A follow-up RFC and implementation will handle Deserialize.

Generated crates that include at least one serde tagged shape will include a serde feature. This will feature gate the module containing the serialization logic. This will provide implementations of SerializeConfigured which provides two methods:

my_thing.serialize_ref(&settings); // Returns `impl Serialize + 'a`
my_thing.serialize_owned(settings); // Returns `impl Serialize`

Once a customer has an object that implements Serialize they can then use it with any Serializer supported by serde.

use generated_crate::serde::SerializeConfigured;
let my_shape = Shape::builder().field(5).build();
let settings = SerializationSettings::redact_sensitive_fields();
let as_json = serde_json::to_string(&my_shape.serialize_ref(&settings));

Customer Use Cases

I want to embed a structure into my own types that implement `Serialize`

The generated code includes two methods: serialize_redacted and serialize_unredacted.

Note: There is nothing in these implementations that rely on implementation details—Customers can implement these methods (or variants of them) themselves.

These have the correct signatures to be used with serialize_with:

use generated_crate::serde::serialize_redacted;

#[derive(Serialize)]
struct MyStruct {
    #[serde(serialize_with = "serialize_redacted")]
    inner: SayHelloInput,
}

I want to serialize data for testing

This will be supported in the future. Currently Deserialize behavior is not covered by this RFC. Customers can take the same serialization settings they used.

I want to dump a structured form of data into a database/logs/etc.

This is possible by using the base APIs. If customers want to delegate another thread or piece of code to actually perform the serialization, they can use .serialize_owned(..) along with erased-serde to accomplish this.

How to actually implement this RFC

In order to provide configurable serialization, this defines the crate-local public trait SerializeConfigured:

/// Trait that allows configuring serialization
/// **This trait should not be implemented directly!** Instead, `impl Serialize for ConfigurableSerdeRef<T>`
pub trait SerializeConfigured {
    /// Return a `Serialize` implementation for this object that owns the object.
    ///
    /// Use this if you need to create `Arc<dyn Serialize>` or similar.
    fn serialize_owned(self, settings: SerializationSettings) -> impl Serialize;

    /// Return a `Serialize` implementation for this object that borrows from the given object
    fn serialize_ref<'a>(&'a self, settings: &'a SerializationSettings) -> impl Serialize + 'a;
}

We also need to define SerializationSettings. The only setting currently exposed is redact_sensitive_fields:

#[non_exhaustive]
#[derive(Copy, Clone, Debug, Default)]
pub struct SerializationSettings {
    /// Replace all sensitive fields with `<redacted>` during serialization
    pub redact_sensitive_fields: bool,
}

We MAY add additional configuration options in the future, but will keep the default behavior matching current behavior. Future options include:

Serialize null when a field is unset (the current default is to skip serializing that field)
Serialize blobs via a list of numbers instead of via base64 encoding
Change the default format for datetimes (current HttpDate)

No objects actually implement SerializeConfigured. Instead, the crate defines two private structs:

pub(crate) struct ConfigurableSerde<T> {
    pub(crate) value: T,
    pub(crate) settings: SerializationSettings
}

pub(crate) struct ConfigurableSerdeRef<'a, T> {
    pub(crate) value: &'a T,
    pub(crate) settings: &'a SerializationSettings
}

Why two structs?

We need to support two use cases—one where the customer wants to maintain ownership of their data and another where the customer wants to create Box<dyn Serialize> or other fat pointer. There is a blanket impl for Serialize from ConfigurableSerde to ConfigurableSerdeRef.

The SerializeConfigured trait has a blanket impl for ConfigurableSerdeRef:

/// Blanket implementation for all `T` such that `ConfigurableSerdeRef<'a, T>` implements `Serialize`.
impl<T> SerializeConfigured for T
    where for<'a> ConfigurableSerdeRef<'a, T>: Serialize {
    fn serialize_owned(
        self,
        settings: SerializationSettings,
    ) -> impl Serialize {
        ConfigurableSerde {
            value: self,
            settings,
        }
    }

    fn serialize_ref<'a>(
        &'a self,
        settings: &'a SerializationSettings,
    ) -> impl Serialize + 'a {
        ConfigurableSerdeRef {
            value: self,
            settings,
        }
    }
}

The job of the code generator is then to implement ConfigurableSerdeRef for all the specific T that we wish to serialize.

Supporting Sensitive Shapes

Handling @sensitive is done by wrapping memers in Sensitive<'a T>(&'a T) during serialization. The serialize implementation consults the settings to determine if redaction is required.

if let Some(member_1) = &inner.foo {
    s.serialize_field("foo",
        &Sensitive(&member_1.serialize_ref(&self.settings)).serialize_ref(&self.settings),
    )?;
}

Note that the exact mechanism for supporting sensitive shapes is crate-private and can be changed in the future.

Supporting Maps and Lists

For Maps and Lists, we need to be able to handle the case where two different Vec<String> may be serialized differently. For example, one may target a Sensitive string and the other may target a non-sensitive string.

To handle this case, we generate a wrapper struct for collections:

struct SomeStructWrapper<'a>(&'a Vec<SomeStruct>);

We then implement Serialize for this wrapper which allows us to control behavior on a collection-by-collection basis without running into conflicts.

Note: This is a potential area where future optimizations could reduce the amount of generated code if we were able to detect that collection serialization implementations were identical and deduplicate them.

Supporting `DateTime`, `Blob`, `Document`, etc.

For custom types that do not implement Serialize, we generate crate-private implementations, only when actually needed:

impl<'a> Serialize for ConfigurableSerdeRef<'a, DateTime> {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> where S: Serializer {
        serializer.serialize_str(&self.value.to_string())
    }
}

Changes checklist

Define SerializeConfigured
Define ConfigurableSerde/SerdeRef
Generate implementations for all types in the service closure
Handle sensitive shapes
Implement Deserialize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Implementing `serde::Serialize`

Terminology

The user experience if this RFC is implemented

Customer Use Cases

I want to embed a structure into my own types that implement `Serialize`

I want to serialize data for testing

I want to dump a structured form of data into a database/logs/etc.

How to actually implement this RFC

Supporting Sensitive Shapes

Supporting Maps and Lists

Supporting `DateTime`, `Blob`, `Document`, etc.

Changes checklist

FilesExpand file tree

rfc0045_configurable_serde.md

Latest commit

History

rfc0045_configurable_serde.md

File metadata and controls

RFC: Implementing serde::Serialize

Terminology

The user experience if this RFC is implemented

Customer Use Cases

I want to embed a structure into my own types that implement Serialize

I want to serialize data for testing

I want to dump a structured form of data into a database/logs/etc.

How to actually implement this RFC

Supporting Sensitive Shapes

Supporting Maps and Lists

Supporting DateTime, Blob, Document, etc.

Changes checklist

RFC: Implementing `serde::Serialize`

I want to embed a structure into my own types that implement `Serialize`

Supporting `DateTime`, `Blob`, `Document`, etc.