WIP: Functional Cached Schema Visitors by Milyardo · Pull Request #1871 · disneystreaming/smithy4s

Milyardo · 2026-01-14T16:34:22Z

This is a WIP PR to share an approach to traversing schemas in a more principled way. The current status quo with schema visitors have a few problems that we look to address:

1.) Address bugs around the caching of targets of compilation via side effects
2.) Improve stack safety by not allowing visitors to recursively call themselves

To these ends this PR implements traversal in 2 passes. The first pass is a corecursive process that unfolds a schema into a rose tree of schema nodes, terminating recursion in the schemas by ending traversal when a node is already been seen in current path from the root node. The second pass traverses the finite tree of schemas nodes in post order, providing lazy references to compilation targets to the visitor.

The first pass is implemented in TreeSchemaVisitor.unfoldToTree. This method recurses from the root node down into child nodes as long a child node has not been seen before. The following recursive tree defined in smithy:

union Tree {
  tree: TreeNode
  leaf: LeafNode
}

structure TreeNode {
  @required
  left: Tree
  @required
  right: Tree
}

structure LeafNode {
  @required
  value: Integer
}

will result in a tree that looks like this:

────Tree: LazySchema
    └── Tree: UnionSchema
        ├── TreeNode: BijectionSchema
        │   └── TreeNode: LazySchema
        │       └── TreeNode: StructSchema
        │           ├── Tree: LazySchema
        │           │   └── Tree: UnionSchema
        │           │       ├── TreeNode: BijectionSchema
        │           │       └── LeafNode: BijectionSchema
        │           │           └── LeafNode: StructSchema
        │           │               └── Integer: PrimitiveSchema
        │           └── Tree: LazySchema
        │               └── Tree: UnionSchema
        │                   ├── TreeNode: BijectionSchema
        │                   └── LeafNode: BijectionSchema
        │                       └── LeafNode: StructSchema
        │                           └── Integer: PrimitiveSchema
        └── LeafNode: BijectionSchema
            └── LeafNode: StructSchema
                └── Integer: PrimitiveSchema

The second pass is implemented in TreeSchemaVisitor.cachedPostOrder where during this post order traversal, an monadic cache effect is provided to the compilation function to fetch references to compilation targets the current target may depend on. Users are meant to implement the VisitorF trait instead of interacting which the cache directly, where VisitorF.apply finds any dependencies for end users and fetches references in the cache for them.

VisitorF is a an effectful visitor with a similar interface to existing visitors, however instead of being given a Schema for any child nodes, the visitor is given lazy references to targets instead. This makes it so schemas don't have to recursively call themselves in order to obtain dependencies. Example Hash and Show visitors have been implemented in TreeVisitorHash and TreeVisitorShow.

Below is a list of work to still be done in this PR:

Implement methods to compose VisitorFs
Implement more Visitors
Figure out what to do with older visitor implementations

…into a eagerly evaluated tree.

CLAassistant · 2026-01-14T16:34:30Z

All committers have signed the CLA.

Baccata · 2026-01-16T08:36:15Z

modules/core/src/smithy4s/schema/VisitorF.scala

+                           shapeId: ShapeId,
+                           hints: Hints,
+                           tag: CollectionTag[C],
+                           member: Lazy[G[A]]


In a lot of cases, schemas visitors delegate to other schema visitors. We want to give implementors the freedom to do that, without having to bake the complexity into the G data structure.

I think removing the def apply(schema: Schema[A) : G[A] to prevent implementors from making an uncontrolled recursive call is smart, but at the same time the interface you propose constrains the implementation patterns too much.

To get a sense of the problem : you should try and implement Document encoders and/or decoders against your POC. The implementation of the map component needs to delegate to a key encoder/decoder, and therefore is a good example that should help validate the ergonomics of your proposed solution.

I think my preference would be to have the Cache type not carry the "codec" type parameter, and expose it to the implementers. It'd look like this :

trait Cache[F[_]]{ def getOrCompute[Codec[_], A](visitor: Visitor[F, Codec], schema: Schema[A]) : F[Codec[A]] def getOrComputeRecursive[Codec[_], A](visitor: Visitor[F, Codec], schema: Lazy[Schema[A]], makeCodec: Lazy[Codec[A]] => Codec[A]) : F[Codec[A]] }

And then the Visitor[F] would look like :

// NB, Zipper is Smithy4s' equivalent to cat's Applicative abstract class Visitor[F[_] : Cache : Zipper, Codec[_]]{ def list[A](shapeId: ShapeId, hints: Hints, member: Schema[A]) : F[Codec[List[A]]] // other methods mirroring the Schema ADT }

NOT EXPOSING A recursive apply like the current SchemaVisitor. This would nearly prohibit the implementors calling un-managed recursion, by really encouraging them to use the provided implicit Cache, passing this: Visitor to its methods (or another visitor should delegation be needed). The implementation entries stored in the Cache's internal map would be keyed by (Visitor, Schema), thus giving the flexibility to have visitors that would be able to delegate to other ones whilst making sure that in the same F program, we'd get a single computation for every codec needed within a service (or even across services)

Baccata · 2026-01-16T08:39:36Z

modules/core/src/smithy4s/data/Tree.scala

+ * A Scala encoding of a rose tree, a tree data structure with a variable and unbounded number of branches per node.
+ * @see https://en.wikipedia.org/wiki/Rose_tree
+ */
+sealed trait Tree[+A] {


~~Is this used in anywhere outside of tests ? I'd rather not expand the API of core unless absolutely necessary~~

I don't know this is necessarily a good fit for the problem, conceptually speaking, in that it seems you'd be recomputing a tree for each new root. Like, if you have the following smithy service :

trait Foo { def op1(a: A) def op2(b: B) def op3 (c: C) }

where

case class B(a: A) case class C(b: B)

don't you end up recomputing trees for A a few times ? When computing codecs, datatypes shouldn't be considered in isolation, but rather as part of a larger whole. A type may be the root of a tree in the context of an operation, and a sub node in another.

The rose trees are intermediate data structures that define how we recurse through a Schema they are constructed and immediately discarded after compilation, and some kind of data structure is necessary by the nature of the problem. If we were to solve this problem using recursion schemes Tree would be a our fix point.

Originally I had taken steps to avoid the construction of a intermediate data structure and basically implemented our own version of hylo. I found in trying to do so, I was basically defining a new library for recursion schemes, was implementing a bunch of machinery to be to avoid a problem that I wasn't sure existed in the first place.

So, I avoided the problem by making the recursive and corecursive traversal explicit, and made things easier for people follow along who weren't well versed in recursion schemes. In practice the allocations don't seem to make much a difference in compile times, but I plan on testing it against the entire schema registry at Disney sometime soon.

and some kind of data structure is necessary by the nature of the problem

That's not necessarily true. I get how you'd reach this conclusion from the point of view that schema compilation is always akin to an hylomorphism, but modelling it this way has negative implications over the ergonomics available to implementors when delegating to other compilation is necessary. Ironically the initial design of smithy4s schema visitors was similar to what you're proposing here (granted, we weren't trying to think really hard about caching back then), but it ended up being much more annoying than the current status quo.

So the middle ground I'm after would be one that would allow some level of control from the user to do something with a member schema received as an input to the visitor method, without giving them control over the general recursion, and what I'm suggesting here (or better, a free-applicative version of that) would allow, I think, to get us that sweet spot.

Baccata · 2026-01-16T08:40:28Z

modules/core/src/smithy4s/schema/Schema.scala

      case BijectionSchema(s, bijection) =>
        underlying(BijectionSchema(this(s), bijection))
      case LazySchema(suspend) =>
-        underlying(LazySchema(suspend.map(this.apply)))


👍 very good catch !

…ne KleisliK, as tool to compose visitors.

a-morales and others added 2 commits January 14, 2026 04:39

write tests for Lazy

3b22bf2

Implement a schema visitor that visits schemas by first turning them …

de12e51

…into a eagerly evaluated tree.

Baccata reviewed Jan 16, 2026

View reviewed changes

Define a DocumentEncoder visitor using tree based traversals and defi…

27ef1c5

…ne KleisliK, as tool to compose visitors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Functional Cached Schema Visitors#1871

WIP: Functional Cached Schema Visitors#1871
Milyardo wants to merge 3 commits intodisneystreaming:series/0.19from
Milyardo:zpowers/pureCaching

Milyardo commented Jan 14, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jan 14, 2026 •

edited

Loading

Uh oh!

Baccata Jan 16, 2026

Uh oh!

Baccata Jan 16, 2026

Uh oh!

Baccata Jan 16, 2026 •

edited

Loading

Uh oh!

Milyardo Jan 20, 2026

Uh oh!

Baccata Jan 21, 2026 •

edited

Loading

Uh oh!

Baccata Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Milyardo commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Baccata Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Baccata Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Baccata Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Milyardo Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Baccata Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Baccata Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Milyardo commented Jan 14, 2026 •

edited

Loading

CLAassistant commented Jan 14, 2026 •

edited

Loading

Baccata Jan 16, 2026 •

edited

Loading

Baccata Jan 21, 2026 •

edited

Loading