WIP: Functional Cached Schema Visitors#1871
WIP: Functional Cached Schema Visitors#1871Milyardo wants to merge 3 commits intodisneystreaming:series/0.19from
Conversation
…into a eagerly evaluated tree.
| shapeId: ShapeId, | ||
| hints: Hints, | ||
| tag: CollectionTag[C], | ||
| member: Lazy[G[A]] |
There was a problem hiding this comment.
In a lot of cases, schemas visitors delegate to other schema visitors. We want to give implementors the freedom to do that, without having to bake the complexity into the G data structure.
I think removing the def apply(schema: Schema[A) : G[A] to prevent implementors from making an uncontrolled recursive call is smart, but at the same time the interface you propose constrains the implementation patterns too much.
To get a sense of the problem : you should try and implement Document encoders and/or decoders against your POC. The implementation of the map component needs to delegate to a key encoder/decoder, and therefore is a good example that should help validate the ergonomics of your proposed solution.
There was a problem hiding this comment.
I think my preference would be to have the Cache type not carry the "codec" type parameter, and expose it to the implementers. It'd look like this :
trait Cache[F[_]]{
def getOrCompute[Codec[_], A](visitor: Visitor[F, Codec], schema: Schema[A]) : F[Codec[A]]
def getOrComputeRecursive[Codec[_], A](visitor: Visitor[F, Codec], schema: Lazy[Schema[A]], makeCodec: Lazy[Codec[A]] => Codec[A]) : F[Codec[A]]
}And then the Visitor[F] would look like :
// NB, Zipper is Smithy4s' equivalent to cat's Applicative
abstract class Visitor[F[_] : Cache : Zipper, Codec[_]]{
def list[A](shapeId: ShapeId, hints: Hints, member: Schema[A]) : F[Codec[List[A]]]
// other methods mirroring the Schema ADT
}NOT EXPOSING A recursive apply like the current SchemaVisitor. This would nearly prohibit the implementors calling un-managed recursion, by really encouraging them to use the provided implicit Cache, passing this: Visitor to its methods (or another visitor should delegation be needed). The implementation entries stored in the Cache's internal map would be keyed by (Visitor, Schema), thus giving the flexibility to have visitors that would be able to delegate to other ones whilst making sure that in the same F program, we'd get a single computation for every codec needed within a service (or even across services)
| * A Scala encoding of a rose tree, a tree data structure with a variable and unbounded number of branches per node. | ||
| * @see https://en.wikipedia.org/wiki/Rose_tree | ||
| */ | ||
| sealed trait Tree[+A] { |
There was a problem hiding this comment.
Is this used in anywhere outside of tests ? I'd rather not expand the API of core unless absolutely necessary
I don't know this is necessarily a good fit for the problem, conceptually speaking, in that it seems you'd be recomputing a tree for each new root. Like, if you have the following smithy service :
trait Foo {
def op1(a: A)
def op2(b: B)
def op3 (c: C)
}
where
case class B(a: A)
case class C(b: B)
don't you end up recomputing trees for A a few times ? When computing codecs, datatypes shouldn't be considered in isolation, but rather as part of a larger whole. A type may be the root of a tree in the context of an operation, and a sub node in another.
There was a problem hiding this comment.
The rose trees are intermediate data structures that define how we recurse through a Schema they are constructed and immediately discarded after compilation, and some kind of data structure is necessary by the nature of the problem. If we were to solve this problem using recursion schemes Tree would be a our fix point.
Originally I had taken steps to avoid the construction of a intermediate data structure and basically implemented our own version of hylo. I found in trying to do so, I was basically defining a new library for recursion schemes, was implementing a bunch of machinery to be to avoid a problem that I wasn't sure existed in the first place.
So, I avoided the problem by making the recursive and corecursive traversal explicit, and made things easier for people follow along who weren't well versed in recursion schemes. In practice the allocations don't seem to make much a difference in compile times, but I plan on testing it against the entire schema registry at Disney sometime soon.
There was a problem hiding this comment.
and some kind of data structure is necessary by the nature of the problem
That's not necessarily true. I get how you'd reach this conclusion from the point of view that schema compilation is always akin to an hylomorphism, but modelling it this way has negative implications over the ergonomics available to implementors when delegating to other compilation is necessary. Ironically the initial design of smithy4s schema visitors was similar to what you're proposing here (granted, we weren't trying to think really hard about caching back then), but it ended up being much more annoying than the current status quo.
So the middle ground I'm after would be one that would allow some level of control from the user to do something with a member schema received as an input to the visitor method, without giving them control over the general recursion, and what I'm suggesting here (or better, a free-applicative version of that) would allow, I think, to get us that sweet spot.
| case BijectionSchema(s, bijection) => | ||
| underlying(BijectionSchema(this(s), bijection)) | ||
| case LazySchema(suspend) => | ||
| underlying(LazySchema(suspend.map(this.apply))) |
…ne KleisliK, as tool to compose visitors.
This is a WIP PR to share an approach to traversing schemas in a more principled way. The current status quo with schema visitors have a few problems that we look to address:
1.) Address bugs around the caching of targets of compilation via side effects
2.) Improve stack safety by not allowing visitors to recursively call themselves
To these ends this PR implements traversal in 2 passes. The first pass is a corecursive process that unfolds a schema into a rose tree of schema nodes, terminating recursion in the schemas by ending traversal when a node is already been seen in current path from the root node. The second pass traverses the finite tree of schemas nodes in post order, providing lazy references to compilation targets to the visitor.
The first pass is implemented in
TreeSchemaVisitor.unfoldToTree. This method recurses from the root node down into child nodes as long a child node has not been seen before. The following recursive tree defined in smithy:will result in a tree that looks like this:
The second pass is implemented in
TreeSchemaVisitor.cachedPostOrderwhere during this post order traversal, an monadic cache effect is provided to the compilation function to fetch references to compilation targets the current target may depend on. Users are meant to implement theVisitorFtrait instead of interacting which the cache directly, whereVisitorF.applyfinds any dependencies for end users and fetches references in the cache for them.VisitorFis a an effectful visitor with a similar interface to existing visitors, however instead of being given aSchemafor any child nodes, the visitor is given lazy references to targets instead. This makes it so schemas don't have to recursively call themselves in order to obtain dependencies. Example Hash and Show visitors have been implemented inTreeVisitorHashandTreeVisitorShow.Below is a list of work to still be done in this PR:
VisitorFs