Skip to content

Noob question: performance tradeoffs vs. the MLStyle approach to sum types #29

@lukemerrick

Description

@lukemerrick

Hi there! I'm really impressed with the technical depth of this package, and I'm keen on trying to deepen my understanding of how it's designed and why.

My question

Assuming my understanding of the SumTypes and MLStyle approaches to implementing sum types is correct (see below), my general question is "what are the performance tradeoffs between these approaches?"

Sub questions:

  • Are the structs defined by SumTypes fixed size no matter which conceptual type they represent, or does their generic typing afford flexibility there?
    • If so, does that trade off potentially higher memory usage?
    • If so, does it enable performance wins for things like an Array, since indexing is simpler?
    • If not, does that trade off performance for things like Arrays?
  • What kinds of use cases would one of the two approaches offer substantially better performance than the other, and in which kind of use cases would the performance difference be small?
  • In general (beyond the world of sum types), how bad is it to use an abstract type vs. a union on concrete types vs. a single concrete type for things like Arrays?

My (possibly faulty) understanding of how we can implement sum types in Julia

As far as I can tell, there are two main options to offering an algebraic datatype experience in Julia.

First there's SumTypes, which like Unityper defines a single struct which can serve to store and access one of several different "conceptual" types.

E.g.

# Let's look under the hood of SumTypes.@sum_type.
macroexpand(
    @__MODULE__,
    :(
        SumTypes.@sum_type Example begin
            Stringy(value::String)
            Symbolic(name::Symbol)
        end
    )
)

# We get this.
($(Expr(:toplevel, quote
    #= /home/luke/.julia/packages/SumTypes/qpNRC/src/sum_type.jl:260 =#
    struct Example{var"#N#", var"#M#", var"#FT#"}
        bits::(Tuple{Vararg{T, N}} where {N, T}){var"#N#", UInt8}
        ptrs::(Tuple{Vararg{T, N}} where {N, T}){var"#M#", Any}
        var"#tag#"::var"#FT#"
        1 + 1
    end
...  # <truncated>

The rest of the generated code is pretty involved, so I don't follow completely, but my understanding is that we're doing some pretty heavy lifting to somewhat reimpliment structs at a pretty low level (including unsafe memory viewing?). My guess is that we do this for performance reasons, but I don't really understand where the performance gains come from.

On the other hand, MLStyle seems to just be providing syntactic sugar around what I'm assuming will eventually be equivalent to isa checks on multiple different concrete implementations of an abstract type. In other words, still using Julia's type system to define an abstract type and several concrete subtypes (one for each of the possible types of the sum type).

# Let's look under the hood of MLStyle.@data.
macroexpand(
    @__MODULE__,
    :(
        MLStyle.@data Example begin 
            Stringy(value::String)
            Symbolic(name::Symbol)
        end
    )
)

quote
    abstract type Example end
    struct Stringy{} <: Example
        #= REPL[7]:5 =#
        value::String
    end
    nothing
    begin
        #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:95 =#
        #= REPL[7]:5 =#
        #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:96 =#
        function (MLStyle).pattern_uncall(t::Type{<:Stringy}, self::Function, type_params::Any, type_args::Any, args::Any)
            #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:96 =#
            #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:103 =#
            #= REPL[7]:5 =#
            #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:104 =#
            (MLStyle.Record._compile_record_pattern)(t, self, type_params, type_args, args)
        end
    end
    struct Symbolic{} <: Example
        #= REPL[7]:6 =#
        name::Symbol
    end
    nothing
    begin
        #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:95 =#
        #= REPL[7]:6 =#
        #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:96 =#
        function (MLStyle).pattern_uncall(t::Type{<:Symbolic}, self::Function, type_params::Any, type_args::Any, args::Any)
            #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:96 =#
            #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:103 =#
            #= REPL[7]:6 =#
            #= /home/luke/.julia/packages/MLStyle/SLOsr/src/Record.jl:104 =#
            (MLStyle.Record._compile_record_pattern)(t, self, type_params, type_args, args)
        end
    end
end

This seems like it offers fewer "sharp edges" with respect to matching the standard mental model of types in Julia, but I'm guessing it is trading off performance somewhere, and I'm hoping to better understand where!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions