[Feauture] Confidence interval bands + errorbars by icweaver · Pull Request #710 · MakieOrg/AlgebraOfGraphics.jl

icweaver · 2025-12-07T20:28:06Z

Howdy, GLM.jl has this in the works. After some discussion on slack, it sounded like this would be a good thing to fold into AoG ~~if it's ever merged. Will keep this as a draft for now.~~ It's now in!

Depends on JuliaStats/GLM.jl#631

If weights are passed, uses GLM.glm instead of GLM.lm to provide confidence interval support. JuliaStats/GLM.jl#495. The following keywords are now provided to linear:

weighttype = :fweights: Thin wrapper around StatsBase.AbstractWeights, e.g., weighttype = :aweights will treat weights as analytic weights, :pweights as probability weights, etc. Design note: AoG strips array wrappers at the layer processing level, which is why this keyword harness is used further down the pipeline instead of attempting to wrap mapping(; weights) directly at the user API level
distr = GLM.Normal(): Distribution passed to GLM.glm.
link = GLM.canonicallink(distr): Link function passed to GLM.glm. Defaults to GLM.IdentityLink().

Below are some brief usage examples:

julia> using AlgebraOfGraphics, CairoMakie, Random

julia> colors = Makie.wong_colors();

julia> df = let
           x = 1:5
           y_err = randn(Xoshiro(111), length(x))
           y = x .+ y_err
           y_unc = abs.(y_err)
           (; x, y, y_unc)
       end

julia> data(df) * mapping(:x, :y) * (
           (visual(Scatter) + mapping(:y_unc) * visual(Errorbars)) * visual(; label = "data")
           + mapping(; weights = :y_unc) * linear() * visual(; color = colors[1], label = "model")
       ) |> draw

Before

After

This uses fweights by default to match current behavior, but different weights can be passed via the new weighttype keyword (this is just a thin wrapper around StatsBase.jl > Weight Vectors):

julia> data(df) * mapping(:x, :y) * (
           (visual(Scatter) + mapping(:y_unc) * visual(Errorbars)) * visual(; label = "data")
           + mapping(; weights = :y_unc) * linear(; weighttype = :pweights) * visual(; color = colors[3], label = "pweights")
       ) |> draw

(other examples with more weighting schemes stacked on top):

julia> specs = data(df) * mapping(:x, :y) * (
           (visual(Scatter) + mapping(:y_unc) * visual(Errorbars)) * visual(; label = "data")
           + mapping(; weights = :y_unc) * (
               mapping(; weights = :y_unc => (w -> inv(w^2))) * linear(; weighttype = :aweights) * visual(; color = colors[1], label = "aweights")
               + linear(; weighttype = :fweights) * visual(; color = colors[2], label = "fweights")
               + linear(; weighttype = :pweights) * visual(; color = colors[3], label = "pweights")
           )
       ) |> draw

If on an older version, this will throw an error if anything besides fweights is passed to respect the deprecation notice in GLM v1.9.1

Still playing around with the design. Happy to make any changes if this is useful

Todo

GLM v1/v2 compat layer?
- wts --> weights once Rename wts to weights and allow passing variable name to it JuliaStats/GLM.jl#631 is in
- open weighttype to aweights and pweights
tests
docs

icweaver · 2025-12-23T19:35:33Z

Whoo, JuliaStats/GLM.jl#487 is in now! I've updated the PR here to match current functionality. Once the design looks ok to y'all, could I have some help with adding any additional docs + desired tests?

icweaver · 2026-01-09T07:14:15Z

Ok, I think things should be ready now for a first review

nalimilan · 2026-01-11T16:48:16Z

docs/src/reference/analyses.md

+    mapping(; weights = :y_unc) * (
+        linear(; weighttype = :fweights, weighttransform = x -> inv.(x .^ 2)) * visual(; color = colors[1], label = "fweights")


Is weighttype really needed? The whole idea behind these AbstractWeights types in StatsBase is that you only specify once the kind of weights you have when building the vector, and then all functions automatically interpret it correctly. We don't have any API currently that takes the kind of weight separately.

Thanks for taking a look, Milan! I hear ya, and I as on the fence on if it would be better to have the user explicitly construct the weight vector themselves before passing it to AoG, or to just have AoG handle it internally. I decided to go with the latter for ease of use.

Currently, this just wraps the specified field from the table-like data passed to AoG in fweights (or to potentially more supported weight kinds after GLM v2 is released). In either case, the weights are still only specified once, no?

Maybe calling it weighttype is misleading since it's really just wrapping our array in a convenience function call. I think I called it weightkind in an earlier iteration, and could switch back to that if this design makes sense? Sorry if I am misunderstanding what you are asking.

My point is that imagine I'm loading a dataset in df. I immediately do something like df.y_unc = aweights(df.y_unc). Then I can create one or more graphs, estimate models, etc., without ever saying again that these are aweights, and everything works automatically. And this is necessary to use weights with StatsBase or GLM as they require an AbstractWeights type. Introducing a new keyword argument which differs from the method used in other packages actually makes things more complex IMO.

As @jkrumbiegel said (https://github.com/MakieOrg/AlgebraOfGraphics.jl/pull/710/changes#r2685551356) it seems better to adopt the API of underlying packages as much as possible.
without introducing new API in AoG.

Right, so your strategy is totally what I would do if I was using vanilla Makie.

In contrast, it's my understanding that the current usage with AoG makes all calls to StatsBase and GLM internally so that the user only needs to pass the raw data without any pre-processing or explicit loading of additional packages needed. It seems this would need to change with your proposed usage, so I guess my question is if this is the direction that we would like to go in this PR.

At any rate, happy to go with either route, was just asking for clarity

Taking another look at this, it seems that AoG actually strips wrapper types in its layer processing step (I assume for ease of processing data in other parts of the code base), so doing something like df.y_unc = aweights(df.y_unc) ahead of time would actually be undone by the time it gets passed to GLM.glm:

julia> using AlgebraOfGraphics: data, mapping, process julia> using AlgebraOfGraphics.StatsBase: aweights julia> using Random: Xoshiro julia> df = let x = 1:5 y_err = randn(Xoshiro(111), length(x)) y = x .+ y_err y_unc = aweights(abs.(y_err)) (; x, y, y_unc) end (x = 1:5, y = [-0.21415701487924022, 2.8728141561713048, 2.5526774519547093, 4.1453218819239845, 4.163024675330345], y_unc = [1.2141570148792402, 0.8728141561713048, 0.4473225480452905, 0.14532188192398446, 0.8369753246696554]) julia> layer = data(df) * mapping(:x, :y; weights = :y_unc) Layer transformation: identity data: AlgebraOfGraphics.Columns{@NamedTuple{x::UnitRange{Int64}, y::Vector{Float64}, y_unc::StatsBase.AnalyticWeights{Float64, Float64, Vector{Float64}}}} positional: 1: x 2: y named: weights: y_unc julia> processed = process(layer) AlgebraOfGraphics.ProcessedLayers(AlgebraOfGraphics.ProcessedLayer[AlgebraOfGraphics.ProcessedLayer(Makie.Scatter, {}, Any[[[1, 2, 3, 4, 5]], [[-0.21415701487924022, 2.8728141561713048, 2.5526774519547093, 4.1453218819239845, 4.163024675330345]]], {:weights = [[1.2141570148792402, 0.8728141561713048, 0.4473225480452905, 0.14532188192398446, 0.8369753246696554]]}, {1 = "x", 2 = "y", :weights = "y_unc"}, {}, {}, nothing)]) julia> processed.layers[1].named[:weights] 1-element Vector{Vector{Float64}}: [1.2141570148792402, 0.8728141561713048, 0.4473225480452905, 0.14532188192398446, 0.8369753246696554]

I guess this could be selectively patched so that calls to f(identity, vs) could be strategically avoided, but that seems like it might be a bit too invasive. This makes me more inclined to just stick with adding back the desired wrapper after the fact I think, similarly to how it is currently done

Does that seem reasonable to folks?

I'm not familiar enough with AoG to give an informed opinion on the implementation, but from an external API point of view it would be too bad IMO that users would need to repeat the weights type as the weighttype argument given that it's already carried by the vector type.

I guess this could be selectively patched so that calls to f(identity, vs) could be strategically avoided

I don't know, you could try to pass through the original data if the transformation function is identity. I can't say without trying if that would have weird side effects. But in general, AoG really only cares about the eltypes because it assumes that the vector types are side effects of whatever table format is being used. Like why should it make a difference if a vector, a vector view, a lazy mapped array, etc. is used if they all contain floats? In that sense, AnalyticWeights is the odd one here as it doesn't contain Weight types but normal numbers, and therefore disintegrates into normal Vector{T} under transformations. I'm not sure making AoG sensitive to wrapper types is a good behavior change.

The map(identity, x) behavior and changing of array types through transformations is of course a bit of a Julia footgun sometimes, and we also have special cases for that in Makie. You have a transformation function with a default of "no transformation" (== identity) but don't want to pay for an actual copy so you special case that map(identity, x) stays just x.

nalimilan · 2026-01-11T16:51:11Z

src/transformations/linear.jl


 """
-    linear(; interval=automatic, level=0.95, dropcollinear=false, npoints=200)
+    linear(; interval=automatic, level=0.95, dropcollinear=false, npoints=200, weighttype=:fweights, weighttransform=identity, distr=GLM.Normal())


If you allow passing distr, it would make sense to allow passing link too. Most distributions other than Normal require a link other than identity.

Oh cool, didn't know about link, thanks! I'm wondering now if it might make more sense to start storing these GLM.glm-specific kwargs in their own struct that's then passed to AoG.linear instead of duplicating individual args in AoG.LinearAnalysis internally as I am currently doing. I guess it depends on how much additional API surface we want to introduce. Happy to go with either route depending on what Julius prefers

I didn't closely examine this but in general, I like it better if options for a different package are pass-through (like kwargs...), so AoG doesn't really add API on its own. But if it's just really cumbersome to use that way, that's an argument for making special AoG API. After all, linear() is still supposed to be rather simple to use.

That makes sense. One point to note though is that distribution and link are positional arguments in GLM, so either they would have to be passed as positional arguments to linear, or they would need special-casing.

Hi all, I just added a simple pass-through kwarg like Jules recommended. Re-tooling linear to accept positional args sounds like it could be a breaking change that might be better for a separate PR? Anyway, I've updated the examples above for your review using this simple setup now.

icweaver added 7 commits December 6, 2025 15:37

started

8f65ac2

byo distr

234355b

temporarily use GLM PR branch

8f7c2d2

weightfunc --> weightkind

494442e

docs started

9cbad45

added weighttransform

a77a0b5

update docs

2cf6b06

icweaver mentioned this pull request Dec 7, 2025

Confidence Interval Band #366

Open

icweaver changed the title ~~Confidence interval bands + GLM.jl~~ [Feauture] Confidence interval bands + errorbars Dec 7, 2025

icweaver added 2 commits December 22, 2025 23:57

Merge branch 'master' into glm

5387feb

weights no longer exported by GLM, use symbols instead

ac450f4

icweaver marked this pull request as ready for review December 23, 2025 20:33

sync doc examples

fe85b63

icweaver marked this pull request as draft January 8, 2026 09:40

icweaver added 6 commits January 8, 2026 03:33

Merge branch 'master' into glm

97b8c63

adjust compat

3118fc9

update docs [skip ci]

88ff001

docstring cleanup [skip ci]

3bb3f5e

cleanup

13f9dff

cleanup

c3bbc26

icweaver marked this pull request as ready for review January 9, 2026 07:14

nalimilan reviewed Jan 11, 2026

View reviewed changes

icweaver added 2 commits January 20, 2026 10:58

Merge branch 'master' into glm

7b5a11b

cleanup

bacf7bc

		mapping(; weights = :y_unc) * (
		linear(; weighttype = :fweights, weighttransform = x -> inv.(x .^ 2)) * visual(; color = colors[1], label = "fweights")

Conversation

icweaver commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Related

Todo

Uh oh!

icweaver commented Dec 23, 2025

Uh oh!

icweaver commented Jan 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

icweaver Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

icweaver Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkrumbiegel Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nalimilan Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

icweaver commented Dec 7, 2025 •

edited

Loading

icweaver Jan 11, 2026 •

edited

Loading

icweaver Jan 16, 2026 •

edited

Loading

jkrumbiegel Jan 28, 2026 •

edited

Loading

nalimilan Jan 13, 2026 •

edited

Loading