-
Notifications
You must be signed in to change notification settings - Fork 65
Description
If combine_chains() or draw_from_chain_and_iteration_() is called on a dataset in which some chains have more iterations than others, this will produce a non-consecutive set of integer .draw ids:
Reprex
my_draws <- tidyr::crossing(
.chain = 1:2,
.iteration = 1:3) |>
dplyr::mutate(x = rnorm(6))
my_draws
#> # A tibble: 6 × 3
#> .chain .iteration x
#> <int> <int> <dbl>
#> 1 1 1 1.20
#> 2 1 2 0.213
#> 3 1 3 0.0492
#> 4 2 1 -0.419
#> 5 2 2 1.58
#> 6 2 3 -2.37
my_tidy_draws <- tidybayes::combine_chains(my_draws)
# as expected
my_tidy_draws
#> # A tibble: 6 × 4
#> .chain .iteration x .draw
#> <int> <int> <dbl> <int>
#> 1 1 1 1.20 1
#> 2 1 2 0.213 2
#> 3 1 3 0.0492 3
#> 4 2 1 -0.419 4
#> 5 2 2 1.58 5
#> 6 2 3 -2.37 6
# make chain 1 shorter than chain 2
my_tidy_draws_missing <- my_draws |>
dplyr::filter(.data$.chain != 1 | .data$.iteration != 3) |>
tidybayes::combine_chains()
# goes from .draw 2 to .draw 4, no .draw 3
my_tidy_draws_missing
#> # A tibble: 5 × 4
#> .chain .iteration x .draw
#> <int> <int> <dbl> <int>
#> 1 1 1 1.20 1
#> 2 1 2 0.213 2
#> 3 2 1 -0.419 4
#> 4 2 2 1.58 5
#> 5 2 3 -2.37 6
# draw_from_chain_and_iteration_() behaves identically
manual_missing <- my_draws |>
dplyr::filter(.data$.chain != 1 | .data$.iteration != 3) |>
dplyr::mutate(.draw = tidybayes:::draw_from_chain_and_iteration_(.data$.chain, .data$.iteration))
manual_missing
#> # A tibble: 5 × 4
#> .chain .iteration x .draw
#> <int> <int> <dbl> <int>
#> 1 1 1 1.20 1
#> 2 1 2 0.213 2
#> 3 2 1 -0.419 4
#> 4 2 2 1.58 5
#> 5 2 3 -2.37 6
# We also get non-consecutive .draw numbers
# when .iteration values within a .chain
# are non-consecutive or .chain values are non-consecutive,
# but that's somewhat less likely to occur.
my_tidy_draws_missing_2 <- my_draws |>
dplyr::filter(.data$.chain != 1 | .data$.iteration != 2) |>
tidybayes::combine_chains()
# no .draw 2
my_tidy_draws_missing_2
#> # A tibble: 5 × 4
#> .chain .iteration x .draw
#> <int> <int> <dbl> <int>
#> 1 1 1 1.20 1
#> 2 1 3 0.0492 3
#> 3 2 1 -0.419 4
#> 4 2 2 1.58 5
#> 5 2 3 -2.37 6Created on 2025-07-15 with reprex v2.1.1
Thoughts
I don't think this is a bug per se. AFAICT there is no explicit guarantee in the docs that .draw values will be consecutive or that max(.draw) is necessarily the total number of draws. But in typical use (spread_draws from a set of equal length chains), both of those things are the case, and a search of github reveals plenty of code that assumes max(.draw) is the total number of draws.
Given that it would be relatively straightforward to modify combine_chains and draw_from_chain_and_iteration_() so that resulting .draw values are guaranteed always to be consecutive, it might be worth doing. I could make a PR if it would be of interest.