Skip to content

combine_chains() and draw_from_chain_and_iteration_() can produce non-consecutive .draw ids #338

@dylanhmorris

Description

@dylanhmorris

If combine_chains() or draw_from_chain_and_iteration_() is called on a dataset in which some chains have more iterations than others, this will produce a non-consecutive set of integer .draw ids:

Reprex

my_draws <- tidyr::crossing(
   .chain = 1:2,
   .iteration = 1:3) |>
   dplyr::mutate(x = rnorm(6))

my_draws
#> # A tibble: 6 × 3
#>   .chain .iteration       x
#>    <int>      <int>   <dbl>
#> 1      1          1  1.20  
#> 2      1          2  0.213 
#> 3      1          3  0.0492
#> 4      2          1 -0.419 
#> 5      2          2  1.58  
#> 6      2          3 -2.37

my_tidy_draws <- tidybayes::combine_chains(my_draws)

# as expected 
my_tidy_draws
#> # A tibble: 6 × 4
#>   .chain .iteration       x .draw
#>    <int>      <int>   <dbl> <int>
#> 1      1          1  1.20       1
#> 2      1          2  0.213      2
#> 3      1          3  0.0492     3
#> 4      2          1 -0.419      4
#> 5      2          2  1.58       5
#> 6      2          3 -2.37       6

# make chain 1 shorter than chain 2
my_tidy_draws_missing <- my_draws |>
    dplyr::filter(.data$.chain != 1 | .data$.iteration != 3) |>
    tidybayes::combine_chains()

# goes from .draw 2 to .draw 4, no .draw 3
my_tidy_draws_missing
#> # A tibble: 5 × 4
#>   .chain .iteration      x .draw
#>    <int>      <int>  <dbl> <int>
#> 1      1          1  1.20      1
#> 2      1          2  0.213     2
#> 3      2          1 -0.419     4
#> 4      2          2  1.58      5
#> 5      2          3 -2.37      6

# draw_from_chain_and_iteration_() behaves identically
manual_missing <- my_draws |>     
    dplyr::filter(.data$.chain != 1 | .data$.iteration != 3) |>
    dplyr::mutate(.draw = tidybayes:::draw_from_chain_and_iteration_(.data$.chain, .data$.iteration))

manual_missing
#> # A tibble: 5 × 4
#>   .chain .iteration      x .draw
#>    <int>      <int>  <dbl> <int>
#> 1      1          1  1.20      1
#> 2      1          2  0.213     2
#> 3      2          1 -0.419     4
#> 4      2          2  1.58      5
#> 5      2          3 -2.37      6

# We also get non-consecutive .draw numbers
# when .iteration values within a .chain 
# are non-consecutive or .chain values are non-consecutive,
# but that's somewhat less likely to occur.
my_tidy_draws_missing_2 <- my_draws |>
    dplyr::filter(.data$.chain != 1 | .data$.iteration != 2) |>
    tidybayes::combine_chains()

# no .draw 2
my_tidy_draws_missing_2
#> # A tibble: 5 × 4
#>   .chain .iteration       x .draw
#>    <int>      <int>   <dbl> <int>
#> 1      1          1  1.20       1
#> 2      1          3  0.0492     3
#> 3      2          1 -0.419      4
#> 4      2          2  1.58       5
#> 5      2          3 -2.37       6

Created on 2025-07-15 with reprex v2.1.1

Thoughts

I don't think this is a bug per se. AFAICT there is no explicit guarantee in the docs that .draw values will be consecutive or that max(.draw) is necessarily the total number of draws. But in typical use (spread_draws from a set of equal length chains), both of those things are the case, and a search of github reveals plenty of code that assumes max(.draw) is the total number of draws.

Given that it would be relatively straightforward to modify combine_chains and draw_from_chain_and_iteration_() so that resulting .draw values are guaranteed always to be consecutive, it might be worth doing. I could make a PR if it would be of interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions