SMC Multiprocessing and Progress Bar Refactor by jessegrabowski · Pull Request #8047 · pymc-devs/pymc

jessegrabowski · 2026-01-10T22:57:04Z

Description

I always like SMC as a gradient-free option for my big silly models with few parameters, but it always gave me trouble because of the API break between it and pm.sample. This PR aims to harmonize the two by bringing over a bunch of functionality from pm.sample to pm.sample_smc.

This PR is intended to be reviewed commit by commit. I verified that the test suite runs in all intermediate forms. Here is a summary of each commit:

Use multiprocessing for SMC sampling: The multiprocessing library is now used to handle parallel SMC sampling. This commit was heavily Claude-assisted, so it should receive special scrutiny. The objective was to make SMC multiprocessing look exactly like MCMC multiprocessing. It also exposes an mp_ctx argument to pm.sample_smc, which can allow compiling with e.g. JAX (using mp_ctx ='forkserver').
Sample SMC sequentially when cores=1 adds separate logic for sequential sampling on one core. Again, this copies the relevant MCMC functions.
Initialize SMC Kernels on main process is a major performance change, intended to address e.g. BUG: sample_smc stalls on final stage #8030. Pytensor compilation is not thread-safe, so we shouldn't be doing it on the workers. In this PR, the kernel is compiled once on the main process, then serialized and sent to the workers. This matches what we do with step functions in MCMC. Importantly, this commit eliminates the need for serialization of many auxiliary objects, including the pymc model itself, and some special logic for custom distributions. To do this, a couple of ancillary changes had to be made -- for example, transformation of the chains from numpy to NDArray objects happens on the main process now, after all sampling is done.
Add blas_cores argument to sample_smc again, this copies over multiprocessing machinery from pm.sample to pm.sample_smc by adding a blas_cores argument to pm.sample_smc, for the same reasons it exists over there.
Add custom progress bar for SMC adds a progress bar style to sample_smc that matches that of pm.sample. The bars fill from 0-1 following the value of beta, and we provide an estimated time to completion by measuring the speed per step. It looks like this:

 Progress                                   Stage   Beta     Stage Speed    Elapsed   Remaining 
────────────────────────────────────────────────────────────────────────────────────────────────
 ━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3       0.0620   4.66 s/stage   0:00:17   0:03:46   
 ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3       0.0634   5.42 s/stage   0:00:17   0:04:18   
 ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   2       0.0269   6.15 s/stage   0:00:17   0:09:29   
 ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3       0.0646   5.38 s/stage   0:00:17   0:04:10   
 ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3       0.0658   5.16 s/stage   0:00:17   0:03:55   
 ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3       0.0658   5.55 s/stage   0:00:17   0:04:12   
 ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3       0.0639   4.83 s/stage   0:00:17   0:03:46

I observed big speed gains using sample_smc after this PR. I timed this simple hierarchical model:

import pymc as pm

with pm.Model() as m:
    idxs = pm.draw(pm.Categorical.dist(p=[1] * 10, shape=(100,)))
    effect_loc = pm.Normal('mu_loc', 0, 1)
    effect_scale = pm.HalfNormal('mu_scale', 1)
    effect = pm.Normal('mu', mu=effect_loc, sigma=effect_scale, shape=(10,))
    
    X = pm.draw(pm.Normal.dist(0, 1, shape=(100, 5)))
    beta = pm.Normal('beta', 0, 1, shape=(5,))
    
    mu = effect[idxs] + X @ beta
    sigma = pm.Exponential('sigma', 1)
    
    obs = pm.Normal('obs', mu=mu, sigma=sigma)
    prior = pm.sample_prior_predictive()

draw = prior.prior.obs.sel(chain=0, draw=123).values
m2 = pm.observe(m, {obs:draw})

%%time 
with m2:
    idata = pm.sample_smc()

Timings went from 6.1 s to 1.44 s using the C backend, and 1.46 s to 1.09 s using Numba mode (with cache). Running test_smc.py locally goes from 1m4s to 6.264 seconds.

I could run more formal benchmarks if someone asks, but I don't really want to.

Related Issue

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

jessegrabowski · 2026-01-10T22:59:37Z

@tvwenger it would be nice if you could try your SMC models that have been giving you trouble on this PR branch and report back, since you've been the one doing the heavy lifting bug-hunting on SMC lately.

Copilot

Pull request overview

This PR refactors SMC (Sequential Monte Carlo) sampling to harmonize its API with pm.sample, bringing multiprocessing capabilities, progress bars, and performance improvements to pm.sample_smc. The refactor moves PyTensor compilation to the main process before distributing work to child processes, addresses thread safety concerns, and adds comprehensive progress reporting similar to MCMC sampling.

Changes:

Implemented multiprocessing support for SMC using a pattern similar to MCMC parallel sampling
Added custom SMC progress bars that track beta (inverse temperature) progression from 0 to 1
Moved kernel compilation to main process to avoid thread-safety issues with PyTensor compilation in worker processes

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
pymc/smc/parallel.py	New file implementing parallel SMC sampling infrastructure with process management, message passing, and result collection
pymc/smc/sampling.py	Major refactor of main SMC sampling function to support both parallel and sequential execution with shared kernel compilation
pymc/smc/kernels.py	Moved kernel compilation to init and added progress bar configuration methods
pymc/progress_bar.py	Added SMCProgressBarManager class for beta-based progress tracking and modified table styling
tests/smc/test_smc.py	Added test for sequential sampling with cores=1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pymc/smc/sampling.py

pymc/smc/kernels.py

pymc/smc/sampling.py

pymc/smc/parallel.py

pymc/progress_bar.py

pymc/smc/parallel.py

codecov · 2026-01-10T23:07:58Z

Codecov Report

❌ Patch coverage is 78.71094% with 109 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.49%. Comparing base (11d0f1b) to head (cd2a687).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
pymc/smc/parallel.py	66.66%	68 Missing ⚠️
pymc/progress_bar/marimo_progress.py	3.33%	29 Missing ⚠️
pymc/smc/sampling.py	95.23%	4 Missing ⚠️
pymc/progress_bar/progress.py	95.16%	3 Missing ⚠️
pymc/sampling/mcmc.py	90.90%	2 Missing ⚠️
pymc/smc/kernels.py	97.46%	2 Missing ⚠️
pymc/sampling/parallel.py	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8047      +/-   ##
==========================================
+ Coverage   90.89%   91.49%   +0.59%     
==========================================
  Files         123      123              
  Lines       19501    19821     +320     
==========================================
+ Hits        17726    18135     +409     
+ Misses       1775     1686      -89

Files with missing lines	Coverage Δ
pymc/progress_bar/__init__.py	`100.00% <ø> (ø)`
pymc/progress_bar/rich_progress.py	`98.86% <100.00%> (+0.16%)`	⬆️
pymc/sampling/parallel.py	`63.54% <92.30%> (+0.59%)`	⬆️
pymc/sampling/mcmc.py	`91.23% <90.90%> (-0.04%)`	⬇️
pymc/smc/kernels.py	`97.26% <97.46%> (+50.16%)`	⬆️
pymc/progress_bar/progress.py	`97.33% <95.16%> (-0.71%)`	⬇️
pymc/smc/sampling.py	`96.59% <95.23%> (+19.51%)`	⬆️
pymc/progress_bar/marimo_progress.py	`17.39% <3.33%> (-1.47%)`	⬇️
pymc/smc/parallel.py	`66.66% <66.66%> (ø)`

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ricardoV94 · 2026-01-10T23:51:05Z

Check #8044

jessegrabowski · 2026-01-11T00:21:38Z

Check #8044

I addressed this in the Extract and share Parallel setup code between MCMC and SMC commit, but now the two PRs are overlapping.

pymc/smc/parallel.py

ricardoV94 · 2026-01-12T11:22:20Z

So there's an edge case with the pickle function -> send to process approach. If the pickled functions have random number generators these need to de changed so as to have independent streams.

Usually this isn't a problem in mcmc because we never wanted to use functions with randomness in it in our step samplers, but this is not the case for SMC, and specially SMC-ABC with Simulator, which definitely supposed to be random.

When you call model.logp() in a model with a Simulator the logp as RandomGenerator variables in it. These are unique everytime you called model.logp(), so the compile in each process didn't have the duplication issue (entropy properties may not have been great, but that's a different matter). Here they'll be exactly the same.

The approach may require something like the set_rng code added for the conjugate samplers in this PR: https://github.com/ricardoV94/pymc-extras/blob/35530daf55a8ae7f5cbcf5ae00760da32abf560a/pymc_extras/sampling/optimizations/conjugate_sampler.py#L84-L98, and make sure it's called at the beginning of sampling in each thread

pymc/smc/sampling.py

pymc/smc/parallel.py

jessegrabowski · 2026-01-12T14:30:29Z

The approach may require something like the set_rng code added for the conjugate samplers in this PR: https://github.com/ricardoV94/pymc-extras/blob/35530daf55a8ae7f5cbcf5ae00760da32abf560a/pymc_extras/sampling/optimizations/conjugate_sampler.py#L84-L98, and make sure it's called at the beginning of sampling in each thread

Could we just make the rng an explicit input to the function we pickle up and send out, to avoid the copy?

ricardoV94 · 2026-01-12T15:15:55Z

Could we just make the rng an explicit input to the function we pickle up and send out, to avoid the copy?

Not without much more changes in the codebase

pymc/smc/kernels.py

pymc/smc/parallel.py

Extract mp_ctx initialization function Extract blas_core setup function Don't use threadpool limits when mp_ctx is ForkContext

jessegrabowski · 2026-02-03T05:34:02Z

There were some major conflicts with the recent round of progress bar PRs, so I went ahead and did a bit of refactoring:

ProgressBarManager is now an ABC that holds common progress bar logic
MCMCProgressManager handles MCMC
SMCProgressManager handles SMC

I renamed everything in the rich/marimo progress bar backends to have more agnostic argument names (chains -> n_bars, total_draws -> total, draw -> completed, etc).

The marimo backend won't work with SMC because it has a ton of hard-coded logic for MCMC and I don't care enough to handle it. If a future dev cares both about SMC and Marimo, he can handle it in the future.

ricardoV94 · 2026-02-05T12:27:15Z

@jessegrabowski is this ready for final review?

jessegrabowski · 2026-02-06T02:56:34Z

yep

ricardoV94 · 2026-02-06T12:03:39Z

pymc/progress_bar/marimo_progress.py

                    "stats": {},
                }
            ]
            self._start_times = [perf_counter()]


Perhaps initialize only when the task / bar advances for the first time. So that sequential sampling is not measuring speed relative to the start of the first chain (otherwise it seems each chain is slower than the previous one...). Does not need to be done in this PR

ricardoV94 · 2026-02-06T12:12:32Z

pymc/smc/kernels.py

+            shared_rngs = [
+                var for var in fn.get_shared() if isinstance(var.type, RandomGeneratorType)
+            ]
+            n_shared_rngs = len(shared_rngs)


raise NotImplementedError if there are shared_rngs and isinstance(fn.maker.linker, JAXLinker), as the rngs have different format after compile and no longer are retrieved by fn.get_shared()` either.

ricardoV94 · 2026-02-06T12:14:51Z

pymc/smc/kernels.py

+        )
+        self.likelihood_logp_func = self.likelihood_logp_func.copy(
+            swap=make_rng_swaps(self.likelihood_logp_func, rng)
+        )


For safety, even if not used, do self.rng = rng

ricardoV94 · 2026-02-06T12:20:23Z

pymc/smc/sampling.py


-    return custom_methods

+def _sample_smc(


inline this in _sample_smc_many?

ricardoV94 · 2026-02-06T12:22:32Z

tests/smc/test_smc.py

            pm.sample_smc(draws=6, cores=2)

+    @pytest.mark.parametrize("chains", [1, 2], ids=["1_chain", "2_chains"])
+    def test_sequential(self, chains, caplog):


parametrize the first sample_test to conver both sequential and parallel.

ricardoV94 · 2026-02-06T12:23:00Z

tests/smc/test_smc.py

+                    chains=chains,
+                    cores=1,
+                    return_inferencedata=False,
+                    progressbar=not _IS_WINDOWS,


We shouldn't need this not _IS_WINDOWS logic anywhere anymore with the new progressbar (it existed in previous tests)

ricardoV94

My last requests (97% confidence)

jessegrabowski added enhancements bug maintenance SMC Sequential Monte Carlo labels Jan 10, 2026

jessegrabowski requested review from Armavica and ricardoV94 January 10, 2026 22:57

jessegrabowski requested a review from Copilot January 10, 2026 22:59

Copilot started reviewing on behalf of jessegrabowski January 10, 2026 23:00 View session

Copilot AI reviewed Jan 10, 2026

View reviewed changes

jessegrabowski requested a review from aseyboldt January 10, 2026 23:05

jessegrabowski force-pushed the smc-refactor branch from bf91046 to 7567f0e Compare January 10, 2026 23:49

github-actions bot added feature request pytensor labels Jan 10, 2026

jessegrabowski force-pushed the smc-refactor branch from 7567f0e to 8f6f2ad Compare January 11, 2026 00:20

jessegrabowski force-pushed the smc-refactor branch 3 times, most recently from 15c4e5f to ae2d582 Compare January 11, 2026 01:01

jessegrabowski requested a review from zaxtax January 12, 2026 00:41