Skip to content

MKL spmm and spgemm integration#147

Open
mmelnich wants to merge 5 commits intomainfrom
spmm_investigation
Open

MKL spmm and spgemm integration#147
mmelnich wants to merge 5 commits intomainfrom
spmm_investigation

Conversation

@mmelnich
Copy link
Contributor

@mmelnich mmelnich commented Feb 4, 2026

This PR adds Intel MKL sparse BLAS support to RandBLAS, which gives a significant speedup for sparse-dense matrix multiplication when you store your sparse matrix in the right format (CSR for left_spmm, CSC for right_spmm due to internal transpose).
Added the mkl_spmm_impl.hh with RAII wrappers for MKL handles, dispatch logic in spmm_dispatch.hh that tries MKL first and falls back to hand-rolled kernels when MKL can't handle the format, and a new spgemm function for sparse×sparse=dense multiplication.
Also, created a benchmark (spmm_mkl_comparison.cc) that demonstrates the MKL vs hand-rolled performance difference and explains when MKL kicks in.

Also, discovered and fixed a pre-existing latent bug in the public RandBLAS::spmm dense×sparse API wrapper that had an extra argument and would have caused a compile error if anyone tried to use it, and we added tests for that API to prevent regression.

@mmelnich mmelnich changed the title Spmm investigation MKL spmm and spgemm integration Feb 4, 2026
@mmelnich
Copy link
Contributor Author

mmelnich commented Feb 6, 2026

@rileyjmurray lmk your thoughts

@mmelnich mmelnich requested a review from rileyjmurray February 6, 2026 15:52
Copy link
Contributor

@rileyjmurray rileyjmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good!

See comments for some requested changes.

I'm also going to ask that you update the web documentation to include spgemm on this page and discussion of MKL on this page. You'll also need to update the Limitations page, here, to mention that spgemm requires a third-party library (currently only MKL). As appropriate, make sure to mention that only single and double precision are supported, in contrast to RandBLAS kernels that use any scalar type.

^ All the web docs are defined here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse-data-matrices is just supposed to hold matrix files. Please leave it in the git-ignore. (I'm not sure why sparse-low-rank-approx subfolders would ever be created. Maybe if you tried to build from that directory?)

Comment on lines +56 to +66
#cmakedefine RandBLAS_HAS_MKL
// ^ CMake determines whether or not to #define RandBLAS_HAS_MKL
//
// This is set when BLAS++ was built with Intel MKL and the MKL sparse
// BLAS header (mkl_spblas.h) is found. When defined, RandBLAS uses
// MKL's Inspector-Executor sparse BLAS for accelerated sparse matrix
// operations (sparse x dense and sparse x sparse multiplication).
//
// If you don't want to use CMake, define this only if you are linking
// to Intel MKL and have mkl_spblas.h available.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you caught this!

#include <algorithm>
#include <numeric>

// Include internal headers for direct kernel access
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this file invokes spmm_left and spmm_right, not direct kernel calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure this executes properly even if MKL is not available. Can you add a continuous integration build that runs this example with small dimension sizes? Or maybe just update our CI builds so that this example runs on all platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse-data-matrices is just supposed to hold matrix files themselves, not any executables. Please create a new folder called something like "simple-kernel-benchmarks". It's okay if that folder only contains this one file.

}
MKLSparseHandle& operator=(MKLSparseHandle&& other) noexcept {
if (this != &other) {
if (handle) mkl_sparse_destroy(handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does mkl_sparse_destroy try to tree attached memory?

colptr, colptr + 1, rowidxs, A.vals
);
} else {
static_assert(sizeof(T) == 0, "MKL sparse BLAS only supports float and double.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as with the static_assert in make_mkl_handle_csr.

(MKL_INT)A.nnz, rows, cols, A.vals
);
} else {
static_assert(sizeof(T) == 0, "MKL sparse BLAS only supports float and double.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as make_mkl_handle_cs[r/c].

} else if constexpr (is_coo) {
return make_mkl_handle_coo(A);
} else {
static_assert(sizeof(SpMat) == 0, "Unsupported sparse matrix format for MKL backend.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as other static_asserts.

Comment on lines +127 to +141
// Try MKL-accelerated path if available.
#if defined(RandBLAS_HAS_MKL)
if constexpr (sizeof(typename SpMat::index_t) == sizeof(MKL_INT)) {
// mkl_left_spmm returns false if it can't handle this case
// (e.g., COO with submatrix offsets, CSC format).
// Beta is already applied to C above, so pass beta=1 to MKL
// so it adds alpha*A*B to the existing (pre-scaled) C.
bool handled = RandBLAS::sparse_data::mkl::mkl_left_spmm(
layout, Op::NoTrans, opB, d, n, m, alpha,
A, ro_a, co_a, B, ldb, (T)1, C, ldc
);
if (handled)
return;
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this implementation for now. Things will get messy if we add support for other sparse matrix libraries (e.g., from Arm or AMD). In the PR description please outline a plan for adding that support in a way that doesn't significantly complicate the spmm_dispatch kernels. (Or explain why you think we should accept the possibility of such complication.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants