Conversation
|
@rileyjmurray lmk your thoughts |
rileyjmurray
left a comment
There was a problem hiding this comment.
Looks pretty good!
See comments for some requested changes.
I'm also going to ask that you update the web documentation to include spgemm on this page and discussion of MKL on this page. You'll also need to update the Limitations page, here, to mention that spgemm requires a third-party library (currently only MKL). As appropriate, make sure to mention that only single and double precision are supported, in contrast to RandBLAS kernels that use any scalar type.
^ All the web docs are defined here.
There was a problem hiding this comment.
sparse-data-matrices is just supposed to hold matrix files. Please leave it in the git-ignore. (I'm not sure why sparse-low-rank-approx subfolders would ever be created. Maybe if you tried to build from that directory?)
| #cmakedefine RandBLAS_HAS_MKL | ||
| // ^ CMake determines whether or not to #define RandBLAS_HAS_MKL | ||
| // | ||
| // This is set when BLAS++ was built with Intel MKL and the MKL sparse | ||
| // BLAS header (mkl_spblas.h) is found. When defined, RandBLAS uses | ||
| // MKL's Inspector-Executor sparse BLAS for accelerated sparse matrix | ||
| // operations (sparse x dense and sparse x sparse multiplication). | ||
| // | ||
| // If you don't want to use CMake, define this only if you are linking | ||
| // to Intel MKL and have mkl_spblas.h available. | ||
| // |
| #include <algorithm> | ||
| #include <numeric> | ||
|
|
||
| // Include internal headers for direct kernel access |
There was a problem hiding this comment.
It seems this file invokes spmm_left and spmm_right, not direct kernel calls.
There was a problem hiding this comment.
I want to make sure this executes properly even if MKL is not available. Can you add a continuous integration build that runs this example with small dimension sizes? Or maybe just update our CI builds so that this example runs on all platforms.
There was a problem hiding this comment.
sparse-data-matrices is just supposed to hold matrix files themselves, not any executables. Please create a new folder called something like "simple-kernel-benchmarks". It's okay if that folder only contains this one file.
| } | ||
| MKLSparseHandle& operator=(MKLSparseHandle&& other) noexcept { | ||
| if (this != &other) { | ||
| if (handle) mkl_sparse_destroy(handle); |
There was a problem hiding this comment.
Does mkl_sparse_destroy try to tree attached memory?
| colptr, colptr + 1, rowidxs, A.vals | ||
| ); | ||
| } else { | ||
| static_assert(sizeof(T) == 0, "MKL sparse BLAS only supports float and double."); |
There was a problem hiding this comment.
Same comment here as with the static_assert in make_mkl_handle_csr.
| (MKL_INT)A.nnz, rows, cols, A.vals | ||
| ); | ||
| } else { | ||
| static_assert(sizeof(T) == 0, "MKL sparse BLAS only supports float and double."); |
There was a problem hiding this comment.
Same comment as make_mkl_handle_cs[r/c].
| } else if constexpr (is_coo) { | ||
| return make_mkl_handle_coo(A); | ||
| } else { | ||
| static_assert(sizeof(SpMat) == 0, "Unsupported sparse matrix format for MKL backend."); |
There was a problem hiding this comment.
Same comment as other static_asserts.
| // Try MKL-accelerated path if available. | ||
| #if defined(RandBLAS_HAS_MKL) | ||
| if constexpr (sizeof(typename SpMat::index_t) == sizeof(MKL_INT)) { | ||
| // mkl_left_spmm returns false if it can't handle this case | ||
| // (e.g., COO with submatrix offsets, CSC format). | ||
| // Beta is already applied to C above, so pass beta=1 to MKL | ||
| // so it adds alpha*A*B to the existing (pre-scaled) C. | ||
| bool handled = RandBLAS::sparse_data::mkl::mkl_left_spmm( | ||
| layout, Op::NoTrans, opB, d, n, m, alpha, | ||
| A, ro_a, co_a, B, ldb, (T)1, C, ldc | ||
| ); | ||
| if (handled) | ||
| return; | ||
| } | ||
| #endif |
There was a problem hiding this comment.
I'm okay with this implementation for now. Things will get messy if we add support for other sparse matrix libraries (e.g., from Arm or AMD). In the PR description please outline a plan for adding that support in a way that doesn't significantly complicate the spmm_dispatch kernels. (Or explain why you think we should accept the possibility of such complication.)
This PR adds Intel MKL sparse BLAS support to RandBLAS, which gives a significant speedup for sparse-dense matrix multiplication when you store your sparse matrix in the right format (CSR for left_spmm, CSC for right_spmm due to internal transpose).
Added the mkl_spmm_impl.hh with RAII wrappers for MKL handles, dispatch logic in spmm_dispatch.hh that tries MKL first and falls back to hand-rolled kernels when MKL can't handle the format, and a new spgemm function for sparse×sparse=dense multiplication.
Also, created a benchmark (spmm_mkl_comparison.cc) that demonstrates the MKL vs hand-rolled performance difference and explains when MKL kicks in.
Also, discovered and fixed a pre-existing latent bug in the public RandBLAS::spmm dense×sparse API wrapper that had an extra argument and would have caused a compile error if anyone tried to use it, and we added tests for that API to prevent regression.