absl/crc: Add RISC-V hardware acceleration for CRC32C#1986
Open
PeterPtroc wants to merge 1 commit intoabseil:masterfrom
Open
absl/crc: Add RISC-V hardware acceleration for CRC32C#1986PeterPtroc wants to merge 1 commit intoabseil:masterfrom
PeterPtroc wants to merge 1 commit intoabseil:masterfrom
Conversation
This change introduces a hardware-accelerated implementation of CRC32C for RISC-V processors that support the Zbc (Carry-less multiplication) or Zbkc extensions. Key changes: - Implemented CRC32AcceleratedRISCV using clmul and clmulh instructions via inline assembly. - Added runtime CPU feature detection for RISC-V using riscv_hwprobe on Linux to safely enable the accelerated path. - Updated CRCImpl::NewInternal to instantiate the RISC-V implementation when supported hardware is detected. - Updated CMakeLists.txt to detect compiler support for -march=rv64gc_zbc or -march=rv64gc_zbkc and apply it to the specific translation unit. - Updated BUILD.bazel to apply -march=rv64gc_zbc for riscv64 builds using GCC/Clang, following Abseil's existing patterns for architecture-specific flags. This implementation significantly improves CRC32C throughput on supported RISC-V hardware by utilizing carry-less multiplication instructions instead of the table-based software fallback. Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
Member
|
Thank you for this pull request. Don't worry about the CI failures, I fixed them when I converted this pull request to an internal change to the Google codebase. Your change is currently under review and testing (I'm trying to find someone with hardware to verify this on). I will report back if I need anything from you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces a hardware-accelerated implementation of CRC32C for RISC-V processors that support the
ZbcorZbkcextensions.Motivation
CRC32C is a performance-critical operation in many applications. The current software fallback on RISC-V is slower than what can be achieved using the carry-less multiplication instructions available in the
Zbcextension. This change leverages these instructions to improve throughput.Changes
absl/crc/internal/crc_riscv.cc: ImplementedAbslCrc32cClmulRiscvusingclmulandclmulhinstructions via inline assembly. The implementation uses a folding approach similar to the x86/ARM combined implementation.absl/crc/internal/cpu_detect.cc: Added runtime CPU feature detection for RISC-V usingriscv_hwprobeon Linux to safely enable the accelerated path only when the hardware supports it.absl/crc/internal/crc.cc: UpdatedCRCImpl::NewInternalto instantiate the RISC-V implementation when supported hardware is detected.CMakeLists.txtto detect compiler support for-march=rv64gc_zbcor-march=rv64gc_zbkcand apply it to the specific translation unit.BUILD.bazelto apply-march=rv64gc_zbcforriscv64builds using GCC/Clang.Performance
Benchmarks were run on a RISC-V 64-bit system (64 cores @ 2.6GHz).
Benchmark:
//absl/crc:crc32c_benchmarkBM_Calculate/500000BM_Extend/500000BM_Extend/100000000BM_ExtendCacheMiss/100000Throughput (MiB/s)
BM_ExtendCacheMiss/100BM_ExtendCacheMiss/1000BM_ExtendCacheMiss/100000Testing
Ran
//absl/crc:alltests on the target hardware.All 231 tests in the project passed.
Raw Benchmark Data (Origin)
Raw Benchmark Data (Patch)