Conversation
|
Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr
cc @rust-lang/miri Some changes occurred to the platform-builtins intrinsics. Make sure the cc @antoyo, @GuillaumeGomez, @bjorn3, @calebzulawski, @programmerjake |
This comment has been minimized.
This comment has been minimized.
| use crate::num::Wrapping; | ||
|
|
||
| // i.e. 0b100010001...0001 in binary. | ||
| const MASK: u128 = 0x1111_1111_1111_1111_1111_1111_1111_1111; |
There was a problem hiding this comment.
| const MASK: u128 = 0x1111_1111_1111_1111_1111_1111_1111_1111; | |
| const MASK: u128 = !0 / 0xF; |
seems easier to read
There was a problem hiding this comment.
Is it? When I look at that expression I'd have to trust the comment that it does what it says. The 0x1111 literal is long but to me it's clearer.
f564229 to
47da3e1
Compare
|
The job Click to see the possible cause of the failure (guessed by this bot) |
My understanding was that
The tuple returning method is a possible future expansion, not intended to be currently implemented. I don't know the best way to implement the widening operation, but we can get the high-order bits using clmul as a primitive. In LLVM's arbitrary precision integer library, clmulr(a, b) = bitreverse(clmul(bitreverse(a), bitreverse(b)))
clmulh(a, b) = clmulr(a, b) >> 1https://github.com/llvm/llvm-project/blob/66ba9c9475c7628893f387f1b9d9d3056b05d995/llvm/include/llvm/ADT/APInt.h#L2457-L2476 |
|
Right, it's just the first widening function that would have that signature. If that is the plan then it's pretty simple: you just widen first and then do For This is running into CI sometimes using earlier LLVM releases without the new intrinsic. I asked in #t-compiler/llvm > Adding an LLVM 22 intrinsic how/if we can work around that. |
actually I expect you can probably just use ; cleaned up slightly
define void @u128_carrying_mul(ptr sret([32 x i8]) align 16 %_0, i128 noundef %a, i128 noundef %b, i128 noundef %c) unnamed_addr #0 {
start:
%0 = zext i128 %a to i256
%1 = zext i128 %b to i256
%2 = zext i128 %c to i256
%3 = mul nuw i256 %1, %0
%4 = add nuw i256 %3, %2
%5 = trunc i256 %4 to i128
%6 = lshr i256 %4, 128
%7 = trunc nuw i256 %6 to i128
store i128 %5, ptr %_0, align 16
%_0.repack1 = getelementptr inbounds nuw i8, ptr %_0, i64 16
store i128 %7, ptr %_0.repack1, align 16
ret void
} |
| /// This approach uses a bitmask of the form `0b100010001...0001` to avoid carry spilling. | ||
| /// When carries do occur, they wind up in a "hole" of zeros and are subsequently masked | ||
| /// out of the result. |
There was a problem hiding this comment.
This approach with 4-bit digits works up to integers with 4 * 15 = 60 bits. Past that, one digit can overflow to the next.
For u64, it does actually work for this "non-widening" operation, since the top digit may be computed as 16, but there is no next digit that would be affected. The wide result would be erroneous however. E.g. x.carryless_mul(x) with x = MASK as u64 as u128.
The impl for u128::carryless_mul is currently incorrect for that reason. You could probably extend the approach to use 5-bit digits, but it's likely better to just implement it in terms of u64::carryless_mul.
some tests against a naive impl: playground
This comment was marked as resolved.
This comment was marked as resolved.
| unsafe { a.unchecked_funnel_shr(b, shift) } | ||
| } | ||
|
|
||
| /// Carryless multiply. |
There was a problem hiding this comment.
Is this meant to be a self-contained description? To me these words mean nothing.^^
| sym::bitreverse => { | ||
| self.call_intrinsic("llvm.bitreverse", &[llty], &[args[0].immediate()]) | ||
| } | ||
| sym::carryless_mul if crate::llvm_util::get_version() >= (22, 0, 0) => { |
There was a problem hiding this comment.
Since the fallback body is non-trivial, maybe this should be added to the list from #150605 ?
| #[doc = concat!("let a = ", $clmul_lhs, stringify!($SelfT), ";")] | ||
| #[doc = concat!("let b = ", $clmul_rhs, stringify!($SelfT), ";")] | ||
| /// | ||
| #[doc = concat!("assert_eq!(a.carryless_mul(b), ", $clmul_result, ");")] |
There was a problem hiding this comment.
The only example here uses huge numbers which means the example is entirely useless if one doesn't already know what this operation does. I for one still have no clue after reading all this.^^
Which carry is discarded? All the ones that show up when adding up the results of the first steps of long multiplication? Or also all the carries that occur within the long multiplication steps? This needs way more detail. Doesn't that mean the result depends on the base? No base is mentioed in the first 2 paragraphs, and even in the third paragraph it's only mentioned in an obscure way.
There was a problem hiding this comment.
maybe describe it as equivalent to:
impl uN {
pub fn carryless_mul(self, other: Self) -> Self {
let mut retval = 0;
for i in 0..Self::BITS {
if (other >> i) & 1 != 0 {
retval ^= self << i;
}
}
retval
}
}There was a problem hiding this comment.
That is a good specification. I haven't really found/been able to come up with a good intuitive explanation, and even if you understand the mechanics, how/when to apply this function is still hard to see.
The only example here uses huge numbers
So in part that is to test that the implementation doesn't just mask off the upper bits. Because of how these functions are generated with a macro, getting examples to work with type inference and without overflowing literals is a bit finicky.
I'll try again to come up with something more insightful.
|
maybe look at the latest C++ proposal for a bunch of good examples: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3642r3.html |
tracking issue: #152080
ACP: rust-lang/libs-team#738
This defers to LLVM's
llvm.clmulwhen available, and otherwise falls back to a method from thepolyvalcrate (link).Some things are missing, which I think we can defer:
const(I think I ran into a bootstrapping issue). That is fine for now, I think instdarchwe can't really use this intrinsic at the moment, we'd only want the scalar version to replace some riscv intrinsics.