Skip to content

Use memcmp for field element comparisons#629

Open
jtraglia wants to merge 6 commits intoethereum:mainfrom
jtraglia:fr-comparisions-optimization
Open

Use memcmp for field element comparisons#629
jtraglia wants to merge 6 commits intoethereum:mainfrom
jtraglia:fr-comparisions-optimization

Conversation

@jtraglia
Copy link
Member

In compute_cells_and_kzg_proofs, comparing field element limbs directly is 2.87% faster with precompute=8.

Note: I benchmarked verify_cell_kzg_proof_batch too but there wasn't any difference.

Before:

$ go clean -cache && GOMAXPROCS=1 go test -test.bench=Benchmark/ComputeCellsAndKZGProofs -test.run=^$ ./bindings/go 
goos: darwin
goarch: arm64
pkg: github.com/ethereum/c-kzg-4844/v2/bindings/go
cpu: Apple M1
Benchmark/ComputeCellsAndKZGProofs(precompute=0)                       4         263398458 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=1)                       2         908959958 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=2)                       3         488505694 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=3)                       3         350817750 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=4)                       4         277039625 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=5)                       5         240792967 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=6)                       5         213914367 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=7)                       6         198406333 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=8)                       6         184162486 ns/op
Benchmark/ComputeCellsAndKZGProofsParallel(count=8)                    4         289445594 ns/op
PASS
ok      github.com/ethereum/c-kzg-4844/v2/bindings/go   96.002s

After:

$ go clean -cache && GOMAXPROCS=1 go test -test.bench=Benchmark/ComputeCellsAndKZGProofs -test.run=^$ ./bindings/go
goos: darwin
goarch: arm64
pkg: github.com/ethereum/c-kzg-4844/v2/bindings/go
cpu: Apple M1
Benchmark/ComputeCellsAndKZGProofs(precompute=0)                       4         263577667 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=1)                       2         904966354 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=2)                       3         482739042 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=3)                       3         345744278 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=4)                       4         276764583 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=5)                       5         238037950 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=6)                       5         211482933 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=7)                       6         195122521 ns/op
Benchmark/ComputeCellsAndKZGProofs(precompute=8)                       6         178871958 ns/op
Benchmark/ComputeCellsAndKZGProofsParallel(count=8)                    4         290647542 ns/op
PASS
ok      github.com/ethereum/c-kzg-4844/v2/bindings/go   95.532s

Copy link
Contributor

@b-wagn b-wagn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/common/fr.c Outdated
blst_uint64_from_fr(a, p);
return a[0] == 1 && a[1] == 0 && a[2] == 0 && a[3] == 0;
bool fr_is_one(const fr_t *fr) {
return memcmp(fr, &FR_ONE, sizeof(fr_t)) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switch to memcmp is not truly straightforward because the old approach (using blst_uint64_from_fr()) was doing a modular reduction on the input field element. Whereas now we don't, and this could be a problem if this function ever receives an unreduced field element.

I don't think this is a problem, because we should not be handling unreduced field elements, but it's a new assumption that should be taken into account every time we parse or introduce field elements in any way.

src/common/fr.c Outdated
bool fr_is_null(const fr_t *p) {
return fr_equal(p, &FR_NULL);
bool fr_is_null(const fr_t *fr) {
return memcmp(fr, &FR_NULL, sizeof(fr_t)) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, FR_NULL is only used during recovery and nowhere else. Perhaps we can just kill FR_NULL completely, and initialize recovered_cells_fr with FR_ZERO. After all, the missing cells will be set to zero by multiplication with vanishing_poly_eval. This is what's done in the spec right now (see extended_evaluation_rbo).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right. I like this suggestion. Let me do this.

src/common/fr.h Outdated
bool fr_is_one(const fr_t *p);
bool fr_is_null(const fr_t *p);
bool fr_is_zero(const fr_t *fr);
bool fr_is_one(const fr_t *fr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops good catch, removed.

@@ -30,36 +31,7 @@
* @retval false The two elements are not equal.
*/
Copy link
Contributor

@kevaundray kevaundray Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should write something like "field elements are assumed to be reduced" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants