Skip to content

feat(gpu): implement 4-bit windowed scalar multiplication for generator#8

Merged
arkadianet merged 1 commit intomainfrom
feat/windowed-scalar-mul
Jan 4, 2026
Merged

feat(gpu): implement 4-bit windowed scalar multiplication for generator#8
arkadianet merged 1 commit intomainfrom
feat/windowed-scalar-mul

Conversation

@arkadianet
Copy link
Owner

@arkadianet arkadianet commented Jan 4, 2026

Replace naive bit-by-bit double-and-add with MSB-first fixed-window method using precomputed generator multiples.

Changes:

  • Add G_TABLE[16][24] with precomputed 0G through 15G in Jacobian coords
  • Replace pt_mul_generator() with windowed implementation
  • Add gen_g_table binary to generate/verify table constants

Performance:

  • Overall throughput: 276k → 311k addr/s (+12.6%)

All 30 tests pass including CPU/GPU consistency check.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Faster GPU secp256k1 scalar multiplication using precomputed lookup tables for generator operations.
    • Added a utility to generate and verify the precomputed table used by GPU kernels.
  • Documentation

    • Added explanatory metadata and comments describing the table layout and generation/verification process.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

Adds a precomputed G_TABLE constant storing 16 windowed multiples of the secp256k1 generator in Jacobian coordinates to the GPU kernel, and introduces a new Rust binary tool that generates this table by computing i·G for i = 0..15 and formatting the output for kernel integration.

Changes

Cohort / File(s) Summary
GPU Kernel: Generator Windowing
crates/erg-vanity-gpu/kernels/secp256k1_point.cl
Added public constant G_TABLE[16][24] storing 16 precomputed multiples of generator G in Jacobian form (X, Y, Z each 8 limbs). Replaced pt_mul_generator implementation with a 4-bit windowed nibble-based algorithm: converts scalar to bytes, processes 32 bytes MSB-first by nibble, doubles accumulators, and adds selected precomputed points using ping-pong buffers.
Table Generation Tool
crates/erg-vanity-gpu/src/bin/gen_g_table.rs
New Rust binary that generates G_TABLE by computing i·G (i = 0..15), converting coordinates to the kernel's 8-limb little-endian representation, and printing a C-style static initializer. Includes bytes_to_limbs helper and validation against kernel GX/GY constants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through bits and nibbles bright,

Sixteen Gs I carried into night,
Limbs aligned, in Jacobian song,
Ping-pong adds make scalars strong,
A rabbit cheers — the table's right!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing a 4-bit windowed scalar multiplication approach for the generator point, which is the core objective of the PR.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Replace naive bit-by-bit double-and-add with MSB-first fixed-window
method using precomputed generator multiples.

Changes:
- Add G_TABLE[16][24] with precomputed 0*G through 15*G in Jacobian coords
- Replace pt_mul_generator() with windowed implementation
- Add gen_g_table binary to generate/verify table constants

Performance:
- Overall throughput: 276k → 311k addr/s (+12.6%)

All 30 tests pass including CPU/GPU consistency check.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@arkadianet arkadianet force-pushed the feat/windowed-scalar-mul branch from 19ac415 to a419894 Compare January 4, 2026 23:41
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/erg-vanity-gpu/kernels/secp256k1_point.cl (1)

317-340: Optional: Consider adding a comment to clarify the ping-pong buffer swap pattern.

The manual pointer swaps achieve zero-copy ping-ponging, which is excellent for performance. However, the swap logic (especially the difference between the loop swaps and the single braced swaps) could benefit from a brief comment explaining that after processing both nibbles, acc returns to its original buffer.

This is a minor readability suggestion—the implementation is correct and efficiently avoids unnecessary copying.

Example clarifying comment
     // Process 32 bytes MSB-first, high nibble then low nibble per byte
+    // Ping-pong buffers: after each nibble (4 doubles + 1 add), pointers swap
+    // After both nibbles per byte, acc returns to original buffer (2 total swaps)
     for (int byte_idx = 0; byte_idx < 32; byte_idx++) {
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19ac415 and a419894.

📒 Files selected for processing (2)
  • crates/erg-vanity-gpu/kernels/secp256k1_point.cl
  • crates/erg-vanity-gpu/src/bin/gen_g_table.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/erg-vanity-gpu/src/bin/gen_g_table.rs
🔇 Additional comments (2)
crates/erg-vanity-gpu/kernels/secp256k1_point.cl (2)

25-110: G_TABLE[1] (generator point) correctly matches GX_BYTES and GY_BYTES constants.

Verified that the precomputed entry for 1*G matches the generator coordinates when converted from big-endian bytes to little-endian 32-bit limbs, confirming correct encoding in the table. The Z-coordinate is correctly set to 1 (affine point in Jacobian representation). The gen_g_table.rs tool exists and was used to generate the table as documented.


308-344: The byte ordering assumption is correct. The sc_to_bytes function in secp256k1_scalar.cl (lines 159-167) explicitly produces big-endian bytes by mapping limb 0 (LSB) to bytes[28-31] and limb 7 (MSB) to bytes[0-3]. This means pt_mul_generator correctly processes the scalar MSB-first by iterating from byte index 0 to 31, and the windowed multiplication algorithm computes the intended scalar value.

@arkadianet arkadianet merged commit 1d9896c into main Jan 4, 2026
5 checks passed
@arkadianet arkadianet deleted the feat/windowed-scalar-mul branch January 4, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant