Skip to content

feat(gpu): add NVIDIA OpenCL compile diagnostics via ERG_CL_VERBOSE#9

Merged
arkadianet merged 1 commit intomainfrom
feature/cl-verbose-diagnostics
Jan 5, 2026
Merged

feat(gpu): add NVIDIA OpenCL compile diagnostics via ERG_CL_VERBOSE#9
arkadianet merged 1 commit intomainfrom
feature/cl-verbose-diagnostics

Conversation

@arkadianet
Copy link
Owner

@arkadianet arkadianet commented Jan 5, 2026

Set ERG_CL_VERBOSE=1 to enable -cl-nv-verbose flag for NVIDIA devices. Prints ptxas info including register usage, spills, and occupancy hints.

Useful for profiling kernel register pressure before optimization.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added conditional verbose diagnostics for NVIDIA GPU kernels, activated when the ERG_CL_VERBOSE environment variable is set.
    • Enhanced kernel compilation debugging with automatic OpenCL build log output for NVIDIA hardware, providing diagnostic information for troubleshooting compilation issues.

✏️ Tip: You can customize this high-level summary in your review settings.

Set ERG_CL_VERBOSE=1 to enable -cl-nv-verbose flag for NVIDIA devices.
Prints ptxas info including register usage, spills, and occupancy hints.

Useful for profiling kernel register pressure before optimization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

Adds conditional OpenCL verbose diagnostics for NVIDIA GPUs controlled by the ERG_CL_VERBOSE environment variable. The kernel builder now dynamically constructs compiler options and emits build logs when verbose mode is enabled.

Changes

Cohort / File(s) Summary
NVIDIA Verbose Diagnostics
crates/erg-vanity-gpu/src/kernel.rs
Dynamic compiler option construction based on NVIDIA vendor detection and ERG_CL_VERBOSE flag; post-build logging of OpenCL diagnostics when verbose mode is active

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A GPU kernel now speaks with care,
When NVIDIA drivers heed the air—
With verbose logs and flags so fine,
The diagnostics dance and align!
Debugging dreams, now crystal-clear,
For every compute we hold dear. 🌟

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding NVIDIA OpenCL compile diagnostics controlled by the ERG_CL_VERBOSE environment variable.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f1c9ce and a24dfc5.

📒 Files selected for processing (1)
  • crates/erg-vanity-gpu/src/kernel.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (3)
crates/erg-vanity-gpu/src/kernel.rs (3)

53-55: Excellent documentation for the diagnostic feature.

The doc comments clearly explain how to enable NVIDIA verbose mode and what diagnostic information it provides. This helps users understand the feature without needing to read the implementation.


74-84: Robust build log retrieval and output.

The build log handling is well-implemented:

  • Safe error handling with if let Ok pattern
  • Checks for non-empty content before printing to avoid noise
  • Appropriate use of stderr for diagnostic messages
  • Clear formatting with the [diag] prefix

The ptxas information (register usage, spills, occupancy hints) will be valuable for kernel optimization as intended.


57-66: Implementation correctly handles NVIDIA-specific compiler options.

The code properly:

  • Detects NVIDIA devices using case-insensitive vendor string matching
  • Parses the ERG_CL_VERBOSE environment variable with a safe default (unwrap_or(false))
  • Constructs compiler options dynamically while preserving the base -cl-std=CL1.2 flag
  • Gates the NVIDIA-specific -cl-nv-verbose flag appropriately with both vendor and verbose checks
  • Includes clear documentation in the function's doc comment explaining the feature and flag behavior

The -cl-nv-verbose flag is a valid NVIDIA OpenCL vendor extension that enables verbose compiler output (register usage, optimization details, occupancy hints) as documented.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@arkadianet arkadianet merged commit 26ff8d2 into main Jan 5, 2026
5 checks passed
@arkadianet arkadianet deleted the feature/cl-verbose-diagnostics branch January 5, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant