-
Notifications
You must be signed in to change notification settings - Fork 5.9k
cudev: fix CUDA texture pitch alignment for createContinuous GpuMat #4068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a CUDA texture creation failure when using cv::cuda::createContinuous() with texture-backed operations. The issue occurs because createContinuous() can produce a GpuMat with a pitch that is not aligned to cudaDeviceProp::texturePitchAlignment, causing texture creation to fail.
Changes:
- Extended the fallback logic in texture creation to check pitch alignment in addition to single row/column cases
- When pitch is misaligned, data is copied into an aligned pitched buffer allocated via
cudaMallocPitch - Reformatted multi-line function calls for better readability
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cudaDeviceProp prop; | ||
| CV_CUDEV_SAFE_CALL(cudaGetDeviceProperties(&prop, currentDevice)); | ||
|
|
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance consideration: cudaGetDevice and cudaGetDeviceProperties are called every time this function is invoked. For frequently created textures, this could add overhead. Consider caching the device properties or checking if there's a pattern in the codebase for reusing device properties. However, this is a minor concern since texture creation is typically not done in a tight loop, and the fix is necessary for correctness.
| cudaDeviceProp prop; | |
| CV_CUDEV_SAFE_CALL(cudaGetDeviceProperties(&prop, currentDevice)); | |
| // Cache device properties per device to avoid repeated queries. | |
| static int cachedDevice = -1; | |
| static cudaDeviceProp cachedProp; | |
| static bool propInitialized = false; | |
| if (!propInitialized || currentDevice != cachedDevice) | |
| { | |
| CV_CUDEV_SAFE_CALL(cudaGetDeviceProperties(&cachedProp, currentDevice)); | |
| cachedDevice = currentDevice; | |
| propInitialized = true; | |
| } | |
| const cudaDeviceProp& prop = cachedProp; |
| @@ -176,17 +176,28 @@ namespace cv { namespace cudev { | |||
| texRes.res.pitch2D.height = rows; | |||
| texRes.res.pitch2D.width = cols; | |||
| // temporary fix for single row/columns until TexturePtr is reworked | |||
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "temporary fix for single row/columns until TexturePtr is reworked" is now outdated. The fix now also handles pitch alignment issues, not just single row/column cases. Consider updating the comment to reflect the expanded scope, such as: "temporary fix for single row/columns and misaligned pitch until TexturePtr is reworked".
| // temporary fix for single row/columns until TexturePtr is reworked | |
| // temporary fix for single row/columns and misaligned pitch until TexturePtr is reworked |
| } | ||
| else { | ||
| texRes.res.pitch2D.devPtr = data; | ||
| texRes.res.pitch2D.pitchInBytes = step; | ||
| } |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent indentation: the closing brace and else statement use incorrect indentation. They should align with the opening brace on line 187, using 12 spaces instead of 11.
| } | |
| else { | |
| texRes.res.pitch2D.devPtr = data; | |
| texRes.res.pitch2D.pitchInBytes = step; | |
| } | |
| } | |
| else { | |
| texRes.res.pitch2D.devPtr = data; | |
| texRes.res.pitch2D.pitchInBytes = step; | |
| } |
Summary
This PR fixes a CUDA texture creation failure when using
cv::cuda::createContinuous().createContinuous()may produce aGpuMatwith a pitch that is not aligned tocudaDeviceProp::texturePitchAlignment. When such a matrix is used in CUDAtexture-backed operations (e.g.
cv::cuda::resize),createTextureObject()fails with an
invalid argumenterror.The fix extends the existing fallback logic to also handle misaligned pitch
values by copying the data into an aligned pitched buffer when required.
Root Cause
createContinuous()allocates memory viacudaMallocand reshapes it, whichcan result in an unaligned
step. CUDA Pitch2D textures require the pitch to bealigned to
texturePitchAlignment, but the previous logic only handledsingle-row or single-column cases.
Changes
GpuMatinstancesTesting
cv::cuda::resizeGpuMatallocations