Skip to content

Fix color artifacts in PointCloud projection due to CUDA race condition#7424

Open
ManuCorrea wants to merge 5 commits intoisl-org:mainfrom
ManuCorrea:fix-cuda-race-condition
Open

Fix color artifacts in PointCloud projection due to CUDA race condition#7424
ManuCorrea wants to merge 5 commits intoisl-org:mainfrom
ManuCorrea:fix-cuda-race-condition

Conversation

@ManuCorrea
Copy link

@ManuCorrea ManuCorrea commented Jan 31, 2026

Type

  • Bug fix (non-breaking change which fixes an issue): Fixes #
  • New feature (non-breaking change which adds functionality). Resolves #
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #

Motivation and Context

It solves the artifacts described in #6612 which happen to be because a race condition on the CUDA code.

The existing ProjectCUDA implementation suffered from a race condition in the color assignment pass. Multiple threads with similar depths in the color assignment pass (within the precision_bound) could simultaneously pass the depth test and write to the RGB color buffer, leading to the artifacts shown in the mentioned issue.

Checklist:

  • I have run python util/check_style.py --apply to apply Open3D code style
    to my code.
  • This PR changes Open3D behavior or adds new functionality.
    • Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
      updated accordingly.
    • I have added or updated C++ and / or Python unit tests OR included test
      results
      (e.g. screenshots or numbers) here.
  • I will follow up and update the code if CI fails.
  • For fork PRs, I have selected Allow edits from maintainers.

Description

To fix it I implemented a buffer to keep track in the pass 1 of which worker should write to the color tensor in the pass 2.

Pass 1 (Depth & Index Acquisition):

  • Packs the depth (32-bit float bits) into the High Bits and the Point Index (32-bit workload_idx) into the Low Bits.
  • Uses atomicMin on the 64-bit value. This ensures an unique worker for every pixel based on the minimum depth.

Pass 2 (Deterministic Color Assignment):

  • Each thread reads the 64-bit winner from the index buffer.
  • By applying a 32-bit mask (& 0xFFFFFFFF), we extract the unique ID.
  • Only the thread whose workload_idx matches the ID writes to the color tensor.

Old graph from the issue:
297903988-9b289072-1d87-40d0-807e-858a10e6702e

My results:
CPUvsCUDA_correct

@update-docs
Copy link

update-docs bot commented Jan 31, 2026

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

@ManuCorrea
Copy link
Author

I will add testing and the changes in CHANGELOG.md in the following days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant