Add CUDA neighbor loop support by AhmedSalih3d · Pull Request #135 · AhmedSalih3d/SPHExample

AhmedSalih3d · 2026-01-14T23:45:28Z

Enable the neighbor traversal hot-loop to run on CUDA GPUs for simulations with GPU-backed particle arrays.
Provide GPU-friendly neighbor data structures by flattening per-cell neighbor lists and recording per-particle cell indices to permit contiguous GPU traversal.
Make GPU path optional and selectable via metadata so CPU behaviour is unchanged by default.

Added FlattenNeighborCellLists! and updated exports in src/SPHNeighborList.jl to produce flattened neighbor offsets/indices for GPU traversal.
Extended UpdateNeighbors! to optionally fill a CellListIndices array (per-particle cell index) and updated BuildNeighborCellLists! usage in src/SPHCellList.jl to call FlattenNeighborCellLists!.
Added UseCUDA::Bool to SimulationMetaData in src/SimulationMetaDataConfiguration.jl to toggle the CUDA path.
Introduced runtime gating (ShouldUseCUDANeighborLoop) and multiple CUDA kernel entrypoints and device kernels in src/SPHCellList.jl (several NeighborLoopPerParticleCUDA! overloads and NeighborLoopKernel* functions) to handle the different metadata/kernel/output/shifting combinations; the original threaded CPU loops remain and are used when CUDA is unavailable or disabled.
Refactored per-interaction callsites to pass a small ParticleFields view and a SimMetaDataType type for more uniform kernel/call signatures required by the CUDA wrappers.

No automated tests were executed for this change (no test run requested).
Commands executed while preparing this PR included repository inspection and edits such as ls, rg, sed, multiple apply_patch actions, and git commit -m "Add CUDA neighbor loop support".

Add CUDA neighbor loop support

66478f3

AhmedSalih3d added the codex label Jan 14, 2026 — with ChatGPT Codex Connector

Provide feedback