Skip to content

Add CUDA neighbor loop support#135

Open
AhmedSalih3d wants to merge 1 commit intocodex/optimize-neighborlist-code-performancefrom
codex/run-neighbor-loop-on-cuda
Open

Add CUDA neighbor loop support#135
AhmedSalih3d wants to merge 1 commit intocodex/optimize-neighborlist-code-performancefrom
codex/run-neighbor-loop-on-cuda

Conversation

@AhmedSalih3d
Copy link
Owner

Motivation

  • Enable the neighbor traversal hot-loop to run on CUDA GPUs for simulations with GPU-backed particle arrays.
  • Provide GPU-friendly neighbor data structures by flattening per-cell neighbor lists and recording per-particle cell indices to permit contiguous GPU traversal.
  • Make GPU path optional and selectable via metadata so CPU behaviour is unchanged by default.

Description

  • Added FlattenNeighborCellLists! and updated exports in src/SPHNeighborList.jl to produce flattened neighbor offsets/indices for GPU traversal.
  • Extended UpdateNeighbors! to optionally fill a CellListIndices array (per-particle cell index) and updated BuildNeighborCellLists! usage in src/SPHCellList.jl to call FlattenNeighborCellLists!.
  • Added UseCUDA::Bool to SimulationMetaData in src/SimulationMetaDataConfiguration.jl to toggle the CUDA path.
  • Introduced runtime gating (ShouldUseCUDANeighborLoop) and multiple CUDA kernel entrypoints and device kernels in src/SPHCellList.jl (several NeighborLoopPerParticleCUDA! overloads and NeighborLoopKernel* functions) to handle the different metadata/kernel/output/shifting combinations; the original threaded CPU loops remain and are used when CUDA is unavailable or disabled.
  • Refactored per-interaction callsites to pass a small ParticleFields view and a SimMetaDataType type for more uniform kernel/call signatures required by the CUDA wrappers.

Testing

  • No automated tests were executed for this change (no test run requested).
  • Commands executed while preparing this PR included repository inspection and edits such as ls, rg, sed, multiple apply_patch actions, and git commit -m "Add CUDA neighbor loop support".

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant