[nav2_mppi] Add optional CUDA-accelerated backend for Jetson/Edge devices

## Feature request

#### Feature description
Currently, `nav2_mppi` heavily relies on CPU for trajectory sampling and evaluation. While efficient on desktop-class CPUs, this becomes a significant bottleneck on edge computing platforms like **NVIDIA Jetson Orin**. In high-density obstacle environments or with a high number of sampled trajectories ($K > 500$), the CPU utilization spikes significantly, often limiting the control frequency to sub-optimal levels (e.g., < 20Hz), which restricts the robot's high-speed maneuvering capabilities.

I propose adding an optional CUDA-accelerated backend to offload these computations to the GPU, significantly improving real-time performance on ARM-based SoC architectures.

#### Implementation considerations
I suggest implementing this as an optional plugin-based optimization. Key technical points include:

* **Parallel Computing:** Use `cuRAND` for parallel noise generation and custom CUDA kernels for trajectory rollouts and scores.
* **Memory Optimization:** Leverage **Unified Memory** (managed memory) to minimize host-to-device data transfer overhead, specifically targeting the shared-memory architecture of Jetson devices.
* **Build System:** The CUDA backend will be gated behind a CMake flag (e.g., `-DENABLE_CUDA=ON`), ensuring 100% backward compatibility and no additional dependencies for non-NVIDIA users.
* **Pros:** - Much higher sampling density (e.g., $K > 2000$).
    - Significantly lower latency and higher control frequency.
    - Reduced CPU overhead for other critical tasks like perception or localization.
* **Cons:** - Additional build-time dependency on the CUDA Toolkit for developers who explicitly enable this feature.

Recent research, such as **"MPPI-Generic: A CUDA Library for Stochastic Trajectory Optimization" (arXiv:2409.07563)**, has already demonstrated the feasibility and performance gains of such an approach. I am a Robotics Algorithm Engineer and I am willing to contribute the implementation and a PR for this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nav2_mppi] Add optional CUDA-accelerated backend for Jetson/Edge devices #5956

Feature request

Feature description

Implementation considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[nav2_mppi] Add optional CUDA-accelerated backend for Jetson/Edge devices #5956

Description

Feature request

Feature description

Implementation considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions