Skip to content

Add support for multi-arch and multi-platform cuda toolchains#422

Open
charleysaintNV wants to merge 11 commits intobazel-contrib:mainfrom
charleysaintNV:csaint/multi-arch
Open

Add support for multi-arch and multi-platform cuda toolchains#422
charleysaintNV wants to merge 11 commits intobazel-contrib:mainfrom
charleysaintNV:csaint/multi-arch

Conversation

@charleysaintNV
Copy link

@charleysaintNV charleysaintNV commented Dec 4, 2025

This MR adds support for multiple architectures and multiple versions of a toolchain. Using the redis_json you can create multiple versions of a toolchain:

cuda.redist_json(
    name = "cuda_13_0_2",
    version = "13.0.2",
    platforms = [
        "linux-x86_64",
        "linux-sbsa",
    ],
)
cuda.redist_json(
    name = "cuda_13_0_0",
    version = "13.0.0",
    platforms = [
        "linux-x86_64",
        "linux-sbsa",
    ],
)
cuda.redist_json(
    name = "cuda_12_8_0",
    version = "12.8.0",
    platforms = [
        "linux-x86_64",
        "linux-sbsa",
    ],
)

cuda.toolkit(name = "cuda")
use_repo(cuda,"cuda")

then using a set of flags in .bazelrc you can control which version of the toolchain to use in a given config:

build --@rules_cuda//cuda:exec_platform="linux-x86_64"
build --@rules_cuda//cuda:target_platform="linux-x86_64"
build --@rules_cuda//cuda:version="13.0.0"

Copy link

@steple steple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, I think this will be very useful.

@charleysaintNV
Copy link
Author

pushed a new commit with the versions stuff moved around and all in one place at least and I changed the name of the nvcc_platform and runtime_platform to exec and target

Comment on lines 91 to 102
"linux_x86_64_repo": attr.string(
mandatory = True,
doc = "Name of the repository to use for x86_64 platform",
),
"linux_aarch64_repo": attr.string(
mandatory = True,
doc = "Name of the repository to use for ARM64/Jetpack platform",
),
"linux_sbsa_repo": attr.string(
mandatory = True,
doc = "Name of the repository to use for SBSA platform",
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would a platform_mapping with attr.string_dict be better here? Validating the key in the rule impl seem to be future proof.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

for _, toolkit in registrations.items():
if components_mapping != None:
cuda_toolkit(name = toolkit.name, components_mapping = components_mapping, version = redist_version)
# Always use the maximum version so the toolkit includes all components.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite true if CTK delete some component in the future. I think a union across all CTK versions will be a little bit more robust.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could take some work, right now the version in cuda_toolkit isn't going to necessarily be correct since it's pointing to @cuda which can point to any number of versioned cuda repos, but I don't know if that gets used anywhere in the rules so I'll try removing it and see what falls out...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is pretty deeply embedded in the repository rules where I can't use the value of a flag. I might need to go back and add the ability to register multiple toolkits to get everything to work as expected...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets leave it for future improvement, just point it out :)

When a CUDA component (like cuda_crt) doesn't exist for a platform
(like linux-aarch64), builds would fail because the select() had no
matching condition for that platform.

Now platform aliases are generated for ALL platforms, with dummy
targets used for platforms where the component doesn't exist. This
ensures builds on any platform have matching select conditions.

Also consolidates platform definitions into a single source of truth
in cuda/private/platforms.bzl and removes unused backward-compatibility
code.

Fixes JP6 (linux-aarch64, CUDA 12) build failure where cuda_crt
(a CUDA 13+ only component) caused select() to fail.
@lgulich
Copy link
Contributor

lgulich commented Jan 27, 2026

Hey @cloudhan , I pushed a new commit addressing your comments.

I also added some additional changes:

  • Added more dummy fallbacks for some targets that don't exist on aarch64/cuda12 (e.g. culibos)
  • Created cuda/private/platforms.bzl as single source of truth for
    supported platforms

@cloudhan
Copy link
Collaborator

Good, we should have some tests against this new feature. I'd like to do it myself and make a PR to your branch before I can proceed.

@lgulich
Copy link
Contributor

lgulich commented Jan 29, 2026

Sure, I've asked @charleysaintNV to give you access to the repo to push to the branch

@lgulich
Copy link
Contributor

lgulich commented Feb 2, 2026

Sure, I've asked @charleysaintNV to give you access to the repo to push to the branch

You should now have access

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants