Add support for multi-arch and multi-platform cuda toolchains#422
Add support for multi-arch and multi-platform cuda toolchains#422charleysaintNV wants to merge 11 commits intobazel-contrib:mainfrom
Conversation
…-arch' Enable multi-version builds See merge request csaint/rules_cuda!1
steple
left a comment
There was a problem hiding this comment.
Thanks for adding this, I think this will be very useful.
|
pushed a new commit with the versions stuff moved around and all in one place at least and I changed the name of the nvcc_platform and runtime_platform to exec and target |
cuda/extensions.bzl
Outdated
| "linux_x86_64_repo": attr.string( | ||
| mandatory = True, | ||
| doc = "Name of the repository to use for x86_64 platform", | ||
| ), | ||
| "linux_aarch64_repo": attr.string( | ||
| mandatory = True, | ||
| doc = "Name of the repository to use for ARM64/Jetpack platform", | ||
| ), | ||
| "linux_sbsa_repo": attr.string( | ||
| mandatory = True, | ||
| doc = "Name of the repository to use for SBSA platform", | ||
| ), |
There was a problem hiding this comment.
would a platform_mapping with attr.string_dict be better here? Validating the key in the rule impl seem to be future proof.
| for _, toolkit in registrations.items(): | ||
| if components_mapping != None: | ||
| cuda_toolkit(name = toolkit.name, components_mapping = components_mapping, version = redist_version) | ||
| # Always use the maximum version so the toolkit includes all components. |
There was a problem hiding this comment.
This is not quite true if CTK delete some component in the future. I think a union across all CTK versions will be a little bit more robust.
There was a problem hiding this comment.
This could take some work, right now the version in cuda_toolkit isn't going to necessarily be correct since it's pointing to @cuda which can point to any number of versioned cuda repos, but I don't know if that gets used anywhere in the rules so I'll try removing it and see what falls out...
There was a problem hiding this comment.
The logic is pretty deeply embedded in the repository rules where I can't use the value of a flag. I might need to go back and add the ability to register multiple toolkits to get everything to work as expected...
There was a problem hiding this comment.
Lets leave it for future improvement, just point it out :)
When a CUDA component (like cuda_crt) doesn't exist for a platform (like linux-aarch64), builds would fail because the select() had no matching condition for that platform. Now platform aliases are generated for ALL platforms, with dummy targets used for platforms where the component doesn't exist. This ensures builds on any platform have matching select conditions. Also consolidates platform definitions into a single source of truth in cuda/private/platforms.bzl and removes unused backward-compatibility code. Fixes JP6 (linux-aarch64, CUDA 12) build failure where cuda_crt (a CUDA 13+ only component) caused select() to fail.
db0a73a to
7eb64fe
Compare
|
Hey @cloudhan , I pushed a new commit addressing your comments. I also added some additional changes:
|
|
Good, we should have some tests against this new feature. I'd like to do it myself and make a PR to your branch before I can proceed. |
|
Sure, I've asked @charleysaintNV to give you access to the repo to push to the branch |
You should now have access |
This MR adds support for multiple architectures and multiple versions of a toolchain. Using the redis_json you can create multiple versions of a toolchain:
then using a set of flags in .bazelrc you can control which version of the toolchain to use in a given config: