Skip to content

add init command that pulls template repo#247

Open
drbh wants to merge 9 commits intomainfrom
add-template-based-init
Open

add init command that pulls template repo#247
drbh wants to merge 9 commits intomainfrom
add-template-based-init

Conversation

@drbh
Copy link
Collaborator

@drbh drbh commented Jan 28, 2026

this PR adds a new init subcommand that clones a huggingface

kernels init drbh/my-kernel

outputs

Downloading template from drbh/template...
Initialized kernel project: /Users/drbh/Projects/kernels/my-kernel
├── my_kernel_cpu
│   └── my_kernel_cpu.cpp
├── my_kernel_cuda
│   └── my_kernel.cu
├── my_kernel_metal
│   ├── my_kernel.metal
│   └── my_kernel.mm
├── my_kernel_xpu
│   └── my_kernel.cpp
├── tests
│   ├── __init__.py
│   └── test_my_kernel.py
├── torch-ext
│   ├── my_kernel
│   │   └── __init__.py
│   ├── torch_binding.cpp
│   └── torch_binding.h
├── .gitattributes
├── .gitignore
├── README.md
├── build.toml
├── example.py
└── flake.nix

Next steps:
  cd my-kernel
  nix run .#build-and-copy -L
  uv run example.py

now build

cd my-kernel
nix run .#build-and-copy -L

# tree build -L 1
# build
# ├── torch210-cpu-aarch64-darwin
# ├── torch210-metal-aarch64-darwin
# ├── torch29-cpu-aarch64-darwin
# └── torch29-metal-aarch64-darwin

# 5 directories, 0 files

then run

uv run example.py

outputs

Using device: mps
Input:  tensor([1., 2., 3.], device='mps:0')
Output: tensor([2., 3., 4.], device='mps:0')
Success!

and tested on a cuda device

# ... same commands as above

# tree build -L 1
# build
# ├── torch210-cxx11-cpu-x86_64-linux
# ├── torch210-cxx11-cu126-x86_64-linux
# ├── torch210-cxx11-cu128-x86_64-linux
# ├── torch210-cxx11-cu130-x86_64-linux
# ├── torch210-cxx11-rocm70-x86_64-linux
# ├── torch210-cxx11-rocm71-x86_64-linux
# ├── torch210-cxx11-xpu20253-x86_64-linux
# ├── torch29-cxx11-cpu-x86_64-linux
# ├── torch29-cxx11-cu126-x86_64-linux
# ├── torch29-cxx11-cu128-x86_64-linux
# ├── torch29-cxx11-cu130-x86_64-linux
# ├── torch29-cxx11-rocm63-x86_64-linux
# ├── torch29-cxx11-rocm64-x86_64-linux
# └── torch29-cxx11-xpu20252-x86_64-linux

# 14 directories, 0 files

outputs

Using device: cuda
Input:  tensor([1., 2., 3.], device='cuda:0')
Output: tensor([2., 3., 4.], device='cuda:0')
Success!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

danieldk
danieldk previously approved these changes Jan 29, 2026
Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!!

@drbh
Copy link
Collaborator Author

drbh commented Jan 29, 2026

updates

uname -a
# Darwin Mac 25.2.0 Darwin Kernel Version 25.2.0: Tue Nov 18 21:09:41 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6031 arm64
kernels init drbh/my-kernel

now clones the repo and removes the non platform specific dirs, and updates the build.toml to only contain the detected or specified backend.

output

Downloading template from drbh/template...
Initialized kernel project: /Users/drbh/Projects/kernel-init-tmp/my-kernel
├── my_kernel_metal
│   ├── my_kernel.metal
│   └── my_kernel.mm
├── tests
│   ├── __init__.py
│   └── test_my_kernel.py
├── torch-ext
│   ├── my_kernel
│   │   └── __init__.py
│   ├── torch_binding.cpp
│   └── torch_binding.h
├── .gitattributes
├── .gitignore
├── README.md
├── build.toml
├── example.py
└── flake.nix

Next steps:
  cd my-kernel
  cachix use huggingface
  nix run -L --max-jobs 1 --cores 8 .#build-and-copy
  uv run example.py

build toml file

[general]
backends = ["metal"]
name = "my-kernel"
version = 1

[torch]
src = [
  "torch-ext/torch_binding.cpp",
  "torch-ext/torch_binding.h",
]

[kernel.my_kernel_metal]
backend = "metal"
depends = ["torch"]
src = [
  "my_kernel_metal/my_kernel.mm",
  "my_kernel_metal/my_kernel.metal",
]

help message shows

kernels init --help
usage: kernel init [-h] [--template-repo TEMPLATE_REPO]
                   [--backends BACKENDS [BACKENDS ...]]
                   kernel_name

positional arguments:
  kernel_name           Name of the kernel repo (e.g., drbh/my-
                        kernel)

options:
  -h, --help            show this help message and exit
  --template-repo TEMPLATE_REPO
                        HuggingFace repo ID for the template
  --backends BACKENDS [BACKENDS ...]
                        Backends to include ('all' or list like: cpu
                        cuda metal rocm xpu). Defaults: cuda on
                        Linux/Windows, metal on macOS.

backends = ["metal"] if sys.platform == "darwin" else ["cuda"]
else:
backends = [
v.strip().lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stripping and lowercasing needed? I think all the whitespace should be consumed by the argument parser. I think it's better if people just have to give the arguments in lowercase.

Copy link
Collaborator Author

@drbh drbh Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestion the code is much cleaner in the latest commit, thanks!

init_parser.add_argument(
"--backends",
nargs="+",
default=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use action="extend" and then the user can use something like --backends cuda rocm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ended up using "+" instead of extend so if a user specifies a backend it overrides the default, otherwise we default to metal on darwin and cuda in all other cases

backends = [
v.strip().lower()
for item in args.backends
for v in item.split(",")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, I don't think we need extra parsing, we can let argparse handle it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, updated in latest commit

"--backends",
nargs="+",
default=None,
help="Backends to include ('all' or list like: cpu cuda metal rocm xpu). Defaults: cuda on Linux/Windows, metal on macOS.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a finite list, it would be nice to use choices=.... We can also set the default to cuda or metal here. I think argparse will then even show it in the usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, updated in latest commit

def run_init(args: Namespace) -> None:
kernel_name = args.kernel_name
if args.backends is None:
backends = ["metal"] if sys.platform == "darwin" else ["cuda"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, updated in latest commit

sys.exit(1)
backends = []
else:
valid = set(KNOWN_BACKENDS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, best to let argparse do the work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fully agree, same as above, updated in latest commit

Comment on lines 183 to 203
text = build_toml_path.read_text()
with open(build_toml_path, "rb") as f:
data = tomllib.load(f)
if "general" not in data:
return
kernel_table = data.get("kernel", {})
if not isinstance(kernel_table, dict):
kernel_table = {}
remove_kernels = {
name
for name, cfg in kernel_table.items()
if isinstance(cfg, dict) and cfg.get("backend") not in set(backends)
}
backends_list = ", ".join(f'"{b}"' for b in backends)
new_line = f"backends = [{backends_list}]"
pattern = r"(\[general\][\s\S]*?)^\s*backends\s*=\s*\[[^\]]*\]"
new_text, count = re.subn(pattern, r"\1" + new_line, text, count=1, flags=re.M)
if remove_kernels:
new_text = _remove_kernel_sections(new_text, remove_kernels)
if count or remove_kernels:
build_toml_path.write_text(new_text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner to parse the TOML, modify the Python data structures and then serialize as TOML again. regexps are fragile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agree, the main reason I opt'ed for regex over toml is that I was concerned that writing toml back into the file would lose any comments/formatting added to the template.

I think its a reasonable trade off (more robust toml parsing, instead of comments) however my initial approach attempted to be comment preserving.

happy to update with the toml route, just wanted to flag the reasoning/tradeoff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider using tomlkit, since it can preserve comments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestion, I've added tomlkit as a dep in the latest commits. Im still open to using the builtins if we don't see a benefit from preserving comments/whitespace, but the current changes are working well

from kernels.metadata import Metadata

ENV_VARS_TRUE_VALUES = {"1", "ON", "YES", "TRUE"}
KNOWN_BACKENDS = ("cpu", "cuda", "metal", "rocm", "xpu", "npu")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make it a set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, updated in latest commit

@drbh drbh force-pushed the add-template-based-init branch from 60bc1f0 to 8796712 Compare February 3, 2026 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants