-
Notifications
You must be signed in to change notification settings - Fork 0
the missing link to Apple GPUs
License
sueszli/llvm-to-air
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Repository files navigation
β β β ββββ β ββββββββ β β ββββββ ββ βββ ββββ βββ β β ββ β β β ββ β β ββ ββ βββββββ β β β ββ β ββββ β // Reverse-engineered compiler stack for Apple Silicon GPUs // // Coming soon to xDSL: // https://github.com/xdslproject/xdsl/blob/main/xdsl/backend/mps/__init__.py MLIR is the right abstraction for portable performance across heterogeneous hardware, but it has no Apple Silicon GPU backend [^1]. This is a problem because (1) Apple Silicon market share is growing fast, (2) most of these machines have powerful GPUs sitting idle and (3) the world needs more compute for everything from protein folding to ML training. Mojo proved that targeting Apple GPUs via MLIR->LLVM->AIR->MetalLib works, but their implementation is closed source [^2]. This project reverse engineers that missing piece and provides an open source implementation of the LLVM IR to AIR lowering pass. +----------------+ +----------------+ +----------------+ | Frontend |----->| MLIR Dialect |----->| LLVM Bitcode | | | | | | (Open Source) | +----------------+ +----------------+ +----------------+ | [ src/llvm_to_air.py ] | +----------------+ +----------------+ +--------v-------+ | Apple GPU |<-----| Metallib |<-----| AIR Bitcode | | (M-Series) | | (Binary) | | (Proprietary) | +----------------+ +----------------+ +----------------+ The core contribution is `src/llvm_to_air.py`, which takes LLVM IR and lowers it to Apple's Intermediate Representation AIR. This enables a full compilation pipeline from high-level MLIR dialects down to executable code on Apple Silicon GPUs. I used xDSL to write the entire compiler stack in Python, making it accessible and hackable. Fair warning: this is experimental and brittle. AIR is closed source and undocumented, so everything here is reverse engineered. But it works and to my knowledge this is the first open source end-to-end stack for Apple Silicon. -------------------------------------------------------------------------------- Performance -------------------------------------------------------------------------------- The mandelbrot benchmark shows a 1150x speedup π₯ over the vanilla Python impl. $ uv run demo_mandelbrot.py mandelbrot benchmark (1,048,576 pixels) results (avg latency ms): gpu : 2.47 ms numba : 188.56 ms numpy : 1519.57 ms numpy+numba : 1820.99 ms plain : 2840.38 ms relative to vanilla python: gpu : 1150.23x faster numba : 15.06x faster numpy : 1.87x faster numpy+numba : 1.56x faster -------------------------------------------------------------------------------- Lisp Frontend -------------------------------------------------------------------------------- There's also a tiny Common Lisp subset as a frontend. $ uv run demo_linalg.py (print (add (matmul (tensor (2 3) (-1.0 2.0 -3.0 4.0 -5.0 6.0)) (tensor (3 2) (7.0 8.0 9.0 10.0 11.0 12.0)) ) (tensor (2 2) (100.0 100.0 100.0 100.0)) ) ) Tensor(2 x 2): 78.000000 76.000000 149.000000 154.000000 -------------------------------------------------------------------------------- References -------------------------------------------------------------------------------- [^1] MLIR: https://discourse.llvm.org/t/rfc-mps-dialect-in-mlir/77102 [^2] Mojo: https://forum.modular.com/t/apple-silicon-gpu-support-in-mojo/2295
About
the missing link to Apple GPUs