Skip to content

the missing link to Apple GPUs

License

Notifications You must be signed in to change notification settings

sueszli/llvm-to-air

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

239 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

β–Œ  β–Œ β–Œ β–Œβ–™β–—β–Œ  ▐       β–žβ–€β––β–œβ–˜β–›β–€β––
β–Œ  β–Œ β–šβ–—β–˜β–Œβ–˜β–Œ  β–œβ–€ β–žβ–€β––  β–™β–„β–Œβ– β–™β–„β–˜
β–Œ  β–Œ β–β–ž β–Œ β–Œ  ▐ β––β–Œ β–Œ  β–Œ β–Œβ– β–Œβ–š 
β–€β–€β–˜β–€β–€β–˜β–˜ β–˜ β–˜   β–€ ▝▀   β–˜ β–˜β–€β–˜β–˜ β–˜

// Reverse-engineered compiler stack for Apple Silicon GPUs
//
// Coming soon to xDSL:
// https://github.com/xdslproject/xdsl/blob/main/xdsl/backend/mps/__init__.py

MLIR is the right abstraction for portable performance across heterogeneous
hardware, but it has no Apple Silicon GPU backend [^1]. This is a problem
because (1) Apple Silicon market share is growing fast, (2) most of these
machines have powerful GPUs sitting idle and (3) the world needs more compute
for everything from protein folding to ML training.

Mojo proved that targeting Apple GPUs via MLIR->LLVM->AIR->MetalLib works, but
their implementation is closed source [^2].

This project reverse engineers that missing piece and provides an open source
implementation of the LLVM IR to AIR lowering pass.

  +----------------+      +----------------+      +----------------+
  |    Frontend    |----->|  MLIR Dialect  |----->|  LLVM Bitcode  |
  |                |      |                |      | (Open Source)  |
  +----------------+      +----------------+      +----------------+
                                                           |
                                                 [ src/llvm_to_air.py ]
                                                           |
  +----------------+      +----------------+      +--------v-------+
  |   Apple GPU    |<-----|    Metallib    |<-----|   AIR Bitcode  |
  |   (M-Series)   |      |    (Binary)    |      |  (Proprietary) |
  +----------------+      +----------------+      +----------------+

The core contribution is `src/llvm_to_air.py`, which takes LLVM IR and lowers it 
to Apple's Intermediate Representation AIR. This enables a full compilation 
pipeline from high-level MLIR dialects down to executable code on Apple Silicon 
GPUs. I used xDSL to write the entire compiler stack in Python, making it accessible
and hackable.

Fair warning: this is experimental and brittle. AIR is closed source and 
undocumented, so everything here is reverse engineered. But it works and to 
my knowledge this is the first open source end-to-end stack for Apple Silicon.

--------------------------------------------------------------------------------
Performance
--------------------------------------------------------------------------------

The mandelbrot benchmark shows a 1150x speedup πŸ”₯ over the vanilla Python impl.

$ uv run demo_mandelbrot.py

    mandelbrot benchmark (1,048,576 pixels)
                                                                                                        
    results (avg latency ms):
    gpu            : 2.47 ms
    numba          : 188.56 ms
    numpy          : 1519.57 ms
    numpy+numba    : 1820.99 ms
    plain          : 2840.38 ms

    relative to vanilla python:
    gpu            : 1150.23x faster
    numba          : 15.06x faster
    numpy          : 1.87x faster
    numpy+numba    : 1.56x faster

--------------------------------------------------------------------------------
Lisp Frontend
--------------------------------------------------------------------------------

There's also a tiny Common Lisp subset as a frontend.

$ uv run demo_linalg.py

    (print
        (add
            (matmul
                (tensor (2 3) (-1.0 2.0 -3.0 4.0 -5.0 6.0))
                (tensor (3 2) (7.0 8.0 9.0 10.0 11.0 12.0))
            )
            (tensor (2 2) (100.0 100.0 100.0 100.0))
        )
    )


    Tensor(2 x 2):
            78.000000 76.000000
            149.000000 154.000000

--------------------------------------------------------------------------------
References
--------------------------------------------------------------------------------

[^1] MLIR: https://discourse.llvm.org/t/rfc-mps-dialect-in-mlir/77102
[^2] Mojo: https://forum.modular.com/t/apple-silicon-gpu-support-in-mojo/2295

About

the missing link to Apple GPUs

Topics

Resources

License

Stars

Watchers

Forks