You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-16Lines changed: 26 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,21 +16,31 @@ A particularity of ParallelStencil is that it enables writing a single high-leve
16
16
Beyond traditional high-performance computing, ParallelStencil supports automatic differentiation of architecture-agnostic parallel kernels relying on [Enzyme.jl], enabling both high-level and generic syntax for maximal flexibility.
17
17
18
18
## Contents
19
-
*[Parallelization and optimization with one macro call](#parallelization-with-one-macro-call)
20
-
*[Stencil computations with math-close notation](#stencil-computations-with-math-close-notation)
21
-
*[50-lines example deployable on GPU and CPU](#50-lines-example-deployable-on-GPU-and-CPU)
*[Seamless interoperability with communication packages and hiding communication](#seamless-interoperability-with-communication-packages-and-hiding-communication)
24
-
*[Support for architecture-agnostic low level kernel programming](#support-for-architecture-agnostic-low-level-kernel-programming)
25
-
*[Support for logical arrays of small arrays / structs](#support-for-logical-arrays-of-small-arrays--structs)
26
-
*[Support for automatic differentiation of architecture-agnostic parallel kernels](#support-for-automatic-differentiation-of-architecture-agnostic-parallel-kernels)
27
-
*[Module documentation callable from the Julia REPL / IJulia](#module-documentation-callable-from-the-julia-repl--ijulia)
-[Seamless interoperability with communication packages and hiding communication](#seamless-interoperability-with-communication-packages-and-hiding-communication)
25
+
-[Support for architecture-agnostic low level kernel programming](#support-for-architecture-agnostic-low-level-kernel-programming)
26
+
-[Support for logical arrays of small arrays / structs](#support-for-logical-arrays-of-small-arrays--structs)
27
+
-[Support for automatic differentiation of architecture-agnostic parallel kernels](#support-for-automatic-differentiation-of-architecture-agnostic-parallel-kernels)
28
+
-[Module documentation callable from the Julia REPL / IJulia](#module-documentation-callable-from-the-julia-repl--ijulia)
-[Questions, comments and discussions](#questions-comments-and-discussions)
42
+
-[Your contributions](#your-contributions)
43
+
-[References](#references)
34
44
35
45
## Parallelization and optimization with one macro call
36
46
A simple call to `@parallel` is enough to parallelize and optimize a function and to launch it. The package used underneath for parallelization is defined in a call to `@init_parallel_stencil` beforehand. Supported are [CUDA.jl], [AMDGPU.jl] and [Metal.jl] for running on GPU and [Base.Threads] for CPU. The following example outlines how to run parallel computations on a GPU using the native kernel programming capabilities of [CUDA.jl] underneath (omitted lines are represented with `#(...)`, omitted arguments with `...`):
@@ -318,7 +328,7 @@ import ParallelStencil.AD
318
328
using Enzyme
319
329
#(...)
320
330
@parallelf!(A, B, a) # normal call of f!
321
-
@parallel configcall=f!(A, B, a) AD.autodiff_deferred!(Enzyme.Reverse, f!, DuplicatedNoNeed(A, Ā), DuplicatedNoNeed(B, B̄), Const(a)) # call to the gradient of f!, differentiated with respect to A and B
331
+
@parallel configcall=f!(A, B, a) AD.autodiff_deferred!(Enzyme.Reverse, f!, DuplicatedNoNeed(A, Ā), DuplicatedNoNeed(B, B̄), a) # call to the gradient of f!, differentiated with respect to A and B
322
332
```
323
333
The submodule `ParallelStencil.AD` contains GPU-compatible wrappers of Enzyme functions (returning always `nothing` as required by the backend packages [CUDA.jl] and [AMDGPU.jl]); the wrapper `AD.autodiff_deferred!` maps, e.g., `Enzyme.autodiff_deferred`. The keyword argument `configcall` makes it trivial to call these generic functions for automatic differentiation with the right launch parameters.
0 commit comments