Skip to content

Commit 39ae275

Browse files
committed
update AD doc
1 parent 1eb199e commit 39ae275

File tree

1 file changed

+26
-16
lines changed

1 file changed

+26
-16
lines changed

README.md

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,31 @@ A particularity of ParallelStencil is that it enables writing a single high-leve
1616
Beyond traditional high-performance computing, ParallelStencil supports automatic differentiation of architecture-agnostic parallel kernels relying on [Enzyme.jl], enabling both high-level and generic syntax for maximal flexibility.
1717

1818
## Contents
19-
* [Parallelization and optimization with one macro call](#parallelization-with-one-macro-call)
20-
* [Stencil computations with math-close notation](#stencil-computations-with-math-close-notation)
21-
* [50-lines example deployable on GPU and CPU](#50-lines-example-deployable-on-GPU-and-CPU)
22-
* [50-lines multi-xPU example](#50-lines-multi-xpu-example)
23-
* [Seamless interoperability with communication packages and hiding communication](#seamless-interoperability-with-communication-packages-and-hiding-communication)
24-
* [Support for architecture-agnostic low level kernel programming](#support-for-architecture-agnostic-low-level-kernel-programming)
25-
* [Support for logical arrays of small arrays / structs](#support-for-logical-arrays-of-small-arrays--structs)
26-
* [Support for automatic differentiation of architecture-agnostic parallel kernels](#support-for-automatic-differentiation-of-architecture-agnostic-parallel-kernels)
27-
* [Module documentation callable from the Julia REPL / IJulia](#module-documentation-callable-from-the-julia-repl--ijulia)
28-
* [Concise single/multi-xPU miniapps](#concise-singlemulti-xpu-miniapps)
29-
* [Dependencies](#dependencies)
30-
* [Installation](#installation)
31-
* [Questions, comments and discussions](#questions-comments-and-discussions)
32-
* [Your contributions](#your-contributions)
33-
* [References](#references)
19+
- [Contents](#contents)
20+
- [Parallelization and optimization with one macro call](#parallelization-and-optimization-with-one-macro-call)
21+
- [Stencil computations with math-close notation](#stencil-computations-with-math-close-notation)
22+
- [50-lines example deployable on GPU and CPU](#50-lines-example-deployable-on-gpu-and-cpu)
23+
- [50-lines multi-xPU example](#50-lines-multi-xpu-example)
24+
- [Seamless interoperability with communication packages and hiding communication](#seamless-interoperability-with-communication-packages-and-hiding-communication)
25+
- [Support for architecture-agnostic low level kernel programming](#support-for-architecture-agnostic-low-level-kernel-programming)
26+
- [Support for logical arrays of small arrays / structs](#support-for-logical-arrays-of-small-arrays--structs)
27+
- [Support for automatic differentiation of architecture-agnostic parallel kernels](#support-for-automatic-differentiation-of-architecture-agnostic-parallel-kernels)
28+
- [Module documentation callable from the Julia REPL / IJulia](#module-documentation-callable-from-the-julia-repl--ijulia)
29+
- [Concise single/multi-xPU miniapps](#concise-singlemulti-xpu-miniapps)
30+
- [Performance metric](#performance-metric)
31+
- [Miniapp content](#miniapp-content)
32+
- [Thermo-mechanical convection 2-D app](#thermo-mechanical-convection-2-d-app)
33+
- [Viscous Stokes 2-D app](#viscous-stokes-2-d-app)
34+
- [Viscous Stokes 3-D app](#viscous-stokes-3-d-app)
35+
- [Acoustic wave 2-D app](#acoustic-wave-2-d-app)
36+
- [Acoustic wave 3-D app](#acoustic-wave-3-d-app)
37+
- [Scalar porosity waves 2-D app](#scalar-porosity-waves-2-d-app)
38+
- [Hydro-mechanical porosity waves 2-D app](#hydro-mechanical-porosity-waves-2-d-app)
39+
- [Dependencies](#dependencies)
40+
- [Installation](#installation)
41+
- [Questions, comments and discussions](#questions-comments-and-discussions)
42+
- [Your contributions](#your-contributions)
43+
- [References](#references)
3444

3545
## Parallelization and optimization with one macro call
3646
A simple call to `@parallel` is enough to parallelize and optimize a function and to launch it. The package used underneath for parallelization is defined in a call to `@init_parallel_stencil` beforehand. Supported are [CUDA.jl], [AMDGPU.jl] and [Metal.jl] for running on GPU and [Base.Threads] for CPU. The following example outlines how to run parallel computations on a GPU using the native kernel programming capabilities of [CUDA.jl] underneath (omitted lines are represented with `#(...)`, omitted arguments with `...`):
@@ -318,7 +328,7 @@ import ParallelStencil.AD
318328
using Enzyme
319329
#(...)
320330
@parallel f!(A, B, a) # normal call of f!
321-
@parallel configcall=f!(A, B, a) AD.autodiff_deferred!(Enzyme.Reverse, f!, DuplicatedNoNeed(A, Ā), DuplicatedNoNeed(B, B̄), Const(a)) # call to the gradient of f!, differentiated with respect to A and B
331+
@parallel configcall=f!(A, B, a) AD.autodiff_deferred!(Enzyme.Reverse, f!, DuplicatedNoNeed(A, Ā), DuplicatedNoNeed(B, B̄), a) # call to the gradient of f!, differentiated with respect to A and B
322332
```
323333
The submodule `ParallelStencil.AD` contains GPU-compatible wrappers of Enzyme functions (returning always `nothing` as required by the backend packages [CUDA.jl] and [AMDGPU.jl]); the wrapper `AD.autodiff_deferred!` maps, e.g., `Enzyme.autodiff_deferred`. The keyword argument `configcall` makes it trivial to call these generic functions for automatic differentiation with the right launch parameters.
324334

0 commit comments

Comments
 (0)