|
1 | 1 | # Changelog |
2 | 2 |
|
3 | | -## [Unreleased] - 2025-?? |
| 3 | +## [1.10.0] - 2025-11-02 |
4 | 4 |
|
5 | 5 | ### Added |
6 | 6 |
|
|
11 | 11 | ([GH-886](https://github.com/NVIDIA/warp/issues/886)). |
12 | 12 | - Add support for negative indexing and improve slicing for the `wp.array()` type |
13 | 13 | ([GH-504](https://github.com/NVIDIA/warp/issues/504)). |
14 | | -- Add support for composite type tile indexed assignment and extraction ([GH-941](https://github.com/NVIDIA/warp/issues/941)). |
15 | | -- Add `warp/examples/tile/example_tile_mcgp.py`, demonstrating how to implement a Monte Carlo Laplace solver. |
16 | | -- Add `wp.tile_full()` builtin, which fills a tile with a constant value. |
| 14 | +- Add `wp.cast()` to reinterpret a value as a different type while preserving its bit pattern |
| 15 | + ([GH-789](https://github.com/NVIDIA/warp/issues/789)). |
| 16 | +- Add support for error functions: `wp.erf()`, `wp.erfc()`, `wp.erfinv()`, and `wp.erfcinv()` |
| 17 | + ([GH-910](https://github.com/NVIDIA/warp/issues/910)). |
| 18 | +- Add `wp.tile_full()`, which fills a tile with a constant value ([GH-973](https://github.com/NVIDIA/warp/issues/973)). |
| 19 | +- Add axis-reduction overloads for `wp.tile_reduce()` and `wp.tile_sum()` |
| 20 | + ([GH-835](https://github.com/NVIDIA/warp/issues/835)). |
| 21 | +- Add support for component-level indexing and assignment on tiles of composite types (e.g. `tile[i][1]` for |
| 22 | + extracting vector components, `tile[i][1, 1]` for matrix elements) |
| 23 | + ([GH-941](https://github.com/NVIDIA/warp/issues/941)). |
| 24 | +- Add `warp/examples/tile/example_tile_mcgp.py`, demonstrating how to implement a Monte Carlo Laplace solver. |
17 | 25 | - Add support for recording and waiting for external events in CUDA graphs |
18 | 26 | ([GH-983](https://github.com/NVIDIA/warp/issues/983)). |
19 | | -- Add kernel-level functions `bsr_row_index()` and `bsr_block_index()` to `warp.sparse` |
20 | | - ([GH-895](https://github.com/NVIDIA/warp/issues/895)). |
21 | 27 | - Add support for querying CPU memory information (requires `psutil` package) |
22 | 28 | ([GH-985](https://github.com/NVIDIA/warp/issues/985)). |
23 | | -- Add support for limiting the graph cache size of JAX callables ([GH-989](https://github.com/NVIDIA/warp/issues/989)). |
24 | | -- Add support for JAX pmap ([GH-976](https://github.com/NVIDIA/warp/pull/976)). |
25 | | -- Add support for `wp.erf()`, `wp.erfc()`, `wp.erfinv()`, and `wp.erfcinv()` ([GH-910](https://github.com/NVIDIA/warp/issues/910)). |
26 | | -- Add axis reduction overloads for `wp.tile_reduce()` and `wp.tile_sum()` |
27 | | - ([GH-835](https://github.com/NVIDIA/warp/issues/835)). |
28 | | -- Add adjoint for `wp.transform()` when constructing with individual scalars ([GH-1011](https://github.com/NVIDIA/warp/issues/1011)). |
29 | | -- Add a double precision overload for `wp.intersect_tri_tri` ([GH-1015](https://github.com/NVIDIA/warp/issues/1015)). |
30 | 29 | - Add `wp.get_cuda_supported_archs()` to query supported CUDA compute architectures for compilation targets |
31 | 30 | ([GH-964](https://github.com/NVIDIA/warp/issues/964)). |
32 | | -- Add `wp.cast()` to reinterpret a value as a different type while preserving its bit pattern |
33 | | - ([GH-789](https://github.com/NVIDIA/warp/issues/789)). |
34 | 31 | - Add runtime version verification to detect native library mismatches. |
35 | 32 | Version mismatches trigger warnings but allow execution to continue |
36 | 33 | ([GH-1018](https://github.com/NVIDIA/warp/issues/1018)). |
| 34 | +- Add kernel-level functions `bsr_row_index()` and `bsr_block_index()` to `warp.sparse` |
| 35 | + ([GH-895](https://github.com/NVIDIA/warp/issues/895)). |
| 36 | +- Add adjoint for `wp.transform()` when constructing with individual scalars |
| 37 | + ([GH-1011](https://github.com/NVIDIA/warp/issues/1011)). |
| 38 | +- Add a double-precision overload for `wp.intersect_tri_tri()` ([GH-1015](https://github.com/NVIDIA/warp/issues/1015)). |
| 39 | +- Add support for `jax.pmap()` ([GH-976](https://github.com/NVIDIA/warp/pull/976)). |
37 | 40 | - Add automatic differentiation support with `jax_kernel(enable_backward=True)` |
38 | 41 | ([GH-912](https://github.com/NVIDIA/warp/pull/912), [GH-515](https://github.com/NVIDIA/warp/issues/515)). |
| 42 | +- Add support for limiting the graph cache size of JAX callables ([GH-989](https://github.com/NVIDIA/warp/issues/989)). |
| 43 | +- Add PyTorch-Warp interop deferred gradient allocation case study to documentation |
| 44 | + ([GH-1046](https://github.com/NVIDIA/warp/issues/1046)). |
39 | 45 |
|
40 | 46 | ### Removed |
41 | 47 |
|
42 | 48 | - Remove `warp.sim` module and related examples. This module has been superseded by the Newton library, a separate |
43 | 49 | package with a new API. For migration guidance, see the |
44 | 50 | [Newton migration guide](https://newton-physics.github.io/newton/migration.html) and the original GitHub announcement |
45 | 51 | ([GH-735](https://github.com/NVIDIA/warp/discussions/735)). |
46 | | -- Remove support for passing lists, tuples, and other non-Warp array arguments when calling built-ins at the Python scope |
47 | | - (e.g: `wp.normalize([1.0, 2.0, 3.0])` should be written as `wp.normalize(wp.vec3(1.0, 2.0, 3.0))`). |
48 | | -- Remove support for Intel-based macOS (x86_64). Apple Silicon-based Macs (ARM64) remain fully supported. |
49 | | - Users attempting to run Warp on Intel Macs will receive a `RuntimeError` directing them to use Warp 1.9.x or earlier |
50 | | - ([GH-1016](https://github.com/NVIDIA/warp/issues/1016)) |
| 52 | +- Remove support for passing lists, tuples, and other non-Warp array arguments when calling built-ins at the Python |
| 53 | + scope (deprecated since v0.11.0). Use explicit type constructors instead (e.g., `wp.normalize([1.0, 2.0, 3.0])` |
| 54 | + should be `wp.normalize(wp.vec3(1.0, 2.0, 3.0))`). |
| 55 | +- Remove support for Intel-based macOS (x86_64). Apple Silicon-based Macs (ARM64) continue to be supported with the CPU |
| 56 | + backend. Users on Intel Macs will receive a `RuntimeError` directing them to use Warp 1.9.x or earlier |
| 57 | + ([GH-1016](https://github.com/NVIDIA/warp/issues/1016)). |
51 | 58 | - Remove `wp.select()` (deprecated since 1.7). Use `wp.where(cond, value_if_true, value_if_false)` instead. |
52 | 59 | - Remove the `wp.matrix(pos, quat, scale)` built-in function. Use `wp.transform_compose()` instead |
53 | 60 | ([GH-980](https://github.com/NVIDIA/warp/issues/980)). |
54 | 61 |
|
55 | 62 | ### Deprecated |
56 | 63 |
|
57 | | -- Deprecate constructing a matrix from vectors at the Python scope (e.g.: `wp.mat22(wp.vec2(1, 2), wp.vec2(3, 4))` should become `wp.matrix_from_rows(wp.vec2(1, 2), wp.vec2(3, 4))`) ([GH-981](https://github.com/NVIDIA/warp/issues/981)). |
| 64 | +- Deprecate constructing a matrix from vectors at the Python scope (e.g. `wp.mat22(wp.vec2(1, 2), wp.vec2(3, 4))` |
| 65 | + should become `wp.matrix_from_rows(wp.vec2(1, 2), wp.vec2(3, 4))`) |
| 66 | + ([GH-981](https://github.com/NVIDIA/warp/issues/981)). |
58 | 67 |
|
59 | 68 | ### Changed |
60 | 69 |
|
61 | | -- Improve efficiency for `wp.bvh_query_aabb()`, `wp.mesh_query_aabb()` and `wp.bvh_query_ray()`. |
62 | | - This fixes a performance regression introduced in Warp 1.6.0 ([GH-758](https://github.com/NVIDIA/warp/issues/758)). |
| 70 | +- **Breaking:** Change the default implementation of `jax_kernel()` to be `wp.jax_experimental.ffi.jax_kernel()`. |
| 71 | + The previous version is still available as `wp.jax_experimental.custom_call.jax_kernel()`, but it is not supported |
| 72 | + with JAX v0.8 and newer ([GH-974](https://github.com/NVIDIA/warp/issues/974)). |
| 73 | +- **Breaking:** Raise `RuntimeError` from `wp.load_module()` when attempting to load a module that does not contain |
| 74 | + any Warp kernels, functions, or structs ([GH-920](https://github.com/NVIDIA/warp/issues/920)). |
| 75 | +- Improve performance when calling built-in functions from the Python scope |
| 76 | + ([GH-801](https://github.com/NVIDIA/warp/issues/801)). |
63 | 77 | - Improve efficiency of struct instance creation and attribute access ([GH-968](https://github.com/NVIDIA/warp/issues/968)). |
| 78 | +- Add `leaf_size` parameter to `wp.Bvh` and `bvh_leaf_size` to `wp.Mesh` to control the number of primitives per leaf |
| 79 | + for performance tuning. The default is now 1 for `wp.Bvh` and 4 for `wp.Mesh`, changed from a hardcoded value of |
| 80 | + 4 ([GH-994](https://github.com/NVIDIA/warp/issues/994)). |
64 | 81 | - Make `warp.sparse` operations with `masked=True` consistent with `bsr_mm()` by preserving result matrix topology, |
65 | 82 | enabling CUDA subgraph capture for `bsr_axpy()`, `bsr_assign()` and `bsr_set_transpose()` |
66 | 83 | ([GH-987](https://github.com/NVIDIA/warp/issues/987)). |
67 | | -- Add `max_new_nnz` argument to `wp.sparse.bsr_mm` providing a synchronization-free path without further assumptions about non-zero topology. |
68 | | -- Improve performance when calling built-in functions from the Python scope |
69 | | - ([GH-801](https://github.com/NVIDIA/warp/issues/801)). |
70 | | -- Building `warp.fem` geometry and function space partitions is now possible in CUDA graphs by passing an explicit upper-bound for the number of cells and nodes to `ExplicitGeometryPartition` and `make_space_partition`. Additionally, building fields and field restrictions is now synchronization-free by default ([GH-1021](https://github.com/NVIDIA/warp/issues/1021)). |
71 | | -- Raise `RuntimeError` from `wp.load_module()` when attempting to load a module that does not contain any Warp kernels, |
72 | | - functions, or structs ([GH-920](https://github.com/NVIDIA/warp/issues/920)). |
| 84 | +- Add `max_new_nnz` argument to `wp.sparse.bsr_mm()` providing a synchronization-free path without further assumptions |
| 85 | + about non-zero topology. |
| 86 | +- Building `warp.fem` geometry and function space partitions is now possible in CUDA graphs by passing an explicit |
| 87 | + upper-bound for the number of cells and nodes to `ExplicitGeometryPartition` and `make_space_partition`. |
| 88 | + Building fields and field restrictions is now synchronization-free by default |
| 89 | + ([GH-1021](https://github.com/NVIDIA/warp/issues/1021)). |
73 | 90 | - Default the `q` argument in `wp.transform()` to the identity quaternion at the kernel scope |
74 | 91 | ([GH-923](https://github.com/NVIDIA/warp/issues/923)). |
75 | | -- Add `leaf_size` parameter to `wp.Bvh` and `bvh_leaf_size` to `wp.Mesh` to control the number of primitives per leaf |
76 | | - for performance tuning. The default is now 1 for `wp.Bvh` and 4 for `wp.Mesh`, changed from a hardcoded value of |
77 | | - 4 ([GH-994](https://github.com/NVIDIA/warp/issues/994)). |
78 | | -- **Breaking:** Change the default implementation of `jax_kernel()` to be `wp.jax_experimental.ffi.jax_kernel()`. |
79 | | - The previous version is still available as `wp.jax_experimental.custom_call.jax_kernel()`, but it is not supported with JAX v0.8 and newer |
80 | | - ([GH-974](https://github.com/NVIDIA/warp/issues/974)). |
| 92 | +- Improve efficiency for `wp.bvh_query_aabb()`, `wp.mesh_query_aabb()` and `wp.bvh_query_ray()`. |
| 93 | + This fixes a performance regression introduced in Warp 1.6.0 ([GH-758](https://github.com/NVIDIA/warp/issues/758)). |
81 | 94 |
|
82 | 95 | ### Fixed |
83 | 96 |
|
| 97 | +- Fix segmentation faults on AArch64 CPUs when using tiles. The fix uses stack memory for tile storage |
| 98 | + and is controlled by `wp.config.enable_tiles_in_stack_memory` (enabled by default) |
| 99 | + ([GH-957](https://github.com/NVIDIA/warp/issues/957)). |
84 | 100 | - Fix copying and filling arrays with large strides ([GH-929](https://github.com/NVIDIA/warp/issues/929)). |
85 | | -- Fix graph deletion during capture ([GH-992](https://github.com/NVIDIA/warp/issues/992)). |
86 | | -- Fix return type annotations for `struct()` and `overload()` decorators ([GH-971](https://github.com/NVIDIA/warp/pull/971)) |
87 | | -- Fix segmentation faults on AArch64 CPUs caused by referencing static memory. The LLVM JIT generates ADRP instructions |
88 | | - to address memory up to 4 GiB from the program counter, but the section for static memory may be further apart than |
89 | | - that. Work around it by reserving stack memory on kernel entry, tracked through the x28 register which is prevented |
90 | | - from being used as a scratch register. `wp.config.enable_tiles_in_stack_memory` can be used to enable (default) |
91 | | - or disable this new method ([GH-957](https://github.com/NVIDIA/warp/issues/957)). |
92 | | -- Fix arithmetic operators not working when a scalar is on the lhs and an array on the rhs |
93 | | - ([GH-892](https://github.com/NVIDIA/warp/issues/892)). |
94 | | -- Fix invalid keyword arguments not being detected in the `wp.transform()` constructor at Python scope |
| 101 | +- Fix incorrect results when filling arrays in CUDA graphs ([GH-1040](https://github.com/NVIDIA/warp/issues/1040)). |
| 102 | +- Defer CUDA graph deletion when graph captures are in progress ([GH-992](https://github.com/NVIDIA/warp/issues/992)). |
| 103 | +- Fix race conditions in CUDA graph destruction callbacks ([GH-1063](https://github.com/NVIDIA/warp/issues/1063)). |
| 104 | +- Fix arithmetic operators with scalars and arrays at the Python scope. Operations like `scalar * array` |
| 105 | + now work correctly (previously only `array * scalar` worked) ([GH-892](https://github.com/NVIDIA/warp/issues/892)). |
| 106 | +- Fix `wp.atomic_add()` failing to accumulate `wp.int64` values ([GH-977](https://github.com/NVIDIA/warp/issues/977)). |
| 107 | +- Fix handling of multi-line lambda expressions and lambda expressions involving parentheses in `wp.map()` |
| 108 | + ([GH-984](https://github.com/NVIDIA/warp/issues/984)). |
| 109 | +- Fix invalid keyword arguments not being detected in the `wp.transform()` constructor at the Python scope |
95 | 110 | ([GH-975](https://github.com/NVIDIA/warp/issues/975)). |
| 111 | +- Fix return type annotations for `struct()` and `overload()` decorators |
| 112 | + ([GH-971](https://github.com/NVIDIA/warp/pull/971)). |
| 113 | +- Suppress `TypeError` and `AttributeError` exceptions during Python interpreter shutdown when Warp objects are being |
| 114 | + cleaned up, as these can be safely ignored during process termination |
| 115 | + ([GH-1048](https://github.com/NVIDIA/warp/issues/1048)). |
96 | 116 |
|
97 | 117 | ## [1.9.1] - 2025-10-01 |
98 | 118 |
|
|
123 | 143 | - Fix handling of generic kernels with `wp.jax_experimental.ffi.jax_kernel()`. |
124 | 144 | - Update built-in documentation to accurately reflect their differentiability status |
125 | 145 | ([GH-970](https://github.com/NVIDIA/warp/issues/970)). |
126 | | -- Fix handling of multi-line lambda expressions and lambda expressions involving parentheses in `wp.map()` ([GH-984](https://github.com/NVIDIA/warp/issues/984)). |
127 | | -- Fix `wp.atomic_add()` for int64 type ([GH-977](https://github.com/NVIDIA/warp/issues/977)) |
128 | 146 |
|
129 | 147 | ## [1.9.0] - 2025-09-04 |
130 | 148 |
|
|
217 | 235 | - Fix adding superfluous inactive nodes to tetrahedron polynomial function spaces in `warp.fem`. |
218 | 236 | - Fix `#line` directives for Python↔CUDA source correlation not being emitted by default when a module is compiled in |
219 | 237 | debug mode ([GH-901](https://github.com/NVIDIA/warp/issues/901)). |
220 | | -- Fix 2D shared tile allocation/de-allocation bug inside Warp functions ([GH-877](https://github.com/NVIDIA/warp/issues/877)). |
221 | | -- Fix loading "unique" modules using `wp.load_module()`. |
222 | 238 |
|
223 | 239 | ## [1.8.1] - 2025-08-01 |
224 | 240 |
|
|
1923 | 1939 |
|
1924 | 1940 | - Initial publish for alpha testing |
1925 | 1941 |
|
1926 | | -[Unreleased]: https://github.com/NVIDIA/warp/compare/v1.9.0...HEAD |
| 1942 | +[1.10.0]: https://github.com/NVIDIA/warp/releases/tag/v1.10.0 |
| 1943 | +[1.9.1]: https://github.com/NVIDIA/warp/releases/tag/v1.9.1 |
1927 | 1944 | [1.9.0]: https://github.com/NVIDIA/warp/releases/tag/v1.9.0 |
1928 | 1945 | [1.8.1]: https://github.com/NVIDIA/warp/releases/tag/v1.8.1 |
1929 | 1946 | [1.8.0]: https://github.com/NVIDIA/warp/releases/tag/v1.8.0 |
|
0 commit comments