-
Notifications
You must be signed in to change notification settings - Fork 93
Open
Description
I updated a local copy of GPUArrays.jl from 97bdfdb to f516fc2, and test times increased from 4 minutes 31 to a whopping 7m32.
Before:
[ Info: Running 5 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable.
| | ---------------- CPU ---------------- |
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
JLArray/alloc cache (2) | 1.00 | 0.01 | 0.8 | 192.21 | 557.28 |
JLArray/indexing scalar (3) | 5.81 | 0.12 | 2.0 | 1279.38 | 602.16 |
JLArray/indexing find (3) | 7.51 | 0.41 | 5.5 | 2602.77 | 765.70 |
JLArray/reductions/reducedim! (4) | 28.30 | 0.86 | 3.0 | 6716.30 | 662.75 |
JLArray/math/power (6) | 29.88 | 1.32 | 4.4 | 7353.23 | 788.39 |
JLArray/reductions/any all count (4) | 4.22 | 0.10 | 2.4 | 1054.88 | 732.34 |
JLArray/uniformscaling (4) | 4.40 | 0.08 | 1.9 | 685.22 | 768.08 |
JLArray/indexing multidimensional (3) | 24.12 | 0.90 | 3.7 | 4996.91 | 866.39 |
JLArray/linalg/mul!/vector-matrix (2) | 48.91 | 2.53 | 5.2 | 12632.54 | 814.38 |
JLArray/math/intrinsics (2) | 1.55 | 0.00 | 0.0 | 261.91 | 816.22 |
JLArray/reductions/mapreducedim!_large (4) | 16.79 | 6.39 | 38.0 | 33495.93 | 1238.14 |
JLArray/linalg/NaN_false (2) | 9.20 | 0.57 | 6.2 | 2102.50 | 901.80 |
JLArray/statistics (2) | 35.87 | 2.51 | 7.0 | 7449.54 | 988.30 |
JLArray/reductions/minimum maximum extrema (6) | 70.57 | 4.66 | 6.6 | 16305.19 | 992.94 |
JLArray/linalg/mul!/matrix-matrix (3) | 63.02 | 3.71 | 5.9 | 8032.98 | 896.27 |
JLArray/ext/jld2 (3) | 12.90 | 0.66 | 5.1 | 1569.69 | 973.78 |
JLArray/vectors (3) | 0.14 | 0.00 | 0.0 | 20.11 | 1019.12 |
JLArray/constructors (6) | 14.77 | 0.70 | 4.7 | 2026.22 | 1067.75 |
JLArray/linalg/norm (4) | 70.23 | 6.18 | 8.8 | 19389.82 | 1238.14 |
JLArray/random (3) | 11.97 | 0.82 | 6.8 | 1525.67 | 1058.08 |
JLArray/reductions/== isequal (4) | 16.24 | 1.34 | 8.3 | 1812.29 | 1243.11 |
JLArray/base (6) | 34.23 | 6.37 | 18.6 | 10291.93 | 1337.23 |
JLArray/reductions/mapreduce (2) | 54.59 | 4.71 | 8.6 | 8460.67 | 1112.97 |
JLArray/reductions/mapreducedim! (4) | 16.72 | 1.65 | 9.8 | 2492.19 | 1286.08 |
Array/alloc cache (4) | 0.35 | 0.00 | 0.0 | 82.41 | 1317.64 |
Array/indexing scalar (4) | 1.68 | 0.00 | 0.0 | 133.01 | 1317.64 |
Array/reductions/reducedim! (4) | 0.01 | 0.00 | 0.0 | 2.43 | 1317.64 |
JLArray/reductions/sum prod (2) | 37.45 | 2.84 | 7.6 | 4407.01 | 1213.83 |
Array/linalg (4) | 21.69 | 3.05 | 14.0 | 3531.55 | 1424.41 |
Array/math/power (2) | 0.02 | 0.00 | 0.0 | 1.38 | 1245.14 |
JLArray/reductions/reduce (6) | 40.04 | 4.01 | 10.0 | 4899.03 | 1405.31 |
Array/linalg/mul!/vector-matrix (4) | 1.59 | 0.00 | 0.0 | 256.59 | 1424.41 |
Array/reductions/any all count (4) | 0.10 | 0.00 | 0.0 | 16.69 | 1424.41 |
Array/indexing find (2) | 2.44 | 0.00 | 0.0 | 409.17 | 1245.14 |
Array/reductions/minimum maximum extrema (4) | 0.61 | 0.00 | 0.0 | 61.04 | 1424.41 |
Array/reductions/mapreducedim!_large (4) | 0.51 | 0.06 | 12.0 | 825.78 | 1642.48 |
Array/uniformscaling (2) | 3.52 | 0.00 | 0.0 | 314.28 | 1245.14 |
Array/linalg/mul!/matrix-matrix (4) | 1.44 | 0.00 | 0.0 | 157.98 | 1642.48 |
Array/math/intrinsics (2) | 0.00 | 0.00 | 0.0 | 0.03 | 1245.14 |
Array/indexing multidimensional (6) | 8.25 | 0.00 | 0.0 | 1004.00 | 1405.31 |
Array/linalg/NaN_false (4) | 0.63 | 0.00 | 0.0 | 115.89 | 1642.48 |
Array/reductions/mapreduce (4) | 0.17 | 0.00 | 0.0 | 25.50 | 1642.48 |
Array/statistics (6) | 1.19 | 0.00 | 0.0 | 304.55 | 1406.94 |
Array/linalg/norm (2) | 8.27 | 0.00 | 0.0 | 859.41 | 1245.14 |
Array/ext/jld2 (6) | 4.69 | 0.00 | 0.0 | 370.13 | 1443.97 |
Array/vectors (2) | 0.20 | 0.00 | 0.0 | 20.38 | 1279.97 |
Array/constructors (4) | 9.13 | 0.00 | 0.0 | 1047.21 | 1642.48 |
Array/reductions/== isequal (4) | 0.01 | 0.00 | 0.0 | 1.20 | 1642.48 |
Array/random (6) | 4.56 | 0.00 | 0.0 | 352.74 | 1443.97 |
Array/reductions/mapreducedim! (6) | 0.57 | 0.00 | 0.0 | 38.10 | 1443.97 |
Array/reductions/reduce (6) | 0.01 | 0.00 | 0.0 | 2.18 | 1443.97 |
Array/reductions/sum prod (6) | 0.40 | 0.00 | 0.0 | 30.74 | 1443.97 |
JLArray/linalg (5) | 237.95 | 64.24 | 27.0 | 310193.91 | 1913.72 |
Array/broadcasting (4) | 16.89 | 0.59 | 3.5 | 2163.93 | 1642.48 |
Array/base (2) | 22.55 | 7.00 | 31.0 | 8369.43 | 1479.42 |
JLArray/broadcasting (3) | 138.79 | 13.37 | 9.6 | 29606.22 | 1559.17 |
Testing finished in 4 minutes, 31 seconds, 745 milliseconds
Test Summary: | Pass Total Time
Overall | 19257 19257
SUCCESS
Testing GPUArrays tests passed
After:
Running 5 tests in parallel. If this is too many, specify the `--jobs=N` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.
│ │ ──────────────── CPU ──────────────── │
Test (Worker) │ Time (s) │ GC (s) │ GC % │ Alloc (MB) │ RSS (MB) │
JLArray/linalg/norm (5) │ 59.42 │ 3.09 │ 5.2 │ 21069.36 │ 1069.19 │
JLArray/linalg/core (4) │ 65.88 │ 2.93 │ 4.4 │ 28870.92 │ 1160.56 │
JLArray/broadcasting (3) │ 102.76 │ 4.14 │ 4.0 │ 31210.01 │ 985.23 │
JLArray/reductions/sum prod (4) │ 41.08 │ 1.10 │ 2.7 │ 9050.56 │ 1160.56 │
Array/sparse (5) │ 56.39 │ 2.09 │ 3.7 │ 5549.36 │ 1149.64 │
JLArray/linalg/kron (2) │ 126.50 │ 35.69 │ 28.2 │ 281439.52 │ 1312.09 │
JLArray/base (2) │ 32.07 │ 2.93 │ 9.1 │ 10910.85 │ 1312.09 │
JLArray/linalg/mul!/matrix-matrix (4) │ 70.91 │ 4.56 │ 6.4 │ 7932.93 │ 1160.56 │
JLArray/reductions/minimum maximum extrema (3) │ 83.81 │ 6.40 │ 7.6 │ 15226.11 │ 1190.20 │
JLArray/linalg/mul!/vector-matrix (5) │ 70.96 │ 6.03 │ 8.5 │ 11755.09 │ 1201.41 │
JLArray/reductions/reducedim! (2) │ 34.96 │ 2.91 │ 8.3 │ 5368.52 │ 1312.09 │
JLArray/statistics (4) │ 31.62 │ 2.31 │ 7.3 │ 5396.39 │ 1223.30 │
JLArray/indexing multidimensional (3) │ 36.53 │ 2.25 │ 6.1 │ 4658.91 │ 1359.70 │
JLArray/math/power (2) │ 38.51 │ 3.32 │ 8.6 │ 5912.38 │ 1312.09 │
JLArray/sparse (1) │ 251.77 │ 12.85 │ 5.1 │ 39745.49 │ 1238.30 │
JLArray/reductions/reduce (5) │ 56.04 │ 5.21 │ 9.3 │ 7216.90 │ 1264.22 │
JLArray/random (3) │ 28.96 │ 10.40 │ 35.9 │ 24492.82 │ 1749.39 │
Array/broadcasting (2) │ 20.02 │ 1.07 │ 5.3 │ 2150.61 │ 1312.09 │
JLArray/reductions/mapreduce (4) │ 43.36 │ 3.43 │ 7.9 │ 4818.84 │ 1314.19 │
JLArray/reductions/mapreducedim! (5) │ 19.92 │ 1.26 │ 6.3 │ 2680.92 │ 1366.38 │
JLArray/constructors (1) │ 22.82 │ 2.31 │ 10.1 │ 2523.19 │ 1324.73 │
Array/linalg/core (2) │ 22.04 │ 2.40 │ 10.9 │ 3272.20 │ 1386.05 │
JLArray/reductions/== isequal (4) │ 24.54 │ 2.49 │ 10.1 │ 1876.42 │ 1385.86 │
JLArray/linalg/diagonal (3) │ 28.02 │ 4.08 │ 14.6 │ 5015.29 │ 1749.39 │
JLArray/ext/jld2 (1) │ 27.92 │ 2.34 │ 8.4 │ 1575.89 │ 1326.95 │
JLArray/reductions/mapreducedim!_large (5) │ 34.05 │ 19.74 │ 58.0 │ 33419.02 │ 1731.38 │
Array/indexing multidimensional (4) │ 11.80 │ 0.00 │ 0.0 │ 995.70 │ 1385.86 │
JLArray/indexing find (2) │ 18.85 │ 2.10 │ 11.1 │ 2445.32 │ 1430.20 │
Array/constructors (3) │ 13.17 │ 0.00 │ 0.0 │ 1105.23 │ 1749.39 │
Array/linalg/norm (1) │ 10.90 │ 0.00 │ 0.0 │ 904.70 │ 1437.20 │
Array/indexing find (2) │ 2.05 │ 0.00 │ 0.0 │ 167.49 │ 1430.20 │
Array/ext/jld2 (4) │ 7.48 │ 0.00 │ 0.0 │ 372.85 │ 1490.95 │
JLArray/linalg/NaN_false (5) │ 14.86 │ 1.49 │ 10.1 │ 1952.90 │ 1731.38 │
JLArray/indexing scalar (3) │ 10.09 │ 0.00 │ 0.0 │ 797.48 │ 1749.39 │
Array/linalg/diagonal (2) │ 10.64 │ 0.00 │ 0.0 │ 742.81 │ 1430.20 │
Array/uniformscaling (4) │ 6.27 │ 0.00 │ 0.0 │ 265.56 │ 1490.95 │
Array/linalg/mul!/vector-matrix (3) │ 2.91 │ 0.00 │ 0.0 │ 259.91 │ 1749.39 │
JLArray/reductions/any all count (5) │ 11.56 │ 0.00 │ 0.0 │ 867.47 │ 1731.38 │
JLArray/alloc cache (4) │ 1.41 │ 0.00 │ 0.0 │ 138.24 │ 1490.95 │
JLArray/uniformscaling (2) │ 7.45 │ 0.00 │ 0.0 │ 469.79 │ 1430.20 │
Array/reductions/== isequal (3) │ 0.98 │ 0.00 │ 0.0 │ 33.26 │ 1749.39 │
Array/linalg/mul!/matrix-matrix (5) │ 3.70 │ 0.00 │ 0.0 │ 277.96 │ 1731.38 │
JLArray/math/intrinsics (2) │ 2.17 │ 0.00 │ 0.0 │ 251.88 │ 1430.20 │
Array/random (4) │ 3.31 │ 0.88 │ 26.6 │ 1061.72 │ 1541.16 │
Array/base (1) │ 44.72 │ 19.57 │ 43.8 │ 7169.32 │ 1627.19 │
Array/linalg/NaN_false (3) │ 1.14 │ 0.00 │ 0.0 │ 171.83 │ 1749.39 │
Array/reductions/minimum maximum extrema (5) │ 0.81 │ 0.00 │ 0.0 │ 54.87 │ 1731.38 │
Array/reductions/reduce (2) │ 0.28 │ 0.00 │ 0.0 │ 35.12 │ 1430.20 │
Array/indexing scalar (4) │ 2.30 │ 0.00 │ 0.0 │ 114.52 │ 1541.16 │
Array/vectors (1) │ 0.25 │ 0.00 │ 0.0 │ 18.85 │ 1627.19 │
Array/linalg/kron (5) │ 0.43 │ 0.00 │ 0.0 │ 33.18 │ 1731.38 │
Array/statistics (3) │ 2.94 │ 0.00 │ 0.0 │ 337.54 │ 1749.39 │
Array/reductions/mapreducedim!_large (2) │ 1.58 │ 0.68 │ 43.2 │ 833.03 │ 1430.20 │
Array/reductions/reducedim! (4) │ 0.45 │ 0.00 │ 0.0 │ 30.59 │ 1541.16 │
JLArray/vectors (2) │ 0.37 │ 0.00 │ 0.0 │ 19.74 │ 1430.20 │
Array/alloc cache (5) │ 1.36 │ 0.00 │ 0.0 │ 79.14 │ 1731.38 │
Array/reductions/sum prod (1) │ 1.31 │ 0.00 │ 0.0 │ 71.08 │ 1627.19 │
Array/reductions/mapreducedim! (3) │ 1.37 │ 0.00 │ 0.0 │ 38.47 │ 1749.39 │
Array/reductions/any all count (4) │ 0.84 │ 0.00 │ 0.0 │ 116.10 │ 1541.16 │
Array/math/intrinsics (2) │ 0.10 │ 0.00 │ 0.0 │ 8.21 │ 1430.20 │
Array/reductions/mapreduce (5) │ 0.18 │ 0.00 │ 0.0 │ 19.96 │ 1731.38 │
Array/math/power (1) │ 0.01 │ 0.00 │ 0.0 │ 1.70 │ 1627.19 │
Test Summary: | Pass Total Time
Overall | 24397 24397 7m32.8s
SUCCESS
Testing GPUArrays tests passed
The offenders look to be:
JLArray/linalg/kron (2) │ 126.50 │ 35.69 │ 28.2 │ 281439.52 │ 1312.09 │
JLArray/sparse (1) │ 251.77 │ 12.85 │ 5.1 │ 39745.49 │ 1238.30 │
On CUDA.jl I also noticed the kron memory usage is excessive:
| | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
gpuarrays/linalg/kron (24) | 122.93 | 0.03 | 0.0 | 5381.25 | 406.00 | 12.56 | 10.2 | 32525.51 | 2849.95 |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels