Skip to content

Test times increased significantly #683

@maleadt

Description

@maleadt

I updated a local copy of GPUArrays.jl from 97bdfdb to f516fc2, and test times increased from 4 minutes 31 to a whopping 7m32.

Before:

[ Info: Running 5 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable.
                                                |          | ---------------- CPU ---------------- |
Test                                   (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
JLArray/alloc cache                         (2) |     1.00 |   0.01 |  0.8 |     192.21 |   557.28 |
JLArray/indexing scalar                     (3) |     5.81 |   0.12 |  2.0 |    1279.38 |   602.16 |
JLArray/indexing find                       (3) |     7.51 |   0.41 |  5.5 |    2602.77 |   765.70 |
JLArray/reductions/reducedim!               (4) |    28.30 |   0.86 |  3.0 |    6716.30 |   662.75 |
JLArray/math/power                          (6) |    29.88 |   1.32 |  4.4 |    7353.23 |   788.39 |
JLArray/reductions/any all count            (4) |     4.22 |   0.10 |  2.4 |    1054.88 |   732.34 |
JLArray/uniformscaling                      (4) |     4.40 |   0.08 |  1.9 |     685.22 |   768.08 |
JLArray/indexing multidimensional           (3) |    24.12 |   0.90 |  3.7 |    4996.91 |   866.39 |
JLArray/linalg/mul!/vector-matrix           (2) |    48.91 |   2.53 |  5.2 |   12632.54 |   814.38 |
JLArray/math/intrinsics                     (2) |     1.55 |   0.00 |  0.0 |     261.91 |   816.22 |
JLArray/reductions/mapreducedim!_large      (4) |    16.79 |   6.39 | 38.0 |   33495.93 |  1238.14 |
JLArray/linalg/NaN_false                    (2) |     9.20 |   0.57 |  6.2 |    2102.50 |   901.80 |
JLArray/statistics                          (2) |    35.87 |   2.51 |  7.0 |    7449.54 |   988.30 |
JLArray/reductions/minimum maximum extrema  (6) |    70.57 |   4.66 |  6.6 |   16305.19 |   992.94 |
JLArray/linalg/mul!/matrix-matrix           (3) |    63.02 |   3.71 |  5.9 |    8032.98 |   896.27 |
JLArray/ext/jld2                            (3) |    12.90 |   0.66 |  5.1 |    1569.69 |   973.78 |
JLArray/vectors                             (3) |     0.14 |   0.00 |  0.0 |      20.11 |  1019.12 |
JLArray/constructors                        (6) |    14.77 |   0.70 |  4.7 |    2026.22 |  1067.75 |
JLArray/linalg/norm                         (4) |    70.23 |   6.18 |  8.8 |   19389.82 |  1238.14 |
JLArray/random                              (3) |    11.97 |   0.82 |  6.8 |    1525.67 |  1058.08 |
JLArray/reductions/== isequal               (4) |    16.24 |   1.34 |  8.3 |    1812.29 |  1243.11 |
JLArray/base                                (6) |    34.23 |   6.37 | 18.6 |   10291.93 |  1337.23 |
JLArray/reductions/mapreduce                (2) |    54.59 |   4.71 |  8.6 |    8460.67 |  1112.97 |
JLArray/reductions/mapreducedim!            (4) |    16.72 |   1.65 |  9.8 |    2492.19 |  1286.08 |
Array/alloc cache                           (4) |     0.35 |   0.00 |  0.0 |      82.41 |  1317.64 |
Array/indexing scalar                       (4) |     1.68 |   0.00 |  0.0 |     133.01 |  1317.64 |
Array/reductions/reducedim!                 (4) |     0.01 |   0.00 |  0.0 |       2.43 |  1317.64 |
JLArray/reductions/sum prod                 (2) |    37.45 |   2.84 |  7.6 |    4407.01 |  1213.83 |
Array/linalg                                (4) |    21.69 |   3.05 | 14.0 |    3531.55 |  1424.41 |
Array/math/power                            (2) |     0.02 |   0.00 |  0.0 |       1.38 |  1245.14 |
JLArray/reductions/reduce                   (6) |    40.04 |   4.01 | 10.0 |    4899.03 |  1405.31 |
Array/linalg/mul!/vector-matrix             (4) |     1.59 |   0.00 |  0.0 |     256.59 |  1424.41 |
Array/reductions/any all count              (4) |     0.10 |   0.00 |  0.0 |      16.69 |  1424.41 |
Array/indexing find                         (2) |     2.44 |   0.00 |  0.0 |     409.17 |  1245.14 |
Array/reductions/minimum maximum extrema    (4) |     0.61 |   0.00 |  0.0 |      61.04 |  1424.41 |
Array/reductions/mapreducedim!_large        (4) |     0.51 |   0.06 | 12.0 |     825.78 |  1642.48 |
Array/uniformscaling                        (2) |     3.52 |   0.00 |  0.0 |     314.28 |  1245.14 |
Array/linalg/mul!/matrix-matrix             (4) |     1.44 |   0.00 |  0.0 |     157.98 |  1642.48 |
Array/math/intrinsics                       (2) |     0.00 |   0.00 |  0.0 |       0.03 |  1245.14 |
Array/indexing multidimensional             (6) |     8.25 |   0.00 |  0.0 |    1004.00 |  1405.31 |
Array/linalg/NaN_false                      (4) |     0.63 |   0.00 |  0.0 |     115.89 |  1642.48 |
Array/reductions/mapreduce                  (4) |     0.17 |   0.00 |  0.0 |      25.50 |  1642.48 |
Array/statistics                            (6) |     1.19 |   0.00 |  0.0 |     304.55 |  1406.94 |
Array/linalg/norm                           (2) |     8.27 |   0.00 |  0.0 |     859.41 |  1245.14 |
Array/ext/jld2                              (6) |     4.69 |   0.00 |  0.0 |     370.13 |  1443.97 |
Array/vectors                               (2) |     0.20 |   0.00 |  0.0 |      20.38 |  1279.97 |
Array/constructors                          (4) |     9.13 |   0.00 |  0.0 |    1047.21 |  1642.48 |
Array/reductions/== isequal                 (4) |     0.01 |   0.00 |  0.0 |       1.20 |  1642.48 |
Array/random                                (6) |     4.56 |   0.00 |  0.0 |     352.74 |  1443.97 |
Array/reductions/mapreducedim!              (6) |     0.57 |   0.00 |  0.0 |      38.10 |  1443.97 |
Array/reductions/reduce                     (6) |     0.01 |   0.00 |  0.0 |       2.18 |  1443.97 |
Array/reductions/sum prod                   (6) |     0.40 |   0.00 |  0.0 |      30.74 |  1443.97 |
JLArray/linalg                              (5) |   237.95 |  64.24 | 27.0 |  310193.91 |  1913.72 |
Array/broadcasting                          (4) |    16.89 |   0.59 |  3.5 |    2163.93 |  1642.48 |
Array/base                                  (2) |    22.55 |   7.00 | 31.0 |    8369.43 |  1479.42 |
JLArray/broadcasting                        (3) |   138.79 |  13.37 |  9.6 |   29606.22 |  1559.17 |
Testing finished in 4 minutes, 31 seconds, 745 milliseconds

Test Summary: |  Pass  Total  Time
  Overall     | 19257  19257
    SUCCESS
     Testing GPUArrays tests passed

After:

Running 5 tests in parallel. If this is too many, specify the `--jobs=N` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.
                                                 │          │ ──────────────── CPU ──────────────── │
Test                                    (Worker) │ Time (s) │ GC (s) │ GC % │ Alloc (MB) │ RSS (MB) │
JLArray/linalg/norm                          (5) │    59.42 │   3.09 │  5.2 │   21069.36 │  1069.19 │
JLArray/linalg/core                          (4) │    65.88 │   2.93 │  4.4 │   28870.92 │  1160.56 │
JLArray/broadcasting                         (3) │   102.76 │   4.14 │  4.0 │   31210.01 │   985.23 │
JLArray/reductions/sum prod                  (4) │    41.08 │   1.10 │  2.7 │    9050.56 │  1160.56 │
Array/sparse                                 (5) │    56.39 │   2.09 │  3.7 │    5549.36 │  1149.64 │
JLArray/linalg/kron                          (2) │   126.50 │  35.69 │ 28.2 │  281439.52 │  1312.09 │
JLArray/base                                 (2) │    32.07 │   2.93 │  9.1 │   10910.85 │  1312.09 │
JLArray/linalg/mul!/matrix-matrix            (4) │    70.91 │   4.56 │  6.4 │    7932.93 │  1160.56 │
JLArray/reductions/minimum maximum extrema   (3) │    83.81 │   6.40 │  7.6 │   15226.11 │  1190.20 │
JLArray/linalg/mul!/vector-matrix            (5) │    70.96 │   6.03 │  8.5 │   11755.09 │  1201.41 │
JLArray/reductions/reducedim!                (2) │    34.96 │   2.91 │  8.3 │    5368.52 │  1312.09 │
JLArray/statistics                           (4) │    31.62 │   2.31 │  7.3 │    5396.39 │  1223.30 │
JLArray/indexing multidimensional            (3) │    36.53 │   2.25 │  6.1 │    4658.91 │  1359.70 │
JLArray/math/power                           (2) │    38.51 │   3.32 │  8.6 │    5912.38 │  1312.09 │
JLArray/sparse                               (1) │   251.77 │  12.85 │  5.1 │   39745.49 │  1238.30 │
JLArray/reductions/reduce                    (5) │    56.04 │   5.21 │  9.3 │    7216.90 │  1264.22 │
JLArray/random                               (3) │    28.96 │  10.40 │ 35.9 │   24492.82 │  1749.39 │
Array/broadcasting                           (2) │    20.02 │   1.07 │  5.3 │    2150.61 │  1312.09 │
JLArray/reductions/mapreduce                 (4) │    43.36 │   3.43 │  7.9 │    4818.84 │  1314.19 │
JLArray/reductions/mapreducedim!             (5) │    19.92 │   1.26 │  6.3 │    2680.92 │  1366.38 │
JLArray/constructors                         (1) │    22.82 │   2.31 │ 10.1 │    2523.19 │  1324.73 │
Array/linalg/core                            (2) │    22.04 │   2.40 │ 10.9 │    3272.20 │  1386.05 │
JLArray/reductions/== isequal                (4) │    24.54 │   2.49 │ 10.1 │    1876.42 │  1385.86 │
JLArray/linalg/diagonal                      (3) │    28.02 │   4.08 │ 14.6 │    5015.29 │  1749.39 │
JLArray/ext/jld2                             (1) │    27.92 │   2.34 │  8.4 │    1575.89 │  1326.95 │
JLArray/reductions/mapreducedim!_large       (5) │    34.05 │  19.74 │ 58.0 │   33419.02 │  1731.38 │
Array/indexing multidimensional              (4) │    11.80 │   0.00 │  0.0 │     995.70 │  1385.86 │
JLArray/indexing find                        (2) │    18.85 │   2.10 │ 11.1 │    2445.32 │  1430.20 │
Array/constructors                           (3) │    13.17 │   0.00 │  0.0 │    1105.23 │  1749.39 │
Array/linalg/norm                            (1) │    10.90 │   0.00 │  0.0 │     904.70 │  1437.20 │
Array/indexing find                          (2) │     2.05 │   0.00 │  0.0 │     167.49 │  1430.20 │
Array/ext/jld2                               (4) │     7.48 │   0.00 │  0.0 │     372.85 │  1490.95 │
JLArray/linalg/NaN_false                     (5) │    14.86 │   1.49 │ 10.1 │    1952.90 │  1731.38 │
JLArray/indexing scalar                      (3) │    10.09 │   0.00 │  0.0 │     797.48 │  1749.39 │
Array/linalg/diagonal                        (2) │    10.64 │   0.00 │  0.0 │     742.81 │  1430.20 │
Array/uniformscaling                         (4) │     6.27 │   0.00 │  0.0 │     265.56 │  1490.95 │
Array/linalg/mul!/vector-matrix              (3) │     2.91 │   0.00 │  0.0 │     259.91 │  1749.39 │
JLArray/reductions/any all count             (5) │    11.56 │   0.00 │  0.0 │     867.47 │  1731.38 │
JLArray/alloc cache                          (4) │     1.41 │   0.00 │  0.0 │     138.24 │  1490.95 │
JLArray/uniformscaling                       (2) │     7.45 │   0.00 │  0.0 │     469.79 │  1430.20 │
Array/reductions/== isequal                  (3) │     0.98 │   0.00 │  0.0 │      33.26 │  1749.39 │
Array/linalg/mul!/matrix-matrix              (5) │     3.70 │   0.00 │  0.0 │     277.96 │  1731.38 │
JLArray/math/intrinsics                      (2) │     2.17 │   0.00 │  0.0 │     251.88 │  1430.20 │
Array/random                                 (4) │     3.31 │   0.88 │ 26.6 │    1061.72 │  1541.16 │
Array/base                                   (1) │    44.72 │  19.57 │ 43.8 │    7169.32 │  1627.19 │
Array/linalg/NaN_false                       (3) │     1.14 │   0.00 │  0.0 │     171.83 │  1749.39 │
Array/reductions/minimum maximum extrema     (5) │     0.81 │   0.00 │  0.0 │      54.87 │  1731.38 │
Array/reductions/reduce                      (2) │     0.28 │   0.00 │  0.0 │      35.12 │  1430.20 │
Array/indexing scalar                        (4) │     2.30 │   0.00 │  0.0 │     114.52 │  1541.16 │
Array/vectors                                (1) │     0.25 │   0.00 │  0.0 │      18.85 │  1627.19 │
Array/linalg/kron                            (5) │     0.43 │   0.00 │  0.0 │      33.18 │  1731.38 │
Array/statistics                             (3) │     2.94 │   0.00 │  0.0 │     337.54 │  1749.39 │
Array/reductions/mapreducedim!_large         (2) │     1.58 │   0.68 │ 43.2 │     833.03 │  1430.20 │
Array/reductions/reducedim!                  (4) │     0.45 │   0.00 │  0.0 │      30.59 │  1541.16 │
JLArray/vectors                              (2) │     0.37 │   0.00 │  0.0 │      19.74 │  1430.20 │
Array/alloc cache                            (5) │     1.36 │   0.00 │  0.0 │      79.14 │  1731.38 │
Array/reductions/sum prod                    (1) │     1.31 │   0.00 │  0.0 │      71.08 │  1627.19 │
Array/reductions/mapreducedim!               (3) │     1.37 │   0.00 │  0.0 │      38.47 │  1749.39 │
Array/reductions/any all count               (4) │     0.84 │   0.00 │  0.0 │     116.10 │  1541.16 │
Array/math/intrinsics                        (2) │     0.10 │   0.00 │  0.0 │       8.21 │  1430.20 │
Array/reductions/mapreduce                   (5) │     0.18 │   0.00 │  0.0 │      19.96 │  1731.38 │
Array/math/power                             (1) │     0.01 │   0.00 │  0.0 │       1.70 │  1627.19 │

Test Summary: |  Pass  Total     Time
  Overall     | 24397  24397  7m32.8s
    SUCCESS
     Testing GPUArrays tests passed

The offenders look to be:

JLArray/linalg/kron                          (2) │   126.50 │  35.69 │ 28.2 │  281439.52 │  1312.09 │
JLArray/sparse                               (1) │   251.77 │  12.85 │  5.1 │   39745.49 │  1238.30 │

On CUDA.jl I also noticed the kron memory usage is excessive:

                                                   |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                                      (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
gpuarrays/linalg/kron                         (24) |   122.93 |   0.03 |  0.0 |    5381.25 |   406.00 |  12.56 | 10.2 |   32525.51 |  2849.95 |

cc @kshyatt @albertomercurio

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions