Commit 970a16f
tests: verify SKU customisation scripts
Add a script to enable both manual and automated testing of the
Azure SKU customisation scripts.
When running the tests manually, it will exercise all the different
supported SKU types via mocking and checking that appropriate links
are installed. It will not check that the customisation service is
active and running as manual mode is expected to used on dev
machines that are unsupported SKU types.
Manual testing like this may throw some warnings or errors because
hardware is not directly supported. For example, testing on a VM
type that does not have GPUs that are supported by the fabric
manager will result in warnings that the service failed to start:
$ sudo /opt/hpc/azure/tests/test-sku-setup.sh --manual
Testing standard_nc96ads_a100_v4
Test Passed: standard_nc96ads_a100_v4
Testing standard_nd40rs_v2
Test Passed: standard_nd40rs_v2
Testing standard_nd96asr_v4
Job for nvidia-fabricmanager.service failed because the control process exited with error code.
See "systemctl status nvidia-fabricmanager.service" and "journalctl -xeu nvidia-fabricmanager.service" for details.
NVIDIA Fabric Manager Inactive!
Test Passed: standard_nd96asr_v4
Testing standard_hb176rs_v4
Test Passed: standard_hb176rs_v4
Testing standard_nc80adis_h100_v5
Check NVLink status after reloading NVIDIA kernel modules...
NVLink is Active.
Test Passed: standard_nc80adis_h100_v5
Testing standard_nd96isr_h200_v5
Job for nvidia-fabricmanager.service failed because the control process exited with error code.
See "systemctl status nvidia-fabricmanager.service" and "journalctl -xeu nvidia-fabricmanager.service" for details.
NVIDIA Fabric Manager Inactive!
Test Passed: standard_nd96isr_h200_v5
$
Such warnings are fine.
When not in manual mode, the test expects that it is running on a
supported SKU VM (e.g. in the CI system) and will query the current
the SKU type.
If the SKU is unsupported, it will check that no files are currently
installed. It will fail in the casei where stale config files are
found:
$ sudo /opt/hpc/azure/tests/test-sku-setup.sh
Unknown SKU
Failed: Standard_NC8as_T4_v3: /etc/nccl.conf not empty
$
If the SKU is supported, it will check that appropriate files are
installed and the service is running.
Signed-off-by: Dave Chinner <dchinner@redhat.com>1 parent bc97a03 commit 970a16f
3 files changed
+163
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
195 | 195 | | |
196 | 196 | | |
197 | 197 | | |
198 | | - | |
| 198 | + | |
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
774 | 774 | | |
775 | 775 | | |
776 | 776 | | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
777 | 785 | | |
778 | 786 | | |
779 | 787 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
0 commit comments