Conversation
Consider test.c:
...
int main (int argc, char **argv) {
printf ("argc: %u\n", argc);
return 0;
}
...
such that we have:
...
$ nvptx-none-run a.out
argc: 1
$ nvptx-none-run a.out bla
argc: 2
...
Given that the usage indicates that the program seperates the nvptx options
and the program arguments:
...
$ nvptx-none-run --help
Usage: nvptx-none-run [option...] program [argument...]
...
I'd expect:
...
$ nvptx-none-run a.out bla -V
argc: 3
...
but instead we get:
...
$ ./run.sh a.out bla -V
nvtpx-none-run (nvptx-tools) 1.0
<COPYRIGHT>
$
...
Fix this by calling getopt_long with optstring starting with '+'.
Add a --verbose flag to nvptx-run, such that we have: ... $ gcc ~/hello.c $ nvptx-none-run -v ./a.out Total device memory: 4242604032 (3.95 GiB) Initial free device memory: 4222156800 (3.93 GiB) Program args reservation (effective): 1048576 (1.00 MiB) Set stack size limit: 131072 (128.00 KiB) Stack size limit reservation (estimated): 1342177280 (1.25 GiB) Stack size limit reservation (effective): 1423966208 (1.32 GiB) Free device memory: 2797142016 (2.60 GiB) Set heap size limit: 268435456 (256.00 MiB) hello ...
|
Note: contains "[nvptx-run] Fix greedy option parsing" to avoid merge conflict. |
|
|
||
| size_t free_mem; | ||
| size_t dummy; |
There was a problem hiding this comment.
Should dummy move inside the if (verbose)?
| r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0); | ||
| fatal_unless_success (r, "could not set stack limit"); | ||
|
|
||
| r = cuMemGetInfo (&free_mem, &dummy); |
There was a problem hiding this comment.
Actually, doesn't dummy here (when given a better name) make obsolete the earlier cuDeviceTotalMem call?
cuMemGetInfo: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g808f555540d0143a331cc42aa98835c0cuDeviceTotalMem: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gc6a0d6551335a3780f9f3c967a0fde5d
Or, is total amount of memory available for allocation by the CUDA context vs. total amount of memory available on the device intentional?
| /* Set stack size limit to 0 to get more accurate free_mem. */ | ||
| r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 0); |
There was a problem hiding this comment.
From cuCtxSetLimit: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g0651954dfb9788173e60a9af7201e65a I can't easily tell the rationale here.
So, should we add more commentary for this, or point to an external URL if that makes sense?
| size_t free_mem_update; | ||
| r = cuMemGetInfo (&free_mem_update, &dummy); | ||
| fatal_unless_success (r, "could not get free memory"); | ||
| report_val (stderr, "Program args reservation (effective)", | ||
| free_mem - free_mem_update); |
There was a problem hiding this comment.
Doesn't this difference computation implicitly assume that nothing else is using the GPU concurrently? (Which is a wrong assumption?) Or, does every process/CUDA context always have available all the GPU memory -- I don't remember the details, and have not yet looked that up.
| size_t free_mem_update; | ||
| r = cuMemGetInfo (&free_mem_update, &dummy); | ||
| fatal_unless_success (r, "could not get free memory"); | ||
| report_val (stderr, "Stack size limit reservation (effective)", | ||
| free_mem - free_mem_update); |
No description provided.