My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.
I'm running on a GTX 580, for which nvidia-smi --gpu-reset
is not supported.
Placing cudaDeviceReset()
in the beginning of the program is only affecting the current context created by the process and doesn't flush the memory allocated before it.
I'm accessing a Fedora server with that GPU remotely, so physical reset is quite complicated.
So, the question is - Is there any way to flush the device memory in this situation?
This question is related to
cuda
gpgpu
remote-access
check what is using your GPU memory with
sudo fuser -v /dev/nvidia*
Your output will look something like this:
USER PID ACCESS COMMAND
/dev/nvidia0: root 1256 F...m Xorg
username 2057 F...m compiz
username 2759 F...m chrome
username 2777 F...m chrome
username 20450 F...m python
username 20699 F...m python
Then kill the PID that you no longer need on htop
or with
sudo kill -9 PID.
In the example above, Pycharm was eating a lot of memory so I killed 20450 and 20699.
on macOS (/ OS X), if someone else is having trouble with the OS apparently leaking memory:
for the ones using python:
import torch, gc
gc.collect()
torch.cuda.empty_cache()
One can also use nvtop
, which gives an interface very similar to htop
, but showing your GPU(s) usage instead, with a nice graph.
You can also kill processes directly from here.
Here is a link to its Github : https://github.com/Syllo/nvtop
I also had the same problem, and I saw a good solution in quora, using
sudo kill -9 PID.
see https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi
First type
nvidia-smi
then select the PID that you want to kill
sudo kill -9 PID
Source: Stackoverflow.com