Set the following two environment variables:
NVIDIA_VISIBLE_DEVICES=$gpu_id
CUDA_VISIBLE_DEVICES=0
where gpu_id
is the ID of your selected GPU, as seen in the host system's nvidia-smi
(a 0-based integer) that will be made available to the guest system (e.g. to the Docker container environment).
You can verify that a different card is selected for each value of gpu_id by inspecting Bus-Id
parameter in nvidia-smi
run in a terminal in the guest system).
This method based on NVIDIA_VISIBLE_DEVICES
exposes only a single card to the system (with local ID zero), hence we also hard-code the other variable, CUDA_VISIBLE_DEVICES
to 0 (mainly to prevent it from defaulting to an empty string that would indicate no GPU).
Note that the environmental variable should be set before the guest system is started (so no chances of doing it in your Jupyter Notebook's terminal), for instance using docker run -e NVIDIA_VISIBLE_DEVICES=0
or env
in Kubernetes or Openshift.
If you want GPU load-balancing, make gpu_id
random at each guest system start.
If setting this with python, make sure you are using strings for all environment variables, including numerical ones.
You can verify that a different card is selected for each value of gpu_id
by inspecting nvidia-smi
's Bus-Id parameter (in a terminal run in the guest system).
The accepted solution based on CUDA_VISIBLE_DEVICES
alone does not hide other cards (different from the pinned one), and thus causes access errors if you try to use them in your GPU-enabled python packages. With this solution, other cards are not visible to the guest system, but other users still can access them and share their computing power on an equal basis, just like with CPU's (verified).
This is also preferable to solutions using Kubernetes / Openshift controlers (resources.limits.nvidia.com/gpu
), that would impose a lock on the allocated card, removing it from the pool of available resources (so the number of containers with GPU access could not exceed the number of physical cards).
This has been tested under CUDA 8.0, 9.0 and 10.1 in docker containers running Ubuntu 18.04 orchestrated by Openshift 3.11.