[cuda] NVIDIA NVML Driver/library version mismatch

When I run nvidia-smi I get the following message:

Failed to initialize NVML: Driver/library version mismatch

An hour ago I received the same message and uninstalled my cuda library and I was able to run nvidia-smi, getting the following result:

nvidia-smi-result

After this I downloaded cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb from the official NVIDIA page and then simply:

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

Now I have cuda installed, but I get the mentioned mismatch error.


Some potentially useful information:

Running cat /proc/driver/nvidia/version I get:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  378.13  Tue Feb  7 20:10:06 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

I'm running Ubuntu 16.04.2 LTS.

Kernel release is: 4.4.0-66-generic.

Thanks!

This question is related to cuda driver gpu nvidia

The answer is


I have to restart my kernels and remove all the packages that I have installed previously(during the first installation), please make sure to delete all the packages, even after removing packages by command below

sudo apt-get --purge remove "nvidia"

the packages like "libtinfo6:i386" doesn't get removed

I'm using Ubuntu 20.04 and Nvidia-driver-440 for that you have to remove all the packages shown below image

List of all the packages that need to be remove:

img

as shown in the image, make sure that the package you're installing is of the correct size that is 207 Mb for Nvidia-driver-440, if it's less it means you haven't removed all the packages.


reboot. If the problem still exist:

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
nvidia-smi

for cent/rhel

cd /boot
mv initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut -vf initramfs-$(uname -r).img $(uname -r)

then

reboot

for debian/ubuntu

update-initramfs -u

if problem exist persist

apt install -y dkms && dkms install -m nvidia -v 440.82

Change 440.82 to your actual version.

tip: get the Nvidia driver version:

ls /usr/src

you will find the Nvidia driver dir such as nvidia-440.82


also you can remove all Nvidia pkg and reinstall driver again

apt purge nvidia*
apt purge *cuda*

#check
apt list -i |grep nvidia
apt list -i |grep cuda

As @etal said, rebooting can solve this problem, but I think a procedure without rebooting will help.

For Chinese, check my blog -> ???

The error message

NVML: Driver/library version mismatch

tell us the Nvidia driver kernel module (kmod) have a wrong version, so we should unload this driver, and then load the correct version of kmod

How to do that ?

First, we should know which drivers are loaded.

lsmod | grep nvidia

you may get

nvidia_uvm            634880  8
nvidia_drm             53248  0
nvidia_modeset        790528  1 nvidia_drm
nvidia              12312576  86 nvidia_modeset,nvidia_uvm

our final goal is to unload nvidia mod, so we should unload the module depend on nvidia

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm

then, unload nvidia

sudo rmmod nvidia

Troubleshooting

if you get an error like rmmod: ERROR: Module nvidia is in use, which indicates that the kernel module is in use, you should kill the process that using the kmod:

sudo lsof /dev/nvidia*

and then kill those process, then continue to unload the kmods

Test

confirm you successfully unload those kmods

lsmod | grep nvidia

you should get nothing, then confirm you can load the correct driver

nvidia-smi

you should get the correct output


I got the error failed to initialize NVML: Driver/Library version mismatch from my nvidia-gpu-temperature-indicator. And nvidia-smi failed to print any info. I tried to find if there were other versions of nvidia driver installed in my ubuntu. But I just found nvidia-driver-390. In the end, reboot helped me solve the problem.


I committed the container into a docker image. Then I recreate another container using this docker image and the problem was gone.


Had the issue too. (I'm running ubuntu 18.04)

What I did:

dpkg -l | grep -i nvidia

Then sudo apt-get remove --purge nvidia-381 (and every duplicate version, in my case I had 381, 384 and 387)

Then sudo ubuntu-drivers devices to list what's available

And I choose sudo apt install nvidia-driver-430

After that, nvidia-smi gave the correct output (no need to reboot). But I suppose you can reboot when in doubt.

I also followed this installation to reinstall cuda+cudnn.


This also happened to me on Ubuntu 16.04 using the nvidia-348 package (latest nvidia version on Ubuntu 16.04).

However I could resolve the problem by installing nvidia-390 through the Proprietary GPU Drivers PPA.

So a solution to the described problem on Ubuntu 16.04 is doing this:

  • sudo add-apt-repository ppa:graphics-drivers/ppa
  • sudo apt-get update
  • sudo apt-get install nvidia-390

Note: This guide assumes a clean Ubuntu install. If you have previous drivers installed a reboot migh be needed to reload all the kernel modules.


sudo reboot

Rebooting solved it for me.


The top-2 answers can't solve my problem. I found a solution at the Nvidia official forum solved my problem. The below error info may cause by installing two different versions of the driver by different approaches. For example, install Nvidia driver by the apt and the official installer.

Failed to initialize NVML: Driver/library version mismatch

To solve this problem, only need to execute one of the following two commands.

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall

First I installed the Nvidia driver.

Next I installed cuda.

Ater that I got the "Driver/library version mismatch" ERROR but I could see the cuda version so I purged the Nvidia driver and reinstall it.

Then it worked correctly.


Mostly reboot would fix the issue on Ubuntu 18.04.

The “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release that is incompatible with the CUDA toolkit version currently in use. Rebooting the compute nodes will generally resolve this issue.


If you've recently updated, a reboot might solve this problem.


I had reinstalled nvidia driver: run these commands in root mode:

  1. systemctl isolate multi-user.target

  2. modprobe -r nvidia-drm

  3. Reinstall Nvidia driver: chmod +x NVIDIA-Linux-x86_64–410.57.run

  4. systemctl start graphical.target

and finally check nvidia-smi

Thanks to: How To Install Nvidia Drivers and CUDA-10.0 for RTX 2080 Ti GPU on Ubuntu-16.04/18.04

How to unload kernel module 'nvidia-drm'?


These answers not worked for me:

https://stackoverflow.com/a/43023000/1179925

https://stackoverflow.com/a/45319156/1179925

https://stackoverflow.com/a/54349675/1179925

dmesg

NVRM: API mismatch: the client has the version 418.67, but
NVRM: this kernel module has the version 430.26.  Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.

Uninstall old driver 418.67 and install new driver 430.26 (download NVIDIA-Linux-x86_64-430.26.run):

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
chmod +x NVIDIA-Linux-x86_64-430.26.run
sudo ./NVIDIA-Linux-x86_64-430.26.run
[ignore abort]

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  430.26  Tue Jun  4 17:40:52 CDT 2019
GCC version:  gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)

So I was having this problem, none of the other remedies worked. The error message was opaque, but checking dmesg was key:

[   10.118255] NVRM: API mismatch: the client has the version 410.79, but
           NVRM: this kernel module has the version 384.130.  Please
           NVRM: make sure that this kernel module and all NVIDIA driver
           NVRM: components have the same version.

However I had completely removed the 384 version, and removed any remaining kernel drivers nvidia-384*. But even after reboot, I was still getting this. Seeing this meant that the kernel was still compiled to reference 384, but was only finding 410. So I recompiled my kernel:

# uname -a # find the kernel it's using
Linux blah 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# update-initramfs -c -k 4.13.0-43-generic #recompile it
# reboot

And then it worked.

After removing 384, I still had 384 files in: /var/lib/dkms/nvidia-XXX/XXX.YY/4.13.0-43-generic/x86_64/module /lib/modules/4.13.0-43-generic/kernel/drivers

I recommend using the locate command (not installed by default) rather than searching the filesystem every time.


I experienced this problem after a normal kernel update on a CentOS machine. Since all CUDA and nVidia drivers and libraries have been installed via YUM repositories, I managed to solve the issues using the following steps:

sudo yum remove nvidia-driver-*
sudo reboot
sudo yum install nvidia-driver-cuda nvidia-modprobe
sudo modprobe nvidia # or just reboot

It made sure my kernel and my nVidia driver are consistent. I reckon that just rebooting may result in wrong version of kernel module being loaded.


For completeness, I ran into this issue as well. In my case it turned out that because I had set Clang as my default compiler (using update-alternatives), nvidia-driver-440 failed to compile (check /var/crash/) even though apt didn't post any warnings. For me, the solution was to apt purge nvidia-*, set cc back to use gcc, reboot, and reinstall nvidia-driver-440.


Examples related to cuda

Which TensorFlow and CUDA version combinations are compatible? NVIDIA NVML Driver/library version mismatch How do I select which GPU to run a job on? How to verify CuDNN installation? Using GPU from a docker container? Using Java with Nvidia GPUs (CUDA) Error Message : Cannot find or open the PDB file How can I flush GPU memory using CUDA (physical reset is unavailable) What is the canonical way to check for errors using the CUDA runtime API? How to get the nvidia driver version from the command line?

Examples related to driver

NVIDIA NVML Driver/library version mismatch php artisan migrate throwing [PDO Exception] Could not find driver - Using Laravel ADB Driver and Windows 8.1 Nexus 7 (2013) and Win 7 64 - cannot install USB driver despite checking many forums and online resources IOCTL Linux device driver How to add the JDBC mysql driver to an Eclipse project? How to get the nvidia driver version from the command line? Setting up PostgreSQL ODBC on Windows java.sql.SQLException: No suitable driver found for jdbc:microsoft:sqlserver

Examples related to gpu

How do I use TensorFlow GPU? How to check if pytorch is using the GPU? NVIDIA NVML Driver/library version mismatch NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver How to get current available GPUs in tensorflow? How to tell if tensorflow is using gpu acceleration from inside python shell? Using Java with Nvidia GPUs (CUDA) Is it possible to run CUDA on AMD GPUs? How do I choose grid and block dimensions for CUDA kernels? Can I run CUDA on Intel's integrated graphics processor?

Examples related to nvidia

How to check if pytorch is using the GPU? NVIDIA NVML Driver/library version mismatch How do I select which GPU to run a job on? Error Message : Cannot find or open the PDB file Is it possible to run CUDA on AMD GPUs? Difference between nVidia Quadro and Geforce cards? How do I choose grid and block dimensions for CUDA kernels? Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation)