Installing GPU drivers


After you create an instance with one or more GPUs, your system requires NVIDIA device drivers so that your applications can access the device. Make sure your virtual machine (VM) instances have enough free disk space (choose at least 30 GB for the boot disk when creating the new VM).

This document explains how to install NVIDIA proprietary drivers on VMs that were created with public images or custom images.

To install GRID drivers for virtual workstations, see Installing GRID drivers for virtual workstations.

Before you begin

NVIDIA driver, CUDA toolkit, and CUDA runtime versions

There are different versioned components of drivers and runtime that might be needed in your environment. These include the following components:

  • NVIDIA driver
  • CUDA toolkit
  • CUDA runtime

When installing these components, you have the ability to configure your environment to suit your needs. For example, if you have an earlier version of Tensorflow that works best with an earlier version of the CUDA toolkit, but the GPU that you want to use requires a later version of the NVIDIA driver, then you can install an earlier version of a CUDA toolkit along with a later version of the NVIDIA driver.

However, you must make sure that your NVIDIA driver and CUDA toolkit versions are compatible. For CUDA toolkit and NVIDIA driver compatibility, see the NVIDIA documentation about CUDA compatibility.

Required NVIDIA driver versions

NVIDIA GPUs running on Compute Engine must use the following NVIDIA driver versions:

  • For A100 GPUs:
    • Linux : 450.80.02 or later
    • Windows: 452.77 or later
  • For all other GPU types:
    • Linux : NVIDIA 410.79 driver or later
    • Windows : 426.00 driver or later

Installing GPU drivers on VMs

One way to install the NVIDIA driver on most VMs is to install the NVIDIA CUDA Toolkit.

To install the NVIDIA toolkit, complete the following steps:

  1. Select a CUDA toolkit that supports the minimum driver that you need.

  2. Connect to the VM where you want to install the driver.

  3. On your VM, download and install the CUDA toolkit. The installation guide for each recommended toolkit is found in the following table. Before you install the toolkit, make sure you complete the pre-installation steps found in the installation guide.

    GPU type Minimum recommended CUDA toolkit version Installation instructions
    • NVIDIA A100
    • NVIDIA T4
    • NVIDIA V100
    • NVIDIA P100
    • NVIDIA P4
    • NVIDIA K80

Examples

The following steps show examples of how to install CUDA 11 and the associated drivers for NVIDIA® GPUs on a few operating systems.

CentOS/RHEL

  1. Connect to the VM where you want to install the driver.

  2. Install latest kernel package. If needed, this command also reboots the system.

    sudo yum clean all
    sudo yum install -y kernel | grep -q 'already installed' || sudo reboot
    
  3. If the system rebooted in the previous step, reconnect to the VM.

  4. Install kernel headers and development packages.

    sudo yum install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
    
  5. Install the epel-release repository. This repository includes the DKMS packages, which are required to install NVIDIA drivers on CentOS.

    • CentOS 7/8 and RHEL 7

      sudo yum install epel-release
      
    • RHEL 8 only

      sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
      
  6. Install yum-utils.

    sudo yum install yum-utils
    
  7. Select a driver repository for the CUDA Toolkit and add it to your VM.

    • CentOS/RHEL 8

      sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
      
    • CentOS/RHEL 7

      sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
      
  8. Clean the Yum cache:

    sudo yum clean all
    
  9. Install CUDA.

    • CentOS/RHEL 8

      sudo dnf -y module install nvidia-driver:latest-dkms
      sudo dnf -y install cuda
      
    • CentOS/RHEL 7

      sudo yum -y install nvidia-driver-latest-dkms cuda
      
  10. Install the NVIDIA driver. This command installs CUDA 11.

    sudo yum -y install cuda-drivers
    

SLES

  1. Connect to the VM where you want to install the driver.

  2. Install the latest kernel package. If needed, this command also reboots the system.

    sudo zypper refresh
    sudo zypper up -y kernel-default | grep -q 'already installed' || sudo reboot
    
  3. If the system rebooted in the previous step, reconnect to the instance.

  4. Select a driver repository for the CUDA Toolkit and add it to your VM.

    sudo rpm --import https://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/7fa2af80.pub
    sudo zypper install https://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/cuda-11.0.3-1.x86_64.rpm
    
  5. Refresh Zypper.

    sudo zypper refresh
    
  6. Install CUDA, which includes the NVIDIA driver.

    sudo zypper install cuda
    

Ubuntu

  1. Connect to the VM where you want to install the driver.

  2. Install latest kernel package.

    sudo apt install linux-headers-$(uname -r)
  3. Select a driver repository for the CUDA Toolkit and install it on your VM. Follow the steps for your Ubuntu version.

    Ubuntu 20.04

    1. Download the Ubuntu 20.04 driver repository.

      curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
      
    2. Move repository.

      sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
      
    3. Fetch keys.

      sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
      
    4. Add repository.

      sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
      
    5. Update the package lists.

      sudo apt update
      
    6. Install CUDA, which includes the NVIDIA driver.

      sudo apt -y install cuda
      

    Ubuntu 18.04

    1. Download the Ubuntu 18.04 driver repository.

      curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
      
    2. Move repository.

      sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
      
    3. Fetch keys.

      sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
      
    4. Add repository.

      sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
      
    5. Update the package lists.

      sudo apt update
      
    6. Install CUDA, which includes the NVIDIA driver.

      sudo apt -y install cuda
      

    Ubuntu 16.04

    1. Download the Ubuntu 16.04 driver repository.

      curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
      
    2. Move repository.

      sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
      
    3. Fetch keys.

      sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
      
    4. Add repository.

      sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/ /"
      
    5. Update the package lists.

      sudo apt update
      
    6. Install CUDA, which includes the NVIDIA driver.

      sudo apt -y install cuda
      

Windows Server

  1. Connect to the instance where you want to install the driver.

  2. Download an .exe installer file to your instance that includes the R452 branch: NVIDIA 452.77 driver or greater. For most Windows Server instances, you can use one of the following options:

    For example in Windows Server 2019, you can open a PowerShell terminal as an administrator and use the Invoke-WebRequest command to download the driver installer that you need. Invoke-WebRequest is available on PowerShell 3.0 or later.

    Invoke-WebRequest https://developer.download.nvidia.com/compute/cuda/11.2.0/network_installers/cuda_11.2.0_win10_network.exe -O cuda_11.2.0_win10_network.exe
  3. Run the .exe installer. For example, you can open a PowerShell terminal as an administrator and run the following command:

    PS C:\> .\\cuda_11.2.0_win10_network.exe
    

Installing GPU drivers on VMs that use Secure Boot

VMS with Secure Boot enabled require all kernel modules to be signed by the key trusted by the system. Currently, only Ubuntu 18.04 and 20.04 with the default Secure Boot settings are supported to install the NVIDIA driver. Support for more operating systems is in progress.

Ubuntu 18.04 and 20.04

  1. Connect to the VM where you want to install the driver.

  2. Update the repository.

    sudo apt-get update
    
  3. Search for the most recent NVIDIA kernel module package or the version you want. This package contains NVIDIA kernel modules signed by the Ubuntu key. Run the following command to see the latest packages:

    NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp$' | awk '{print $1}' | sort | tail -n 1 | head -n 1 | awk -F"-" '{print $4}')
    

    For example, specify the number to 2 to get the next earlier version:

    NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp$' | awk '{print $1}' | sort | tail -n 2 | head -n 1 | awk -F"-" '{print $4}')
    

    You can check the picked driver version by running echo $NVIDIA_DRIVER_VERSION. The output is a version string like 455.

  4. Install the kernel module package and corresponding NVIDIA driver:

    sudo apt install linux-modules-nvidia-${NVIDIA_DRIVER_VERSION}-gcp nvidia-driver-${NVIDIA_DRIVER_VERSION}
    

    If the command failed with the package not found error, the latest nvidia driver might be missing from the repository. Return to the last step to find an earlier driver version.

  5. Verify that the NVIDIA driver is installed. You might need to reboot the VM.

  6. Configure APT to use the NVIDIA package repository.

    1. To help APT pick the correct dependency, pin the repositories as follows:

      sudo tee /etc/apt/preferences.d/cuda-repository-pin-600 > /dev/null <<EOL
      Package: nsight-compute
      Pin: origin *ubuntu.com*
      Pin-Priority: -1
      Package: nsight-systems Pin: origin *ubuntu.com* Pin-Priority: -1
      Package: nvidia-modprobe Pin: release l=NVIDIA CUDA Pin-Priority: 600
      Package: nvidia-settings Pin: release l=NVIDIA CUDA Pin-Priority: 600
      Package: * Pin: release l=NVIDIA CUDA Pin-Priority: 100 EOL

    2. Install software-properties-common. This is required if you are using Ubuntu minimal images.

      sudo apt install software-properties-common
      

    3. Add the NVIDIA repository:

      • Ubuntu 18.04

        sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
        sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
        
      • Ubuntu 20.04

        sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
        sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
        
  7. Find the compatible CUDA driver version.

    The following script determines the latest CUDA driver version that is compatible with the NVIDIA driver we just installed:

    CUDA_DRIVER_VERSION=$(apt-cache madison cuda-drivers | awk '{print $3}' | sort -r | while read line; do
       if dpkg --compare-versions $(dpkg-query -f='${Version}\n' -W nvidia-driver-${NVIDIA_DRIVER_VERSION}) ge $line ; then
           echo "$line"
           break
       fi
    done)
    

    You can check the CUDA driver version by running echo $CUDA_DRIVER_VERSION. The output is a version string like 455.32.00-1.

  8. Install CUDA drivers with the version identified from the previous step.

    sudo apt install cuda-drivers-${NVIDIA_DRIVER_VERSION}=${CUDA_DRIVER_VERSION} cuda-drivers=${CUDA_DRIVER_VERSION}
    

  9. Optional: Hold back dkms packages.

    After enabling Secure Boot, all kernel modules must be signed to be loaded. Kernel modules built by dkms don't work on the VM because they aren't properly signed by default. This is an optional step, but it can help prevent you from accidentally installing other dkms packages in the future.

    To hold dkms packages, run the following command:

    sudo apt-get remove dkms && sudo apt-mark hold dkms
    
  10. Install CUDA toolkit and runtime.

    Pick the suitable CUDA version. The following script determines the latest CUDA version that is compatible with the CUDA driver we just installed:

    CUDA_VERSION=$(apt-cache showpkg cuda-drivers | grep -o 'cuda-runtime-[0-9][0-9]-[0-9],cuda-drivers [0-9\.]*' | while read line; do
       if dpkg --compare-versions ${CUDA_DRIVER_VERSION} ge $(echo $line | grep -Eo '[[:digit:]]+\.[[:digit:]]+') ; then
           echo $(echo $line | grep -Eo '[[:digit:]]+-[[:digit:]]')
           break
       fi
    done)
    

    You can check the CUDA version by running echo $CUDA_VERSION. The output is a version string like 11-1.

    Install the CUDA package:

    sudo apt install cuda-${CUDA_VERSION}
    

  11. Verify the CUDA installation:

    sudo nvidia-smi
    /usr/local/cuda/bin/nvcc --version
    
    The first command prints the GPU information. The second command prints the installed CUDA compiler version.

Verifying the GPU driver install

After completing the driver installation steps, verify that the driver installed and initialized properly.

Linux

Connect to the Linux instance and use the nvidia-smi command to verify that the driver is running properly.

sudo nvidia-smi

The output is similar to the following:

Wed Oct 28 21:34:28 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P0    52W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

If this command fails, review the following:

  • Check if there is any GPU attached to the VM.

    Use the following command to check for any NVIDIA PCI devices:

    sudo lspci | grep -i "nvidia".

  • Check that the driver kernel version and the VM kernel version are the same.

    • To check the VM kernel version, run uname -r.
    • To check the driver kernel version, run sudo apt-cache show linux-modules-nvidia-NVIDIA_DRIVER_VERSION-gcp.

    If the versions don't match, reboot the VM to the new kernel version.

Windows Server

Connect to the Windows Server instance and open a PowerShell terminal as an administrator, then run the following command to verify that the driver is running properly.

&"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe"

The output is similar to the following:

Thu Feb  4 21:21:39 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.89       Driver Version: 460.89       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            TCC  | 00000000:00:04.0 Off |                    0 |
| N/A   31C    P8     7W /  75W |      8MiB /  7611MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

What's next?