Installing GPU drivers


After you create an instance with one or more GPUs, your system requires NVIDIA device drivers so that your applications can access the device. Make sure your virtual machine (VM) instances have enough free disk space (choose at least 30 GB for the boot disk when creating the new VM).

This document explains how to install NVIDIA proprietary drivers on VMs that were created with public images or custom images.

To install GRID drivers for virtual workstations, see Installing GRID drivers for virtual workstations.

Before you begin

NVIDIA driver, CUDA toolkit, and CUDA runtime versions

There are different versioned components of drivers and runtime that might be needed in your environment. These include the following components:

  • NVIDIA driver
  • CUDA toolkit
  • CUDA runtime

When installing these components, you have the ability to configure your environment to suit your needs. For example, if you have an earlier version of Tensorflow that works best with an earlier version of the CUDA toolkit, but the GPU that you want to use requires a later version of the NVIDIA driver, then you can install an earlier version of a CUDA toolkit along with a later version of the NVIDIA driver.

However, you must make sure that your NVIDIA driver and CUDA toolkit versions are compatible. For CUDA toolkit and NVIDIA driver compatibility, see the NVIDIA documentation about CUDA compatibility.

Required NVIDIA driver versions

NVIDIA GPUs running on Compute Engine must use the following NVIDIA driver versions:

  • For A100 GPUs:
    • Linux : 450.80.02 or later
    • Windows: 452.77 or later
  • For all other GPU types:
    • Linux : NVIDIA 410.79 driver or later
    • Windows : 426.00 driver or later

Installing GPU drivers on VMs

One way to install the NVIDIA driver on most VMs is to install the NVIDIA CUDA Toolkit.

To install the NVIDIA toolkit, complete the following steps:

  1. Select a CUDA toolkit that supports the minimum driver that you need.

  2. Connect to the VM where you want to install the driver.

  3. On your VM, download and install the CUDA toolkit. The installation guide for each recommended toolkit is found in the following table. Before you install the toolkit, make sure you complete the pre-installation steps found in the installation guide.

    GPU type Minimum recommended CUDA toolkit version Installation instructions
    • NVIDIA A100
    • NVIDIA T4
    • NVIDIA V100
    • NVIDIA P100
    • NVIDIA P4
    • NVIDIA K80

Installation scripts

You can use the following scripts to automate the installation process. To review these scripts, see the GitHub repository.

Linux

Supported operating systems

The Linux installation script was tested on the following operating systems:

  • CentOS 7 and 8
  • Debian 10 and 11
  • Red Hat Enterprise Linux (RHEL) 7 and 8
  • SUSE Linux Enterprise Server (SUSE) 15
  • Ubuntu 18 and 20

The installation of GPU drivers on other systems using this script, might fail.

  1. Ensure that Python 3 is installed on your operating system.

  2. Download the installation script.

    curl https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py
  3. Run the installation script.

    sudo python3 install_gpu_driver.py

    The script takes some time to run. It might restart your VM. If the VM restarts, run the script again to continue the installation.

  4. Verify the installation. See Verifying the GPU driver install.

Windows

This installation script can be used on VMs that have secure boot enabled.

Open a PowerShell terminal as an administrator, then complete the following steps:

  1. Download the script.

    Invoke-WebRequest https://github.com/GoogleCloudPlatform/compute-gpu-installation/raw/main/windows/install_gpu_driver.ps1 -OutFile C:\install_gpu_driver.ps1
  2. Run the script.

    C:\install_gpu_driver.ps1

    The script takes some time to run. No command prompts are given during the installation process. Once the script exits, the driver is installed.

    This script installs the drivers in the following default location on your VM: "C:\Program Files\NVIDIA Corporation".

  3. Verify the installation. See Verifying the GPU driver install.

Installing GPU drivers on VMs that use Secure Boot

VMS with Secure Boot enabled require all kernel modules to be signed by the key trusted by the system.

OS support

  • For installation of NVIDIA drivers on Windows operating that use Secure Boot, see the general Installing GPU drivers on VMs section.
  • For Linux operating systems, support is only available for Ubuntu 18.04 and 20.04 operating systems. Support for more operating systems is in progress.

Ubuntu 18.04 and 20.04

  1. Connect to the VM where you want to install the driver.

  2. Update the repository.

    sudo apt-get update
    
  3. Search for the most recent NVIDIA kernel module package or the version you want. This package contains NVIDIA kernel modules signed by the Ubuntu key. Run the following command to see the latest packages:

    NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp$' | awk '{print $1}' | sort | tail -n 1 | head -n 1 | awk -F"-" '{print $4}')
    

    For example, specify the number to 2 to get the next earlier version:

    NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp$' | awk '{print $1}' | sort | tail -n 2 | head -n 1 | awk -F"-" '{print $4}')
    

    You can check the picked driver version by running echo $NVIDIA_DRIVER_VERSION. The output is a version string like 455.

  4. Install the kernel module package and corresponding NVIDIA driver:

    sudo apt install linux-modules-nvidia-${NVIDIA_DRIVER_VERSION}-gcp nvidia-driver-${NVIDIA_DRIVER_VERSION}
    

    If the command failed with the package not found error, the latest nvidia driver might be missing from the repository. Return to the last step to find an earlier driver version.

  5. Verify that the NVIDIA driver is installed. You might need to reboot the VM.

  6. Configure APT to use the NVIDIA package repository.

    1. To help APT pick the correct dependency, pin the repositories as follows:

      sudo tee /etc/apt/preferences.d/cuda-repository-pin-600 > /dev/null <<EOL
      Package: nsight-compute
      Pin: origin *ubuntu.com*
      Pin-Priority: -1
      Package: nsight-systems Pin: origin *ubuntu.com* Pin-Priority: -1
      Package: nvidia-modprobe Pin: release l=NVIDIA CUDA Pin-Priority: 600
      Package: nvidia-settings Pin: release l=NVIDIA CUDA Pin-Priority: 600
      Package: * Pin: release l=NVIDIA CUDA Pin-Priority: 100 EOL

    2. Install software-properties-common. This is required if you are using Ubuntu minimal images.

      sudo apt install software-properties-common
      

    3. Add the NVIDIA repository:

      • Ubuntu 18.04

        sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
        sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
        
      • Ubuntu 20.04

        sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
        sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
        
  7. Find the compatible CUDA driver version.

    The following script determines the latest CUDA driver version that is compatible with the NVIDIA driver we just installed:

    CUDA_DRIVER_VERSION=$(apt-cache madison cuda-drivers | awk '{print $3}' | sort -r | while read line; do
       if dpkg --compare-versions $(dpkg-query -f='${Version}\n' -W nvidia-driver-${NVIDIA_DRIVER_VERSION}) ge $line ; then
           echo "$line"
           break
       fi
    done)
    

    You can check the CUDA driver version by running echo $CUDA_DRIVER_VERSION. The output is a version string like 455.32.00-1.

  8. Install CUDA drivers with the version identified from the previous step.

    sudo apt install cuda-drivers-${NVIDIA_DRIVER_VERSION}=${CUDA_DRIVER_VERSION} cuda-drivers=${CUDA_DRIVER_VERSION}
    

  9. Optional: Hold back dkms packages.

    After enabling Secure Boot, all kernel modules must be signed to be loaded. Kernel modules built by dkms don't work on the VM because they aren't properly signed by default. This is an optional step, but it can help prevent you from accidentally installing other dkms packages in the future.

    To hold dkms packages, run the following command:

    sudo apt-get remove dkms && sudo apt-mark hold dkms
    
  10. Install CUDA toolkit and runtime.

    Pick the suitable CUDA version. The following script determines the latest CUDA version that is compatible with the CUDA driver we just installed:

    CUDA_VERSION=$(apt-cache showpkg cuda-drivers | grep -o 'cuda-runtime-[0-9][0-9]-[0-9],cuda-drivers [0-9\.]*' | while read line; do
       if dpkg --compare-versions ${CUDA_DRIVER_VERSION} ge $(echo $line | grep -Eo '[[:digit:]]+\.[[:digit:]]+') ; then
           echo $(echo $line | grep -Eo '[[:digit:]]+-[[:digit:]]')
           break
       fi
    done)
    

    You can check the CUDA version by running echo $CUDA_VERSION. The output is a version string like 11-1.

    Install the CUDA package:

    sudo apt install cuda-${CUDA_VERSION}
    

  11. Verify the CUDA installation:

    sudo nvidia-smi
    /usr/local/cuda/bin/nvcc --version
    
    The first command prints the GPU information. The second command prints the installed CUDA compiler version.

Verifying the GPU driver install

After completing the driver installation steps, verify that the driver installed and initialized properly.

Linux

Connect to the Linux instance and use the nvidia-smi command to verify that the driver is running properly.

sudo nvidia-smi

The output is similar to the following:

Mon Oct 11 12:51:37 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P0    50W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

If this command fails, review the following:

  • Check if there is any GPU attached to the VM.

    Use the following command to check for any NVIDIA PCI devices:

    sudo lspci | grep -i "nvidia".

  • Check that the driver kernel version and the VM kernel version are the same.

    • To check the VM kernel version, run uname -r.
    • To check the driver kernel version, run sudo apt-cache show linux-modules-nvidia-NVIDIA_DRIVER_VERSION-gcp.

    If the versions don't match, reboot the VM to the new kernel version.

Windows Server

Connect to the Windows Server instance and open a PowerShell terminal as an administrator, then run the following command to verify that the driver is running properly.

&"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe"

The output is similar to the following:

Mon Oct 11 12:13:10 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 462.31       Driver Version: 462.31       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4           WDDM  | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P8    18W /  70W |    570MiB / 15360MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       408    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A      3120    C+G   ...w5n1h2txyewy\SearchUI.exe    N/A      |
|    0   N/A  N/A      4056    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A      4176    C+G   ...y\ShellExperienceHost.exe    N/A      |
|    0   N/A  N/A      5276    C+G   C:\Windows\explorer.exe         N/A      |
|    0   N/A  N/A      5540    C+G   ...in7x64\steamwebhelper.exe    N/A      |
|    0   N/A  N/A      6296    C+G   ...y\GalaxyClient Helper.exe    N/A      |
+-----------------------------------------------------------------------------+

What's next?