Adding or Removing GPUs

Google Compute Engine provides graphics processing units (GPUs) that you can add to your virtual machine instances. You can use these GPUs to accelerate specific workloads on your instances such as machine learning and data processing.

For more information about what you can do with GPUs and what types of GPU hardware are available, read GPUs on Compute Engine.

Before you begin

Creating an instance with a GPU

Before you create an instance with a GPU, select which boot disk image you want to use for the instance, and ensure that the appropriate GPU driver is installed.

If you are using GPUs for machine learning, you can use a Deep Learning VM image for your instance. The Deep Learning VM images have GPU drivers pre-installed, and include packages such as TensorFlow and PyTorch. You can also use the Deep Learning VM images for general GPU workloads. For information on the images available, and the packages installed on the images, see the Deep Learning VM documentation.

You can also use any public image or custom image, but some images might require a unique driver or install process that is not covered in this guide. You must identify what drivers are appropriate for your images.

For steps to install drivers, see installing GPU drivers.

When you create an instance with one or more GPUs, you must set the instance to terminate on host maintenance. Instances with GPUs cannot live migrate because they are assigned to specific hardware devices. See GPU restrictions for details.

Create an instance with one or more GPUs using the Google Cloud Platform Console, the gcloud command-line tool, or the API.

Console

  1. Go to the VM instances page.

    Go to the VM instances page

  2. Click Create instance.
  3. Select a zone where GPUs are available. See the list of available zones with GPUs.
  4. In the Machine type section, select the machine type that you want to use for this instance. Alternatively, you can specify custom machine type settings later.
  5. In the Machine type section, click Customize to see advanced machine type options and available GPUs.
  6. Click GPUs to see the list of available GPUs.
  7. Specify the GPU type and the number of GPUs that you need.
  8. If necessary, adjust the machine type to accommodate your desired GPU settings. If you leave these settings as they are, the instance uses the predefined machine type that you specified before opening the machine type customization screen.
  9. To configure your boot disk, in the Boot disk section, click Change.
  10. In the OS images tab, choose an image.
  11. Click Select to confirm your boot disk options.
  12. Optionally, you can include a startup script to install the GPU driver while the instance starts up. In the Automation section, include the contents of your startup script under Startup script. See installing GPU drivers for example scripts.
  13. Configure any other instance settings that you require. For example, you can change the Preemptibility settings to configure your instance as a preemptible instance. This reduces the cost of your instance and the attached GPUs. Read GPUs on preemptible instances to learn more.
  14. At the bottom of the page, click Create to create the instance.

gcloud

Use the regions describe command to ensure that you have sufficient GPU quota in the region where you want to create instances with GPUs.

gcloud compute regions describe [REGION]

where [REGION] is the region where you want to check for GPU quota.

Start an instance with the latest image from an image family:

gcloud compute instances create [INSTANCE_NAME] \
    --machine-type [MACHINE_TYPE] --zone [ZONE] \
    --accelerator type=[ACCELERATOR_TYPE],count=[ACCELERATOR_COUNT] \
    --image-family [IMAGE_FAMILY] --image-project [IMAGE_PROJECT] \
    --maintenance-policy TERMINATE --restart-on-failure \
    --metadata startup-script='[STARTUP_SCRIPT]' \
    [--preemptible]

where:

  • [INSTANCE_NAME] is the name for the new instance.
  • [MACHINE_TYPE] is the machine type that you selected for the instance. See GPUs on Compute Engine to see what machine types are available based on your desired GPU count.
  • [ZONE] is the zone for this instance.
  • [IMAGE_FAMILY] is one of the available image families.
  • [ACCELERATOR_COUNT] is the number of GPUs that you want to add to your instance. See GPUs on Compute Engine for a list of GPU limits based on the machine type of your instance.
  • [ACCELERATOR_TYPE] is the GPU model that you want to use. Use one of the following values:

    * NVIDIA® Tesla® P4: `nvidia-tesla-p4`
    * NVIDIA® Tesla® P4 Virtual Workstation with NVIDIA®
      GRID®: `nvidia-tesla-p4-vws`
    * NVIDIA® Tesla® P100: `nvidia-tesla-p100`
    * NVIDIA® Tesla® P100 Virtual Workstation with NVIDIA®
      GRID®: `nvidia-tesla-p100-vws`
    * NVIDIA® Tesla® V100: `nvidia-tesla-v100`
    * NVIDIA® Tesla® K80: `nvidia-tesla-k80`
    

    See GPUs on Compute Engine for a list of available GPU models. + [IMAGE_PROJECT] is the image project that the image family belongs to. + [STARTUP_SCRIPT] is an optional startup script that you can use to install the GPU driver while the instance is starting up. See installing GPU drivers for examples. + --preemptible is an optional flag that configures your instance as a preemptible instance. This reduces the cost of your instance and the attached GPUs. Read GPUs on preemptible instances to learn more.

For example, you can use the following gcloud command to start an Ubuntu 16.04 instance with one NVIDIA® Tesla® K80 GPU and 2 vCPUs in the us-east1-d zone. The startup-script metadata instructs the instance to install the CUDA Toolkit with its recommended driver version.

gcloud compute instances create gpu-instance-1 \
    --machine-type n1-standard-2 --zone us-east1-d \
    --accelerator type=nvidia-tesla-k80,count=1 \
    --image-family ubuntu-1604-lts --image-project ubuntu-os-cloud \
    --maintenance-policy TERMINATE --restart-on-failure \
    --metadata startup-script='#!/bin/bash
    echo "Checking for CUDA and installing."
    # Check for CUDA and try to install.
    if ! dpkg-query -W cuda-9-0; then
      curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
      dpkg -i ./cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
      apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
      apt-get update
      apt-get install cuda-9-0 -y
    fi'

This example command starts the instance, but CUDA and the driver will take several minutes to finish installing.

API

Identify the GPU type that you want to add to your instance. Submit a GET request to list the GPU types that are available to your project in a specific zone.

GET https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/acceleratorTypes

where:

  • [PROJECT_ID] is your project ID.
  • [ZONE] is the zone where you want to list the available GPU types.

In the API, create a POST request to create a new instance. Include the acceleratorType parameter to specify which GPU type you want to use, and include the acceleratorCount parameter to specify how many GPUs you want to add. Also set the onHostMaintenance parameter to TERMINATE.

POST https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances?key={YOUR_API_KEY}
{
  "machineType": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/machineTypes/n1-highmem-2",
  "disks":
  [
    {
      "type": "PERSISTENT",
      "initializeParams":
      {
        "diskSizeGb": "[DISK_SIZE]",
        "sourceImage": "https://www.googleapis.com/compute/v1/projects/[IMAGE_PROJECT]/global/images/family/[IMAGE_FAMILY]"
      },
      "boot": true
    }
  ],
  "name": "[INSTANCE_NAME]",
  "networkInterfaces":
  [
    {
      "network": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/global/networks/[NETWORK]"
    }
  ],
  "guestAccelerators":
  [
    {
      "acceleratorCount": [ACCELERATOR_COUNT],
      "acceleratorType": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/acceleratorTypes/[ACCELERATOR_TYPE]"
    }
  ],
  "scheduling":
  {
    "onHostMaintenance": "terminate",
    "automaticRestart": true,
    ["preemptible": true]
  },
  "metadata":
  {
    "items":
    [
      {
        "key": "startup-script",
        "value": "[STARTUP_SCRIPT]"
      }
    ]
  }
}

where:

  • [INSTANCE_NAME] is the name of the instance.
  • [PROJECT_ID] is your project ID.
  • [ZONE] is the zone for this instance.
  • [MACHINE_TYPE] is the machine type that you selected for the instance. See GPUs on Compute Engine to see what machine types are available based on your desired GPU count.
  • [IMAGE_PROJECT] is the image project that the image belongs to.
  • [IMAGE_FAMILY] is a boot disk image for your instance. Specify an image family from the list of available public images.
  • [DISK_SIZE] is the size of your boot disk in GB.
  • [NETWORK] is the VPC network that you want to use for this instance. Specify default to use your default network.
  • [ACCELERATOR_COUNT] is the number of GPUs that you want to add to your instance. See GPUs on Compute Engine for a list of GPU limits based on the machine type of your instance.
  • [ACCELERATOR_TYPE] is the GPU model that you want to use. See GPUs on Compute Engine for a list of available GPU models.
  • [STARTUP_SCRIPT] is an optional startup script that you can use to install the GPU driver while the instance is starting up. See installing GPU drivers for examples.
  • "preemptible": true is an optional parameter that configures your instance as a preemptible instance. This reduces the cost of your instance and the attached GPUs. Read GPUs on preemptible instances to learn more.

If you used a startup script to automatically install the GPU device driver, verify that the GPU driver installed correctly.

If you did not use a startup script to install the GPU driver during instance creation, manually install the GPU driver on your instance so that your system can use the device.

Adding or removing GPUs on existing instances

You can add or detach GPUs on your existing instances, but you must first stop the instance and change its host maintenance setting so that it terminates rather than live-migrating. Instances with GPUs cannot live migrate because they are assigned to specific hardware devices. See GPU restrictions for details.

Also be aware that you must install GPU drivers on this instance after you add a GPU. The boot disk image that you used to create this instance determines what drivers you need. You must identify what drivers are appropriate for the operating system on your instance's persistent boot disk images. Read installing GPU drivers for details.

You can add or remove GPUs from an instance using the Google Cloud Platform Console or the API.

Console

You can add or remove GPUs from your instance by stopping the instance and editing your instance's configuration.

  1. Verify that all of your critical applications are stopped on the instance. You must stop the instance before you can add a GPU.

  2. Go to the VM instances page to see your list of instances.

    Go to the VM instances page

  3. On the list of instances, click the name of the instance where you want to add GPUs. The instance details page opens.

  4. At the top of the instance details page, click Stop to stop the instance.

  5. After the instance stops running, click Edit to change the instance properties.

  6. If the instance has a shared-core machine type, you must change the machine type to have one or more vCPUs. You cannot add accelerators to instances with shared-core machine types.

  7. In the Machine type settings, click GPUs to expand the GPU selection list.

  8. Select the number of GPUs and the GPU model that you want to add to your instance. Alternatively, you can set the number of GPUs to None to remove existing GPUs from the instance.

  9. If you added GPUs to an instance, set the host maintenance setting to Terminate. If you removed GPUs from the instance, you can optionally set the host maintenance setting back to Migrate VM instance.

  10. At the bottom of the instance details page, click Save to apply your changes.

  11. After the instance settings are saved, click Start at the top of the instance details page to start the instance again.

API

You can add or remove GPUs from your instance by stopping the instance and changing your instance's configuration through the API.

  1. Verify that all of your critical applications are stopped on the instance and then create a POST command to stop the instance so it can move to a host system where GPUs are available.

    POST https://www.googleapis.com/compute/v1/projects/compute/zones/[ZONE]/instances/[INSTANCE_NAME]/stop
    

    where:

    • [INSTANCE_NAME] is the name of the instance where you want to add GPUs.
    • [ZONE] is the zone for where the instance is located.
  2. Identify the GPU type that you want to add to your instance. Submit a GET request to list the GPU types that are available to your project in a specific zone.

    GET https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/acceleratorTypes
    

    where:

    • [PROJECT_ID] is your project ID.
    • [ZONE] is the zone where you want to list the available GPU types.
  3. If the instance has a shared-core machine type, you must change the machine type to have one or more vCPUs. You cannot add accelerators to instances with shared-core machine types.

  4. After the instance stops, create a POST request to add or remove one or more GPUs to your instance.

    POST https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances/[INSTANCE_NAME]/setMachineResources
    
    {
     "guestAccelerators": [
      {
        "acceleratorCount": [ACCELERATOR_COUNT],
        "acceleratorType": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/acceleratorTypes/[ACCELERATOR_TYPE]"
      }
     ]
    }
    

    where:

    • [INSTANCE_NAME] is the name of the instance.
    • [PROJECT_ID] is your project ID.
    • [ZONE] is the zone for this instance.
    • [ACCELERATOR_COUNT] is the number of GPUs that you want on your instance. See GPUs on Compute Engine for a list of GPU limits based on the machine type of your instance.
    • [ACCELERATOR_TYPE] is the GPU model that you want to use. See GPUs on Compute Engine for a list of available GPU models.
  5. Create a POST command to set the scheduling options for the instance. If you are adding GPUs to an instance, you must specify "onHostMaintenance": "TERMINATE". Optionally, if you are removing GPUs from an instance you can specify "onHostMaintenance": "MIGRATE".

    POST https://www.googleapis.com/compute/v1/projects/compute/zones/[ZONE]/instances/[INSTANCE_NAME]/setScheduling
    
    {
     "onHostMaintenance": "[MAINTENANCE_TYPE]",
     "automaticRestart": true
    }
    

    where:

    • [INSTANCE_NAME] is the name of the instance where you want to add GPUs.
    • [ZONE] is the zone for where the instance is located.
    • [MAINTENANCE_TYPE] is the action you want your instance to take when host maintenance is necessary. Specify TERMINATE if you are adding GPUs to your instance. Alternatively, you can specify "onHostMaintenance": "MIGRATE" if you have removed all of the GPUs from your instance and want the instance to resume migration on host maintenance events.
  6. Start the instance.

    POST https://www.googleapis.com/compute/v1/projects/compute/zones/[ZONE]/instances/[INSTANCE_NAME]/start
    

    where:

    • [INSTANCE_NAME] is the name of the instance where you want to add GPUs.
    • [ZONE] is the zone for where the instance is located.

Next install the GPU driver on your instance so that your system can use the device.

Creating groups of GPU instances using instance templates

You can use instance templates to create managed instance groups with GPUs added to each instance. Managed instance groups use the template to create multiple identical instances. You can scale the number of instances in the group to match your workload.

For steps to create an instance template, see Creating instance templates.

If you create the instance using the Console, customize the machine type, and select the type and number of GPUs that you want to add to the instance template.

If you are using the gcloud command-line tool, include the --accelerators and --maintenance-policy TERMINATE flags. Optionally, include the --metadata startup-script flag and specify a startup script to install the GPU driver while the instance starts up. For sample scripts that work on GPU instances, see installing GPU drivers.

The following example creates an instance template with 2 vCPUs, a 250GB boot disk with Ubuntu 16.04, an NVIDIA® Tesla® K80 GPU, and a startup script. The startup script installs the CUDA Toolkit with its recommended driver version.

gcloud beta compute instance-templates create gpu-template \
    --machine-type n1-standard-2 \
    --boot-disk-size 250GB \
    --accelerator type=nvidia-tesla-k80,count=1 \
    --image-family ubuntu-1604-lts --image-project ubuntu-os-cloud \
    --maintenance-policy TERMINATE --restart-on-failure \
    --metadata startup-script='#!/bin/bash
    echo "Checking for CUDA and installing."
    # Check for CUDA and try to install.
    if ! dpkg-query -W cuda-9-0; then
      curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
      dpkg -i ./cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
      apt-get update
      apt-get install cuda-9-0 -y
    fi'

After you create the template, use the template to create an instance group. Every time you add an instance to the group, it starts that instance using the settings in the instance template.

If you are creating a regional managed instance group, be sure to select zones that specifically support the GPU model that you want. For a list of GPU models and available zones, see GPUs on Compute Engine. The following example creates a regional managed instance group across two zones that support the nvidia-tesla-k80 model.

gcloud beta compute instance-groups managed create example-rmig \
    --template gpu-template --base-instance-name example-instances \
    --size 30 --zones us-east1-c,us-east1-d

Note: If you are choosing specific zones, use the gcloud beta component because the zone selection feature is currently in Beta.

To learn more about managing and scaling groups of instances, read Creating Groups of Managed Instances.

Installing GPU drivers

After you create an instance with one or more GPUs, your system requires device drivers so that your applications can access the device. This guide shows the ways to install NVIDIA proprietary drivers on instances with public images.

You can install GPU drivers through one of the following options:

Installing GPU drivers using scripts

NVIDIA GPUs running on Google Compute Engine must use the following driver versions:

  • Linux instances:
    • R384 branch: NVIDIA 384.111 driver or greater
    • R390 branch: Not yet available
  • Windows Server instances:
    • R384 branch: NVIDIA 386.07 driver or greater
    • R390 branch: Not yet available

For most driver installs, you can obtain these drivers by installing the NVIDIA CUDA Toolkit.

On some images, you can use scripts to simplify the driver install process. You can either specify these scripts as startup scripts on your instances or copy these scripts to your instances and run them through the terminal as a user with sudo privileges.

You must prepare the script so that it works with the boot disk image that you selected. If you imported a custom boot disk image for your instances, you might need to customize the startup script to work correctly with that custom image.

For Windows Server instances and SLES 12 instances where you cannot automate the driver installation process, install the driver manually.

The following samples are startup scripts that install CUDA and the associated drivers for NVIDIA® GPUs on public images. If the software you are using requires a specific version of CUDA, modify the script to download the version of CUDA that you need.

For information on support for CUDA, and for steps to modify your CUDA installation, see the CUDA Tooltkit Documentation.

CentOS

This sample script checks for an existing CUDA install and then installs the full CUDA 9 package and its associated proprietary driver.

CentOS 7 - CUDA 9:

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-9-0; then
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-9.0.176-1.x86_64.rpm
  rpm -i --force ./cuda-repo-rhel7-9.0.176-1.x86_64.rpm
  yum clean all
  # Install Extra Packages for Enterprise Linux (EPEL) for dependencies
  yum install epel-release -y
  yum update -y
  yum install cuda-9-0 -y
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-9-0; then
  yum install cuda-9-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1

On instances with CentOS 7 images, you might need to reboot the instance after the script finishes installing the drivers and the CUDA packages. Reboot the instance if the script is finished and the nvidia-smi command returns the following error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA
driver. Make sure that the latest NVIDIA driver is installed and
running.

CentOS 6 - CUDA 9:

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-9-0; then
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-9.0.176-1.x86_64.rpm
  rpm -i --force ./cuda-repo-rhel6-9.0.176-1.x86_64.rpm
  yum clean all
  # Install Extra Packages for Enterprise Linux (EPEL) for dependencies
  yum install epel-release -y
  yum update -y
  yum install cuda-9-0 -y
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-9-0; then
  yum install cuda-9-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1

RHEL

This script checks for an existing CUDA install and then installs the full CUDA 9 package and its associated proprietary driver.

RHEL 7 - CUDA 9:

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-9-0; then
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-9.0.176-1.x86_64.rpm
  rpm -i --force ./cuda-repo-rhel7-9.0.176-1.x86_64.rpm
  curl -O https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
  rpm -i --force ./epel-release-latest-7.noarch.rpm
  yum clean all
  yum update -y
  yum install cuda-9-0 -y
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-9-0; then
  yum install cuda-9-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1

On instances with RHEL 7 images, you might need to reboot the instance after the script finishes installing the drivers and the CUDA packages. Reboot the instance if the script is finished and the nvidia-smi command returns the following error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA
driver. Make sure that the latest NVIDIA driver is installed and
running.

RHEL 6 - CUDA 9:

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q  cuda-9-0; then
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-9.0.176-1.x86_64.rpm
  rpm -i --force ./cuda-repo-rhel6-9.0.176-1.x86_64.rpm
  curl -O https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
  rpm -i --force ./epel-release-latest-6.noarch.rpm
  yum clean all
  yum update -y
  yum install cuda-9-0 -y
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q  cuda-9-0; then
  yum install cuda-9-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1

SLES

This sample script checks for an existing CUDA install and then installs the full CUDA 9.1 package and its associated proprietary driver.

SLES 12 Service Pack 3 - CUDA 9.1:

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-9-1; then
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/sles123/x86_64/cuda-repo-sles123-9.1.85-1.x86_64.rpm
  rpm -i --force ./cuda-repo-sles123-9.1.85-1.x86_64.rpm
  zypper --gpg-auto-import-keys refresh
  zypper install -ny cuda-9-1
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-9-1; then
  zypper install -ny cuda-9-1
fi
# Enable persistence mode
nvidia-smi -pm 1

On SLES 12 instances, install the driver manually.

Ubuntu

This sample script checks for an existing CUDA install and then installs the full CUDA 9 package and its associated proprietary driver.

Ubuntu 17.04 and 17.10 - CUDA 9:

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-9-0; then
  # The 17.04 installer works with 17.10.
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
  dpkg -i ./cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
  apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/7fa2af80.pub
  apt-get update
  apt-get install cuda-9-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1

Ubuntu 16.04 LTS - CUDA 9:

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-9-0; then
  # The 16.04 installer works with 16.10.
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
  dpkg -i ./cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
  apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
  apt-get update
  apt-get install cuda-9-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1

Windows Server

On Windows Server instances, you must install the driver manually.

After your script finishes running, you can verify that GPU driver installed.

Manually installing GPU drivers

If you cannot use a script to install the driver for your GPUs, you can manually install the driver yourself. You are responsible for selecting the installer and driver version that works best for your applications. Use this install method if you require a specific driver or you need to install the driver on a custom image or a public image that does not work with one of the install scripts.

You can use this process to manually install drivers on instances with most public images. For custom images, you might need to modify the process to function in your unique environment.

CentOS

  1. Connect to the instance where you want to install the driver.

  2. Select a driver repository and add it to your instance. For example, use curl to download the CUDA Toolkit and use the rpm command to add the repository to your system:

    • CentOS 7

      $ curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-9.0.176-1.x86_64.rpm
      
      $ sudo rpm -i cuda-repo-rhel7-9.0.176-1.x86_64.rpm
      

    • CentOS 6

      $ curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-9.0.176-1.x86_64.rpm
      
      $ sudo rpm -i cuda-repo-rhel6-9.0.176-1.x86_64.rpm
      

  3. Install the epel-release repository. This repository includes the DKMS packages, which are required to install NVIDIA drivers on CentOS.

    $ sudo yum install epel-release
    

  4. Clean the Yum cache:

    $ sudo yum clean all
    

  5. Install CUDA 9, which includes the NVIDIA driver.

    $ sudo yum install cuda-9-0
    

  6. Enable persistence mode.

    $ sudo nvidia-smi -pm 1
    Enabled persistence mode for GPU 00000000:00:04.0.
    Enabled persistence mode for GPU 00000000:00:05.0.
    All done.
    

RHEL

  1. Connect to the instance where you want to install the driver.

  2. Select a driver repository and add it to your instance. For example, use curl to download the CUDA Toolkit and use the rpm command to add the repository to your system:

    • RHEL 7

      $ curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-9.0.176-1.x86_64.rpm
      
      $ sudo rpm -i cuda-repo-rhel7-9.0.176-1.x86_64.rpm
      

    • RHEL 6

      $ curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-9.0.176-1.x86_64.rpm
      
      $ sudo rpm -i cuda-repo-rhel6-9.0.176-1.x86_64.rpm
      

  3. Install the epel-release repository. This repository includes the DKMS packages, which are required to install NVIDIA drivers. On RHEL, you must download the .rpm for this repository from fedoraproject.org and add it to your system.

    • RHEL 7

      $ curl -O https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
      
      $ sudo rpm -i epel-release-latest-7.noarch.rpm
      

    • RHEL 6

      $ curl -O https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
      
      $ sudo rpm -i epel-release-latest-6.noarch.rpm
      

  4. Clean the Yum cache:

    $ sudo yum clean all
    

  5. Install CUDA 9, which includes the NVIDIA driver.

    $ sudo yum install cuda-9-0
    

  6. Enable persistence mode.

    $ sudo nvidia-smi -pm 1
    Enabled persistence mode for GPU 00000000:00:04.0.
    Enabled persistence mode for GPU 00000000:00:05.0.
    All done.
    

SLES

  1. Connect to the instance where you want to install the driver.

  2. Select a driver repository and add it to your instance. For example, use curl to download the CUDA Toolkit and use the rpm command to add the repository to your system:

    • SLES 12 with Service Pack 3

      $ curl -O https://developer.download.nvidia.com/compute/cuda/repos/sles123/x86_64/cuda-repo-sles123-9.1.85-1.x86_64.rpm
      
      $ sudo rpm -i cuda-repo-sles123-9.1.85-1.x86_64.rpm
      

  3. Refresh Zypper:

    $ sudo zypper refresh
    

  4. Install CUDA, which includes the NVIDIA driver.

    $ zypper install cuda-9-1
    

  5. Enable persistence mode.

    $ sudo nvidia-smi -pm 1
    Enabled persistence mode for GPU 00000000:00:04.0.
    Enabled persistence mode for GPU 00000000:00:05.0.
    All done.
    

Ubuntu

  1. Connect to the instance where you want to install the driver.

  2. Select a driver repository and add it to your instance. For example, use curl to download the CUDA Toolkit and use the dpkg command to add the repository to your system. Then, use the apt-key command to authenticate the download:

    • Ubuntu 17.04 and 17.10

      $ curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
      
      $ sudo dpkg -i cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
      $ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/7fa2af80.pub
      

    • Ubuntu 16.04 LTS

      $ curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
      
      $ sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
      $ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
      

  3. Update the package lists:

    $ sudo apt-get update
    

  4. Install CUDA, which includes the NVIDIA driver.

    $ sudo apt-get install cuda-9-0
    

  5. Enable persistence mode.

    $ sudo nvidia-smi -pm 1
    Enabled persistence mode for GPU 00000000:00:04.0.
    Enabled persistence mode for GPU 00000000:00:05.0.
    All done.
    

Windows Server

  1. Connect to the instance where you want to install the driver.

  2. Download an .exe installer file to your instance that includes the R384 branch: NVIDIA 386.07 driver or greater. For most Windows Server instances, you can use one of the following options:

    For example in Windows Server 2016, you can open a PowerShell terminal as an administrator and use the wget command to download the driver installer that you need.

    PS C:> wget https://developer.nvidia.com/compute/cuda/9.0/Prod/network_installers/cuda_9.0.176_win10_network-exe -o cuda_9.0.176_win10_network
    

  3. Run the .exe installer. For example, you can open a PowerShell terminal as an administrator and run the following command.

    PS C:> .\cuda_9.0.176_win10_network
    

After your installer finishes running, verify that GPU driver installed.

Verifying the GPU driver install

After the driver finishes installing, verify that the driver installed and initialized properly.

Linux

Connect to the Linux instance and use the nvidia-smi command to verify that the driver is running properly.

$ nvidia-smi

Mon Jan 26 10:23:26 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:04.0     Off |                    0 |
| N/A   43C    P0    72W / 149W |      0MiB / 11439MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

Windows Server

Connect to the Windows Server instance and use the nvidia-smi.exe tool to verify that the driver is running properly.

PS C:> & 'C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe'

Mon Jan 27 13:06:50 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 369.30                 Driver Version: 369.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           TCC  | 0000:00:04.0     Off |                    0 |
| N/A   52C    P8    30W / 149W |      0MiB / 11423MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

If the driver is not functioning and you used a script to install the driver, check the startup script logs ensure that the script has finished and that it did not fail during the install process.

Installing GRID® drivers for virtual workstations

For a full list of NVIDIA drivers that you can use on Compute Engine, see the contents of the NVIDIA drivers Cloud Storage bucket.

Linux

  1. Download the GRID driver, using the following command:

    curl -O \ https://storage.googleapis.com/nvidia-drivers-us-public/GRID/NVIDIA-Linux-x86_64-384.111-grid.run
    
  2. Use the following command to start the installer:

    sudo bash NVIDIA-Linux-x86_64-384.111-grid.run
    
  3. During the installation, choose the following options:

    • If you are prompted to install 32-bit binaries, select Yes.
    • If you are prompted to modify the x.org file, select No.

Windows Server

  1. Depending on your version of Windows Server, download one of the following NVIDIA GRID drivers:

  2. Run the installer, and choose the Express installation.

  3. After the installation is complete, restart the VM. When you restart, you are disconnected from your session.

  4. Reconnect to your instance using RDP or a PCoIP client.

Verifying that the GRID driver has been installed

Linux

Run the following commands:

sudo nvidia-smi --persistence-mode=1
nvidia-smi

The output of the command looks similar to the following:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    26W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Windows Server

  1. Connect to your Windows instance using RDP or a PCoIP client.

  2. Right-click the desktop, and select NVIDIA Control Panel.

  3. In the NVIDIA Control Panel, from the Help menu, select System Information. The information shows the GPU that the VM is using, and the driver version.

Optimizing GPU performance

In general, you can optimize the performance of your GPU devices on Linux instances using the following settings:

  • Enable persistence mode. This setting applies to all of the GPUs on your instance.

    $ sudo nvidia-smi -pm 1
    Enabled persistence mode for GPU 00000000:00:04.0.
    Enabled persistence mode for GPU 00000000:00:05.0.
    All done.
    

  • On instances with NVIDIA® Tesla® K80 GPUs, disable autoboost:

    $ sudo nvidia-smi --auto-boost-default=DISABLED
    All done.
    

Handling host maintenance events

GPU instances must terminate for host maintenance events, but can automatically restart. These maintenance events typically occur once per week, but can occur more frequently when necessary.

You can deal with maintenance events using the following processes:

  • Avoid these disruptions by regularly restarting your instances on a schedule that is more convenient for your applications.
  • Identify when your instance is scheduled for host maintenance and prepare your workload to transition through the system restart.

To receive advanced notice of host maintenance events, monitor the /computeMetadata/v1/instance/maintenance-event metadata value. If the request to the metadata server returns NONE, the instance is not scheduled to terminate. For example, run the following command from within an instance:

$ curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H "Metadata-Flavor: Google"

NONE

If the metadata server returns a timestamp, the timestamp indicates when your instance will be forcefully terminated. Compute Engine gives GPU instances a one hour termination notice, while normal instances receive only a 60 second notice. Configure your application to transition through the maintenance event. For example, you might use one of the following techniques:

  • Configure your application to temporarily move work in progress to a Google Cloud Storage bucket, then retrieve that data after the instance restarts.

  • Write data to a secondary persistent disk. When the instance automatically restarts, the persistent disk can be reattached and your application can resume work.

You can also receive notification of changes in this metadata value without polling. For examples of how to receive advanced notice of host maintenance events without polling, read getting live migration notices.

What's next?

Was this page helpful? Let us know how we did:

Send feedback about...

Compute Engine Documentation