Adding or removing GPUs


Compute Engine provides graphics processing units (GPUs) that you can add to your virtual machine instances (VMs). You can use these GPUs to accelerate specific workloads on your VMs such as machine learning and data processing.

If you did not attach GPUs during VM creation, you can add GPUs to your existing VMs to suit your application needs as they arise.

If you attached GPUs during or after VM creation, you can detach these GPUs from these VMs when you no longer need them.

Overview

In summary, the process to add or remove a GPU from an existing VM is as follows:

  1. Prepare your VM for the modification.
  2. Stop the VM.
  3. Add or remove the GPU.
  4. If you are adding a GPU, you need to complete the following steps:

Before you begin

Checking GPU quota

To protect Compute Engine systems and users, new projects have a global GPU quota, which limits the total number of GPUs you can create in any supported zone.

Use the regions describe command to ensure that you have sufficient GPU quota in the region where you want to create VMs with GPUs.

gcloud compute regions describe REGION

Replace REGION with the region that you want to check for GPU quota.

If you need additional GPU quota, request a quota increase. When you request a GPU quota, you must request a quota for the GPU types that you want to create in each region and an additional global quota for the total number of GPUs of all types in all zones.

If your project has an established billing history, it will receive quota automatically after you submit the request.

Preparing your VM

When a GPU is added to a VM, the order of the network interface can change.

Most public images on Compute Engine do not have persistent network interface names and adjust to the new order.

However, if you are using either SLES or a custom image, you must update the system setting to prevent the network interface from persisting. To prevent the network interface from persisting, run the following command on your VM:

 rm /etc/udev/rules.d/70-persistent-net.rules 

Adding GPUs to existing VMs

You can add GPUs to an existing VM by using the Google Cloud Console or the API.

Adding GPUs to existing VMs (A100 GPUs)

This section covers how to add NVIDIA® A100 GPUs to existing VMs.

Console

You can add GPUs to your VM by stopping the VM and editing the VM configuration.

  1. Verify that all of your critical applications are stopped on the VM.

  2. In the Google Cloud Console, go to the VM instances page to see your list of VMs.

    Go to VM instances

  3. Click the name of the VM where you want to add GPUs. The VM instance details page opens.

  4. On the VM instance details page, complete the following steps:

    1. Click Stop to stop the VM. You can check the notification panel to see when the instance is stopped.
    2. On the stopped VM, click Edit to change the VM properties.
    3. From Machine configuration, complete the following steps.

      1. Under Machine family, click GPU.
      2. Under Series, select A2.
      3. Under Machine type, select the A2 machine type that you want.

        Machine configuration.

      4. Expand the CPU platform and GPU section.

        GPU configuration.

      5. Under CPU platform and GPU, review the GPU type and Number of GPUs.

    4. Scroll to the On host maintenance section. When you add GPUs to a VM, the host maintenance setting is automatically set to Terminate VM instance. See Handling GPU host maintenance events.

    5. Click Save to apply your changes.

    6. Click Start/Resume to restart the VM.

API

You can add GPUs to your VM by stopping the VM and changing your VM's configuration.

  1. Verify that all of your critical applications are stopped on the VM, and then create a POST command to stop the VM so it can move to a host system where GPUs are available.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/stop
    

    Replace the following:

    • PROJECT_ID: project ID.
    • VM_NAME: the name of the VM to stop. This is the VM that you want to attach GPUs to.
    • ZONE: the zone where the VM is located. This zone must support A100 GPUs.
  2. After the VM stops, create a POST request to change the machine type.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/setMachineType
    
    {
       machineType: "zones/us-central1-f/machineTypes/MACHINE_TYPE"
    }
    
    

    Replace the following:

    • PROJECT_ID: your project ID.
    • ZONE: the zone for the VM.
    • VM_NAME: the name of the VM.
    • MACHINE_TYPE: a A2 machine type.
  3. Create a POST command to set the scheduling options for the VM.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/setScheduling
    
    {
    "onHostMaintenance": "TERMINATE",
    "automaticRestart": true
    }
    

    Replace the following:

    • PROJECT_ID: project ID.
    • VM_NAME: the name of the VM where you want to add GPUs.
    • ZONE: the zone where the VM is located.
  4. Start the VM.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/start
    

    Replace the following:

    • PROJECT_ID: project ID.
    • VM_NAME: the name of the VM that you want to add GPUs to.
    • ZONE: the zone where the VM is located.

Next: Install the GPU driver on your VM, so that your system can use the device.

Adding GPUs to existing VMs (other GPU types)

This section covers how to add the following GPU types to existing VMs:

  • NVIDIA® T4: nvidia-tesla-t4
  • NVIDIA® T4 Virtual Workstation with NVIDIA® GRID®: nvidia-tesla-t4-vws
  • NVIDIA® V100: nvidia-tesla-v100
  • NVIDIA® P100: nvidia-tesla-p100.
  • NVIDIA® P100 Virtual Workstation with NVIDIA® GRID®: nvidia-tesla-p100-vws
  • NVIDIA® P4: nvidia-tesla-p4
  • NVIDIA® P4 Virtual Workstation with NVIDIA® GRID®: nvidia-tesla-p4-vws
  • NVIDIA® K80: nvidia-tesla-k80

Console

You can add or remove GPUs from your VM by stopping the VM and editing the VM configuration.

  1. Verify that all of your critical applications are stopped on the VM.

  2. In the Google Cloud Console, go to the VM instances page to see your list of VMs.

    Go to VM instances

  3. Click the name of the VM where you want to add GPUs. The VM instance details page opens.

  4. Complete the following steps from the VM instance details page.

    1. Click Stop to stop the VM. You can check the notification panel to see when the instance is stopped.

    2. On the stopped VM, click Edit and complete the following steps:

    3. From the Machine configuration section, complete the following steps.

      1. Under Series, select N1.
      2. Under Machine type, select the N1 machine type that you want.
      3. Expand the CPU platform and GPU section.
      4. Click Add GPU.

        The machine configuration section.

      5. Specify the GPU type and Number of GPUs.
      6. If your GPU model supports virtual workstations, and you plan on running graphics-intensive workloads on this VM, select Enable Virtual Workstation (NVIDIA GRID).

        For information about NVIDIA® GRID virtual workstations, see NVIDIA® GRID® GPUs for graphics workloads.

        GPU configuration.

    4. Scroll to the On host maintenance section. When you add GPUs to a VM, the host maintenance setting is automatically set to Terminate VM instance. See Handling GPU host maintenance events.

    5. Click Save to apply your changes.

    6. Click Start/Resume to restart the VM.

API

You can add GPUs to your VM by stopping the VM and changing your VM's configuration through the API.

  1. Verify that all of your critical applications are stopped on the VM and then create a POST command to stop the VM so it can move to a host system where GPUs are available.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/stop
    

    Replace the following:

    • PROJECT_ID: project ID.
    • VM_NAME: the name of the VM to stop. This is the VM that you want to attach GPUs to.
    • ZONE: the zone for where the VM is located.
  2. Identify the GPU type that you want to add to your VM. Submit a GET request to list the GPU types that are available to your project in a specific zone.

    GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/acceleratorTypes
    

    Replace the following:

    • PROJECT_ID: project ID.
    • ZONE: the zone where you want to list the available GPU types.
  3. If the VM has a shared-core machine type, you must change the machine type to have one or more vCPUs. You cannot add accelerators to VMs with shared-core machine types.

  4. After the VM stops, create a POST request to add one or more GPUs to your VM.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/setMachineResources
    
        {
         "guestAccelerators": [
          {
            "acceleratorCount": ACCELERATOR_COUNT,
            "acceleratorType": "https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/acceleratorTypes/ACCELERATOR_TYPE"
          }
         ]
        }
    

    Replace the following:

    • VM_NAME: the name of the VM.
    • PROJECT_ID: your project ID.
    • ZONE: the zone for the VM.
    • ACCELERATOR_COUNT: the number of GPUs that you want to attach to your VM. For a list of GPU limits based on the machine type of your VM, see GPUs on Compute Engine.
    • ACCELERATOR_TYPE: the GPU model that you want to use. If you plan on running graphics-intensive workloads on this VM, use one of the virtual workstation models.

      Choose one of the following values:

      • NVIDIA® T4: nvidia-tesla-t4
      • NVIDIA® T4 Virtual Workstation with NVIDIA® GRID®: nvidia-tesla-t4-vws
      • NVIDIA® P4: nvidia-tesla-p4
      • NVIDIA® P4 Virtual Workstation with NVIDIA® GRID®: nvidia-tesla-p4-vws
      • NVIDIA® P100: nvidia-tesla-p100
      • NVIDIA® P100 Virtual Workstation with NVIDIA® GRID®: nvidia-tesla-p100-vws
      • NVIDIA® V100: nvidia-tesla-v100
      • NVIDIA® K80: nvidia-tesla-k80
  5. Create a POST command to set the scheduling options for the VM.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/setScheduling
    
    {
    "onHostMaintenance": "TERMINATE",
    "automaticRestart": true
    }
    

    Replace the following:

    • PROJECT_ID: project ID.
    • VM_NAME: the name of the VM where you want to add GPUs.
    • ZONE: the zone where the VM is located.
  6. Start the VM.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/start
    

    Replace the following:

    • PROJECT_ID: project ID.
    • VM_NAME: the name of the VM that you want to add GPUs to.
    • ZONE: the zone where the VM is located.

Next: To ensure that your system can use the GPUs, complete the following steps:

Removing or modifying GPUs

You can use the Google Cloud Console to remove GPUs from an existing VM, or modify the number or type of GPU that you have attached. To remove or modify GPUs, complete the following steps:

  1. Verify that all of your critical applications are stopped on the VM.

  2. In the Google Cloud Console, go to the VM instances page to see your list of VMs.

    Go to VM instances

  3. Click the name of the VM where you want to remove or modify GPUs. The VM instance details page opens.

  4. Complete the following steps from the VM instance details page.

    1. Click Stop to stop the VM. You can check the notification panel to see when the instance is stopped.
    2. On the stopped VM, click Edit.
    3. Under Machine configuration, expand the CPU platform and GPU section.
    4. Remove or modify the GPUs as follows:
      • To modify the GPUs, adjust the Number of GPUs or the GPU Type as needed.
      • To remove all GPUs, click the X located beside the attached GPUs.
    5. Scroll to the On host maintenance section. When you add GPUs to a VM, the host maintenance setting is automatically set to Terminate VM instance. See Handling GPU host maintenance events.
    6. Click Save to apply your changes.
    7. Click Start/Resume to restart the VM.

What's next?