About GPUs


To accelerate specific workloads on Compute Engine you can either deploy an accelerator-optimized VM that has attached GPUs, or attach GPUs to an N1 general-purpose VM.

This document describes the features and limitations of GPUs running on Compute Engine.

GPUs and machine series

GPUs are supported for N1 general-purpose, and the accelerator-optimized (A3, A2, and G2) machine series. For VMs that use N1 machine types, you attach the GPU to the VM during, or after VM creation. For VMs that use A3, A2 or G2 machine types, the GPUs are automatically attached when you create the VM. GPUs can't be used with other machine series.

Accelerator-optimized machine series

Each accelerator-optimized machine type has a specific model of NVIDIA GPUs attached.

For more information, see Accelerator-optimized machine series.

N1 general-purpose machine series

For all other GPU types, you can use most N1 machine types except the N1 shared-core (f1-micro and g1-small).

For this machine series, you can use either predefined or custom machine types.

GPUs on Spot VMs

You can add GPUs to your Spot VMs at lower spot prices for the GPUs. GPUs attached to Spot VMs work like normal GPUs but persist only for the life of the VM. Spot VMs with GPUs follow the same preemption process as all Spot VMs.

Consider requesting dedicated Preemptible GPU quota to use for GPUs on Spot VMs. For more information, see Quotas for Spot VMs.

During maintenance events, Spot VMs with GPUs are preempted by default and cannot be automatically restarted. If you want to recreate your VMs after they have been preempted, use a managed instance group. Managed instance groups recreate your VM instances if the vCPU, memory, and GPU resources are available.

If you want a warning before your VMs are preempted, or want to configure your VMs to automatically restart after a maintenance event, use standard VMs with a GPU. For standard VMs with GPUs, Compute Engine provides one hour advance notice before preemption.

Compute Engine does not charge you for GPUs if their VMs are preempted in the first minute after they start running.

To learn how to create Spot VMs with GPUs attached, read Create a VM with attached GPUs and Creating Spot VMs.

GPUs on VMs with predefined run times

Resources for VMs that use the default standard provisioning model (standard VMs) typically cannot use preemptible allocation quotas, which are intended for temporary workloads and usually more available. If your project does not have preemptible quota, and you have never requested preemptible quota, all VMs in that project consume standard allocation quotas.

However, once you request preemptible allocation quota, then the standard VMs that meet all the following criteria can consume only preemptible allocation quota.

By consuming preemptible allocation quota for such workloads, you gain both the benefits of uninterrupted run time from standard VMs and improved obtainability from preemptible allocation quota.

Regardless of the quota used, standard VMs don't qualify for Spot VMs pricing and are not subject to preemption.

For more information, see Preemptible quotas.

GPUs and Confidential VM

You can't attach GPUs to Confidential VM instances. For more information about Confidential VM, see Confidential VM overview.

GPUs and block storage

When you create a VM on a GPU platform, you can add persistent or temporary block storage to the VM. To store non-transient data, use persistent block storage like Hyperdisk ML or Persistent Disk because the disks are independent of the VM's lifecycle. Data on persistent storage can be retained even after you delete the VM.

For temporary scratch storage or caches, use temporary block storage by adding Local SSD disks when you create the VM.

Persistent block storage with Persistent Disk and Hyperdisk volumes

You can attach Persistent Disk and Hyperdisk ML volumes with GPU enabled VMs.

For machine-learning training and serving workloads, Google recommends using Hyperdisk ML volumes, which offer high throughput and shorter data load times. This makes Hyperdisk ML a more cost effective option for ML workloads because it offers lower GPU idle times.

Hyperdisk ML volumes provide read-only multi-attach support, so you can attach the same disk to multiple VMs, giving each VM access to the same data.

For more information about the supported disk types for machine series that support GPUs, see the N1 and accelerator optimized machine series pages.

Local SSD disks

Local SSD disks provide fast, temporary storage for caching, data processing, or other transient data. Local SSD disks are fast storage because they are physically attached to the server hosting your VM. They are temporary because the data is lost if the VM restarts.

You shouldn't store data with strong persistency requirements on Local SSD disks. To store non-transient data, use persistent storage instead.

If you manually stop a VM with a GPU, you can preserve the Local SSD data, with certain restrictions. See the Local SSD documentation for more details.

For regional support for Local SSD with GPU types, see Local SSD availability by GPU regions and zones.

GPUs and host maintenance

VMs with attached GPUs are always stopped when Compute Engine performs maintenance events on the VMs. If the VM has attached Local SSD disks, the Local SSD data is lost after the VM stops.

For information on handling maintenance events, see Handling GPU host maintenance events.

GPU pricing

Most VMs with an attached GPU receive sustained use discounts similar to vCPUs. When you select a GPU for a virtual workstation, an NVIDIA RTX Virtual Workstation license is added to your VM.

For hourly and monthly pricing for GPUs, see GPU pricing page.

Reserving GPUs with committed use discounts

To reserve GPU resources in a specific zone, see Reservations of Compute Engine zonal resources.

To receive committed use discounts for GPUs in a specific zone, you must purchase resource-based commitments for the GPUs and also attach reservations that specify matching GPUs to your commitments. For more information, see Attach reservations to resource-based commitments.

GPU restrictions and limitations

For VMs with attached GPUs, the following restrictions and limitations apply:

  • GPUs are only supported with general-purpose N1 or accelerator-optimized - A3, A2, and G2 - machine types.

  • To protect Compute Engine systems and users, new projects have a global GPU quota, which limits the total number of GPUs you can create in any supported zone. When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones.

  • VMs with one or more GPUs have a maximum number of vCPUs for each GPU that you add to the VM. To see the available vCPU and memory ranges for different GPU configurations, see the GPUs list.

  • GPUs require device drivers in order to function properly. NVIDIA GPUs running on Compute Engine must use a minimum driver version. For more information about driver versions, see Required NVIDIA driver versions.

  • VMs with a specific attached GPU model are covered by the Compute Engine SLA only if that attached GPU model is generally available and is supported in more than one zone in the same region. The Compute Engine SLA doesn't cover GPU models in the following zones:

    • NVIDIA H100 80GB:
      • asia-south1-c
      • australia-southeast1-c
      • europe-west2-b
      • europe-west1-b
      • europe-west2-b
      • europe-west3-a
      • europe-west4-b
      • europe-west8-c
      • europe-west9-c
      • europe-west12-b
      • us-east5-a
      • us-west4-a
    • NVIDIA L4:
      • asia-northeast1-b
      • northamerica-northeast2-a
    • NVIDIA A100 80GB:
      • asia-southeast1-c
      • us-east4-c
      • us-east5-b
    • NVIDIA A100 40GB:
      • us-east1-b
      • us-west1-b
      • us-west3-b
      • us-west4-b
    • NVIDIA T4:
      • europe-west3-b
      • southamerica-east1-c
      • us-west3-b
    • NVIDIA V100:
      • asia-east1-c
      • us-east1-c
    • NVIDIA P100:
      • australia-southeast1-c
      • europe-west4-a
  • Compute Engine supports the running of 1 concurrent user per GPU.

What's next?