About GPUs

To accelerate specific workloads on Compute Engine you can either deploy an accelerator-optimized VM that has attached GPUs, or attach GPUs to an N1 general-purpose VM.

This document describes the features and limitations of GPUs running on Compute Engine.

You can also use some GPU machine types on AI Hypercomputer. AI Hypercomputer is a supercomputing system that is optimized to support your artificial intelligence (AI) and machine learning (ML) workloads. This option is recommended for creating a densely allocated, performance-optimized infrastructure that has integrations for Google Kubernetes Engine (GKE) and Slurm schedulers.

GPUs and machine series

GPUs are supported for the accelerator-optimized (A4X, A4, A3, A2, and G2) machine series and the N1 general-purpose machine series. For VMs that use accelerator-optimized machine types, the GPUs are automatically attached when you create the VM. For VMs that use N1 machine types, you attach the GPU to the VM during or after VM creation. GPUs can't be used with other machine series.

Accelerator-optimized machine series

Each accelerator-optimized machine type has a specific model of NVIDIA GPUs attached.

For A4X machine types, NVIDIA GB200 superchips are attached.
For A4 machine types, NVIDIA B200 GPUs are attached.
For A3 machine types, NVIDIA H100 80GB or NVIDIA H200 141GB GPUs are attached. These are available in the following options:
- A3 Ultra: these machine types have H200 141GB GPUs attached
- A3 Mega: these machine types have H100 80GB GPUs attached
- A3 High: these machine types have H100 80GB GPUs attached
- A3 Edge: these machine types have H100 80GB GPUs attached
For A2 machine types, NVIDIA A100 GPUs are attached. These are available in the following options:
- A2 Ultra: these machine types have A100 80GB GPUs attached
- A2 Standard: these machine types have A100 40GB GPUs attached
For G2 machine types, NVIDIA L4 GPUs are attached.

For more information, see Accelerator-optimized machine series.

N1 general-purpose machine series

For all other GPU types, you can use most N1 machine types except the N1 shared-core (f1-micro and g1-small).

For this machine series, you can use either predefined or custom machine types.

GPUs on Spot VMs

You can add GPUs to your Spot VMs at lower spot prices for the GPUs. GPUs attached to Spot VMs work like normal GPUs but persist only for the life of the VM. Spot VMs with GPUs follow the same preemption process as all Spot VMs.

Consider requesting dedicated Preemptible GPU quota to use for GPUs on Spot VMs. For more information, see Quotas for Spot VMs.

During maintenance events, Spot VMs with GPUs are preempted by default and cannot be automatically restarted. If you want to recreate your VMs after they have been preempted, use a managed instance group. Managed instance groups recreate your VM instances if the vCPU, memory, and GPU resources are available.

If you want a warning before your VMs are preempted, or want to configure your VMs to automatically restart after a maintenance event, use standard VMs with a GPU. For standard VMs with GPUs, Compute Engine provides one hour advance notice before preemption.

Compute Engine does not charge you for GPUs if their VMs are preempted in the first minute after they start running.

To learn how to create Spot VMs with GPUs attached, read Create a VM with attached GPUs and Creating Spot VMs. For example, see Create an A3 Ultra or A4 instance using Spot VMs.

GPUs on VMs with predefined run times

VMs that use the standard provisioning model typically can't use preemptible allocation quotas. Preemptible quotas are for temporary workloads and are usually more available. If your project doesn't have preemptible quota, and you have never requested it, then all VMs in your project consume standard allocation quotas.

If you request preemptible allocation quota, then VMs that use the standard provisioning model must meet all of the following criteria to consume preemptible allocation quota:

The VMs have GPUs attached.
The VMs are configured to be automatically deleted after a predefined run time through the maxRunDuration or terminationTime field. For more information, see the following:
- Limit the run time of a VM
- Limit the run time of VMs in a MIG
The VM isn't allowed to consume reservations. For more information, see Prevent compute instances from consuming reservations.

When you consume preemptible allocation for time-bound GPU workloads, you can benefit from both uninterrupted run time and the high obtainability of preemptible allocation quota. For more information, see Preemptible quotas.

GPUs and Confidential VM

You can use a GPU with a Confidential VM instance using Intel TDX on A3 machine series. For more information, see Confidential VM supported configurations. To learn how to create a Confidential VM instance with GPUs, see Create a Confidential VM instance with GPU.

GPUs and block storage

When you create a VM on a GPU platform, you can add persistent or temporary block storage to the VM. To store non-transient data, use persistent block storage like Hyperdisk or Persistent Disk because the disks are independent of the VM's lifecycle. Data on persistent storage can be retained even after you delete the VM.

For temporary scratch storage or caches, use temporary block storage by adding Local SSD disks when you create the VM.

Persistent block storage with Persistent Disk and Hyperdisk volumes

You can attach Persistent Disk and select Hyperdisk volumes with GPU enabled VMs.

For machine-learning training and serving workloads, Google recommends using Hyperdisk ML volumes, which offer high throughput and shorter data load times. This makes Hyperdisk ML a more cost effective option for ML workloads because it offers lower GPU idle times.

Hyperdisk ML volumes provide read-only multi-attach support, so you can attach the same disk to multiple VMs, giving each VM access to the same data.

For more information about the supported disk types for machine series that support GPUs, see the N1 and accelerator optimized machine series pages.

Local SSD disks

Local SSD disks provide fast, temporary storage for caching, data processing, or other transient data. Local SSD disks are fast storage because they are physically attached to the server hosting your VM. They are temporary because the data is lost if the VM restarts.

You shouldn't store data with strong persistency requirements on Local SSD disks. To store non-transient data, use persistent storage instead.

If you manually stop a VM with a GPU, you can preserve the Local SSD data, with certain restrictions. See the Local SSD documentation for more details.

For regional support for Local SSD with GPU types, see Local SSD availability by GPU regions and zones.

GPUs and host maintenance

VMs with attached GPUs are always stopped when Compute Engine performs maintenance events on the VMs. If the VM has attached Local SSD disks, the Local SSD data is lost after the VM stops.

For information on handling maintenance events, see Handling GPU host maintenance events.

GPU pricing

For VMs that have GPUs attached, you incur costs as follows:

If you request Compute Engine to provision GPUs using the spot, flex-start, or reservation-bound provisioning model, then you get a discounted price, depending on the GPU type.
Most VMs that have GPUs attached receive sustained use discounts (SUDs), similar to vCPUs. When you select a GPU for a virtual workstation, Compute Engine automatically adds an NVIDIA RTX Virtual Workstation license to your VM.

For hourly and monthly pricing for GPUs, see GPU pricing page.

Reserving GPUs with committed use discounts

To reserve GPU resources in a specific zone, see Choose a reservation type.

To receive committed use discounts for GPUs in a specific zone, you must purchase resource-based commitments for the GPUs and also attach reservations that specify matching GPUs to your commitments. For more information, see Attach reservations to resource-based commitments.

GPU restrictions and limitations

For VMs with attached GPUs, the following restrictions and limitations apply:

GPUs are supported with only accelerator-optimized (A4X, A4, A3, A2, and G2) or general-purpose N1 machine types.
To protect Compute Engine systems and users, new projects have a global GPU quota, which limits the total number of GPUs you can create in any supported zone. When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones.
VMs with one or more GPUs have a maximum number of vCPUs for each GPU that you add to the VM. To see the available vCPU and memory ranges for different GPU configurations, see the GPUs list.
GPUs require device drivers in order to function properly. NVIDIA GPUs running on Compute Engine must use a minimum driver version. For more information about driver versions, see Required NVIDIA driver versions.
VMs with attached GPU model are covered by the Compute Engine SLA only if that attached GPU model is generally available.

For regions that have multiple zones, the Compute Engine SLA only covers the VM if the GPU model is available in more than one zone within that region. For GPU models by region, see GPU regions and zones.
Compute Engine supports the running of 1 concurrent user per GPU.
Also see the limitations for each machine type with attached GPUs.

What's next?

Learn how to create VMs with attached GPUs.
Learn how to add or remove GPUs.
Learn how to create a Confidential VM instance with an attached GPU.