Overview of creating an instance with attached GPUs

Linux Windows

This document provides an overview of the steps required to create a Compute Engine instance with attached graphics processing units (GPUs). You can use GPUs to accelerate specific workloads, such as machine learning and data processing.

You can also use some GPU machine types on AI Hypercomputer. AI Hypercomputer is a supercomputing system that is optimized to support your artificial intelligence (AI) and machine learning (ML) workloads. This option is recommended for creating a densely allocated, performance-optimized infrastructure that has integrations for Google Kubernetes Engine (GKE) and Slurm schedulers.

For more information about GPUs on Compute Engine, see About GPUs.

Select the GPU model

For a list of GPU models that are available, see GPU platforms. Also make a note of the machine type that is supported for the selected GPU model.

For each model, it might also be helpful to review the following:

Supported regions and zones.
GPU pricing to understand the cost to use each GPU model on your instances. For instances that use accelerator-optimized machines, also review VM instance pricing.

Limitations

In addition to the restrictions for all instances with GPUs, each machine series with attached GPUs has the following limitations:

A4 instances

You can only request capacity by using the supported consumption options for an A4 machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use an A4 machine type.
You can only use an A4 machine type in certain regions and zones.
You can't use Persistent Disk (regional or zonal) on an instance that uses an A4 machine type.
The A4 machine type is only available on the Emerald Rapids CPU platform.
You can't change the machine type of an existing instance to an A4 machine type. You can only create new A4 instances. After creating an instance using an A4 machine type, you can't change the machine type.
A4 machine types don't support sole-tenancy.
You can't run Windows operating systems on an A4 machine type.

A3 Ultra instances

You can only request capacity by using the supported consumption options for an A3 Ultra machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use an A3 Ultra machine type.
You can only use an A3 Ultra machine type in certain regions and zones.
You can't use Persistent Disk (regional or zonal) on an instance that uses an A3 Ultra machine type.
The A3 Ultra machine type is only available on the Emerald Rapids CPU platform.
You can't change the machine type of an existing instance to an A3 Ultra machine type. You can only create new A3-ultra instances. After creating an instance using an A3 Ultra machine type, you can't change the machine type.
A3 Ultra machine types don't support sole-tenancy.
You can't run Windows operating systems on an A3 Ultra machine type.

A3 Mega instances

You can only request capacity by using the supported consumption options for an A3 Mega machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use an A3 Mega machine type.
You can only use an A3 Mega machine type in certain regions and zones.
You can't use regional Persistent Disk on an instance that uses an A3 Mega machine type.
The A3 Mega machine type is only available on the Sapphire Rapids CPU platform.
You can't change the machine type of an existing instance to an A3 Mega machine type. You can only create new A3-mega instances. After creating an instance using an A3 Mega machine type, you can't change the machine type.
A3 Mega machine types don't support sole-tenancy.
You can't run Windows operating systems on an A3 Mega machine type.

A3 High instances

You can only request capacity by using the supported consumption options for an A3 High machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use an A3 High machine type.
You can only use an A3 High machine type in certain regions and zones.
You can't use regional Persistent Disk on an instance that uses an A3 High machine type.
The A3 High machine type is only available on the Sapphire Rapids CPU platform.
You can't change the machine type of an existing instance to an A3 High machine type. You can only create new A3-high instances. After creating an instance using an A3 High machine type, you can't change the machine type.
A3 High machine types don't support sole-tenancy.
You can't run Windows operating systems on an A3 High machine type.
For a3-highgpu-1g, a3-highgpu-2g, anda3-highgpu-4g machine types, you must create instances using Spot VMs or a feature that uses the Dynamic Workload Scheduler (DWS), such as resize requests in a MIG. For detailed instructions on either of these options, review the following:
- To create Spot VMs, set the provisioning model to SPOT when you Create an accelerator-optimized VM.
- To create a resize request in a MIG, which uses DWS, see Create a MIG with GPU VMs.

A3 Edge instances

You can only request capacity by using the supported consumption options for an A3 Edge machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use an A3 Edge machine type.
You can only use an A3 Edge machine type in certain regions and zones.
You can't use regional Persistent Disk on an instance that uses an A3 Edge machine type.
The A3 Edge machine type is only available on the Sapphire Rapids CPU platform.
You can't change the machine type of an existing instance to an A3 Edge machine type. You can only create new A3-edge instances. After creating an instance using an A3 Edge machine type, you can't change the machine type.
A3 Edge machine types don't support sole-tenancy.
You can't run Windows operating systems on an A3 Edge machine type.

A2 Standard instances

You can only request capacity by using the supported consumption options for an A2 Standard machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use an A2 Standard machine type.
You can only use an A2 Standard machine type in certain regions and zones.
The A2 Standard machine type is only available on the Cascade Lake platform.
If your instance uses an A2 Standard machine type, you can only switch from one A2 Standard machine type type to another A2 Standard machine type. You can't change to any other machine type. For more information, see Modify accelerator-optimized instances.
You can't use the Windows operating system with a2-megagpu-16g A2 Standard machine types. When using Windows operating systems, choose a different A2 Standard machine type.
You can't do a quick format of the attached Local SSDs on Windows instances that use A2 Standard machine types. To format these Local SSDs, you must do a full format by using the diskpart utility and specifying format fs=ntfs label=tmpfs.
A2 Standard machine types don't support sole-tenancy.

A2 Ultra instances

You can only request capacity by using the supported consumption options for an A2 Ultra machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use an A2 Ultra machine type.
You can only use an A2 Ultra machine type in certain regions and zones.
The A2 Ultra machine type is only available on the Cascade Lake platform.
If your instance uses an A2 Ultra machine type, you can't change the machine type. If you need to use a different A2 Ultra machine type, or any other machine type, you must create a new instance.
You can't change any other machine type to an A2 Ultra machine type. If you need a instance that uses an A2 Ultra machine type, you must create a new instance.
You can't do a quick format of the attached Local SSDs on Windows instances that use A2 Ultra machine types. To format these Local SSDs, you must do a full format by using the diskpart utility and specifying format fs=ntfs label=tmpfs.

G4 instances

You can only request capacity by using the supported consumption options for a G4 machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use a G4 machine type.
You can only use a G4 machine type in certain regions and zones.
You can't use Persistent Disk (regional or zonal) on an instance that uses a G4 machine type.
The G4 machine type is only available on the AMD EPYC Turin 5th Generation platform.
You can only create on-demand instances. Reservations, flex-start, or Spot VMs, aren't supported. To get started with G4 instances, contact your Google account team.
You can only create G4 instances in us-central1-b.
You can only use Hyperdisk Balanced and Hyperdisk Extreme disk types on an instance that uses a G4 machine type.
You can't apply committed-use discounts to instances that use a G4 machine type.
You can't create Confidential VM instances that use a G4 machine type.
You can't create G4 instances on sole-tenant nodes.
You can't create G4 instances that use NVIDIA RTX Virtual Workstation (vWS).

G2 instances

You can only request capacity by using the supported consumption options for a G2 machine type.
You don't receive sustained use discounts and flexible committed use discounts for instances that use a G2 machine type.
You can only use a G2 machine type in certain regions and zones.
The G2 machine type is only available on the Cascade Lake platform.
Standard Persistent Disk (pd-standard) isn't supported on instances that use the G2 machine type. For supported disk types, see Supported disk types for G2.
You can't create Multi-Instance GPUs on an instance that uses a G2 machine type.
If you need to change the machine type of a G2 instance, review Modify accelerator-optmized instances.
You can't use Deep Learning VM Images as boot disks for instances that use the G2 machine type.
The current default driver for Container-Optimized OS doesn't support L4 GPUs running on G2 machine types. Also, Container-Optimized OS only supports a select set of drivers. If you want to use Container-Optimized OS on G2 machine types, review the following notes:
- Use a Container-Optimized OS version that supports the minimum recommended NVIDIA driver version 525.60.13 or later. For more information, review the Container-Optimized OS release notes.
- When you install the driver, specify the latest available version that works for the L4 GPUs. For example, sudo cos-extensions install gpu -- -version=525.60.13.
You must use the Google Cloud CLI or REST to create G2 instances for the following scenarios:
- You want to specify custom memory values.
- You want to customize the number of visible CPU cores.

N1+GPU instances

To learn about the limitations for N1 instances with GPUs, see features for the N1 machine series and GPUs for the N1 machine series.

Choose an operating system

If you are using GPUs for machine learning, use one of the following operating systems:

Images optimized for AI workloads. You can use Ubuntu and Rocky images, which are available in accelerator-optimized versions with NVIDIA drivers and CUDA Toolkit pre-installed. See OS images in the AI Hypercomputer documentation.
Deep Learning VM Images. Each Deep Learning VM has a GPU driver installation tool and includes packages such as TensorFlow and PyTorch. You can also use a Deep Learning VM for general GPU workloads. To learn more about available images and packages installed on these images, see Choosing an image in the Deep Learning VM documentation.

Caution: You can't use Deep Learning VM Images on boot disks for your VMs that use G2 machine types. G2 machine types are accelerator-optimized machine series that have NVIDIA L4 GPUs attached.

Alternatively, you can use a public or custom image. For most public images or custom images, you need to install NVIDIA drivers and CUDA Toolkit. To help identify which drivers are appropriate for your GPU model, see installing GPU drivers.

Check GPU quota

To protect Compute Engine systems and users, new projects have a global GPU quota, which limits the total number of GPUs you can create in any supported zone. To review GPU quota, see GPU quota.

If you need additional GPU quota, request a quota increase. When you request GPU quota, you must request quota for the GPU types that you want to create in each region and an additional global quota for the total number of GPUs of all types in all zones.

If your project has an established billing history, it will receive quota automatically after you submit the request.

GPU instances and preemptible allocation quotas

Instances that use the standard provisioning model typically can't use preemptible allocation quotas. Preemptible quotas are for temporary workloads and are usually more available. If your project doesn't have preemptible quota, and you have never requested it, then all instances in your project consume standard allocation quotas.

If you request preemptible allocation quota, then instances that use the standard provisioning model must meet all of the following criteria to consume preemptible allocation quota:

The instances have GPUs attached.
The instances are configured to be automatically deleted after a predefined run time through the maxRunDuration or terminationTime field. For more information, see the following:
- Limit the run time of an instance
- Limit the run time of instances in a MIG
The instance isn't allowed to consume reservations. For more information, see Prevent compute instances from consuming reservations.

When you consume preemptible allocation for time-bound GPU workloads, you can benefit from both uninterrupted run time and the high obtainability of preemptible allocation quota. For more information, see Preemptible quotas.

Create an instance that has attached GPUs

To create an instance that has attached GPUs, complete the following steps:

Create the instance. The method used to create an instance depends on the GPU model selected.
- To create an instance that has attached NVIDIA B200 or H200 GPUs, see Create an A3 Ultra or A4 instance.
- To create an instance that has attached NVIDIA H100, A100, or L4 GPUs, see Create an A3, A2, or G2 instance.
- For information on how to get started with G4 instances that have attached NVIDIA RTX PRO 6000 GPUs, contact your Google account team.
- To create an instance that has attached NVIDIA T4, P4, P100, or V100 GPUs, see Create an N1 instance that has attached GPUs.
For the instance to use the GPU, you need to install the GPU driver on your instance. If you enabled an NVIDIA RTX virtual workstation (formerly known as NVIDIA GRID), install a driver for virtual workstation.

What's next?

Learn more about GPU platforms.
Learn more about the features and limitations of using GPUs.

Learn how to view the actual and forecasted usage of your GPUs.