GPU models


You can use GPUs on Compute Engine to accelerate specific workloads on your VMs such as machine learning (ML) and data processing. To use GPUs, you can either deploy an accelerator-optimized VM that has attached GPUs, or attach GPUs to an N1 general-purpose VM.

Compute Engine provides GPUs for your VMs in passthrough mode so that your VMs have direct control over the GPUs and their associated memory.

For more information about GPUs on Compute Engine, see About GPUs.

If you have graphics-intensive workloads, such as 3D visualization, 3D rendering, or virtual applications, you can use NVIDIA RTX virtual workstations (formerly known as NVIDIA GRID).

This document provides an overview of the different GPU VMs that are available on Compute Engine.

To view available regions and zones for GPUs on Compute Engine, see GPUs regions and zone availability.

GPUs for compute workloads

For compute workloads, GPU models are available in the following stages:

  • A3 machine series
    • A3 Mega: NVIDIA H100 80GB Mega: nvidia-h100-mega-80gb: Generally Available
    • A3 Standard: NVIDIA H100 80GB: nvidia-h100-80gb: Generally Available
  • G2 machine series
    • NVIDIA L4: nvidia-l4: Generally Available
  • A2 machine series
    • A2 Ultra: NVIDIA A100 80GB: nvidia-a100-80gb: Generally Available
    • A2 Standard: NVIDIA A100 40GB: nvidia-tesla-a100: Generally Available
  • N1 machine series
    • NVIDIA T4: nvidia-tesla-t4: Generally Available
    • NVIDIA V100: nvidia-tesla-v100: Generally Available
    • NVIDIA P100: nvidia-tesla-p100: Generally Available
    • NVIDIA P4: nvidia-tesla-p4: Generally Available

A3 machine series

To run NVIDIA H100 80GB GPUs, you must use an A3 accelerator-optimized machine. Each A3 machine type has a fixed GPU count, vCPU count, and memory size.

A3 machine series are available in two types:

  • A3 Mega: these machine types have H100 80GB Mega GPUs and Local SSD attached, a maximum network bandwidth speed of 1,800 Gbps.
  • A3 Standard: these machine types have H100 80GB GPUs and Local SSD attached, and a maximum network bandwidth speed of 1,000 Gbps.
Accelerator type Machine type GPU count GPU memory*
(GB HBM3)
vCPU count VM memory (GB) Attached Local SSD (GiB) Maximum network bandwidth (Gbps)
VM GPU cluster
nvidia-h100-mega-80gb a3-megagpu-8g 8 640 208 1,872 6,000 200 1,600
nvidia-h100-80gb a3-highgpu-8g 8 640 208 1,872 6,000 200 800

*GPU memory is the memory that is available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

G2 machine series

To use NVIDIA L4 GPUs, you must deploy a G2 accelerator-optimized machine.

Each G2 machine type has a fixed number of NVIDIA L4 GPUs and vCPUs attached. Each G2 machine type also has a default memory and a custom memory range. The custom memory range defines the amount of memory that you can allocate to your VM for each machine type. You can specify your custom memory during VM creation.

Accelerator type Machine type GPU count GPU memory* (GB GDDR6) vCPU count Default VM memory (GB) Custom VM memory range (GB) Max Local SSD supported (GiB)
nvidia-tesla-l4 or
nvidia-tesla-l4-vws
g2-standard-4 1 24 4 16 16 to 32 375
g2-standard-8 1 24 8 32 32 to 54 375
g2-standard-12 1 24 12 48 48 to 54 375
g2-standard-16 1 24 16 64 54 to 64 375
g2-standard-24 2 48 24 96 96 to 108 750
g2-standard-32 1 24 32 128 96 to 128 375
g2-standard-48 4 96 48 192 192 to 216 1,500
g2-standard-96 8 192 96 384 384 to 432 3,000

*GPU memory is the memory that is available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

A2 machine series

To use NVIDIA A100 GPUs on Google Cloud, you must deploy an A2 accelerator-optimized machine. Each A2 machine type has a fixed GPU count, vCPU count, and memory size.

A2 machine series are available in two types:

  • A2 Ultra: these machine types have A100 80GB GPUs and Local SSD attached.
  • A2 Standard: these machine types have A100 40GB GPUs attached.

A2 Ultra

Accelerator type Machine type GPU count GPU memory* (GB HBM2e) vCPU count VM memory (GB) Attached Local SSD (GiB)
nvidia-a100-80gb a2-ultragpu-1g 1 80 12 170 375
a2-ultragpu-2g 2 160 24 340 750
a2-ultragpu-4g 4 320 48 680 1,500
a2-ultragpu-8g 8 640 96 1,360 3,000

A2 Standard

Accelerator type Machine type GPU count GPU memory* (GB HBM2) vCPU count VM memory (GB) Local SSD supported
nvidia-tesla-a100 a2-highgpu-1g 1 40 12 85 Yes
a2-highgpu-2g 2 80 24 170 Yes
a2-highgpu-4g 4 160 48 340 Yes
a2-highgpu-8g 8 320 96 680 Yes
a2-megagpu-16g 16 640 96 1,360 Yes

*GPU memory is the memory that is available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

N1 machine series

You can attach the following GPU models to an N1 machine type with the exception of the N1 shared-core machine type.

N1 VMs with lower numbers of GPUs are limited to a maximum number of vCPUs. In general, a higher number of GPUs lets you create VM instances with a higher number of vCPUs and memory.

N1+T4 GPUs

You can attach NVIDIA T4 GPUs to N1 general-purpose VMs with the following VM configurations.

Accelerator type GPU count GPU memory* (GB GDDR6) vCPU count VM memory (GB) Local SSD supported
nvidia-tesla-t4 or
nvidia-tesla-t4-vws
1 16 1 to 48 1 to 312 Yes
2 32 1 to 48 1 to 312 Yes
4 64 1 to 96 1 to 624 Yes

*GPU memory is the memory that is available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

N1+P4 GPUs

You can attach NVIDIA P4 GPUs to N1 general-purpose VMs with the following VM configurations.

Accelerator type GPU count GPU memory* (GB GDDR5) vCPU count VM memory (GB) Local SSD supported
nvidia-tesla-p4 or
nvidia-tesla-p4-vws
1 8 1 to 24 1 to 156 Yes
2 16 1 to 48 1 to 312 Yes
4 32 1 to 96 1 to 624 Yes

*GPU memory is the memory that is available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

For VMs with attached NVIDIA P4 GPUs, Local SSD disks are only supported in zones us-central1-c and northamerica-northeast1-b.

N1+V100 GPUs

You can attach NVIDIA V100 GPUs to N1 general-purpose VMs with the following VM configurations.

Accelerator type GPU count GPU memory* (GB HBM2) vCPU count VM memory (GB) Local SSD supported
nvidia-tesla-v100 1 16 1 to 12 1 to 78 Yes
2 32 1 to 24 1 to 156 Yes
4 64 1 to 48 1 to 312 Yes
8 128 1 to 96 1 to 624 Yes

*GPU memory is the memory that is available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

For VMs with attached NVIDIA V100 GPUs, Local SSD disks aren't supported in us-east1-c.

N1+P100 GPUs

You can attach NVIDIA P100 GPUs to N1 general-purpose VMs with the following VM configurations.

For some NVIDIA P100 GPUs, the maximum CPU and memory that is available for some configurations is dependent on the zone in which the GPU resource is running.

Accelerator type GPU count GPU memory* (GB HBM2) vCPU count VM memory (GB) Local SSD supported
nvidia-tesla-p100 or
nvidia-tesla-p100-vws
1 16 1 to 16 1 to 104 Yes
2 32 1 to 32 1 to 208 Yes
4 64

1 to 64
(us-east1-c, europe-west1-d, europe-west1-b)

1 to 96
(all P100 zones)

1 to 208
(us-east1-c, europe-west1-d, europe-west1-b)

1 to 624
(all P100 zones)

Yes

*GPU memory is the memory that is available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

NVIDIA RTX Virtual Workstations (vWS) for graphics workloads

If you have graphics-intensive workloads, such as 3D visualization, you can create virtual workstations that use NVIDIA RTX Virtual Workstations (vWS) (formerly known as NVIDIA GRID). When you create a virtual workstation, an NVIDIA RTX Virtual Workstation (vWS) license is automatically added to your VM.

For information about pricing for virtual workstations, see GPU pricing page.

For graphics workloads, NVIDIA RTX virtual workstation (vWS) models are available:

  • G2 machine series: for G2 machine types you can enable NVIDIA L4 Virtual Workstations (vWS): nvidia-l4-vws

  • N1 machine series: for N1 machine types, you can enable the following virtual workstations:

    • NVIDIA T4 Virtual Workstations: nvidia-tesla-t4-vws
    • NVIDIA P100 Virtual Workstations: nvidia-tesla-p100-vws
    • NVIDIA P4 Virtual Workstations: nvidia-tesla-p4-vws

General comparison chart

The following table describes the GPU memory size, feature availability, and ideal workload types of different GPU models that are available on Compute Engine.

GPU model GPU memory Interconnect NVIDIA RTX Virtual Workstation (vWS) support Best used for
H100 80GB 80 GB HBM3 @ 3.35 TBps NVLink Full Mesh @ 900 GBps Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM
A100 80GB 80 GB HBM2e @ 1.9 TBps NVLink Full Mesh @ 600 GBps Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM
A100 40GB 40 GB HBM2 @ 1.6 TBps NVLink Full Mesh @ 600 GBps ML Training, Inference, HPC
L4 24 GB GDDR6 @ 300 GBps N/A ML Inference, Training, Remote Visualization Workstations, Video Transcoding, HPC
T4 16 GB GDDR6 @ 320 GBps N/A ML Inference, Training, Remote Visualization Workstations, Video Transcoding
V100 16 GB HBM2 @ 900 GBps NVLink Ring @ 300 GBps ML Training, Inference, HPC
P4 8 GB GDDR5 @ 192 GBps N/A Remote Visualization Workstations, ML Inference, and Video Transcoding
P100 16 GB HBM2 @ 732 GBps N/A ML Training, Inference, HPC, Remote Visualization Workstations

To compare GPU pricing for the different GPU models and regions that are available on Compute Engine, see GPU pricing.

Performance comparison chart

The following table describes the performance specifications of different GPU models that are available on Compute Engine.

Compute performance

GPU model FP64 FP32 FP16 INT8
H100 80GB 34 TFLOPS 67 TFLOPS
A100 80GB 9.7 TFLOPS 19.5 TFLOPS
A100 40GB 9.7 TFLOPS 19.5 TFLOPS
L4 0.5 TFLOPS* 30.3 TFLOPS
T4 0.25 TFLOPS* 8.1 TFLOPS
V100 7.8 TFLOPS 15.7 TFLOPS
P4 0.2 TFLOPS* 5.5 TFLOPS 22 TOPS
P100 4.7 TFLOPS 9.3 TFLOPS 18.7 TFLOPS

*To allow FP64 code to work correctly, a small number of FP64 hardware units are included in the T4, L4, and P4 GPU architecture.

TeraOperations per Second.

Tensor core performance

GPU model FP64 TF32 Mixed-precision FP16/FP32 INT8 INT4 FP8
H100 80GB 67 TFLOPS 989 TFLOPS 1,979 TFLOPS*, † 3,958 TOPS 3,958 TFLOPS
A100 80GB 19.5 TFLOPS 156 TFLOPS 312 TFLOPS* 624 TOPS 1248 TOPS
A100 40GB 19.5 TFLOPS 156 TFLOPS 312 TFLOPS* 624 TOPS 1248 TOPS
L4 120 TFLOPS 242 TFLOPS*, † 485 TOPS 485 TFLOPS
T4 65 TFLOPS 130 TOPS 260 TOPS
V100 125 TFLOPS
P4
P100

*For mixed precision training, NVIDIA H100, A100, and L4 GPUs also support the bfloat16 data type.

For H100 and L4 GPUs, structural sparsity is supported which you can use to double the performance value. The values shown are with sparsity. Specifications are one-half lower without sparsity.

What's next?