GPU machine types

This document outlines the NVIDIA GPU models available on Compute Engine, which you can use to accelerate machine learning (ML), data processing, and graphics-intensive workloads on your virtual machine (VM) instances. This document also details which GPUs come pre-attached to accelerator-optimized machine series such as A4X, A4, A3, A2, and G2, and which GPUs you can attach to N1 general-purpose instances.

Use this document to compare the performance, memory, and features of different GPU models. For a more detailed overview of the accelerator-optimized machine family, including information on CPU platforms, storage options, and networking capabilities, and to find the specific machine type that matches your workload, see Accelerator-optimized machine family.

For more information about GPUs on Compute Engine, see About GPUs.

To view available regions and zones for GPUs on Compute Engine, see GPUs regions and zone availability.

GPU models available

The following GPU models are available with the specified machine type to support your AI, ML, and HPC workloads. If you have graphics-intensive workloads, such as 3D visualization, you can also create virtual workstations that use NVIDIA RTX Virtual Workstations (vWS). NVIDIA RTX Virtual Workstation is available for some GPU models. When you create an instance that use NVIDIA RTX Virtual Workstation, Compute Engine automatically adds a vWS license. For information about pricing for virtual workstations, see GPU pricing page.

For the A and G series accelerator-optimized machine types, the specified GPU model automatically attaches to the instance. For the N1 general-purpose machine types, you can attach the GPU models specified.

Machine type	GPU model	NVIDIA RTX Virtual Workstation (vWS) model
A4X	NVIDIA GB200 Grace Blackwell Superchips (`nvidia-gb200`). Each Superchip contains four NVIDIA B200 Blackwell GPUs.
A4	NVIDIA B200 Blackwell GPUs (`nvidia-b200`)
A3 Ultra	NVIDIA H200 SXM GPUs (`nvidia-h200-141gb`)
A3 Mega	NVIDIA H100 SXM GPUs (`nvidia-h100-mega-80gb`)
A3 High and A3 Edge	NVIDIA H100 SXM GPUs (`nvidia-h100-80gb`)
A2 Ultra	NVIDIA A100 80GB GPUs (`nvidia-a100-80gb`)
A2 Standard	NVIDIA A100 40GB GPUs (`nvidia-a100-40gb`)
G4 (Preview)	NVIDIA RTX PRO 6000 Blackwell Server Edition (`nvidia-rtx-pro-6000`)
G2	NVIDIA L4 (`nvidia-l4`)	NVIDIA L4 Virtual Workstations (vWS) (`nvidia-l4-vws`)
N1	NVIDIA T4 GPUs (`nvidia-tesla-t4`)	NVIDIA T4 Virtual Workstations (vWS) (`nvidia-tesla-t4-vws`)
	NVIDIA P4 GPUs (`nvidia-tesla-p4`)	NVIDIA P4 Virtual Workstations (vWS) (`nvidia-tesla-p4-vws`)
	NVIDIA V100 GPUs (`nvidia-tesla-v100`)
	NVIDIA P100 GPUs (`nvidia-tesla-p100`)	NVIDIA P100 Virtual Workstations (vWS) (`nvidia-tesla-p100-vws`)

You can also use some GPU machine types on AI Hypercomputer. AI Hypercomputer is a supercomputing system that is optimized to support your artificial intelligence (AI) and machine learning (ML) workloads. This option is recommended for creating a densely allocated, performance-optimized infrastructure that has integrations for Google Kubernetes Engine (GKE) and Slurm schedulers.

A4X machine series

A4X accelerator-optimized machine types use NVIDIA GB200 Grace Blackwell Superchips (nvidia-gb200) and are ideal for foundation model training and serving.

A4X is an exascale platform based on NVIDIA GB200 NVL72. Each machine has two sockets with NVIDIA Grace CPUs with Arm Neoverse V2 cores. These CPUs are connected to four NVIDIA B200 Blackwell GPUs with fast chip-to-chip (NVLink-C2C) communication.

						Attached NVIDIA GB200 Grace Blackwell Superchips
Machine type	vCPU count^*	Instance memory (GB)	Attached Local SSD (GiB)	Physical NIC count	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB HBM3e)
`a4x-highgpu-4g`	140	884	12,000	6	2,000	4	720

^*A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
^†Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. For more information about network bandwidth, see Network bandwidth.
^‡GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

A4 machine series

A4 accelerator-optimized machine types have NVIDIA B200 Blackwell GPUs (nvidia-b200) attached and are ideal for foundation model training and serving.

						Attached NVIDIA Blackwell GPUs
Machine type	vCPU count^*	Instance memory (GB)	Attached Local SSD (GiB)	Physical NIC count	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB HBM3e)
`a4-highgpu-8g`	224	3,968	12,000	10	3,600	8	1,440

A3 machine series

A3 accelerator-optimized machine types have NVIDIA H100 SXM or NVIDIA H200 SXM GPUs attached.

A3 Ultra machine type

A3 Ultra machine types have NVIDIA H200 SXM GPUs (nvidia-h200-141gb) attached and provides the highest network performance in the A3 series. A3 Ultra machine types are ideal for foundation model training and serving.

						Attached NVIDIA H200 GPUs
Machine type	vCPU count^*	Instance memory (GB)	Attached Local SSD (GiB)	Physical NIC count	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB HBM3e)
`a3-ultragpu-8g`	224	2,952	12,000	10	3,600	8	1128

A3 Mega, High, and Edge machine types

To use NVIDIA H100 SXM GPUs, you have the following options:

A3 Mega: these machine types have H100 SXM GPUs (nvidia-h100-mega-80gb) and are ideal for large-scale training and serving workloads.
A3 High: these machine types have H100 SXM GPUs (nvidia-h100-80gb) and are well-suited for both training and serving tasks.
A3 Edge: these machine types have H100 SXM GPUs (nvidia-h100-80gb), are designed specifically for serving, and are available in a limited set of regions.

A3 Mega

Tip: When provisioning a3-megagpu-8g machine types, we recommend using a cluster of these instances and deploying with a scheduler such as Google Kubernetes Engine (GKE) or Slurm. For detailed instructions on either of these options, review the following:

To create Google Kubernetes Engine cluster, see Deploy an A3 Mega cluster with GKE.
To create a Slurm cluster, see Deploy an A3 Mega Slurm cluster.

						Attached NVIDIA H100 GPUs
Machine type	vCPU count^*	Instance memory (GB)	Attached Local SSD (GiB)	Physical NIC count	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB HBM3)
`a3-megagpu-8g`	208	1,872	6,000	9	1,800	8	640

A3 High

Tip: When provisioning a3-highgpu-1g, a3-highgpu-2g, or a3-highgpu-4g machine types, you must create instances using Spot VMs or a feature that uses the Dynamic Workload Scheduler (DWS), such as resize requests in a MIG. For detailed instructions on either of these options, review the following:

To create Spot VMs, set the provisioning model to SPOT when you Create an accelerator-optimized VM.
To create a resize request in a MIG, which uses DWS, see Create a MIG with GPU VMs.

						Attached NVIDIA H100 GPUs
Machine type	vCPU count^*	Instance memory (GB)	Attached Local SSD (GiB)	Physical NIC count	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB HBM3)
`a3-highgpu-1g`	26	234	750	1	25	1	80
`a3-highgpu-2g`	52	468	1,500	1	50	2	160
`a3-highgpu-4g`	104	936	3,000	1	100	4	320
`a3-highgpu-8g`	208	1,872	6,000	5	1,000	8	640

A3 Edge

						Attached NVIDIA H100 GPUs
Machine type	vCPU count^*	Instance memory (GB)	Attached Local SSD (GiB)	Physical NIC count	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB HBM3)
`a3-edgegpu-8g`	208	1,872	6,000	5	800: for asia-south1 and northamerica-northeast2 400: for all other A3 Edge regions	8	640

A2 machine series

A2 accelerator-optimized machine types have NVIDIA A100 GPUs attached and are ideal for model fine tuning, large model and cost optimized inference.

A2 machine series are available in two types:

A2 Ultra: these machine types have A100 80GB GPUs (nvidia-a100-80gb) and Local SSD disks attached.
A2 Standard: these machine types have A100 40GB GPUs (nvidia-tesla-a100) attached. You can also add Local SSD disks when creating an A2 Standard instance. For the number of disks you can attach, see Machine types that require you to choose a number of Local SSD disks.

A2 Ultra

					Attached NVIDIA A100 80GB GPUs
Machine type	vCPU count^*	Instance memory (GB)	Attached Local SSD (GiB)	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB HBM3)
`a2-ultragpu-1g`	12	170	375	24	1	80
`a2-ultragpu-2g`	24	340	750	32	2	160
`a2-ultragpu-4g`	48	680	1,500	50	4	320
`a2-ultragpu-8g`	96	1,360	3,000	100	8	640

A2 Standard

					Attached NVIDIA A100 40GB GPUs
Machine type	vCPU count^*	Instance memory (GB)	Local SSD supported	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB HBM3)
`a2-highgpu-1g`	12	85	Yes	24	1	40
`a2-highgpu-2g`	24	170	Yes	32	2	80
`a2-highgpu-4g`	48	340	Yes	50	4	160
`a2-highgpu-8g`	96	680	Yes	100	8	320
`a2-megagpu-16g`	96	1,360	Yes	100	16	640

G4 machine series

G4 accelerator-optimized machine types use NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs (nvidia-rtx-pro-6000) and are suitable for NVIDIA Omniverse simulation workloads, graphics-intensive applications, video transcoding, and virtual desktops. G4 machine types also provide a low-cost solution for performing single host inference and model tuning compared with A series machine types.

						Attached NVIDIA RTX PRO 6000 GPUs
Machine type	vCPU count^*	Instance memory (GB)	Attached Titanium SSD (GiB)	Physical NIC count	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB GDDR7)
`g4-standard-48`	48	180	1,500	1	50	1	96
`g4-standard-96`	96	360	3,000	1	100	2	192
`g4-standard-192`	192	720	6,000	1	200	4	384
`g4-standard-384`	384	1,440	12,000	2	400	8	768

^*A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
^†Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.
^‡GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

G2 machine series

G2 accelerator-optimized machine types have NVIDIA L4 GPUs attached and are ideal for cost-optimized inference, graphics-intensive and high performance computing workloads.

Each G2 machine type also has a default memory and a custom memory range. The custom memory range defines the amount of memory that you can allocate to your instance for each machine type. You can also add Local SSD disks when creating a G2 instance. For the number of disks you can attach, see Machine types that require you to choose a number of Local SSD disks.

						Attached NVIDIA L4 GPUs
Machine type	vCPU count^*	Default instance memory (GB)	Custom instance memory range (GB)	Max Local SSD supported (GiB)	Maximum network bandwidth (Gbps)^†	GPU count	GPU memory^‡ (GB GDDR6)
`g2-standard-4`	4	16	16 to 32	375	10	1	24
`g2-standard-8`	8	32	32 to 54	375	16	1	24
`g2-standard-12`	12	48	48 to 54	375	16	1	24
`g2-standard-16`	16	64	54 to 64	375	32	1	24
`g2-standard-24`	24	96	96 to 108	750	32	2	48
`g2-standard-32`	32	128	96 to 128	375	32	1	24
`g2-standard-48`	48	192	192 to 216	1,500	50	4	96
`g2-standard-96`	96	384	384 to 432	3,000	100	8	192

N1 machine series

You can attach the following GPU models to an N1 machine type with the exception of the N1 shared-core machine types.

Unlike the machine types in the accelerator-optimized machine series, N1 machine types don't come with a set number of attached GPUs. Instead, you specify the number of GPUs to attach when creating the instance.

N1 instances with fewer GPUs limit the maximum number of vCPUs. In general, a higher number of GPUs lets you create instances with a higher number of vCPUs and memory.

N1+T4 GPUs

You can attach NVIDIA T4 GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator type	GPU count	GPU memory^* (GB GDDR6)	vCPU count	Instance memory (GB)	Local SSD supported
`nvidia-tesla-t4` or `nvidia-tesla-t4-vws`	1	16	1 to 48	1 to 312	Yes
	2	32	1 to 48	1 to 312	Yes
	4	64	1 to 96	1 to 624	Yes

^*GPU memory is the memory available on a GPU device that you can use for temporary data storage. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

N1+P4 GPUs

You can attach NVIDIA P4 GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator type	GPU count	GPU memory^* (GB GDDR5)	vCPU count	Instance memory (GB)	Local SSD supported^†
`nvidia-tesla-p4` or `nvidia-tesla-p4-vws`	1	8	1 to 24	1 to 156	Yes
	2	16	1 to 48	1 to 312	Yes
	4	32	1 to 96	1 to 624	Yes

^*GPU memory is the memory that is available on a GPU device that you can use for temporary data storage. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

^†For instances with attached NVIDIA P4 GPUs, Local SSD disks are only supported in zones us-central1-c and northamerica-northeast1-b.

N1+V100 GPUs

You can attach NVIDIA V100 GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator type	GPU count	GPU memory^* (GB HBM2)	vCPU count	Instance memory (GB)	Local SSD supported^†
`nvidia-tesla-v100`	1	16	1 to 12	1 to 78	Yes
	2	32	1 to 24	1 to 156	Yes
	4	64	1 to 48	1 to 312	Yes
	8	128	1 to 96	1 to 624	Yes

N1+P100 GPUs

You can attach NVIDIA P100 GPUs to N1 general-purpose instances with the following instance configurations.

For some NVIDIA P100 GPUs, the maximum CPU and memory available for some configurations depends on the zone in which the GPU resource runs.

Accelerator type	GPU count	GPU memory^* (GB HBM2)	Zone	vCPU count	Instance memory (GB)	Local SSD supported
`nvidia-tesla-p100` or `nvidia-tesla-p100-vws`	1	16	All P100 zones	1 to 16	1 to 104	Yes
	2	32	All P100 zones	1 to 32	1 to 208	Yes
	4	64	`us-east1-c`, `europe-west1-d`, `europe-west1-b`	1 to 64	1 to 208	Yes
	4	64	All other P100 zones	1 to 96	1 to 624	Yes

General comparison chart

The following table describes the GPU memory size, feature availability, and ideal workload types of different GPU models that are available on Compute Engine.

GPU model	GPU memory	Interconnect	Best used for
GB200	180 GB HBM3e @ 8 TBps	NVLink Full Mesh @ 1,800 GBps	Large-scale distributed training and inference of LLMs, Recommenders, HPC
B200	180 GB HBM3e @ 8 TBps	NVLink Full Mesh @ 1,800 GBps	Large-scale distributed training and inference of LLMs, Recommenders, HPC
H200	141 GB HBM3e @ 4.8 TBps	NVLink Full Mesh @ 900 GBps	Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM
H100	80 GB HBM3 @ 3.35 TBps	NVLink Full Mesh @ 900 GBps	Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM
A100 80GB	80 GB HBM2e @ 1.9 TBps	NVLink Full Mesh @ 600 GBps	Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM
A100 40GB	40 GB HBM2 @ 1.6 TBps	NVLink Full Mesh @ 600 GBps	ML Training, Inference, HPC
RTX PRO 6000 (Preview)	96 GB GDDR7 with ECC @ 1597 GBps	N/A	ML Inference, Training, Remote Visualization Workstations, Video Transcoding, HPC
L4	24 GB GDDR6 @ 300 GBps	N/A	ML Inference, Training, Remote Visualization Workstations, Video Transcoding, HPC
T4	16 GB GDDR6 @ 320 GBps	N/A	ML Inference, Training, Remote Visualization Workstations, Video Transcoding
V100	16 GB HBM2 @ 900 GBps	NVLink Ring @ 300 GBps	ML Training, Inference, HPC
P4	8 GB GDDR5 @ 192 GBps	N/A	Remote Visualization Workstations, ML Inference, and Video Transcoding
P100	16 GB HBM2 @ 732 GBps	N/A	ML Training, Inference, HPC, Remote Visualization Workstations

To compare GPU pricing for the different GPU models and regions that are available on Compute Engine, see GPU pricing.

Performance comparison chart

The following table describes the performance specifications of different GPU models that are available on Compute Engine.

Compute performance

GPU model	FP64	FP32	FP16	INT8
GB200	90 TFLOPS	180 TFLOPS
B200	40 TFLOPS	80 TFLOPS
H200	34 TFLOPS	67 TFLOPS
H100	34 TFLOPS	67 TFLOPS
A100 80GB	9.7 TFLOPS	19.5 TFLOPS
A100 40GB	9.7 TFLOPS	19.5 TFLOPS
L4	0.5 TFLOPS^*	30.3 TFLOPS
T4	0.25 TFLOPS^*	8.1 TFLOPS
V100	7.8 TFLOPS	15.7 TFLOPS
P4	0.2 TFLOPS^*	5.5 TFLOPS		22 TOPS^†
P100	4.7 TFLOPS	9.3 TFLOPS	18.7 TFLOPS

^*To allow FP64 code to work correctly, the T4, L4, and P4 GPU architecture includes a small number of FP64 hardware units.
^†TeraOperations per Second.

Tensor core performance

GPU model	FP64	TF32	Mixed-precision FP16/FP32	INT8	INT4	FP8
GB200	90 TFLOPS	2,500 TFLOPS^†	5,000 TFLOPS^{*, †}	10,000 TFLOPS^†	20,000 TFLOPS^†	10,000 TFLOPS^†
B200	40 TFLOPS	1,100 TFLOPS^†	4,500 TFLOPS^{*, †}	9,000 TFLOPS^†		9,000 TFLOPS^†
H200	67 TFLOPS	989 TFLOPS^†	1,979 TFLOPS^{*, †}	3,958 TOPS^†		3,958 TFLOPS^†
H100	67 TFLOPS	989 TFLOPS^†	1,979 TFLOPS^{*, †}	3,958 TOPS^†		3,958 TFLOPS^†
A100 80GB	19.5 TFLOPS	156 TFLOPS	312 TFLOPS^*	624 TOPS	1248 TOPS
A100 40GB	19.5 TFLOPS	156 TFLOPS	312 TFLOPS^*	624 TOPS	1248 TOPS
L4		120 TFLOPS^†	242 TFLOPS^{*, †}	485 TOPS^†		485 TFLOPS^†
T4			65 TFLOPS	130 TOPS	260 TOPS
V100			125 TFLOPS
P4
P100

^*For mixed precision training, NVIDIA GB200, B200, H200, H100, A100, and L4 GPUs also support the bfloat16 data type.
^†NVIDIA GB200, B200, H200, H100, and L4 GPUs support structural sparsity. You can use structural sparsity to double the performance of your models. The values that are documented apply when using structured sparsity. If you aren't using structured sparsity, the values are halved.

What's next?

Learn more about Compute Engine GPUs.
Check GPU regions and zones availability.
Review Network bandwidths and GPUs.
View GPU pricing details.