GPUs on Compute Engine

Compute Engine provides graphics processing units (GPUs) that you can add to your virtual machine (VM) instances. You can use these GPUs to accelerate specific workloads on your instances such as machine learning and data processing.

If you have graphics-intensive workloads, such as 3D visualization, 3D rendering, or virtual applications, you can create virtual workstations that use NVIDIA® GRID® technology. For information on GPUs for graphics-intensive applications, see GPUs for graphics workloads.

This document provides an overview of GPUs on Compute Engine, for more information about working with GPUs, review the following resources:

Try it for yourself

If you're new to Google Cloud, create an account to evaluate how Compute Engine performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Try Compute Engine free

Introduction

Compute Engine provides NVIDIA® GPUs for your instances in passthrough mode so that your virtual machine instances have direct control over the GPUs and their associated memory.

For compute workloads, GPU models are available in the following stages:

  • NVIDIA® A100: Generally Available
  • NVIDIA® T4: nvidia-tesla-t4: Generally Available
  • NVIDIA® V100: nvidia-tesla-v100: Generally Available
  • NVIDIA® P100: nvidia-tesla-p100: Generally Available
  • NVIDIA® P4: nvidia-tesla-p4: Generally Available
  • NVIDIA® K80: nvidia-tesla-k80: Generally Available

For graphics workloads, GPU models are available in the following stages:

  • NVIDIA® T4 Virtual Workstations: nvidia-tesla-t4-vws: Generally Available
  • NVIDIA® P100 Virtual Workstations: nvidia-tesla-p100-vws: Generally Available
  • NVIDIA® P4 Virtual Workstations: nvidia-tesla-p4-vws: Generally Available

For information on GPUs for virtual workstations, see GPUs for graphics workloads.

You can attach GPUs only to instances with predefined or custom machine types. GPUs are not supported on shared-core or memory-optimized machine types.

You can also add local SSDs to GPUs. For a list of local SSD support by GPU types and regions, see Local SSD availability by GPU regions and zones.

Pricing

Most GPU devices receive sustained use discounts similar to vCPUs. For hourly and monthly pricing for GPU devices, see GPU pricing page.

GPU models

NVIDIA® A100 GPUs

To run NVIDIA® A100 GPUs, you must use the accelerator-optimized (A2) machine type.

Each A2 machine type has a fixed GPU count, vCPU count, and memory size.

GPU model Machine type GPUs GPU memory Available vCPUs Available memory
NVIDIA® A100 a2-highgpu-1g 1 GPU 40 GB HBM2 12 vCPUs 85 GB
a2-highgpu-2g 2 GPUs 80 GB HBM2 24 vCPUs 170 GB
a2-highgpu-4g 4 GPUs 160 GB HBM2 48 vCPUs 340 GB
a2-highgpu-8g 8 GPUs 320 GB HBM2 96 vCPUs 680 GB
a2-megagpu-16g 16 GPUs 640 GB HBM2 96 vCPUs 1360 GB

Other available NVIDIA® GPU models

VMs with lower numbers of GPUs are limited to a maximum number of vCPUs. In general, a higher number of GPUs lets you create instances with a higher number of vCPUs and memory.

GPU model GPUs GPU memory Available vCPUs Available memory
NVIDIA® T4 1 GPU 16 GB GDDR6 1 - 24 vCPUs 1 - 156 GB
2 GPUs 32 GB GDDR6 1 - 48 vCPUs 1 - 312 GB
4 GPUs 64 GB GDDR6 1 - 96 vCPUs 1 - 624 GB
NVIDIA® P4 1 GPU 8 GB GDDR5 1 - 24 vCPUs 1 - 156 GB
2 GPUs 16 GB GDDR5 1 - 48 vCPUs 1 - 312 GB
4 GPUs 32 GB GDDR5 1 - 96 vCPUs 1 - 624 GB
NVIDIA® V100 1 GPU 16 GB HBM2 1 - 12 vCPUs 1 - 78 GB
2 GPUs 32 GB HBM2 1 - 24 vCPUs 1 - 156 GB
4 GPUs 64 GB HBM2 1 - 48 vCPUs 1 - 312 GB
8 GPUs 128 GB HBM2 1 - 96 vCPUs 1 - 624 GB
NVIDIA® P100 1 GPU 16 GB HBM2 1 - 16 vCPUs 1 - 104 GB
2 GPUs 32 GB HBM2 1 - 32 vCPUs 1 - 208 GB
4 GPUs 64 GB HBM2

1 - 64 vCPUs
(us-east1-c, europe-west1-d, europe-west1-b)

1 - 96 vCPUs
(all P100 zones)

1 - 208 GB
(us-east1-c, europe-west1-d, europe-west1-b)

1 - 624 GB
(all P100 zones)

NVIDIA® K80 1 GPU 12 GB GDDR5 1 - 8 vCPUs 1 - 52 GB
2 GPUs 24 GB GDDR5 1 - 16 vCPUs 1 - 104 GB
4 GPUs 48 GB GDDR5 1 - 32 vCPUs 1 - 208 GB
8 GPUs 96 GB GDDR5 1 - 64 vCPUs

1 - 416 GB
(asia-east1-a and us-east1-d)

1 - 208 GB
(all K80 zones)

Note:
  • For a more detailed description of zones, see Regions and zones.
  • NVIDIA® K80® boards contain two GPUs each. The pricing for K80 GPUs is by individual GPU, not by the board.

NVIDIA® GRID® GPUs for graphics workloads

If you have graphics-intensive workloads, such as 3D visualization, you can create virtual workstations that use NVIDIA® GRID® platform. For background information about NVIDIA® GRID®, see the GRID overview.

When you select a GPU for a virtual workstation, an NVIDIA® GRID® license is added to your VM. For more information about pricing, see GPU pricing page.

To set up an NVIDIA® GRID® virtual workstation, you need to create a VM with Virtual Workstation enabled and install a GRID driver.

After you create your virtual workstation, you can connect to it using a remote desktop protocol such as Teradici® PCoIP or VMware® Horizon View.

GPU model GPUs GPU memory Available vCPUs Available memory
NVIDIA® T4 Virtual Workstation 1 GPU 16 GB GDDR6 1 - 24 vCPUs 1 - 156 GB
2 GPUs 32 GB GDDR6 1 - 48 vCPUs 1 - 312 GB
4 GPUs 64 GB GDDR6 1 - 96 vCPUs 1 - 624 GB
NVIDIA® P4 Virtual Workstation 1 GPU 8 GB GDDR5 1 - 16 vCPUs 1 - 156 GB
2 GPUs 16 GB GDDR5 1 - 48 vCPUs 1 - 312 GB
4 GPUs 32 GB GDDR5 1 - 96 vCPUs 1 - 624 GB
NVIDIA® P100 Virtual Workstation 1 GPU 16 GB HBM2 1 - 16 vCPUs 1 - 104 GB
2 GPUs 32 GB HBM2 1 - 32 vCPUs 1 - 208 GB
4 GPUs 64 GB HBM2

1 - 64 vCPUs
(us-east1-c, europe-west1-d, europe-west1-b)

1 - 96 vCPUs
(all P100 zones)

1 - 208 GB
(us-east1-c, europe-west1-d, europe-west1-b)

1 - 624 GB
(all P100 zones)

Network bandwidths and GPUs

Using higher network bandwidths can improve the performance of distributed workloads. For more information, see Network bandwidths and GPUs.

GPUs on preemptible instances

You can add GPUs to your preemptible VM instances at lower preemptible prices for the GPUs. GPUs attached to preemptible instances work like normal GPUs but persist only for the life of the instance. Preemptible instances with GPUs follow the same preemption process as all preemptible instances.

During maintenance events, preemptible instances with GPUs are preempted by default and cannot be automatically restarted. If you want to recreate your instances after they have been preempted, use a managed instance group. Managed instance groups recreate your instances if the vCPU, memory, and GPU resources are available.

If you want a warning before your instance is preempted, or want to configure your instance to automatically restart after a maintenance event, use a non-preemptible instance with a GPU. For non-preemptible instances with GPUs, Google provides one hour advance notice before preemption.

Compute Engine does not charge you for GPUs if their instances are preempted in the first minute after they start running.

For steps to automatically restart a non-preemptible instance, see Updating options for an instance.

To learn how to create preemptible instances with GPUs attached, read Creating VMs with attached GPUs.

Reserving GPUs with committed use discounts

To reserve GPU resources in a specific zone, see Reserving zonal resources. Reservations are required for committed use discounted pricing for GPUs.

GPU comparison chart

Review this section to learn more about factors such as performance specifications, feature availability, and ideal workload types that are best suited for the different GPU models that are available on Compute Engine.

The maximum CPU and memory that is available for any GPU model is dependent on the zone in which the GPU resource is running. For more information about memory, CPU resources, and available region and zones, see GPU list.

General comparison

Metric A100 T4 V100 P4 P100 K80
Memory 40 GB HBM2 @ 1.6TB/s 16 GB GDDR6 @ 320 GB/s 16 GB HBM2 @ 900 GB/s 8 GB GDDR5 @ 192 GB/s 16 GB HBM2 @ 732 GB/s 12 GB GDDR5 @ 240 GB/s
Interconnect NVLink Full Mesh @ 600 GB/s N/A NVLink Ring @ 300 GB/s N/A N/A N/A
GRID remote workstation support
Best used for ML Training, Inference, HPC ML Inference, Training, Remote Visualization Workstations, Video Transcoding ML Training, Inference, HPC Remote Visualization Workstations, ML Inference, and Video Transcoding ML Training, Inference, HPC, Remote Visualization Workstations ML Inference, Training, HPC
Pricing To compare GPU pricing for the different GPU models and regions that are available on Compute Engine, see GPU pricing.

Performance comparison

Metric A100 T4 V100 P4 P100 K80
Compute performance
FP64 9.7 TFLOPS 0.25 TFLOPS1 7.8 TFLOPS 0.2 TFLOPS1 4.7 TFLOPS 1.46 TFLOPS
FP32 19.5 TFLOPS 8.1 TFLOPS 15.7 TFLOPS 5.5 TFLOPS 9.3 TFLOPS 4.37 TFLOPS
FP16 18.7 TFLOPS
INT8 22 TOPS2
Tensor core performance
FP64 19.5 TFLOPS
TF32 156 TFLOPS
Mixed-precision FP16/FP32 312 TFLOPS3 65 TFLOPS 125 TFLOPS
INT8 624 TOPS2 180 TOPS2
INT4 1248 TOPS2 260 TOPS2

1To allow FP64 code to work correctly, a small number of FP64 hardware units are included in the T4 and P4 GPU architecture.

2TeraOperations per Second.

3 For mixed precision training, NVIDIA A100 also supports the bfloat16 data type.

Restrictions

For VMs with attached GPUs, the following restrictions apply:

  • If you want to use NVIDIA® K80 GPUs with your VMs, the VMs cannot use the Intel Skylake or later CPU platforms.

  • GPUs are currently only supported with general-purpose N1 or accelerator-optimized A2 machine types.

  • You cannot attach GPUs to VMs with shared-core machine types.

  • VMs with attached GPUs must stop for host maintenance events, but can automatically restart. Host maintenance events, on Compute Engine, have a frequency of once every two weeks but might occasionally run more frequently. You must configure your workloads to handle these maintenance events cleanly. Specifically, long-running workloads like machine learning and high-performance computing (HPC) must handle the interruption of host maintenance events. For more information, see Handling GPU host maintenance events.

  • To protect Compute Engine systems and users, new projects have a global GPU quota, which limits the total number of GPUs you can create in any supported zone. When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones.

  • VMs with one or more GPUs have a maximum number of vCPUs for each GPU that you add to the instance. For example, each NVIDIA® K80 GPU lets you have up to eight vCPUs and up to 52 GB of memory in your instance machine type. To see the available vCPU and memory ranges for different GPU configurations, see the GPUs list.

  • GPUs require device drivers in order to function properly. NVIDIA GPUs running on Compute Engine must use a minimum driver version. For more information about driver versions, see Required NVIDIA driver versions.

  • VMs with a specific attached GPU model are covered by the Compute Engine SLA only if that attached GPU model is generally available and is supported in more than one zones in the same region. The Compute Engine SLA does not cover GPU models in the following zones:

    • NVIDIA® A100:
      • asia-southeast1-c
    • NVIDIA® T4:
      • australia-southeast1-a
      • europe-west3-b
      • southamerica-east1-c
    • NVIDIA® V100:
      • asia-east1-c
      • us-east1-c
    • NVIDIA® P100:
      • australia-southeast1-c
      • europe-west4-a
    • NVIDIA® K80:
      • us-west1-b
  • Compute Engine supports the running of 1 concurrent user per GPU.

What's next?