About GPUs on Google Cloud


Google Cloud is focused on delivering world-class artificial intelligence (AI) infrastructure to power your most demanding GPU-accelerated workloads across a wide range of segments. You can use GPUs on Google Cloud to run AI, machine learning (ML), scientific, analytics, engineering, consumer, and enterprise applications.

Through our partnership with NVIDIA, Google Cloud delivers the latest GPUs while optimizing the software stack with a wide array of storage and networking options. For a full list of GPUs available, see GPU platforms.

The following sections outline the benefits of GPUs on Google Cloud.

GPU-accelerated VMs

On Google Cloud, you can access and provision GPUs in the way that best suits your needs. A specialized accelerator-optimized machine family is available, with pre-attached GPUs and networking capabilities that are ideal for maximizing performance. These are available in the A3, A2, and G2 machine series.

Multiple provisioning options

You can provision clusters by using the accelerator-optimized machine family with any of the following open-source or Google Cloud products.

Vertex AI

Vertex AI is a fully-managed machine learning (ML) platform that you can use to train and deploy ML models and AI applications. In Vertex AI applications, you can use GPU-accelerated VMs to improve performance in the following ways:

GKE and Slurm

Large-scale orchestration platforms, such as GKE, are ideal for provisioning large clusters that can be used for training and fine-tuning large-scale ML models. Large-scale ML models are those that use vast quantities of data.

The following orchestration platforms are available on Google Cloud.

  • Google Kubernetes Engine (GKE): is a service that you can use to deploy and operate containerized applications at scale using Google's infrastructure.

  • Slurm: is an open-source cluster management and job scheduling tool. On Google Cloud you can deploy Slurm clusters by using Cluster Toolkit.

Run large-scale model training and fine-tuning

For training or fine-tuning large-scale models, we recommend using a cluster of A3 Mega (a3-megagpu-8g) machines and deploying with a scheduler such as GKE or Slurm.

Deployment option

Deployment guides

Slurm

Deploy an A3 Mega Slurm cluster

GKE

Deploy an A3 Mega cluster with GKE

Run mainstream model training and fine-tuning

For training and fine-tuning of mainstream models, we recommend using the A3 High with 8 GPUs (a3-highgpu-8g) and deploying with a scheduler such as GKE or Slurm. You can also use an A2 or G2 machine type.

Deployment option

Deployment guides

Workloads

GKE

Deploy autopilot or standard node pools

Inference: Serve models on GKE

Training: Train a model on GKE

Slurm

Run Llama-2 fine tuning on a G2 Slurm cluster

Single VMs

Create A3 High VMs (with GPUDirect-TCPX enabled)

Compute Engine

You can also create and manage single VMs or smaller clusters of VMs with attached GPUs on Compute Engine. This method is mostly used for running graphics-intensive workloads, simulation workloads, or small-scale training. For these workloads we recommend G2, small A3 High (those with 1, 2, or 4 GPUs attached), and N1 machine types with T4, P4, P100 and V100 GPUs.

Deployment option

Deployment guides

Create a VM for serving and single node workloads

Create an A3 Edge or A3 High VM

Create managed instance groups (MIGs)

This option uses the Dynamic Workload Scheduler (DWS) to provision VMs.

Create a MIG with GPU VMs

Create VMs in bulk

Create a group of GPU VMs in bulk

Create a single VM

Create a single GPU VM (Standard or Spot VMs)

Create virtual workstations

Create a virtual GPU-accelerated workstation

Cloud Run

You can configure GPUs for your Cloud Run service. GPUs are ideal for running AI inference workloads using large language models on Cloud Run.

On Cloud Run, consult these resources for running AI workloads on GPUs: