About GPUs on Google Cloud

Google Cloud is focused on delivering world-class artificial intelligence (AI) infrastructure to power your most demanding GPU-accelerated workloads across a wide range of segments. You can use GPUs on Google Cloud to run AI, machine learning (ML), scientific, analytics, engineering, consumer, and enterprise applications.

Through our partnership with NVIDIA, Google Cloud delivers the latest GPUs while optimizing the software stack with a wide array of storage and networking options. For a full list of GPUs available, see GPU platforms.

The following sections outline the benefits of GPUs on Google Cloud.

GPU-accelerated VMs

On Google Cloud, you can access and provision GPUs in the way that best suits your needs. A specialized accelerator-optimized machine family is available, with pre-attached GPUs and networking capabilities that are ideal for maximizing performance. These are available in the A3, A2, and G2 machine series.

Multiple provisioning options

You can provision clusters by using the accelerator-optimized machine family with any of the following open-source or Google Cloud products.

Vertex AI

Vertex AI is a fully-managed machine learning (ML) platform that you can use to train and deploy ML models and AI applications. In Vertex AI applications, you can use GPU-accelerated VMs to improve performance in the following ways:

Use GPU-enabled VMs in custom training GKE worker pools.
Use open source LLM models from the Vertex AI Model Garden.
Reduce prediction latency.
Improve performance of Vertex AI Workbench notebook code.
Improve performance of a Colab Enterprise runtime.

Hypercompute Cluster

Hypercompute Cluster is an infrastructure building block that lets you create a cluster of GPU-accelerated VMs that are deployed and maintained as a single, homogenous unit. This option is ideal for provisioning a densely allocated, performance-optimized infrastructure that has integrations for Google Kubernetes Engine (GKE) and Slurm schedulers. Hypercompute Cluster provides infrastructure that is specifically designed for running AI, ML, and HPC workloads. For more information, see Hypercompute Cluster.

To get started with Hypercompute Cluster, see Choose a deployment strategy.

Compute Engine

You can also create and manage individual VMs or small clusters of VMs with attached GPUs on Compute Engine. This method is mostly used for running graphics-intensive workloads, simulation workloads, or small-scale ML model training.

Deployment option	Deployment guides
Create a VM for serving and single node workloads	Create an A3 Edge or A3 High VM
Create managed instance groups (MIGs) This option uses the Dynamic Workload Scheduler (DWS) to provision VMs.	Create a MIG with GPU VMs
Create VMs in bulk	Create a group of GPU VMs in bulk
Create a single VM	Create a single GPU VM (Standard or Spot VMs)
Create virtual workstations	Create a virtual GPU-accelerated workstation

Cloud Run

You can configure GPUs for your Cloud Run service. GPUs are ideal for running AI inference workloads using large language models on Cloud Run.

On Cloud Run, consult these resources for running AI workloads on GPUs: