Deployment options overview

To run artificial intelligence (AI), machine learning (ML), or high performance computing (HPC) workloads, you can deploy AI-optimized VMs and clusters of A4X, A4, and A3 Ultra machines. For more information about the features of these machines that enable you to run large-scale AI/ML clusters, see Cluster management overview.

You can create A4X, A4, and A3 Ultra VMs directly from Compute Engine, or through other services that run on Compute Engine instances like Cluster Toolkit or Google Kubernetes Engine.

For the most appropriate option to create your VMs or clusters for your use case, choose one of the following:

Option Use case
Cluster Toolkit

You want to use open-source software that simplifies the process for you to deploy both Slurm and GKE clusters. Cluster Toolkit is designed to be highly customizable and extensible. To learn more, see the following:

GKE You want maximum flexibility in configuring your Google Kubernetes Engine cluster based on the needs of your workload. To learn more, see Create a custom AI-optimized Google Kubernetes Engine cluster.
Use Compute Engine

You want full control of the infrastructure layer so that you can set up your own orchestrator. To learn more, see the following: