This document outlines the factors to consider when creating A3 Ultra virtual machine (VM) instances and clusters that are deployed on Hypercompute Cluster. For more information about Hypercompute Cluster, see Hypercompute Cluster.
Overview
The following factors must be considered when working with Hypercompute Cluster.
- The deployment type for your reservation block. See Deployment types.
- The maintenance scheduling type. See Maintenance scheduling types.
- The deployment tool. See Cluster deployment tool.
- The provisioning model. See Provisioning models.
- The operating system (OS). OS images are available that are specifically designed to support your accelerated workloads. For more information about these OS images, see Operating systems.
Reservation block deployment types
Hypercompute Cluster provisions blocks of densely allocated hosts. With dense deployments, you get the following benefits:
- Hosts are allocated physically close to each other to minimize network hops, and are optimized for the lowest latency.
- Non-blocking networking for consistent high bandwidth, low latency VM connectivity using Google's dynamic ML network fabric.
- Access to network topology provides a hierarchical view of the relative proximity between VMs. This is useful for advanced job scheduling use cases.
- Fine grained topology-aware placement when using orchestrators.
- Fine-grained user control over maintenance schedules to maximize job scheduling and uptime while minimizing overall downtime.
To request these blocks of densely allocated resources, see Request capacity.
Maintenance scheduling types
Maintenance scheduling determines how Hypercompute Cluster schedules host maintenance for VMs running in your cluster. VMs can either be grouped and have synchronized scheduling or can be loosely coupled and have individual scheduling.
Grouped maintenance scheduling
The maintenance scheduling type ensures that no matter when a VM is provisioned, at the same time or individually, all VMs running the same workload have the the same planned maintenance frequency. This tightly coupled maintenance is particularly useful for environments that use a job scheduler such as Slurm or Google Kubernetes Engine.
This maintenance scheduling type is ideal if you are running training or other highly parallelized-computing workloads. It lets you optimize your jobs by giving you complete control over your used and unused capacity.
Independent maintenance scheduling
This configuration ensures that VMs can have different maintenance schedules. This maintenance scheduling type is ideal if you are running inference or limited-scale training where the workloads run more efficiently when they have separate schedules. Limited scale training workloads are small in scale, can tolerate lower SLOs, and run for short durations.
The independent maintenance scheduling type will be available for future releases.
Cluster deployment
Cluster Toolkit is open-source software offered by Google Cloud that provides the recommended deployment tool for Hypercompute Cluster. Cluster Toolkit can deploy both GKE or Slurm clusters.
Alternatively, you can choose to provision your VM groups by using either the Bulk API or managed instance groups (MIGs). With these alternatives, you can incorporate your own workload scheduler as needed. You can also provision single VMs by using the instance create methods.
Provisioning models
When creating VMs, the following provisioning models are available on Compute Engine for your requested resources based on the consumption option.
- Standard: used for provisioning on-demand or pay-as-you-go (PAYG) VMs
- Spot: used for provisioning Spot VMs
Reservation-bound: used for provisioning VMs that use time-limited reservations such as the reserved blocks of capacity delivered by Hypercompute Cluster.
Reservation-bound provisioning model
In the reservation-bound provisioning model, a VM's lifecycle is tied to the specific reservation that it's consuming. Specifically, the VM exists when you create it to consume the reservation, and Compute Engine deletes the VM when one of the following happens:
- You delete the VM.
- The reservation reaches its end time. If the VM is still running at that time, then Compute Engine forcefully stops it before deleting it.
You incur charges for the VM at the same rate as on-demand reservations, regardless if the VM is running or not. For more information, see the pricing for on-demand reservations in the Compute Engine documentation.
To set the reservation-bound provisioning model for your VM, specify the following options when your create the VM.
--provisioning-model=RESERVATION_BOUND
in the gcloud CLI"provisioningModel": "RESERVATION_BOUND"
in the Compute Engine API
For detailed instructions about specifying these parameters when creating VMs or MIGs, see Create instance templates. If you are using Cluster Toolkit to deploy your clusters, the cluster blueprint configures this for you.
What's next?
- Request capacity.
- Learn how to create VMs and MIGs.
- Learn how to Create a GKE Hypercompute Cluster with default configuration.
- Learn how to Create custom GKE Hypercompute Clusters.
- Learn how to create Slurm clusters.