Best practices for running HPC workloads

This document provides best practices for tuning Google Cloud resources for optimal performance of high performance computing (HPC) workloads.

Use the compute-optimized machine type

We recommend that you use the compute-optimized machine family: H3, C2, or C2D. Virtual machine (VM) instances created with this machine type have a fixed virtual-to-physical core mapping and expose NUMA cell architecture to the guest OS, both of which are critical for performance of tightly-coupled HPC applications.

To reduce the communication overhead between VM nodes, we recommend that you consolidate onto a smaller number of c2-standard-60 or c2d-standard-112 VMs (with the same total core count) instead of launching a larger number of smaller C2 or C2D VMs. The greatest bottleneck in MPI workloads is inter-node communication, and larger VM shapes minimize this communication.

Use compact placement policies

To reduce internode latency, VM instance placement policies enable control over the placement of VMs in Google Cloud data centers. Compact placement policies are recommended as they provide lower-latency communication within a single zone.

Use the HPC VM image

We recommend that you use the HPC VM image, which incorporates best practices for running HPC applications on Google Cloud. These images are based on Rocky Linux 8 and are available at no additional cost on Google Cloud.

Disable automatic updates

Automatic updates can be a source of significant and unpredictable performance degradation. To disable automatic updates, you can use the google_disable_automatic_updates metadata flag on VMs that use HPC VM images version v20240712 or later. Any VM image whose base image is an HPC VM image can also use this feature, such as Slurm images.

For example, this will affect dnf automatic package updates on the following image families:

HPC images, such as hpc-rocky-linux-8 (project cloud-hpc-image-public)
Slurm images, such as slurm-gcp-6-6-hpc-rocky-linux-8 (project schedmd-slurm-public)

Cluster Toolkit provides an easy-to-use setting on relevant modules to set this metadata flag for you: allow_automatic_updates: false. Here is an example using the vm-instance module:

‐ id: workstation-rocky
  source: modules/compute/vm-instance
  use: [network]
  settings:
    allow_automatic_updates: false

Here is an example for a Slurm nodeset:

‐ id: dynamic_nodeset
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
  use: [network]
  settings:
    node_count_static: 1
    node_count_dynamic_max: 4
    allow_automatic_updates: false

Adjust HPC VM image tunings

To get the best performance on Google Cloud, use the following image tunings.

The following sample command can be used for manually configuring a VM, to run HPC workloads. However, all of this tuning is handled automatically by using a cluster blueprint from Cluster Toolkit.

To create the VM manually, use the Google Cloud CLI and provide the following settings.

gcloud compute instances create VM_NAME \
    --image-family=hpc-rocky-linux-8  \
    --image-project=cloud-hpc-image-public \
    --machine-type=MACHINE_TYPE \
    --network-interface=nic-type=GVNIC \
    --metadata=google_mpi_tuning=--hpcthroughput \
    --threads-per-core=1

In the preceding sample command the following tunings are applied:

Set Google Virtual NIC (gVNIC) network interface to enable better communication performance and higher throughput: --network-interface=nic-type=GVNIC.
Set network HPC throughput profile: --metadata=google_mpi_tuning=--hpcthroughput.

If the VM already exists, run sudo google_mpi_tuning --hpcthroughput to update the network HPC throughput profile setting.
Disable simultaneous multithreading (SMT) in the guest OS: --threads-per-core=1.

If the VM already exists, run sudo google_mpi_tuning --nosmt to disable simultaneous multithreading.
Turn off Meltdown and Spectre mitigations. The HPC VM image enables this setting by default.

Caution: Disabling these mitigations can incur a security risk.

If the VM already exists, run sudo google_mpi_tuning --nomitigation to turn off Meltdown and Spectre mitigations.

Configure file system tunings

Following are the primary storage choices for tightly-coupled applications. Each choice has its own cost, performance profile, APIs, and consistency semantics.

NFS-based solutions such as Filestore and NetApp Cloud Volumes can be used for deploying shared storage options. Both Filestore and NetApp Cloud Volumes are fully managed on Google Cloud, and we recommend that you use them when your application does not have extreme I/O requirements to a single dataset.

For performance limits, see the Filestore and NetApp Cloud Volumes documentation.
POSIX-based parallel file systems are more commonly used by MPI applications. POSIX-based options include open-source Lustre and the fully-supported Lustre offering, DDN Storage EXAScaler Cloud.
Intel DAOS is another option supported by the Cluster Toolkit. DAOS is a performant option for an ephemeral scratch file-system. This option requires some additional setup, including creating custom compute VM images. For more information, see Intel-DAOS in the Cluster Toolkit GitHub repository.

In the tutorial guide associated to this document, the deployment implements Filestore, which does not achieve the performance of Lustre, but has the convenience of being a Google Cloud fully managed service.

Use Intel MPI

For best performance, we recommend that you use Intel MPI.

For Ansys Fluent, we recommend that you use Intel MPI 2018.4.274. You can also set the version of Intel MPI in Ansys Fluent by using the following command. Replace MPI_DIRECTORY with the path to the directory that contains your Intel MPI library.
```
export INTELMPI_ROOT="MPI_DIRECTORY/compilers_and_libraries_2018.5.274/linux/mpi/intel64/"
```
Intel MPI collective algorithms can be tuned for optimal performance. The recommended collective algorithms for Ansys Fluent is -genv I_MPI_ADJUST_BCAST 8 -genv I_MPI_ADJUST_ALLREDUCE 10.
For Simcenter STAR-CCM+, we also recommend you use the TCP fabric providers by specifying the following environment variables: I_MPI_FABRICS shm:ofi and FI_PROVIDER tcp.

Summary of best practices

The following is a summary of the recommended best practices for running HPC workloads on Google Cloud.

Resource	Recommendation
Machine family	Use compute-optimized machine family (H3, C2 or C2D)
OS image	Use the HPC VM image Apply the HPC VM image best practices
File system	Use one of the following: A Google Cloud managed service such as Filestore A posix-based parallel file system such as DDN Lustre Intel-DAOS
MPI	Use Intel MPI

What's next

Run Ansys Fluent on Google Cloud.
Run Simcenter STAR-CCM+ on Google Cloud.