Best practices for running HPC workloads

This document provides best practices for tuning Google Cloud resources for optimal performance of high performance computing (HPC) workloads.

Use the compute-optimized machine type

Use the compute-optimized machine family: H3, C2, or C2D. Virtual machine (VM) instances created with this machine type have a fixed virtual-to-physical core mapping. They also expose NUMA cell architecture to the guest OS. Both features are critical for the performance of tightly-coupled HPC applications.

To reduce communication overhead between VM nodes, consolidate onto a smaller number of c2-standard-60 or c2d-standard-112 VMs (with the same total core count) instead of launching a larger number of smaller C2 or C2D VMs. Inter-node communication is the greatest bottleneck in MPI workloads. Larger VM shapes minimize this communication.

Use compact placement policies

To reduce internode latency, VM instance placement policies let you control the placement of VMs in Google Cloud data centers. We recommend compact placement policies because they provide lower-latency communication within a single zone.

Use the HPC VM image

Use the HPC VM image, which incorporates best practices for running HPC applications on Google Cloud. These images are based on Rocky Linux 8 and are available at no additional cost on Google Cloud.

Disable automatic updates

Automatic updates can significantly and unpredictably degrade performance. To disable automatic updates, use the google_disable_automatic_updates metadata flag on VMs that use HPC VM images version v20240712 or later. Any VM image that has an HPC VM image as its base can also use this feature, for example, Slurm images.

For example, this setting affects dnf automatic package updates on the following image families:

  • HPC images, such as hpc-rocky-linux-8 (project cloud-hpc-image-public)
  • Slurm images, such as slurm-gcp-6-6-hpc-rocky-linux-8 (project schedmd-slurm-public)

Cluster Toolkit provides a convenient setting on relevant modules to set this metadata flag for you: allow_automatic_updates: false. Here is an example using the vm-instance module:

- id: workstation-rocky
  source: modules/compute/vm-instance
  use: [network]
  settings:
    allow_automatic_updates: false

Here is an example for a Slurm nodeset:

- id: dynamic_nodeset
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
  use: [network]
  settings:
    node_count_static: 1
    node_count_dynamic_max: 4
    allow_automatic_updates: false

Adjust HPC VM image tunings

To get the best performance on Google Cloud, use the following image tunings.

You can use the following sample command to manually configure a VM to run HPC workloads. However, Cluster Toolkit automatically handles all of this tuning when you use a cluster blueprint.

To create the VM manually, use the Google Cloud CLI and provide the following settings.

gcloud compute instances create VM_NAME \
    --image-family=hpc-rocky-linux-8  \
    --image-project=cloud-hpc-image-public \
    --machine-type=MACHINE_TYPE \
    --network-interface=nic-type=GVNIC \
    --metadata=google_mpi_tuning=--hpcthroughput \
    --threads-per-core=1

The preceding sample command applies the following tunings:

  • Sets Google Virtual NIC (gVNIC) network interface to enable better communication performance and higher throughput: --network-interface=nic-type=GVNIC.

  • Sets network HPC throughput profile: --metadata=google_mpi_tuning=--hpcthroughput.

    If the VM already exists, run sudo google_mpi_tuning --hpcthroughput to update the network HPC throughput profile setting.

  • Disables simultaneous multithreading (SMT) in the guest OS: --threads-per-core=1.

    If the VM already exists, run sudo google_mpi_tuning --nosmt to disable simultaneous multithreading.

  • Turns off Meltdown and Spectre mitigations. The HPC VM image enables this setting by default.

    If the VM already exists, run sudo google_mpi_tuning --nomitigation to turn off Meltdown and Spectre mitigations.

Configure file system tuning

Each primary storage choice for tightly-coupled applications has its own cost, performance profile, APIs, and consistency semantics. The primary choices include the following:

  • Network File System (NFS) solutions, such as Filestore and Google Cloud NetApp Volumes. These solutions let you deploy shared storage options. Both Filestore and NetApp Volumes are fully managed by Google Cloud. Use them when your application does not have extreme I/O requirements to a single dataset. For performance limits, see the Filestore and NetApp Volumes documentation.

  • Google Cloud Managed Lustre is a fully managed POSIX-based parallel file system. This solution is commonly used by MPI applications.

Use Intel MPI

For best performance, use Intel MPI.

  • For Ansys Fluent, use Intel MPI 2018.4.274. Set the version of Intel MPI in Ansys Fluent by using the following command. Replace MPI_DIRECTORY with the path to the directory that contains your Intel MPI library.

    export INTELMPI_ROOT="MPI_DIRECTORY/compilers_and_libraries_2018.5.274/linux/mpi/intel64/"

    Intel MPI collective algorithms can be tuned for optimal performance. The recommended collective algorithms for Ansys Fluent is -genv I_MPI_ADJUST_BCAST 8 -genv I_MPI_ADJUST_ALLREDUCE 10.

  • For Simcenter STAR-CCM+, we also recommend you use the TCP fabric providers by specifying the following environment variables: I_MPI_FABRICS shm:ofi and FI_PROVIDER tcp.

Summary of best practices

The following is a summary of the recommended best practices for running HPC workloads on Google Cloud.


Resource

Recommendation
Machine family
  • Use compute-optimized machine family (H3, C2 or C2D)

OS image
  • Use the HPC VM image
  • Apply the HPC VM image best practices
File system

Use one of the following:

  • A Google Cloud managed NFS service such as Filestore or NetApp Volumes
  • A Google Cloud managed posix-based parallel file system, such as Managed Lustre
MPI
  • Use Intel MPI

What's next