Jump to Content
HPC

Introducing the latest Slurm on GCP scripts

March 19, 2021
Annie Ma-Weaver

Group Product Manager, Google Cloud HPC

Andrew Stein

Product Manager

Do you use the Slurm job scheduler to manage your high performance computing (HPC) workloads? Today, alongside SchedMD, we’re announcing the newest set of features for Slurm running on Google Cloud, including support for Terraform, the HPC VM Image, placement policies, Bulk API and instance templates, as well as a Google Cloud Marketplace listing. Note that Slurm’s support for the Bulk API is in Beta at the time of this release.

https://storage.googleapis.com/gweb-cloudblog-publish/images/slurm.max-900x900.jpg

Slurm is one of the leading open-source HPC workload managers used in TOP500 supercomputers around the world. Over the past four years, we’ve worked with SchedMD, the company behind Slurm, to release ever-improving versions of Slurm on Google Cloud. 

Here’s more information about these new features:

Support for Terraform
In this release, Terraform support is now generally available. The latest scripts automatically deploy a SchedMD-provided Virtual Machine (VM) image based on the Google Cloud HPC VM image, a CentOS 7-based VM image optimized for HPC workloads that we announced in February. This new image-based deployment reduces the time to deploy a Slurm cluster to just a few minutes.

Placement policies
You can now create a set of nodes on demand, per job, in a placement policy. With the previous version of our Slurm on GCP scripts, you were only able to enable placement policies at a cluster-level. Now you can configure placement policies per partition, enabling you to achieve significant improvements in latency and performance for your tightly coupled workloads.

Bulk API
Slurm is now able to use the Bulk API to create instances. This allows for faster and more efficient creation of VM instances than ever before by collecting up to 1,000 in a single API call. The Bulk API also supports “regional capacity finding,” and can create instances in whichever zone within a region that has the necessary capacity, improving the speed and likelihood of getting the resources requested.

Instance templates
You can now specify instance templates as the definitions for creating Slurm instances.

Cloud Marketplace listing
Last but not least, we’re excited to share that the Slurm on Google Cloud scripts are now available through our Cloud Marketplace. From the Google Cloud Console, you can locate and launch the latest version of Slurm on Google Cloud in just a few clicks. The Cloud Marketplace listing also provides more information about how to access additional managed services from SchedMD, helping you expand and deepen your HPC workloads on Google Cloud using Slurm. 

Research organizations are taking advantage of Google Cloud’s capacity with Slurm scripts to meet increased demand for their HPC compute clusters. 

"When it comes to supporting cutting-edge research requiring advanced computing, there are never enough resources on-prem. Driven by the application of Artificial Intelligence in a wide spectrum of research areas, the undertaking of urgent COVID-19 research, and the increasing popularity of AI, ML and Data Science academic courses, the job wait times on our HPC cluster have been increasing. 

To address the increasing job wait times, and to allow researchers to evaluate the latest CPUs and GPUs, the HPC team had been evaluating the viability of bursting jobs to Google Cloud. 

With additional features from the Slurm on Google Cloud, and offerings such as preemptible virtual machines, we decided to burst jobs that have been submitted to our on-prem cluster to GCP, enabling us to reduce job wait times and produce research results faster." - Stratos Efstathiadis, Director, Research Technology Services at NYU

Getting started
This new release was built by the Slurm experts at SchedMD. You can download this release in SchedMD’s GitHub repository. For more information, check out the included README. If you need help getting started with Slurm check out the quick start guide, and for help with the Slurm features for Google Cloud check out the Slurm Auto-Scaling Cluster codelab and the Deploying a Slurm cluster on Google Compute Engine and Installing apps in a Slurm cluster on Compute Engine solution guides. If you have further questions, you can post on the Slurm on GCP Google discussion group, or contact SchedMD directly.

Posted in