Improving the Slurm on Google Cloud Experience
Wyatt Gorman
Solutions Manager, HPC & AI Infrastructure, Google Cloud
Nick Ihli
Director of Cloud and Solutions Engineering, SchedMD
If you use the Slurm workload manager to manage your HPC workloads on Google Cloud, get ready for some new enhancements to Slurm on Google Cloud. We’re pleased to announce a set of improvements to the Slurm on Google Cloud open-source code, developed by SchedMD. This includes support for more resource types and operating systems, simpler deployment, improved error reporting and transparency, and enhanced support for security features. Read on to understand how Slurm could be an important part of your Google Cloud HPC environment.
Slurm on Google Cloud
Slurm on Google Cloud is an open-source scheduling solution available for deployment through the Cloud HPC Toolkit, directly through Terraform, or from the Google Cloud Marketplace. It is the result of an ongoing Google and SchedMD partnership to develop Slurm on Google Cloud and we’re excited to announce the latest updates.
Slurm on Google Cloud has a number of key benefits:
Flexibility: Slurm supports a wide variety of Google HPC infrastructure, permitting multiple configurations to meet your complex workload needs.
Scalability: Slurm on Google Cloud offers impressive scalability to handle performance requirements for large cluster and exascale computers. Slurm spans HPC, HTC, and AI workloads with proven reliability.
Cost-effectiveness: Slurm on Google Cloud is an economical way to run HPC workloads.
To enhance the experience of using Slurm on Google Cloud, we’re releasing the latest iteration of the solution.
Improvements for Slurm on Google Cloud
The latest release is now available. New developments and improvements include:
Expanded resource support: Slurm on Google Cloud now supports ARM CPUs and NVIDIA Multi-instance GPUs. This allows you to run your cluster and workloads on a wider range of hardware, giving you more flexibility and choice.
Rocky Linux support with a Slurm-ready image: The Slurm-ready image for Rocky Linux 8, based on the new HPC VM Image for Rocky Linux, makes it easy to get started with Slurm on Rocky Linux.
Streamlined deployment updates: We’ve made it easier to update your Slurm cluster. With the new streamlined deployment updates, you can quickly and easily make changes to your cluster’s partitions with automated Slurm reconfiguration.
Improved logging and reporting: We’ve improved the logging and reporting capabilities of Slurm on Google Cloud, making it easier to track the performance of your cluster and identify any potential problems.
Hybrid Slurm advancements: We’ve made several advancements to hybrid Slurm, which allows you to run your workloads on both Google Cloud and on-premises resources. These improvements simplify the management of your hybrid cluster and improve the performance of your workloads.
Improved integrations with the Cloud HPC Toolkit: There are enhanced integrations between Slurm on Google Cloud and the Cloud HPC Toolkit, making it simpler to use the Cloud HPC Toolkit to deploy your Slurm cluster.
Support for Shielded VMs: We now support Shielded VMs for Slurm on Google Cloud. Shielded VMs provide additional security features for your workloads, such as verified boot and integrity monitoring.
Improved instance tagging: We’ve improved the instance tagging capabilities of Slurm on Google Cloud. Effortlessly tag your instances with custom labels, which can be used to control access to your cluster and to track the usage of your resources.
We’re excited to announce these new developments and improvements for the Slurm on Google Cloud open-source code. We believe these advancements will make Slurm on Google Cloud an even more powerful and scalable HPC solution.
If you are interested in trying these scripts out yourself, we suggest using our Cloud HPC Toolkit tutorial for launching a stand-alone Slurm cluster on Google Cloud. If you want to learn more, please visit the SchedMD website or contact Google Cloud.