Cloud HPC Toolkit

Cloud HPC Toolkit is open-source software offered by Google Cloud which makes it easy for you to deploy high performance computing (HPC) environments. It is designed to be highly customizable and extensible, and intends to address the HPC deployment needs of a broad range of use cases.

Benefits

Cloud HPC Toolkit provides you with the following benefits:

  • Fast creation and deployment of turnkey HPC environments that follow Google Cloud best practices
  • An open source solution that is configurable and extensible
  • Seamless integration with various partners such as Intel DAOS, DDN EXAscaler, and Slurm
  • Monitoring and performance visibility through integration with Cloud Monitoring

Components

Cloud HPC Toolkit has the following main components:

  • HPC blueprint: a YAML file that defines which HPC modules to use and how to customize them.
  • HPC modules: the building blocks of a deployment folder. Modules are composed of Terraform or Packer configuration files.
  • ghpc engine: a Google open source tool that uses the information in the HPC blueprint to combine different HPC modules and produce a deployment folder.
  • HPC deployment folder: a self-contained folder that can be used to deploy a cluster onto Google Cloud. With Cloud HPC Toolkit, you have the added flexibility to configure a cluster to your specifications by editing the deployment folder before you deploy.

How it works

Cloud HPC Toolkit architecture.
Figure 1. Cloud HPC Toolkit architecture overview

You can use Cloud HPC Toolkit to deploy an HPC cluster on Google Cloud as follows:

  1. Set up your working environment. Your working environment is the command line from which you will run your commands. This can either be a Linux or macOS command line or you can use Cloud Shell. If using a Linux or macOS command line, you need to install a few dependencies.
  2. From the command line, complete the following:

    1. Clone the Cloud HPC Toolkit GitHub repository. This repository contains the ghpc binary, modules, HPC blueprint examples, and other resources needed for the configuration of an HPC cluster.
    2. Build the ghpc binary.

    For detailed instructions, see Configure your environment.

  3. Use an editor to create your HPC blueprint file. Example blueprints are also available in the Cloud HPC Toolkit GitHub repository. These blueprints can be used either directly or as a template or starting point for your custom HPC blueprint.

  4. From the command line, complete the following:

    1. Run the ghpc create command and specify your HPC blueprint. When you run this command, ghpc engine then completes the following steps:
      1. Builds a deployment folder that is based on the specified HPC Blueprint. This deployment folder contains all the specifications and resources needed to deploy the cluster.
      2. Prints instructions to the command-line on how to deploy the cluster. This will provide you with the commands that you must run to deploy the cluster. These will either be Terraform or Packer commands.
    2. Run the commands provided by the ghpc engine. When you run these commands, Terraform or Packer then deploys the cluster on Google Cloud.

    For detailed instructions, see Deploy a cluster.

  5. After your cluster is deployed, you can submit jobs to your HPC cluster. You can also use Cloud Monitoring to analyze and monitor Google Cloud resources that are used by your cluster.

Limitations

Cloud HPC Toolkit only supports creating and deleting a cluster. If you want to modify the hardware or software configuration of an active cluster, Google recommends the following steps:

  1. Delete the cluster
  2. Update the HPC blueprint
  3. Create the HPC deployment folder
  4. Deploy the cluster

What's next