Installing apps in a Slurm cluster on Compute Engine


This tutorial shows how to install an app in a Slurm cluster on Google Cloud so that the app is available across the cluster even when the cluster is autoscaled. An app can be packaged as a tar file or as an RPM. The tutorial shows how to install apps using each file type. This tutorial also shows how to use a script to install different versions of Python.

This document is for administrators of Slurm clusters and assumes a basic knowledge of the following:

  • Linux system administration
  • Command-line usage
  • Slurm

In this tutorial, you install multiple versions of the same app and you create modulefiles to use the different versions of the app for different jobs. There are many use cases for installing multiple versions:

  • Test results that were obtained with one version or the other version.
  • Use code that requires a specific version of an app.

This document also shows how to use environment modules so that any jobs running on the cluster can access the app. You use a Slurm cluster to run batch jobs or parallel compute jobs.

The following diagram illustrates the structure of a Slurm cluster deployed on Google Cloud.

Architectural diagram showing a Slurm cluster installed on Compute Engine.

To schedule jobs on Compute Engine compute nodes, you log in to the login node, also known as the head node. Scheduled jobs run on one or more compute nodes. Compute nodes can be either static nodes that are always online, or ephemeral nodes that are created in response to scheduled jobs and that are later destroyed.

When installing apps on clusters, your installation needs to meet the following requirements:

  • Software installations are available on both types of nodes.
  • Multiple versions of the same tool or library are available simultaneously.
  • Different versions of the software are available in the cluster.

To meet these requirements, install software packages on the cluster's NFS server.

Objectives

  • Understand the NFS-mounted apps and home directories in a Slurm cluster on Google Cloud
  • Install software in the NFS-mounted apps directory.
  • Set up an environment modulefile for recently installed software packages.
  • Install different versions of Python

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

  4. Enable the Compute Engine API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

  7. Enable the Compute Engine API.

    Enable the API

  8. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  9. This tutorial assumes that you have created, or have access to, a Slurm cluster on Google Cloud. Learn how to deploy a Slurm cluster to Google Cloud.

Installing apps

Every Slurm cluster on Google Cloud includes an NFS server that exports mounts for the /apps and /home directories. Depending on the deployment, the NFS server can run on the following:

The /apps and /home mounts are available on every node in the cluster regardless of where the NFS server is running.

After you deploy the cluster, the /apps directory in a Slurm cluster on Google Cloud contains the following two directories:

  • /apps/modulefiles: contains the files to alter or set shell environment variables.
  • /apps/slurm: contains all files and directories associated with the Slurm workload management application.

Install tar files in the apps directory

The following steps show how to install an app that is packaged as a tar file. For this example, you install the Julia programming language compiler and runtime.

  1. In Cloud Shell, log in to your cluster's login node using SSH. Replace cluster-name with the name of the cluster.

    gcloud compute ssh cluster-name-login0
    sudo -i
    
  2. Create a /julia directory in the /apps directory:

    mkdir /apps/julia
    
  3. Install the latest version of Julia:

    wget https://julialang-s3.julialang.org/bin/linux/x64/1.3/julia-1.3.1-linux-x86_64.tar.gz
    mv julia-1.3.1-linux-x86_64.tar.gz /apps/julia
    cd /apps/julia
    tar zxf julia-1.3.1-linux-x86_64.tar.gz
    rm julia-1.3.1-linux-x86_64.tar.gz
    mv julia-1.3.1 1.3.1
    
  4. Install the long-term support (LTS) version of Julia in the /apps/julia directory:

    cd
    wget https://julialang-s3.julialang.org/bin/linux/x64/1.0/julia-1.0.5-linux-x86_64.tar.gz
    mv julia-1.0.5-linux-x86_64.tar.gz /apps/julia
    cd /apps/julia
    tar zxf julia-1.0.5-linux-x86_64.tar.gz
    rm julia-1.0.5-linux-x86_64.tar.gz
    mv julia-1.0.5 1.0.5
    

    There are now two versions of Julia installed.

Install RPMs in the apps directory

The following steps show how to install an app that is packaged as an RPM. For this example, you install the Singularity container runtime.

  1. In Cloud Shell, log in to your cluster's login node using SSH:

    gcloud compute ssh cluster-name-login0
    sudo -i
    
  2. Create a /singularity directory in the /apps directory:

    mkdir /apps/singularity
    
  3. Create a directory for the 3.5.3-1.1 version of Singularity:

    mkdir /apps/singularity/3.5.3-1.1
    
  4. Get the Singularity 3.5.3-1.1 RPM and unpack it:

    wget http://rpmfind.net/linux/opensuse/tumbleweed/repo/oss/x86_64/singularity-3.5.3-1.1.x86_64.rpm
    rpm2cpio singularity-3.5.3-1.1.x86_64.rpm | cpio -idmv
    rm singularity-3.5.3-1.1.x86_64.rpm
    

Using Environment Modules

The Environment Modules package simplifies the initialization and management of the shell environment when you execute a Slurm job. You create modulefiles that alter or set shell environment variables, for example, PATH or MANPATH. All popular shells are supported as well as some scripting languages, including Perl, Python, and Ruby.

You use the module command to work with the Environment Modules package. The MODULEPATH environment variable tells the module command where to search for modulefiles in your system. In a Slurm cluster on Google Cloud, the MODULEPATH environment variable specifies the following three directories:

  • /usr/share/Modules/modulefiles contains the modulefiles that are part of the Environment Modules package distribution.
  • /etc/modulefiles contains the module file openmpi-x86_64.
  • /apps/modulefiles contains the modulefiles for any apps that you install in the cluster.

How you organize the modulefiles for apps that you install in the /apps/modulefiles directory is up to you. We recommend that you create a directory for each app that you install in the /apps/modulefiles, and then create a modulefile for every version of the app in the app-specific directory.

Create a simple modulefile

Modulefiles are written in the Tool Command Language (Tcl) and interpreted by the modulecmd.tcl Tcl program through the module user interface. By using Tcl, you let modulefiles handle complex configurations, but most modulefiles are relatively simple.

  1. In Cloud Shell, log in to your cluster's login node using SSH:

    gcloud compute ssh cluster-name-login0
    sudo -i
    
  2. Create the file /apps/modulefiles/julia/1.0.5.

  3. In a text editor, open the 1.0.5 file and paste the following:

    #%Module1.0#####################################################################
    ##
    ## modules julia/1.0.5.
    ##
    ## modulefiles/julia/1.0.5.
    ##
    proc ModulesHelp { } {
            global version modroot
            puts stderr "julia/1.0.5 - sets the environment for Julia 1.0.5"
    }
    module-whatis   "Sets the environment for using Julia 1.0.5"
    # for Tcl script use only
    set     topdir          /apps/julia/1.0.5
    set     version         1.0.5
    set     sys             linux86
    prepend-path    PATH            $topdir/bin
    prepend-path    MANPATH         $topdir/man
    

    Consider the following in the code:

    • The first line is a magic cookie that indicates the minimum version of the modulecmd.tcl Tcl program that is required to interpret the modulefile.
    • The ModulesHelp procedure is called by the module help command to provide additional information about a modulefile.
    • The module-whatis command defines a string displayed in response to the module-whatis command.
    • The set commands create Tcl variable bindings between the first argument and the second argument.
    • The prepend-path modifies the environment variable referenced in the first argument by prepending the value specified in the second argument.

    For more information, see the modulefile documentation.

Installing versions of Python

This step is optional. Follow these steps if the apps running on your cluster require custom Python versions or complex library builds.

Run the following script to install a Python version using the virtualenv tool. This script creates a module entry called python2. You can modify the script to meet the custom needs of your configuration. When you use virtualenv, the Python executable file and all libraries are installed in the virtualenv directory. This configuration lets you freeze a distinct binary and collection of libraries for a particular app.

Instead of directly editing a module, create the following bash script.

  1. In Cloud Shell, log in to your cluster's login node using SSH:

    gcloud compute ssh cluster-name-login0
    
  2. Install pip:

    curl https://bootstrap.pypa.io/get-pip.py | python -
    
  3. Install the virtualenv, absl-py, and google-cloud-storage packages:

    pip install virtualenv
    cd /apps
    virtualenv python2
    source ./python2/bin/activate
    pip install absl-py
    pip install google-cloud-storage
    
  4. Create the /apps/modulefiles/python2 modulefile:

    cat > /apps/modulefiles/python2 << "PYTHONEND"
    #%Module1.0#####################################################################
    ##
    ## python2 for Google modulefile
    ##
    proc ModulesHelp { } {
           global version modroot
    
           puts stderr "\n\tThis adds $modroot/* to several of the"
           puts stderr "\tenvironment variables."
           puts stderr "\n\tVersion $version\n"
    }
    
    module-whatis    "Runs Virtualenv for python. Included Google Cloud Storage.";
    
    # for Tcl script use only
    set    version        000
    set    modroot        /apps/python2
    
    if {[module-info mode] == "load"} {
       puts stdout "source /apps/python2/bin/activate;"
    } elseif {[module-info mode] == "remove"} {
       puts stdout "deactivate;"
    }
    
    PYTHONEND
    
  5. Add the python2 module to your environment:

    module load python2
    

    The Slurm job can now use Python 2.x.

Clean up

The easiest way to eliminate billing is to delete the Google Cloud project you created for the tutorial. Alternatively, you can delete the individual resources.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

  • Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.