This tutorial shows how to install an app in a Slurm cluster on Google Cloud so that the app is available across the cluster even when the cluster is autoscaled. An app can be packaged as a tar file or as an RPM. The tutorial shows how to install apps using each file type. This tutorial also shows how to use a script to install different versions of Python.
This document is for administrators of Slurm clusters and assumes a basic knowledge of the following:
- Linux system administration
- Command-line usage
- Slurm
In this tutorial, you install multiple versions of the same app and you create modulefiles to use the different versions of the app for different jobs. There are many use cases for installing multiple versions:
- Test results that were obtained with one version or the other version.
- Use code that requires a specific version of an app.
This document also shows how to use environment modules so that any jobs running on the cluster can access the app. You use a Slurm cluster to run batch jobs or parallel compute jobs.
The following diagram illustrates the structure of a Slurm cluster deployed on Google Cloud.
To schedule jobs on Compute Engine compute nodes, you log in to the login node, also known as the head node. Scheduled jobs run on one or more compute nodes. Compute nodes can be either static nodes that are always online, or ephemeral nodes that are created in response to scheduled jobs and that are later destroyed.
When installing apps on clusters, your installation needs to meet the following requirements:
- Software installations are available on both types of nodes.
- Multiple versions of the same tool or library are available simultaneously.
- Different versions of the software are available in the cluster.
To meet these requirements, install software packages on the cluster's NFS server.
Objectives
- Understand the NFS-mounted apps and home directories in a Slurm cluster on Google Cloud
- Install software in the NFS-mounted apps directory.
- Set up an environment modulefile for recently installed software packages.
- Install different versions of Python
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the Compute Engine API.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the Compute Engine API.
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
- This tutorial assumes that you have created, or have access to, a Slurm cluster on Google Cloud. Learn how to deploy a Slurm cluster to Google Cloud.
Installing apps
Every Slurm cluster on Google Cloud includes an NFS server that exports
mounts for the /apps
and /home
directories. Depending on the deployment, the
NFS server can run on the following:
- The cluster's controller node
- An instance of Filestore
- A Google Cloud Elastifile cluster
- Your own on-premises NFS server
The /apps
and /home
mounts are available on every node in the cluster
regardless of where the NFS server is running.
After you deploy the cluster, the /apps
directory in a Slurm cluster on
Google Cloud contains the following two directories:
/apps/modulefiles
: contains the files to alter or set shell environment variables./apps/slurm
: contains all files and directories associated with the Slurm workload management application.
Install tar files in the apps directory
The following steps show how to install an app that is packaged as a tar file. For this example, you install the Julia programming language compiler and runtime.
In Cloud Shell, log in to your cluster's login node using SSH. Replace
cluster-name
with the name of the cluster.gcloud compute ssh cluster-name-login0 sudo -i
Create a
/julia
directory in the/apps
directory:mkdir /apps/julia
Install the latest version of Julia:
wget https://julialang-s3.julialang.org/bin/linux/x64/1.3/julia-1.3.1-linux-x86_64.tar.gz mv julia-1.3.1-linux-x86_64.tar.gz /apps/julia cd /apps/julia tar zxf julia-1.3.1-linux-x86_64.tar.gz rm julia-1.3.1-linux-x86_64.tar.gz mv julia-1.3.1 1.3.1
Install the long-term support (LTS) version of Julia in the
/apps/julia
directory:cd wget https://julialang-s3.julialang.org/bin/linux/x64/1.0/julia-1.0.5-linux-x86_64.tar.gz mv julia-1.0.5-linux-x86_64.tar.gz /apps/julia cd /apps/julia tar zxf julia-1.0.5-linux-x86_64.tar.gz rm julia-1.0.5-linux-x86_64.tar.gz mv julia-1.0.5 1.0.5
There are now two versions of Julia installed.
Install RPMs in the apps directory
The following steps show how to install an app that is packaged as an RPM. For this example, you install the Singularity container runtime.
In Cloud Shell, log in to your cluster's login node using SSH:
gcloud compute ssh cluster-name-login0 sudo -i
Create a
/singularity
directory in the/apps
directory:mkdir /apps/singularity
Create a directory for the 3.5.3-1.1 version of Singularity:
mkdir /apps/singularity/3.5.3-1.1
Get the Singularity 3.5.3-1.1 RPM and unpack it:
wget http://rpmfind.net/linux/opensuse/tumbleweed/repo/oss/x86_64/singularity-3.5.3-1.1.x86_64.rpm rpm2cpio singularity-3.5.3-1.1.x86_64.rpm | cpio -idmv rm singularity-3.5.3-1.1.x86_64.rpm
Using Environment Modules
The
Environment Modules
package simplifies the initialization and management of the shell environment
when you execute a Slurm job. You create modulefiles that alter or set shell
environment variables, for example, PATH
or MANPATH
. All popular shells are
supported as well as some scripting languages, including Perl, Python, and
Ruby.
You use the module
command to work with the Environment Modules package. The
MODULEPATH
environment variable tells the module
command where to search for
modulefiles in your system. In a Slurm cluster on Google Cloud, the
MODULEPATH
environment variable specifies the following three directories:
/usr/share/Modules/modulefiles
contains the modulefiles that are part of the Environment Modules package distribution./etc/modulefiles
contains the module fileopenmpi-x86_64
./apps/modulefiles
contains the modulefiles for any apps that you install in the cluster.
How you organize the modulefiles for apps that you install in the
/apps/modulefiles
directory is up to you. We recommend that you create a
directory for each app that you install in the /apps/modulefiles,
and then
create a modulefile for every version of the app in the app-specific
directory.
Create a simple modulefile
Modulefiles are written in the
Tool Command Language (Tcl)
and interpreted by the modulecmd.tcl
Tcl program through the module user
interface. By using Tcl, you let modulefiles handle complex configurations, but
most modulefiles are relatively simple.
In Cloud Shell, log in to your cluster's login node using SSH:
gcloud compute ssh cluster-name-login0 sudo -i
Create the file
/apps/modulefiles/julia/1.0.5
.In a text editor, open the
1.0.5
file and paste the following:#%Module1.0##################################################################### ## ## modules julia/1.0.5. ## ## modulefiles/julia/1.0.5. ## proc ModulesHelp { } { global version modroot puts stderr "julia/1.0.5 - sets the environment for Julia 1.0.5" } module-whatis "Sets the environment for using Julia 1.0.5" # for Tcl script use only set topdir /apps/julia/1.0.5 set version 1.0.5 set sys linux86 prepend-path PATH $topdir/bin prepend-path MANPATH $topdir/man
Consider the following in the code:
- The first line is a
magic cookie
that indicates the minimum version of the
modulecmd.tcl
Tcl program that is required to interpret the modulefile. - The
ModulesHelp
procedure is called by themodule help
command to provide additional information about a modulefile. - The
module-whatis
command defines a string displayed in response to themodule-whatis
command. - The
set
commands create Tcl variable bindings between the first argument and the second argument. - The
prepend-path
modifies the environment variable referenced in the first argument by prepending the value specified in the second argument.
For more information, see the modulefile documentation.
- The first line is a
magic cookie
that indicates the minimum version of the
Installing versions of Python
This step is optional. Follow these steps if the apps running on your cluster require custom Python versions or complex library builds.
Run the following script to install a Python version
using
the virtualenv
tool.
This script creates a module entry called python2
. You can modify the script
to meet the custom needs of your configuration. When you use virtualenv
, the
Python executable file and all libraries are installed in the virtualenv
directory. This configuration lets you freeze a distinct binary and collection
of libraries for a particular app.
Instead of directly editing a module, create the following bash script.
In Cloud Shell, log in to your cluster's login node using SSH:
gcloud compute ssh cluster-name-login0
Install pip:
curl https://bootstrap.pypa.io/get-pip.py | python -
Install the
virtualenv
,absl-py
, andgoogle-cloud-storage
packages:pip install virtualenv cd /apps virtualenv python2 source ./python2/bin/activate pip install absl-py pip install google-cloud-storage
Create the
/apps/modulefiles/python2
modulefile:cat > /apps/modulefiles/python2 << "PYTHONEND" #%Module1.0##################################################################### ## ## python2 for Google modulefile ## proc ModulesHelp { } { global version modroot puts stderr "\n\tThis adds $modroot/* to several of the" puts stderr "\tenvironment variables." puts stderr "\n\tVersion $version\n" } module-whatis "Runs Virtualenv for python. Included Google Cloud Storage."; # for Tcl script use only set version 000 set modroot /apps/python2 if {[module-info mode] == "load"} { puts stdout "source /apps/python2/bin/activate;" } elseif {[module-info mode] == "remove"} { puts stdout "deactivate;" } PYTHONEND
Add the
python2
module to your environment:module load python2
The Slurm job can now use Python 2.x.
Clean up
The easiest way to eliminate billing is to delete the Google Cloud project you created for the tutorial. Alternatively, you can delete the individual resources.Delete the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.