After completing this tutorial, you'll know how to:
- Enable Elasticluster to access your GCP project
- Use Elasticluster to create a cluster of Compute Engine VMs that run Grid Engine
- Copy files to the cluster and connect to cluster instances
This tutorial uses billable components of GCP, including:
- Compute Engine
- Cloud Storage
Before you begin
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
Select or create a Google Cloud Platform project.
Make sure that billing is enabled for your Google Cloud Platform project.
- Enable the Compute Engine API.
- Install and initialize the Cloud SDK.
Set up your local environment
If you haven't already installed Elasticluster on your local machine, follow the steps below.
Python version 2.7 is required to run Elasticluster. For more information on setting up your Python development environment, such as installing pip on your system, see the Python Development Environment Setup Guide.
If you don't have virtualenv, install it using pip:
pip install virtualenv
It is highly recommended that you install Elasticluster into a Python virtualenv. The virtualenv keeps Elasticluster and its Python dependencies separate from the rest of your Python environment so updates to your Python environment won't break Elasticluster.
Run virtualenv to create an isolated Python environment called
elasticluster. Then set your current directory:
virtualenv elasticluster source elasticluster/bin/activate cd elasticluster
elasticlusterdirectory virtualenv created in the previous step, clone the Elasticluster GitHub repository and install dependencies:
git clone git://github.com/gc3-uzh-ch/elasticluster.git src cd src pip install -e . pip install oauth2client
Check that the Elasticluster installation was successful. Running this command generates a
~/.elasticluster/configfile, and may generate a series of warnings and errors, but these can be safely ignored.
Allow Elasticluster to access your GCP project
Follow the steps below to enable Elasticluster to access resources in your GCP project and create clusters of Compute Engine VMs. Complete these steps on your local machine.
Obtain your client ID and client secret
- Go to the Credentials page.
Go to the Credentials page
- Select your GCP project.
Click Create credentials, then select OAuth Client ID.
Under Application type, select Other, add a Name, then click Create.
On the OAuth client window that appears, note the client ID and client secret. You'll need to use these in the Elasticluster configuration file.
On the Credentials window, your new Other credentials appear along with the primary client ID that's used to access your application.
Generate an SSH key pair
Elasticluster needs an SSH key pair to connect to GCP and
start Compute Engine VMs. If you haven't already connected to a
Compute Engine instance using the
gcloud compute ssh command,
enter the following command:
gcloud compute config-ssh
After the process completes, you'll see the following new key pairs on your machine:
Select a disk image for the cluster's VMs
To start Compute Engine VMs, your Elasticluster configuration needs to specify one of the Debian images available on Compute Engine. Run the following command and note the name of the disk image returned:
gcloud compute images list | grep debian | cut -f 1 -d " "
If multiple disk images are returned, choose the first image in the list. In the following example output, you would select the highlighted image:
~/.elasticluster/config file and delete all of its contents. Then
copy the following text, substituting the relevant variables, and save the file.
# Grid Engine software to be configured by Ansible [setup/gridengine] provider=ansible frontend_groups=gridengine_master compute_groups=gridengine_worker # Create a cloud provider called "google-cloud" [cloud/google-cloud] provider=google gce_project_id=
PROJECT_IDgce_client_id= CLIENT_IDgce_client_secret= SECRET_KEY# Create a login called "google-login" [login/google-login] image_user= GOOGLE_USER_ID (just the user ID, not the full email address)image_user_sudo=root image_sudo=True user_key_name=elasticluster user_key_private=~/.ssh/google_compute_engine user_key_public=~/.ssh/google_compute_engine.pub # Bring all of the elements together to define a cluster called "gridengine" [cluster/gridengine] cloud=google-cloud login=google-login setup=gridengine security_group=default image_id= IMAGEflavor=n1-standard-1 frontend_nodes=1 compute_nodes=3 image_userdata= ssh_to=frontend
For more information on the Elasticluster configuration file, see the Elasticluster documentation.
Run a cluster of Compute Engine VMs
The following steps show you how to start a cluster, interact with the cluster, and stop the cluster.
Complete these steps on your local machine. The first time you start a cluster, you'll need to authorize Elasticluster to issue Compute Engine API requests on your behalf. The authorization flow launches a web browser on the same machine from which you started the cluster.
If you need to run Elasticluster on a remote machine or in some other environment
that isn't able to open a web browser, before you start the cluster, edit
~/.elasticluster/config and add
noauth_local_webserver=true to the
# Create a cloud provider [cloud/google-cloud] provider=google noauth_local_webserver=true
Start a cluster
Run the following command to start the cluster:
elasticluster start gridengine
The setup process might take several minutes. After the cluster starts, the following message prints to the console:
Your cluster `gridengine` is ready! Cluster name: gridengine Cluster template: gridengine Default ssh to node: frontend001 - frontend nodes: 1 - compute nodes: 4 To login on the frontend node, run the command: elasticluster ssh gridengine To upload or download files to the cluster, use the command: elasticluster sftp gridengine
To get verbose output, use the
elasticluster start gridengine -v
List cluster instances
To list the instances in your cluster, run the following command:
elasticluster list-nodes gridengine
A message similar to the following appears:
Cluster name: gridengine Cluster template: gridengine Default ssh to node: frontend001 - frontend nodes: 1 - compute nodes: 4 To login on the frontend node, run the command: elasticluster ssh gridengine To upload or download files to the cluster, use the command: elasticluster sftp gridengine frontend nodes: - frontend001 connection IP: 203.0.113.1 IPs: 203.0.113.1 instance id: gridengine-frontend001 instance flavor: n1-standard-1 compute nodes: - compute001 connection IP: 198.51.100.1 IPs: 198.51.100.1 instance id: gridengine-compute001 instance flavor: n1-standard-1 ...
Copy files to cluster instances
You can use Elasticluster's
sftp command to open an SFTP session to
the cluster's frontend node. This allows you to upload or download files to and
from the cluster. For more information about using SFTP with Elasticluster,
view the Elasticluster documentation.
To open an SFTP session, run the following command:
elasticluster sftp gridengine
You can use a here document to send a list of commands over SFTP:
elasticluster sftp gridengine << 'EOF' put *.sh EOF
For more information on SFTP, view the SFTP man page.
Connect to cluster instances
With Elasticluster, you can use SSH to connect to any of your cluster nodes. If
elasticluster ssh gridengine without specifying a node, Elasticluster
automatically connects to the frontend node:
elasticluster ssh gridengine
To connect to other nodes in the cluster, add the
-n flag and specify the name
of the node:
elasticluster ssh gridengine -n
For example, to connect to the
compute001 node from the output in
Listing cluster instances, run the following
elasticluster ssh gridengine -n compute001
Exit the virtualenv
To exit the virtualenv, run the
deactivate command from the
To use Elasticluster commands again, re-active virtualenv
by running the
source elasticluster/bin/activate command.
After you finish this tutorial, you can clean up the resources you created on Google Cloud Platform so you won't be billed for them in the future. The following sections describe how to delete or turn off these resources.
Destroy the cluster
To stop the cluster and turn off all of the cluster instances, run the following command:
elasticluster stop gridengine
This command returns a prompt asking if you really want to stop the cluster.
To stop the cluster without any prompt, add the
--yes flag to the command:
elasticluster stop --yes gridengine
Delete the project
The easiest way to eliminate billing is to delete the project you used for the tutorial.
To delete the project:
- In the GCP Console, go to the Projects page.
- In the project list, select the project you want to delete and click Delete project.
- In the dialog, type the project ID, and then click Shut down to delete the project.
- See the Cloud Genomics Elasticluster fork. This fork provides bug fixes and enhancements that are relevant to Google Cloud Platform use cases.