Creating a Grid Engine Cluster with Elasticluster

If you have existing tools that use Grid Engine to run tasks on Elasticluster compute clusters, follow this tutorial to create a similar environment for running those tasks on Google Cloud Platform.

Objectives

After completing this tutorial, you'll know how to:

  • Enable Elasticluster to access your GCP project
  • Use Elasticluster to create a cluster of Compute Engine VMs that run Grid Engine
  • Copy files to the cluster and connect to cluster instances

Costs

This tutorial uses billable components of GCP, including:

  • Compute Engine
  • Cloud Storage

Use the Pricing Calculator to generate a cost estimate based on your projected usage. New Cloud Platform users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Google Cloud Platform project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your Google Cloud Platform project.

    Learn how to enable billing

  4. Enable the Compute Engine API.

    Enable the API

  5. Install and initialize the Cloud SDK.

Set up your local environment

If you haven't already installed Elasticluster on your local machine, follow the steps below.

Install prerequisites

  1. Install Python 2.7.

    Python version 2.7 is required to run Elasticluster. For more information on setting up your Python development environment, such as installing pip on your system, see the Python Development Environment Setup Guide.

  2. If you don't have virtualenv, install it using pip:

    pip install virtualenv
    

    It is highly recommended that you install Elasticluster into a Python virtualenv. The virtualenv keeps Elasticluster and its Python dependencies separate from the rest of your Python environment so updates to your Python environment won't break Elasticluster.

Install Elasticluster

  1. Run virtualenv to create an isolated Python environment called elasticluster. Then set your current directory:

    virtualenv elasticluster
    source elasticluster/bin/activate
    cd elasticluster
    
  2. In the elasticluster directory virtualenv created in the previous step, clone the Elasticluster GitHub repository and install dependencies:

    git clone git://github.com/gc3-uzh-ch/elasticluster.git src
    cd src
    pip install -e .
    pip install oauth2client
    
  3. Check that the Elasticluster installation was successful. Running this command generates a ~/.elasticluster/config file, and may generate a series of warnings and errors, but these can be safely ignored.

    elasticluster list-templates
    

Allow Elasticluster to access your GCP project

Follow the steps below to enable Elasticluster to access resources in your GCP project and create clusters of Compute Engine VMs. Complete these steps on your local machine.

Obtain your client ID and client secret

  1. Go to the Credentials page.
    Go to the Credentials page
  2. Select your GCP project.
  3. Click Create credentials, then select OAuth Client ID.

  4. Under Application type, select Other, add a Name, then click Create.

  5. On the OAuth client window that appears, note the client ID and client secret. You'll need to use these in the Elasticluster configuration file.

  6. On the Credentials window, your new Other credentials appear along with the primary client ID that's used to access your application.

Generate an SSH key pair

Elasticluster needs an SSH key pair to connect to GCP and start Compute Engine VMs. If you haven't already connected to a Compute Engine instance using the gcloud compute ssh command, enter the following command:

gcloud compute config-ssh

After the process completes, you'll see the following new key pairs on your machine:

~/.ssh/google_compute_engine
~/.ssh/google_compute_engine.pub

Select a disk image for the cluster's VMs

To start Compute Engine VMs, your Elasticluster configuration needs to specify one of the Debian images available on Compute Engine. Run the following command and note the name of the disk image returned:

gcloud compute images list | grep debian | cut -f 1 -d " "

If multiple disk images are returned, choose the first image in the list. In the following example output, you would select the highlighted image:

debian-8-jessie-v20180611
debian-9-stretch-v20180611

Configure Elasticluster

Open the ~/.elasticluster/config file and delete all of its contents. Then copy the following text, substituting the relevant variables, and save the file.

# Grid Engine software to be configured by Ansible
[setup/gridengine]
provider=ansible
frontend_groups=gridengine_master
compute_groups=gridengine_worker

# Create a cloud provider called "google-cloud"
[cloud/google-cloud]
provider=google
gce_project_id=PROJECT_ID
gce_client_id=CLIENT_ID
gce_client_secret=SECRET_KEY

# Create a login called "google-login"
[login/google-login]
image_user=GOOGLE_USER_ID (just the user ID, not the full email address)
image_user_sudo=root
image_sudo=True
user_key_name=elasticluster
user_key_private=~/.ssh/google_compute_engine
user_key_public=~/.ssh/google_compute_engine.pub

# Bring all of the elements together to define a cluster called "gridengine"
[cluster/gridengine]
cloud=google-cloud
login=google-login
setup=gridengine
security_group=default
image_id=IMAGE
flavor=n1-standard-1
frontend_nodes=1
compute_nodes=3
image_userdata=
ssh_to=frontend

For more information on the Elasticluster configuration file, see the Elasticluster documentation.

Run a cluster of Compute Engine VMs

The following steps show you how to start a cluster, interact with the cluster, and stop the cluster.

Complete these steps on your local machine. The first time you start a cluster, you'll need to authorize Elasticluster to issue Compute Engine API requests on your behalf. The authorization flow launches a web browser on the same machine from which you started the cluster.

If you need to run Elasticluster on a remote machine or in some other environment that isn't able to open a web browser, before you start the cluster, edit ~/.elasticluster/config and add noauth_local_webserver=true to the cloud/google-cloud section:

# Create a cloud provider
[cloud/google-cloud]
provider=google
noauth_local_webserver=true

Start a cluster

Run the following command to start the cluster:

elasticluster start gridengine

The setup process might take several minutes. After the cluster starts, the following message prints to the console:

Your cluster `gridengine` is ready!

Cluster name:     gridengine
Cluster template: gridengine
Default ssh to node: frontend001
- frontend nodes: 1
- compute nodes: 4

To login on the frontend node, run the command:

    elasticluster ssh gridengine

To upload or download files to the cluster, use the command:

    elasticluster sftp gridengine

To get verbose output, use the -v flag:

elasticluster start gridengine -v

List cluster instances

To list the instances in your cluster, run the following command:

elasticluster list-nodes gridengine

A message similar to the following appears:

Cluster name:     gridengine
Cluster template: gridengine
Default ssh to node: frontend001
- frontend nodes: 1
- compute nodes: 4

To login on the frontend node, run the command:

    elasticluster ssh gridengine

To upload or download files to the cluster, use the command:

    elasticluster sftp gridengine

frontend nodes:

  - frontend001
    connection IP: 203.0.113.1
    IPs:    203.0.113.1
    instance id:   gridengine-frontend001
    instance flavor: n1-standard-1

compute nodes:

  - compute001
    connection IP: 198.51.100.1
    IPs:    198.51.100.1
    instance id:   gridengine-compute001
    instance flavor: n1-standard-1

...

Copy files to cluster instances

You can use Elasticluster's sftp command to open an SFTP session to the cluster's frontend node. This allows you to upload or download files to and from the cluster. For more information about using SFTP with Elasticluster, view the Elasticluster documentation. To open an SFTP session, run the following command:

elasticluster sftp gridengine

You can use a here document to send a list of commands over SFTP:

elasticluster sftp gridengine << 'EOF'
put *.sh
EOF

For more information on SFTP, view the SFTP man page.

Connect to cluster instances

With Elasticluster, you can use SSH to connect to any of your cluster nodes. If you run elasticluster ssh gridengine without specifying a node, Elasticluster automatically connects to the frontend node:

elasticluster ssh gridengine

To connect to other nodes in the cluster, add the -n flag and specify the name of the node:

elasticluster ssh gridengine -n NODE_NAME

For example, to connect to the compute001 node from the output in Listing cluster instances, run the following command:

elasticluster ssh gridengine -n compute001

Exit the virtualenv

To exit the virtualenv, run the deactivate command from the command line:

deactivate

To use Elasticluster commands again, re-active virtualenv by running the source elasticluster/bin/activate command.

Clean up

After you finish this tutorial, you can clean up the resources you created on Google Cloud Platform so you won't be billed for them in the future. The following sections describe how to delete or turn off these resources.

Destroy the cluster

To stop the cluster and turn off all of the cluster instances, run the following command:

elasticluster stop gridengine

This command returns a prompt asking if you really want to stop the cluster. To stop the cluster without any prompt, add the --yes flag to the command:

elasticluster stop --yes gridengine

Delete the project

The easiest way to eliminate billing is to delete the project you used for the tutorial.

To delete the project:

  1. In the GCP Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete project. After selecting the checkbox next to the project name, click
      Delete project
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

  • See the Cloud Genomics Elasticluster fork. This fork provides bug fixes and enhancements that are relevant to Google Cloud Platform use cases.
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Genomics