Running Nextflow

This page explains how to run a pipeline on Google Cloud using Nextflow.

The pipeline used in this tutorial is a proof of concept of an RNA-Seq pipeline intended to show Nextflow usage on Google Cloud.


After completing this tutorial, you'll know how to:

  • Install Nextflow in Cloud Shell
  • Configure a Nextflow pipeline
  • Run a pipeline using Nextflow on Google Cloud


This tutorial uses billable components of Google Cloud, including:

  • Compute Engine
  • Cloud Storage

Use the Pricing Calculator to generate a cost estimate based on your projected usage. New Cloud Platform users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. Enable the Google Genomics, Compute Engine, and Cloud Storage APIs.

    Enable the APIs

Create a Cloud Storage bucket

Following the guidance outlined in the bucket and object naming guidelines, create a uniquely named bucket to store temporary work and output files throughout this tutorial.


  1. In the Cloud Console, open the Cloud Storage browser:

    GO TO Cloud Storage BROWSER

  2. Click Create bucket.

  3. In the Bucket name text box, enter the name you selected for BUCKET, and then click Create.


  1. Open Cloud Shell:

    GO TO Cloud Shell

  2. Create a bucket using the following commands:

    gsutil mb gs://BUCKET

Install and configure Nextflow in Cloud Shell

To avoid having to install any software on your machine, run all the terminal commands in this tutorial from Cloud Shell.

  1. Open Cloud Shell.


  2. Install Nextflow in Cloud Shell.

    export NXF_VER=19.01.0
    export NXF_MODE=google
    curl | bash
  3. Clone the sample pipeline repository which includes both the pipeline to run and the sample data that will be used.

    git clone
  4. Configure Nextflow:

    1. Change to the rnaseq-nf folder.

      cd rnaseq-nf

    2. Copy the following text and paste it at the end of the file named nextflow.config. Replace the PROJECT_ID and REGION variables with your values. The REGION variable is the region in which to run, such as us-central1. If the section below exists in the config file, replace the sections with the values below.

      process {
         executor = 'google-pipelines'
      cloud {
         instanceType = 'n1-standard-1'
      google {
         project = 'PROJECT_ID'
         region = 'REGION'
    3. Change back to the previous folder

      cd ..

Run the Pipeline with Nextflow

  1. Run the Pipeline with Nextflow.

    ./nextflow run rnaseq-nf/ -w gs://BUCKET/WORK_DIR
  2. Nextflow will continue to run in Cloud Shell.

Viewing output of Nextflow Pipeline

After the pipeline finishes, you can check the output as well as any logs, errors, commands run, and temporary files.

The final output file will be saved in the Cloud Storage instance as the file results/qc_report.html.

To check individual output files from each task as well as intermediate files:


  1. In the Cloud Storage console, open the Storage Browser page:

    GO TO Cloud Storage BROWSER

  2. Go to the BUCKET and WORK_DIR you specified when running the pipeline.

  3. There will be a folder for each of the separate tasks that was run in the pipeline.

  4. The folder will contain the commands that were run, output files, and temporary files used during the workflow.


  1. To see the output files in Cloud Shell, first open Cloud Shell:

    GO TO Cloud Shell

  2. Run the following command to list the outputs in your Cloud Storage bucket.

    gsutil ls gs://BUCKET/WORK_DIR
  3. The output will show a folder for each of the tasks run. Continue to list the contents of subsequent subdirectories to see all the files created by the pipeline.


You can either view the intermediate files created by the pipeline and choose which ones you want to keep, or remove them to reduce costs associated with Cloud Storage. To remove the files, see Deleting intermediate files in your Cloud Storage bucket.


  • If you encounter problems when running the pipeline, see Cloud Life Sciences API troubleshooting.

  • If your pipeline fails, you can check the logs for each task by looking at the log files in each of the folders in Cloud Storage, such as .command.err, .command.log, .command.out, and so forth.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

After you've finished the Running the GATK Best Practices Pipeline tutorial, you can clean up the resources that you created on GCP so they won't take up quota and you won't be billed for them in the future. The following sections describe how to delete or turn off these resources.

Deleting intermediate files in your Cloud Storage bucket

When you run the pipeline, it stores intermediate files in gs://BUCKET/WORK_DIR. You can remove the files after the workflow completes to reduce Cloud Storage charges.

To view the amount of space used in the directory:

gsutil du -sh gs://BUCKET/WORK_DIR

To remove files from the work directory:


  1. In the Cloud Storage console, open the Storage Browser page:

    GO TO Cloud Storage BROWSER

  2. Go to the BUCKET and WORK_DIR you specified when running the pipeline.

  3. Browse through the subfolders and delete any unwanted files or directories. To delete all files, delete the entire WORK_DIR.


  1. Open Cloud Shell and run the following:

    GO TO Cloud Shell

  2. To remove all of the intermediate files in the WORK_DIR directory:

    gsutil -m rm gs://BUCKET/WORK_DIR/**

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Life Sciences Documentation