Quickstart

This page shows you how to run a pipeline that uses the Cloud Genomics API to create an index file (BAI file) from a large binary file containing DNA sequences (BAM file).

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the GCP Console, go to the Manage resources page and select or create a new project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your project.

    Learn how to enable billing

  4. Enable the Cloud Genomics, Compute Engine, and Cloud Storage JSON APIs.

    Enable the APIs

Set up your local environment and install prerequisites

  1. Install and initialize the Cloud SDK. Alternatively, you can use Google Cloud Shell, which comes with the Cloud SDK already installed.

  2. Make sure your GOPATH environment variable is set by running the following command:

    echo $GOPATH
    

Run the pipeline

  1. Clone the pipeline tools sample:

    go get github.com/googlegenomics/pipelines-tools/...
    

  2. Create a BUCKET environment variable. The variable points to a Cloud Storage bucket that uses your project name with -genomics appended.

    export BUCKET=gs://PROJECT_ID-genomics
    

  3. Create the bucket using the gsutil mb command:

    gsutil mb ${BUCKET}
    

  4. Create a script called index.script by running the following command. The script takes a BAM file (${INPUT0}) and a generated BAI file (${OUTPUT0}) as its arguments.

    echo 'samtools index ${INPUT0} ${OUTPUT0} # image=gcr.io/genomics-tools/samtools' > index.script
    

  5. Run the pipeline using the index.script script, using the BAM file as the input and a BAI file as the output. The pipeline invokes the Cloud Genomics API, creates a Compute Engine VM instance, and then runs the pipeline process on the instance. After the process finishes, the instance is automatically shut down and the BAI file is copied to your Cloud Storage bucket.

    pipelines run \
        --inputs=gs://genomics-public-data/NA12878.chr20.sample.bam \
        --outputs=${BUCKET}/NA12878.chr20.sample.bam.bai \
        index.script
    

    If successful, the command returns the following:

    ...
    Pipeline execution completed
    

  6. Verify that the BAI file was generated:

    gsutil ls ${BUCKET}
    

    The command should return the following:

    gs://BUCKET/NA12878.chr20.sample.bam.bai
    

You've just run a pipeline using the Cloud Genomics API to create a BAI file from a BAM file.

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

  1. Use the gsutil rm command to delete the BAI file:

    gsutil rm ${BUCKET}/NA12878.chr20.sample.bam.bai
    

  2. If you created the bucket specifically for this quickstart and no longer need it, delete it using the gsutil rb command:

    gsutil rb ${BUCKET}
    

What's next

Send feedback about...

Cloud Genomics