Cloud Life Sciences is deprecated and will no longer be available on Google Cloud after July 8, 2025. Use cases for Cloud Life Sciences are now supported by Batch. To learn how to migrate your workload, see Migrate to Batch.

Process genomic data by using Cloud Life Sciences

This page explains how to run a genomics pipeline that uses the Cloud Life Sciences API to create an index file (BAI file) from a binary file containing DNA sequences (BAM file).

BAM files are typically large and can take a long time to read using a genome viewer. You use a BAI file to locate the portions of the BAM file that contain the genome position you are interested in.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Cloud Life Sciences, Compute Engine, and Cloud Storage JSON APIs.

Enable the APIs

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Cloud Life Sciences, Compute Engine, and Cloud Storage JSON APIs.

Enable the APIs

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

Alternatively, you can use Cloud Shell, which comes with the gcloud CLI already installed.

Install Python 3.8.
If you are using Windows and you left the relevant checkbox selected when you installed the Google Cloud CLI, this was done automatically.

Run the pipeline

To run the pipeline, complete the following steps:

Create a bucket where you store the BAI file. Buckets are the basic containers that hold your data in Cloud Storage. To create a bucket named PROJECT_ID-life-sciences, run the gsutil mb command:
```
gsutil mb gs://PROJECT_ID-life-sciences
```
Replace PROJECT_ID with your Google Cloud project ID. You must use a globally unique bucket name.
See bucket naming requirements.
- Bucket names can only contain lowercase letters, numeric characters, dashes (-), and underscores (_). Spaces are not allowed.
- Bucket names must start and end with a number or letter.
- Bucket names must contain 3-63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.
- Bucket names cannot be represented as an IP address in dotted-decimal notation (for example, 192.168.5.4).
- Bucket names cannot begin with the "goog" prefix.
- Bucket names cannot contain "google" or close misspellings, such as "g00gle".
Caution: Do not include sensitive information in the bucket name, because the bucket namespace is global and publicly visible.

If successful, the command returns the following:
```
Creating gs://PROJECT_ID-life-sciences
```

To start the pipeline, run the gcloud beta lifesciences pipelines run command:

gcloud beta lifesciences pipelines run \
    --regions us-east1 \
    --command-line 'samtools index ${BAM} ${BAI}' \
    --docker-image "gcr.io/cloud-lifesciences/samtools" \
    --inputs BAM=gs://genomics-public-data/NA12878.chr20.sample.bam \
    --outputs BAI=gs://PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai

If successful, the command returns the following:

Running [projects/PROJECT_ID/operations/OPERATION_ID]

Note the OPERATION_ID, which you use in the next step.

To track the pipeline's status, run the gcloud beta lifesciences operations wait command. Replace OPERATION_ID with the value printed in the previous step. The pipeline takes a few minutes to finish.
```
gcloud beta lifesciences operations wait OPERATION_ID
```
After the operation finishes, it returns the following message:
```
Waiting for [projects/PROJECT_ID/operations/OPERATION_ID]...done.
```
To verify that the BAI file was generated, run the gsutil ls command:
```
gsutil ls gs://PROJECT_ID-life-sciences
```
If successful, the command returns the following:
```
gs://PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai
```

You've run a pipeline using the Cloud Life Sciences API to create a BAI file from a BAM file. Use a genome viewer to examine the NA12878.chr20.sample.bam BAM file using the NA12878.chr20.sample.bam.bai index file.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

Delete the BAI file

To delete the generated BAI file but keep the project and bucket you created, run the gsutil rm command:

gsutil rm PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai

Delete the bucket

If you created the bucket specifically for this quickstart and no longer need it, but want to keep your project, delete the bucket using the gsutil rb command. Deleting the bucket also deletes the generated BAI file.

gsutil rb gs://PROJECT_ID-life-sciences

Delete the project

If you created the project specifically for this quickstart and no longer need it, you can delete the project. Deleting the project also deletes the BAI file and the Cloud Storage bucket.

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

How did it go?

What's next

Learn more about Cloud Life Sciences API public datasets.
Learn how to load variant data into Cloud Storage or BigQuery.
Learn how to analyze variants with BigQuery.