Quickstart

This page shows you how to run a genomics pipeline that uses the Cloud Life Sciences API to create an index file (BAI file) from a large binary file containing DNA sequences (BAM file).

Before you begin

  1. Google アカウントにログインします。

    Google アカウントをまだお持ちでない場合は、新しいアカウントを登録します。

  2. GCP Console のプロジェクト セレクタのページで、GCP プロジェクトを選択または作成します。

    プロジェクト セレクタのページに移動

  3. Google Cloud Platform プロジェクトに対して課金が有効になっていることを確認します。 詳しくは、課金を有効にする方法をご覧ください。

  4. Cloud Life Sciences, Compute Engine, and Cloud Storage JSON API を有効にします。

    APIを有効にする

  5. Cloud SDK をインストールして初期化します。
  6. Alternatively, you can use Cloud Shell, which comes with the Cloud SDK already installed.

Run the pipeline

You can run the pipeline using curl or Windows PowerShell.

curl command

  1. Create a BUCKET environment variable. The variable points to a Cloud Storage bucket that uses your project name with -life-sciences appended.

    export BUCKET=gs://PROJECT_ID-life-sciences
    
  2. Create the bucket using the gsutil mb command:

    gsutil mb ${BUCKET}
    
  3. Run a pipeline using the gcloud command-line tool, specifying a BAM file name for the input and a BAI file name for the output. The pipeline invokes the Cloud Life Sciences API, creates a Compute Engine VM instance, and then runs the pipeline process on the instance. After the process finishes, the instance is automatically shut down and the BAI file is copied to your Cloud Storage bucket.

    gcloud beta lifesciences workflows run \
        --regions us-east1 \
        --command-line 'samtools index ${BAM} ${BAI}' \
        --docker-image "gcr.io/genomics-tools/samtools" \
        --inputs BAM=gs://genomics-public-data/NA12878.chr20.sample.bam \
        --outputs BAI=${BUCKET}/NA12878.chr20.sample.bam.bai
    

    If successful, the command returns the following:

    Running [projects/PROJECT_ID/operations/OPERATION_ID]
    
  4. The pipeline takes a few minutes to finish. You can run the following command to track its status. Replace OPERATION_ID with the value printed in the previous step.

    gcloud beta lifesciences operations wait OPERATION_ID
    

    After the operation finishes, it returns the following message:

    Waiting for [projects/PROJECT_ID/operations/OPERATION_ID]...done.
    
  5. Verify that the BAI file was generated:

    gsutil ls ${BUCKET}
    

    The command should return the following:

    gs://BUCKET/NA12878.chr20.sample.bam.bai
    

You've just run a pipeline using the Cloud Life Sciences API to create a BAI file from a BAM file.

PowerShell

  1. Create a BUCKET environment variable. The variable points to a Cloud Storage bucket that uses your project name with -life-sciences appended.

    $BUCKET = "gs://PROJECT_ID-life-sciences"
    
  2. Create the bucket using the gsutil mb command:

    gsutil mb ${BUCKET}
    
  3. Run a pipeline using the gcloud command-line tool, specifying a BAM file name for the input and a BAI file name for the output. The pipeline invokes the Cloud Life Sciences API, creates a Compute Engine VM instance, and then runs the pipeline process on the instance. After the process finishes, the instance is automatically shut down and the BAI file is copied to your Cloud Storage bucket.

    gcloud beta lifesciences workflows run `
        --regions us-east1 `
        --command-line 'samtools index ${BAM} ${BAI}' `
        --docker-image "gcr.io/genomics-tools/samtools" `
        --inputs BAM=gs://genomics-public-data/NA12878.chr20.sample.bam `
        --outputs BAI=${BUCKET}/NA12878.chr20.sample.bam.bai
    

    If successful, the command returns the following:

    Running [projects/PROJECT_ID/operations/OPERATION_ID]
    
  4. The pipeline takes a few minutes to finish. You can run the following command to track its status. Replace OPERATION_ID with the value printed in the previous step.

    gcloud beta lifesciences operations wait OPERATION_ID
    

    After the operation finishes, it returns the following message:

    Waiting for [projects/PROJECT_ID/operations/OPERATION_ID]...done.
    
  5. Verify that the BAI file was generated:

    gsutil ls ${BUCKET}
    

    The command should return the following:

    gs://BUCKET/NA12878.chr20.sample.bam.bai
    

You've just run a pipeline using the Cloud Life Sciences API to create a BAI file from a BAM file.

Clean up

To avoid incurring charges to your GCP account for the resources used in this tutorial, you can clean up the resources you created on GCP. The following sections describe how to delete or turn off these resources.

Delete the project

If you created the project specifically for this quickstart and no longer need it, you can delete the project. Deleting the project also deletes the Cloud Storage bucket and the BAI file.

  1. GCP Console で [プロジェクト] ページに移動します。

    プロジェクト ページに移動

  2. プロジェクト リストで、削除するプロジェクトを選択し、[削除] をクリックします。
  3. ダイアログでプロジェクト ID を入力し、[シャットダウン] をクリックしてプロジェクトを削除します。

Delete the BAI file

To delete the generated BAI file but keep the project and bucket you created, run the gsutil rm command:

gsutil rm ${BUCKET}/NA12878.chr20.sample.bam.bai

Delete the bucket

If you created the bucket specifically for this quickstart and no longer need it, but want to keep your project, delete the bucket using the gsutil rb command. Deleting the bucket also deletes the generated BAI file.

gsutil rb ${BUCKET}

What's next

このページは役立ちましたか?評価をお願いいたします。

フィードバックを送信...