gcloud Examples

This page provides examples of using the gcloud API to perform common Pipelines API operations. Full details are available in the gcloud alpha genomics reference.

As an alternative to the gcloud tool, consider using a workflow engine like Cromwell or a job submission tool like dsub. These tools add powerful user interfaces and workflow definition features that supplement the Pipelines API.

Outputting a command to a Cloud Storage bucket

When running the Pipelines API using gcloud, you can specify single commands using the --command-line flag:

gcloud alpha genomics pipelines run \
    --regions us-east1 \
    --logging gs://BUCKET/my-path/example.log \
    --command-line 'echo "Hello, world!"'

This command creates a Compute Engine VM in your Google Cloud Platform project in the us-east1 region. After the VM starts, it pulls the Cloud SDK Docker image and runs the command: echo "Hello, world!". The output of the command is written to a log file in a Cloud Storage bucket specified with the --logging flag.

To use an image other than Cloud SDK, use the --docker-image flag. For example, the Quickstart uses the gcr.io/genomics-tools/samtools image.

Checking the status of a pipeline

After you run a pipeline using gcloud, the command returns the following:

Running [projects/PROJECT_ID/operations/OPERATION_ID]

You can use the OPERATION_ID value to check the status of the pipeline by running the following command:

gcloud alpha genomics operations describe OPERATION_ID

Running the operations describe command provides the following details about the pipeline:

  • Whether it has started
  • Whether it is in progress
  • Whether it finished successfully or encountered errors

To see only whether the operation has completed, use the --format flag:

gcloud --format="value(done)" alpha genomics operations describe OPERATION_ID

The gcloud tool provides other features for filtering operations and formatting the displayed values. Read the documentation for the --filter and --format flags for more information.

Cancelling a pipeline run

To cancel a running pipeline:

gcloud alpha genomics operations cancel OPERATION_ID

Note that the operation will not be immediately marked as done. The Compute Engine VM must be deleted before the operation completes, and this might take a few minutes.

Passing input parameters

Use the --inputs flag to pass input parameters to the Docker image:

gcloud alpha genomics pipelines run \
    --regions us-east1 \
    --logging gs://BUCKET/my-path/example.log \
    --command-line 'echo "${MESSAGE}"' \
    --inputs MESSAGE='Hello, world!'

The parameters are passed by name as environment variables to the command running inside the Docker container.

Specifying input and output files

Use the --inputs flag to specify input files:

gcloud alpha genomics pipelines run \
    --regions us-east1 \
    --logging gs://BUCKET/my-path/example.log \
    --command-line 'cat ${INPUT_FILE} | md5sum' \
    --inputs INPUT_FILE=gs://BUCKET/INPUT_FILE

The BUCKET/my-path/example.log log file contains the resulting md5sum.

Use the --outputs flag to write the results of the command to a file in Cloud Storage:

gcloud alpha genomics pipelines run \
    --regions us-east1 \
    --logging gs://BUCKET/my-path/example.log \
    --command-line 'cat "${INPUT_FILE}" | md5sum > "${OUTPUT_FILE}"' \
    --inputs INPUT_FILE=gs://BUCKET/INPUT_FILE
    --outputs OUTPUT_FILE=gs://BUCKET/OUTPUT_FILE.md5

Input and output files can contain wildcards, so you can pass multiple files at once. Recursive copy is not available.

Using preemptible VMs

You can use a preemptible VM, which can be up to 80% cheaper than a regular VM. However, Compute Engine might terminate (preempt) this VM if Compute Engine requires access to the VM's resources for other tasks. If your VM is preempted, you will have to restart your pipeline.

To use a preemptible VM, use the --preemptible flag when running your pipeline:

gcloud alpha genomics pipelines run \
    --regions us-east1 \
    --logging gs://BUCKET/my-path/example.log \
    --command-line 'echo "Hello, world!"' \
    --preemptible

Setting VM instance types

By default, the Compute Engine VM that runs the pipeline will be an n1-standard-1. You can specify the amount of memory with the --memory flag and the number of CPU cores with the --cpus flag:

gcloud alpha genomics pipelines run \
    --regions us-east1 \
    --logging gs://BUCKET/my-path/example.log \
    --command-line 'echo "Hello, world!"' \
    --cpus MIN_CPUS \
    --memory MIN_RAM_GB

If you are specifying your pipeline using a YAML or JSON file, you can specify any supported Compute Engine machine type by entering it in the machineType field.

Writing complex pipeline definitions

For more complex pipelines, define the pipeline in a YAML or JSON file and pass the file to gcloud with the --pipeline-file flag. The file must contain a single Pipeline message, as described in the reference documentation. Defining your pipeline this way lets you specify advanced features including multiple disks and background containers.

To convert the command from the Hello, world! example to a pipeline file, copy the following text and save it to a file named hello.yaml:

actions:
- commands: [ '-c', 'echo "Hello, world!"' ]
  imageUri: bash

Run the hello.yaml file by passing the --pipeline-file flag to gcloud:

gcloud alpha genomics pipelines run \
    --regions us-east1 \
    --logging gs://BUCKET/my-path/example.log \
    --pipeline-file hello.yaml

Using multiple Docker containers

The preceding examples focus on running a single command using a single Docker container. If the pipeline requires multiple separate commands from multiple containers, you must specify a pipeline in a YAML or JSON file.

You can add more steps to a YAML or JSON configuration file by entering them in the actions list. To run a pipeline using two Docker containers, add the following to a YAML configuration file:

actions:
- commands: [ 'echo', 'Hello from bash!' ]
  imageUri: bash
- commands: [ 'echo', 'Hello from ubuntu!' ]
  imageUri: ubuntu

When you run the pipeline, the first command will run in the bash Docker image, and the second command will run in the ubuntu Docker image.

¿Te ha resultado útil esta página? Enviar comentarios:

Enviar comentarios sobre...