Pipelines API Troubleshooting

The Google Genomics Pipelines API makes it easier to run batch computing tasks on Google Compute Engine virtual machines (VMs). This document describes where to look for troubleshooting information if a pipeline fails, including:

Pipeline operation resource

When the pipelines.run() API is called an operation resource is created to maintain important state about the pipeline. The operation name is returned so that you can track progress.

The key fields in the operation for tracking progress are:

done

When the operation is created the done value will be set to false. When the operation completes (with success or failure), done will be set to true.

error

If the operation has failed, the done property will be set to true and an error code and message will be provided. Here are some examples:

Operation Cancelled

Error text

error:
  code: 1
  message: Operation canceled at 2017-02-01T13:46:07-08:00

Description

The operation was cancelled by the user through the operations.cancel API.

File not found

Input file

Error text

error:
  code: 5
  message: '9: Failed to localize files: failed to copy the following files: "gs://my-bucket/my-path/input.txt
    -> /mnt/data/input.txt (cp failed: gsutil -q -m cp gs://my-bucket/my-path/input.txt
    /mnt/data/input.txt, command failed: CommandException: No URLs matched: gs://my-bucket/my-path/input.txt\nCommandException:
    1 file/object could not be transferred.\n)"'

Description

The pipeline had an input parameter with:

localCopy: /mnt/data/input.txt
value: gs://my-bucket/my-path/input.txt

No object was found at the Cloud Storage path, gs://my-bucket/my-path/input.txt.

Output file

Error text

error:
  code: 5
  message: '10: Failed to delocalize files: failed to copy the following files: "/mnt/data/output.txt
    -> gs://my-bucket/my-path/
    (cp failed: gsutil -q -m cp -L /var/log/google-genomics/out.log /mnt/data/output.txt
    gs://my-bucket/my-path/,
    command failed: CommandException: No URLs matched: /mnt/data/output.txt\nCommandException:
    1 file/object could not be transferred.\n)"'

Description

The pipeline had an output parameter with:

localCopy: /mnt/data/output.txt
value: gs://my-bucket/my-path

When the docker run completed, there was no file found at /mnt/data/output.txt on the operation's Compute Engine VM.

Docker image

Error text

error:
  code: 5
  message: |
    8: Failed to pull image gcr.io/my-project-id/my-non-existent-image: "gcloud docker -- pull gcr.io/my-project-id/my-non-existent-image" failed: exit status 1: Using default tag: latest
    Pulling repository gcr.io/my-project-id/my-non-existent-image
    Tag latest not found in repository gcr.io/my-project-id/my-non-existent-image

Description

The pipeline had a docker image specified as gcr.io/my-project-id/my-non-existent-image. The image could not be downloaded to the Compute Engine VM because the image did not exist.

Help with uploading Docker images can be found at Pushing to Container Registry.

VM shutdown

Error text

error:
  code: 10
  message: '13: VM ggp-3407728597191315463 shut down unexpectedly.'

error:
  code: 10
  message: '14: VM ggp-3630732807397428672 stopped unexpectedly.'

Description

The operation's Compute Engine VM shut down unexpectedly. This is typically due to:

  • A preemptible VM being preempted. To determine if a VM was preempted, follow these instructions.
  • User explicitly stopped or deleted the VM instance

Failed docker command

error:
  code: 10
  message: |-
    11: Docker run failed: command failed: [STDERR]
    . See logs at gs://my-bucket/my-path/logging

Description

The docker command executed but exited with a non-zero exit code.

Sufficient information may be found in the [STDERR] block that is in the operation.

More detailed information will be found in the operation log files.

metadata.events

As execution progresses for the pipeline, events will be posted, along with timestamps to the operation's metadata.events array. A typical example of a pipeline's events will look like:

metadata:
  events:
  - description: start
    startTime: '2016-06-21T22:11:19.033492348Z'
  - description: pulling-image
    startTime: '2016-06-21T22:11:19.033538217Z'
  - description: localizing-files
    startTime: '2016-06-21T22:11:41.986968591Z'
  - description: running-docker
    startTime: '2016-06-21T22:11:44.414132377Z'
  - description: delocalizing-files
    startTime: '2016-06-21T22:11:44.867186386Z'
  - description: ok
    startTime: '2016-06-21T22:11:48.363039120Z'

or if there is a failure running the pipeline's docker command:

metadata:
  events:
  - description: start
    startTime: '2016-06-22T17:26:19.774419593Z'
  - description: pulling-image
    startTime: '2016-06-22T17:26:19.774797917Z'
  - description: localizing-files
    startTime: '2016-06-22T17:27:51.700833219Z'
  - description: running-docker
    startTime: '2016-06-22T17:27:51.700872247Z'
  - description: fail
    startTime: '2016-06-22T17:27:54.305925814Z'

If your operation appears to be hanging, it could be that there is not enough quota to start your VM. Quota warnings are added to events list while the operation is in progress:

metadata:
  events:
  - description: 'Warning: Quota ''CPUS'' exceeded. Limit: 24.0. Region: us-east1, will try again'
    startTime: '2017-02-11T07:51:45.984778139Z'
  - description: 'Warning: Creating VM and disk(s) would exceed "CPUS" in region us-east1, will try again'
    startTime: '2017-02-11T09:43:35.768859933Z'

Pipeline operation log files

A pipeline operation includes a Cloud Storage path where logs are written. Three log files are written from the pipeline's VM. Since the files are written from the VM, there will be no log files if the VM is never launched.

The 3 files written are:

  • Pipeline log: log written by the pipeline's software that pulls the docker image, localizes files, runs the docker command, and de-localizes files
  • stderr log: the stderr log from the docker command
  • stdout log: the stdout log from the docker command

To retrieve the logging path from an existing operation, run on the command line:

gcloud alpha genomics operations describe <operation-id> \
  --format='value(metadata.request.pipelineArgs.logging)'

The Cloud Storage path may be either:

  • A simple Cloud Storage path
  • A Cloud Storage path ending in ".log"

A simple Cloud Storage path

When a simple Cloud Storage path is provided, it is treated like a file system folder. For example if you specify gs://my-bucket/my-path/my-pipeline, the log file names generated will be:

  • gs://my-bucket/my-path/my-pipeline/<operation-id>.log
  • gs://my-bucket/my-path/my-pipeline/<operation-id>-stderr.log
  • gs://my-bucket/my-path/my-pipeline/<operation-id>-stdout.log

A Cloud Storage path ending in ".log"

You can specify a Cloud Storage path that ends in ".log". For example if you specify: gs://my-bucket/my-path/my-pipeline.log, then the log file names generated will be:

  • gs://my-bucket/my-path/my-pipeline.log
  • gs://my-bucket/my-path/my-pipeline-stderr.log
  • gs://my-bucket/my-path/my-pipeline-stdout.log

Pipeline VM

The name and zone of the Compute Engine VM that runs a pipeline can be found in the operation under the metadata.runtimeMetadata.computeEngine element. While the instance is running, it can be inspected.

To retrieve the zone/instanceName from an existing operation, run on the command line:

gcloud alpha genomics operations describe <operation-id> \
   --format='value[separator=/]
                 (metadata.runtimeMetadata.computeEngine.zone,
                  metadata.runtimeMetadata.computeEngine.instanceName)'

VM logging

VM logging includes:

  • Linux system logging.
  • File localization and delocalization logging.
  • Pipeline docker logging.

There are several ways to view VM logs:

  • In the Google Cloud Console Logs Viewer:

    • Select GCE VM Instance from the first drop-down menu.
    • Enter the name of the VM in the search box.
    • In the All logs drop-down menu, select syslog.
  • In the Google Cloud Console instance detail page, click on the View serial port button.

  • Run gcloud compute instances get-serial-port-output zone/instanceName

All of the above methods work if the VM is up. If the VM has terminated, logs are only accessible using the Google Cloud Console Logs Viewer.

SSH to a pipeline's VM

You may want to open an SSH session to a pipeline's VM in order to examine the file system (available disk space) or even connect to the running docker container.

The zone/instanceName from the gcloud command above can be passed directly to the gcloud compute ssh command:

gcloud compute ssh zone/instanceName

If a failure occurs executing the docker command or de-localizing output files, the VM is automatically deleted. If you need to keep the VM running longer so you can SSH into it, you can use the keepVmAliveOnFailureDuration.

Similarly if you need to give yourself time to SSH to the VM before the docker command runs, you can add a sleep to the start of your docker command. For example:

sleep $((5*60)); mycommand.sh

would give you 5 extra minutes.

If the boot volume is out of space, you may not be able to SSH to the instance. This can be a challenging situation to debug. You may find clues to this problem in the VM logs. You may need to cancel your existing pipeline operation, recreate it, and SSH to the instance early in processing in order to debug.

Inspect the file systems' available disk space

A simple cause of pipeline failure can be running out of disk space. To inspect available disk space:

df -k -h

Check the Mounted on column for your volume(s) of interest. The boot volume will be mounted on /.

Any additional disk resources added for your pipeline will be "mounted on" the mountPoint value specified in the pipeline definition.

Connect to the running docker container

You can connect to a running docker container with the docker exec command. You must first get the container ID or name.

The docker ps command will display a list of running containers. On a pipelines VM instance, there will be just one. For example:

$ sudo docker ps
CONTAINER ID        IMAGE                COMMAND                CREATED             STATUS              PORTS               NAMES
24e6a7c1e573        java:openjdk-8-jre   "/tmp/ggp-532475336"   10 minutes ago      Up 10 minutes                           clever_chandrasekhar

To get just the container ID, use the --format flag:

$ sudo docker ps --format '{{.ID}}'
24e6a7c1e573

To connect to a running container to run a bash shell:

sudo docker exec -t -i <container-id> /bin/bash

For example:

$ sudo docker exec -t -i $(sudo docker ps --format '{{.ID}}') /bin/bash
root@24e6a7c1e573:/#

VM monitoring

All pipelines VMs have the Stackdriver monitoring agent installed, so you can set up Stackdriver monitoring and alerting.

My pipelines won't run or they won't stop running

You are unable to start running a pipeline or you cannot cancel a pipeline that is already running.

On the IAM page in the Google Cloud Platform Console, verify that the role Genomics Service Agent appears in the Members list for the relevant project service account. (Look for the project service account that ends in @genomics-api.google.com.iam.gserviceaccount.com).

If the Genomics Service Agent role does not appear in the Members list, use gcloud to add the genomics.serviceAgent role to the relevant project service account. This role includes permission to stop and start Compute Engine instances inside your project.

To find the PROJECT_ID and PROJECT_NUMBER, refer to Identifying projects.

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=serviceAccount:service-PROJECT_NUMBER@genomics-api.google.com.iam.gserviceaccount.com \
    --role=roles/genomics.serviceAgent
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Genomics