Cloud Genomics v1alpha2 Migration Guide

The v1alpha2 API is deprecated and will be turned down later in 2018. This guide describes how to migrate v1alpha2 pipeline definitions to the v2alpha1 API. The v2alpha API has significant performance improvements, however this guide covers only the minimum required steps to translate v1alpha2 requests.

In addition to this document, an open source tool that automatically migrates v1alpha2 requests is available. This tool reads a v1alpha2 request from standard input, applies the transformations described below, and writes a v2alpha1 request to standard output.

Key differences in v2alpha1

There are a few key differences in the v2alpha1 API that are important to understand:

  1. Multiple containers can be executed in a single pipeline request. Either the same image or multiple images can be used. Each container execution is defined by an Action message.

  2. Localization and delocalization (copying files to and from the VM, respectively) is no longer a fixed part of the API. Instead, actions must be added to the pipeline, and the actions perform these functions.

  3. Logging to Cloud Storage is no longer provided as a fixed part of the API, though more information is provided during execution than in the v1alpha2 API. If additional logs are required, an action must be added to the pipeline that copies logs off of the VM.

  4. The split between a pipeline definition and input/output parameters has been removed, greatly simplifying the request.

  5. The following scopes are no longer automatically enabled for the service account being used to access data and services, thus they need to be manually enabled if required by the pipeline:

    • https://www.googleapis.com/auth/compute
    • https://www.googleapis.com/auth/devstorage.full_control
    • https://www.googleapis.com/auth/logging.write
    • https://www.googleapis.com/auth/monitoring.write

Migrating requests

The process of migrating v1alpha2 requests to the new API primarily consists of wrapping the single user-specified Docker command with a series of actions to perform localization, delocalization, and logging. This process is described in detail in the sections below.

Resources

Resources are specified in two parts in the v1alpha2 API: EphemeralPipeline and PipelineArgs. These two specifications must be merged together before being translated into a v2alpha1 Resources object. The simplest strategy is to override any value specified in the ephemeral pipeline with the value specified in the pipeline arguments.

After the resources are merged together, most fields map directly onto their counterparts in the Resources object.

One significant exception is the machine type string, which must be generated based on the minimumCpuCores and minimumRamGb fields in the v1alpha2 request.

Standard machine type names (for example, n1-standard-1) or custom types (for example, custom-1-4096) can be used. If a custom type is specified that maps directly onto a cheaper standard type, Compute Engine will use the standard type automatically.

The simplest way to migrate existing values to machine types is to take the minimum number of CPU cores and RAM specified and generate a custom machine type (custom-N-M), where N is the number of cores, and M is the number of megabytes of RAM. This will typically allocate less expensive machines, though some pipelines may fail if they were not requesting sufficient minimums.

Example: Mapping minimum CPU and RAM values to a custom machine type

v1alpha2v2alpha1
{
  'ephemeralPipeline': {
    'resources': {
      'minimumCpuCores': 1,
      'minimumRamGb': 4
    }
  },
  'pipelineArgs': {
    'resources': {
      'minimumRamGb': 8,
      'preemptible': true
    }
  }
}
{
  'pipeline': {
    'resources': {
      'virtualMachine': {
        'machineType': 'custom-1-8192',
        'preemptible': true
      }
    }
  }
}

Localization

To perform localization in the same way as the v1alpha2 API, each entry in the inputParameters field must be translated. Note that the inputParameters and pipelineArgs sections in the v1alpha2 request must be merged together.

Example: Simple input parameter

The example below shows how to convert a simple input parameter (no file is copied). In this case, the value simply needs to be added to the pipeline environment map so that it is exposed as an environment variable to the container.

v1alpha2v2alpha1
{
  'ephemeralPipeline': {
    'inputParameters': [
      {
        'name': 'SHARDS',
        'defaultValue': '1'
      },
    ],
  },
  'pipelineArgs': {
    'SHARDS': '2',
  }
}
{
  'pipeline': {
    'environment': {
      'SHARDS': '2'
    }
  }
}

Example: Copying an input file

The example below shows an input parameter that uses localCopy. In this case, a unique local filename should be generated and assigned to the appropriate environment variable, and an action should be added which invokes the gsutil command to copy the file to the VM.

Note that the attached data disk must be mounted into the action that copies the file as well as any action that expects to use the file.

v1alpha2v2alpha1
{
  'ephemeralPipeline': {
    'inputParameters': [
      {
        'name': 'INPUT1',
        'defaultValue': 'gs://DIRECTORY/FILE'
        'localCopy': {
          'path': 'test',
          'disk': 'data'
        }
      }
   ],
   'resources': {
     'disks': [
       {
         'name': 'data',
         'mountPoint': '/data'
       }
     ]
   }
 }
}
{
  'pipeline': {
     'environment': {
       'INPUT1': '/data/input1'
     },
    'actions': [
      {
        'imageUri': 'google/cloud-sdk',
        'commands': [
          'sh', '-c', 'gsutil cp gs:/DIRECTORY/FILE $INPUT1'
        ]
        'mounts': [
          {
            'disk': 'data',
            'path': '/data'
          }
        ]
      }
    ]
    'resources': {
      'virtualMachine': {
        'disks': [
           {
             'name': 'data'
           }
        ]
      }
    }
  }
}

Running the user command

After any required localization actions have been added, a single action should be generated from a user-specified v1alpha2 executor. This involves translating the DockerExecutor parameters into an action.

Note that the v1alpha2 API executed commands using bash. This is not required in v2alpha1, but is advisable when migrating pipelines to avoid unexpected results. In particular, using bash is required to cause environment variable expansion.

Example: Running the user-specified command using a shell

v1alpha2v2alpha1
{
  'ephemeralPipeline': {
    'executor': {
      'imageName': 'ubuntu',
      'cmd': 'echo hello world'
    }
  }
}
{
  'pipeline': {
    'actions': [
      {
        'imageUri': 'ubuntu',
        'commands': [
          'bash', '-c', 'echo hello world'
        ]
      }
    ]
  }
}

Delocalization

After adding the action that runs the user command, additional actions must be added for any outputParameters. These actions will copy data off of the VM into Cloud Storage.

These actions should have the ALWAYS_RUN flag specified to ensure that they run even if the user command fails. Normally, once an action fails the pipeline stops executing. Since a partial output file may be useful, delocalization actions should always run. Consult the Action reference documentation for the complete set of available flags.

Example: Copying an output file

v1alpha2v2alpha1
{
  'ephemeralPipeline': {
    'outputParameters': [
      {
        'name': 'OUTPUT1',
        'defaultValue': 'gs://DIRECTORY/FILE'
        'localCopy': {
          'path': 'test',
          'disk': 'data'
        }
      }
   ],
   'resources': {
     'disks': [
       {
         'name': 'data',
         'mountPoint': '/data'
       }
     ]
   }
 }
}
{
  'pipeline': {
     'environment': {
       'INPUT1': '/data/input1'
     },
    'actions': [
      {
        'imageUri': 'google/cloud-sdk',
        'commands': [
          'sh', '-c', 'gsutil cp $OUTPUT1 gs://DIRECTORY/FILE'
        ]
        'flags': [
          'ALWAYS_RUN'
        ],
        'mounts': [
          {
            'disk': 'data',
            'path': '/data'
          }
        ]
      }
    ]
    'resources': {
      'virtualMachine': {
        'disks': [
           {
             'name': 'data'
           }
        ]
      }
    }
  }
}

Logging

Finally, after any delocalization actions, logs should be copied off of the VM into Cloud Storage. For full compatibility with the v1alpha2 API, this should be performed every few minutes in the background. However, most users will only need to consult the logs after the pipeline completes, and viewing a single action will suffice.

Logs are stored under the special (and always mounted read-only) /google directory. Consult the Action reference documentation for a detailed description of this directory.

Example: Copying all logs to Cloud Storage at the end of the pipeline

In this example, logs are copied off of the VM once as a final action that is marked as ALWAYS_RUN (because logs may be particularly interesting for failed pipelines).

v1alpha2v2alpha1
{
  'ephemeralPipeline': {
    'logging': {
      'gcsPath': 'gs://DIRECTORY/FILE'
    }
 }
}
{
  'pipeline': {
    'actions': [
      {
        'imageUri': 'google/cloud-sdk',
        'commands': [
          'sh', '-c', 'gsutil cp /google/logs/output gs://DIRECTORY/FILE'
        ],
        'flags': [
          'ALWAYS_RUN'
        ]
      }
    ]
  }
}

Example: Periodically copying logs to Cloud Storage

In this example, logs are copied off of the VM every minute as a background action (a new feature in the v2alpha1 API).

v1alpha2v2alpha1
{
  'ephemeralPipeline': {
    'logging': {
      'gcsPath': 'gs://DIRECTORY/FILE'
    }
 }
}
{
  'pipeline': {
    'actions': [
      {
        'imageUri': 'google/cloud-sdk',
        'commands': [
          'sh', '-c', 'while true; sleep 1m; gsutil cp /google/logs/output gs://DIRECTORY/FILE; done'
        ],
        'flags': [
          'RUN_IN_BACKGROUND'
        ]
      }
    ]
  }
}

Reading status information

In general, operation status is exposed in a similar fashion via the standard Long Running Operations API. Each running pipeline has a done field that indicates whether it has completed. When this field indicates the operation is done, the error or response fields will be populated.

Events

The v2alpha1 API exposes a machine-readable event stream. The set of events differs from the v1alpha2 API. The table below describes how to map v1alpha2 event descriptions to information exposed by the v2alpha1 API (where possible).

v1alpha2 v2alpha1
start The first PullStartedEvent.
pulling-image PullStartedEvent is emitted when each pull starts. PullStoppedEvent is emitted when each pull stops.
running-docker For each action, a ContainerStartedEventis generated when the container starts and a ContainerStoppedEvent is generated when it exits.
localizing-files
delocalizing-files
Because localization is simply another container invocation, ContainerStartedEvent and ContainerStoppedEvent can be used in conjunction with per-action labels.
fail A FailedEvent is generated.
ok There is no corresponding event, but the operation's done field is set to true with an empty error.

Accessing the containers

In the v2alpha1 API, you cannot directly SSH to the running containers. However, you can access the containers by running a separate SSH server as a background action. The server runs in the same network as the other containers and can communicate with them. To run the separate server, start it as a background action (using the RUN_IN_BACKGROUND flag) before running any other actions.

Example: Starting an SSH container as a background action

v2alpha1
{
  'pipeline': {
    'actions': [
      {
        'imageUri': 'gcr.io/cloud-genomics-pipelines/tools',
        'entrypoint': 'ssh-server',
        'flags': [
          'RUN_IN_BACKGROUND'
        ],
        "portMappings": {
          "22": 22
        }
      }
    ]
  }
}
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Genomics