Migrate to Batch from Cloud Life Sciences

This page describes how to migrate to Batch from Cloud Life Sciences.

On July 17, 2023, Google Cloud announced that Cloud Life Sciences, which had been in beta, is deprecated. The service will no longer be available on Google Cloud after July 8, 2025. However, Batch is generally available and is a comprehensive successor that supports all use cases for Cloud Life Sciences.

Learn more about Batch, Cloud Life Sciences, and product launch stages.

Cloud Life Sciences versus Batch

Migrating from Cloud Life Sciences to Batch primarily involves understanding how you can use Batch for the workloads that you currently execute by running Cloud Life Sciences pipelines.

To understand how you can you can execute your Cloud Life Sciences workloads on Batch, see all of the following the sections:

Overview

A Cloud Life Sciences pipeline describes a sequence of actions (containers) to execute and the environment to execute the containers in.

A Batch job describes an array of one or more tasks and the environment to execute those tasks in. You define the workload for a job as one sequence of one or more runnables (containers and/or scripts) to be executed. Each task for a job represents one execution of its sequence of runnables.

Cloud Life Sciences pipelines can be expressed as single-task Batch jobs.

For example, the following samples describe a simple Cloud Life Sciences pipeline and its equivalent Batch job:

Cloud Life Sciences pipeline Batch job
  {
    "actions": [
      {
        "imageUri": "bash",
        "commands": [
          "-c",
          "echo Hello, world!"
        ]
      }
    ]
  }
  
    {
      "taskGroups" : [{
        "taskSpec" : {
          "runnables" : [{
            "container":{
              "imageUri": "bash",
              "commands": [
                "-c",
                "echo Hello, world!"
              ]
            }
          }]
        }
      }]
    }
    

Multiple-task Batch jobs are similar to copied Cloud Life Sciences pipelines.

Unlike Cloud Life Sciences, Batch allows you to automatically schedule multiple executions of your workload. You indicate the number of times that you want to execute the sequence of runnables for a job by defining the number of tasks. When a job has multiple tasks, you specify how you want each execution to vary by referencing the task's index in your runnables. Additionally, you can configure the relative schedules for a job's tasks—for example, whether to allow multiple tasks to run in parallel or to require tasks to run in sequential order and one at a time. Batch manages the scheduling the job's tasks: when a task finishes, the job automatically starts the next task, if any.

For example, see the following Batch job. This example job has 100 tasks that execute on 10 Compute Engine virtual machine (VM) instances, so there are approximately 10 tasks running in parallel at any given time. Each task in this example job only executes one runnable: a script that prints a message and the task's index, which is defined by the BATCH_TASK_INDEX predefined environment variable.

{
  "taskGroups" : [{
    "taskSpec" : {
      "runnables" : [{
        "script":{
          "text": "echo Hello world! This is task ${BATCH_TASK_INDEX}."
        }
      }]
    },
    "taskCount": 100,
    "parallelism": 10
  }]
}

Workflows that involve the creation and monitoring of multiple similar Cloud Life Sciences pipelines can sometimes be simplified by taking advantage of Batch's built-in scheduling.

Basic operations

This section describes basic operations in Cloud Life Sciences versus Batch.

The following table summarizes the basic operations options for Cloud Life Sciences and Batch.

Basic operation Cloud Life Sciences options Batch options
Execute a workload.
  • Run a pipeline.
  • Create and run a job.
View all of your workloads.
  • List long-running operations.
  • View a list of your jobs.
View the details and status for a workload.
  • Get details for a long-running operation.
  • Poll a long-running operation.
  • View the details of a job.
  • View a list of a job's tasks.
  • View the details of a task.
Stop and remove a workload.
  • Cancel a long-running operation.
  • Delete (and cancel) a job.
  • Check the status of a job deletion request.

The basic operations for Cloud Life Sciences and Batch have a few key differences.

Firstly, long-running operation resources do not play the same role in Batch that they do in Cloud Life Sciences. Long-running operation resources (LROs) in Cloud Life Sciences are the primary resource used to list and view your pipelines. But, long-running operation resources in Batch and other Google Cloud APIs are only used to monitor the status of a request that takes a long time to complete. Specifically, in Batch, the only request that returns a long-running operation resource is deleting a job. For more information about long-running operation resources for Batch, see the Batch API reference documentation for the projects.locations.operations REST resource. Instead of using long-running operation resources, Batch has job resources that you view and delete for your workloads.

Secondly, viewing the details of a workload in Batch involves different operations than Cloud Life Sciences. You can view a job to see both it's details and status. But, each of a job's tasks also has its own details and status that you can see by viewing a list of a job's tasks and viewing the details of a task.

To help you further understand the basic operations for Cloud Life Sciences versus Batch, the following sections provide examples of Google Cloud CLI commands and API requests paths for some of these basic operations.

Example gcloud CLI commands

For gcloud CLI, Cloud Life Sciences commands begin with gcloud beta lifesciences and Batch commands begin with gcloud batch. For example, see the following gcloud CLI commands.

  • Cloud Life Sciences example gcloud CLI commands:

    • Run a pipeline:

      gcloud beta lifesciences pipelines run \
        --project=PROJECT_ID \
        --regions=LOCATION \
        --pipeline-file=JSON_CONFIGURATION_FILE
      
    • Get details for a long-running operation:

      gcloud beta lifesciences operations describe OPERATION_ID
      

    Replace the following:

    • PROJECT_ID: the project ID of your project.
    • LOCATION: the location for the pipeline.
    • JSON_CONFIGURATION_FILE: the JSON configuration file for the pipeline.
    • OPERATION_ID: the identifier for the long-running operation, which was returned by the request to run the pipeline.
  • Batch example gcloud CLI commands:

    • Create and run a job:

      gcloud batch jobs submit JOB_NAME \
        --project=PROJECT_ID \
        --location=LOCATION \
        --config=JSON_CONFIGURATION_FILE
      
    • View the details of a job:

      gcloud batch jobs describe JOB_NAME \
        --project=PROJECT_ID \
        --location=LOCATION \
      
    • View a job's list of tasks:

      ​​gcloud batch tasks list \
        --project=PROJECT_ID \
        --location=LOCATION \
        --job=JOB_NAME
      
    • View the details of a task:

      gcloud batch tasks describe TASK_INDEX \
        --project=PROJECT_ID \
        --location=LOCATION \
        --job=JOB_NAME \
        --task_group=TASK_GROUP
      
    • Delete (and cancel) a job:

      gcloud batch jobs delete JOB_NAME \
        --project=PROJECT_ID \
        --location=LOCATION
      

    Replace the following:

    • JOB_NAME: the name of the job.
    • PROJECT_ID: the project ID of your project.
    • LOCATION: the location of the job.
    • JSON_CONFIGURATION_FILE: the path for a JSON file with the job's configuration details.
    • TASK_INDEX: the index of the task that you want to view the details of. In a task group, the task index starts at 0 for the first task and increases by 1 with each additional task. For example, a task group that contains four tasks has the indexes 0, 1, 2, and 3.
    • TASK_GROUP_NAME: the name of the task group that you want to view the details of. The value must be set to group0.

Example API request paths

For APIs, Cloud Life Sciences uses lifesciences.googleapis.com request paths and Batch uses batch.googleapis.com request paths. For example, see the following API request paths. Unlike Cloud Life Sciences, Batch does not have an RPC API; it only has a REST API.

  • Cloud Life Sciences example API request paths:

    • Run a pipeline:

      POST https://lifesciences.googleapis.com/v2beta/projects/PROJECT_ID/locations/LOCATION/pipelines:run
      
    • Get details for a long-running operation:

      GET https://lifesciences.googleapis.com/v2beta/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID
      

    Replace the following:

    • PROJECT_ID: the project ID of your project.
    • LOCATION: the location for the pipeline.
    • OPERATION_ID: the identifier for the long-running operation, which was returned by the request to run the pipeline.
  • Batch example API request paths:

    • Create and run a job:

      POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME
      
    • View the details of a job:

      GET https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs/JOB_NAME
      
    • View a job's list of tasks:

      GET https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs/JOB_NAME/taskGroups/TASK_GROUP/tasks
      
    • Delete a job

      DELETE https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs/JOB_NAME
      
    • Check the status of job deletion request:

      GET https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID
      

    Replace the following:

    • PROJECT_ID: the project ID of your project.
    • LOCATION: the location of the job.
    • JOB_NAME: the name of the job.
    • TASK_GROUP_NAME: the name of the task group that you want to view the details of. The value must be set to group0.
    • OPERATION_ID: the identifier for the long-running operation, which was returned by the request to delete the job.

IAM roles and permissions

This section summarizes the differences in Identity and Access Management roles and permissions for Cloud Life Sciences and Batch. For more information about any roles and their permissions, see the IAM basic and predefined roles reference.

The following table describes the predefined roles and their permissions that are required for users of Cloud Life Sciences.

Cloud Life Sciences roles Permissions

Any of the following:

  • Cloud Life Sciences Admin (roles/lifesciences.admin) on the project
  • Cloud Life Sciences Editor (roles/lifesciences.editor) on the project
  • Cloud Life Sciences Workflows Runner (roles/lifesciences.workflowsRunner) on the project
  • lifesciences.workflows.run
  • lifesciences.operations.cancel
  • lifesciences.operations.get
  • lifesciences.operations.list
Cloud Life Sciences Viewer (roles/lifesciences.viewer) on the project
  • lifesciences.operations.get
  • lifesciences.operations.list
  • resourcemanager.projects.get
  • resourcemanager.projects.list

The following table describes some of the predefined roles and their permissions for Batch. Unlike Cloud Life Sciences, Batch requires you to grant permissions to users and the service account for a job. For more information about the IAM requirements, see Prerequisites for Batch.

Batch roles for users Permissions
Batch Job Editor (roles/batch.jobsEditor) on the project
  • batch.jobs.create
  • batch.jobs.delete
  • batch.jobs.get
  • batch.jobs.list
  • batch.locations.get
  • batch.locations.list
  • batch.operations.get
  • batch.operations.list
  • batch.tasks.get
  • batch.tasks.list
  • resourcemanager.projects.get
  • resourcemanager.projects.list
Batch Job Viewer (roles/batch.jobsViewer) on the project
  • batch.jobs.get
  • batch.jobs.list
  • batch.locations.get
  • batch.locations.list
  • batch.operations.get
  • batch.operations.list
  • batch.tasks.get
  • batch.tasks.list
  • resourcemanager.projects.get
  • resourcemanager.projects.list
Service Account User (roles/iam.serviceAccountUser) on the job's service account
  • iam.serviceAccounts.actAs
  • iam.serviceAccounts.get
  • iam.serviceAccounts.list
  • resourcemanager.projects.get
  • resourcemanager.projects.list
Batch roles for service accounts Permissions
Batch Agent Reporter (roles/batch.agentReporter) on the project
  • batch.states.report

Corresponding features

The following table describes the features for Cloud Life Sciences, the equivalent features for Batch, and details about the differences between them.

Each feature is represented by a description and its JSON syntax. You can use JSON syntax when accessing Batch through the API or when specifying a JSON configuration file through the Google Cloud CLI. However, note that you can also use Batch features through other methods—such as through Google Cloud console fields, flags gcloud CLI, and client libraries—which are described in the Batch documentation.

For more information about each feature and its JSON syntax, see the following:

Cloud Life Sciences features Batch features Details
pipeline (pipeline) job (job) and its tasks (taskGroups[])

A Batch job consists of an array of one or more tasks that each execute all of the same runnables. A Cloud Life Sciences pipeline is similar to a Batch job with one task. However, Cloud Life Sciences does not have an equivalent concept for (jobs with multiple) tasks, which are somewhat like repetitions of a pipeline.

For more information about jobs and tasks, see Overview for Batch.

actions (actions[]) for a pipeline runnables (runnables[]) for a job's tasks

A Cloud Life Sciences action describes a container, but a Batch runnable can contain either a container or script.

credentials (credentials) for an action

for a container runnable:

In Cloud Life Sciences an action's credentials must be a Cloud Key Management Service encrypted dictionary with username and password key-value pairs.

In Batch, the username and password for a container runnable are in separate fields. Either field may be specified with plain text or with the name of a Secret Manager secret.

for an action:

for an environment:

possible environments:

Cloud Life Sciences lets you specify the environment variables for an action that are formatted as plain text or as an encrypted dictionary. In Batch, this is similar to having the environment for a runnable (environment field in runnables[]) include variables that are formatted as plain-text (variables) or an encrypted dictionary (encryptedVariables).

But, Batch also has more options for specifying environment variables:

  • As an alternative to specifying variables as plain text or an encrypted dictionary, you can specify variables using Secret Manager secrets by using a secret variable (secretVariables).
  • As an alternative to specifying an environment variable for a runnable, you can specify an environment variable for all runnables by using the environment field in taskSpec.
  • As an alternative to specifying an environment variable that has the same value for each task, you can specify an environment variable that has a different value for each task by using the taskEnvironments[] field in taskGroups[].

For more information, see Use environment variables.

labels for a request to run a pipeline (labels in the request body) labels for a job (labels in the job resource)

Unlike Cloud Life Sciences, Batch does not include a labels field in the request to create a new job. The closest option for Batch is to use labels that are only associated with the job.

Batch has multiple types of labels (labels fields) that you can use when creating a job. For more information, see Organize resources using labels.

regions (regions[]) and zones (zones[]) for a pipeline's resources (resources) allowed locations (allowedLocations) for a job's resource location policy (locationPolicy)

In Cloud Life Sciences, a pipeline executes on a single VM, which you can specify the desired regions and/or zones for.

In Batch, the equivalent option is the allowed locations for a job, which you can define as one or more regions or zones and specifies where the VMs for a job can be created. All the VMs for a single Batch job belong to a single managed instance group (MIG), which exists in a particular region; however, individual VMs might be in different zones of that region.

Notably, specifying the allowed locations field for a job is optional because it is separate from the job's location. Unlike the job's location, the allowed location does not affect the location that is used for creating a Batch job and storing job metadata. For more information, see Batch locations.

for a pipeline's resources (resources):

for a job's resource policy (allocationPolicy):

In Cloud Life Sciences, you can configure the (one) VM that a pipeline runs on.

In Batch the same options for VMs are available in the fields of a job's resource allocation policy (allocationPolicy):

  • The service account, labels, and network configuration for the VMs are defined in their dedicated fields.
  • The VM field (instances), which you can define either directly or using an instance template, includes the configuration options for the machine type, minimum allowed CPU platform, boot disk and any other attached disks, and any GPUs and GPU drivers.

for an action:

for a runnable:

These various convenience flags from Cloud Life Sciences are equivalent in Batch except they are specified for each runnable (which can contain a script or container) instead of each action (container).

for an action:

options (options) for a container runnable

These Cloud Life Sciences options (and others) are supported in Batch through the options field (options) for a container runnable. Set the options field to any flags that you want Batch to append to the docker run command—for example, -P --pid mynamespace -p 22:22.

for an action:

no equivalent

Batch prefetches images and processes the outputs of all runnables identically in accordance with the job's logs policy (logsPolicy).

option to block external networks (blockExternalNetwork) for an action option to block external networks (blockExternalNetwork) for a container runnable

The Cloud Life Sciences option to block external networks for an action is similar to the Batch option to block external networks for a container.

Batch also has many other networking options, such as to block external networks for all of a job's VMs. For more information, see Batch networking overview.

mounts (mounts[]) for an action volumes for all runnables (volumes[] in taskSpec) and volume options for a container (volumes[] in container)

In Batch, you can use the volumes[] field in taskSpec to define a job's volumes and their mount paths. Batch mounts storage volumes to the job's VMs and storage volumes are accessible to all of the job's runnables (scripts or containers). This mounting is done before the VM executes any tasks or runnables.

Additionally, Batch supports explicit volume options on container runnables by using the volumes[] field in container. These mount options are passed to the container as options for the --volume flag of the docker run command—for example, the [ "/etc:/etc", "/foo:/bar" ] value is translated to the docker run --volume /etc:/etc --volume /foo:/bar command on the container.

For more information about using storage volumes with Batch, see Create and run a job that uses storage volumes.

option to enable Cloud Storage FUSE (enableFuse) for an action no equivalent

Batch handles mounting any storage volumes, such as a Cloud Storage bucket, that you specify for a job. As a result, you don't enable any mounting tools like Cloud Storage FUSE for Batch; however, you can optionally specify mount options for your storage volumes by using the mountOptions[] field.

For more information about using Cloud Storage buckets with Batch, see Create and run a job that uses storage volumes.

Pub/Sub topic (pubSubTopic) for a request to run a pipeline

for a job's notification configurations (notifications[]):

Batch allows greater customization of status updates than Cloud Life Sciences. For example, Batch users can be notified on a Pub/Sub topic when either individual tasks change state or only when the overall job changes state.

Workflow services

If you use a workflow service with Cloud Life Sciences, then your migration process also involves configuring a workflow service to work with Batch. This section summarizes the workflow services that you can use with Batch.

Batch supports Workflows, which is a workflow service from Google Cloud. If you want to use Workflows with Batch, see Run a Batch job using Workflows. Otherwise, the following table describes other workflows services which you might use for Cloud Life Sciences that you can also use with Batch. This table lists the key differences for using each workflow service with Batch instead of Cloud Life Sciences and details on where to learn more about using each service with Batch.

Workflow Service Key Differences Details
Cromwell

To use a Cromwell configuration file for the v2beta Cloud Life Sciences API with the Batch API instead, make the following changes:

  1. For the actor-factory field, replace cromwell.backend.google.pipelines.v2beta.PipelinesApiLifecycleActorFactory with cromwell.backend.google.batch.GcpBatchLifecycleActorFactory.
  2. Remove the genomics.endpoint-url field.
  3. Generate a new configuration file.
To learn more about how to use Batch with Cromwell, see the Cromwell documentation for Batch and Cromwell tutorial for Batch.
dsub

To use a run your dsub pipeline for Cloud Life Sciences with Batch instead, make the following changes:

  • For the provider field, replace google-cls-v2 with google-batch.
To learn more about how to use Batch with dsub, see the dsub documentation for Batch.
Nextflow

To use a Nextflow configuration file for Cloud Life Sciences with Batch instead, make the following changes:

  1. For the executor field, replace google-lifesciences with google-batch.
  2. For any config prefixes, replace google.lifeScience with google.batch.
To learn more about how to use Batch with Nextflow, see a Batch tutorial or Nextflow tutorial For more information about configuration options, see the Nextflow documentation.
Snakemake

To use a Snakemake pipeline for the v2beta Cloud Life Sciences API with the Batch API instead, make the following changes:

  1. Make sure you are using Snakemake version 8 or newer. For more information, see Migration between Snakemake versions.
  2. Make the following changes to the snakemake command:

    • Replace the --google-lifesciences flag with the --executor googlebatch flag.
    • Replace any additional flags that have the --google-lifesciences- prefix to use the --googlebatch- prefix instead.
To learn more about how to use Batch with Snakemake, see the Snakemake documentation for Batch.

What's next

  • To configure Batch for new users and projects, see Get started.
  • To learn how to execute workloads using Batch, see Create a job.