This page describes how to migrate to Batch from Cloud Life Sciences.
On July 17, 2023, Google Cloud announced that Cloud Life Sciences, which had been in beta, is deprecated. The service will no longer be available on Google Cloud after July 8, 2025. However, Batch is generally available and is a comprehensive successor that supports all use cases for Cloud Life Sciences.
Learn more about Batch, Cloud Life Sciences, and product launch stages.
Cloud Life Sciences versus Batch
Migrating from Cloud Life Sciences to Batch primarily involves understanding how you can use Batch for the workloads that you currently execute by running Cloud Life Sciences pipelines.
To understand how you can you can execute your Cloud Life Sciences workloads on Batch, see all of the following the sections:
Overview
A Cloud Life Sciences pipeline describes a sequence of actions (containers) to execute and the environment to execute the containers in.
A Batch job describes an array of one or more tasks and the environment to execute those tasks in. You define the workload for a job as one sequence of one or more runnables (containers and/or scripts) to be executed. Each task for a job represents one execution of its sequence of runnables.
Cloud Life Sciences pipelines can be expressed as single-task Batch jobs.
For example, the following samples describe a simple Cloud Life Sciences pipeline and its equivalent Batch job:
Cloud Life Sciences pipeline | Batch job |
---|---|
{ "actions": [ { "imageUri": "bash", "commands": [ "-c", "echo Hello, world!" ] } ] } |
{ "taskGroups" : [{ "taskSpec" : { "runnables" : [{ "container":{ "imageUri": "bash", "commands": [ "-c", "echo Hello, world!" ] } }] } }] } |
Multiple-task Batch jobs are similar to copied Cloud Life Sciences pipelines.
Unlike Cloud Life Sciences, Batch allows you to automatically schedule multiple executions of your workload. You indicate the number of times that you want to execute the sequence of runnables for a job by defining the number of tasks. When a job has multiple tasks, you specify how you want each execution to vary by referencing the task's index in your runnables. Additionally, you can configure the relative schedules for a job's tasks—for example, whether to allow multiple tasks to run in parallel or to require tasks to run in sequential order and one at a time. Batch manages the scheduling the job's tasks: when a task finishes, the job automatically starts the next task, if any.
For example, see the following Batch job. This example
job has 100 tasks that execute on 10 Compute Engine virtual
machine (VM) instances, so there are approximately 10 tasks running in parallel
at any given time. Each task in this example job only executes one runnable:
a script that prints a message and the task's index, which is defined by the
BATCH_TASK_INDEX
predefined environment variable.
{
"taskGroups" : [{
"taskSpec" : {
"runnables" : [{
"script":{
"text": "echo Hello world! This is task ${BATCH_TASK_INDEX}."
}
}]
},
"taskCount": 100,
"parallelism": 10
}]
}
Workflows that involve the creation and monitoring of multiple similar Cloud Life Sciences pipelines can sometimes be simplified by taking advantage of Batch's built-in scheduling.
Basic operations
This section describes basic operations in Cloud Life Sciences versus Batch.
The following table summarizes the basic operations options for Cloud Life Sciences and Batch.
Basic operation | Cloud Life Sciences options | Batch options |
---|---|---|
Execute a workload. |
|
|
View all of your workloads. |
|
|
View the details and status for a workload. |
|
|
Stop and remove a workload. |
|
|
The basic operations for Cloud Life Sciences and Batch have a few key differences.
Firstly, long-running operation resources do not play the same role in
Batch that they do in Cloud Life Sciences.
Long-running operation resources (LROs) in Cloud Life Sciences
are the primary resource used to list and view your pipelines. But,
long-running operation resources in Batch and other Google Cloud APIs
are only used to monitor the status of a request that takes a long time to
complete. Specifically, in Batch, the only request that
returns a long-running operation resource is deleting a job.
For more information about long-running operation resources for
Batch, see the
Batch API reference documentation for the projects.locations.operations
REST resource.
Instead of using long-running operation resources, Batch has
job resources that you view and delete for your workloads.
Secondly, viewing the details of a workload in Batch involves different operations than Cloud Life Sciences. You can view a job to see both it's details and status. But, each of a job's tasks also has its own details and status that you can see by viewing a list of a job's tasks and viewing the details of a task.
To help you further understand the basic operations for Cloud Life Sciences versus Batch, the following sections provide examples of Google Cloud CLI commands and API requests paths for some of these basic operations.
Example gcloud CLI commands
For gcloud CLI, Cloud Life Sciences commands
begin with gcloud beta lifesciences
and Batch commands
begin with gcloud batch
.
For example, see the following gcloud CLI commands.
Cloud Life Sciences example gcloud CLI commands:
Run a pipeline:
gcloud beta lifesciences pipelines run \ --project=PROJECT_ID \ --regions=LOCATION \ --pipeline-file=JSON_CONFIGURATION_FILE
Get details for a long-running operation:
gcloud beta lifesciences operations describe OPERATION_ID
Replace the following:
PROJECT_ID
: the project ID of your project.LOCATION
: the location for the pipeline.JSON_CONFIGURATION_FILE
: the JSON configuration file for the pipeline.OPERATION_ID
: the identifier for the long-running operation, which was returned by the request to run the pipeline.
Batch example gcloud CLI commands:
Create and run a job:
gcloud batch jobs submit JOB_NAME \ --project=PROJECT_ID \ --location=LOCATION \ --config=JSON_CONFIGURATION_FILE
View the details of a job:
gcloud batch jobs describe JOB_NAME \ --project=PROJECT_ID \ --location=LOCATION \
View a job's list of tasks:
gcloud batch tasks list \ --project=PROJECT_ID \ --location=LOCATION \ --job=JOB_NAME
View the details of a task:
gcloud batch tasks describe TASK_INDEX \ --project=PROJECT_ID \ --location=LOCATION \ --job=JOB_NAME \ --task_group=TASK_GROUP
Delete (and cancel) a job:
gcloud batch jobs delete JOB_NAME \ --project=PROJECT_ID \ --location=LOCATION
Replace the following:
JOB_NAME
: the name of the job.PROJECT_ID
: the project ID of your project.LOCATION
: the location of the job.JSON_CONFIGURATION_FILE
: the path for a JSON file with the job's configuration details.TASK_INDEX
: the index of the task that you want to view the details of. In a task group, the task index starts at 0 for the first task and increases by 1 with each additional task. For example, a task group that contains four tasks has the indexes0
,1
,2
, and3
.TASK_GROUP_NAME
: the name of the task group that you want to view the details of. The value must be set togroup0
.
Example API request paths
For APIs, Cloud Life Sciences uses
lifesciences.googleapis.com
request paths and Batch uses
batch.googleapis.com
request paths.
For example, see the following API request paths. Unlike
Cloud Life Sciences, Batch does not have an RPC API;
it only has a REST API.
Cloud Life Sciences example API request paths:
Run a pipeline:
POST https://lifesciences.googleapis.com/v2beta/projects/PROJECT_ID/locations/LOCATION/pipelines:run
Get details for a long-running operation:
GET https://lifesciences.googleapis.com/v2beta/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID
Replace the following:
PROJECT_ID
: the project ID of your project.LOCATION
: the location for the pipeline.OPERATION_ID
: the identifier for the long-running operation, which was returned by the request to run the pipeline.
Batch example API request paths:
Create and run a job:
POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME
View the details of a job:
GET https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs/JOB_NAME
View a job's list of tasks:
GET https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs/JOB_NAME/taskGroups/TASK_GROUP/tasks
Delete a job
DELETE https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs/JOB_NAME
Check the status of job deletion request:
GET https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID
Replace the following:
PROJECT_ID
: the project ID of your project.LOCATION
: the location of the job.JOB_NAME
: the name of the job.TASK_GROUP_NAME
: the name of the task group that you want to view the details of. The value must be set togroup0
.OPERATION_ID
: the identifier for the long-running operation, which was returned by the request to delete the job.
IAM roles and permissions
This section summarizes the differences in Identity and Access Management roles and permissions for Cloud Life Sciences and Batch. For more information about any roles and their permissions, see the IAM basic and predefined roles reference.
The following table describes the predefined roles and their permissions that are required for users of Cloud Life Sciences.
Cloud Life Sciences roles | Permissions |
---|---|
Any of the following:
|
|
Cloud Life Sciences Viewer (roles/lifesciences.viewer ) on the project |
|
The following table describes some of the predefined roles and their permissions for Batch. Unlike Cloud Life Sciences, Batch requires you to grant permissions to users and the service account for a job. For more information about the IAM requirements, see Prerequisites for Batch.
Batch roles for users | Permissions |
---|---|
Batch Job Editor (roles/batch.jobsEditor ) on the project |
|
Batch Job Viewer (roles/batch.jobsViewer ) on the project |
|
Service Account User (roles/iam.serviceAccountUser ) on the job's service account |
|
Batch roles for service accounts | Permissions |
Batch Agent Reporter (roles/batch.agentReporter ) on the project |
|
Corresponding features
The following table describes the features for Cloud Life Sciences, the equivalent features for Batch, and details about the differences between them.
Each feature is represented by a description and its JSON syntax. You can use JSON syntax when accessing Batch through the API or when specifying a JSON configuration file through the Google Cloud CLI. However, note that you can also use Batch features through other methods—such as through Google Cloud console fields, flags gcloud CLI, and client libraries—which are described in the Batch documentation.
For more information about each feature and its JSON syntax, see the following:
For Cloud Life Sciences, see the Cloud Life Sciences API reference documentation for the
projects.locations.pipelines
REST resource.For Batch, see the Batch API reference documentation for the
projects.locations.jobs
REST resource.
Cloud Life Sciences features | Batch features | Details |
---|---|---|
pipeline (pipeline ) |
job (job ) and its tasks (taskGroups[] ) |
A Batch job consists of an array of one or more tasks that each execute all of the same runnables. A Cloud Life Sciences pipeline is similar to a Batch job with one task. However, Cloud Life Sciences does not have an equivalent concept for (jobs with multiple) tasks, which are somewhat like repetitions of a pipeline. For more information about jobs and tasks, see Overview for Batch. |
actions (actions[] ) for a pipeline |
runnables (runnables[] ) for a job's tasks |
A Cloud Life Sciences action describes a container, but a Batch runnable can contain either a container or script. |
credentials (credentials ) for an action |
for a container runnable: |
In Cloud Life Sciences an action's credentials must be a Cloud Key Management Service encrypted dictionary with username and password key-value pairs. In Batch, the username and password for a container runnable are in separate fields. Either field may be specified with plain text or with the name of a Secret Manager secret. |
for an action:
|
for an environment:
possible environments:
|
Cloud Life Sciences lets you specify the environment variables
for an action that are formatted as plain text or as an encrypted dictionary.
In Batch, this is similar to having the environment
for a runnable ( But, Batch also has more options for specifying environment variables:
For more information, see Use environment variables. |
labels for a request to run a pipeline (labels in the request body) |
labels for a job (labels in the job resource) |
Unlike Cloud Life Sciences, Batch does not include a labels field in the request to create a new job. The closest option for Batch is to use labels that are only associated with the job. Batch has multiple types of labels
( |
regions (regions[] ) and zones (zones[] ) for a pipeline's resources (resources ) |
allowed locations (allowedLocations ) for a job's resource location policy (locationPolicy ) |
In Cloud Life Sciences, a pipeline executes on a single VM, which you can specify the desired regions and/or zones for. In Batch, the equivalent option is the allowed locations for a job, which you can define as one or more regions or zones and specifies where the VMs for a job can be created. All the VMs for a single Batch job belong to a single managed instance group (MIG), which exists in a particular region; however, individual VMs might be in different zones of that region. Notably, specifying the allowed locations field for a job is optional because it is separate from the job's location. Unlike the job's location, the allowed location does not affect the location that is used for creating a Batch job and storing job metadata. For more information, see Batch locations. |
for a pipeline's resources (
|
for a job's resource policy (
|
In Cloud Life Sciences, you can configure the (one) VM that a pipeline runs on. In Batch the
same options for VMs are available in the fields of a
job's resource allocation policy (
|
for an action:
|
for a runnable:
|
These various convenience flags from Cloud Life Sciences are equivalent in Batch except they are specified for each runnable (which can contain a script or container) instead of each action (container). |
for an action:
|
options (options ) for a container runnable |
These Cloud Life Sciences options (and others) are supported in
Batch through the options field ( |
for an action:
|
no equivalent |
Batch prefetches images and processes the
outputs of all runnables identically in accordance with the job's
logs policy ( |
option to block external networks (blockExternalNetwork ) for an action |
option to block external networks (blockExternalNetwork ) for a container runnable |
The Cloud Life Sciences option to block external networks for an action is similar to the Batch option to block external networks for a container. Batch also has many other networking options, such as to block external networks for all of a job's VMs. For more information, see Batch networking overview. |
mounts (mounts[] ) for an action |
volumes for all runnables (volumes[] in taskSpec ) and volume options for a container (volumes[] in container ) |
In Batch, you can use the
Additionally, Batch
supports explicit volume options on container runnables by using the
For more information about using storage volumes with Batch, see Create and run a job that uses storage volumes. |
option to enable Cloud Storage FUSE (enableFuse ) for an action |
no equivalent |
Batch handles mounting any storage volumes,
such as a Cloud Storage bucket, that you specify for a job.
As a result, you don't enable any mounting tools like Cloud Storage FUSE
for Batch; however, you can optionally specify
mount options for your storage volumes by using the
For more information about using Cloud Storage buckets with Batch, see Create and run a job that uses storage volumes. |
Pub/Sub topic (pubSubTopic ) for a request to run a pipeline |
for a job's notification configurations (
|
Batch allows greater customization of status updates than Cloud Life Sciences. For example, Batch users can be notified on a Pub/Sub topic when either individual tasks change state or only when the overall job changes state. |
Workflow services
If you use a workflow service with Cloud Life Sciences, then your migration process also involves configuring a workflow service to work with Batch. This section summarizes the workflow services that you can use with Batch.
Batch supports Workflows, which is a workflow service from Google Cloud. If you want to use Workflows with Batch, see Run a Batch job using Workflows. Otherwise, the following table describes other workflows services which you might use for Cloud Life Sciences that you can also use with Batch. This table lists the key differences for using each workflow service with Batch instead of Cloud Life Sciences and details on where to learn more about using each service with Batch.
Workflow Service | Key Differences | Details |
---|---|---|
Cromwell |
To use a Cromwell configuration file for the v2beta Cloud Life Sciences API with the Batch API instead, make the following changes:
|
To learn more about how to use Batch with Cromwell, see the Cromwell documentation for Batch and Cromwell tutorial for Batch. |
dsub |
To use a run your dsub pipeline for Cloud Life Sciences with Batch instead, make the following changes:
|
To learn more about how to use Batch with dsub, see the dsub documentation for Batch. |
Nextflow |
To use a Nextflow configuration file for Cloud Life Sciences with Batch instead, make the following changes:
|
To learn more about how to use Batch with Nextflow, see a Batch tutorial or Nextflow tutorial For more information about configuration options, see the Nextflow documentation. |
Snakemake |
To use a Snakemake pipeline for the v2beta Cloud Life Sciences API with the Batch API instead, make the following changes:
|
To learn more about how to use Batch with Snakemake, see the Snakemake documentation for Batch. |
What's next
- To configure Batch for new users and projects, see Get started.
- To learn how to execute workloads using Batch, see Create a job.