Get started with Batch

Stay organized with collections Save and categorize content based on your preferences.

This page describes how to get started with Batch for Google Cloud.

Overview

Batch is a fully managed service that lets you schedule, queue, and execute batch processing workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale.

Using Batch, you don't need to configure and manage third-party job schedulers, provision and deprovision resources, or request resources one zone at a time. To run a job, you specify parameters for the resources required for your workload, then Batch obtains resources and queues the job for execution. Batch provides native integration with other Google Cloud services to aid in the scheduling, execution, storage, and analysis of batch jobs, so you can focus on submitting a job and consuming the results.

Batch consists of the following components:

  • Job: A scheduled program that runs a set of tasks to completion without any user interaction, typically for computational workloads. For example, a job might be a single shell script or a complex, multipart computation.

    A job is executed through one or more specific actions called tasks. Each Batch job consists of an array of one or more tasks that all run the same executable. A job's tasks can run in parallel or sequentially on the job's resources.

  • Tasks: Programmatic actions that are defined as part of a job and executed when the job runs. Each task is part of a job's task group.

  • Resources: The infrastructure needed to run a job. Each Batch job runs on a regional managed instance group (MIG) of Compute Engine VMs based on the job's specified requirements and location. If specified, a job might also use additional compute resources, like GPUs, or additional read/write storage resources, like local SSDs or a Cloud Storage bucket. Some of the factors that determine the number of VMs provisioned for a job include the compute resources required for each task and the job's parallelism: whether you want tasks to run sequentially on one VM or simultaneously on multiple VMs.

In summary, Batch lets you create and run jobs that each automatically provision and utilize the resources required to execute its tasks.

Pricing

There is no additional cost for using Batch. You are only charged for the cost of the underlying resources required to execute your jobs.

Restrictions

Batch has the following restrictions:

Prerequisites

To start using Batch, complete the following prerequisites:

  1. If your project has not used Batch before, enable Batch for your project.
  2. Set up Batch for each new user.

Enable Batch for a project

To start using Batch with a project, do the following:

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  3. Make sure Batch is enabled for your project:

    1. Enable the APIs for Batch using the Google Cloud console or the Google Cloud CLI.

      Console

      Enable the Batch, Compute Engine, and Logging APIs.

      Enable the APIs

      gcloud

      Enable the Batch, Compute Engine, and Logging APIs:

      gcloud services enable batch.googleapis.com compute.googleapis.com 

    2. To ensure that the Compute Engine default service account has the necessary permissions to allow the Batch service agent to create and access resources for jobs, ask your administrator to grant the Compute Engine default service account the following IAM roles:

      • Batch Agent Reporter (roles/batch.agentReporter) on the project
      • To let jobs access a Cloud Storage bucket: Storage Admin (roles/storage.admin) on the bucket
      • To let jobs generate logs in Cloud Logging API: Logs Writer (roles/logging.logWriter) on the project

      For more information about granting roles to default service accounts, see Restricting service accounts and Manage access to service accounts.

Set up Batch for a new user

To start using Batch as a user, do the following:

  1. To get the permissions that you need to use Batch, ask your administrator to grant you the required IAM roles on the project. Refer to the documentation for each task to see its requirements.

    For example, if you want to start learning how to use Batch by creating a basic job, consider requesting roles for the following tasks:

    • To create and delete jobs: Batch Job Administrator (roles/batch.jobsAdmin)
    • To list and describe jobs: Batch Job Administrator (roles/batch.jobsAdmin) or Batch Job Viewer (roles/batch.jobsViewer)
    • To view logs for jobs: Logs Viewer (roles/logging.viewer)

    For more information about granting roles, see Manage access.

  2. If you want to use the command-line examples for Batch, do the following:

    1. Install and initialize the Google Cloud CLI.

    2. Recommended: Set a default project using the gcloud config set project command:

      gcloud config set project PROJECT_ID
      

      where PROJECT_ID is the project ID of your project.

  3. If you want to use the API examples for Batch, see Authenticate to Batch.

What's next