This document explains the execution process and creation options for jobs. Batch jobs let you run batch-processing workloads on Google Cloud. To learn more about jobs, see Get started with Batch.
How job creation and execution works
To use Batch to run a workload, you create a job that specifies your workload and its requirements. When you finish creating the job, the job is automatically queued, scheduled, and executed on the specified resources.
The resources required to run a job—a regional managed instance group (MIG) of Compute Engine virtual machine (VM) instances and any additional resources specified—are automatically provisioned and deprovisioned. The time a job takes to finish queueing and running varies for different jobs and at different times based on factors related to resource availability. Generally, jobs are more likely to run and finish sooner if they are smaller and require only a few common resources. For the example jobs in Batch documentation, which typically use minimal resources, you might see them finish running in as little as a few minutes.
After you create a job, you can check its status by describing the job. After a job's state indicates the job has started running, you can also monitor the job by using Cloud Logging to view and manage logs. A job's other information remains available in Batch until you or Google Cloud deletes it. Google Cloud automatically deletes a job 60 days after it succeeds or fails. Before then, you can delete the job yourself, or, if you need to retain the info, you can export the job before it is deleted.
Job creation options
Create and run a basic job explains the fundamentals, including how to define a job's tasks using either a script or container image and use predefined and custom environment variables.
After you understand the fundamentals for job creation, consider using one or more of the following options:
Control access for a job:
Control access for a job using a custom service account explains how to specify a job's service account, which influences the resources and applications that a job's VMs can access. If you do not specify a custom service account, jobs default to using the Compute Engine default service account.
Networking overview provides an overview of when and how you can customize the networking configuration for a job, including specifying the job's network, blocking external connections, and protecting data and resources by using VPC Service Controls.
Advanced job creation options:
Define job resources using a VM instance template explains how to specify a Compute Engine VM template to define a job's resources when you create a job.
Configure task communication using an MPI library explains how to configure a job with tightly coupled tasks that communicate with each other across different VMs by using a Message Passing Interface (MPI) library. A common use case for MPI is tightly coupled high-performance computing (HPC) workloads.
Use GPUs for a job explains how to define a job that uses one or more graphics processing units (GPUs). Common use cases for jobs that use GPUs include intensive data processing or machine learning (ML) workloads.
Use storage volumes for a job explains how to define a job that can access one or more external storage volumes. Storage options include new or existing persistent disk, new local SSDs, existing Cloud Storage buckets, and an existing network file system (NFS) such as a Filestore file share.
VM OS environment overview provides an overview of when and how you can customize the VM operating system (OS) environment for a job, including the job's VM OS image and boot disks.
Learn the fundamentals of job creation:
If you want to learn how to use additional services to create and run jobs, follow a tutorial:
Create and run Batch jobs using Terraform and Cloud Scheduler explains how to incorporate Batch jobs into Terraform.