This page explains how to run Jobs in Google Kubernetes Engine.
Overview
In GKE, a Job is a controller object that represents a finite task. Jobs differ from other controller objects in that Jobs manage the task as it runs to completion, rather than managing an ongoing desired state (such as the total number of running Pods).
Jobs are useful for large computation and batch-oriented tasks. Jobs can be used to support parallel execution of Pods. You can use a Job to run independent but related work items in parallel: sending emails, rendering frames, transcoding files, scanning database keys, etc. However, Jobs are not designed for closely-communicating parallel processes such as continuous streams of background processes.
In GKE, there are two types of Jobs:
- Non-parallel Job: A Job which creates only one Pod (which is re-created if the Pod terminates unsuccessfully), and which is completed when the Pod terminates successfully.
- Parallel jobs with a completion count: A Job that is completed when a
certain number of Pods terminate successfully. You specify the desired number
of completions using the
completions
field.
Jobs are represented by Kubernetes Job objects. When a Job is created, the Job controller creates one or more Pods and ensures that its Pods terminate successfully. As its Pods terminate, a Job tracks how many Pods completed their tasks successfully. Once the desired number of successful completions is reached, the Job is complete.
Similar to other controllers, a Job controller creates a new Pod if one of its Pods fails or is deleted.
Before you begin
Before you start, make sure you have performed the following tasks:
- Ensure that you have enabled the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- Ensure that you have installed the Cloud SDK.
Set up default gcloud
settings using one of the following methods:
- Using
gcloud init
, if you want to be walked through setting defaults. - Using
gcloud config
, to individually set your project ID, zone, and region.
Using gcloud init
If you receive the error One of [--zone, --region] must be supplied: Please specify
location
, complete this section.
-
Run
gcloud init
and follow the directions:gcloud init
If you are using SSH on a remote server, use the
--console-only
flag to prevent the command from launching a browser:gcloud init --console-only
-
Follow the instructions to authorize
gcloud
to use your Google Cloud account. - Create a new configuration or select an existing one.
- Choose a Google Cloud project.
- Choose a default Compute Engine zone.
Using gcloud config
- Set your default project ID:
gcloud config set project PROJECT_ID
- If you are working with zonal clusters, set your default compute zone:
gcloud config set compute/zone COMPUTE_ZONE
- If you are working with regional clusters, set your default compute region:
gcloud config set compute/region COMPUTE_REGION
- Update
gcloud
to the latest version:gcloud components update
Creating a Job
You can create a Job using
kubectl apply
with a manifest file.
The following example shows a Job manifest:
apiVersion: batch/v1 kind: Job metadata: # Unique key of the Job instance name: example-job spec: template: metadata: name: example-job spec: containers: - name: pi image: perl command: ["perl"] args: ["-Mbignum=bpi", "-wle", "print bpi(2000)"] # Do not restart containers after they exit restartPolicy: Never # of retries before marking as failed. backoffLimit: 4
Copy the manifest to a file named config.yaml
, and create the Job:
kubectl apply -f config.yaml
This Job computes pi to 2000 places then prints it.
The only requirement for a Job object is that the Pod template
field is
mandatory.
Job completion count
A Job is completed when a specific number of Pods terminate successfully. By default, a non-parallel Job with a single Pod completes as soon as the Pod terminates successfully.
If you have a parallel Job, you can set a completion count using the optional
completions
field. This field specifies how many Pods should terminate
successfully before the Job is complete. The completions
field accepts a
non-zero, positive value.
Omitting completions
or specifying a zero value causes the success of any
Pod to signal the success of all Pods.
Copy config.yaml
from the preceding example to a file named config-2.yaml
.
In config-2.yaml
, change name
to example-job-2
, and add completions: 8
to the Job's spec
field. This specifies that there should be eight successful
completions:
apiVersion: batch/v1 kind: Job metadata: name: example-job-2 spec: completions: 8 template: metadata: name: example-job-2 spec: ...
Create the Job:
kubectl apply -f config-2.yaml
The default value of completions
is 1
. When completions
is set, the
parallelism
field defaults to 1
unless set otherwise. If both fields are not
set, their default values are 1
.
Managing parallelism
By default, Job Pods do not run in parallel. The optional parallelism
field
specifies the maximum desired number of Pods a Job should run concurrently at
any given time.
The actual number of Pods running in a steady state might be less than the
parallelism
value if the remaining work is less than the parallelism
value. If you have also set completions
, the actual number of Pods running in
parallel does not exceed the number of remaining completions. A Job may
throttle Pod creation in response to excessive Pod creation failure.
Copy config.yaml
from the preceding example to a file named config-3.yaml
.
In config-3.yaml
, change name
to example-job-3
, and add parallelism: 5
to the Job's spec
field. This specifies that there should be five concurrent
Pods running:
apiVersion: batch/v1 kind: Job metadata: name: example-job-3 spec: parallelism: 5 template: metadata: name: example-job-3 spec: ...
Create the Job:
kubectl apply -f config-3.yaml
The default value of parallelism
is 1
if the field if omitted or unless set
otherwise. If the value is set to 0
, the Job is paused until the value is
increased.
Specifying retries
By default, a Job runs uninterrupted unless there is a failure, at which point
the Job defers to the backoffLimit
. The backoffLimit
field specifies the
number of retries before marking the job as failed; the default value is 6. The
number of retries applies per Pod, not globally. This means that if multiple
Pods fail (when parallelism
is greater than 1), the job continues to run until
a single Pod fails backoffLimit
of times. Once the backoffLimit
has been
reached, the Job is marked as failed and any running Pods will be
terminated.
For example, in our example job, we set the number of retries to 4
:
apiVersion: batch/v1 kind: Job metadata: name: example-job spec: template: metadata: name: example-job spec: containers: ... backoffLimit: 4
Pod replacement
Job recreates Pods honoring the backoffLimit
when the current Pod is
considered
failed
in scenarios such as:
- The Pod container exits with a non-zero error code.
- When a Node is rebooted, the kubelet may mark the Pod as
Failed
after the reboot
Under certain scenarios a Job that has not completed replaces the Pod without
considering the backoffLimit
, such as:
- Manually killing a Pod would not set the Pod phase to
Failed
. The replacement Pod may be created even before the current pod's termination grace period is completed. - When a Node is drained (manually or during auto-upgrade), the Pod is terminated honoring a drain grace period and is replaced.
- When a Node is deleted, the Pod is garbage collected (marked as deleted) and is replaced.
Specifying a deadline
By default, a Job creates new Pods forever if its Pods fail continuously. If you
prefer not to have a Job retry forever, you can specify a deadline value using
the optional activeDeadlineSeconds
field.
A deadline grants a Job a specific amount of time, in seconds, to complete its tasks successfully before terminating.
To specify a deadline, add the activeDeadlineSeconds
value to the Job's
spec
field in the manifest file. For example, the following
configuration specifies that the Job has 100 seconds to complete successfully:
apiVersion: batch/v1 kind: Job metadata: name: example-job spec: activeDeadlineSeconds: 100 template: metadata: name: example-job spec: ...
If a Job does not complete successfully before the deadline, the Job ends with
the status DeadlineExceeded
. This causes creation of Pods to stop and causes
existing Pods to be deleted.
Specifying a Pod selector
Manually specifying a selector is useful if you want to update a Job's Pod template, but you want the Job's current Pods to run under the updated Job.
A Job is instantiated with a selector
field. The selector
generates a unique
identifier for the Job's Pods. The generated ID does not overlap with any other
Jobs. Generally, you would not set this field yourself: setting a selector
value which overlaps with another Job can cause issues with Pods in the other
Job. To set the field yourself, you must specify manualSelector: True
in the
Job's spec
field.
For example, you can run kubectl get job my-job --output=yaml
to see the Job's
specification, which contains the selector generated for its Pods:
kind: Job metadata: name: my-job ... spec: completions: 1 parallelism: 1 selector: matchLabels: controller-uid: a8f3d00d-c6d2-11e5-9f87-42010af00002 ...
When you create a new Job, you can set the manualSelector
value to True
,
then set the selector
field's job-uid
value like the following:
kind: Job metadata: name: my-new-job ... spec: manualSelector: true selector: matchLabels: controller-uid: a8f3d00d-c6d2-11e5-9f87-42010af00002 ...
Pods created by my-new-job
use the previous Pod UID.
Inspecting a Job
kubectl
To check a Job's status, run the following command:
kubectl describe job my-job
To view all Pod resources in your cluster, including Pods created by the Job which have completed, run:
kubectl get pods -a
The -a
flag specifies that all resources of the type specified (in this case,
Pods) should be shown.
Console
After creating a Job using kubectl
, you can inspect it by performing the
following steps:
Visit the Google Kubernetes Engine Workloads menu in Cloud Console.
Select the desired workload from the menu.
You can inspect the Job in the following ways:
- To see the Job's live configuration, click YAML.
- To see all events related to the Job, click Events.
- To see the Job's revision history, click Revision history.
Deleting a Job
When a Job completes, the Job stops creating Pods. The Job API object is not removed when it completes, which allows you to view its status. Pods created by the Job are not deleted, but they are terminated. Retention of the Pods allows you to view their logs and to interact with them.
kubectl
To delete a Job, run the following command:
kubectl delete job my-job
When you delete a Job, all of its Pods are also deleted.
To delete a Job but retain its Pods, specify the --cascade false
flag:
kubectl delete jobs my-job --cascade false
Console
To delete a Job, perform the following steps:
Visit the Google Kubernetes Engine Workloads menu in Cloud Console.
From the menu, select the desired workload.
Click Delete.
From the confirmation dialog menu, click Delete.