This page shows you how to run batch jobs with Batch on GKE (Batch).
There are two ways to submit jobs in Batch: ksub
and kubectl
.
The ksub
command can submit shell scripts as jobs, and kubectl
can submit
jobs using yaml files.
Configuring ksub
Ksub is a command line tool to perform job related actions on your Batch
system. You can use keywords prefixed with #KB
to specify job properties.
To configure Ksub, perform the following steps:
Enable ksub to use your own user credentials for API access:
gcloud auth application-default login
Change to the kbatch directory: Depending on your OS, choose the right ksub version from the ksub folder. For example, for Linux users, use this version:
cd ksub/linux/amd64
Setup a default configuration file:
./ksub --config --create-default
This creates a configuration file at
~/.ksubrc
.Add the values for project-id, cluster-name, and, if you are not operating in the default namespace, namespace-name using a ksub command like this:
./ksub --config --set-project-id project-id \ --set-clustername cluster-name --set-namespace namespace-name
Setting up mount points [Optional if you are not using a private PersistentVolumeClaim created by your cluster admin]
./ksub --config --add-volume fs-volume --volume-source PersistentVolumeClaim --params claimName:[PVC_NAME] --params readOnly:false
where [PVC_NAME] is the PersistentVolumeClaim name created by your admin for you to save your private input/output files. If using a Filestore instance, this is the Filestore instance's name.
Add the install directory of ksub to $PATH:
export PATH=$PATH:/path/to/kbatch/
Configuring kubectl
The default tool for Kubernetes is kubectl, which is already included in the Cloud SDK.
To ensure you have the current version of kubectl, run the following command:
gcloud components update
Running sample jobs
Batch on GKE includes several sample jobs.
Running a single-task job
Get the samples:
git clone https://github.com/GoogleCloudPlatform/Kbatch.git
Create the default Batch admin resources in the "default" K8s namespace:
./samples/defaultresources/create.sh
You can submit the job with ksub or kubectl.
ksub
Run the ComputePi Job under
/samples/computepi
:ksub run_pi_with_ksub.sh
This command outputs the job name.
Wait for the job to complete:
ksub -Gw [JOB_NAME]
Get the task name:
ksub -Ga [JOB_NAME]
This command outputs the task name.
View the logs:
ksub -L [TASK_NAME]
kubectl
Run the ComputePi Job in
/samples/computepi
:kubectl create -f pi-job.yaml
This output is:
batchjob.kbatch.k8s.io/[JOB_NAME] created
Identify the Pod associated with the job:
kubectl get pods | grep [JOB_NAME]
The output is:
[POD_NAME] 0/1 Completed 0 1m
View the logs:
kubectl logs pod/[POD_NAME]
Running a job that uses Preemptible VMs
Get the samples. Skip this step if you have done it for other sample jobs.
git clone https://github.com/GoogleCloudPlatform/Kbatch.git
Create the default Batch admin resources in the "default" K8s namespace. Skip this step if you have done it for other sample jobs.
./samples/defaultresources/create.sh
You can submit the job with ksub or kubectl.
ksub
Run the ComputePi Preemptible Job under
/samples/computepi
:ksub run_pi_preemptible_with_ksub.sh
This command outputs the job name.
Wait for the job to complete:
ksub -Gw [JOB_NAME]
Get the task name:
ksub -Ga [JOB_NAME]
This command outputs the task name.
View the logs:
ksub -L [TASK_NAME]
kubectl
Run the ComputePi Preemptible Job in
/samples/computepi
:kubectl create -f pi-job-preemptible.yaml
This output is:
batchjob.kbatch.k8s.io/[JOB_NAME] created
Identify the Pod associated with the job:
kubectl get pods | grep [JOB_NAME]
The output is:
[POD_NAME] 0/1 Completed 0 1m
If instead of this output you see "No resources found in default namespace." you may need to wait a few seconds and retry the command. It can take upwards of a minute for the Pod to be created. Once the job has finished the Pod will disappear about 90 seconds later and you will get the same "No resources..." output again.
View the logs:
kubectl logs pod/[POD_NAME]
Running jobs with dependencies
With dependencies, you can run some jobs only when specific conditions related to previous jobs have occurred. The Beta version supports 3 dependency types:
- Success
- A job will run only if all the jobs it depends on have succeeded.
- Failed
- A job will run only if all the jobs it depends on have failed.
- Finished
- A job will run only once all the jobs it depends on have completed.
If the system decides not to run a job because a dependency cannot be met,
Batch marks the job as Failed
. For example, if job1 depends on
job2 with the dependency type Success
and job2 fails, then job1 never runs and
is considered to have failed. Otherwise, job failure and success are determined
by the success or failure of the Pod associated with the job as defined by the
Kubernetes Pod lifecycle.
Before running this sample job, you must set up a Google Cloud Filestore instance, in the same zone that is your GKE cluster's node location) for inputs / outputs.
Get and extract the Batch samples, admintools and usertools. Skip this step if you have done it for other sample jobs.
git clone https://github.com/GoogleCloudPlatform/Kbatch.git
tar -xzvf kbatch-github.tar.gz
Create the default Batch admin resources in the "default" K8s namespace. Skip this step if you have done it for other sample jobs.
./samples/defaultresources/create.sh
Change to the imageprocess folder.
cd ../imageprocess
Run
apply-extra-config.sh
to create the PersistentVolume resources and permissions. Type 'y' when asked if you can "run as root in BatchTasks and access storage."pushd ../userstorage ./apply-extra-config.sh popd
Update ksub config to use the persistent volume claim created the previous step:
./ksub --config --add-volume fs-volume --volume-source PersistentVolumeClaim \ --params claimName:[PVC_NAME] --params readOnly:false
where [PVC_NAME] is the name of the PVC created in step 4. In this example, a PersistentVolumeClaim named
pvc
was created in step 4, so. replace [PVC_NAME] withpvc
.Run
copy-input.sh
to copy the input image to Filestore.
You can submit the job with ksub or kubectl.
ksub
Submitting a job with dependencies using ksub
There are two ways to submit a job with dependencies: specify the dependency with a single command or manually edit the KB Dependency Success: field in the shell script.
Specify dependencies with ksub
Run the following command from
samples/imageprocess
to run the both jobs with a dependency:ksub --dependency Success:job_name -- run_grey_with_ksub.sh
You can also use shell variables to connect the jobs.
Create the job1:
job1=`ksub run_checkerboard_with_ksub.sh` \
Submit job2:
ksub --dependency Success:${job1} -- ./run_grey_with_ksub.sh
Get the task name:
ksub -Ga [JOB_NAME]
This command outputs the task name.
View the logs:
ksub -L [TASK_NAME]
Run
copy-output.sh
to copy the processed image to your local machine.
Use your first job's name to create a dependency.
Submit ImageProcess Jobs:
ksub run_checkerboard_with_ksub.sh
This outputs the [JOB_NAME], for example:
checkerboard-64t5n
The following
run_grey_with_ksub.sh
describes a sample script for job2 with a dependency on job1:#!/bin/sh #KB Jobname grey- #KB Namespace default #KB Image gcr.io/kbatch-images/greyimage/greyimage:latest #KB Queuename default #KB MaxWallTime 5m #KB MinCpu 1.0 #KB MinMemory 2Gi #KB Mount fs-volume /mnt/pv #KB Dependency Success:[JOB_NAME] echo "Starting job grey" # greyimage is in /app directory. cd /app ./greyimage -in=/mnt/pv/checker.png -out=/mnt/pv/checkergrey.png echo "Completed job grey"
Open
run_grey_with_ksub.sh
with the editor of your choice and replace [JOB_NAME] with your job name.Submit the second job
ksub run_grey_with_ksub.sh
This outputs the job name.
Get the task name:
ksub -Ga [JOB_NAME]
This command outputs the task name.
View the logs:
ksub -L [TASK_NAME]
Run
copy-output.sh
to copy the processed image to your local machine.
kubectl
Submitting a job with dependencies using kubectl
Submit ImageProcess Jobs:
kubectl create -f imageprocess-job.yaml
The output is similar to this:
batchjob.kbatch.k8s.io/checkerboard created batchjob.kbatch.k8s.io/grey created
Examine the first job:
kubectl describe batchjob/checkerboard
Examine the second job:
kubectl describe batchjob/grey
Run
copy-output.sh
to copy the processed image to your local machine.
Running a job that uses GPUs
Get and extract the Batch samples, admintools and usertools. Skip this step if you have done it for other sample jobs.
git clone https://github.com/GoogleCloudPlatform/Kbatch.git
tar -xzvf kbatch-github.tar.gz
Create the default Batch admin resources in the "default" K8s namespace. Skip this step if you have done it for other sample jobs.
./samples/defaultresources/create.sh
You can submit the job with ksub or kubectl.
ksub
Verify that
samples/GPUjob/run_gpu_with_ksub.sh
indicates a GPU type that is available in your cluster.Submit the job:
ksub samples/GPUjob/run_gpu_with_ksub.sh
This outputs the job name.
Wait for the job to complete:
ksub -Gw [JOB_NAME]
Get the task name:
ksub -Ga [JOB_NAME]
This command outputs the task name.
View the logs:
ksub -L [TASK_NAME]
gcloud
Verify the GPU shown in the gpu-job.yaml file matches a GPU type that is available in your autoscaler zone.
Submit the job:
kubectl create -f samples/GPUjob/run_gpu_with_ksub.sh
The output is similar to:
batchjob.kbatch.k8s.io/[JOB_NAME] created
View the logs:
kubectl describe batchjob/[JOB_NAME]
Running array jobs
An array job is a group of tasks that shares the same container image and are differentiated by the array index. By specifying the "taskCount" in IndexSpec, a job can generate up to one thousand tasks.
To run the array job example, you must set up a Google Cloud Filestore instance, in the same zone as your {product_name_short}} cluster's node location for inputs / outputs. Once your filestore instance is created, perform the following steps to run array job example:
Get and extract the Batch samples, admintools and usertools. Skip this step if you have done it for other sample jobs.
git clone https://github.com/GoogleCloudPlatform/Kbatch.git
tar -xzvf kbatch-github.tar.gz
Create the default Batch admin resources in the "default" K8s namespace. Skip this step if you have done it for other sample jobs.
./samples/defaultresources/create.sh
Change to the arrayjob folder.
cd ../arrayjob
Run
setup.sh
to create the PersistentVolume resources and permissions. You need to input the filestore instance IP, zone and volume name../setup.sh
Type 'y' when asked if you can "run as root in BatchTasks and access storage."
Update ksub config to use the persistent volume claim created the previous step:
./ksub --config --add-volume fs-volume --volume-source PersistentVolumeClaim
--params claimName:[PVC_NAME] --params readOnly:falsewhere [PVC_NAME] is the name of the PVC created in step 4. In this example, a PersistentVolumeClaim named
pvc
was created in step 4 so replace [PVC_NAME] withpvc
.Run
copy-array-input.sh
to copy the input images to Filestore../copy-array-input.sh
The input files are copied into the array-image-data directory under your PVC.
Use ksub to submit your array job:
ksub ./run_array_image_ksub.sh`
Alternatively, you can also submit an array job by yaml file:
kubectl create -f array-image.yaml
Check the array job status by the job name generated in the previous step:
ksub -Ga [JOB_NAME],/var> --output=wide
orksub -G [JOB_NAME],/var> --output=describe
Once the job is successfully finished, the output files are saved into the array-image-data direcotory under your PVC.
To copy the output data into your local machine, run the following command:
./copy-array-output.sh
To validate, go to the array-image-data diretory. Each input file should have a corresponding output file.
Submitting jobs
You can submit jobs with ksub or kubectl.
Using ksub
Ksub allows for submission of scripts as jobs. You can use keywords prefixed with #KB to specify job properties.
The following run_pi_with_ksub.sh describes a sample ksub job:
#!/bin/sh # Keywords to specify job parameters #KB Jobname pi- #KB Namespace default #KB Image gcr.io/kbatch-images/generate-pi/generate-pi:latest #KB Queuename default #KB MaxWallTime 5m #KB MinCpu 1.0 #KB MinMemory 2Gi echo "Starting job pi" # pi is in /app directory. cd /app ./pi echo "Completed job pi"
To submit the script, run the following command:
ksub run_pi_with_ksub.sh
Specifying ksub Keywords
Specify your job's parameters with keywords. These keywords are prefixed with #KB. Ksub expects keywords in a block of lines without blank lines or spaces between them. Ksub stops parsing #KB keywords after the 1st line which doesn't start with #KB.
Ksub supports the following keywords:
Keyword | Comment | Example |
---|---|---|
Jobname | Jobname prefix used to generate jobname. | #KB Jobname pi- |
Namespace | Namespace job operates in. | #KB Namespace default |
Queuename | Queue job is submitted to. | #KB Queuename default |
Image | Image that runs the job container. | #KB Image ubuntu |
Mount | PVC to mount and location where it should be mounted. | #KB Mount fs-volume /tmp |
MinCpu | Number of CPUs the job requires. | #KB MinCpu 1.0 |
MinMemory | Amount of memory required by the container. | #KB MinMemory 2Gi |
Gpu | Number and type of GPUs required for the job. In the example on the right, nvidia-tesla-k80 is the type of GPUs to be used, and "2" is the number of GPUs to be used |
#KB GPU nvidia-tesla-k80 2 |
Dependency | Dependencies of the job | #KB Dependency Success:job-name1 |
MaxWallTime | Maximum run time of the job | #KB MaxWallTime 5m |
TaskCount | Task count for an array job (Maximum: 1000) | #KB TaskCount 10 |
Using kubectl
Kubectl connects to the Batch system using the Kubernetes configuration for the cluster.
The following pi-job.yaml describes a sample YAML job:
apiVersion: kbatch.k8s.io/v1beta1 kind: BatchJob metadata: generateName: pi- # generateName allows the system to generate a random name, using this prefix, for the BatchJob upon creation. namespace: default spec: batchQueueName: default taskGroups: - name: main maxWallTime: 5m template: spec: containers: - name: pi # This image has been made public so it can be pulled from any project. image: gcr.io/kbatch-images/generate-pi/generate-pi:latest resources: requests: cpu: 1.0 memory: 2Gi limits: cpu: 1.0 memory: 2Gi imagePullPolicy: IfNotPresent restartPolicy: Never
To submit the job run the following command:
kubectl create -f pi-job.yaml
Managing data
Get the user tools:
git clone https://github.com/GoogleCloudPlatform/Kbatch.git
Go to the monitoring directory:
cd usertools/filestore
Batch on GKE provides a utility for you to copy files into/from a Kubernetes PersistentVolume.
The basic usage of the script is:
./datacopy.sh [-d|-u] -l [LOCAL_FILE] -r [REMOTE_FILE_PATH] -p [PVC_NAME]
Where:
-u
to copy data from your workstation to the Cloud.-d
to copy data from the Cloud to your workstation.-h
prints a helpful usage message.
If you wanted to run a job that uses the input file input.dat
that is in the
current directory on your local machine, you can type the following command to
copy the input file to your personal Batch directory:
./datacopy.sh -u -l input.dat -r problem-1-input.data -p [NAME]-team1
Viewing jobs
You can view jobs by using ksub or kubectl.
ksub
Viewing jobs by user in a queue
View the jobs by user by running the following command:
ksub -Q -n [NAMESPACE] [QUEUE_NAME]
Where the first [NAMESPACE] is your namespace and [QUEUE_NAME] is your Queue name.
The output is similar to:
Name: pi-s4dwl, Status: Succeeded
Viewing jobs in a queue
View the jobs in a queue by running the following command:
ksub -Qa -n [NAMESPACE] [QUEUE_NAME]
Where the first [NAMESPACE] is your namespace and [QUEUE_NAME] is your Queue name.
The output is similar to:
Name: pi-s4dwl, Creation Time Stamp: 2019-09-12 13:03:42 -0700 PDT, Status: Succeeded
kubectl
Viewing jobs in a queue
View the jobs in a queue by running the following command:
kubectl get batchjobs --selector=batchQueue=[QUEUE_NAME] --namespace [NAMESPACE]
Where [QUEUE_NAME] is your Queue name and [NAMESPACE] is your namespace.
The output is similar to:
NAME AGE pi-6rc7s 2m
Viewing jobs by user
View jobs by user by performing the following instructions:
Retrieve your username:
gcloud config get-value account
The output is similar to:
user@company.com
Run the following command to view a list of jobs by a user in a Namespace:
kubectl get batchjobs --selector=submittedBy=[userATexample.com] --namespace [NAMESPACE]
Where
--selector=submittedBy
is your emailATcompany.com and [NAMESPACE] is your namespace.The output is similar to:
NAME AGE pi-6rc7s 36m
Run the following command to view a list of jobs by a user in a Queue:
kubectl get batchjobs --selector=batchQueue=[QUEUE_NAME],submittedBy=[userATexample.com] --namespace [NAMESPACE]
Where [QUEUE_NAME] is your Queue name, emailATcompany.com, and [NAMESPACE] is your namespace.
The output is similar to:
NAME AGE pi-6rc7s 36m
Stopping jobs
You can stop a running job by using ksub or kubectl. The job will be marked as "Failed" with the condition "JobTerminationByUser" while reserving the historical data associated with the job.
ksub
Run the following command to terminate the job:
ksub -T [JOB_NAME] -n [NAMESPACE]
where [JOB_NAME] is your Job name and [NAMESPACE] is your namespace.
The output is similar to this:
Termination request for job [JOB_NAME] is sent to the server, please check the job status
You can also run the following command to watch the job until it completes:
ksub -Gw [JOB_NAME] -n [NAMESPACE]
kubectl
To terminate a running job, run the following command:
kubectl patch batchjob [JOB_NAME] --namespace [NAMESPACE] --type merge --patch '{"spec": {"userCommand": "Terminate"}}'
where [JOB_NAME] is your Job name and [NAMESPACE] is your namespace.
The output is similar to this:
batchjob.kbatch.k8s.io/[JOB_NAME] patched
Viewing logs for jobs
Job logs can be viewed only after the job has started executing, that is if the job has moved from queued to a running state. You must wait for the job to start before viewing it's logs.
To get a job's log, run the following command:
ksub -L -n [NAMESPACE] [JOB_NAME]
Where [NAMESPACE] is your namespace, and [JOB_NAME] is your job name.
Troubleshooting
If a BatchTask stays at the "Ready" phase for a long time, you or your cluster admin can go to the "Workloads" tab in cloud console to check the corresponding Pod.
There is a known issue in GKE 1.14 where a Pod can stuck at the "ContainerCreating" stage due to network set up issues. In this case, you can terminate the job and resubmit it. This issue will be fixed in the GKE 1.15 release.
Due to new restrictions around labels in GKE, please use v0.9.1 or newer of Batch on GKE to ensure jobs run properly.