Running jobs with Batch on GKE

This page shows you how to run batch jobs with Batch on GKE (Batch). There are two ways to submit jobs in Batch: ksub and kubectl. The ksub command can submit shell scripts as jobs, and kubectl can submit jobs using yaml files.

Configuring ksub

Ksub is a command line tool to perform job related actions on your Batch system. You can use keywords prefixed with #KB to specify job properties.

To configure Ksub, perform the following steps:

  1. Enable ksub to use your own user credentials for API access:

    gcloud auth application-default login
    
  2. Change to the kbatch directory:

    cd kbatch
    
  3. Setup a default configuration file:

    ./ksub --config --create-default
    

    This creates a configuration file at ~/.ksubrc.

  4. Add the default values for projectID and clusterName in .ksubrc.

  5. If you are not operating in the default namespace, namespace, set up a new default namespace by editing the namespace field.

    vi ~/.ksubrc
    
  6. Setting up mount points [Optional if you are not using a private PersistentVolumeClaim created by your cluster admin]

    ./ksub --config --add-volume fs-volume --volume-source PersistentVolumeClaim
     --params claimName:[PVC_NAME] --params readOnly:false
    

    where [PVC_NAME] is the PersistentVolumeClaim name created by your admin for you to save your private input/output files.

  7. Add the install directory of ksub to $PATH:

    export PATH=$PATH:/path/to/kbatch/
    

Configuring kubectl

The default tool for Kubernetes is kubectl, which is already included in the Cloud SDK.

To ensure you have the current version of kubectl, run the following command:

gcloud components update

Running sample jobs

Batch on GKE includes several sample jobs.

ComputePi

  1. Get the samples:

    git clone https://github.com/GoogleCloudPlatform/Kbatch.git
    
  2. Create the default Batch admin resources in the "default" K8s namespace:

    ./samples/defaultresources/create.sh
    

You can submit the job with ksub or kubectl.

ksub

  1. Run the ComputePi Job under /samples/computepi:

    ksub run_pi_with_ksub.sh
    

    This command outputs the job name.

  2. Wait for the job to complete:

    ksub -Gw [JOB_NAME]
    
  3. Get the task name:

    ksub -Ga [JOB_NAME]
    

    This command outputs the task name.

  4. View the logs:

    ksub -L [TASK_NAME]
    

kubectl

  1. Run the ComputePi Job in /samples/computepi:

    kubectl create -f pi-job.yaml
    

    This output is:

    batchjob.kbatch.k8s.io/[JOB_NAME] created
    
  2. Identify the Pod associated with the job:

    kubectl get pods | grep [JOB_NAME]
    

    The output is:

    [POD_NAME]   0/1     Completed   0          1m
    
  3. View the logs:

    kubectl logs pod/[POD_NAME]
    

Running jobs with dependencies

With dependencies, you can run some jobs only when specific conditions related to previous jobs have occurred. The Beta version supports 3 dependency types:

Success
A job will run only if all the jobs it depends on have succeeded.
Failed
A job will run only if all the jobs it depends on have failed.
Finished
A job will run only once all the jobs it depends on have completed.

If the system decides not to run a job because a dependency cannot be met, Batch marks the job as Failed?. For example, if job1 depends on job2 with the dependency type Success and job2 fails, then job1 never runs and is considered to have failed. Otherwise, job failure and success are determined by the success or failure of the Pod associated with the job as defined by the Kubernetes Pod lifecycle.

Before running this sample job, you must set up a Google Cloud Filestore instance, in the same zone that is your GKE cluster's node location) for inputs / outputs.

  1. Get and extract the Batch samples, admintools and usertools. Skip this step if you have done it for other sample jobs.

    git clone https://github.com/GoogleCloudPlatform/Kbatch.git
    
    tar -xzvf kbatch-github.tar.gz
    
  2. Create the default Batch admin resources in the "default" K8s namespace. Skip this step if you have done it for other sample jobs.

    ./samples/defaultresources/create.sh
    
  3. Change to the imageprocess folder.

    cd ../imageprocess
    
  4. Run apply-extra-config.sh to create the PersistentVolume resources and permissions. Type 'y' when asked if you can "run as root in BatchTasks and access storage."

  5. Update ksub config to use the persistent volume claim created the previous step:

    ./ksub --config --add-volume fs-volume --volume-source PersistentVolumeClaim \
     --params claimName:[PVC_NAME] --params readOnly:false
    

    where [PVC_NAME] is the name of the PVC created in step 4. In this example, a PersistentVolumeClaim named pvc was created in step 4, so. replace [PVC_NAME] with pvc.

  6. Run copy-input.sh to copy the input image to Filestore.

You can submit the job with ksub or kubectl.

ksub

Submitting a job with dependencies using ksub

There are two ways to submit a job with dependencies: specify the dependency with a single command or manually edit the KB Dependency Success: field in the shell script.

Specify dependencies with ksub

  1. Run the following command from samples/imageprocess to run the both jobs with a dependency:

    ksub --dependency Success:job_name -- run_grey_with_ksub.sh
    

    You can also use shell variables to connect the jobs.

    Create the job1:

    job1=`ksub run_checkerboard_with_ksub.sh` \
    

    Submit job2:

    ksub --dependency Success:${job1} -- ./run_grey_with_ksub.sh
    
  2. Get the task name:

    ksub -Ga [JOB_NAME]
    

    This command outputs the task name.

  3. View the logs:

    ksub -L [TASK_NAME]
    
  4. Run copy-output.sh to copy the processed image to your local machine.

Use your first job's name to create a dependency.

  1. Submit ImageProcess Jobs:

    ksub run_checkerboard_with_ksub.sh
    

    This outputs the [JOB_NAME], for example:

    checkerboard-64t5n
    

    The following run_grey_with_ksub.sh describes a sample script for job2 with a dependency on job1:

    #!/bin/sh
    
    #KB Jobname grey-
    #KB Namespace default
    #KB Image gcr.io/kbatch-images/greyimage/greyimage:latest
    #KB Queuename default
    #KB MaxWallTime 5m
    #KB MinCpu 1.0
    #KB MinMemory 2Gi
    #KB Mount fs-volume /mnt/pv
    #KB Dependency Success:[JOB_NAME]
    
    echo "Starting job grey"
    # greyimage is in /app directory.
    cd /app
    ./greyimage -in=/mnt/pv/checker.png -out=/mnt/pv/checkergrey.png
    echo "Completed job grey"
    
  2. Open run_grey_with_ksub.sh with the editor of your choice and replace [JOB_NAME] with your job name.

  3. Submit the second job

    ksub run_grey_with_ksub.sh
    

    This outputs the job name.

  4. Get the task name:

    ksub -Ga [JOB_NAME]
    

    This command outputs the task name.

  5. View the logs:

    ksub -L [TASK_NAME]
    
  6. Run copy-output.sh to copy the processed image to your local machine.

kubectl

Submitting a job with dependencies using kubectl

  1. Submit ImageProcess Jobs:

    kubectl create -f imageprocess-job.yaml
    

    The output is similar to this:

    batchjob.kbatch.k8s.io/checkerboard created
    batchjob.kbatch.k8s.io/grey created
    
  2. Examine the first job:

    kubectl describe batchjob/checkerboard
    
  3. Examine the second job:

    kubectl describe batchjob/grey
    
  4. Run copy-output.sh to copy the processed image to your local machine.

Running a job that uses GPUs

  1. Get and extract the Batch samples, admintools and usertools. Skip this step if you have done it for other sample jobs.

    git clone https://github.com/GoogleCloudPlatform/Kbatch.git
    
    tar -xzvf kbatch-github.tar.gz
    
  2. Create the default Batch admin resources in the "default" K8s namespace. Skip this step if you have done it for other sample jobs.

    ./samples/defaultresources/create.sh
    

You can submit the job with ksub or kubectl.

ksub

  1. Verify that samples/GPUjob/run_gpu_with_ksub.sh indicates a GPU type that is available in your cluster.

  2. Submit the job:

    samples/GPUjob/run_gpu_with_ksub.sh
    

    This outputs the job name.

  3. Wait for the job to complete:

    ksub -Gw [JOB_NAME]
    
  4. Get the task name:

    ksub -Ga [JOB_NAME]
    

    This command outputs the task name.

  5. View the logs:

    ksub -L [TASK_NAME]
    

gcloud

  1. Verify the GPU shown in the gpu-job.yaml file matches a GPU type that is available in your autoscaler zone.

  2. Submit the job:

    kubectl create -f samples/GPUjob/run_gpu_with_ksub.sh
    

    The output is similar to:

    batchjob.kbatch.k8s.io/[JOB_NAME] created
    
  3. View the logs:

    kubectl describe batchjob/[JOB_NAME]
    

Submitting jobs

You can submit jobs with ksub or kubectl.

Using ksub

Ksub allows for submission of scripts as jobs. You can use keywords prefixed with #KB to specify job properties.

The following run_pi_with_ksub.sh describes a sample ksub job:

#!/bin/sh

# Keywords to specify job parameters

#KB Jobname pi-
#KB Namespace default
#KB Image gcr.io/kbatch-images/generate-pi/generate-pi:latest
#KB Queuename default
#KB MaxWallTime 5m
#KB MinCpu 1.0
#KB MinMemory 2Gi

echo "Starting job pi"
# pi is in /app directory.
cd /app
./pi
echo "Completed job pi"

To submit the script, run the following command:

ksub run_pi_with_ksub.sh

Specifying ksub Keywords

Specify your job's parameters with keywords. These keywords are prefixed with #KB. Ksub expects keywords in a block of lines without blank lines or spaces between them. Ksub stops parsing #KB keywords after the 1st line which doesn't start with #KB.

Ksub supports the following keywords:

Keyword Comment Example
Jobname Jobname prefix used to generate jobname. #KB Jobname pi-
Namespace Namespace job operates in. #KB Namespace default
Queuename Queue job is submitted to. #KB Queuename default
Image Image that runs the job container. #KB Image ubuntu
Mount PVC to mount and location where it should be mounted. #KB Mount fs-volume /tmp
MinCpu Number of CPUs the job requires. #KB MinCpu 1.0
MinMemory Amount of memory required by the container. #KB MinMemory 2Gi
Gpu Number and type of GPUs required for the job. In the example on the right, nvidia-tesla-k80 is the type of GPUs to be used, and "2" is the number of GPUs to be used #KB GPU nvidia-tesla-k80 2
Dependency Dependencies of the job #KB Dependency Success:job-name1
MaxWallTime Maximum run time of the job #KB MaxWallTime 5m

Using kubectl

Kubectl connects to the Batch system using the Kubernetes configuration for the cluster.

The following pi-job.yaml describes a sample YAML job:

apiVersion: kbatch.k8s.io/v1beta1
kind: BatchJob
metadata:
  generateName: pi-  # generateName allows the system to generate a random name, using this prefix, for the BatchJob upon creation.
  namespace: default
spec:
  batchQueueName: default
  taskGroups:
  - name: main
    maxWallTime: 5m
    template:
      spec:
        containers:
        - name: pi
          # This image has been made public so it can be pulled from any project.
          image: gcr.io/kbatch-images/generate-pi/generate-pi:latest
          resources:
            requests:
              cpu: 1.0
              memory: 2Gi
            limits:
              cpu: 1.0
              memory: 2Gi
          imagePullPolicy: IfNotPresent
        restartPolicy: Never

To submit the job run the following command:

kubectl create -f pi-job.yaml

Managing data

  1. Get the user tools:

    git clone https://github.com/GoogleCloudPlatform/Kbatch.git
    
  2. Go to the monitoring directory:

    cd usertools/filestore
    

Batch on GKE provides a utility for you to copy files into/from a Kubernetes PersistentVolume.

The basic usage of the script is:

./datacopy.sh [-d|-u] -l [LOCAL_FILE] -r [REMOTE_FILE_PATH] -p [PVC_NAME]

Where:

  • -u to copy data from your workstation to the Cloud.
  • -d to copy data from the Cloud to your workstation.
  • -h prints a helpful usage message.

If you wanted to run a job that uses the input file input.dat that is in the current directory on your local machine, you can type the following command to copy the input file to your personal Batch directory:

./datacopy.sh -u -l input.dat -r problem-1-input.data -p [NAME]-team1

Viewing jobs

You can view jobs by using ksub or kubectl.

ksub

Viewing jobs by user in a queue

View the jobs by user by running the following command:

ksub -Q -n [NAMESPACE] [QUEUE_NAME]

Where the first [NAMESPACE] is your namespace and [QUEUE_NAME] is your Queue name.

The output is similar to:

Name: pi-s4dwl, Status: Succeeded

Viewing jobs in a queue

View the jobs in a queue by running the following command:

ksub -Qa -n [NAMESPACE] [QUEUE_NAME]

Where the first [NAMESPACE] is your namespace and [QUEUE_NAME] is your Queue name.

The output is similar to:

Name: pi-s4dwl, Creation Time Stamp: 2019-09-12 13:03:42 -0700 PDT, Status: Succeeded

kubectl

Viewing jobs in a queue

View the jobs in a queue by running the following command:

kubectl get batchjobs --selector=batchQueue=[QUEUE_NAME] --namespace [NAMESPACE]

Where [QUEUE_NAME] is your Queue name and [NAMESPACE] is your namespace.

The output is similar to:

NAME       AGE
pi-6rc7s   2m

Viewing jobs by user

View jobs by user by performing the following instructions:

  1. Retrieve your username:

    gcloud config get-value account
    

    The output is similar to:

    user@company.com
    
  2. Run the following command to view a list of jobs by a user in a Namespace:

    kubectl get batchjobs --selector=submittedBy=[userATexample.com] --namespace [NAMESPACE]
    

    Where --selector=submittedBy is your emailATcompany.com and [NAMESPACE] is your namespace.

    The output is similar to:

    NAME       AGE
    pi-6rc7s   36m
    
  3. Run the following command to view a list of jobs by a user in a Queue:

    kubectl get batchjobs --selector=batchQueue=[QUEUE_NAME],submittedBy=[userATexample.com] --namespace [NAMESPACE]
    

    Where [QUEUE_NAME] is your Queue name, emailATcompany.com, and [NAMESPACE] is your namespace.

    The output is similar to:

    NAME       AGE
    pi-6rc7s   36m
    

Stopping jobs

You can stop a running job by using ksub or kubectl. The job will be marked as "Failed" with the condition "JobTerminationByUser" while reserving the historical data associated with the job.

ksub

Run the following command to terminate the job:

ksub -T [JOB_NAME] -n [NAMESPACE]

where [JOB_NAME] is your Job name and [NAMESPACE] is your namespace.

The output is similar to this:

Termination request for job [JOB_NAME] is sent to the server, please check the job status

You can also run the following command to watch the job until it completes:

ksub -Gw [JOB_NAME] -n [NAMESPACE]

kubectl

To terminate a running job, run the following command:

kubectl patch batchjob [JOB_NAME] --namespace [NAMESPACE] --type merge --patch '{"spec": {"userCommand": "Terminate"}}'

where [JOB_NAME] is your Job name and [NAMESPACE] is your namespace.

The output is similar to this:

batchjob.kbatch.k8s.io/[JOB_NAME] patched

Viewing logs for jobs

Job logs can be viewed only after the job has started executing, that is if the job has moved from queued to a running state. You must wait for the job to start before viewing it's logs.

To get a job's log, run the following command:

ksub -L -n [NAMESPACE] [JOB_NAME]

Where [NAMESPACE] is your namespace, and [JOB_NAME] is your job name.

What's next

Hai trovato utile questa pagina? Facci sapere cosa ne pensi:

Invia feedback per...

Kubernetes Engine Documentation