Limit run times for tasks and runnables using timeouts

This document describes how to limit the run times of tasks and runnables by setting timeouts.

A timeout specifies the amount of time that a task or runnable is permitted to run. Batch doesn't allow jobs to run for longer than 14 days and doesn't set default timeouts for individual tasks and runnables. Consequently, an individual task or runnable can run for as long as 14 days before automatic failure. But, if your tasks and runnables aren't intended to run for that long, this configuration might cause unexpected costs and delays. To prevent excessive run times, you can set timeouts for tasks and runnables.

Before you begin

Set timeouts

You can set timeouts for runnables, tasks, or both. The timeout for a runnable specifies the maximum run time for that runnable. The timeout for a task specifies the maximum run time for that task, which is the sum of all the individual run times of its runnables. For example, if a task has 3 runnables that all run at the same time for 1 minute, then the task's run time is 3 minutes, not 1 minute.

If you set overlapping timeouts—such as a timeout for both a runnable and the runnable's task—then only one timeout needs to be exceeded to trigger automatic failure. For example, suppose you set a task's timeout to 60 seconds and the timeout of each of that task's runnables to 120 seconds. Then, this example task and all of its runnables fail when the sum of the run times of its runnables exceeds 60 seconds, and it's impossible to trigger the 120-second timeouts.

To choose the appropriate timeout to set for your job's tasks and runnables, analyze the logs of similar jobs that you have previously run to determine the typical run time for the tasks and runnables for similar workloads.

Set timeout for a task

Use the Google Cloud CLI or REST API to create a job that includes the maxRunDuration field in the taskSpec object of the JSON file:

{
    "taskGroups": [
      {
        "taskSpec": {
          ...
          "maxRunDuration": "TIMEOUT"
        }
      }
    ]
}

Replace TIMEOUT with the maximum number of seconds or fractional sections you want to permit the task to run for. For example, 255s.

A job that sets a 255 second timeout for a task would have a JSON configuration file similar to the following:

{
    "taskGroups": [
      {
        "taskSpec": {
          "runnables": [
            {
              "script": {
                "text": "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
              }
            }
          ],
          "maxRunDuration": "255s"
        },
        "taskCount": 3
      }
    ],
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

If the timeout for a task is exceeded, the task automatically fails and the exceeded timeout is indicated by exit code 50005 in the job's status events and logs. For more information about exceeded timeouts, see the troubleshooting documentation for exit code 50005.

Set timeout for a runnable

Use the Google Cloud CLI or REST API to create a job that includes the timeout field in the runnable object of the JSON file:

{
    "taskGroups": [
      {
        "taskSpec": {
          "runnables": [
            {
              ...
              "timeout": "TIMEOUT"
            }
          ]
        }
      }
    ]
}

Replace TIMEOUT with the maximum number of seconds or fractional sections you want to permit the runnable to run for. For example, 3.5s.

A job that sets a 3.5 second timeout for a runnable would have a JSON configuration file similar to the following:

{
    "taskGroups": [
      {
        "taskSpec": {
          "runnables": [
            {
              "script": {
                "text": "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
              },
              "timeout": "3.5s"
            }
          ]
        },
        "taskCount": 3
      }
    ],
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

If the timeout for a runnable is exceeded, the runnable automatically fails and the exceeded timeout is indicated by exit code 50005 in the job's status events and logs. For more information about exceeded timeouts, see the troubleshooting documentation for exit code 50005.

What's next