Troubleshoot job delays

This page lists common causes of Dataproc job scheduling delays, with information that can help you avoid them.

Overview

The following are common reasons why a Dataproc job is being delayed (throttled):

  • Too many running jobs
  • High system memory usage
  • Not enough free memory
  • Rate limit exceeded

Typically, the job delay message will be issued in the following format:

Awaiting execution [SCHEDULER_MESSAGE]"

The following sections provide possible causes and solutions for specific job delay scenarios.

Too many running jobs

Scheduler message:

Throttling job ### (and maybe others): Too many running jobs (current=xx max=xx)

Causes:

The maximum number of concurrent jobs based on master VM memory is exceeded (the job driver runs on the Dataproc cluster master VM). By default, Dataproc reserves 3.5GB of memory for applications, and allows 1 job per GB.

Example: The n1-standard-4 machine type has 15GB memory. With 3.5GB reserved for overhead, 11.5GB remains. Rounding down to an integer, 11GB is available for up to 11 concurrent jobs.

Solutions:

  1. Monitor log metrics, such as CPU usage and memory, to estimate job requirements.

  2. When you create a job cluster:

    1. Use a larger memory machine type for the cluster master VM.

    2. If 1GB per job is more than you need, set the dataproc:dataproc.scheduler.driver-size-mb cluster property to less than 1024.

    3. Set the dataproc:dataproc.scheduler.max-concurrent-jobs cluster property to a value suited to your job requirements.

High system memory or not enough free memory

Scheduler message:

Throttling job xxx_____JOBID_____xxx (and maybe others): High system memory usage (current=xx%)

Throttling job xxx_____JOBID_____xxx (and maybe others): Not enough free memory (current=xx min=xx)

Causes:

By default, the Dataproc agent throttles job submission when memory use reaches 90% (0.9). When this limit is reached, new jobs cannot be scheduled.

The amount of free memory needed to schedule another job on the cluster is not sufficient.

Solution:

  1. When you create a cluster:

    1. Increase the value of the dataproc:dataproc.scheduler.max-memory-used cluster property. For example, set it above the 0.90 default to 0.95.
    2. Increase the value of the dataproc.scheduler.min-free-memory.mb cluster property. The default value is 256 MB.

Job rate limit exceeded

Scheduler message:

Throttling job xxx__JOBID___xxx (and maybe others): Rate limit

Causes:

The Dataproc agent reached the job submission rate limit.

Solutions:

  1. By default, the Dataproc agent job submission is limited at 1.0 QPS, which you can set to a different value when you create a cluster with the dataproc:dataproc.scheduler.job-submission-rate cluster property.

View job status.

To view job status and details, see Job monitoring and debugging.