This page lists common causes of Dataproc job scheduling delays, with information that can help you avoid them.
Overview
The following are common reasons why a Dataproc job is being delayed (throttled):
- Too many running jobs
- High system memory usage
- Not enough free memory
- Rate limit exceeded
Typically, the job delay message will be issued in the following format:
Awaiting execution [SCHEDULER_MESSAGE]"
The following sections provide possible causes and solutions for specific job delay scenarios.
Too many running jobs
Scheduler message:
Throttling job ### (and maybe others): Too many running jobs (current=xx max=xx)
Causes:
The maximum number of concurrent jobs based on master VM memory is exceeded (the job driver runs on the Dataproc cluster master VM). By default, Dataproc reserves 3.5GB of memory for applications, and allows 1 job per GB.
Example: The n1-standard-4
machine type has 15GB
memory. With 3.5GB
reserved for overhead,
11.5GB
remains. Rounding down to an integer, 11GB
is available for up to 11 concurrent jobs.
Solutions:
Monitor log metrics, such as CPU usage and memory, to estimate job requirements.
When you create a job cluster:
Use a larger memory machine type for the cluster master VM.
If
1GB
per job is more than you need, set thedataproc:dataproc.scheduler.driver-size-mb
cluster property to less than1024
.Set the
dataproc:dataproc.scheduler.max-concurrent-jobs
cluster property to a value suited to your job requirements.
High system memory or not enough free memory
Scheduler message:
Throttling job xxx_____JOBID_____xxx (and maybe others): High system memory usage (current=xx%)
Throttling job xxx_____JOBID_____xxx (and maybe others): Not enough free memory (current=xx min=xx)
Causes:
By default, the Dataproc agent throttles job submission when
memory use reaches 90% (0.9)
. When this limit is reached, new jobs cannot be
scheduled.
The amount of free memory needed to schedule another job on the cluster is not sufficient.
Solution:
When you create a cluster:
- Increase the value of the
dataproc:dataproc.scheduler.max-memory-used
cluster property. For example, set it above the0.90
default to0.95
. - Increase the value of the
dataproc.scheduler.min-free-memory.mb
cluster property. The default value is256
MB.
- Increase the value of the
Job rate limit exceeded
Scheduler message:
Throttling job xxx__JOBID___xxx (and maybe others): Rate limit
Causes:
The Dataproc agent reached the job submission rate limit.
Solutions:
- By default, the Dataproc agent job submission is limited at
1.0 QPS
, which you can set to a different value when you create a cluster with thedataproc:dataproc.scheduler.job-submission-rate
cluster property.
View job status.
To view job status and details, see Job monitoring and debugging.