Problem
You notice that Dataflow pipelines will not run or will not scale up due to the following error:
Startup of the worker pool in <ZONE> failed to bring up any of the desired <NUMBER_OF_WORKERSM workers. ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS: The zone '<ZONE>' does not have enough resources available to fulfill the request. '(resource type:compute)'.
Environment
- Any Cloud Dataflow environment.
- To find the corresponding error in the logs you can use the following Cloud Monitoring log filter:
resource.type="dataflow_step" severity=ERROR resource.labels.job_id="<JOB_ID>"
- Where <JOB_ID> is the ID of your failed Dataflow job (for example 2021-05-12_00_31_34-301052801769647219). To find out more about filtering the Cloud Monitoring logs, see: Advanced log queries.
Solution
Resource unavailability is rare and often lasts for a very short period of time, consider the implementing one of the following solutions:
- Short term solutions
- Move jobs to different zone or/and region.
- Wait 5+ mins and retry.
- Verify which machine types are supported in the zone and then reconfigure the jobs with smaller machine types.
- Long term solutions
- Using Compute Engine reservations. The reservations is a feature that provides a very high level of assurance in obtaining capacity on the Google Cloud platform. For more details on how to use this feature, see Consuming and managing reservations.
- Breaking up or other way optimizing your jobs and use fewer resources and reconfigure the jobs with smaller machine type. Smaller VMs shapes have a lower probability of resource exhaustion.
- If you have a dedicated Account Team, contact your Technical Account Manager (TAM) for the capacity team to review the resource reservation in the region.
Cause
Review our documentation which outlines how to build resilient and scalable architectures on Google Cloud Platform.
Please note, this issue is not related to user quota.