The Cloud Dataflow managed service has the following quota limits:
- Each user may make up to 5000 requests per second.
- Each Dataflow job may use a maximum of 1000 Google Compute Engine instances.
- Each Cloud Platform project may run 25 concurrent Dataflow jobs.
- Each user may make up to 250 monitoring requests per second.
In addition, the Cloud Dataflow Service exercises various components of the Google Cloud Platform, such as BigQuery, Cloud Storage, Cloud Pub/Sub, and Compute Engine. These (and other Cloud Platform services) employ quotas to cap the maximum number of resources you may use within a project. When you use Cloud Dataflow, you may need to adjust your quota settings for these services.
Google Compute Engine
When you run your pipeline on the Cloud Dataflow Service, Cloud Dataflow spins up Compute Engine instances to run your pipeline code.
- CPUs: The default machine types for Dataflow are
n1-standard-1for batch and
n1-standard-4for streaming. Compute Engine calculates the number of CPUs by summing each instance’s total CPU count. For example, running 10
n1-standard-4instances counts as 40 CPUs. See Compute Engine machine types for a mapping of machine types to CPU count.
- In-Use IP Addresses: The number of in-use IP addresses in your project must be sufficient to accommodate the desired number of instances. To use 10 Compute Engine instances, you'll need 10 in-use IP addresses.
- Persistent Disk: Cloud Dataflow attaches Persistent Disks to each instance. The default disk size is 250GB for batch and 420GB for streaming. For 10 instances, by default you'll need 2,500 GB of Persistent Disk for a batch job.
- Managed Instance Groups: Cloud Dataflow deploys your Compute Engine instances as a
Managed Instance Group. You'll need to ensure you have the following related quota available:
- One Instance Group per Dataflow job
- One Managed Instance Group per Dataflow job
- One Instance Template per Dataflow job
maxNumWorkersthat fits within your trial limit.
Depending on which sources and sinks you are using, you may also need additional quota.
- Pub/Sub: If you are using Pub/Sub, you may need additional quota. When planning for quota, note that processing 1 message from Cloud Pub/Sub involves 3 operations. If you use custom timestamps, you should double your expected number of operations, since Cloud Dataflow will create a separate subscription to track custom timestamps.
- BigQuery: If you are using the streaming API for BigQuery, quota limits and other restrictions apply.