Cloud Dataflow supports regional endpoints based on Compute Engine regions and associated zones. A regional endpoint stores and handles metadata about your Cloud Dataflow job, and deploys and controls your Cloud Dataflow workers.
Currently, Cloud Dataflow provides regional endpoints for the following regions:
Why specify a regional endpoint?
There are situations where specifying a regional endpoint for your Cloud Dataflow job may be useful.
Security and compliance
You may need to constrain Cloud Dataflow job processing to a specific geographic region in support of your project’s security and compliance needs. For example, the European Union Data Protection Directive.
You can minimize network latency and network transport costs by running a Cloud Dataflow job from the same region as its sources and/or sinks.
Notes about common Cloud Dataflow job sources:
- Cloud Storage buckets can be regional or multi-regional resources: When using a Cloud Storage regional bucket as a source, Google recommends that you perform read operations in the same region. When using a Cloud Storage multi-regional bucket as a source, we recommend that you perform read operations in the same region.
- Cloud Pub/Sub topics are global resources and do not have regional considerations.
Resilience and geographic separation
You may want to isolate your normal Cloud Dataflow operations from outages that could occur in other geographic regions. Or, you may need to plan alternate sites for business continuity in the event of a region-wide disaster.
Regional endpoint semantics
Regional endpoint names follow a standard convention based on Compute Engine
For example, the name for the Central US region is
By default, when you specify a regional endpoint, the workers assigned to your job will be in a zone within the specified region.
Using regional endpoints
To specify a regional endpoint for your job, set the
--region option to one of
the supported regional endpoints. If you do not specify a regional endpoint,
Cloud Dataflow uses
us-central1 as the default.
The Cloud Dataflow Command-line Interface
also supports the
--region option to specify regional endpoints.
Overriding the zone
By default, when you submit a job with the
--region parameter, the regional
endpoint automatically assigns workers to a zone within the region. However, you
may want to specify both a region and a zone (using
--zone) in the following
Your workers are in a zone that does not have a regional endpoint, and you want to use a regional endpoint that is closer to that zone.
You want to ensure that data processing for your Cloud Dataflow job occurs strictly in a specific zone.
For all other cases, we do not recommend overriding the zone. The common scenarios table contains usage recommendations for these situations.
You can run the
gcloud compute regions list command to see a listing of
regions that have available zones for worker deployment.
The following table contains usage recommendations for common scenarios.
|I want to use a supported regional endpoint and have no zone preference within the region.||Use
|I need worker processing to occur in a specific zone of a region that has a regional endpoint.||Specify both
|I need worker processing to occur in a specific region that does not have a regional endpoint.||Specify both
|I need to use Cloud Dataflow Shuffle (Beta).||Use