Regional endpoints

Cloud Dataflow supports regional endpoints based on Compute Engine regions and associated zones. A regional endpoint stores and handles metadata about your Cloud Dataflow job, and deploys and controls your Cloud Dataflow workers.

Regional endpoint names follow a standard convention based on Compute Engine region names. For example, the name for the Central US region is us-central1. Currently, Cloud Dataflow provides regional endpoints for the following regions:

  • us-central1
  • us-east1
  • us-west1
  • europe-west1
  • asia-east1
  • asia-northeast1

Why specify a regional endpoint?

There are situations where specifying a regional endpoint for your Cloud Dataflow job may be useful.

Security and compliance

You may need to constrain Cloud Dataflow job processing to a specific geographic region in support of your project’s security and compliance needs. For example, the European Union Data Protection Directive.

Data locality

You can minimize network latency and network transport costs by running a Cloud Dataflow job from the same region as its sources and/or sinks.

Notes about common Cloud Dataflow job sources:

  • Cloud Storage buckets can be regional or multi-regional resources: When using a Cloud Storage regional bucket as a source, Google recommends that you perform read operations in the same region. When using a Cloud Storage multi-regional bucket as a source, we recommend that you perform read operations in the same region.
  • Cloud Pub/Sub topics are global resources and do not have regional considerations.

Resilience and geographic separation

You may want to isolate your normal Cloud Dataflow operations from outages that could occur in other geographic regions. Or, you may need to plan alternate sites for business continuity in the event of a region-wide disaster.

Auto Zone placement

By default, a regional endpoint automatically selects the best zone within the region based on the available zone capacity at the time of the job creation request. Automatic zone selection helps ensure that job workers run in the best zone for your job.

Using regional endpoints

To specify a regional endpoint for your job, set the --region option to one of the supported regional endpoints. If you do not specify a regional endpoint, Cloud Dataflow uses us-central1 as the default region, and job workers will start in zones within us-central1.

The Cloud Dataflow Command-line Interface also supports the --region option to specify regional endpoints.

Overriding the zone

By default, when you submit a job with the --region parameter, the regional endpoint automatically assigns workers to the best zone within the region. However, you may want to specify both a region and a zone (using --zone) in the following cases.

  • Your workers are in a zone that does not have a regional endpoint, and you want to use a regional endpoint that is closer to that zone.

  • You want to ensure that data processing for your Cloud Dataflow job occurs strictly in a specific zone.

For all other cases, we do not recommend overriding the zone. The common scenarios table contains usage recommendations for these situations.

You can run the gcloud compute regions list command to see a listing of regions that have available zones for worker deployment.

Common scenarios

The following table contains usage recommendations for common scenarios.

Scenario Recommendation
I want to use a supported regional endpoint and have no zone preference within the region. In this case, the regional endpoint automatically selects the best zone based on available capacity. Use --region to specify a regional endpoint. This ensures that Cloud Dataflow manages your job and processes data within the specified region.
I need worker processing to occur in a specific zone of a region that has a regional endpoint. Specify both --region and --zone.

Use --region to specify the regional endpoint. Use --zone to specify the specific zone within that region.

I need worker processing to occur in a specific region that does not have a regional endpoint. Specify both --region and --zone.

Use --region to specify the supported regional endpoint that is closest to the zone where the worker processing must occur. Use --zone to specify a zone within the desired region where worker processing must occur.

I need to use Cloud Dataflow Shuffle (Beta). Use --region to specify a regional endpoint that supports Cloud Dataflow Shuffle. Some regional endpoints may not support this feature; see the feature documentation for a list of supported regions.
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow Documentation