Regional endpoints

A Dataflow regional endpoint stores and handles metadata about your Dataflow job, and deploys and controls your Dataflow workers.

Regional endpoint names follow a standard convention based on Compute Engine region names. For example, the name for the Central US region is us-central1. Currently, Dataflow provides regional endpoints for the following regions:

  • us-west1
  • us-central1
  • us-east1
  • us-east4
  • northamerica-northeast1
  • europe-west2
  • europe-west1
  • europe-west4
  • europe-west3
  • asia-southeast1
  • asia-east1
  • asia-northeast1
  • australia-southeast1

Why specify a regional endpoint?

There are situations where specifying a regional endpoint for your Dataflow job may be useful.

Security and compliance

You might need to constrain Dataflow job processing to a specific geographic region in support of your project’s security and compliance needs.

Data locality

You can minimize network latency and network transport costs by running a Dataflow job from the same region as its sources, sinks, and staging/temporary file locations. It important to note that if you use sources, sinks, or staging/temporary file locations that are located outside of your job's region, your data might be sent across regions.

Notes about common Dataflow job sources:

  • Cloud Storage buckets can be regional or multi-regional resources: When using a Cloud Storage regional or a multi-regional bucket as a source, we recommend that you perform read operations in the same region.
  • Pub/Sub topics are global resources and do not have regional considerations.

Resilience and geographic separation

You might want to isolate your normal Dataflow operations from outages that could occur in other geographic regions. Or, you may need to plan alternate sites for business continuity in the event of a region-wide disaster.

Auto zone placement

By default, a regional endpoint automatically selects the best zone within the region based on the available zone capacity at the time of the job creation request. Automatic zone selection helps ensure that job workers run in the best zone for your job.

Using regional endpoints

Note: Regional endpoint configuration requires Apache Beam SDK version 2.0.0 or higher.

To specify a regional endpoint for your job, set the --region option to one of the supported regional endpoints. If you do not specify a regional endpoint, Dataflow uses us-central1 as the default region, and job workers will start in zones within us-central1. If the regional endpoint differs from the default region, the region needs to be specified in every Cloud Dataflow command for this job to avoid errors.

The Cloud Dataflow Command-line Interface also supports the --region option to specify regional endpoints.

Overriding the worker region or zone

By default, when you submit a job with the --region parameter, the regional endpoint automatically assigns workers to the best zone within the region. However, you may want to specify either a region or a specific zone (using --worker_region or --worker_zone, respectively) for your worker instances.

You might want to override the worker location in the following cases:

  • Your workers are in a region or zone that does not have a regional endpoint, and you want to use a regional endpoint that is closer to that region or zone.

  • You want to ensure that data processing for your Dataflow job occurs strictly in a specific region or zone.

For all other cases, we do not recommend overriding the worker location. The common scenarios table contains usage recommendations for these situations.

You can run the gcloud compute regions list command to see a listing of regions and zones that are available for worker deployment.

Common scenarios

The following table contains usage recommendations for common scenarios.

Scenario Recommendation
I want to use a supported regional endpoint and have no zone preference within the region. In this case, the regional endpoint automatically selects the best zone based on available capacity. Use --region to specify a regional endpoint. This ensures that Dataflow manages your job and processes data within the specified region.
I need worker processing to occur in a specific zone of a region that has a regional endpoint. Specify both --region and --worker_zone.

Use --region to specify the regional endpoint. Use --worker_zone to specify the specific zone within that region.

I need worker processing to occur in a specific region that does not have a regional endpoint. Specify both --region and --worker_region.

Use --region to specify the supported regional endpoint that is closest to the region where the worker processing must occur. Use --worker_region to specify a region where worker processing must occur.

I need to use Dataflow Shuffle. Use --region to specify a regional endpoint that supports Dataflow Shuffle. Some regional endpoints may not support this feature; see the feature documentation for a list of supported regions.