Data Analytics

Introducing regional placement in Dataflow

March 10, 2023

Yuta Labur

Software Engineer

Efesa Origbo

Product Manager, Google

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

We’re excited to announce that Dataflow now supports regional placement of workers.

Building upon Auto Zone placement

Dataflow deploys its workers as Compute Engine resources, which are hosted in multiple locations worldwide. Since 2018, Dataflow has supported the Auto Zone feature, which uses the available zone capacity to automatically select the best single zone within a region to run Dataflow workers. While Auto Zone enabled customers to defer the zone selection process to the Dataflow service, it lacked a few key features:

When a zone runs out of compute capacity, Auto Zone created jobs are susceptible to resource availability errors.
In the event of a zonal failure, Auto Zone created jobs will fail since they are confined to a specific zone.

Regional worker placement resolves the gaps mentioned above by enrolling the Dataflow job in all available zones within the relevant region. Thus, if a subset of zones run out of compute capacity, the Dataflow job will continue to provision workers from other zones that have additional capacity. This helps to improve the scalability and reliability of your Dataflow jobs.

https://storage.googleapis.com/gweb-cloudblog-publish/images/regional_placement_in_Dataflow.max-2000x2000.jpg

Getting started

Streaming Engine and Shuffle Service jobs are automatically enabled for regional worker placement. To learn more, head on over to the documentation page for more details.

Posted in