Data Analytics

The next generation of Dataflow: Dataflow Prime, Dataflow Go, and Dataflow ML

July 20, 2022

Sachin Agarwal

Group Product Manager, Google Cloud

Frank Guan

Product Marketing Lead, Google Cloud

By the end of 2024, 75% of enterprises will shift from piloting to operationalizing artificial intelligence according to IDC, yet the growing complexity of data types, heterogeneous data stacks and programming languages make this a challenge for all data engineers. With the current economic climate, doing more with cheaper costs and higher efficiency have also become a key consideration for many organizations.

Today, we are pleased to announce three major releases that bring the power of Google Cloud’s Dataflow to more developers for expanded use cases and higher data processing workloads, while keeping the costs low, as part of our goal to democratize the power of big data, real time streaming, and ML/AI for all developers, everywhere.

The three big Dataflow releases we’re thrilled to announce in general availability are:

Dataflow Prime - Dataflow Prime takes the serverless, no-operation benefits of Dataflow to a totally new level. Dataflow Prime allows users to take advantage of both horizontal autoscaling (more machines) and vertical autoscaling (larger machines with more memory) automatically for your streaming data processing workloads, with batch coming in the near future. With Dataflow Prime, pipelines are more efficient, enabling you to apply the insights in real time.
Dataflow Go - Dataflow Go provides native support for Go, a rapidly growing programming language thanks to its flexibility, ease of use and differentiated concepts, for both batch and streaming data processing workloads. With Apache Beam’s unique multi-language model, Dataflow Go pipelines can leverage the well adopted, best-in-class performance provided by the wide range of Java I/O connectors with ML transforms and I/O connectors from Python coming soon.
Dataflow ML - Speaking of ML transforms, Dataflow now has added out of the box support for running PyTorch and scikit-learn models directly within the pipeline. The new RunInference transform enables simplicity by allowing models to be used in production pipelines with very little code. These features are in addition to Dataflow's existing ML capabilities such as GPU support and the pre and post processing system for ML training, either directly or via frameworks such as Tensorflow Extended (TFX).

We’re so excited to make Dataflow even better. With the world’s only truly unified batch and streaming data processing model provided by Apache Beam, the wide support for ML frameworks, and the unique cross-language capabilities of the Beam model, Dataflow is becoming ever easier, faster, and more accessible for all data processing needs.

Getting started

To get started with Dataflow Go easily, see the Quickstart and download the Go SDK.
To learn more about Dataflow Prime, see the documentation.
To learn more about Dataflow ML and RunInference, read about the new RunInference Beam transform on the Apache Beam website.

Interested in running a proof of concept using your own data? Talk to your Google Cloud sales contact for hands-on workshop opportunities or sign up here.

Data Analytics

Dataflow Prime: bring unparalleled efficiency and radical simplicity to big data processing

Create even better data pipelines with Dataflow Prime, coming to Preview in Q3 2021.

By Evren Eryurek PhD • 5-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/09_-_Data_Analytics_tFH57V6.max-900x900.jpg

Posted in

Data Analytics

New BigQuery global queries let you explore distributed data with a single SQL statement

By Wawrzek Hyska • 3-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/whats_new_data_cloud_fWg4bKK.max-700x700_L5HMKic.png

Data Analytics

What’s new with Google Data Cloud

By The Google Cloud Data Analytics, BI, and Database teams • 3-minute read

Data Analytics

Introducing Conversational Analytics in BigQuery

By Vasiya Krishnan • 4-minute read

Data Analytics

What's new with ML infrastructure for Dataflow

By Efesa Origbo • 4-minute read

The next generation of Dataflow: Dataflow Prime, Dataflow Go, and Dataflow ML

Sachin Agarwal

Frank Guan

Getting started

Dataflow Prime: bring unparalleled efficiency and radical simplicity to big data processing

Related articles

New BigQuery global queries let you explore distributed data with a single SQL statement

What’s new with Google Data Cloud

Introducing Conversational Analytics in BigQuery

What's new with ML infrastructure for Dataflow