Join our session at Google Cloud Next ’23 to learn about the latest Dataflow innovations.
Jump to
Dataflow

Dataflow

Unified stream and batch data processing that's serverless, fast, and cost-effective.

New customers get $300 in free credits to spend on Dataflow.

  • Real-time insights and activation with data streaming and machine learning

  • Fully managed data processing service

  • Automated provisioning and management of processing resources

  • Horizontal and vertical autoscaling of worker resources to maximize resource utilization

  • OSS community-driven innovation with Apache Beam SDK

Benefits

Streaming data analytics with speed

Dataflow enables fast, simplified streaming data pipeline development with lower data latency.

Simplify operations and management

Allow teams to focus on programming instead of managing server clusters as Dataflow’s serverless approach removes operational overhead from data engineering workloads.

Reduce total cost of ownership

Resource autoscaling paired with cost-optimized batch processing capabilities means Dataflow offers virtually limitless capacity to manage your seasonal and spiky workloads without overspending.

Key features

Key features

Ready-to-use real-time AI

Enabled through out-of-the box ML features including NVIDIA GPU and ready-to-use patterns, Dataflow’s real-time AI capabilities allow for real-time reactions with near-human intelligence to large torrents of events.

Customers can build intelligent solutions ranging from predictive analytics and anomaly detection to real-time personalization and other advanced analytics use cases.

Train, deploy, and manage complete machine learning (ML) pipelines, including local and remote inference with batch and streaming pipelines. 

Autoscaling of resources and dynamic work rebalancing

Minimize pipeline latency, maximize resource utilization, and reduce processing cost per data record with data-aware resource autoscaling. Data inputs are partitioned automatically and constantly rebalanced to even out worker resource utilization and reduce the effect of “hot keys” on pipeline performance.

Monitoring and observability

Observe the data at each step of a Dataflow pipeline. Diagnose problems and troubleshoot effectively with samples of actual data. Compare different runs of the job to identify problems easily.

View all features

Documentation

Documentation

Tutorial

Serverless Data Processing with Dataflow: Foundations

Foundation training on everything you need to know about Dataflow.
Tutorial

Dataflow quickstart using Python

Set up your Google Cloud project and Python development environment, get the Apache Beam Python SDK and run and modify the WordCount example on the Dataflow service.
Tutorial

Using Dataflow SQL

Create a SQL query and deploy a Dataflow job to run your query from the Dataflow SQL UI.
Tutorial

Installing the Apache Beam SDK

Install the Apache Beam SDK so that you can run your pipelines on the Dataflow service.
Tutorial

Machine learning with Apache Beam and TensorFlow

Preprocess, train, and make predictions on a molecular energy machine learning model, using Apache Beam, Dataflow, and TensorFlow.
Tutorial

Dataflow word count tutorial using Java

In this tutorial, you'll learn the basics of the Cloud Dataflow service by running a simple example pipeline using the Apache Beam Java SDK.
Tutorial

Hands-on labs: Processing Data with Google Cloud Dataflow

Learn how to process a real-time, text-based dataset using Python and Dataflow, then store it in BigQuery.