Dataflow release notes

This page documents production updates to the Dataflow service. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/dataflow-release-notes.xml

March 15, 2024

You can now use worker utilization hints to tune horizontal autoscaling for streaming pipelines.

Added new autoscaling metrics:

  • Autoscaling rationale chart: explains the factors driving autoscaling decisions
  • Worker CPU utilization chart: shows current user worker CPU utilization and customer autoscaling hint value
  • Timer backlog per stage: shows an estimate of time needed to materialize the output for windows whose timer has expired
  • Parallel processing: the number of keys available for parallel processing

March 11, 2024

You can now use committed use discounts (CUDs) with Dataflow streaming jobs. Committed use discounts provide discounted prices in exchange for your commitment to continuously use a certain amount of Dataflow compute resources for a year or longer.

March 08, 2024

Streaming jobs created after March 7, 2024 automatically encrypt all user data with customer-managed encryption keys (CMEK). To enable this encryption for jobs created before March 7, 2024, drain or cancel the job, and then restart it.

February 27, 2024

Dataflow now supports at-least-once streaming mode. You can use this mode to achieve lower latency and reduced costs for workloads that can tolerate duplicate records. This feature is generally available (GA). For more information, see Set the pipeline streaming mode.

February 21, 2024

You can now use Gemma models in your Apache Beam inference pipelines. For more information, see Use Gemma open models with Dataflow.

February 15, 2024

You can now use a turnkey transform to enrich streaming data in your Dataflow pipeline. When you enrich data, you augment the raw data from one source by adding related data from a second source. For more information, see Enrich streaming data.

February 12, 2024

Dataflow Streaming Engine now supports resource-based billing. When you enable resource-based billing with Streaming Engine, you're billed for the total resources consumed by your job.

February 05, 2024

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.9.5 (2024-01-30)

Bug Fixes
  • dataflow: Enable universe domain resolution options (fd1d569)

January 31, 2024

Dataflow is available in Johannesburg, South Africa (africa-south1).

December 18, 2023

Dataflow now supports data sampling for pipeline exceptions. With this feature, you can see samples of the data being processed when an unhandled exception occurs. Use exception sampling to help troubleshoot pipeline errors. For more information, see Use exception sampling.

December 12, 2023

You can now run a job graph validation check to verify whether a replacement job is valid before you launch the new job. For more information, see Validate a replacement job.

December 06, 2023

You can now archive completed Dataflow jobs. When you archive a Dataflow job, the job is moved from the Dataflow Jobs page in the console to the Archived jobs page. For more information, see Archive Dataflow jobs.

December 05, 2023

The Dataflow web-based monitoring interface now includes a dashboard that monitors your Dataflow jobs at the project level. For more information, see Dataflow project monitoring dashboard.

November 17, 2023

Dataflow supports NVIDIA® L4 and NVIDIA® A100 80 GB GPU types. For more information, see Dataflow support for GPUs.

November 13, 2023

Dataflow jobs now scale to 4,000 worker VMs.

November 06, 2023

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.9.4 (2023-11-01)

Bug Fixes
  • dataflow: Bump google.golang.org/api to v0.149.0 (8d2ab9f)

October 30, 2023

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.9.3 (2023-10-26)

Bug Fixes
  • dataflow: Update grpc-go to v1.59.0 (81a97b0)

October 23, 2023

The Cloud Spanner to BigQuery template for batch pipelines is available in preview.

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.8.5 (2023-10-09)

Documentation

October 16, 2023

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.9.2 (2023-10-12)

Bug Fixes
  • dataflow: Update golang.org/x/net to v0.17.0 (174da47)

September 19, 2023

Dataflow is now available in Dammam, Saudi Arabia (me-central2).

September 14, 2023

Dataflow now supports the Tau T2A Arm machine series as a worker machine type. This feature is generally available (GA). For more information, see Use Arm VMs on Dataflow.

September 06, 2023

The following Dataflow templates are generally available (GA):

August 22, 2023

Dataflow is available in Berlin (europe-west10).

August 15, 2023

You can now update streaming job options without stopping your job. For more information, see In-flight job option update.

Dataflow cost monitoring is generally available (GA).

July 27, 2023

The following Dataflow templates are generally available (GA):

July 26, 2023

Dynamic thread scaling is generally available (GA). Dynamic thread scaling is a part of Dataflow's suite of vertical scaling features.

July 25, 2023

When you run multiple SDK processes on a shared Dataflow GPU, you can improve GPU efficiency and utilization by enabling the NVIDIA Multi-Process Service (MPS).

July 24, 2023

You can now view streaming stragglers in the Google Cloud console. For more information, see Troubleshoot stragglers in streaming jobs.

July 10, 2023

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.8.4 (2023-07-04)

Bug Fixes
  • Add async context manager return types (#184) (355b8b4)

June 26, 2023

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.9.1 (2023-06-20)

Bug Fixes
  • dataflow: REST query UpdateMask bug (df52820)

June 13, 2023

Dataflow now supports Confidential VMs for Dataflow worker VMs. For more information, see Dataflow service options.

June 05, 2023

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.9.0 (2023-05-30)

Features
  • dataflow: Update all direct dependencies (b340d03)

0.9.0 (2023-05-30)

Features
  • dataflow: Update all direct dependencies (b340d03)

May 31, 2023

Data sampling is now generally available (GA). Data sampling lets you observe the data at each step of a pipeline. For more information, see Use data sampling to observe pipeline data.

May 15, 2023

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.8.1 (2023-05-08)

Bug Fixes
  • dataflow: Update grpc to v1.55.0 (1147ce0)

April 21, 2023

Dataflow ML now supports the Automatic Model Refresh feature, which lets you update your machine learning model without stopping your Apache Beam pipeline.

April 19, 2023

You can now manage Dataflow jobs by using Eventarc. For more information, see Use Eventarc to manage Dataflow jobs.

April 10, 2023

Dataflow cost monitoring is now available in preview.

April 03, 2023

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.8.3 (2023-03-23)

Documentation
  • Fix formatting of request arg in docstring (#177) (22668f6)

March 30, 2023

Dataflow is now available in Doha (me-central1).

March 29, 2023

The Dataflow VM image has been updated to include mitigations for multiple vulnerabilities by upgrading to cos-97-16919-235-30. For the full list of mitigations, see the Container-Optimized OS release notes.

Dataflow jobs started on or after March 29, 2023 will run VM instances that use this image.

March 28, 2023

Vertical Autoscaling now supports batch jobs.

March 23, 2023

Dataflow is now available in Turin (europe-west12).

February 13, 2023

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.8.2 (2023-02-07)

Bug Fixes
  • Raise not implemented error when REST transport is not supported (#170) (44651ca)

January 30, 2023

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.8.1 (2023-01-20)

Bug Fixes
  • Add context manager return types (63d369a)
Documentation
  • Add documentation for enums (63d369a)

January 16, 2023

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.8.0 (2023-01-10)

Features

January 09, 2023

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.8.0 (2023-01-04)

Features
  • dataflow: Add REST client (06a54a1)

January 03, 2023

Starting in version 2023-01-03-00_RC00, the Google-provided Dataflow templates support ES6 syntax for JavaScript user-defined functions (UDFs). This change is backwards-compatible. ES5 syntax and existing user-defined functions are still supported.

When you run Google-provided templates using the latest version, your jobs are upgraded automatically on restart. If you want to keep running an earlier version of a template, when you run the template, specify version 2022-12-15-00_RC00 or earlier.

December 27, 2022

Starting with Beam SDK version 2.44.0, Dataflow will not support running Dataflow jobs with workers in a region that is different from the Dataflow regional endpoint.

December 19, 2022

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.7.0 (2022-12-15)

Features
  • Add support for google.cloud.dataflow.__version__ (5f36251)
  • Add typing to proto.Message based class attributes (5f36251)
Bug Fixes
  • Add dict typing for client_options (5f36251)
  • deps: Require google-api-core >=1.34.0, >=2.11.0 (9b9083c)
  • Drop usage of pkg_resources (9b9083c)
  • Fix timeout default values (9b9083c)
Documentation
  • samples: Snippetgen handling of repeated enum field (5f36251)
  • samples: Snippetgen should call await on the operation coroutine before calling result (9b9083c)

December 16, 2022

Dataflow now supports regional placement for workers.

December 15, 2022

The Dataflow VM image has been updated to include mitigations for OpenSSL CVE-2022-3602 by upgrading to cos-97-16919-189-12. For jobs that use GPUs, the NVIDIA drivers have also been updated to mitigate the vulnerability. Dataflow jobs started on or after December 14, 2022 will run VM instances that use this image.

November 14, 2022

A weekly digest of client library updates from across the Cloud SDK.

Node.js

Changes for @google-cloud/dataflow

2.0.1 (2022-11-11)

Bug Fixes
  • Allow passing gax instance to client constructor (#80) (9054e83)
  • Better support for fallback mode (#76) (7b4c304)
  • Change import long to require (#77) (531996b)
  • deps: Use google-gax v3.5.2 (#87) (9f856a5)
  • Do not import the whole google-gax from proto JS (#79) (a0924da)
  • docs: Document fallback rest option (#72) (bb637f7)
  • Preserve default values in x-goog-request-params header (#81) (18e64cc)
  • Regenerated protos JS and TS definitions (#90) (920d3fe)
  • Remove pip install statements (#78) (884ea27)
  • use google-gax v3.3.0 (a0924da)

October 17, 2022

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.6.2 (2022-10-10)

Bug Fixes
  • deps: Allow protobuf 3.19.5 (#150) (216c6e2)
  • deps: require google-api-core>=1.33.2 (216c6e2)

October 10, 2022

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.6.1 (2022-10-03)

Bug Fixes

Java

Changes for google-cloud-dataflow

0.7.6 (2022-10-05)

Bug Fixes
  • update protobuf to v3.21.7 (63bfc0e)

0.7.5 (2022-10-03)

Dependencies
  • Update dependency cachetools to v5 (#243) (b55c975)
  • Update dependency certifi to v2022.9.24 (#222) (7482df0)
  • Update dependency charset-normalizer to v2.1.1 (#226) (2ea7474)
  • Update dependency click to v8.1.3 (#227) (20f0fdc)
  • Update dependency com.google.cloud:google-cloud-shared-dependencies to v3.0.4 (#247) (7010c38)
  • Update dependency gcp-releasetool to v1.8.8 (#223) (3c11024)
  • Update dependency google-api-core to v2.10.1 (#228) (cd149f3)
  • Update dependency google-auth to v2.12.0 (#229) (808298e)
  • Update dependency google-cloud-core to v2.3.2 (#224) (e9c50a8)
  • Update dependency google-cloud-storage to v2.5.0 (#230) (55a25e9)
  • Update dependency googleapis-common-protos to v1.56.4 (#225) (2ccbec5)
  • Update dependency markupsafe to v2.1.1 (#231) (4c6e0a6)
  • Update dependency protobuf to v3.20.2 (#232) (75a739c)
  • Update dependency protobuf to v4 (#244) (b38c19f)
  • Update dependency pyjwt to v2.5.0 (#233) (7f4064b)
  • Update dependency requests to v2.28.1 (#234) (41938f3)
  • Update dependency typing-extensions to v4.3.0 (#235) (8c42354)
  • Update dependency zipp to v3.8.1 (#242) (4b2ebd4)

October 04, 2022

Dataflow is now available in Tel Aviv (me-west1).

The Dataflow VM image has been updated to include several mitigations for a recently disclosed hardware speculative execution vulnerability named Retbleed. Dataflow jobs started on or after September 21, 2022 will run VM instances that use this image.

September 26, 2022

A weekly digest of client library updates from across the Cloud SDK.

Go

Changes for dataflow/apiv1beta3

0.7.0 (2022-09-21)

Features
  • dataflow: rewrite signatures in terms of new types for betas (9f303f9)

0.6.0 (2022-09-19)

Features
  • dataflow: start generating proto message types (563f546)

September 19, 2022

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.6.0 (2022-09-13)

Features

Java

Changes for google-cloud-dataflow

0.7.4 (2022-09-15)

Dependencies
  • Update dependency com.google.cloud:google-cloud-shared-dependencies to v3.0.3 (#216) (54999e7)

September 12, 2022

A weekly digest of client library updates from across the Cloud SDK.

Java

Changes for google-cloud-dataflow

0.7.3 (2022-09-09)

Dependencies
  • Update dependency com.google.cloud:google-cloud-shared-dependencies to v3.0.2 (#211) (20b1918)

August 25, 2022

Dataflow now uses Regional Managed Instance Groups (MIGs). Previously, Dataflow used zonal MIGs.

If this change causes you to exceed your quota, set your Regional managed instance groups quota to the same limit assigned to your Managed instance groups quota. For more information, see Working with quotas.

August 22, 2022

A weekly digest of client library updates from across the Cloud SDK.

Python

Changes for google-cloud-dataflow-client

0.5.5 (2022-08-11)

Bug Fixes
  • deps: allow protobuf < 5.0.0 (#126) (16b89c0)
  • deps: require proto-plus >= 1.22.0 (16b89c0)

August 15, 2022

A weekly digest of client library updates from across the Cloud SDK.

Java

Changes for google-cloud-dataflow

0.7.2 (2022-08-09)

Dependencies
  • update dependency com.google.cloud:google-cloud-shared-dependencies to v3 (#202) (161c011)

July 20, 2022

Dataflow Prime is now in General Availability.

July 11, 2022

You can use the Apache Beam SDK for Go to create batch and streaming Dataflow pipelines. This feature is now in General Availability.

June 07, 2022

Dataflow is now available in Dallas, Texas (us-south1).

May 24, 2022

Dataflow is now available in Columbus (us-east5).

May 13, 2022

Dataflow now supports Flex Template images from private registries. To learn more, see Use an image from a private registry.

May 10, 2022

Dataflow is now available in Madrid (europe-southwest1).

May 03, 2022

Dataflow is now available in Paris (europe-west9).

April 20, 2022

Dataflow is now available in Milan (europe-west8).

April 06, 2022

Dataflow now supports Runner v2 in GA for all languages.

March 17, 2022

March 04, 2022

You can now use the Apache Beam SDK for Go to create batch Dataflow pipelines. This feature is in Preview.

February 16, 2022

Profiling Dataflow pipelines with Cloud Profiler is generally available (GA). Use Dataflow integration with Cloud Profiler to monitor pipeline performance.

January 04, 2022

Dataflow now fully supports using Identity and Access Management (IAM) custom roles. You can create a custom IAM role and assign it to a user-managed service account used in Dataflow instead of assigning the Dataflow Worker role.

November 16, 2021

Dataflow is now available in Santiago (southamerica-west1).

September 21, 2021

Dataflow now uses Zonal DNS for worker resources. This enables Dataflow to offer higher reliability guarantees around Internal DNS registration.

September 07, 2021

Dataflow now supports Shielded VM workers.

August 31, 2021

Dataflow Prime is now available in Preview.

August 03, 2021

Dataflow is now able to use workers, Dataflow Shuffle, Streaming Engine, FlexRS, and regional endpoints in zones in Toronto (northamerica-northeast2).

July 31, 2021

Dataflow now supports storing Flex Template images in Artifact Registry.

July 22, 2021

Dataflow now supports custom containers in GA.

June 30, 2021

GPU support on Dataflow is now in General Availability.

June 29, 2021

Dataflow is now able to use workers, Dataflow Shuffle, Streaming Engine, FlexRS, and regional endpoints in zones in Delhi (asia-south2).

June 28, 2021

Dataflow snapshots are now available in GA.

June 22, 2021

Dataflow is now able to use workers, Dataflow Shuffle, Streaming Engine, FlexRS, and regional endpoints in zones in Melbourne (australia-southeast2).

June 14, 2021

In addition to scalar functions, Dataflow SQL now supports aggregate user-defined functions (UDFs) for Java. For more information, see Dataflow SQL user-defined functions. This feature is in Preview.

June 09, 2021

Dataflow SQL now supports user-defined functions (UDFs) written using Java. For more information, see Dataflow SQL user-defined functions. This feature is in Preview.

May 14, 2021

You can now enable logging of human-readable hot keys. For more information, see the hot key entry in Pipeline options.

May 11, 2021

Dataflow Shuffle is now the default mode for all batch pipelines.

March 24, 2021

Dataflow is now able to use workers, Dataflow Shuffle, Streaming Engine, FlexRS, and regional endpoints in zones in europe-central2 (Warsaw).

March 22, 2021

Dataflow SQL now supports user-defined functions (UDFs) written using SQL. For more information, see Dataflow SQL user-defined functions. This feature is in Preview.

March 19, 2021

Execution details are now available in Preview.

February 03, 2021

Dataflow now supports Dataflow Shuffle, Streaming Engine, FlexRS, and the following regional endpoints in GA:

  • asia-east2 Hong Kong
  • asia-northeast2 - Japan (Osaka)
  • asia-northeast3 - Seoul
  • asia-southeast2 - Jakarta
  • europe-north1 - Finland
  • us-west3 - Salt Lake City
  • us-west4 - Las Vegas

January 29, 2021

Flex templates now support updating streaming jobs and Flexible Resource Scheduling (FlexRS).

Dataflow snapshots are now available in Preview.

January 25, 2021

GPU support on Dataflow is currently available in Preview. To enroll in this Preview offering, contact Support or Sales.

December 11, 2020

Workers now use the Java 11 runtime.

December 10, 2020

Dataflow now supports custom containers as a Preview offering.

November 11, 2020

Dataflow now supports Interactive Notebooks in GA.

November 05, 2020

Dataflow now supports Dataflow Shuffle, Streaming Engine, FlexRS, and the following regional endpoints in GA:

  • us-west2 (Los Angeles)
  • southamerica-east1 (São Paulo)
  • europe-west6 (Zurich)
  • asia-south1 (Mumbai)

Pub/Sub I/O metrics in the Dataflow and Cloud Monitoring UIs may be unavailable for Dataflow jobs using Streaming Engine.

October 30, 2020

Dataflow Runner v2 is now the default runner for Python streaming pipelines using SDK 2.21.0 and above.

September 30, 2020

Dataflow now supports Flex Templates in GA.

September 29, 2020

You can now use a network tags parameter to add network tags to all worker VMs that execute a particular Dataflow job.

July 27, 2020

Dataflow now supports Dataflow Shuffle, Streaming Engine, FlexRS, and the following regional endpoints in GA:

  • northamerica-northeast1 (Montréal)
  • asia-southeast1 (Singapore)
  • australia-southeast1 (Sydney)

June 08, 2020

Dataflow is now able to use workers in zones in the asia-southeast2 region (Jakarta).

April 20, 2020

Dataflow is now able to use workers in zones in the us-west4 region (Las Vegas).

April 15, 2020

Cloud Dataflow SQL is now generally available. You can now run parameterized queries from the Dataflow SQL UI.

April 09, 2020

Dataflow now provides beta support for Flex Templates.

Dataflow now provides beta support for Interactive Notebooks.

April 07, 2020

Dataflow now supports Dataflow Shuffle, Streaming Engine, FlexRS, and the following regional endpoints in GA:

  • us-east4 (Northern Virginia)
  • europe-west2 (London)
  • europe-west3 (Frankfurt)

March 03, 2020

Cloud Dataflow SQL is now available in beta. You can now do the following in Cloud Dataflow SQL:

  • Write data to two destinations, including Cloud Pub/Sub
  • Specify how to load data into a BigQuery table
  • Set pipeline options in the Cloud Dataflow SQL UI

February 24, 2020

Using Cloud Dataflow with Cloud Key Management Service to create a customer-managed encryption key (CMEK) is generally available.

Cloud Dataflow is now able to use workers in zones in the us-west3 region (Salt Lake City).

February 04, 2020

The Cloud Dataflow monitoring UI now has enhanced observability features to help with troubleshooting batch and streaming pipelines.

January 24, 2020

Cloud Dataflow is now able to use workers in zones in the asia-northeast3 region (Seoul).

November 18, 2019

Flexible Resource Scheduling (FlexRS) in Cloud Dataflow is generally available. The service is available in five additional regions:

  • us-east1 (South Carolina)
  • us-west1 (Oregon)
  • asia-east1 (Taiwan)
  • asia-northeast1 (Tokyo)
  • europe-west4 (Netherlands)

You can now do the following in Cloud Dataflow SQL:

  • Use Cloud Storage filesets as a data source
  • Assign schemas to data sources in the Cloud Dataflow SQL UI
  • Preview the content of Cloud Pub/Sub messages from the Cloud Dataflow SQL UI

October 31, 2019

Cloud Dataflow Shuffle and Streaming Engine are now available in us-east1 (South Carolina).

October 25, 2019

You can now see audit logs of Cloud KMS key operations and protect Cloud Dataflow Shuffle state using a customer-managed encryption key.

October 08, 2019

Python streaming for Apache Beam SDK 2.16 or higher is generally available. You can now do the following in Python:

Python 3 support for Apache Beam SDK 2.16.0 or higher is now generally available. This feature provides support for using Python 3.5, 3.6, and 3.7. You can run run any existing Python 2.7 batch and streaming pipelines that use DirectRunner or DataflowRunner. However, you might need to make changes to ensure that your pipeline code is compatible with Python 3. Keyword-only arguments (a syntactic construct introduced in Python 3) are not yet supported by Apache Beam SDK. For the current status and summary of recent Python 3-specific improvements, follow updates on the Apache Beam issue tracker.

October 07, 2019

Cloud Dataflow Shuffle and Streaming Engine are now available in two additional regions:

  • us-west1 (Oregon)
  • asia-east1 (Taiwan)

September 03, 2019

Automatic hot key detection is now enabled in batch pipelines for Apache Beam SDK 2.15.0 or higher.

August 09, 2019

Integration with Cloud Dataflow VPC Service Controls is generally available.

August 02, 2019

Using Cloud Dataflow with Cloud Key Management Service is now available in beta. Customer-managed encryption keys (CMEK) allow for encryption of your pipeline state. This feature is limited to Persistent Disks attached to Cloud Dataflow workers and used for Persistent Disk-based shuffle and streaming state storage.

August 01, 2019

Python 3 support for Apache Beam SDK 2.14.0 or higher is now in beta. This feature provides support for using Python 3.5, 3.6, and 3.7. You can run any existing Python 2.7 batch and streaming pipelines that use DirectRunner or DataflowRunner. However, you might need to make changes to ensure that your pipeline code is compatible with Python 3. Some syntactic constructs introduced in Python 3 are not yet fully supported by the Apache Beam SDK. For details and current status, follow updates on the Apache Beam issue tracker.

May 16, 2019

Cloud Dataflow SQL is now publicly available in alpha. Cloud Dataflow SQL lets you use SQL queries to develop and run Cloud Dataflow jobs from the BigQuery web UI.

April 18, 2019

Cloud Dataflow is now able to use workers in zones in the asia-northeast2 region (Osaka, Japan).

April 10, 2019

Cloud Dataflow Streaming Engine is generally available. The service is available in two additional regions:

  • asia-northeast1 (Tokyo)
  • europe-west4 (Netherlands)

Note that Streaming Engine requires the Apache Beam SDK for Java, versions 2.10.0 or higher.

Cloud Dataflow Shuffle is now available in two additional regions:

  • asia-northeast1 (Tokyo)
  • europe-west4 (Netherlands)

Cloud Dataflow provides beta support for Flexible Resource Scheduling (FlexRS) in the us-central1 and europe-west1 regions.

Streaming autoscaling is generally available for pipelines that use Streaming Engine.

April 08, 2019

Apache Beam SDK for Python can only use BigQuery resources in the following regions:

  • Regional locations: us-west2, us-east4, europe-north1, europe-west2, europe-west6.
  • Multi-regional locations: EU and US.

Cloud Dataflow provides beta support for Flexible Resource Scheduling (FlexRS) in the us-central1 and europe-west1 regions.

April 01, 2019

Cloud Dataflow provides beta support for VPC Service Controls.

March 24, 2019

The following SDK versions will be decommissioned later in 2019 due to the discontinuation of support for JSON-RPC and Global HTTP Batch Endpoints. Note that this change overrides the release note from December 17, that states that decommissioning was expected to happen in March 2019.

  • Apache Beam SDK for Java, versions 2.0.0 to 2.4.0 (inclusive)
  • Apache Beam SDK for Python, versions 2.0.0 to 2.4.0 (inclusive)
  • Cloud Dataflow SDK for Java, versions 2.0.0 to 2.4.0 (inclusive)
  • Cloud Dataflow SDK for Python, 2.0.0 to 2.4.0 (inclusive)

See the SDK version support status page for detailed SDK support status.

March 20, 2019

Apache Beam SDK 2.4.0 and Cloud Dataflow SDK 2.4.0 are now deprecated. For detailed support status information, see the SDK version support status table.

March 11, 2019

Cloud Dataflow is now able to use workers in zones in the europe-west6 region (Zürich, Switzerland).

March 06, 2019

Apache Beam SDK 2.10.0 depends on gcsio client library version 1.9.13, which has known issues:

To work around these issues, either upgrade to Apache Beam SDK 2.11.0, or override the gcsio client library version to 1.9.16 or later.

February 25, 2019

You can now view system latency and data freshness metrics for your pipeline in the Cloud Dataflow monitoring interface.

February 20, 2019

2018-2019-Apache Beam SDK 2.10.0 contains fixes for the known issues disclosed on December 20 and February 4.

February 04, 2019

In a specific case, users of Apache Beam Java SDKs (2.9.0 and earlier) and Cloud Dataflow Java SDKs (2.5.0 and earlier) might experience data duplication when reading files from Cloud Storage. Duplication might occur when all of the following conditions are true:

  • You are reading files with the content-encoding set to gzip, and the files are dynamically decompressive transcoded by Cloud Storage.

  • The file size (decompressed) is larger than 2.14 GB.

  • The input stream runs into an error (and is recreated) after 2.14 GB is read.

As a workaround, do not set the content-encoding header, and store compressed files in Cloud Storage with the proper extension (for example, gz for gzip). For existing files, you can update the content-encoding header and file name with the gsutil tool.

December 20, 2018

Streaming Engine users should not upgrade to SDK 2.9.0 due to a known issue. If you choose to use SDK 2.9.0, you must also set the enable_conscrypt_security_provider experimental flag to enable conscrypt, which has known stability issues.

December 17, 2018

2019-The following decommission notice has been changed. For more information, see the release note for March 24.

2019-The following SDK versions will be decommissioned on March 25 due to the discontinuation of support for JSON-RPC and Global HTTP Batch Endpoints. Shortly after this date, you will no longer be able to submit new Cloud Dataflow jobs or update running Cloud Dataflow jobs that use the decommissioned SDKs. In addition, existing streaming jobs that use these SDK versions might fail.

  • Apache Beam SDK for Java, versions 2.0.0 to 2.4.0 (inclusive)
  • Apache Beam SDK for Python, versions 2.0.0 to 2.4.0 (inclusive)
  • Cloud Dataflow SDK for Java, versions 2.0.0 to 2.4.0 (inclusive)
  • Cloud Dataflow SDK for Python, versions 2.0.0 to 2.4.0 (inclusive)

See the SDK version support status page for detailed SDK support status.

October 22, 2018

Cloud Dataflow is now able to use workers in zones in the asia-east2 region (Hong Kong).

October 16, 2018

2018-Cloud Dataflow SDK 1.x for Java is unsupported as of October 16. In the near future, the Cloud Dataflow service will reject new Cloud Dataflow jobs that are based on Cloud Dataflow SDK 1.x for Java. See Migrating from Cloud Dataflow SDK 1.x for Java for migration guidance.

October 03, 2018

Cloud Dataflow now has a Public IP parameter that allows you to turn off public IP addresses for your worker nodes.

July 16, 2018

Cloud Dataflow Shuffle is now generally available.

July 10, 2018

Cloud Dataflow is now able to use workers in zones in the us-west2 region (Los Angeles).

June 14, 2018

Streaming Engine is now publicly available in beta. Streaming Engine moves streaming pipeline execution out of the worker VMs and into the Cloud Dataflow service backend.

June 11, 2018

You can now specify a user-managed controller service account when you run your pipeline job.

Cloud Dataflow is now able to use workers in zones in the europe-north1 region (Finland).

April 26, 2018

You can now view side input metrics for your pipeline from the Cloud Dataflow monitoring interface.

February 21, 2018

Cloud Dataflow now supports the following regional endpoints in GA: us-central1, us-east1, europe-west1, asia-east1, and asia-northeast1.

January 10, 2018

Cloud Dataflow is now able to use workers in zones in the northamerica-northeast1 region (Montréal).

Cloud Dataflow is now able to use workers in zones in the europe-west4 region (Netherlands).

October 31, 2017

Cloud Dataflow is now able to use workers in zones in the asia-south1 region (Mumbai).

October 30, 2017

Cloud Dataflow Shuffle is now available in the europe-west1 region.

Cloud Dataflow Shuffle is now available for pipelines using the Apache Beam SDK for Python version 2.1 or later.

October 25, 2017

October 12, 2017

Fixed the known issue disclosed on October 2, 2017.

October 02, 2017

Cloud Dataflow 2.x pipelines in which the output of a PTransform is consumed by a flatten and at least one other PTransform results in a malformed graph, leaving the other PTransforms input-less.

September 20, 2017

Cloud Dataflow provides beta support for regional endpoints us-central1 and europe-west1.

September 05, 2017

Cloud Dataflow is now able to use workers in zones in the southamerica-east1 region (São Paulo).

August 01, 2017

Cloud Dataflow is now able to use workers in zones in the europe-west3 region (Frankfurt).

July 20, 2017

You can now access the Stackdriver error report for your pipeline directly from the Dataflow monitoring interface.

June 20, 2017

Cloud Dataflow is now able to use workers in zones in the australia-southeast1 region (Sydney).

June 06, 2017

Cloud Dataflow is now able to use workers in zones in the europe-west2 region (London).

April 25, 2017

Per-step worker logs are now accessible directly in the Cloud Dataflow UI. Consult the documentation for more information.

April 11, 2017

The Cloud Dataflow service will now automatically shut down a streaming job if all steps have reached the maximum watermark. This will only affect pipelines in which every source produces only bounded input – e.g., streaming pipelines reading from Cloud Pub/Sub are not affected.

April 03, 2017

Improved graph layout in the Cloud Dataflow UI.

September 29, 2016

Autoscaling for streaming pipelines is now publicly available in beta for use with select sources and sinks. See the autoscaling documentation for more details.

September 15, 2016

The default autoscaling ceiling for batch pipelines using the Cloud Dataflow SDK for Java 1.6 or newer has been raised to 10 worker VMs. You can specify an alternate ceiling using the --maxNumWorkers pipeline option. See the autoscaling documentation for more details.

August 18, 2016

Autoscaling for batch pipelines using the Cloud Dataflow SDK for Java 1.6 or higher is now being enabled by default. This change will be rolled out to projects over the next several days. By default, the Cloud Dataflow service will cap the dynamic number of workers to a ceiling of 5 worker VMs. The default autoscaling ceiling may be raised in future service releases. You can specify an alternate ceiling using the --maxNumWorkers pipeline option. See autoscaling documentation for more details.

July 27, 2016

Announced beta support for the 0.4.0 release of the Cloud Dataflow SDK for Python. Get started and run your pipeline remotely on the service.

Default disk size for pipelines in streaming mode is now 420GB. This change will be rolled out to projects over the next several days.

March 14, 2016

Scalability and performance improvements available when using Cloud Dataflow SDK for Java version 1.5.0:

  • The service now scales to tens of thousands of initial splits when reading from a BoundedSource. This includes TextIO.Read, AvroIO.Read, and BigtableIO.Read, among others.
  • The service will now use Avro instead of JSON as a BigQuery export format for BigQueryIO.Read. This change greatly increases the efficiency and performance when reading from BigQuery.

January 29, 2016

Changes to the runtime environment for streaming jobs:

  • Files uploaded with --filesToStage were previously downloaded to: /dataflow/packages on the workers. With the latest service release, files will now be in the location /var/opt/google/dataflow. This change was a cleanup intended to better follow standard Linux path conventions.

January 19, 2016

Changes to the runtime environment for batch jobs:

  • Files uploaded with --filesToStage were previously downloaded to: /dataflow/packages on the workers. With the latest service release, files will now be in the location /var/opt/google/dataflow. This change was a cleanup intended to better follow standard Linux path conventions.

November 13, 2015

Usability improvements in the Monitoring UI:

  • The Job Log tab has been renamed Logs.
  • The View Log button has moved into the Logs tab, and renamed Worker Logs.

Performance and stability improvements for Streaming pipelines:

  • Addressed a condition that caused a slowly-growing memory usage in streaming workers.
  • Large Window buffers no longer need to fit entirely in memory at once.
  • Improved disk assignment to avoid data locality hotspots.
  • Worker logging is now optimized to avoid filling up the local disk.

August 12, 2015

The Cloud Dataflow Service is now generally available.

August 06, 2015

Monitoring changes:

  • Added JOB_STATE_CANCELLED as a possible state value for Cloud Dataflow jobs in the Monitoring UI and command-line interface. Appears when the user cancels a job.
  • Temporarily, as part of the above job state introduction, jobs may may show different job states in list view relative to the single job view.
  • Added Compute Engine core-hour count field to the monitoring UI and enabled core-hour counting for bounded jobs (field is populated with "-" for unbounded jobs).

Performance improvements to the unbounded runner.

July 28, 2015

Added a check during job creation to ensure active job names are unique within each project. You may no longer create a new job with the same name as an active job. If there are already active jobs with the same name running in the system, they will not be impacted by this change.

April 23, 2015

Improvements to the monitoring UI. Clicking View Log for a stage now defaults to display the logs generated by user code on the worker machines.

April 16, 2015

The Cloud Dataflow Service is now in beta.

Improvements to the monitoring UI: The job details page now provides more job information including job duration, and job type. For streaming pipelines, it additionally provides data watermark.

April 13, 2015

Command line interface now available for Cloud Dataflow in gcloud alpha.

Default disk size in batch is 250 GB.

April 09, 2015

Improvements to the monitoring UI: Improved organization of pipeline visualization.

Default VM for batch jobs is now n1-standard-1.

Improved resource teardown operations on job completion and cancellations.

Performance improvements for the service.

April 03, 2015

Improvements to the monitoring UI: The list of jobs now includes name, type, start time, and job ID.

March 27, 2015

Improved mechanisms for elastic scaling of compute resources. Batch pipelines can now grow and shrink the worker pool size at different stages of execution.

March 20, 2015

Monitoring changes:

  • Jobs summary page now shows the status of the current job.
  • Performance improvements to the UI.

March 06, 2015

Workers now use the Java 8 runtime.

March 01, 2015

Dynamic work rebalancing

Streaming support enabled for all projects participating in alpha.