To get the latest product updates delivered to you, add the URL of this page to your
feed
reader, or add the feed URL directly: https://cloud.google.com/feeds/cloud-dataflow-release-notes.xml
November 18, 2019
Flexible Resource Scheduling (FlexRS) in Cloud Dataflow is generally available. The service is available in five additional regions:
us-east1
(South Carolina)us-west1
(Oregon)asia-east1
(Taiwan)asia-northeast1
(Tokyo)europe-west4
(Netherlands)
You can now do the following in Cloud Dataflow SQL:
- Use Cloud Storage filesets as a data source
- Assign schemas to data sources in the Cloud Dataflow SQL UI
- Preview the content of Cloud Pub/Sub messages from the Cloud Dataflow SQL UI
October 31, 2019
Cloud Dataflow Shuffle and Streaming Engine are now available in us-east1
(South Carolina).
October 25, 2019
You can now see audit logs of Cloud KMS key operations and protect Cloud Dataflow Shuffle state using a customer-managed encryption key.
October 08, 2019
Python streaming for Apache Beam SDK 2.16 or higher is generally available. You can now do the following in Python:
- Update and Drain streaming pipelines.
- Enable streaming autoscaling.
- Use Streaming Engine.
Python 3 support for Apache Beam SDK 2.16.0 or higher is now generally available. This feature provides support for using Python 3.5, 3.6, and 3.7. You can run run any existing Python 2.7 batch and streaming pipelines that use DirectRunner
or DataflowRunner
. However, you might need to make changes to ensure that your pipeline code is compatible with Python 3. Keyword-only arguments (a syntactic construct introduced in Python 3) are not yet supported by Apache Beam SDK. For the current status and summary of recent Python 3-specific improvements, follow updates on the Apache Beam issue tracker.
October 07, 2019
Cloud Dataflow Shuffle and Streaming Engine are now available in two additional regions:
us-west1
(Oregon)asia-east1
(Taiwan)
September 03, 2019
Automatic hot key detection is now enabled in batch pipelines for Apache Beam SDK 2.15.0 or higher.
August 09, 2019
Integration with Cloud Dataflow VPC Service Controls is generally available.
August 02, 2019
Using Cloud Dataflow with Cloud Key Management Service is now available in beta. Customer-managed encryption keys (CMEK) allow for encryption of your pipeline state. This feature is limited to Persistent Disks attached to Cloud Dataflow workers and used for Persistent Disk-based shuffle and streaming state storage.
August 01, 2019
Python 3 support for Apache Beam SDK 2.14.0 or higher is now in beta. This feature provides support for using Python 3.5, 3.6, and 3.7. You can run any existing Python 2.7 batch and streaming pipelines that use DirectRunner
or DataflowRunner
. However, you might need to make changes to ensure that your pipeline code is compatible with Python 3. Some syntactic constructs introduced in Python 3 are not yet fully supported by the Apache Beam SDK. For details and current status, follow updates on the Apache Beam issue tracker.
May 16, 2019
Cloud Dataflow SQL is now publicly available in alpha. Cloud Dataflow SQL lets you use SQL queries to develop and run Cloud Dataflow jobs from the BigQuery web UI.
April 18, 2019
Cloud Dataflow is now able to use workers in zones in the asia-northeast2
region (Osaka, Japan).
April 10, 2019
Cloud Dataflow Streaming Engine is generally available. The service is available in two additional regions:
asia-northeast1
(Tokyo)europe-west4
(Netherlands)
Note that Streaming Engine requires the Apache Beam SDK for Java, versions 2.10.0 or higher.
Cloud Dataflow Shuffle is now available in two additional regions:
asia-northeast1
(Tokyo)europe-west4
(Netherlands)
Cloud Dataflow provides beta support for Flexible Resource Scheduling (FlexRS) in the us-central1
and europe-west1
regions.
Streaming autoscaling is generally available for pipelines that use Streaming Engine.
April 08, 2019
Apache Beam SDK for Python can only use BigQuery resources in the following regions:
- Regional locations:
us-west2
,us-east4
,europe-north1
,europe-west2
,europe-west6
. - Multi-regional locations:
EU
andUS
.
Cloud Dataflow provides beta support for Flexible Resource Scheduling (FlexRS) in the us-central1
and europe-west1
regions.
April 01, 2019
Cloud Dataflow provides beta support for VPC Service Controls.
March 24, 2019
The following SDK versions will be decommissioned later in 2019 due to the discontinuation of support for JSON-RPC and Global HTTP Batch Endpoints. Note that this change overrides the release note from December 17, that states that decommissioning was expected to happen in March 2019.
- Apache Beam SDK for Java, versions 2.0.0 to 2.4.0 (inclusive)
- Apache Beam SDK for Python, versions 2.0.0 to 2.4.0 (inclusive)
- Cloud Dataflow SDK for Java, versions 2.0.0 to 2.4.0 (inclusive)
- Cloud Dataflow SDK for Python, 2.0.0 to 2.4.0 (inclusive)
See the SDK version support status page for detailed SDK support status.
March 20, 2019
Apache Beam SDK 2.4.0 and Cloud Dataflow SDK 2.4.0 are now deprecated. For detailed support status information, see the SDK version support status table.
March 11, 2019
Cloud Dataflow is now able to use workers in zones in the europe-west6
region (Zürich, Switzerland).
March 06, 2019
Apache Beam SDK 2.10.0 depends on gcsio client library version 1.9.13, which has known issues:
- Reading side inputs can result in sending a large number of requests to Cloud Storage. As a result, Cloud Dataflow jobs can fail with HTTP 429 errors from Cloud Storage.
- Apache Beam ParquetIO fails when reading files from Cloud Storage.
To work around these issues, either upgrade to Apache Beam SDK 2.11.0, or override the gcsio client library version to 1.9.16 or later.
February 25, 2019
You can now view system latency and data freshness metrics for your pipeline in the Cloud Dataflow monitoring interface.
February 20, 2019
2018-2019-Apache Beam SDK 2.10.0 contains fixes for the known issues disclosed on December 20 and February 4.
February 04, 2019
In a specific case, users of Apache Beam Java SDKs (2.9.0 and earlier) and Cloud Dataflow Java SDKs (2.5.0 and earlier) might experience data duplication when reading files from Cloud Storage. Duplication might occur when all of the following conditions are true:
You are reading files with the content-encoding set to gzip, and the files are dynamically decompressive transcoded by Cloud Storage.
The file size (decompressed) is larger than 2.14 GB.
The input stream runs into an error (and is recreated) after 2.14 GB is read.
As a workaround, do not set the content-encoding header, and store compressed files in Cloud Storage with the proper extension (for example, gz for gzip). For existing files, you can update the content-encoding header and file name with the gsutil tool.
December 20, 2018
Streaming Engine users should not upgrade to SDK 2.9.0 due to a known issue. If you choose to use SDK 2.9.0, you must also set the enable_conscrypt_security_provider
experimental flag to enable conscrypt, which has known stability issues.
December 17, 2018
2019-The following decommission notice has been changed. For more information, see the release note for March 24.
2019-The following SDK versions will be decommissioned on March 25 due to the discontinuation of support for JSON-RPC and Global HTTP Batch Endpoints. Shortly after this date, you will no longer be able to submit new Cloud Dataflow jobs or update running Cloud Dataflow jobs that use the decommissioned SDKs. In addition, existing streaming jobs that use these SDK versions might fail.
- Apache Beam SDK for Java, versions 2.0.0 to 2.4.0 (inclusive)
- Apache Beam SDK for Python, versions 2.0.0 to 2.4.0 (inclusive)
- Cloud Dataflow SDK for Java, versions 2.0.0 to 2.4.0 (inclusive)
- Cloud Dataflow SDK for Python, versions 2.0.0 to 2.4.0 (inclusive)
See the SDK version support status page for detailed SDK support status.
October 22, 2018
Cloud Dataflow is now able to use workers in zones in the asia-east2
region (Hong Kong).
October 16, 2018
2018-Cloud Dataflow SDK 1.x for Java is unsupported as of October 16. In the near future, the Cloud Dataflow service will reject new Cloud Dataflow jobs that are based on Cloud Dataflow SDK 1.x for Java. See Migrating from Cloud Dataflow SDK 1.x for Java for migration guidance.
October 03, 2018
Cloud Dataflow now has a Public IP parameter that allows you to turn off public IP addresses for your worker nodes.
July 16, 2018
Cloud Dataflow Shuffle is now generally available.
July 10, 2018
Cloud Dataflow is now able to use workers in zones in the us-west2
region (Los Angeles).
June 14, 2018
Streaming Engine is now publicly available in beta. Streaming Engine moves streaming pipeline execution out of the worker VMs and into the Cloud Dataflow service backend.
June 11, 2018
You can now specify a user-managed controller service account when you run your pipeline job.
Cloud Dataflow is now able to use workers in zones in the europe-north1
region (Finland).
April 26, 2018
You can now view side input metrics for your pipeline from the Cloud Dataflow monitoring interface.
February 21, 2018
Cloud Dataflow now supports the following regional endpoints in GA: us-central1
, us-east1
, europe-west1
, asia-east1
, and asia-northeast1
.
January 10, 2018
Cloud Dataflow is now able to use workers in zones in the northamerica-northeast1
region (Montréal).
Cloud Dataflow is now able to use workers in zones in the europe-west4
region (Netherlands).
October 31, 2017
Cloud Dataflow is now able to use workers in zones in the asia-south1
region (Mumbai).
October 30, 2017
Cloud Dataflow Shuffle is now available in the europe-west1
region.
Cloud Dataflow Shuffle is now available for pipelines using the Apache Beam SDK for Python version 2.1 or later.
October 25, 2017
Cloud Dataflow provides beta support for additional Google-provided templates. To get started with templates, follow the quickstart.
October 12, 2017
Fixed the known issue disclosed on October 2, 2017.
October 02, 2017
Cloud Dataflow 2.x pipelines in which the output of a PTransform is consumed by a flatten and at least one other PTransform results in a malformed graph, leaving the other PTransforms input-less.
September 20, 2017
Cloud Dataflow provides beta support for regional endpoints us-central1
and europe-west1
.
September 05, 2017
Cloud Dataflow is now able to use workers in zones in the southamerica-east1
region (São Paulo).
August 01, 2017
Cloud Dataflow is now able to use workers in zones in the europe-west3
region (Frankfurt).
July 20, 2017
You can now access the Stackdriver error report for your pipeline directly from the Dataflow monitoring interface.
June 20, 2017
Cloud Dataflow is now able to use workers in zones in the australia-southeast1
region (Sydney).
June 06, 2017
Cloud Dataflow is now able to use workers in zones in the europe-west2
region (London).
April 25, 2017
Per-step worker logs are now accessible directly in the Cloud Dataflow UI. Consult the documentation for more information.
April 11, 2017
The Cloud Dataflow service will now automatically shut down a streaming job if all steps have reached the maximum watermark. This will only affect pipelines in which every source produces only bounded input – e.g., streaming pipelines reading from Cloud Pub/Sub are not affected.
April 03, 2017
Improved graph layout in the Cloud Dataflow UI.
September 29, 2016
Autoscaling for streaming pipelines is now publicly available in beta for use with select sources and sinks. See the autoscaling documentation for more details.
September 15, 2016
The default autoscaling ceiling for batch pipelines using the Cloud Dataflow SDK for Java 1.6 or newer has been raised to 10 worker VMs. You can specify an alternate ceiling using the --maxNumWorkers
pipeline option. See the autoscaling documentation for more details.
August 18, 2016
Autoscaling for batch pipelines using the Cloud Dataflow SDK for Java 1.6 or higher is now being enabled by default. This change will be rolled out to projects over the next several days. By default, the Cloud Dataflow service will cap the dynamic number of workers to a ceiling of 5 worker VMs. The default autoscaling ceiling may be raised in future service releases. You can specify an alternate ceiling using the --maxNumWorkers
pipeline option. See autoscaling documentation for more details.
July 27, 2016
Announced beta support for the 0.4.0 release of the Cloud Dataflow SDK for Python. Get started and run your pipeline remotely on the service.
Default disk size for pipelines in streaming mode is now 420GB. This change will be rolled out to projects over the next several days.
March 14, 2016
Scalability and performance improvements available when using Cloud Dataflow SDK for Java version 1.5.0:
- The service now scales to tens of thousands of initial splits when reading from a
BoundedSource
. This includesTextIO.Read
,AvroIO.Read
, andBigtableIO.Read
, among others. - The service will now use Avro instead of JSON as a BigQuery export format for
BigQueryIO.Read
. This change greatly increases the efficiency and performance when reading from BigQuery.
January 29, 2016
Changes to the runtime environment for streaming jobs:
- Files uploaded with
--filesToStage
were previously downloaded to:/dataflow/packages
on the workers. With the latest service release, files will now be in the location/var/opt/google/dataflow
. This change was a cleanup intended to better follow standard Linux path conventions.
January 19, 2016
Changes to the runtime environment for batch jobs:
- Files uploaded with
--filesToStage
were previously downloaded to:/dataflow/packages
on the workers. With the latest service release, files will now be in the location/var/opt/google/dataflow
. This change was a cleanup intended to better follow standard Linux path conventions.
November 13, 2015
Usability improvements in the Monitoring UI:
- The Job Log tab has been renamed Logs.
- The View Log button has moved into the Logs tab, and renamed Worker Logs.
Performance and stability improvements for Streaming pipelines:
- Addressed a condition that caused a slowly-growing memory usage in streaming workers.
- Large Window buffers no longer need to fit entirely in memory at once.
- Improved disk assignment to avoid data locality hotspots.
- Worker logging is now optimized to avoid filling up the local disk.
August 12, 2015
The Cloud Dataflow Service is now generally available.
August 06, 2015
Monitoring changes:
- Added JOB_STATE_CANCELLED as a possible state value for Cloud Dataflow jobs in the Monitoring UI and command-line interface. Appears when the user cancels a job.
- Temporarily, as part of the above job state introduction, jobs may may show different job states in list view relative to the single job view.
- Added Compute Engine core-hour count field to the monitoring UI and enabled core-hour counting for bounded jobs (field is populated with "-" for unbounded jobs).
Performance improvements to the unbounded runner.
July 28, 2015
Added a check during job creation to ensure active job names are unique within each project. You may no longer create a new job with the same name as an active job. If there are already active jobs with the same name running in the system, they will not be impacted by this change.
April 23, 2015
Improvements to the monitoring UI. Clicking View Log for a stage now defaults to display the logs generated by user code on the worker machines.
April 16, 2015
The Cloud Dataflow Service is now in beta.
Improvements to the monitoring UI: The job details page now provides more job information including job duration, and job type. For streaming pipelines, it additionally provides data watermark.
April 13, 2015
Command line interface now available for Cloud Dataflow in gcloud alpha
.
Default disk size in batch is 250 GB.
April 09, 2015
Improvements to the monitoring UI: Improved organization of pipeline visualization.
Default VM for batch jobs is now n1-standard-1
.
Improved resource teardown operations on job completion and cancellations.
Performance improvements for the service.
April 03, 2015
Improvements to the monitoring UI: The list of jobs now includes name, type, start time, and job ID.
March 27, 2015
Improved mechanisms for elastic scaling of compute resources. Batch pipelines can now grow and shrink the worker pool size at different stages of execution.
March 20, 2015
Monitoring changes:
- Jobs summary page now shows the status of the current job.
- Performance improvements to the UI.
March 06, 2015
Workers now use the Java 8 runtime.
March 01, 2015
Dynamic work rebalancing
Streaming support enabled for all projects participating in alpha.