Release Notes: Cloud Dataflow Service

This page documents production updates to the Cloud Dataflow Service. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

July 10, 2018

Cloud Dataflow is now able to use workers in zones in the us-west2 region (Los Angeles).

June 14, 2018

Streaming Engine is now publicly available in Beta. Streaming Engine moves streaming pipeline execution out of the worker VMs and into the Cloud Dataflow service backend.

June 11, 2018

You can now specify a user-managed controller service account when you run your pipeline job.

Cloud Dataflow is now able to use workers in zones in the europe-north1 region (Finland).

April 26, 2018

You can now view side input metrics for your pipeline from the Cloud Dataflow monitoring interface.

February 21, 2018

Cloud Dataflow now supports the following regional endpoints in GA: us-central1, us-east1, europe-west1, asia-east1, and asia-northeast1.

January 10, 2018

Cloud Dataflow is now able to use workers in zones in the northamerica-northeast1 region (Montréal).

Cloud Dataflow is now able to use workers in zones in the europe-west4 region (Netherlands).

October 31, 2017

Cloud Dataflow is now able to use workers in zones in the asia-south1 region (Mumbai).

October 30, 2017

Cloud Dataflow Shuffle is now available in the europe-west1 region.

Cloud Dataflow Shuffle is now available for pipelines using the Apache Beam SDK for Python version 2.1 or later.

October 25, 2017

October 12, 2017

Fixed the known issue disclosed on October 2, 2017.

October 2, 2017

Cloud Dataflow 2.x pipelines in which the output of a PTransform is consumed by a flatten and at least one other PTransform results in a malformed graph, leaving the other PTransforms input-less.

September 20, 2017

Cloud Dataflow provides beta support for regional endpoints us-central1 and europe-west1.

September 5, 2017

Cloud Dataflow is now able to use workers in zones in the southamerica-east1 region (São Paulo).

August 1, 2017

Cloud Dataflow is now able to use workers in zones in the europe-west3 region (Frankfurt).

July 20, 2017

You can now access the Stackdriver error report for your pipeline directly from the Dataflow monitoring interface.

June 20, 2017

Cloud Dataflow is now able to use workers in zones in the australia-southeast1 region (Sydney).

June 6, 2017

Cloud Dataflow is now able to use workers in zones in the europe-west2 region (London).

April 25, 2017

Per-step worker logs are now accessible directly in the Cloud Dataflow UI. Consult the documentation for more information.

April 11, 2017

The Cloud Dataflow service will now automatically shut down a streaming job if all steps have reached the maximum watermark. This will only affect pipelines in which every source produces only bounded input – e.g., streaming pipelines reading from Cloud Pub/Sub are not affected.

April 3, 2017

Improved graph layout in the Cloud Dataflow UI.

September 29, 2016

Autoscaling for streaming pipelines is now publicly available in Beta for use with select sources and sinks. See the autoscaling documentation for more details.

September 15, 2016

The default autoscaling ceiling for batchpipelines using the Cloud Dataflow SDK for Java 1.6 or newer has been raised to 10 worker VMs. You can specify an alternate ceiling using the --maxNumWorkers pipeline option. See the autoscaling documentation for more details.

August 18, 2016

Autoscaling for batch pipelines using the Cloud Dataflow SDK for Java 1.6 or higher is now being enabled by default. This change will be rolled out to projects over the next several days. By default, the Cloud Dataflow service will cap the dynamic number of workers to a ceiling of 5 worker VMs. The default autoscaling ceiling may be raised in future service releases. You can specify an alternate ceiling using the --maxNumWorkers pipeline option. See autoscaling documentation for more details.

July 27, 2016

Announced beta support for the 0.4.0 release of the Cloud Dataflow SDK for Python. Get started and run your pipeline remotely on the service.

Default disk size for pipelines in streaming mode is now 420GB. This change will be rolled out to projects over the next several days.

March 14, 2016

Scalability and performance improvements available when using Cloud Dataflow SDK for Java version 1.5.0:

  • The service now scales to tens of thousands of initial splits when reading from a BoundedSource. This includes TextIO.Read, AvroIO.Read, and BigtableIO.Read, among others.
  • The service will now use Avro instead of JSON as a BigQuery export format for BigQueryIO.Read. This change greatly increases the efficiency and performance when reading from BigQuery.

January 29, 2016

Changes to the runtime environment for streaming jobs:

  • Files uploaded with --filesToStage were previously downloaded to: /dataflow/packages on the workers. With the latest service release, files will now be in the location /var/opt/google/dataflow. This change was a cleanup intended to better follow standard linux path conventions.

January 19, 2016

Changes to the runtime environment for batch jobs:

  • Files uploaded with --filesToStage were previously downloaded to: /dataflow/packages on the workers. With the latest service release, files will now be in the location /var/opt/google/dataflow . This change was a cleanup intended to better follow standard linux path conventions.

November 13, 2015

Usability improvements in the Monitoring UI:

  • The Job Log tab has been renamed Logs.
  • The View Log button has moved into the Logs tab, and renamed Worker Logs.

Performance and stability improvements for Streaming pipelines:

  • Addressed a condition that caused a slowly-growing memory usage in streaming workers.
  • Large Window buffers no longer need to fit entirely in memory at once.
  • Improved disk assignment to avoid data locality hotspots.
  • Worker logging is now optimized to avoid filling up the local disk.

August 12, 2015

The Cloud Dataflow Service is now Generally Available.

August 6, 2015

Monitoring changes:

  • Added JOB_STATE_CANCELLED as a possible state value for Cloud Dataflow jobs in the Monitoring UI and command-line interface. Appears when the user cancels a job.
  • Temproraily, as part of the above job state introduction, jobs may may show different job states in list view relative to the single job view.
  • Added Google Compute Engine core-hour count field to the monitoring UI and enabled core-hour counting for bounded jobs (field is populated with "-" for unbounded jobs).

In the Service: Performance improvements to the unbounded runner.

July 28, 2015

Adds a check during job creation to ensure active job names are unique within each project. You may no longer create a new job with the same name as an active job. If there are already active jobs with the same name running in the system, they will not be impacted by this change.

April 23, 2015

Improvements to the monitoring UI. Clicking View Log for a stage now defaults to display the logs generated by user code on the worker machines.

April 16, 2015

  • Improvements to the monitoring UI: The job details page now provides more job information including job duration, and job type. For streaming pipelines, it additionally provides data watermark.
  • The Cloud Dataflow Service is now in Beta.

April 13, 2015

  • Command line interface now available for Cloud Dataflow in gcloud alpha.
  • Default disk size in batch is 250 GB.

April 9, 2015

  • Improvements to the monitoring UI: Improved organization of pipeline visualization.
  • Default VM for batch jobs is now n1-standard-1.
  • Improved resource teardown operations on job completion and cancellations.
  • Performance improvements for the service.

April 3, 2015

Improvements to the monitoring UI: The list of jobs now includes name, type, start time, and job ID.

March 27, 2015

Improved mechanisms for elastic scaling of compute resources. Batch pipelines can now grow and shrink the worker pool size at different stages of execution.

March 20, 2015

Monitoring changes:

  • Jobs summary page now shows the status of the current job.
  • Performance improvements to the UI.

March 6, 2015

Workers now use the Java 8 runtime.

March 1, 2015

  • Dynamic work rebalancing.
  • Streaming support enabled for all projects participating in Alpha.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow