Release notes

This page documents production updates to Cloud Data Fusion. Check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

You can see the latest product updates for all of Google Cloud on the Google Cloud release notes page.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/clouddatafusion-release-notes.xml

August 24, 2020

Highlights

Cloud Data Fusion 6.1.4 provides performance and scalability improvements that increase developer productivity and optimize pipeline runtime performance. The release includes scaled-up previews that support up to 50 concurrent runs, capabilities to handle large and complex schemas in Pipeline Studio, an enhanced log viewer, and other critical improvements and fixes.

This release is in parallel with the CDAP 6.1.4 release.

You can now create autoscaling Dataproc clusters.

You can now use the schema support feature in the UI to edit the precision and scale fields.

Cloud Data Fusion now has improved memory performance in pipelines by utilizing a disk-only auto-caching strategy.

Cloud Data Fusion previews now support up to 50 concurrent runs.

Cloud Data Fusion now supports large and deeply nested schemas (>5K fields with 20+ levels of nesting).

Fixed a bug where the setting for the number of executors in streaming pipelines was ignored.

Fixed a race condition where runtime monitoring failed when programs launched concurrently.

Fixed the preview page in table mode so that it shows multiple inputs and outputs with tabs.

Fixed the stability of state transitions for starting pipelines in AppFabric when AppFabric restarts.

Fixed a bug where a metric incorrectly counted the number of the records written in the Google Cloud Storage sink.

July 16, 2020

Cloud Data Fusion version 6.1.3 is now available. This version includes performance improvements and minor bug fixes.

  • Improved performance of Joiner plugins, aggregators, program startup, and previews.
  • Added support for custom images. You can select a custom Dataproc image by specifying the image URI.
  • Added support for rendering large schemas (>1000 fields) in the pipelines UI.
  • Added payload compression support to the messaging service.

April 22, 2020

Cloud Data Fusion version 6.1.2 is now available. This version includes several stability and performance improvements and new features.

  • Added support for Field Level Lineage for Spark plugins and Streaming pipelines
  • Added support for Spark 2.4
  • Added an option to skip header in the files in delimited, CSV, TSV, and text formats
  • Added an option for database source to replace the characters in the field names

Reduced preview startup by 60%. Also added limit to max concurrent preview runs (10 by default).

Fixed a bug that caused errors when Wrangler's parse-as-csv with header was used when reading multiple small files.

Fixed a bug that caused zombie processes when using the Remote Hadoop Provisioner.

Fixed a bug that caused DBSource plugin to fail in preview mode.

Fixed a race condition that caused a failure when running a Spark program.

January 10, 2020

Cloud Data Fusion version 6.1.1 is now available. This version includes several stability and performance improvements, as well as these new features:

  • Azure Data Lake storage support in Wrangler
  • Enabled Field Level Lineage (Beta)
  • Data Loss Prevention plugin to identify, tokenize, or encrypt sensitive data at scale (Beta)

December 10, 2019

Cloud Data Fusion version 6.1.0.5 is now available. This version includes several stability and performance improvements.

November 21, 2019

Cloud Data Fusion is now generally available.

Added support for creating Cloud Data Fusion instances that use private IP addresses.

Added support for creating private Cloud Data Fusion instances and executing data pipelines in a VPC-SC environment.

Added support to encrypt resources created in Cloud Storage, BigQuery, and Pub/Sub using Cloud Data Fusion with Customer Managed Encryption Keys.

Added reference documentation for creating and managing pipelines and datasets.

The Cloud Data Fusion UI is now available at a different URL in the format: <instance-name>-<project-id>-dot-<region identifier>.datafusion.googleusercontent.com.

May 31, 2019

Renamed "Cloud Dataprep service" to "Wrangler service" in the System Admin page of the Cloud Data Fusion UI.

Added a version number field to the Cloud Data Fusion Instance details page in the GCP Console.

Fixed a bug that caused Cloud Data Fusion to launch Cloud Dataproc clusters in an incorrect project.

Added support for specifying a subnet for the Cloud Dataproc provisioner.

Fixed the Cloud Dataproc provisioner to handle networks that do not use automatic subnet creation.

April 10, 2019

Cloud Data Fusion is now publicly available in beta.