Release Notes

These release notes apply to the Cloud Dataprep by Trifacta. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

Subscribe to the Cloud Dataprep release notes. Subscribe

November 19, 2018

This Cloud Dataprep release includes the following features, changes, issues, and fixes:

    • Variable overrides:
      • For flow executions, you can apply override values to multiple variables. See Flow View Page.
      • Apply variable overrides to scheduled job executions. See Add Schedule Dialog.
      • Variable overrides can now be applied to samples taken from your datasets with parameters. See Samples Panel.
    • New transformations:
      • Bin column: Place numeric values into bins of equal or custom size for the range of values
      • Scale column: Scale a column's values to a fixed range or to zero mean, unit variance distributions.
      • One-hot encoding: Encode values from one column into separate columns containing 0 or 1 , depending on the absence or presence of the value in the corresponding row.
      • Group By: Generate new columns or replacement tables from aggregate functions applied to grouped values from one or more columns.
    • CSV publishing options: Add quotes as CSV file publishing options. See Run Job Page.
    • Review and select patterns: Patterns are available for review and selection, prompting suggestions, in the context panel. See Pattern Details Panel.
    • Swap in dynamic datasets: Swap a static imported dataset with a dataset with parameters in Flow View. See Flow View Page.
    • Named samples: Generated samples can be named. See Samples Panel.
    • Join Panel: The Join page has been replaced by the new Join Panel in the context panel. See Join Panel.
    • Nested expressions: Expressions can be nested within expressions in Wrangle. See Wrangle Language.
    • TD-34840: Platform fails to provide suggestions for transformations when selecting keys from an object with many of them.
    • TD-34822: Case-sensitive variations in date range values are not matched when creating a dataset with parameters.
      • NOTE: Date range parameters are now case-insensitive.
    • DP-98: BigQuery does not support reading from tables stored in regions other than US or EU.
    • TD-34574: BigQuery tables and views with NUMERIC data type cannot be imported.
    • TD-33428: Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.
      • NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.
    • TD-30857: Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
      • NOTE: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.

September 21, 2018

  • Announcing the General Availability (GA) release of Cloud Dataprep. The following is a list of the latest release features, changes, deprecations, issues, and fixes in this release:

    • Share flows within the same project: Collaborate with other users through shared flows within the same GCP project. Or send them a copy for their own use. For more information, see Overview of Sharing.
    • NOTE: If you try to share a flow with a known user of Cloud Dataprep by TRIFACTA and receive a That is not a valid email error, please ask that user to login again into Cloud Dataprep in the same GCP project.
    • TD-34574: BigQuery tables and views with NUMERIC data type cannot be imported.
      • Workaround: Cast the NUMERIC type to FLOAT, and the import should succeed.
      • NOTE: Support for NUMERIC data type for BigQuery began on August 20, 2018. For details, see BigQuery Release Notes.
      • Support for NUMERIC data type in is planned for a future release.
    • TD-34061: Running jobs on datasets sourced from more than 6000 files may fail.

      NOTE: Due to a limitation in Cloud Dataflow, when you run a job on a parameterized dataset containing more than 1000 files, the input paths data must be compressed, which results in non-readable location values in the Cloud Dataflow job details.

      Workaround: For this and other performance reasons, try to limit your parameterized datasets to no more than 5000 source files.

    • TD-33428: Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.

      NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.

    • TD-33901: Cannot sort flows by name in Flows page.
    • TD-33900: When headers use protected names, the columns may be renamed.
    • TD-33888: 'Unable to load wrangled Dataset Script is malformed (Cannot read property 'push' of undefined)" error when opening recipe with Case transformations.
    • TD-33798: "Could not create dataset" error when importing Avro dataset from Cloud Storage.
    • TD-33797: Status icon for the active job in Jobs page flickers as you move the mouse.
    • TD-33108: Textbox for name of reference object in Flow View appears stretched.
    • TD-32123: Window transformation doesn't handle order parameter in descending order.

July 18, 2018

This Cloud Dataprep release includes the following features, changes, deprecations, issues, and fixes:

  • New home page and left nav bar: New Home page and left nav bar allows for more streamlined access to recent flows and jobs, as well as learning resources. See Home Page
  • Updated onboarding tutorial: Expanded onboarding tutorial that extends existing workflow to include import and job result guides.
  • New Library page: Manage your datasets and references from the new Library page. See Library Page.
  • Redesigned Jobs page: In the new Jobs page, you can more easily locate and review all jobs to which you have access. See Jobs Page.
  • Introducing pre-defined transformations for common tasks: Through the context panel, you can search across dozens of pre-defined transformations. Select one, and the Transform Builder is pre-populated based on the current context in the data grid or column browser.
  • New Transformer toolbar: New toolbar provides faster access to common transformations and operations. See Transformer Toolbar.
  • Match your recipe to the target: Assign a new target to your recipes to provide matching guidance during wrangling. See Overview of Target Matching.
    • Targets assigned to a recipe appear in a column header overlay to assist you in aligning your dataset to match the dataset schema to the target schema. See Data Grid Panel.
  • Cancel sampling jobs: Cancel in-progress sampling jobs. See Samples Panel.
  • Improved column matching: Better intelligence for column matching during union operations. See Union Page.
  • Improved Join page: Numerous functional improvements to the Join page. See Join Page.
  • More flexible column names: Support for a broader range of characters in column names. See Rename Columns.
  • Share flows: Collaborate with other users through shared flows within the same GCP project. Or send them a copy for their own use.
    NOTE: This feature may not be immediately available in your user account or in your collaborators' accounts. Please check again in a few days. For more information, see Overview of Sharing.
  • Import/Export Flows: Export flows and import them into a GCP project for flows created in Cloud Dataprep by TRIFACTA®.
    • See Export Flow.
    • See Import Flow.
    • You can also export the dependencies of an executed job as a separate flow. See Flow View Page.
    • You can only import flows that are exported from Cloud Dataprep by TRIFACTA® of the same version.
  • Introducing dynamic datasets with parameters: Use parameterized rules in imported dynamic datasets to allow scheduled jobs to automatically pick up the right input data. See Overview of Parameterization.
  • Datasets page: The Datasets page has been replaced by the new Library page. See Library Page.
  • Aggregate transform: The aggregate transform has been removed from the platform.
    • Aggregate functionality has been integrated into pivot, so you can accomplish the same tasks.

      NOTE: All prior functionality for the Aggregate transform is supported in the new release using the Pivot transform.

    • In the Search panel, enter pivot. See Search Panel.
  • TD-31305: Copying a flow invalidates the samples in the new copy. Copying or moving a node within a flow invalidates the node's samples.
    • This issue also applies to flows that were upgraded from a previous release.
    • Workaround: Recreate the samples after the move or copy.
  • TD-31252: Assigning a target schema through the Column Browser does not refresh the page.
    • Workaround: To update the page, reload the page through the browser.
  • TD-31165: Job results are incorrect when a sample is collected and then the last transform step is undone.
    • Workaround: Recollect a sample after undoing the transform step.
  • TD-30857: Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
    • Workaround: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.
  • TD-28807: You may receive a Nothing Found message when navigating to a BigQuery project that contains data. With your BigQuery administrator, please verify that the service account in use has been properly set up and has the appropriate permissions so that you can use the project.
  • TD-31339: Writing to a single file in the top-level directory fails if the temporary output generates more than 32 files.
  • TD-29149: Columns containing String values with leading spaces are incorrectly type cast to Integer data type.
  • TD-28930: Delete other columns causes column lineage to be lost and reorders columns.
  • TD-26069: Photon evaluates date(yr, month, 0) as first date of the previous month. It should return a null value.

May 23, 2018

This latest release includes the following changes and issues:

  • Product Name Change: As of this release, the product is now known as Cloud Dataprep by TRIFACTA®.
  • GDPR: The product is now compliant with GDPR regulations in the European Union. This regulation provides enhanced data privacy requirements for users. For more information, see https://www.eugdpr.org/.

    As part of this compliance, Cloud Dataprep by TRIFACTA has updated Terms of Service and Privacy Policy for all users, effective immediately:

TD-28807: You may receive a Nothing Found message when navigating to a BigQuery project that contains data. With your BigQuery administrator, please verify that the service account in use has been properly set up and has the appropriate permissions so that you can use the project.

April 25, 2018

When a user disables Cloud Dataprep, all metadata associated with Cloud Dataprep will be deleted. This operation in not reversible (see Effect of disabling Cloud Dataprep).

January 23, 2018

Announcing Cloud Dataprep Beta 5 release. The following is a list of release features, changes, deprecations, issues, and fixes:

  • New Flow View page: New objects in Flow View and better organization of them. See Flow View Page.
  • BigQuery read/write access across projects:
    • Read from BigQuery tables associated with GCP projects other than the current one where Cloud Dataprep was launched.
    • Write results into BigQuery tables associated with other projects.
    • You must configure Cloud Dataprep & Cloud Dataflow service accounts to have read or write access to BigQuery datasets and tables outside of the current GCP project.
  • Re-run job on Cloud Dataflow:
    • After you run a job in Cloud Dataprep, you can re-run the job directly from the Cloud Dataflow interface.
    • Inputs and outputs are parameters that you can modify.
    • Operationalize the job with a third-party scheduling tool.
    • See Run Job on Dataflow.
  • Cross joins: Perform cross joins between datasets. See Join Page.
  • Enable or disable type inference on files and tables: Enable (default) or disable initial type inference for BigQuery tables or Avro files used as sources for individual datasets. See Import Data Page.
  • Batch column rename: Rename multiple columns in a single transformation step. See Rename Columns.
  • Reuse your common patterns: Browse and select patterns for re-use from your recent history. See Pattern History Panel.
  • Convert phone and date patterns:
    • In Column Details, you can select a phone number or date pattern to generate suggestions for standardizing the values in the column to a single format.
    • See Column Details Panel.
  • New string comparison functions:
  • New SUBSTITUTE function: Replace string literals or patterns with a new literal or column value. See SUBSTITUTE Function.
  • New Flow Objects: The objects in your flow have been modified and expanded to provide greater flexibility in flow definition and re-use:
    • References: Create references to the outputs of your recipes and use them as inputs to other recipes.
    • Output object: Specify individual publishing outputs in a separate object associated with a recipe. Publishing options include format, location, and data type.
    • For more information, see Object Overview.
  • Wrangled Datasets: Wrangled datasets are no longer objects in Cloud Dataprep. Their functionality has been moved to other and new objects. For more information, see Object Overview.
  • TD-28155: Sampling from an Avro file on Cloud Dataflow always scans the entire file. As a result, additional processing costs may be incurred.
  • TD-26069: Photon evaluates date(yr, month, 0) as first date of the previous month. It should return a null value.
  • TD-27568: Cannot select BigQuery publishing destinations that are empty databases.
  • TD-25733: Attempting a union of 12 datasets crashes UI.
  • TD-24793: BigQueryNotFoundException were incorrectly reported for output tables that have been moved or deleted by user.
  • TD-24130: Cannot read recursive directory structures with files at different levels of folder depth in Cloud Dataflow.

November 2, 2017

Announcing a Cloud Dataprep release, which highlights a revamped UI, scheduling, improved sampling, and several other minor features. The following is a list of release features, changes, deprecations, issues, and fixes:

  • Interactive Getting Started Tutorial for New Users: New users to Cloud Dataprep can review the "Getting Started 101" tutorial with pre-loaded data through the product.
  • Scheduling: Schedule execution of one or more wrangled datasets within a flow. Scheduled jobs must be configured from Flow View. See Flow View Page.
  • New Transformer page: New navigation and layout for the Transformer page simplifies working with data and increases the area of the data grid. See Transformer Page.
  • Transformation suggestions are now displayed in a right-side panel, instead of on the bottom of the page. A preview for a transformation suggestion is displayed only when you hover over the suggestion.
  • Improved sampling: Enhanced sampling methods provide access to customizable, task-oriented subsets of your data. See Samples Panel.
  • Improved Transformer loading due to persistence of initial sample. For more information on the new sampling methods, see Overview of Sampling.
  • Improved Flow View: Improved user experience with flows. See Flow View Page.
  • Disable steps: Disable individual steps in your recipes. See Recipe Panel.
  • Set encoding settings during import: You can define per-file import settings including file encoding type and automated structure detection. See Import Dataset Page.
  • Snappy compression: Read/write support for Snappy compression. See Supported File Formats.
  • Column lineage: Highlight the recipe steps where a specific column is referenced. See Column Menus.
  • Search for columns: Search for columns by name. See Data Grid Panel.
  • CASE Function: Build multi-conditional expressions with a single CASE statement. See CASE Function.
  • Support for BigQuery Datetime: Publish Cloud Dataprep Datetime values to BigQuery as Datetime or Timestamp values, depending on the data. See BigQuery Data Type Conversions.
  • Supported browser version required: You cannot login to the application using an unsupported version of Google Chrome.
  • Supported encoding types: The list of supported encoding types has changed.
  • Dependencies Browser: The Dependencies browser has been replaced by the Dataset Navigator.
  • Transform Editor: The Transform Editor for entering raw text Wrangle steps has been removed. Please use the Transform Builder for creating transformation steps.
  • TD-27568: Cannot select BigQuery publishing destinations that are empty databases.
  • TD-24312: Improved Error Messages for Google users to identify pre-job run failures. If an error is encountered during the launch of a job but before job execution, you can now view a detailed error message as to the cause in the failed job card. Common errors that occur during the launch of a job include:
    • Cloud Dataflow staging location is not writeable
    • Cloud Dataflow cannot read from and write to different regions
    • Insufficient workers for Cloud Dataflow, please check your quota
  • TD-24273: Circular reference in schema of Avro file causes job in Cloud Dataflow to fail.
  • TD-23635: Read-only BigQuery databases are listed as publishing destinations. Publish fails.
  • TD-26177: Dataflow job fails for large avro files. Avro datasets that were imported before this release may still have failures during job execution on Dataflow. To fix these failures, you must re-import the dataset.
  • TD-25438: Deleting an upstream reference node does not propagate results correctly to the Transformer page.
  • TD-25419: When a pivot transform is applied, some column histograms may not be updated.
  • TD-23787: When publishing location is unavailable, spinning wheel hangs indefinitely without any error message.
  • TD-22467: Last active sample is not displayed during preview of multi-dataset operations.
  • TD-22128: Cannot read multi-file Avro stream if data is greater than 500 KB.
  • TD-19865: You cannot configure a publishing location to be a directory that does not already exist. See Run Job Page.
  • TD-17657: splitrows transform allows splitting even if required parameter on is set to an empty value.
  • TD-24464: 'Python Error' when opening recipe with large number of columns and a nest
  • TD-24322: Nest transform creates a map with duplication keys.
  • TD-23920 : Support for equals sign (=) in output path.
  • TD-23646: Adding a specific comment appears to invalidate earlier edit.
  • TD-23111: Long latency when loading complex flow views
  • TD-23099: View Results button is missing on Job Cards even with profiling enabled
  • TD-22889: Extremely slow UI performance for some actions

September 21, 2017

Announcing Cloud Dataprep public beta release. See the Cloud Dataprep Documentation.

May 17, 2017

  • The Cloud Dataprep application currently is compatible only with Chrome browsers. More specifically, it is dependent on the PNaCl plugin. Users can confirm that their Chrome environment supports PNaCl by accessing PNaCl demos. If the demos do not work, users may need to adjust their Chrome environment.
  • Cloud Dataprep jobs on Cloud Dataflow can only be started from the Cloud Dataprep UI. Programmatic execution is expected to be supported in a future release.
  • Cloud Dataprep jobs on Cloud Dataflow can only access data within the project.
  • A user may see sources that the user has access to but are not within the selected project. Cloud Dataflow jobs attempted with these sources may fail without warning.
  • Cloud Dataprep flows/datasets are only visible per user, per project. Sharing of flows/datasets is expected in a future release.
  • There is limited mapping for data types when publishing to BigQuery. For example, date/time and array types are written as strings. This will be fixed in a future release.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataprep by Trifacta