Release Notes

These release notes apply to Dataprep by Trifacta. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/cloud-dataprep-release-notes.xml

September 16, 2019

  • Introducing APIs: Manage job execution via API:
  • Cloud Dataprep byTRIFACTA INC. now supports API endpoints for programmatic execution and monitoring of Cloud Dataprep jobs. Beginning in this release, you can use token-based security to manage the launching and execution of Cloud Dataprep by TRIFACTA INC. jobs. For more information, see API Overview.
  • This API should be used as a replacement for Cloud Dataflow templates for programmatic invocation of Cloud Dataprep jobs. In addition, this feature includes support for dynamic functions and input & output destinations.
  • NOTE: Cloud Dataflow templates generated by Cloud Dataprep by TRIFACTA INC. are still supported but are no longer recommended for use.

Re-run jobs using Cloud Dataflow templates:

  • In prior releases, you could re-run a Cloud Dataprep by TRIFACTA INC.job by configuring the Cloud Dataflow template with input and output parameters for the job.
  • As of this release, Cloud Dataprep by TRIFACTA INC. will continue to generate Cloud Dataflow templates, but they are no longer recommended for use in programmatic execution of Cloud Dataflow jobs.
  • Instead, you can now run jobs and monitor them through exposed API endpoints. For more information, see API Overview.
  • Support for Cloud Dataflow templates will be decommissioned in December 2019.

TD-43284: When running a job via API, you cannot apply setting overrides, parameter values, or other execution settings as part of the job definition.

September 11, 2019

  • Introducing recipe macros: User-defined macros enable saving and reusing sequences of steps. For more information, see Overview of Macros.
  • Introducing Transformation by Example: Transform by example output values for a column of values. See Transformation by Example Page.

  • Redesigned Recipe Panel: Multi-step operations and more robust copy and paste actions are now supported. See Recipe Panel.

  • Browse flow for joins: Browse your current flow for datasets or recipes to join into the current recipe. See Join Panel.

  • Replace cell values: Build transformations to replace specific cell values. See Replace Cell Values.

  • Parameter overrides to destinations: Parameterize output paths and table and file names for dynamic destinations. See Run Job Page.

  • Specify VPC networks and sub-nets: You can specify your own Google VPC network and the sub-net IP address range to use for individual job execution or for your project. For more information, see Project Settings Page.

  • New functions:

  • Broader support for metadata references: Broader support for metadata references. For Excel files, $filepath references now return the location of the source Excel file. Sheet names are appended to the end of the reference. See Source Metadata References.

  • PNaCl browser extension no longer supported: Please verify that all users of Cloud Dataprep by TRIFACTA INC. are using a supported version of Google Chrome, which automatically enables use of WebAssembly. For more information, see Desktop Requirements .
  • Documentation errata: In prior releases, the documentation listed UTF32-BE and UTF32-LE as supported file formats. These formats are not supported. Documentation has been updated to correct this error. See Supported File Encoding Types.
  • TD-40424: UTF-32BE and UTF-32LE are available as supported file encoding options. They do not work.

    • NOTE: Although these options are available in the application, they have never been supported in the underlying platform. They have been removed from the interface.
  • TD-39296: Cannot run Cloud Dataflow jobs on datasets with parameters sourced from Parquet file or files.

May 16, 2019

  • Cloud Dataprep byTRIFACTA INC. now supports WebAssembly: The product now uses the WebAssembly browser client, which is the default in-browser web client for Google Chrome.
    • WebAssembly is available by default in Google Chrome version 68+. Please upgrade to a supported version of Google Chrome. No further installation or configuration is required. For more information, see Desktop Requirements.
    • Previously, the product supported the PNaCl browser client. This client is still available for use.
  • Cloud Dataflow: Cloud Dataflow SDK has been updated to version 2.11.
  • Cloud Dataflow templates support: Future versions of Cloud Dataprep by TRIFACTA INC. will contain a new method to execute jobs in a programmatic (API) manner. At that time, support for Cloud Dataflow templates will be revisited.

TD-39386: Some users may not be able to edit datasets with parameters, receiving an HTTP 403 error (permission denied) on sources that should be accessible.

March 20, 2019

  • TD-39411: Cannot import BigQuery table or view when source is originally from Google Suite.

    • Cloud Dataprep by TRIFACTA only supports native BigQuery tables and views. Cloud Dataprep by TRIFACTA does not support BigQuery sources that reference data stored in Google Suite, such as Google Sheets.
    • Workaround: Create a copy of the BigQuery table linked to the Google Suite source within BigQuery. Then, import the native BigQuery table as a dataset in Cloud Dataprep by TRIFACTA using the Import Dataset page.
  • TD-39386: Some users may not be able to edit datasets with parameters, receiving an HTTP 403 error (permission denied) on sources that should be accessible.

    • Workaround: Create a replacement dataset with parameters from scratch and swap out the old dataset with the new dataset with parameters.
  • TD-39296: Cannot run Cloud Dataflow jobs on datasets with parameters sourced from Parquet file or files.

    • Workaround: Generate source using another supported file format or union all Parquet-sourced datasets as first step.
  • TD-39295: Parquet jobs fail on Cloud Dataflow when dataset contains columns of INT96 data type.

    • Workaround: Data type INT96 has been deprecated from the library used to convert Parquet data. Please change the source to another data type and re-import. For more information, see PARQUET-1480 on GitHub.
  • TD-39173: Cannot preview imported datasets when source is Avro file.

    • Workaround: File can still be imported and wrangled.
  • TD-38869: Upload of Parquet files does not support nested values, which appear as null values in the Transformer page.

    • Workaround: Unnest the values before importing into the platform.
  • TD-37688: Documentation for new Selection Details Panel was not updated.

    • The Selection Details panel replaces and extends the Suggestion Cards Panel. The feature is present, while the documentation is outdated.
    • Updated documentation will be available in the next release.
    • Workaround: Documentation for the new Selection Details panel is available here: Selection Details Panel.
  • TD-37683: Send a copy does not create independent sets of recipes and datasets in new flow. If imported datasets are removed in the source flow, they disappear from the sent version.

    • Workaround: Create new versions of the imported datasets in the sent flow.
  • TD-36332: Data grid can display wrong results if a sample is collected and dataset is unioned.
  • TD-36192: Canceling a step in recipe panel can result in column menus disappearing in the data grid.
  • TD-31252: Assigning a target schema through the Column Browser does not refresh the page.
  • DP-98: BigQuery does not support reading from tables stored in regions other than US or EU.

November 19, 2018

  • Variable overrides:
    • For flow executions, you can apply override values to multiple variables.See Flow View Page.
    • Apply variable overrides to scheduled job executions. See Add Schedule Dialog.
    • Variable overrides can now be applied to samples taken from your datasets with parameters. See Samples Panel.
  • New transformations:
    • Bin column: Place numeric values into bins of equal or custom size for the range of values
    • Scale column: Scale a column's values to a fixed range or to zero mean, unit variance distributions.
    • One-hot encoding: Encode values from one column into separate columns containing 0 or 1 , depending on the absence or presence of the value in the corresponding row.
    • Group By: Generate new columns or replacement tables from aggregate functions applied to grouped values from one or more columns.
  • CSV publishing options: Add quotes as CSV file publishing options. See Run Job Page.
  • Review and select patterns: Patterns are available for review and selection, prompting suggestions, in the context panel. See Pattern Details Panel.
  • Swap in dynamic datasets: Swap a static imported dataset with a dataset with parameters in Flow View. See Flow View Page.
  • Named samples: Generated samples can be named. See Samples Panel.
  • Join Panel: The Join page has been replaced by the new Join Panel in the context panel. See Join Panel.
  • Nested expressions: Expressions can be nested within expressions in Wrangle. See Wrangle Language.
  • TD-34840: Platform fails to provide suggestions for transformations when selecting keys from an object with many of them.
  • TD-34822: Case-sensitive variations in date range values are not matched when creating a dataset with parameters.
    • NOTE: Date range parameters are now case-insensitive.
  • DP-98: BigQuery does not support reading from tables stored in regions other than US or EU.
  • TD-34574: BigQuery tables and views with NUMERIC data type cannot be imported.
  • TD-33428: Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.
    • NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.
  • TD-30857: Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
    • NOTE: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.

September 21, 2018

  • Announcing the General Availability (GA) release of Cloud Dataprep.
  • Share flows within the same project: Collaborate with other users through shared flows within the same GCP project. Or send them a copy for their own use. For more information, see Overview of Sharing.
  • NOTE: If you try to share a flow with a known user of Cloud Dataprep by TRIFACTA and receive a That is not a valid email error, please ask that user to login again into Cloud Dataprep in the same GCP project.
  • TD-34574: BigQuery tables and views with NUMERIC data type cannot be imported.
    • Workaround: Cast the NUMERIC type to FLOAT, and the import should succeed.
    • NOTE: Support for NUMERIC data type for BigQuery began on August 20, 2018. For details, see BigQuery Release Notes.
    • Support for NUMERIC data type in is planned for a future release.
  • TD-34061: Running jobs on datasets sourced from more than 6000 files may fail.

    NOTE: Due to a limitation in Cloud Dataflow, when you run a job on a parameterized dataset containing more than 1000 files, the input paths data must be compressed, which results in non-readable location values in the Cloud Dataflow job details.

    Workaround: For this and other performance reasons, try to limit your parameterized datasets to no more than 5000 source files.

  • TD-33428: Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.

    NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.

  • TD-33901: Cannot sort flows by name in Flows page.
  • TD-33900: When headers use protected names, the columns may be renamed.
  • TD-33888: 'Unable to load wrangled Dataset Script is malformed (Cannot read property 'push' of undefined)" error when opening recipe with Case transformations.
  • TD-33798: "Could not create dataset" error when importing Avro dataset from Cloud Storage.
  • TD-33797: Status icon for the active job in Jobs page flickers as you move the mouse.
  • TD-33108: Textbox for name of reference object in Flow View appears stretched.
  • TD-32123: Window transformation doesn't handle order parameter in descending order.

July 18, 2018

  • New home page and left nav bar: New Home page and left nav bar allows for more streamlined access to recent flows and jobs, as well as learning resources. See Home Page
  • Updated onboarding tutorial: Expanded onboarding tutorial that extends existing workflow to include import and job result guides.
  • New Library page: Manage your datasets and references from the new Library page. See Library Page.
  • Redesigned Jobs page: In the new Jobs page, you can more easily locate and review all jobs to which you have access. See Jobs Page.
  • Introducing pre-defined transformations for common tasks: Through the context panel, you can search across dozens of pre-defined transformations. Select one, and the Transform Builder is pre-populated based on the current context in the data grid or column browser.
  • New Transformer toolbar: New toolbar provides faster access to common transformations and operations. See Transformer Toolbar.
  • Match your recipe to the target: Assign a new target to your recipes to provide matching guidance during wrangling. See Overview of Target Matching.
    • Targets assigned to a recipe appear in a column header overlay to assist you in aligning your dataset to match the dataset schema to the target schema. See Data Grid Panel.
  • Cancel sampling jobs: Cancel in-progress sampling jobs. See Samples Panel.
  • Improved column matching: Better intelligence for column matching during union operations. See Union Page.
  • Improved Join page: Numerous functional improvements to the Join page. See Join Page.
  • More flexible column names: Support for a broader range of characters in column names. See Rename Columns.
  • Share flows: Collaborate with other users through shared flows within the same GCP project. Or send them a copy for their own use.
    NOTE: This feature may not be immediately available in your user account or in your collaborators' accounts. Please check again in a few days. For more information, see Overview of Sharing.
  • Import/Export Flows: Export flows and import them into a GCP project for flows created in Cloud Dataprep by TRIFACTA®.
    • See Export Flow.
    • See Import Flow.
    • You can also export the dependencies of an executed job as a separate flow. See Flow View Page.
    • You can only import flows that are exported from Cloud Dataprep by TRIFACTA® of the same version.
  • Introducing dynamic datasets with parameters: Use parameterized rules in imported dynamic datasets to allow scheduled jobs to automatically pick up the right input data. See Overview of Parameterization.

Datasets page: The Datasets page has been replaced by the new Library page. See Library Page.

Aggregate transform: The aggregate transform has been removed from the platform.

  • Aggregate functionality has been integrated into pivot, so you can accomplish the same tasks.

    NOTE: All prior functionality for the Aggregate transform is supported in the new release using the Pivot transform.

  • In the Search panel, enter pivot. See Search Panel.

  • TD-31305: Copying a flow invalidates the samples in the new copy. Copying or moving a node within a flow invalidates the node's samples.
    • This issue also applies to flows that were upgraded from a previous release.
    • Workaround: Recreate the samples after the move or copy.
  • TD-31252: Assigning a target schema through the Column Browser does not refresh the page.
    • Workaround: To update the page, reload the page through the browser.
  • TD-31165: Job results are incorrect when a sample is collected and then the last transform step is undone.
    • Workaround: Recollect a sample after undoing the transform step.
  • TD-30857: Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
    • Workaround: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.
  • TD-28807: You may receive a Nothing Found message when navigating to a BigQuery project that contains data. With your BigQuery administrator, please verify that the service account in use has been properly set up and has the appropriate permissions so that you can use the project.
  • TD-31339: Writing to a single file in the top-level directory fails if the temporary output generates more than 32 files.
  • TD-29149: Columns containing String values with leading spaces are incorrectly type cast to Integer data type.
  • TD-28930: Delete other columns causes column lineage to be lost and reorders columns.
  • TD-26069: Photon evaluates date(yr, month, 0) as first date of the previous month. It should return a null value.

May 23, 2018

  • Product Name Change: As of this release, the product is now known as Cloud Dataprep by TRIFACTA®.
  • GDPR: The product is now compliant with GDPR regulations in the European Union. This regulation provides enhanced data privacy requirements for users. For more information, see https://www.eugdpr.org/.

    As part of this compliance, Cloud Dataprep by TRIFACTA has updated Terms of Service and Privacy Policy for all users, effective immediately:

TD-28807: You may receive a Nothing Found message when navigating to a BigQuery project that contains data. With your BigQuery administrator, please verify that the service account in use has been properly set up and has the appropriate permissions so that you can use the project.

April 25, 2018

When a user disables Cloud Dataprep, all metadata associated with Cloud Dataprep will be deleted. This operation in not reversible (see Effect of disabling Cloud Dataprep by Trifacta).

January 23, 2018

  • Announcing Cloud Dataprep Beta 5 release. The following is a list of release features, changes, deprecations, issues, and fixes:
  • New Flow View page: New objects in Flow View and better organization of them. See Flow View Page.
  • BigQuery read/write access across projects:
    • Read from BigQuery tables associated with GCP projects other than the current one where Cloud Dataprep was launched.
    • Write results into BigQuery tables associated with other projects.
    • You must configure Cloud Dataprep & Cloud Dataflow service accounts to have read or write access to BigQuery datasets and tables outside of the current GCP project.
  • Re-run job on Cloud Dataflow:
    • After you run a job in Cloud Dataprep, you can re-run the job directly from the Cloud Dataflow interface.
    • Inputs and outputs are parameters that you can modify.
    • Operationalize the job with a third-party scheduling tool.
    • See Run Job on Dataflow.
  • Cross joins: Perform cross joins between datasets. See Join Page.
  • Enable or disable type inference on files and tables: Enable (default) or disable initial type inference for BigQuery tables or Avro files used as sources for individual datasets. See Import Data Page.
  • Batch column rename: Rename multiple columns in a single transformation step. See Rename Columns.
  • Reuse your common patterns: Browse and select patterns for re-use from your recent history. See Pattern History Panel.
  • Convert phone and date patterns:
    • In Column Details, you can select a phone number or date pattern to generate suggestions for standardizing the values in the column to a single format.
    • See Column Details Panel.
  • New string comparison functions:
  • New SUBSTITUTE function: Replace string literals or patterns with a new literal or column value. See SUBSTITUTE Function.

New Flow Objects: The objects in your flow have been modified and expanded to provide greater flexibility in flow definition and re-use:

  • References: Create references to the outputs of your recipes and use them as inputs to other recipes.
  • Output object: Specify individual publishing outputs in a separate object associated with a recipe. Publishing options include format, location, and data type.
  • For more information, see Object Overview.

Wrangled Datasets: Wrangled datasets are no longer objects in Cloud Dataprep. Their functionality has been moved to other and new objects. For more information, see Object Overview

  • TD-28155: Sampling from an Avro file on Cloud Dataflow always scans the entire file. As a result, additional processing costs may be incurred.
  • TD-26069: Photon evaluates date(yr, month, 0) as first date of the previous month. It should return a null value.
  • TD-27568: Cannot select BigQuery publishing destinations that are empty databases.
  • TD-25733: Attempting a union of 12 datasets crashes UI.
  • TD-24793: BigQueryNotFoundException were incorrectly reported for output tables that have been moved or deleted by user.
  • TD-24130: Cannot read recursive directory structures with files at different levels of folder depth in Cloud Dataflow.

November 02, 2017

  • Announcing a Cloud Dataprep release, which highlights a revamped UI, scheduling, improved sampling, and several other minor features. The following is a list of release features, changes, deprecations, issues, and fixes:
  • Interactive Getting Started Tutorial for New Users: New users to Cloud Dataprep can review the "Getting Started 101" tutorial with pre-loaded data through the product.
  • Scheduling: Schedule execution of one or more wrangled datasets within a flow. Scheduled jobs must be configured from Flow View. See Flow View Page.
  • New Transformer page: New navigation and layout for the Transformer page simplifies working with data and increases the area of the data grid. See Transformer Page.
  • Transformation suggestions are now displayed in a right-side panel, instead of on the bottom of the page. A preview for a transformation suggestion is displayed only when you hover over the suggestion.
  • Improved sampling: Enhanced sampling methods provide access to customizable, task-oriented subsets of your data. See Samples Panel.
  • Improved Transformer loading due to persistence of initial sample. For more information on the new sampling methods, see Overview of Sampling.
  • Improved Flow View: Improved user experience with flows. See Flow View Page.
  • Disable steps: Disable individual steps in your recipes. See Recipe Panel.
  • Set encoding settings during import: You can define per-file import settings including file encoding type and automated structure detection. See Import Dataset Page.
  • Snappy compression: Read/write support for Snappy compression. See Supported File Formats.
  • Column lineage: Highlight the recipe steps where a specific column is referenced. See Column Menus.
  • Search for columns: Search for columns by name. See Data Grid Panel.
  • CASE Function: Build multi-conditional expressions with a single CASE statement. See CASE Function.
  • Support for BigQuery Datetime: Publish Cloud Dataprep Datetime values to BigQuery as Datetime or Timestamp values, depending on the data. See BigQuery Data Type Conversions.
  • Supported browser version required: You cannot login to the application using an unsupported version of Google Chrome.
  • Supported encoding types: The list of supported encoding types has changed.
  • Dependencies Browser: The Dependencies browser has been replaced by the Dataset Navigator.

Transform Editor: The Transform Editor for entering raw text Wrangle steps has been removed. Please use the Transform Builder for creating transformation steps.

  • TD-27568: Cannot select BigQuery publishing destinations that are empty databases.
  • TD-24312: Improved Error Messages for Google users to identify pre-job run failures. If an error is encountered during the launch of a job but before job execution, you can now view a detailed error message as to the cause in the failed job card. Common errors that occur during the launch of a job include:
    • Cloud Dataflow staging location is not writeable
    • Cloud Dataflow cannot read from and write to different regions
    • Insufficient workers for Cloud Dataflow, please check your quota
  • TD-24273: Circular reference in schema of Avro file causes job in Cloud Dataflow to fail.
  • TD-23635: Read-only BigQuery databases are listed as publishing destinations. Publish fails.
  • TD-26177: Dataflow job fails for large avro files. Avro datasets that were imported before this release may still have failures during job execution on Dataflow. To fix these failures, you must re-import the dataset.
  • TD-25438: Deleting an upstream reference node does not propagate results correctly to the Transformer page.
  • TD-25419: When a pivot transform is applied, some column histograms may not be updated.
  • TD-23787: When publishing location is unavailable, spinning wheel hangs indefinitely without any error message.
  • TD-22467: Last active sample is not displayed during preview of multi-dataset operations.
  • TD-22128: Cannot read multi-file Avro stream if data is greater than 500 KB.
  • TD-19865: You cannot configure a publishing location to be a directory that does not already exist. See Run Job Page.
  • TD-17657: splitrows transform allows splitting even if required parameter on is set to an empty value.
  • TD-24464: 'Python Error' when opening recipe with large number of columns and a nest
  • TD-24322: Nest transform creates a map with duplication keys.
  • TD-23920 : Support for equals sign (=) in output path.
  • TD-23646: Adding a specific comment appears to invalidate earlier edit.
  • TD-23111: Long latency when loading complex flow views
  • TD-23099: View Results button is missing on Job Cards even with profiling enabled
  • TD-22889: Extremely slow UI performance for some actions

September 21, 2017

Announcing Cloud Dataprep public beta release. See the Cloud Dataprep Documentation.

May 17, 2017

  • The Cloud Dataprep application currently is compatible only with Chrome browsers. More specifically, it is dependent on the PNaCl plugin. Users can confirm that their Chrome environment supports PNaCl by accessing PNaCl demos. If the demos do not work, users may need to adjust their Chrome environment.
  • Cloud Dataprep jobs on Cloud Dataflow can only be started from the Cloud Dataprep UI. Programmatic execution is expected to be supported in a future release.
  • Cloud Dataprep jobs on Cloud Dataflow can only access data within the project.
  • A user may see sources that the user has access to but are not within the selected project. Cloud Dataflow jobs attempted with these sources may fail without warning.
  • Cloud Dataprep flows/datasets are only visible per user, per project. Sharing of flows/datasets is expected in a future release.
  • There is limited mapping for data types when publishing to BigQuery. For example, date/time and array types are written as strings. This will be fixed in a future release.
Esta página foi útil? Conte sua opinião sobre:

Enviar comentários sobre…

Google Cloud Dataprep Documentation
Precisa de ajuda? Acesse nossa página de suporte.