Jump to Content
Data Analytics

New flexibility: Run your Dataprep jobs with BigQuery or Dataflow

April 7, 2021
Sudhir Hasbe

Sr. Director of Product Management, Google Cloud

Cloud Dataprep by Trifacta is Google Cloud’s intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analytics and machine learning. Due to its serverless architecture, Dataprep does not need any infrastructure to deploy or manage, and is fully scalable.

Designed and built through a partnership between Google and Trifacta, the data engineering platform, Dataprep allows users of all skill levels to prepare data. Dataprep has a graphical user interface so there is no need to write code to create a pipeline. Additionally, each UI input automatically predicts and suggests the next step for data transformation, making it easy and flexible for the end user. After making all desired transformations, Dataprep users can build repeatable data pipelines that can be executed at scale using the processing power of Google Cloud.

Introducing Dataprep BigQuery pushdown

Speaking of processing power, today we’re excited to announce a new addition to Dataprep: BigQuery pushdown. BigQuery pushdown gives you the flexibility to run jobs using either BigQuery or Dataflow. If you select BigQuery then Dataprep can automatically determine if data pipelines can be partially or fully translated in a BigQuery SQL statement. Any portions of the pipeline that cannot be run in BigQuery are executed in Dataflow. Utilizing the power of BigQuery results in highly efficient data transformations, especially for manipulations such as filters, joins, unions, and aggregations. Dataprep BigQuery pushdown leads to better performance, optimized costs, and increased security with IAM and OAuth support.

Dataprep leverages BigQuery or Dataflow to wrangle data (click to enlarge)

In today’s data focused world, the modern equivalent to ETL (Extract, Transform, Load) is ELT (Extract, Load, Transform). The ELT framework enables data savvy business users to transform data rather than relying on IT teams to both transform and load data. With ELT, technical teams can handle the logistics of moving and managing data in BigQuery, while business users can leverage the power of SQL for intermediate transformations. This reduces cycles between the technical and business teams for finalizing data transformation requirements, and reduces the burden on the technical teams. Dataprep BigQuery pushdown complements the ELT framework helping data savvy business users and the technically focused IT teams with data transformation and data preparation tasks.

In summary, the new BigQuery pushdown on Dataprep enables faster data transformations, optimized costs, and increased flexibility. You can learn more about Dataprep BigQuery pushdown and get started today by trying Dataprep on the Google Cloud Marketplace.

Posted in