Learn what's new with Cloud Dataprep: Watch the Next session on Creating a data transformation pipeline with Cloud Dataprep

Dataprep by Trifacta

An intelligent cloud data service to visually explore, clean, and prepare data for analysis and machine learning.

View documentation for this product.

Dataprep icon sits in front of open laptop with stack of web pages to the right, a spreadsheet with Trifacta logo on it to the left, and a bar graph in the cloud behind

Intelligent data preparation

Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning. Because Dataprep is serverless and works at any scale, there is no infrastructure to deploy or manage. Your next ideal data transformation is suggested and predicted with each UI input, so you don’t have to write code.

What's new

Computer monitor with data on display

Serverless simplicity

Dataprep is an integrated partner service operated by Trifacta and based on their industry-leading data preparation solution. Google works closely with Trifacta to provide a seamless user experience that removes the need for up-front software installation, separate licensing costs, or ongoing operational overhead. Dataprep is fully managed and scales on demand to meet your growing data preparation needs so you can stay focused on analysis.

Stylized graph with a stopwatch icon

Fast exploration and anomaly detection

Understand and explore data instantly with visual data distributions. Dataprep automatically detects schemas, data types, possible joins, and anomalies such as missing values, outliers, and duplicates so you get to skip the time-consuming work of assessing your data quality and go right to the exploration and analysis.

Page with text and blue checks in the right margin emerges from an open cardboard box. The Dataprep icon is to the left.

Easy and powerful data preparation

With each gesture in the UI, Dataprep automatically suggests and predicts your next ideal data transformation. Once you’ve defined your sequence of transformations, Dataprep uses Dataflow or BigQuery under the hood, enabling you to process structured or unstructured datasets of any size with the ease of clicks, not code.

Dataprep features

Starter, Professional, and Enterprise editions.

Predictive transformation

Dataprep uses a proprietary inference algorithm to interpret the data transformation intent of a user’s data selection. A ranked set of suggestions and patterns for the selections to match are automatically generated.

Rich transformations

Leverage hundreds of transformation functions to turn your data into the asset you want. With a click of a mouse, apply aggregation, pivot, unpivot, joins, union, extraction, calculation, comparison, condition, merge, regular expressions, and more.

Optimized processing throughput

Dataprep automatically selects the best underlying Google Cloud processing engine to transform the data as fast as possible. Based on the data locality and volume, Dataprep leverages BigQuery (in-place ELT transforms) to prepare the data, Dataflow, or for small volumes Dataprep's in-memory engine.

Active profiling

See and explore your data through interactive visual distributions of your data to assist in discovery, cleansing, and transformation. Visual representations help interpret large volumes of data, and Dataprep’s innovative profiling techniques visualize key statistical information in a dynamic, easy-to-consume format.

Data quality rules

Data quality rules suggest data quality indicators to monitor and remediate the accuracy, completeness, consistency, validity, and uniqueness of the data, ensuring that you have a comprehensive view of the cleanliness of your data.

Collaboration

In team environments, it can be helpful to be able to have multiple users work on the same assets or to create copies of good quality work to serve as templates for others. Dataprep enables users to collaborate on the same flow objects in real time or to create copies for others to use for independent work.

Comprehensive connectivity

In addition to BigQuery, Cloud Storage, Microsoft Excel, and Google Sheets standard connectivity, enrich your self-service analytics with hundreds of data sources such as Salesforce, Oracle, Microsoft SQL Server, MySQL, PostgreSQL, and many more.

Data pipeline orchestration

Schedule and automate your data preparation jobs by chaining them together in sequential and conditional order. Alert users of success or failure, and trigger external tasks (such as Cloud Functions). Leverage comprehensive APIs to integrate Dataprep as part of an enterprise’s end-to-end solution.

Enterprise-scale operationalization

Adopt a continuous deployment practice with recipe import/export across editions and versions, flow parameters, custom configuration for Dataflow or BigQuery, performance tuning, and advanced APIs to automate software development life cycles and monitoring.

Common data types

Transform structured or unstructured datasets stored in CSV, JSON, relational table formats, or SaaS application data of any size—megabytes to petabytes—with equal ease and simplicity.

Pattern matching

Utilize columnar pattern matching to identify data patterns of interest to you and to surface them in the interface for use in building your recipes. Additionally, in your recipe steps, you can apply regular expressions or Dataprep patterns to locate patterns and transform the matching data in your datasets.

Standardization

Group values by similarities based on spelling or language-independent pronunciation and create standardized clusters of consistent values.

Sampling

For performance optimization, Dataprep automatically generates one or more samples of the data for display and manipulation in the client application. However, you can easily change the size of samples, the scope of the sample, and the method by which the sample is created.

Advanced security

Expand on current security standards by providing individual data access control using a combination of Google IAM roles and BigQuery, Cloud Storage, and Google Sheets access rights to determine access.

Dataprep ELT pipeline architecture

On left, Ingestion column contains raw data in BigQuery, Cloud Storage, Google Sheets, Microsoft Excel, Databases, Applications, and File upload. Flow moves right, through Preparation & Storage column into Cloud Dataprep and Dataflow, data is refined in BigQuery and Cloud storage. Under this column is Governance & automation: Data Catalog, Cloud Functions, Cloud Composer. Flow continues right into Analysis & ML column, with BigQuery/BigQueryML, Looker, Google Data Studio, Partner BI services (Qlik logo here), and Cloud AI Platform.

Dataprep allows us to quickly explore new datasets, and its flexibility supports all our data transformation needs. Data preparation work at Merkle is now completed in minutes, not hours or days, accelerating our data preparation time by 90%.

Henry Culver, IT Architect, Merkle

Our customers

Pricing

Dataprep is an interactive web application in which users define the data preparation rules by interacting with a sample of their data. For execution of the flow over the complete dataset, the flow can be executed as a Dataprep job (using Dataflow).  Pricing is split across two variables; design and execution. Design is priced on a per-project basis for an unlimited number of users. The execution price consists of the Dataflow usage for running jobs in Dataprep. Learn more and view complete details in our pricing page in Google Cloud Marketplace.

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Need help getting started?
Work with a trusted partner
Continue browsing

Take the next step

Start your next project, explore interactive tutorials, and manage your account.

Need help getting started?
Work with a trusted partner
Get tips & best practices