Intelligent data preparation
Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning. Because Cloud Dataprep is serverless and works at any scale, there is no infrastructure to deploy or manage. Your next ideal data transformation is suggested and predicted with each UI input, so you don’t have to write code.
5 reasons to evaluate Cloud Dataprep Premium
How Callahan improves media impact by 90% by automating its cloud data warehouse
What’s new in Cloud Dataprep newsletter
New Use Cases in Cloud Dataprep - Data Pipelines & New Connectors: In this session, we’ll take a deep dive into new capabilities for Cloud Dataprep recently released this summer. We’ll walk you through several new features to enable a wider range of use cases with Cloud Dataprep; and, major enhancements to existing features asked for by customers.
Mastering Pricing Optimization with Data Preparation
Cloud Dataprep is an integrated partner service operated by Trifacta and based on their industry-leading data preparation solution. Google works closely with Trifacta to provide a seamless user experience that removes the need for up-front software installation, separate licensing costs, or ongoing operational overhead. Cloud Dataprep is fully managed and scales on demand to meet your growing data preparation needs so you can stay focused on analysis.
Fast exploration and anomaly detection
Understand and explore data instantly with visual data distributions. Cloud Dataprep automatically detects schemas, data types, possible joins, and anomalies such as missing values, outliers, and duplicates so you get to skip the time-consuming work of assessing your data quality and go right to the exploration and analysis.
Easy and powerful data preparation
With each gesture in the UI, Cloud Dataprep automatically suggests and predicts your next ideal data transformation. Once you’ve defined your sequence of transformations, Cloud Dataprep uses Cloud Dataflow under the hood, enabling you to process structured or unstructured datasets of any size with the ease of clicks, not code.
Cloud Dataprep Features
Standard and Premium editions
Cloud Dataprep uses a proprietary inference algorithm to interpret the data transformation intent of a user’s data selection. A ranked set of suggestions and patterns for the selections to match are automatically generated.
Leverage hundreds of transformation functions to turn your data into the asset you want. With a click of a mouse, apply aggregation, pivot, unpivot, joins, union, extraction, calculation, comparison, condition, merge, regular expressions, and more.
Execute a recipe across multiple instances of identical datasets by parameterizing a variable to replace the parts of the file path that change with each refresh. This variable can be modified as needed at job runtime.
In team environments, it can be helpful to be able to have multiple users work on the same assets or to create copies of good quality work to serve as templates for others. Cloud Dataprep enables users to collaborate on the same flow objects in real time or to create copies for others to use for independent work.
Utilize columnar pattern matching to identify data patterns of interest to you and to surface them in the interface for use in building your recipes. Additionally, in your recipe steps, you can apply regular expressions or Cloud Dataprep patterns to locate patterns and transform the matching data in your datasets.
Group values by similarities based on spelling or language-independent pronunciation and create standardized clusters of consistent values.
See and explore your data through interactive visual distributions of your data to assist in discovery, cleansing, and transformation. Visual representations help interpret large volumes of data, and Cloud Dataprep’s innovative profiling techniques visualize key statistical information in a dynamic, easy-to-consume format.
For performance optimization, Cloud Dataprep automatically generates one or more samples of the data for display and manipulation in the client application. However, you can easily change the size of samples, the scope of the sample, and the method by which the sample is created.
Schedule the execution of recipes in your flows on a recurring or as-needed basis. When the scheduled job successfully executes, you can collect the wrangled output in the specified output location, where it is available in the published form you specify.
Define target schemas, through imported or created datasets, and assign to an existing recipe to systematize and speed up your wrangling efforts. Targets appear in the Transformer page and can be applied against the entire dataset or selected columns of the dataset you need to wrangle.
Common data types
Transform structured or unstructured datasets, stored in CSV, JSON, or relational table formats, of any size — megabytes to petabytes — with equal ease and simplicity.
Integrated with Google Cloud Platform
Process data stored in Cloud Storage, BigQuery, or from your desktop, then export refined data to BigQuery or Cloud Storage for storage, analysis, visualization, or machine learning. User access and data security is seamlessly managed with Cloud Identity and Access Management.
In addition to BigQuery, Cloud Storage, Microsoft Excel, and Google Sheets standard connectivity, enrich your self-service analytics with Salesforce, Oracle, Microsoft SQL Server, MySQL, and PostgreSQL data sources.
Data pipeline orchestration
Raise your automation capabilities by chaining data preparation jobs together in sequential and conditional order. Alert users of success or failure, and trigger external tasks (such as Cloud Functions). Leverage comprehensive APIs to integrate Cloud Dataprep as part of an enterprise’s end-to-end solution.
Enterprise scale operationalization
Adopt a continuous deployment practice with recipe import/export across editions and versions, flow parameters, custom configuration for Google Dataflow performance tuning, and advanced APIs to automate Software Development Life Cycles and monitoring.
Data quality rules
Data quality rules suggest data quality indicators to monitor and remediate the accuracy, completeness, consistency, validity, and uniqueness of the data, ensuring that you have a comprehensive view of the cleanliness of your data.
Expand on current security standards by providing individual data access control using a combination of Google IAM roles and BigQuery, Cloud Storage, and Google Sheets access rights to determine access.
Cloud Dataprep is an interactive web application in which users define the data preparation rules by interacting with a sample of their data. For execution of the flow over the complete dataset, the flow can be executed as a Cloud Dataprep job (using Google Cloud Dataflow). Pricing is split across two variables; design and execution. Design is priced on a per-project basis for an unlimited number of users. The execution price consists of the Dataflow usage for running jobs in Dataprep. Learn more and view complete details in our pricing page in Google Cloud marketplace.