Dataprep is an interactive web application in which users define the data preparation rules by interacting with a sample of their data. Use of the application is free. Once a data preparation flow has been defined, the sample can be exported for free or the flow can be executed as a Dataprep job (using Google Dataflow) over the original dataset.
Pricing before March 15, 2019
Each Dataprep job is billed as a multiple of the execution cost (Cloud Dataprep Units) of the Dataflow job that performs the data transformation.
|Dataprep flow execution price|
|1.16 * (cost of Dataflow job that executed the Dataprep flow)1|
1 Dataprep jobs can execute with different resource configurations in order to optimize performance and efficiency, but the default Dataflow job configurations are typical.
To monitor or calculate the cost of a Dataprep job, navigate to the Dataflow monitoring page for your Dataprep job, and then note the resource consumption metrics (e.g. vCPU, Memory, Storage, etc.). Calculate the equivalent Dataflow cost, and then multiply the calculated cost by 1.16.
Dataprep jobs are charged for execution units, which are composed of Memory, vCPU, Storage, etc.
In addition to the Dataprep charges, a job may consume the following resources, which are billed at their own pricing, including but not limited to:
See the Google Cloud Platform Pricing Calculator to estimate Google Cloud Platform resource pricing.
Pricing on and after March 15, 2019
When you submit a job to Dataprep, it is executed by Dataflow workers. Starting on March 15, 2019, Dataprep will be billed according to the number of Dataflow worker virtual CPUs (vCPUs) that are needed to process a job and the time that the vCPUs are used.
As an example, take a Dataprep job that runs for 1 hour and requires 5 Dataflow virtual CPUs.
The price for this job can be calculated based on the vCPU-based pricing:
Dataprep job cost = 1 hour * $0.60 * 5 vCPUs
Dataprep job cost = $3.00
The number of workers used and the length of time they are used will depend on how Dataflow is configured to run the job. You can refer to the Dataflow Documentation and the Google Cloud Platform Pricing Calculator to estimate costs for Dataprep jobs.