Dataproc pricing is based on the size of Dataproc clusters and the duration of time that they run. The size of a cluster is based on the aggregate number of virtual CPUs (vCPUs) across the entire cluster, including the master and worker nodes. The duration of a cluster is the length of time between cluster creation and cluster deletion.
Although the pricing tables on this page reflect hourly rates, Dataproc is billed by the second. All Dataproc clusters are billed in one-second clock-time increments, subject to a 1-minute minimum billing. Usage is stated in fractional hours (for example, 30 minutes is expressed as 0.5 hours) in order to apply hourly pricing to second-by-second use.
Dataproc pricing is in addition to the Compute Engine per-instance price for each virtual machine, as described below. Compute Engine resources in a Dataproc cluster are also billed in per-second increments.
Dataproc supports the following Compute Engine instance types in clusters:
- Standard machine types
- High-memory machine types
- High-CPU machine types
- Memory-optimized machine types
- Custom machine types
Standard machine types
High-memory machine types
High-memory machine types have 6.50 GB of RAM per virtual core. High-memory instances are ideal for tasks that require more memory relative to virtual CPUs.
High-CPU machine types
High-CPU machine types have one virtual core for every 0.90 GB of RAM. High-CPU machine types are ideal for tasks that require more virtual CPUs relative to memory.
Memory-optimized machine types
Memory-optimized machine types are ideal for tasks that require intensive use of memory with higher memory to vCPU ratios than high-memory machine types. Memory-optimized machine types have 15 GB of RAM per virtual CPU. See Regions and Zones for a listing of locations where memory-optimized machine types are available.
Custom machine types
Create a custom machine type with a specific number of vCPUs and amount of memory if predefined machine types are not optimal for your workloads. Custom machine types also save you the cost of running on a larger, more expensive machine type if you do not need to use all of the resources of that machine type.
The Dataproc charge for custom machine types depends on the total number of vCPUs for each node.
Use of other Google Cloud resources
As a managed and integrated solution, Dataproc is built on top of other Google Cloud technologies. Dataproc clusters consume the following resources, each billed at its own pricing:
- Compute Engine—All Compute Engine instances for a Dataproc cluster have a 1-minute clock-time minimum, and are billed based on per-second billing increments and sustained use price rules.
- Standard Persistent Disk provisioned space
- Cloud Monitoring—see Google Cloud's operations suite Pricing
Dataproc clusters can optionally utilize the following resources, each billed at its own pricing, including but not limited to:
As an example, consider a cluster (with master and worker nodes) that has the following configuration running in a US zone where the Dataproc price is $0.01 per virtual CPU.
|Item||Machine Type||Virtual CPUs||Attached persistent disk||Number in cluster|
|Master Node||n1-standard-4||4||500 GB||1|
|Worker Nodes||n1-standard-4||4||500 GB||5|
This Dataproc cluster has 24 virtual CPUs, 4 for the master and 20 spread across the workers. For Dataproc billing purposes, the pricing for this cluster would be based on those 24 virtual CPUs and the length of time the cluster ran. If the cluster runs for 2 hours, for example, the Dataproc pricing would use the following formula:
Dataproc charge = # of vCPUs * hours * Dataproc price = 24 * 2 * $0.01 = $0.48
In this example, the cluster uses other Google Cloud products which would be billed in addition to the Dataproc charge. Specifically, this cluster would incur charges for Compute Engine and Standard Persistent Disk Provisioned Space in addition to the Dataproc charge. The billing calculator can be used to determine those separate costs based on current rates.