Dataproc pricing
Dataproc on Compute Engine pricing
Dataproc on Compute Engine pricing is based on the size of Dataproc clusters and the duration of time that they run. The size of a cluster is based on the aggregate number of virtual CPUs (vCPUs) across the entire cluster, including the master and worker nodes. The duration of a cluster is the length of time between cluster creation and cluster stopping or deletion.
The Dataproc pricing formula is: $0.010 * # of vCPUs * hourly duration
.
Although the pricing formula is expressed as an hourly rate, Dataproc is billed by the second, and all Dataproc clusters are billed in one-second clock-time increments, subject to a 1-minute minimum billing. Usage is stated in fractional hours (for example, 30 minutes is expressed as 0.5 hours) in order to apply hourly pricing to second-by-second use.
Dataproc pricing is in addition to the Compute Engine per-instance price for each virtual machine (see Use of other Google Cloud resources).
Pricing example
As an example, consider a cluster (with master and worker nodes) that has the following configuration:
Item | Machine Type | Virtual CPUs | Attached persistent disk | Number in cluster |
---|---|---|---|---|
Master Node | n1-standard-4 | 4 | 500 GB | 1 |
Worker Nodes | n1-standard-4 | 4 | 500 GB | 5 |
This Dataproc cluster has 24 virtual CPUs, 4 for the master and 20 spread across the workers. For Dataproc billing purposes, the pricing for this cluster would be based on those 24 virtual CPUs and the length of time the cluster ran (assuming no nodes are scaled down or preempted). If the cluster runs for 2 hours, the Dataproc pricing would use the following formula:
Dataproc charge = # of vCPUs * hours * Dataproc price = 24 * 2 * $0.01 = $0.48
In this example, the cluster would also incur charges for Compute Engine and Standard Persistent Disk Provisioned Space in addition to the Dataproc charge (see Use of other Google Cloud resources). The billing calculator can be used to determine separate Google Cloud resource costs.
Use of other Google Cloud resources
As a managed and integrated solution, Dataproc is built on top of other Google Cloud technologies. Dataproc clusters consume the following resources, each billed at its own pricing:
- Compute Engine—All Compute Engine instances for a Dataproc cluster have a 1-minute clock-time minimum, and are billed based on per-second billing increments and sustained use price rules.
- Standard Persistent Disk provisioned space
- Cloud Monitoring—see Google Cloud Observability Pricing
Dataproc clusters can optionally utilize the following resources, each billed at its own pricing, including but not limited to:
Dataproc on GKE pricing
This section explains the charges that apply only to the virtual Dataproc cluster that runs on a user-managed GKE. See GKE pricing to learn about the added charges that apply to the user-managed GKE cluster.
The Dataproc on GKE pricing
formula, $0.010 * # of vCPUs * hourly duration
, is the same as the
Dataproc on Compute Engine pricing formula, and
is applied to the aggregate number of virtual CPUs running in VMs instances in
Dataproc-created node pools
in the cluster. The duration of a virtual machine instance is the length of time
from its creation to its deletion. As with Dataproc on Compute Engine,
Dataproc on GKE is billed by the second, subject to a 1-minute minimum billing
per virtual machine instance. Other Google Cloud charges
are applied in addition to Dataproc charges.
Dataproc-created node pools continue to exist after deletion of the Dataproc cluster since they may be shared by multiple clusters. If you delete the node pools or scale node pools down to zero instances, continued Dataproc charges will not be incurred. Any remaining node pool VMs will continue to incur charges until you delete them.
Dataproc Serverless pricing
See Dataproc Serverless pricing.
What's next
- Read the Dataproc documentation.
- Get started with Dataproc.
- Try the Pricing calculator.
- Learn about Dataproc solutions and use cases.