Dataproc Serverless pricing
Dataproc Serverless for Spark pricing is based on the number of Data Compute Units (DCUs), the number of accelerators used, and the amount of shuffle storage used. DCUs, accelerators, and shuffle storage are billed per second, with a 1-minute minimum charge for DCUs and shuffle storage, and a 5-minute minimum charge for accelerators.
Each Dataproc vCPU counts as 0.6 DCU. RAM is charged differently below and above 8GB. Each gigabyte of RAM below 8G gigabyte per vCPU counts as 0.1 DCU, and each gigabyte of RAM above 8G gigabyte per vCPU counts as 0.2 DCU. Memory used by Spark drivers and executors and system memory usage are counted towards DCU usage.
By default, each Dataproc Serverless for Spark batch and interactive workload consumes a minimum of 12 DCUs for the duration of the workload: the driver uses 4 vCPUs and 16GB of RAM and consumes 4 DCUs, and each of the 2 executors uses 4 vCPUs and 16GB of RAM and consumes 4 DCUs. You can customize the number of vCPUs and the amount of memory per vCPU by setting Spark properties. No additional Compute Engine VM or Persistent Disk charges apply.
Data Compute Unit (DCU) pricing
The DCU rate shown below is an hourly rate. It is prorated and billed per
second, with a 1-minute minimum charge.
Dataproc Serverless for Spark interactive workload is charged at Premium.
Shuffle storage pricing
The shuffle storage rate shown below is a monthly rate. It is prorated and billed per second, with a 1-minute minimum charge for standard shuffle storage and a 5-minute minimum charge for Premium shuffle storage. Premium shuffle storage can only be used with Premium Compute Unit.
Accelerator pricing
The accelerator rate shown below is an hourly rate. It is prorated and billed per
second, with a 5-minute minimum charge.
Pricing example
If the Dataproc Serverless for Spark batch workload runs with 12 DCUs
(spark.driver.cores=4
,spark.executor.cores=4
,spark.executor.instances=2
)
for 24 hours in the us-central1 region and consumes 25GB of shuffle storage, the
price calculation is as follows.
Total compute cost = 12 * 24 * $0.060000 = $17.28 Total storage cost = 25 * ($0.040/301) = $0.03 ------------------------------------------------ Total cost = $17.28 + $0.03 = $17.31
Notes:
- The example assumes a 30-day month. Since the batch workload duration is one day, the monthly shuffle storage rate is divided by 30.
If the Dataproc Serverless for Spark batch workload runs with 12 DCUs and 2
L4 GPUs (spark.driver.cores=4
,spark.executor.cores=4
,
spark.executor.instances=2
,spark.dataproc.driver.compute.tier=premium
,
spark.dataproc.executor.compute.tier=premium
,
spark.dataproc.executor.disk.tier=premium
,
spark.dataproc.executor.resource.accelerator.type=l4
) for 24 hours in the
us-central1 region and consumes 25GB of shuffle storage, the price calculation
is as follows.
Total compute cost = 12 * 24 * $0.089000 = $25.632 Total storage cost = 25 * ($0.1/301) = $0.083 Total accelerator cost = 2 * 24 * $0.6720 = $48.39 ------------------------------------------------ Total cost = $25.632 + $0.083 + $48.39 = $74.105
Notes:
- The example assumes a 30-day month. Since the batch workload duration is one day, the monthly shuffle storage rate is divided by 30.
If the Dataproc Serverless for Spark interactive workload runs with 12 DCUs
(spark.driver.cores=4
,spark.executor.cores=4
,spark.executor.instances=2
)
for 24 hours in the us-central1 region and consumes 25GB of shuffle storage, the
price calculation is as follows:
Total compute cost = 12 * 24 * $0.089000 = $25.632 Total storage cost = 25 * ($0.040/301) = $0.03 ------------------------------------------------ Total cost = $25.632 + $0.03 = $25.662
Notes:
- The example assumes a 30-day month. Since the batch workload duration is one day, the monthly shuffle storage rate is divided by 30.
Pricing estimation example
When a batch workload completes, Dataproc Serverless for Spark calculates
UsageMetrics,
which contain an approximation of the total DCU, accelerator, and shuffle
storage resources consumed by the completed workload. After running a workload,
you can run the gcloud dataproc batches describe BATCH_ID
command to view workload usage metrics to help you estimate the cost of running
the workload.
Example:
Dataproc Serverless for Spark runs a workload on an ephemeral cluster with
one master and two workers. Each node consumes 4 DCUs (default is 4 DCUs per
core—see spark.dataproc.driver.disk.size
)
and 400 GB shuffle storage
(default is 100GB per core—see
spark.driver.cores
).
Workload run time is 60 seconds. Also, each worker has 1 GPU for a total
of 2 across the cluster.
The user runs gcloud dataproc batches describe BATCH_ID --region REGION
to obtain usage metrics. The command output includes the following snippet
(milliDcuSeconds
: 4 DCUs x 3 VMs x 60 seconds x 1000
=
720000
, milliAcceleratorSeconds
: 1 GPU x 2 VMs x 60 seconds x 1000
=
120000
, and shuffleStorageGbSeconds
: 400GB x 3 VMs x 60 seconds
= 72000
):
runtimeInfo: approximateUsage: milliDcuSeconds: '720000' shuffleStorageGbSeconds: '72000' milliAcceleratorSeconds: '120000'
Use of other Google Cloud resources
Your Dataproc Serverless for Spark workload can optionally utilize the following resources, each billed at its own pricing, including but not limited to:
What's next
- Read the Dataproc Serverless documentation.
- Get started with Dataproc Serverless.
- Try the Pricing calculator.