Scaling clusters

After creating a Cloud Dataproc cluster, you can adjust ("scale") the cluster by increasing or decreasing the number of worker nodes in the cluster. You can scale a Cloud Dataproc cluster at any time, even when jobs are running on the cluster.

Why scale a Cloud Dataproc cluster?

  1. to increase the number of workers to make a job run faster
  2. to decrease the number of workers to save money (see Graceful Decommissioning as an option to use when downsizing a cluster to avoid losing work in progress).
  3. to increase the number of nodes to expand available Hadoop Distributed Filesystem (HDFS) storage

Because clusters can be scaled more than once, you might want to increase/decrease the cluster size at one time, and then decrease/increase the size later.

Using Scaling

There are three ways you can scale your Cloud Dataproc cluster:

  1. Use the gcloud command-line tool in the Google Cloud SDK.
  2. Edit the cluster configuration in the Google Cloud Platform Console.
  3. Use the REST API.

New workers added to a cluster will use the same machine type as existing workers. For example, if a cluster is created with workers that use the n1-standard-8 machine type, new workers will also use the n1-standard-8 machine type.

gcloud

To scale a cluster with gcloud dataproc clusters update, run the following command.
gcloud dataproc clusters update cluster-name --num-workers new-number-of-workers
where cluster-name is the name of the cluster to update, and new-number-of-workers is the updated number of worker nodes. For example, to scale a cluster named "dataproc-1" to use five worker nodes, run the following command.
gcloud dataproc clusters update dataproc-1 --num-workers 5
Waiting on operation [operations/projects/project-id/operations/...].
Waiting for cluster update operation...done.
Updated [https://dataproc.googleapis.com/...].
clusterName: my-test-cluster
...
  masterDiskConfiguration:
    bootDiskSizeGb: 500
  masterName: dataproc-1-m
  numWorkers: 5
  ...
  workers:
  - my-test-cluster-w-0
  - my-test-cluster-w-1
  - my-test-cluster-w-2
  - my-test-cluster-w-3
  - my-test-cluster-w-4
...

Console

After a cluster is created, you can scale a cluster by clicking the Edit button on the Configuration tab on the cluster detail page.
Enter a new value for the number of Worker nodes (updated to "5" in the following screenshot).
Click Save to update the cluster.

REST API

See clusters.patch.

Graceful Decommissioning

When you update a cluster using Cloud Dataproc v 1.2 or later, you can use Graceful Decommissioning, which incorporates graceful YARN decommissioning to finish work in progress on a worker before it is removed from the Cloud Dataproc cluster.

Using Graceful Decommissioning

Cloud Dataproc Graceful Decommissioning incorporates graceful YARN decommissioning to finish work in progress on a worker before it is removed from the Cloud Dataproc cluster. As a default, graceful decommissioning is disabled. You enable it by setting a timeout value when you update your cluster to remove one or more workers from the cluster.

gcloud

When you update a cluster to remove one or more workers, use the gcloud beta dataproc clusters update command with the --graceful-decommission-timeout flag. The timeout (string) values can be a value of "0s" (the default; forceful not graceful decommissioning) or a positive duration relative to the current time (for example, "3s"). The maximum duration is 1 day.
gcloud dataproc clusters update \
  --graceful-decommission-timeout="timeout-value"
  other args ...

REST API

See clusters.patch.gracefulDecommissionTimeout. The timeout (string) values can be a value of "0" (the default; forceful not graceful decommissioning) or a duration in seconds (for example, "3s"). The maximum duration is 1 day.

Console

Support for Graceful Decommissioning in the Cloud Platform Console will be added in a future Cloud Dataproc release.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Dataproc Documentation