After creating a Cloud Dataproc cluster, you can adjust ("scale") the cluster by increasing or decreasing the number of worker nodes in the cluster. You can scale a Cloud Dataproc cluster at any time, even when jobs are running on the cluster.
Why scale a Cloud Dataproc cluster?
- to increase the number of workers to make a job run faster
- to decrease the number of workers to save money
- to increase the number of nodes to expand available Hadoop Distributed Filesystem (HDFS) storage
Because clusters can be scaled more than once, you might want to increase/decrease the cluster size at one time, and then decrease/increase the size later.
There are two ways you can scale your Cloud Dataproc cluster:
- Use the
gcloudcommand-line tool in the Google Cloud SDK.
- Edit the cluster configuration in the Google Cloud Platform Console.
New workers added to a cluster will use the same
as existing workers. For example, if a cluster is created with
workers that use the
n1-standard-8 machine type, new workers
will also use the
n1-standard-8 machine type.
Scaling with the gcloud command-line tool
To scale a cluster with
gcloud dataproc clusters update,
run the following command.
gcloud dataproc clusters update <cluster-name> --num-workers <new-number-of-workers>
cluster-name is the name of the cluster to update, and
new-number-of-workers is the updated number of worker nodes.
For example, to scale a cluster named "dataproc-1" to use five worker nodes, run the following command.
gcloud dataproc clusters update dataproc-1 --num-workers 5 Waiting on operation [operations/projects/project-id/operations/...]. Waiting for cluster update operation...done. Updated [https://dataproc.googleapis.com/...]. clusterName: my-test-cluster ... masterDiskConfiguration: bootDiskSizeGb: 500 masterName: dataproc-1-m numWorkers: 5 ... workers: - my-test-cluster-w-0 - my-test-cluster-w-1 - my-test-cluster-w-2 - my-test-cluster-w-3 - my-test-cluster-w-4 ...
Scaling with the Cloud Platform Console
After a cluster is created, you can scale a cluster by clicking the Edit button on the Configuration tab on the cluster detail page.
Enter a new value for the the number of Worker nodes (updated to "5" in the following screenshot).
Click Save to update the cluster.