This page shows you how to use the Google Cloud SDK gcloud command-line tool to create a Google Cloud Dataproc cluster, run a simple Apache Spark job in the cluster, then modify the number of workers in the cluster.
Before you begin
Create a cluster
Run the following command to create a
example-cluster with default Cloud Dataproc settings:
gcloud dataproc clusters create example-cluster ... Waiting for cluster creation operation...done. Created [... example-cluster]
The default value of the
--region flag is
global. This is a special
multi-region endpoint that is capable of deploying instances into any
user-specified Compute Engine zone. You can also specify distinct regions,
europe-west1, to isolate resources (including VM
instances and Cloud Storage) and metadata storage locations utilized by Cloud Dataproc
within the user-specified region. See Regional endpoints
to learn more about the difference between global and regional endpoints.
See Available regions & zones
for information on selecting a region. You can also run the
gcloud compute regions list command to see a listing of available regions.
Submit a job
To submit a sample Spark job that calculates a rough value for pi, run the following command:
gcloud dataproc jobs submit spark --cluster example-cluster \ --class org.apache.spark.examples.SparkPi \ --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000
This command specifies:
- That you want to run a
sparkjob on the
classcontaining the main method for the job's pi-calculating application
- The location of the jar file containing your job's code
- Any parameters you want to pass to the job—in this case the number of
tasks, which is
The job's running and final output is displayed in the terminal window:
Waiting for job output... ... Pi is roughly 3.14118528 ... Job finished successfully.
Update a cluster
To change the number of workers in the cluster to five, run the following command:
gcloud dataproc clusters update example-cluster --num-workers 5
Your cluster's updated details are displayed in the command's output:
workerConfig: ... instanceNames: - example-cluster-w-0 - example-cluster-w-1 - example-cluster-w-2 - example-cluster-w-3 - example-cluster-w-4 numInstances: 5 statusHistory: ... - detail: Add 3 workers.
You can use the same command to decrease the number of worker nodes to the original value:
gcloud dataproc clusters update example-cluster --num-workers 2
To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:
clusters deleteto delete your example cluster.
gcloud dataproc clusters delete example-clusterYou are prompted to confirm that you want to delete the cluster. Type
yto complete the deletion.
- You should also remove any Cloud Storage buckets that were created by the
cluster by running the following command:
gsutil rm gs://bucket/subdir/**