This page shows you how to use the Google Cloud SDK gcloud command-line tool to create a Google Cloud Dataproc cluster, run a simple Apache Spark job in the cluster, then modify the number of workers in the cluster.
You can find out how to do the same or similar tasks with Quickstarts Using the API Explorer, the Google Cloud Console in Quickstart Using the Console, and using the Client Libraries in Quickstart using Google Cloud Client Libraries.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the Dataproc API.
Create a cluster
Run the following command to create a cluster called
See Available regions & zones
for information on selecting a region (you can also run the
gcloud compute regions list command to see a listing of available regions).
Also see Regional endpoints
to learn about the difference between
global and regional endpoints.
gcloud dataproc clusters create example-cluster --region=region
Cluster creation is confirmed in the command output:
... Waiting for cluster creation operation...done. Created [... example-cluster]
Submit a job
To submit a sample Spark job that calculates a rough value for pi, run the following command:
gcloud dataproc jobs submit spark --cluster example-cluster \ --region=region \ --class org.apache.spark.examples.SparkPi \ --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000
This command specifies:
- That you want to run a
sparkjob on the
example-clustercluster in the specified region
classcontaining the main method for the job's pi-calculating application
- The location of the jar file containing your job's code
- Any parameters you want to pass to the job—in this case the number of
tasks, which is
The job's running and final output is displayed in the terminal window:
Waiting for job output... ... Pi is roughly 3.14118528 ... Job finished successfully.
Update a cluster
To change the number of workers in the cluster to five, run the following command:
gcloud dataproc clusters update example-cluster \ --region=region \ --num-workers 5
Your cluster's updated details are displayed in the command's output:
workerConfig: ... instanceNames: - example-cluster-w-0 - example-cluster-w-1 - example-cluster-w-2 - example-cluster-w-3 - example-cluster-w-4 numInstances: 5 statusHistory: ... - detail: Add 3 workers.
You can use the same command to decrease the number of worker nodes to the original value:
gcloud dataproc clusters update example-cluster \ --region=region \ --num-workers 2
To avoid incurring charges to your Google Cloud account for the resources used in this quickstart, follow these steps.
clusters deleteto delete your example cluster.
gcloud dataproc clusters delete example-cluster \ --region=regionYou are prompted to confirm that you want to delete the cluster. Type
yto complete the deletion.
- Learn how to write and run a Scala job.