Quickstart using the gcloud command-line tool

This page shows you how to use the Google Cloud SDK gcloud command-line tool to create a Google Cloud Dataproc cluster, run a simple Apache Spark job in the cluster, then modify the number of workers in the cluster.

You can find out how to do the same tasks with Quickstarts Using the API Explorer and the Google Cloud Platform Console in Quickstart Using the Console.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud Platform project. Learn how to enable billing.

  4. Enable the Cloud Dataproc API.

    Enable the API

Create a cluster

Run the following command to create a cluster called example-cluster. See Available regions & zones for information on selecting a region (you can also run the gcloud compute regions list command to see a listing of available regions). Also see Regional endpoints to learn about the difference between global and regional endpoints.

gcloud dataproc clusters create example-cluster --region=region
Waiting for cluster creation operation...done.
Created [... example-cluster]

Submit a job

To submit a sample Spark job that calculates a rough value for pi, run the following command:

gcloud dataproc jobs submit spark --cluster example-cluster \
  --class org.apache.spark.examples.SparkPi \
  --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000

This command specifies:

  • That you want to run a spark job on the example-cluster cluster
  • The class containing the main method for the job's pi-calculating application
  • The location of the jar file containing your job's code
  • Any parameters you want to pass to the job—in this case the number of tasks, which is 1000

The job's running and final output is displayed in the terminal window:

Waiting for job output...
Pi is roughly 3.14118528
Job finished successfully.

Update a cluster

To change the number of workers in the cluster to five, run the following command:

gcloud dataproc clusters update example-cluster --num-workers 5

Your cluster's updated details are displayed in the command's output:

  - example-cluster-w-0
  - example-cluster-w-1
  - example-cluster-w-2
  - example-cluster-w-3
  - example-cluster-w-4
  numInstances: 5
- detail: Add 3 workers.

You can use the same command to decrease the number of worker nodes to the original value:

gcloud dataproc clusters update example-cluster --num-workers 2

Clean up

To avoid incurring charges to your GCP account for the resources used in this quickstart:

  1. Run clusters delete to delete your example cluster.
    gcloud dataproc clusters delete example-cluster
    You are prompted to confirm that you want to delete the cluster. Type y to complete the deletion.
  2. You should also remove any Cloud Storage buckets that were created by the cluster by running the following command:
    gsutil rm gs://bucket/subdir/**

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation
Need help? Visit our support page.