Quickstart using the console

This page shows you how to use the Google Cloud Platform Console to create a Google Cloud Dataproc cluster, run a simple Apache Spark job in the cluster, then modify the number of workers in the cluster.

You can find out how to do the same tasks with Quickstarts Using the API Explorer and Quickstart using the gcloud command-line tool.

Before you begin

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. Enable the Cloud Dataproc API.

    Enable the API

Create a cluster

  1. Go to the Cloud Platform Console Cloud Dataproc Clusters page.
  2. Click Create cluster.
  3. Enter example-cluster in the Name field.
  4. Select a region and zone for the cluster from the Region and Zone drop-down menus (global region and us-central1-a zone are shown selected, below). global region is the default. This is a special multi-region namespace that is capable of deploying instances into all Google Compute zones globally. You can also specify distinct regions, such as us-east1 or europe-west1, to isolate resources (including VM instances and Google Cloud Storage) and metadata storage locations utilized by Cloud Dataproc within the user-specified region. See Available regions & zones for information on selecting a region. You can also run the gcloud compute regions list command to see a listing of available regions.
  5. Use the provided defaults for all the other options.

  6. Click Create to create the cluster.

Your new cluster should appear in the Clusters list. Cluster status is listed as "Provisioning" until the cluster is ready to use, then changes to "Running."

Submit a job

To run a sample Spark job:

  1. Select Jobs in the left nav to switch to Dataproc's jobs view.
  2. Click Submit job.
  3. Select your new cluster example-cluster from the Cluster drop-down menu.
  4. Select Spark from the Job type drop-down menu.
  5. Enter file:///usr/lib/spark/examples/jars/spark-examples.jar in the Jar file field.
  6. Enter org.apache.spark.examples.SparkPi in the Main class or jar field.
  7. Enter 1000 in the Arguments field to set the number of tasks.
  1. Click Submit.

Your job should appear in the Jobs list, which shows your project's jobs with their cluster, type, and current status. Job status displays as "Running," and then "Succeeded" after it completes. To see your completed job's output:

  1. Click the job ID in the Jobs list.
  2. Select Line Wrapping to avoid scrolling.

You should see that your job has successfully calculated a rough value for pi!

Update a cluster

To change the number of worker instances in your cluster:

  1. Select Clusters in the left navigation pane to return to the Cloud Dataproc Clusters view.
  2. Click example-cluster in the Clusters list. By default, the page displays an overview of your cluster's CPU usage.
  3. Click Configuration to display your cluster's current settings.
  4. Click Edit. The number of worker nodes is now editable.
  5. Enter 5 in the Worker nodes field.
  6. Click Save.

Your cluster is now updated. You can follow the same procedure to decrease the number of worker nodes to the original value.

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

  1. On the example-cluster Cluster page, click Delete to delete the cluster. You are prompted to confirm that you want to delete the cluster. Click OK.
  2. You should also remove any Cloud Storage buckets that were created by the cluster by running the following command:
    gsutil rm gs://bucket/subdir/**
    

What's next

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Dataproc Documentation