Submit a Spark job by using a template

This page shows you how to use an Google APIs Explorer template to run a simple Spark job on an existing Dataproc cluster.

For other ways to submit a job to a Dataproc cluster, see:

Before you begin

Before you can run a Dataproc job, you must create a cluster of one or more virtual machines (VMs) to run it on. You can use the APIs Explorer, the Google Cloud console, the gcloud CLI gcloud command-line tool, or the Quickstarts using Cloud Client Libraries to create a cluster.

Submit a job

To submit a sample Apache Spark job that calculates a rough value for pi, fill in and execute the Google APIs Explorer Try this API template.

  1. Request parameters:

    1. Insert your projectId.
    2. Specify the region where your cluster is located (confirm or replace "us-central1"). Your cluster's region is listed on the Dataproc Clusters page in the Google Cloud console.
  2. Request body:

    1. job.placement.clusterName: The name of the cluster where the job will run (confirm or replace "example-cluster").
    2. job.sparkJob.args: "1000", the number of job tasks.
    3. job.sparkJob.jarFileUris: "file:///usr/lib/spark/examples/jars/spark-examples.jar". This is the local file path on the Dataproc cluster's master node where the jar that contains the Spark Scala job code is installed.
    4. job.sparkJob.mainClass: "org.apache.spark.examples.SparkPi". The is the main method of the job's pi calculation Scala application.
  3. Click EXECUTE. The first time you run the API template, you may be asked to choose and sign into your Google account, then authorize the Google APIs Explorer to access your account. If the request is successful, the JSON response shows that job submission request is pending.

  4. To view job output, open the Dataproc Jobs page in the Google Cloud console, then click the top (most recent) Job ID. Click "LINE WRAP" to ON to bring lines that exceed the right margin into view.

    ...
    Pi is roughly 3.141804711418047
    ...
    

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

  1. If you don't need the cluster to explore the other quickstarts or to run other jobs, use the APIs Explorer, the Google Cloud console, the gcloud CLI gcloud command-line tool, or the Cloud Client Libraries to delete the cluster.

What's next