Submit a Spark job by using a template
This page shows you how to use an Google APIs Explorer template to run a simple Spark job on an existing Dataproc cluster.
For other ways to submit a job to a Dataproc cluster, see:
Before you begin
Before you can run a Dataproc job, you must create a cluster of one or more virtual machines (VMs) to run it on. You can use the APIs Explorer, the Google Cloud console, the gcloud CLI gcloud command-line tool, or the Quickstarts using Cloud Client Libraries to create a cluster.Submit a job
To submit a sample Apache Spark job that calculates a rough value for pi, fill in and execute the Google APIs Explorer Try this API template.
Request parameters:
Request body:
- job.placement.clusterName: The name of the cluster where the job will run (confirm or replace "example-cluster").
- job.sparkJob.args: "1000", the number of job tasks.
- job.sparkJob.jarFileUris: "file:///usr/lib/spark/examples/jars/spark-examples.jar". This is the local file path on the Dataproc cluster's master node where the jar that contains the Spark Scala job code is installed.
- job.sparkJob.mainClass: "org.apache.spark.examples.SparkPi". The is the main method of the job's pi calculation Scala application.
Click EXECUTE. The first time you run the API template, you may be asked to choose and sign into your Google account, then authorize the Google APIs Explorer to access your account. If the request is successful, the JSON response shows that job submission request is pending.
To view job output, open the Dataproc Jobs page in the Google Cloud console, then click the top (most recent) Job ID. Click "LINE WRAP" to ON to bring lines that exceed the right margin into view.
... Pi is roughly 3.141804711418047 ...
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
- If you don't need the cluster to explore the other quickstarts or to run other jobs, use the APIs Explorer, the Google Cloud console, the gcloud CLI gcloud command-line tool, or the Cloud Client Libraries to delete the cluster.
What's next
- Learn how to update a Dataproc cluster by using a template.