Submit a Spark job by using a template
This page shows you how to use an Google APIs Explorer template to run a simple Spark job on an existing Dataproc cluster.
For other ways to submit a job to a Dataproc cluster, see:
Before you beginBefore you can run a Dataproc job, you must create a cluster of one or more virtual machines (VMs) to run it on. You can use the APIs Explorer, the Google Cloud console, the gcloud CLI gcloud command-line tool, or the Quickstarts using Cloud Client Libraries to create a cluster.
Submit a job
- job.placement.clusterName: The name of the cluster where the job will run (confirm or replace "example-cluster").
- job.sparkJob.args: "1000", the number of job tasks.
- job.sparkJob.jarFileUris: "file:///usr/lib/spark/examples/jars/spark-examples.jar". This is the local file path on the Dataproc cluster's master node where the jar that contains the Spark Scala job code is installed.
- job.sparkJob.mainClass: "org.apache.spark.examples.SparkPi". The is the main method of the job's pi calculation Scala application.
Click EXECUTE. The first time you run the API template, you may be asked to choose and sign into your Google account, then authorize the Google APIs Explorer to access your account. If the request is successful, the JSON response shows that job submission request is pending.
To view job output, open the Dataproc Jobs page in the console, then click the top (most recent) Job ID. Click "LINE WRAP" to ON to bring lines that exceed the right margin into view.
... Pi is roughly 3.141804711418047 ...
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
- Learn how to update a Dataproc cluster by using a template.