This page shows you how to use an inline Google APIs Explorer template to run a simple Spark job in an existing Cloud Dataproc cluster. You can learn how to do the same task using the Google Cloud Platform Console in Quickstart Using the Console or using the command line in Quickstart using the gcloud command-line tool.
Before you beginBefore you can run a Cloud Dataproc job, you need to create a cluster of virtual machines (VMs) to run it on. You can use the APIs Explorer, the Google Cloud Platform Console, or the Cloud SDK gcloud command-line tool to create a cluster.
Submit a job
To submit a sample Apache Spark job that calculates a rough value for pi, fill in and execute the APIs Explorer template, below, as follows:
- Enter you project ID (project name) in the
- The following fields are filled in for you:
region= a "global".
globalis the default region when a Cloud Dataproc cluster is created. This is a special multi-region namespace that is capable of deploying instances into all Compute Engine globally when a Cloud Dataproc cluster is created. If you created your cluster (see APIs Explorer—Create a cluster) in a different region, replace "global" with the name of your cluster's region.
- Request body
job.placement.clusterName= "example-cluster". This is the name of the Cloud Dataproc cluster (created in the previous quickstarts—see APIs Explorer—Create a cluster) where the job will be run. Replace this name with the name of your cluster if it is different.
- Request body
args= "1000". The number of tasks.
jarFileUris= "file:///usr/lib/spark/examples/jars/spark-examples.jar". The location of the pre-installed jar file on the master VM instance in your cluster that contains the Spark Scala job code.
mainClass= "org.apache.spark.examples.SparkPi". The main method for the job's pi-calculating Scala application.
Click EXECUTE. A dialog will ask you to confirm the default
https://www.googleapis.com/auth/cloud-platformscope. Click the dialog's ALLOW to send the request to the service. After less than one second (typically), the JSON response showing that the example-cluster is pending appears below the template.
You can inspect the job output by going to GCP Console—Clusters, then clicking on the Job ID link (select the "Line wrapping" box to bring lines that exceed the right margin into view).
Congratulations! You've used the Google APIs Explorer to submit an Apache Spark job to a Cloud Dataproc cluster.