Google Cloud Serverless for Apache Spark lets you run Spark workloads without requiring you to provision and manage your own Dataproc cluster. There are two ways to run Serverless for Apache Spark workloads: batch workloads and interactive sessions.
Batch workloads
Submit a batch workload to the Serverless for Apache Spark service using the Google Cloud console, Google Cloud CLI, or Dataproc API. The service runs the workload on a managed compute infrastructure, autoscaling resources as needed. Serverless for Apache Spark charges apply only to the time when the workload is executing.
Batch workload capabilities
You can run the following Serverless for Apache Spark batch workload types:
- PySpark
- Spark SQL
- Spark R
- Spark (Java or Scala)
You can specify Spark properties when you submit a Serverless for Apache Spark batch workload.
Schedule batch workloads
You can schedule a Spark batch workload as part of an Airflow or Cloud Composer workflow using an Airflow batch operator. For more information, see Run Serverless for Apache Spark workloads with Cloud Composer.
Get started
To get started, see Run an Apache Spark batch workload.
Interactive sessions
Write and run code in Jupyter notebooks during a Serverless for Apache Spark interactive session. You can create a notebook session in the following ways:
Run PySpark code in BigQuery Studio notebooks. Open a BigQuery Python notebook to create a Spark-Connect-based Serverless for Apache Spark interactive session. Each BigQuery notebook can have only one active Serverless for Apache Spark session associated with it.
Use the Dataproc JupyterLab plugin to create multiple Jupyter notebook sessions from templates that you create and manage. When you install the plugin on a local machine or Compute Engine VM, different cards that correspond to different Spark kernel configurations appear on the JupyterLab launcher page. Click a card to create a Serverless for Apache Spark notebook session, then start writing and testing your code in the notebook.
The Dataproc JupyterLab plugin also lets you use the JupyterLab launcher page to take the following actions:
- Create Dataproc on Compute Engine clusters.
- Submit jobs to Dataproc on Compute Engine clusters.
- View Google Cloud and Spark logs.
Security compliance
Serverless for Apache Spark adheres to all data residency, CMEK, VPC-SC, and other security requirements that Dataproc is compliant with.