Watch the Spark on Google Cloud session at Next 2021 here.

Jump to

Spark on Google Cloud

Industry’s first autoscaling serverless Spark, integrated with the best of Google-native and open source tools. Develop and run Spark where you need it across all use cases, including ETL, data science, and exploration.

Benefits

Increase developer productivity and get faster data insights

Operational simplicity through serverless Spark

Write Spark applications and pipelines that autoscale without any manual infrastructure provisioning or tuning. 

Seamless Spark for all data users

Spark is integrated with BigQuery, Vertex AI, and Dataplex, so you can write and run it from these interfaces in two clicks, without custom integrations, for ETL, data exploration, analysis, and ML. 

Flexibility of consumption

One size does not fit all. You can choose between serverless, Kubernetes clusters, and compute clusters for your Spark applications.

Key features

Run Spark jobs that autoscale, from the interface of your choice, in two clicks

Serverless Spark (GA coming soon)

Developers can spend all their time on code and logic, and use their chosen interface to submit Spark jobs which auto-provision and auto-scale.

Spark through BigQuery (Private Preview)

Unified SQL and Spark experience: enable data warehousing users to easily write and execute Spark on BigQuery data without exporting it. No infrastructure management required. 

Spark through Vertex AI (Private Preview)

Spark for data science in one click: Data scientists can use Spark for development from Vertex AI Workbench seamlessly, with built-in security. Spark is integrated with Vertex AI's MLOps features, where users can execute Spark code through notebook executors that are integrated with Vertex AI Pipelines.

Spark through Dataplex (Private Preview)

Run auto-scaling Spark on data across Google Cloud from a single interface that has one-click access to SparkSQL, Notebooks, or PySpark. Also offers easy collaboration with the ability to save, share, search notebooks and scripts alongside data, and built-in governance across data lakes.

Flexible consumption options

In addition to serverless Spark for no-ops deployment, customers standardizing on Kubernetes for infrastructure management can run Spark on Google Kubernetes Engine (Private Preview) to improve resource utilization and simplify infrastructure management. Customers looking for Hadoop-style infrastructure management can run Spark on Compute Engine (GA).


Ready to get started? Contact us

Partners

Get the latest Spark on Google Cloud news, blogs, and events

Register interest here to request early access to the new solutions for Spark on Google Cloud