Google Cloud Platform
Hadoop on Google Cloud Platform

Spark and Hadoop on Google Cloud Platform

You can run powerful and cost-effective Apache Spark and Apache Hadoop clusters easily on Google Cloud Platform. When you run Hadoop and Spark on Google Cloud Platform, you combine these powerful open source ecosystems with Google's reliable and highly-scalable infrastructure, Google's pricing philosophy, and a powerful portfolio of cloud technologies.

There are many easy ways to get started with Spark and Hadoop on Google Cloud Platform, including:

  1. Google Cloud Dataproc — A managed Spark and Hadoop service that allows anyone to create and use fast, easy, and cost-effective clusters
  2. Command line tools (bdutil) — A collection of shell scripts to manually create and manage Spark and Hadoop clusters
  3. Third party Hadoop distributions:

Getting started

There are a few ways to quickly get started with Spark and Hadoop on Google Cloud Platform, as described below.

Cloud Dataproc

As a managed service, Cloud Dataproc is the easiest way to quickly get started with Spark and Hadoop on Google Cloud Platform. Visit the Dataproc getting started guide for more information on how to get started with Dataproc. See the Dataproc product page and the Dataproc documentation to get started. You can read about our beta release of Cloud Dataproc on the Google Cloud Platform Blog.

Command line deployment (bdutil)

To manually deploy a cluster, see Command-Line Deployment. For more detailed information, see the bdutil reference and the bdutil GitHub repository.

Get involved