Dataproc documentation

Read product documentation

Dataproc | Serverless for Apache Spark | Dataproc Metastore

Dataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them. With less time and money spent on administration, you can focus on your jobs and your data.

Go to the Dataproc product page for more.

Get started for free

Start your proof of concept with $300 in free credit

Get access to Gemini 2.0 Flash Thinking
Free monthly usage of popular products, including AI APIs and BigQuery
No automatic charges, no commitment

View free product offers

Keep exploring with 20+ always-free products

Access 20+ free products for common use cases, including AI APIs, VMs, data warehouses, and more.

Documentation resources

Find quickstarts and guides, review key references, and get help with common issues.

Dataproc documentation

Start your proof of concept with $300 in free credit

Keep exploring with 20+ always-free products

Documentation resources

Guides

Reference

Resources

Run a Spark job on Google Kubernetes Engine

Introduction to Cloud Dataproc: Hadoop and Spark on Google Cloud

Machine Learning with Spark on Dataproc

Workflow scheduling solutions

Migrate HDFS Data from On-Premises to Google Cloud

Manage Java and Scala dependencies for Apache Spark

Python API samples

Java API samples

Node.js API samples

Go API samples

Related videos