Dataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them. With less time and money spent on administration, you can focus on your jobs and your data. Learn more

Training and tutorials

Try Dataproc tutorials, courses, and self-paced training from Google Cloud Skills Boost.

Use cases

Explore use cases, reference architectures, whitepapers, best practices, and industry solutions.

Code samples

Dive into coding with examples that demonstrate how to use and connect Google Cloud services.

Videos