Core and common tasks
-
Authenticate to Dataproc
Learn how to authenticate to Dataproc.
-
Create a cluster
Create a cluster using the Google Cloud console or the Google Cloud CLI.
-
Create a partial cluster
Create a partial cluster that has a minimum number of primary workers.
-
Create a custom image
Learn how to create a custom image and install it on a Dataproc cluster.
-
Create and manage labels
Create and manage Dataproc user labels.
-
Manage Java and Scala dependencies for Spark
Learn how to manage Java & Scala dependencies and resolve conflicts for Apache Spark applications.
-
Manage a cluster
Update or shut down a cluster.
-
Run Vertex AI Workbench notebooks on Dataproc clusters
Run the notebook file of a managed instance on a Dataproc cluster.
-
Set up a project
Set up a new project to use Dataproc.
-
Starting and stopping clusters
Start and stop a Dataproc cluster.
-
Submit a job
Submit different job types using the Google Cloud console, the Google Cloud CLI, or by SSHing into a cluster instance.
Dataproc on GKE
-
Dataproc on GKE overview
An overview of Dataproc on GKE .
-
Quickstart: Run a Spark job on Dataproc on GKE
Create a Dataproc on GKE virtual cluster, then run a Spark job on the virtual cluster.
-
Recreate and update a Dataproc on GKE virtual cluster
Recreate and update a Dataproc on GKE virtual cluster.
-
Delete a Dataproc on GKE virtual cluster
Delete a Dataproc on GKE virtual cluster.
-
Custom Dataproc on GKE container images
How to create Dataproc on GKE container images.
-
Diagnose a Dataproc on GKE cluster
How to diagnose a Dataproc on GKE cluster.
-
Dataproc on GKE IAM roles and identity
Dataproc on GKE IAM permissions.
-
Dataproc on GKE logging
View Dataproc on GKE logs.
-
Dataproc on GKE node pools
Manage Dataproc on GKE node pools.
-
Dataproc on GKE release versions
Dataproc on GKE release version information.
-
Scale a Dataproc on GKE cluster
How to scale a Dataproc on GKE cluster.
Dataproc Hub
Dataproc node groups
Dataproc Templates
-
Dataproc templates
Use Dataproc templates to set up and run Dataproc workloads and jobs.
-
Dataproc logs
Use Cloud Logging to view Dataproc cluster and job logs.
-
Dataproc job output and logs
Configure and view Dataproc job output.
-
View Dataproc audit logs
How to view Dataproc audit logs.
-
Cloud Monitoring
Use Cloud Monitoring to view Dataproc cluster metrics.
-
Create Dataproc metric alerts
Create Dataproc cluster and job metric alerts.
-
Cloud Profiler
Use Cloud Profiler to profile Spark and Hadoop job CPU usage and memory-allocation.
Logging and monitoring
Migrating to Dataproc
Dataproc performance enhancements
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.