Core and common tasks
-
Create a cluster
Create a cluster using the Google Cloud console or the Google Cloud CLI.
-
Create a custom image
Learn how to create a custom image and install it on a Dataproc cluster.
-
Create and manage labels
Create and manage Dataproc user labels.
-
Manage Java and Scala dependencies for Spark
Learn how to manage Java & Scala dependencies and resolve conflicts for Apache Spark applications.
-
Manage a cluster
Update or shut down a cluster.
-
Run Vertex AI Workbench notebooks on Dataproc clusters
Run the notebook file of a managed instance on a Dataproc cluster.
-
Set up a project
Set up a new project to use Dataproc.
-
Starting and stopping clusters
Start and stop a Dataproc cluster.
-
Submit a job
Submit different job types using the Google Cloud console, the Google Cloud CLI, or by SSHing into a cluster instance.
Dataproc on GKE
-
Dataproc on GKE overview
An overview of Dataproc on GKE .
-
Quickstart: Run a Spark job on Dataproc on GKE
Create a Dataproc on GKE virtual cluster, then run a Spark job on the virtual cluster.
-
Recreate and update a Dataproc on GKE virtual cluster
Recreate and update a Dataproc on GKE virtual cluster.
-
Delete a Dataproc on GKE virtual cluster
Delete a Dataproc on GKE virtual cluster.
-
Custom Dataproc on GKE container images
How to create Dataproc on GKE container images.
-
Diagnose a Dataproc on GKE cluster
How to diagnose a Dataproc on GKE cluster.
-
Dataproc on GKE IAM roles and identity
Dataproc on GKE IAM permissions.
-
Dataproc on GKE logging
View Dataproc on GKE logs.
-
Dataproc on GKE node pools
Manage Dataproc on GKE node pools.
-
Dataproc on GKE release versions
Dataproc on GKE release version information.
-
Scale a Dataproc on GKE cluster
How to scale a Dataproc on GKE cluster.
Dataproc Hub
Logging and monitoring
-
Dataproc logs
Use Cloud Logging to view Dataproc cluster and job logs.
-
Dataproc job output and logs"
Configure and view Dataproc job output.
-
View Dataproc audit logs
How to view Dataproc audit logs.
-
Cloud Monitoring
Use Cloud Monitoring to view Dataproc cluster metrics.
-
Cloud Profiler
Use Cloud Profiler to profile Spark and Hadoop job CPU usage and memory-allocation.