Core and common tasks
-
Create a cluster
Create a cluster using the Google Cloud Console or the gcloud command-line tool.
-
Create a custom image
Learn how to create a custom image and install it on a Dataproc cluster.
-
Create and manage labels
Create and manage Dataproc user labels.
-
Manage Java and Scala dependencies for Spark
Learn how to manage Java & Scala dependencies and resolve conflicts for Apache Spark applications.
-
Manage a cluster
Update or shut down a cluster.
-
Run jobs on Google Kubernetes EngineBeta
Run Dataproc jobs on a GKE cluster.
-
Set up a project
Set up a new project to use Dataproc.
-
Starting and stopping clusters
Start and stop a Dataproc cluster.
-
Submit a job
Submit different job types using the Google Cloud Console, the gcloud command-line tool, or by SSHing into a cluster instance.
Dataproc Hub
Logging and monitoring
-
Job driver output
Use the console, gcloud command-line tool, or Cloud Storage to view Dataproc job driver output.
-
Cloud Logging
Use Cloud Logging to view Dataproc cluster and job logs.
-
Cloud Monitoring
Use Cloud Monitoring to view Dataproc cluster metrics.
-
Cloud Profiler
Use Cloud Profiler to profile Spark and Hadoop job CPU usage and memory-allocation.