Dataproc monitoring and troubleshooting tools

Introduction

Dataproc is a fully managed and highly scalable service for running open-source distributed processing platforms such as Apache Hadoop, Apache Spark, Apache Flink, and Trino. You can use the files and tools discussed in the following sections to troubleshoot and monitor your Dataproc clusters and jobs.

Open source web interfaces

Many Dataproc cluster open source components, such as Apache Hadoop and Apache Spark, provide web interfaces. These interfaces can be used to monitor cluster resources and job performance. For example, you can use the YARN Resource Manager UI to view YARN application resource allocation on a Dataproc cluster.

Persistent History Server

Open Source web interfaces running on a cluster are available when the cluster is running, but they terminate when you delete the cluster. To view cluster and job data after a cluster is deleted, you can create a Persistent History Server (PHS).

Example: You encounter a job error or slowdown that you want to analyze. You stop or delete the job cluster, then view and analyze job history data using your PHS.

After you create a PHS, you enable it on a Dataproc cluster or Dataproc Serverless batch workload when you create the cluster or submit the batch workload. A PHS can access history data for jobs run on multiple clusters, letting you monitor jobs across a project instead of monitoring separate UIs running on different clusters.

Dataproc logs

Dataproc collects the logs generated by Apache Hadoop, Spark, Hive, Zookeeper and other open source systems running on your clusters, and sends them to Logging. These logs are grouped based on the source of logs, which allows you to select and view logs of interest to you: for example, YARN NodeManager and Spark Executor logs generated on a cluster are labelled separately. See Dataproc logs for more information on Dataproc log contents and options.

Cloud Logging

Logging is a fully-managed, real-time log management system. It provides storage for logs ingested from Google Cloud services and tools to search, filter, and analyze logs at scale. Dataproc clusters generate multiple logs, including Dataproc service agent logs, cluster startup logs, and OSS component logs, such as YARN NodeManager logs.

Logging is enabled by default on Dataproc clusters and Dataproc Serverless batch workloads. Logs are periodically exported to Logging, where they persist after the cluster is deleted or the workload is completed.

Dataproc metrics

Dataproc cluster and job metrics, prefixed with dataproc.googleapis.com/, consist of time-series data that provide insights into the performance of a cluster, such as CPU utilization or job status. Dataproc custom metrics, prefixed with custom.googleapis.com/, include metrics emitted by open source systems running on the cluster, such as the YARN running applications metric. Gaining insight into Dataproc metrics can help you configure your clusters efficiently. Setting up metric-based alerts can help you recognize and respond to problems quickly.

Dataproc cluster and job metrics are collected by default without charge. The collection of custom metrics is charged to customers. You can enable the collection of custom metrics when you create a cluster. The collection of Dataproc Serverless Spark metrics is enabled by default on Spark batch workloads.

Cloud Monitoring

Monitoring uses cluster metadata and metrics, including HDFS, YARN, job, and operation metrics, to provide visibility into the health, performance, and availability of Dataproc clusters and jobs. You can use Monitoring to explore metrics, add charts, build dashboards, and create alerts.

Metrics Explorer

You can use the Metrics Explorer to view Dataproc metrics. Dataproc cluster, job, and serverless batch metrics are listed under the Cloud Dataproc Cluster, Cloud Dataproc Job, and Cloud Dataproc Batch resources. Dataproc custom metrics are listed under the VM Instances resource, Custom category.

Charts

You can use Metrics Explorer to create charts that visualize Dataproc metrics.

Example: You create a chart to see the number of active Yarn applications running on your clusters, and then add a filter to select visualized metrics by cluster name or region.

Dashboards

You can build dashboards to monitor Dataproc clusters and jobs using metrics from multiple projects and different Google Cloud products. You can build dashboards in the Google Cloud console from the Dashboards Overview page by clicking, creating, and then saving a chart from the Metrics Explorer page.

Alerts

You can create Dataproc metric alerts to receive timely notice of cluster or job issues.

For more information

For additional guidance, see