Stackdriver Monitoring provides visibility into the performance, uptime, and overall health of cloud-powered applications. Stackdriver collects and ingests metrics, events, and metadata from Cloud Dataproc clusters to generate insights via dashboards and charts.
Use Stackdriver cluster metrics to monitor the performance and health of Cloud Dataproc clusters.
See Stackdriver Pricing to understand your costs.
See Monitoring Quotas and limits for information on metric data retention.
Stackdriver cluster metrics
Cloud Dataproc cluster resource metrics are automatically enabled on Cloud Dataproc clusters. Use Monitoring to view these metrics.
After creating a cluster, go to the Monitoring console to view cluster monitoring data.
When you first access Monitoring, it creates a Workspace and associates your GCP project with that Workspace. If you've never used Monitoring, this process is automatic. If you have used Monitoring, then the Add your project to a Workspace dialog is displayed. To create a new Workspace, from the New Workspace list, select your GCP project, and then click Add.
After setting up the Workspace, the Monitoring console appears. At this point, you can install the Monitoring agent on VMs in your project as an additional set-up step. You don't need to install the agent on VMs in Cloud Dataproc clusters because this step is performed for you when you create a Cloud Dataproc cluster.
Select Resources→Metrics Explorer, then click in the "Find resource type and metric" input box to display the resource drop-down list. Select the "Cloud Dataproc Cluster" resource (or type "cloud_dataproc_cluster" in the box).
Click again in the input box, and then select a metric from the drop-down list. In the next screenshot, "YARN memory size" is selected. Hovering over the metric name displays information about the metric.
You can select filters, group by metric labels, perform aggregations, and select chart viewing options (see the Monitoring documentation).
You can use the Monitoring
API to capture and list metrics defined by a
Use the Try this API template on the API page to send
an API request and display the response.
Example: Here's a snapshot of a templated request and the returned
JSON response for the following Monitoring
- name: projects/example-project-id
- filter: metric.type="dataproc.googleapis.com/cluster/hdfs/storage_capacity"
- interval.endTime: 2018-02-27T11:54:00.000-08:00
- interval.startTime: 2018-02-20T00:00:00.000-08:00
Building a custom Monitoring dashboard
You can build a custom Monitoring dashboard that display charts of selected Cloud Dataproc cluster metrics.
Select Dashboards→Create Dashboard from the Monitoring console.
An "Untitled Dashboard" opens. Click Add Chart. In the Add Chart window, select "Cloud Dataproc Cluster" as the resource type. Select one or more metrics and metric and chart properties. Confirm or type a new chart title, then Save the chart.
You can add additional charts to your dashboard. After you Save the dashboard, its title appears in the Monitoring Dashboards menu.
Dashboard charts can be viewed, updated, and deleted from the dashboard display page.
Using Monitoring alerts
You can create a Monitoring alert that notifies you when a Cloud Dataproc cluster or job metric crosses a specified threshold (for example, when HDFS free capacity is low).
Creating an alert
Select "Alerting→Create a Policy" from the Monitoring console.
From the Create a new alerting policy page, define an alert by adding alert conditions, notification channels, and documentation.
Select "Conditions→+ Add Condition", then from the Select condition type page, select "Metric Threshold/Rate Change/Absence".
In the Add monitoring.v3 Condition page, select the "Cloud Dataproc Cluster" metric and the alert trigger condition, then click "Save Condition".
After setting the alert condition, complete the alert policy by setting notification channels, documentation, and the name for the new alert policy from the Create a new alerting policy page.
When an alert is triggered by a metric threshold condition, Monitoring creates an incident (and a corresponding event). You can review incidents from the Monitoring Alerting→Incidents page. If you defined a notification mechanism in the alert policy, such as an email or SMS notification, Monitoring also sends a notification of the incident.
- Explore Stackdriver