Types of alerting policies

Stay organized with collections Save and categorize content based on your preferences.

This document describes different types of metric-based alerting policies, and it provides JSON examples for these policies. Alerting policies define conditions that watch for three things: Some metric behaving in some way for some period of time. For example, an alerting policy might trigger when the value of a metric goes higher than a threshold, or when the value changes too quickly.

This content does not apply to log-based alerting policies. For information about log-based alerting policies, which notify you when a particular message appears in your logs, see Monitoring your logs.

For information about creating alerting policies, see the following documents:

Metric-absence condition

A metric-absence condition triggers when a monitored time series has no data for a specific duration window.

Metric-absence conditions require at least one successful measurement — one that retrieves data — within the maximum duration window after the policy was installed or modified. The maximum configurable duration window is 24 hours if you use the Google Cloud console and 24.5 hours if you use the Cloud Monitoring API.

For example, suppose you set the duration window in a metric-absence policy to 30 minutes. The condition won't trigger when the subsystem that writes metric data has never written a data point. The subsystem needs to output at least one data point and then fail to output additional data points for 30 minutes.

Metric-threshold condition

A metric-threshold condition triggers when the values of a metric are more than, or less than, the threshold for a specific duration window. For example, a metric-threshold condition might trigger when the CPU utilization is higher than 80% for at least 5 minutes.

Within the class of metric-threshold conditions, there are patterns that fall into general sub-categories:

  • Rate-of-change conditions trigger when the values in a time series increase or decrease by a specific percent or more during a duration window.

    When you create this type of condition, a percent-of-change computation is applied to the time series before comparison to the threshold.

    The condition averages the values of the metric from the past 10 minutes, then compares the result with the 10-minute average that was measured just before the duration window. The 10-minute lookback window used by a metric rate of change condition is a fixed value; you can't change it. However, you do specify the duration window when you create a condition.

  • Group-aggregate conditions trigger when a metric measured across a resource group crosses a threshold for a duration window.

  • Uptime-check conditions trigger when an uptime check fails to successfully respond to a request sent from at least two geographic locations.

  • Process-health conditions trigger when the number of processes running on a VM instance is more than, or less than, a threshold. You can also configure these conditions to monitor a group of instances that match a naming convention.

    This condition type requires the Ops Agent or the Monitoring agent to be running on the monitored resources. For more information about the agents, see Google Cloud Operations suite agents.

  • Metric-ratio conditions trigger when the ratio of two metrics exceeds a threshold for a duration window. These conditions compute the ratio of two metrics, for example, the ratio of HTTP error responses to all HTTP responses.

    For more information about ratio-based policies, see Conditions for alerting on ratios.

Examples

Examples of each of these policy types are available:

Condition type JSON example Google Cloud console
Metric threshold View Instructions
Rate of change View Instructions
Group aggregate View Instructions
Uptime check View Instructions
Process health View Instructions
Metric ratio View Instructions

What's next