Create metric-absence alerting policies

This document describes how to use the Google Cloud console to create an alerting policy that sends notifications and creates an alert, or equivalently an incident, when a monitored time series has no data for a specific duration window.

Metric-absence conditions require at least one successful measurement — one that retrieves data — within the maximum duration window after the policy was installed or modified. The maximum configurable duration window is 24 hours if you use the Google Cloud console and 24.5 hours if you use the Cloud Monitoring API.

For example, suppose you set the duration window in a metric-absence policy to 30 minutes. The condition won't trigger when the subsystem that writes metric data has never written a data point. The subsystem needs to output at least one data point and then fail to output additional data points for 30 minutes.

This content does not apply to log-based alerting policies. For information about log-based alerting policies, which notify you when a particular message appears in your logs, see Monitoring your logs.

This document doesn't describe the following:

Before you begin

  1. To get the permissions that you need to create and modify alerting policies by using the Google Cloud console, ask your administrator to grant you the Monitoring Editor (roles/monitoring.editor) IAM role on your project. For more information about granting roles, see Manage access.

    You might also be able to get the required permissions through custom roles or other predefined roles.

    For more information about Cloud Monitoring roles, see Control access with Identity and Access Management.

  2. Ensure that you're familiar with the general concepts of alerting policies. For information about these topics, see Alerting overview.

  3. Configure the notification channels that you want to use to receive any alerts. For redundancy purposes, we recommend that you create multiple types of notification channels. For more information, see Create and manage notification channels.

Create alerting policy

To create an alerting policy that sends notifications when a monitored time series has no data for a specific duration period, do the following:

  1. In the navigation panel of the Google Cloud console, select Monitoring, and then select  Alerting:

    Go to Alerting

  2. Select Create policy.
  3. Select the time series to be monitored:

    1. Click Select a metric, navigate through the menus to select a resource type and metric type, and then click Apply.

      The Select a metric menu contains features that help you find the metric types available:

      • To find a specific metric type, use the Filter bar. For example, if you by enter util, then you restrict the menu to show entries that include util. Entries are shown when they pass a case-insensitive "contains" test.

      You can monitor any built-in metric or any user-defined metric.

    2. Optional: To monitor a subset of the time series that match the metric and resource types you selected in the previous step, click Add filter. In the filter dialog, select the label by which to filter, a comparator, and then the filter value. For example, the filter zone =~ ^us.*.a$ uses a regular expression to match all time-series data whose zone name starts with us and ends with a. For more information, see Filter the selected time series.

    3. Optional: To change how the points in a time series are aligned, in the Transform data section, set the Rolling window and Rolling window function fields.

      These fields specify how the points that are recorded in a window are combined. For example, assume that the window is 15 minutes and the window function is max. The aligned point is the maximum value of all points in the most recent 15 minutes. For more information, see Alignment: within-series regularization.

    4. Optional: Combine time series when you want to reduce the number of time series monitored by a policy, or when you want to monitor only a collection of time series. For example, instead of monitoring the CPU utilization of each VM instance, you might want to compute the average of the CPU utilization for all VMs in a zone, and then monitor that average. By default, time series aren't combined. For general information, see Reduction: combining time series.

      To combine all time series, do the following:

      1. In the Across time series section, click Expand.
      2. Set the Time series aggregation field to a value other than none. For example, to display the average value of the time series, select mean.
      3. Ensure that the Time series group by field is empty.

      To combine, or group, time series by label values, do the following:

      1. In the Across time series section, click Expand.
      2. Set the Time series aggregation field to a value other than none.
      3. In the Time series group by field, select the labels by which to group.

      For example, if you group by the zone label and then set the aggregation field to a value of mean, then the chart displays one time series for each zone for which there is data. The time series shown for a specific zone is the average of all time series with that zone.

    5. Click Next.

  4. Configure the condition trigger:

    1. Select Metric absence for the type of condition.
    2. Select a value for the Alert trigger menu. This menu lets you specify the subset of time series that must not have data before the condition is triggered.
    3. Specify how long metric data must be absent before alerting notifies you by using the Trigger absence time field.
    4. Click Next.
  5. Optional: Create an alerting policy with multiple conditions.

    Most policies monitor a single metric type, for example, a policy might monitor the number of bytes written to a VM instance. When you want to monitor multiple metric types, create a policy with multiple conditions. Each condition monitors one metric type. After you create the conditions, you specify how the conditions are combined. For information, see Policies with multiple conditions.

    To create an alerting policy with multiple conditions, do the following:

    1. For each additional condition, click Add alert condition and then configure that condition.
    2. Click Next and configure how conditions are combined.
    3. Click Next to advance to the notifications and documentation set up.
  6. Configure the notifications:

    1. Expand the Notifications and name menu and select your notification channels. For redundancy purposes, we recommend that you add to an alerting policy multiple types of notification channels. For more information, see Manage notification channels.

    2. Optional: To be notified when an incident is closed, select Notify on incident closure. By default, when you create an alerting policy with the Google Cloud console, a notification is sent only when an incident is created.

    3. Optional: To change how long Monitoring waits before closing an incident after data stops arriving, select an option from the Incident autoclose duration menu. By default, when data stops arriving, Monitoring waits seven days before closing an open incident.

    4. Select an option from the Policy severity level menu. Incidents and notifications display the severity level.

    5. Optional: To add custom labels to the alerting policy, in the Policy user labels section, do the following:

      1. Click Add label, and in the Key field enter a name for the label. Label names must start with a lowercase letter, and they can contain lowercase letters, numerals, underscores, and dashes. For example, enter severity.
      2. Click Value and enter a value for your label. Label values can contain lowercase letters, numerals, underscores, and dashes. For example, enter critical.

      For information about how you can use policy labels to help you manage your alerts, see Annotate alerts with labels.

  7. Optional: In the Documentation section, enter any content that you want included with the notification.

    To format your documentation, you can use plain text, Markdown, and variables. You can also include links to help users debug the incident, such as links to internal playbooks, Google Cloud dashboards, and external pages. For example, the following documentation template describes a CPU utilization incident for a gce_instance resource and includes several variables to reference the alerting policy and condition REST resources. The documentation template then directs readers to external pages to help with debugging.

    When notifications are created, Monitoring replaces the documentation variables with their values. The values replace the variables only in notifications. The preview pane and other places in the Google Cloud console show only the Markdown formatting.

    Preview

    ## CPU utilization exceeded
    
    ### Summary
    
    The ${metric.display_name} of the ${resource.type}
    ${resource.label.instance_id} in the project ${resource.project} has
    exceeded 90% for over 15 minutes.
    
    ### Additional resource information
    
    Condition resource name: ${condition.name}  
    Alerting policy resource name: ${policy.name}  
    
    ### Troubleshooting and Debug References
    
    Repository with debug scripts: example.com  
    Internal troubleshooting guide: example.com  
    ${resource.type} dashboard: example.com
    

    Format in notification

    Example of how documentation renders in a notification.

    For more information, see Annotate alerts with user-defined documentation and Using channel controls.

  8. Click Alert name and enter a name for the alerting policy.

  9. Click Create policy.

Filter the selected time series

Filters ensure that only time series that meet some set of criteria are monitored. When you apply filters, you might reduce the number of lines on the chart, which can improve the performance of the chart. You can also reduce the amount of data being monitored by applying aggregation. Filters ensure that only time series that meet some set of criteria are used. When you apply filters, there are fewer time series to evaluate, which can improve the performance of the alert.

A filter is composed of a label, a comparator, and a value. For example, to match all time series whose zone label starts with "us-central1", you could use the filter zone=~"us-central1.*", which uses a regular expression to perform the comparison.

When you filter by the project ID or the resource container, you must use the equals operator, (=). When you filter by other labels, you can use any supported comparator. Typically, you can filter metric and resource labels, and by resource group.

When you supply multiple filtering criteria, only the time series that meet all criteria are monitored.

To add a filter, click Add filter, complete the dialog, and then click Done. In the dialog, you use the Filter field to select the criterion by which to filter, select the comparison operator, and then select the value. Each row in the following table lists a comparison operator, its meaning, and an example:

OperatorMeaningExample
= Equality resource.labels.zone = "us-central1-a"
!= Inequality resource.labels.zone != "us-central1-a"
=~ Regular expression2 equality monitoring.regex.full_match("^us.*")
!=~ Regular expression2 inequality monitoring.regex.full_match("^us.*")
starts_with Value starts with resource.labels.zone = starts_with("us")
ends_with Value ends with resource.labels.zone = ends_with("b")
has_substring Value contains resource.labels.zone = has_substring("east")
one_of One of resource.labels.zone = one_of("asia-east1-b", "europe-north1-a")
!starts_with Value doesn't start with resource.labels.zone != starts_with("us")
!ends_with Value doesn't ends with resource.labels.zone != ends_with("b")
!has_substring Value doesn't contain resource.labels.zone != has_substring("east")
!one_of Value isn't one of resource.labels.zone != one_of("asia-east1-b", "europe-north1-a")