Managing metric-based alerting policies

This document illustrates how to use the Google Cloud Console to create and manage an alerting policy based on metrics. This content does not apply to log-based alerting policies. For information about log-based alerting policies, which notify you when a particular message appears in your logs, see Monitoring your logs.

An alerting policy describes a set of conditions that you want to monitor. These conditions might relate to the health of an application, the value of a system metric, or to resource consumption. For example, you might want a policy that monitors an uptime check or one that monitors your Cloud Monitoring API usage. An alerting policy also lets you specify how you want to be notified when the conditions of the policy are met and what documentation to include in that notification.

You can also use the Cloud Monitoring API to create and manage alerting policies. For more information about this approach, see Managing alerting policies by API. To see policies represented in JSON, see Sample policies.

Before you begin

Before creating alerting policies, you should be familiar with the general concepts and terminology in alerting policies. For information about the components of a policy, the concept of an incident, and pricing and limitations, see Introduction to alerting.

Create an alerting policy

Cloud Monitoring is refreshing the interface that you use to create an alerting policy. This document provides information on the Legacy interface and the Preview interface. If you choose to try the Preview interface, and then want to use the legacy interface, click Return to Legacy UI.

Legacy interface

To create an alerting policy, do the following:

  1. In the Cloud Console, select Monitoring:

    Go to Monitoring

  2. Select Alerting.

  3. Click Create Policy to see the Create alerting policy page:

    Create an alerting policy dialog is displayed.

    1. Click Add condition and complete the dialog. For information on the fields in a condition, see Specifying conditions.

      A condition describes a monitored resource, a metric for that resource, and when the condition is met. An alerting policy must have at least 1 condition, however alerting policies can contain up to 6 conditions. If an alerting policy has exactly 1 condition and if the condition is met, then an incident is created. If an alerting policy has multiple conditions, then you specify how these conditions are combined. For more information, see Policies with multiple conditions.

    2. Click Next to advance to the notifications section.

    3. To be informed when an incident is created, add a notification channel to your alerting policy. You can add multiple notification channels. For details about your choices of notification channels, see Notification options.

      To add a notification channel, click Notification channels. In the dialog, select one or more notification channels from the menu and then click OK.

      Notification dialog displaying the refresh and manage channels buttons.

      If a notification channel that you want to add isn't listed, then click Manage notification channels. You are taken to the Notification channels page in a new browser tab. From this page, you can update the configured notification channels. When you have completed your updates, return to the original tab, click Refresh , and then select the notification channels to add to the alerting policy. For more information, see Creating a channel on demand.

    4. (Optional) If you want to be notified when an incident is opened and closed, then select Notify on incident closure. By default, notifications are sent only when an incident is opened.

      Display of the notify on incident closure button.

    5. (Optional) If the field Incident autoclose duration is shown, then set the value of this field to be the duration that Monitoring should wait before closing incidents when observations stop arriving. The default value of this field is seven days.

      For example, assume that you have an alert with a metric threshold condition that monitors a virtual machine (VM). If you turn down the VM while an incident is open, then by default Monitoring waits for seven days before it closes the incident.

    6. Click Next to advance to the documentation section.

    7. Click Name and enter a policy name. This name is included in notifications and it is displayed in the Policies page.

    8. (Optional) Specify the documentation to be included in notifications. To format your documentation, you can use Markdown. To pull information out of the policy itself to tailor the content of your documentation, you can use variables. For more information about how you can format and tailor the content of this field, see Using Markdown and variables in documentation templates.

      For example, documentation might include a title such as Addressing High CPU Usage and details that identify the project:

      ## Addressing High CPU Usage
      
      This note contains information about high CPU Usage.
      
      You can include variables in the documentation. For example:
      
      This alert originated from the project ${project}, using
      the variable $${project}.
      

      The value replaces the variable only in notifications. The Preview Markdown pane, and the other places in the Cloud Console that show the documentation, reflect only the Markdown formatting:

      Example writing a documentation note using markdown.

      You can also include channel-specific tagging to control notifications. For more information, see Using channel controls.

  4. Click Save.

Preview interface

To create an alerting policy, do the following:

  1. In the Cloud Console, select Monitoring:

    Go to Monitoring

  2. In the Cloud Monitoring navigation pane, select Alerting and then click Create Policy.

  3. In the New condition step on the Create alerting policy page, configure the alerting policy condition.

  4. (Optional) To create a multi-condition alerting policy, do the following:

    1. For each additional condition, click Add condition and then configure that condition.
    2. After you have added all conditions, select how these conditions are combined in the Multi-condition trigger step. For information, see Policies with multiple conditions.
  5. (Optional) To be notified when the condition of an alerting policy is met, ensure Use notification channel is enabled in the Notifications and name step, and then do the following:

    1. Click the text Notification channels to activate a menu. Select one or more notification channels from this menu.

      If a notification channel that you want to add isn't listed, see Creating a channel on demand.

      For details about your choices of notification channels, see Notification options.

    2. To be notified when an incident is resolved, select Notify on incident closure.

      By default, if you create an alerting policy with the Google Cloud Console, a notification is sent only when an incident is created.

    You can change the notification channels for an alerting policy by editing that policy.

  6. (Optional) If the field Incident autoclose duration is shown, then set the value of this field to be the duration that Monitoring should wait before closing incidents when observations stop arriving. The default value of this field is seven days.

    For example, if you have an alert with a metric threshold condition that monitors a virtual machine (VM). If you turn down the VM while an incident is open, then by default Monitoring waits for seven days before it closes the incident.

  7. (Optional) To include custom documentation with a notification, add that content to the Documentation section of the Notifications and name step.

    To format your documentation, you can use Markdown. To pull information out of the policy itself to tailor the content of your documentation, you can use variables. For example, documentation might include a title such as Addressing High CPU Usage and details that identify the project:

    ## Addressing High CPU Usage
    
    This note contains information about high CPU Usage.
    
    You can include variables in the documentation. For example:
    
    This alert originated from the project ${project}, using
    the variable $${project}.
    

    When notifications are created, Monitoring replaces the variables with their values. The values replace the variables only in notifications. The preview pane and in other places in the Cloud Console only show the Markdown formatting:

    Example writing a documentation note using markdown.

    For information about Markdown and variables, see Using Markdown and variables in documentation templates.

    For information about how to include channel-specific tagging to control notifications, see Using channel controls.

    You can change the documentation for an alerting policy by editing that policy.

  8. (Optional) To change the name of the alerting policy from New alert to something more meaningful, go to the Notifications and name step and update the policy name.

  9. Click Create.

Configure a condition

This section describes how you can configure a condition by using the Preview interface. If you are using the Legacy interface, see Specifying conditions.

To configure a condition, do the following:

  1. Select how you want to specify the time series to be monitored:

    • Basic mode

      Use basic mode when you want to configure a condition that monitors a metric for a specific resource and you don't want to use MQL.

      If you select basic mode, then you can convert your selections into MQL or to using direct filter mode.

    • MQL mode

      Use MQL mode when you want to use MQL to describe the condition or if you want to monitor a ratio of metrics.

      If you use MQL mode, then your query can't be converted to basic mode or direct filter mode.

    • Direct filter mode

      Use direct filter mode when you are interested in monitoring any of the following:

      • A service level objective (SLO).
      • The count of processes running on virtual machines (VMs).
      • A custom metric for which you don't yet have data.

      If you use direct filter mode, then your query can't be converted to basic mode or to an MQL query.

  2. (Optional) If you use basic mode or direct filter mode, then specify how the selected time series are processed and combined by using the Transform data fields.

  3. Configure the Condition trigger.

Basic mode

To select the time series to be monitored by using basic mode, do the following:

  1. Click Select a metric in the Create alerting policy page, scroll to the resource, and then navigate through the menus. After you select a metric, click Apply.

    To limit the menu to those metrics that contain a specific string, enter that string on the filter bar. For example, if you enter CPU, then only metrics whose name contains CPU are displayed. A case-insensitive test is performed to determine whether a metric is listed in the menu.

    To change the selected metric or resource, expand Select a metric menu, and then click Reset or navigate through the menus to make a different selection.

  2. (Optional) To monitor only some of the time series displayed in the chart, add a filter. In the filter dialog, you use the Filter field to select the label by which to filter. For example, you can filter by resource group, by name, by resource label, by zone, and by metric label.

    For example, the filter zone =~ ^us.*.a$ uses a regular expression to match all time-series data whose zone name starts with us and ends with a. For more information, see Filter the selected data.

If you click MQL in the toolbar after making selections, then the Query Editor is opened and it displays a query that includes your selections. If you modify the existing query, and then return to basic mode, your modifications are discarded.

MQL mode

To configure a condition by using MQL, do the following:

  1. Click MQL in the toolbar of the Create custom alert page and then enter the query for the condition.

    Your query must end with one of the following operations:

    For more information, see Alerting policies with MQL.

  2. Click Next and Configure the condition trigger.

To return to basic mode, click Basic query in the toolbar of the Create custom alert page.

If you create a query or modify the existing query and then return to basic mode, your modifications are discarded.

Direct filter mode

To select the time series to be monitored by using direct filter mode, do the following:

  1. Click ? on the Select metric section header and then select Direct filter mode in the tooltip.

  2. Enter a Monitoring filter.

    For example, to count the number of processes that are running on Compute Engine VM instances whose name includes nginx, enter the following:

    select_process_count("monitoring.regex.full_match(\".*nginx.*\")")
    resource.type="gce_instance"
    

    For syntax information see the following resources:

To return to basic mode, click ? on the Select metric section header and then select Basic mode.

If you make selections by using basic mode, and then enter direct filter mode, then you can view the monitoring filter. You can return to basic mode when you don't modify the monitoring filter. If you return to basic mode after you create or modify a monitoring filter, then your changes might not be preserved.

If you use direct filter mode to select the time series to be monitored and then switch to MQL mode, your changes aren't preserved.

Transform data

This section applies only to basic mode and direct filter mode. It doesn't apply to MQL mode.

To configure how each time series is aligned and how time series are combined, do the following:

  1. (Optional) To change how the points in a time series are aligned to fixed time intervals, set the Rolling window and the Rolling window function in the Transform data section.

    These fields specify how the points that are recorded in a window are combined. For example, if the window is 15 minutes and the window function is max, then the aligned point is the maximum value of all samples recorded in the most recent 15 minutes.

    For more information, see Align time series.

  2. (Optional) To combine time series together, in the Across time series section, click Show more, and then complete the dialog. By default, time series aren't combined.

    To create a single time series, do the following:

    1. Set the Time series aggregation field to a value other than none. For example, if you select mean, then each point in the displayed time series is the average of points from the individual time series.

    2. Ensure that the Time series group by field is empty.

    To group time series, do the following:

    1. Set the Time series aggregation field to a value other than none.

      For example, if you group the time series by zone and then set the aggregation field to mean, then there is one time series for each zone.

    2. In the Time series group by field, select one or more labels by which to group.

    If you have multiple time series displayed after completing the previous steps, to combine these time series into a single time series use the Secondary data transform fields.

    For more information, see Combine time series.

Condition trigger

To configure when a condition is met, go to the Configure alert trigger page and then do the following:

  1. If the Condition type field is shown, then select the type of condition.

    • To be notified when metric data stops arriving, select Metric absence.

    • To be notified based on the value of a metric, select Threshold.

  2. To specify how individual time series contribute to when a condition is met, select a value from the Alert trigger menu. This menu lets you specify the subset of time series that must satisfy the trigger for the condition to be met.

  3. Complete the condition-specific field:

    • Metric absence condition

      Enter how long the alerting policy waits to notify you that no metric data is being received by using the Trigger when data is absent for this amount of time field.

    • Threshold condition

      Enter how the value of the metric meets the condition criteria, by using the Threshold position, Threshold value, and Time above threshold fields. For example, if you set these values to Above threshold, 0.3, and 5 minutes, then a time series meets the condition if every sample in a 5-minute interval is greater than 0.3.

      Enter how long measurements must meet the condition criteria before alerting generates an incident by using the Retest window. If you select most recent value, then a single measurement can result in a notification. For conceptual information and an example, see The alignment period and the duration.

JSON for an alerting policy

To view or download the JSON representation of an alerting policy, do the following:

  1. In the Cloud Console, select Monitoring:

    Go to Monitoring

  2. Select Alerting, find the policy that you want to view, and then open the Policy details page.

  3. Do one of the following:

    • To download the JSON to your local system, click JSON.

    • To view the JSON, ensure you are using the Preview interface, click Edit, and then click View code.

      If View code isn't shown, then either you are using the Legacy interface or the option is hidden under More options.

Add an alerting policy to a dashboard

When an alerting policy contains one condition, you can display a summary of that alerting policy on a custom dashboard. The summary includes the time series that the alerting policy monitors, the threshold, and the number of open incidents.

To display a summary of an alerting policy on a custom dashboard, do the following:

  1. In the Cloud Console, select Monitoring:

    Go to Monitoring

  2. Select Dashboards and open the custom dashboard that you want to modify.

  3. If Editing isn't shown, then click Viewing and select Switch to Editing mode.

  4. Select Alert chart from the widget library, or click Add chart and then select Alert chart from the menu.

  5. In the configuration pane of the Alert chart, use the Alert policy menu to select an alerting policy. Only single-condition alerting policies can be selected from the Alert policy menu.

The following screenshot illustrates an alert chart:

Example of an alert chart.

In this example, the alerting policy is monitoring the CPU usage of two different virtual machines. The dashed red line shows the condition threshold, which is set to 50%. The green chip with the label No incidents indicates that there are no open incidents for the alerting policy. If you place your pointer on the chip that shows the number of open incidents, then a dialog opens that links to the underlying alerting policy.

For more information, see Using dashboards and charts.

Manage policies

To list all alerting policies, do the following:

  1. In the Cloud Console, select Monitoring:

    Go to Monitoring

  2. Select Alerting.

  3. A partial list of policies is shown in the Policies pane. To see all policies and to enable filtering, click See all policies.

To view the details of an alerting policy, click its name.

To restrict the alerting policies that are listed, add filters. Each filter is composed of a name and a value. You can set the value to be an exact match for a policy name, or a partial match. Filters don't perform case-sensitive comparisons. If you have multiple filters, then a logical AND joins the filters unless you insert an OR filter. The following screenshot lists all alerting policies that match test or My Uptime Check Alert Policy:

Sample alerts overview with filters.

From the Policies page you can edit, delete, copy, enable, or disable an alerting policy:

  • To edit or copy a policy, click More options , and select the corresponding option. Editing and copying a policy is similar to Creating an alerting policy. You can change, and sometimes delete, the values in the fields. When done, click Save.

  • To delete a policy, click More options and select Delete. In the confirmation dialog, select Delete.

  • To change the enabled status of the alerting policy, click the toggle located under the heading Enabled.

If you select an alerting policy, then the Policy details are displayed. For example, the following screenshot illustrates the details for the alerting policy named Test staging:

Sample alert-policy details page.

The Notifications section lists whether you've configured a policy to notify you when incidents closed. You are always notified when an incident is opened. To change the notification behavior, edit the alerting policy.

You can edit, delete, copy, enable, or disable an alerting policy from the Policy details page. For example, to disable an alerting policy that is enabled, click Enabled and select Turn off. Similarly, to enable a policy that is disabled, click Disabled.