Managing metric-based alerting policies

This document illustrates how to use the Google Cloud console to create and manage an alerting policy based on metrics. This content does not apply to log-based alerting policies. For information about log-based alerting policies, which notify you when a particular message appears in your logs, see Monitoring your logs.

An alerting policy describes a set of conditions that you want to monitor. These conditions might relate to the health of an application, the value of a system metric, or to resource consumption. For example, you might want a policy that monitors an uptime check or one that monitors your Cloud Monitoring API usage. An alerting policy also lets you specify how you want to be notified when the conditions of the policy are met and what documentation to include in that notification.

You can also use the Cloud Monitoring API to create and manage alerting policies. For more information about this approach, see Managing alerting policies by API. To see policies represented in JSON, see Sample policies.

Before you begin

Before creating alerting policies, you should be familiar with the general concepts and terminology in alerting policies. For information about the components of a policy, the concept of an incident, and pricing and limitations, see Introduction to alerting.

Create an alerting policy

Cloud Monitoring is refreshing the interface that you use to create an alerting policy. This document provides information on the Legacy interface and the Preview interface. To return to the legacy interface when using the preview interface, click Return to Legacy UI.

Legacy interface

To create an alerting policy, do the following:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring

  2. Select Alerting.

  3. Click Create Policy to see the Create alerting policy page:

    Create an alerting policy dialog is displayed.

    1. Click Add condition and complete the dialog. For information on the fields in a condition, see Specifying conditions.

      A condition describes a monitored resource, a metric for that resource, and when the condition is met. An alerting policy must have at least 1 condition, however alerting policies can contain up to 6 conditions. When an alerting policy has exactly 1 condition and when that condition is met, an incident is created. When an alerting policy has multiple conditions, you specify how these conditions are combined. For more information, see Policies with multiple conditions.

    2. Click Next to advance to the notifications section.

    3. To be informed when an incident is created, add a notification channel to your alerting policy. You can add multiple notification channels. For details about your choices of notification channels, see Notification options.

      To add a notification channel, click Notification channels. In the dialog, select one or more notification channels from the menu and then click OK.

      Notification dialog displaying the refresh and manage channels buttons.

      To add a notification channel to the list of channels, click Manage notification channels. You are taken to the Notification channels page in a new browser tab. From this page, you can update the configured notification channels. When you have completed your updates, return to the original tab, click Refresh , and then select the notification channels to add to the alerting policy. For more information, see Creating a channel on demand.

    4. (Optional) To be notified when an incident is opened and closed, then select Notify on incident closure. By default, notifications are sent only when an incident is opened.

      Display of the notify on incident closure button.

    5. (Optional) If the field Incident autoclose duration is shown, then set the value of this field to be the duration that Monitoring should wait before closing incidents when observations stop arriving. The default value of this field is seven days. The minimum auto-close duration is 30 minutes.

      For example, assume that you have an alert with a metric threshold condition that monitors a virtual machine (VM). If you turn down the VM while an incident is open, then by default Monitoring waits for seven days before it closes the incident.

    6. Click Next to advance to the documentation section.

    7. Click Name and enter a policy name. This name is included in notifications and it is displayed in the Policies page.

    8. (Optional) Specify the documentation to be included in notifications. To format your documentation, you can use Markdown. To pull information out of the policy itself to tailor the content of your documentation, you can use variables. For more information about how you can format and tailor the content of this field, see Using Markdown and variables in documentation templates.

      For example, documentation might include a title such as Addressing High CPU Usage and details that identify the project:

      ## Addressing High CPU Usage
      
      This note contains information about high CPU Usage.
      
      You can include variables in the documentation. For example:
      
      This alert originated from the project ${project}, using
      the variable $${project}.
      

      The value replaces the variable only in notifications. The Preview Markdown pane, and the other places in the Google Cloud console that show the documentation, reflect only the Markdown formatting:

      Example writing a documentation note using markdown.

      You can also include channel-specific tagging to control notifications. For more information, see Using channel controls.

  4. Click Save.

Preview interface

To create an alerting policy, do the following:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring

  2. In the Cloud Monitoring navigation pane, select Alerting and then click Create Policy.

  3. In the New condition step on the Create alerting policy page, configure the alerting policy condition.

  4. (Optional) To create a multi-condition alerting policy, do the following:

    1. For each additional condition, click Add condition and then configure that condition.
    2. After you have added all conditions, select how these conditions are combined in the Multi-condition trigger step. For information, see Policies with multiple conditions.
  5. (Optional) To be notified when the condition of an alerting policy is met, do the following:

    1. Ensure Use notification channel is enabled in the Notifications and name step.
    2. Click the text Notification channels and select one or more notification channels from the menu.

      For information about how to add a notification channel to the list of configured channels, see Creating a channel on demand.

      For details about your choices of notification channels, see Notification options.

    3. To be notified when an incident is closed, select Notify on incident closure.

      By default, when you create an alerting policy with the Google Cloud console, a notification is sent only when an incident is created.

    You can change the notification channels for an alerting policy by editing that policy.

  6. (Optional) If the field Incident autoclose duration is shown, then update the value of this field to be the duration that Monitoring should wait before closing incidents when observations stop arriving. The default value of this field is seven days.

    For example, consider an alerting policy with a metric threshold condition that monitors a virtual machine (VM). If you turn down the VM while an incident is open, then by default Monitoring waits for seven days before it closes the incident.

  7. (Optional) To add custom labels to the alerting policy, in the Policy user labels section, do the following:

    1. Click Add label, and in the Key field enter a name for the label. Label names must start with a lowercase letter, and they can contain lowercase letters, numerals, underscores, and dashes. For example, enter severity.
    2. Click Value and enter a value for your label. Label values can contain lowercase letters, numerals, underscores, and dashes. For example, enter critical.

    You can add multiple labels.

    For information about how you can use policy labels to help you manage your alerts, see Add severity levels to an alerting policy.

  8. (Optional) To include custom documentation with a notification, add that content to the Documentation section of the Notifications and name step.

    To format your documentation, you can use Markdown. To pull information out of the policy itself to tailor the content of your documentation, you can use variables. For example, documentation might include a title such as Addressing High CPU Usage and details that identify the project:

    ## Addressing High CPU Usage
    
    This note contains information about high CPU Usage.
    
    You can include variables in the documentation. For example:
    
    This alert originated from the project ${project}, using
    the variable $${project}.
    

    When notifications are created, Monitoring replaces the variables with their values. The values replace the variables only in notifications. The preview pane and in other places in the Google Cloud console only show the Markdown formatting:

    Example writing a documentation note using markdown.

    For information about Markdown and variables, see Using Markdown and variables in documentation templates.

    For information about how to include channel-specific tagging to control notifications, see Using channel controls.

    You can change the documentation for an alerting policy by editing that policy.

  9. (Optional) To change the name of the alerting policy from New alert to something more meaningful, go to the Notifications and name step and update the policy name.

  10. Click Create.

Configure a condition

This section describes how you can configure a condition by using the Preview interface. For information about how to configure a condition with the Legacy interface, see Specifying conditions.

To configure a condition, do the following:

  1. Select how you want to specify the time series to be monitored:

    • Basic mode

      Use basic mode when you want to configure a condition that monitors a metric for a specific resource and you don't want to use MQL.

      You can convert basic mode selections into the format used by MQL or by direct filter mode.

      Basic mode is the default configuration method.

    • MQL mode

      Use MQL mode when you want to use MQL to describe the condition or when you want to monitor a ratio of metrics.

      You can't convert the MQL query to the format used by basic mode or by direct filter mode.

    • Direct filter mode

      Use direct filter mode when you are interested in monitoring any of the following:

      • A service level objective (SLO).
      • The count of processes running on virtual machines (VMs).
      • A custom metric for which you don't yet have data.

      You can't convert a direct filter mode query to the format used by basic mode or by MQL.

  2. (Optional) When using basic mode or direct filter mode, you can specify how the selected time series are processed and combined by using the Transform data fields. Default settings are selected for the data transformation.

  3. Configure the Condition trigger.

Basic mode

To select the time series to be monitored by using basic mode, do the following:

  1. Click Select a metric in the Create alerting policy page, scroll to the resource, and then navigate through the menus. After you select a metric, click Apply.

    To limit the menu to those metrics that contain a specific string, enter that string on the filter bar. For example, to restrict the menu to list only metrics whose name contains CPU, in the filter bar, enter CPU. A case-insensitive test is performed to determine whether a metric is listed in the menu.

    To change the selected metric or resource, expand Select a metric menu, and then click Reset or navigate through the menus to make a different selection.

  2. (Optional) To monitor only some of the time series displayed in the chart, add a filter. In the filter dialog, you use the Filter field to select the label by which to filter. For example, you can filter by resource group, by name, by resource label, by zone, and by metric label.

    For example, the filter zone =~ ^us.*.a$ uses a regular expression to match all time-series data whose zone name starts with us and ends with a. For more information, see Filter the selected data.

To open the Query Editor and have it pre-populated with your selections, click MQL. Any modifications that you make are discarded when you return to basic mode.

Next step: Specify how the selected time series are processed and combined by using the Transform data fields.

MQL mode

To configure a condition by using MQL, do the following:

  1. Click MQL in the toolbar of the Create custom alert page and then enter the query for the condition.

    Your query must end with one of the following operations:

    For more information, see Alerting policies with MQL.

  2. Click Next and Configure the condition trigger.

To return to basic mode, click Basic query in the toolbar of the Create custom alert page. Any modifications that you make are discarded when you return to basic mode.

Next step: Configure the Condition trigger.

Direct filter mode

To select the time series to be monitored by using direct filter mode, do the following:

  1. Click ? on the Select metric section header and then select Direct filter mode in the tooltip.

  2. Enter a Monitoring filter.

    For example, to count the number of processes that are running on Compute Engine VM instances whose name includes nginx, enter the following:

    select_process_count("monitoring.regex.full_match(\".*nginx.*\")")
    resource.type="gce_instance"
    

    For syntax information see the following resources:

To return to basic mode, click ? on the Select metric section header and then select Basic mode.

Changes that you make in direct filter mode are discarded when you switch to MQL mode. If you return to basic mode after you create or modify a monitoring filter, then your changes might not be preserved.

Next step: Specify how the selected time series are processed and combined by using the Transform data fields.

Transform data

This section applies only to basic mode and direct filter mode. It doesn't apply to MQL mode.

To configure how each time series is aligned and how time series are combined, do the following:

  1. (Optional) To change how the points in a time series are aligned to fixed time intervals, set the Rolling window and the Rolling window function in the Transform data section.

    These fields specify how the points that are recorded in a window are combined. For example, when the window is 15 minutes and the window function is max, the aligned point is the maximum value of all samples recorded in the most recent 15 minutes.

    For more information, see Align time series.

  2. (Optional) To combine time series together, in the Across time series section, click Show more, and then complete the dialog. By default, time series aren't combined.

    To create a single time series, do the following:

    1. Set the Time series aggregation field to a value other than none. For example, when you select mean, each point in the displayed time series is the average of points from the individual time series.

    2. Ensure that the Time series group by field is empty.

    To group time series, do the following:

    1. Set the Time series aggregation field to a value other than none.

      For example, if you group the time series by zone and then set the aggregation field to mean, then there is one time series for each zone.

    2. In the Time series group by field, select one or more labels by which to group.

    When you want a single time series to be shown and multiple time series are displayed after completing the previous steps, combine these time series by using the Secondary data transform fields.

    For more information, see Combine time series.

Next step: Configure the Condition trigger.

Condition trigger

To configure when a condition is met, go to the Configure alert trigger page and then do the following:

  1. If the Condition type field is shown, then select the type of condition.

    • To be notified when metric data stops arriving, select Metric absence.

    • To be notified based on the value of a metric, select Threshold.

  2. To specify how individual time series contribute to when a condition is met, select a value from the Alert trigger menu. This menu lets you specify the subset of time series that must satisfy the trigger for the condition to be met.

  3. Complete the condition-specific fields:

    • Metric absence condition:

      Specify how long the alerting policy waits to notify you that no metric data is being received by using the Trigger absence time field.

    • Threshold condition:

      • Enter how the value of the metric meets the condition criteria by using the Threshold position, Threshold value, and Time above threshold fields. For example, if you set these values to Above threshold, 0.3, and 5 minutes, then a time series meets the condition when every sample in a 5-minute interval is greater than 0.3.

      • (Optional) To select how long measurements must meet the condition criteria before alerting generates an incident, click Advanced options and then make a selection from the Retest window menu.

        A single measurement can result in a notification when you select No retest. For conceptual information and an example, see The alignment period and the duration.

      • (Optional) To select how Monitoring evaluates the condition when time-series data stops arriving, click Advanced options and then make a selection from the Evaluation missing data menu.

        Google Cloud console
        "Evaluation missing data" field
        Summary Details
        Missing data empty Open incidents stay open.
        New incidents aren't opened.

        For conditions that are met, the condition continues to be met when data stops arriving. If an incident is open for this condition, then the incident stays open. When an incident is open and no data arrives, the auto-close timer starts after a delay of at least 15 minutes. If the timer expires, then the incident is closed.

        For conditions that aren't met, the condition continues to not be met when data stops arriving.

        Missing data points treated as values that violate the policy condition Open incidents stay open.
        New incidents can be opened.

        For conditions that are met, the condition continues to be met when data stops arriving. If an incident is open for this condition, then the incident stays open. When an incident is open and no data arrives for the auto-close duration plus 24 hours, the incident is closed.

        For conditions that aren't met, this setting causes the metric-threshold condition to behave like a metric-absence condition. If data doesn't arrive in the time specified by the retest window, then the condition is evaluated as met. For an alerting policy with one condition, the condition being met results in an incident being opened.

        Missing data points treated as values that don't violate the policy condition Open incidents are closed.
        New incidents aren't opened.

        For conditions that are met, the condition stops being met when data stops arriving. If an incident is open for this condition, then the incident is closed.

        For conditions that aren't met, the condition continues to not be met when data stops arriving.

  4. (Optional) Update the condition name.

View alerting policy

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring

  2. In the navigation pane, select Alerting.

  3. To see all policies and to enable filtering, click See all policies in the Policies pane.

  4. Find the policy that you want to view, and then select it.

For example, the following screenshot illustrates the details for the alerting policy named Test staging:

Sample alert-policy details page.

As shown in the previous image, the details page provides you information about the alerting policy:

  • To view incidents created by the policy, see the Incidents section.

  • To view the configured notification channels, see the Notification Channels section.

  • To view the additional information you specified to be included with a notification, see the Documentation section.

  • To view the user-defined labels, view the Labels section. For examples that illustrate how you can use labels to manage your alerts, see Add severity levels to an alerting policy.

  • To edit, copy, delete, download the JSON representation, or change the enabled state of the policy, use the toolbar. For example, to disable an alerting policy that is enabled, click Enabled and select Disable.

View JSON for an alerting policy

The JSON representation of an alerting policy shows how the policy settings are associated with fields in the Cloud Monitoring API. To view or download the JSON representation of an alerting policy, do the following:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring
  2. In the navigation pane, select Alerting.
  3. Find the policy that you want to view, and then click the policy name to open the Policy details page.

  4. Do one of the following:

    • To download the JSON to your local system, click JSON.

    • To view the JSON, ensure you are using the Preview interface, click Edit, and then click View code.

      If View code isn't shown, then either you are using the Legacy interface or the option is hidden under More options.

Add an alerting policy to a dashboard

When an alerting policy contains one condition, you can display a summary of that alerting policy on a custom dashboard. The summary includes the time series that the alerting policy monitors, the threshold, and the number of open incidents.

To display a summary of an alerting policy on a custom dashboard, do the following:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring
  2. In the navigation pane, select Dashboards, then select the dashboard that you want to view or edit.
  3. If the Edit dashboard button is shown, then click it.

  4. Select Alert chart from the widget library, or click Add chart and then select Alert chart from the menu.

  5. In the configuration pane of the Alert chart, use the Alert policy menu to select an alerting policy. Only single-condition alerting policies can be selected from the Alert policy menu.

The following screenshot illustrates an alert chart:

Example of an alert chart.

In this example, the alerting policy is monitoring the CPU usage of two different virtual machines. The dashed line shows the condition threshold, which is set to 50%. The green chip with the label No incidents indicates that there are no open incidents for the alerting policy. If you place your pointer on the chip that shows the number of open incidents, then a dialog opens that links to the underlying alerting policy.

For more information, see Using dashboards and charts.

Manage policies

To list all alerting policies, do the following:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring

  2. In the navigation pane, select Alerting.

    The Alerting page displays panes that list summary information, incidents, and alerting policies.

  3. Click See all policies in the Policies pane.

    The Policies page lists all policies, includes a filter bar, and for each policy, and options to edit, copy, or delete the policy:

    • To restrict the alerting policies that are listed, do one of the following:

      • Enter a name on the filter bar. For example, enter Example to display policies with the string Example in their name. A case-insensitive comparison test is used to determine if a filter is listed.
      • Click Filter policies, select the filter property, and then either enter a value or select a value from the menu.

      When you have multiple filters, a logical AND joins the filters unless you insert an OR filter. The following screenshot lists all alerting policies that match test or My Uptime Check Alert Policy:

      Sample alerts overview with filters.

    • To edit or copy a policy, click More options , and select the corresponding option. Editing and copying a policy is similar to Creating an alerting policy. You can change, and sometimes delete, the values in the fields. When done, click Save policy.

    • To delete a policy, click More options and select Delete. In the confirmation dialog, select Delete.