Managing alerting policies

An alerting policy describes a set of conditions that you want to monitor. These conditions might relate to the state of an unhealthy system or to resource consumption. For example, you might want to create a policy to monitor an uptime check, or to create a policy that monitors your Cloud Monitoring API usage. In addition to conditions, in an alerting policy, you can specify how you want to be notified and what documentation is included in that notification.

This page illustrates how to use the Monitoring in the Google Cloud Console to create and manage an alerting policy. You can also use the Cloud Monitoring API to perform these tasks. For more information on this approach, see Managing alerting policies by API. To see policies represented in JSON, see Sample policies.

Before you begin

Before creating alerting policies, you should be familiar with the general concepts and terminology in alerting policies. This includes understanding the components of a policy, the concept of an incident, and pricing and limitations. See Introduction to alerting for more information.

Notification channels

Notification channels let you specify how you want to be informed of alerts. If a type of notification channel is configured, you have the option to select it when setting up notifications for your alerting policy.

For a list of notification channel types and channel configuration information, see Notification options.

Alerting policies on uptime checks

Create the alerting policy for an uptime check from the Uptime checks window. By following this guideline, most fields in the alerting policy are preconfigured:

Creating an alerting policy

  1. In the Cloud Console, select Monitoring:

    Go to Monitoring

    If you have never used Monitoring, then a Workspace is automatically created and your project is associated with that Workspace. Otherwise, if your project isn't associated with a Workspace, then a dialog appears and you can either create a new Workspace or add this project to an existing Workspace. After you make your selection, click Add. After the Workspace is created, you are automatically transitioned to Alerting.

  2. Select Alerting.

  3. Click Create Policy.

    Create an alerting policy dialog is displayed.

  4. After you complete specifying the name, conditions, notification channels, and adding documentation, click Save.

Naming

In the Create New Alerting Policy window, enter a policy name. The policy name is included in notifications, and it is displayed in the Policies window.

Conditions

Each condition on an alerting policy describes a resource that is being monitored and when that resource isn't meeting a performance measure. If an alerting policy has 1 condition, then an incident is created when the condition's configuration evaluates to true. For example, if the configuration is Any time series is above 10 for 5 minutes, then when this statement evaluates to true, the condition is met and therefore an incident should be created. See Multiple conditions for information on combining multiple conditions.

To add an alerting condition to a policy, in the Create New Alerting Policy window, click Add Condition. For information on defining conditions, see Specifying conditions.

Policies with multiple conditions

An alerting policy can contain up to 6 conditions.

If you are using the Cloud Monitoring API or if your alerting policy has multiple conditions, then you must specify when violations of the individual conditions result in an incident being opened:

  • If you are using the Google Cloud Console, you use the Policy triggers field.
  • If you are using the Cloud Monitoring API, then you use the combiner field.

This table lists the settings in the Cloud Console, the equivalent value in the Cloud Monitoring API, and a description of each setting:

Cloud Console
Policy triggers value
Cloud Monitoring API
combiner value
Meaning
Any condition is met
(default value)
OR An incident is opened if any resource violates any of the conditions.
All conditions are met AND An incident is opened if each each condition is violated by at least one resource, even if a different resource violates each condition.
All conditions are met
on matching resources
AND_WITH_MATCHING_RESOURCE An incident is opened if each condition is violated by the same resource. This setting is the most stringent combining choice.

In this context, the term met means that the condition's configuration evaluates to true. For example, if the configuration is Any time series is above 10 for 5 minutes, then when this statement evaluates to true, the condition is met.

Example

Consider a Google Cloud project that contains two VM instances, vm1 and vm2. Also, assume that you create an alerting policy with 2 conditions:

  • The condition named CPU usage is too high monitors the CPU usage of the instances. This condition is met when the CPU usage of any instance is above 100ms/s for 1 minute.
  • The condition named Excessive utilization monitors the CPU utilization of the instances. This condition is met when the CPU utilization of any instance is above 60% for 1 minute.

Initially, assume that both conditions evaluate to false.

Next, assume that the CPU usage of vm1 exceeds 100ms/s for 1 minute. This causes CPU usage is too high to evaluate to true. If the conditions are combined with Any condition is met, then an incident is created because a condition is met. If the conditions are combined with All conditions are met or All conditions are met on matching resources, then an incident isn't created. These combiner choices require that both conditions evaluate to true.

Now assume that the CPU usage of vm1 continues to be above 100ms/s and that the CPU utilization of vm2 exceeds 60% for 1 minute. The result is that both conditions evaluate to true. The following describes what occurs based on how the conditions are combined:

  • Any condition is met: A second incident is created because vm2 is causing Excessive utilization to evaluate to true.

    When a condition's configuration evaluates to true, the alerting policy keeps a record of the monitored resource and the condition. An incident is created based on the pairing of the resource and the condition. Therefore, vm1 causing CPU usage is too high to be true and vm2 causing CPU usage is too high to be true are distinct events. An incident is created for each event.

  • All conditions are met: An incident is created because both conditions evaluate to true.

    In this example, vm1 causes CPU usage is too high to be true while vm2 is causing Excessive utilization to evaluate to true. Consequently, an incident is created.

  • All conditions are met on matching resources: An incident isn't created in this case because neither vm1 nor vm2 caused both conditions to evaluate to true. For an incident to be created for this combiner choice, the same VM instance must cause both conditions to evaluate to true.

Notifications

Notification channels let you specify how you want to be informed of alerts. If you don't add at least one notification channel, then you aren't notified when an incident occurs. You can add multiple notification channels. For details on your choices of notification channels, see Notification options.

To add a notification channel, do the following:

  1. Click Add notification channel.
  2. Select the Notification Channel Type.
  3. You might need to complete additional fields. For example, if you select Email, then you are prompted for your email address.
  4. Click Add.

To add an additional notification channel to your policy, repeat the previous steps. Configuring at least two notification channel types increases reliability.

Documentation

Documentation is included in notifications to help you manage the failure condition. You can use Markdown to format your documentation, or use plain text.

In addition to Markdown, you can use variables to pull information out of the policy itself to tailor the content of your documentation. For more information, see Using variables.

For example, documentation might include a title such as Addressing High CPU Usage and details that identify the project:

## Addressing High CPU Usage

This note contains information about high CPU Usage.

You can include variables in the documentation. For example:

This alert originated from the project ${project}, using
the variable $${project}.

The variables are replaced by their values only in notifications. The Preview Markdown pane, and the other places in the Cloud Console that show the documentation, reflect only the Markdown formatting:

Example writing a documentation note using markdown.

You can also include channel-specific tagging to control notifications. For more information, see Using channel controls.

Managing policies

To list all alerting policies, do the following:

  1. In the Cloud Console, select Monitoring:

    Go to Monitoring

  2. Select Alerting.

  3. A partial list of policies is shown in the Policies pane. To see all policies and to enable filtering, click See all policies.

To view the details of an alerting policy, click on its name.

To restrict the alerting policies that are listed, add filters. Each filter is composed of a name and a value. You can set the value to be an exact match for a policy name, or a partial match. Matches are not case sensitive. If you have multiple filters, then the filters are automatically joined by a logical AND unless you insert an OR filter. The following screenshot list all alerting policies that matche test or My Uptime Check Alert Policy:

Sample alerts overview with filters.

From the Policies window you can edit, delete, copy, enable, or disable an alerting policy:

  • To edit or copy a policy, click More options , and select the corresponding option. Editing and copying a policy is similar to Creating an alerting policy. You can change, and in some cases, delete the values in the fields. When done, click Save.

  • To delete a policy, click More options and select Delete. In the confirmation window, select Delete.

  • To changed the enabled status of the alerting policy, click the toggle located under the heading Enabled.

If you select an alerting policy, then the Policy details are displayed. For example, the following screenshot illustrates the details for the alerting policy named My Uptime Check Alert Policy:

Sample alert-policy details page.

You can edit, delete, copy, enable, or disable an alerting policy from the Policy details window. For example, to disable an alerting policy that is currently enabled, click Enabled and select Turn off. Similarly, to enable a policy that is currently disabled, click Disabled.