This document describes how to use the Google Cloud console to create a metric-based alerting policy that sends notifications and generates an alert, or equivalently an incident, when values of a metric are more than, or less than, the threshold for a specific duration window. For example, a policy might trigger when the CPU utilization is higher than 80% for at least five minutes.
This content does not apply to log-based alerting policies. For information about log-based alerting policies, which notify you when a particular message appears in your logs, see Monitoring your logs.
This document doesn't describe the following:
- How to be notified when data stops arriving. For information about this topic, see Create metric-absence alerting policies.
- How to be notified based on the predicted value of a metric. For information about this topic, see Create forecasted metric-value alerting policies.
How to create an alerting policy by using the Cloud Monitoring API. For information about this topic, see Create alerting policies by using the API.
How to create an alerting policy whose condition includes a Monitoring Query Language (MQL) query. These policies can use a static or dynamic threshold. For more information, see the following documents:
Before you begin
-
To get the permissions that you need to create and modify alerting policies by using the Google Cloud console, ask your administrator to grant you the Monitoring Editor (
roles/monitoring.editor
) IAM role on your project. For more information about granting roles, see Manage access.You might also be able to get the required permissions through custom roles or other predefined roles.
For more information about Cloud Monitoring roles, see Control access with Identity and Access Management.
Ensure that you're familiar with the general concepts of alerting policies. For information about these topics, see Alerting overview.
Configure the notification channels that you want to use to receive any alerts. For redundancy purposes, we recommend that you create multiple types of notification channels. For more information, see Create and manage notification channels.
Create alerting policy
To create an alerting policy that compares the value of that metric to a static threshold, do the following:
In the Google Cloud console, select Monitoring or click the following button:
Go to MonitoringIn the navigation pane, select notifications Alerting and then click Create policy.
Select the time series to be monitored:
Click Select a metric, navigate through the menus to select a resource type and metric type, and then click Apply.
To reduce the options in the menus, enter into the filter bar the name of the metric type or resource type that is of interest. For example, if you enter "VM instance" in the filter bar, then only metric types for VM instances are listed. If you enter "CPU", then the menus only display metric types that contain "CPU" in their name.
You can monitor any built-in metric or any user-defined metric.
For information about how to monitor a metric that isn't listed in the menus, see Metric not listed in menu.
Optional: To monitor a subset of the time series that match the metric and resource types you selected in the previous step, click Add filter. In the filter dialog, select the label by which to filter, a comparator, and then the filter value. For example, the filter
zone =~ ^us.*.a$
uses a regular expression to match all time-series data whose zone name starts withus
and ends witha
. For more information, see Filter the selected time series.Optional: To change how the points in a time series are aligned, in the Transform data section, set the Rolling window and Rolling window function fields.
These fields specify how the points that are recorded in a window are combined. For example, assume that the window is 15 minutes and the window function is
max
. The aligned point is the maximum value of all points in the most recent 15 minutes. For more information, see Alignment: within-series regularization.You can also monitor the rate at which a metric value changes by using the Rolling window function field to percent change. For more information, see Monitor a rate of change.
Optional: Combine time series when you want to reduce the number of time series monitored by a policy, or when you want to monitor only a collection of time series. For example, instead of monitoring the CPU utilization of each VM instance, you might want to compute the average of the CPU utilization for all VMs in a zone, and then monitor that average. By default, time series aren't combined. For general information, see Reduction: combining time series.
To combine all time series, do the following:
- In the Across time series section, click expand_more Expand.
- Set the Time series aggregation field to a value other than
none
. For example, to display the average value of the time series, selectmean
. - Ensure that the Time series group by field is empty.
To combine, or group, time series by label values, do the following:
- In the Across time series section, click expand_more Expand.
- Set the Time series aggregation field to a value other than
none
. - In the Time series group by field, select the labels by which to group.
For example, if you group by the
zone
label and then set the aggregation field to a value ofmean
, then the chart displays one time series for each zone for which there is data. The time series shown for a specific zone is the average of all time series with that zone.Click Next.
Configure the condition trigger:
Leave the Condition type field at the default value of Threshold.
Select a value for the Alert trigger menu. This menu lets you specify the subset of time series that must violate the threshold before the condition is triggered.
Enter when the value of a metric violates the threshold by using the Threshold position and Threshold value fields. For example, if you set these values to Above threshold and
0.3
, then any measurement higher than0.3
violates the threshold.Optional: To select how long measurements must violate the threshold before alerting generates an incident, expand Advanced options and then use the Retest window menu.
The default value is No retest. With this setting, a single measurement can result in a notification. For more information and an example, see Alignment period and duration settings.
Optional: To specify how Monitoring evaluates the condition when data stops arriving, expand Advanced options, and then use the Evaluation missing data menu.
The Evaluation missing data menu is disabled when the value of the Retest window is No retest.
Google Cloud console
"Evaluation of missing data" fieldSummary Details Missing data empty Open incidents stay open.
New incidents aren't opened.For conditions that are met, the condition continues to be met when data stops arriving. If an incident is open for this condition, then the incident stays open. When an incident is open and no data arrives, the auto-close timer starts after a delay of at least 15 minutes. If the timer expires, then the incident is closed.
For conditions that aren't met, the condition continues to not be met when data stops arriving.
Missing data points treated as values that violate the policy condition Open incidents stay open.
New incidents can be opened.For conditions that are met, the condition continues to be met when data stops arriving. If an incident is open for this condition, then the incident stays open. When an incident is open and no data arrives for the auto-close duration plus 24 hours, the incident is closed.
For conditions that aren't met, this setting causes the metric-threshold condition to behave like a
metric-absence condition
. If data doesn't arrive in the time specified by the retest window, then the condition is evaluated as met. For an alerting policy with one condition, the condition being met results in an incident being opened.Missing data points treated as values that don't violate the policy condition Open incidents are closed.
New incidents aren't opened.For conditions that are met, the condition stops being met when data stops arriving. If an incident is open for this condition, then the incident is closed.
For conditions that aren't met, the condition continues to not be met when data stops arriving.
Click Next.
Optional: Create an alerting policy with multiple conditions.
Most policies monitor a single metric type, for example, a policy might monitor the number of bytes written to a VM instance. When you want to monitor multiple metric types, create a policy with multiple conditions. Each condition monitors one metric type. After you create the conditions, you specify how the conditions are combined. For information, see Policies with multiple conditions.
To create an alerting policy with multiple conditions, do the following:
- For each additional condition, click Add alert condition and then configure that condition.
- Click Next and configure how conditions are combined.
- Click Next to advance to the notifications and documentation set up.
Configure the notifications and documentation:
Expand the Notification channels menu and select your notification channels. For redundancy purposes, we recommend that you add to an alerting policy multiple types of notification channels. For more information, see Manage notification channels.
Optional: To be notified when an incident is closed, select Notify on incident closure. By default, when you create an alerting policy with the Google Cloud console, a notification is sent only when an incident is created.
Optional: To change how long Monitoring waits before closing an incident after data stops arriving, select an option from the Incident autoclose duration menu. By default, when data stops arriving, Monitoring waits seven days before closing an open incident.
Optional: To add custom labels to the alerting policy, in the Policy user labels section, do the following:
- Click Add label, and in the Key field enter a name for the
label. Label names must start with a lowercase letter, and they can
contain lowercase letters, numerals, underscores, and dashes.
For example, enter
severity
. - Click Value and enter a value for your label. Label values can
contain lowercase letters, numerals, underscores, and dashes.
For example, enter
critical
.
For information about how you can use policy labels to help you manage your alerts, see Annotate alerts with labels.
- Click Add label, and in the Key field enter a name for the
label. Label names must start with a lowercase letter, and they can
contain lowercase letters, numerals, underscores, and dashes.
For example, enter
Optional: To include custom documentation with a notification, enter that content to the Documentation section. To format your documentation, you can use Markdown. To pull information out of the policy itself to tailor the content of your documentation, you can use variables. For example, documentation might include a title such as
Addressing High CPU Usage
and details that identify the project:## Addressing High CPU Usage This note contains information about high CPU Usage. You can include variables in the documentation. For example: This alert originated from the project ${project}, using the variable $${project}.
When notifications are created, Monitoring replaces the variables with their values. The values replace the variables only in notifications. The preview pane and in other places in the Google Cloud console only show the Markdown formatting:
For more information, see Annotate alerts with user-defined documentation and Using channel controls.
Click Alert name and enter a name for the alerting policy.
Click Create policy.
Filter the selected time series
You can reduce the amount of data being monitored by specifying filter criteria or by applying aggregation. Filters ensure that only time series that meet some set of criteria are used. When you apply filters, there are fewer time series to evaluate, which can improve the performance of the alert.
When you supply multiple filtering criteria, only the time series that meet all criteria are monitored.
To add a filter, click Add filter, complete the dialog, and then click Done. In the dialog, you use the Filter field to select the criterion by which to filter. For example, you can filter by resource group, by name, by resource label, by zone, and by metric label. After you select the filter criterion, then complete the filter by selecting the comparison operator and the value. Each row in the following table lists a comparison operator, its meaning, and an example:
Operator | Meaning | Example |
---|---|---|
= |
Equality | resource.labels.zone = "us-central1-a" |
!= |
Inequality | resource.labels.zone != "us-central1-a" |
=~ |
Regular expression2 equality | monitoring.regex.full_match("^us.*") |
!=~ |
Regular expression2 inequality | monitoring.regex.full_match("^us.*") |
starts_with |
Value starts with | resource.labels.zone = starts_with("us") |
ends_with |
Value ends with | resource.labels.zone = ends_with("b") |
has_substring |
Value contains | resource.labels.zone = has_substring("east") |
one_of |
One of | resource.labels.zone = one_of("asia-east1-b", "europe-north1-a") |
!starts_with |
Value doesn't start with | resource.labels.zone != starts_with("us") |
!ends_with |
Value doesn't ends with | resource.labels.zone != ends_with("b") |
!has_substring |
Value doesn't contain | resource.labels.zone != has_substring("east") |
!one_of |
Value isn't one of | resource.labels.zone != one_of("asia-east1-b", "europe-north1-a") |
Troubleshoot
This section contains troubleshooting tips.
Metric not listed in menu of available metrics
To monitor a metric that isn't listed in the Select a metric menu, do one of the following:
To create an alerting policy that monitors a Google Cloud metric, expand the Select a metric menu and then disable the Show only active resources & metrics toggle. When disabled, the menu lists all metrics for Google Cloud services, and all metrics with data.
To configure an alert for a custom metric type before that metric type generates data, you must specify the metric type by using a Monitoring filter:
- Select ? on the Select metric section header and then select Direct filter mode in the tooltip.
Enter a monitoring filter or a time series selector. For information about syntax, see the following documents:
Monitor a rate of change
To monitor the rate of change of a metric value, set the Rolling window function field to percent change, then Monitoring compares the rate of change of the metric to the threshold. When you select the percent change function, Monitoring does the following:
- If the time series has a
DELTA
orCUMULATIVE
metric kind, the time series is converted to one that has aGAUGE
metric kind. For information about the conversion, see Kinds, types, and conversions. - Computes percent changed by comparing the average value in the most recent 10-minute window to the average value from the 10-minute window before the retest window.
The 10-minute lookback window is a fixed value; you can't change it. However, you do specify the retest window when you create a condition.
What's next
- To create a policy that compares the value of a time series to a dynamic threshold, you must use MQL. For more information, see Create dynamic severity levels using MQL.
The instructions on this page apply to any alerting policy. The following documents provide guidance for specific configurations: