Alerting gives timely awareness to problems in your cloud applications so you can resolve the problems quickly. In Cloud Monitoring, an alerting policy describes the circumstances under which you want to be alerted and how you want to be notified. This page provides an overview of alerting policies.
Alerting policies that are used to track metric data collected by Cloud Monitoring are called metric-based alerting policies. Most of the Cloud Monitoring documentation about alerting policies assumes that you are using metric-based alerting policies. To learn how to set up a metric-based alerting policy, try the Quickstart for Compute Engine.
You can also create log-based alerting policies, which notify you when a particular message appears in your logs. These policies are not based on metrics. This content does not apply to log-based alerting policies. For information about log-based alerting policies, see Monitoring your logs.
How alerting works
Each alerting policy specifies the following:
Conditions that describe when a resource, or a group of resources, is in a state that requires you to respond. For example, you might configure a condition as follows:
The HTTP response latency is higher than two seconds for at least five minutes.
In this example, the condition monitors the metric HTTP response latency and it triggers when all latency measurements in a five-minute period are higher than two seconds.
There are three types of conditions:
- Metric-threshold conditions trigger when the values of a metric are more than, or less than, a threshold for a specific duration window.
- Metric-absence conditions trigger when there is an absence of measurements for a duration window.
- Forecast conditions predict the future behavior of the measurements by using previous data. These conditions trigger when there is a prediction that a time series will violate the threshold within a forecast window.
An alerting policy must have at least one condition; however, you can configure a policy to contain multiple conditions.
Notification channels that describe who is to be notified when action is required. You can include multiple notification channels in an alerting policy. Cloud Monitoring supports Cloud Mobile App and Pub/Sub in addition to common notification channels. For a complete list of supported channels and information about how to configure these channels, see Create and manage notification channels.
For example, you can configure an alerting policy to email
my-support-team@example.com
and to post a Slack message to the channel#my-support-team
.Documentation that you want included in a notification. The documentation field supports plain text, Markdown, and variables.
For example, you could include in your alerting policy the following documentation:
## HTTP latency responses This alert originated from the project ${project}, using the variable $${project}.
After a metric-based alerting policy is configured, Monitoring continuously monitors the conditions of that policy. You can't configure the conditions to be monitored only for certain time periods.
When the conditions of an alerting policy trigger, Monitoring creates an incident and sends a notification about the incident creation. This notification includes summary information about the incident, a link to the Policy details page so that you can investigate the incident, and any documentation that you specified.
If an incident is open and Monitoring determines that the conditions of the metric-based policy are no longer met, then Monitoring automatically closes the incident and sends a notification about the closure.
Example
You deploy a web application onto a Compute Engine virtual machine (VM) instance that's running a web application. While you expect the HTTP response latency to fluctuate, you want your support team to respond when the application has high latency for a significant time period.
To ensure that your support team is notified when your application experiences high latencies, you create the following alerting policy:
If the HTTP response latency is higher than two seconds for at least five minutes, then open an incident and send an email to your support team.
In this alerting policy, the metric-threshold condition is monitoring the HTTP response latency. If this latency is higher than two seconds continuously for five minutes, then the condition triggers and an incident is created. A transient spike in latency doesn't cause the condition to trigger or an incident to be created.
Your web application turns out to be popular, and the response latency grows beyond two seconds. Here's how your alerting policy responds:
Monitoring starts a five-minute timer when it receives an HTTP latency measurement higher than two seconds.
If each latency measure received during the next five minutes is higher than two seconds, then the timer expires. When the timer expires, the condition triggers, and Monitoring opens an incident and sends an email to your support team.
Your support team receives the email, signs into the Google Cloud console, and acknowledges receipt of the notification.
Following the documentation in the notification email, your support team is able to address the cause of the latency. Within a few minutes, the HTTP response latency drops to less than two seconds.
When Monitoring receives an HTTP latency measurement less than two seconds, it closes the incident and sends a notification to your support team that the incident is closed.
If the latency rises higher than two seconds and stays higher than that threshold for five minutes, then a new incident is opened and a notification is sent.
How to add an alerting policy
You can add a metric-based alerting policy to your Google Cloud project by using the Google Cloud console, the Cloud Monitoring API, or the Google Cloud CLI:
When you use the Google Cloud console, you can enable a recommended alert or you can create an alert by starting from the Alerts page of Cloud Monitoring.
Recommended alerts are available for some Google Cloud products. These alerts require minimal configuration, such as adding notification channels. For example, the Pub/Sub Lite Topics page links to alerts that are configured to notify you when you're reaching a quota limit. Similarly, the VM Instances page from within Monitoring links to alerting policies that are configured to monitor the memory utilization and network latency of those instances.
For information about how to create an alerting policy, see the following documents:
- Create metric-threshold alerting policies
- Create metric-absence alerting policies
- Create forecasted metric-value alerting policies
Any policy that you create by using the Google Cloud console, you can also modify and view by using either the Google Cloud console or the Cloud Monitoring API. The Cloud Monitoring API lets you create alerting policies that monitor ratios of metrics. When these policies use Monitoring filters, you can't view or modify them by using the Google Cloud console.
When you use the Cloud Monitoring API directly or when you use the Google Cloud CLI, you can create, view, and modify alerting policies.
For more information, see Create alerting policies by using the Cloud Monitoring API or Google Cloud CLI.
You can create conditions that monitor a single metric, multiple metrics, or a ratio of metrics. When you use the Cloud Monitoring API, you can specify the ratio by using Monitoring Query Language (MQL) or by using Monitoring filters. For an example of a policy that uses Monitoring filters, see Metric ratio.
Cloud Monitoring supports an expressive, text-based language that can be used with the Google Cloud console and with the Cloud Monitoring API. For information about using this language with alerting, see Creating alerting policies by using Monitoring Query Language (MQL).
You can add a log-based alerting policy to your Google Cloud project by using the Logs Explorer in Cloud Logging or by using the Monitoring API. This content does not apply to log-based alerting policies. For information about log-based alerting policies, see Monitoring your logs.
Costs associated with alerting policies
There are no costs associated with using alerting policies. For information about the pricing of uptime checks, see Cloud Monitoring pricing summary.
The following limits apply to your use of alerting policies and uptime checks:
Category | Value | Policy type1 |
---|---|---|
Alerting policies (sum of metric and log) per metrics scope 2 | 500 | Metric, Log |
Conditions per alerting policy | 6 | Metric |
Maximum time period that a metric-absence condition evaluates3 |
1 day | Metric |
Maximum time period that a metric-threshold condition evaluates3 |
23 hours 30 minutes | Metric |
Maximum length of the filter used in a metric-threshold condition |
2,048 Unicode characters | Metric |
Maximum number of time series monitored by a forecast condition |
64 | Metric |
Minimum forecast window | 1 hour (3,600 seconds) | Metric |
Maximum forecast window | 7 days (604,800 seconds) | Metric |
Notification channels per alerting policy | 16 | Metric, Log |
Maximum rate of notifications | 1 notification every 5 minutes for each log-based alert | Log |
Maximum number of notifications | 20 notifications a day for each log-based alert | Log |
Maximum number of simultaneously open incidents per alerting policy |
1,000 | Metric |
Period after which an incident with no new data is automatically closed |
7 days | Metric |
Maximum duration of an incident if not manually closed | 7 days | Log |
Retention of closed incidents | 13 months | Not applicable |
Retention of open incidents | Indefinite | Not applicable |
Notification channels per metrics scope | 4,000 | Not applicable |
Maximum number of alerting policies per snooze | 16 | Metric, Log |
Retention of a snooze | 13 months | Not applicable |
Uptime checks per metrics scope 4 | 100 | Not applicable |
Maximum number of ICMP pings per public uptime check | 3 | Not applicable |
2Apigee and Apigee hybrid are deeply integrated with Cloud Monitoring. The alerting limit for all Apigee subscription levels—Standard, Enterprise, and Enterprise Plus—is the same as for Cloud Monitoring: 500 per metrics scope .
3The maximum time period that a condition evaluates is the sum of the alignment period and the duration window values. For example, if the alignment period is set to 15 hours, and the duration window is set 15 hours, then 30 hours of data is required to evaluate the condition.
4This limit applies to the number of uptime-check configurations. Each uptime-check configuration includes the time interval between testing the status of the specified resource. See Managing uptime checks for more information.
For full pricing information, see Pricing for Google Cloud's operations suite.
What's next
For information about notification latency and how the choices for the parameters of an alerting policy affect when notifications are sent, see Behavior of metric-based alerting policies.
For a list of metric-based policy examples, see Summary of example alerting policies.
For information about how to monitor the number of trace spans or logs that are ingested, or how to be notified when specific content is included in a log entry, see the following: