Alerting overview

Alerting gives timely awareness to problems in your cloud applications so you can resolve the problems quickly. In Cloud Monitoring, an alerting policy describes the circumstances under which you want to be alerted and how you want to be notified. This page provides an overview of alerting policies.

Alerting policies that are used to track metric data collected by Cloud Monitoring are called metric-based alerting policies. Most of the Cloud Monitoring documentation about alerting policies assumes that you are using metric-based alerting policies. To learn how to set up a metric-based alerting policy, try the Quickstart for Compute Engine.

You can also create log-based alerting policies, which notify you when a particular message appears in your logs. These policies are not based on metrics. This content does not apply to log-based alerting policies. For information about log-based alerting policies, see Monitoring your logs.

How alerting works

Each alerting policy specifies the following:

  • Conditions that describe when a resource, or a group of resources, is in a state that requires you to respond. For example, you might configure a condition as follows:

    The HTTP response latency is higher than two seconds for at least five minutes.
    

    In this example, the condition monitors the metric HTTP response latency and it triggers when all latency measurements in a five-minute period are higher than two seconds.

    There are three types of conditions:

    • Metric-threshold conditions trigger when the values of a metric are more than, or less than, a threshold for a specific duration window.
    • Metric-absence conditions trigger when there is an absence of measurements for a duration window.
    • Forecast conditions predict the future behavior of the measurements by using previous data. These conditions trigger when there is a prediction that a time series will violate the threshold within a forecast window.

    An alerting policy must have at least one condition; however, you can configure a policy to contain multiple conditions.

  • Notification channels that describe who is to be notified when action is required. You can include multiple notification channels in an alerting policy. Cloud Monitoring supports Cloud Mobile App and Pub/Sub in addition to common notification channels. For a complete list of supported channels and information about how to configure these channels, see Create and manage notification channels.

    For example, you can configure an alerting policy to email my-support-team@example.com and to post a Slack message to the channel #my-support-team.

  • Documentation that you want included in a notification. The documentation field supports plain text, Markdown, and variables.

    For example, you could include in your alerting policy the following documentation:

    ## HTTP latency responses
    
    This alert originated from the project ${project}, using
    the variable $${project}.
    

After a metric-based alerting policy is configured, Monitoring continuously monitors the conditions of that policy. You can't configure the conditions to be monitored only for certain time periods.

When the conditions of an alerting policy trigger, Monitoring creates an incident and sends a notification about the incident creation. This notification includes summary information about the incident, a link to the Policy details page so that you can investigate the incident, and any documentation that you specified.

If an incident is open and Monitoring determines that the conditions of the metric-based policy are no longer met, then Monitoring automatically closes the incident and sends a notification about the closure.

Example

You deploy a web application onto a Compute Engine virtual machine (VM) instance that's running a web application. While you expect the HTTP response latency to fluctuate, you want your support team to respond when the application has high latency for a significant time period.

To ensure that your support team is notified when your application experiences high latencies, you create the following alerting policy:

  If the HTTP response latency is higher than two seconds for at least five
  minutes, then open an incident and send an email to your support team.

In this alerting policy, the metric-threshold condition is monitoring the HTTP response latency. If this latency is higher than two seconds continuously for five minutes, then the condition triggers and an incident is created. A transient spike in latency doesn't cause the condition to trigger or an incident to be created.

Your web application turns out to be popular, and the response latency grows beyond two seconds. Here's how your alerting policy responds:

  1. Monitoring starts a five-minute timer when it receives an HTTP latency measurement higher than two seconds.

  2. If each latency measure received during the next five minutes is higher than two seconds, then the timer expires. When the timer expires, the condition triggers, and Monitoring opens an incident and sends an email to your support team.

  3. Your support team receives the email, signs into the Google Cloud console, and acknowledges receipt of the notification.

  4. Following the documentation in the notification email, your support team is able to address the cause of the latency. Within a few minutes, the HTTP response latency drops to less than two seconds.

  5. When Monitoring receives an HTTP latency measurement less than two seconds, it closes the incident and sends a notification to your support team that the incident is closed.

If the latency rises higher than two seconds and stays higher than that threshold for five minutes, then a new incident is opened and a notification is sent.

How to add an alerting policy

You can add a metric-based alerting policy to your Google Cloud project by using the Google Cloud console, the Cloud Monitoring API, or the Google Cloud CLI:

  • When you use the Google Cloud console, you can enable a recommended alert or you can create an alert by starting from the Alerts page of Cloud Monitoring.

    Recommended alerts are available for some Google Cloud products. These alerts require minimal configuration, such as adding notification channels. For example, the Pub/Sub Lite Topics page links to alerts that are configured to notify you when you're reaching a quota limit. Similarly, the VM Instances page from within Monitoring links to alerting policies that are configured to monitor the memory utilization and network latency of those instances.

    For information about how to create an alerting policy, see the following documents:

    Any policy that you create by using the Google Cloud console, you can also modify and view by using either the Google Cloud console or the Cloud Monitoring API. The Cloud Monitoring API lets you create alerting policies that monitor ratios of metrics. When these policies use Monitoring filters, you can't view or modify them by using the Google Cloud console.

  • When you use the Cloud Monitoring API directly or when you use the Google Cloud CLI, you can create, view, and modify alerting policies.

    For more information, see Create alerting policies by using the Cloud Monitoring API or Google Cloud CLI.

    You can create conditions that monitor a single metric, multiple metrics, or a ratio of metrics. When you use the Cloud Monitoring API, you can specify the ratio by using Monitoring Query Language (MQL) or by using Monitoring filters. For an example of a policy that uses Monitoring filters, see Metric ratio.

Cloud Monitoring supports an expressive, text-based language that can be used with the Google Cloud console and with the Cloud Monitoring API. For information about using this language with alerting, see Creating alerting policies by using Monitoring Query Language (MQL).

You can add a log-based alerting policy to your Google Cloud project by using the Logs Explorer in Cloud Logging or by using the Monitoring API. This content does not apply to log-based alerting policies. For information about log-based alerting policies, see Monitoring your logs.

Costs associated with alerting policies

There are no costs associated with using alerting policies. For information about the pricing of uptime checks, see Cloud Monitoring pricing summary.

The following limits apply to your use of alerting policies and uptime checks:

Category Value Policy type1
Alerting policies (sum of metric and log) per metrics scope 2 500 Metric, Log
Conditions per alerting policy 6 Metric
Maximum time period that a
metric-absence condition evaluates3
1 day Metric
Maximum time period that a
metric-threshold condition evaluates3
23 hours 30 minutes Metric
Maximum length of the filter used
in a metric-threshold condition
2,048 Unicode characters Metric
Maximum number of time series
monitored by a forecast condition
64 Metric
Minimum forecast window 1 hour (3,600 seconds) Metric
Maximum forecast window 7 days (604,800 seconds) Metric
Notification channels per alerting policy 16 Metric, Log
Maximum rate of notifications 1 notification every 5 minutes for each log-based alert Log
Maximum number of notifications 20 notifications a day for each log-based alert Log
Maximum number of simultaneously open incidents
per alerting policy
1,000 Metric
Period after which an incident with no new data is
automatically closed
7 days Metric
Maximum duration of an incident if not manually closed 7 days Log
Retention of closed incidents 13 months Not applicable
Retention of open incidents Indefinite Not applicable
Notification channels per metrics scope 4,000 Not applicable
Maximum number of alerting policies per snooze 16 Metric, Log
Retention of a snooze 13 months Not applicable
Uptime checks per metrics scope 4 100 Not applicable
Maximum number of ICMP pings per public uptime check 3 Not applicable
1Metric: an alerting policy based on metric data; Log: an alerting policy based on log messages (log-based alerts)
2Apigee and Apigee hybrid are deeply integrated with Cloud Monitoring. The alerting limit for all Apigee subscription levels—Standard, Enterprise, and Enterprise Plus—is the same as for Cloud Monitoring: 500 per metrics scope .
3The maximum time period that a condition evaluates is the sum of the alignment period and the duration window values. For example, if the alignment period is set to 15 hours, and the duration window is set 15 hours, then 30 hours of data is required to evaluate the condition.
4This limit applies to the number of uptime-check configurations. Each uptime-check configuration includes the time interval between testing the status of the specified resource. See Managing uptime checks for more information.

For full pricing information, see Pricing for Google Cloud's operations suite.

What's next