Introduction to alerting

Alerting gives timely awareness to problems in your cloud applications so you can resolve the problems quickly.

In Cloud Monitoring, an alerting policy describes the circumstances under which you want to be alerted and how you want to be notified. This page provides an overview of alerting policies.

Alerting policies that are used to track metric data collected by Cloud Monitoring are called metric-based alerting policies. Most of the Cloud Monitoring documentation about alerting policies assumes that you are using metric-based alerting policies. To learn how to set up a metric-based alerting policy, try the Quickstart for Compute Engine.

You can also create log-based alerting policies, which notify you when a particular message appears in your logs. These policies are not based on metrics. This content does not apply to log-based alerting policies. For information about log-based alerting policies, see Monitoring your logs.

How alerting works

Each alerting policy specifies the following:

  • Conditions that describe when a resource, or a group or resources, is in a state that requires you to take action. An alerting policy must have at least one condition; however, you can configure a policy to contain multiple conditions.

    For example, you might configure a condition as follows:

    The HTTP response latency is higher than two seconds for at least five minutes.
    

    In this example, the condition monitors the metric HTTP response latency and specifies when the values of the metric require you to take action.

  • Notification channels that describe who is to be notified when action is required. You can include multiple notification channels in an alerting policy. Cloud Monitoring supports common notification channels as well as Cloud Mobile App and Pub/Sub. For a complete list of supported channels and information about how to configure these channels, see Notification options.

    For example, you can configure an alerting policy to email my-support-team@example.com and to post a Slack message to the channel #my-support-team.

  • Documentation that you want included in a notification. The documentation field supports plain text, markdown, and variables.

    For example, you could include in your alerting policy the following documentation:

    ## HTTP latency responses
    
    This alert originated from the project ${project}, using
    the variable $${project}.
    

After a metric-based alerting policy is configured, Monitoring continuously monitors the conditions of that policy. You can't configure the conditions to be monitored only for certain time periods. When the conditions of that policy are met, that is, when the state of resources requires that you take action, Monitoring creates an incident and sends a notification about the incident creation. This notification includes summary information about the incident, a link to the Policy details page so that you can investigate the incident, and any documentation that you specified.

If an incident is open and Monitoring determines that the conditions of the metric-based policy are no longer met, then Monitoring automatically closes the incident and sends a notification about the closure.

Example

You deploy a web application onto a Compute Engine virtual machine (VM) instance that's running a web application. While you know that the HTTP response latency might fluctuate as normal demand rises and falls, if your users start to experience high latency for a significant period of time, you want to be notified so that your support team can take action.

To be notified when your users experience high latency, you create the following alerting policy:

  If the HTTP response latency is higher than two seconds for at least five minutes,
  then open an incident and send an email to your support team.

In this alerting policy, the condition is monitoring the HTTP response latency. If this latency is higher than two seconds continuously for five minutes, then the condition is met and an incident is created. A transient spike in latency doesn't cause the condition to be met or an incident to be created.

Your web app turns out to be very popular, and the response latency grows beyond two seconds. Here's how your alerting policy responds:

  1. Monitoring starts a five-minute timer when it receives a HTTP latency measurement higher than two seconds.

  2. If each latency measure received during the next five minutes is higher than two seconds, then the timer expires. When the timer expires, Monitoring marks the condition as met, it opens an incident, and it sends an email to your support team.

  3. Your support team receives the email, signs into the Cloud Console, and acknowledges receipt of the notification.

  4. Following the documentation in the notification email, your support team is able to address the cause of the latency. Within a few minutes, the HTTP response latency drops to below two seconds.

  5. When Monitoring receives an HTTP latency measurement below two seconds, it closes the incident and sends a notification to your support team that the incident is closed.

After the incident is closed, if the HTTP response latency rises higher than two seconds and stays higher than that threshold continuously for five minutes, then Monitoring opens a new incident and sends a notification email.

How to add an alerting policy

You can add a metric-based alerting policy to your Google Cloud project by using the Google Cloud Console, the Cloud Monitoring API, or the Cloud SDK:

  • If you use the Cloud Console, then you can enable a recommended alert or you can create an alert by starting from the Alerts page of Cloud Monitoring.

    Recommended alerts are available for some Google Cloud products. These alerts require minimal configuration, such as adding notification channels. For example, if you are viewing the Pub/Sub Lite Topics page, then you can enable an alert to notify you if you're reaching a quota limit. Similarly, if you're viewing the VM Instances page from within Monitoring, then you can enable recommended alerting policies that monitor the memory utilization and network latency of those instances.

    For information about how to create an alerting policy when starting on the Alerts page of Cloud Monitoring, see Creating alerting policies by using the Cloud Console

  • If you use the Cloud Monitoring API directly or if you use the Cloud SDK, then you can create, view, and modify alerting policies. If you want the condition of an alerting policy to compute the ratio of two metrics and then to compare that ratio to a threshold, then you must create that policy by using the Cloud Monitoring API or Cloud SDK. For an example of this type of policy, see Metric ratio.

    For more information about using the Cloud Monitoring API and the Cloud SDK, see Creating alerting policies by using the Cloud Monitoring API or Cloud SDK.

Cloud Monitoring supports an expressive, text-based language that can be used with the Google Cloud Console and with the Cloud Monitoring API. For information about using this language with alerting, see Creating alerting policies by using Monitoring Query Language (MQL).

You can add a log-based alerting policy to your Google Cloud project by using the Logs Explorer in Cloud Logging or by using the Monitoring API. This content does not apply to log-based alerting policies. For information about log-based alerting policies, see Monitoring your logs.

How to manage alerting policies

For information about how to view a list of your project's metric-based alerting policies, and how to modify those policies, see the following:

For information about managing log-based alerting policies, see Using log-based alerts.

Authorization required to create alerting policies

This section describes the roles or permissions needed to create an alerting policy. For detailed information about Identity and Access Management (IAM) for Cloud Monitoring, see Access control.

Each IAM role has an ID and a name. Role IDs have the form roles/monitoring.editor and are passed as arguments to the gcloud command-line tool when configuring access control. For more information, see Granting, changing, and revoking access. Role names, such as Monitoring Editor, are displayed by the Cloud Console.

Required Cloud Console roles

To create an alerting policy, your IAM role name for the Google Cloud project must be one of the following:

  • Monitoring Editor
  • Monitoring Admin
  • Project Owner

To view a list of roles and their associated permissions, see Roles.

Required API permissions

To use the Cloud Monitoring API to create an alerting policy, your IAM role ID for the Google Cloud project must be one of the following:

  • roles/monitoring.alertPolicyEditor: This role ID grants the minimal permissions that are needed to create an alerting policy. For more details on this role, see Predefined alerting roles.
  • roles/monitoring.editor
  • roles/monitoring.admin
  • roles/owner

To identify the permission required for a specific Cloud Monitoring API method, see Cloud Monitoring API permissions. To view a list of roles and their associated permissions, see Roles.

Determining your role

To determine your role for a project by using the Cloud Console, do the following:

  1. Open the Cloud Console and select the Google Cloud project:

    Go to Cloud Console

  2. To view your role, click IAM & admin. Your role is on the same line as your username.

To determine your organization-level permissions, contact your organization's administrator.

Costs associated with alerting policies

There are no costs associated with using alerting policies or uptime checks, but the following limits apply:

Category Value Policy type1
Alerting policies (sum of metric and log) per metrics scope 2 500 Metric, Log
Conditions per alerting policy 6 Metric
Maximum time period that a
metric-absence condition evaluates3
1 day Metric
Maximum time period that a
metric-threshold condition evaluates3
23 hours 30 minutes Metric
Notification channels per alerting policy 16 Metric, Log
Maximum rate of notifications 1 notification every 5 minutes for each log-based alert Log
Maximum number of notifications 20 notifications a day for each log-based alert Log
Maximum number of simultaneously open incidents
per alerting policy
5000 Metric
Period after which an incident with no new data is
automatically closed
7 days Metric
Maximum duration of an incident if not manually closed 7 days Log
Retention of closed incidents 90 days Not applicable
Retention of open incidents Indefinite Not applicable
Notification channels per metrics scope 4000 Not applicable
Uptime checks per metrics scope 4 100 Not applicable
1Metric: an alerting policy based on metric data; Log: an alerting policy based on log messages (log-based alerts)
2Apigee and Apigee hybrid are deeply integrated with Cloud Monitoring. The alerting limit for all Apigee subscription levels—Standard, Enterprise, and Enterprise Plus—is the same as for Cloud Monitoring: 500 per metrics scope .
3The maximum time period that a condition evaluates is the sum of the alignment period and the duration window values. For example, if the alignment period is set to 15 hours, and the duration window is set 15 hours, then 30 hours of data is required to evaluate the condition.
4This limit applies to the number of uptime-check configurations. Each uptime-check configuration includes the time interval between testing the status of the specified resource. See Managing uptime checks for more information.

For full pricing information, see Pricing for Google Cloud's operations suite.

What's next