Introduction to alerting

Alerting gives timely awareness to problems in your cloud applications so you can resolve the problems quickly.

To create an alerting policy, you must describe the circumstances under which you want to be alerted and how you want to be notified. This page provides an overview of alerting policies and the concepts behind them.

For a practical introduction to alerting, try one of these quickstarts:

For an alerting policy that monitors usage and alerts you when you approach the threshold for billing, see Alerting on monthly log ingestion and Alerting on monthly trace span ingestion.

How does alerting work?

You can create and manage alerting policies with the Google Cloud Console, the Cloud Monitoring API, and Cloud SDK.

Each alerting policy specifies the following:

  • Conditions that identify when a resource or a group of resources is in a state that requires you to take action. The conditions for an alerting policy are continuously monitored. You cannot configure the conditions to be monitored only for certain time periods.

  • Notifications that are sent through email, SMS, or other channels to let your support team know when the conditions have been met. Configuring notifications is optional. For information on the available notification channels, see Notification options.

  • Documentation that can be included in some types of notifications to help your support team resolve the issue. Configuring documentation is optional.

When the conditions of an alerting policy are met, Cloud Monitoring creates and displays an incident in the Google Cloud Console. If you set up notifications, Cloud Monitoring also sends notifications to people or third-party notification services. Responders can acknowledge receipt of the notification, but the incident remains open until the conditions that triggered the incident are no longer true.

For information and viewing and managing incidents by using the Google Cloud Console, see Incidents and events.

Example

You deploy a web application onto a Compute Engine VM instance that's running a LAMP stack. While you know that HTTP response latency might fluctuate as normal demand rises and falls, if your users start to experience high latency for a significant period of time, you want to take action.

To be notified when your users experience high latency, you create the following alerting policy:

If HTTP response latency is higher than two seconds,
and if this condition lasts longer than five minutes,
open an incident and send email to your support team.

Your web app turns out to be more popular than you expected and the response latency grows beyond two seconds. Here's how your alerting policy responds:

  1. Cloud Monitoring opens an incident and sends email after five consecutive minutes of HTTP latency higher than two seconds.

  2. The support team receives the email, signs into the Google Cloud Console, and acknowledges receipt of the notification.

  3. Following the documentation in the notification email, the team is able to address the cause of the latency. Within a few minutes HTTP responses drop back below two seconds.

  4. As soon as Cloud Monitoring measures HTTP latency below two seconds, the policy's condition is no longer true (even a single measurement of lower latency breaks the "consecutive five minutes" requirement).

    Cloud Monitoring closes the incident and resets the five-minute timer. If latency rises above two seconds during the next consecutive five minutes, the policy opens a new incident.

Types of alerting policies

Cloud Monitoring enables you to create different types of policies. For example, you can create an alerting policy that triggers if a metric is absent or if the value of a metric exceeds a threshold.

The Google Cloud Console lists all alerting policies associated with a Google Cloud project, even those created by using the API or the Cloud SDK. However, you must use the Cloud Monitoring API or the Cloud SDK to create, view, or modify a ratio-based alerting policy.

For information about different types of policies and example policies, see Types of alerting policies.

For information about variables that can impact alerting, see Alerting behavior.

Authorization

This section describes the roles or permissions needed to create an alerting policy. For detailed information about Identity and Access Management (IAM) for Cloud Monitoring, see Access control.

Each IAM role has an ID and a name. Role IDs have the form roles/monitoring.editor and are passed as arguments to the gcloud command-line tool when configuring access control. For more information, see Granting, changing, and revoking access. Role names, such as Monitoring Editor, are displayed by the Cloud Console.

Required Cloud Console roles

To create an alerting policy, your IAM role name for the Google Cloud project must be one of the following:

  • Monitoring Editor
  • Monitoring Admin
  • Project Owner

To view a list of roles and their associated permissions, see Roles.

Required API permissions

To use the Cloud Monitoring API to create an alerting policy, your IAM role ID for the Google Cloud project must be one of the following:

  • roles/monitoring.alertPolicyEditor: This role ID grants the minimal permissions that are needed to create an alerting policy. For more details on this role, see Predefined alerting roles.
  • role/monitoring.editor
  • role/monitoring.admin
  • role/owner

To identify the permission required for a specific Cloud Monitoring API method, see Cloud Monitoring API permissions. To view a list of roles and their associated permissions, see Roles.

Determining your role

To determine your role for a project by using the Cloud Console, do the following:

  1. Open the Cloud Console and select the Google Cloud project:

    Go to Cloud Console

  2. To view your role, click IAM & admin. Your role is on the same line as your username.

To determine your organization-level permissions, contact your organization's administrator.

Pricing and limits

There are no costs associated with using alerting policies or uptime checks, but the following limits apply:

Category Value
Uptime checks per Workspace 100*
Alerting policies per Workspace 500
Conditions per alerting policy 6
Notification channels per alerting policy 16
Notification channels per Workspace 4000
Simultaneously open incidents per alerting policy 5000
Maximum duration for a metric-absence condition 1 day
Maximum duration for a metric-threshold condition 23 hours 30 minutes
*This limit applies to the number of uptime-check configurations. Each uptime-check configuration includes the time interval between testing the status of the specified resource. See Managing uptime checks for more information.

For full pricing information, see Pricing for Google Cloud's operations suite.

What's next