An alerting policy is represented in the Cloud Monitoring API
by an AlertPolicy
object,
which describes a set of conditions indicating a potentially
unhealthy status in your system.
This document describes the following:
- How the Monitoring API represents alerting policies.
- The types of conditions the Monitoring API provides for alerting policies.
- How to create an alerting policy by using the Google Cloud CLI or client libraries.
Structure of an alerting policy
The AlertPolicy
structure defines the components of an
alerting policy. When you create a policy, either by using the Google Cloud console
or the Monitoring API, you specify values for the following
AlertPolicy
fields:
displayName
: A descriptive label for the policy.documentation
: Any information provided to help responders. This field is optional. Thedocumentation
object includes the following:content
: User-defined text that appears in the body of the notification.subject
: The subject line of the notification. Subject lines are limited to 255 characters.
userLabels
: Any user-defined labels attached to the policy. For information about using labels with alerting, see Annotate alerts with labels.conditions[]
: An array ofCondition
structures.combiner
: A logical operator that determines how to handle multiple conditions.notificationChannels[]
: an array of resource names, each identifying aNotificationChannel
.alertStrategy
: Specifies how quickly Monitoring closes incidents when data stops arriving. This object also specifies whether repeated notifications are enabled for metric-based alerts, and the interval between those notifications. For more information, see Send repeated notifications.
There are other fields you might use, depending on the conditions you create.
By default, alerting policies created by using the Monitoring API send notifications when a condition for triggering the policy is met and when the condition stops being met. You can't change this behavior by using the Monitoring API, but you can turn off notifications about incident closure by editing the policy in the Google Cloud console. To turn off incident-closure notifications, clear the Notify on incident closure option in the notifications section and save the edited policy.
When you create or modify the alerting policy, Monitoring sets
other fields as well, including the name
field. The value of the name
field is the resource name for the alerting policy, which identifies the
policy. The resource name has the following form:
projects/PROJECT_ID/alertPolicies/POLICY_ID
The conditions in the alerting policy are the most variable part of the alerting policy.
Types of conditions in the API
The Cloud Monitoring API supports a variety of condition types in the
Condition
structure. There are multiple condition
types for metric-based alerting policies, and one for log-based alerting
policies. The following sections describe the available condition types.
Conditions for metric-based alerting policies
To create an alerting policy that monitors metric data, including log-based metrics, you can use the following condition types:
Filter-based metric conditions
The MetricAbsence
and MetricThreshold
conditions use
Monitoring filters to select the time-series data
to monitor. Other fields in the condition structure specify how to filter,
group, and aggregate the data. For more information on these concepts, see
Filtering and aggregation: manipulating time series.
If you use the MetricAbsence
condition type, then you can create a condition
that triggers only when all of the time series are absent by aggregating
the time series into a single time series by using aggregations
; see
the MetricAbsence
reference in the API documentation.
A metric-absence alerting policy requires that some data has been written previously; for more information, see Create metric-absence alerting policies.
If you want to create an alert based on a forecast, then use the
MetricThreshold
condition type and set the forecastOptions
field. When
this field is omitted, then the measured data is compared to a threshold.
However, when this field is set, then predicted data is compared to a threshold.
For more information, see
Create forecasted metric-value alerting policies.
MQL-based metric conditions
The MonitoringQueryLanguageCondition
condition uses Monitoring Query Language (MQL) to
select and manipulate the time-series data to monitor. You can create alerting
policies that compare values against a threshold or test for the absence
of values with this condition type.
If you use a MonitoringQueryLanguageCondition
condition, it must be the only
condition in your alerting policy. For more information, see
Alerting policies with MQL.
PromQL-based metric conditions
The PrometheusQueryLanguageCondition
condition uses Prometheus Query Language (PromQL)
queries to select and manipulate time-series data to monitor. You can create
a simple or complex query and use querying structures such as dynamic
thresholds, ratios, metric comparisons, and more.
If you use a PrometheusQueryLanguageCondition
condition, it must be the only
condition in your alerting policy. For more information, see
Alerting policies with PromQL.
Conditions for alerting on ratios
You can create metric-threshold alerting policies to monitor the
ratio of two metrics. You can create these policies by using either
the MetricThreshold
or MonitoringQueryLanguageCondition
condition type.
You can also use MQL directly in the Google Cloud console. You can't create
or manage ratio-based conditions by using the graphical interface for creating
threshold conditions.
We recommend using MQL to create ratio-based alerting policies.
MQL lets you build more powerful and flexible queries than you can
build by using the MetricTheshold
condition type and
Monitoring filters.
For example, with a MonitoringQueryLanguageCondition
condition, you can
compute the ratio of a gauge metric to a delta metric. For examples, see
MQL alerting-policy examples.
If you use the MetricThreshold
condition, the numerator and denominator
of the ratio must have the same MetricKind
.
For a list of metrics and their properties, see Metric lists.
In general, it is best to compute ratios based on time series collected for a single metric type, by using label values. A ratio computed over two different metric types is subject to anomalies due to different sampling periods and alignment windows.
For example, suppose that you have two different metric types, an RPC total count and an RPC error count, and you want to compute the ratio of error-count RPCs over total RPCs. The unsuccessful RPCs are counted in the time series of both metric types. Therefore, there is a chance that, when you align the time series, an unsuccessful RPC doesn't appear in the same alignment interval for both time series. This difference can happen for several reasons, including the following:
- Because there are two different time series recording the same event, there are two underlying counter values implementing the collection, and they aren't updated atomically.
- The sampling rates might differ. When the time series are aligned to a common period, the counts for a single event might appear in adjacent alignment intervals in the time series for the different metrics.
The difference in the number of values in corresponding alignment intervals can
lead to nonsensical error/total
ratio values like 1/0 or 2/1.
Ratios of larger numbers are less likely to result in nonsensical values. You can get larger numbers by aggregation, either by using an alignment window that is longer than the sampling period, or by grouping data for certain labels. These techniques minimize the effect of small differences in the number of points in a given interval. That is, a two-point disparity is more significant when the expected number of points in an interval is 3 than when the expected number is 300.
If you are using built-in metric types, then you might have no choice but to compute ratios across metric types to get the value you need.
If you are designing custom metrics that might count the same thing—like RPCs returning error status—in two different metrics, consider instead a single metric, which includes each count only once. For example, suppose that you are counting RPCs and you want to track the ratio of unsuccessful RPCs to all RPCs. To solve this problem, create a single metric type to count RPCs, and use a label to record the status of the invocation, including the "OK" status. Then each status value, error or "OK", is recorded by updating a single counter for that case.
Condition for log-based alerting policies
To create a log-based alerting policy, which notifies you when a message
matching your filter appears in your log entries, use the
LogMatch
condition type. If you use a LogMatch
condition, it must be the only condition in your alerting policy.
Don't try to use the LogMatch
condition type in conjunction with log-based
metrics. Alerting policies that monitor log-based metrics are metric-based
policies. For more information about choosing between alerting policies that
monitor log-based metrics or log entries, see
Monitoring your logs.
The alerting policies used in the examples in the Managing alerting policies document are metric-based alerting policies, although the principles are the same for log-based alerting policies. For information specific to log-based alerting policies, see Create a log-based alert (Monitoring API) in the Cloud Logging documentation.
Before you begin
Before writing code against the API, you should:
- Be familiar with the general concepts and terminology used with alerting policies; see Alerting overview for more information.
- Ensure that the Cloud Monitoring API is enabled for use; see Enabling the API for more information.
- If you plan to use client libraries, then install the libraries for the languages that you want to use; see Client Libraries for details. Currently, API support for alerting is available only for C#, Go, Java, Node.js, and Python.
If you plan to use the Google Cloud CLI, then install it. However, if you use Cloud Shell, then Google Cloud CLI is already installed.
Examples using the
gcloud
interface are also provided here. Note that thegcloud
examples all assume that the current project has already been set as the target (gcloud config set project [PROJECT_ID]
) so invocations omit the explicit--project
flag. The ID of the current project in the examples isa-gcp-project
.
-
To get the permissions that you need to create and modify alerting policies by using the Cloud Monitoring API, ask your administrator to grant you the Monitoring AlertPolicy Editor (
roles/monitoring.alertPolicyEditor
) IAM role on your project. For more information about granting roles, see Manage access.You might also be able to get the required permissions through custom roles or other predefined roles.
For detailed information about IAM roles for Monitoring, see Control access with Identity and Access Management.
Design your application to single-thread Cloud Monitoring API calls that modify the state of an alerting policy in a Google Cloud project. For example, single-thread API calls that create, update, or delete an alerting policy.
Create an alerting policy
To create an alerting policy in a project, use the
alertPolicies.create
method. For information about how to invoke this
method, its parameters, and the response data, see the reference page
alertPolicies.create
.
You can create policies from JSON or YAML files.
The Google Cloud CLI accepts these files as arguments, and
you can programmatically read JSON files, convert them to AlertPolicy
objects, and create policies from them
by using the alertPolicies.create
method. If you
have a Prometheus JSON or YAML configuration file with an alerting rule, then
the gcloud CLI can migrate it to a Cloud Monitoring alerting
policy with a PromQL condition. For more information, see
Migrate alerting rules and receivers from Prometheus.
Each alerting policy belongs to a scoping project of a metrics scope. Each
project can contain up to 500 policies.
For API calls, you must provide a “project ID”; use the
ID of the scoping project of a metrics scope as the value. In these examples,
the ID of the scoping project of a metrics scope is a-gcp-project
.
The following samples illustrate the creation of alerting policies, but they don't describe how to create a JSON or YAML file that describes an alerting policy. Instead, the samples assume that a JSON-formatted file exists and they illustrate how to issue the API call. For example JSON files, see Sample policies. For general information about monitoring ratios of metrics, see Ratios of metrics.
gcloud
To create an alerting policy in a project, use the gcloud alpha monitoring
policies create
command. The following example creates an alerting policy in
a-gcp-project
from the rising-cpu-usage.json
file:
gcloud alpha monitoring policies create --policy-from-file="rising-cpu-usage.json"
If successful, this command returns the name of the new policy, for example:
Created alert policy [projects/a-gcp-project/alertPolicies/12669073143329903307].
The file rising-cpu-usage.json
file contains the JSON for a policy with
the display name “High CPU rate of change”. For details about this policy, see
Rate-of-change policy.
See the
gcloud alpha monitoring policies create
reference for more information.
C#
To authenticate to Monitoring, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To authenticate to Monitoring, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To authenticate to Monitoring, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To authenticate to Monitoring, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To authenticate to Monitoring, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To authenticate to Monitoring, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
The created AlertPolicy
object will have additional fields.
The policy itself will have name
, creationRecord
, and mutationRecord
fields. Additionally, each condition in the policy is also given a name
.
These fields cannot be modified externally, so there is no need to set them
when creating a policy. None of the JSON examples used for creating
policies include them, but if policies created from them are retrieved after
creation, the fields will be present.