You can create Monitoring alerting policies whose condition includes an MQL query. MQL queries for alert conditions are like other MQL queries, except that they also include a MQL alerting operation.
This page introduces the MQL alerting operations and describes how to create an alerting policy that uses them. For general information on Monitoring alerting policies, see Behavior of metric-based alerting policies.
MQL alerting operations
You can create both threshold and absence alerting policies with MQL.
You create an MQL-based alerting policy by using one of the following MQL alerting operations in your query:
condition
operation for threshold alerts.absent_for
operation for absence alerts.
Your query must end with one of these operations. For detailed information, see the Alerting section in the MQL reference.
You query must not include an explicit time-range specification, that is,
a within
operation.
When using MQL to create an alerting policy, you build an MQL
query with fetch
, filter
, group_by
, and so on, to identify the
target time series. This part of the query is the same as a query used for
retrieving time series data for a chart. For example, the following query
fetches the CPU utilization for all Compute Engine VM instances in any
US central region:
fetch gce_instance::compute.googleapis.com/instance/cpu/utilization | filter zone =~ 'us-central.*'
This query generates an output table. To create an alert, you pipe the output table into an alerting operation. The alerting operation computes boolean values for the data values in the output table generated by the the query preceding the alerting operation.
The alerting operation specifies an expression for evaluating the data in the input table. For a threshold condition, the expression tests each point against a threshold like "is the value less than 0.5?"
The Monitoring alerting facility uses the results of the alerting operation to determine if and when the alerting policy is triggered. Alert configuration describes how the decision is made.
Threshold alerts
For threshold alerts, use the condition
operation.
The condition operation takes an expression that evaluates a value against a
threshold, like "the value is greater than 15 percent", and returns a boolean.
The condition
operation requires that the input table be aligned with an
explicit alignment window. To align the input table with an explicit window,
specify an alignment window to an align
operation—for
example, align delta_gauge(5m)
—or use a temporal group_by
with a
sliding time window. The following example illustrates using group_by
with
a sliding
operation:
fetch gce_instance::compute.googleapis.com/instance/cpu/utilization | filter zone =~ 'us-central.*' | group_by sliding(5m), mean(val()) | condition val() > .15 '10^2.%'
Because the default group_by
window setting is sliding, the group_by
expression in the previous query is identical to group_by 5m, mean(val())
.
The condition tests each data point in the aligned input table to determine
whether the utilization value exceeds the threshold value of 15%. The table
resulting from the condition
operator has two value columns, a boolean column
recording the result of the threshold evaluation, and a second containing a
copy of the utilization
value column from the input table.
MQL lets you create user-defined labels and attach them to incidents. For examples, see Add severity levels to an alerting policy.
The CPU-utilization value is stored as fractional utilization; the values range
from 0.0 to 1.0. The metric descriptor specifies the unit for these value as
10^2.%
, which the chart displays as a percentage. The units for the
threshold have to be compatible, so we express the threshold as .15 '10^2.%
.
Units for metric types are listed in the relevant table of metric
types; for the metric type compute.googleapis.com/instance/cpu/utilization
,
see the compute
table.
For more information on units in MQL, see Units of measure.
Absence alerts
For absence alerts, use the absent_for
operation, which
takes a duration for which data must be missing. For example, the following
tests to see if data has been missing from the US central zones for eight hours:
fetch gce_instance::compute.googleapis.com/instance/cpu/utilization | filter zone =~ 'us-central.*' | absent_for 8h
The absent_for
operation takes only a duration argument,
which indicates for how long data must be absent to satisfy the condition.
Data is considered absent if data has appeared in the last 24-hour period but not within the duration, in this example, the most recent eight hours.
An absent_for
query creates an output table with aligned values, using
either the default alignment or by using an every
operation following the
absent_for
operation.
The output table has two columns.
The first is the
active
column, which records the boolean results for data absence. Atrue
value means there was an input point within the last 24 hours and none within the duration period.The second column is the
signal
column. If the input table has value columns, then thesignal
column contains the value from the first value column of the most recent input point. If the input table has no value columns, then thesignal
column contains the number of minutes since the last input point was recorded. You can easily force this case, as shown in the following example:fetch gce_instance::compute.googleapis.com/instance/cpu/utilization | filter zone =~ 'us-central.*' | value [] | absent_for 8h
In the preceding example, the
value []
operation removes the value columns from its input table, so thesignal
column in the table created by theabsent_for
operation contain the number of minutes since the last input point was recorded.
Alert configuration
In addition to the MQL query, an alerting-policy condition includes two other values:
- The number of input time series that must satisfy the condition.
The value can be any of the following:
- A single time series.
- A specific number of time series.
- A percentage of time series.
- All time series.
- Duration of the alert state, that is, how long the alert condition must
continuously evaluate to
true
.
When the alerting query continuously evaluates to true
for the specified
duration for a particular time series, then that time series is considered
active. When the specified number of time series are active, then the
alerting policy is triggered and an alert is generated for each active time
series. For more information about how alerting policies are evaluated,
see Behavior of metric-based alerting policies.
If you use MQL in a condition, that condition must be the only condition in the policy. You can't use multiple conditions in MQL-based alerting policies.
Creating MQL alerting policies (console)
To create a MQL-based alerting policy from the Google Cloud console, do the following:
In the Google Cloud console, select Monitoring or click the following button:
Go to MonitoringIn the navigation pane, select notifications Alerting.
To add notification channels or to update notification channels, click Edit notification channels. Add your notification channels and then return to the Alerting page.
For details about your choices of notification channels, see Create and manage notification channels.
On the Alerting page, click Create Policy.
On the toolbar, select MQL.
The Query Editor opens.
Enter the query that selects the data you want to monitor in the Query Editor. The following query fetches the time series and aligns them over a five-minute window:
fetch gce_instance | metric 'compute.googleapis.com/instance/cpu/utilization' | group_by 5m, mean(val())
If you click Run Query at this point, then you see a chart. For one project, this query produced the following result:
Add an alert clause to the query by using one of the following operations:
- The
condition
operator, for a threshold alert. - The
absent_for
operator, for an absence alert.
For more information about these alerting operations, see Alerting in the MQL reference.
The following example uses the
condition
operation to specify a threshold:fetch gce_instance | metric 'compute.googleapis.com/instance/cpu/utilization' | group_by 5m, mean(val()) | condition val() > .05
If you click Run Query at this point, then the chart adds a threshold line for the condition, as shown in the following screenshot:
- The
If you haven't run your query yet, then click Run Query.
Click Next and configure the alert trigger:
Alert triggers lets you specify how many time series returned by the query must satisfy the alerting operation before the alerting policy can be triggered. You can select from the following criteria:
- A single time series.
- A specific number of time series.
- A percentage of the time series.
- All of the time series.
Optional: Expand the Advanced options menu and select the Retest window. This field defines how long the condition must be satisfied before the alerting policy is triggered. The Retest window isn't the same as the alignment window used in the MQL query. For more information on the relationship between these values, see The alignment period and the duration.
Enter a name for the condition and click Next.
Optional: Configure notifications, add policy labels, and add documentation.
Click Alert name and enter a name for the alerting policy.
Click Create policy.
Queries for conditions in alerting policies aren't converted to strict form.
For complete steps, see Managing alerting policies.
Creating MQL alerting policies (API)
If you're using the API, create a condition of the type
MonitoringQueryLanguageCondition
when you set up the
policy. For more information, see Creating conditions
for alerting policies.
Then pass the policy to alertPolicies.create
as
usual.