Alerting policies with MQL

You can create Monitoring alerting policies whose condition includes an MQL query. MQL queries for alert conditions are like other MQL queries, except that they also include a MQL alerting operation.

This page introduces the MQL alerting operations and describes how to create an alerting policy that uses them. For general information on Monitoring alerting policies, see Alerting behavior.

MQL alerting operations

You can create both threshold and absence alerting policies with MQL.

You create an MQL-based alerting policy by using one of the following MQL alerting operations in your query:

Your query must end with one of these operations. For detailed information, see the Alerting section in the MQL reference.

You query must not include an explicit time-range specification, that is, a within operation.

When using MQL to create an alerting policy, you build an MQL query with fetch, filter, group_by, and so on, to identify the target time series. This part of the query is the same as a query used for retrieving time series data for a chart. For example, the following query fetches the CPU utilization for all Compute Engine VM instances in any US central region:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| filter zone =~ 'us-central.*'

This query generates an output table. To create an alert, you pipe the output table into an alerting operation. The alerting operation computes boolean values for the data values in the output table generated by the the query preceding the alerting operation.

The alerting operation specifies an expression for evaluating the data in the input table. For a threshold condition, the expression tests each point against a threshold like "is the value less than 0.5?"

The Monitoring alerting facility uses the results of the alerting operation to determine if and when the alerting policy is triggered. Alert configuration describes how the decision is made.

Threshold alerts

For threshold alerts, use the condition operation. The condition operation takes an expression that evaluates a value against a threshold, like "the value is greater than 15 percent", and returns a boolean.

The condition operation requires that the input table be aligned with an explicit alignment window. You can do this by specifying an alignment window to an align operation—for example, align delta(5m)—or by using a window operation, as shown in the following example:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| filter zone =~ 'us-central.*'
| window 5m
| condition val() > .15 '10^2.%'

The condition tests each data point in the aligned input table to determine whether the utilization value exceeds the threshold value of 15%. The table resulting from the condition operator has two value columns, a boolean column recording the result of the threshold evaluation, and a second containing a copy of the utilization value column from the input table.

The CPU-utilization value is stored as fractional utilization; the values range from 0.0 to 1.0. The metric descriptor specifies the unit for these value as 10^2.%, which the chart displays as a percentage. The units for the threshold have to be compatible, so we express the threshold as .15 '10^2.%.

Units for metric types are listed in the relevant table of metric types; for the metric type compute.googleapis.com/instance/cpu/utilization, see the compute table.

For more information on units in MQL, see Units of measure.

Absence alerts

For absence alerts, use the absent_for operation, which takes a duration for which data must be missing. For example, the following tests to see if data has been missing from the US central zones for eight hours:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| filter zone =~ 'us-central.*'
| absent_for 8h

The absent_for operation takes only a duration argument, which indicates for how long data must be absent to satisfy the condition.

Data is considered absent if data has appeared in the last 24-hour period but not within the duration, in this example, the most recent eight hours.

An absent_for query creates an output table with aligned values, using either the default alignment or by using an every operation following the absent_for operation.

The output table has two columns.

  • The first is the active column, which records the boolean results for data absence. A true value means there was an input point within the last 24 hours and none within the duration period.

  • The second column is the signal column. If the input table has value columns, then the signal column contains the value from the first value column of the most recent input point. If the input table has no value columns, then the signal column contains the number of minutes since the last input point was recorded. You can easily force this case, as shown in the following example:

    fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
    | filter zone =~ 'us-central.*'
    | value []
    | absent_for 8h
    

    In the preceding example, the value [] operation removes the value columns from its input table, so the signal column in the table created by the absent_for operation contain the number of minutes since the last input point was recorded.

Alert configuration

In addition to the MQL query, an alerting-policy condition includes two other values:

  • The number of input time series that must satisfy the condition. The value can be any of the following:
    • A single time series.
    • A specific number of time series.
    • A percentage of time series.
    • All time series.
  • Duration of the alert state, that is, how long the alert condition must continuously evaluate to true.

When the alerting query continuously evaluates to true for the specified duration for a particular time series, then that time series is considered active. When the specified number of time series are active, then the alerting policy is triggered and an alert is generated for each active time series. For more information about how alerting policies are evaluated, see Alerting behavior.

If you use MQL in a condition, that condition must be the only condition in the policy. You can't use multiple conditions in MQL-based alerting policies.

Creating MQL alerting policies (console)

To create a MQL-based alerting policy from the Google Cloud Console, follow the usual steps for creating the policy, described in Managing alerting policies. When you create the condition for the alerting policy, use the Query Editor instead of the form-based metric selector.

The condition editor for MQL-based alerting policies.

Complete the condition as follows:

  1. Name your condition by entering a value in the Untitled Condition field. When saved, the condition is assigned a numeric identifier. This optional display name can provide a more meaningful description.

  2. To start your alerting condition, enter the query that selects the data you want to monitor in the Query Editor. The following query fetches the time series and aligns them over a five-minute window:

    fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
    | window 5m
    

    If you click Run Query at this point, then you see a chart. For one project, this query produced the follwowing result:

    Chart from an alerting condition before specifying the alert.

  3. Add an alert clause to the query by using one of the following operations:

    • The condition operator, for a threshold alert.
    • The absent_for operator, for an absence alert.

    For more information about these alerting operations, see Alerting in the MQL reference.

    The following example uses the condition operation to specify a threshold:

    fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
    | window 5m
    | condition val() > .15
    

    If you click Run Query at this point, then the chart adds a threshold line for the condition, as shown in the following screenshot:

    Chart from an alerting condition after specifying the alert.

  4. If you haven't run your query yet, then click Run Query.

  5. In the Configuration pane, specify when the alerting policy should be triggered. There are two values to specify:

    • Condition triggers if lets you specify how many time series returned by the query must satisfy the alerting operation before the alerting policy can be triggered. You can specify the following criteria:

      • A single time series.
      • A specific number of time series.
      • A percentage of the time series.
      • All of the time series.
    • For lets you specify how long the condition must be satisfied before the alerting policy can be triggered. This is not the same as the alignment window used in the MQL query. For more information on the relationship between these values, see The alignment period and the duration.

  6. Click Add to save the condition. You get a dialog box that notifies you that the alerting condition is converted to a strict form when saved. Click Save to save the query, or click Cancel to continue editing.

  7. Proceed with the rest of the alert-policy configuration.

Creating MQL alerting policies (API)

If you're using the API, create a condition of the type MonitoringQueryLanguageCondition when you set up the policy. Then pass the policy to alertPolicies.create as usual. For more information, see Conditions in alerting policies.