Use cases for MQL alerts

Alerting policies with a Monitoring Query Language (MQL)-based condition let you configure your alerting environment for many possible use cases. Certain configurations are available only through the use of MQL queries.

This document describes several uses cases and sample queries for deploying alerting policies with an MQL-based condition in a production environment.

Alert on dynamic thresholds

You can use an MQL query to configure an alerting policy that triggers alerts based on a threshold that varies over time, such as days of the week. This configuration isn't supported in alerting policy conditions without MQL queries.

For example, you have an MQL query that sends an alert if the CPU utilization of a Compute Engine instance exceeds 95%:

fetch gce_instance :: compute.googleapis.com/instance/cpu/utilization
| align
| every 30s
| condition utilization > 95'%'

However, you want to set a lower utilization threshold, such as 85%, for weekends, to account for longer response times from your support team. In this case, you could configure your query with a value column that contains the alerting threshold:

fetch gce_instance :: compute.googleapis.com/instance/cpu/utilization
| align
| every 30s
| value add [day_of_week: end().timestamp_to_string('%w').string_to_int64]
| value [utilization, is_weekend: day_of_week = 0 || day_of_week = 6]
| value [utilization, max_allowed_utilization: if(is_weekend, 85'%', 95'%')]
| condition utilization > scale(max_allowed_utilization)

The value operations do the following:

  • value add [day_of_week: end().timestamp_to_string('%w').string_to_int64] adds a value column whose value is a number between 0 and 6, where 0 is Sunday and 6 is Saturday.
  • value [utilization, is_weekend: day_of_week = 0 || day_of_week = 6] replaces your day number with a boolean that indicates whether the data point was on a weekend or a weekday.
  • value [utilization, max_allowed_utilization: if(is_weekend, 85'%', 95'%')] replaces the boolean with a threshold that varies depending on the value of is_weekend.

The condition, condition utilization > scale(max_allowed_utilization), compares the two value columns.

For an example of an alerting policy with an MQL-based condition that configures incident severity levels based on dynamic criteria, see Create dynamic severity levels using MQL.

Alert on thresholds based on rate of change

You can configure alerting policy MQL queries to evaluate thresholds based on the rate of change for a metric. For example, you want to evaluate the rate of 5xx errors per instance of resource.method in your API requests, where your rate is equivalent to requests per second. If the rate is greater than 5 error responses per second, then Cloud Monitoring sends an alert:

fetch consumed_api
| metric 'serviceruntime.googleapis.com/api/request_count'
| filter (metric.response_code_class == '5xx')
| align rate(10m)
| every 30s
| group_by [resource.method],
    [value_request_count_mean: mean(value.request_count)]
| condition val() > 0.05'1/s'

You can create rate-of-change alerting policies without using MQL:

Alert on ratio-based thresholds

Your alerting policy can use an MQL query to evaluate ratios derived by joining two metrics and then dividing the value columns. For example, you want to query the ratio of read bytes compared to write bytes for each of your Compute Engine instances. If the ratio is greater than 3/5, or 60%, then Cloud Monitoring sends an alert:

{
  fetch gce_instance :: compute.googleapis.com/instance/disk/read_bytes_count;
  fetch gce_instance :: compute.googleapis.com/instance/disk/write_bytes_count
}
| every 30s
| join
| value val(0) / val(1)
| condition val() > 0.6

You can also query the ratio of aggregated values. For example, you want to compute the average CPU usage time per core across your Compute Engine instances. If the ratio is greater than than 3/5, or 60%, then Cloud Monitoring sends an alert:

{
  fetch gce_instance :: compute.googleapis.com/instance/cpu/usage_time
  | group_by [], .sum;
  fetch gce_instance :: compute.googleapis.com/instance/cpu/reserved_cores
  | group_by [], .sum
}
| every 30s
| ratio
| condition val() > 0.6

You can create ratio-based alerting policies without using MQL:

  • For an example that uses the Google Cloud console, see Compute ratios.
  • For an example that uses the Cloud Monitoring API, see Metric ratio.