Select and configure metrics

Stay organized with collections Save and categorize content based on your preferences.

This document describes the fields that you set when you are configuring the condition of an alerting policy. Typically, you create an alerting policy when you want to be notified when time-series data, such as the CPU usage of a virtual machine, satisfies certain conditions. This content does not apply to log-based alerting policies. For information about log-based alerting policies, which notify you when a particular message appears in your logs, see Monitoring your logs.

This document uses the terminology used by the menu-driven interface of the Google Cloud console. However, the conceptual information is applicable to all approaches that you can use to create an alerting policy. For information about using the Cloud Monitoring API, see Alerting policies in the Cloud Monitoring API.

Select the data to display

To specify the metrics to display when creating an alerting policy, you specify values for a metric and a resource type:

To configure a condition in an alerting policy, you can use the Cloud Monitoring API or the Google Cloud console. The default dialog you use to create an alerting policy when you use the Google Cloud console is menu-driven. However, if you prefer to use Monitoring Query Language (MQL), you can configure the Google Cloud console to activate an MQL editor.

Not all time series can be represented with a metric and resource model. For example, you can't specify a metric and resource when you want to monitor the number of processes running on a virtual machine (VM). For situations like these, configure the Google Cloud console to open Direct filter mode.

Default mode

Use the default mode when you want to configure a condition that monitors a metric type for a specific resource type and you don't want to use Monitoring Query Language (MQL). By default, the menus only list metrics for which data has been received. A toggle is provided to let you list all Google Cloud metrics.

  • The metric field identifies the measurements to be collected from a monitored resource. It includes a description of what is being measured and how the measurements are interpreted. Metric is a short form of metric type. For conceptual information, see Metric types.

  • The resource type field specifies from which resource the metric data is captured. The resource type is sometimes called the monitored resource type or the resource. For conceptual information, see Monitored resources.

To use default mode, do the following:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring
  2. In the navigation pane, select Alerting.
  3. Click Create Policy.
  4. Select a resource and a metric by using the Select a metric menu.
  5. Add filters, specify the data transformation, and complete the alerting policy dialog. For more information, see Create metric-based alerting policy.

MQL mode

To use Monitoring Query Language (MQL) to describe the condition or to monitor a ratio of metrics, use [MQL].

To use MQL, do the following:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring
  2. In the navigation pane, select Alerting.
  3. Click Create Policy.
  4. In the toolbar, select MQL.
  5. Enter an MQL expression. For information about how to use MQL, see Using the Monitoring Query Language.
  6. Complete the alerting policy dialog. For more information, see Managing alerting policies with MQL.

Direct filter mode

When you want to do any of the following, use Direct filter mode:

  • Monitoring service level objectives (SLO).
  • Configuring an alert for custom metrics for which you don't yet have data.
  • Monitoring the count of processes running on VMs.
  • Verifying syntax for a filter statement to be included in an API command.

When you use direct filter mode, to select the time series you enter a Monitoring filter. For example, the following Monitoring filter in the chart displaying a count of processes whose name includes nginx:

select_process_count("monitoring.regex.full_match(\".*nginx.*\")")
resource.type="gce_instance"

The next filter selects the Disk write bytes time series for Compute Engine VMs that are located in the us-central1-a zone:

metric.type="compute.googleapis.com/instance/disk/write_bytes_count"
resource.type="gce_instance"
resource.label."zone"="us-central1-a"

To enter a monitoring filter or a time-series selector, do the following:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring
  2. In the navigation pane, select Alerting.
  3. Click Create Policy.
  4. Select ? on the Select metric section header and then select Direct filter mode in the tooltip.
  5. Enter a monitoring filter or a time series selector. For information about syntax, see the following documents:

  6. Specify the data transformation and complete the alerting policy dialog. For more information, see Create process-health alerting policy.

Filter the selected data

You can reduce the amount of data being monitored by specifying filter criteria or by applying aggregation. Filters ensure that only time series that meet some set of criteria are used. When you apply filters, there are fewer time series to evaluate, which can improve the performance of the alert.

When you supply multiple filtering criteria, the corresponding chart shows only the time series that meet all criteria, a logical AND.

To add a filter, click Add filter, complete the dialog, and then click Done. In the dialog, you use the Filter field to select the criterion by which to filter. For example, you can filter by resource group, by name, by resource label, by zone, and by metric label. After you select the filter criterion, then complete the filter by selecting the comparison operator and the value. Each row in the following table lists a comparison operator, its meaning, and an example:

OperatorMeaningExample
= Equality resource.labels.zone = "us-central1-a
!= Inequality resource.labels.zone != "us-central1-a"
=~ Regular expression2 equality monitoring.regex.full_match("^us.*")
!=~ Regular expression2 inequality monitoring.regex.full_match("^us.*")
starts_with Value starts with resource.labels.zone = starts_with("us")
ends_with Value ends with resource.labels.zone = ends_with("b")
has_substring Value contains resource.labels.zone = has_substring("east")
one_of One of resource.labels.zone = one_of("asia-east1-b", "europe-north1-a")
!starts_with Value doesn't start with resource.labels.zone != starts_with("us")
!ends_with Value doesn't ends with resource.labels.zone != ends_with("b")
!has_substring Value doesn't contain resource.labels.zone != has_substring("east")
!one_of Value isn't one of resource.labels.zone != one_of("asia-east1-b", "europe-north1-a")

Transform data

After the time series are selected, the next steps are to specify how each time series is processed, also known as alignment, and how the aligned time series are combined.

The remainder of this page briefly describes these options. For a detailed explanation, see Manipulating time series.

Align time series

Alignment is the process of converting a time series received by Monitoring into a new time series with data points spaced by a fixed length of time. The process of alignment consists of the following steps:

  1. Dividing a time series into a set of fixed-length intervals.
  2. Collecting all data points received in each interval and applying a function to combine those data points together. For example, you can select this function to compute the average of all samples.
  3. Associating a timestamp with the value computed in the previous step, and then adding the pair to the aligned time series.

For a general discussion of alignment, see Alignment: within-series regularization.

When you create a condition on an alerting policy, you must specify the alignment parameters. If you use the Google Cloud console, then default values for these parameters are provided:

  • Rolling window: This field is a look-back interval from a particular point in time. For example, if this value is five minutes, then at 1:00 PM, the samples received between 12:55 PM and 1:00 PM are to be aligned. At 1:01 PM, the samples received between 12:56 PM and 1:01 PM are to be aligned. In the context of alerting policies, the alignment period can be viewed as a sliding window that looks to the past. For a more involved discussion about this field, see The alignment period and the duration.

  • Rolling window function: The field specifies the function used to combine all the data points in the look-back interval. In the Cloud Monitoring API, this field is called an aligner. For more information on the available functions, see Aligner in the API reference. Some of the aligner functions both align the data and convert it from one metric kind or type to another. For a detailed explanation, see Kinds, types, and conversions.

Combine time series

You can reduce the amount of data returned for a metric by combining different time series. To combine multiple time series, you typically specify a grouping and a function. Grouping is done by label values. The function defines how all time-series data within a group are combined into a new time series.

To access the options to combine time series, click Show more in the Across time series section.

To combine time series by label value, click the text Time series group by and make a selection from the menu. The menu is constructed dynamically based on the time series you selected.

When you add the first label, the following occurs:

  • An error is displayed because the Time series aggregation field is set to none. To resolve the error, select a function that is used to combine the time series with the same label value.

  • The chart displays one time series for each value of the label listed in the Time series group by field.

If you don't specify a grouping option and do specify an aggregation function, then that function is applied to the selected time series and results in a single time series.

You can group by multiple labels. When you have multiple grouping options, the aggregator is applied to the set of time series that have the same values for the selected labels.

The resulting chart displays one time series for each combination of label values. The order in which you specify the labels doesn't matter.

For example, the following screenshot illustrates grouping by user_labels.version and system_labels.machine_image:

Showing time series' grouped by version and machine image.

As illustrated, if you group by both the labels, you get one time series for each pair of values. The fact that you get a time series for each combination of labels means that this technique can easily create more data than you can usefully put on a single chart.

When you specify grouping or if you select an aggregator, the charted time series only contains required labels, such as the project identifier, and the labels specified by the grouping.

To remove a group-by condition, you must:

  1. Delete the group-by labels.
  2. Set the aggregator to none.

Secondary Aggregation

If you have multiple time series displayed after the Primary data transform and if you want the alerting policy to monitor a single time series, then use the Secondary data transform fields.

Behavior when data is missing

Google Cloud console

You can configure how Monitoring evaluates a metric-threshold condition when data stops arriving. For example, when an incident is open and an expected measurement doesn't arrive, do you want Monitoring to leave the incident open or to close it immediately? Similarly, when data stops arriving and no incident is open, do you want an incident to be opened? Lastly, how long should an incident stay open after data stops arriving?

There are two configurable fields that specify how Monitoring evaluates metric-threshold conditions when data stops arriving:

  • To configure how Monitoring determines the replacement value for missing data, use the Evaluation missing data field which you set in the Condition trigger step. This field is disabled when the retest window is set to No retest.

  • To configure how long Monitoring waits before closing an open incident after data stops arriving, use the Incident autoclose duration field. You set the auto-close duration in the Notification step. The default auto-close duration is seven days.

The following describes the different options for the missing data field:

Google Cloud console
"Evaluation missing data" field
Summary Details
Missing data empty Open incidents stay open.
New incidents aren't opened.

For conditions that are met, the condition continues to be met when data stops arriving. If an incident is open for this condition, then the incident stays open. When an incident is open and no data arrives, the auto-close timer starts after a delay of at least 15 minutes. If the timer expires, then the incident is closed.

For conditions that aren't met, the condition continues to not be met when data stops arriving.

Missing data points treated as values that violate the policy condition Open incidents stay open.
New incidents can be opened.

For conditions that are met, the condition continues to be met when data stops arriving. If an incident is open for this condition, then the incident stays open. When an incident is open and no data arrives for the auto-close duration plus 24 hours, the incident is closed.

For conditions that aren't met, this setting causes the metric-threshold condition to behave like a metric-absence condition. If data doesn't arrive in the time specified by the retest window, then the condition is evaluated as met. For an alerting policy with one condition, the condition being met results in an incident being opened.

Missing data points treated as values that don't violate the policy condition Open incidents are closed.
New incidents aren't opened.

For conditions that are met, the condition stops being met when data stops arriving. If an incident is open for this condition, then the incident is closed.

For conditions that aren't met, the condition continues to not be met when data stops arriving.

API

You can configure how Monitoring evaluates a metric-threshold condition when data stops arriving. For example, when an incident is open and an expected measurement doesn't arrive, do you want Monitoring to leave the incident open or to close it immediately? Similarly, when data stops arriving and no incident is open, do you want an incident to be opened? Lastly, how long should an incident stay open after data stops arriving?

There are two configurable fields that specify how Monitoring evaluates metric-threshold conditions when data stops arriving:

  • To configure how Monitoring determines the replacement value for missing data, use the evaluationMissingData field of the MetricThreshold structure. This field is ignored when the duration field is zero.

  • To configure how long Monitoring waits before closing an open incident after data stops arriving, use the autoClose field in the AlertStrategy structure.

The following describes the different options for the missing data field:

API
evaluationMissingData field
Summary Details
EVALUATION_MISSING_DATA_UNSPECIFIED Open incidents stay open.
New incidents aren't opened.

For conditions that are met, the condition continues to be met when data stops arriving. If an incident is open for this condition, then the incident stays open. When an incident is open and no data arrives, the auto-close timer starts after a delay of at least 15 minutes. If the timer expires, then the incident is closed.

For conditions that aren't met, the condition continues to not be met when data stops arriving.

EVALUATION_MISSING_DATA_ACTIVE Open incidents stay open.
New incidents can be opened.

For conditions that are met, the condition continues to be met when data stops arriving. If an incident is open for this condition, then the incident stays open. When an incident is open and no data arrives for the auto-close duration plus 24 hours, the incident is closed.

For conditions that aren't met, this setting causes the metric-threshold condition to behave like a metric-absence condition. If data doesn't arrive in the time specified by the `duration` field, then the condition is evaluated as met. For an alerting policy with one condition, the condition being met results in an incident being opened.

EVALUATION_MISSING_DATA_INACTIVE Open incidents are closed.
New incidents aren't opened.

For conditions that are met, the condition stops being met when data stops arriving. If an incident is open for this condition, then the incident is closed.

For conditions that aren't met, the condition continues to not be met when data stops arriving.